Thursday, April 21, 2011

Group similar text together using Fuzzy grouping component In SSIS.

Create Txt file DATA.txt


We can see that the meanings of the “SUHAS” and “suhas” are the same except for the case sensitivity. So we need to group and clean those two rows.

You can do this by referring to the below Steps:

·         Create a flat file named “DATA.txt”, and copy the above data into it.

·         Create a Flat File Connection under the Connection Manager panel. Name it “FuzzyGroupConn” and set the filename to the path of the file “DATA.txt”.

·         Drag a Flat File Source to the Data Flow panel and configure it to use the FuzzyGroupConn connection.

·         Drag a Data Conversion component to the panel, connect the Flat File Source to it and convert the data type of the FNAME column to DT_WSTR, and type the output alias "FNAMENEW".

·         Drag a Fuzzy Grouping component to the Data Flow panel. Connect the output of the Data Conversion to the Fuzzy Grouping. Go to the Fuzzy Grouping’s property editor, select the Column tab, choose "FNAMENEW" as input column. Then go to the “Advanced” tab, and set the Similarity threshold to .50.

·         Drag an OLE DB Destination to the Data Flow Panel, connect the Fuzzy Grouping component to it, right click the green connection line, click Data Viewers…, and click the “Add…” button to add a data viewer. The results look like this

·         You can find that the data is cleaned and grouped by similar title values.

Posted by - SUHAS R. KUDEKAR (MCTS - Microsoft Business Intelligence)
Learning Office 2010 + SharePoint 2010 + SQL Server 2008 R2\

No comments:

Post a Comment