SSIS Remove Duplicate Rows Using Fuzzy Grouping (SSIS Transformations)
SSIS Remove Duplicate Rows Using Fuzzy Grouping (SSIS Transformations) Source data may have the duplicate rows which needs to be removed as part of data cleansing task. Fuzzy Grouping is one of the transformation in Data Flow Transformations which can be used to group the similar rows in the source stream line. This transformation performs data cleaning tasks by identifying rows of data that are likely to be duplicate. Lets generate some of duplicate records as shown in below screen:- Create one DataFlow task inside which Drag OLE DB Source component and write the query shown above which will become as source data which has duplicate records. Drag the Fuzzy Group component and open the editor. Go to the Columns tab, and check the checkbox agist the column which you want to analyze of similarity. after that go to the advanced tab. Here we can define Similarity threshold. Important values in aboe screenshot are _key_in and _key_out in our example as we ar