Merge Join Transformation in SSIS
Merge Join Transformation is one of the useful tool in SSIS to join two sources. The Merge Join transformation provides an output that is generated by joining two sorted datasets using a FULL, LEFT, or INNER join. Let us see how it works.
You can configure the Merge Join transformation in the following ways:
• Specify the join as FULL, LEFT, or INNER join.
• Specify the columns the join uses.
• Specify whether the transformation handles null values as equal to other nulls.
NOTE: If null values are not treated as equal values, the transformation handles null values like the SQL Server Database Engine does.
You can configure the Merge Join transformation in the following ways:
• Specify the join as FULL, LEFT, or INNER join.
• Specify the columns the join uses.
• Specify whether the transformation handles null values as equal to other nulls.
NOTE: If null values are not treated as equal values, the transformation handles null values like the SQL Server Database Engine does.
This transformation has two inputs and one output. It does not support an error output. Let’s discuss the same with an example. We will start out with a connection manager that is created for the Adventure works database
Now drag and drop a Data Flow Task from the toolbox. Now Double click on Data flow Task.
Now we will create an OLE DB Source called “Products”, in which we will select all of the products (see query below).
Double click on the Products OLE DB Source and the OLE DB Source Editor will appear. We select SQL Command from the Data Access Mode and will specify the query. This query will select the ProductID, Name, ProductNumber, and ProductSubcategoryID from the Product table.
Now Create an OLE DB Source called “Product Subcategory”, in which we will select all of the Product Categories .
Now Double click on the Product Subcategory OLE DB Source and the OLE DB Source Editor will appear. We select SQL Command from the Data Access Mode and will specify the query. This query will select the ProductSubcategoryID, and Name from the ProductCategory table.
Now Create an OLE DB Source called “Purchase Order Detail”, in which we will select all of the Details of the Purchase Orders.
Double click on the Purchase Order Detail OLE DB Source and the OLE DB Source Editor will appear. We select SQL Command from the Data Access Mode and will specify the query. This query will select the PurchaseOrderID, ProductID, and UnitPrice from the PurchaseOrderDetail table.
Now we will create two Sort components and join the pipeline from Products to one of the sort transformations and join the pipeline from Product Subcategory to the other sort transformation. Remember that both datasets that you are joining must be sorted the same before joining with a Merge Join.
Now Click on the Sort that we connected to the Products source and the Sort Transformation Editor will appear. We want to sort this dataset by ProductSubcategoryID, so click on the checkbox to the left of ProductSubcategoryID. This will put ProductSubcategoryID in the lower box. Now we can click OK when we are complete.
Click on the Sort that we connected to the Product Subcategory source and the Sort Transformation Editor will appear. We want to sort this dataset by ProductSubcategoryID, so click on the checkbox to the left of ProductSubcategoryID. This will put ProductSubcategoryID in the lower box. Now we can click OK when we are complete.
Now add a Merge Join transformation. This will allow us to join the Products and Product Subcategory sources together. Drag the pipeline arrow from the Products Sort to the Merge Join.
The Input Output Selection window will appear. Select Merge Join Left Input. This will mean that we are using the Products on the Left hand side of the join. If you are familiar with Left and Right joins in SQL this is a familiar concept. Choosing Left Input doesn’t mean we are necessarily doing an outer or inner join (we define that later), it just defines what side of the join this input will be used as.
Now drag the pipeline arrow from the Product Subcategory Sort to the Merge Join. This time it will most likely not prompt you for which side of the join you want to add this input, as we already selected Left Input for the previous input.
Now we will double click on the Merge Join and the Merge Join Transformation Editor will appear. You will notice that ProductSubcategoryID from both datasets has a line connected to each other. This means that this is the column that we are joining on. You will also notice that the Join Type can be “Inner Join”, “Left Outer Join” or “Full Outer Join”. In order to retain the other columns in the pipeline click on the checkboxes to the left of the column names. If any of the boxes to the left of the columns are not checked, it means that column will no longer be in the pipeline after the Merge Join. Notice that below there is a column called “Output Alias”. This column allows us to rename columns to new names. This allows us to rename “Name” from Products to “ProductName” and renam “Name” from ProductSubcategory to “CategoryName”. So after the Merge Join, this columns will now be known be these alias names. When completed click OK.
Now we will create two Sort components and join the pipeline from the Merge Join to one of the sorts and join the pipeline from the Purchase Order Detail source to the other sort.
Now Click on the Sort that we connected to the Merge Join output and the Sort Transformation Editor will appear. We want to sort this dataset by ProductID, so click on the checkbox to the left of ProductID. This will put ProductID in the lower box. Now we can click OK when we are complete.
Click on the Sort that we connected to the Purchase Order Detail source and the Sort Transformation Editor will appear. We want to sort this dataset by ProductID, so click on the checkbox to the left of ProductID. This will put ProductID in the lower box. Now we can click OK when we are complete.
Now we will add a Merge Join transformation. This will allow us to join the Results of the first Merge Join and the Purchase Order Detail source together. Drag the pipeline arrow from the sort transformation of the first Merge Join to the Merge Join. The Input Output Selection window will appear. Select Merge Join Left Input. Then drag the pipeline arrow from the Purchase Order Detail sort to the Merge Join.
Now we will double click on the Merge Join and the Merge Join Transformation Editor will appear. You will notice that ProductID from both datasets has a line connected to each other. This means that this is the column that we are joining on. You will also notice that the Join Type can be “Inner Join”, “Left Outer Join” or “Full Outer Join”. In order to retain the other columns in the pipeline click on the checkboxes to the left of the column names. If any of the boxes to the left of the columns are not checked, it means that column will no longer be in the pipeline after the Merge Join. Click OK hen completed.
Instead of using OLEDB as destination, I used Audit Transformation for testing purpose and used Data viewer to show output data produced by Merge join Transformation. After testing our package successfully we can delete Audit Transformation and Data Viewer and bring our destination and connect to Merge Join transformation for input columns for our destination.
Now Execute the package to see the Results.
Now we can output our joined dataset to any source we desire. I Hope you are now able to successfully join datasets using SSIS.
0 comments:
Post a Comment