Tuesday, July 28, 2015

Aggregate Transformation in SSIS

What is the Aggregate Transformation?

The Aggregate transformation is used to perform aggregate operations/functions on groups in a dataset. 

  • The aggregate functions available are- Count, Count Distinct, Sum, Average, Minimum and Maximum. 
  • The Aggregate transformation has one input and one or more outputs.
  •  It does not support an error output. 

When would you use the Aggregate Transformation?

As a rough rule, you should use the Aggregate transformation only when the data source cannot efficiently support the Aggregation processes by itself. If you are reading in data from a relational source, usually it will make more sense to have the server aggregate the data in a query before passing it into SSIS. An exception to this may be if you are hitting a live system and cannot afford to (or are not allowed to) load the server with queries. If you were reading from a Flat File source then you would have to use the Aggregate Transformation as the File System doesn’t provide any means to perform data operations.
The Aggregate transformation supports the following operations.
Group By: Divides datasets into groups. Columns of any data type can be used for grouping.
Sum: Sums the values in a column. Only columns with numeric data types can be summed.
Average: Returns the average of the column values in a column. Only columns with numeric data types can be averaged.
Count: Returns the number of items in a group.
Count distinct: Returns the number of unique non null values in a group.
Minimum: Returns the minimum value in a group. This operation can be used only with numeric, date, and time data types.
Maximum: Returns the maximum value in a group. This operation can be used only with numeric, date, and time data types.
The Aggregate transformation handles null values in the same way as the SQL Server relational database engine.
  • In a GROUP BY clause, nulls are treated like other column values. If the grouping column contains more than one null value, the null values are put into a single group.
  • In the COUNT (column name) and COUNT (DISTINCT column name) functions, nulls are ignored and the result excludes rows that contain null values in the named column.
  • In the COUNT (*) function, all rows are counted, including rows with null values.
Now let me demonstrate how you can create an SSIS package with Aggregate   transformation

Go to START ==> Microsoft SQL Server 2008 ==>SQL Server Business Intelligence Development Studio to launch BIDS.

Then go to File menu==> New Project ==>Select “Business Intelligence Projects” in the left tree pane -> Select “Integration Services Projects” and name the project as you wish and click OK.

Here in this example 

we want to get the sum of the sales amount  for each Color and English product name  based on the Dimproduct and FactInternetsales tables  data from AdventureworkDW Database. 


We want to perform database equivalent of SUM(SALESAMOUNT) GROUP BY Color and EnglishproductName  operation.


Here, we have Dimproduct and FactInternetsales  tables are OLEDB Source.

Now Drag and Drop Aggregate Transformation As Show below.
Double-click the Aggregate transform to open the editor. Next in the lower pane we select   the Input Column, se


What is the Aggregate Transformation?

The Aggregate Transformation provides a means of carrying out some simple aggregations on data pushed through SSIS, similar to those found in SQL where using “Group By” clause. The available aggregations are:
  • Group by
  • Sum
  • Average
  • Count
  • Count distinct
  • Minimum
  • Maximum
Below is a snapshot of the output from the example package Data Flow 1, where the SalesOrderHeader table in AdventureWorks is grouped by OrderDate, the Count aggregation is applied (by selecting the “(*)” column in the column selector) and the TaxAmt field is both Summed and Averaged. Note the row counts going in and coming out of the transformation – because of the grouping much fewer rows come out of the transform than are pushed in.


b
Fig 1: The Aggregate Transformation and its output

How to cut the same dataset different ways

The Aggregate Transformation can support multiple outputs – this means you can read the data set into memory once, then cut it up as many ways as you like. By clicking the Advanced button on the Aggregate tab of the component editor, a new grid is revealed. If you enter a new value in the “Aggregation Name” column, the column selector is enabled and you can create a new set of aggregations which will be delivered as a new output for the component, as demonstrated in Data Flow 2 of the example package.


Fig 2: The Aggregate Transformation with multiple outputs
Fig 2: The Aggregate Transformation with multiple outputs

Improving Performance in the Aggregate Transformation

The Aggregate Transformation is pretty quick as it runs in memory, but if you are shifting very large volumes of data through it and it is slowing down there are a few tweaks available. First is the Keysand KeyScale properties. These tell the component how many “Group By” distinct groups it should be prepared to handle. By default the value for KeyScale is “Unspecified”, but can be set to low (up to 500,000 keys), medium (up to 5m keys) or high (25m keys). If you are more certain of how many Keys you will be writing you can use the Keys property, which overrides KeyScale, and you can enter the amount of expected Keys. This can either be set per Aggregation output in the advanced editor grid, or globally using the Advanced tab of the editor. If you are using a CountDistinctaggregation you can set the CountDistinctScale and CountDistinctKeys properties which operate in the same way. Usually there is no need to adjust these properties.

0 comments:

Popular Posts

Recent Posts

Unordered List

Text Widget

Blog Archive