Powered By Blogger

Tuesday, May 24, 2011

Aggregate Transformation

The Aggregate transformation applies aggregate functions, such as Average, to column values and copies the results to the transformation output. Besides aggregate functions, the transformation provides the GROUP BY clause, which you can use to specify groups to aggregate across.
You configure the Aggregate transformation at the transformation, output, and column levels.
  • At the transformation level, you configure the Aggregate transformation for performance by specifying the following values:
    • The number of groups that are expected to result from a Group by operation.
    • The number of distinct values that are expected to result from aCount distinct operation.
    • The percentage by which memory can be extended during the aggregation.
    The Aggregate transformation can also be configured to generate a warning instead of failing when the value of a divisor is zero.

  • At the output level, you configure the Aggregate transformation for performance by specifying the number of groups that are expected to result from a Group by operation. The Aggregate transformation supports multiple outputs, and each can be configured differently.

  • At the column level, you specify the following values:
    • The aggregation that the column performs.
    • The comparison options of the aggregation.
You can also configure the Aggregate transformation for performance by specifying these values:
  • The number of groups that are expected to result from a Group byoperation on the column.
  • The number of distinct values that are expected to result from a Count distinct operation on the column.
You can also identify columns as IsBig if a column contains large numeric values or numeric values with high precision.

The Aggregate transformation is asynchronous, which means that it does not consume and publish data row by row. Instead it consumes the whole rowset, performs its groupings and aggregations, and then publishes the results.
This transformation does not pass through any columns, but creates new columns in the data flow for the data it publishes. Only the input columns to which aggregate functions apply or the input columns the transformation uses for grouping are copied to the transformation output. For example, an Aggregate transformation input might have three columns: CountryRegion, City, and Population. The transformation groups by the CountryRegion column and applies the Sum function to the Population column. Therefore the output does not include the City column.

You can also add multiple outputs to the Aggregate transformation and direct each aggregation to a different output. For example, if the Aggregate transformation applies the Sum and the Average functions, each aggregation can be directed to a different output.

You can apply multiple aggregations to a single input column. For example, if you want the sum and average values for an input column named Sales, you can configure the transformation to apply both the Sum and Average functions to the Sales column.

The Aggregate transformation has one input and one or more outputs. It does not support an error output.
Operations

The Aggregate transformation supports the following operations.

OperationDescription
Group byDivides datasets into groups. Columns of any data type can be used for grouping. For more information, see GROUP BY (Transact-SQL

SumSums the values in a column. Only columns with numeric data

types can be summed. For more information, see SUM (Transact-SQL).
Average

Returns the average of the column values in a column. Only columns with numeric data types can be averaged. For more information, see AVG (Transact-SQL).
CountReturns the number of items in a group. For more information, seeCOUNT (Transact-SQL).
Count distinctReturns the number of unique nonnull values in a group. For more information, see Eliminating Duplicates with DISTINCT.
MinimumReturns the minimum value in a group. For more information, see MIN (Transact-SQL). In contrast to the Transact-SQL MIN function, this operation can be used only with numeric, date, and time data types.
MaximumReturns the maximum value in a group. For more information, see MAX (Transact-SQL). In contrast to the Transact-SQL MAX function, this operation can be used only with numeric, date, and time data types.
The Aggregate transformation handles null values in the same way as the SQL Server relational database engine. The behavior is defined in the SQL-92 standard. The following rules apply:
  • In a GROUP BY clause, nulls are treated like other column values. If the grouping column contains more than one null value, the null values are put into a single group.
  • In the COUNT (column name) and COUNT (DISTINCT column name) functions, nulls are ignored and the result excludes rows that contain null values in the named column.
  • In the COUNT (*) function, all rows are counted, including rows with null values.

No comments: