Sort of free and it’s really boring as well so thought of writing about Informatica Aggregator transformation. It can be little insane to get rid of boredom through bugging others 🙂 but can’t help it.
(Phaltu ka gyan – Unnecessary conversation) : Apology for being sarcastic.
Aggregating atomic data is a very common process to obtain. In a desired data view / performance metrics for business and other Stakeholders. From ETL perspective Data aggregation is required to maintain data in Aggregate Fact tables and OLAP cubes. Let’s discuss how Informatica Aggregator Transformation works… – Aggregation in business IT world. Informatica Aggregator transformation performs calculations on group of data. Aggregator reads every input record and preserves the data in caches initially before it applies the required aggregation function over the data set. Informatica allow you to use a variety of filter conditions while aggregating the data. Some of the Aggregate functions available in Informatica are – sum, avg, max, min, count, first and many more. – Aggregation in Informatica.
(Phaltu ka gyan – Unnecessary conversation): I hope this unnecessary conversation might help you a bit to know the reason of aggregation in industry.
- What will Informatica Aggregator Transformation return if you do not configure Group By port? Informatica will return last input row read from the incoming data in the pipeline. Informatica returns one row for each value of group by port when Group by port is configured where in it returns just last read row when Group by port is not configured.
- What happens when Informatica Aggregator transformation is configured with “Sorted Input” but input rows are not sorted? Informatica Session will fail if the input data is not sorted.
- Explain two Informatica Aggregator Transformation caches? Here are two caches created and maintained by an Aggregator transformation – Data Cache and Index Cache. Index Cache preserves the Group By port values and Data Cache preserves the rest such as aggregate calculations, variable and other input values.
Different ports in Aggregator Transformation
Below are different ports required to be configured in an Aggregator Transformation.
1. Input Port. Data elements that are passed to an Aggregator Transformation are known to be Input Ports
2. Output Port. Values that are needed in the pipeline after Aggregator Transformation will be marked as Output ports. Generally Group By and aggregated values are also the output ports.
3. Group by Port. This port is an indicator for Aggregator Transformation to divide the input data into groups in order to perform the aggregation. For example if you want to find the highest spent customer(s) for every city and every month – (Date.month) and (Address.city) will be ideal Group by ports.
4. Variable Port. An optional port in Aggregator Transformation however pretty useful one, if you intend to perform some intermediate calculations as a before applying aggregation. For example, a variable port is defined to calculate commission in the below snapshot.
How to create Aggregator Transformation?
1. Drag Aggregator Transformation to Design space.
2. Drag the ports to Aggregator Transformation.
3. Configure Group By port and default values for your input data if you are expecting to see NULLs.
4. Add expressions to calculate aggregate values
5. Configure Cache directory in properties..
6. Configure to use sorted input if your input data is in sort order.
Performance Points to Consider for an Aggregator Transformation.
1. Sorting the data based on Group By ports improves the performance widely. It is always a good idea to sort the data before it is passed to Aggregator Transformation. Please note that the session will fail if it is configured to use sorted input but input data is not sorted accurately.
2. In datawarehouses while handling the huge volume of data, the fastest aggregation strategy will be Incremental. For example if you are trying to find maximum spent customer for a month, it is a good idea to configure Aggregator transformation with Incremental Cache so the cache is preserved and the aggregation is performed incrementally.
3. Contrary to Point 2 – when your dimension data is set to have major changes every time, it is good to drop and rebuild your aggregate data.
Informatica aggregator interview questions.
- What will Informatica Aggregator Transformation return if you do not configure Group By port?
- What happens when Informatica Aggregator transformation is configured with “Sorted Input” but input rows are not sorted?
- Explain two Informatica Aggregator Transformation caches?
Do let me know if it requires to be added more practice questions ?