Partitioning is a mega feature and so here is a mega post on Informatica partition . Tried to be shorter (Concise) but could not do so. In general partitioning is a generic term and partitioning is a strategy for faster processing and it gets used in many of computer related domains like ETL, database, processors etc.
Intended audience: Advanced Informatica users with some hands-on on power center.
In particular, Informatica partition post will help to understand below discussion points.
(1) What is Informatica Session Partition? (2) How to create Informatica session partitions? (3) Why informatica Session Partitioning is required? (4) What are partition points in Informatica? (5) When to partition an Informatica Session? (6) What are the different partition types available in Informatica? ( 7) How to choose a partition type? (8) What are the benefits of Session Partitions? (9) How to create partition points in Informatica? (10) How to partition Flat file sources and targets in Informatica? (11) How to partition Database sources and targets in Informatica? (12) What are the limitations of Informatica Partitions? (13) What is the suitable partition type for your transformation?
What is a Partitioning in Informatica?
Partitioning is the concept of introducing more processing channels (aka threads in Informatica) to enhance the performance of the Informatica session. Informatica offer you to choose number of partitions while coding and dynamically by passing a parameter as well.
- Dynamic Partitions – Partition type will be configured at the beginning and “Number of partitions” can be parametrized and passed through a session parameter $DynamicPartitionCount. In addition to user fed value; number of parameters can also be based on source partitions, number of nodes also.
- Non-dynamic Partitions – Number of partitions are fixed, you are required to change the code to change the number of partitions.
When to consider Partitioning a Session?
The most eligible cases to use partitioning in Informatica Sessions will be –
- When you are loading enormously huge volume of data say more than a 2 GB file (these numbers are indicative, it is always advised to consider your system resources and evaluate define performance benchmarks for your system).
- When you are loading a relational source data that already has partitions defined, and using those partitions will allow to improve the session performance.
- When you have session performance bottlenecks or when you want to optimize the Informatica code.
What are Partition Points?
Partition points create the limits between threads in the mapping pipeline. Informatica distributes data at each partition point there by increasing the performance.
Informatica has three types of threads – namely Reader, Writer and Transformation. You can add more partition points to increase the transformation threads. Adding partition point to a transformation will allow you to add partitions to that particular transformation and allow executing the code concurrently for every partition. Adding partitions is subjected to certain logical limitations based on the nature of the transformation; these are discussed at the end of this post.
What are different Partition types available?
There are different partition types available in Informatica –
- Database partitioning – Informatica look into the database (say Oracle database) for the partitions on a particular table and read data from the table partitions. This is available for DB2 source and target tables and Oracle source table only. When you are doing Database Partitioning, Informatica will create source filter property for each partition to allow you write filter condition for each partition.
- Key range Partitioning – You can define a simple or composite key and specify the range of values for each partition (as in example). This can be ideally used when there is a key available in source or target instances of the session.
- Round-robin Partitioning – The standard Round-robin algorithm will be applied if you choose this partitioning type. All the rows are evenly distributed to balance the load during the workflow execution
- Pass-through – This is a default partition type Informatica creates in every session. All the data will be passed through one partition in this kind by default. You are required to change the number of partitions to have a better performance
- Hash partitioning – Informatica distribute rows based on the groups defined (either by user or Informatica defined) in the mapping. Generally suitable for the transformations those work on groups of rows like Aggregator, Rank and Sorter.
Informatica Partitions – Dos & Don’ts
- Partition points cannot be created for some transformations – Source, Sequence Generator, Unconnected transformations
- Partition points cannot be deleted from – Source Qualifier, Normalizer Target transformations
- Flatfile sources will always have Pass-through partition type
- Joiner transformation can have Hash Auto keys and Pass-through partition types
- Lookup is an amazing transformation in terms of partitions, it can have any partition type except DB Partitions
- Database Partitioning cannot be used for Source Qualifier transformation when the session is configured with source / user based commit and constraint based loading
- Dynamic Partitioning Key ranges are always required to be close ended.
- Flatfile source should always be configured to use Pass-through partition type
- Informatica session will continue to be valid though the Partition rules are not accurately configured but the workflow fails during the execution of the session.
- Multiple thread reading is not allowed for XML source and FTP files
- Database partitioning cannot be used for Target transformation when –
- the session is configured with source / user based commit and constraint based loading
- the target tables are partitioned based on key range
How to add / change Partition Points and Partitions?
Here are the steps to add / change partitions
- Double click on session, you want to add / change partition points, “Edit Task” window will open
- Goto Mapping tab -> select “partitions” tab underneath the transformation list. It will open the mapping & partition information in preview pane.
- Select the transformation where you want to add a new partition point -> Click on “Add Partition Point”; Partition point browser will open
- Add number of partitions you intend to add and choose Partition type.
- Click OK to create a partition point
- Add Partition keys if you configured a key range partition. As in example given here, you can always leave the end range for the last partition to accommodate all higher valued keys in last partition.
How to Partition Flat files?
Sessions can run on single or multiple nodes (in a grid). Row distribution depends on two factors –
- Number of Threads
- Number of Partitions
A. Flatfile Sources
- Partitioning allowed for both Direct and Indirect type Flat file sources
- Some Source Qualifier transformation with subsequent transformations like Joiner transformation (configured with sorted input) require sorted row order.
- It is important to preserve the row order while distributing rows between partitions. This is to be achieved by setting an appropriate value for “Concurrent Read Partitioning” in Session properties.
B. Flatfile Targets
- While writing a target, you can configure Informatica to write to different files one for each partition or you can configure to write data concurrently.
- All these files can be merged to one file when the process completes as indicated in the merge type
- To merge the files, you can either set below properties / write a post session command – Merge Type, Merge File Directory, Merge File Name
- Informatica allow you to merge the target files in three different ways –
- Sequential Merge – Informatica creates one output file for each partition and sequentially merge them to a single file when the session completes
- Concurrent Merge – Informatica writes the output data for all partitions concurrently on to one merge file. Sorting order will not be preserved in this type of merge
- File List – Informatica creates and write one output file for each partition and write the file directory and file name to a separate list file.