“ What is ETL ? ” This month, almost every week I am being asked this question. it's a obvious questions for many of my new colleagues who are in other technologies like Dot net, Share point, antenna mobile, Biztalk, Extraview etc.
Probably , You might also need to know this term because somewhere you would have come across one of following term like “ETL tools”, “ETL technology”, “ETL integration” etc , etc …. and one of these term would have put interest in you to know more about that “what it is?”.
As usual ‘ What is ETL ’ can also be defined in many ways and here I am attempting to write two versions
A Non-geek perspective !!!!!!!!!!!!!!!!!!
Geek perspective !!!!!!!!!!!!!!!!!!
ETL is ECL
What is ETL ? (Common perspective – Non Geek )
ETL is shortened form of EXTRACT, TRANSFORM, LOAD. It is a process or methods where with any means data gets extracted from some source , then this data gets changed (transformed ) as per specific need , further changed ( transformed ) data gets loaded to another system mostly known as target system. Let’s take simple example of school to understand this. Let's Say, Your school principal wants to know that every month how many students are successfully clearing subject tests. In another words, how many (%) students failing and in which subject they are failing?
Here application which will work for principle, Will pull data from exam management system and HR system(called as EXTRACT). Pulled data will be coupled and integrated to sum up student marks (referred as TRANSFORM), further this data will be placed in princy’s application (referred as LOAD). Isn’t simple???
What is ETL ? (Geek perspective)
ETL is generic term which expands as Extract, Transform and Load. This technology heavily leveraged in EAI (Enterprise application integration ), DI(Data integration) and BI(business intelligence ) space. Over period of time ETL has significantly served enterprise IT's application integration needs. Application integration is next big challenge in enterprise IT space where organizations see high potential for tools.
Two Real time scenario
First sample, Fetching product complaints from salesforce.com (Cloud based CRM system) to make this data available to product quality team to further analyze so it can eventually help them to make that product better. In this case ETL can establish process to make this data available in a meaningful way, so it could easily understood by product quality team.
Another Sample, Providing drug adverse effect data to the team of scientist who can further review it on periodic basis. In this case ETL can pull data from multiple systems (Web, standalone, Mainframe etc) and push it to system getting used by scientist.
5 Interesting facts about ETL
- Business intelligence is the most common space for ETL. Where it gets leveraged to pull data from different systems. This data gets transformed. and at last this data gets pushed to data base tables. These tables later get used by another IT application which provides reports to business user.
- This technology works behind or in-between most of the time without any user interface. Many tools are available to perform ETL activities and those tools are generally called as ETL tools.
- moreover helps organization to achieve single version of data. This itself is a big topic and need to be focused separately.
- You might be having questions revolving around “Why should I use tools when I can do all these things with data base SQL” . Ahead are few reasons because of which tools gets used by enterprises.
ETL tools are helpful than internal coded ETL (SQL)
Following are few of the most common reasons to go for tools
- Ease of use
- Built-in objects: Like aggregators
- Easy debugging
- Good Auditing capabilities
- Easy skill availability: Believe me there is a dearth of talent who can code it with SQL or PLSQL.
- Structured development path and extensibility
ETL Application Flow
A typical application data flow looks like this, Data first pulled in staging area from multiple sources. Staging area is a temporary area which helps to perform data standardization , cleansing and scrubbing. Staging area contains standardized form of data. after this data gets transformed as per business needs. further this data put up in target system. this system eventually use this data to perform ad-hoc reporting. Reporting systems are altogether different system and there are different challenges associated with them.