Wednesday, November 14, 2007

EAI or ETL ?

I see a lot of confusion in the industry today about when to use EAI and when to use ETL. A very large EAI implementation at a Canadian processed foods company is a performance dud because it is basically a database integration which could have been done much more efficiently the ETL route.

ETL tools are most appropriate for data integration that consists of data synchronization between applications, and for point-to-point, single step interactive processing. Real-time data oriented integration projects that involve large amounts of data, complex transformations, or data augmentation are appropriate for these tools. You will also get better performance moving and transforming large chunks of data with an ETL tool performing relational database-type operations on large amounts of data.

EAI tools are most appropriate for process integration, where a contract (in the form of a schema say) needs to be exposed for multiple parties to consume and adhere to, and where the end systems are not fixed in stone but can be replaced.
Let's capture these differences definitvely:
  1. EAI is message oriented (act of a single row of data which is treated as a ‘message’ performing conversion, transformation or string operations), while ETL is data-set oriented (able to perform joins, merges, unions, sorts, aggregations, pivots on a huge set of data at once)
  2. ETL can perform heavy-duty data lifting EAI are more geared towards heavy throughput of messages or transactions especially when moving to many destinations.
  3. ETL is better at relational data manipulation/transformation while EAI is better at hierarchical data manipulation (XML, Flat files)
  4. EAI tools are not generally designed to understand the data schemas of the applications and to perform data transformations. They are designed to interact with the applications at an API level. To do relational and complex transformations you have to drop down to writing code. ETL is a natural at this and offers significant performance gains by using the right tool for the right job
  5. EAI is based on a pub-sub model while ETL is based on an on-demand model. EAI subscriptions interface is based on a schema rather than an ERD data model which is far more difficult to share between data stores. Hence EAI subscribers are contract-based and replaceable
  6. EAI connects disparate applications while the destination of ETL is usually a data store or a data warehouse
  7. EAI provides much better workflow and process integration capabilities while ETL is good at integrating data.