ETL Tools — The ELT vs ETL Process
How to ELT or ETL data to a warehouse, data lake, or mart
The “ETL” acronym can describe an electrical safety certification or a data process that will “Extract, Transform and Load” to a target destination. The acronym “ELT” is used to describe two pretty cool things. One is the world’s biggest eye in the sky called the “Extremely Large Telescope”. The other acronym is for data ingestion, loading, and processing called “Extract Load Transform”.
While it can seem like you are peering into the darkness when working on ETL and ELT projects, we won’t be talking about how to use a telescope to solve those challenges (or electrical safety testing).
In this article, we will provide a high-level, conceptual view of ELT and ETL definitions and example use cases to help illustrate common situations for data ingestion and migration. Lastly, we will show how ETL and ELT can coexist in a data workflow.
What is an ETL Tool?
- represents an ELT or ETL process for extracting, aggregating, organizing, converting, routing, mapping and storing data
- offer frameworks and processes to handle diverse and complex data ecosystems ensuring ease of access
- provide consistency and automation for development, testing, and operations
- increase business velocity for data processing, visualizations, reporting or statistical models by removing barriers to accessing data
What is an ETL Process?
Business is buried in data. The distance between data and those who need to use data has never been greater. When data resides hidden in systems, it is unseen and untapped for operations, insights and decision making. These are your data silos and ETL processes can help break them down.
An ETL process is meant to provide a framework to efficiently and programmatically break down data silos. A process encapsulates people and technology to solve ongoing and expanding challenges for data ingestion and data migration. An ETL process informs the right types of ETL tools that will work best for your data challenges.
Why we use ELT and ETL Tools
ETL has traditionally been considered difficult and resource-intensive given a variety of upstream data sources, ill-defined business logic, lack of expertise and legacy environments. The difficulty is largely centered on the transformation work required in advance of loading data to a target destination. Whether it is ETL or ELT, the heavy lifting required for transformation work is still present in today’s enterprises. This work is not getting any easier considering the data wrangling that occurs across so many data sources.
The purpose of commercial or opensource ETL tools is to facilitate easy data consumption. Ultimately, easy data consumption is a principal goal of any effort. ELT or ETL reflect attempts to achieve efficiencies and removal of barriers in achieving an easy data consumption outcome.
ETL vs ELT
When people discuss ETL alternatives the focus on primarily on ELT. We previously did a deep dive on ETL which briefly referenced ELT. We will be going into more depth in this article on both.
As we indicated previously, the transformations or the “T” in both ETL and ELT reflect business logic and models. Both process steps are similar, just in a different order. For example, the data extraction and data loading in both will generally look similar. While the shift only changes the order of operation, this can make all the difference for teams that need rapid access to data (even if the data is less polished).
The fact that “T” is a middle step is considered one of the main ETL implementation challenges. ELT puts the transformation of data after loading where ETL has it in the middle prior to loading. A complex transformation can take time to get defined, designed, implemented and tested. This can be a blocker to accessing less refined, but usable data.
Since potentially complex transformations do not slow access to data, teams can get access to it more quickly with an ELT process. While loaded data may be less refined, the fact that it is easily accessible delivers value to those that prioritize speed over further refinement.
Diagrams for ELT and ETL developers
Below are a few different ELT and ETL flow diagrams for ETL developers. The data source for these flows can reflect batch and streaming data ingestion. These examples reflect a high-level data ingestion pipeline using both ETL or ELT. Normally, this both would involve automated data ingestion
1. ETL process in data warehouse example
In this example, we show the data extraction from a source through a set of data transformation tasks and loaded into a target warehouse or data lake destination. The data extraction in ETL, as well as transformation and loading, can be seen in this basic ETL BigQuery example:
This end-to-end ETL example reflects a data extraction process, data mapping, and loading to a target data warehouse. It reflects ETL basic concepts for the process.
While we only show one data source, the would apply to multiple sources as well. Another note, we show BigQuery in the example but this could easily reflect an Amazon Athena or Redshift ETL process within an AWS pipeline. So the best tool, may not be ETL at all for systems like Redshift or Athena. Maybe you should consider ELT (see below).
2. ELT tools example
In this example, we show ELT or the “extract, load, transformation” process. It’s helpful to think of ELT tools and processes as an evolution of traditional ETL methods.
As we mentioned previously, the transformation (“T”) step is transitioned to a post process. In some regards, the “T” might not even be needed depending on the use case. It might just be “EL” or extraction and loading.
If transformation is needed, it is executed at a later date and time by possibly by a different team. The point here is that in ELT an often complex transformation process is abstracted from the workflow to increase the velocity of data flow.
Show how would this change in a BigQuery vs Redshift workflow? Not much.
A Redshift ETL or ELT process will be similar but may vary in ETL tools used. There is a collection of Redshift ETL best practices, even some opensource tools for parts of this process. However, from an overall flow, it will be similar regardless of destination,
3. ELT vs ETL architecture: A hybrid model
ETL often is used in the context of a data warehouse. Our examples above have used this as a primary destination.
ETL and ELT serve a broader purpose for applications, systems and destinations like data lakes and data marts. Keep in mind this not an ETL vs ELT architecture battle, they can work together. Below is an example of a hybrid model:
In this example, there are two parts to the ELT and ETL architecture. The first team used an ELT process that would send Adobe event data to an AWS data lake. The second team would do a data migration the data from AWS to an enterprise Oracle Cloud environment using Oracle ETL systems. This allows team two to run ETL testing jobs in Oracle, create new data models or applications without impacting downstream workflows
Why AWS to Oracle? It was the most efficient and cost-effective data consumption pattern for the Oracle BI environment. The agility and economics of using the AWS rather than a traditional ETL data migration process were 100X less expensive and much faster.
When looking at data ingestion in an AWS data lake like the example above, some level of curation should be implemented. This the case for other destinations like marts and warehouses too. Largely, this is a function of the extraction applications you are creating. The extraction methods in data warehouses such as the Oracle example above will vary based on the tools Team Two wanted to use.
4. ETL and ELT tools in business intelligence
This example is a derivative of the number 3. This is a demonstration of an ELT and ETL business intelligence extension.
In our use cases, we will assume the Oracle environment Team Two manages does not allow a specific group of analysts access. The analysts are able to undertake data analytics based on the ELT work Team One completed. In the example below is reflects how the same workflow supports Tableau data prep as well as a Tableau AWS connection to a data lake and Amazon Athena.
The efficiency of this model enables the BI team to leverage the ELT model from team one, connect Tableau to AWS Athena so they can accomplish their reporting work.
Is Tableau an ETL tool in this use case? No, it is purely used for data analysis. This is not to say Tableau could not undertake some basic transformation work. This answer may change slightly if the questions were is Tableau Prep an ETL tool. With Tableau Prep, you are certainly undertaking some level of ETL work. You could also replace Tableau with some form of Alteryx ETL process.
The point here is that rather than have this be ETL vs ELT, you can have an approach that gives you the flexibility to have a conversation about one or more tools performing transformations on the same data.
5. ELT and ETL data for Adobe clickstream events
This example is a derivative of the number 4. Similar to the last example, rather than use commercial offerings like Informatica ETL or Talend ETL for processing Adobe data, a data lake and Oracle ETL process was used. This shows business intelligence operations occurring from the Oracle EDW rather than the data lake.
The process for Oracle ETL team is simplified given the use of the optimized data lake architecture and organized data catalog. The Oracle team simply calls ETL schema endpoint for each master view they want to tap for data extraction. They undertake batch processing at a set schedule to extract the data they need. Of course, if needed the underlying data is available in the lake, but using the catalog for data extraction in ETL the process a snap.
In the case, the Oracle ETL developers created a workflow to support business operations and a separate Tableau ETL data analytics. This provided an ETL automation framework that leveraged Adobe, Amazon and a data lake as a landing zone or staging area.
Opensource ETL Tools
- Talend ETL (Open Studio)
- Apache Airflow (also described in a class of ETL scheduling tools)
- Clover ETL
- Pentaho ETL (Kettle)
Cloud ETL Tools
- AWS Glue
- AWS Data Pipeline
- Azure ETL Tools (Azure Data Factory)
- Google Cloud Dataflow
- Domo ETL Tool (ETL Magic)
Traditional ETL Software
- Informatica ETL Tool
- Oracle ETL (Oracle Data Integrator)
Python is accessible and ubiquitous in ETL and ELT. From Python SDKs that support an ETL extraction process to transformations in Pandas, Pyarrow..to loading in a target destination with SQLAlchemy, you can find Python everywhere. Python is a jack of all trades toolkit, either as a standalone solution or as a component in open source, cloud or commercial products.
On Quora the there is a question asking “Is SQL an ETL tool?”. Technically speaking SQL could be considered a tool for ETL. You can certainly extract, transform and load data via SQL. It might not be the best tool, but you could not categorically say it could not be used as one. So in a basic use case, yes, SQL could be used as a simple ETL tool though there are likely better options (see Python above).
Summary: Building out your ELT and ETL pipeline
If you are an entry level ETL developer or an old pro, you know the best ETL or ELT is a function of available tools, skills, experience, and attention to detail. You know that ETL and ELT are not mutually exclusive of each other. Both can be considerations as parts of a broader data integration strategy. Each may reflect a stage or a step in a data migration process that occurs within the enterprise.
If you are currently designing ETL processes for data warehousing or a data lake, you may want to consider ELT or a hybrid approach. The hybrid model is typical with data lake ETL and ELT workflows.
Any specific ETL vs ELT pros and cons will be dependent on your situation. For example, if you are using Google Cloud or Amazon Web Services, both offer a collection of services to create or host your cloud ETL or ELT. Each will provide will offer a unique set of capabilities and service offerings to assist in rolling your own tools.
Are you looking for AWS ETL partners? A simple, easy to use online ETL tool? The best ETL tools are the ones you don’t have to learn how to code or deploy. Openbridge offers ELT data ingestion as a service. We offer both batch data ingestion service as well as streaming API data ingestions.
In addition to our batch data processing system and streaming API, we also offer pre-built data connectors to popular services like Facebook, Amazon Seller Central, Google Ads, Salesforce, Google Analytics 360, YouTube and many others.
If you do not want to learn ETL tools and are looking for a code-free, fully automated, zero administration ELT data pipelines, we have you covered.
We have launched a code-free, zero-admin, fully automated ETL process to leading cloud warehouses and lakes.
Get started with Amazon Redshift, Google BigQuery, Redshift Spectrum or Amazon Athena for free!
DNot ready yet get started just yet? If you are trying to learn how to implement the ETL process, or want to explore an ELT model, reach out to our team of data experts to discuss your needs further.
Need a platform and team of experts to kickstart your data and analytic efforts? Our ELT and ETL toolkits can help! Getting traction adopting new technologies, especially if it means your team is working in different and unfamiliar ways, can be a roadblock for success. This is especially true in a self-service only world. If you want to discuss a proof-of-concept, pilot, project or any other effort, the Openbridge platform and team of data experts are ready to help.
Visit us at www.openbridge.com to learn how we are helping other companies with code-free, fully automated ETL Process.