In this lesson we will:
- Introduce the concept of ETL;
- Discuss how we can use ETL tools alongisde ClickHouse;
- Discuss dbt, which is a special class of ETL tools.
In the previous lessons we have looked at various methods for manually taking data and ingesting it into ClickHouse.
These methods all put a fair degree of emphasis on the database administrator to write the ingestion logic and keep it running on an ongoing basis.
Though this might initially look like a simple thing to do, in reality, all kinds of complexities rear their head in the real world when we are running production ingestion jobs.
Fortunately, other ETL tools have been created to automate and run this process for us, and today many of them operate well against ClickHouse.
ETL actually encompasses all three stages - extract, transform and load.
- Extracts the data from the source system or data store;
- Transforms it into our preferred formats;
- Loads it into the data warehouse, in this case ClickHouse.
It can also support operationally by monitoring the integration jobs, running them on a schedule etc.
For the ClickHouse developer to write all of this themselves is a bigger task than it may initially seem, hence the need to integrate with third party tools.
Legacy ETL Tooling was quite painful. It was a very heavyweight enterprise class of tools, and had expensive licenses. The tools were.
Nowadays, we are fortunate to have more modern ETL tools that can be found in the Modern Data Stack. These are more often than not cloud based.
There is one particular etl tool which is a very popuplar and growing tool for ETL and ELT.
We have explored how dbt can be combined in ClickHouse in a number of articles.
Please see our course on dbt here.