In this lesson we will:
- Introduce Kafka;
- Describe how Kafka and ClickHouse are typically integrated together.
Kafka is the leading platform found in industry for working with streaming and real-time data. For more details about Kafka, please consult our course which focusses specifically on it.
As ClickHouse is also often used for real-time analytics use cases, the combination of ClickHouse and Kafka is a very common one. So much in fact that ClickHouse comes bundled with a native integration.
This integration is exposed as a table engine. We create the table like any other, specifying the address of the broker and the topic to listen to.
To create a table engine in ClickHouse with the Kafka table engine, you'll need to define a table in ClickHouse that uses the Kafka engine to read data from Kafka topics. The Kafka engine allows you to ingest data from Kafka into ClickHouse for real-time analytics and processing.
CREATE TABLE your_kafka_table ( key String, value String, timestamp DateTime ) ENGINE = Kafka SETTINGS ( 'kafka_format' = 'JSONEachRow', 'kafka_broker_list' = 'your_kafka_broker_1:9092,your_kafka_broker_2:9092', 'kafka_topic_list' = 'your_kafka_topic', 'kafka_num_consumers' = 1, -- Adjust as needed 'kafka_group_name' = 'your_kafka_consumer_group', 'kafka_security_protocol' = 'PLAINTEXT', -- Or 'SSL' for secured Kafka 'kafka_sasl_mechanism' = 'PLAIN', -- Mechanism depends on Kafka security settings 'kafka_auto_offset_reset' = 'earliest' )
Tables backed by the Kafka engine are slightly different to normal tables. They do not contain any data, but they are designed to mimic a stream.
We need to create a materialsied view to read from the stream.