In this lesson we will:
- Learn more about how tables are managed in a ClickHouse database.
Tables are of course the central concept of any relational database. They are where our data is typically stored and the key abstraction for manipulating or querying your datasets.
In ClickHouse, they can be created using the CREATE table command with a series of typed column names:
CREATE OR REPLACE TABLE people ( id UInt64, name String, updated_at DateTime DEFAULT now() ) ENGINE = MergeTree ORDER BY id;
It is possible to specify one of our rows as a primary key. This is often a unique identifier such as an order or customer id.
In most database, the primary key is implicitly unique and the database will enforce this with a constraint.
This is not the case in ClickHouse. Primary Keys are not unique, and multiple rows can have the same primary key.
Instead, the main function of a primary key is for sorting data physically on disk.
If a primary key is not specified, then the primary key is implcityly the columns in the order by clause because they are logically the same. This is demonstrated in the query above where we did not set a primary key, so instead the ORDER BY clause was used.
Another difference with ClickHouse in comparison with other databases is that you need to specify the table engine which will be used to actually manage the data. Table engines are described in more detail in our course.
It is possible to specify constraints on a ClickHouse table which limit the data which can be entered into the database. For instance, maybe for a given field, all entries must be a positive number, or all entries must come from a specific set of values.
The downside of constraints is that they will slow down your inserts. However, it may be worth this penalty as they can enforce cleaner data and you can assume guarantees about your data after the fact which could make your queries easier to write and more efficient.
ClickHouse will work to compress your raw data as efficiently as possible to reduce the storage requirements.
Compression can be set at the server level, and can also be specified at the column level with a CREATE TABLE query.
Compression is not supported by all table engines.
Once a table has been created, the next task is of course to insert data into it.
This can be done with an ad-hoc insert statement. However, OLAP databases work at their most efficient when inserting batches of rows. ClickHouse can comfortable handle inserts containing millions of rows in each batch.