In this lesson we will:
- Learn about dictionary objects;
- Demonstate how they are used in practice;
- Demonstrate how they can speed up query performance.
In ClickHouse, a dictionary is a mapping between keys and values that is stored in memory. The main use for dictionaries is in improving query performance, particularly with joins, as they are optimised for performance and avoid the need for disk IO.
Dictionaries can be used in a number of ways, such as mapping values from one domain to another in a query, or to speed up joins where we need to do simple lookups and translations which would otherwise require a traditional join.
There are various ways to create dictionaries in Clickhouse. For instance, they can be statically configured in the config.xml file, in external files, or created from data stored in a Clickhouse table.
Say we have a table of data in our uk_property_price_paid dataset.
One example for a dictionary might be to map towns to counties. For instance, the town Basildon is.
AMER --> America EMEA --> Europe, Middle East, Asia
ClickHouse provides a special object type for Dictionaries in order to provide some more efficient optimisations rather than simply creating and joining to lookup tables:
CREATE DICTIONARY counties AS
We can then join with the dictionary:
Dictionaries can be valuable for improving query performance by reducing the need to scan large tables or for mapping values between different domains.
The dictionary data is cached in memory, which makes dictionary lookups very fast.
Additionally, dictionaries can be set to refresh periodically or based on changes in the source data.