In this lesson we will:
- Introduce the concept of dbt properties;
- Explain how to define static tests using properties;
- Show how these tests are processed by dbt to produce compiled sql.
Properties are used to define extra information about our project components in structured YAML files. Properties add further context about the SQL code that we have written in a structured way.
For instance, properties can be used to define details about the columns or schemas of our models:
models: - name: average_trip_distance description: "Average distance of each trip" columns: - name: average_trip_distance description: "The average trip distance in the dataset" tests: - not_null
Note that historically, properties were defined in a file called schema.yml. However, in recent versions of dbt the file can have any name so long as it has the extension .yml and is stored within your models directory. Some documents may still refer to them as schema files or schema.yml.
Some of the easiest tests to define are to use properties to specify tests against our models. For instance, in the example below we have a model named exams, and we have written to ensure that the grade column only has values passed or failed.
- name: grade tests: - accepted_values: values: ['passed', 'failed']
If a row contains a value other than passed or failed then the test will fail.
dbt comes with a small set of tests that we can use to make assertions about our models through properties. These include:
Though this doesn't sound like many, they can go a long way in building confidence in our data before even attempting to do anything more complex. Checking for no nulls, unique values, that all values are in some set, and adding in referential integrity checks is a good base to build on.
We can however extend this base set of property tests in two ways.
Firstly, we can write our own Generic Tests where we define our own tests in parameterised macros. These tests could test something generic about the data, such as whether or not it is a positive number, or something more tailored to your domain. Generic tests are then accessed via properties:
models: - name: orders columns: - name: order_category tests: - is_available_order_category
Secondly, we could use a library of pre-built generic tests. A popular library we have used is this port of the Great Expectations test suite. This will enhance your dbt project with tens of potential property tests.
Again, these are accessed as properties:
tests: - dbt_expectations.expect_column_values_to_be_between: min_value: 0 # (Optional) max_value: 10 # (Optional) row_condition: "id is not null" # (Optional) strictly: false
Between the built in tests, tests provided by third parties and any bespoke generic tests, it should be possible to get very good test coverage of datasets.
When we run a dbt compile or a dbt run, dbt has to go through the process of translating the entries in our YAML properties files into SQL files which can then be executed to check the test.
name: payments_by_type_very_large description: "large payments by type" columns: - name: constant_currency description: "Currency of the payment" tests: - not_null
This SQL can be found in the target folder of our build:
select constant_currency from taxi_production.payments_by_type_very_large where constant_currency is null
Though ordinarily we wouldn't need to check these files, it can be useful to understand this process and how to find the code when we trying to debug a failing list.