In this lesson we will:
- Show to test seed data.
dbt incorporates the concept of seed data. This is data that we would like to populate our database with to enable our transformation code, such as lookup lists and static values.
As with everything else in our project, it can be useful to test our seed data for correctness. Though this is a niche requirement, one situation is that our seed data could be changed over time independent of the model code. If someone was to enter bad seed data, it could violate our assumptions and lead to bad data being delivered or failing transformations. As with sources, it is therefore worth validaing the quality of our seed data early in the pipeline.
Image we have designed a seed data file in the following format:
code,description M,Male F,Female
We would test the seed data in the same way that we would test a model or a source, specifically by describing the seed object and adding a property in the YAML file:
seeds: - name: genders description: Gender Codes columns: - name: code tests: - unique - not_null - name: description tests: - unique - not_null
These tests would be executed during the usual dbt test cycle:
In this example, we can see that the test has not met the expectation. The seed data would need to be modified or we could change the transformation logic to reflect that our assumption has been violated.
It is also possible to limit our test runs to only testing seed data through the use of a selector:
dbt test --select config.materialized:seed