In this lesson we will:
- Introduce dbt, the leading tool for data transformations within the Modern Data Stack;
- Look at an example of a dbt model;
- Explain some of the high level benefits of using dbt.
dbt incorporates the concept of Sources. This is where we describe the source data which we are loading as inputs into our transformations. For, example common data sources for data teams include SaaS applications such as Google Analytics or Stripe or extracts from line of business applications.
If this source data contains errors, there is a risk that it could pollute our database or result in us presenting bad data to our end users. And if it does not meet our assumptions and business rules, then it is highly likely that our transformations can fail in strange and unexpected ways. For both reasons, it is worth testing the source data extensively before attempting to do anything with it.
dbt allows us to test sources using all of the mechanisms previously discussed, including property tests, generic tests and singular tests.
In the example below, we are using the in-build property tests to check that the order_id field on our Stripe extract is unique and not null.
sources: - name: stripe_extract description: Daily extract from Stripe tables: - name: orders columns: - name: order_id description: Primary key of the orders table tests: - unique - not_null