We are pleased to announce Ensemble CI, an Open Source Continuous Integration and Continuous Delivery (CI/CD) platform designed specifically for the needs of Data Engineers.
Ensemble CI is designed to help businesses improve the quality, reliability and timeliness of their data pipelines by implementing an automated but controlled development lifecycle around their data transformation code.
Ensemble CI is built around the open source project dbt Core, a new but rapidly adopted tool that is used for defining and executing data transformations using SQL.
Though dbt Core moves Data Engineering forward by enabling best practices such as source controlled SQL, reusable modular code and automated testing, it does not impose a development process or "path to production".
This leaves data teams with the problem of agreeing their development workflow, and then automating it using a CI/CD platform such as Jenkins, Gitlab or Github Actions, or perhaps an orchestrator such as Airflow or Dagster.
Though these tools are undoubtedly powerful, they are general purpose, and can require complex custom scripting and ongoing maintenence. Building a fully featured deployment pipeline often becomes a distraction for data teams who would prefer to be working on higher value activities than automating deployments.
Recognising this, Ensemble CI has been developed specifically to solve the problem of CI/CD for Data Engineers who use dbt. This allows us to build a more opinionated, out of the box experience and a user experience that is tailored for the workflow of Data Engineers.
Data Engineering teams that implement Ensemble would likely experience the following benefits:
As stated, Ensemble avoids the need to build a custom deployment pipelines using a generic CI/CD tool that was designed for Software Engineers. Instead, we offer a simple, opinionated and out of the box user experience tailoed to this one problem.
Teams using Ensemble CI will be able to continually combine their code on branches or the mainline, and have it automatically built, deployed and tested on each check-in or pull request. This allows teams to check the accuracy of their combined work and identify integration issues early where they are easier and cheaper to resolve.
Ensemble CI will support data teams towards a "Continuous Delivery" model, whereby changes are automatically pushed through development, testing and production environments as confidence is increased. This avoids the problem of big, risky releases and gets changes into the hands of users much faster in smaller increments of change.
Ensemble CI supports the use of automated tests and quality checks to drive up quality and ensure accuracy in your production data. Any automated test failures will be flagged as early as possible before breaking changes enter your production environment.
Ensemble CI will allow you to define a concrete path to production which developers will never bypass, ensuring that code is linted, tested and deployed to the correct development and test environments according to your teams agreed rules.
Ensemble CI can act as a central hub which members of the team can consult for a graphical overview of their projects code and state. This improves collaboration by acting as a lightweight data dictionary or data catalogue as well as a central hub for how your project is evolving.
To bring this to life, we have produced a short demonstration of the end to end CI/CD process with Ensemble. This shows the process of fixing models and tests on a branch and then merging them to the mainline via a pull request:
If these benefits sound appealing, give Ensemble a try today! We offer an open source version which can be deployed locally in your environment, or a fully managed Cloud service. Please join our community Slack channel to share any comments or feedback.