This sample creates a simple word count Spark application. It uses TravisCI GitHub integration to run unit tests, and Azure DevOps GitHub integration to deploy to Azure Databricks.
CI/CD is set up as following -
- When code is checked into a feature branch, Travis CI will kick off a validation build and run unit tests.
- Once the validation build succeeds, the PR can be merged to the master branch.
- Once the PR is merged to the master branch, Azure DevOps will kick off a build. When azure-pipelines.yml exists in the root folder of the project, Azure DevOps will attempt to use it as the build pipeline for the project. It also has continuous integration enabled by default. Alternatively, you can create a build pipeline using Azure DevOps visual designer, as seen in this equivalent pipeline.
- After the build succeeds, Azure DevOps will run a release pipeline that invokes a bash script that uses
Databricks Cli
to create a job in Azure Databricks, run spark-submit of the built jar, and report back if the job ran successfully.
It's also possible to run unit tests with Spark applications using Azure DevOps alone by
- installing dependencies on the agent,
- or using a private build agent with dependencies already installed,
- or running a build job on a container. See this example.
We use Travis CI because it's simple to use, and we demonstrate you can integrate GitHub with Azure DevOps for a complete CI/CD process.