Skip to content

Spark job with unit test and integrated with Travis CI and Azure DevOps CD

Notifications You must be signed in to change notification settings

liupeirong/spark-cicd

Repository files navigation

Build Status Build Status

Spark CI/CD

This sample creates a simple word count Spark application. It uses TravisCI GitHub integration to run unit tests, and Azure DevOps GitHub integration to deploy to Azure Databricks.

CI/CD is set up as following -

  • When code is checked into a feature branch, Travis CI will kick off a validation build and run unit tests.
  • Once the validation build succeeds, the PR can be merged to the master branch.
  • Once the PR is merged to the master branch, Azure DevOps will kick off a build. When azure-pipelines.yml exists in the root folder of the project, Azure DevOps will attempt to use it as the build pipeline for the project. It also has continuous integration enabled by default. Alternatively, you can create a build pipeline using Azure DevOps visual designer, as seen in this equivalent pipeline.
  • After the build succeeds, Azure DevOps will run a release pipeline that invokes a bash script that uses Databricks Cli to create a job in Azure Databricks, run spark-submit of the built jar, and report back if the job ran successfully.

It's also possible to run unit tests with Spark applications using Azure DevOps alone by

We use Travis CI because it's simple to use, and we demonstrate you can integrate GitHub with Azure DevOps for a complete CI/CD process.

About

Spark job with unit test and integrated with Travis CI and Azure DevOps CD

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published