A set of tools and examples to run machine learning tests on ML hardware accelerators (TPUs or GPUs) using Google Cloud Platform.
This is not an officially supported Google product.
In this mode, your tests and/or models run on an automated schedule in GKE. Results are collected by the "Metrics Handler" and written to BigQuery.
This route is recommended if you have many tests that run for a long time and produce many metrics that you want to monitor for regressions.
- Install all of our development prerequisites.
- Follow instructions in the
deployments
directory to set up a Kubernetes Cluster. - Follow instructions in the
images
directory to set up the Docker image that your tests will run. - Deploy the metrics handler to Google Cloud Functions.
- Set up your tests. Here you have 1 of 2 choices:
- (Optional) Set up a dashboard to view test results. See dashboard directory for instructions.
In this mode, your tests run on GKE but are tied to a CI platform like Github Actions or CircleCI. Tests can run as presubmits for pending PRs, as postsubmit checks on submitted PRs, or on a timed schedule.
This route is recommended if you want some tie-in with Github and your tests are relatively short-running.
- Install all of our development prerequisites.
- Follow instructions in the
deployments
directory to set up a Kubernetes Cluster. - See the ci_pytorch directory for the last few setup steps.
Are you interested in using ML Testing Accelerators? E-mail [email protected] and tell us about your use-case. We're happy to help you get started.