Skip to content
Dr. Rob Lambert, PhD edited this page May 14, 2020 · 31 revisions

What is catwalk?

Simply put, catwalk is a model wrapping and serving platform (hence the name), for your python data science models. It provides a simple and automated method to wrap and test a generic python-based model into a production-ready, dockerised REST API server.

The catwalk package is made for:

  • Data scientists looking for an efficient and effective way to productionise your model,
  • Data engineers that want to build, maintain and test production data pipelines, and
  • Infrastructure engineers exploring ways to deploy data science models to production.

What does catwalk do?

  • Make it quick and easy for data scientists to get their models to production
  • Ensure robustness with thorough testing (the model, server, docker image, and input/output data)
  • Productionisation of models is handled automatically, using standardised best practices
  • Models can be versioned and built into a deployment-ready, secure and scalable docker images

This is done through the catwalk command line tool:

  • catwalk test-model tests the model against test data and I/O schema
  • catwalk serve wraps the model and creates a REST API that validates model input and output
  • catwalk test-server tests the model server
  • catwalk build-prep creates standard build files (Dockerfile, nginx configuration, ...)
  • catwalk build builds a secure and scalable docker image
  • catwalk test-image tests the docker image
  • catwalk deploy-prep creates standard deployment files (docker-compose.yml, ...)

Using the above commands you can swiftly wrap models via a CI/CD pipeline for cloud deployment.

Where does catwalk fit into the Data Science Process?

A data scientist can build their model however they wish, using any (pythonic) tools they like, then wrap the result in catwalk. A CI pipeline can then automate test-build-test-package-test, and an engineer or CD pipeline can receive a production-ready artifact to launch into production.

  • catwalk helps guide decisions on productionization
  • catwalk helps document and package models once they are trained
  • catwalk streamlines the steps from "I have a trained model" to "I have a model ready for production" into two small files and a simple CLI.
  • catwalk happens before your production environment, and is agnostic to the details of the production environment (although by default it assumes REST communication between containers)
  • catwalk is agnostic to precise CI/CD tools (by default assumes dockerisation)
  • catwalk is agnostic to the model and training regime/environment (except for assuming python at the moment)

How does catwalk compare to similar packages?

catwalk is heavily influenced by several industry-leading open source projects (Amazon SageMaker, RedHat OpenShift S2I, DataBricks MLflow and Google Kubeflow).

Feature MLflow Kubeflow catwalk
Python support ✔️ ✔️ ✔️
Other languages support ✔️
Command line tool ✔️ ✔️ ✔️
Model training ✔️ ✔️
Model testing ✔️
Model serving ✔️ ✔️ ✔️
Model I/O schema validation ✔️
SSL support ✔️ ✔️
Stateless API ✔️
Docker build ✔️ ✔️ ✔️
Model deployment ✔️

Learn more

Want to learn more about catwalk? Here, you can find some step-by-step guides:

And here you can find further explanations about different parts of catwalk:

Licensing of Catwalk

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.