Skip to content

Jupyter Notebook Step by Step Guide

mirandachong edited this page May 13, 2020 · 1 revision

Jupyter notebook walkthrough

You can find a Jupyter Notebook tutorial at tutorials/sklearn/catwalk_sklearn_tutorial

This tutorial is for:

  • Data scientists with a basic understanding in developing models using Jupyter notebook.
  • Data scientists learning to prepare machine learning models for deployment.

In this tutorial:

Step by step guide:

  1. Open Jupyter notebook and load the tutorial notebook.
  2. Run the cells one by one. In the first cell you will install the dependencies if you do not have them installed. If you already have the dependencies installed, you will see Requirement already satisfied. Otherwise, wait for the packages to be downloaded and installed. It should only take a few minutes.
  3. You will then load an example dataset which should return the following output:
Number of training examples: 422
Number of testing examples: 20
  1. Then you will train a very simple linear regression model using the default parameters, and should see the output:
LinearRegression(copy_X=True, 
fit_intercept=True, 
n_jobs=None,
normalize=False)
  1. The model evaluation step will show you the Coefficients, Mean squared error, and Coefficient of determination.
  2. You will then plot the result, which should look like this: linear regression example plot
  3. You will saved the model properties and structure in pickle format.
  4. The model itself will be saved in a .py format, in the tutorial it will be saved as model.py.
  5. Then you will save model metadata in YAML format, which contains the model's name, version and contact information, and validation information.
  6. In this tutorial, the requirements.txt will just contain the package sklearn; you can include other packages that you have used in your development.
  7. You can then test the model and server to check whether the implementation is expected. The output should return OK for both tests.
  8. In a separate terminal, navigate to catwalk/tutorials/sklearn, and then start the catwalk server $ catwalk serve --debug. When the server is ready, you will see the message * Running on http://0.0.0.0:9090/.
  9. Then you can run the last two cells in the notebook, which requests and returns model metadata; and sends a value to the model for prediction. The outputs will be returned in json format.
  10. Stop the server in the terminal by CTRL-C.

Folder structure

After running the tutorial notebook, you will see the following items in your sklearn folder:

  • catwalk_sklearn_tutorial.ipynb - The tutorial notebook
  • docker-compose.yml - docker compose file containing configuration info of docker
  • Dockerfile - the file which contains all commands and info to assemble an image
  • model.pkl - the pickle file you have generated which contains model artifact
  • model.py - the python model
  • model.yml - model metadata
  • requirements.txt - python packages required for running the model