-
Notifications
You must be signed in to change notification settings - Fork 1
Building your own model
Models should be developed independently in secure development environments. Models servers themselves will run in the catwalk base image, a python v3.7 environment, and it is therefore recommended to use this specific python version when developing the model.
Models have two requirements:
- A specification in
model.yml
- An implementation in
model.py
Models may also contain an optional requirements.txt
file to manage pip dependencies.
Note: At this time catwalk
is restricted to only support dependencies installable via pip.
Models need a specification inside a model.yml
file, containing:
name: "Model name (str)"
version: "Model version (str)"
contact:
name: "Contact name (str)"
email: "Contact email (str)"
schema:
input: "The input schema of the model in OpenAPI format (object / array)"
output: "The output schema of the model in OpenAPI format (object / array)"
This specification file is used to validate the incoming data posted to the server, as well as to form the docker tag from the model name and version.
Models must be implemented in a model.py
file, in a single class called Model
. This is the interface:
class Model(object):
"""The Model knows how to load itself, provides test data and runs with `Model::predict`.
"""
def __init__(self, path="."):
"""The Model constructor.
Use this to initialise your model, including loading any weights etc.
:param str path: The full path to the folder in which the model is located.
"""
pass
def load_test_data(self, path=".") -> (list, list):
"""Loads and returns test data.
Format of the returned data is similar to pd.DataFrame.records, a list of key-value pairs.
:param str path: The full path to the folder in which the model is located.
:return: Tuple of feature, target lists.
"""
pass
def predict(self, X) -> dict:
"""Uses the model to predict a value.
:param dict X: The features to predict against
:return: The prediction result
"""
pass
Example models are included in this repo for reference and convenience. Simply run them with your local python.
$ cd example_models/rng
$ python model.py
The pandas DataFrame is the go-to tool for many a pythonic Data Scientist.
To add support for DataFrames in the Model.predict()
method, specify io_type: PANDAS_DATA_FRAME
in the model.yml.
This will ensure that the X argument passed in is a pre-constructed DataFrame.
Note that you must return a DataFrame from the Model.predict()
method as well!
Important points:
- The model's IO schema can either be in "records" format (
[{column -> value}, … , {column -> value}]
) or simplified to a single record ({column -> value}
). - Similarly, the server will accept
input
data in both these formats. - pandas must be installed by a model's requirements.txt, to avoid binary or API incompatibilities between versions.
See example_models/dataframe
for an example.
In a CI/CD pipeline, Models are unit tested before they can be safely wrapped and deployed.
Note that a model may have it's own set of requirements, which will be installed by catwalk
in these tests.
$ catwalk test-model --model-path /path/to/your/model
Copyright 2020 Leap Beyond Emerging Technologies B.V. (CC BY 4.0 )