Skip to content

Transpile trained scikit-learn models to C, Java, JavaScript and others.

License

Notifications You must be signed in to change notification settings

QuantJia/sklearn-porter

 
 

Repository files navigation

sklearn-porter

Build Status PyPI PyPI GitHub license Join the chat at https://gitter.im/nok/sklearn-porter

Transpile trained scikit-learn models to C, Java, JavaScript and others.
It's recommended for limited embedded systems and critical applications where performance matters most.

Machine learning algorithms

Algorithm Programming language
Classification C Java JavaScript Go PHP Ruby
sklearn.svm.SVC
sklearn.svm.NuSVC
sklearn.svm.LinearSVC
sklearn.tree.DecisionTreeClassifier
sklearn.ensemble.RandomForestClassifier
sklearn.ensemble.ExtraTreesClassifier
sklearn.ensemble.AdaBoostClassifier
sklearn.neighbors.KNeighborsClassifier
sklearn.neural_network.MLPClassifier
sklearn.naive_bayes.GaussianNB
sklearn.naive_bayes.BernoulliNB
Regression
sklearn.neural_network.MLPRegressor

✓ = is full-featured, ○ = has minor exceptions

Installation

pip install sklearn-porter

If you want the latest bleeding edge changes, you can install the module from the master (development) branch:

pip uninstall -y sklearn-porter
pip install --no-cache-dir https://github.com/nok/sklearn-porter/zipball/master

Minimum requirements

- python>=2.7.3
- scikit-learn>=0.14.1

If you want to transpile a multilayer perceptron (sklearn.neural_network.MLPClassifier), you have to upgrade the scikit-learn package:

- scikit-learn>=0.18.0

Usage

Export

The following example shows how you can port a decision tree model to Java:

from sklearn.datasets import load_iris
from sklearn.tree import tree
from sklearn_porter import Porter

# Load data and train the classifier:
samples = load_iris()
X, y = samples.data, samples.target
clf = tree.DecisionTreeClassifier()
clf.fit(X, y)

# Export:
porter = Porter(clf, language='java')
output = porter.export()
print(output)

The exported result matches the official human-readable version of the decision tree.

Prediction

Run the prediction(s) in the target programming language directly:

# ...
porter = Porter(clf, language='java')

# Prediction(s):
Y_preds = porter.predict(X)
y_pred = porter.predict(X[0])
y_pred = porter.predict([1., 2., 3., 4.])

Accuracy

Always compute the accuracy between the original and the ported estimator:

# ...
porter = Porter(clf, language='java')

# Accuracy:
accuracy = porter.predict_test(X)
print(accuracy) # 1.0

Command-line interface

This example shows how you can port a model from the command line. First of all you have to store the model to the pickle format:

# ...

# Extract estimator:
joblib.dump(clf, 'model.pkl')

After that the model can be transpiled by using the following command:

python -m sklearn_porter --input <pickle_file> [--output <destination_dir>] [--language {c,go,java,js,php,ruby}]
python -m sklearn_porter -i <pickle_file> [-o <destination_dir>] [-l {c,go,java,js,php,ruby}]

The following commands have all the same result:

python -m sklearn_porter --input model.pkl --language java
python -m sklearn_porter -i model.pkl -l java

By changing the language parameter you can set the target programming language:

python -m sklearn_porter -i model.pkl -l c
python -m sklearn_porter -i model.pkl -l go
python -m sklearn_porter -i model.pkl -l java
python -m sklearn_porter -i model.pkl -l js
python -m sklearn_porter -i model.pkl -l php
python -m sklearn_porter -i model.pkl -l ruby

Further information will be shown by using the --help parameter:

python -m sklearn_porter --help
python -m sklearn_porter -h

Development

Environment

Install the required environment modules by executing the script environment.sh:

./recipes/environment.sh
conda config --add channels conda-forge
conda env create -n sklearn-porter python=2 -f environment.yml
source activate sklearn-porter

Furthermore Node.js (>=6), Java (>=1.6), PHP (>=7), Ruby (>=1.9.3) and GCC (>=4.2) are required for all tests.

Testing

The tests cover module functions as well as matching predictions of transpiled models. Run all tests by executing the script test.sh:

./recipes/test.sh
source activate sklearn-porter
python -m unittest discover -vp '*Test.py'
source deactivate

While you are developing new features or fixes, you can reduce the test duration by setting the number of random model tests:

N_RANDOM_FEATURE_SETS=15 N_EXISTING_FEATURE_SETS=30 python -m unittest discover -vp '*Test.py'

Quality

It's highly recommended to ensure the code quality. For that I use Pylint, which you can run by executing the script lint.sh:

./recipes/lint.sh
source activate sklearn-porter
find ./sklearn_porter -name '*.py' -exec pylint {} \;
source deactivate

Questions?

Don't be shy and feel free to contact me on Twitter or Gitter.

License

The module is Open Source Software released under the MIT license.

About

Transpile trained scikit-learn models to C, Java, JavaScript and others.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 99.3%
  • Shell 0.7%