Transpile trained scikit-learn estimators to C, Java, JavaScript and others.
It's recommended for limited embedded systems and critical applications where performance matters most.
Algorithm | Programming language | |||||
Classifier | Java * | JS | C | Go | PHP | Ruby |
svm.SVC | ✓, ✓ ᴵ | ✓ | ✓ | ✓ | ✓ | |
svm.NuSVC | ✓, ✓ ᴵ | ✓ | ✓ | ✓ | ✓ | |
svm.LinearSVC | ✓, ✓ ᴵ | ✓ | ✓ | ✓ | ✓ | ✓ |
tree.DecisionTreeClassifier | ✓, ✓ ᴱ, ✓ ᴵ | ✓, ✓ ᴱ | ✓, ✓ ᴱ | ✓, ✓ ᴱ | ✓, ✓ ᴱ | ✓, ✓ ᴱ |
ensemble.RandomForestClassifier | ✓ ᴱ, ✓ ᴵ | ✓ ᴱ | ✓ ᴱ | ✓ ᴱ | ✓ ᴱ | |
ensemble.ExtraTreesClassifier | ✓ ᴱ, ✓ ᴵ | ✓ ᴱ | ✓ ᴱ | ✓ ᴱ | ✓ ᴱ | |
ensemble.AdaBoostClassifier | ✓ ᴱ, ✓ ᴵ | ✓ ᴱ, ✓ ᴵ | ✓ ᴱ | |||
neighbors.KNeighborsClassifier | ✓, ✓ ᴵ | ✓, ✓ ᴵ | ||||
naive_bayes.GaussianNB | ✓, ✓ ᴵ | ✓ | ||||
naive_bayes.BernoulliNB | ✓, ✓ ᴵ | ✓ | ||||
neural_network.MLPClassifier | ✓, ✓ ᴵ | ✓, ✓ ᴵ | ||||
Regressor | ||||||
neural_network.MLPRegressor | ✓ |
✓ = is full-featured, ᴱ = with embedded model data, ᴵ = with imported model data, * = default language
$ pip install sklearn-porter
If you want the latest changes, you can install the module from the master branch:
$ pip uninstall -y sklearn-porter
$ pip install --no-cache-dir https://github.com/nok/sklearn-porter/zipball/master
- python>=2.7.3
- scikit-learn>=0.14.1
If you want to transpile a multilayer perceptron, you have to upgrade the scikit-learn package:
- python>=2.7.3
- scikit-learn>=0.18.0
The following example shows how you can port a decision tree estimator to Java:
from sklearn.datasets import load_iris
from sklearn.tree import tree
from sklearn_porter import Porter
# Load data and train the classifier:
samples = load_iris()
X, y = samples.data, samples.target
clf = tree.DecisionTreeClassifier()
clf.fit(X, y)
# Export:
porter = Porter(clf, language='java')
output = porter.export(embed_data=True)
print(output)
The exported result matches the official human-readable version of the decision tree.
Run the prediction(s) in the target programming language directly:
# ...
porter = Porter(clf, language='java')
# Prediction(s):
Y_java = porter.predict(X)
y_java = porter.predict(X[0])
y_java = porter.predict([1., 2., 3., 4.])
Always compute and test the integrity between the original and the transpiled estimator:
# ...
porter = Porter(clf, language='java')
# Accuracy:
integrity = porter.integrity_score(X)
print(integrity) # 1.0
First of all have a quick view on the available arguments:
$ python -m sklearn_porter [-h] --input <PICKLE_FILE> [--output <DEST_DIR>] \
[--class_name <CLASS_NAME>] [--method_name <METHOD_NAME>] \
[--c] [--java] [--js] [--go] [--php] [--ruby] \
[--export] [--checksum] [--data] [--pipe]
The following example shows how you can save an trained estimator to the pickle format:
# ...
# Extract estimator:
joblib.dump(clf, 'estimator.pkl')
After that the estimator can be transpiled to JavaScript by using the following command:
$ python -m sklearn_porter -i estimator.pkl --js
The target programming language is changeable on the fly:
$ python -m sklearn_porter -i estimator.pkl --c
$ python -m sklearn_porter -i estimator.pkl --go
$ python -m sklearn_porter -i estimator.pkl --php
$ python -m sklearn_porter -i estimator.pkl --java
$ python -m sklearn_porter -i estimator.pkl --ruby
For further processing the argument --pipe
can be used to pass the result:
$ python -m sklearn_porter -i estimator.pkl --js --pipe > estimator.js
For instance the result can be minified by using UglifyJS:
$ python -m sklearn_porter -i estimator.pkl --js --pipe | uglifyjs --compress -o estimator.min.js
Further information will be shown by using the --help
argument:
$ python -m sklearn_porter --help
$ python -m sklearn_porter -h
Install the required environment modules by executing the script environment.sh:
$ bash ./scripts/environment.sh
#!/usr/bin/env bash
conda env create -c conda-forge -n sklearn-porter python=2 -f environment.yml
source activate sklearn-porter
The following compilers or intepreters are required to cover all tests:
The tests cover module functions as well as matching predictions of transpiled estimators. Run all tests by executing the script test.sh:
$ bash ./scripts/test.sh
#!/usr/bin/env bash
python -m unittest discover -vp '*Test.py'
The test files have a specific pattern: '[Algorithm][Language]Test.py'
:
$ python -m unittest discover -vp 'RandomForest*Test.py'
$ python -m unittest discover -vp '*JavaTest.py'
While you are developing new features or fixes, you can reduce the test duration by setting the number of tests:
$ N_RANDOM_FEATURE_SETS=15 N_EXISTING_FEATURE_SETS=30 python -m unittest discover -vp '*Test.py'
It's highly recommended to ensure the code quality. For that I use Pylint, which you can run by executing the script lint.sh:
$ bash ./scripts/lint.sh
#!/usr/bin/env bash
find ./sklearn_porter -name '*.py' -exec pylint {} \;
If you use this implementation in you work, please add a reference/citation to the paper. You can use the following BibTeX entry:
@misc{SkPoDaMo,
author = {Darius Morawiec},
title = {sklearn-porter: Transpile trained scikit-learn estimators to C, Java, JavaScript and others},
url = {https://github.com/nok/sklearn-porter},
year = {2016--2017}
}
The module is Open Source Software released under the MIT license.
Don't be shy and feel free to contact me on Twitter or Gitter.