Skip to content

Commit

Permalink
ONNX Interface (#39)
Browse files Browse the repository at this point in the history
* Draft Onnx TLearner

* Add tests

* Fix parameters to avoid some warnings

* Add test for categorical features

* Rename variable

* Implement onnx XLearner and tests

* Refactor

* Fix import location

* Implement onnx RLearner and tests

* Fix atol

* Remove skip as fixed

* Remove old code

* Refactor to not overwrite input

* Update metalearners/_utils.py

Co-authored-by: Kevin Klein <[email protected]>

* Update metalearners/_utils.py

Co-authored-by: Kevin Klein <[email protected]>

* Update metalearners/_utils.py

Co-authored-by: Kevin Klein <[email protected]>

* Update metalearners/_utils.py

Co-authored-by: Kevin Klein <[email protected]>

* Update metalearners/_utils.py

Co-authored-by: Kevin Klein <[email protected]>

* Update metalearners/_utils.py

Co-authored-by: Kevin Klein <[email protected]>

* Consistent docs

* Add return type

* Implement onnx DRLearner and tests

* Increase radius to avoid nan

* Move data generation to fixture

* Abstract method and raise error for SLearner

* Fix

* Add docstrings

* Fix spacing

* Update metalearners/rlearner.py

Co-authored-by: Kevin Klein <[email protected]>

* Update metalearners/rlearner.py

Co-authored-by: Kevin Klein <[email protected]>

* Fix variable names

* Update metalearners/metalearner.py

Co-authored-by: Kevin Klein <[email protected]>

* Update metalearners/metalearner.py

Co-authored-by: Kevin Klein <[email protected]>

* Update metalearners/metalearner.py

Co-authored-by: Kevin Klein <[email protected]>

* Update metalearners/metalearner.py

Co-authored-by: Kevin Klein <[email protected]>

* Fix variable names

* Reuse ONNX_PROBABILITIES_OUTPUTS

* Rename _validate_feature_set_all and error

* Update tests/test_rlearner.py

Co-authored-by: Kevin Klein <[email protected]>

* Use itertools.repeat

* Implement necessary_onnx_models

* Raise error multiclass

* Fix pchs

* Add docstring

* Add warning in docstring and build_onnx

* Change import

* Add tutorial

* Modify error msg

* Update CHANGELOG

* Update lock file

* Update metalearners/drlearner.py

Co-authored-by: Christian Bourjau <[email protected]>

* Use const instead of constant

* Update docs/examples/example_onnx.ipynb

Co-authored-by: Christian Bourjau <[email protected]>

* Only check spox installed as depends on onnx

* Avoid indexing

* Fix XLearner float value

* Make methods private

* Clarify what _build_onnx does

* Fix windows errors

* Typo

* Refactor code

* Fix pchs

* Add note about lightgbm

* Fix text

* Update docs/examples/example_onnx.ipynb

Co-authored-by: Christian Bourjau <[email protected]>

* Update docs/examples/example_onnx.ipynb

Co-authored-by: Christian Bourjau <[email protected]>

* Edit.

* Add netron visualization.

* Remove state.

* Add return type annotation.

* Adapt phrasing of doc string and error message.

* Use md instead of html for image.

---------

Co-authored-by: Kevin Klein <[email protected]>
Co-authored-by: Christian Bourjau <[email protected]>
Co-authored-by: kklein <[email protected]>
  • Loading branch information
4 people authored Aug 2, 2024
1 parent 6d8bc68 commit b0be07c
Show file tree
Hide file tree
Showing 19 changed files with 5,536 additions and 2,591 deletions.
5 changes: 5 additions & 0 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,11 @@ Changelog

* Add :class:`metalearners.utils.FixedBinaryPropensity`.

* Added ``_build_onnx`` to :class:`metalearners.MetaLearner` abstract class and implement it
for :class:`metalearners.TLearner`, :class:`metalearners.XLearner`, :class:`metalearners.RLearner`
and :class:`metalearners.DRLearner`.

* Added ``_necessary_onnx_models`` to :class:`metalearners.MetaLearner`.

0.8.0 (2024-07-22)
------------------
Expand Down
380 changes: 380 additions & 0 deletions docs/examples/example_onnx.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,380 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Converting a MetaLearner to ONNX\n",
"\n",
"```{warning}\n",
"This is a experimental feature which is not subject to deprecation cycles. Use\n",
"it at your own risk!\n",
"```\n",
"\n",
"ONNX is an open standard for representing trained machine learning models.\n",
"By converting a Metalearner\n",
"into an ONNX model, it becomes easier to leverage the model in different environments without\n",
"needing to worry about compatibility or performance issues.\n",
"\n",
"In particular, this conversion also allows models to be run on a variety of hardware setups. Also, ONNX\n",
"models are optimized for efficient computation, enabling faster inference compared to\n",
"the Python interface.\n",
"\n",
"For more information about ONNX, you can check the ONNX [website](https://onnx.ai/).\n",
"\n",
"In this example we will show how most MetaLearners can be converted to ONNX.\n",
"\n",
"## Installation\n",
"\n",
"In order to convert a MetaLearner to ONNX, we first need to install the following packages:\n",
"\n",
"* [onnx](https://github.com/onnx/onnx)\n",
"* [onnxmltools](https://github.com/onnx/onnxmltools)\n",
"* [onnxruntime](https://github.com/microsoft/onnxruntime)\n",
"* [spox](https://github.com/Quantco/spox)\n",
"\n",
"We can do so either via conda and conda-forge:\n",
"\n",
"```console\n",
"$ conda install onnx onnxmltools onnxruntime spox -c conda-forge\n",
"```\n",
"\n",
"or via pip and PyPI\n",
"\n",
"```console\n",
"$ pip install onnx onnxmltools onnxruntime spox\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Usage\n",
"\n",
"```{warning}\n",
"It is important to notice that this method only works for {class}`~metalearners.TLearner`,\n",
"{class}`~metalearners.XLearner`, {class}`~metalearners.RLearner` and {class}`~metalearners.DRLearner`.\n",
"Converting an {class}`~metalearners.SLearner` is highly dependent on the fact that the base\n",
"model supports categorical variables or not and it is not implemented yet. \n",
"```\n",
"\n",
"### Loading the data\n",
"\n",
"Just like in our {ref}`example on estimating CATEs with a MetaLearner <example-basic>`,\n",
"we will first load some experiment data:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"from pathlib import Path\n",
"from git_root import git_root\n",
"\n",
"df = pd.read_csv(git_root(\"data/learning_mindset.zip\"))\n",
"outcome_column = \"achievement_score\"\n",
"treatment_column = \"intervention\"\n",
"feature_columns = [\n",
" column for column in df.columns if column not in [outcome_column, treatment_column]\n",
"]\n",
"categorical_feature_columns = [\n",
" \"ethnicity\",\n",
" \"gender\",\n",
" \"frst_in_family\",\n",
" \"school_urbanicity\",\n",
" \"schoolid\",\n",
"]\n",
"# Note that explicitly setting the dtype of these features to category\n",
"# allows both lightgbm as well as shap plots to\n",
"# 1. Operate on features which are not of type int, bool or float\n",
"# 2. Correctly interpret categoricals with int values to be\n",
"# interpreted as categoricals, as compared to ordinals/numericals.\n",
"for categorical_feature_column in categorical_feature_columns:\n",
" df[categorical_feature_column] = df[categorical_feature_column].astype(\"category\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now that we've loaded the experiment data, we can train a MetaLearner.\n",
"\n",
"\n",
"### Training a MetaLearner\n",
"\n",
"Again, mirroring our {ref}`example on estimating CATEs with a MetaLearner\n",
"<example-basic>`, we can train an\n",
"{class}`~metalearners.XLearner` as follows:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from metalearners import XLearner\n",
"from lightgbm import LGBMRegressor, LGBMClassifier\n",
"\n",
"xlearner = XLearner(\n",
" nuisance_model_factory=LGBMRegressor,\n",
" propensity_model_factory=LGBMClassifier,\n",
" treatment_model_factory=LGBMRegressor,\n",
" is_classification=False,\n",
" n_variants=2,\n",
" nuisance_model_params={\"n_estimators\": 5, \"verbose\": -1},\n",
" propensity_model_params={\"n_estimators\": 5, \"verbose\": -1},\n",
" treatment_model_params={\"n_estimators\": 5, \"verbose\": -1},\n",
" n_folds=2,\n",
")\n",
"\n",
"xlearner.fit(\n",
" X=df[feature_columns],\n",
" y=df[outcome_column],\n",
" w=df[treatment_column],\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"```{note}\n",
"In this example, we used all ``lightgbm`` models because these are the only type of models\n",
"that we managed to get to work with categorical encodings from ``pandas``\n",
"while also being convertible to ONNX. Other ``sklearn`` models which support categoricals such as\n",
"``HistGradientBoostingRegressor`` or ``xgboost`` models do not have support for them\n",
"in their conversion to ONNX. See [this issue](https://github.com/onnx/sklearn-onnx/issues/1051)\n",
"and [this comment](https://github.com/onnx/onnxmltools/issues/469#issuecomment-1993880910).\n",
"```\n",
"\n",
"### Converting the base models to ONNX\n",
"\n",
"Before being able to convert the MetaLearner to ONXX we need to manually convert the necessary\n",
"base models for the prediction. To get the necessary base models that need to be\n",
"converted we can use {meth}`~metalearners.metalearner.MetaLearner._necessary_onnx_models`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"necessary_models = xlearner._necessary_onnx_models()\n",
"necessary_models"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We see that we need to convert the ``\"propensity_model\"``, the ``\"control_effect_model\"``\n",
"and the ``\"treatment_effect_model\"``. We can do this with the following code where we\n",
"use the ``convert_lightgbm`` function from the ``onnxmltools`` package.\n",
"\n",
"```{note}\n",
"It is important to know that for classifiers we need to pass the ``zipmap=False`` option. This\n",
"is required so the output probabilities are a Matrix and not a list of dictionaries.\n",
"In the case of using a ``sklearn`` model and using the ``convert_sklearn`` function, this\n",
"option needs to be specified with the ``options={\"zipmap\": False}`` parameter.\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import onnx\n",
"from onnxmltools import convert_lightgbm\n",
"from onnxconverter_common.data_types import FloatTensorType\n",
"\n",
"onnx_models: dict[str, list[onnx.ModelProto]] = {}\n",
"\n",
"for model_kind, models in necessary_models.items():\n",
" onnx_models[model_kind] = []\n",
" for model in models:\n",
" onnx_models[model_kind].append(\n",
" convert_lightgbm(\n",
" model,\n",
" initial_types=[(\"X\", FloatTensorType([None, len(feature_columns)]))],\n",
" zipmap=False,\n",
" )\n",
" )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we can call {meth}`~metalearners.metalearner.MetaLearner._build_onnx` which combines\n",
"the the converted ONNX base models into a single ONNX model.\n",
"This combined model has a single 2D input ``\"X\"`` and a single output named ``\"tau\"``.\n",
"The output name can be changed using the ``output_name`` parameter. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"onnx_model = xlearner._build_onnx(onnx_models)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can explore the input and output of the model and see that it expects a matrix with 11\n",
"columns and returns a three dimensional tensor with shape ``(..., 1, 1)`` which is expected\n",
"as there is only two treatment variants and one outcome as it is a regression problem."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(\"ONNX model input: \", onnx_model.graph.input)\n",
"print(\"ONNX model output: \", onnx_model.graph.output)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can also visualize the ONNX model with, e.g. [netron](https://netron.app/):\n",
"![](onnx_netron.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"```{note}\n",
"We noticed that ``convert_lightgbm`` does not support using native pandas categorical variables.\n",
"This is because a numpy array needs to be passed when predicting, for this reason we need to\n",
"use the categories codes in the input matrix. For more context on this issue see\n",
"[here](https://github.com/onnx/onnxmltools/issues/309) and [here](https://github.com/microsoft/LightGBM/issues/5162).\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"\n",
"X_onnx = df[feature_columns].copy(deep=True)\n",
"for c in categorical_feature_columns:\n",
" X_onnx[c] = df[c].cat.codes\n",
"X_onnx = X_onnx.to_numpy(np.float32)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can finally use ``onnxruntime`` to perform predictions using our model:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import onnxruntime as rt\n",
"\n",
"sess = rt.InferenceSession(\n",
" onnx_model.SerializeToString(), providers=rt.get_available_providers()\n",
")\n",
"\n",
"(pred_onnx,) = sess.run(\n",
" [\"tau\"],\n",
" {\"X\": X_onnx},\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"onnx.save_model(onnx_model, \"model.onnx\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We recommend always doing a final check with some data that the CATEs predicted by the python\n",
"implementation and the ONNX model are the same (up to some tolerance). This can be done with\n",
"the following code:\n",
"\n",
"```{note}\n",
"We have to use the data as if it was out-of-sample with ``oos_method = True`` as when we\n",
"converted the base models we used the ``_overall_estimtor``.\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"np.testing.assert_allclose(\n",
" xlearner.predict(df[feature_columns], True, \"overall\"), pred_onnx, atol=1e-6\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Further comments\n",
"\n",
"* It would be desirable to work with ``DoubleTensorType`` instead of ``FloatTensorType``\n",
" but we have noted that some converters have issues with it. We recommend try using\n",
" ``DoubleTensorType`` but switching to ``FloatTensorType`` in case the converter fails.\n",
"* In the case the final assertion fails we recommend first testing that the different\n",
" base models have the same base outputs as we discovered some issues with some converters,\n",
" see [this issue](https://github.com/onnx/sklearn-onnx/issues/1117) and\n",
" [this issue](https://github.com/onnx/sklearn-onnx/issues/1116)."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.3"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
1 change: 1 addition & 0 deletions docs/examples/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,3 +14,4 @@ Examples
Generating data <example_data_generation.ipynb>
Estimating CATEs for survival analysis <example_survival.ipynb>
What if I know the propensity score? <example_propensity.ipynb>
Converting a MetaLearner to ONNX <example_onnx.ipynb>
Binary file added docs/examples/onnx_netron.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit b0be07c

Please sign in to comment.