Skip to content

Commit

Permalink
Fix documentation (#1145)
Browse files Browse the repository at this point in the history
* Fix documentation

Signed-off-by: xadupre <[email protected]>

* update versions in CI

Signed-off-by: xadupre <[email protected]>

* restore links

Signed-off-by: xadupre <[email protected]>

* fix nan

Signed-off-by: xadupre <[email protected]>

* disable deprecated tests

Signed-off-by: xadupre <[email protected]>

* deprecation

Signed-off-by: xadupre <[email protected]>

* fix issue with murmur

Signed-off-by: xadupre <[email protected]>

* fix overflow

* fix at

Signed-off-by: xadupre <[email protected]>

* doc

Signed-off-by: xadupre <[email protected]>

* fix doc

Signed-off-by: xadupre <[email protected]>

* fix doc

* diable tests

---------

Signed-off-by: xadupre <[email protected]>
  • Loading branch information
xadupre authored Dec 22, 2024
1 parent 1399f55 commit ffb17c5
Show file tree
Hide file tree
Showing 18 changed files with 107 additions and 45 deletions.
13 changes: 10 additions & 3 deletions .github/workflows/linux-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,14 +7,21 @@ jobs:
strategy:
matrix:
os: [ubuntu-latest]
sklearn_version: ['==1.5.2', '==1.4.2', '==1.3.2', '==1.2.2', '==1.1.3']
sklearn_version: ['==1.6.0', '==1.5.2', '==1.4.2', '==1.3.2', '==1.2.2', '==1.1.3']
include:
- sklearn_version: '==1.5.2'
- sklearn_version: '==1.6.0'
documentation: 1
numpy_version: '>=2.0'
scipy_version: '>=1.7.0'
onnx_version: 'onnx>=1.17.0'
onnxrt_version: 'onnxruntime>=1.20.1'
python_version: '3.12'
- sklearn_version: '==1.5.2'
documentation: 0
numpy_version: '>=1.21.1,<2.0'
scipy_version: '>=1.7.0'
onnx_version: 'onnx>=1.17.0'
onnxrt_version: 'onnxruntime>=1.19.2'
onnxrt_version: 'onnxruntime==1.19.2'
python_version: '3.11'
- sklearn_version: '==1.5.2'
documentation: 0
Expand Down
8 changes: 7 additions & 1 deletion .github/workflows/windows-macos-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,14 @@ jobs:
strategy:
matrix:
os: [windows-latest, macos-latest]
sklearn_version: ['==1.5.2', '==1.4.2', '==1.3.2', '==1.2.2', '==1.1.3']
sklearn_version: ['==1.6.0', '==1.5.2', '==1.4.2', '==1.3.2', '==1.2.2', '==1.1.3']
include:
- sklearn_version: '==1.6.0'
python_version: '3.12'
numpy_version: '>=2.0'
scipy_version: '>=1.7.0'
onnx_version: 'onnx==1.17.0'
onnxrt_version: 'onnxruntime==1.20.1'
- sklearn_version: '==1.5.2'
python_version: '3.12'
numpy_version: '>=1.21.1,<2.0'
Expand Down
15 changes: 15 additions & 0 deletions CHANGELOGS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
# Change Logs

## 1.19.0

In progress.

## 1.18.1

* Fix np.NAN into np.nan,
Expand All @@ -14,6 +18,17 @@
[#1109](https://github.com/onnx/sklearn-onnx/pull/1109)
* Add converter for TunedThresholdClassifierCV,
[#1107](https://github.com/onnx/sklearn-onnx/pull/1107)
* Update and Fix documentation
[#1113](https://github.com/onnx/sklearn-onnx/pull/1113)
* Support fill_value for SimpleImputer with string data
[#1123](https://github.com/onnx/sklearn-onnx/pull/1123)
* Remove unnecessary options for Regressor
[#1124](https://github.com/onnx/sklearn-onnx/pull/1124)
* OrdinalEncoder handle encoded_missing_value and unknown_value
[#1132](https://github.com/onnx/sklearn-onnx/pull/1132)
* Create output_onnx_single_probability.py
[#1139](https://github.com/onnx/sklearn-onnx/pull/1139),
[#1141](https://github.com/onnx/sklearn-onnx/pull/1141)

## 1.17.0 (development)

Expand Down
6 changes: 3 additions & 3 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
# -- Project information -----------------------------------------------------

project = "sklearn-onnx"
copyright = "2018-2023, Microsoft"
copyright = "2018-2025, Microsoft"
author = "Microsoft"
version = skl2onnx.__version__
release = version
Expand Down Expand Up @@ -82,8 +82,6 @@
"scipy": ("https://docs.scipy.org/doc/scipy/reference", None),
"seaborn": ("https://seaborn.pydata.org/", None),
"scikit-learn": ("https://scikit-learn.org/stable/", None),
"sklearn": ("https://scikit-learn.org/stable/", None),
"skl2onnx": ("https://onnx.ai/sklearn-onnx/", None),
"sklearn-onnx": ("https://onnx.ai/sklearn-onnx/", None),
}

Expand All @@ -105,6 +103,7 @@
epkg_dictionary = {
"C": "https://en.wikipedia.org/wiki/C_(programming_language)",
"C++": "https://en.wikipedia.org/wiki/C%2B%2B",
"CatBoost": "https://catboost.ai/",
"cython": "https://cython.org/",
"DOT": "https://www.graphviz.org/doc/info/lang.html",
"ImageNet": "http://www.image-net.org/",
Expand All @@ -118,6 +117,7 @@
"ONNX operators": "https://onnx.ai/onnx/operators/",
"ONNX ML operators": "https://onnx.ai/onnx/operators/",
"ONNX ML Operators": "https://onnx.ai/onnx/operators/",
"ONNX Zoo": "https://github.com/onnx/models/",
"onnxmltools": "https://github.com/onnx/onnxmltools",
"onnxruntime": "https://microsoft.github.io/onnxruntime/",
"openmp": "https://en.wikipedia.org/wiki/OpenMP",
Expand Down
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -129,5 +129,6 @@ It is licensed with `Apache License v2.0 <../LICENSE>`_.

**Older versions**

* `1.18.0 <versions/v1.18.0/>`_
* `1.17.0 <versions/v1.17.0/>`_
* `1.16.0 <versions/v1.16.0/>`_
8 changes: 6 additions & 2 deletions docs/index_tutorial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,11 @@ The tutorial was tested with following version:
.. runpython::
:showcode:

import catboost
try:
import catboost
except Exception as e:
print("Unable to import catboost due to", e)
catboost = None
import numpy
import scipy
import sklearn
Expand All @@ -39,7 +43,7 @@ The tutorial was tested with following version:
mods = [numpy, scipy, sklearn, lightgbm, xgboost, catboost,
onnx, onnxmltools, onnxruntime,
skl2onnx]
mods = [(m.__name__, m.__version__) for m in mods]
mods = [(m.__name__, m.__version__) for m in mods if m is not None]
mx = max(len(_[0]) for _ in mods) + 1
for name, vers in sorted(mods):
print("%s%s%s" % (name, " " * (mx - len(name)), vers))
7 changes: 7 additions & 0 deletions docs/tests/test_documentation_examples.py
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,13 @@ def add_test_methods(cls):
reason = None
if name in {"plot_woe_transformer.py"}:
reason = "dot not available"
if name in {
"plot_catwoe_transformer.py",
"plot_gexternal_xgboost.py",
"plot_pipeline_xgboost.py",
"plot_usparse_xgboost.py",
}:
reason = "unstable, xgboost not ready"

if reason:

Expand Down
9 changes: 8 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,14 @@ select = [
]

[tool.ruff.lint.per-file-ignores]
"**" = ["C413", "C408", "C417", "E731", "PIE808", "RUF012", "RUF015", "SIM103", "SIM108", "SIM114", "SIM910", "UP008", "UP015", "UP028", "UP030", "UP031", "UP032"]
"**" = [
"C413", "C408", "C417",
"E731",
"PIE808",
"RUF012", "RUF015",
"SIM103", "SIM108", "SIM114", "SIM910",
"UP006", "UP008", "UP015", "UP028", "UP030", "UP031", "UP035", "UP032"
]
"**/__init__.py" = ["F401"]
"docs/**" = ["B018", "E402"]
"skl2onnx/algebra/onnx_ops.py" = ["F821"]
Expand Down
2 changes: 1 addition & 1 deletion skl2onnx/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
Main entry point to the converter from the *scikit-learn* to *onnx*.
"""

__version__ = "1.18.0"
__version__ = "1.19.0"
__author__ = "Microsoft"
__producer__ = "skl2onnx"
__producer_version__ = __version__
Expand Down
3 changes: 0 additions & 3 deletions skl2onnx/convert.py
Original file line number Diff line number Diff line change
Expand Up @@ -290,9 +290,6 @@ def to_onnx(
:class:`OnnxOperatorMixin`, it calls method *to_onnx*
in that case otherwise it calls :func:`convert_sklearn`.
.. versionchanged:: 1.10.0
Parameter *naming* was added.
.. versionchanged:: 1.18.0
The main opset is now equal to target_opset and not a value equal or less
than the given value.
Expand Down
3 changes: 1 addition & 2 deletions skl2onnx/operator_converters/nearest_neighbours.py
Original file line number Diff line number Diff line change
Expand Up @@ -257,8 +257,7 @@ def _convert_nearest_neighbors(operator, container, k=None, radius=None):
single_reg = (
not hasattr(op, "_y")
or len(op._y.shape) == 1
or len(op._y.shape) == 2
and op._y.shape[1] == 1
or (len(op._y.shape) == 2 and op._y.shape[1] == 1)
)
ndim = 1 if single_reg else op._y.shape[1]

Expand Down
12 changes: 9 additions & 3 deletions skl2onnx/operator_converters/power_transformer.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,12 @@
)


def get_nan():
if hasattr(np, "nan"):
return np.nan
return np.NAN


def convert_powertransformer(
scope: Scope, operator: Operator, container: ModelComponentContainer
):
Expand Down Expand Up @@ -78,7 +84,7 @@ def convert_powertransformer(
y_gr0 = OnnxImputer(
y_gr0,
imputed_value_floats=[0.0],
replaced_value_float=getattr(np, "nan", getattr(np, "NAN")),
replaced_value_float=get_nan(),
op_version=opv,
)
y_gr0 = OnnxMul(y_gr0, greater_mask, op_version=opv)
Expand All @@ -104,7 +110,7 @@ def convert_powertransformer(
y_le0 = OnnxImputer(
y_le0,
imputed_value_floats=[0.0],
replaced_value_float=getattr(np, "nan", getattr(np, "NAN")),
replaced_value_float=get_nan(),
op_version=opv,
)
y_le0 = OnnxMul(y_le0, less_mask, op_version=opv)
Expand All @@ -130,7 +136,7 @@ def convert_powertransformer(
y_gr0_l_eq0 = OnnxImputer(
y_gr0_l_eq0,
imputed_value_floats=[0.0],
replaced_value_float=getattr(np, "nan", getattr(np, "NAN")),
replaced_value_float=get_nan(),
op_version=opv,
)
y_gr0_l_eq0 = OnnxMul(y_gr0_l_eq0, lambda_zero_mask, op_version=opv)
Expand Down
3 changes: 3 additions & 0 deletions tests/test_sklearn_adaboost_converter.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@

class TestSklearnAdaBoostModels(unittest.TestCase):
@unittest.skipIf(TARGET_OPSET < 11, reason="not available")
@unittest.skipIf(pv.Version(skl_version) >= pv.Version("1.6"), "deprecated")
def test_ada_boost_classifier_samme_r(self):
if pv.Version(skl_version) < pv.Version("1.2"):
model, X_test = fit_classification_model(
Expand Down Expand Up @@ -62,6 +63,7 @@ def test_ada_boost_classifier_samme_r(self):
)

@unittest.skipIf(TARGET_OPSET < 11, reason="not available")
@unittest.skipIf(pv.Version(skl_version) >= pv.Version("1.6"), "deprecated")
def test_ada_boost_classifier_samme_r_decision_function(self):
if pv.Version(skl_version) < pv.Version("1.2"):
model, X_test = fit_classification_model(
Expand Down Expand Up @@ -101,6 +103,7 @@ def test_ada_boost_classifier_samme_r_decision_function(self):
)

@unittest.skipIf(TARGET_OPSET < 11, reason="not available")
@unittest.skipIf(pv.Version(skl_version) >= pv.Version("1.6"), "deprecated")
def test_ada_boost_classifier_samme_r_logreg(self):
if pv.Version(skl_version) < pv.Version("1.2"):
model, X_test = fit_classification_model(
Expand Down
26 changes: 14 additions & 12 deletions tests/test_sklearn_feature_hasher.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@
from typing import Optional
import packaging.version as pv
import numpy as np
from sklearn.utils._testing import assert_almost_equal
from pandas import DataFrame
from onnx import TensorProto, __version__ as onnx_version
from onnx.helper import (
Expand Down Expand Up @@ -113,7 +112,10 @@ def test_ort_murmurhash3_string(self):
mat[i, indices[i]] = final[i]

skl = FeatureHasher(n_features, input_type="string", dtype=np.uint32)
expected = skl.transform(feeds["X"].reshape((-1, 1)))
try:
expected = skl.transform(feeds["X"].reshape((-1, 1)))
except OverflowError as e:
raise unittest.SkipTest(f"Unexpected sklearn error: {e}")
dense = expected.todense()
for i, (a, b) in enumerate(zip(dense.tolist(), mat.tolist())):
if a != b:
Expand Down Expand Up @@ -327,10 +329,10 @@ def test_feature_hasher_pipeline(self):
model_onnx.SerializeToString(), providers=["CPUExecutionProvider"]
)
got2 = sess.run(None, dict(cat_features=X_train_ort2))
assert_almost_equal(expected2, got2[0])
np.testing.assert_almost_equal(expected2, got2[0])
got1 = sess.run(None, dict(cat_features=X_train_ort1))
with self.assertRaises(AssertionError):
assert_almost_equal(expected2, got1[0])
np.testing.assert_almost_equal(expected2, got1[0])

# check hash
X_train_ort = X_train.values
Expand All @@ -350,9 +352,9 @@ def test_feature_hasher_pipeline(self):
)
got = sess.run(None, dict(cat_features=X_train_ort))
with self.assertRaises(AssertionError):
assert_almost_equal(expected, got[0])
np.testing.assert_almost_equal(expected, got[0])
got = sess.run(None, dict(cat_features=X_train_ort2))
assert_almost_equal(expected, got[0])
np.testing.assert_almost_equal(expected, got[0])

# transform
X_train_ort = X_train.values
Expand All @@ -367,9 +369,9 @@ def test_feature_hasher_pipeline(self):
)
got = sess.run(None, dict(cat_features=X_train_ort))
with self.assertRaises(AssertionError):
assert_almost_equal(expected, got[0].astype(np.float64))
np.testing.assert_almost_equal(expected, got[0].astype(np.float64))
got = sess.run(None, dict(cat_features=X_train_ort2))
assert_almost_equal(expected, got[0].astype(np.float64))
np.testing.assert_almost_equal(expected, got[0].astype(np.float64))

# classifier
expected = complete_pipeline.predict_proba(X_train)
Expand All @@ -387,9 +389,9 @@ def test_feature_hasher_pipeline(self):
X_train_ort = X_train.values
got = sess.run(None, dict(cat_features=X_train_ort))
with self.assertRaises(AssertionError):
assert_almost_equal(expected, got[1].astype(np.float64))
np.testing.assert_almost_equal(expected, got[1].astype(np.float64))
got = sess.run(None, dict(cat_features=X_train_ort2))
assert_almost_equal(labels, got[0])
np.testing.assert_almost_equal(labels, got[0])

@unittest.skipIf(
pv.Version(onnx_version) < pv.Version("1.11"), reason="onnx is too old"
Expand Down Expand Up @@ -565,12 +567,12 @@ def _run(
onx.SerializeToString(), so, providers=["CPUExecutionProvider"]
)
got = sess.run(None, feeds)
assert_almost_equal(expected, got[0])
np.testing.assert_almost_equal(expected, got[0])

if ReferenceEvaluator is not None:
# The pure python implementation does not correctly implement murmurhash3.
# There are issue with type int.
assert_almost_equal(expected.shape, got_py[0].shape)
np.testing.assert_almost_equal(expected.shape, got_py[0].shape)


if __name__ == "__main__":
Expand Down
2 changes: 1 addition & 1 deletion tests/test_sklearn_sgd_classifier_converter.py
Original file line number Diff line number Diff line change
Expand Up @@ -436,7 +436,7 @@ def test_model_sgd_binary_class_log_bool(self):

def test_model_sgd_multi_class_log_int(self):
model, X = fit_classification_model(
SGDClassifier(loss=LOG_LOSS, random_state=42), 5, is_int=True
SGDClassifier(loss=LOG_LOSS, random_state=42), 2, is_int=True
)
model_onnx = convert_sklearn(
model,
Expand Down
10 changes: 7 additions & 3 deletions tests/test_sklearn_tfidf_transformer_converter_sparse.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@

import packaging.version as pv
import unittest
import urllib.error
import sys
import onnx
from sklearn.datasets import fetch_20newsgroups
Expand Down Expand Up @@ -41,9 +42,12 @@ def test_model_tfidf_transform_bug(self):
"comp.graphics",
"sci.med",
]
twenty_train = fetch_20newsgroups(
subset="train", categories=categories, shuffle=True, random_state=0
)
try:
twenty_train = fetch_20newsgroups(
subset="train", categories=categories, shuffle=True, random_state=0
)
except urllib.error.HTTPError as e:
raise unittest.SkipTest(f"HTTP fails due to {e}")
text_clf = Pipeline(
[("vect", CountVectorizer()), ("tfidf", TfidfTransformer())]
)
Expand Down
Loading

0 comments on commit ffb17c5

Please sign in to comment.