Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API for customizing OutputField names #361

Open
vruusmann opened this issue Nov 4, 2022 · 4 comments
Open

API for customizing OutputField names #361

vruusmann opened this issue Nov 4, 2022 · 4 comments

Comments

@vruusmann
Copy link
Member

Inspired by this (again!):
#359

There should be a Python-accessible API for instructing the PMML conversion engine to override default field names with user-specified field names.

The Alias decorator typically creates a copy of the field. As a result, the PMML model schema will contain both the old "badly named" field declaration, plus the new "well named" field declaration. This is confusing for model end users.

I'm thinking about some PMMLPipeline-level attribute, which could be set using a convenience method:

pipeline = PMMLPipeline([...])
pipeline.fit(X, y)

# THIS!
mapping = {
  "probability(0)" : "proba_no",
  "probability(1)" : "proba_yes"
}
pipeline.rename_pmml_fields(mapping)

sklearn2pmml(pipeline, "pipeline.pmml")

The exported PMML file would contain only "proba_no" and "proba_yes" fields.

@DTKx
Copy link

DTKx commented Dec 4, 2022

Hello!
a question there related to naming of attributes currently how do I pass the names of my attributes when exporting my model to my pmml file? Is there any example link with an example of usage you can pass me?

More specifically I am trying to load the model in java using this library and I cannot find a way to get the names.

        <dependency>
            <groupId>org.jpmml</groupId>
            <artifactId>pmml-evaluator-metro</artifactId>
            <version>1.6.4</version>
        </dependency>

Any example link you could share would be helpful.
Many thanks in advance

@vruusmann
Copy link
Member Author

I am trying to load the model in java using this library and I cannot find a way to get the names.

See https://github.com/jpmml/jpmml-evaluator#querying-the-data-schema-of-models

how do I pass the names of my attributes when exporting my model to my pmml file?

  • The values of PMMLPipeline.active_fields show up as o.j.e.Evaluator#getActiveFields().
  • The values of PMMLPipeline.target_fields (ie. primary results) show up as o.j.e.Evaluator#getTargetFields()
  • The values of estimator object methods (ie. secondary outputs, such as Scikit-Learn's predict_proba, apply etc. methods) show up as o.j.e.Evaluator#getOutputFields().

Renaming active and target fields is straightforward - just initialize your PMMLPipeline object correctly.

Renaming output fields will be addressed by this issue. Right now they are generated using fairly reasonable patterns (eg. all probability outputs are named probability(<target category value>)). However, some people do not like my field naming conventions, and want to go with their own custom names instead.

There's also a distinct category of fields called "derived fields", which correspond to Scikit-Learn transformer objects. They are named after Scikit-Learn class names by default (eg. a StandardScaler object will give rise to one or more "standardScaler" derived fields). These defaults can be overriden by setting the pmml_name_ attribute.

@vruusmann
Copy link
Member Author

Any example link you could share would be helpful.

If you have any field naming questions - backed by concrete Scikit-Learn/Python code snippets - please ask them here. Will do my best to answer them, and maybe it will give me some new ideas for designing a better fix for this issue.

Once this issue is resolved, I'll hope to do a quick overview in the form of a small technical article at https://openscoring.io/blog/

@vruusmann
Copy link
Member Author

Currently doable using the Model Customization API:

from sklearn2pmml.util.pmml import make_element

pipeline = PMMLPipeline([...])
pipeline.fit(X, y)

# Define a "skeletal" PMML element, which defines changeable attributes.
# Here, only the OutputField@name attribute will be changed, all other attributes will remain as-is
updated_output_field = make_element("OutputField", name = "p(0)")

# Point the update action towards the existing OutputField element that is named "probability(0)"
pipeline.customize(command = "update", xpath_expr = "//:OutputField[@name='probability(0)']", pmml_element = updated_output_field.tostring().decode("utf-8"))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants