-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
StringNormalizer
creats ValueError: Expected 2D array, got 1D array instead
#443
Comments
Do you think the following The mapper = DataFrameMapper(
[
(
["color"],
[
#StringNormalizer(function="lower", trim_blanks=True),
ExpressionTransformer("(X[0].lower()).strip()"),
TargetEncoder(random_state=0),
],
),
],
df_out=True,
default=False,
) Alternatively, you can do exactly what the Python error message tells you to do - reshape the data container to from sklearn2pmml.util import Reshaper
mapper = DataFrameMapper(
[
(
["color"],
[
StringNormalizer(function="lower", trim_blanks=True),
Reshaper((-1, 1)),
TargetEncoder(random_state=0),
],
),
],
df_out=True,
default=False,
) |
This sub-sequence catches my eye: SimpleImputer(
missing_values=pd.NA, strategy="constant", fill_value="missing"
),
ReplaceTransformer("^\d+000000", "missing"),
ReplaceTransformer("^(41|0041|0)7\d{8}$", "mobile"),
LookupTransformer(
{"missing": "missing", "mobile": "mobile"}, default_value="fix"
) I would try to replace it with a single The JPMML-Python library that handles Python-to-PMML expression translation supports RegEx replace functionality using Not going to write any Python code for you this time. I'll tag this issue, and perhaps I'll write an example about it sometimes in the not-so-distant future into the JPMML documentation site (under construction rn). |
Since you'll be using RegEx in The above pipeline fragment can be simplified to a simple Python UDF: def phone_number_type(phone_humber):
if re.search("^\d+000000", phone_number):
return "missing"
elif re.search("^(41|0041|0)7\d{8}$", phone_number):
return "mobile"
else:
return "fixed" Then you'd pass a reference to this Python UDF to |
Thanks for your quick response!
I do not think this function is at fault, because you use it in many other transformer classes. If I understand your code correctly, I suggest that a simple Thank you for proposing working alternatives!
OK, I will look into Python UDF in conjunction with the |
IIRC, you define the Python UDF in terms of scalar values, and the Just get some experiments going, and ask for my assistance here if you get completely stuck somewhere. Right now, the only reference to this topic is this: |
Hello Sir,
long time no see. Hope you're doing well!
I am trying to use the
StringNormalizer
prior to ansklearn
preprocessor (doesn't matter which one) and there is an exception thrown uponfit_transform()
. Consider the following example:The last line throws the following error:
The transformation of the
phone_number
feature has no direct link with the issue. But since it seems a bit rocky, I thought you might give me some feedback on how to improve it anyway.Thank you a lot in advance!
The text was updated successfully, but these errors were encountered: