You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm currently trying to train an Isolation Forest model.
However, when I try to run pipeline.fit() the execution aborts after some stages with an exception I have no idea about what is going wrong:
java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.sql.catalyst.expressions.BoundReference.accessor of type scala.Function2 in instance of org.apache.spark.sql.catalyst.expressions.BoundReference
Have had anyone else some similar issues?
Code to reproduce issue
I'm currently doing the same like in the documentation examples:
SynapseML version
1.0.4
System information
Describe the problem
I'm currently trying to train an Isolation Forest model.
However, when I try to run pipeline.fit() the execution aborts after some stages with an exception I have no idea about what is going wrong:
java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.sql.catalyst.expressions.BoundReference.accessor of type scala.Function2 in instance of org.apache.spark.sql.catalyst.expressions.BoundReference
Have had anyone else some similar issues?
Code to reproduce issue
I'm currently doing the same like in the documentation examples:
Isolation Forest Parameter
contamination = 0.01
num_estimators = 1
max_samples = 1
max_features = 1.0
MLFlow Experiment
artifact_path = "isolation_forest"
experiment_name = f"/opt/spark-data/iforest/isolation_forest_experiment{str(uuid.uuid1())}/"
model_name = f"isolation-forest-model-v1"
Isolation Forest Model
isolationForest = IsolationForest()
.setNumEstimators(num_estimators)
.setBootstrap(False)
.setMaxSamples(max_samples)
.setMaxFeatures(max_features)
.setFeaturesCol("features")
.setPredictionCol("predictedLabel")
.setScoreCol("outlierScore")
.setContamination(contamination)
.setContaminationError(0.01 * contamination)
.setRandomSeed(1)
mlflow.set_experiment(experiment_name)
with mlflow.start_run():
va = VectorAssembler(inputCols=inputCols, outputCol="features")
pipeline = Pipeline(stages=[va, isolationForest])
model = pipeline.fit(df_train)
mlflow.spark.log_model(
model, artifact_path=artifact_path, registered_model_name=model_name
)
Other info / logs
What component(s) does this bug affect?
area/cognitive
: Cognitive projectarea/core
: Core projectarea/deep-learning
: DeepLearning projectarea/lightgbm
: Lightgbm projectarea/opencv
: Opencv projectarea/vw
: VW projectarea/website
: Websitearea/build
: Project build systemarea/notebooks
: Samples under notebooks folderarea/docker
: Docker usagearea/models
: models related issueWhat language(s) does this bug affect?
language/scala
: Scala source codelanguage/python
: Pyspark APIslanguage/r
: R APIslanguage/csharp
: .NET APIslanguage/new
: Proposals for new client languagesWhat integration(s) does this bug affect?
integrations/synapse
: Azure Synapse integrationsintegrations/azureml
: Azure ML integrationsintegrations/databricks
: Databricks integrationsThe text was updated successfully, but these errors were encountered: