diff --git a/docs/onnx.md b/docs/onnx.md
index 6c06e3a8d8..57c1cd06ff 100644
--- a/docs/onnx.md
+++ b/docs/onnx.md
@@ -9,11 +9,11 @@ description: Learn how to use the ONNX model transformer to run inference for an
[ONNX](https://onnx.ai/) is an open format to represent both deep learning and traditional machine learning models. With ONNX, AI developers can more easily move models between state-of-the-art tools and choose the combination that is best for them.
-MMLSpark now includes a Spark transformer to bring an trained ONNX model to Apache Spark, so you can run inference on your data with Spark's large-scale data processing power.
+SynapseML now includes a Spark transformer to bring an trained ONNX model to Apache Spark, so you can run inference on your data with Spark's large-scale data processing power.
## Usage
-1. Create a `com.microsoft.ml.spark.onnx.ONNXModel` object and use `setModelLocation` or `setModelPayload` to load the ONNX model.
+1. Create a `com.microsoft.azure.synapse.ml.onnx.ONNXModel` object and use `setModelLocation` or `setModelPayload` to load the ONNX model.
For example:
@@ -27,7 +27,7 @@ MMLSpark now includes a Spark transformer to bring an trained ONNX model to Apac
3. Set the parameters properly to the `ONNXModel` object.
- The `com.microsoft.ml.spark.onnx.ONNXModel` class provides a set of parameters to control the behavior of the inference.
+ The `com.microsoft.azure.synapse.ml.onnx.ONNXModel` class provides a set of parameters to control the behavior of the inference.
| Parameter | Description | Default Value |
|:------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------|
diff --git a/docs/vagrant.md b/docs/vagrant.md
index 52763a9040..4c8a1ac823 100644
--- a/docs/vagrant.md
+++ b/docs/vagrant.md
@@ -1,4 +1,4 @@
-# Using the MMLSpark Vagrant Image
+# Using the SynapseML Vagrant Image
## Install Vagrant and Dependencies
@@ -10,7 +10,7 @@ You will need to a few dependencies before we get started. These instructions ar
## Build the Vagrant Image
-Start powershell as Administrator and go to the `mmlspark/tools/vagrant` directory and run
+Start powershell as Administrator and go to the `synapseml/tools/vagrant` directory and run
vagrant up
diff --git a/docs/vw.md b/docs/vw.md
index ddb0b7f692..2a3a36ace3 100644
--- a/docs/vw.md
+++ b/docs/vw.md
@@ -42,7 +42,7 @@ Furthermore it includes many advances in the area of reinforcement learning (e.g
In PySpark, you can run the `VowpalWabbitClassifier` via:
```python
-from mmlspark.vw import VowpalWabbitClassifier
+from synapse.ml.vw import VowpalWabbitClassifier
model = (VowpalWabbitClassifier(numPasses=5, args="--holdout_off --loss_function logistic")
.fit(train))
```
@@ -50,7 +50,7 @@ model = (VowpalWabbitClassifier(numPasses=5, args="--holdout_off --loss_function
Similarly, you can run the `VowpalWabbitRegressor`:
```python
-from mmlspark.vw import VowpalWabbitRegressor
+from synapse.ml.vw import VowpalWabbitRegressor
model = (VowpalWabbitRegressor(args="--holdout_off --loss_function quantile -q :: -l 0.1")
.fit(train))
```
@@ -62,7 +62,7 @@ example](../notebooks/Vowpal%20Wabbit%20-%20Quantile%20Regression%20for%20Drug%2
### Hyper-parameter tuning
-- Common parameters can also be set through methods enabling the use of SparkMLs ParamGridBuilder and CrossValidator ([example](https://github.com/Azure/mmlspark/blob/master/src/test/scala/com/microsoft/ml/spark/vw/VerifyVowpalWabbitClassifier.scala#L29)). Note if
+- Common parameters can also be set through methods enabling the use of SparkMLs ParamGridBuilder and CrossValidator ([example](https://github.com/Microsoft/SynapseML/blob/master/src/test/scala/com/microsoft/ml/spark/vw/VerifyVowpalWabbitClassifier.scala#L29)). Note if
the same parameters are passed through _args_ property (e.g. args="-l 0.2" and setLearningRate(0.5)) the _args_ value will
take precedence.
parameter
@@ -87,7 +87,7 @@ To fluently embed VW into the Spark ML eco system the following adaptions were m
- Pro: best composability with existing Spark ML components.
- Cons: due to type restrictions (e.g. feature indicies are Java integers) the maximum model size is limited to 30-bits. One could overcome this restriction by adding additional type support to the classifier/regressor to directly operate on input features (e.g. strings, int, double, ...).
-- VW hashing is separated out into the [VowpalWabbitFeaturizer](https://github.com/Azure/mmlspark/blob/master/src/test/scala/com/microsoft/ml/spark/vw/VerifyVowpalWabbitFeaturizer.scala#L34) transformer. It supports mapping Spark Dataframe schema into VWs namespaces and sparse
+- VW hashing is separated out into the [VowpalWabbitFeaturizer](https://github.com/Microsoft/SynapseML/blob/master/src/test/scala/com/microsoft/ml/spark/vw/VerifyVowpalWabbitFeaturizer.scala#L34) transformer. It supports mapping Spark Dataframe schema into VWs namespaces and sparse
features.
- Pro: featurization can be scaled to many nodes, scale independent of distributed learning.
- Pro: hashed features can be cached and efficiently re-used when performing hyper-parameter sweeps.
diff --git a/docs/your-first-model.md b/docs/your-first-model.md
index 6180791084..978ea95629 100644
--- a/docs/your-first-model.md
+++ b/docs/your-first-model.md
@@ -6,7 +6,7 @@ We also learn how to use Jupyter notebooks for developing and running the model.
### Prerequisites
-- You have installed the MMLSpark package, either as a Docker image or on a
+- You have installed the SynapseML package, either as a Docker image or on a
Spark cluster,
- You have basic knowledge of Python language,
- You have basic understanding of machine learning concepts: training, testing,
@@ -14,7 +14,7 @@ We also learn how to use Jupyter notebooks for developing and running the model.
### Working with Jupyter Notebooks
-Once you have the MMLSpark package installed, open Jupyter notebooks folder in
+Once you have the SynapseML package installed, open Jupyter notebooks folder in
your web browser
- Local Docker: `http://localhost:8888`
@@ -69,12 +69,12 @@ train, test = data.randomSplit([0.75, 0.25], seed=123)
### Training a Model
-To train the classifier model, we use the `mmlspark.TrainClassifier` class. It
+To train the classifier model, we use the `synapseml.TrainClassifier` class. It
takes in training data and a base SparkML classifier, maps the data into the
format expected by the base classifier algorithm, and fits a model.
```python
-from mmlspark.train import TrainClassifier
+from synapse.ml.train import TrainClassifier
from pyspark.ml.classification import LogisticRegression
model = TrainClassifier(model=LogisticRegression(), labelCol=" income").fit(train)
```
@@ -85,22 +85,22 @@ binarizes the label column.
### Scoring and Evaluating the Model
Finally, let's score the model against the test set, and use
-`mmlspark.ComputeModelStatistics` class to compute metrics — accuracy, AUC,
+`synapseml.ComputeModelStatistics` class to compute metrics — accuracy, AUC,
precision, recall — from the scored data.
```python
-from mmlspark.train import ComputeModelStatistics
+from synapse.ml.train import ComputeModelStatistics
prediction = model.transform(test)
metrics = ComputeModelStatistics().transform(prediction)
metrics.select('accuracy').show()
```
-And that's it: you've build your first machine learning model using the MMLSpark
-package. For help on mmlspark classes and methods, you can use Python's help()
+And that's it: you've build your first machine learning model using the SynapseML
+package. For help on synapsemlclasses and methods, you can use Python's help()
function, for example
```python
-help(mmlspark.train.TrainClassifier)
+help(synapse.ml.train.TrainClassifier)
```
Next, view our other tutorials to learn how to
diff --git a/environment.yaml b/environment.yaml
index 50602326d6..f71705581b 100644
--- a/environment.yaml
+++ b/environment.yaml
@@ -1,4 +1,4 @@
-name: mmlspark
+name: synapseml
channels:
- conda-forge
- default
@@ -11,6 +11,7 @@ dependencies:
- r-dplyr
- r-sparklyr
- r-devtools
+ - r-roxygen2
- pip:
- wheel
- sphinx
diff --git a/lightgbm/src/main/python/mmlspark/lightgbm/LightGBMClassificationModel.py b/lightgbm/src/main/python/synapse/ml/lightgbm/LightGBMClassificationModel.py
similarity index 74%
rename from lightgbm/src/main/python/mmlspark/lightgbm/LightGBMClassificationModel.py
rename to lightgbm/src/main/python/synapse/ml/lightgbm/LightGBMClassificationModel.py
index f174c519e8..ae3f186678 100644
--- a/lightgbm/src/main/python/mmlspark/lightgbm/LightGBMClassificationModel.py
+++ b/lightgbm/src/main/python/synapse/ml/lightgbm/LightGBMClassificationModel.py
@@ -1,12 +1,12 @@
# Copyright (C) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License. See LICENSE in project root for information.
-from mmlspark.lightgbm._LightGBMClassificationModel import _LightGBMClassificationModel
-from mmlspark.lightgbm.mixin import LightGBMModelMixin
+from synapse.ml.lightgbm._LightGBMClassificationModel import _LightGBMClassificationModel
+from synapse.ml.lightgbm.mixin import LightGBMModelMixin
from pyspark import SparkContext
from pyspark.ml.common import inherit_doc
from pyspark.ml.wrapper import JavaParams
-from mmlspark.core.serialize.java_params_patch import *
+from synapse.ml.core.serialize.java_params_patch import *
@inherit_doc
@@ -17,7 +17,7 @@ def loadNativeModelFromFile(filename):
Load the model from a native LightGBM text file.
"""
ctx = SparkContext._active_spark_context
- loader = ctx._jvm.com.microsoft.ml.spark.lightgbm.LightGBMClassificationModel
+ loader = ctx._jvm.com.microsoft.azure.synapse.ml.lightgbm.LightGBMClassificationModel
java_model = loader.loadNativeModelFromFile(filename)
return JavaParams._from_java(java_model)
@@ -27,7 +27,7 @@ def loadNativeModelFromString(model):
Load the model from a native LightGBM model string.
"""
ctx = SparkContext._active_spark_context
- loader = ctx._jvm.com.microsoft.ml.spark.lightgbm.LightGBMClassificationModel
+ loader = ctx._jvm.com.microsoft.azure.synapse.ml.lightgbm.LightGBMClassificationModel
java_model = loader.loadNativeModelFromString(model)
return JavaParams._from_java(java_model)
diff --git a/lightgbm/src/main/python/mmlspark/lightgbm/LightGBMRankerModel.py b/lightgbm/src/main/python/synapse/ml/lightgbm/LightGBMRankerModel.py
similarity index 76%
rename from lightgbm/src/main/python/mmlspark/lightgbm/LightGBMRankerModel.py
rename to lightgbm/src/main/python/synapse/ml/lightgbm/LightGBMRankerModel.py
index cfa514d370..8a108a76db 100644
--- a/lightgbm/src/main/python/mmlspark/lightgbm/LightGBMRankerModel.py
+++ b/lightgbm/src/main/python/synapse/ml/lightgbm/LightGBMRankerModel.py
@@ -1,12 +1,12 @@
# Copyright (C) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License. See LICENSE in project root for information.
-from mmlspark.lightgbm._LightGBMRankerModel import _LightGBMRankerModel
-from mmlspark.lightgbm.mixin import LightGBMModelMixin
+from synapse.ml.lightgbm._LightGBMRankerModel import _LightGBMRankerModel
+from synapse.ml.lightgbm.mixin import LightGBMModelMixin
from pyspark import SparkContext
from pyspark.ml.common import inherit_doc
from pyspark.ml.wrapper import JavaParams
-from mmlspark.core.serialize.java_params_patch import *
+from synapse.ml.core.serialize.java_params_patch import *
@inherit_doc
@@ -17,7 +17,7 @@ def loadNativeModelFromFile(filename):
Load the model from a native LightGBM text file.
"""
ctx = SparkContext._active_spark_context
- loader = ctx._jvm.com.microsoft.ml.spark.lightgbm.LightGBMRankerModel
+ loader = ctx._jvm.com.microsoft.azure.synapse.ml.lightgbm.LightGBMRankerModel
java_model = loader.loadNativeModelFromFile(filename)
return JavaParams._from_java(java_model)
@@ -27,7 +27,7 @@ def loadNativeModelFromString(model):
Load the model from a native LightGBM model string.
"""
ctx = SparkContext._active_spark_context
- loader = ctx._jvm.com.microsoft.ml.spark.lightgbm.LightGBMRankerModel
+ loader = ctx._jvm.com.microsoft.azure.synapse.ml.lightgbm.LightGBMRankerModel
java_model = loader.loadNativeModelFromString(model)
return JavaParams._from_java(java_model)
diff --git a/lightgbm/src/main/python/mmlspark/lightgbm/LightGBMRegressionModel.py b/lightgbm/src/main/python/synapse/ml/lightgbm/LightGBMRegressionModel.py
similarity index 73%
rename from lightgbm/src/main/python/mmlspark/lightgbm/LightGBMRegressionModel.py
rename to lightgbm/src/main/python/synapse/ml/lightgbm/LightGBMRegressionModel.py
index dd9fbf43b7..fad265372f 100644
--- a/lightgbm/src/main/python/mmlspark/lightgbm/LightGBMRegressionModel.py
+++ b/lightgbm/src/main/python/synapse/ml/lightgbm/LightGBMRegressionModel.py
@@ -1,12 +1,12 @@
# Copyright (C) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License. See LICENSE in project root for information.
-from mmlspark.lightgbm._LightGBMRegressionModel import _LightGBMRegressionModel
-from mmlspark.lightgbm.mixin import LightGBMModelMixin
+from synapse.ml.lightgbm._LightGBMRegressionModel import _LightGBMRegressionModel
+from synapse.ml.lightgbm.mixin import LightGBMModelMixin
from pyspark import SparkContext
from pyspark.ml.common import inherit_doc
from pyspark.ml.wrapper import JavaParams
-from mmlspark.core.serialize.java_params_patch import *
+from synapse.ml.core.serialize.java_params_patch import *
@inherit_doc
class LightGBMRegressionModel(LightGBMModelMixin, _LightGBMRegressionModel):
@@ -16,7 +16,7 @@ def loadNativeModelFromFile(filename):
Load the model from a native LightGBM text file.
"""
ctx = SparkContext._active_spark_context
- loader = ctx._jvm.com.microsoft.ml.spark.lightgbm.LightGBMRegressionModel
+ loader = ctx._jvm.com.microsoft.azure.synapse.ml.lightgbm.LightGBMRegressionModel
java_model = loader.loadNativeModelFromFile(filename)
return JavaParams._from_java(java_model)
@@ -26,6 +26,6 @@ def loadNativeModelFromString(model):
Load the model from a native LightGBM model string.
"""
ctx = SparkContext._active_spark_context
- loader = ctx._jvm.com.microsoft.ml.spark.lightgbm.LightGBMRegressionModel
+ loader = ctx._jvm.com.microsoft.azure.synapse.ml.lightgbm.LightGBMRegressionModel
java_model = loader.loadNativeModelFromString(model)
return JavaParams._from_java(java_model)
diff --git a/deep-learning/src/main/python/mmlspark/cntk/__init__.py b/lightgbm/src/main/python/synapse/ml/lightgbm/__init__.py
similarity index 100%
rename from deep-learning/src/main/python/mmlspark/cntk/__init__.py
rename to lightgbm/src/main/python/synapse/ml/lightgbm/__init__.py
diff --git a/lightgbm/src/main/python/mmlspark/lightgbm/mixin.py b/lightgbm/src/main/python/synapse/ml/lightgbm/mixin.py
similarity index 97%
rename from lightgbm/src/main/python/mmlspark/lightgbm/mixin.py
rename to lightgbm/src/main/python/synapse/ml/lightgbm/mixin.py
index 248d807646..c2b58e1a1f 100644
--- a/lightgbm/src/main/python/mmlspark/lightgbm/mixin.py
+++ b/lightgbm/src/main/python/synapse/ml/lightgbm/mixin.py
@@ -3,7 +3,7 @@
from pyspark.ml.linalg import SparseVector, DenseVector
from pyspark.ml.common import inherit_doc
-from mmlspark.core.serialize.java_params_patch import *
+from synapse.ml.core.serialize.java_params_patch import *
@inherit_doc
class LightGBMModelMixin:
diff --git a/lightgbm/src/main/scala/com/microsoft/ml/spark/lightgbm/LightGBMBase.scala b/lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/LightGBMBase.scala
similarity index 96%
rename from lightgbm/src/main/scala/com/microsoft/ml/spark/lightgbm/LightGBMBase.scala
rename to lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/LightGBMBase.scala
index c916952491..f815c3f7e0 100644
--- a/lightgbm/src/main/scala/com/microsoft/ml/spark/lightgbm/LightGBMBase.scala
+++ b/lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/LightGBMBase.scala
@@ -1,19 +1,19 @@
// Copyright (C) Microsoft Corporation. All rights reserved.
// Licensed under the MIT License. See LICENSE in project root for information.
-package com.microsoft.ml.spark.lightgbm
-
-import com.microsoft.ml.lightgbm.{SWIGTYPE_p_int, lightgbmlib}
-import com.microsoft.ml.spark.core.utils.ClusterUtil
-import com.microsoft.ml.spark.io.http.SharedSingleton
-import com.microsoft.ml.spark.lightgbm.ConnectionState.Finished
-import com.microsoft.ml.spark.lightgbm.LightGBMUtils.{closeConnections, handleConnection, sendDataToExecutors}
-import com.microsoft.ml.spark.lightgbm.TaskTrainingMethods.{isWorkerEnabled, prepareDatasets}
-import com.microsoft.ml.spark.lightgbm.TrainUtils._
-import com.microsoft.ml.spark.lightgbm.booster.LightGBMBooster
-import com.microsoft.ml.spark.lightgbm.dataset._
-import com.microsoft.ml.spark.lightgbm.params._
-import com.microsoft.ml.spark.logging.BasicLogging
+package com.microsoft.azure.synapse.ml.lightgbm
+
+import com.microsoft.azure.synapse.ml.core.utils.ClusterUtil
+import com.microsoft.azure.synapse.ml.io.http.SharedSingleton
+import com.microsoft.azure.synapse.ml.lightgbm.ConnectionState.Finished
+import com.microsoft.azure.synapse.ml.lightgbm.LightGBMUtils.{closeConnections, handleConnection, sendDataToExecutors}
+import com.microsoft.azure.synapse.ml.lightgbm.TaskTrainingMethods.{isWorkerEnabled, prepareDatasets}
+import com.microsoft.azure.synapse.ml.lightgbm.TrainUtils._
+import com.microsoft.azure.synapse.ml.lightgbm.booster.LightGBMBooster
+import com.microsoft.azure.synapse.ml.lightgbm.dataset.{BaseAggregatedColumns, DatasetUtils, LightGBMDataset}
+import com.microsoft.azure.synapse.ml.lightgbm.params._
+import com.microsoft.azure.synapse.ml.logging.BasicLogging
+import com.microsoft.ml.lightgbm.lightgbmlib
import org.apache.spark.broadcast.Broadcast
import org.apache.spark.ml.attribute._
import org.apache.spark.ml.linalg.SQLDataTypes.VectorType
diff --git a/lightgbm/src/main/scala/com/microsoft/ml/spark/lightgbm/LightGBMClassifier.scala b/lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/LightGBMClassifier.scala
similarity index 96%
rename from lightgbm/src/main/scala/com/microsoft/ml/spark/lightgbm/LightGBMClassifier.scala
rename to lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/LightGBMClassifier.scala
index 312d624642..350c61dc7d 100644
--- a/lightgbm/src/main/scala/com/microsoft/ml/spark/lightgbm/LightGBMClassifier.scala
+++ b/lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/LightGBMClassifier.scala
@@ -1,17 +1,17 @@
// Copyright (C) Microsoft Corporation. All rights reserved.
// Licensed under the MIT License. See LICENSE in project root for information.
-package com.microsoft.ml.spark.lightgbm
+package com.microsoft.azure.synapse.ml.lightgbm
-import com.microsoft.ml.spark.lightgbm.booster.LightGBMBooster
-import com.microsoft.ml.spark.lightgbm.params.{ClassifierTrainParams, LightGBMModelParams,
- LightGBMPredictionParams, TrainParams}
-import com.microsoft.ml.spark.logging.BasicLogging
-import org.apache.spark.ml.{ComplexParamsReadable, ComplexParamsWritable}
-import org.apache.spark.ml.param._
-import org.apache.spark.ml.util._
+import com.microsoft.azure.synapse.ml.lightgbm.booster.LightGBMBooster
+import com.microsoft.azure.synapse.ml.lightgbm.params.{
+ ClassifierTrainParams, LightGBMModelParams, LightGBMPredictionParams, TrainParams}
+import com.microsoft.azure.synapse.ml.logging.BasicLogging
import org.apache.spark.ml.classification.{ProbabilisticClassificationModel, ProbabilisticClassifier}
import org.apache.spark.ml.linalg.{Vector, Vectors}
+import org.apache.spark.ml.param._
+import org.apache.spark.ml.util._
+import org.apache.spark.ml.{ComplexParamsReadable, ComplexParamsWritable}
import org.apache.spark.sql._
import org.apache.spark.sql.functions.{col, udf}
diff --git a/lightgbm/src/main/scala/com/microsoft/ml/spark/lightgbm/LightGBMClassifier.txt b/lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/LightGBMClassifier.txt
similarity index 100%
rename from lightgbm/src/main/scala/com/microsoft/ml/spark/lightgbm/LightGBMClassifier.txt
rename to lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/LightGBMClassifier.txt
diff --git a/lightgbm/src/main/scala/com/microsoft/ml/spark/lightgbm/LightGBMConstants.scala b/lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/LightGBMConstants.scala
similarity index 97%
rename from lightgbm/src/main/scala/com/microsoft/ml/spark/lightgbm/LightGBMConstants.scala
rename to lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/LightGBMConstants.scala
index 97989dcccb..48fefced40 100644
--- a/lightgbm/src/main/scala/com/microsoft/ml/spark/lightgbm/LightGBMConstants.scala
+++ b/lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/LightGBMConstants.scala
@@ -1,7 +1,7 @@
// Copyright (C) Microsoft Corporation. All rights reserved.
// Licensed under the MIT License. See LICENSE in project root for information.
-package com.microsoft.ml.spark.lightgbm
+package com.microsoft.azure.synapse.ml.lightgbm
object LightGBMConstants {
/** The port for LightGBM Driver server, 0 (random)
diff --git a/lightgbm/src/main/scala/com/microsoft/ml/spark/lightgbm/LightGBMDelegate.scala b/lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/LightGBMDelegate.scala
similarity index 93%
rename from lightgbm/src/main/scala/com/microsoft/ml/spark/lightgbm/LightGBMDelegate.scala
rename to lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/LightGBMDelegate.scala
index 956de2c348..ca954effc6 100644
--- a/lightgbm/src/main/scala/com/microsoft/ml/spark/lightgbm/LightGBMDelegate.scala
+++ b/lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/LightGBMDelegate.scala
@@ -1,10 +1,10 @@
// Copyright (C) Microsoft Corporation. All rights reserved.
// Licensed under the MIT License. See LICENSE in project root for information.
-package com.microsoft.ml.spark.lightgbm
+package com.microsoft.azure.synapse.ml.lightgbm
-import com.microsoft.ml.spark.lightgbm.booster.LightGBMBooster
-import com.microsoft.ml.spark.lightgbm.params.TrainParams
+import com.microsoft.azure.synapse.ml.lightgbm.booster.LightGBMBooster
+import com.microsoft.azure.synapse.ml.lightgbm.params.TrainParams
import org.apache.spark.sql.Dataset
import org.apache.spark.sql.types.StructType
import org.slf4j.Logger
diff --git a/lightgbm/src/main/scala/com/microsoft/ml/spark/lightgbm/LightGBMModelMethods.scala b/lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/LightGBMModelMethods.scala
similarity index 97%
rename from lightgbm/src/main/scala/com/microsoft/ml/spark/lightgbm/LightGBMModelMethods.scala
rename to lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/LightGBMModelMethods.scala
index d65cfb8b77..47e9979695 100644
--- a/lightgbm/src/main/scala/com/microsoft/ml/spark/lightgbm/LightGBMModelMethods.scala
+++ b/lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/LightGBMModelMethods.scala
@@ -1,9 +1,9 @@
// Copyright (C) Microsoft Corporation. All rights reserved.
// Licensed under the MIT License. See LICENSE in project root for information.
-package com.microsoft.ml.spark.lightgbm
+package com.microsoft.azure.synapse.ml.lightgbm
-import com.microsoft.ml.spark.lightgbm.params.LightGBMModelParams
+import com.microsoft.azure.synapse.ml.lightgbm.params.LightGBMModelParams
import org.apache.spark.internal.Logging
import org.apache.spark.ml.linalg.{Vector, Vectors}
diff --git a/lightgbm/src/main/scala/com/microsoft/ml/spark/lightgbm/LightGBMRanker.scala b/lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/LightGBMRanker.scala
similarity index 95%
rename from lightgbm/src/main/scala/com/microsoft/ml/spark/lightgbm/LightGBMRanker.scala
rename to lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/LightGBMRanker.scala
index 037bec175f..a7fdb1986f 100644
--- a/lightgbm/src/main/scala/com/microsoft/ml/spark/lightgbm/LightGBMRanker.scala
+++ b/lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/LightGBMRanker.scala
@@ -1,19 +1,18 @@
// Copyright (C) Microsoft Corporation. All rights reserved.
// Licensed under the MIT License. See LICENSE in project root for information.
-package com.microsoft.ml.spark.lightgbm
+package com.microsoft.azure.synapse.ml.lightgbm
-import com.microsoft.ml.spark.lightgbm.booster.LightGBMBooster
-import com.microsoft.ml.spark.lightgbm.params.{LightGBMModelParams, LightGBMPredictionParams,
- RankerTrainParams, TrainParams}
-import com.microsoft.ml.spark.logging.BasicLogging
-import org.apache.spark.ml.{ComplexParamsReadable, ComplexParamsWritable, Ranker, RankerModel}
+import com.microsoft.azure.synapse.ml.lightgbm.booster.LightGBMBooster
+import com.microsoft.azure.synapse.ml.lightgbm.params.{
+ LightGBMModelParams, LightGBMPredictionParams, RankerTrainParams, TrainParams}
+import com.microsoft.azure.synapse.ml.logging.BasicLogging
+import org.apache.spark.ml.linalg.Vector
import org.apache.spark.ml.param._
import org.apache.spark.ml.util._
-import org.apache.spark.ml.linalg.Vector
+import org.apache.spark.ml.{ComplexParamsReadable, ComplexParamsWritable, Ranker, RankerModel}
import org.apache.spark.sql._
import org.apache.spark.sql.functions.{col, udf}
-import org.apache.spark.sql.types.DataType
object LightGBMRanker extends DefaultParamsReadable[LightGBMRanker]
diff --git a/lightgbm/src/main/scala/com/microsoft/ml/spark/lightgbm/LightGBMRanker.txt b/lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/LightGBMRanker.txt
similarity index 100%
rename from lightgbm/src/main/scala/com/microsoft/ml/spark/lightgbm/LightGBMRanker.txt
rename to lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/LightGBMRanker.txt
diff --git a/lightgbm/src/main/scala/com/microsoft/ml/spark/lightgbm/LightGBMRegressor.scala b/lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/LightGBMRegressor.scala
similarity index 95%
rename from lightgbm/src/main/scala/com/microsoft/ml/spark/lightgbm/LightGBMRegressor.scala
rename to lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/LightGBMRegressor.scala
index c0333e3e29..b99f689ab0 100644
--- a/lightgbm/src/main/scala/com/microsoft/ml/spark/lightgbm/LightGBMRegressor.scala
+++ b/lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/LightGBMRegressor.scala
@@ -1,17 +1,17 @@
// Copyright (C) Microsoft Corporation. All rights reserved.
// Licensed under the MIT License. See LICENSE in project root for information.
-package com.microsoft.ml.spark.lightgbm
+package com.microsoft.azure.synapse.ml.lightgbm
-import com.microsoft.ml.spark.lightgbm.booster.LightGBMBooster
-import com.microsoft.ml.spark.lightgbm.params.{LightGBMModelParams, LightGBMPredictionParams,
- RegressorTrainParams, TrainParams}
-import com.microsoft.ml.spark.logging.BasicLogging
-import org.apache.spark.ml.{BaseRegressor, ComplexParamsReadable, ComplexParamsWritable}
-import org.apache.spark.ml.param._
-import org.apache.spark.ml.util._
+import com.microsoft.azure.synapse.ml.lightgbm.booster.LightGBMBooster
+import com.microsoft.azure.synapse.ml.lightgbm.params.{
+ LightGBMModelParams, LightGBMPredictionParams, RegressorTrainParams, TrainParams}
+import com.microsoft.azure.synapse.ml.logging.BasicLogging
import org.apache.spark.ml.linalg.Vector
+import org.apache.spark.ml.param._
import org.apache.spark.ml.regression.RegressionModel
+import org.apache.spark.ml.util._
+import org.apache.spark.ml.{BaseRegressor, ComplexParamsReadable, ComplexParamsWritable}
import org.apache.spark.sql._
import org.apache.spark.sql.functions.{col, udf}
diff --git a/lightgbm/src/main/scala/com/microsoft/ml/spark/lightgbm/LightGBMRegressor.txt b/lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/LightGBMRegressor.txt
similarity index 100%
rename from lightgbm/src/main/scala/com/microsoft/ml/spark/lightgbm/LightGBMRegressor.txt
rename to lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/LightGBMRegressor.txt
diff --git a/lightgbm/src/main/scala/com/microsoft/ml/spark/lightgbm/LightGBMUtils.scala b/lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/LightGBMUtils.scala
similarity index 95%
rename from lightgbm/src/main/scala/com/microsoft/ml/spark/lightgbm/LightGBMUtils.scala
rename to lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/LightGBMUtils.scala
index 58d5e2eebb..f26c02533c 100644
--- a/lightgbm/src/main/scala/com/microsoft/ml/spark/lightgbm/LightGBMUtils.scala
+++ b/lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/LightGBMUtils.scala
@@ -1,13 +1,12 @@
// Copyright (C) Microsoft Corporation. All rights reserved.
// Licensed under the MIT License. See LICENSE in project root for information.
-package com.microsoft.ml.spark.lightgbm
+package com.microsoft.azure.synapse.ml.lightgbm
+import com.microsoft.azure.synapse.ml.core.env.NativeLoader
+import com.microsoft.azure.synapse.ml.featurize.{Featurize, FeaturizeUtilities}
+import com.microsoft.azure.synapse.ml.lightgbm.ConnectionState._
import com.microsoft.ml.lightgbm._
-import com.microsoft.ml.spark.core.env.NativeLoader
-import com.microsoft.ml.spark.featurize.{Featurize, FeaturizeUtilities}
-import com.microsoft.ml.spark.lightgbm.ConnectionState._
-import com.microsoft.ml.spark.lightgbm.params.TrainParams
import org.apache.spark.ml.PipelineModel
import org.apache.spark.sql.Dataset
import org.apache.spark.{SparkEnv, TaskContext}
diff --git a/lightgbm/src/main/scala/com/microsoft/ml/spark/lightgbm/SharedState.scala b/lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/SharedState.scala
similarity index 92%
rename from lightgbm/src/main/scala/com/microsoft/ml/spark/lightgbm/SharedState.scala
rename to lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/SharedState.scala
index d61337e19c..45d103730b 100644
--- a/lightgbm/src/main/scala/com/microsoft/ml/spark/lightgbm/SharedState.scala
+++ b/lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/SharedState.scala
@@ -1,18 +1,17 @@
// Copyright (C) Microsoft Corporation. All rights reserved.
// Licensed under the MIT License. See LICENSE in project root for information.
-package com.microsoft.ml.spark.lightgbm
+package com.microsoft.azure.synapse.ml.lightgbm
-import java.util.concurrent.CountDownLatch
-
-import com.microsoft.ml.spark.lightgbm.dataset.DatasetUtils._
-import com.microsoft.ml.spark.lightgbm.dataset._
-import com.microsoft.ml.spark.lightgbm.params.TrainParams
-import org.apache.spark.ml.linalg.{DenseVector, SparseVector}
+import com.microsoft.azure.synapse.ml.lightgbm.dataset.DatasetUtils._
+import com.microsoft.azure.synapse.ml.lightgbm.dataset._
+import com.microsoft.azure.synapse.ml.lightgbm.params.TrainParams
import org.apache.spark.sql.Row
import org.apache.spark.sql.types.StructType
import org.slf4j.Logger
+import java.util.concurrent.CountDownLatch
+
class SharedState(columnParams: ColumnParams,
schema: StructType,
trainParams: TrainParams) {
diff --git a/lightgbm/src/main/scala/com/microsoft/ml/spark/lightgbm/TaskTrainingMethods.scala b/lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/TaskTrainingMethods.scala
similarity index 90%
rename from lightgbm/src/main/scala/com/microsoft/ml/spark/lightgbm/TaskTrainingMethods.scala
rename to lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/TaskTrainingMethods.scala
index 1c1944a399..bd09b4d784 100644
--- a/lightgbm/src/main/scala/com/microsoft/ml/spark/lightgbm/TaskTrainingMethods.scala
+++ b/lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/TaskTrainingMethods.scala
@@ -1,10 +1,10 @@
// Copyright (C) Microsoft Corporation. All rights reserved.
// Licensed under the MIT License. See LICENSE in project root for information.
-package com.microsoft.ml.spark.lightgbm
+package com.microsoft.azure.synapse.ml.lightgbm
-import com.microsoft.ml.spark.lightgbm.dataset.BaseAggregatedColumns
-import com.microsoft.ml.spark.lightgbm.params.TrainParams
+import com.microsoft.azure.synapse.ml.lightgbm.dataset.BaseAggregatedColumns
+import com.microsoft.azure.synapse.ml.lightgbm.params.TrainParams
import org.apache.spark.broadcast.Broadcast
import org.apache.spark.sql.Row
import org.slf4j.Logger
diff --git a/lightgbm/src/main/scala/com/microsoft/ml/spark/lightgbm/TrainUtils.scala b/lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/TrainUtils.scala
similarity index 97%
rename from lightgbm/src/main/scala/com/microsoft/ml/spark/lightgbm/TrainUtils.scala
rename to lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/TrainUtils.scala
index 16f554c916..a27beae5fd 100644
--- a/lightgbm/src/main/scala/com/microsoft/ml/spark/lightgbm/TrainUtils.scala
+++ b/lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/TrainUtils.scala
@@ -1,21 +1,21 @@
// Copyright (C) Microsoft Corporation. All rights reserved.
// Licensed under the MIT License. See LICENSE in project root for information.
-package com.microsoft.ml.spark.lightgbm
-
-import java.io._
-import java.net._
+package com.microsoft.azure.synapse.ml.lightgbm
+import com.microsoft.azure.synapse.ml.core.env.StreamUtilities._
+import com.microsoft.azure.synapse.ml.core.utils.FaultToleranceUtils
+import com.microsoft.azure.synapse.ml.lightgbm.booster.LightGBMBooster
+import com.microsoft.azure.synapse.ml.lightgbm.dataset.LightGBMDataset
+import com.microsoft.azure.synapse.ml.lightgbm.params.{ClassifierTrainParams, TrainParams}
import com.microsoft.ml.lightgbm._
-import com.microsoft.ml.spark.core.env.StreamUtilities._
-import com.microsoft.ml.spark.core.utils.FaultToleranceUtils
-import com.microsoft.ml.spark.lightgbm.booster.LightGBMBooster
-import com.microsoft.ml.spark.lightgbm.dataset.LightGBMDataset
-import com.microsoft.ml.spark.lightgbm.params.{ClassifierTrainParams, TrainParams}
-import org.apache.spark.{BarrierTaskContext, TaskContext}
import org.apache.spark.sql.types.StructType
+import org.apache.spark.{BarrierTaskContext, TaskContext}
import org.slf4j.Logger
+import java.io._
+import java.net._
+
case class NetworkParams(defaultListenPort: Int, addr: String, port: Int, barrierExecutionMode: Boolean)
case class ColumnParams(labelColumn: String, featuresColumn: String, weightColumn: Option[String],
initScoreColumn: Option[String], groupColumn: Option[String])
diff --git a/lightgbm/src/main/scala/com/microsoft/ml/spark/lightgbm/booster/LightGBMBooster.scala b/lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/booster/LightGBMBooster.scala
similarity index 98%
rename from lightgbm/src/main/scala/com/microsoft/ml/spark/lightgbm/booster/LightGBMBooster.scala
rename to lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/booster/LightGBMBooster.scala
index c90dc0fdcf..c19f7c9020 100644
--- a/lightgbm/src/main/scala/com/microsoft/ml/spark/lightgbm/booster/LightGBMBooster.scala
+++ b/lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/booster/LightGBMBooster.scala
@@ -1,12 +1,12 @@
// Copyright (C) Microsoft Corporation. All rights reserved.
// Licensed under the MIT License. See LICENSE in project root for information.
-package com.microsoft.ml.spark.lightgbm.booster
+package com.microsoft.azure.synapse.ml.lightgbm.booster
+import com.microsoft.azure.synapse.ml.lightgbm.dataset.LightGBMDataset
+import com.microsoft.azure.synapse.ml.lightgbm.swig.SwigUtils
+import com.microsoft.azure.synapse.ml.lightgbm.{LightGBMConstants, LightGBMUtils}
import com.microsoft.ml.lightgbm._
-import com.microsoft.ml.spark.lightgbm.{LightGBMConstants, LightGBMUtils}
-import com.microsoft.ml.spark.lightgbm.dataset.LightGBMDataset
-import com.microsoft.ml.spark.lightgbm.swig.SwigUtils
import org.apache.spark.ml.linalg.{DenseVector, SparseVector, Vector}
import org.apache.spark.sql.{SaveMode, SparkSession}
diff --git a/lightgbm/src/main/scala/com/microsoft/ml/spark/lightgbm/dataset/DatasetAggregator.scala b/lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/dataset/DatasetAggregator.scala
similarity index 98%
rename from lightgbm/src/main/scala/com/microsoft/ml/spark/lightgbm/dataset/DatasetAggregator.scala
rename to lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/dataset/DatasetAggregator.scala
index 151ce98e36..33ba5cda66 100644
--- a/lightgbm/src/main/scala/com/microsoft/ml/spark/lightgbm/dataset/DatasetAggregator.scala
+++ b/lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/dataset/DatasetAggregator.scala
@@ -1,19 +1,18 @@
// Copyright (C) Microsoft Corporation. All rights reserved.
// Licensed under the MIT License. See LICENSE in project root for information.
-package com.microsoft.ml.spark.lightgbm.dataset
+package com.microsoft.azure.synapse.ml.lightgbm.dataset
+import com.microsoft.azure.synapse.ml.lightgbm.dataset.DatasetUtils.getRowAsDoubleArray
+import com.microsoft.azure.synapse.ml.lightgbm.swig._
+import com.microsoft.azure.synapse.ml.lightgbm.{ColumnParams, LightGBMUtils}
import com.microsoft.ml.lightgbm.{SWIGTYPE_p_int, lightgbmlib, lightgbmlibConstants}
-
-import java.util.concurrent.atomic.AtomicLong
-import com.microsoft.ml.spark.lightgbm.{ColumnParams, LightGBMUtils}
-import com.microsoft.ml.spark.lightgbm.dataset.DatasetUtils.getRowAsDoubleArray
-import com.microsoft.ml.spark.lightgbm.swig._
import org.apache.spark.ml.linalg.SQLDataTypes.VectorType
import org.apache.spark.ml.linalg.{DenseVector, SparseVector}
import org.apache.spark.sql.Row
import org.apache.spark.sql.types.StructType
+import java.util.concurrent.atomic.AtomicLong
import scala.collection.mutable.ListBuffer
private[lightgbm] object ChunkedArrayUtils {
diff --git a/lightgbm/src/main/scala/com/microsoft/ml/spark/lightgbm/dataset/DatasetUtils.scala b/lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/dataset/DatasetUtils.scala
similarity index 95%
rename from lightgbm/src/main/scala/com/microsoft/ml/spark/lightgbm/dataset/DatasetUtils.scala
rename to lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/dataset/DatasetUtils.scala
index 4fe55bb411..a6664fb88d 100644
--- a/lightgbm/src/main/scala/com/microsoft/ml/spark/lightgbm/dataset/DatasetUtils.scala
+++ b/lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/dataset/DatasetUtils.scala
@@ -1,11 +1,11 @@
// Copyright (C) Microsoft Corporation. All rights reserved.
// Licensed under the MIT License. See LICENSE in project root for information.
-package com.microsoft.ml.spark.lightgbm.dataset
+package com.microsoft.azure.synapse.ml.lightgbm.dataset
+import com.microsoft.azure.synapse.ml.lightgbm.ColumnParams
+import com.microsoft.azure.synapse.ml.lightgbm.swig.DoubleChunkedArray
import com.microsoft.ml.lightgbm.{doubleChunkedArray, floatChunkedArray}
-import com.microsoft.ml.spark.lightgbm.ColumnParams
-import com.microsoft.ml.spark.lightgbm.swig.DoubleChunkedArray
import org.apache.spark.ml.linalg.SQLDataTypes.VectorType
import org.apache.spark.ml.linalg.{DenseVector, SparseVector}
import org.apache.spark.sql.Row
diff --git a/lightgbm/src/main/scala/com/microsoft/ml/spark/lightgbm/dataset/LightGBMDataset.scala b/lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/dataset/LightGBMDataset.scala
similarity index 97%
rename from lightgbm/src/main/scala/com/microsoft/ml/spark/lightgbm/dataset/LightGBMDataset.scala
rename to lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/dataset/LightGBMDataset.scala
index 0c513cfd23..2d7ba5c99d 100644
--- a/lightgbm/src/main/scala/com/microsoft/ml/spark/lightgbm/dataset/LightGBMDataset.scala
+++ b/lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/dataset/LightGBMDataset.scala
@@ -1,12 +1,12 @@
// Copyright (C) Microsoft Corporation. All rights reserved.
// Licensed under the MIT License. See LICENSE in project root for information.
-package com.microsoft.ml.spark.lightgbm.dataset
+package com.microsoft.azure.synapse.ml.lightgbm.dataset
+import com.microsoft.azure.synapse.ml.lightgbm.LightGBMUtils
import com.microsoft.lightgbm.SwigPtrWrapper
import com.microsoft.ml.lightgbm._
-import com.microsoft.ml.spark.lightgbm.LightGBMUtils
-import com.microsoft.ml.spark.lightgbm.dataset.DatasetUtils.countCardinality
+import DatasetUtils.countCardinality
import scala.reflect.ClassTag
diff --git a/lightgbm/src/main/scala/com/microsoft/ml/spark/lightgbm/params/FObjParam.scala b/lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/params/FObjParam.scala
similarity index 82%
rename from lightgbm/src/main/scala/com/microsoft/ml/spark/lightgbm/params/FObjParam.scala
rename to lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/params/FObjParam.scala
index ff166a9d38..79a77bdfe5 100644
--- a/lightgbm/src/main/scala/com/microsoft/ml/spark/lightgbm/params/FObjParam.scala
+++ b/lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/params/FObjParam.scala
@@ -1,9 +1,9 @@
// Copyright (C) Microsoft Corporation. All rights reserved.
// Licensed under the MIT License. See LICENSE in project root for information.
-package com.microsoft.ml.spark.lightgbm.params
+package com.microsoft.azure.synapse.ml.lightgbm.params
-import com.microsoft.ml.spark.core.serialize.ComplexParam
+import com.microsoft.azure.synapse.ml.core.serialize.ComplexParam
import org.apache.spark.ml.param.Params
/** Param for FObjTrait. Needed as spark has explicit params for many different
diff --git a/lightgbm/src/main/scala/com/microsoft/ml/spark/lightgbm/params/FObjTrait.scala b/lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/params/FObjTrait.scala
similarity index 81%
rename from lightgbm/src/main/scala/com/microsoft/ml/spark/lightgbm/params/FObjTrait.scala
rename to lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/params/FObjTrait.scala
index dbc1304e8f..006fdf551e 100644
--- a/lightgbm/src/main/scala/com/microsoft/ml/spark/lightgbm/params/FObjTrait.scala
+++ b/lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/params/FObjTrait.scala
@@ -1,9 +1,9 @@
// Copyright (C) Microsoft Corporation. All rights reserved.
// Licensed under the MIT License. See LICENSE in project root for information.
-package com.microsoft.ml.spark.lightgbm.params
+package com.microsoft.azure.synapse.ml.lightgbm.params
-import com.microsoft.ml.spark.lightgbm.dataset.LightGBMDataset
+import com.microsoft.azure.synapse.ml.lightgbm.dataset.LightGBMDataset
trait FObjTrait extends Serializable {
/**
diff --git a/lightgbm/src/main/scala/com/microsoft/ml/spark/lightgbm/params/LightGBMBoosterParam.scala b/lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/params/LightGBMBoosterParam.scala
similarity index 75%
rename from lightgbm/src/main/scala/com/microsoft/ml/spark/lightgbm/params/LightGBMBoosterParam.scala
rename to lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/params/LightGBMBoosterParam.scala
index 50afdec45b..484d962c67 100644
--- a/lightgbm/src/main/scala/com/microsoft/ml/spark/lightgbm/params/LightGBMBoosterParam.scala
+++ b/lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/params/LightGBMBoosterParam.scala
@@ -1,10 +1,10 @@
// Copyright (C) Microsoft Corporation. All rights reserved.
// Licensed under the MIT License. See LICENSE in project root for information.
-package com.microsoft.ml.spark.lightgbm.params
+package com.microsoft.azure.synapse.ml.lightgbm.params
-import com.microsoft.ml.spark.core.serialize.ComplexParam
-import com.microsoft.ml.spark.lightgbm.booster.LightGBMBooster
+import com.microsoft.azure.synapse.ml.core.serialize.ComplexParam
+import com.microsoft.azure.synapse.ml.lightgbm.booster.LightGBMBooster
import org.apache.spark.ml.param.Params
/** Custom ComplexParam for LightGBMBooster, to make it settable on the LightGBM models.
diff --git a/lightgbm/src/main/scala/com/microsoft/ml/spark/lightgbm/params/LightGBMParams.scala b/lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/params/LightGBMParams.scala
similarity index 97%
rename from lightgbm/src/main/scala/com/microsoft/ml/spark/lightgbm/params/LightGBMParams.scala
rename to lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/params/LightGBMParams.scala
index 6a938e6ad2..b7c27f8821 100644
--- a/lightgbm/src/main/scala/com/microsoft/ml/spark/lightgbm/params/LightGBMParams.scala
+++ b/lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/params/LightGBMParams.scala
@@ -1,12 +1,12 @@
// Copyright (C) Microsoft Corporation. All rights reserved.
// Licensed under the MIT License. See LICENSE in project root for information.
-package com.microsoft.ml.spark.lightgbm.params
+package com.microsoft.azure.synapse.ml.lightgbm.params
-import com.microsoft.ml.spark.codegen.Wrappable
-import com.microsoft.ml.spark.core.contracts.{HasInitScoreCol, HasValidationIndicatorCol, HasWeightCol}
-import com.microsoft.ml.spark.lightgbm.booster.LightGBMBooster
-import com.microsoft.ml.spark.lightgbm.{LightGBMConstants, LightGBMDelegate}
+import com.microsoft.azure.synapse.ml.codegen.Wrappable
+import com.microsoft.azure.synapse.ml.core.contracts.{HasInitScoreCol, HasValidationIndicatorCol, HasWeightCol}
+import com.microsoft.azure.synapse.ml.lightgbm.booster.LightGBMBooster
+import com.microsoft.azure.synapse.ml.lightgbm.{LightGBMConstants, LightGBMDelegate}
import org.apache.spark.ml.param._
import org.apache.spark.ml.util.DefaultParamsWritable
@@ -82,7 +82,7 @@ trait LightGBMExecutionParams extends Wrappable {
val numTasks = new IntParam(this, "numTasks",
"Advanced parameter to specify the number of tasks. " +
- "MMLSpark tries to guess this based on cluster configuration, but this parameter can be used to override.")
+ "SynapseML tries to guess this based on cluster configuration, but this parameter can be used to override.")
setDefault(numTasks -> 0)
def getNumTasks: Int = $(numTasks)
diff --git a/lightgbm/src/main/scala/com/microsoft/ml/spark/lightgbm/params/TrainParams.scala b/lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/params/TrainParams.scala
similarity index 98%
rename from lightgbm/src/main/scala/com/microsoft/ml/spark/lightgbm/params/TrainParams.scala
rename to lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/params/TrainParams.scala
index 6bb55ffc59..fc0bf270c1 100644
--- a/lightgbm/src/main/scala/com/microsoft/ml/spark/lightgbm/params/TrainParams.scala
+++ b/lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/params/TrainParams.scala
@@ -1,9 +1,9 @@
// Copyright (C) Microsoft Corporation. All rights reserved.
// Licensed under the MIT License. See LICENSE in project root for information.
-package com.microsoft.ml.spark.lightgbm.params
+package com.microsoft.azure.synapse.ml.lightgbm.params
-import com.microsoft.ml.spark.lightgbm.{LightGBMConstants, LightGBMDelegate}
+import com.microsoft.azure.synapse.ml.lightgbm.{LightGBMConstants, LightGBMDelegate}
/** Defines the common Booster parameters passed to the LightGBM learners.
*/
diff --git a/lightgbm/src/main/scala/com/microsoft/ml/spark/lightgbm/swig/SwigUtils.scala b/lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/swig/SwigUtils.scala
similarity index 98%
rename from lightgbm/src/main/scala/com/microsoft/ml/spark/lightgbm/swig/SwigUtils.scala
rename to lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/swig/SwigUtils.scala
index d84992c6eb..d28ca62c06 100644
--- a/lightgbm/src/main/scala/com/microsoft/ml/spark/lightgbm/swig/SwigUtils.scala
+++ b/lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/swig/SwigUtils.scala
@@ -1,7 +1,7 @@
// Copyright (C) Microsoft Corporation. All rights reserved.
// Licensed under the MIT License. See LICENSE in project root for information.
-package com.microsoft.ml.spark.lightgbm.swig
+package com.microsoft.azure.synapse.ml.lightgbm.swig
import com.microsoft.ml.lightgbm.{SWIGTYPE_p_double, SWIGTYPE_p_float, SWIGTYPE_p_int, doubleChunkedArray,
floatChunkedArray, int32ChunkedArray, lightgbmlib}
diff --git a/lightgbm/src/test/scala/com/microsoft/ml/spark/lightgbm/split1/VerifyLightGBMClassifier.scala b/lightgbm/src/test/scala/com/microsoft/azure/synapse/ml/lightgbm/split1/VerifyLightGBMClassifier.scala
similarity index 98%
rename from lightgbm/src/test/scala/com/microsoft/ml/spark/lightgbm/split1/VerifyLightGBMClassifier.scala
rename to lightgbm/src/test/scala/com/microsoft/azure/synapse/ml/lightgbm/split1/VerifyLightGBMClassifier.scala
index 88f015efd3..8c52c73951 100644
--- a/lightgbm/src/test/scala/com/microsoft/ml/spark/lightgbm/split1/VerifyLightGBMClassifier.scala
+++ b/lightgbm/src/test/scala/com/microsoft/azure/synapse/ml/lightgbm/split1/VerifyLightGBMClassifier.scala
@@ -1,19 +1,16 @@
// Copyright (C) Microsoft Corporation. All rights reserved.
// Licensed under the MIT License. See LICENSE in project root for information.
-package com.microsoft.ml.spark.lightgbm.split1
-
-import java.io.File
-import java.nio.file.{Files, Path, Paths}
-
-import com.microsoft.ml.spark.core.test.base.TestBase
-import com.microsoft.ml.spark.core.test.benchmarks.{Benchmarks, DatasetUtils}
-import com.microsoft.ml.spark.core.test.fuzzing.{EstimatorFuzzing, TestObject}
-import com.microsoft.ml.spark.featurize.ValueIndexer
-import com.microsoft.ml.spark.lightgbm._
-import com.microsoft.ml.spark.lightgbm.dataset.LightGBMDataset
-import com.microsoft.ml.spark.lightgbm.params.{FObjTrait, TrainParams}
-import com.microsoft.ml.spark.stages.MultiColumnAdapter
+package com.microsoft.azure.synapse.ml.lightgbm.split1
+
+import com.microsoft.azure.synapse.ml.core.test.base.TestBase
+import com.microsoft.azure.synapse.ml.core.test.benchmarks.{Benchmarks, DatasetUtils}
+import com.microsoft.azure.synapse.ml.core.test.fuzzing.{EstimatorFuzzing, TestObject}
+import com.microsoft.azure.synapse.ml.featurize.ValueIndexer
+import com.microsoft.azure.synapse.ml.lightgbm.dataset.LightGBMDataset
+import com.microsoft.azure.synapse.ml.lightgbm.params.{FObjTrait, TrainParams}
+import com.microsoft.azure.synapse.ml.lightgbm._
+import com.microsoft.azure.synapse.ml.stages.MultiColumnAdapter
import org.apache.commons.io.FileUtils
import org.apache.spark.TaskContext
import org.apache.spark.ml.evaluation.{BinaryClassificationEvaluator, MulticlassClassificationEvaluator}
@@ -22,11 +19,13 @@ import org.apache.spark.ml.linalg.{DenseVector, Vector}
import org.apache.spark.ml.tuning.{ParamGridBuilder, TrainValidationSplit}
import org.apache.spark.ml.util.MLReadable
import org.apache.spark.ml.{Estimator, Model}
-import org.apache.spark.sql.{DataFrame, Row}
import org.apache.spark.sql.catalyst.encoders.RowEncoder
import org.apache.spark.sql.functions._
+import org.apache.spark.sql.{DataFrame, Row}
import org.slf4j.Logger
+import java.io.File
+import java.nio.file.{Files, Path, Paths}
import scala.math.exp
@SerialVersionUID(100L)
diff --git a/lightgbm/src/test/scala/com/microsoft/ml/spark/lightgbm/split2/VerifyLightGBMRanker.scala b/lightgbm/src/test/scala/com/microsoft/azure/synapse/ml/lightgbm/split2/VerifyLightGBMRanker.scala
similarity index 85%
rename from lightgbm/src/test/scala/com/microsoft/ml/spark/lightgbm/split2/VerifyLightGBMRanker.scala
rename to lightgbm/src/test/scala/com/microsoft/azure/synapse/ml/lightgbm/split2/VerifyLightGBMRanker.scala
index a1dcd76db8..e01b39e587 100644
--- a/lightgbm/src/test/scala/com/microsoft/ml/spark/lightgbm/split2/VerifyLightGBMRanker.scala
+++ b/lightgbm/src/test/scala/com/microsoft/azure/synapse/ml/lightgbm/split2/VerifyLightGBMRanker.scala
@@ -1,21 +1,19 @@
// Copyright (C) Microsoft Corporation. All rights reserved.
// Licensed under the MIT License. See LICENSE in project root for information.
-package com.microsoft.ml.spark.lightgbm.split2
-
-import com.microsoft.ml.spark.core.test.benchmarks.{Benchmarks, DatasetUtils}
-import com.microsoft.ml.spark.core.test.fuzzing.{EstimatorFuzzing, TestObject}
-import com.microsoft.ml.spark.lightgbm.dataset.{DatasetUtils => CardinalityUtils}
-import com.microsoft.ml.spark.lightgbm.split1.LightGBMTestUtils
-import com.microsoft.ml.spark.lightgbm.{LightGBMRanker, LightGBMRankerModel, LightGBMUtils}
-import org.apache.spark.SparkException
+package com.microsoft.azure.synapse.ml.lightgbm.split2
+
+import com.microsoft.azure.synapse.ml.core.test.benchmarks.{Benchmarks, DatasetUtils}
+import com.microsoft.azure.synapse.ml.core.test.fuzzing.{EstimatorFuzzing, TestObject}
+import com.microsoft.azure.synapse.ml.lightgbm.dataset.DatasetUtils.countCardinality
+import com.microsoft.azure.synapse.ml.lightgbm.split1.LightGBMTestUtils
+import com.microsoft.azure.synapse.ml.lightgbm.{LightGBMRanker, LightGBMRankerModel, LightGBMUtils}
import org.apache.spark.ml.feature.VectorAssembler
import org.apache.spark.ml.linalg.Vectors
import org.apache.spark.ml.util.MLReadable
import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.functions.{col, monotonically_increasing_id, _}
import org.apache.spark.sql.types.StructType
-import org.scalatest.Matchers._
//scalastyle:off magic.number
/** Tests to validate the functionality of LightGBM Ranker module. */
@@ -125,15 +123,13 @@ class VerifyLightGBMRanker extends Benchmarks with EstimatorFuzzing[LightGBMRank
}
test("verify cardinality counts: int") {
- val counts = CardinalityUtils.countCardinality(Seq(1, 1, 2, 2, 2, 3))
-
- counts shouldBe Seq(2, 3, 1)
+ val counts = countCardinality(Seq(1, 1, 2, 2, 2, 3))
+ assert(counts === Seq(2, 3, 1))
}
test("verify cardinality counts: string") {
- val counts = CardinalityUtils.countCardinality(Seq("a", "a", "b", "b", "b", "c"))
-
- counts shouldBe Seq(2, 3, 1)
+ val counts = countCardinality(Seq("a", "a", "b", "b", "b", "c"))
+ assert(counts === Seq(2, 3, 1))
}
override def testObjects(): Seq[TestObject[LightGBMRanker]] = {
diff --git a/lightgbm/src/test/scala/com/microsoft/ml/spark/lightgbm/split2/VerifyLightGBMRegressor.scala b/lightgbm/src/test/scala/com/microsoft/azure/synapse/ml/lightgbm/split2/VerifyLightGBMRegressor.scala
similarity index 94%
rename from lightgbm/src/test/scala/com/microsoft/ml/spark/lightgbm/split2/VerifyLightGBMRegressor.scala
rename to lightgbm/src/test/scala/com/microsoft/azure/synapse/ml/lightgbm/split2/VerifyLightGBMRegressor.scala
index a3865e9fad..cbca56fc8a 100644
--- a/lightgbm/src/test/scala/com/microsoft/ml/spark/lightgbm/split2/VerifyLightGBMRegressor.scala
+++ b/lightgbm/src/test/scala/com/microsoft/azure/synapse/ml/lightgbm/split2/VerifyLightGBMRegressor.scala
@@ -1,21 +1,20 @@
// Copyright (C) Microsoft Corporation. All rights reserved.
// Licensed under the MIT License. See LICENSE in project root for information.
-package com.microsoft.ml.spark.lightgbm.split2
-
-import com.microsoft.ml.spark.core.test.base.TestBase
-import com.microsoft.ml.spark.core.test.benchmarks.{Benchmarks, DatasetUtils}
-import com.microsoft.ml.spark.core.test.fuzzing.{EstimatorFuzzing, TestObject}
-import com.microsoft.ml.spark.lightgbm.split1.LightGBMTestUtils
-import com.microsoft.ml.spark.lightgbm.{LightGBMRegressionModel, LightGBMRegressor, LightGBMUtils}
-import com.microsoft.ml.spark.stages.MultiColumnAdapter
+package com.microsoft.azure.synapse.ml.lightgbm.split2
+
+import com.microsoft.azure.synapse.ml.core.test.benchmarks.{Benchmarks, DatasetUtils}
+import com.microsoft.azure.synapse.ml.core.test.fuzzing.{EstimatorFuzzing, TestObject}
+import com.microsoft.azure.synapse.ml.lightgbm.split1.LightGBMTestUtils
+import com.microsoft.azure.synapse.ml.lightgbm.{LightGBMRegressionModel, LightGBMRegressor, LightGBMUtils}
+import com.microsoft.azure.synapse.ml.stages.MultiColumnAdapter
import org.apache.spark.ml.evaluation.RegressionEvaluator
import org.apache.spark.ml.feature.StringIndexer
import org.apache.spark.ml.linalg.Vector
import org.apache.spark.ml.tuning.{CrossValidator, ParamGridBuilder, TrainValidationSplit}
import org.apache.spark.ml.util.MLReadable
-import org.apache.spark.sql.{DataFrame, Row}
import org.apache.spark.sql.functions.{avg, col, lit, when}
+import org.apache.spark.sql.{DataFrame, Row}
// scalastyle:off magic.number
diff --git a/notebooks/AzureSearchIndex - Met Artworks.ipynb b/notebooks/AzureSearchIndex - Met Artworks.ipynb
index 70304755ae..f43739bb87 100644
--- a/notebooks/AzureSearchIndex - Met Artworks.ipynb
+++ b/notebooks/AzureSearchIndex - Met Artworks.ipynb
@@ -10,7 +10,7 @@
{
"cell_type": "markdown",
"source": [
- "In this example, we show how you can enrich data using Cognitive Skills and write to an Azure Search Index using MMLSpark. We use a subset of The MET's open-access collection and enrich it by passing it through 'Describe Image' and a custom 'Image Similarity' skill. The results are then written to a searchable index."
+ "In this example, we show how you can enrich data using Cognitive Skills and write to an Azure Search Index using SynapseML. We use a subset of The MET's open-access collection and enrich it by passing it through 'Describe Image' and a custom 'Image Similarity' skill. The results are then written to a searchable index."
],
"metadata": {}
},
@@ -85,8 +85,8 @@
"cell_type": "code",
"execution_count": 7,
"source": [
- "from mmlspark.cognitive import AnalyzeImage\r\n",
- "from mmlspark.stages import SelectColumns\r\n",
+ "from synapse.ml.cognitive import AnalyzeImage\r\n",
+ "from synapse.ml.stages import SelectColumns\r\n",
"\r\n",
"#define pipeline\r\n",
"describeImage = (AnalyzeImage()\r\n",
@@ -124,7 +124,7 @@
"cell_type": "code",
"execution_count": 10,
"source": [
- "from mmlspark.cognitive import *\r\n",
+ "from synapse.ml.cognitive import *\r\n",
"df2.writeToAzureSearch(\r\n",
" subscriptionKey=AZURE_SEARCH_KEY,\r\n",
" actionCol=\"searchAction\",\r\n",
diff --git a/notebooks/Classification - Adult Census with Vowpal Wabbit.ipynb b/notebooks/Classification - Adult Census with Vowpal Wabbit.ipynb
index 7a8641a772..a2a76eb8cf 100644
--- a/notebooks/Classification - Adult Census with Vowpal Wabbit.ipynb
+++ b/notebooks/Classification - Adult Census with Vowpal Wabbit.ipynb
@@ -4,10 +4,10 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "# Classification - Adult Census using Vowpal Wabbit in MMLSpark\n",
+ "# Classification - Adult Census using Vowpal Wabbit in SynapseML\n",
"\n",
- "In this example, we predict incomes from the *Adult Census* dataset using Vowpal Wabbit (VW) classifier in MMLSpark.\n",
- "First, we read the data and split it into train and test sets as in this [example](https://github.com/Azure/mmlspark/blob/master/notebooks/Classification%20-%20Adult%20Census.ipynb\n",
+ "In this example, we predict incomes from the *Adult Census* dataset using Vowpal Wabbit (VW) classifier in SynapseML.\n",
+ "First, we read the data and split it into train and test sets as in this [example](https://github.com/Microsoft/SynapseML/blob/master/notebooks/Classification%20-%20Adult%20Census.ipynb\n",
")."
]
},
@@ -51,7 +51,7 @@
"source": [
"from pyspark.sql.functions import when, col\n",
"from pyspark.ml import Pipeline\n",
- "from mmlspark.vw import VowpalWabbitFeaturizer, VowpalWabbitClassifier\n",
+ "from synapse.ml.vw import VowpalWabbitFeaturizer, VowpalWabbitClassifier\n",
"\n",
"# Define classification label\n",
"train = train.withColumn(\"label\", when(col(\"income\").contains(\"<\"), 0.0).otherwise(1.0)).repartition(1).cache()\n",
@@ -121,7 +121,7 @@
"metadata": {},
"outputs": [],
"source": [
- "from mmlspark.train import ComputeModelStatistics\n",
+ "from synapse.ml.train import ComputeModelStatistics\n",
"metrics = ComputeModelStatistics(evaluationMetric=\"classification\", \n",
" labelCol=\"label\", \n",
" scoredLabelsCol=\"prediction\").transform(prediction)\n",
diff --git a/notebooks/Classification - Adult Census.ipynb b/notebooks/Classification - Adult Census.ipynb
index 6a271e1835..b1ce8b6149 100644
--- a/notebooks/Classification - Adult Census.ipynb
+++ b/notebooks/Classification - Adult Census.ipynb
@@ -7,7 +7,7 @@
"\n",
"In this example, we try to predict incomes from the *Adult Census* dataset.\n",
"\n",
- "First, we import the packages (use `help(mmlspark)` to view contents),"
+ "First, we import the packages (use `help(synapse)` to view contents),"
],
"metadata": {}
},
@@ -56,7 +56,7 @@
"cell_type": "markdown",
"source": [
"`TrainClassifier` can be used to initialize and fit a model, it wraps SparkML classifiers.\n",
- "You can use `help(mmlspark.train.TrainClassifier)` to view the different parameters.\n",
+ "You can use `help(synapse.ml.train.TrainClassifier)` to view the different parameters.\n",
"\n",
"Note that it implicitly converts the data into the format expected by the algorithm: tokenize\n",
"and hash strings, one-hot encodes categorical variables, assembles the features into a vector\n",
@@ -68,7 +68,7 @@
"cell_type": "code",
"execution_count": null,
"source": [
- "from mmlspark.train import TrainClassifier\r\n",
+ "from synapse.ml.train import TrainClassifier\r\n",
"from pyspark.ml.classification import LogisticRegression\r\n",
"model = TrainClassifier(model=LogisticRegression(), labelCol=\"income\", numFeatures=256).fit(train)"
],
diff --git a/notebooks/Classification - Before and After MMLSpark.ipynb b/notebooks/Classification - Before and After MMLSpark.ipynb
index bf0430fc5a..9a0fa1bbda 100644
--- a/notebooks/Classification - Before and After MMLSpark.ipynb
+++ b/notebooks/Classification - Before and After MMLSpark.ipynb
@@ -4,7 +4,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "## Classification - Before and After MMLSpark\n",
+ "## Classification - Before and After SynapseML\n",
"\n",
"### 1. Introduction\n",
"\n",
@@ -12,7 +12,7 @@
"\n",
"In this tutorial, we perform the same classification task in two\n",
"different ways: once using plain **`pyspark`** and once using the\n",
- "**`mmlspark`** library. The two methods yield the same performance,\n",
+ "**`synapseml`** library. The two methods yield the same performance,\n",
"but one of the two libraries is drastically simpler to use and iterate\n",
"on (can you guess which one?).\n",
"\n",
@@ -90,7 +90,7 @@
"metadata": {},
"outputs": [],
"source": [
- "from mmlspark.stages import UDFTransformer\n",
+ "from synapse.ml.stages import UDFTransformer\n",
"wordLength = \"wordLength\"\n",
"wordCount = \"wordCount\"\n",
"wordLengthTransformer = UDFTransformer(inputCol=\"text\", outputCol=wordLength, udf=wordLengthUDF)\n",
@@ -214,9 +214,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "### 4b. Classify using mmlspark\n",
+ "### 4b. Classify using synapseml\n",
"\n",
- "Life is a lot simpler when using `mmlspark`!\n",
+ "Life is a lot simpler when using `synapseml`!\n",
"\n",
"1. The **`TrainClassifier`** Estimator featurizes the data internally,\n",
" as long as the columns selected in the `train`, `test`, `validation`\n",
@@ -237,8 +237,8 @@
"metadata": {},
"outputs": [],
"source": [
- "from mmlspark.train import TrainClassifier, ComputeModelStatistics\n",
- "from mmlspark.automl import FindBestModel\n",
+ "from synapse.ml.train import TrainClassifier, ComputeModelStatistics\n",
+ "from synapse.ml.automl import FindBestModel\n",
"\n",
"# Prepare data for learning\n",
"train, test, validation = data.randomSplit([0.60, 0.20, 0.20], seed=123)\n",
diff --git a/notebooks/Classification - Twitter Sentiment with Vowpal Wabbit.ipynb b/notebooks/Classification - Twitter Sentiment with Vowpal Wabbit.ipynb
index c7be7427b3..b3bd002a74 100644
--- a/notebooks/Classification - Twitter Sentiment with Vowpal Wabbit.ipynb
+++ b/notebooks/Classification - Twitter Sentiment with Vowpal Wabbit.ipynb
@@ -4,9 +4,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "# Twitter Sentiment Classification using Vowpal Wabbit in MMLSpark\n",
+ "# Twitter Sentiment Classification using Vowpal Wabbit in SynapseML\n",
"\n",
- "In this example, we show how to build a sentiment classification model using Vowpal Wabbit (VW) in MMLSpark. The data set we use to train and evaluate the model is [Sentiment140](http://help.sentiment140.com/for-students/?source=post_page---------------------------) twitter data. First, we import a few packages that we need."
+ "In this example, we show how to build a sentiment classification model using Vowpal Wabbit (VW) in SynapseML. The data set we use to train and evaluate the model is [Sentiment140](http://help.sentiment140.com/for-students/?source=post_page---------------------------) twitter data. First, we import a few packages that we need."
]
},
{
@@ -26,8 +26,8 @@
"from pyspark.sql.types import StructType, StructField, DoubleType, StringType\n",
"from pyspark.ml import Pipeline\n",
"from pyspark.ml.feature import CountVectorizer, RegexTokenizer\n",
- "from mmlspark.vw import VowpalWabbitClassifier\n",
- "from mmlspark.train import ComputeModelStatistics\n",
+ "from synapse.ml.vw import VowpalWabbitClassifier\n",
+ "from synapse.ml.train import ComputeModelStatistics\n",
"from pyspark.mllib.evaluation import BinaryClassificationMetrics\n",
"import matplotlib.pyplot as plt"
]
@@ -165,7 +165,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "## VW MMLSpark Training\n",
+ "## VW SynapseML Training\n",
"\n",
"Now we are ready to define a pipeline which consists of feture engineering steps and the VW model."
]
@@ -336,7 +336,7 @@
"pygments_lexer": "ipython3",
"version": "3.6.8"
},
- "name": "vw_mmlspark_sentiment_classification2",
+ "name": "vw_synapseml_sentiment_classification2",
"notebookId": 2916790739696591
},
"nbformat": 4,
diff --git a/notebooks/CognitiveServices - Celebrity Quote Analysis.ipynb b/notebooks/CognitiveServices - Celebrity Quote Analysis.ipynb
index 81e0d75cd1..cdd55529e5 100644
--- a/notebooks/CognitiveServices - Celebrity Quote Analysis.ipynb
+++ b/notebooks/CognitiveServices - Celebrity Quote Analysis.ipynb
@@ -22,7 +22,7 @@
},
"outputs": [],
"source": [
- "from mmlspark.cognitive import *\n",
+ "from synapse.ml.cognitive import *\n",
"from pyspark.ml import PipelineModel\n",
"from pyspark.sql.functions import col, udf\n",
"from pyspark.ml.feature import SQLTransformer\n",
@@ -123,7 +123,7 @@
},
"outputs": [],
"source": [
- "from mmlspark.stages import UDFTransformer \n",
+ "from synapse.ml.stages import UDFTransformer \n",
"\n",
"recognizeText = RecognizeText()\\\n",
" .setSubscriptionKey(VISION_API_KEY)\\\n",
@@ -185,7 +185,7 @@
"metadata": {},
"outputs": [],
"source": [
- "from mmlspark.stages import SelectColumns\n",
+ "from synapse.ml.stages import SelectColumns\n",
"# Select the final coulmns\n",
"cleanupColumns = SelectColumns().setCols([\"url\", \"firstCeleb\", \"text\", \"sentimentLabel\"])\n",
"\n",
diff --git a/notebooks/CognitiveServices - Overview.ipynb b/notebooks/CognitiveServices - Overview.ipynb
index 92e2e08ba5..4da554df03 100644
--- a/notebooks/CognitiveServices - Overview.ipynb
+++ b/notebooks/CognitiveServices - Overview.ipynb
@@ -30,60 +30,60 @@
"\n",
"### Vision\n",
"[**Computer Vision**](https://azure.microsoft.com/en-us/services/cognitive-services/computer-vision/)\n",
- "- Describe: provides description of an image in human readable language ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/scala/com/microsoft/ml/spark/cognitive/DescribeImage.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/pyspark/mmlspark.cognitive.html#module-mmlspark.cognitive.DescribeImage))\n",
- "- Analyze (color, image type, face, adult/racy content): analyzes visual features of an image ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/scala/com/microsoft/ml/spark/cognitive/AnalyzeImage.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/pyspark/mmlspark.cognitive.html#module-mmlspark.cognitive.AnalyzeImage))\n",
- "- OCR: reads text from an image ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/scala/com/microsoft/ml/spark/cognitive/OCR.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/pyspark/mmlspark.cognitive.html#module-mmlspark.cognitive.OCR))\n",
- "- Recognize Text: reads text from an image ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/scala/com/microsoft/ml/spark/cognitive/RecognizeText.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/pyspark/mmlspark.cognitive.html#module-mmlspark.cognitive.RecognizeText))\n",
- "- Thumbnail: generates a thumbnail of user-specified size from the image ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/scala/com/microsoft/ml/spark/cognitive/GenerateThumbnails.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/pyspark/mmlspark.cognitive.html#module-mmlspark.cognitive.GenerateThumbnails))\n",
- "- Recognize domain-specific content: recognizes domain-specific content (celebrity, landmark) ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/scala/com/microsoft/ml/spark/cognitive/RecognizeDomainSpecificContent.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/pyspark/mmlspark.cognitive.html#module-mmlspark.cognitive.RecognizeDomainSpecificContent))\n",
- "- Tag: identifies list of words that are relevant to the in0put image ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/scala/com/microsoft/ml/spark/cognitive/TagImage.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/pyspark/mmlspark.cognitive.html#module-mmlspark.cognitive.TagImage))\n",
+ "- Describe: provides description of an image in human readable language ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/scala/com/microsoft/ml/spark/cognitive/DescribeImage.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/pyspark/synapse.ml.cognitive.html#module-synapse.ml.cognitive.DescribeImage))\n",
+ "- Analyze (color, image type, face, adult/racy content): analyzes visual features of an image ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/scala/com/microsoft/ml/spark/cognitive/AnalyzeImage.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/pyspark/synapse.ml.cognitive.html#module-synapse.ml.cognitive.AnalyzeImage))\n",
+ "- OCR: reads text from an image ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/scala/com/microsoft/ml/spark/cognitive/OCR.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/pyspark/synapse.ml.cognitive.html#module-synapse.ml.cognitive.OCR))\n",
+ "- Recognize Text: reads text from an image ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/scala/com/microsoft/ml/spark/cognitive/RecognizeText.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/pyspark/synapse.ml.cognitive.html#module-synapse.ml.cognitive.RecognizeText))\n",
+ "- Thumbnail: generates a thumbnail of user-specified size from the image ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/scala/com/microsoft/ml/spark/cognitive/GenerateThumbnails.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/pyspark/synapse.ml.cognitive.html#module-synapse.ml.cognitive.GenerateThumbnails))\n",
+ "- Recognize domain-specific content: recognizes domain-specific content (celebrity, landmark) ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/scala/com/microsoft/ml/spark/cognitive/RecognizeDomainSpecificContent.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/pyspark/synapse.ml.cognitive.html#module-synapse.ml.cognitive.RecognizeDomainSpecificContent))\n",
+ "- Tag: identifies list of words that are relevant to the in0put image ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/scala/com/microsoft/ml/spark/cognitive/TagImage.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/pyspark/synapse.ml.cognitive.html#module-synapse.ml.cognitive.TagImage))\n",
"\n",
"[**Face**](https://azure.microsoft.com/en-us/services/cognitive-services/face/)\n",
- "- Detect: detects human faces in an image ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/scala/com/microsoft/ml/spark/cognitive/DetectFace.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/pyspark/mmlspark.cognitive.html#module-mmlspark.cognitive.DetectFace))\n",
- "- Verify: verifies whether two faces belong to a same person, or a face belongs to a person ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/scala/com/microsoft/ml/spark/cognitive/VerifyFaces.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/pyspark/mmlspark.cognitive.html#module-mmlspark.cognitive.VerifyFaces))\n",
- "- Identify: finds the closest matches of the specific query person face from a person group ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/scala/com/microsoft/ml/spark/cognitive/IdentifyFaces.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/pyspark/mmlspark.cognitive.html#module-mmlspark.cognitive.IdentifyFaces))\n",
- "- Find similar: finds similar faces to the query face in a face list ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/scala/com/microsoft/ml/spark/cognitive/FindSimilarFace.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/pyspark/mmlspark.cognitive.html#module-mmlspark.cognitive.FindSimilarFace))\n",
- "- Group: divides a group of faces into disjoint groups based on similarity ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/scala/com/microsoft/ml/spark/cognitive/GroupFaces.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/pyspark/mmlspark.cognitive.html#module-mmlspark.cognitive.GroupFaces))\n",
+ "- Detect: detects human faces in an image ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/scala/com/microsoft/ml/spark/cognitive/DetectFace.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/pyspark/synapse.ml.cognitive.html#module-synapse.ml.cognitive.DetectFace))\n",
+ "- Verify: verifies whether two faces belong to a same person, or a face belongs to a person ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/scala/com/microsoft/ml/spark/cognitive/VerifyFaces.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/pyspark/synapse.ml.cognitive.html#module-synapse.ml.cognitive.VerifyFaces))\n",
+ "- Identify: finds the closest matches of the specific query person face from a person group ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/scala/com/microsoft/ml/spark/cognitive/IdentifyFaces.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/pyspark/synapse.ml.cognitive.html#module-synapse.ml.cognitive.IdentifyFaces))\n",
+ "- Find similar: finds similar faces to the query face in a face list ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/scala/com/microsoft/ml/spark/cognitive/FindSimilarFace.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/pyspark/synapse.ml.cognitive.html#module-synapse.ml.cognitive.FindSimilarFace))\n",
+ "- Group: divides a group of faces into disjoint groups based on similarity ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/scala/com/microsoft/ml/spark/cognitive/GroupFaces.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/pyspark/synapse.ml.cognitive.html#module-synapse.ml.cognitive.GroupFaces))\n",
"\n",
"### Speech\n",
"[**Speech Services**](https://azure.microsoft.com/en-us/services/cognitive-services/speech-services/)\n",
- "- Speech-to-text: transcribes audio streams ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/scala/com/microsoft/ml/spark/cognitive/SpeechToText.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/pyspark/mmlspark.cognitive.html#module-mmlspark.cognitive.SpeechToText))\n",
+ "- Speech-to-text: transcribes audio streams ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/scala/com/microsoft/ml/spark/cognitive/SpeechToText.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/pyspark/synapse.ml.cognitive.html#module-synapse.ml.cognitive.SpeechToText))\n",
"\n",
"### Language\n",
"[**Text Analytics**](https://azure.microsoft.com/en-us/services/cognitive-services/text-analytics/)\n",
- "- Language detection: detects language of the input text ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/scala/com/microsoft/ml/spark/cognitive/LanguageDetector.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/pyspark/mmlspark.cognitive.html#module-mmlspark.cognitive.LanguageDetector))\n",
- "- Key phrase extraction: identifies the key talking points in the input text ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/scala/com/microsoft/ml/spark/cognitive/KeyPhraseExtractor.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/pyspark/mmlspark.cognitive.html#module-mmlspark.cognitive.KeyPhraseExtractor))\n",
- "- Named entity recognition: identifies known entities and general named entities in the input text ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/scala/com/microsoft/ml/spark/cognitive/NER.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/pyspark/mmlspark.cognitive.html#module-mmlspark.cognitive.NER))\n",
- "- Sentiment analysis: returns a score betwee 0 and 1 indicating the sentiment in the input text ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/scala/com/microsoft/ml/spark/cognitive/TextSentiment.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/pyspark/mmlspark.cognitive.html#module-mmlspark.cognitive.TextSentiment))\n",
+ "- Language detection: detects language of the input text ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/scala/com/microsoft/ml/spark/cognitive/LanguageDetector.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/pyspark/synapse.ml.cognitive.html#module-synapse.ml.cognitive.LanguageDetector))\n",
+ "- Key phrase extraction: identifies the key talking points in the input text ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/scala/com/microsoft/ml/spark/cognitive/KeyPhraseExtractor.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/pyspark/synapse.ml.cognitive.html#module-synapse.ml.cognitive.KeyPhraseExtractor))\n",
+ "- Named entity recognition: identifies known entities and general named entities in the input text ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/scala/com/microsoft/ml/spark/cognitive/NER.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/pyspark/synapse.ml.cognitive.html#module-synapse.ml.cognitive.NER))\n",
+ "- Sentiment analysis: returns a score betwee 0 and 1 indicating the sentiment in the input text ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/scala/com/microsoft/ml/spark/cognitive/TextSentiment.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/pyspark/synapse.ml.cognitive.html#module-synapse.ml.cognitive.TextSentiment))\n",
"\n",
"[**Translator**](https://azure.microsoft.com/en-us/services/cognitive-services/translator/)\n",
- "- Translate: Translates text. ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/scala/com/microsoft/ml/spark/cognitive/Translate.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/pyspark/mmlspark.cognitive.html#module-mmlspark.cognitive.Translate))\n",
- "- Transliterate: Converts text in one language from one script to another script. ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/scala/com/microsoft/ml/spark/cognitive/Transliterate.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/pyspark/mmlspark.cognitive.html#module-mmlspark.cognitive.Transliterate))\n",
- "- Detect: Identifies the language of a piece of text. ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/scala/com/microsoft/ml/spark/cognitive/Detect.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/pyspark/mmlspark.cognitive.html#module-mmlspark.cognitive.Detect))\n",
- "- BreakSentence: Identifies the positioning of sentence boundaries in a piece of text. ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/scala/com/microsoft/ml/spark/cognitive/BreakSentence.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/pyspark/mmlspark.cognitive.html#module-mmlspark.cognitive.BreakSentence))\n",
- "- Dictionary Lookup: Provides alternative translations for a word and a small number of idiomatic phrases. ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/scala/com/microsoft/ml/spark/cognitive/DictionaryLookup.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/pyspark/mmlspark.cognitive.html#module-mmlspark.cognitive.DictionaryLookup))\n",
- "- Dictionary Examples: Provides examples that show how terms in the dictionary are used in context. ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/scala/com/microsoft/ml/spark/cognitive/DictionaryExamples.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/pyspark/mmlspark.cognitive.html#module-mmlspark.cognitive.DictionaryExamples))\n",
- "- Document Translation: Translates documents across all supported languages and dialects while preserving document structure and data format. ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/scala/com/microsoft/ml/spark/cognitive/DocumentTranslator.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/pyspark/mmlspark.cognitive.html#module-mmlspark.cognitive.DocumentTranslator))\n",
+ "- Translate: Translates text. ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/scala/com/microsoft/ml/spark/cognitive/Translate.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/pyspark/synapse.ml.cognitive.html#module-synapse.ml.cognitive.Translate))\n",
+ "- Transliterate: Converts text in one language from one script to another script. ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/scala/com/microsoft/ml/spark/cognitive/Transliterate.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/pyspark/synapse.ml.cognitive.html#module-synapse.ml.cognitive.Transliterate))\n",
+ "- Detect: Identifies the language of a piece of text. ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/scala/com/microsoft/ml/spark/cognitive/Detect.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/pyspark/synapse.ml.cognitive.html#module-synapse.ml.cognitive.Detect))\n",
+ "- BreakSentence: Identifies the positioning of sentence boundaries in a piece of text. ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/scala/com/microsoft/ml/spark/cognitive/BreakSentence.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/pyspark/synapse.ml.cognitive.html#module-synapse.ml.cognitive.BreakSentence))\n",
+ "- Dictionary Lookup: Provides alternative translations for a word and a small number of idiomatic phrases. ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/scala/com/microsoft/ml/spark/cognitive/DictionaryLookup.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/pyspark/synapse.ml.cognitive.html#module-synapse.ml.cognitive.DictionaryLookup))\n",
+ "- Dictionary Examples: Provides examples that show how terms in the dictionary are used in context. ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/scala/com/microsoft/ml/spark/cognitive/DictionaryExamples.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/pyspark/synapse.ml.cognitive.html#module-synapse.ml.cognitive.DictionaryExamples))\n",
+ "- Document Translation: Translates documents across all supported languages and dialects while preserving document structure and data format. ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/scala/com/microsoft/ml/spark/cognitive/DocumentTranslator.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/pyspark/synapse.ml.cognitive.html#module-synapse.ml.cognitive.DocumentTranslator))\n",
"\n",
"### Azure Form Recognizer\n",
"[**Form Recognizer**](https://azure.microsoft.com/en-us/services/form-recognizer/)\n",
- "- Analyze Layout: Extract text and layout information from a given document. ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/scala/com/microsoft/ml/spark/cognitive/AnalyzeLayout.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/pyspark/mmlspark.cognitive.html#module-mmlspark.cognitive.AnalyzeLayout))\n",
- "- Analyze Receipts: Detects and extracts data from receipts using optical character recognition (OCR) and our receipt model, enabling you to easily extract structured data from receipts such as merchant name, merchant phone number, transaction date, transaction total, and more. ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/scala/com/microsoft/ml/spark/cognitive/AnalyzeReceipts.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/pyspark/mmlspark.cognitive.html#module-mmlspark.cognitive.AnalyzeReceipts))\n",
- "- Analyze Business Cards: Detects and extracts data from business cards using optical character recognition (OCR) and our business card model, enabling you to easily extract structured data from business cards such as contact names, company names, phone numbers, emails, and more. ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/scala/com/microsoft/ml/spark/cognitive/AnalyzeBusinessCards.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/pyspark/mmlspark.cognitive.html#module-mmlspark.cognitive.AnalyzeBusinessCards))\n",
- "- Analyze Invoices: Detects and extracts data from invoices using optical character recognition (OCR) and our invoice understanding deep learning models, enabling you to easily extract structured data from invoices such as customer, vendor, invoice ID, invoice due date, total, invoice amount due, tax amount, ship to, bill to, line items and more. ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/scala/com/microsoft/ml/spark/cognitive/AnalyzeInvoices.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/pyspark/mmlspark.cognitive.html#module-mmlspark.cognitive.AnalyzeInvoices))\n",
- "- Analyze ID Documents: Detects and extracts data from identification documents using optical character recognition (OCR) and our ID document model, enabling you to easily extract structured data from ID documents such as first name, last name, date of birth, document number, and more. ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/scala/com/microsoft/ml/spark/cognitive/AnalyzeIDDocuments.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/pyspark/mmlspark.cognitive.html#module-mmlspark.cognitive.AnalyzeIDDocuments))\n",
- "- Analyze Custom Form: Extracts information from forms (PDFs and images) into structured data based on a model created from a set of representative training forms. ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/scala/com/microsoft/ml/spark/cognitive/AnalyzeCustomModel.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/pyspark/mmlspark.cognitive.html#module-mmlspark.cognitive.AnalyzeCustomModel))\n",
+ "- Analyze Layout: Extract text and layout information from a given document. ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/scala/com/microsoft/ml/spark/cognitive/AnalyzeLayout.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/pyspark/synapse.ml.cognitive.html#module-synapse.ml.cognitive.AnalyzeLayout))\n",
+ "- Analyze Receipts: Detects and extracts data from receipts using optical character recognition (OCR) and our receipt model, enabling you to easily extract structured data from receipts such as merchant name, merchant phone number, transaction date, transaction total, and more. ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/scala/com/microsoft/ml/spark/cognitive/AnalyzeReceipts.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/pyspark/synapse.ml.cognitive.html#module-synapse.ml.cognitive.AnalyzeReceipts))\n",
+ "- Analyze Business Cards: Detects and extracts data from business cards using optical character recognition (OCR) and our business card model, enabling you to easily extract structured data from business cards such as contact names, company names, phone numbers, emails, and more. ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/scala/com/microsoft/ml/spark/cognitive/AnalyzeBusinessCards.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/pyspark/synapse.ml.cognitive.html#module-synapse.ml.cognitive.AnalyzeBusinessCards))\n",
+ "- Analyze Invoices: Detects and extracts data from invoices using optical character recognition (OCR) and our invoice understanding deep learning models, enabling you to easily extract structured data from invoices such as customer, vendor, invoice ID, invoice due date, total, invoice amount due, tax amount, ship to, bill to, line items and more. ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/scala/com/microsoft/ml/spark/cognitive/AnalyzeInvoices.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/pyspark/synapse.ml.cognitive.html#module-synapse.ml.cognitive.AnalyzeInvoices))\n",
+ "- Analyze ID Documents: Detects and extracts data from identification documents using optical character recognition (OCR) and our ID document model, enabling you to easily extract structured data from ID documents such as first name, last name, date of birth, document number, and more. ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/scala/com/microsoft/ml/spark/cognitive/AnalyzeIDDocuments.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/pyspark/synapse.ml.cognitive.html#module-synapse.ml.cognitive.AnalyzeIDDocuments))\n",
+ "- Analyze Custom Form: Extracts information from forms (PDFs and images) into structured data based on a model created from a set of representative training forms. ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/scala/com/microsoft/ml/spark/cognitive/AnalyzeCustomModel.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/pyspark/synapse.ml.cognitive.html#module-synapse.ml.cognitive.AnalyzeCustomModel))\n",
"- Get Custom Model: Get detailed information about a custom model. ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/scala/com/microsoft/ml/spark/cognitive/GetCustomModel.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/scala/com/microsoft/ml/spark/cognitive/ListCustomModels.html))\n",
- "- List Custom Models: Get information about all custom models. ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/scala/com/microsoft/ml/spark/cognitive/ListCustomModels.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/pyspark/mmlspark.cognitive.html#module-mmlspark.cognitive.ListCustomModels))\n",
+ "- List Custom Models: Get information about all custom models. ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/scala/com/microsoft/ml/spark/cognitive/ListCustomModels.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/pyspark/synapse.ml.cognitive.html#module-synapse.ml.cognitive.ListCustomModels))\n",
"\n",
"### Decision\n",
"[**Anomaly Detector**](https://azure.microsoft.com/en-us/services/cognitive-services/anomaly-detector/)\n",
- "- Anomaly status of latest point: generates a model using preceding points and determines whether the latest point is anomalous ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/scala/com/microsoft/ml/spark/cognitive/DetectLastAnomaly.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/pyspark/mmlspark.cognitive.html#module-mmlspark.cognitive.DetectLastAnomaly))\n",
- "- Find anomalies: generates a model using an entire series and finds anomalies in the series ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/scala/com/microsoft/ml/spark/cognitive/DetectAnomalies.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/pyspark/mmlspark.cognitive.html#module-mmlspark.cognitive.DetectAnomalies))\n",
+ "- Anomaly status of latest point: generates a model using preceding points and determines whether the latest point is anomalous ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/scala/com/microsoft/ml/spark/cognitive/DetectLastAnomaly.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/pyspark/synapse.ml.cognitive.html#module-synapse.ml.cognitive.DetectLastAnomaly))\n",
+ "- Find anomalies: generates a model using an entire series and finds anomalies in the series ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/scala/com/microsoft/ml/spark/cognitive/DetectAnomalies.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/pyspark/synapse.ml.cognitive.html#module-synapse.ml.cognitive.DetectAnomalies))\n",
"\n",
"### Search\n",
- "- [Bing Image search](https://azure.microsoft.com/en-us/services/cognitive-services/bing-image-search-api/) ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/scala/com/microsoft/ml/spark/cognitive/BingImageSearch.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/pyspark/mmlspark.cognitive.html#module-mmlspark.cognitive.BingImageSearch))\n",
- "- [Azure Cognitive search](https://docs.microsoft.com/en-us/azure/search/search-what-is-azure-search) ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/scala/index.html#com.microsoft.ml.spark.cognitive.AzureSearchWriter$), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/scala/index.html#com.microsoft.ml.spark.cognitive.AzureSearchWriter$))\n"
+ "- [Bing Image search](https://azure.microsoft.com/en-us/services/cognitive-services/bing-image-search-api/) ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/scala/com/microsoft/ml/spark/cognitive/BingImageSearch.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/pyspark/synapse.ml.cognitive.html#module-synapse.ml.cognitive.BingImageSearch))\n",
+ "- [Azure Cognitive search](https://docs.microsoft.com/en-us/azure/search/search-what-is-azure-search) ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/scala/index.html#com.microsoft.azure.synapse.ml.cognitive.AzureSearchWriter$), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc4/scala/index.html#com.microsoft.azure.synapse.ml.cognitive.AzureSearchWriter$))\n"
],
"metadata": {}
},
@@ -92,7 +92,7 @@
"source": [
"## Prerequisites\n",
"\n",
- "1. Follow the steps in [Getting started](https://docs.microsoft.com/en-us/azure/cognitive-services/big-data/getting-started) to set up your Azure Databricks and Cognitive Services environment. This tutorial shows you how to install MMLSpark and how to create your Spark cluster in Databricks.\n",
+ "1. Follow the steps in [Getting started](https://docs.microsoft.com/en-us/azure/cognitive-services/big-data/getting-started) to set up your Azure Databricks and Cognitive Services environment. This tutorial shows you how to install SynapseML and how to create your Spark cluster in Databricks.\n",
"1. After you create a new notebook in Azure Databricks, copy the **Shared code** below and paste into a new cell in your notebook.\n",
"1. Choose a service sample, below, and copy paste it into a second new cell in your notebook.\n",
"1. Replace any of the service subscription key placeholders with your own key.\n",
@@ -115,7 +115,7 @@
"execution_count": null,
"source": [
"from pyspark.sql.functions import udf, col\r\n",
- "from mmlspark.io.http import HTTPTransformer, http_udf\r\n",
+ "from synapse.ml.io.http import HTTPTransformer, http_udf\r\n",
"from requests import Request\r\n",
"from pyspark.sql.functions import lit\r\n",
"from pyspark.ml import PipelineModel\r\n",
@@ -150,7 +150,7 @@
"cell_type": "code",
"execution_count": null,
"source": [
- "from mmlspark.cognitive import *\r\n",
+ "from synapse.ml.cognitive import *\r\n",
"\r\n",
"# A general Cognitive Services key for Text Analytics, Computer Vision and Form Recognizer (or use separate keys that belong to each service)\r\n",
"service_key = os.environ[\"COGNITIVE_SERVICE_KEY\"]\r\n",
@@ -480,7 +480,7 @@
"source": [
"## Azure Cognitive search sample\n",
"\n",
- "In this example, we show how you can enrich data using Cognitive Skills and write to an Azure Search Index using MMLSpark."
+ "In this example, we show how you can enrich data using Cognitive Skills and write to an Azure Search Index using SynapseML."
],
"metadata": {}
},
diff --git a/notebooks/CognitiveServices - Predictive Maintenance.ipynb b/notebooks/CognitiveServices - Predictive Maintenance.ipynb
index 1c7651c0fb..66790b90aa 100644
--- a/notebooks/CognitiveServices - Predictive Maintenance.ipynb
+++ b/notebooks/CognitiveServices - Predictive Maintenance.ipynb
@@ -152,7 +152,7 @@
{
"cell_type": "code",
"source": [
- "from pyspark.sql.functions import col, struct\nfrom mmlspark.cognitive import SimpleDetectAnomalies\nfrom mmlspark.core.spark import FluentAPI\n\ndetector = (SimpleDetectAnomalies()\n .setSubscriptionKey(service_key)\n .setLocation(location)\n .setOutputCol(\"anomalies\")\n .setGroupbyCol(\"grouping\")\n .setSensitivity(95)\n .setGranularity(\"secondly\"))\n\ndf_anomaly = (df_signals\n .where(col(\"unitSymbol\") == 'RPM')\n .withColumn(\"timestamp\", col(\"dateTime\").cast(\"string\"))\n .withColumn(\"value\", col(\"measureValue\").cast(\"double\"))\n .withColumn(\"grouping\", struct(\"deviceId\"))\n .mlTransform(detector)).cache()\n\ndf_anomaly.createOrReplaceTempView('df_anomaly')"
+ "from pyspark.sql.functions import col, struct\nfrom synapse.ml.cognitive import SimpleDetectAnomalies\nfrom synapse.ml.core.spark import FluentAPI\n\ndetector = (SimpleDetectAnomalies()\n .setSubscriptionKey(service_key)\n .setLocation(location)\n .setOutputCol(\"anomalies\")\n .setGroupbyCol(\"grouping\")\n .setSensitivity(95)\n .setGranularity(\"secondly\"))\n\ndf_anomaly = (df_signals\n .where(col(\"unitSymbol\") == 'RPM')\n .withColumn(\"timestamp\", col(\"dateTime\").cast(\"string\"))\n .withColumn(\"value\", col(\"measureValue\").cast(\"double\"))\n .withColumn(\"grouping\", struct(\"deviceId\"))\n .mlTransform(detector)).cache()\n\ndf_anomaly.createOrReplaceTempView('df_anomaly')"
],
"metadata": {
"application/vnd.databricks.v1+cell": {
diff --git a/notebooks/ConditionalKNN - Exploring Art Across Cultures.ipynb b/notebooks/ConditionalKNN - Exploring Art Across Cultures.ipynb
index 9f8480cb69..3386deba53 100644
--- a/notebooks/ConditionalKNN - Exploring Art Across Cultures.ipynb
+++ b/notebooks/ConditionalKNN - Exploring Art Across Cultures.ipynb
@@ -34,7 +34,7 @@
"from pyspark.sql.types import *\n",
"from pyspark.ml.feature import Normalizer\n",
"from pyspark.sql.functions import lit, array, array_contains, udf, col, struct\n",
- "from mmlspark.nn import ConditionalKNN, ConditionalKNNModel\n",
+ "from synapse.ml.nn import ConditionalKNN, ConditionalKNNModel\n",
"from PIL import Image\n",
"from io import BytesIO\n",
"\n",
diff --git a/notebooks/CyberML - Anomalous Access Detection.ipynb b/notebooks/CyberML - Anomalous Access Detection.ipynb
index 2c3996f9ee..b7d97dcf3f 100644
--- a/notebooks/CyberML - Anomalous Access Detection.ipynb
+++ b/notebooks/CyberML - Anomalous Access Detection.ipynb
@@ -34,7 +34,7 @@
"# Create an Azure Databricks cluster and install the following libs\n",
"\n",
"1. In Cluster Libraries install from library source Maven:\n",
- "Coordinates: com.microsoft.ml.spark:mmlspark:1.0.0-rc4\n",
+ "Coordinates: com.microsoft.azure:synapseml:1.0.0-rc4\n",
"Repository: https://mmlspark.azureedge.net/maven\n",
"\n",
"2. In Cluster Libraries install from PyPI the library called plotly"
@@ -54,10 +54,10 @@
"outputs": [],
"source": [
"# this is used to produce the synthetic dataset for this test\n",
- "from mmlspark.cyber.dataset import DataFactory\n",
+ "from synapse.ml.cyber.dataset import DataFactory\n",
"\n",
"# the access anomalies model generator\n",
- "from mmlspark.cyber.anomaly.collaborative_filtering import AccessAnomaly\n",
+ "from synapse.ml.cyber.anomaly.collaborative_filtering import AccessAnomaly\n",
"\n",
"from pyspark.sql import functions as f, types as t"
]
diff --git a/notebooks/DeepLearning - BiLSTM Medical Entity Extraction.ipynb b/notebooks/DeepLearning - BiLSTM Medical Entity Extraction.ipynb
index e095c33421..bfaed7e954 100644
--- a/notebooks/DeepLearning - BiLSTM Medical Entity Extraction.ipynb
+++ b/notebooks/DeepLearning - BiLSTM Medical Entity Extraction.ipynb
@@ -6,7 +6,7 @@
"source": [
"## DeepLearning - BiLSTM Medical Entity Extraction\n",
"\n",
- "In this tutorial we use a Bidirectional LSTM entity extractor from the MMLSPark\n",
+ "In this tutorial we use a Bidirectional LSTM entity extractor from the synapseml\n",
"model downloader to extract entities from PubMed medical abstracts\n",
"\n",
"Our goal is to identify useful entities in a block of free-form text. This is a\n",
@@ -28,8 +28,8 @@
"metadata": {},
"outputs": [],
"source": [
- "from mmlspark.cntk import CNTKModel\n",
- "from mmlspark.downloader import ModelDownloader\n",
+ "from synapse.ml.cntk import CNTKModel\n",
+ "from synapse.ml.downloader import ModelDownloader\n",
"from pyspark.sql.functions import udf, col\n",
"from pyspark.sql.types import IntegerType, ArrayType, FloatType, StringType\n",
"from pyspark.sql import Row\n",
diff --git a/notebooks/DeepLearning - CIFAR10 Convolutional Network.ipynb b/notebooks/DeepLearning - CIFAR10 Convolutional Network.ipynb
index 01226b35b5..e50c52d5cb 100644
--- a/notebooks/DeepLearning - CIFAR10 Convolutional Network.ipynb
+++ b/notebooks/DeepLearning - CIFAR10 Convolutional Network.ipynb
@@ -13,8 +13,8 @@
"metadata": {},
"outputs": [],
"source": [
- "from mmlspark.cntk import CNTKModel\n",
- "from mmlspark.downloader import ModelDownloader\n",
+ "from synapse.ml.cntk import CNTKModel\n",
+ "from synapse.ml.downloader import ModelDownloader\n",
"from pyspark.sql.functions import udf\n",
"from pyspark.sql.types import IntegerType\n",
"from os.path import abspath"
diff --git a/notebooks/DeepLearning - Flower Image Classification.ipynb b/notebooks/DeepLearning - Flower Image Classification.ipynb
index b76f914837..e050192f63 100644
--- a/notebooks/DeepLearning - Flower Image Classification.ipynb
+++ b/notebooks/DeepLearning - Flower Image Classification.ipynb
@@ -8,7 +8,7 @@
"source": [
"from pyspark.ml import Transformer, Estimator, Pipeline\n",
"from pyspark.ml.classification import LogisticRegression\n",
- "from mmlspark.downloader import ModelDownloader\n",
+ "from synapse.ml.downloader import ModelDownloader\n",
"import os, sys, time"
]
},
@@ -50,10 +50,10 @@
"metadata": {},
"outputs": [],
"source": [
- "from mmlspark.opencv import ImageTransformer\n",
- "from mmlspark.image import UnrollImage\n",
- "from mmlspark.cntk import ImageFeaturizer\n",
- "from mmlspark.stages import *\n",
+ "from synapse.ml.opencv import ImageTransformer\n",
+ "from synapse.ml.image import UnrollImage\n",
+ "from synapse.ml.cntk import ImageFeaturizer\n",
+ "from synapse.ml.stages import *\n",
"\n",
"# Make some featurizers\n",
"it = ImageTransformer()\\\n",
diff --git a/notebooks/DeepLearning - Transfer Learning.ipynb b/notebooks/DeepLearning - Transfer Learning.ipynb
index d47dde8318..ef7b204f54 100644
--- a/notebooks/DeepLearning - Transfer Learning.ipynb
+++ b/notebooks/DeepLearning - Transfer Learning.ipynb
@@ -23,8 +23,8 @@
"metadata": {},
"outputs": [],
"source": [
- "from mmlspark.cntk import CNTKModel\n",
- "from mmlspark.downloader import ModelDownloader\n",
+ "from synapse.ml.cntk import CNTKModel\n",
+ "from synapse.ml.downloader import ModelDownloader\n",
"import numpy as np, os, urllib, tarfile, pickle, array\n",
"from os.path import abspath\n",
"from pyspark.sql.functions import col, udf\n",
@@ -100,7 +100,7 @@
"metadata": {},
"outputs": [],
"source": [
- "from mmlspark.train import TrainClassifier\n",
+ "from synapse.ml.train import TrainClassifier\n",
"from pyspark.ml.classification import RandomForestClassifier\n",
"\n",
"train,test = featurizedImages.randomSplit([0.75,0.25])\n",
@@ -121,7 +121,7 @@
"metadata": {},
"outputs": [],
"source": [
- "from mmlspark.train import ComputeModelStatistics\n",
+ "from synapse.ml.train import ComputeModelStatistics\n",
"predictions = model.transform(test)\n",
"metrics = ComputeModelStatistics(evaluationMetric=\"accuracy\").transform(predictions)\n",
"metrics.show()"
diff --git a/notebooks/HttpOnSpark - Working with Arbitrary Web APIs.ipynb b/notebooks/HttpOnSpark - Working with Arbitrary Web APIs.ipynb
index 5a40d43b15..c3825ee9e2 100644
--- a/notebooks/HttpOnSpark - Working with Arbitrary Web APIs.ipynb
+++ b/notebooks/HttpOnSpark - Working with Arbitrary Web APIs.ipynb
@@ -36,7 +36,7 @@
"\n",
"from pyspark.sql.functions import struct\n",
"from pyspark.sql.types import *\n",
- "from mmlspark.io.http import *\n",
+ "from synapse.ml.io.http import *\n",
"\n",
"df = spark.createDataFrame([(\"foo\",) for x in range(20)], [\"data\"]) \\\n",
" .withColumn(\"inputs\", struct(\"data\"))\n",
diff --git a/notebooks/HyperParameterTuning - Fighting Breast Cancer.ipynb b/notebooks/HyperParameterTuning - Fighting Breast Cancer.ipynb
index 2832e54178..075b777b51 100644
--- a/notebooks/HyperParameterTuning - Fighting Breast Cancer.ipynb
+++ b/notebooks/HyperParameterTuning - Fighting Breast Cancer.ipynb
@@ -6,7 +6,7 @@
"source": [
"## HyperParameterTuning - Fighting Breast Cancer\n",
"\n",
- "We can do distributed randomized grid search hyperparameter tuning with MMLSpark.\n",
+ "We can do distributed randomized grid search hyperparameter tuning with SynapseML.\n",
"\n",
"First, we import the packages"
]
@@ -51,8 +51,8 @@
"metadata": {},
"outputs": [],
"source": [
- "from mmlspark.automl import TuneHyperparameters\n",
- "from mmlspark.train import TrainClassifier\n",
+ "from synapse.ml.automl import TuneHyperparameters\n",
+ "from synapse.ml.train import TrainClassifier\n",
"from pyspark.ml.classification import LogisticRegression, RandomForestClassifier, GBTClassifier\n",
"logReg = LogisticRegression()\n",
"randForest = RandomForestClassifier()\n",
@@ -76,7 +76,7 @@
"metadata": {},
"outputs": [],
"source": [
- "from mmlspark.automl import *\n",
+ "from synapse.ml.automl import *\n",
"\n",
"paramBuilder = \\\n",
" HyperparamBuilder() \\\n",
@@ -140,7 +140,7 @@
"metadata": {},
"outputs": [],
"source": [
- "from mmlspark.train import ComputeModelStatistics\n",
+ "from synapse.ml.train import ComputeModelStatistics\n",
"prediction = bestModel.transform(test)\n",
"metrics = ComputeModelStatistics().transform(prediction)\n",
"metrics.limit(10).toPandas()"
diff --git a/notebooks/Interpretability - Image Explainers.ipynb b/notebooks/Interpretability - Image Explainers.ipynb
index ced39cc113..aac9781e22 100644
--- a/notebooks/Interpretability - Image Explainers.ipynb
+++ b/notebooks/Interpretability - Image Explainers.ipynb
@@ -15,10 +15,10 @@
"cell_type": "code",
"execution_count": null,
"source": [
- "from mmlspark.explainers import *\r\n",
- "from mmlspark.onnx import ONNXModel\r\n",
- "from mmlspark.opencv import ImageTransformer\r\n",
- "from mmlspark.io import *\r\n",
+ "from synapse.ml.explainers import *\r\n",
+ "from synapse.ml.onnx import ONNXModel\r\n",
+ "from synapse.ml.opencv import ImageTransformer\r\n",
+ "from synapse.ml.io import *\r\n",
"from pyspark.ml import Pipeline\r\n",
"from pyspark.ml.classification import LogisticRegression\r\n",
"from pyspark.ml.feature import StringIndexer\r\n",
@@ -74,7 +74,7 @@
"cell_type": "code",
"execution_count": null,
"source": [
- "from mmlspark.io import *\r\n",
+ "from synapse.ml.io import *\r\n",
"\r\n",
"image_df = spark.read.image().load(\"wasbs://publicwasb@mmlspark.blob.core.windows.net/explainers/images/david-lusvardi-dWcUncxocQY-unsplash.jpg\")\r\n",
"display(image_df)\r\n",
diff --git a/notebooks/Interpretability - Tabular SHAP explainer.ipynb b/notebooks/Interpretability - Tabular SHAP explainer.ipynb
index 598684608c..8bde934744 100644
--- a/notebooks/Interpretability - Tabular SHAP explainer.ipynb
+++ b/notebooks/Interpretability - Tabular SHAP explainer.ipynb
@@ -32,7 +32,7 @@
"outputs": [],
"source": [
"import pyspark\n",
- "from mmlspark.explainers import *\n",
+ "from synapse.ml.explainers import *\n",
"from pyspark.ml import Pipeline\n",
"from pyspark.ml.classification import LogisticRegression\n",
"from pyspark.ml.feature import StringIndexer, OneHotEncoder, VectorAssembler\n",
diff --git a/notebooks/Interpretability - Text Explainers.ipynb b/notebooks/Interpretability - Text Explainers.ipynb
index a3acc5f24e..281ace06f3 100644
--- a/notebooks/Interpretability - Text Explainers.ipynb
+++ b/notebooks/Interpretability - Text Explainers.ipynb
@@ -36,8 +36,8 @@
"from pyspark.ml.feature import StopWordsRemover, HashingTF, IDF, Tokenizer\n",
"from pyspark.ml import Pipeline\n",
"from pyspark.ml.classification import LogisticRegression\n",
- "from mmlspark.explainers import *\n",
- "from mmlspark.featurize.text import TextFeaturizer\n",
+ "from synapse.ml.explainers import *\n",
+ "from synapse.ml.featurize.text import TextFeaturizer\n",
"\n",
"vec2array = udf(lambda vec: vec.toArray().tolist(), ArrayType(FloatType()))\n",
"vec_access = udf(lambda v, i: float(v[i]), FloatType())"
@@ -143,7 +143,7 @@
"outputs": [],
"source": [
"def plotConfusionMatrix(df, label, prediction, classLabels):\n",
- " from mmlspark.plot import confusionMatrix\n",
+ " from synapse.ml.plot import confusionMatrix\n",
" import matplotlib.pyplot as plt\n",
"\n",
" fig = plt.figure(figsize=(4.5, 4.5))\n",
diff --git a/notebooks/LightGBM - Overview.ipynb b/notebooks/LightGBM - Overview.ipynb
index 68bfc175bc..b10b1b7ea0 100644
--- a/notebooks/LightGBM - Overview.ipynb
+++ b/notebooks/LightGBM - Overview.ipynb
@@ -184,7 +184,7 @@
"cell_type": "code",
"execution_count": null,
"source": [
- "from mmlspark.lightgbm import LightGBMClassifier\r\n",
+ "from synapse.ml.lightgbm import LightGBMClassifier\r\n",
"model = LightGBMClassifier(objective=\"binary\", featuresCol=\"features\", labelCol=\"Bankrupt?\", isUnbalance=True)"
],
"outputs": [],
@@ -210,7 +210,7 @@
"cell_type": "code",
"execution_count": null,
"source": [
- "from mmlspark.lightgbm import LightGBMClassificationModel\r\n",
+ "from synapse.ml.lightgbm import LightGBMClassificationModel\r\n",
"\r\n",
"if os.environ.get(\"AZURE_SERVICE\", None) == \"Microsoft.ProjectArcadia\":\r\n",
" model.saveNativeModel(\"/models/lgbmclassifier.model\")\r\n",
@@ -279,7 +279,7 @@
"cell_type": "code",
"execution_count": null,
"source": [
- "from mmlspark.train import ComputeModelStatistics\n",
+ "from synapse.ml.train import ComputeModelStatistics\n",
"metrics = ComputeModelStatistics(evaluationMetric=\"classification\", labelCol='Bankrupt?', scoredLabelsCol='prediction').transform(predictions)\n",
"display(metrics)"
],
@@ -354,7 +354,7 @@
"cell_type": "code",
"execution_count": null,
"source": [
- "from mmlspark.lightgbm import LightGBMRegressor\n",
+ "from synapse.ml.lightgbm import LightGBMRegressor\n",
"model = LightGBMRegressor(objective='quantile',\n",
" alpha=0.2,\n",
" learningRate=0.3,\n",
@@ -393,7 +393,7 @@
"cell_type": "code",
"execution_count": null,
"source": [
- "from mmlspark.train import ComputeModelStatistics\n",
+ "from synapse.ml.train import ComputeModelStatistics\n",
"metrics = ComputeModelStatistics(evaluationMetric='regression',\n",
" labelCol='label',\n",
" scoresCol='prediction') \\\n",
@@ -442,7 +442,7 @@
"cell_type": "code",
"execution_count": null,
"source": [
- "from mmlspark.lightgbm import LightGBMRanker\n",
+ "from synapse.ml.lightgbm import LightGBMRanker\n",
"\n",
"features_col = 'features'\n",
"query_col = 'query'\n",
diff --git a/notebooks/ModelInterpretation - Snow Leopard Detection.ipynb b/notebooks/ModelInterpretation - Snow Leopard Detection.ipynb
index 5c9fb57245..6d4052748c 100644
--- a/notebooks/ModelInterpretation - Snow Leopard Detection.ipynb
+++ b/notebooks/ModelInterpretation - Snow Leopard Detection.ipynb
@@ -36,8 +36,8 @@
"cell_type": "code",
"execution_count": null,
"source": [
- "from mmlspark.cognitive import *\n",
- "from mmlspark.core.spark import FluentAPI\n",
+ "from synapse.ml.cognitive import *\n",
+ "from synapse.ml.core.spark import FluentAPI\n",
"from pyspark.sql.functions import lit\n",
"\n",
"def bingPhotoSearch(name, queries, pages):\n",
@@ -199,9 +199,9 @@
"from pyspark.ml.feature import StringIndexer\r\n",
"from pyspark.ml.classification import LogisticRegression\r\n",
"from pyspark.sql.functions import udf\r\n",
- "from mmlspark.downloader import ModelDownloader\r\n",
- "from mmlspark.cntk import ImageFeaturizer\r\n",
- "from mmlspark.stages import UDFTransformer\r\n",
+ "from synapse.ml.downloader import ModelDownloader\r\n",
+ "from synapse.ml.cntk import ImageFeaturizer\r\n",
+ "from synapse.ml.stages import UDFTransformer\r\n",
"from pyspark.sql.types import *\r\n",
"\r\n",
"def getIndex(row):\r\n",
@@ -239,7 +239,7 @@
"execution_count": null,
"source": [
"def plotConfusionMatrix(df, label, prediction, classLabels):\r\n",
- " from mmlspark.plot import confusionMatrix\r\n",
+ " from synapse.ml.plot import confusionMatrix\r\n",
" import matplotlib.pyplot as plt\r\n",
" fig = plt.figure(figsize=(4.5, 4.5))\r\n",
" confusionMatrix(df, label, prediction, classLabels)\r\n",
@@ -258,7 +258,7 @@
"execution_count": null,
"source": [
"import urllib.request\r\n",
- "from mmlspark.lime import ImageLIME\r\n",
+ "from synapse.ml.lime import ImageLIME\r\n",
"\r\n",
"test_image_url = \"https://mmlspark.blob.core.windows.net/graphics/SnowLeopardAD/snow_leopard1.jpg\"\r\n",
"with urllib.request.urlopen(test_image_url) as url:\r\n",
diff --git a/notebooks/ONNX - Inference on Spark.ipynb b/notebooks/ONNX - Inference on Spark.ipynb
index b15266c45c..20d08f4467 100644
--- a/notebooks/ONNX - Inference on Spark.ipynb
+++ b/notebooks/ONNX - Inference on Spark.ipynb
@@ -47,7 +47,7 @@
"execution_count": null,
"source": [
"from pyspark.ml.feature import VectorAssembler\r\n",
- "from mmlspark.lightgbm import LightGBMClassifier\r\n",
+ "from synapse.ml.lightgbm import LightGBMClassifier\r\n",
"\r\n",
"feature_cols = df.columns[1:]\r\n",
"featurizer = VectorAssembler(\r\n",
@@ -119,7 +119,7 @@
"cell_type": "code",
"execution_count": null,
"source": [
- "from mmlspark.onnx import ONNXModel\r\n",
+ "from synapse.ml.onnx import ONNXModel\r\n",
"\r\n",
"onnx_ml = ONNXModel().setModelPayload(model_payload_ml)\r\n",
"\r\n",
diff --git a/notebooks/OpenCV - Pipeline Image Transformations.ipynb b/notebooks/OpenCV - Pipeline Image Transformations.ipynb
index e6b4cda376..34adfcbdc9 100644
--- a/notebooks/OpenCV - Pipeline Image Transformations.ipynb
+++ b/notebooks/OpenCV - Pipeline Image Transformations.ipynb
@@ -31,10 +31,10 @@
" from pyspark.sql import SparkSession\n",
" spark = SparkSession.builder.getOrCreate()\n",
"\n",
- "import mmlspark\n",
+ "import synapse.ml\n",
"import numpy as np\n",
- "from mmlspark.opencv import toNDArray\n",
- "from mmlspark.io import *\n",
+ "from synapse.ml.opencv import toNDArray\n",
+ "from synapse.ml.io import *\n",
"\n",
"imageDir = \"wasbs://publicwasb@mmlspark.blob.core.windows.net/sampleImages\"\n",
"images = spark.read.image().load(imageDir).cache()\n",
@@ -147,7 +147,7 @@
"metadata": {},
"outputs": [],
"source": [
- "from mmlspark.opencv import ImageTransformer\n",
+ "from synapse.ml.opencv import ImageTransformer\n",
"\n",
"tr = (ImageTransformer() # images are resized and then cropped\n",
" .setOutputCol(\"transformed\")\n",
@@ -165,7 +165,7 @@
"metadata": {},
"source": [
"For the advanced image manipulations, use Spark UDFs.\n",
- "The MMLSpark package provides conversion function between *Spark Row* and\n",
+ "The SynapseML package provides conversion function between *Spark Row* and\n",
"*ndarray* image representations."
]
},
@@ -176,7 +176,7 @@
"outputs": [],
"source": [
"from pyspark.sql.functions import udf\n",
- "from mmlspark.opencv import ImageSchema, toNDArray, toImage\n",
+ "from synapse.ml.opencv import ImageSchema, toNDArray, toImage\n",
"\n",
"def u(row):\n",
" array = toNDArray(row) # convert Image to numpy ndarray[height, width, 3]\n",
@@ -204,7 +204,7 @@
"metadata": {},
"outputs": [],
"source": [
- "from mmlspark.image import UnrollImage\n",
+ "from synapse.ml.image import UnrollImage\n",
"\n",
"unroller = UnrollImage().setInputCol(\"noblue\").setOutputCol(\"unrolled\")\n",
"\n",
diff --git a/notebooks/Regression - Auto Imports.ipynb b/notebooks/Regression - Auto Imports.ipynb
index 271a17c751..6ce43ff19e 100644
--- a/notebooks/Regression - Auto Imports.ipynb
+++ b/notebooks/Regression - Auto Imports.ipynb
@@ -13,7 +13,7 @@
"model to predict the automobile's price. The process includes training, testing,\n",
"and evaluating the model on the Automobile Imports data set.\n",
"\n",
- "This sample demonstrates the use of several members of the mmlspark library:\n",
+ "This sample demonstrates the use of several members of the synapseml library:\n",
"- [`TrainRegressor`\n",
" ](http://mmlspark.azureedge.net/docs/pyspark/TrainRegressor.html)\n",
"- [`SummarizeData`\n",
@@ -93,7 +93,7 @@
"metadata": {},
"outputs": [],
"source": [
- "from mmlspark.stages import SummarizeData\n",
+ "from synapse.ml.stages import SummarizeData\n",
"summary = SummarizeData().transform(data)\n",
"summary.toPandas()"
]
@@ -138,7 +138,7 @@
"metadata": {},
"outputs": [],
"source": [
- "from mmlspark.featurize import CleanMissingData\n",
+ "from synapse.ml.featurize import CleanMissingData\n",
"cols = [\"normalized-losses\", \"stroke\", \"bore\", \"horsepower\",\n",
" \"peak-rpm\", \"price\"]\n",
"cleanModel = CleanMissingData().setCleaningMode(\"Median\") \\\n",
@@ -191,7 +191,7 @@
"# train Poisson Regression Model\n",
"from pyspark.ml.regression import GeneralizedLinearRegression\n",
"from pyspark.ml import Pipeline\n",
- "from mmlspark.train import TrainRegressor\n",
+ "from synapse.ml.train import TrainRegressor\n",
"\n",
"glr = GeneralizedLinearRegression(family=\"poisson\", link=\"log\")\n",
"poissonModel = TrainRegressor().setModel(glr).setLabelCol(\"price\").setNumFeatures(256)\n",
@@ -244,7 +244,7 @@
"metadata": {},
"outputs": [],
"source": [
- "from mmlspark.train import ComputeModelStatistics\n",
+ "from synapse.ml.train import ComputeModelStatistics\n",
"poissonMetrics = ComputeModelStatistics().transform(poissonPrediction)\n",
"print(\"Poisson Metrics\")\n",
"poissonMetrics.toPandas()"
@@ -274,7 +274,7 @@
"metadata": {},
"outputs": [],
"source": [
- "from mmlspark.train import ComputePerInstanceStatistics\n",
+ "from synapse.ml.train import ComputePerInstanceStatistics\n",
"def demonstrateEvalPerInstance(pred):\n",
" return ComputePerInstanceStatistics().transform(pred) \\\n",
" .select(\"price\", \"Scores\", \"L1_loss\", \"L2_loss\") \\\n",
diff --git a/notebooks/Regression - Flight Delays with DataCleaning.ipynb b/notebooks/Regression - Flight Delays with DataCleaning.ipynb
index c4340228fc..5eb03604ac 100644
--- a/notebooks/Regression - Flight Delays with DataCleaning.ipynb
+++ b/notebooks/Regression - Flight Delays with DataCleaning.ipynb
@@ -104,7 +104,7 @@
"metadata": {},
"outputs": [],
"source": [
- "from mmlspark.featurize import DataConversion\n",
+ "from synapse.ml.featurize import DataConversion\n",
"flightDelay = DataConversion(cols=[\"Quarter\",\"Month\",\"DayofMonth\",\"DayOfWeek\",\n",
" \"OriginAirportID\",\"DestAirportID\",\n",
" \"CRSDepTime\",\"CRSArrTime\"],\n",
@@ -156,7 +156,7 @@
"metadata": {},
"outputs": [],
"source": [
- "from mmlspark.train import TrainRegressor, TrainedRegressorModel\n",
+ "from synapse.ml.train import TrainRegressor, TrainedRegressorModel\n",
"from pyspark.ml.regression import LinearRegression\n",
"\n",
"trainCat = DataConversion(cols=[\"Carrier\",\"DepTimeBlk\",\"ArrTimeBlk\"],\n",
@@ -200,7 +200,7 @@
"metadata": {},
"outputs": [],
"source": [
- "from mmlspark.train import ComputeModelStatistics\n",
+ "from synapse.ml.train import ComputeModelStatistics\n",
"metrics = ComputeModelStatistics().transform(scoredData)\n",
"metrics.toPandas()"
]
@@ -219,7 +219,7 @@
"metadata": {},
"outputs": [],
"source": [
- "from mmlspark.train import ComputePerInstanceStatistics\n",
+ "from synapse.ml.train import ComputePerInstanceStatistics\n",
"evalPerInstance = ComputePerInstanceStatistics().transform(scoredData)\n",
"evalPerInstance.select(\"ArrDelay\", \"Scores\", \"L1_loss\", \"L2_loss\") \\\n",
" .limit(10).toPandas()"
diff --git a/notebooks/Regression - Flight Delays.ipynb b/notebooks/Regression - Flight Delays.ipynb
index 590915e7cc..74e307c703 100644
--- a/notebooks/Regression - Flight Delays.ipynb
+++ b/notebooks/Regression - Flight Delays.ipynb
@@ -33,7 +33,7 @@
"source": [
"import numpy as np\n",
"import pandas as pd\n",
- "import mmlspark"
+ "import synapse.ml"
]
},
{
@@ -86,7 +86,7 @@
"metadata": {},
"outputs": [],
"source": [
- "from mmlspark.train import TrainRegressor, TrainedRegressorModel\n",
+ "from synapse.ml.train import TrainRegressor, TrainedRegressorModel\n",
"from pyspark.ml.regression import LinearRegression\n",
"from pyspark.ml.feature import StringIndexer\n",
"# Convert columns to categorical\n",
@@ -139,7 +139,7 @@
"metadata": {},
"outputs": [],
"source": [
- "from mmlspark.train import ComputeModelStatistics\n",
+ "from synapse.ml.train import ComputeModelStatistics\n",
"metrics = ComputeModelStatistics().transform(scoredData)\n",
"metrics.toPandas()"
]
@@ -158,7 +158,7 @@
"metadata": {},
"outputs": [],
"source": [
- "from mmlspark.train import ComputePerInstanceStatistics\n",
+ "from synapse.ml.train import ComputePerInstanceStatistics\n",
"evalPerInstance = ComputePerInstanceStatistics().transform(scoredData)\n",
"evalPerInstance.select(\"ArrDelay\", \"Scores\", \"L1_loss\", \"L2_loss\").limit(10).toPandas()"
]
diff --git a/notebooks/Regression - Vowpal Wabbit vs. LightGBM vs. Linear Regressor.ipynb b/notebooks/Regression - Vowpal Wabbit vs. LightGBM vs. Linear Regressor.ipynb
index 51a71519cf..3db12e7222 100644
--- a/notebooks/Regression - Vowpal Wabbit vs. LightGBM vs. Linear Regressor.ipynb
+++ b/notebooks/Regression - Vowpal Wabbit vs. LightGBM vs. Linear Regressor.ipynb
@@ -8,7 +8,7 @@
"\n",
"This notebook shows how to build simple regression models by using \n",
"[Vowpal Wabbit (VW)](https://github.com/VowpalWabbit/vowpal_wabbit) and \n",
- "[LightGBM](https://github.com/microsoft/LightGBM) with MMLSpark.\n",
+ "[LightGBM](https://github.com/microsoft/LightGBM) with SynapseML.\n",
" We also compare the results with \n",
" [Spark MLlib Linear Regression](https://spark.apache.org/docs/latest/ml-classification-regression.html#linear-regression)."
]
@@ -32,9 +32,9 @@
"outputs": [],
"source": [
"import math\n",
- "from mmlspark.train import ComputeModelStatistics\n",
- "from mmlspark.vw import VowpalWabbitRegressor, VowpalWabbitFeaturizer\n",
- "from mmlspark.lightgbm import LightGBMRegressor\n",
+ "from synapse.ml.train import ComputeModelStatistics\n",
+ "from synapse.ml.vw import VowpalWabbitRegressor, VowpalWabbitFeaturizer\n",
+ "from synapse.ml.lightgbm import LightGBMRegressor\n",
"import numpy as np\n",
"import pandas as pd\n",
"from pyspark.ml.feature import VectorAssembler\n",
@@ -392,7 +392,7 @@
"pygments_lexer": "ipython3",
"version": "3.6.8"
},
- "name": "mmlspark example - regression",
+ "name": "synapseml example - regression",
"notebookId": 1395284431467721,
"pycharm": {
"stem_cell": {
diff --git a/notebooks/SparkServing - Deploying a Classifier.ipynb b/notebooks/SparkServing - Deploying a Classifier.ipynb
index 854ae260a0..1fb23adf6e 100644
--- a/notebooks/SparkServing - Deploying a Classifier.ipynb
+++ b/notebooks/SparkServing - Deploying a Classifier.ipynb
@@ -73,7 +73,7 @@
},
"outputs": [],
"source": [
- "from mmlspark.train import TrainClassifier\n",
+ "from synapse.ml.train import TrainClassifier\n",
"from pyspark.ml.classification import LogisticRegression\n",
"model = TrainClassifier(model=LogisticRegression(), labelCol=\"income\", numFeatures=256).fit(train)"
]
@@ -91,7 +91,7 @@
"metadata": {},
"outputs": [],
"source": [
- "from mmlspark.train import ComputeModelStatistics, TrainedClassifierModel\n",
+ "from synapse.ml.train import ComputeModelStatistics, TrainedClassifierModel\n",
"prediction = model.transform(test)\n",
"prediction.printSchema()"
]
@@ -111,7 +111,7 @@
"metadata": {},
"source": [
"First, we will define the webservice input/output.\n",
- "For more information, you can visit the [documentation for Spark Serving](https://github.com/Azure/mmlspark/blob/master/docs/mmlspark-serving.md)"
+ "For more information, you can visit the [documentation for Spark Serving](https://github.com/Microsoft/SynapseML/blob/master/docs/mmlspark-serving.md)"
]
},
{
@@ -121,7 +121,7 @@
"outputs": [],
"source": [
"from pyspark.sql.types import *\n",
- "from mmlspark.io import *\n",
+ "from synapse.ml.io import *\n",
"import uuid\n",
"\n",
"serving_inputs = spark.readStream.server() \\\n",
diff --git a/notebooks/TextAnalytics - Amazon Book Reviews with Word2Vec.ipynb b/notebooks/TextAnalytics - Amazon Book Reviews with Word2Vec.ipynb
index 9cd06cdd91..d65eb4823d 100644
--- a/notebooks/TextAnalytics - Amazon Book Reviews with Word2Vec.ipynb
+++ b/notebooks/TextAnalytics - Amazon Book Reviews with Word2Vec.ipynb
@@ -129,7 +129,7 @@
"execution_count": null,
"source": [
"from pyspark.ml.classification import LogisticRegression, RandomForestClassifier, GBTClassifier\r\n",
- "from mmlspark.train import TrainClassifier\r\n",
+ "from synapse.ml.train import TrainClassifier\r\n",
"import itertools\r\n",
"\r\n",
"lrHyperParams = [0.05, 0.2]\r\n",
@@ -166,7 +166,7 @@
"cell_type": "code",
"execution_count": null,
"source": [
- "from mmlspark.automl import FindBestModel\r\n",
+ "from synapse.ml.automl import FindBestModel\r\n",
"bestModel = FindBestModel(evaluationMetric=\"AUC\", models=trainedModels).fit(ptest)\r\n",
"bestModel.getRocCurve().show()\r\n",
"bestModel.getBestModelMetrics().show()\r\n",
@@ -186,7 +186,7 @@
"cell_type": "code",
"execution_count": null,
"source": [
- "from mmlspark.train import ComputeModelStatistics\r\n",
+ "from synapse.ml.train import ComputeModelStatistics\r\n",
"predictions = bestModel.transform(pvalidation)\r\n",
"metrics = ComputeModelStatistics().transform(predictions)\r\n",
"print(\"Best model's accuracy on validation set = \"\r\n",
diff --git a/notebooks/TextAnalytics - Amazon Book Reviews.ipynb b/notebooks/TextAnalytics - Amazon Book Reviews.ipynb
index e700ea2beb..033fed0bb3 100644
--- a/notebooks/TextAnalytics - Amazon Book Reviews.ipynb
+++ b/notebooks/TextAnalytics - Amazon Book Reviews.ipynb
@@ -54,7 +54,7 @@
"cell_type": "code",
"execution_count": null,
"source": [
- "from mmlspark.featurize.text import TextFeaturizer\r\n",
+ "from synapse.ml.featurize.text import TextFeaturizer\r\n",
"textFeaturizer = TextFeaturizer() \\\r\n",
" .setInputCol(\"text\").setOutputCol(\"features\") \\\r\n",
" .setUseStopWordsRemover(True).setUseIDF(True).setMinDocFreq(5).setNumFeatures(1 << 16).fit(data)"
@@ -108,7 +108,7 @@
"lrHyperParams = [0.05, 0.1, 0.2, 0.4]\r\n",
"logisticRegressions = [LogisticRegression(regParam = hyperParam) for hyperParam in lrHyperParams]\r\n",
"\r\n",
- "from mmlspark.train import TrainClassifier\r\n",
+ "from synapse.ml.train import TrainClassifier\r\n",
"lrmodels = [TrainClassifier(model=lrm, labelCol=\"label\").fit(train) for lrm in logisticRegressions]"
],
"outputs": [],
@@ -125,7 +125,7 @@
"cell_type": "code",
"execution_count": null,
"source": [
- "from mmlspark.automl import FindBestModel, BestModel\r\n",
+ "from synapse.ml.automl import FindBestModel, BestModel\r\n",
"bestModel = FindBestModel(evaluationMetric=\"AUC\", models=lrmodels).fit(test)\r\n",
"bestModel.getRocCurve().show()\r\n",
"bestModel.getBestModelMetrics().show()\r\n",
@@ -145,7 +145,7 @@
"cell_type": "code",
"execution_count": null,
"source": [
- "from mmlspark.train import ComputeModelStatistics\r\n",
+ "from synapse.ml.train import ComputeModelStatistics\r\n",
"predictions = bestModel.transform(validation)\r\n",
"metrics = ComputeModelStatistics().transform(predictions)\r\n",
"print(\"Best model's accuracy on validation set = \"\r\n",
diff --git a/notebooks/Vowpal Wabbit - Overview.ipynb b/notebooks/Vowpal Wabbit - Overview.ipynb
index 0934c36ebc..dab08ad9b9 100644
--- a/notebooks/Vowpal Wabbit - Overview.ipynb
+++ b/notebooks/Vowpal Wabbit - Overview.ipynb
@@ -149,7 +149,7 @@
"metadata": {},
"outputs": [],
"source": [
- "from mmlspark.vw import VowpalWabbitFeaturizer\n",
+ "from synapse.ml.vw import VowpalWabbitFeaturizer\n",
"featurizer = VowpalWabbitFeaturizer(inputCols=df.columns[:-1], outputCol=\"features\")\n",
"train_data = featurizer.transform(train)[\"target\", \"features\"]\n",
"test_data = featurizer.transform(test)[\"target\", \"features\"]"
@@ -177,7 +177,7 @@
"metadata": {},
"outputs": [],
"source": [
- "from mmlspark.vw import VowpalWabbitClassifier\n",
+ "from synapse.ml.vw import VowpalWabbitClassifier\n",
"model = VowpalWabbitClassifier(numPasses=20, labelCol=\"target\", featuresCol=\"features\").fit(train_data)"
]
},
@@ -204,7 +204,7 @@
"metadata": {},
"outputs": [],
"source": [
- "from mmlspark.train import ComputeModelStatistics\n",
+ "from synapse.ml.train import ComputeModelStatistics\n",
"metrics = ComputeModelStatistics(evaluationMetric='classification', labelCol='target', scoredLabelsCol='prediction').transform(predictions)\n",
"display(metrics)"
]
@@ -213,7 +213,7 @@
"source": [
"## Adult Census with VowpalWabbitClassifier\n",
"\n",
- "In this example, we predict incomes from the Adult Census dataset using Vowpal Wabbit (VW) Classifier in MMLSpark."
+ "In this example, we predict incomes from the Adult Census dataset using Vowpal Wabbit (VW) Classifier in SynapseML."
],
"cell_type": "markdown",
"metadata": {}
@@ -256,7 +256,7 @@
"source": [
"from pyspark.sql.functions import when, col\n",
"from pyspark.ml import Pipeline\n",
- "from mmlspark.vw import VowpalWabbitFeaturizer, VowpalWabbitClassifier\n",
+ "from synapse.ml.vw import VowpalWabbitFeaturizer, VowpalWabbitClassifier\n",
"\n",
"# Define classification label\n",
"train = train.withColumn(\"label\", when(col(\"income\").contains(\"<\"), 0.0).otherwise(1.0)).repartition(1)\n",
@@ -334,7 +334,7 @@
"metadata": {},
"outputs": [],
"source": [
- "from mmlspark.train import ComputeModelStatistics\n",
+ "from synapse.ml.train import ComputeModelStatistics\n",
"metrics = ComputeModelStatistics(evaluationMetric=\"classification\", \n",
" labelCol=\"label\", \n",
" scoredLabelsCol=\"prediction\").transform(prediction)\n",
@@ -372,8 +372,8 @@
"from matplotlib.colors import ListedColormap, Normalize\n",
"from matplotlib.cm import get_cmap\n",
"import matplotlib.pyplot as plt\n",
- "from mmlspark.train import ComputeModelStatistics\n",
- "from mmlspark.vw import VowpalWabbitRegressor, VowpalWabbitFeaturizer\n",
+ "from synapse.ml.train import ComputeModelStatistics\n",
+ "from synapse.ml.vw import VowpalWabbitRegressor, VowpalWabbitFeaturizer\n",
"import numpy as np\n",
"import pandas as pd\n",
"from sklearn.datasets import load_boston"
@@ -628,7 +628,7 @@
"metadata": {},
"outputs": [],
"source": [
- "from mmlspark.vw import VowpalWabbitRegressor\n",
+ "from synapse.ml.vw import VowpalWabbitRegressor\n",
"model = (VowpalWabbitRegressor(numPasses=20, args=\"--holdout_off --loss_function quantile -q :: -l 0.1\")\n",
" .fit(train))"
]
@@ -656,7 +656,7 @@
"metadata": {},
"outputs": [],
"source": [
- "from mmlspark.train import ComputeModelStatistics\n",
+ "from synapse.ml.train import ComputeModelStatistics\n",
"metrics = ComputeModelStatistics(evaluationMetric='regression',\n",
" labelCol='label',\n",
" scoresCol='prediction') \\\n",
@@ -733,7 +733,7 @@
"metadata": {},
"outputs": [],
"source": [
- "from mmlspark.vw import VowpalWabbitFeaturizer, VowpalWabbitContextualBandit, VectorZipper\n",
+ "from synapse.ml.vw import VowpalWabbitFeaturizer, VowpalWabbitContextualBandit, VectorZipper\n",
"from pyspark.ml import Pipeline\n",
"pipeline = Pipeline(stages=[\n",
" VowpalWabbitFeaturizer(inputCols=['GUser_id'], outputCol='GUser_id_feature'),\n",
diff --git a/opencv/src/main/python/mmlspark/opencv/ImageTransformer.py b/opencv/src/main/python/synapse/ml/opencv/ImageTransformer.py
similarity index 98%
rename from opencv/src/main/python/mmlspark/opencv/ImageTransformer.py
rename to opencv/src/main/python/synapse/ml/opencv/ImageTransformer.py
index 1ef0a210b6..6f50bd825e 100644
--- a/opencv/src/main/python/mmlspark/opencv/ImageTransformer.py
+++ b/opencv/src/main/python/synapse/ml/opencv/ImageTransformer.py
@@ -13,7 +13,7 @@
from pyspark.sql.types import *
from pyspark.sql.types import Row, _create_row
import numpy as np
-from mmlspark.opencv._ImageTransformer import _ImageTransformer
+from synapse.ml.opencv._ImageTransformer import _ImageTransformer
ImageFields = ["origin", "height", "width", "nChannels", "mode", "data"]
diff --git a/deep-learning/src/main/python/mmlspark/onnx/__init__.py b/opencv/src/main/python/synapse/ml/opencv/__init__.py
similarity index 100%
rename from deep-learning/src/main/python/mmlspark/onnx/__init__.py
rename to opencv/src/main/python/synapse/ml/opencv/__init__.py
diff --git a/opencv/src/main/scala/com/microsoft/ml/spark/opencv/ImageSetAugmenter.scala b/opencv/src/main/scala/com/microsoft/azure/synapse/ml/opencv/ImageSetAugmenter.scala
similarity index 91%
rename from opencv/src/main/scala/com/microsoft/ml/spark/opencv/ImageSetAugmenter.scala
rename to opencv/src/main/scala/com/microsoft/azure/synapse/ml/opencv/ImageSetAugmenter.scala
index ae89e80dd9..1054ae684c 100644
--- a/opencv/src/main/scala/com/microsoft/ml/spark/opencv/ImageSetAugmenter.scala
+++ b/opencv/src/main/scala/com/microsoft/azure/synapse/ml/opencv/ImageSetAugmenter.scala
@@ -1,11 +1,11 @@
// Copyright (C) Microsoft Corporation. All rights reserved.
// Licensed under the MIT License. See LICENSE in project root for information.
-package com.microsoft.ml.spark.opencv
+package com.microsoft.azure.synapse.ml.opencv
-import com.microsoft.ml.spark.codegen.Wrappable
-import com.microsoft.ml.spark.core.contracts.{HasInputCol, HasOutputCol}
-import com.microsoft.ml.spark.logging.BasicLogging
+import com.microsoft.azure.synapse.ml.codegen.Wrappable
+import com.microsoft.azure.synapse.ml.core.contracts.{HasInputCol, HasOutputCol}
+import com.microsoft.azure.synapse.ml.logging.BasicLogging
import org.apache.spark.ml._
import org.apache.spark.ml.image.ImageSchema
import org.apache.spark.ml.param._
diff --git a/opencv/src/main/scala/com/microsoft/ml/spark/opencv/ImageTransformer.scala b/opencv/src/main/scala/com/microsoft/azure/synapse/ml/opencv/ImageTransformer.scala
similarity index 98%
rename from opencv/src/main/scala/com/microsoft/ml/spark/opencv/ImageTransformer.scala
rename to opencv/src/main/scala/com/microsoft/azure/synapse/ml/opencv/ImageTransformer.scala
index 9caa45c54f..795c5c458c 100644
--- a/opencv/src/main/scala/com/microsoft/ml/spark/opencv/ImageTransformer.scala
+++ b/opencv/src/main/scala/com/microsoft/azure/synapse/ml/opencv/ImageTransformer.scala
@@ -1,12 +1,12 @@
// Copyright (C) Microsoft Corporation. All rights reserved.
// Licensed under the MIT License. See LICENSE in project root for information.
-package com.microsoft.ml.spark.opencv
+package com.microsoft.azure.synapse.ml.opencv
-import com.microsoft.ml.spark.codegen.Wrappable
-import com.microsoft.ml.spark.core.contracts.{HasInputCol, HasOutputCol}
-import com.microsoft.ml.spark.core.schema.{BinaryFileSchema, ImageSchemaUtils}
-import com.microsoft.ml.spark.logging.BasicLogging
+import com.microsoft.azure.synapse.ml.codegen.Wrappable
+import com.microsoft.azure.synapse.ml.core.contracts.{HasInputCol, HasOutputCol}
+import com.microsoft.azure.synapse.ml.core.schema.{BinaryFileSchema, ImageSchemaUtils}
+import com.microsoft.azure.synapse.ml.logging.BasicLogging
import org.apache.spark.injections.UDFUtils
import org.apache.spark.ml.image.ImageSchema
import org.apache.spark.ml.param._
@@ -200,7 +200,7 @@ object Flip {
}
/** Blurs the image using a box filter.
- * The com.microsoft.ml.spark.core.serialize.params are a map of the dimensions of the blurring box. Please refer to
+ * The params are a map of the dimensions of the blurring box. Please refer to
* [[http://docs.opencv.org/2.4/modules/imgproc/doc/filtering.html#blur OpenCV]] for more information.
*
* @param params Map of parameters and values
diff --git a/opencv/src/main/scala/com/microsoft/ml/spark/opencv/OpenCVUtils.scala b/opencv/src/main/scala/com/microsoft/azure/synapse/ml/opencv/OpenCVUtils.scala
similarity index 80%
rename from opencv/src/main/scala/com/microsoft/ml/spark/opencv/OpenCVUtils.scala
rename to opencv/src/main/scala/com/microsoft/azure/synapse/ml/opencv/OpenCVUtils.scala
index 411d4234de..57fa63a0d2 100644
--- a/opencv/src/main/scala/com/microsoft/ml/spark/opencv/OpenCVUtils.scala
+++ b/opencv/src/main/scala/com/microsoft/azure/synapse/ml/opencv/OpenCVUtils.scala
@@ -1,9 +1,9 @@
// Copyright (C) Microsoft Corporation. All rights reserved.
// Licensed under the MIT License. See LICENSE in project root for information.
-package com.microsoft.ml.spark.opencv
+package com.microsoft.azure.synapse.ml.opencv
-import com.microsoft.ml.spark.core.env.NativeLoader
+import com.microsoft.azure.synapse.ml.core.env.NativeLoader
import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.catalyst.encoders.RowEncoder
@@ -21,12 +21,12 @@ object OpenCVUtils {
new NativeLoader("/nu/pattern/opencv").loadLibraryByName(Core.NATIVE_LIBRARY_NAME)
}
- private[spark] def loadOpenCVFunc[A](it: Iterator[A]) = {
+ private[ml] def loadOpenCVFunc[A](it: Iterator[A]) = {
OpenCVLoader
it
}
- private[spark] def loadOpenCV(df: DataFrame): DataFrame = {
+ private[ml] def loadOpenCV(df: DataFrame): DataFrame = {
val encoder = RowEncoder(df.schema)
df.mapPartitions(loadOpenCVFunc)(encoder)
}
diff --git a/opencv/src/test/scala/com/microsoft/ml/spark/image/ResizeImageTransformerSuite.scala b/opencv/src/test/scala/com/microsoft/azure/synapse/ml/image/ResizeImageTransformerSuite.scala
similarity index 91%
rename from opencv/src/test/scala/com/microsoft/ml/spark/image/ResizeImageTransformerSuite.scala
rename to opencv/src/test/scala/com/microsoft/azure/synapse/ml/image/ResizeImageTransformerSuite.scala
index b20b309bb0..eb6eba5750 100644
--- a/opencv/src/test/scala/com/microsoft/ml/spark/image/ResizeImageTransformerSuite.scala
+++ b/opencv/src/test/scala/com/microsoft/azure/synapse/ml/image/ResizeImageTransformerSuite.scala
@@ -1,20 +1,20 @@
// Copyright (C) Microsoft Corporation. All rights reserved.
// Licensed under the MIT License. See LICENSE in project root for information.
-package com.microsoft.ml.spark.image
+package com.microsoft.azure.synapse.ml.image
-import java.io.File
-import java.net.URL
-
-import com.microsoft.ml.spark.core.env.FileUtilities
-import com.microsoft.ml.spark.core.test.fuzzing.{TestObject, TransformerFuzzing}
-import com.microsoft.ml.spark.io.IOImplicits._
-import com.microsoft.ml.spark.opencv.{ImageTransformer, OpenCVTestUtils}
+import com.microsoft.azure.synapse.ml.core.env.FileUtilities
+import com.microsoft.azure.synapse.ml.core.test.fuzzing.{TestObject, TransformerFuzzing}
+import com.microsoft.azure.synapse.ml.io.IOImplicits._
+import com.microsoft.azure.synapse.ml.opencv.{ImageTransformer, OpenCVTestUtils}
import org.apache.commons.io.FileUtils
import org.apache.spark.ml.linalg.DenseVector
import org.apache.spark.ml.util.MLReadable
import org.apache.spark.sql.{DataFrame, Row}
+import java.io.File
+import java.net.URL
+
class ResizeImageTransformerSuite extends TransformerFuzzing[ResizeImageTransformer]
with OpenCVTestUtils {
diff --git a/opencv/src/test/scala/com/microsoft/ml/spark/opencv/ImageSetAugmenterSuite.scala b/opencv/src/test/scala/com/microsoft/azure/synapse/ml/opencv/ImageSetAugmenterSuite.scala
similarity index 78%
rename from opencv/src/test/scala/com/microsoft/ml/spark/opencv/ImageSetAugmenterSuite.scala
rename to opencv/src/test/scala/com/microsoft/azure/synapse/ml/opencv/ImageSetAugmenterSuite.scala
index 427f84d08f..eefbe172bb 100644
--- a/opencv/src/test/scala/com/microsoft/ml/spark/opencv/ImageSetAugmenterSuite.scala
+++ b/opencv/src/test/scala/com/microsoft/azure/synapse/ml/opencv/ImageSetAugmenterSuite.scala
@@ -1,12 +1,12 @@
// Copyright (C) Microsoft Corporation. All rights reserved.
// Licensed under the MIT License. See LICENSE in project root for information.
-package com.microsoft.ml.spark.opencv
+package com.microsoft.azure.synapse.ml.opencv
-import com.microsoft.ml.spark.build.BuildInfo
-import com.microsoft.ml.spark.core.test.base.LinuxOnly
-import com.microsoft.ml.spark.core.test.fuzzing.{TestObject, TransformerFuzzing}
-import com.microsoft.ml.spark.io.IOImplicits._
+import com.microsoft.azure.synapse.ml.core.test.base.LinuxOnly
+import com.microsoft.azure.synapse.ml.core.test.fuzzing.{TestObject, TransformerFuzzing}
+import com.microsoft.azure.synapse.ml.io.IOImplicits._
+import com.microsoft.azure.synapse.ml.build.BuildInfo
import org.apache.spark.ml.util.MLReadable
import org.apache.spark.sql.DataFrame
diff --git a/opencv/src/test/scala/com/microsoft/ml/spark/opencv/ImageTransformerSuite.scala b/opencv/src/test/scala/com/microsoft/azure/synapse/ml/opencv/ImageTransformerSuite.scala
similarity index 97%
rename from opencv/src/test/scala/com/microsoft/ml/spark/opencv/ImageTransformerSuite.scala
rename to opencv/src/test/scala/com/microsoft/azure/synapse/ml/opencv/ImageTransformerSuite.scala
index 554b2d0776..9b44b20b68 100644
--- a/opencv/src/test/scala/com/microsoft/ml/spark/opencv/ImageTransformerSuite.scala
+++ b/opencv/src/test/scala/com/microsoft/azure/synapse/ml/opencv/ImageTransformerSuite.scala
@@ -1,13 +1,13 @@
// Copyright (C) Microsoft Corporation. All rights reserved.
// Licensed under the MIT License. See LICENSE in project root for information.
-package com.microsoft.ml.spark.opencv
+package com.microsoft.azure.synapse.ml.opencv
-import com.microsoft.ml.spark.build.BuildInfo
-import com.microsoft.ml.spark.core.env.FileUtilities
-import com.microsoft.ml.spark.core.test.fuzzing.{TestObject, TransformerFuzzing}
-import com.microsoft.ml.spark.image.{UnrollBinaryImage, UnrollImage}
-import com.microsoft.ml.spark.io.IOImplicits._
+import com.microsoft.azure.synapse.ml.core.env.FileUtilities
+import com.microsoft.azure.synapse.ml.core.test.fuzzing.{TestObject, TransformerFuzzing}
+import com.microsoft.azure.synapse.ml.image.{UnrollBinaryImage, UnrollImage}
+import com.microsoft.azure.synapse.ml.io.IOImplicits._
+import com.microsoft.azure.synapse.ml.build.BuildInfo
import org.apache.hadoop.fs.Path
import org.apache.spark.ml.linalg.DenseVector
import org.apache.spark.ml.param.DataFrameEquality
diff --git a/pipeline.yaml b/pipeline.yaml
index ed4d28fc99..a7794b2b51 100644
--- a/pipeline.yaml
+++ b/pipeline.yaml
@@ -56,7 +56,7 @@ jobs:
azureSubscription: 'MMLSpark Build'
keyVaultName: mmlspark-keys
- bash: |
- source activate mmlspark
+ source activate synapse
sbt packagePython
sbt publishBlob publishDocs publishR publishPython
sbt genBuildInfo
@@ -97,7 +97,7 @@ jobs:
azureSubscription: 'MMLSpark Build'
keyVaultName: mmlspark-keys
- bash: |
- source activate mmlspark
+ source activate synapseml
sbt packagePython
sbt publishBlob
displayName: Publish Blob Artifacts
@@ -113,7 +113,7 @@ jobs:
inputs:
azureSubscription: 'MMLSpark Build'
scriptLocation: inlineScript
- inlineScript: 'sbt "testOnly com.microsoft.ml.spark.nbtest.DatabricksTests"'
+ inlineScript: 'sbt "testOnly com.microsoft.azure.synapse.ml.nbtest.DatabricksTests"'
condition: and(succeeded(), eq(variables.runTests, 'True'))
- task: PublishTestResults@2
displayName: 'Publish Test Results'
@@ -138,7 +138,7 @@ jobs:
azureSubscription: 'MMLSpark Build'
keyVaultName: mmlspark-keys
- bash: |
- source activate mmlspark
+ source activate synapseml
jupyter nbconvert --to script ./notebooks/*.ipynb*
sbt packagePython
sbt publishBlob
@@ -155,7 +155,7 @@ jobs:
inputs:
azureSubscription: 'MMLSpark Build'
scriptLocation: inlineScript
- inlineScript: 'sbt "testOnly com.microsoft.ml.spark.nbtest.SynapseTests"'
+ inlineScript: 'sbt "testOnly com.microsoft.azure.synapse.ml.nbtest.SynapseTests"'
condition: and(succeeded(), eq(variables.runTests, 'True'))
- task: PublishTestResults@2
displayName: 'Publish Test Results'
@@ -278,7 +278,7 @@ jobs:
azureSubscription: 'MMLSpark Build'
scriptLocation: inlineScript
inlineScript: |
- source activate mmlspark
+ source activate synapseml
(timeout 5m sbt setup) || (echo "retrying" && timeout 5m sbt setup) || (echo "retrying" && timeout 5m sbt setup)
(sbt coverage testPython) || (sbt coverage testPython) || (sbt coverage testPython)
- task: PublishTestResults@2
@@ -328,7 +328,7 @@ jobs:
azureSubscription: 'MMLSpark Build'
scriptLocation: inlineScript
inlineScript: |
- source activate mmlspark
+ source activate synapseml
(timeout 5m sbt setup) || (echo "retrying" && timeout 5m sbt setup) || (echo "retrying" && timeout 5m sbt setup)
sbt coverage testR
- task: PublishTestResults@2
@@ -447,9 +447,9 @@ jobs:
sudo apt-get update && sudo apt-get install ffmpeg libgstreamer1.0-0 \
gstreamer1.0-plugins-base gstreamer1.0-plugins-good gstreamer1.0-plugins-bad gstreamer1.0-plugins-ugly -y)
export SBT_OPTS="-Xmx2G -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled -XX:MaxPermSize=2G -Xss2M -Duser.timezone=GMT"
- (timeout 20m sbt coverage "testOnly com.microsoft.ml.spark.$(PACKAGE).**") ||
- (${FLAKY:-false} && timeout 20m sbt coverage "testOnly com.microsoft.ml.spark.$(PACKAGE).**") ||
- (${FLAKY:-false} && timeout 20m sbt coverage "testOnly com.microsoft.ml.spark.$(PACKAGE).**")
+ (timeout 20m sbt coverage "testOnly com.microsoft.azure.synapse.ml.$(PACKAGE).**") ||
+ (${FLAKY:-false} && timeout 20m sbt coverage "testOnly com.microsoft.azure.synapse.ml.$(PACKAGE).**") ||
+ (${FLAKY:-false} && timeout 20m sbt coverage "testOnly com.microsoft.azure.synapse.ml.$(PACKAGE).**")
- task: PublishTestResults@2
displayName: 'Publish Test Results'
diff --git a/project/BlobMavenPlugin.scala b/project/BlobMavenPlugin.scala
index de8114172e..7008c69964 100644
--- a/project/BlobMavenPlugin.scala
+++ b/project/BlobMavenPlugin.scala
@@ -11,7 +11,7 @@ object BlobMavenPlugin extends AutoPlugin {
override def trigger = allRequirements
object autoImport {
- val publishBlob = TaskKey[Unit]("publishBlob", "publish the library to mmlspark blob")
+ val publishBlob = TaskKey[Unit]("publishBlob", "publish the library to synapseml blob")
val blobArtifactInfo = SettingKey[String]("blobArtifactInfo")
}
@@ -34,7 +34,7 @@ object BlobMavenPlugin extends AutoPlugin {
},
blobArtifactInfo := {
s"""
- |MMLSpark Build and Release Information
+ |SynapseML Build and Release Information
|---------------
|
|### Maven Coordinates
diff --git a/project/CodegenPlugin.scala b/project/CodegenPlugin.scala
index 91b42c1446..bcc0b21654 100644
--- a/project/CodegenPlugin.scala
+++ b/project/CodegenPlugin.scala
@@ -37,8 +37,11 @@ object CodegenPlugin extends AutoPlugin {
object autoImport {
val pythonizedVersion = settingKey[String]("Pythonized version")
val rVersion = settingKey[String]("R version")
- val genPackageNamespace = settingKey[String]("genPackageNamespace")
+ val genPyPackageNamespace = settingKey[String]("genPyPackageNamespace")
+ val genRPackageNamespace = settingKey[String]("genRPackageNamespace")
+
val genTestPackageNamespace = settingKey[String]("genTestPackageNamespace")
+
val codegenJarName = settingKey[Option[String]]("codegenJarName")
val testgenJarName = settingKey[Option[String]]("testgenJarName")
val codegenArgs = settingKey[String]("codegenArgs")
@@ -75,9 +78,9 @@ object CodegenPlugin extends AutoPlugin {
packageR.value
publishLocal.value
val libPath = join(condaEnvLocation.value, "Lib", "R", "library").toString
- val rSrcDir = join(codegenDir.value, "src", "R", genPackageNamespace.value)
+ val rSrcDir = join(codegenDir.value, "src", "R", genRPackageNamespace.value)
rCmd(activateCondaEnv.value,
- Seq("R", "CMD", "INSTALL", "--no-multiarch", "--with-keep.source", genPackageNamespace.value),
+ Seq("R", "CMD", "INSTALL", "--no-multiarch", "--with-keep.source", genRPackageNamespace.value),
rSrcDir.getParentFile, libPath)
val testRunner = join("tools", "tests", "run_r_tests.R")
if (join(rSrcDir,"tests").exists()){
@@ -91,7 +94,7 @@ object CodegenPlugin extends AutoPlugin {
(Test / compile).value
val arg = testgenArgs.value
Def.task {
- (Test / runMain).toTask(s" com.microsoft.ml.spark.codegen.TestGen $arg").value
+ (Test / runMain).toTask(s" com.microsoft.azure.synapse.ml.codegen.TestGen $arg").value
}
} tag(TestGenTag)
@@ -107,7 +110,7 @@ object CodegenPlugin extends AutoPlugin {
version.value,
pythonizedVersion.value,
rVersion.value,
- genPackageNamespace.value
+ genPyPackageNamespace.value
).toJson.compactPrint
},
testgenArgs := {
@@ -119,7 +122,7 @@ object CodegenPlugin extends AutoPlugin {
version.value,
pythonizedVersion.value,
rVersion.value,
- genPackageNamespace.value
+ genPyPackageNamespace.value
).toJson.compactPrint
},
codegenJarName := {
@@ -141,7 +144,7 @@ object CodegenPlugin extends AutoPlugin {
(Test / compile).value
val arg = codegenArgs.value
Def.task {
- (Compile / runMain).toTask(s" com.microsoft.ml.spark.codegen.CodeGen $arg").value
+ (Compile / runMain).toTask(s" com.microsoft.azure.synapse.ml.codegen.CodeGen $arg").value
}
}.value),
testgen := testGenImpl.value,
@@ -162,7 +165,7 @@ object CodegenPlugin extends AutoPlugin {
packageR := {
createCondaEnvTask.value
codegen.value
- val rSrcDir = join(codegenDir.value, "src", "R", genPackageNamespace.value)
+ val rSrcDir = join(codegenDir.value, "src", "R", genRPackageNamespace.value)
val rPackageDir = join(codegenDir.value, "package", "R")
val libPath = join(condaEnvLocation.value, "Lib", "R", "library").toString
rCmd(activateCondaEnv.value, Seq("R", "-q", "-e", "roxygen2::roxygenise()"), rSrcDir, libPath)
@@ -180,11 +183,11 @@ object CodegenPlugin extends AutoPlugin {
packagePython := {
codegen.value
createCondaEnvTask.value
- val destPyDir = join(targetDir.value, "classes", genPackageNamespace.value)
+ val destPyDir = join(targetDir.value, "classes", genPyPackageNamespace.value)
val packageDir = join(codegenDir.value, "package", "python").absolutePath
val pythonSrcDir = join(codegenDir.value, "src", "python")
if (destPyDir.exists()) FileUtils.forceDelete(destPyDir)
- val sourcePyDir = join(pythonSrcDir.getAbsolutePath, genPackageNamespace.value)
+ val sourcePyDir = join(pythonSrcDir.getAbsolutePath, genPyPackageNamespace.value)
FileUtils.copyDirectory(sourcePyDir, destPyDir)
runCmd(
activateCondaEnv.value ++
@@ -208,8 +211,8 @@ object CodegenPlugin extends AutoPlugin {
version.value + "/" + fn, "pip")
},
mergePyCode := {
- val srcDir = join(codegenDir.value, "src", "python", genPackageNamespace.value)
- val destDir = join(mergePyCodeDir.value, "src", "python", genPackageNamespace.value)
+ val srcDir = join(codegenDir.value, "src", "python", genPyPackageNamespace.value)
+ val destDir = join(mergePyCodeDir.value, "src", "python", genPyPackageNamespace.value)
FileUtils.copyDirectory(srcDir, destDir)
},
testPython := {
@@ -220,7 +223,7 @@ object CodegenPlugin extends AutoPlugin {
activateCondaEnv.value ++ Seq("python",
"-m",
"pytest",
- s"--cov=${genPackageNamespace.value}",
+ s"--cov=${genPyPackageNamespace.value}",
s"--junitxml=${join(mainTargetDir, s"python-test-results-${name.value}.xml")}",
"--cov-report=xml",
genTestPackageNamespace.value
@@ -237,11 +240,14 @@ object CodegenPlugin extends AutoPlugin {
codegenDir := {
join(targetDir.value, "generated")
},
- genPackageNamespace := {
- "mmlspark"
+ genPyPackageNamespace := {
+ "synapse"
+ },
+ genRPackageNamespace := {
+ "synapseml"
},
genTestPackageNamespace := {
- "mmlsparktest"
+ "synapsemltest"
}
)
diff --git a/project/CondaPlugin.scala b/project/CondaPlugin.scala
index 4e3e3ce005..ca9c602f47 100644
--- a/project/CondaPlugin.scala
+++ b/project/CondaPlugin.scala
@@ -18,7 +18,7 @@ object CondaPlugin extends AutoPlugin {
import autoImport._
override lazy val globalSettings: Seq[Setting[_]] = Seq(
- condaEnvName := "mmlspark",
+ condaEnvName := "synapseml",
cleanCondaEnvTask := {
runCmd(Seq("conda", "env", "remove", "--name", condaEnvName.value, "-y"))
},
diff --git a/scalastyle-config.xml b/scalastyle-config.xml
index 8a4b5a81b1..2f2e4e631b 100644
--- a/scalastyle-config.xml
+++ b/scalastyle-config.xml
@@ -12,7 +12,7 @@