diff --git a/README.md b/README.md index 42987cbd7a..01adb5d27e 100644 --- a/README.md +++ b/README.md @@ -137,9 +137,9 @@ MMLSpark can be conveniently installed on existing Spark clusters via the `--packages` option, examples: ```bash - spark-shell --packages Azure:mmlspark:0.12 - pyspark --packages Azure:mmlspark:0.12 - spark-submit --packages Azure:mmlspark:0.12 MyApp.jar + spark-shell --packages Azure:mmlspark:0.13 + pyspark --packages Azure:mmlspark:0.13 + spark-submit --packages Azure:mmlspark:0.13 MyApp.jar ``` This can be used in other Spark contexts too, for example, you can use MMLSpark @@ -156,7 +156,7 @@ the above example, or from python: ```python import pyspark spark = pyspark.sql.SparkSession.builder.appName("MyApp") \ - .config("spark.jars.packages", "Azure:mmlspark:0.12") \ + .config("spark.jars.packages", "Azure:mmlspark:0.13") \ .getOrCreate() import mmlspark ``` @@ -172,7 +172,7 @@ running script actions, see [this guide](https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-customize-cluster-linux#use-a-script-action-during-cluster-creation). The script action url is: -. +. If you're using the Azure Portal to run the script action, go to `Script actions` → `Submit new` in the `Overview` section of your cluster blade. In @@ -188,7 +188,7 @@ cloud](http://community.cloud.databricks.com), create a new [library from Maven coordinates](https://docs.databricks.com/user-guide/libraries.html#libraries-from-maven-pypi-or-spark-packages) in your workspace. -For the coordinates use: `Azure:mmlspark:0.12`. Ensure this library is +For the coordinates use: `Azure:mmlspark:0.13`. Ensure this library is attached to all clusters you create. Finally, ensure that your Spark cluster has at least Spark 2.1 and Scala 2.11. @@ -202,7 +202,7 @@ your `build.sbt`: ```scala resolvers += "MMLSpark Repo" at "https://mmlspark.azureedge.net/maven" - libraryDependencies += "com.microsoft.ml.spark" %% "mmlspark" % "0.12" + libraryDependencies += "com.microsoft.ml.spark" %% "mmlspark" % "0.13" ``` ### Building from source diff --git a/docs/R-setup.md b/docs/R-setup.md index d9665a3eef..96d434f31d 100644 --- a/docs/R-setup.md +++ b/docs/R-setup.md @@ -10,7 +10,7 @@ To install the current MMLSpark package for R use: ```R ... - devtools::install_url("https://mmlspark.azureedge.net/rrr/mmlspark-0.12.zip") + devtools::install_url("https://mmlspark.azureedge.net/rrr/mmlspark-0.13.zip") ... ``` @@ -23,7 +23,7 @@ It will take some time to install all dependencies. Then, run: library(sparklyr) library(dplyr) config <- spark_config() - config$sparklyr.defaultPackages <- "Azure:mmlspark:0.12" + config$sparklyr.defaultPackages <- "Azure:mmlspark:0.13" sc <- spark_connect(master = "local", config = config) ... ``` diff --git a/docs/docker.md b/docs/docker.md index f857bdfbe3..1a12bd7566 100644 --- a/docs/docker.md +++ b/docs/docker.md @@ -29,7 +29,7 @@ You can now select one of the sample notebooks and run it, or create your own. In the above, `microsoft/mmlspark` specifies the project and image name that you want to run. There is another component implicit here which is the *tag* (= version) that you want to use — specifying it explicitly looks like -`microsoft/mmlspark:0.12` for the `0.12` tag. +`microsoft/mmlspark:0.13` for the `0.13` tag. Leaving `microsoft/mmlspark` by itself has an implicit `latest` tag, so it is equivalent to `microsoft/mmlspark:latest`. The `latest` tag is identical to the @@ -47,7 +47,7 @@ that you will probably want to use can look as follows: -e ACCEPT_EULA=y \ -p 127.0.0.1:80:8888 \ -v ~/myfiles:/notebooks/myfiles \ - microsoft/mmlspark:0.12 + microsoft/mmlspark:0.13 ``` In this example, backslashes are used to break things up for readability; you @@ -59,7 +59,7 @@ path and line breaks looks a little different: -e ACCEPT_EULA=y ` -p 127.0.0.1:80:8888 ` -v C:\myfiles:/notebooks/myfiles ` - microsoft/mmlspark:0.12 + microsoft/mmlspark:0.13 ``` Let's break this command and go over the meaning of each part: @@ -143,7 +143,7 @@ Let's break this command and go over the meaning of each part: model.write().overwrite().save('myfiles/myTrainedModel.mml') ``` -* **`microsoft/mmlspark:0.12`** +* **`microsoft/mmlspark:0.13`** Finally, this specifies an explicit version tag for the image that we want to run. diff --git a/docs/gpu-setup.md b/docs/gpu-setup.md index d0bbd61e0d..e066a20f53 100644 --- a/docs/gpu-setup.md +++ b/docs/gpu-setup.md @@ -26,7 +26,7 @@ to check availability in your data center. MMLSpark provides an Azure Resource Manager (ARM) template to create a default setup that includes an HDInsight cluster and a GPU machine for training. The template can be found here: -. +. It has the following parameters that configure the HDI Spark cluster and the associated GPU VM: @@ -48,16 +48,16 @@ the associated GPU VM: - `gpuVirtualMachineSize`: The size of the GPU virtual machine to create There are actually two additional templates that are used from this main template: -- [`spark-cluster-template.json`](https://mmlspark.azureedge.net/buildartifacts/0.12/spark-cluster-template.json): +- [`spark-cluster-template.json`](https://mmlspark.azureedge.net/buildartifacts/0.13/spark-cluster-template.json): A template for creating an HDI Spark cluster within a VNet, including MMLSpark and its dependencies. (This template installs MMLSpark using the HDI script action: - [`install-mmlspark.sh`](https://mmlspark.azureedge.net/buildartifacts/0.12/install-mmlspark.sh).) -- [`gpu-vm-template.json`](https://mmlspark.azureedge.net/buildartifacts/0.12/gpu-vm-template.json): + [`install-mmlspark.sh`](https://mmlspark.azureedge.net/buildartifacts/0.13/install-mmlspark.sh).) +- [`gpu-vm-template.json`](https://mmlspark.azureedge.net/buildartifacts/0.13/gpu-vm-template.json): A template for creating a GPU VM within an existing VNet, including CNTK and other dependencies that MMLSpark needs for GPU training. (This is done via a script action that runs - [`gpu-setup.sh`](https://mmlspark.azureedge.net/buildartifacts/0.12/gpu-setup.sh).) + [`gpu-setup.sh`](https://mmlspark.azureedge.net/buildartifacts/0.13/gpu-setup.sh).) Note that these child templates can also be deployed independently, if you don't need both parts of the installation. Particularly, to scale @@ -69,7 +69,7 @@ GPU VM setup template at experimentation time. ### 1. Deploy an ARM template within the [Azure Portal](https://ms.portal.azure.com/) [Click here to open the above main -template](https://portal.azure.com/#create/Microsoft.Template/uri/https%3A%2F%2Fmmlspark.azureedge.net%2Fbuildartifacts%2F0.12%2Fdeploy-main-template.json) +template](https://portal.azure.com/#create/Microsoft.Template/uri/https%3A%2F%2Fmmlspark.azureedge.net%2Fbuildartifacts%2F0.13%2Fdeploy-main-template.json) in the Azure portal. (If needed, you click the **Edit template** button to view and edit the @@ -87,11 +87,11 @@ We also provide a convenient shell script to create a deployment on the command line: * Download the [shell - script](https://mmlspark.azureedge.net/buildartifacts/0.12/deploy-arm.sh) + script](https://mmlspark.azureedge.net/buildartifacts/0.13/deploy-arm.sh) and make a local copy of it * Create a JSON parameter file by downloading [this template - file](https://mmlspark.azureedge.net/buildartifacts/0.12/deploy-parameters.template) + file](https://mmlspark.azureedge.net/buildartifacts/0.13/deploy-parameters.template) and modify it according to your specification. You can now run the script — it takes the following arguments: @@ -124,7 +124,7 @@ you for all needed values. ### 3. Deploy an ARM template with the MMLSpark Azure PowerShell MMLSpark also provides a [PowerShell -script](https://mmlspark.azureedge.net/buildartifacts/0.12/deploy-arm.ps1) +script](https://mmlspark.azureedge.net/buildartifacts/0.13/deploy-arm.ps1) to deploy ARM templates, similar to the above bash script. Run it with `-?` to see the usage instructions (or use `get-help`). If needed, install the Azure PowerShell cmdlets using the instructions in the