Skip to content

Commit

Permalink
Sort out locations
Browse files Browse the repository at this point in the history
This changes the expected use of the ARM templates to work from a given
URL.
  • Loading branch information
elibarzilay committed Nov 15, 2017
1 parent cdbcf89 commit 7ef335c
Show file tree
Hide file tree
Showing 10 changed files with 96 additions and 81 deletions.
66 changes: 42 additions & 24 deletions docs/gpu-setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,16 +14,19 @@ third party software used by MMLSpark.

### Data Center Compatibility

Not all data centers currently have GPU VMs available. See [the Linux VMs
page](https://azure.microsoft.com/en-us/pricing/details/virtual-machines/linux/)
Currently, not all data centers have GPU VMs available. See [the Linux
VMs page](https://azure.microsoft.com/en-us/pricing/details/virtual-machines/linux/)
to check availability in your data center.

## Connect an HDI cluster and a GPU VM via the ARM template

MMLSpark provides an Azure Resource Manager (ARM) template to create a setup
that includes an HDInsight cluster and/or a GPU machine for training. The
[template](../tools/deployment/deploy-main-template.json) has the following
parameters to allow you to configure the HDI Spark cluster and the GPU VM:
template can be found here:
<https://mmlspark.azureedge.net/buildartifacts/0.9/deploy-main-template.json>.

It has the following parameters that configure the HDI Spark cluster and
the associated GPU VM:
- `clusterName`: The name of the HDInsight Spark cluster to create
- `clusterLoginUserName`: These credentials can be used to submit jobs to the
cluster and to log into cluster dashboards
Expand Down Expand Up @@ -51,15 +54,21 @@ For the naming rules and restrictions for Azure resources please refer to the
[Naming conventions
article](https://docs.microsoft.com/en-us/azure/architecture/best-practices/naming-conventions).

MMLSpark provides three ARM templates:
- [`deploy-main-template.json`](../tools/deployment/deploy-main-template.json):
This is the main template. It referencs the following two child templates.
- [`spark-cluster-template.json`](../tools/deployment/spark-cluster-template.json):
There are actually three templates that are used for deployment:
- [`deploy-main-template.json`](https://mmlspark.azureedge.net/buildartifacts/0.9/deploy-main-template.json):
This is the main template. It referencs the following two child
templates — these are relative references so they are expected to be
found in the same base URL.
- [`spark-cluster-template.json`](https://mmlspark.azureedge.net/buildartifacts/0.9/spark-cluster-template.json):
A template for creating an HDI Spark cluster within a VNet, including
MMLSpark and its dependencies.
- [`gpu-vm-template.json`](../tools/deployment/gpu-vm-template.json):
MMLSpark and its dependencies. (This template installs MMLSpark using
the HDI script action:
[`install-mmlspark.sh`](https://mmlspark.azureedge.net/buildartifacts/0.9/install-mmlspark.sh).)
- [`gpu-vm-template.json`](https://mmlspark.azureedge.net/buildartifacts/0.9/gpu-vm-template.json):
A template for creating a GPU VM within an existing VNet, including
CNTK and other dependencies that MMLSpark needs for GPU training.
(This is done via a script action that runs
[`gpu-setup.sh`](https://mmlspark.azureedge.net/buildartifacts/0.9/gpu-setup.sh).)

Note that the last two child templates can also be deployed independently, if
you don't need both parts of the installation.
Expand All @@ -83,28 +92,36 @@ open the template in the Portal. If needed, click the **Edit template** button

![ARM template in Portal](http://image.ibb.co/gZ6iiF/arm_Template_In_Portal.png)

### 2. Deploy an ARM template with [MMLSpark Azure CLI 2.0](../tools/deployment/deploy-arm.sh)
### 2. Deploy an ARM template with [MMLSpark Azure CLI 2.0](https://mmlspark.azureedge.net/buildartifacts/0.9/deploy-arm.sh)

MMLSpark provides an Azure CLI 2.0 script
([`deploy-arm.sh`](../tools/deployment/deploy-arm.sh)) to deploy an ARM
template (such as
[`deploy-main-template.json`](../tools/deployment/deploy-main-template.json))
[`deploy-main-template.json`](https://mmlspark.azureedge.net/buildartifacts/0.9/deploy-main-template.json))
along with a parameter file (see
[deploy-parameters.template](../tools/deployment/deploy-parameters.template)
for a template of such a file).

> Note that you cannot use the
> [template file](../tools/deployment/deploy-main-template.json) from
> the source tree, since it requires additional resources that are
> created by the build (specifically, a working version of
> [`install-mmlspark.sh`](../tools/hdi/install-mmlspark.sh)).
The script take the following arguments:
- `subscriptionId`: The GUID that identifies your subscription (e.g.,
`01234567-89ab-cdef-0123-456789abcdef`), defaults to setting in your
`az` environment
`az` environment.
- `resourceGroupName` (required): If the name doesn’t exist a new
resource group will be created
resource group will be created.
- `resourceGroupLocation`: The location of the resource group (e.g.,
`East US`), note that this is required if creating a new resource
group
- `deploymentName`: The name for this deployment
- `templateFilePath`: The path to the ARM template file. By default, it
is set to `deploy-main-template.json`
group.
- `deploymentName`: The name for this deployment.
- `templateLocation`: The URL of an ARM template file, or the path to
one. By default, it is set to `deploy-main-template.json` in the same
directory, but note that this will normally not work without the rest
of the required resources.
- `parametersFilePath`: The path to the parameter file, which you need
to create. Use `deploy-parameters.template` as a template for
creating a parameters file.
Expand All @@ -115,17 +132,18 @@ set these arguments:
./deploy-arm.sh -h

If no flags are specified on the command line, the script will prompt
for all values. If needed, install the Azure CLI 2.0 using the
you for all values. If needed, install the Azure CLI 2.0 using the
instruction found in the [Azure CLI Installation
Guide](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli).

### 3. Deploy an ARM template with the [MMLSpark Azure PowerShell](../tools/deployment/deploy-arm.ps1)
### 3. Deploy an ARM template with the [MMLSpark Azure PowerShell](https://mmlspark.azureedge.net/buildartifacts/0.9/deploy-arm.ps1)

MMLSpark also provides a [PowerShell
script](../tools/deployment/deploy-arm.ps1) to deploy ARM templates,
similar to the above bash script, run it with `-?` to see the usage
instructions (or use `get-help`). If needed, install the Azure
PowerShell cmdlets using the instructions in the [Azure PowerShell
script](https://mmlspark.azureedge.net/buildartifacts/0.9/deploy-arm.ps1)
to deploy ARM templates, similar to the above bash script, run it with
`-?` to see the usage instructions (or use `get-help`). If needed,
install the Azure PowerShell cmdlets using the instructions in the
[Azure PowerShell
Guide](https://docs.microsoft.com/powershell/azureps-cmdlets-docs/).

## Set up passwordless SSH login to the GPU VM
Expand Down
3 changes: 3 additions & 0 deletions tools/deployment/deploy-arm.ps1
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
# Copyright (C) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License. See LICENSE in project root for information.

<#
.SYNOPSIS
Expand Down
23 changes: 14 additions & 9 deletions tools/deployment/deploy-arm.sh
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ usage() {
echo "Usage: $(basename "$0") \\"
echo " -i <subscriptionId> -g <resourceGroupName> \\"
echo " -n <deploymentName> -l <resourceGroupLocation> \\"
echo " -t <templateFilePath> -p <parametersFilePath>"
echo " -t <templateLocation> -p <parametersFilePath>"
echo "Run without any arguments for interactive argument reading."
echo "Use \"$here/deploy-parameters.template\" to create your parameters file."
exit
Expand All @@ -28,15 +28,15 @@ subscriptionId=""
resourceGroupName=""
deploymentName=""
resourceGroupLocation=""
templateFilePath=""
templateLocation=""
parametersFilePath=""
while getopts ":i:g:n:l:t:p:" arg; do
case "${arg}" in
( i ) subscriptionId="${OPTARG}" ;;
( g ) resourceGroupName="${OPTARG}" ;;
( n ) deploymentName="${OPTARG}" ;;
( l ) resourceGroupLocation="${OPTARG}" ;;
( t ) templateFilePath="${OPTARG}" ;;
( t ) templateLocation="${OPTARG}" ;;
( p ) parametersFilePath="${OPTARG}" ;;
esac
done
Expand All @@ -57,8 +57,8 @@ readarg() { # [-rf] varname name [default]
echo "Setting $var to default value: \"$dflt\""; X="$dflt"
fi
fi
if [[ $file = 1 && ! -r "$X" ]]; then failwith "$var: $X not found"; fi
if [[ $req = 1 && -z "$X" ]]; then failwith "$name required"; fi
if [[ $file = 1 && ! -r "$X" ]]; then failwith "$var: \"$X\" not found"; fi
}

# login if needed
Expand All @@ -71,7 +71,7 @@ readarg subscriptionId "Subscription ID" "$cursub"
readarg -r resourceGroupName "Resource Group Name"
readarg deploymentName "Deployment Name"
readarg resourceGroupLocation "Resource Group Location"
readarg -f templateFilePath "Template File" "$here/deploy-main-template.json"
readarg templateLocation "Template Location (Path/URL)" "$here/deploy-main-template.json"
readarg -rf parametersFilePath "Parameters File"

if [[ "$subscriptionId" != "$cursub" ]]; then
Expand All @@ -98,9 +98,14 @@ fi
echo "Starting deployment..."
args=()
if [[ -n "$deploymentName" ]]; then args+=(--name "$deploymentName"); fi
args+=(--resource-group "$resourceGroupName"
--template-file "$templateFilePath"
--parameters "@$parametersFilePath")
args+=(--resource-group "$resourceGroupName")
if [[ "$templateLocation" = "http://"* ]]; then args+=(--template-uri)
elif [[ "$templateLocation" = "https://"* ]]; then args+=(--template-uri)
elif [[ -r "$templateLocation" ]]; then args+=(--template-file)
else failwith "templateLocation is neither a URL, nor does it point at a file"
fi
args+=("$templateLocation")
args+=(--parameters "@$parametersFilePath")

az group deployment create "${args[@]}" || failwith "Deployment failed"
echo "Template has been successfully created and deployed"
echo "Template has been successfully deployed"
7 changes: 3 additions & 4 deletions tools/deployment/deploy-main-template.json
Original file line number Diff line number Diff line change
Expand Up @@ -91,11 +91,10 @@
}
},
"variables": {
"vnetName": "[concat(parameters('clusterName'),'-vnet')]",
"vnetName": "[concat(parameters('clusterName'), '-vnet')]",
"subnetName": "subnet1",
"templateBaseUrl": "https://raw.githubusercontent.com/Azure/mmlspark/master/tools/deployment/",
"sparkClusterTemplateUrl": "[concat(variables('templateBaseUrl'), 'spark-cluster-template.json')]",
"gpuVmTemplateUrl": "[concat(variables('templateBaseUrl'), 'gpu-vm-template.json')]",
"sparkClusterTemplateUrl": "[uri(deployment().properties.templateLink.uri, 'spark-cluster-template.json')]",
"gpuVmTemplateUrl": "[uri(deployment().properties.templateLink.uri, 'gpu-vm-template.json')]",
"sparkClusterDeploymentName": "sparkClusterTemplate",
"vmDeploymentName": "gpuVmTemplate"
},
Expand Down
40 changes: 10 additions & 30 deletions tools/deployment/deploy-parameters.template
Original file line number Diff line number Diff line change
Expand Up @@ -9,35 +9,15 @@ section when you're done.)
"$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentParameters.json#",
"contentVersion": "1.0.0.0",
"parameters": {
"clusterName": {
"value": "your spark cluster name"
},
"clusterLoginUserName": {
"value": "admin"
},
"clusterLoginPassword": {
"value": "your password"
},
"sshUserName": {
"value": "sshuser"
},
"sshPassword": {
"value": "your password"
},
"headNodeSize": {
"value": "Standard_D3_v2"
},
"workerNodeCount": {
"value": "1"
},
"workerNodeSize": {
"value": "Standard_D3_v2"
},
"gpuVirtualMachineName": {
"value": "your gpu vm name"
},
"gpuVirtualMachineSize": {
"value": "Standard_NC12"
}
"clusterName": { "value": "your spark cluster name" },
"clusterLoginUserName": { "value": "admin" },
"clusterLoginPassword": { "value": "your password" },
"sshUserName": { "value": "sshuser" },
"sshPassword": { "value": "your password" },
"headNodeSize": { "value": "Standard_D3_v2" },
"workerNodeCount": { "value": "1" },
"workerNodeSize": { "value": "Standard_D3_v2" },
"gpuVirtualMachineName": { "value": "your gpu vm name" },
"gpuVirtualMachineSize": { "value": "Standard_NC12" }
}
}
2 changes: 2 additions & 0 deletions tools/deployment/gpu-setup.sh
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
#!/usr/bin/env bash
# Copyright (C) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License. See LICENSE in project root for information.

# Install the prerequisites for MMLSpark on a GPU VM

Expand Down
4 changes: 2 additions & 2 deletions tools/deployment/gpu-vm-template.json
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,7 @@
"autoUpgradeMinorVersion": true,
"settings": {
"fileUris": [
"https://raw.githubusercontent.com/Azure/mmlspark/master/tools/deployment/gpu-setup.sh"
"[uri(deployment().properties.templateLink.uri, 'gpu-setup.sh')]"
],
"commandToExecute": "./gpu-setup.sh"
}
Expand Down Expand Up @@ -196,7 +196,7 @@
"outputs": {
"gpuvm": {
"type": "object",
"value": "[reference(resourceId('Microsoft.Compute/virtualMachines',parameters('virtualMachineName')))]"
"value": "[reference(resourceId('Microsoft.Compute/virtualMachines', parameters('virtualMachineName')))]"
}
}
}
18 changes: 9 additions & 9 deletions tools/deployment/spark-cluster-template.json
Original file line number Diff line number Diff line change
Expand Up @@ -87,18 +87,18 @@
},
"variables": {
"defaultStorageAccount": {
"name": "[concat('wasbsto00',uniqueString(resourceGroup().id))]",
"name": "[concat('wasbsto00', uniqueString(resourceGroup().id))]",
"type": "Standard_LRS"
},
"vNet": {
"name": "[parameters('virtualNetworkName')]",
"addressSpacePrefix": "10.0.0.0/16",
"subnetName": "[parameters('virtualNetworkSubnetName')]",
"subnetPrefix": "10.0.0.0/24",
"id": "[resourceId('Microsoft.Network/virtualNetworks', concat(parameters('clusterName'),'-vnet'))]",
"subnet": "[concat(resourceId('Microsoft.Network/virtualNetworks', concat(parameters('clusterName'),'-vnet')), concat('/subnets/', parameters('virtualNetworkSubnetName')))]"
"id": "[resourceId('Microsoft.Network/virtualNetworks', concat(parameters('clusterName'), '-vnet'))]",
"subnet": "[concat(resourceId('Microsoft.Network/virtualNetworks', concat(parameters('clusterName'), '-vnet')), '/subnets/', parameters('virtualNetworkSubnetName'))]"
},
"scriptActionUri": "https://mmlspark.azureedge.net/buildartifacts/0.9/install-mmlspark.sh"
"scriptActionUri": "[uri(deployment().properties.templateLink.uri, 'install-mmlspark.sh')]"
},
"resources": [
{
Expand Down Expand Up @@ -139,8 +139,8 @@
"location": "[resourceGroup().location]",
"apiVersion": "2015-03-01-preview",
"dependsOn": [
"[concat('Microsoft.Storage/storageAccounts/',variables('defaultStorageAccount').name)]",
"[concat('Microsoft.Network/virtualNetworks/',variables('vNet').name)]"
"[concat('Microsoft.Storage/storageAccounts/', variables('defaultStorageAccount').name)]",
"[concat('Microsoft.Network/virtualNetworks/', variables('vNet').name)]"
],
"tags": {},
"properties": {
Expand All @@ -163,7 +163,7 @@
"storageProfile": {
"storageaccounts": [
{
"name": "[replace(replace(reference(resourceId('Microsoft.Storage/storageAccounts', variables('defaultStorageAccount').name), '2016-01-01').primaryEndpoints.blob,'https://',''),'/','')]",
"name": "[replace(replace(reference(resourceId('Microsoft.Storage/storageAccounts', variables('defaultStorageAccount').name), '2016-01-01').primaryEndpoints.blob, 'https://', ''), '/', '')]",
"isDefault": true,
"container": "[toLower(parameters('clusterName'))]",
"key": "[listKeys(resourceId('Microsoft.Storage/storageAccounts', variables('defaultStorageAccount').name), '2016-01-01').keys[0].value]"
Expand Down Expand Up @@ -240,11 +240,11 @@
"outputs": {
"vnet": {
"type": "object",
"value": "[reference(resourceId('Microsoft.Network/virtualNetworks',variables('vNet').name))]"
"value": "[reference(resourceId('Microsoft.Network/virtualNetworks', variables('vNet').name))]"
},
"cluster": {
"type": "object",
"value": "[reference(resourceId('Microsoft.HDInsight/clusters',parameters('clusterName')))]"
"value": "[reference(resourceId('Microsoft.HDInsight/clusters', parameters('clusterName')))]"
}
}
}
3 changes: 3 additions & 0 deletions tools/hdi/install-mmlspark.sh
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,9 @@

# <=<= this line is replaced with variables defined with `defvar -X` =>=>
DOWNLOAD_URL="$STORAGE_URL/$MML_VERSION"
if [[ -z "$MML_VERSION" ]]; then
echo "Error: this script cannot be executed as-is" 1>&2; exit 1
fi

HDFS_NOTEBOOKS_FOLDER="/HdiNotebooks/Microsoft ML Spark Examples"

Expand Down
11 changes: 8 additions & 3 deletions tools/runme/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -260,14 +260,19 @@ _upload_artifacts_to_storage() {
mkdir -p "$tmp"
( cd "$BUILD_ARTIFACTS"
_ zip -qr9 "$tmp/$(basename "$BUILD_ARTIFACTS.zip")" * )
local f txt
local f txt target
local varlinerx="^(.*)# +<=<= .*? =>=>(.*)\$"
for f in "$TOOLSDIR/hdi/"*; do
for f in "$TOOLSDIR/"{hdi,deployment}"/"*; do
target="$tmp/$(basename "$f")"
if [[ -e "$target" ]]; then
failwith "duplicate file intended for $STORAGE_CONTAINER: $(basename "$f")";
fi
txt="$(< "$f")"
if [[ "$txt" =~ $varlinerx ]]; then
txt="${BASH_REMATCH[1]}$(_show_gen_vars)${BASH_REMATCH[2]}"
fi
echo "$txt" > "$tmp/$(basename "$f")"
# might be useful to allow <{...}> substitutions: _replace_var_substs txt
echo "$txt" > "$target"
done
_ azblob upload-batch --source "$tmp" --destination "$STORAGE_CONTAINER/$MML_VERSION"
_rm "$tmp"
Expand Down

0 comments on commit 7ef335c

Please sign in to comment.