Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Azure ML XGBoost instructions to Ubuntu 24.04 #465

Open
wants to merge 33 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
1df54c4
deleted $ for multiline commands
melodywang060 Oct 8, 2024
9b7088a
Update source/cloud/azure/aks.md
melodywang060 Oct 9, 2024
f1a8682
Update source/cloud/azure/aks.md
melodywang060 Oct 9, 2024
bcf36bc
Update source/cloud/azure/aks.md
melodywang060 Oct 9, 2024
f54e67b
fixed multiline command issue
melodywang060 Oct 9, 2024
c57b8c3
added more detailed instructions
melodywang060 Oct 10, 2024
1205ab9
added clearer user input sections
melodywang060 Oct 10, 2024
33b27db
more descripted title
melodywang060 Oct 10, 2024
2034658
fixed linting errors
melodywang060 Oct 10, 2024
540a35a
fixed small linting error
melodywang060 Oct 10, 2024
ef7a978
updated ubuntu versions
melodywang060 Oct 10, 2024
8a204a9
got rid of outdated package
melodywang060 Oct 10, 2024
8c3a176
added intermediary step for clarity
melodywang060 Oct 10, 2024
59d6343
changed hardcoded lines to FILL-THIS-IN
melodywang060 Oct 10, 2024
fa6613a
Update source/guides/azure/infiniband.md
melodywang060 Oct 10, 2024
3e7aace
Update source/guides/azure/infiniband.md
melodywang060 Oct 10, 2024
c62f7ab
Update source/guides/azure/infiniband.md
melodywang060 Oct 10, 2024
b9c1317
Update source/cloud/azure/aks.md
melodywang060 Oct 10, 2024
1880bc9
fixed backtick error
melodywang060 Oct 11, 2024
7c07bbe
ran black and pretty to format files:
melodywang060 Oct 11, 2024
9802cbf
ran ruff
melodywang060 Oct 11, 2024
6993b09
Merge branch 'main' into xgboost-azure
melodywang060 Oct 11, 2024
b4bce66
fix linting issues
melodywang060 Oct 11, 2024
48c4096
Merge branch 'main' into xgboost-azure
jacobtomlinson Oct 14, 2024
622f326
removed package.json and package-lock.json and added to .gitignore
melodywang060 Oct 14, 2024
e78dfa9
Merge branch 'xgboost-azure' of github.com:rapidsai/deployment into x…
melodywang060 Oct 14, 2024
9265020
Update source/guides/azure/infiniband.md
melodywang060 Oct 15, 2024
2ffcf9a
Update source/guides/azure/infiniband.md
melodywang060 Oct 15, 2024
b1f42dd
fixed merge conflict
melodywang060 Oct 16, 2024
81c0310
fixed merge conflicts
melodywang060 Oct 16, 2024
173935e
Merge branch 'xgboost-azure' of github.com:rapidsai/deployment into x…
melodywang060 Oct 16, 2024
178925f
fixed linting issues
melodywang060 Oct 16, 2024
df3fe1c
Update source/guides/azure/infiniband.md
jacobtomlinson Oct 31, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions source/_includes/check-gpu-pod-works.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Let's create a sample pod that uses some GPU compute to make sure that everything is working as expected.

```console
$ cat << EOF | kubectl create -f -
```bash
cat << EOF | kubectl create -f -
apiVersion: v1
kind: Pod
metadata:
Expand Down
8 changes: 4 additions & 4 deletions source/cloud/azure/aks.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,8 @@ $ az login

Now we can launch a GPU enabled AKS cluster. First launch an AKS cluster.

```console
$ az aks create -g <resource group> -n rapids \
```bash
az aks create -g <resource group> -n rapids \
--enable-managed-identity \
--node-count 1 \
--enable-addons monitoring \
Expand Down Expand Up @@ -91,8 +91,8 @@ $ az extension add --name aks-preview

`````

```console
$ az aks nodepool add \
```bash
az aks nodepool add \
melodywang060 marked this conversation as resolved.
Show resolved Hide resolved
--resource-group <resource group> \
--cluster-name rapids \
--name gpunp \
Expand Down
2 changes: 1 addition & 1 deletion source/cloud/azure/azureml.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ The compute instance provides an integrated Jupyter notebook service, JupyterLab

Sign in to [Azure Machine Learning Studio](https://ml.azure.com/) and navigate to your workspace on the left-side menu.

Select **Compute** > **+ New** > choose a [RAPIDS compatible GPU](https://medium.com/dropout-analytics/which-gpus-work-with-rapids-ai-f562ef29c75f) VM size (e.g., `Standard_NC12s_v3`)
Select **Compute** > **+ New** (Create compute instance) > choose a [RAPIDS compatible GPU](https://medium.com/dropout-analytics/which-gpus-work-with-rapids-ai-f562ef29c75f) VM size (e.g., `Standard_NC12s_v3`)

![Screenshot of create new notebook with a gpu-instance](../../images/azureml-create-notebook-instance.png)

Expand Down
13 changes: 9 additions & 4 deletions source/examples/rapids-azureml-hpo/notebook.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
]
},
"source": [
"# Train and Hyperparameter-Tune with RAPIDS"
"# Train and Hyperparameter-Tune with RAPIDS on AzureML"
]
},
{
Expand Down Expand Up @@ -97,12 +97,17 @@
"from azure.ai.ml import MLClient\n",
"from azure.identity import DefaultAzureCredential\n",
"\n",
"\n",
"subscription_id = \"FILL IN WITH YOUR AZURE ML CREDENTIALS\"\n",
"resource_group_name = \"FILL IN WITH YOUR AZURE ML CREDENTIALS\"\n",
"workspace_name = \"FILL IN WITH YOUR AZURE ML CREDENTIALS\"\n",
"\n",
"# Get a handle to the workspace\n",
"ml_client = MLClient(\n",
" credential=DefaultAzureCredential(),\n",
" subscription_id=\"fc4f4a6b-4041-4b1c-8249-854d68edcf62\",\n",
" resource_group_name=\"rapidsai-deployment\",\n",
" workspace_name=\"rapids-aml-cluster\",\n",
" subscription_id= subscription_id,\n",
" resource_group_name= resource_group_name,\n",
" workspace_name= workspace_name\n",
")\n",
"\n",
"print(\n",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -178,10 +178,10 @@
"metadata": {},
"outputs": [],
"source": [
"location = \"West US 2\"\n",
"resource_group = \"rapidsai-deployment\"\n",
"vnet = \"rapidsai-deployment-vnet\"\n",
"security_group = \"rapidsaiclouddeploymenttest-nsg\"\n",
"location = \"FILL-THIS-IN\"\n",
"resource_group = \"FILL-THIS-IN\"\n",
"vnet = \"FILL-THIS-IN\"\n",
"security_group = \"FILL-THIS-IN\"\n",
"vm_size = \"Standard_NC12s_v3\" # or choose a different GPU enabled VM type\n",
"\n",
"docker_image = \"{{rapids_container}}\"\n",
Expand Down
22 changes: 14 additions & 8 deletions source/guides/azure/infiniband.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,8 @@ for demonstration.
- Select `East US` region.
- Change `Availability options` to `Availability set` and create a set.
- If building multiple instances put additional instances in the same set.
- Use the 2nd Gen Ubuntu 20.04 image.
- Search all images for `Ubuntu Server 20.04` and choose the second one down on the list.
- Use the 2nd Gen Ubuntu 24.04 image.
- Search all images for `Ubuntu Server 24.04` and choose the second one down on the list.
- Change size to `ND40rs_v2`.
- Set password login with credentials.
- User `someuser`
Expand All @@ -39,8 +39,8 @@ The commands below should work for Ubuntu. See the [CUDA Toolkit documentation](
```shell
sudo apt-get install -y linux-headers-$(uname -r)
distribution=$(. /etc/os-release;echo $ID$VERSION_ID | sed -e 's/\.//g')
wget https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64/cuda-keyring_1.0-1_all.deb
sudo dpkg -i cuda-keyring_1.0-1_all.deb
wget https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update
sudo apt-get -y install cuda-drivers
```
Expand Down Expand Up @@ -118,11 +118,11 @@ Mon Nov 14 20:32:39 2022

### InfiniBand Driver

On Ubuntu 20.04
On Ubuntu 24.04

```shell
sudo apt-get install -y automake dh-make git libcap2 libnuma-dev libtool make pkg-config udev curl librdmacm-dev rdma-core \
libgfortran5 bison chrpath flex graphviz gfortran tk dpatch quilt swig tcl ibverbs-utils
libgfortran5 bison chrpath flex graphviz gfortran tk quilt swig tcl ibverbs-utils
```

Check install
Expand Down Expand Up @@ -247,14 +247,20 @@ wget https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforg
bash Mambaforge-Linux-x86_64.sh
```

Accept the default and allow conda init to run. Then start a new shell.
Accept the default and allow conda init to run.
``shell
melodywang060 marked this conversation as resolved.
Show resolved Hide resolved
~/mambaforge/bin/conda init

jacobtomlinson marked this conversation as resolved.
Show resolved Hide resolved
````
melodywang060 marked this conversation as resolved.
Show resolved Hide resolved

Then start a new shell.

Create a conda environment (see [UCX-Py](https://ucx-py.readthedocs.io/en/latest/install.html) docs)

```shell
mamba create -n ucxpy {{ rapids_conda_channels }} {{ rapids_conda_packages }} ipython ucx-proc=*=gpu ucx ucx-py dask distributed numpy cupy pytest pynvml -y
mamba activate ucxpy
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look right to me... were these removed by accident? Please check how this renders in the GitHub UI on your branch (https://github.com/rapidsai/deployment/blob/xgboost-azure/source/guides/azure/infiniband.md)... I can see there that removing this causes some text that's not intended to be code-formatting being represented in a code block.

Screenshot 2024-10-14 at 2 05 49 PM

In case you're new to working in markdown, I've found these really helpful:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the links!

````
melodywang060 marked this conversation as resolved.
Show resolved Hide resolved

Clone UCX-Py repo locally

Expand Down
Loading