You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: docs/get-started/xgboost-examples/csp/databricks/databricks.md
+36-40
Original file line number
Diff line number
Diff line change
@@ -6,27 +6,26 @@ This is a getting started guide to XGBoost4J-Spark on Databricks. At the end of
6
6
Prerequisites
7
7
-------------
8
8
9
-
* Apache Spark 3.1+ running in Databricks Runtime 9.1 ML or 10.4 ML with GPU
10
-
* AWS: 9.1 LTS ML (GPU, Scala 2.12, Spark 3.1.2) or 10.4 LTS ML (GPU, Scala 2.12, Spark 3.2.1)
11
-
* Azure: 9.1 LTS ML (GPU, Scala 2.12, Spark 3.1.2) or 10.4 LTS ML (GPU, Scala 2.12, Spark 3.2.1)
9
+
* Apache Spark 3.x running in Databricks Runtime 10.4 ML or 11.3 ML with GPU
10
+
* AWS: 10.4 LTS ML (GPU, Scala 2.12, Spark 3.2.1) or 11.3 LTS ML (GPU, Scala 2.12, Spark 3.3.0)
11
+
* Azure: 10.4 LTS ML (GPU, Scala 2.12, Spark 3.2.1) or 11.3 LTS ML (GPU, Scala 2.12, Spark 3.3.0)
12
12
13
13
The number of GPUs per node dictates the number of Spark executors that can run in that node. Each executor should only be allowed to run 1 task at any given time.
14
14
15
15
Start A Databricks Cluster
16
16
--------------------------
17
17
18
-
Create a Databricks cluster by clicking "+ Create -> Cluster" on the left panel. Ensure the
18
+
Create a Databricks cluster by going to "Compute", then clicking `+ Create compute`. Ensure the
19
19
cluster meets the prerequisites above by configuring it as follows:
20
20
1. Select the Databricks Runtime Version from one of the supported runtimes specified in the
21
21
Prerequisites section.
22
-
2. Under Autopilot Options, disable autoscaling.
23
-
3. Choose the number of workers you want to use.
24
-
4. Select a worker type. On AWS, use nodes with 1 GPU each such as `p3.2xlarge` or `g4dn.xlarge`.
22
+
2. Choose the number of workers that matches the number of GPUs you want to use.
23
+
3. Select a worker type. On AWS, use nodes with 1 GPU each such as `p3.2xlarge` or `g4dn.xlarge`.
25
24
p2 nodes do not meet the architecture requirements (Pascal or higher) for the Spark worker
26
25
(although they can be used for the driver node). For Azure, choose GPU nodes such as
27
-
Standard_NC6s_v3.
28
-
5. Select the driver type. Generally this can be set to be the same as the worker.
29
-
6. Start the cluster.
26
+
Standard_NC6s_v3. For GCP, choose N1 or A2 instance types with GPUs.
27
+
4. Select the driver type. Generally this can be set to be the same as the worker.
28
+
5. Start the cluster.
30
29
31
30
Advanced Cluster Configuration
32
31
--------------------------
@@ -38,20 +37,18 @@ cluster.
38
37
your workspace. See [Managing
39
38
Notebooks](https://docs.databricks.com/notebooks/notebooks-manage.html#id2) for instructions on
40
39
how to import a notebook.
41
-
Select the initialization script based on the Databricks runtime
40
+
Select the version of the RAPIDS Accelerator for Apache Spark based on the Databricks runtime
42
41
version:
43
-
44
-
-[Databricks 9.1 LTS
45
-
ML](https://docs.databricks.com/release-notes/runtime/9.1ml.html#system-environment) has CUDA 11
46
-
installed. Users will need to use 21.12.0 or later on Databricks 9.1 LTS ML. In this case use
47
-
[generate-init-script.ipynb](generate-init-script.ipynb) which will install
48
-
the RAPIDS Spark plugin.
49
-
50
-
-[Databricks 10.4 LTS
51
-
ML](https://docs.databricks.com/release-notes/runtime/9.1ml.html#system-environment) has CUDA 11
52
-
installed. Users will need to use 22.04.0 or later on Databricks 10.4 LTS ML. In this case use
53
-
[generate-init-script-10.4.ipynb](generate-init-script-10.4.ipynb) which will install
54
-
the RAPIDS Spark plugin.
42
+
-[Databricks 10.4 LTS
43
+
ML](https://docs.databricks.com/release-notes/runtime/10.4ml.html#system-environment) has CUDA 11
44
+
installed. Users will need to use 22.04.0 or later on Databricks 10.4 LTS ML.
45
+
-[Databricks 11.3 LTS
46
+
ML](https://docs.databricks.com/release-notes/runtime/11.3ml.html#system-environment) has CUDA 11
47
+
installed. Users will need to use 23.04.0 or later on Databricks 11.3 LTS ML.
48
+
49
+
In both cases use
50
+
[generate-init-script.ipynb](./generate-init-script.ipynb) which will install
51
+
the RAPIDS Spark plugin.
55
52
56
53
2. Once you are in the notebook, click the “Run All” button.
57
54
3. Ensure that the newly created init.sh script is present in the output from cell 2 and that the
"1. Edit your cluster, adding an initialization script from `dbfs:/databricks/init_scripts/init.sh` in the \"Advanced Options\" under \"Init Scripts\" tab\n",
134
134
"2. Reboot the cluster\n",
135
135
"3. Go to \"Libraries\" tab under your cluster and install `dbfs:/FileStore/jars/xgboost4j-spark-gpu_2.12-1.7.1.jar` in your cluster by selecting the \"DBFS\" option for installing jars\n",
136
-
"4. Import the mortgage example notebook from `https://github.com/NVIDIA/spark-rapids-examples/blob/branch-23.02/examples/XGBoost-Examples/mortgage/notebooks/python/mortgage-gpu.ipynb`\n",
136
+
"4. Import the mortgage example notebook from `https://github.com/NVIDIA/spark-rapids-examples/blob/branch-23.04/examples/XGBoost-Examples/mortgage/notebooks/python/mortgage-gpu.ipynb`\n",
137
137
"5. Inside the mortgage example notebook, update the data paths\n",
"1. Edit your cluster, adding an initialization script from `dbfs:/databricks/init_scripts/init.sh` in the \"Advanced Options\" under \"Init Scripts\" tab\n",
134
+
"2. Reboot the cluster\n",
135
+
"3. Go to \"Libraries\" tab under your cluster and install `dbfs:/FileStore/jars/xgboost4j-spark-gpu_2.12-1.7.3.jar` in your cluster by selecting the \"DBFS\" option for installing jars\n",
136
+
"4. Import the mortgage example notebook from `https://github.com/NVIDIA/spark-rapids-examples/blob/branch-23.04/examples/XGBoost-Examples/mortgage/notebooks/python/mortgage-gpu.ipynb`\n",
137
+
"5. Inside the mortgage example notebook, update the data paths\n",
"1. Edit your cluster, adding an initialization script from `dbfs:/databricks/init_scripts/init.sh` in the \"Advanced Options\" under \"Init Scripts\" tab\n",
134
134
"2. Reboot the cluster\n",
135
135
"3. Go to \"Libraries\" tab under your cluster and install `dbfs:/FileStore/jars/xgboost4j-spark-gpu_2.12-1.7.1.jar` in your cluster by selecting the \"DBFS\" option for installing jars\n",
136
-
"4. Import the mortgage example notebook from `https://github.com/NVIDIA/spark-rapids-examples/blob/branch-23.02/examples/XGBoost-Examples/mortgage/notebooks/python/mortgage-gpu.ipynb`\n",
136
+
"4. Import the mortgage example notebook from `https://github.com/NVIDIA/spark-rapids-examples/blob/branch-23.04/examples/XGBoost-Examples/mortgage/notebooks/python/mortgage-gpu.ipynb`\n",
137
137
"5. Inside the mortgage example notebook, update the data paths\n",
0 commit comments