Skip to content

Commit 41f25c7

Browse files
nvliyuannvautopxLiwbo4958viadea
authored
merge dev-2210 branch to Main branch (#237)
* Init 22.10.0-SNAPSHOT (#214) Signed-off-by: Peixin Li <[email protected]> Signed-off-by: Peixin Li <[email protected]> * update version and fix some document error, add more comments for running xgboost notebooks on GCP (#215) (#222) Signed-off-by: liyuan <[email protected]> Signed-off-by: liyuan <[email protected]> Signed-off-by: liyuan <[email protected]> * update version and fix some document error, add more comments for running xgboost notebooks on GCP (#215) (#224) Signed-off-by: liyuan <[email protected]> Signed-off-by: liyuan <[email protected]> Signed-off-by: liyuan <[email protected]> * Update default cmake to 3.23.X in udf exmaple dockerfile (#227) Signed-off-by: Peixin Li <[email protected]> Signed-off-by: Peixin Li <[email protected]> * [xgboost] Remove default parameters (#226) * remove the default parameters for xgboost examples * remove the default parameters Signed-off-by: Bobby Wang <[email protected]> * remove unused variables for mortgage-ETL Signed-off-by: Bobby Wang <[email protected]> * add more details/notes for the mortgage perforamcne tests (#229) * add more details/notes for the mortgage perforamcne tests Signed-off-by: liyuan <[email protected]> * Update examples/XGBoost-Examples/README.md Co-authored-by: Hao Zhu <[email protected]> * Update examples/XGBoost-Examples/README.md Co-authored-by: Hao Zhu <[email protected]> * Update examples/XGBoost-Examples/README.md Co-authored-by: Hao Zhu <[email protected]> Signed-off-by: liyuan <[email protected]> Co-authored-by: Hao Zhu <[email protected]> * Enable automerge from 22.10 to 22.12 (#230) Signed-off-by: Peixin Li <[email protected]> Signed-off-by: Peixin Li <[email protected]> * update versions for v22.10 release (#235) Signed-off-by: liyuan <[email protected]> Signed-off-by: liyuan <[email protected]> Signed-off-by: Peixin Li <[email protected]> Signed-off-by: liyuan <[email protected]> Signed-off-by: Bobby Wang <[email protected]> Co-authored-by: Jenkins Automation <[email protected]> Co-authored-by: Peixin <[email protected]> Co-authored-by: Bobby Wang <[email protected]> Co-authored-by: Hao Zhu <[email protected]>
1 parent 998abfb commit 41f25c7

File tree

31 files changed

+60
-87
lines changed

31 files changed

+60
-87
lines changed

.github/workflows/auto-merge.yml

+4-4
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ name: auto-merge HEAD to BASE
1818
on:
1919
pull_request_target:
2020
branches:
21-
- branch-22.08
21+
- branch-22.10
2222
types: [closed]
2323

2424
jobs:
@@ -29,13 +29,13 @@ jobs:
2929
steps:
3030
- uses: actions/checkout@v2
3131
with:
32-
ref: branch-22.08 # force to fetch from latest upstream instead of PR ref
32+
ref: branch-22.10 # force to fetch from latest upstream instead of PR ref
3333

3434
- name: auto-merge job
3535
uses: ./.github/workflows/auto-merge
3636
env:
3737
OWNER: NVIDIA
3838
REPO_NAME: spark-rapids-examples
39-
HEAD: branch-22.08
40-
BASE: branch-22.10
39+
HEAD: branch-22.10
40+
BASE: branch-22.12
4141
AUTOMERGE_TOKEN: ${{ secrets.AUTOMERGE_TOKEN }} # use to merge PR

docs/get-started/xgboost-examples/csp/databricks/generate-init-script-10.4.ipynb

+3-3
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@
2424
"source": [
2525
"%sh\n",
2626
"cd ../../dbfs/FileStore/jars/\n",
27-
"sudo wget -O rapids-4-spark_2.12-22.08.0.jar https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/22.08.0/rapids-4-spark_2.12-22.08.0.jar\n",
27+
"sudo wget -O rapids-4-spark_2.12-22.10.0.jar https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/22.10.0/rapids-4-spark_2.12-22.10.0.jar\n",
2828
"sudo wget -O xgboost4j-gpu_2.12-1.6.1.jar https://repo1.maven.org/maven2/ml/dmlc/xgboost4j-gpu_2.12/1.6.1/xgboost4j-gpu_2.12-1.6.1.jar\n",
2929
"sudo wget -O xgboost4j-spark-gpu_2.12-1.6.1.jar https://repo1.maven.org/maven2/ml/dmlc/xgboost4j-spark-gpu_2.12/1.6.1/xgboost4j-spark-gpu_2.12-1.6.1.jar\n",
3030
"ls -ltr\n",
@@ -60,7 +60,7 @@
6060
"sudo rm -f /databricks/jars/spark--maven-trees--ml--10.x--xgboost-gpu--ml.dmlc--xgboost4j-spark-gpu_2.12--ml.dmlc__xgboost4j-spark-gpu_2.12__1.5.2.jar\n",
6161
"\n",
6262
"sudo cp /dbfs/FileStore/jars/xgboost4j-gpu_2.12-1.6.1.jar /databricks/jars/\n",
63-
"sudo cp /dbfs/FileStore/jars/rapids-4-spark_2.12-22.08.0.jar /databricks/jars/\n",
63+
"sudo cp /dbfs/FileStore/jars/rapids-4-spark_2.12-22.10.0.jar /databricks/jars/\n",
6464
"sudo cp /dbfs/FileStore/jars/xgboost4j-spark-gpu_2.12-1.6.1.jar /databricks/jars/\"\"\", True)"
6565
]
6666
},
@@ -133,7 +133,7 @@
133133
"1. Edit your cluster, adding an initialization script from `dbfs:/databricks/init_scripts/init.sh` in the \"Advanced Options\" under \"Init Scripts\" tab\n",
134134
"2. Reboot the cluster\n",
135135
"3. Go to \"Libraries\" tab under your cluster and install `dbfs:/FileStore/jars/xgboost4j-spark-gpu_2.12-1.6.1.jar` in your cluster by selecting the \"DBFS\" option for installing jars\n",
136-
"4. Import the mortgage example notebook from `https://github.com/NVIDIA/spark-rapids-examples/blob/branch-22.08/examples/XGBoost-Examples/mortgage/notebooks/python/mortgage-gpu.ipynb`\n",
136+
"4. Import the mortgage example notebook from `https://github.com/NVIDIA/spark-rapids-examples/blob/branch-22.10/examples/XGBoost-Examples/mortgage/notebooks/python/mortgage-gpu.ipynb`\n",
137137
"5. Inside the mortgage example notebook, update the data paths\n",
138138
" `train_data = reader.schema(schema).option('header', True).csv('/data/mortgage/csv/small-train.csv')`\n",
139139
" `trans_data = reader.schema(schema).option('header', True).csv('/data/mortgage/csv/small-trans.csv')`"

docs/get-started/xgboost-examples/csp/databricks/generate-init-script.ipynb

+3-3
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@
2424
"source": [
2525
"%sh\n",
2626
"cd ../../dbfs/FileStore/jars/\n",
27-
"sudo wget -O rapids-4-spark_2.12-22.08.0.jar https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/22.08.0/rapids-4-spark_2.12-22.08.0.jar\n",
27+
"sudo wget -O rapids-4-spark_2.12-22.10.0.jar https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/22.10.0/rapids-4-spark_2.12-22.10.0.jar\n",
2828
"sudo wget -O xgboost4j-gpu_2.12-1.6.1.jar https://repo1.maven.org/maven2/ml/dmlc/xgboost4j-gpu_2.12/1.6.1/xgboost4j-gpu_2.12-1.6.1.jar\n",
2929
"sudo wget -O xgboost4j-spark-gpu_2.12-1.6.1.jar https://repo1.maven.org/maven2/ml/dmlc/xgboost4j-spark-gpu_2.12/1.6.1/xgboost4j-spark-gpu_2.12-1.6.1.jar\n",
3030
"ls -ltr\n",
@@ -60,7 +60,7 @@
6060
"sudo rm -f /databricks/jars/spark--maven-trees--ml--9.x--xgboost-gpu--ml.dmlc--xgboost4j-spark-gpu_2.12--ml.dmlc__xgboost4j-spark-gpu_2.12__1.4.1.jar\n",
6161
"\n",
6262
"sudo cp /dbfs/FileStore/jars/xgboost4j-gpu_2.12-1.6.1.jar /databricks/jars/\n",
63-
"sudo cp /dbfs/FileStore/jars/rapids-4-spark_2.12-22.08.0.jar /databricks/jars/\n",
63+
"sudo cp /dbfs/FileStore/jars/rapids-4-spark_2.12-22.10.0.jar /databricks/jars/\n",
6464
"sudo cp /dbfs/FileStore/jars/xgboost4j-spark-gpu_2.12-1.6.1.jar /databricks/jars/\"\"\", True)"
6565
]
6666
},
@@ -133,7 +133,7 @@
133133
"1. Edit your cluster, adding an initialization script from `dbfs:/databricks/init_scripts/init.sh` in the \"Advanced Options\" under \"Init Scripts\" tab\n",
134134
"2. Reboot the cluster\n",
135135
"3. Go to \"Libraries\" tab under your cluster and install `dbfs:/FileStore/jars/xgboost4j-spark-gpu_2.12-1.6.1.jar` in your cluster by selecting the \"DBFS\" option for installing jars\n",
136-
"4. Import the mortgage example notebook from `https://github.com/NVIDIA/spark-rapids-examples/blob/branch-22.08/examples/XGBoost-Examples/mortgage/notebooks/python/mortgage-gpu.ipynb`\n",
136+
"4. Import the mortgage example notebook from `https://github.com/NVIDIA/spark-rapids-examples/blob/branch-22.10/examples/XGBoost-Examples/mortgage/notebooks/python/mortgage-gpu.ipynb`\n",
137137
"5. Inside the mortgage example notebook, update the data paths\n",
138138
" `train_data = reader.schema(schema).option('header', True).csv('/data/mortgage/csv/small-train.csv')`\n",
139139
" `trans_data = reader.schema(schema).option('header', True).csv('/data/mortgage/csv/small-trans.csv')`"

docs/get-started/xgboost-examples/on-prem-cluster/kubernetes-scala.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ export SPARK_DOCKER_IMAGE=<gpu spark docker image repo and name>
4040
export SPARK_DOCKER_TAG=<spark docker image tag>
4141

4242
pushd ${SPARK_HOME}
43-
wget https://github.com/NVIDIA/spark-rapids-examples/raw/branch-22.08/dockerfile/Dockerfile
43+
wget https://github.com/NVIDIA/spark-rapids-examples/raw/branch-22.10/dockerfile/Dockerfile
4444

4545
# Optionally install additional jars into ${SPARK_HOME}/jars/
4646

docs/get-started/xgboost-examples/prepare-package-data/preparation-python.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ For simplicity export the location to these jars. All examples assume the packag
99
* [XGBoost4j-Spark Package](https://repo1.maven.org/maven2/com/nvidia/xgboost4j-spark_3.0/1.4.2-0.3.0/)
1010

1111
2. Download the RAPIDS Accelerator for Apache Spark plugin jar
12-
* [RAPIDS Spark Package](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/22.08.0/rapids-4-spark_2.12-22.08.0.jar)
12+
* [RAPIDS Spark Package](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/22.10.0/rapids-4-spark_2.12-22.10.0.jar)
1313

1414
### Build XGBoost Python Examples
1515

@@ -26,7 +26,7 @@ You need to copy the dataset to `/opt/xgboost`. Use the following links to downl
2626

2727
``` bash
2828
export SPARK_XGBOOST_DIR=/opt/xgboost
29-
export RAPIDS_JAR=${SPARK_XGBOOST_DIR}/rapids-4-spark_2.12-22.08.0.jar
29+
export RAPIDS_JAR=${SPARK_XGBOOST_DIR}/rapids-4-spark_2.12-22.10.0.jar
3030
export XGBOOST4J_JAR=${SPARK_XGBOOST_DIR}/xgboost4j_3.0-1.4.2-0.3.0.jar
3131
export XGBOOST4J_SPARK_JAR=${SPARK_XGBOOST_DIR}/xgboost4j-spark_3.0-1.4.2-0.3.0.jar
3232
export SAMPLE_ZIP=${SPARK_XGBOOST_DIR}/samples.zip

docs/get-started/xgboost-examples/prepare-package-data/preparation-scala.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ For simplicity export the location to these jars. All examples assume the packag
55
### Download the jars
66

77
1. Download the RAPIDS Accelerator for Apache Spark plugin jar
8-
* [RAPIDS Spark Package](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/22.08.0/rapids-4-spark_2.12-22.08.0.jar)
8+
* [RAPIDS Spark Package](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/22.10.0/rapids-4-spark_2.12-22.10.0.jar)
99

1010
### Build XGBoost Scala Examples
1111

@@ -22,6 +22,6 @@ You need to copy the dataset to `/opt/xgboost`. Use the following links to downl
2222

2323
``` bash
2424
export SPARK_XGBOOST_DIR=/opt/xgboost
25-
export RAPIDS_JAR=${SPARK_XGBOOST_DIR}/rapids-4-spark_2.12-22.08.0.jar
25+
export RAPIDS_JAR=${SPARK_XGBOOST_DIR}/rapids-4-spark_2.12-22.10.0.jar
2626
export SAMPLE_JAR=${SPARK_XGBOOST_DIR}/sample_xgboost_apps-0.2.3-jar-with-dependencies.jar
2727
```

examples/ML+DL-Examples/Spark-cuML/pca/Dockerfile

+1-1
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@
1717

1818
ARG CUDA_VER=11.5.1
1919
FROM nvidia/cuda:${CUDA_VER}-devel-ubuntu20.04
20-
ARG BRANCH_VER=22.08
20+
ARG BRANCH_VER=22.10
2121

2222
RUN apt-get update
2323
RUN apt-get install -y wget ninja-build git

examples/ML+DL-Examples/Spark-cuML/pca/README.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ User can also download the release jar from Maven central:
1212

1313
[rapids-4-spark-ml_2.12-22.02.0-cuda11.jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark-ml_2.12/22.02.0/rapids-4-spark-ml_2.12-22.02.0-cuda11.jar)
1414

15-
[rapids-4-spark_2.12-22.08.0.jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/22.08.0/rapids-4-spark_2.12-22.08.0.jar)
15+
[rapids-4-spark_2.12-22.10.0.jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/22.10.0/rapids-4-spark_2.12-22.10.0.jar)
1616

1717

1818
## Sample code
@@ -48,7 +48,7 @@ It is assumed that a Standalone Spark cluster has been set up, the `SPARK_MASTER
4848

4949
``` bash
5050
RAPIDS_ML_JAR=PATH_TO_rapids-4-spark-ml_2.12-22.02.0-cuda11.jar
51-
PLUGIN_JAR=PATH_TO_rapids-4-spark_2.12-22.08.0.jar
51+
PLUGIN_JAR=PATH_TO_rapids-4-spark_2.12-22.10.0.jar
5252
5353
jupyter toree install \
5454
--spark_home=${SPARK_HOME} \

examples/ML+DL-Examples/Spark-cuML/pca/pom.xml

+2-2
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@
2121
<groupId>com.nvidia</groupId>
2222
<artifactId>PCAExample</artifactId>
2323
<packaging>jar</packaging>
24-
<version>22.08.0-SNAPSHOT</version>
24+
<version>22.10.0-SNAPSHOT</version>
2525

2626
<properties>
2727
<maven.compiler.source>8</maven.compiler.source>
@@ -51,7 +51,7 @@
5151
<dependency>
5252
<groupId>com.nvidia</groupId>
5353
<artifactId>rapids-4-spark-ml_2.12</artifactId>
54-
<version>22.08.0-SNAPSHOT</version>
54+
<version>22.10.0-SNAPSHOT</version>
5555
</dependency>
5656
</dependencies>
5757

examples/ML+DL-Examples/Spark-cuML/pca/spark-submit.sh

+3-3
Original file line numberDiff line numberDiff line change
@@ -15,8 +15,8 @@
1515
# limitations under the License.
1616
#
1717

18-
ML_JAR=/root/.m2/repository/com/nvidia/rapids-4-spark-ml_2.12/22.08.0-SNAPSHOT/rapids-4-spark-ml_2.12-22.08.0-SNAPSHOT.jar
19-
PLUGIN_JAR=/root/.m2/repository/com/nvidia/rapids-4-spark_2.12/22.08.0-SNAPSHOT/rapids-4-spark_2.12-22.08.0-SNAPSHOT.jar
18+
ML_JAR=/root/.m2/repository/com/nvidia/rapids-4-spark-ml_2.12/22.10.0-SNAPSHOT/rapids-4-spark-ml_2.12-22.10.0-SNAPSHOT.jar
19+
PLUGIN_JAR=/root/.m2/repository/com/nvidia/rapids-4-spark_2.12/22.10.0-SNAPSHOT/rapids-4-spark_2.12-22.10.0-SNAPSHOT.jar
2020

2121
$SPARK_HOME/bin/spark-submit \
2222
--master spark://127.0.0.1:7077 \
@@ -38,4 +38,4 @@ $SPARK_HOME/bin/spark-submit \
3838
--conf spark.network.timeout=1000s \
3939
--jars $ML_JAR,$PLUGIN_JAR \
4040
--class com.nvidia.spark.examples.pca.Main \
41-
/workspace/target/PCAExample-22.08.0-SNAPSHOT.jar
41+
/workspace/target/PCAExample-22.10.0-SNAPSHOT.jar

examples/SQL+DF-Examples/micro-benchmarks/notebooks/micro-benchmarks-gpu.ipynb

+1-1
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@
2222
"import os\n",
2323
"# Change to your cluster ip:port and directories\n",
2424
"SPARK_MASTER_URL = os.getenv(\"SPARK_MASTER_URL\", \"spark:your-ip:port\")\n",
25-
"RAPIDS_JAR = os.getenv(\"RAPIDS_JAR\", \"/your-path/rapids-4-spark_2.12-22.08.0.jar\")\n"
25+
"RAPIDS_JAR = os.getenv(\"RAPIDS_JAR\", \"/your-path/rapids-4-spark_2.12-22.10.0.jar\")\n"
2626
]
2727
},
2828
{

examples/UDF-Examples/RAPIDS-accelerated-UDFs/Dockerfile

+1-1
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@ CUDA_VERSION_MINOR=$(echo $CUDA_VERSION | tr -d '.' | cut -c 3); \
5858
# Set JDK8 as the default Java
5959
&& update-alternatives --set java /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java
6060

61-
ARG CMAKE_VERSION=3.20.5
61+
ARG CMAKE_VERSION=3.23.3
6262

6363
# Install CMake
6464
RUN cd /tmp \

examples/UDF-Examples/RAPIDS-accelerated-UDFs/README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -108,7 +108,7 @@ See above Prerequisites section
108108
First finish the steps in "Building with Native Code Examples and run test cases" section, then do the following in the docker.
109109

110110
### Get jars from Maven Central
111-
[rapids-4-spark_2.12-22.08.0.jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/22.08.0/rapids-4-spark_2.12-22.08.0.jar)
111+
[rapids-4-spark_2.12-22.10.0.jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/22.10.0/rapids-4-spark_2.12-22.10.0.jar)
112112

113113
### Launch a local mode Spark
114114

examples/UDF-Examples/RAPIDS-accelerated-UDFs/pom.xml

+2-2
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@
2525
user defined functions for use with the RAPIDS Accelerator
2626
for Apache Spark
2727
</description>
28-
<version>22.08.0-SNAPSHOT</version>
28+
<version>22.10.0-SNAPSHOT</version>
2929

3030
<properties>
3131
<maven.compiler.source>1.8</maven.compiler.source>
@@ -37,7 +37,7 @@
3737
<cuda.version>cuda11</cuda.version>
3838
<scala.binary.version>2.12</scala.binary.version>
3939
<!-- Depends on release version, Snapshot version is not published to the Maven Central -->
40-
<rapids4spark.version>22.08.0</rapids4spark.version>
40+
<rapids4spark.version>22.10.0</rapids4spark.version>
4141
<spark.version>3.1.1</spark.version>
4242
<scala.version>2.12.15</scala.version>
4343
<udf.native.build.path>${project.build.directory}/cpp-build</udf.native.build.path>

examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/cpp/CMakeLists.txt

+5-5
Original file line numberDiff line numberDiff line change
@@ -14,9 +14,9 @@
1414
# limitations under the License.
1515
#=============================================================================
1616

17-
cmake_minimum_required(VERSION 3.20.1 FATAL_ERROR)
17+
cmake_minimum_required(VERSION 3.23.1 FATAL_ERROR)
1818

19-
file(DOWNLOAD https://raw.githubusercontent.com/rapidsai/rapids-cmake/branch-22.08/RAPIDS.cmake
19+
file(DOWNLOAD https://raw.githubusercontent.com/rapidsai/rapids-cmake/branch-22.10/RAPIDS.cmake
2020
${CMAKE_BINARY_DIR}/RAPIDS.cmake)
2121
include(${CMAKE_BINARY_DIR}/RAPIDS.cmake)
2222

@@ -32,7 +32,7 @@ if(DEFINED GPU_ARCHS)
3232
endif()
3333
rapids_cuda_init_architectures(UDFEXAMPLESJNI)
3434

35-
project(UDFEXAMPLESJNI VERSION 22.08.0 LANGUAGES C CXX CUDA)
35+
project(UDFEXAMPLESJNI VERSION 22.10.0 LANGUAGES C CXX CUDA)
3636

3737
option(PER_THREAD_DEFAULT_STREAM "Build with per-thread default stream" OFF)
3838
option(BUILD_UDF_BENCHMARKS "Build the benchmarks" OFF)
@@ -84,10 +84,10 @@ set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -w --expt-extended-lambda --expt-relax
8484
set(CUDA_USE_STATIC_CUDA_RUNTIME OFF)
8585

8686
rapids_cpm_init()
87-
rapids_cpm_find(cudf 22.08.00
87+
rapids_cpm_find(cudf 22.10.00
8888
CPM_ARGS
8989
GIT_REPOSITORY https://github.com/rapidsai/cudf.git
90-
GIT_TAG branch-22.08
90+
GIT_TAG branch-22.10
9191
GIT_SHALLOW TRUE
9292
SOURCE_SUBDIR cpp
9393
OPTIONS "BUILD_TESTS OFF"

examples/UDF-Examples/Spark-cuSpatial/Dockerfile

+1-1
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ RUN conda --version
3939
RUN conda install -c conda-forge openjdk=8 maven=3.8.1 -y
4040

4141
# install cuDF dependency.
42-
RUN conda install -c rapidsai -c nvidia -c conda-forge -c defaults libcuspatial=22.08 python=3.8 -y
42+
RUN conda install -c rapidsai -c nvidia -c conda-forge -c defaults libcuspatial=22.10 python=3.8 -y
4343

4444
RUN wget --quiet \
4545
https://github.com/Kitware/CMake/releases/download/v3.21.3/cmake-3.21.3-linux-x86_64.tar.gz \

examples/UDF-Examples/Spark-cuSpatial/Dockerfile.awsdb

+1-1
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ RUN wget -q https://repo.continuum.io/miniconda/Miniconda3-py38_4.9.2-Linux-x86_
4848
conda config --system --set always_yes True && \
4949
conda clean --all
5050

51-
RUN conda install -c rapidsai-nightly -c nvidia -c conda-forge -c defaults libcuspatial=22.08
51+
RUN conda install -c rapidsai-nightly -c nvidia -c conda-forge -c defaults libcuspatial=22.10
5252
RUN conda install -c conda-forge libgdal==3.3.1
5353
RUN pip install jupyter
5454
ENV JAVA_HOME /usr/lib/jvm/java-1.8.0-openjdk-amd64

examples/UDF-Examples/Spark-cuSpatial/README.md

+3-3
Original file line numberDiff line numberDiff line change
@@ -65,9 +65,9 @@ Note: The docker env is just for building the jar, not for running the applicati
6565
4. [cuspatial](https://github.com/rapidsai/cuspatial): install libcuspatial
6666
```Bash
6767
# Install libcuspatial from conda
68-
conda install -c rapidsai -c nvidia -c conda-forge -c defaults libcuspatial=22.06
68+
conda install -c rapidsai -c nvidia -c conda-forge -c defaults libcuspatial=22.10
6969
# or below command for the nightly (aka SNAPSHOT) version.
70-
conda install -c rapidsai-nightly -c nvidia -c conda-forge -c defaults libcuspatial=22.08
70+
conda install -c rapidsai-nightly -c nvidia -c conda-forge -c defaults libcuspatial=22.10
7171
```
7272
5. Build the JAR using `mvn package`.
7373
```Bash
@@ -86,7 +86,7 @@ Note: The docker env is just for building the jar, not for running the applicati
8686
2. Set up [a standalone cluster](/docs/get-started/xgboost-examples/on-prem-cluster/standalone-scala.md) of Spark. Make sure the conda/lib is included in LD_LIBRARY_PATH, so that spark executors can load libcuspatial.so.
8787
8888
3. Download Spark RAPIDS JAR
89-
* [Spark RAPIDS JAR v22.08.0](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/22.08.0/rapids-4-spark_2.12-22.08.0.jar) or above
89+
* [Spark RAPIDS JAR v22.10.0](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/22.10.0/rapids-4-spark_2.12-22.10.0.jar) or above
9090
4. Prepare sample dataset and JARs. Copy the [sample dataset](../../../datasets/cuspatial_data.tar.gz) to `/data/cuspatial_data/`.
9191
Copy Spark RAPIDS JAR and `spark-cuspatial-<version>.jar` to `/data/cuspatial_data/jars/`.
9292
If you build the `spark-cuspatial-<version>.jar` in docker, please copy the jar from docker to local:

examples/UDF-Examples/Spark-cuSpatial/gpu-run.sh

+1-1
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ rm -rf $DATA_OUT_PATH
3131
# the path to keep the jars of spark-rapids & spark-cuspatial
3232
JARS=$ROOT_PATH/jars
3333

34-
JARS_PATH=${JARS_PATH:-$JARS/rapids-4-spark_2.12-22.08.0.jar,$JARS/spark-cuspatial-22.08.0-SNAPSHOT.jar}
34+
JARS_PATH=${JARS_PATH:-$JARS/rapids-4-spark_2.12-22.10.0.jar,$JARS/spark-cuspatial-22.10.0-SNAPSHOT.jar}
3535

3636
$SPARK_HOME/bin/spark-submit --master spark://$HOSTNAME:7077 \
3737
--name "Gpu Spatial Join UDF" \

examples/UDF-Examples/Spark-cuSpatial/notebooks/cuspatial_sample_standalone.ipynb

+1-1
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
"source": [
1010
"from pyspark.sql import SparkSession\n",
1111
"import os\n",
12-
"jarsPath = os.getenv(\"JARS_PATH\", \"/data/cuspatial_data/jars/rapids-4-spark_2.12-22.08.0.jar,/data/cuspatial_data/jars/spark-cuspatial-22.08.0-SNAPSHOT.jar\")\n",
12+
"jarsPath = os.getenv(\"JARS_PATH\", \"/data/cuspatial_data/jars/rapids-4-spark_2.12-22.10.0.jar,/data/cuspatial_data/jars/spark-cuspatial-22.10.0-SNAPSHOT.jar\")\n",
1313
"spark = SparkSession.builder \\\n",
1414
" .config(\"spark.jars\", jarsPath) \\\n",
1515
" .config(\"spark.sql.adaptive.enabled\", \"false\") \\\n",

examples/UDF-Examples/Spark-cuSpatial/pom.xml

+2-2
Original file line numberDiff line numberDiff line change
@@ -24,13 +24,13 @@
2424
<name>UDF of the cuSpatial case for the RAPIDS Accelerator</name>
2525
<description>The RAPIDS accelerated user defined function of the cuSpatial case
2626
for use with the RAPIDS Accelerator for Apache Spark</description>
27-
<version>22.08.0-SNAPSHOT</version>
27+
<version>22.10.0-SNAPSHOT</version>
2828

2929
<properties>
3030
<maven.compiler.source>1.8</maven.compiler.source>
3131
<maven.compiler.target>1.8</maven.compiler.target>
3232
<java.major.version>8</java.major.version>
33-
<rapids.version>22.08.0</rapids.version>
33+
<rapids.version>22.10.0</rapids.version>
3434
<scala.binary.version>2.12</scala.binary.version>
3535
<spark.version>3.2.0</spark.version>
3636
<udf.native.build.path>${project.build.directory}/cpp-build</udf.native.build.path>

examples/UDF-Examples/Spark-cuSpatial/src/main/native/CMakeLists.txt

+1-1
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@
1616

1717
cmake_minimum_required(VERSION 3.20.1 FATAL_ERROR)
1818

19-
project(SPATIALUDJNI VERSION 22.08.0 LANGUAGES C CXX CUDA)
19+
project(SPATIALUDJNI VERSION 22.10.0 LANGUAGES C CXX CUDA)
2020

2121
###################################################################################################
2222
# - build type ------------------------------------------------------------------------------------

examples/XGBoost-Examples/README.md

+4
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,10 @@ In the public cloud, better performance can lead to significantly lower costs as
1212

1313
![mortgage-speedup](/docs/img/guides/mortgage-perf.png)
1414

15+
Note that the test result is based on 21 years [Fannie Mea Single-Family Loan Performance Data](https://capitalmarkets.fanniemae.com/credit-risk-transfer/single-family-credit-risk-transfer/fannie-mae-single-family-loan-performance-data)
16+
with a 4 A100 GPU and 512 CPU vcores cluster, the performance is affected by many aspects,
17+
including data size and type of GPU.
18+
1519
In this folder, there are three blue prints for users to learn about using
1620
Spark XGBoost and RAPIDS Accelerator on GPUs :
1721

examples/XGBoost-Examples/agaricus/scala/src/com/nvidia/spark/examples/agaricus/Main.scala

-3
Original file line numberDiff line numberDiff line change
@@ -63,9 +63,6 @@ object Main {
6363
val xgbClassificationModel = if (xgboostArgs.isToTrain) {
6464
// build XGBoost classifier
6565
val paramMap = xgboostArgs.xgboostParams(Map(
66-
"eta" -> 0.1,
67-
"missing" -> 0.0,
68-
"max_depth" -> 2,
6966
"objective" -> "binary:logistic",
7067
"eval_sets" -> datasets(1).map(ds => Map("eval" -> ds)).getOrElse(Map.empty)
7168
))

0 commit comments

Comments
 (0)