Skip to content

Commit dba5adf

Browse files
Merge pull request #317 from SurajAralihalli/branch-23.08-main-release
Branch 23.08 main release
2 parents 0cb527c + 8e1701f commit dba5adf

File tree

32 files changed

+59
-58
lines changed

32 files changed

+59
-58
lines changed

.github/workflows/auto-merge.yml

+4-4
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ name: auto-merge HEAD to BASE
1818
on:
1919
pull_request_target:
2020
branches:
21-
- branch-23.06
21+
- branch-23.08
2222
types: [closed]
2323

2424
jobs:
@@ -29,14 +29,14 @@ jobs:
2929
steps:
3030
- uses: actions/checkout@v3
3131
with:
32-
ref: branch-23.06 # force to fetch from latest upstream instead of PR ref
32+
ref: branch-23.08 # force to fetch from latest upstream instead of PR ref
3333

3434
- name: auto-merge job
3535
uses: ./.github/workflows/auto-merge
3636
env:
3737
OWNER: NVIDIA
3838
REPO_NAME: spark-rapids-examples
39-
HEAD: branch-23.06
40-
BASE: branch-23.08
39+
HEAD: branch-23.08
40+
BASE: branch-23.10
4141
AUTOMERGE_TOKEN: ${{ secrets.AUTOMERGE_TOKEN }} # use to merge PR
4242

dockerfile/Dockerfile

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
1+
# Copyright (c) 2019-2023, NVIDIA CORPORATION. All rights reserved.
22
# Licensed to the Apache Software Foundation (ASF) under one or more
33
# contributor license agreements. See the NOTICE file distributed with
44
# this work for additional information regarding copyright ownership.
@@ -15,7 +15,7 @@
1515
# limitations under the License.
1616
#
1717

18-
FROM nvidia/cuda:11.0-devel-ubuntu18.04
18+
FROM nvidia/cuda:11.8.0-devel-ubuntu18.04
1919
ARG spark_uid=185
2020

2121
# Install java dependencies

docs/get-started/xgboost-examples/csp/databricks/databricks.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ Navigate to your home directory in the UI and select **Create** > **File** from
2121
create an `init.sh` scripts with contents:
2222
```bash
2323
#!/bin/bash
24-
sudo wget -O /databricks/jars/rapids-4-spark_2.12-23.06.0.jar https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/23.06.0/rapids-4-spark_2.12-23.06.0.jar
24+
sudo wget -O /databricks/jars/rapids-4-spark_2.12-23.08.1.jar https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/23.08.1/rapids-4-spark_2.12-23.08.1.jar
2525
```
2626
1. Select the Databricks Runtime Version from one of the supported runtimes specified in the
2727
Prerequisites section.
@@ -68,7 +68,7 @@ create an `init.sh` scripts with contents:
6868
```bash
6969
spark.rapids.sql.python.gpu.enabled true
7070
spark.python.daemon.module rapids.daemon_databricks
71-
spark.executorEnv.PYTHONPATH /databricks/jars/rapids-4-spark_2.12-23.06.0.jar:/databricks/spark/python
71+
spark.executorEnv.PYTHONPATH /databricks/jars/rapids-4-spark_2.12-23.08.1.jar:/databricks/spark/python
7272
```
7373
Note that since python memory pool require installing the cudf library, so you need to install cudf library in
7474
each worker nodes `pip install cudf-cu11 --extra-index-url=https://pypi.nvidia.com` or disable python memory pool

docs/get-started/xgboost-examples/csp/databricks/init.sh

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
sudo rm -f /databricks/jars/spark--maven-trees--ml--10.x--xgboost-gpu--ml.dmlc--xgboost4j-gpu_2.12--ml.dmlc__xgboost4j-gpu_2.12__1.5.2.jar
22
sudo rm -f /databricks/jars/spark--maven-trees--ml--10.x--xgboost-gpu--ml.dmlc--xgboost4j-spark-gpu_2.12--ml.dmlc__xgboost4j-spark-gpu_2.12__1.5.2.jar
33

4-
sudo wget -O /databricks/jars/rapids-4-spark_2.12-23.06.0.jar https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/23.06.0/rapids-4-spark_2.12-23.06.0.jar
4+
sudo wget -O /databricks/jars/rapids-4-spark_2.12-23.08.1.jar https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/23.08.1/rapids-4-spark_2.12-23.08.1.jar
55
sudo wget -O /databricks/jars/xgboost4j-gpu_2.12-1.7.1.jar https://repo1.maven.org/maven2/ml/dmlc/xgboost4j-gpu_2.12/1.7.1/xgboost4j-gpu_2.12-1.7.1.jar
66
sudo wget -O /databricks/jars/xgboost4j-spark-gpu_2.12-1.7.1.jar https://repo1.maven.org/maven2/ml/dmlc/xgboost4j-spark-gpu_2.12/1.7.1/xgboost4j-spark-gpu_2.12-1.7.1.jar
77
ls -ltr

docs/get-started/xgboost-examples/on-prem-cluster/kubernetes-scala.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ export SPARK_DOCKER_IMAGE=<gpu spark docker image repo and name>
4040
export SPARK_DOCKER_TAG=<spark docker image tag>
4141

4242
pushd ${SPARK_HOME}
43-
wget https://github.com/NVIDIA/spark-rapids-examples/raw/branch-23.06/dockerfile/Dockerfile
43+
wget https://github.com/NVIDIA/spark-rapids-examples/raw/branch-23.08/dockerfile/Dockerfile
4444

4545
# Optionally install additional jars into ${SPARK_HOME}/jars/
4646

docs/get-started/xgboost-examples/prepare-package-data/preparation-python.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ For simplicity export the location to these jars. All examples assume the packag
55
### Download the jars
66

77
Download the RAPIDS Accelerator for Apache Spark plugin jar
8-
* [RAPIDS Spark Package](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/23.06.0/rapids-4-spark_2.12-23.06.0.jar)
8+
* [RAPIDS Spark Package](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/23.08.1/rapids-4-spark_2.12-23.08.1.jar)
99

1010
### Build XGBoost Python Examples
1111

docs/get-started/xgboost-examples/prepare-package-data/preparation-scala.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ For simplicity export the location to these jars. All examples assume the packag
55
### Download the jars
66

77
1. Download the RAPIDS Accelerator for Apache Spark plugin jar
8-
* [RAPIDS Spark Package](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/23.06.0/rapids-4-spark_2.12-23.06.0.jar)
8+
* [RAPIDS Spark Package](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/23.08.1/rapids-4-spark_2.12-23.08.1.jar)
99

1010
### Build XGBoost Scala Examples
1111

examples/MIG-Support/device-plugins/gpu-mig/pom.xml

+1-1
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@
3535
</licenses>
3636

3737
<properties>
38-
<yarn.version>3.3.0</yarn.version>
38+
<yarn.version>3.3.6</yarn.version>
3939
<java.version>1.8</java.version>
4040
<maven.compiler.version>3.8.1</maven.compiler.version>
4141
<maven.jar.plugin.version>3.2.0</maven.jar.plugin.version>

examples/ML+DL-Examples/Spark-cuML/pca/Dockerfile

+4-4
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
#!/bin/bash
22
#
3-
# Copyright (c) 2021-2022, NVIDIA CORPORATION. All rights reserved.
3+
# Copyright (c) 2021-2023, NVIDIA CORPORATION. All rights reserved.
44
#
55
# Licensed under the Apache License, Version 2.0 (the "License");
66
# you may not use this file except in compliance with the License.
@@ -15,9 +15,9 @@
1515
# limitations under the License.
1616
#
1717

18-
ARG CUDA_VER=11.5.1
18+
ARG CUDA_VER=11.8.0
1919
FROM nvidia/cuda:${CUDA_VER}-devel-ubuntu20.04
20-
ARG BRANCH_VER=23.06
20+
ARG BRANCH_VER=23.08
2121

2222
RUN apt-get update
2323
RUN apt-get install -y wget ninja-build git
@@ -42,7 +42,7 @@ RUN conda install -c conda-forge openjdk=8 maven=3.8.1 -y
4242
RUN conda install -c rapidsai-nightly -c nvidia -c conda-forge cudf=${BRANCH_VER} python=3.8 -y
4343

4444
RUN wget --quiet \
45-
https://github.com/Kitware/CMake/releases/download/v3.21.3/cmake-3.21.3-linux-x86_64.tar.gz \
45+
https://github.com/Kitware/CMake/releases/download/v3.26.4/cmake-3.26.4-linux-x86_64.tar.gz \
4646
&& tar -xzf cmake-3.21.3-linux-x86_64.tar.gz \
4747
&& rm -rf cmake-3.21.3-linux-x86_64.tar.gz
4848

examples/ML+DL-Examples/Spark-cuML/pca/README.md

+4-3
Original file line numberDiff line numberDiff line change
@@ -12,14 +12,15 @@ User can also download the release jar from Maven central:
1212

1313
[rapids-4-spark-ml_2.12-22.02.0-cuda11.jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark-ml_2.12/22.02.0/rapids-4-spark-ml_2.12-22.02.0-cuda11.jar)
1414

15-
[rapids-4-spark_2.12-23.06.0.jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/23.06.0/rapids-4-spark_2.12-23.06.0.jar)
15+
[rapids-4-spark_2.12-23.08.1.jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/23.08.1/rapids-4-spark_2.12-23.08.1.jar)
1616

17+
Note: This demo could only work with v22.02.0 version.
1718

1819
## Sample code
1920

2021
User can find sample scala code in [`main.scala`](main.scala). In the sample code, we will generate random data with 2048 feature dimensions. Then we use PCA to reduce number of features to 3.
2122

22-
Just copy the sample code into the spark-shell laucnhed according to [this section](https://github.com/NVIDIA/spark-rapids-ml#how-to-use) and REPL will give out the algorithm results.
23+
Just copy the sample code into the spark-shell launched according to [this section](https://github.com/NVIDIA/spark-rapids-ml#how-to-use) and REPL will give out the algorithm results.
2324

2425
## Notebook
2526

@@ -48,7 +49,7 @@ It is assumed that a Standalone Spark cluster has been set up, the `SPARK_MASTER
4849

4950
``` bash
5051
RAPIDS_ML_JAR=PATH_TO_rapids-4-spark-ml_2.12-22.02.0-cuda11.jar
51-
PLUGIN_JAR=PATH_TO_rapids-4-spark_2.12-23.06.0.jar
52+
PLUGIN_JAR=PATH_TO_rapids-4-spark_2.12-23.08.1.jar
5253
5354
jupyter toree install \
5455
--spark_home=${SPARK_HOME} \

examples/ML+DL-Examples/Spark-cuML/pca/pom.xml

+2-2
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@
2121
<groupId>com.nvidia</groupId>
2222
<artifactId>PCAExample</artifactId>
2323
<packaging>jar</packaging>
24-
<version>23.06.0</version>
24+
<version>23.08.0</version>
2525

2626
<properties>
2727
<maven.compiler.source>8</maven.compiler.source>
@@ -51,7 +51,7 @@
5151
<dependency>
5252
<groupId>com.nvidia</groupId>
5353
<artifactId>rapids-4-spark-ml_2.12</artifactId>
54-
<version>23.06.0</version>
54+
<version>23.02.0</version>
5555
</dependency>
5656
</dependencies>
5757

examples/ML+DL-Examples/Spark-cuML/pca/spark-submit.sh

+4-3
Original file line numberDiff line numberDiff line change
@@ -15,8 +15,9 @@
1515
# limitations under the License.
1616
#
1717

18-
ML_JAR=/root/.m2/repository/com/nvidia/rapids-4-spark-ml_2.12/23.06.0-SNAPSHOT/rapids-4-spark-ml_2.12-23.06.0-SNAPSHOT.jar
19-
PLUGIN_JAR=/root/.m2/repository/com/nvidia/rapids-4-spark_2.12/23.06.0-SNAPSHOT/rapids-4-spark_2.12-23.06.0-SNAPSHOT.jar
18+
ML_JAR=/root/.m2/repository/com/nvidia/rapids-4-spark-ml_2.12/23.04.0-SNAPSHOT/rapids-4-spark-ml_2.12-23.04.0-SNAPSHOT.jar
19+
PLUGIN_JAR=/root/.m2/repository/com/nvidia/rapids-4-spark_2.12/23.08.0-SNAPSHOT/rapids-4-spark_2.12-23.08.0-SNAPSHOT.jar
20+
Note: The last rapids-4-spark-ml release version is 22.02.0, snapshot version is 23.04.0-SNPASHOT.
2021

2122
$SPARK_HOME/bin/spark-submit \
2223
--master spark://127.0.0.1:7077 \
@@ -38,4 +39,4 @@ $SPARK_HOME/bin/spark-submit \
3839
--conf spark.network.timeout=1000s \
3940
--jars $ML_JAR,$PLUGIN_JAR \
4041
--class com.nvidia.spark.examples.pca.Main \
41-
/workspace/target/PCAExample-23.06.0-SNAPSHOT.jar
42+
/workspace/target/PCAExample-23.08.0-SNAPSHOT.jar

examples/SQL+DF-Examples/micro-benchmarks/notebooks/micro-benchmarks-gpu.ipynb

+1-1
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@
2222
"import os\n",
2323
"# Change to your cluster ip:port and directories\n",
2424
"SPARK_MASTER_URL = os.getenv(\"SPARK_MASTER_URL\", \"spark:your-ip:port\")\n",
25-
"RAPIDS_JAR = os.getenv(\"RAPIDS_JAR\", \"/your-path/rapids-4-spark_2.12-23.06.0.jar\")\n"
25+
"RAPIDS_JAR = os.getenv(\"RAPIDS_JAR\", \"/your-path/rapids-4-spark_2.12-23.08.1.jar\")\n"
2626
]
2727
},
2828
{

examples/UDF-Examples/RAPIDS-accelerated-UDFs/Dockerfile

+3-3
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
#
2-
# Copyright (c) 2021-2022, NVIDIA CORPORATION. All rights reserved.
2+
# Copyright (c) 2021-2023, NVIDIA CORPORATION. All rights reserved.
33
#
44
# Licensed under the Apache License, Version 2.0 (the "License");
55
# you may not use this file except in compliance with the License.
@@ -15,7 +15,7 @@
1515
#
1616

1717
# A container that can be used to build UDF native code against libcudf
18-
ARG CUDA_VERSION=11.5.0
18+
ARG CUDA_VERSION=11.8.0
1919
ARG LINUX_VERSION=ubuntu18.04
2020

2121
FROM nvidia/cuda:${CUDA_VERSION}-devel-${LINUX_VERSION}
@@ -58,7 +58,7 @@ CUDA_VERSION_MINOR=$(echo $CUDA_VERSION | tr -d '.' | cut -c 3); \
5858
# Set JDK8 as the default Java
5959
&& update-alternatives --set java /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java
6060

61-
ARG CMAKE_VERSION=3.23.3
61+
ARG CMAKE_VERSION=3.26.4
6262

6363
# Install CMake
6464
RUN cd /tmp \

examples/UDF-Examples/RAPIDS-accelerated-UDFs/README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -108,7 +108,7 @@ See above Prerequisites section
108108
First finish the steps in "Building with Native Code Examples and run test cases" section, then do the following in the docker.
109109

110110
### Get jars from Maven Central
111-
[rapids-4-spark_2.12-23.06.0.jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/23.06.0/rapids-4-spark_2.12-23.06.0.jar)
111+
[rapids-4-spark_2.12-23.08.1.jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/23.08.1/rapids-4-spark_2.12-23.08.1.jar)
112112

113113
### Launch a local mode Spark
114114

examples/UDF-Examples/RAPIDS-accelerated-UDFs/pom.xml

+2-2
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@
2525
user defined functions for use with the RAPIDS Accelerator
2626
for Apache Spark
2727
</description>
28-
<version>23.06.0</version>
28+
<version>23.08.0</version>
2929

3030
<properties>
3131
<maven.compiler.source>1.8</maven.compiler.source>
@@ -37,7 +37,7 @@
3737
<cuda.version>cuda11</cuda.version>
3838
<scala.binary.version>2.12</scala.binary.version>
3939
<!-- Depends on release version, Snapshot version is not published to the Maven Central -->
40-
<rapids4spark.version>23.06.0</rapids4spark.version>
40+
<rapids4spark.version>23.08.1</rapids4spark.version>
4141
<spark.version>3.1.1</spark.version>
4242
<scala.version>2.12.15</scala.version>
4343
<udf.native.build.path>${project.build.directory}/cpp-build</udf.native.build.path>

examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/cpp/CMakeLists.txt

+4-4
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@
1616

1717
cmake_minimum_required(VERSION 3.23.1 FATAL_ERROR)
1818

19-
file(DOWNLOAD https://raw.githubusercontent.com/rapidsai/rapids-cmake/branch-23.06/RAPIDS.cmake
19+
file(DOWNLOAD https://raw.githubusercontent.com/rapidsai/rapids-cmake/branch-23.08/RAPIDS.cmake
2020
${CMAKE_BINARY_DIR}/RAPIDS.cmake)
2121
include(${CMAKE_BINARY_DIR}/RAPIDS.cmake)
2222

@@ -32,7 +32,7 @@ if(DEFINED GPU_ARCHS)
3232
endif()
3333
rapids_cuda_init_architectures(UDFEXAMPLESJNI)
3434

35-
project(UDFEXAMPLESJNI VERSION 23.06.0 LANGUAGES C CXX CUDA)
35+
project(UDFEXAMPLESJNI VERSION 23.08.0 LANGUAGES C CXX CUDA)
3636

3737
option(PER_THREAD_DEFAULT_STREAM "Build with per-thread default stream" OFF)
3838
option(BUILD_UDF_BENCHMARKS "Build the benchmarks" OFF)
@@ -84,10 +84,10 @@ set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -w --expt-extended-lambda --expt-relax
8484
set(CUDA_USE_STATIC_CUDA_RUNTIME OFF)
8585

8686
rapids_cpm_init()
87-
rapids_cpm_find(cudf 23.06.00
87+
rapids_cpm_find(cudf 23.08.00
8888
CPM_ARGS
8989
GIT_REPOSITORY https://github.com/rapidsai/cudf.git
90-
GIT_TAG branch-23.06
90+
GIT_TAG branch-23.08
9191
GIT_SHALLOW TRUE
9292
SOURCE_SUBDIR cpp
9393
OPTIONS "BUILD_TESTS OFF"

examples/UDF-Examples/Spark-cuSpatial/Dockerfile

+1-1
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ RUN conda --version
3939
RUN conda install -c conda-forge openjdk=8 maven=3.8.1 -y
4040

4141
RUN wget --quiet \
42-
https://github.com/Kitware/CMake/releases/download/v3.21.3/cmake-3.21.3-linux-x86_64.tar.gz \
42+
https://github.com/Kitware/CMake/releases/download/v3.26.4/cmake-3.26.4-linux-x86_64.tar.gz \
4343
&& tar -xzf cmake-3.21.3-linux-x86_64.tar.gz \
4444
&& rm -rf cmake-3.21.3-linux-x86_64.tar.gz
4545

examples/UDF-Examples/Spark-cuSpatial/gpu-run.sh

-1
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,6 @@ rm -rf $DATA_OUT_PATH
3232
JARS=$ROOT_PATH/jars
3333

3434
JARS_PATH=${JARS_PATH:-$JARS/rapids-4-spark_2.12-23.02.0.jar,$JARS/spark-cuspatial-23.02.0.jar}
35-
3635
$SPARK_HOME/bin/spark-submit --master spark://$HOSTNAME:7077 \
3736
--name "Gpu Spatial Join UDF" \
3837
--executor-memory 20G \

examples/XGBoost-Examples/agaricus/notebooks/python/agaricus-gpu.ipynb

+1-1
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,7 @@
7373
"Setting default log level to \"WARN\".\n",
7474
"To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).\n",
7575
"2022-11-30 06:57:40,550 WARN resource.ResourceUtils: The configuration of cores (exec = 2 task = 1, runnable tasks = 2) will result in wasted resources due to resource gpu limiting the number of runnable tasks per executor to: 1. Please adjust your configuration.\n",
76-
"2022-11-30 06:57:54,195 WARN rapids.RapidsPluginUtils: RAPIDS Accelerator 23.06.0 using cudf 23.06.0.\n",
76+
"2022-11-30 06:57:54,195 WARN rapids.RapidsPluginUtils: RAPIDS Accelerator 23.08.1 using cudf 23.08.1.\n",
7777
"2022-11-30 06:57:54,210 WARN rapids.RapidsPluginUtils: spark.rapids.sql.multiThreadedRead.numThreads is set to 20.\n",
7878
"2022-11-30 06:57:54,214 WARN rapids.RapidsPluginUtils: RAPIDS Accelerator is enabled, to disable GPU support set `spark.rapids.sql.enabled` to false.\n",
7979
"2022-11-30 06:57:54,214 WARN rapids.RapidsPluginUtils: spark.rapids.sql.explain is set to `NOT_ON_GPU`. Set it to 'NONE' to suppress the diagnostics logging about the query placement on the GPU.\n",

examples/XGBoost-Examples/mortgage/notebooks/python/MortgageETL+XGBoost.ipynb

+1-1
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
"source": [
77
"# Dataset\n",
88
"\n",
9-
"Dataset is derived from Fannie Mae’s [Single-Family Loan Performance Data](http://www.fanniemae.com/portal/funding-the-market/data/loan-performance-data.html) with all rights reserved by Fannie Mae. Refer to these [instructions](https://github.com/NVIDIA/spark-rapids-examples/blob/branch-23.06/docs/get-started/xgboost-examples/dataset/mortgage.md) to download the dataset.\n",
9+
"Dataset is derived from Fannie Mae’s [Single-Family Loan Performance Data](http://www.fanniemae.com/portal/funding-the-market/data/loan-performance-data.html) with all rights reserved by Fannie Mae. Refer to these [instructions](https://github.com/NVIDIA/spark-rapids-examples/blob/branch-23.08/docs/get-started/xgboost-examples/dataset/mortgage.md) to download the dataset.\n",
1010
"\n",
1111
"# ETL + XGBoost train & transform\n",
1212
"\n",

examples/XGBoost-Examples/mortgage/notebooks/python/MortgageETL.ipynb

+3-3
Original file line numberDiff line numberDiff line change
@@ -6,18 +6,18 @@
66
"source": [
77
"## Prerequirement\n",
88
"### 1. Download data\n",
9-
"Dataset is derived from Fannie Mae’s [Single-Family Loan Performance Data](http://www.fanniemae.com/portal/funding-the-market/data/loan-performance-data.html) with all rights reserved by Fannie Mae. Refer to these [instructions](https://github.com/NVIDIA/spark-rapids-examples/blob/branch-23.06/docs/get-started/xgboost-examples/dataset/mortgage.md) to download the dataset.\n",
9+
"Dataset is derived from Fannie Mae’s [Single-Family Loan Performance Data](http://www.fanniemae.com/portal/funding-the-market/data/loan-performance-data.html) with all rights reserved by Fannie Mae. Refer to these [instructions](https://github.com/NVIDIA/spark-rapids-examples/blob/branch-23.08/docs/get-started/xgboost-examples/dataset/mortgage.md) to download the dataset.\n",
1010
"\n",
1111
"### 2. Download needed jars\n",
12-
"* [rapids-4-spark_2.12-23.06.0.jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/23.06.0/rapids-4-spark_2.12-23.06.0.jar)\n",
12+
"* [rapids-4-spark_2.12-23.08.1.jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/23.08.1/rapids-4-spark_2.12-23.08.1.jar)\n",
1313
"\n",
1414
"\n",
1515
"### 3. Start Spark Standalone\n",
1616
"Before running the script, please setup Spark standalone mode\n",
1717
"\n",
1818
"### 4. Add ENV\n",
1919
"```\n",
20-
"$ export SPARK_JARS=rapids-4-spark_2.12-23.06.0.jar\n",
20+
"$ export SPARK_JARS=rapids-4-spark_2.12-23.08.1.jar\n",
2121
"$ export PYSPARK_DRIVER_PYTHON=jupyter \n",
2222
"$ export PYSPARK_DRIVER_PYTHON_OPTS=notebook\n",
2323
"```\n",

examples/XGBoost-Examples/mortgage/notebooks/python/cv-mortgage-gpu.ipynb

+1-1
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,7 @@
6363
"Setting default log level to \"WARN\".\n",
6464
"To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).\n",
6565
"2022-11-25 09:34:43,952 WARN resource.ResourceUtils: The configuration of cores (exec = 4 task = 1, runnable tasks = 4) will result in wasted resources due to resource gpu limiting the number of runnable tasks per executor to: 1. Please adjust your configuration.\n",
66-
"2022-11-25 09:34:58,155 WARN rapids.RapidsPluginUtils: RAPIDS Accelerator 23.06.0 using cudf 23.06.0.\n",
66+
"2022-11-25 09:34:58,155 WARN rapids.RapidsPluginUtils: RAPIDS Accelerator 23.08.1 using cudf 23.08.1.\n",
6767
"2022-11-25 09:34:58,171 WARN rapids.RapidsPluginUtils: spark.rapids.sql.multiThreadedRead.numThreads is set to 20.\n",
6868
"2022-11-25 09:34:58,175 WARN rapids.RapidsPluginUtils: RAPIDS Accelerator is enabled, to disable GPU support set `spark.rapids.sql.enabled` to false.\n",
6969
"2022-11-25 09:34:58,175 WARN rapids.RapidsPluginUtils: spark.rapids.sql.explain is set to `NOT_ON_GPU`. Set it to 'NONE' to suppress the diagnostics logging about the query placement on the GPU.\n"

examples/XGBoost-Examples/mortgage/notebooks/python/mortgage-gpu.ipynb

+1-1
Original file line numberDiff line numberDiff line change
@@ -84,7 +84,7 @@
8484
"22/11/24 06:14:06 INFO org.apache.spark.SparkEnv: Registering BlockManagerMaster\n",
8585
"22/11/24 06:14:06 INFO org.apache.spark.SparkEnv: Registering BlockManagerMasterHeartbeat\n",
8686
"22/11/24 06:14:06 INFO org.apache.spark.SparkEnv: Registering OutputCommitCoordinator\n",
87-
"22/11/24 06:14:07 WARN com.nvidia.spark.rapids.RapidsPluginUtils: RAPIDS Accelerator 23.06.0 using cudf 23.06.0.\n",
87+
"22/11/24 06:14:07 WARN com.nvidia.spark.rapids.RapidsPluginUtils: RAPIDS Accelerator 23.08.1 using cudf 23.08.1.\n",
8888
"22/11/24 06:14:07 WARN com.nvidia.spark.rapids.RapidsPluginUtils: spark.rapids.sql.multiThreadedRead.numThreads is set to 20.\n",
8989
"22/11/24 06:14:07 WARN com.nvidia.spark.rapids.RapidsPluginUtils: RAPIDS Accelerator is enabled, to disable GPU support set `spark.rapids.sql.enabled` to false.\n",
9090
"22/11/24 06:14:07 WARN com.nvidia.spark.rapids.RapidsPluginUtils: spark.rapids.sql.explain is set to `NOT_ON_GPU`. Set it to 'NONE' to suppress the diagnostics logging about the query placement on the GPU.\n"

0 commit comments

Comments
 (0)