From 5a3153e8a2d9a13ba9076da49b942b5c3275cffd Mon Sep 17 00:00:00 2001 From: Suraj Aralihalli Date: Tue, 6 Feb 2024 13:11:53 -0800 Subject: [PATCH 1/4] update 24.02.0 docs --- docs/archive.md | 90 ++++++++++++++++++++++++++++++++++++++++++++++++ docs/download.md | 39 ++++++++++----------- 2 files changed, 109 insertions(+), 20 deletions(-) diff --git a/docs/archive.md b/docs/archive.md index de1bd1ac8b9..edd270ae5bc 100644 --- a/docs/archive.md +++ b/docs/archive.md @@ -5,6 +5,96 @@ nav_order: 15 --- Below are archived releases for RAPIDS Accelerator for Apache Spark. +## Release v23.12.2 +### Hardware Requirements: + +The plugin is tested on the following architectures: + + GPU Models: NVIDIA V100, T4, A10/A100, L4 and H100 GPUs + +### Software Requirements: + + OS: Ubuntu 20.04, Ubuntu 22.04, CentOS 7, or Rocky Linux 8 + + NVIDIA Driver*: R470+ + + Runtime: + Scala 2.12, 2.13 + Python, Java Virtual Machine (JVM) compatible with your spark-version. + + * Check the Spark documentation for Python and Java version compatibility with your specific + Spark version. For instance, visit `https://spark.apache.org/docs/3.4.1` for Spark 3.4.1. + + Supported Spark versions: + Apache Spark 3.2.0, 3.2.1, 3.2.2, 3.2.3, 3.2.4 + Apache Spark 3.3.0, 3.3.1, 3.3.2, 3.3.3 + Apache Spark 3.4.0, 3.4.1 + Apache Spark 3.5.0 + + Supported Databricks runtime versions for Azure and AWS: + Databricks 10.4 ML LTS (GPU, Scala 2.12, Spark 3.2.1) + Databricks 11.3 ML LTS (GPU, Scala 2.12, Spark 3.3.0) + Databricks 12.2 ML LTS (GPU, Scala 2.12, Spark 3.3.2) + + Supported Dataproc versions: + GCP Dataproc 2.0 + GCP Dataproc 2.1 + + Supported Dataproc Serverless versions: + Spark runtime 1.1 LTS + +*Some hardware may have a minimum driver version greater than R470. Check the GPU spec sheet +for your hardware's minimum driver version. + +*For Cloudera and EMR support, please refer to the +[Distributions](https://docs.nvidia.com/spark-rapids/user-guide/latest/faq.html#which-distributions-are-supported) section of the FAQ. + +### RAPIDS Accelerator's Support Policy for Apache Spark +The RAPIDS Accelerator maintains support for Apache Spark versions available for download from [Apache Spark](https://spark.apache.org/downloads.html) + +### Download RAPIDS Accelerator for Apache Spark v23.12.2 +- **Scala 2.12:** + - [RAPIDS Accelerator for Apache Spark 23.12.2 - Scala 2.12 jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/23.12.2/rapids-4-spark_2.12-23.12.2.jar) + - [RAPIDS Accelerator for Apache Spark 23.12.2 - Scala 2.12 jar.asc](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/23.12.2/rapids-4-spark_2.12-23.12.2.jar.asc) + +- **Scala 2.13:** + - [RAPIDS Accelerator for Apache Spark 23.12.2 - Scala 2.13 jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.13/23.12.2/rapids-4-spark_2.13-23.12.2.jar) + - [RAPIDS Accelerator for Apache Spark 23.12.2 - Scala 2.13 jar.asc](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.13/23.12.2/rapids-4-spark_2.13-23.12.2.jar.asc) + +This package is built against CUDA 11.8. It is tested on V100, T4, A10, A100, L4 and H100 GPUs with +CUDA 11.8 through CUDA 12.0. + +### Verify signature +* Download the [PUB_KEY](https://keys.openpgp.org/search?q=sw-spark@nvidia.com). +* Import the public key: `gpg --import PUB_KEY` +* Verify the signature for Scala 2.12 jar: + `gpg --verify rapids-4-spark_2.12-23.12.2.jar.asc rapids-4-spark_2.12-23.12.2.jar` +* Verify the signature for Scala 2.13 jar: + `gpg --verify rapids-4-spark_2.13-23.12.2.jar.asc rapids-4-spark_2.13-23.12.2.jar` + +The output of signature verify: + + gpg: Good signature from "NVIDIA Spark (For the signature of spark-rapids release jars) " + +### Release Notes +New functionality and performance improvements for this release include: +* Introduced support for chunked reading of ORC files. +* Enhanced support for additional time zones and added stack function support. +* Enhanced performance for join and aggregation operations. +* Kernel optimizations have been implemented to improve Parquet read performance. +* RAPIDS Accelerator also built and tested with Scala 2.13. +* Last version to support Pascal-based Nvidia GPUs; discontinued in the next release. +* Introduced support for parquet Legacy rebase mode (spark.sql.parquet.datetimeRebaseModeInRead=LEGACY and spark.sql.parquet.int96RebaseModeInRead=LEGACY) +* Introduced support for Percentile function. +* Delta lake 2.3 support. +* Qualification and Profiling tool: + * Profiling Tool now processes Spark Driver log for GPU runs, enhancing feature analysis. + * Auto-tuner recommendations include AQE settings for optimized performance. + * New configurations in Profiler for enabling off-default features: udfCompiler, incompatibleDateFormats, hasExtendedYearValues. + +For a detailed list of changes, please refer to the +[CHANGELOG](https://github.com/NVIDIA/spark-rapids/blob/main/CHANGELOG.md). + ## Release v23.12.1 ### Hardware Requirements: diff --git a/docs/download.md b/docs/download.md index c8ab9f219b2..df1a64fd66f 100644 --- a/docs/download.md +++ b/docs/download.md @@ -18,7 +18,7 @@ cuDF jar, that is either preinstalled in the Spark classpath on all nodes or sub that uses the RAPIDS Accelerator For Apache Spark. See the [getting-started guide](https://docs.nvidia.com/spark-rapids/user-guide/latest/getting-started/overview.html) for more details. -## Release v23.12.2 +## Release v24.02.0 ### Hardware Requirements: The plugin is tested on the following architectures: @@ -48,6 +48,7 @@ The plugin is tested on the following architectures: Databricks 10.4 ML LTS (GPU, Scala 2.12, Spark 3.2.1) Databricks 11.3 ML LTS (GPU, Scala 2.12, Spark 3.3.0) Databricks 12.2 ML LTS (GPU, Scala 2.12, Spark 3.3.2) + Databricks 13.3 ML LTS (GPU, Scala 2.12, Spark 3.4.1) Supported Dataproc versions: GCP Dataproc 2.0 @@ -65,14 +66,14 @@ for your hardware's minimum driver version. ### RAPIDS Accelerator's Support Policy for Apache Spark The RAPIDS Accelerator maintains support for Apache Spark versions available for download from [Apache Spark](https://spark.apache.org/downloads.html) -### Download RAPIDS Accelerator for Apache Spark v23.12.2 +### Download RAPIDS Accelerator for Apache Spark v24.02.0 - **Scala 2.12:** - - [RAPIDS Accelerator for Apache Spark 23.12.2 - Scala 2.12 jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/23.12.2/rapids-4-spark_2.12-23.12.2.jar) - - [RAPIDS Accelerator for Apache Spark 23.12.2 - Scala 2.12 jar.asc](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/23.12.2/rapids-4-spark_2.12-23.12.2.jar.asc) + - [RAPIDS Accelerator for Apache Spark 24.02.0 - Scala 2.12 jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/24.02.0/rapids-4-spark_2.12-24.02.0.jar) + - [RAPIDS Accelerator for Apache Spark 24.02.0 - Scala 2.12 jar.asc](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/24.02.0/rapids-4-spark_2.12-24.02.0.jar.asc) - **Scala 2.13:** - - [RAPIDS Accelerator for Apache Spark 23.12.2 - Scala 2.13 jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.13/23.12.2/rapids-4-spark_2.13-23.12.2.jar) - - [RAPIDS Accelerator for Apache Spark 23.12.2 - Scala 2.13 jar.asc](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.13/23.12.2/rapids-4-spark_2.13-23.12.2.jar.asc) + - [RAPIDS Accelerator for Apache Spark 24.02.0 - Scala 2.13 jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.13/24.02.0/rapids-4-spark_2.13-24.02.0.jar) + - [RAPIDS Accelerator for Apache Spark 24.02.0 - Scala 2.13 jar.asc](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.13/24.02.0/rapids-4-spark_2.13-24.02.0.jar.asc) This package is built against CUDA 11.8. It is tested on V100, T4, A10, A100, L4 and H100 GPUs with CUDA 11.8 through CUDA 12.0. @@ -81,9 +82,9 @@ CUDA 11.8 through CUDA 12.0. * Download the [PUB_KEY](https://keys.openpgp.org/search?q=sw-spark@nvidia.com). * Import the public key: `gpg --import PUB_KEY` * Verify the signature for Scala 2.12 jar: - `gpg --verify rapids-4-spark_2.12-23.12.2.jar.asc rapids-4-spark_2.12-23.12.2.jar` + `gpg --verify rapids-4-spark_2.12-24.02.0.jar.asc rapids-4-spark_2.12-24.02.0.jar` * Verify the signature for Scala 2.13 jar: - `gpg --verify rapids-4-spark_2.13-23.12.2.jar.asc rapids-4-spark_2.13-23.12.2.jar` + `gpg --verify rapids-4-spark_2.13-24.02.0.jar.asc rapids-4-spark_2.13-24.02.0.jar` The output of signature verify: @@ -91,19 +92,17 @@ The output of signature verify: ### Release Notes New functionality and performance improvements for this release include: -* Introduced support for chunked reading of ORC files. -* Enhanced support for additional time zones and added stack function support. -* Enhanced performance for join and aggregation operations. -* Kernel optimizations have been implemented to improve Parquet read performance. -* RAPIDS Accelerator also built and tested with Scala 2.13. -* Last version to support Pascal-based Nvidia GPUs; discontinued in the next release. -* Introduced support for parquet Legacy rebase mode (spark.sql.parquet.datetimeRebaseModeInRead=LEGACY and spark.sql.parquet.int96RebaseModeInRead=LEGACY) -* Introduced support for Percentile function. -* Delta lake 2.3 support. +* Discontinued support for Nvidia GPUs based on Pascal architecture. +* Set get_json_object functionality to disabled by default. +* Implemented string comparison in AST expressions. +* Expanded timezone support to include options beyond UTC. +* Enhanced security by adding checksum for cached files in Filecache. +* Introduced support for Databricks 13.3 ML LTS. +* Added support for parse_url functionality. * Qualification and Profiling tool: - * Profiling Tool now processes Spark Driver log for GPU runs, enhancing feature analysis. - * Auto-tuner recommendations include AQE settings for optimized performance. - * New configurations in Profiler for enabling off-default features: udfCompiler, incompatibleDateFormats, hasExtendedYearValues. + * Enhanced qualification tool accuracy by incorporating penalties for executors/operators not linked with stages. + * Increased granularity in unsupported operators output within the Qualification Tool to understand potential fallback impacts better. + * Enhanced shuffle partitions recommendation heuristic for more effective Profiling Tool reports. For a detailed list of changes, please refer to the [CHANGELOG](https://github.com/NVIDIA/spark-rapids/blob/main/CHANGELOG.md). From 8a3e4ed752ceb3a776bd401c643c827494ea7c87 Mon Sep 17 00:00:00 2001 From: Suraj Aralihalli Date: Tue, 6 Feb 2024 13:48:15 -0800 Subject: [PATCH 2/4] introduce arm64 Signed-off-by: Suraj Aralihalli --- docs/download.md | 26 +++++++++++++++++++------- 1 file changed, 19 insertions(+), 7 deletions(-) diff --git a/docs/download.md b/docs/download.md index df1a64fd66f..7414b7974ed 100644 --- a/docs/download.md +++ b/docs/download.md @@ -56,6 +56,8 @@ The plugin is tested on the following architectures: Supported Dataproc Serverless versions: Spark runtime 1.1 LTS + Spark runtime 2.0 + Spark runtime 2.1 *Some hardware may have a minimum driver version greater than R470. Check the GPU spec sheet for your hardware's minimum driver version. @@ -67,13 +69,23 @@ for your hardware's minimum driver version. The RAPIDS Accelerator maintains support for Apache Spark versions available for download from [Apache Spark](https://spark.apache.org/downloads.html) ### Download RAPIDS Accelerator for Apache Spark v24.02.0 -- **Scala 2.12:** - - [RAPIDS Accelerator for Apache Spark 24.02.0 - Scala 2.12 jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/24.02.0/rapids-4-spark_2.12-24.02.0.jar) - - [RAPIDS Accelerator for Apache Spark 24.02.0 - Scala 2.12 jar.asc](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/24.02.0/rapids-4-spark_2.12-24.02.0.jar.asc) - -- **Scala 2.13:** - - [RAPIDS Accelerator for Apache Spark 24.02.0 - Scala 2.13 jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.13/24.02.0/rapids-4-spark_2.13-24.02.0.jar) - - [RAPIDS Accelerator for Apache Spark 24.02.0 - Scala 2.13 jar.asc](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.13/24.02.0/rapids-4-spark_2.13-24.02.0.jar.asc) +- **x86** + - **Scala 2.12:** + - [RAPIDS Accelerator for Apache Spark 24.02.0 - Scala 2.12 jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/24.02.0/rapids-4-spark_2.12-24.02.0.jar) + - [RAPIDS Accelerator for Apache Spark 24.02.0 - Scala 2.12 jar.asc](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/24.02.0/rapids-4-spark_2.12-24.02.0.jar.asc) + + - **Scala 2.13:** + - [RAPIDS Accelerator for Apache Spark 24.02.0 - Scala 2.13 jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.13/24.02.0/rapids-4-spark_2.13-24.02.0.jar) + - [RAPIDS Accelerator for Apache Spark 24.02.0 - Scala 2.13 jar.asc](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.13/24.02.0/rapids-4-spark_2.13-24.02.0.jar.asc) + +- **arm64** + - **Scala 2.12:** + - [RAPIDS Accelerator for Apache Spark 24.02.0 - Scala 2.12 arm64 jar]() + - [RAPIDS Accelerator for Apache Spark 24.02.0 - Scala 2.12 arm64 jar.asc]() + + - **Scala 2.13:** + - [RAPIDS Accelerator for Apache Spark 24.02.0 - Scala 2.13 arm64 jar]() + - [RAPIDS Accelerator for Apache Spark 24.02.0 - Scala 2.13 arm64 jar.asc]() This package is built against CUDA 11.8. It is tested on V100, T4, A10, A100, L4 and H100 GPUs with CUDA 11.8 through CUDA 12.0. From 9a19cd3565e17630bffe7d359e6edc4b8a850d6b Mon Sep 17 00:00:00 2001 From: Suraj Aralihalli Date: Fri, 9 Feb 2024 10:48:01 -0800 Subject: [PATCH 3/4] update readme Signed-off-by: Suraj Aralihalli --- docs/download.md | 33 +++++++++++---------------------- 1 file changed, 11 insertions(+), 22 deletions(-) diff --git a/docs/download.md b/docs/download.md index 7414b7974ed..118ea905feb 100644 --- a/docs/download.md +++ b/docs/download.md @@ -69,23 +69,12 @@ for your hardware's minimum driver version. The RAPIDS Accelerator maintains support for Apache Spark versions available for download from [Apache Spark](https://spark.apache.org/downloads.html) ### Download RAPIDS Accelerator for Apache Spark v24.02.0 -- **x86** - - **Scala 2.12:** - - [RAPIDS Accelerator for Apache Spark 24.02.0 - Scala 2.12 jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/24.02.0/rapids-4-spark_2.12-24.02.0.jar) - - [RAPIDS Accelerator for Apache Spark 24.02.0 - Scala 2.12 jar.asc](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/24.02.0/rapids-4-spark_2.12-24.02.0.jar.asc) - - - **Scala 2.13:** - - [RAPIDS Accelerator for Apache Spark 24.02.0 - Scala 2.13 jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.13/24.02.0/rapids-4-spark_2.13-24.02.0.jar) - - [RAPIDS Accelerator for Apache Spark 24.02.0 - Scala 2.13 jar.asc](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.13/24.02.0/rapids-4-spark_2.13-24.02.0.jar.asc) - -- **arm64** - - **Scala 2.12:** - - [RAPIDS Accelerator for Apache Spark 24.02.0 - Scala 2.12 arm64 jar]() - - [RAPIDS Accelerator for Apache Spark 24.02.0 - Scala 2.12 arm64 jar.asc]() - - - **Scala 2.13:** - - [RAPIDS Accelerator for Apache Spark 24.02.0 - Scala 2.13 arm64 jar]() - - [RAPIDS Accelerator for Apache Spark 24.02.0 - Scala 2.13 arm64 jar.asc]() +| Processor | Scala Version | Download Jar | Download Signature | +|-----------|---------------|--------------|--------------------| +| x86_64 | Scala 2.12 | [RAPIDS Accelerator 2.12 v24.02.0](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/24.02.0/rapids-4-spark_2.12-24.02.0.jar) | [Signature](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/24.02.0/rapids-4-spark_2.12-24.02.0.jar.asc) | +| x86_64 | Scala 2.13 | [RAPIDS Accelerator 2.13 v24.02.0](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.13/24.02.0/rapids-4-spark_2.13-24.02.0.jar) | [Signature](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.13/24.02.0/rapids-4-spark_2.13-24.02.0.jar.asc) | +| arm64 | Scala 2.12 | [RAPIDS Accelerator 2.12 v24.02.0](#) | [Signature](#) | +| arm64 | Scala 2.13 | [RAPIDS Accelerator 2.13 v24.02.0](#) | [Signature](#) | This package is built against CUDA 11.8. It is tested on V100, T4, A10, A100, L4 and H100 GPUs with CUDA 11.8 through CUDA 12.0. @@ -108,13 +97,13 @@ New functionality and performance improvements for this release include: * Set get_json_object functionality to disabled by default. * Implemented string comparison in AST expressions. * Expanded timezone support to include options beyond UTC. -* Enhanced security by adding checksum for cached files in Filecache. +* Optional checksums for cached files in the file cache. * Introduced support for Databricks 13.3 ML LTS. * Added support for parse_url functionality. -* Qualification and Profiling tool: - * Enhanced qualification tool accuracy by incorporating penalties for executors/operators not linked with stages. - * Increased granularity in unsupported operators output within the Qualification Tool to understand potential fallback impacts better. - * Enhanced shuffle partitions recommendation heuristic for more effective Profiling Tool reports. +* Introducing Lazy Quantifier support for regular expression functions. +* Added support for the format_number function. +* Enhanced batching support for row-based bounded window functions. +* For updates on RAPIDS Accelerator Tools, please visit [this link](https://github.com/NVIDIA/spark-rapids-tools/releases). For a detailed list of changes, please refer to the [CHANGELOG](https://github.com/NVIDIA/spark-rapids/blob/main/CHANGELOG.md). From c6ec30f35eb72027006914d8bd864aa303414e8f Mon Sep 17 00:00:00 2001 From: Suraj Aralihalli Date: Fri, 9 Feb 2024 16:40:51 -0800 Subject: [PATCH 4/4] update links for arm64 Signed-off-by: Suraj Aralihalli --- docs/download.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/download.md b/docs/download.md index 118ea905feb..52bb04da4ed 100644 --- a/docs/download.md +++ b/docs/download.md @@ -73,8 +73,8 @@ The RAPIDS Accelerator maintains support for Apache Spark versions available for |-----------|---------------|--------------|--------------------| | x86_64 | Scala 2.12 | [RAPIDS Accelerator 2.12 v24.02.0](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/24.02.0/rapids-4-spark_2.12-24.02.0.jar) | [Signature](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/24.02.0/rapids-4-spark_2.12-24.02.0.jar.asc) | | x86_64 | Scala 2.13 | [RAPIDS Accelerator 2.13 v24.02.0](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.13/24.02.0/rapids-4-spark_2.13-24.02.0.jar) | [Signature](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.13/24.02.0/rapids-4-spark_2.13-24.02.0.jar.asc) | -| arm64 | Scala 2.12 | [RAPIDS Accelerator 2.12 v24.02.0](#) | [Signature](#) | -| arm64 | Scala 2.13 | [RAPIDS Accelerator 2.13 v24.02.0](#) | [Signature](#) | +| arm64 | Scala 2.12 | [RAPIDS Accelerator 2.12 v24.02.0](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/24.02.0/rapids-4-spark_2.12-24.02.0-arm64.jar) | [Signature](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/24.02.0/rapids-4-spark_2.12-24.02.0-arm64.jar.asc) | +| arm64 | Scala 2.13 | [RAPIDS Accelerator 2.13 v24.02.0](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.13/24.02.0/rapids-4-spark_2.13-24.02.0-arm64.jar) | [Signature](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.13/24.02.0/rapids-4-spark_2.13-24.02.0-arm64.jar.asc) | This package is built against CUDA 11.8. It is tested on V100, T4, A10, A100, L4 and H100 GPUs with CUDA 11.8 through CUDA 12.0.