Skip to content

Commit

Permalink
Merge branch 'branch-25.02' into dump_write_data
Browse files Browse the repository at this point in the history
  • Loading branch information
revans2 committed Dec 13, 2024
2 parents 8f8ce17 + edb7a67 commit b5403c4
Show file tree
Hide file tree
Showing 9 changed files with 271 additions and 52 deletions.
90 changes: 90 additions & 0 deletions docs/archive.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,96 @@ nav_order: 15
---
Below are archived releases for RAPIDS Accelerator for Apache Spark.

## Release v24.10.1
### Hardware Requirements:

The plugin is tested on the following architectures:

GPU Models: NVIDIA V100, T4, A10/A100, L4 and H100 GPUs

### Software Requirements:

OS: Spark RAPIDS is compatible with any Linux distribution with glibc >= 2.28 (Please check ldd --version output). glibc 2.28 was released August 1, 2018.
Tested on Ubuntu 20.04, Ubuntu 22.04, Rocky Linux 8 and Rocky Linux 9

NVIDIA Driver*: R470+

Runtime:
Scala 2.12, 2.13
Python, Java Virtual Machine (JVM) compatible with your spark-version.

* Check the Spark documentation for Python and Java version compatibility with your specific
Spark version. For instance, visit `https://spark.apache.org/docs/3.4.1` for Spark 3.4.1.

Supported Spark versions:
Apache Spark 3.2.0, 3.2.1, 3.2.2, 3.2.3, 3.2.4
Apache Spark 3.3.0, 3.3.1, 3.3.2, 3.3.3, 3.3.4
Apache Spark 3.4.0, 3.4.1, 3.4.2, 3.4.3
Apache Spark 3.5.0, 3.5.1, 3.5.2

Supported Databricks runtime versions for Azure and AWS:
Databricks 11.3 ML LTS (GPU, Scala 2.12, Spark 3.3.0)
Databricks 12.2 ML LTS (GPU, Scala 2.12, Spark 3.3.2)
Databricks 13.3 ML LTS (GPU, Scala 2.12, Spark 3.4.1)

Supported Dataproc versions (Debian/Ubuntu/Rocky):
GCP Dataproc 2.1
GCP Dataproc 2.2

Supported Dataproc Serverless versions:
Spark runtime 1.1 LTS
Spark runtime 2.0
Spark runtime 2.1
Spark runtime 2.2

*Some hardware may have a minimum driver version greater than R470. Check the GPU spec sheet
for your hardware's minimum driver version.

*For Cloudera and EMR support, please refer to the
[Distributions](https://docs.nvidia.com/spark-rapids/user-guide/latest/faq.html#which-distributions-are-supported) section of the FAQ.

### RAPIDS Accelerator's Support Policy for Apache Spark
The RAPIDS Accelerator maintains support for Apache Spark versions available for download from [Apache Spark](https://spark.apache.org/downloads.html)

### Download RAPIDS Accelerator for Apache Spark v24.10.1

| Processor | Scala Version | Download Jar | Download Signature |
|-----------|---------------|--------------|--------------------|
| x86_64 | Scala 2.12 | [RAPIDS Accelerator v24.10.1](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/24.10.1/rapids-4-spark_2.12-24.10.1.jar) | [Signature](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/24.10.1/rapids-4-spark_2.12-24.10.1.jar.asc) |
| x86_64 | Scala 2.13 | [RAPIDS Accelerator v24.10.1](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.13/24.10.1/rapids-4-spark_2.13-24.10.1.jar) | [Signature](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.13/24.10.1/rapids-4-spark_2.13-24.10.1.jar.asc) |
| arm64 | Scala 2.12 | [RAPIDS Accelerator v24.10.1](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/24.10.1/rapids-4-spark_2.12-24.10.1-cuda11-arm64.jar) | [Signature](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/24.10.1/rapids-4-spark_2.12-24.10.1-cuda11-arm64.jar.asc) |
| arm64 | Scala 2.13 | [RAPIDS Accelerator v24.10.1](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.13/24.10.1/rapids-4-spark_2.13-24.10.1-cuda11-arm64.jar) | [Signature](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.13/24.10.1/rapids-4-spark_2.13-24.10.1-cuda11-arm64.jar.asc) |

This package is built against CUDA 11.8. It is tested on V100, T4, A10, A100, L4 and H100 GPUs with
CUDA 11.8 through CUDA 12.0.

### Verify signature
* Download the [PUB_KEY](https://keys.openpgp.org/[email protected]).
* Import the public key: `gpg --import PUB_KEY`
* Verify the signature for Scala 2.12 jar:
`gpg --verify rapids-4-spark_2.12-24.10.1.jar.asc rapids-4-spark_2.12-24.10.1.jar`
* Verify the signature for Scala 2.13 jar:
`gpg --verify rapids-4-spark_2.13-24.10.1.jar.asc rapids-4-spark_2.13-24.10.1.jar`

The output of signature verify:

gpg: Good signature from "NVIDIA Spark (For the signature of spark-rapids release jars) <[email protected]>"

### Release Notes
* Optimize scheduling policy for GPU Semaphore
* Support distinct join for right outer joins
* Support MinBy and MaxBy for non-float ordering
* Support ArrayJoin expression
* Optimize Expand and Aggregate expression performance
* Improve JSON related expressions
* For updates on RAPIDS Accelerator Tools, please visit [this link](https://github.com/NVIDIA/spark-rapids-tools/releases)

Note: There is a known issue in the 24.10.1 release when decompressing gzip files on H100 GPUs.
Please find more details in [issue-16661](https://github.com/rapidsai/cudf/issues/16661).

For a detailed list of changes, please refer to the
[CHANGELOG](https://github.com/NVIDIA/spark-rapids/blob/main/CHANGELOG.md).

## Release v24.10.0
### Hardware Requirements:

Expand Down
33 changes: 18 additions & 15 deletions docs/download.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ cuDF jar, that is either preinstalled in the Spark classpath on all nodes or sub
that uses the RAPIDS Accelerator For Apache Spark. See the [getting-started
guide](https://docs.nvidia.com/spark-rapids/user-guide/latest/getting-started/overview.html) for more details.

## Release v24.10.1
## Release v24.12.0
### Hardware Requirements:

The plugin is tested on the following architectures:
Expand Down Expand Up @@ -69,14 +69,14 @@ for your hardware's minimum driver version.
### RAPIDS Accelerator's Support Policy for Apache Spark
The RAPIDS Accelerator maintains support for Apache Spark versions available for download from [Apache Spark](https://spark.apache.org/downloads.html)

### Download RAPIDS Accelerator for Apache Spark v24.10.1
### Download RAPIDS Accelerator for Apache Spark v24.12.0

| Processor | Scala Version | Download Jar | Download Signature |
|-----------|---------------|--------------|--------------------|
| x86_64 | Scala 2.12 | [RAPIDS Accelerator v24.10.1](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/24.10.1/rapids-4-spark_2.12-24.10.1.jar) | [Signature](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/24.10.1/rapids-4-spark_2.12-24.10.1.jar.asc) |
| x86_64 | Scala 2.13 | [RAPIDS Accelerator v24.10.1](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.13/24.10.1/rapids-4-spark_2.13-24.10.1.jar) | [Signature](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.13/24.10.1/rapids-4-spark_2.13-24.10.1.jar.asc) |
| arm64 | Scala 2.12 | [RAPIDS Accelerator v24.10.1](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/24.10.1/rapids-4-spark_2.12-24.10.1-cuda11-arm64.jar) | [Signature](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/24.10.1/rapids-4-spark_2.12-24.10.1-cuda11-arm64.jar.asc) |
| arm64 | Scala 2.13 | [RAPIDS Accelerator v24.10.1](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.13/24.10.1/rapids-4-spark_2.13-24.10.1-cuda11-arm64.jar) | [Signature](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.13/24.10.1/rapids-4-spark_2.13-24.10.1-cuda11-arm64.jar.asc) |
| x86_64 | Scala 2.12 | [RAPIDS Accelerator v24.12.0](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/24.12.0/rapids-4-spark_2.12-24.12.0.jar) | [Signature](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/24.12.0/rapids-4-spark_2.12-24.12.0.jar.asc) |
| x86_64 | Scala 2.13 | [RAPIDS Accelerator v24.12.0](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.13/24.12.0/rapids-4-spark_2.13-24.12.0.jar) | [Signature](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.13/24.12.0/rapids-4-spark_2.13-24.12.0.jar.asc) |
| arm64 | Scala 2.12 | [RAPIDS Accelerator v24.12.0](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/24.12.0/rapids-4-spark_2.12-24.12.0-cuda11-arm64.jar) | [Signature](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/24.12.0/rapids-4-spark_2.12-24.12.0-cuda11-arm64.jar.asc) |
| arm64 | Scala 2.13 | [RAPIDS Accelerator v24.12.0](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.13/24.12.0/rapids-4-spark_2.13-24.12.0-cuda11-arm64.jar) | [Signature](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.13/24.12.0/rapids-4-spark_2.13-24.12.0-cuda11-arm64.jar.asc) |

This package is built against CUDA 11.8. It is tested on V100, T4, A10, A100, L4 and H100 GPUs with
CUDA 11.8 through CUDA 12.0.
Expand All @@ -85,24 +85,27 @@ CUDA 11.8 through CUDA 12.0.
* Download the [PUB_KEY](https://keys.openpgp.org/[email protected]).
* Import the public key: `gpg --import PUB_KEY`
* Verify the signature for Scala 2.12 jar:
`gpg --verify rapids-4-spark_2.12-24.10.1.jar.asc rapids-4-spark_2.12-24.10.1.jar`
`gpg --verify rapids-4-spark_2.12-24.12.0.jar.asc rapids-4-spark_2.12-24.12.0.jar`
* Verify the signature for Scala 2.13 jar:
`gpg --verify rapids-4-spark_2.13-24.10.1.jar.asc rapids-4-spark_2.13-24.10.1.jar`
`gpg --verify rapids-4-spark_2.13-24.12.0.jar.asc rapids-4-spark_2.13-24.12.0.jar`

The output of signature verify:

gpg: Good signature from "NVIDIA Spark (For the signature of spark-rapids release jars) <[email protected]>"

### Release Notes
* Optimize scheduling policy for GPU Semaphore
* Support distinct join for right outer joins
* Support MinBy and MaxBy for non-float ordering
* Support ArrayJoin expression
* Optimize Expand and Aggregate expression performance
* Improve JSON related expressions
* Add repartition-based algorithm fallback in hash aggregate
* Support Spark function months_between
* Support asynchronous writing for Parquet files
* Add retry support to improve sub hash-join stability
* Improve JSON scan and from_json
* Improved performance for CASE WHEN statements comparing a string column against multiple values
* Falling back to the CPU for ORC boolean writes by the GPU due to a bug in cudf's ORC writer
* Fix a device memory leak in timestamp operator in `incompatibleDateFormats` case
* Fix a host memory leak in GpuBroadcastNestedLoopJoinExecBase when `spillableBuiltBatch` is 0
* For updates on RAPIDS Accelerator Tools, please visit [this link](https://github.com/NVIDIA/spark-rapids-tools/releases)

Note: There is a known issue in the 24.10.1 release when decompressing gzip files on H100 GPUs.
Note: There is a known issue in the 24.12.0 release when decompressing gzip files on H100 GPUs.
Please find more details in [issue-16661](https://github.com/rapidsai/cudf/issues/16661).

For a detailed list of changes, please refer to the
Expand Down
2 changes: 0 additions & 2 deletions integration_tests/src/main/python/json_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -1012,7 +1012,6 @@ def test_from_json_struct_of_list_with_mismatched_schema():
'struct<student:array<struct<name:string,class:string>>>',
'struct<teacher:string,student:array<struct<name:string,class:string>>>'])
@allow_non_gpu(*non_utc_allow)
@pytest.mark.xfail(reason='https://github.com/rapidsai/cudf/issues/17349')
def test_from_json_struct_of_list_with_mixed_nested_types_input(schema):
json_string_gen = StringGen(r'{"teacher": "[A-Z]{1}[a-z]{2,5}",' \
r'"student": \[{"name": "[A-Z]{1}[a-z]{2,5}", "class": "junior"},' \
Expand Down Expand Up @@ -1399,7 +1398,6 @@ def test_spark_from_json_empty_table(data):
conf =_enable_all_types_conf)

# SPARK-20549: from_json bad UTF-8
@pytest.mark.xfail(reason='https://github.com/NVIDIA/spark-rapids/issues/10483')
@allow_non_gpu(*non_utc_allow) # https://github.com/NVIDIA/spark-rapids/issues/10453
def test_spark_from_json_bad_json():
schema = StructType([StructField("a", IntegerType())])
Expand Down
34 changes: 11 additions & 23 deletions scripts/generate-changelog
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#!/usr/bin/env python

# Copyright (c) 2020-2023, NVIDIA CORPORATION.
# Copyright (c) 2020-2024, NVIDIA CORPORATION.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -164,8 +164,7 @@ query ($after: String, $since: DateTime) {
"""


def process_changelog(resource_type: str, changelog: dict, releases: set, projects: set, no_project_prs: list,
token: str):
def process_changelog(resource_type: str, changelog: dict, releases: set, projects: set, token: str):
if resource_type == PULL_REQUESTS:
items = process_pr(releases=releases, token=token)
time_field = 'mergedAt'
Expand All @@ -178,14 +177,14 @@ def process_changelog(resource_type: str, changelog: dict, releases: set, projec

for item in items:
if len(item["projectItems"]["nodes"]) == 0 or not item["projectItems"]["nodes"][0]['roadmap']:
# compatibility support for project API V1
if len(item['projectCards']['nodes']) == 0:
if resource_type == PULL_REQUESTS:
if '[bot]' in item['title']:
continue # skip auto-gen PR
no_project_prs.append(item)
if resource_type == PULL_REQUESTS:
if '[bot]' in item['title']:
continue # skip auto-gen PR
# Obtain the version from the PR's target branch, e.g. branch-x.y --> x.y
ver = item['baseRefName'].replace('branch-', '')
project = f"{RELEASE} {ver}"
else:
continue
project = item['projectCards']['nodes'][0]['project']['name']
else:
ver = item["projectItems"]["nodes"][0]['roadmap']['name']
project = f"{RELEASE} {ver}"
Expand Down Expand Up @@ -309,12 +308,6 @@ def form_subsection(issues: dict, subtitle: str):
return subsection


def print_no_project_pr(no_project_prs: list):
if len(no_project_prs) != 0:
print("\nNOTE: Merged Pull Requests w/o Project:")
for pr in no_project_prs:
print(f"{pr['baseRefName']} #{pr['number']} {pr['title']} {pr['url']}")


def main(rels: str, path: str, token: str):
print('Generating changelog ...')
Expand All @@ -323,25 +316,20 @@ def main(rels: str, path: str, token: str):
changelog = {} # changelog dict
releases = {x.strip() for x in rels.split(',')}
projects = {f"{RELEASE} {rel}" for rel in releases}
no_project_prs = [] # list of merge pr w/o project

print('Processing pull requests ...')
process_changelog(resource_type=PULL_REQUESTS, changelog=changelog,
releases=releases, projects=projects,
no_project_prs=no_project_prs, token=token)
releases=releases, projects=projects, token=token)
print('Processing issues ...')
process_changelog(resource_type=ISSUES, changelog=changelog,
releases=releases, projects=projects,
no_project_prs=no_project_prs, token=token)
releases=releases, projects=projects, token=token)
# form doc
form_changelog(path=path, changelog=changelog)
except Exception as e: # pylint: disable=broad-except
print(e)
sys.exit(1)

print('Done.')
# post action
print_no_project_pr(no_project_prs=no_project_prs)


if __name__ == '__main__':
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -133,6 +133,7 @@ object GpuSemaphore {
}

private val MAX_PERMITS = 1000
val DEFAULT_PRIORITY = 0L

def computeNumPermits(conf: SQLConf): Int = {
val concurrentStr = conf.getConfString(RapidsConf.CONCURRENT_GPU_TASKS.key, null)
Expand Down Expand Up @@ -184,7 +185,8 @@ private final class SemaphoreTaskInfo(val stageId: Int, val taskAttemptId: Long)
* If this task holds the GPU semaphore or not.
*/
private var hasSemaphore = false
private var lastHeld: Long = 0
private var lastAcquired: Long = GpuSemaphore.DEFAULT_PRIORITY
private var lastReleased: Long = GpuSemaphore.DEFAULT_PRIORITY

type GpuBackingSemaphore = PrioritySemaphore[Long]

Expand Down Expand Up @@ -256,11 +258,12 @@ private final class SemaphoreTaskInfo(val stageId: Int, val taskAttemptId: Long)
if (!done && shouldBlockOnSemaphore) {
// We cannot be in a synchronized block and wait on the semaphore
// so we have to release it and grab it again afterwards.
semaphore.acquire(numPermits, lastHeld, taskAttemptId)
semaphore.acquire(numPermits, lastReleased, taskAttemptId)
synchronized {
// We now own the semaphore so we need to wake up all of the other tasks that are
// waiting.
hasSemaphore = true
lastAcquired = System.nanoTime()
if (trackSemaphore) {
nvtxRange =
Some(new NvtxUniqueRange(s"Stage ${stageId} Task ${taskAttemptId} owning GPU",
Expand Down Expand Up @@ -296,9 +299,10 @@ private final class SemaphoreTaskInfo(val stageId: Int, val taskAttemptId: Long)
} else {
if (blockedThreads.size() == 0) {
// No other threads for this task are waiting, so we might be able to grab this directly
val ret = semaphore.tryAcquire(numPermits, lastHeld, taskAttemptId)
val ret = semaphore.tryAcquire(numPermits, lastReleased, taskAttemptId)
if (ret) {
hasSemaphore = true
lastAcquired = System.nanoTime()
activeThreads.add(t)
// no need to notify because there are no other threads and we are holding the lock
// to ensure that.
Expand All @@ -316,7 +320,8 @@ private final class SemaphoreTaskInfo(val stageId: Int, val taskAttemptId: Long)
if (hasSemaphore) {
semaphore.release(numPermits)
hasSemaphore = false
lastHeld = System.currentTimeMillis()
lastReleased = System.nanoTime()
GpuTaskMetrics.get.addSemaphoreHoldingTime(lastReleased - lastAcquired)
nvtxRange.foreach(_.close())
nvtxRange = None
}
Expand All @@ -333,7 +338,7 @@ private final class GpuSemaphore() extends Logging {
import GpuSemaphore._

type GpuBackingSemaphore = PrioritySemaphore[Long]
private val semaphore = new GpuBackingSemaphore(MAX_PERMITS)
private val semaphore = new GpuBackingSemaphore(MAX_PERMITS, GpuSemaphore.DEFAULT_PRIORITY)
// A map of taskAttemptId => semaphoreTaskInfo.
// This map keeps track of all tasks that are both active on the GPU and blocked waiting
// on the GPU.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,12 @@ package com.nvidia.spark.rapids
import java.util.PriorityQueue
import java.util.concurrent.locks.{Condition, ReentrantLock}

class PrioritySemaphore[T](val maxPermits: Int)(implicit ordering: Ordering[T]) {
import scala.collection.JavaConverters.asScalaIteratorConverter

import org.apache.spark.sql.rapids.GpuTaskMetrics

class PrioritySemaphore[T](val maxPermits: Int, val priorityForNonStarted: T)
(implicit ordering: Ordering[T]) {
// This lock is used to generate condition variables, which affords us the flexibility to notify
// specific threads at a time. If we use the regular synchronized pattern, we have to either
// notify randomly, or if we try creating condition variables not tied to a shared lock, they
Expand Down Expand Up @@ -69,6 +74,11 @@ class PrioritySemaphore[T](val maxPermits: Int)(implicit ordering: Ordering[T])
val info = ThreadInfo(priority, condition, numPermits, taskAttemptId)
try {
waitingQueue.add(info)
// only count tasks that had held semaphore before,
// so they're very likely to have remaining data on GPU
GpuTaskMetrics.get.recordOnGpuTasksWaitingNumber(
waitingQueue.iterator().asScala.count(_.priority != priorityForNonStarted))

while (!info.signaled) {
info.condition.await()
}
Expand Down
Loading

0 comments on commit b5403c4

Please sign in to comment.