From 9568891e98d4fd2afcb931534f2ebd95719f0999 Mon Sep 17 00:00:00 2001 From: Tim Liu Date: Thu, 15 Feb 2024 13:03:36 +0800 Subject: [PATCH] Init changelog 24.02 [skip ci] (#10380) * Init changelog 24.02 Init changelog with GITHUB_TOKEN=<> scripts/generate-changelog --releases=23.12,24.02 move 23.10 content to archives folder Signed-off-by: Tim Liu * Update Change log Signed-off-by: Tim Liu * Update Change log Signed-off-by: Tim Liu --------- Signed-off-by: Tim Liu --- CHANGELOG.md | 528 ++++++++++-------- ...o_23.08.md => CHANGELOG_23.02_to_23.10.md} | 246 +++++++- 2 files changed, 528 insertions(+), 246 deletions(-) rename docs/archives/{CHANGELOG_23.02_to_23.08.md => CHANGELOG_23.02_to_23.10.md} (80%) diff --git a/CHANGELOG.md b/CHANGELOG.md index b3026739a4b..d0783ace096 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,278 @@ # Change log -Generated on 2024-01-31 +Generated on 2024-02-15 + +## Release 24.02 + +### Features +||| +|:---|:---| +|[#9926](https://github.com/NVIDIA/spark-rapids/issues/9926)|[FEA] Add config option for the parquet reader input read limit.| +|[#10270](https://github.com/NVIDIA/spark-rapids/issues/10270)|[FEA] Add support for single quotes when reading JSON| +|[#10253](https://github.com/NVIDIA/spark-rapids/issues/10253)|[FEA] Enable mixed types as string in GpuJsonToStruct| +|[#9692](https://github.com/NVIDIA/spark-rapids/issues/9692)|[FEA] Remove Pascal support| +|[#8806](https://github.com/NVIDIA/spark-rapids/issues/8806)|[FEA] Support lazy quantifier and specified group index in regexp_extract function| +|[#10079](https://github.com/NVIDIA/spark-rapids/issues/10079)|[FEA] Add string parameter support for `unix_timestamp` for non-UTC time zones| +|[#9667](https://github.com/NVIDIA/spark-rapids/issues/9667)|[FEA][JSON] Add support for non default `dateFormat` in `from_json`| +|[#9173](https://github.com/NVIDIA/spark-rapids/issues/9173)|[FEA] Support format_number | +|[#10145](https://github.com/NVIDIA/spark-rapids/issues/10145)|[FEA] Support to_utc_timestamp| +|[#9927](https://github.com/NVIDIA/spark-rapids/issues/9927)|[FEA] Support to_date with non-UTC timezones without DST| +|[#10006](https://github.com/NVIDIA/spark-rapids/issues/10006)|[FEA] Support ```ParseToTimestamp``` for non-UTC time zones| +|[#9096](https://github.com/NVIDIA/spark-rapids/issues/9096)|[FEA] Add Spark 3.3.4 support| +|[#9585](https://github.com/NVIDIA/spark-rapids/issues/9585)|[FEA] support ascii function| +|[#9260](https://github.com/NVIDIA/spark-rapids/issues/9260)|[FEA] Create Spark 3.4.2 shim and build env| +|[#10076](https://github.com/NVIDIA/spark-rapids/issues/10076)|[FEA] Add performance test framework for non-UTC time zone features.| +|[#9881](https://github.com/NVIDIA/spark-rapids/issues/9881)|[TASK] Remove `spark.rapids.sql.nonUTC.enabled` configuration option| +|[#9801](https://github.com/NVIDIA/spark-rapids/issues/9801)|[FEA] Support DateFormat on GPU with a non-UTC timezone| +|[#6834](https://github.com/NVIDIA/spark-rapids/issues/6834)|[FEA] Support GpuHour expression for timezones other than UTC| +|[#6842](https://github.com/NVIDIA/spark-rapids/issues/6842)|[FEA] Support TimeZone aware operations for value extraction| +|[#1860](https://github.com/NVIDIA/spark-rapids/issues/1860)|[FEA] Optimize row based window operations for BOUNDED ranges| +|[#9606](https://github.com/NVIDIA/spark-rapids/issues/9606)|[FEA] Support unix_timestamp with CST(China Time Zone) support| +|[#9815](https://github.com/NVIDIA/spark-rapids/issues/9815)|[FEA] Support ```unix_timestamp``` for non-DST timezones| +|[#8807](https://github.com/NVIDIA/spark-rapids/issues/8807)|[FEA] support ‘yyyyMMdd’ format in from_unixtime function| +|[#9605](https://github.com/NVIDIA/spark-rapids/issues/9605)|[FEA] Support from_unixtime with CST(China Time Zone) support| +|[#6836](https://github.com/NVIDIA/spark-rapids/issues/6836)|[FEA] Support FromUnixTime for non UTC timezones| +|[#9175](https://github.com/NVIDIA/spark-rapids/issues/9175)|[FEA] Support Databricks 13.3| +|[#6881](https://github.com/NVIDIA/spark-rapids/issues/6881)|[FEA] Support RAPIDS Spark plugin on ARM| +|[#9274](https://github.com/NVIDIA/spark-rapids/issues/9274)|[FEA] Regular deploy process to include arm artifacts| +|[#9844](https://github.com/NVIDIA/spark-rapids/issues/9844)|[FEA] Let Gpu arrow python runners support writing one batch one time for the single threaded model.| +|[#7309](https://github.com/NVIDIA/spark-rapids/issues/7309)|[FEA] Detect multiple versions of the RAPIDS jar on the classpath at the same time| + +### Performance +||| +|:---|:---| +|[#9442](https://github.com/NVIDIA/spark-rapids/issues/9442)|[FEA] For hash joins where the build side can change use the smaller table for the build side| +|[#10142](https://github.com/NVIDIA/spark-rapids/issues/10142)|[TASK] Benchmark existing timestamp functions that work in non-UTC time zone (non-DST)| + +### Bugs Fixed +||| +|:---|:---| +|[#9974](https://github.com/NVIDIA/spark-rapids/issues/9974)|[BUG] host memory Leak in MultiFileCoalescingPartitionReaderBase in UTC time zone| +|[#10359](https://github.com/NVIDIA/spark-rapids/issues/10359)|[BUG] Build failure on Databricks nightly run with `GpuMapInPandasExecMeta`| +|[#10327](https://github.com/NVIDIA/spark-rapids/issues/10327)|[BUG] Unit test FAILED against : SPARK-24957: average with decimal followed by aggregation returning wrong result | +|[#10324](https://github.com/NVIDIA/spark-rapids/issues/10324)|[BUG] hash_aggregate_test.py test FAILED: Type conversion is not allowed from Table {...}| +|[#10291](https://github.com/NVIDIA/spark-rapids/issues/10291)|[BUG] SIGSEGV in libucp.so| +|[#9212](https://github.com/NVIDIA/spark-rapids/issues/9212)|[BUG] `from_json` fails with cuDF error `Invalid list size computation error`| +|[#10264](https://github.com/NVIDIA/spark-rapids/issues/10264)|[BUG] hash aggregate test failures due to type conversion errors| +|[#10262](https://github.com/NVIDIA/spark-rapids/issues/10262)|[BUG] Test "SPARK-24957: average with decimal followed by aggregation returning wrong result" failed.| +|[#9353](https://github.com/NVIDIA/spark-rapids/issues/9353)|[BUG] [JSON] A mix of lists and structs within the same column is not supported| +|[#10099](https://github.com/NVIDIA/spark-rapids/issues/10099)|[BUG] orc_test.py::test_orc_scan_with_aggregate_pushdown fails with a standalone cluster on spark 3.3.0| +|[#10047](https://github.com/NVIDIA/spark-rapids/issues/10047)|[BUG] CudfException during conditional hash join while running nds query64| +|[#9779](https://github.com/NVIDIA/spark-rapids/issues/9779)|[BUG] 330cdh failed test_hash_reduction_sum_full_decimal on CI| +|[#10197](https://github.com/NVIDIA/spark-rapids/issues/10197)|[BUG] Disable GetJsonObject by default and update docs| +|[#10165](https://github.com/NVIDIA/spark-rapids/issues/10165)|[BUG] Databricks 13.3 executor side broadcast failure| +|[#10224](https://github.com/NVIDIA/spark-rapids/issues/10224)|[BUG] DBR builds fails when installing Maven| +|[#10222](https://github.com/NVIDIA/spark-rapids/issues/10222)|[BUG] to_utc_timestamp and from_utc_timestamp fallback when TZ is supported time zone| +|[#10195](https://github.com/NVIDIA/spark-rapids/issues/10195)|[BUG] test_window_aggs_for_negative_rows_partitioned failure in CI| +|[#10182](https://github.com/NVIDIA/spark-rapids/issues/10182)|[BUG] test_dpp_bypass / test_dpp_via_aggregate_subquery failures in CI (databricks)| +|[#10169](https://github.com/NVIDIA/spark-rapids/issues/10169)|[BUG] Host column vector leaks when running `test_cast_timestamp_to_date`| +|[#10050](https://github.com/NVIDIA/spark-rapids/issues/10050)|[BUG] test_cast_decimal_to_decimal[to:DecimalType(1,-1)-from:Decimal(5,-3)] fails with DATAGEN_SEED=1702439569| +|[#10088](https://github.com/NVIDIA/spark-rapids/issues/10088)|[BUG] GpuExplode single row split to fit cuDF limits| +|[#10174](https://github.com/NVIDIA/spark-rapids/issues/10174)|[BUG] json_test.py::test_from_json_struct_timestamp failed on: Part of the plan is not columnar | +|[#10186](https://github.com/NVIDIA/spark-rapids/issues/10186)|[BUG] test_to_date_with_window_functions failed in non-UTC nightly CI| +|[#10154](https://github.com/NVIDIA/spark-rapids/issues/10154)|[BUG] 'spark-test.sh' integration tests FAILED on 'ps: command not found" in Rocky Docker environment| +|[#10175](https://github.com/NVIDIA/spark-rapids/issues/10175)|[BUG] string_test.py::test_format_number_float_special FAILED : AssertionError 'NaN' == | +|[#10166](https://github.com/NVIDIA/spark-rapids/issues/10166)|Detect Undeclared Shim in POM.xml| +|[#10170](https://github.com/NVIDIA/spark-rapids/issues/10170)|[BUG] `test_cast_timestamp_to_date` fails with `TZ=Asia/Hebron`| +|[#10149](https://github.com/NVIDIA/spark-rapids/issues/10149)|[BUG] GPU illegal access detected during delta_byte_array.parquet read| +|[#9905](https://github.com/NVIDIA/spark-rapids/issues/9905)|[BUG] GpuJsonScan incorrect behavior when parsing dates| +|[#10163](https://github.com/NVIDIA/spark-rapids/issues/10163)|Spark 3.3.4 Shim Build Failure| +|[#10105](https://github.com/NVIDIA/spark-rapids/issues/10105)|[BUG] scala:compile is not thread safe unless compiler bridge already exists | +|[#10026](https://github.com/NVIDIA/spark-rapids/issues/10026)|[BUG] test_hash_agg_with_nan_keys failed with a DATAGEN_SEED=1702335559| +|[#10075](https://github.com/NVIDIA/spark-rapids/issues/10075)|[BUG] `non-pinned blocking alloc with spill` unit test failed in HostAllocSuite| +|[#10134](https://github.com/NVIDIA/spark-rapids/issues/10134)|[BUG] test_window_aggs_for_batched_finite_row_windows_partitioned failed on Scala 2.13 with DATAGEN_SEED=1704033145| +|[#10118](https://github.com/NVIDIA/spark-rapids/issues/10118)|[BUG] non-UTC Nightly CI failed| +|[#10136](https://github.com/NVIDIA/spark-rapids/issues/10136)|[BUG] The canonicalized version of `GpuFileSourceScanExec`s that suppose to be semantic-equal can be different | +|[#10110](https://github.com/NVIDIA/spark-rapids/issues/10110)|[BUG] disable collect_list and collect_set for window operations by default.| +|[#10129](https://github.com/NVIDIA/spark-rapids/issues/10129)|[BUG] Unit test suite fails with `Null data pointer` in GpuTimeZoneDB| +|[#10089](https://github.com/NVIDIA/spark-rapids/issues/10089)|[BUG] DATAGEN_SEED= environment does not override the marker datagen_overrides| +|[#10108](https://github.com/NVIDIA/spark-rapids/issues/10108)|[BUG] @datagen_overrides seed is sticky when it shouldn't be| +|[#10064](https://github.com/NVIDIA/spark-rapids/issues/10064)|[BUG] test_unsupported_fallback_regexp_replace failed with DATAGEN_SEED=1702662063| +|[#10117](https://github.com/NVIDIA/spark-rapids/issues/10117)|[BUG] test_from_utc_timestamp failed on Cloudera Env when TZ is Iran| +|[#9914](https://github.com/NVIDIA/spark-rapids/issues/9914)|[BUG] Report GPU OOM on recent passed CI premerges.| +|[#10094](https://github.com/NVIDIA/spark-rapids/issues/10094)|[BUG] spark351 PR check failure MockTaskContext method isFailed in class TaskContext of type ()Boolean is not defined| +|[#10017](https://github.com/NVIDIA/spark-rapids/issues/10017)|[BUG] test_casting_from_double_to_timestamp failed for DATAGEN_SEED=1702329497| +|[#9992](https://github.com/NVIDIA/spark-rapids/issues/9992)|[BUG] conditionals_test.py::test_conditional_with_side_effects_cast[String] failed with DATAGEN_SEED=1701976979| +|[#9743](https://github.com/NVIDIA/spark-rapids/issues/9743)|[BUG][AUDIT] SPARK-45652 - SPJ: Handle empty input partitions after dynamic filtering| +|[#9859](https://github.com/NVIDIA/spark-rapids/issues/9859)|[AUDIT] [SPARK-45786] Inaccurate Decimal multiplication and division results| +|[#9555](https://github.com/NVIDIA/spark-rapids/issues/9555)|[BUG] Scala 2.13 build with JDK 11 or 17 fails OpcodeSuite tests| +|[#10073](https://github.com/NVIDIA/spark-rapids/issues/10073)|[BUG] test_csv_prefer_date_with_infer_schema failed with DATAGEN_SEED=1702847907| +|[#10004](https://github.com/NVIDIA/spark-rapids/issues/10004)|[BUG] If a host memory buffer is spilled, it cannot be unspilled| +|[#10063](https://github.com/NVIDIA/spark-rapids/issues/10063)|[BUG] CI build failure with 341db: method getKillReason has weaker access privileges; it should be public| +|[#10055](https://github.com/NVIDIA/spark-rapids/issues/10055)|[BUG] array_test.py::test_array_transform_non_deterministic failed with non-UTC time zone| +|[#10056](https://github.com/NVIDIA/spark-rapids/issues/10056)|[BUG] Unit tests ToPrettyStringSuite FAILED on spark-3.5.0| +|[#10048](https://github.com/NVIDIA/spark-rapids/issues/10048)|[BUG] Fix ```out of range``` error from ```pySpark``` in ```test_timestamp_millis``` and other two integration test cases| +|[#4204](https://github.com/NVIDIA/spark-rapids/issues/4204)|casting double to string does not match Spark| +|[#9938](https://github.com/NVIDIA/spark-rapids/issues/9938)|Better to do some refactor for the Python UDF code| +|[#10018](https://github.com/NVIDIA/spark-rapids/issues/10018)|[BUG] `GpuToUnixTimestampImproved` off by 1 on GPU when handling timestamp before epoch| +|[#10012](https://github.com/NVIDIA/spark-rapids/issues/10012)|[BUG] test_str_to_map_expr_random_delimiters with DATAGEN_SEED=1702166057 hangs| +|[#10029](https://github.com/NVIDIA/spark-rapids/issues/10029)|[BUG] doc links fail with 404 for shims.md| +|[#9472](https://github.com/NVIDIA/spark-rapids/issues/9472)|[BUG] Non-Deterministic expressions in an array_transform can cause errors| +|[#9884](https://github.com/NVIDIA/spark-rapids/issues/9884)|[BUG] delta_lake_delete_test.py failed assertion [DATAGEN_SEED=1701225104, IGNORE_ORDER...| +|[#9977](https://github.com/NVIDIA/spark-rapids/issues/9977)|[BUG] test_cast_date_integral fails on databricks 3.4.1| +|[#9936](https://github.com/NVIDIA/spark-rapids/issues/9936)|[BUG] Nightly CI of non-UTC time zone reports 'year 0 is out of range' error| +|[#9941](https://github.com/NVIDIA/spark-rapids/issues/9941)|[BUG] A potential data corruption in Pandas UDFs| +|[#9897](https://github.com/NVIDIA/spark-rapids/issues/9897)|[BUG] Error message for multiple jars on classpath is wrong| +|[#9916](https://github.com/NVIDIA/spark-rapids/issues/9916)|[BUG] ```test_cast_string_ts_valid_format``` failed at ```seed = 1701362564```| +|[#9559](https://github.com/NVIDIA/spark-rapids/issues/9559)|[BUG] precommit regularly fails with error trying to download a dependency| +|[#9708](https://github.com/NVIDIA/spark-rapids/issues/9708)|[BUG] test_cast_string_ts_valid_format fails with DATAGEN_SEED=1699978422| + +### PRs +||| +|:---|:---| +|[#10367](https://github.com/NVIDIA/spark-rapids/pull/10367)|Update rapids JNI and private version to release 24.02.0| +|[#10414](https://github.com/NVIDIA/spark-rapids/pull/10414)|[DOC] Fix 24.02.0 documentation errors [skip ci]| +|[#10403](https://github.com/NVIDIA/spark-rapids/pull/10403)|Cherry-pick: Fix a memory leak in json tuple (#10360)| +|[#10387](https://github.com/NVIDIA/spark-rapids/pull/10387)|[DOC] Update docs for 24.02.0 release [skip ci]| +|[#10399](https://github.com/NVIDIA/spark-rapids/pull/10399)|Update NOTICE-binary| +|[#10389](https://github.com/NVIDIA/spark-rapids/pull/10389)|Change version and branch to 24.02 in docs [skip ci]| +|[#10309](https://github.com/NVIDIA/spark-rapids/pull/10309)|[DOC] add custom 404 page and fix some document issue [skip ci]| +|[#10352](https://github.com/NVIDIA/spark-rapids/pull/10352)|xfail mixed type test| +|[#10355](https://github.com/NVIDIA/spark-rapids/pull/10355)|Revert "Support barrier mode for mapInPandas/mapInArrow (#10343)"| +|[#10353](https://github.com/NVIDIA/spark-rapids/pull/10353)|Use fixed seed for test_from_json_struct_decimal| +|[#10343](https://github.com/NVIDIA/spark-rapids/pull/10343)|Support barrier mode for mapInPandas/mapInArrow| +|[#10345](https://github.com/NVIDIA/spark-rapids/pull/10345)|Fix auto merge conflict 10339 [skip ci]| +|[#9991](https://github.com/NVIDIA/spark-rapids/pull/9991)|Start to use explicit memory limits in the parquet chunked reader| +|[#10328](https://github.com/NVIDIA/spark-rapids/pull/10328)|Fix typo in spark-tests.sh [skip ci]| +|[#10279](https://github.com/NVIDIA/spark-rapids/pull/10279)|Run '--packages' only with default cuda11 jar| +|[#10273](https://github.com/NVIDIA/spark-rapids/pull/10273)|Support reading JSON data with single quotes around attribute names and values| +|[#10306](https://github.com/NVIDIA/spark-rapids/pull/10306)|Fix performance regression in from_json| +|[#10272](https://github.com/NVIDIA/spark-rapids/pull/10272)|Add FullOuter support to GpuShuffledSymmetricHashJoinExec| +|[#10260](https://github.com/NVIDIA/spark-rapids/pull/10260)|Add perf test for time zone operators| +|[#10275](https://github.com/NVIDIA/spark-rapids/pull/10275)|Add tests for window Python udf with array input| +|[#10278](https://github.com/NVIDIA/spark-rapids/pull/10278)|Clean up $M2_CACHE to avoid side-effect of previous dependency:get [skip ci]| +|[#10268](https://github.com/NVIDIA/spark-rapids/pull/10268)|Add config to enable mixed types as string in GpuJsonToStruct & GpuJsonScan| +|[#10297](https://github.com/NVIDIA/spark-rapids/pull/10297)|Revert "UCX 1.16.0 upgrade (#10190)"| +|[#10289](https://github.com/NVIDIA/spark-rapids/pull/10289)|Add gerashegalov to CODEOWNERS [skip ci]| +|[#10290](https://github.com/NVIDIA/spark-rapids/pull/10290)|Fix merge conflict with 23.12 [skip ci]| +|[#10190](https://github.com/NVIDIA/spark-rapids/pull/10190)|UCX 1.16.0 upgrade| +|[#10211](https://github.com/NVIDIA/spark-rapids/pull/10211)|Use parse_url kernel for QUERY literal and column key| +|[#10267](https://github.com/NVIDIA/spark-rapids/pull/10267)|Update to libcudf unsigned sum aggregation types change| +|[#10208](https://github.com/NVIDIA/spark-rapids/pull/10208)|Added Support for Lazy Quantifier| +|[#9993](https://github.com/NVIDIA/spark-rapids/pull/9993)|Enable mixed types as string in GpuJsonScan| +|[#10246](https://github.com/NVIDIA/spark-rapids/pull/10246)|Refactor full join iterator to allow access to build tracker| +|[#10257](https://github.com/NVIDIA/spark-rapids/pull/10257)|Enable auto-merge from branch-24.02 to branch-24.04 [skip CI]| +|[#10178](https://github.com/NVIDIA/spark-rapids/pull/10178)|Mark hash reduction decimal overflow test as a permanent seed override| +|[#10244](https://github.com/NVIDIA/spark-rapids/pull/10244)|Use POSIX mode in assembly plugin to avoid issues with large UID/GID| +|[#10238](https://github.com/NVIDIA/spark-rapids/pull/10238)|Smoke test with '--package' to fetch the plugin jar| +|[#10201](https://github.com/NVIDIA/spark-rapids/pull/10201)|Deploy release candidates to local maven repo for dependency check[skip ci]| +|[#10240](https://github.com/NVIDIA/spark-rapids/pull/10240)|Improved inner joins with large build side| +|[#10220](https://github.com/NVIDIA/spark-rapids/pull/10220)|Disable GetJsonObject by default and add tests for as many issues with it as possible| +|[#10230](https://github.com/NVIDIA/spark-rapids/pull/10230)|Fix Databricks 13.3 BroadcastHashJoin using executor side broadcast fed by ColumnarToRow [Databricks]| +|[#10232](https://github.com/NVIDIA/spark-rapids/pull/10232)|Fixed 330db Shims to Adopt the PythonRunner Changes| +|[#10225](https://github.com/NVIDIA/spark-rapids/pull/10225)|Download Maven from apache.org archives [skip ci]| +|[#10210](https://github.com/NVIDIA/spark-rapids/pull/10210)|Add string parameter support for unix_timestamp for non-UTC time zones| +|[#10223](https://github.com/NVIDIA/spark-rapids/pull/10223)|Fix to_utc_timestamp and from_utc_timestamp fallback when TZ is supported time zone| +|[#10205](https://github.com/NVIDIA/spark-rapids/pull/10205)|Deterministic ordering in window tests| +|[#10204](https://github.com/NVIDIA/spark-rapids/pull/10204)|Further prevent degenerative joins in dpp_test| +|[#10156](https://github.com/NVIDIA/spark-rapids/pull/10156)|Update string to float compatibility doc[skip ci]| +|[#10193](https://github.com/NVIDIA/spark-rapids/pull/10193)|Fix explode with carry-along columns on GpuExplode single row retry handling| +|[#10191](https://github.com/NVIDIA/spark-rapids/pull/10191)|Updating the config documentation for filecache configs [skip ci]| +|[#10131](https://github.com/NVIDIA/spark-rapids/pull/10131)|With a single row GpuExplode tries to split the generator array| +|[#10179](https://github.com/NVIDIA/spark-rapids/pull/10179)|Fix build regression against Spark 3.2.x| +|[#10189](https://github.com/NVIDIA/spark-rapids/pull/10189)|test needs marks for non-UTC and for non_supported timezones| +|[#10176](https://github.com/NVIDIA/spark-rapids/pull/10176)|Fix format_number NaN symbol in high jdk version| +|[#10074](https://github.com/NVIDIA/spark-rapids/pull/10074)|Update the legacy mode check: only take effect when reading date/timestamp column| +|[#10167](https://github.com/NVIDIA/spark-rapids/pull/10167)|Defined Shims Should Be Declared In POM | +|[#10168](https://github.com/NVIDIA/spark-rapids/pull/10168)|Prevent a degenerative join in test_dpp_reuse_broadcast_exchange| +|[#10171](https://github.com/NVIDIA/spark-rapids/pull/10171)|Fix `test_cast_timestamp_to_date` when running in a DST time zone| +|[#9975](https://github.com/NVIDIA/spark-rapids/pull/9975)|Improve dateFormat support in GpuJsonScan and make tests consistent with GpuStructsToJson| +|[#9790](https://github.com/NVIDIA/spark-rapids/pull/9790)|Support float case of format_number with format_float kernel| +|[#10144](https://github.com/NVIDIA/spark-rapids/pull/10144)|Support to_utc_timestamp| +|[#10162](https://github.com/NVIDIA/spark-rapids/pull/10162)|Fix Spark 334 Build| +|[#10146](https://github.com/NVIDIA/spark-rapids/pull/10146)|Refactor the window code so it is not mostly kept in a few very large files| +|[#10155](https://github.com/NVIDIA/spark-rapids/pull/10155)|Install procps tools for rocky docker images [skip ci]| +|[#10153](https://github.com/NVIDIA/spark-rapids/pull/10153)|Disable multi-threaded Maven | +|[#10100](https://github.com/NVIDIA/spark-rapids/pull/10100)|Enable to_date (via gettimestamp and casting timestamp to date) for non-UTC time zones| +|[#10140](https://github.com/NVIDIA/spark-rapids/pull/10140)|Removed Unnecessary Whitespaces From Spark 3.3.4 Shim [skip ci]| +|[#10148](https://github.com/NVIDIA/spark-rapids/pull/10148)|fix test_hash_agg_with_nan_keys floating point sum failure| +|[#10150](https://github.com/NVIDIA/spark-rapids/pull/10150)|Increase timeouts in HostAllocSuite to avoid timeout failures on slow machines| +|[#10143](https://github.com/NVIDIA/spark-rapids/pull/10143)|Fix `test_window_aggs_for_batched_finite_row_windows_partitioned` fail| +|[#9887](https://github.com/NVIDIA/spark-rapids/pull/9887)|Reduce time-consuming of pre-merge| +|[#10130](https://github.com/NVIDIA/spark-rapids/pull/10130)|Change unit tests that force ooms to specify the oom type (gpu|cpu)| +|[#10138](https://github.com/NVIDIA/spark-rapids/pull/10138)|Update copyright dates in NOTICE files [skip ci]| +|[#10139](https://github.com/NVIDIA/spark-rapids/pull/10139)|Add Delta Lake 2.3.0 to list of versions to test for Spark 3.3.x| +|[#10135](https://github.com/NVIDIA/spark-rapids/pull/10135)|Fix CI: can't find script when there is pushd in script [skip ci]| +|[#10137](https://github.com/NVIDIA/spark-rapids/pull/10137)|Fix the canonicalizing for GPU file scan| +|[#10132](https://github.com/NVIDIA/spark-rapids/pull/10132)|Disable collect_list and collect_set for window by default| +|[#10084](https://github.com/NVIDIA/spark-rapids/pull/10084)|Refactor GpuJsonToStruct to reduce code duplication and manage resources more efficiently| +|[#10087](https://github.com/NVIDIA/spark-rapids/pull/10087)|Additional unit tests for GeneratedInternalRowToCudfRowIterator| +|[#10082](https://github.com/NVIDIA/spark-rapids/pull/10082)|Add Spark 3.3.4 Shim| +|[#10054](https://github.com/NVIDIA/spark-rapids/pull/10054)|Support Ascii function for ascii and latin-1| +|[#10127](https://github.com/NVIDIA/spark-rapids/pull/10127)|Fix merge conflict with branch-23.12| +|[#10097](https://github.com/NVIDIA/spark-rapids/pull/10097)|[DOC] Update docs for 23.12.1 release [skip ci]| +|[#10109](https://github.com/NVIDIA/spark-rapids/pull/10109)|Fixes a bug where datagen seed overrides were sticky and adds datagen_seed_override_disabled| +|[#10093](https://github.com/NVIDIA/spark-rapids/pull/10093)|Fix test_unsupported_fallback_regexp_replace| +|[#10119](https://github.com/NVIDIA/spark-rapids/pull/10119)|Fix from_utc_timestamp case failure on Cloudera when TZ is Iran| +|[#10106](https://github.com/NVIDIA/spark-rapids/pull/10106)|Add `isFailed()` to MockTaskContext and Remove MockTaskContextBase.scala| +|[#10112](https://github.com/NVIDIA/spark-rapids/pull/10112)|Remove datagen seed override for test_conditional_with_side_effects_cast| +|[#10104](https://github.com/NVIDIA/spark-rapids/pull/10104)|[DOC] Add in docs about memory debugging [skip ci]| +|[#9925](https://github.com/NVIDIA/spark-rapids/pull/9925)|Use threads, cache Scala compiler in GH mvn workflow| +|[#9967](https://github.com/NVIDIA/spark-rapids/pull/9967)|Added Spark-3.4.2 Shims| +|[#10061](https://github.com/NVIDIA/spark-rapids/pull/10061)|Use parse_url kernel for QUERY parsing| +|[#10101](https://github.com/NVIDIA/spark-rapids/pull/10101)|[DOC] Add column order error docs [skip ci]| +|[#10078](https://github.com/NVIDIA/spark-rapids/pull/10078)|Add perf test for non-UTC operators| +|[#10096](https://github.com/NVIDIA/spark-rapids/pull/10096)|Shim MockTaskContext to fix Spark 3.5.1 build| +|[#10092](https://github.com/NVIDIA/spark-rapids/pull/10092)|Implement Math.round using floor on GPU| +|[#10085](https://github.com/NVIDIA/spark-rapids/pull/10085)|Update tests that originally restricted the Spark timestamp range| +|[#10090](https://github.com/NVIDIA/spark-rapids/pull/10090)|Replace GPU-unsupported `\z` with an alternative RLIKE expression| +|[#10095](https://github.com/NVIDIA/spark-rapids/pull/10095)|Temporarily fix date format failed cases for non-UTC time zone.| +|[#9999](https://github.com/NVIDIA/spark-rapids/pull/9999)|Add some odd time zones for timezone transition tests| +|[#9962](https://github.com/NVIDIA/spark-rapids/pull/9962)|Add 3.5.1-SNAPSHOT Shim| +|[#10071](https://github.com/NVIDIA/spark-rapids/pull/10071)|Cleanup usage of non-utc configuration here| +|[#10057](https://github.com/NVIDIA/spark-rapids/pull/10057)|Add support for StringConcatFactory.makeConcatWithConstants (#9555)| +|[#9996](https://github.com/NVIDIA/spark-rapids/pull/9996)|Test full timestamp output range in PySpark| +|[#10081](https://github.com/NVIDIA/spark-rapids/pull/10081)|Add a fallback Cloudera Maven repo URL [skip ci]| +|[#10065](https://github.com/NVIDIA/spark-rapids/pull/10065)|Improve host memory spill interfaces| +|[#10070](https://github.com/NVIDIA/spark-rapids/pull/10070)|Fix 332db build failure| +|[#10060](https://github.com/NVIDIA/spark-rapids/pull/10060)|Fix failed cases for non-utc time zone| +|[#10038](https://github.com/NVIDIA/spark-rapids/pull/10038)|Remove spark.rapids.sql.nonUTC.enabled configuration option| +|[#10059](https://github.com/NVIDIA/spark-rapids/pull/10059)|Fixed Failing ToPrettyStringSuite Test for 3.5.0| +|[#10013](https://github.com/NVIDIA/spark-rapids/pull/10013)|Extended configuration of OOM injection mode| +|[#10052](https://github.com/NVIDIA/spark-rapids/pull/10052)|Set seed=0 for some integration test cases| +|[#10053](https://github.com/NVIDIA/spark-rapids/pull/10053)|Remove invalid user from CODEOWNER file [skip ci]| +|[#10049](https://github.com/NVIDIA/spark-rapids/pull/10049)|Fix out of range error from pySpark in test_timestamp_millis and other two integration test cases| +|[#9721](https://github.com/NVIDIA/spark-rapids/pull/9721)|Support date_format via Gpu for non-UTC time zone| +|[#9845](https://github.com/NVIDIA/spark-rapids/pull/9845)|Use parse_url kernel for HOST parsing| +|[#10024](https://github.com/NVIDIA/spark-rapids/pull/10024)|Support hour minute second for non-UTC time zone| +|[#9973](https://github.com/NVIDIA/spark-rapids/pull/9973)|Batching support for row-based bounded window functions | +|[#10042](https://github.com/NVIDIA/spark-rapids/pull/10042)|Update tests to not have hard coded fallback when not needed| +|[#9816](https://github.com/NVIDIA/spark-rapids/pull/9816)|Support unix_timestamp and to_unix_timestamp with non-UTC timezones (non-DST)| +|[#9902](https://github.com/NVIDIA/spark-rapids/pull/9902)|Some refactor for the Python UDF code| +|[#10023](https://github.com/NVIDIA/spark-rapids/pull/10023)|GPU supports `yyyyMMdd` format by post process for the `from_unixtime` function| +|[#10033](https://github.com/NVIDIA/spark-rapids/pull/10033)|Remove GpuToTimestampImproved and spark.rapids.sql.improvedTimeOps.enabled| +|[#10016](https://github.com/NVIDIA/spark-rapids/pull/10016)|Fix infinite loop in test_str_to_map_expr_random_delimiters| +|[#10030](https://github.com/NVIDIA/spark-rapids/pull/10030)|Update links in shims.md| +|[#10015](https://github.com/NVIDIA/spark-rapids/pull/10015)|Fix array_transform to not recompute the argument| +|[#10011](https://github.com/NVIDIA/spark-rapids/pull/10011)|Add cpu oom retry split handling to InternalRowToColumnarBatchIterator| +|[#10019](https://github.com/NVIDIA/spark-rapids/pull/10019)|Fix auto merge conflict 10010 [skip ci]| +|[#9760](https://github.com/NVIDIA/spark-rapids/pull/9760)|Support split broadcast join condition into ast and non-ast| +|[#9827](https://github.com/NVIDIA/spark-rapids/pull/9827)|Enable ORC timestamp and decimal predicate push down tests| +|[#10002](https://github.com/NVIDIA/spark-rapids/pull/10002)|Use Spark 3.3.3 instead of 3.3.2 for Scala 2.13 premerge builds| +|[#10000](https://github.com/NVIDIA/spark-rapids/pull/10000)|Optimize from_unixtime| +|[#10003](https://github.com/NVIDIA/spark-rapids/pull/10003)|Fix merge conflict with branch-23.12| +|[#9984](https://github.com/NVIDIA/spark-rapids/pull/9984)|Fix 340+(including DB341+) does not support casting date to integral/float| +|[#9972](https://github.com/NVIDIA/spark-rapids/pull/9972)|Fix year 0 is out of range in test_from_json_struct_timestamp | +|[#9814](https://github.com/NVIDIA/spark-rapids/pull/9814)|Support from_unixtime via Gpu for non-UTC time zone| +|[#9929](https://github.com/NVIDIA/spark-rapids/pull/9929)|Add host memory retries for GeneratedInternalRowToCudfRowIterator| +|[#9957](https://github.com/NVIDIA/spark-rapids/pull/9957)|Update cases for cast between integral and (date/time)| +|[#9959](https://github.com/NVIDIA/spark-rapids/pull/9959)|Append new authorized user to blossom-ci whitelist [skip ci]| +|[#9942](https://github.com/NVIDIA/spark-rapids/pull/9942)|Fix a potential data corruption for Pandas UDF| +|[#9922](https://github.com/NVIDIA/spark-rapids/pull/9922)|Fix `allowMultipleJars` recommend setting message| +|[#9947](https://github.com/NVIDIA/spark-rapids/pull/9947)|Fix merge conflict with branch-23.12| +|[#9908](https://github.com/NVIDIA/spark-rapids/pull/9908)|Register default allocator for host memory| +|[#9944](https://github.com/NVIDIA/spark-rapids/pull/9944)|Fix Java OOM caused by incorrect state of shouldCapture when exception occurred| +|[#9937](https://github.com/NVIDIA/spark-rapids/pull/9937)|Refactor to use CLASSIFIER instead of CUDA_CLASSIFIER [skip ci]| +|[#9904](https://github.com/NVIDIA/spark-rapids/pull/9904)|Params for build and test CI scripts on Databricks| +|[#9719](https://github.com/NVIDIA/spark-rapids/pull/9719)|Support fine grained timezone checker instead of type based| +|[#9918](https://github.com/NVIDIA/spark-rapids/pull/9918)|Prevent generation of 'year 0 is out of range' strings in IT| +|[#9852](https://github.com/NVIDIA/spark-rapids/pull/9852)|Avoid generating duplicate nan keys with MapGen(FloatGen)| +|[#9674](https://github.com/NVIDIA/spark-rapids/pull/9674)|Add cache action to speed up mvn workflow [skip ci]| +|[#9900](https://github.com/NVIDIA/spark-rapids/pull/9900)|Revert "Remove Databricks 13.3 from release 23.12 (#9890)"| +|[#9888](https://github.com/NVIDIA/spark-rapids/pull/9888)|Update nightly build and deploy script for arm artifacts [skip ci]| +|[#9656](https://github.com/NVIDIA/spark-rapids/pull/9656)|Update for new retry state machine JNI APIs| +|[#9654](https://github.com/NVIDIA/spark-rapids/pull/9654)|Detect multiple jars on the classpath when init plugin| +|[#9857](https://github.com/NVIDIA/spark-rapids/pull/9857)|Skip redundant steps in nightly build [skip ci]| +|[#9812](https://github.com/NVIDIA/spark-rapids/pull/9812)|Update JNI and private dep version to 24.02.0-SNAPSHOT| ## Release 23.12 @@ -118,14 +391,20 @@ Generated on 2024-01-31 ### PRs ||| |:---|:---| +|[#10384](https://github.com/NVIDIA/spark-rapids/pull/10384)|[DOC] Update docs for 23.12.2 release [skip ci] | +|[#10341](https://github.com/NVIDIA/spark-rapids/pull/10341)|Update changelog for v23.12.2 [skip ci]| |[#10340](https://github.com/NVIDIA/spark-rapids/pull/10340)|Copyright to 2024 [skip ci]| |[#10323](https://github.com/NVIDIA/spark-rapids/pull/10323)|Upgrade version to 23.12.2-SNAPSHOT| +|[#10329](https://github.com/NVIDIA/spark-rapids/pull/10329)|update download page for v23.12.2 release [skip ci]| |[#10274](https://github.com/NVIDIA/spark-rapids/pull/10274)|PythonRunner Changes| |[#10124](https://github.com/NVIDIA/spark-rapids/pull/10124)|Update changelog for v23.12.1 [skip ci]| |[#10123](https://github.com/NVIDIA/spark-rapids/pull/10123)|Change version to v23.12.1 [skip ci]| |[#10122](https://github.com/NVIDIA/spark-rapids/pull/10122)|Init changelog for v23.12.1 [skip ci]| |[#10121](https://github.com/NVIDIA/spark-rapids/pull/10121)|[DOC] update download page for db hot fix [skip ci]| |[#10116](https://github.com/NVIDIA/spark-rapids/pull/10116)|Upgrade to 23.12.1-SNAPSHOT| +|[#10069](https://github.com/NVIDIA/spark-rapids/pull/10069)|Revert "Support split broadcast join condition into ast and non-ast […| +|[#9470](https://github.com/NVIDIA/spark-rapids/pull/9470)|Use float to string kernel| +|[#9481](https://github.com/NVIDIA/spark-rapids/pull/9481)|Use parse_url kernel for PROTOCOL parsing| |[#9935](https://github.com/NVIDIA/spark-rapids/pull/9935)|Init 23.12 changelog [skip ci]| |[#9943](https://github.com/NVIDIA/spark-rapids/pull/9943)|[DOC] Update docs for 23.12.0 release [skip ci]| |[#10014](https://github.com/NVIDIA/spark-rapids/pull/10014)|Add documentation for how to run tests with a fixed datagen seed [skip ci]| @@ -146,6 +425,8 @@ Generated on 2024-01-31 |[#9906](https://github.com/NVIDIA/spark-rapids/pull/9906)|Fix test_initcap to use the intended limited character set| |[#9831](https://github.com/NVIDIA/spark-rapids/pull/9831)|Skip fastparquet timestamp tests when plugin cannot read/write timestamps| |[#9893](https://github.com/NVIDIA/spark-rapids/pull/9893)|Add multiple expression tier regression test for AST| +|[#9889](https://github.com/NVIDIA/spark-rapids/pull/9889)|Fix test_cast_string_ts_valid_format test| +|[#9833](https://github.com/NVIDIA/spark-rapids/pull/9833)|Fix a hang for Pandas UDFs on DB 13.3| |[#9873](https://github.com/NVIDIA/spark-rapids/pull/9873)|Add support for decimal in `to_json`| |[#9890](https://github.com/NVIDIA/spark-rapids/pull/9890)|Remove Databricks 13.3 from release 23.12| |[#9874](https://github.com/NVIDIA/spark-rapids/pull/9874)|Fix zero-scale floor and ceil tests| @@ -193,6 +474,7 @@ Generated on 2024-01-31 |[#9741](https://github.com/NVIDIA/spark-rapids/pull/9741)|Set seed=0 for the delta lake part roundtrip tests| |[#9660](https://github.com/NVIDIA/spark-rapids/pull/9660)|Fully support date/time legacy rebase for nested input| |[#9672](https://github.com/NVIDIA/spark-rapids/pull/9672)|Support String type for AST| +|[#9716](https://github.com/NVIDIA/spark-rapids/pull/9716)|Initiate project version 24.02.0-SNAPSHOT| |[#9732](https://github.com/NVIDIA/spark-rapids/pull/9732)|Temporarily force `datagen_seed=0` for `test_re_replace_all` to unblock CI| |[#9726](https://github.com/NVIDIA/spark-rapids/pull/9726)|Fix leak in BatchWithPartitionData| |[#9717](https://github.com/NVIDIA/spark-rapids/pull/9717)|Encode the file path from Iceberg when converting to a PartitionedFile| @@ -296,250 +578,6 @@ Generated on 2024-01-31 |[#9373](https://github.com/NVIDIA/spark-rapids/pull/9373)|Fix auto merge conflict 9372| |[#9308](https://github.com/NVIDIA/spark-rapids/pull/9308)|Initiate arm64 CI support [skip ci]| |[#9292](https://github.com/NVIDIA/spark-rapids/pull/9292)|Init project version 23.12.0-SNAPSHOT| -|[#9291](https://github.com/NVIDIA/spark-rapids/pull/9291)|Automerge from 23.10 to 23.12 [skip ci]| - -## Release 23.10 - -### Features -||| -|:---|:---| -|[#9220](https://github.com/NVIDIA/spark-rapids/issues/9220)|[FEA] Add GPU support for converting binary data to a hex string in REPL| -|[#9171](https://github.com/NVIDIA/spark-rapids/issues/9171)|[FEA] Add GPU version of ToPrettyString| -|[#5314](https://github.com/NVIDIA/spark-rapids/issues/5314)|[FEA] Support window.rowsBetween(Window.unboundedPreceding, -1) | -|[#9057](https://github.com/NVIDIA/spark-rapids/issues/9057)|[FEA] Add unbounded to unbounded fixers for min and max| -|[#8121](https://github.com/NVIDIA/spark-rapids/issues/8121)|[FEA] Add Spark 3.5.0 shim layer| -|[#9224](https://github.com/NVIDIA/spark-rapids/issues/9224)|[FEA] Allow } and }} to be transpiled to static strings| -|[#8596](https://github.com/NVIDIA/spark-rapids/issues/8596)|[FEA] Support spark.sql.legacy.parquet.datetimeRebaseModeInWrite=LEGACY| -|[#8767](https://github.com/NVIDIA/spark-rapids/issues/8767)|[AUDIT][SPARK-43302][SQL] Make Python UDAF an AggregateFunction| -|[#9055](https://github.com/NVIDIA/spark-rapids/issues/9055)|[FEA] Support Spark 3.3.3 official release| -|[#8672](https://github.com/NVIDIA/spark-rapids/issues/8672)|[FEA] Make GPU readers easier to debug on failure (any failure including OOM)| -|[#8965](https://github.com/NVIDIA/spark-rapids/issues/8965)|[FEA] Enable Bloom filter join acceleration by default| -|[#8625](https://github.com/NVIDIA/spark-rapids/issues/8625)|[FEA] Support outputTimestampType being INT96| - -### Performance -||| -|:---|:---| -|[#9512](https://github.com/NVIDIA/spark-rapids/issues/9512)|[DOC] Multi-Threaded shuffle documentation is not accurate on the read side| -|[#7803](https://github.com/NVIDIA/spark-rapids/issues/7803)|[FEA] Accelerate Bloom filtered joins | - -### Bugs Fixed -||| -|:---|:---| -|[#8662](https://github.com/NVIDIA/spark-rapids/issues/8662)|[BUG] Dataproc spark-rapids.sh fails due to cuda driver version issue| -|[#9428](https://github.com/NVIDIA/spark-rapids/issues/9428)|[Audit] SPARK-44448 Wrong results for dense_rank() <= k| -|[#9485](https://github.com/NVIDIA/spark-rapids/issues/9485)|[BUG] GpuSemaphore can deadlock if there are multiple threads per task| -|[#9498](https://github.com/NVIDIA/spark-rapids/issues/9498)|[BUG] spark 3.5.0 shim spark-shell is broken in spark-rapids 23.10 and 23.12| -|[#9060](https://github.com/NVIDIA/spark-rapids/issues/9060)|[BUG] OOM error in split and retry with multifile coalesce reader with parquet data| -|[#8916](https://github.com/NVIDIA/spark-rapids/issues/8916)|[BUG] Databricks - move init scripts off DBFS| -|[#9416](https://github.com/NVIDIA/spark-rapids/issues/9416)|[BUG] CDH build failed due to missing dependencies| -|[#9357](https://github.com/NVIDIA/spark-rapids/issues/9357)|[BUG] json_test failed on "NameError: name 'TimestampNTZType' is not defined"| -|[#9271](https://github.com/NVIDIA/spark-rapids/issues/9271)|[BUG] ThreadPool size is deduced incorrectly in MultiFileReaderThreadPool on YARN clusters| -|[#9309](https://github.com/NVIDIA/spark-rapids/issues/9309)|[BUG] bround and round do not return the correct result for some decimal values.| -|[#9153](https://github.com/NVIDIA/spark-rapids/issues/9153)|[BUG] netty OOM with MULTITHREADED shuffle| -|[#9311](https://github.com/NVIDIA/spark-rapids/issues/9311)|[BUG] test_hash_groupby_collect_list fails| -|[#9180](https://github.com/NVIDIA/spark-rapids/issues/9180)|[FEA][AUDIT][SPARK-44641] Incorrect result in certain scenarios when SPJ is not triggered| -|[#9290](https://github.com/NVIDIA/spark-rapids/issues/9290)|[BUG] delta_lake_test FAILED on "column mapping mode id is not supported for this Delta version"| -|[#9255](https://github.com/NVIDIA/spark-rapids/issues/9255)|[BUG] Unable to read DeltaTable with columnMapping.mode = name| -|[#9261](https://github.com/NVIDIA/spark-rapids/issues/9261)|[BUG] Leaks and Double Frees in Unit Tests| -|[#9246](https://github.com/NVIDIA/spark-rapids/issues/9246)|[BUG] `test_predefined_character_classes` failed with seed 4| -|[#9208](https://github.com/NVIDIA/spark-rapids/issues/9208)|[BUG] SplitAndRetryOOM query14_part1 at 100TB with spark.executor.cores=64| -|[#9106](https://github.com/NVIDIA/spark-rapids/issues/9106)|[BUG] Configuring GDS breaks new host spillable buffers and batches| -|[#9131](https://github.com/NVIDIA/spark-rapids/issues/9131)|[BUG] ConcurrentModificationException in ScalableTaskCompletion| -|[#9263](https://github.com/NVIDIA/spark-rapids/issues/9263)|[BUG] Unit test logging is not captured when running against Spark 3.5.0| -|[#9168](https://github.com/NVIDIA/spark-rapids/issues/9168)|[BUG] Calling RmmSpark.getAndResetNumRetryThrow from tests is not working| -|[#8776](https://github.com/NVIDIA/spark-rapids/issues/8776)|[BUG] FileCacheIntegrationSuite intermittent failure| -|[#9223](https://github.com/NVIDIA/spark-rapids/issues/9223)|[BUG] Failed to create memory map on query14_part1 at 100TB with spark.executor.cores=64| -|[#9116](https://github.com/NVIDIA/spark-rapids/issues/9116)|[BUG] spark350 shim build failed in mvn-verify github checks and nightly due to dependencies not released| -|[#8984](https://github.com/NVIDIA/spark-rapids/issues/8984)|[BUG] Check that keys are not null when creating a map| -|[#9233](https://github.com/NVIDIA/spark-rapids/issues/9233)|[BUG] test_parquet_testing_error_files - Failed: DID NOT RAISE in databricks runtime 12.2| -|[#9142](https://github.com/NVIDIA/spark-rapids/issues/9142)|[BUG] AWS EMR 6.12 NDS SF3k query9 Failure on g4dn.4xlarge| -|[#9214](https://github.com/NVIDIA/spark-rapids/issues/9214)|[BUG] mvn resolve dependencies failed missing rapids-4-spark-sql-plugin-api_2.12 of 311 shim| -|[#9204](https://github.com/NVIDIA/spark-rapids/issues/9204)|[BUG] SplitAndRetryOOM query78 at 100TB with spark.executor.cores=64| -|[#9213](https://github.com/NVIDIA/spark-rapids/issues/9213)|[BUG] Missing revision info in databricks shims failed nightly build| -|[#9206](https://github.com/NVIDIA/spark-rapids/issues/9206)|[BUG] test_datetime_roundtrip_with_legacy_rebase failed in databricks runtimes| -|[#9165](https://github.com/NVIDIA/spark-rapids/issues/9165)|[BUG] Data gen for key groups produces type-mismatch columns| -|[#9129](https://github.com/NVIDIA/spark-rapids/issues/9129)|[BUG] Writing Parquet map(map) column can not set the outer key as non-null.| -|[#9194](https://github.com/NVIDIA/spark-rapids/issues/9194)|[BUG] missing sql-plugin-api databricks artifacts in the nightly CI | -|[#9167](https://github.com/NVIDIA/spark-rapids/issues/9167)|[BUG] Ensure no udf-compiler internal nodes escape| -|[#9092](https://github.com/NVIDIA/spark-rapids/issues/9092)|[BUG] NDS query 64 falls back to CPU only for a shuffle| -|[#9071](https://github.com/NVIDIA/spark-rapids/issues/9071)|[BUG] `test_numeric_running_sum_window_no_part_unbounded` failed in MT tests| -|[#9154](https://github.com/NVIDIA/spark-rapids/issues/9154)|[BUG] Spark 3.5.0 nightly build failures (test_parquet_testing_error_files)| -|[#9149](https://github.com/NVIDIA/spark-rapids/issues/9149)|[BUG] compile failed in databricks runtimes due to new added TestReport| -|[#9041](https://github.com/NVIDIA/spark-rapids/issues/9041)|[BUG] Fix regression in Python UDAF support when running against Spark 3.5.0| -|[#9064](https://github.com/NVIDIA/spark-rapids/issues/9064)|[BUG][Spark 3.5.0] Re-enable test_hive_empty_simple_udf when 3.5.0-rc2 is available| -|[#9065](https://github.com/NVIDIA/spark-rapids/issues/9065)|[BUG][Spark 3.5.0] Reinstate cast map/array to string tests when 3.5.0-rc2 is available| -|[#9119](https://github.com/NVIDIA/spark-rapids/issues/9119)|[BUG] Predicate pushdown doesn't work for parquet files written by GPU| -|[#9103](https://github.com/NVIDIA/spark-rapids/issues/9103)|[BUG] test_select_complex_field fails in MT tests| -|[#9086](https://github.com/NVIDIA/spark-rapids/issues/9086)|[BUG] GpuBroadcastNestedLoopJoinExec can assert in doUnconditionalJoin| -|[#8939](https://github.com/NVIDIA/spark-rapids/issues/8939)|[BUG] q95 odd task failure in query95 at 30TB| -|[#9082](https://github.com/NVIDIA/spark-rapids/issues/9082)|[BUG] Race condition while spilling and aliasing a RapidsBuffer (regression)| -|[#9069](https://github.com/NVIDIA/spark-rapids/issues/9069)|[BUG] ParquetFormatScanSuite does not pass locally| -|[#8980](https://github.com/NVIDIA/spark-rapids/issues/8980)|[BUG] invalid escape sequences in pytests| -|[#7807](https://github.com/NVIDIA/spark-rapids/issues/7807)|[BUG] Round robin partitioning sort check falls back to CPU for cases that can be supported| -|[#8482](https://github.com/NVIDIA/spark-rapids/issues/8482)|[BUG] Potential leak on SplitAndRetry when iterator not fully drained| -|[#8942](https://github.com/NVIDIA/spark-rapids/issues/8942)|[BUG] NDS query 14 parts 1 and 2 both fail at SF100K| -|[#8778](https://github.com/NVIDIA/spark-rapids/issues/8778)|[BUG] GPU Parquet output for TIMESTAMP_MICROS is misinteterpreted by fastparquet as nanos| - -### PRs -||| -|:---|:---| -|[#9304](https://github.com/NVIDIA/spark-rapids/pull/9304)|Specify recoverWithNull when reading JSON files| -|[#9474](https://github.com/NVIDIA/spark-rapids/pull/9474)| Improve configuration handling in BatchWithPartitionData| -|[#9289](https://github.com/NVIDIA/spark-rapids/pull/9289)|Add tests to check compatibility with pyarrow| -|[#9522](https://github.com/NVIDIA/spark-rapids/pull/9522)|Update 23.10 changelog [skip ci]| -|[#9501](https://github.com/NVIDIA/spark-rapids/pull/9501)|Fix GpuSemaphore to support multiple threads per task| -|[#9500](https://github.com/NVIDIA/spark-rapids/pull/9500)|Fix Spark 3.5.0 shell classloader issue with the plugin| -|[#9230](https://github.com/NVIDIA/spark-rapids/pull/9230)|Fix reading partition value columns larger than cudf column size limit| -|[#9427](https://github.com/NVIDIA/spark-rapids/pull/9427)|[DOC] Update docs for 23.10.0 release [skip ci]| -|[#9421](https://github.com/NVIDIA/spark-rapids/pull/9421)|Init changelog of 23.10 [skip ci]| -|[#9445](https://github.com/NVIDIA/spark-rapids/pull/9445)|Only run test_csv_infer_schema_timestamp_ntz tests with PySpark >= 3.4.1| -|[#9420](https://github.com/NVIDIA/spark-rapids/pull/9420)|Update private and jni dep version to released 23.10.0| -|[#9415](https://github.com/NVIDIA/spark-rapids/pull/9415)|[BUG] fix docker modified check in premerge [skip ci]| -|[#9407](https://github.com/NVIDIA/spark-rapids/pull/9407)|[Doc]Update docs for 23.08.2 version[skip ci]| -|[#9392](https://github.com/NVIDIA/spark-rapids/pull/9392)|Only run test_json_ts_formats_round_trip_ntz tests with PySpark >= 3.4.1| -|[#9401](https://github.com/NVIDIA/spark-rapids/pull/9401)|Remove using mamba before they fix the incompatibility issue [skip ci]| -|[#9381](https://github.com/NVIDIA/spark-rapids/pull/9381)|Change the executor core calculation to take into account the cluster manager| -|[#9351](https://github.com/NVIDIA/spark-rapids/pull/9351)|Put back in full decimal support for format_number| -|[#9374](https://github.com/NVIDIA/spark-rapids/pull/9374)|GpuCoalesceBatches should throw SplitAndRetyOOM on GPU OOM error| -|[#9238](https://github.com/NVIDIA/spark-rapids/pull/9238)|Simplified handling of GPU core dumps| -|[#9362](https://github.com/NVIDIA/spark-rapids/pull/9362)|[DOC] Removing User Guide pages that will be source of truth on docs.nvidia…| -|[#9365](https://github.com/NVIDIA/spark-rapids/pull/9365)|Update DataWriteCommandExec docs to reflect ORC support for nested types| -|[#9277](https://github.com/NVIDIA/spark-rapids/pull/9277)|[Doc]Remove CUDA related requirement from download page.[Skip CI]| -|[#9352](https://github.com/NVIDIA/spark-rapids/pull/9352)|Refine rules for skipping `test_csv_infer_schema_timestamp_ntz_*` tests| -|[#9334](https://github.com/NVIDIA/spark-rapids/pull/9334)|Add NaNs to Data Generators In Floating-Point Testing| -|[#9344](https://github.com/NVIDIA/spark-rapids/pull/9344)|Update MULTITHREADED shuffle maxBytesInFlight default to 128MB| -|[#9330](https://github.com/NVIDIA/spark-rapids/pull/9330)|Add Hao to blossom-ci whitelist| -|[#9328](https://github.com/NVIDIA/spark-rapids/pull/9328)|Building different Cuda versions section profile does not take effect [skip ci]| -|[#9329](https://github.com/NVIDIA/spark-rapids/pull/9329)|Add kuhushukla to blossom ci yml| -|[#9281](https://github.com/NVIDIA/spark-rapids/pull/9281)|Support `format_number`| -|[#9335](https://github.com/NVIDIA/spark-rapids/pull/9335)|Temporarily skip failing tests test_csv_infer_schema_timestamp_ntz*| -|[#9318](https://github.com/NVIDIA/spark-rapids/pull/9318)|Update authorized user in blossom-ci whitelist [skip ci]| -|[#9221](https://github.com/NVIDIA/spark-rapids/pull/9221)|Add GPU version of ToPrettyString| -|[#9321](https://github.com/NVIDIA/spark-rapids/pull/9321)|[DOC] Fix some incorrect config links in doc [skip ci]| -|[#9314](https://github.com/NVIDIA/spark-rapids/pull/9314)|Fix RMM crash in FileCacheIntegrationSuite with ARENA memory allocator| -|[#9287](https://github.com/NVIDIA/spark-rapids/pull/9287)|Allow checkpoint and restore on non-deterministic expressions in GpuFilter and GpuProject| -|[#9146](https://github.com/NVIDIA/spark-rapids/pull/9146)|Improve some CSV integration tests| -|[#9159](https://github.com/NVIDIA/spark-rapids/pull/9159)|Update tests and documentation for `spark.sql.timestampType` when reading CSV/JSON| -|[#9313](https://github.com/NVIDIA/spark-rapids/pull/9313)|Sort results of collect_list test before comparing since it is not guaranteed| -|[#9286](https://github.com/NVIDIA/spark-rapids/pull/9286)|[FEA][AUDIT][SPARK-44641] Incorrect result in certain scenarios when SPJ is not triggered| -|[#9229](https://github.com/NVIDIA/spark-rapids/pull/9229)|Support negative preceding/following for ROW-based window functions| -|[#9297](https://github.com/NVIDIA/spark-rapids/pull/9297)|Append new authorized user to blossom-ci whitelist [skip ci]| -|[#9294](https://github.com/NVIDIA/spark-rapids/pull/9294)|Fix test_delta_read_column_mapping test failures on Spark 3.2.x and 3.3.x| -|[#9285](https://github.com/NVIDIA/spark-rapids/pull/9285)|Add CastOptions to make GpuCast extendible to handle more options| -|[#9279](https://github.com/NVIDIA/spark-rapids/pull/9279)|Fix file format checks to be exact and handle Delta Lake column mapping| -|[#9283](https://github.com/NVIDIA/spark-rapids/pull/9283)|Refactor ExternalSource to move some APIs to converted GPU format or scan| -|[#9264](https://github.com/NVIDIA/spark-rapids/pull/9264)|Fix leak in test and double free in corner case| -|[#9280](https://github.com/NVIDIA/spark-rapids/pull/9280)|Fix some issues found with different seeds in integration tests| -|[#9257](https://github.com/NVIDIA/spark-rapids/pull/9257)|Have host spill use the new HostAlloc API| -|[#9253](https://github.com/NVIDIA/spark-rapids/pull/9253)|Enforce Scala method syntax over deprecated procedure syntax| -|[#9273](https://github.com/NVIDIA/spark-rapids/pull/9273)|Add arm64 profile to build arm artifacts| -|[#9270](https://github.com/NVIDIA/spark-rapids/pull/9270)|Remove GDS spilling| -|[#9267](https://github.com/NVIDIA/spark-rapids/pull/9267)|Roll our own BufferedIterator so we can close cleanly| -|[#9266](https://github.com/NVIDIA/spark-rapids/pull/9266)|Specify correct dependency versions for 350 build| -|[#9262](https://github.com/NVIDIA/spark-rapids/pull/9262)|Add Delta Lake support for Spark 3.4.1 and Delta Lake tests on Spark 3.4.x| -|[#9256](https://github.com/NVIDIA/spark-rapids/pull/9256)|Test Parquet double column stat without NaN| -|[#9254](https://github.com/NVIDIA/spark-rapids/pull/9254)|[Doc]update the emr getting started doc for emr-6130 release[skip ci]| -|[#9228](https://github.com/NVIDIA/spark-rapids/pull/9228)|Add in unbounded to unbounded optimization for min/max| -|[#9252](https://github.com/NVIDIA/spark-rapids/pull/9252)|Add Spark 3.5.0 to list of supported Spark versions [skip ci]| -|[#9251](https://github.com/NVIDIA/spark-rapids/pull/9251)|Enable a couple of retry asserts in internal row to cudf row iterator suite| -|[#9239](https://github.com/NVIDIA/spark-rapids/pull/9239)|Handle escaping the dangling right ] and right } in the regexp transpiler| -|[#9090](https://github.com/NVIDIA/spark-rapids/pull/9090)|Add test cases for Parquet statistics| -|[#9240](https://github.com/NVIDIA/spark-rapids/pull/9240)|Fix flaky ORC filecache test| -|[#9053](https://github.com/NVIDIA/spark-rapids/pull/9053)|[DOC] update the turning guide document issues [skip ci]| -|[#9211](https://github.com/NVIDIA/spark-rapids/pull/9211)|Allow skipping host spill for a direct device->disk spill| -|[#9234](https://github.com/NVIDIA/spark-rapids/pull/9234)|Enable Spark 350 builds| -|[#9237](https://github.com/NVIDIA/spark-rapids/pull/9237)|Check for null keys when creating map| -|[#9235](https://github.com/NVIDIA/spark-rapids/pull/9235)|xfail fixed_length_byte_array.parquet test due to rapidsai/cudf#14104| -|[#9231](https://github.com/NVIDIA/spark-rapids/pull/9231)|Use conda libmamba solver to resolve intermittent libarchive issue [skip ci]| -|[#8404](https://github.com/NVIDIA/spark-rapids/pull/8404)|Add in support for FIXED_LEN_BYTE_ARRAY as binary| -|[#9225](https://github.com/NVIDIA/spark-rapids/pull/9225)|Add in a HostAlloc API for high priority and add in spilling| -|[#9207](https://github.com/NVIDIA/spark-rapids/pull/9207)|Support SplitAndRetry for GpuRangeExec| -|[#9217](https://github.com/NVIDIA/spark-rapids/pull/9217)|Fix leak in aggregate when there are retries| -|[#9200](https://github.com/NVIDIA/spark-rapids/pull/9200)|Fix a few minor things with scale test| -|[#9222](https://github.com/NVIDIA/spark-rapids/pull/9222)|Deploy classified aggregator for Databricks [skip ci]| -|[#9209](https://github.com/NVIDIA/spark-rapids/pull/9209)|Fix tests for datetime rebase in Databricks| -|[#9181](https://github.com/NVIDIA/spark-rapids/pull/9181)|[DOC] address document issues [skip ci]| -|[#9132](https://github.com/NVIDIA/spark-rapids/pull/9132)|Support `spark.sql.parquet.datetimeRebaseModeInWrite=LEGACY`| -|[#9196](https://github.com/NVIDIA/spark-rapids/pull/9196)|Fix host memory leak for R2C| -|[#9192](https://github.com/NVIDIA/spark-rapids/pull/9192)|Throw overflow exception when interval seconds are outside of range [0, 59]| -|[#9150](https://github.com/NVIDIA/spark-rapids/pull/9150)|add error section in report and the rest queries| -|[#9189](https://github.com/NVIDIA/spark-rapids/pull/9189)|Expose host store spill| -|[#9147](https://github.com/NVIDIA/spark-rapids/pull/9147)|Make map column non-nullable when it's a key in another map.| -|[#9193](https://github.com/NVIDIA/spark-rapids/pull/9193)|Support Retry for GpuLocalLimitExec and GpuGlobalLimitExec| -|[#9183](https://github.com/NVIDIA/spark-rapids/pull/9183)|Add test to verify UDT fallback for parquet| -|[#9195](https://github.com/NVIDIA/spark-rapids/pull/9195)|Deploy sql-plugin-api artifact in DBR CI pipelines [skip ci]| -|[#9170](https://github.com/NVIDIA/spark-rapids/pull/9170)|Add in new HostAlloc API| -|[#9182](https://github.com/NVIDIA/spark-rapids/pull/9182)|Consolidate Spark vendor shim dependency management| -|[#9190](https://github.com/NVIDIA/spark-rapids/pull/9190)|Prevent returning internal compiler expressions when compiling UDFs| -|[#9164](https://github.com/NVIDIA/spark-rapids/pull/9164)|Support Retry for GpuTopN and GpuSortEachBatchIterator| -|[#9134](https://github.com/NVIDIA/spark-rapids/pull/9134)|Fix shuffle fallback due to AQE on AWS EMR| -|[#9188](https://github.com/NVIDIA/spark-rapids/pull/9188)|Fix flaky tests in FileCacheIntegrationSuite| -|[#9148](https://github.com/NVIDIA/spark-rapids/pull/9148)|Add minimum Maven module eventually containing all non-shimmable source code| -|[#9169](https://github.com/NVIDIA/spark-rapids/pull/9169)|Add retry-without-split in InternalRowToColumnarBatchIterator| -|[#9172](https://github.com/NVIDIA/spark-rapids/pull/9172)|Remove doSetSpillable in favor of setSpillable| -|[#9152](https://github.com/NVIDIA/spark-rapids/pull/9152)|Add test cases for testing Parquet compression types| -|[#9157](https://github.com/NVIDIA/spark-rapids/pull/9157)|XFAIL parquet `lz4_raw` tests for Spark 3.5.0 or later| -|[#9128](https://github.com/NVIDIA/spark-rapids/pull/9128)|Test parquet predicate pushdown for basic types and fields having dots in names| -|[#9158](https://github.com/NVIDIA/spark-rapids/pull/9158)|Add json4s dependencies for Databricks integration_tests build| -|[#9102](https://github.com/NVIDIA/spark-rapids/pull/9102)|Add retry support to GpuOutOfCoreSortIterator.mergeSortEnoughToOutput | -|[#9089](https://github.com/NVIDIA/spark-rapids/pull/9089)|Add application to run Scale Test| -|[#9143](https://github.com/NVIDIA/spark-rapids/pull/9143)|[DOC] update spark.rapids.sql.concurrentGpuTasks default value in tuning guide [skip ci]| -|[#8476](https://github.com/NVIDIA/spark-rapids/pull/8476)|Use retry with split in GpuCachedDoublePassWindowIterator| -|[#9141](https://github.com/NVIDIA/spark-rapids/pull/9141)|Removed resultDecimalType in GpuIntegralDecimalDivide| -|[#9099](https://github.com/NVIDIA/spark-rapids/pull/9099)|Spark 3.5.0 follow-on work (rc2 support + Python UDAF)| -|[#9140](https://github.com/NVIDIA/spark-rapids/pull/9140)|Bump Jython to 2.7.3| -|[#9136](https://github.com/NVIDIA/spark-rapids/pull/9136)|Moving row column conversion code from cudf to jni| -|[#9133](https://github.com/NVIDIA/spark-rapids/pull/9133)|Add 350 tag to InSubqueryShims| -|[#9124](https://github.com/NVIDIA/spark-rapids/pull/9124)|Import `scala.collection` intead of `collection`| -|[#9122](https://github.com/NVIDIA/spark-rapids/pull/9122)|Fall back to CPU if `spark.sql.execution.arrow.useLargeVarTypes` is true| -|[#9115](https://github.com/NVIDIA/spark-rapids/pull/9115)|[DOC] updates documentation related to java compatibility [skip ci]| -|[#9098](https://github.com/NVIDIA/spark-rapids/pull/9098)|Add SpillableHostColumnarBatch| -|[#9091](https://github.com/NVIDIA/spark-rapids/pull/9091)|GPU support for DynamicPruningExpression and InSubqueryExec| -|[#9117](https://github.com/NVIDIA/spark-rapids/pull/9117)|Temply disable spark 350 shim build in nightly [skip ci]| -|[#9113](https://github.com/NVIDIA/spark-rapids/pull/9113)|Instantiate execution plan capture callback via shim loader| -|[#8969](https://github.com/NVIDIA/spark-rapids/pull/8969)|Initial support for Spark 3.5.0-rc1| -|[#9100](https://github.com/NVIDIA/spark-rapids/pull/9100)|Support broadcast nested loop existence joins with no condition| -|[#8925](https://github.com/NVIDIA/spark-rapids/pull/8925)|Add GpuConv operator for the `conv` 10<->16 expression| -|[#9109](https://github.com/NVIDIA/spark-rapids/pull/9109)|[DOC] adding java 11 to download docs [skip ci]| -|[#9085](https://github.com/NVIDIA/spark-rapids/pull/9085)|Retry with smaller split on `CudfColumnSizeOverflowException`| -|[#8961](https://github.com/NVIDIA/spark-rapids/pull/8961)|Save Databricks init scripts in the workspace| -|[#9088](https://github.com/NVIDIA/spark-rapids/pull/9088)|Add retry and SplitAndRetry support to AcceleratedColumnarToRowIterator| -|[#9095](https://github.com/NVIDIA/spark-rapids/pull/9095)|Support released spark 3.3.3| -|[#9084](https://github.com/NVIDIA/spark-rapids/pull/9084)|Fix race when a rapids buffer is aliased while it is spilled| -|[#9093](https://github.com/NVIDIA/spark-rapids/pull/9093)|Update ParquetFormatScanSuite to not call CUDF directly| -|[#9068](https://github.com/NVIDIA/spark-rapids/pull/9068)|Test ORC predicate pushdown (PPD) with timestamps decimals booleans| -|[#9054](https://github.com/NVIDIA/spark-rapids/pull/9054)|Initial entry point to data generation for scale test| -|[#9070](https://github.com/NVIDIA/spark-rapids/pull/9070)|Spillable host buffer| -|[#9066](https://github.com/NVIDIA/spark-rapids/pull/9066)|Add retry support to RowToColumnarIterator| -|[#9073](https://github.com/NVIDIA/spark-rapids/pull/9073)|Stop using invalid escape sequences| -|[#9018](https://github.com/NVIDIA/spark-rapids/pull/9018)|Add test for selecting a single complex field array and its parent struct array| -|[#9067](https://github.com/NVIDIA/spark-rapids/pull/9067)|Add array support for round robin partition; Refactor pluginSupportedOrderableSig| -|[#9072](https://github.com/NVIDIA/spark-rapids/pull/9072)|Revert "Implement SumUnboundedToUnboundedFixer (#8934)"| -|[#9056](https://github.com/NVIDIA/spark-rapids/pull/9056)|Add in configs for host memory limits| -|[#9061](https://github.com/NVIDIA/spark-rapids/pull/9061)|Fix import order| -|[#8934](https://github.com/NVIDIA/spark-rapids/pull/8934)|Implement SumUnboundedToUnboundedFixer| -|[#9051](https://github.com/NVIDIA/spark-rapids/pull/9051)|Use number of threads on executor instead of driver to set core count| -|[#9040](https://github.com/NVIDIA/spark-rapids/pull/9040)|Fix issues from 23.08 merge in join_test| -|[#9045](https://github.com/NVIDIA/spark-rapids/pull/9045)|Fix auto merge conflict 9043 [skip ci]| -|[#9009](https://github.com/NVIDIA/spark-rapids/pull/9009)|Add in a layer of indirection for task completion callbacks| -|[#9013](https://github.com/NVIDIA/spark-rapids/pull/9013)|Create a two-shim jar by default on Databricks| -|[#8995](https://github.com/NVIDIA/spark-rapids/pull/8995)|Add test case for ORC statistics test| -|[#8970](https://github.com/NVIDIA/spark-rapids/pull/8970)|Add ability to debug dump input data only on errors| -|[#9003](https://github.com/NVIDIA/spark-rapids/pull/9003)|Fix auto merge conflict 9002 [skip ci]| -|[#8989](https://github.com/NVIDIA/spark-rapids/pull/8989)|Mark lazy spillables as allowSpillable in during gatherer construction| -|[#8988](https://github.com/NVIDIA/spark-rapids/pull/8988)|Move big data generator to a separate module| -|[#8987](https://github.com/NVIDIA/spark-rapids/pull/8987)|Fix host memory buffer leaks in SerializationSuite| -|[#8968](https://github.com/NVIDIA/spark-rapids/pull/8968)|Enable GPU acceleration of Bloom filter join expressions by default| -|[#8947](https://github.com/NVIDIA/spark-rapids/pull/8947)|Add ArrowUtilsShims in preparation for Spark 3.5.0| -|[#8946](https://github.com/NVIDIA/spark-rapids/pull/8946)|[Spark 3.5.0] Shim access to StructType.fromAttributes| -|[#8824](https://github.com/NVIDIA/spark-rapids/pull/8824)|Drop the in-range check at INT96 output path| -|[#8924](https://github.com/NVIDIA/spark-rapids/pull/8924)|Deprecate and delegate GpuCV.debug to cudf TableDebug| -|[#8915](https://github.com/NVIDIA/spark-rapids/pull/8915)|Move LegacyBehaviorPolicy references to shim layer| -|[#8918](https://github.com/NVIDIA/spark-rapids/pull/8918)|Output unified diff when GPU output deviates| -|[#8857](https://github.com/NVIDIA/spark-rapids/pull/8857)|Remove the pageable pool| -|[#8854](https://github.com/NVIDIA/spark-rapids/pull/8854)|Fix auto merge conflict 8853 [skip ci]| -|[#8805](https://github.com/NVIDIA/spark-rapids/pull/8805)|Bump up dep versions to 23.10.0-SNAPSHOT| -|[#8796](https://github.com/NVIDIA/spark-rapids/pull/8796)|Init version 23.10.0-SNAPSHOT| ## Older Releases Changelog of older releases can be found at [docs/archives](/docs/archives) diff --git a/docs/archives/CHANGELOG_23.02_to_23.08.md b/docs/archives/CHANGELOG_23.02_to_23.10.md similarity index 80% rename from docs/archives/CHANGELOG_23.02_to_23.08.md rename to docs/archives/CHANGELOG_23.02_to_23.10.md index 7423e378aca..dc2601adac6 100644 --- a/docs/archives/CHANGELOG_23.02_to_23.08.md +++ b/docs/archives/CHANGELOG_23.02_to_23.10.md @@ -1,5 +1,249 @@ + # Change log -Generated on 2023-10-12 +Generated on 2023-10-24 + +## Release 23.10 + +### Features +||| +|:---|:---| +|[#9220](https://github.com/NVIDIA/spark-rapids/issues/9220)|[FEA] Add GPU support for converting binary data to a hex string in REPL| +|[#9171](https://github.com/NVIDIA/spark-rapids/issues/9171)|[FEA] Add GPU version of ToPrettyString| +|[#5314](https://github.com/NVIDIA/spark-rapids/issues/5314)|[FEA] Support window.rowsBetween(Window.unboundedPreceding, -1) | +|[#9057](https://github.com/NVIDIA/spark-rapids/issues/9057)|[FEA] Add unbounded to unbounded fixers for min and max| +|[#8121](https://github.com/NVIDIA/spark-rapids/issues/8121)|[FEA] Add Spark 3.5.0 shim layer| +|[#9224](https://github.com/NVIDIA/spark-rapids/issues/9224)|[FEA] Allow } and }} to be transpiled to static strings| +|[#8596](https://github.com/NVIDIA/spark-rapids/issues/8596)|[FEA] Support spark.sql.legacy.parquet.datetimeRebaseModeInWrite=LEGACY| +|[#8767](https://github.com/NVIDIA/spark-rapids/issues/8767)|[AUDIT][SPARK-43302][SQL] Make Python UDAF an AggregateFunction| +|[#9055](https://github.com/NVIDIA/spark-rapids/issues/9055)|[FEA] Support Spark 3.3.3 official release| +|[#8672](https://github.com/NVIDIA/spark-rapids/issues/8672)|[FEA] Make GPU readers easier to debug on failure (any failure including OOM)| +|[#8965](https://github.com/NVIDIA/spark-rapids/issues/8965)|[FEA] Enable Bloom filter join acceleration by default| +|[#8625](https://github.com/NVIDIA/spark-rapids/issues/8625)|[FEA] Support outputTimestampType being INT96| + +### Performance +||| +|:---|:---| +|[#9512](https://github.com/NVIDIA/spark-rapids/issues/9512)|[DOC] Multi-Threaded shuffle documentation is not accurate on the read side| +|[#7803](https://github.com/NVIDIA/spark-rapids/issues/7803)|[FEA] Accelerate Bloom filtered joins | + +### Bugs Fixed +||| +|:---|:---| +|[#8662](https://github.com/NVIDIA/spark-rapids/issues/8662)|[BUG] Dataproc spark-rapids.sh fails due to cuda driver version issue| +|[#9428](https://github.com/NVIDIA/spark-rapids/issues/9428)|[Audit] SPARK-44448 Wrong results for dense_rank() <= k| +|[#9485](https://github.com/NVIDIA/spark-rapids/issues/9485)|[BUG] GpuSemaphore can deadlock if there are multiple threads per task| +|[#9498](https://github.com/NVIDIA/spark-rapids/issues/9498)|[BUG] spark 3.5.0 shim spark-shell is broken in spark-rapids 23.10 and 23.12| +|[#9060](https://github.com/NVIDIA/spark-rapids/issues/9060)|[BUG] OOM error in split and retry with multifile coalesce reader with parquet data| +|[#8916](https://github.com/NVIDIA/spark-rapids/issues/8916)|[BUG] Databricks - move init scripts off DBFS| +|[#9416](https://github.com/NVIDIA/spark-rapids/issues/9416)|[BUG] CDH build failed due to missing dependencies| +|[#9357](https://github.com/NVIDIA/spark-rapids/issues/9357)|[BUG] json_test failed on "NameError: name 'TimestampNTZType' is not defined"| +|[#9271](https://github.com/NVIDIA/spark-rapids/issues/9271)|[BUG] ThreadPool size is deduced incorrectly in MultiFileReaderThreadPool on YARN clusters| +|[#9309](https://github.com/NVIDIA/spark-rapids/issues/9309)|[BUG] bround and round do not return the correct result for some decimal values.| +|[#9153](https://github.com/NVIDIA/spark-rapids/issues/9153)|[BUG] netty OOM with MULTITHREADED shuffle| +|[#9311](https://github.com/NVIDIA/spark-rapids/issues/9311)|[BUG] test_hash_groupby_collect_list fails| +|[#9180](https://github.com/NVIDIA/spark-rapids/issues/9180)|[FEA][AUDIT][SPARK-44641] Incorrect result in certain scenarios when SPJ is not triggered| +|[#9290](https://github.com/NVIDIA/spark-rapids/issues/9290)|[BUG] delta_lake_test FAILED on "column mapping mode id is not supported for this Delta version"| +|[#9255](https://github.com/NVIDIA/spark-rapids/issues/9255)|[BUG] Unable to read DeltaTable with columnMapping.mode = name| +|[#9261](https://github.com/NVIDIA/spark-rapids/issues/9261)|[BUG] Leaks and Double Frees in Unit Tests| +|[#9246](https://github.com/NVIDIA/spark-rapids/issues/9246)|[BUG] `test_predefined_character_classes` failed with seed 4| +|[#9208](https://github.com/NVIDIA/spark-rapids/issues/9208)|[BUG] SplitAndRetryOOM query14_part1 at 100TB with spark.executor.cores=64| +|[#9106](https://github.com/NVIDIA/spark-rapids/issues/9106)|[BUG] Configuring GDS breaks new host spillable buffers and batches| +|[#9131](https://github.com/NVIDIA/spark-rapids/issues/9131)|[BUG] ConcurrentModificationException in ScalableTaskCompletion| +|[#9263](https://github.com/NVIDIA/spark-rapids/issues/9263)|[BUG] Unit test logging is not captured when running against Spark 3.5.0| +|[#9168](https://github.com/NVIDIA/spark-rapids/issues/9168)|[BUG] Calling RmmSpark.getAndResetNumRetryThrow from tests is not working| +|[#8776](https://github.com/NVIDIA/spark-rapids/issues/8776)|[BUG] FileCacheIntegrationSuite intermittent failure| +|[#9223](https://github.com/NVIDIA/spark-rapids/issues/9223)|[BUG] Failed to create memory map on query14_part1 at 100TB with spark.executor.cores=64| +|[#9116](https://github.com/NVIDIA/spark-rapids/issues/9116)|[BUG] spark350 shim build failed in mvn-verify github checks and nightly due to dependencies not released| +|[#8984](https://github.com/NVIDIA/spark-rapids/issues/8984)|[BUG] Check that keys are not null when creating a map| +|[#9233](https://github.com/NVIDIA/spark-rapids/issues/9233)|[BUG] test_parquet_testing_error_files - Failed: DID NOT RAISE in databricks runtime 12.2| +|[#9142](https://github.com/NVIDIA/spark-rapids/issues/9142)|[BUG] AWS EMR 6.12 NDS SF3k query9 Failure on g4dn.4xlarge| +|[#9214](https://github.com/NVIDIA/spark-rapids/issues/9214)|[BUG] mvn resolve dependencies failed missing rapids-4-spark-sql-plugin-api_2.12 of 311 shim| +|[#9204](https://github.com/NVIDIA/spark-rapids/issues/9204)|[BUG] SplitAndRetryOOM query78 at 100TB with spark.executor.cores=64| +|[#9213](https://github.com/NVIDIA/spark-rapids/issues/9213)|[BUG] Missing revision info in databricks shims failed nightly build| +|[#9206](https://github.com/NVIDIA/spark-rapids/issues/9206)|[BUG] test_datetime_roundtrip_with_legacy_rebase failed in databricks runtimes| +|[#9165](https://github.com/NVIDIA/spark-rapids/issues/9165)|[BUG] Data gen for key groups produces type-mismatch columns| +|[#9129](https://github.com/NVIDIA/spark-rapids/issues/9129)|[BUG] Writing Parquet map(map) column can not set the outer key as non-null.| +|[#9194](https://github.com/NVIDIA/spark-rapids/issues/9194)|[BUG] missing sql-plugin-api databricks artifacts in the nightly CI | +|[#9167](https://github.com/NVIDIA/spark-rapids/issues/9167)|[BUG] Ensure no udf-compiler internal nodes escape| +|[#9092](https://github.com/NVIDIA/spark-rapids/issues/9092)|[BUG] NDS query 64 falls back to CPU only for a shuffle| +|[#9071](https://github.com/NVIDIA/spark-rapids/issues/9071)|[BUG] `test_numeric_running_sum_window_no_part_unbounded` failed in MT tests| +|[#9154](https://github.com/NVIDIA/spark-rapids/issues/9154)|[BUG] Spark 3.5.0 nightly build failures (test_parquet_testing_error_files)| +|[#9149](https://github.com/NVIDIA/spark-rapids/issues/9149)|[BUG] compile failed in databricks runtimes due to new added TestReport| +|[#9041](https://github.com/NVIDIA/spark-rapids/issues/9041)|[BUG] Fix regression in Python UDAF support when running against Spark 3.5.0| +|[#9064](https://github.com/NVIDIA/spark-rapids/issues/9064)|[BUG][Spark 3.5.0] Re-enable test_hive_empty_simple_udf when 3.5.0-rc2 is available| +|[#9065](https://github.com/NVIDIA/spark-rapids/issues/9065)|[BUG][Spark 3.5.0] Reinstate cast map/array to string tests when 3.5.0-rc2 is available| +|[#9119](https://github.com/NVIDIA/spark-rapids/issues/9119)|[BUG] Predicate pushdown doesn't work for parquet files written by GPU| +|[#9103](https://github.com/NVIDIA/spark-rapids/issues/9103)|[BUG] test_select_complex_field fails in MT tests| +|[#9086](https://github.com/NVIDIA/spark-rapids/issues/9086)|[BUG] GpuBroadcastNestedLoopJoinExec can assert in doUnconditionalJoin| +|[#8939](https://github.com/NVIDIA/spark-rapids/issues/8939)|[BUG] q95 odd task failure in query95 at 30TB| +|[#9082](https://github.com/NVIDIA/spark-rapids/issues/9082)|[BUG] Race condition while spilling and aliasing a RapidsBuffer (regression)| +|[#9069](https://github.com/NVIDIA/spark-rapids/issues/9069)|[BUG] ParquetFormatScanSuite does not pass locally| +|[#8980](https://github.com/NVIDIA/spark-rapids/issues/8980)|[BUG] invalid escape sequences in pytests| +|[#7807](https://github.com/NVIDIA/spark-rapids/issues/7807)|[BUG] Round robin partitioning sort check falls back to CPU for cases that can be supported| +|[#8482](https://github.com/NVIDIA/spark-rapids/issues/8482)|[BUG] Potential leak on SplitAndRetry when iterator not fully drained| +|[#8942](https://github.com/NVIDIA/spark-rapids/issues/8942)|[BUG] NDS query 14 parts 1 and 2 both fail at SF100K| +|[#8778](https://github.com/NVIDIA/spark-rapids/issues/8778)|[BUG] GPU Parquet output for TIMESTAMP_MICROS is misinteterpreted by fastparquet as nanos| + +### PRs +||| +|:---|:---| +|[#9304](https://github.com/NVIDIA/spark-rapids/pull/9304)|Specify recoverWithNull when reading JSON files| +|[#9474](https://github.com/NVIDIA/spark-rapids/pull/9474)| Improve configuration handling in BatchWithPartitionData| +|[#9289](https://github.com/NVIDIA/spark-rapids/pull/9289)|Add tests to check compatibility with pyarrow| +|[#9522](https://github.com/NVIDIA/spark-rapids/pull/9522)|Update 23.10 changelog [skip ci]| +|[#9501](https://github.com/NVIDIA/spark-rapids/pull/9501)|Fix GpuSemaphore to support multiple threads per task| +|[#9500](https://github.com/NVIDIA/spark-rapids/pull/9500)|Fix Spark 3.5.0 shell classloader issue with the plugin| +|[#9230](https://github.com/NVIDIA/spark-rapids/pull/9230)|Fix reading partition value columns larger than cudf column size limit| +|[#9427](https://github.com/NVIDIA/spark-rapids/pull/9427)|[DOC] Update docs for 23.10.0 release [skip ci]| +|[#9421](https://github.com/NVIDIA/spark-rapids/pull/9421)|Init changelog of 23.10 [skip ci]| +|[#9445](https://github.com/NVIDIA/spark-rapids/pull/9445)|Only run test_csv_infer_schema_timestamp_ntz tests with PySpark >= 3.4.1| +|[#9420](https://github.com/NVIDIA/spark-rapids/pull/9420)|Update private and jni dep version to released 23.10.0| +|[#9415](https://github.com/NVIDIA/spark-rapids/pull/9415)|[BUG] fix docker modified check in premerge [skip ci]| +|[#9407](https://github.com/NVIDIA/spark-rapids/pull/9407)|[Doc]Update docs for 23.08.2 version[skip ci]| +|[#9392](https://github.com/NVIDIA/spark-rapids/pull/9392)|Only run test_json_ts_formats_round_trip_ntz tests with PySpark >= 3.4.1| +|[#9401](https://github.com/NVIDIA/spark-rapids/pull/9401)|Remove using mamba before they fix the incompatibility issue [skip ci]| +|[#9381](https://github.com/NVIDIA/spark-rapids/pull/9381)|Change the executor core calculation to take into account the cluster manager| +|[#9351](https://github.com/NVIDIA/spark-rapids/pull/9351)|Put back in full decimal support for format_number| +|[#9374](https://github.com/NVIDIA/spark-rapids/pull/9374)|GpuCoalesceBatches should throw SplitAndRetyOOM on GPU OOM error| +|[#9238](https://github.com/NVIDIA/spark-rapids/pull/9238)|Simplified handling of GPU core dumps| +|[#9362](https://github.com/NVIDIA/spark-rapids/pull/9362)|[DOC] Removing User Guide pages that will be source of truth on docs.nvidia…| +|[#9365](https://github.com/NVIDIA/spark-rapids/pull/9365)|Update DataWriteCommandExec docs to reflect ORC support for nested types| +|[#9277](https://github.com/NVIDIA/spark-rapids/pull/9277)|[Doc]Remove CUDA related requirement from download page.[Skip CI]| +|[#9352](https://github.com/NVIDIA/spark-rapids/pull/9352)|Refine rules for skipping `test_csv_infer_schema_timestamp_ntz_*` tests| +|[#9334](https://github.com/NVIDIA/spark-rapids/pull/9334)|Add NaNs to Data Generators In Floating-Point Testing| +|[#9344](https://github.com/NVIDIA/spark-rapids/pull/9344)|Update MULTITHREADED shuffle maxBytesInFlight default to 128MB| +|[#9330](https://github.com/NVIDIA/spark-rapids/pull/9330)|Add Hao to blossom-ci whitelist| +|[#9328](https://github.com/NVIDIA/spark-rapids/pull/9328)|Building different Cuda versions section profile does not take effect [skip ci]| +|[#9329](https://github.com/NVIDIA/spark-rapids/pull/9329)|Add kuhushukla to blossom ci yml| +|[#9281](https://github.com/NVIDIA/spark-rapids/pull/9281)|Support `format_number`| +|[#9335](https://github.com/NVIDIA/spark-rapids/pull/9335)|Temporarily skip failing tests test_csv_infer_schema_timestamp_ntz*| +|[#9318](https://github.com/NVIDIA/spark-rapids/pull/9318)|Update authorized user in blossom-ci whitelist [skip ci]| +|[#9221](https://github.com/NVIDIA/spark-rapids/pull/9221)|Add GPU version of ToPrettyString| +|[#9321](https://github.com/NVIDIA/spark-rapids/pull/9321)|[DOC] Fix some incorrect config links in doc [skip ci]| +|[#9314](https://github.com/NVIDIA/spark-rapids/pull/9314)|Fix RMM crash in FileCacheIntegrationSuite with ARENA memory allocator| +|[#9287](https://github.com/NVIDIA/spark-rapids/pull/9287)|Allow checkpoint and restore on non-deterministic expressions in GpuFilter and GpuProject| +|[#9146](https://github.com/NVIDIA/spark-rapids/pull/9146)|Improve some CSV integration tests| +|[#9159](https://github.com/NVIDIA/spark-rapids/pull/9159)|Update tests and documentation for `spark.sql.timestampType` when reading CSV/JSON| +|[#9313](https://github.com/NVIDIA/spark-rapids/pull/9313)|Sort results of collect_list test before comparing since it is not guaranteed| +|[#9286](https://github.com/NVIDIA/spark-rapids/pull/9286)|[FEA][AUDIT][SPARK-44641] Incorrect result in certain scenarios when SPJ is not triggered| +|[#9229](https://github.com/NVIDIA/spark-rapids/pull/9229)|Support negative preceding/following for ROW-based window functions| +|[#9297](https://github.com/NVIDIA/spark-rapids/pull/9297)|Append new authorized user to blossom-ci whitelist [skip ci]| +|[#9294](https://github.com/NVIDIA/spark-rapids/pull/9294)|Fix test_delta_read_column_mapping test failures on Spark 3.2.x and 3.3.x| +|[#9285](https://github.com/NVIDIA/spark-rapids/pull/9285)|Add CastOptions to make GpuCast extendible to handle more options| +|[#9279](https://github.com/NVIDIA/spark-rapids/pull/9279)|Fix file format checks to be exact and handle Delta Lake column mapping| +|[#9283](https://github.com/NVIDIA/spark-rapids/pull/9283)|Refactor ExternalSource to move some APIs to converted GPU format or scan| +|[#9264](https://github.com/NVIDIA/spark-rapids/pull/9264)|Fix leak in test and double free in corner case| +|[#9280](https://github.com/NVIDIA/spark-rapids/pull/9280)|Fix some issues found with different seeds in integration tests| +|[#9257](https://github.com/NVIDIA/spark-rapids/pull/9257)|Have host spill use the new HostAlloc API| +|[#9253](https://github.com/NVIDIA/spark-rapids/pull/9253)|Enforce Scala method syntax over deprecated procedure syntax| +|[#9273](https://github.com/NVIDIA/spark-rapids/pull/9273)|Add arm64 profile to build arm artifacts| +|[#9270](https://github.com/NVIDIA/spark-rapids/pull/9270)|Remove GDS spilling| +|[#9267](https://github.com/NVIDIA/spark-rapids/pull/9267)|Roll our own BufferedIterator so we can close cleanly| +|[#9266](https://github.com/NVIDIA/spark-rapids/pull/9266)|Specify correct dependency versions for 350 build| +|[#9262](https://github.com/NVIDIA/spark-rapids/pull/9262)|Add Delta Lake support for Spark 3.4.1 and Delta Lake tests on Spark 3.4.x| +|[#9256](https://github.com/NVIDIA/spark-rapids/pull/9256)|Test Parquet double column stat without NaN| +|[#9254](https://github.com/NVIDIA/spark-rapids/pull/9254)|[Doc]update the emr getting started doc for emr-6130 release[skip ci]| +|[#9228](https://github.com/NVIDIA/spark-rapids/pull/9228)|Add in unbounded to unbounded optimization for min/max| +|[#9252](https://github.com/NVIDIA/spark-rapids/pull/9252)|Add Spark 3.5.0 to list of supported Spark versions [skip ci]| +|[#9251](https://github.com/NVIDIA/spark-rapids/pull/9251)|Enable a couple of retry asserts in internal row to cudf row iterator suite| +|[#9239](https://github.com/NVIDIA/spark-rapids/pull/9239)|Handle escaping the dangling right ] and right } in the regexp transpiler| +|[#9090](https://github.com/NVIDIA/spark-rapids/pull/9090)|Add test cases for Parquet statistics| +|[#9240](https://github.com/NVIDIA/spark-rapids/pull/9240)|Fix flaky ORC filecache test| +|[#9053](https://github.com/NVIDIA/spark-rapids/pull/9053)|[DOC] update the turning guide document issues [skip ci]| +|[#9211](https://github.com/NVIDIA/spark-rapids/pull/9211)|Allow skipping host spill for a direct device->disk spill| +|[#9234](https://github.com/NVIDIA/spark-rapids/pull/9234)|Enable Spark 350 builds| +|[#9237](https://github.com/NVIDIA/spark-rapids/pull/9237)|Check for null keys when creating map| +|[#9235](https://github.com/NVIDIA/spark-rapids/pull/9235)|xfail fixed_length_byte_array.parquet test due to rapidsai/cudf#14104| +|[#9231](https://github.com/NVIDIA/spark-rapids/pull/9231)|Use conda libmamba solver to resolve intermittent libarchive issue [skip ci]| +|[#8404](https://github.com/NVIDIA/spark-rapids/pull/8404)|Add in support for FIXED_LEN_BYTE_ARRAY as binary| +|[#9225](https://github.com/NVIDIA/spark-rapids/pull/9225)|Add in a HostAlloc API for high priority and add in spilling| +|[#9207](https://github.com/NVIDIA/spark-rapids/pull/9207)|Support SplitAndRetry for GpuRangeExec| +|[#9217](https://github.com/NVIDIA/spark-rapids/pull/9217)|Fix leak in aggregate when there are retries| +|[#9200](https://github.com/NVIDIA/spark-rapids/pull/9200)|Fix a few minor things with scale test| +|[#9222](https://github.com/NVIDIA/spark-rapids/pull/9222)|Deploy classified aggregator for Databricks [skip ci]| +|[#9209](https://github.com/NVIDIA/spark-rapids/pull/9209)|Fix tests for datetime rebase in Databricks| +|[#9181](https://github.com/NVIDIA/spark-rapids/pull/9181)|[DOC] address document issues [skip ci]| +|[#9132](https://github.com/NVIDIA/spark-rapids/pull/9132)|Support `spark.sql.parquet.datetimeRebaseModeInWrite=LEGACY`| +|[#9196](https://github.com/NVIDIA/spark-rapids/pull/9196)|Fix host memory leak for R2C| +|[#9192](https://github.com/NVIDIA/spark-rapids/pull/9192)|Throw overflow exception when interval seconds are outside of range [0, 59]| +|[#9150](https://github.com/NVIDIA/spark-rapids/pull/9150)|add error section in report and the rest queries| +|[#9189](https://github.com/NVIDIA/spark-rapids/pull/9189)|Expose host store spill| +|[#9147](https://github.com/NVIDIA/spark-rapids/pull/9147)|Make map column non-nullable when it's a key in another map.| +|[#9193](https://github.com/NVIDIA/spark-rapids/pull/9193)|Support Retry for GpuLocalLimitExec and GpuGlobalLimitExec| +|[#9183](https://github.com/NVIDIA/spark-rapids/pull/9183)|Add test to verify UDT fallback for parquet| +|[#9195](https://github.com/NVIDIA/spark-rapids/pull/9195)|Deploy sql-plugin-api artifact in DBR CI pipelines [skip ci]| +|[#9170](https://github.com/NVIDIA/spark-rapids/pull/9170)|Add in new HostAlloc API| +|[#9182](https://github.com/NVIDIA/spark-rapids/pull/9182)|Consolidate Spark vendor shim dependency management| +|[#9190](https://github.com/NVIDIA/spark-rapids/pull/9190)|Prevent returning internal compiler expressions when compiling UDFs| +|[#9164](https://github.com/NVIDIA/spark-rapids/pull/9164)|Support Retry for GpuTopN and GpuSortEachBatchIterator| +|[#9134](https://github.com/NVIDIA/spark-rapids/pull/9134)|Fix shuffle fallback due to AQE on AWS EMR| +|[#9188](https://github.com/NVIDIA/spark-rapids/pull/9188)|Fix flaky tests in FileCacheIntegrationSuite| +|[#9148](https://github.com/NVIDIA/spark-rapids/pull/9148)|Add minimum Maven module eventually containing all non-shimmable source code| +|[#9169](https://github.com/NVIDIA/spark-rapids/pull/9169)|Add retry-without-split in InternalRowToColumnarBatchIterator| +|[#9172](https://github.com/NVIDIA/spark-rapids/pull/9172)|Remove doSetSpillable in favor of setSpillable| +|[#9152](https://github.com/NVIDIA/spark-rapids/pull/9152)|Add test cases for testing Parquet compression types| +|[#9157](https://github.com/NVIDIA/spark-rapids/pull/9157)|XFAIL parquet `lz4_raw` tests for Spark 3.5.0 or later| +|[#9128](https://github.com/NVIDIA/spark-rapids/pull/9128)|Test parquet predicate pushdown for basic types and fields having dots in names| +|[#9158](https://github.com/NVIDIA/spark-rapids/pull/9158)|Add json4s dependencies for Databricks integration_tests build| +|[#9102](https://github.com/NVIDIA/spark-rapids/pull/9102)|Add retry support to GpuOutOfCoreSortIterator.mergeSortEnoughToOutput | +|[#9089](https://github.com/NVIDIA/spark-rapids/pull/9089)|Add application to run Scale Test| +|[#9143](https://github.com/NVIDIA/spark-rapids/pull/9143)|[DOC] update spark.rapids.sql.concurrentGpuTasks default value in tuning guide [skip ci]| +|[#8476](https://github.com/NVIDIA/spark-rapids/pull/8476)|Use retry with split in GpuCachedDoublePassWindowIterator| +|[#9141](https://github.com/NVIDIA/spark-rapids/pull/9141)|Removed resultDecimalType in GpuIntegralDecimalDivide| +|[#9099](https://github.com/NVIDIA/spark-rapids/pull/9099)|Spark 3.5.0 follow-on work (rc2 support + Python UDAF)| +|[#9140](https://github.com/NVIDIA/spark-rapids/pull/9140)|Bump Jython to 2.7.3| +|[#9136](https://github.com/NVIDIA/spark-rapids/pull/9136)|Moving row column conversion code from cudf to jni| +|[#9133](https://github.com/NVIDIA/spark-rapids/pull/9133)|Add 350 tag to InSubqueryShims| +|[#9124](https://github.com/NVIDIA/spark-rapids/pull/9124)|Import `scala.collection` intead of `collection`| +|[#9122](https://github.com/NVIDIA/spark-rapids/pull/9122)|Fall back to CPU if `spark.sql.execution.arrow.useLargeVarTypes` is true| +|[#9115](https://github.com/NVIDIA/spark-rapids/pull/9115)|[DOC] updates documentation related to java compatibility [skip ci]| +|[#9098](https://github.com/NVIDIA/spark-rapids/pull/9098)|Add SpillableHostColumnarBatch| +|[#9091](https://github.com/NVIDIA/spark-rapids/pull/9091)|GPU support for DynamicPruningExpression and InSubqueryExec| +|[#9117](https://github.com/NVIDIA/spark-rapids/pull/9117)|Temply disable spark 350 shim build in nightly [skip ci]| +|[#9113](https://github.com/NVIDIA/spark-rapids/pull/9113)|Instantiate execution plan capture callback via shim loader| +|[#8969](https://github.com/NVIDIA/spark-rapids/pull/8969)|Initial support for Spark 3.5.0-rc1| +|[#9100](https://github.com/NVIDIA/spark-rapids/pull/9100)|Support broadcast nested loop existence joins with no condition| +|[#8925](https://github.com/NVIDIA/spark-rapids/pull/8925)|Add GpuConv operator for the `conv` 10<->16 expression| +|[#9109](https://github.com/NVIDIA/spark-rapids/pull/9109)|[DOC] adding java 11 to download docs [skip ci]| +|[#9085](https://github.com/NVIDIA/spark-rapids/pull/9085)|Retry with smaller split on `CudfColumnSizeOverflowException`| +|[#8961](https://github.com/NVIDIA/spark-rapids/pull/8961)|Save Databricks init scripts in the workspace| +|[#9088](https://github.com/NVIDIA/spark-rapids/pull/9088)|Add retry and SplitAndRetry support to AcceleratedColumnarToRowIterator| +|[#9095](https://github.com/NVIDIA/spark-rapids/pull/9095)|Support released spark 3.3.3| +|[#9084](https://github.com/NVIDIA/spark-rapids/pull/9084)|Fix race when a rapids buffer is aliased while it is spilled| +|[#9093](https://github.com/NVIDIA/spark-rapids/pull/9093)|Update ParquetFormatScanSuite to not call CUDF directly| +|[#9068](https://github.com/NVIDIA/spark-rapids/pull/9068)|Test ORC predicate pushdown (PPD) with timestamps decimals booleans| +|[#9054](https://github.com/NVIDIA/spark-rapids/pull/9054)|Initial entry point to data generation for scale test| +|[#9070](https://github.com/NVIDIA/spark-rapids/pull/9070)|Spillable host buffer| +|[#9066](https://github.com/NVIDIA/spark-rapids/pull/9066)|Add retry support to RowToColumnarIterator| +|[#9073](https://github.com/NVIDIA/spark-rapids/pull/9073)|Stop using invalid escape sequences| +|[#9018](https://github.com/NVIDIA/spark-rapids/pull/9018)|Add test for selecting a single complex field array and its parent struct array| +|[#9067](https://github.com/NVIDIA/spark-rapids/pull/9067)|Add array support for round robin partition; Refactor pluginSupportedOrderableSig| +|[#9072](https://github.com/NVIDIA/spark-rapids/pull/9072)|Revert "Implement SumUnboundedToUnboundedFixer (#8934)"| +|[#9056](https://github.com/NVIDIA/spark-rapids/pull/9056)|Add in configs for host memory limits| +|[#9061](https://github.com/NVIDIA/spark-rapids/pull/9061)|Fix import order| +|[#8934](https://github.com/NVIDIA/spark-rapids/pull/8934)|Implement SumUnboundedToUnboundedFixer| +|[#9051](https://github.com/NVIDIA/spark-rapids/pull/9051)|Use number of threads on executor instead of driver to set core count| +|[#9040](https://github.com/NVIDIA/spark-rapids/pull/9040)|Fix issues from 23.08 merge in join_test| +|[#9045](https://github.com/NVIDIA/spark-rapids/pull/9045)|Fix auto merge conflict 9043 [skip ci]| +|[#9009](https://github.com/NVIDIA/spark-rapids/pull/9009)|Add in a layer of indirection for task completion callbacks| +|[#9013](https://github.com/NVIDIA/spark-rapids/pull/9013)|Create a two-shim jar by default on Databricks| +|[#8995](https://github.com/NVIDIA/spark-rapids/pull/8995)|Add test case for ORC statistics test| +|[#8970](https://github.com/NVIDIA/spark-rapids/pull/8970)|Add ability to debug dump input data only on errors| +|[#9003](https://github.com/NVIDIA/spark-rapids/pull/9003)|Fix auto merge conflict 9002 [skip ci]| +|[#8989](https://github.com/NVIDIA/spark-rapids/pull/8989)|Mark lazy spillables as allowSpillable in during gatherer construction| +|[#8988](https://github.com/NVIDIA/spark-rapids/pull/8988)|Move big data generator to a separate module| +|[#8987](https://github.com/NVIDIA/spark-rapids/pull/8987)|Fix host memory buffer leaks in SerializationSuite| +|[#8968](https://github.com/NVIDIA/spark-rapids/pull/8968)|Enable GPU acceleration of Bloom filter join expressions by default| +|[#8947](https://github.com/NVIDIA/spark-rapids/pull/8947)|Add ArrowUtilsShims in preparation for Spark 3.5.0| +|[#8946](https://github.com/NVIDIA/spark-rapids/pull/8946)|[Spark 3.5.0] Shim access to StructType.fromAttributes| +|[#8824](https://github.com/NVIDIA/spark-rapids/pull/8824)|Drop the in-range check at INT96 output path| +|[#8924](https://github.com/NVIDIA/spark-rapids/pull/8924)|Deprecate and delegate GpuCV.debug to cudf TableDebug| +|[#8915](https://github.com/NVIDIA/spark-rapids/pull/8915)|Move LegacyBehaviorPolicy references to shim layer| +|[#8918](https://github.com/NVIDIA/spark-rapids/pull/8918)|Output unified diff when GPU output deviates| +|[#8857](https://github.com/NVIDIA/spark-rapids/pull/8857)|Remove the pageable pool| +|[#8854](https://github.com/NVIDIA/spark-rapids/pull/8854)|Fix auto merge conflict 8853 [skip ci]| +|[#8805](https://github.com/NVIDIA/spark-rapids/pull/8805)|Bump up dep versions to 23.10.0-SNAPSHOT| +|[#8796](https://github.com/NVIDIA/spark-rapids/pull/8796)|Init version 23.10.0-SNAPSHOT| ## Release 23.08