Generated on 2025-02-17
#11648 | [FEA] it would be nice if we could support org.apache.spark.sql.catalyst.expressions.Bin |
#11891 | [FEA] Support Spark 3.5.4 release |
#11928 | [FEA] make maxCpuBatchSize in GpuPartitioning configurable |
#10505 | [FOLLOW UP] Support row_number() filters for GpuWindowGroupLimitExec |
#11853 | [FEA] Ability to dump tables on a write |
#11804 | [FEA] Support TruncDate expression |
#11674 | [FEA] HiveHash supports nested types |
#11342 | [FEA] Put file writes in a background thread |
#11729 | [FEA] optimize the multi-contains generated by rlike |
#11860 | [FEA] kernel for date_trunc and trunc that has a scalar format |
#11812 | [FEA] Support escape characters in search list when rewrite regexp_replace to string replace |
#12091 | [BUG]An assertion error in the sized hash join |
#12096 | [BUG] CI_PART1 for DBR 14.3 hangs in the nightly pre-release pipeline |
#12076 | [BUG] ExtraPlugins might be loaded duplicated |
#11433 | [BUG] Spark UT framework: SPARK-34212 Parquet should read decimals correctly |
#12038 | [BUG] spark321 failed core dump in nightly |
#12046 | [BUG] orc_test fail non-UTC cases with Part of the plan is not columnar class org.apache.spark.sql.execution.FileSourceScanExec |
#12039 | [BUG] HostAllocSuite failed "Maximum pool size exceeded" in nightly UT tests |
#12036 | [BUG] The check in assertIsOnTheGpu method to test if a plan is on the GPU is not accurate |
#11989 | [BUG] ParquetCachedBatchSerializer does not grab the GPU semaphore and does not have retry blocks |
#11651 | [BUG] Parse regular expressions using JDK to make error behavior more consistent between CPU and GPU |
#11628 | [BUG] Spark UT framework: select one deep nested complex field after join, IOException parsing parquet |
#11629 | [BUG] Spark UT framework: select one deep nested complex field after outer join, IOException parsing parquet |
#11620 | [BUG] Spark UT framework: "select a single complex field and partition column" causes java.lang.IndexOutOfBoundsException |
#11621 | [BUG] Spark UT framework: partial schema intersection - select missing subfield causes java.lang.IndexOutOfBoundsException |
#11619 | [BUG] Spark UT framework: "select a single complex field" causes java.lang.IndexOutOfBoundsException |
#11975 | [FOLLOWUP] We should have a separate version definition for the rapids-4-spark-hybrid dependency |
#11976 | [BUG] scala 2.13 rapids_integration test failed |
#11971 | [BUG] scala213 nightly build failed rapids-4-spark-tests_2.13 of spark400 |
#11903 | [BUG] Unexpected large output batches due to implementation defects |
#11914 | [BUG] Nightly CI does not upload sources and Javadoc JARs as the release script does |
#11896 | [BUG] [BUILD] CI passes without checking for operatorsScore.csv , supportedExprs.csv update |
#11895 | [BUG] BasePythonRunner has a new parameter metrics in Spark 4.0 |
#11107 | [BUG] Rework RapidsShuffleManager initialization for Apache Spark 4.0.0 |
#11897 | [BUG] JsonScanRetrySuite is failing in the CI. |
#11885 | [BUG] data corruption with spill framework changes |
#11762 | [BUG] Non-nullable bools in a nullable struct fails |
#11526 | Fix Arithmetic tests on Databricks 14.3 |
#11866 | [BUG]The CHANGELOG is generated based on the project's roadmap rather than the target branch. |
#11749 | [BUG] Include Databricks 14.3 shim into the dist jar |
#11822 | [BUG] [Spark 4] Type mismatch Exceptions from DFUDFShims.scala with Spark-4.0.0 expressions.Expression |
#11760 | [BUG] isTimestamp leaks a Scalar |
#11796 | [BUG] populate-daily-cache action masks errors |
#10901 | from_json throws exception when the json's structure only partially matches the provided schema |
#11736 | [BUG] Orc writes don't fully support Booleans with nulls |
#12129 | Update dependency version JNI, private, hybrid to 25.02.0 [skip ci] |
#12102 | [DOC] update the download page for 2502 release [skip ci] |
#12112 | HybridParquetScan: Fix velox runtime error in hybrid scan when filter timestamp |
#12092 | Fix an assertion error in the sized hash join |
#12114 | Fix HybridParquetScan over select(1) |
#12109 | revert ucx 1.18 upgrade |
#12103 | Revert "Enable event log for qualification & profiling tools testing … |
#12058 | upgrade jucx to 1.18 |
#12077 | Fix the issue of ExtraPlugins loading multiple times |
#12080 | Quick fix for hybrid tests without git information. |
#12068 | Do not build Spark-4.0.0-SNAPSHOT [skip ci] |
#12064 | Run mvn with the project's pom.xml in hybrid_execution.sh |
#12060 | Relax decimal metadata checks for mismatched precision/scale |
#12054 | Update the version of the rapids-hybrid-execution dependency. |
#11970 | Explicitly set Delta table props to accommodate for different defaults |
#12044 | Set CI=true for complete failure reason in summary |
#12050 | Fixed FileSourceScanExec and BatchScanExec inadvertently falling to the CPU in non-utc orc tests |
#12000 | HybridParquetScan: Refine filter push down to avoid double evaluation |
#12037 | Removed the assumption if a plan is Columnar it probably is on the GPU |
#11991 | Grab the GPU Semaphore when reading cached batch data with the GPU |
#11880 | Perform handle spill IO outside of locked section in SpillFramework |
#11997 | Configure 14.3 support at runtime |
#11977 | Use bounce buffer pools in the Spill Framework |
#11912 | Ensure Java Compatibility Check for Regex Patterns |
#11984 | Include the size information when printing a SCB |
#11889 | Change order of initialization so pinned pool is available for spill framework buffers |
#11956 | Enable tests in RapidsParquetSchemaPruningSuite |
#11981 | Protect the batch read by a retry block in agg |
#11967 | Add support for org.apache.spark.sql.catalyst.expressions.Bin |
#11982 | Use common add-to-project action [skip ci] |
#11978 | Try to fix Scala 2.13 nightly failure: can not find version-def.sh |
#11973 | Minor change: Make Hybrid version a separate config like priviate repo |
#11969 | Support raise_error() on 14.3, Spark 4. |
#11972 | Update MockTaskContext to support new functions added in Spark-4.0 |
#11906 | Enable Hybrid test cases in premerge/nightly CIs |
#11720 | Introduce hybrid (CPU) scan for Parquet read |
#11911 | Avoid concatentating multiple host buffers when reading Parquet |
#11960 | Remove jlowe as committer since he retired |
#11958 | Update to use vulnerability-scan runner [skip ci] |
#11955 | Add Spark 3.5.4 shim |
#11959 | Remove inactive user from github workflow[skip ci] |
#11952 | Fix auto merge conflict 11948 [skip ci] |
#11908 | Fix two potential OOM issues in GPU aggregate. |
#11936 | Add throttle time metrics for async write |
#11929 | make maxCpuBatchSize in GpuPartitioning configurable |
#11939 | [DOC] update release note to add spark 353 support [skip ci] |
#11920 | Remove Alluxio support |
#11938 | Update codeowners file to use team [skip ci] |
#11915 | Deploy the sources and Javadoc JARs in the nightly CICD [skip ci] |
#11917 | Fix issue with CustomerShuffleReaderExec metadata copy |
#11910 | fix bug: enable if_modified_files check for all shims in github actions [skip ci] |
#11909 | Update copyright year in NOTICE [skip ci] |
#11907 | Fix generated doc for xxhash64 for Spark 400 |
#11905 | Fix the build error for Spark 400 |
#11904 | Eagerly initialize RapidsShuffleManager for SPARK-45762 |
#11865 | Async write support for ORC |
#11816 | address some comments for 11792 |
#11789 | Improve the retry support for nondeterministic expressions |
#11898 | Add missing json reader options for JsonScanRetrySuite |
#11859 | Xxhash64 supports nested types |
#11890 | Update operatorsScore,supportedExprs for TruncDate, TruncTimestamp |
#11886 | Support group-limit optimization for ROW_NUMBER |
#11887 | Make sure that the chunked packer bounce buffer is realease after the synchronize |
#11894 | Fix bug: add timeout for cache deps steps [skip ci] |
#11810 | Use faster multi-contains in rlike regex rewrite |
#11882 | Add metrics GpuPartitioning.CopyToHostTime |
#11864 | Add support for dumping write data to try and reproduce error cases |
#11781 | Fix non-nullable under nullable struct write |
#11877 | Fix auto merge conflict 11873 [skip ci] |
#11833 | Support trunc and date_trunc SQL function |
#11660 | Add HiveHash support for nested types |
#11855 | Add integration test for parquet async writer |
#11747 | Spill framework refactor for better performance and extensibility |
#11870 | Workaround: Exclude cudf_log.txt in RAT check |
#11867 | Generate the CHANGELOG based on the PR's target branch [skip ci] |
#11821 | add a few more stage level metrics |
#11856 | Document Hive text write serialization format checks |
#11805 | Enable some integration tests for from_json |
#11840 | Support running Databricks CI_PART2 integration tests with JARs built by CI_PART1 |
#11847 | Some small improvements |
#11811 | Fix bug: populate cache deps [skip ci] |
#11817 | Optimize Databricks Jenkins scripts [skip ci] |
#11829 | Some minor improvements identified during benchmark |
#11827 | Deal with Spark changes for column<->expression conversions |
#11826 | Balance the pre-merge CI job's time for the ci_1 and ci_2 tests |
#11784 | Add support for kudo write metrics |
#11783 | Fix the task count check in TrafficController |
#11813 | Support some escape chars when rewriting regexp_replace to stringReplace |
#11819 | Add the 'test_type' parameter for Databricks script |
#11786 | Enable license header check |
#11791 | Incorporate checksum of internal dependencies in the GH cache key [skip ci] |
#11788 | Support running Databricks CI_PART2 integration tests with JARs built by CI_PART1 |
#11778 | Remove unnecessary toBeReturned field from serialized batch iterators |
#11785 | Update advanced configs introduced by private repo [skip ci] |
#11772 | Update rapids JNI and private dependency to 25.02.0-SNAPSHOT |
#11756 | remove excluded release shim and TODO |
#11630 | [FEA] enable from_json and json scan by default |
#11709 | [FEA] Add support for MonthsBetween |
#11666 | [FEA] support task limit profiling for specified stages |
#11662 | [FEA] Support Apache Spark 3.4.4 |
#11657 | [FEA] Support format 'yyyyMMdd HH:mm:ss' for legacy mode |
#11419 | [FEA] Support Spark 3.5.3 release |
#11505 | [FEA] Support yyyymmdd format for GetTimestamp for LEGACY mode. |
#8391 | [FEA] Do a hash based re-partition instead of a sort based fallback for hash aggregate |
#11560 | [FEA] Improve GpuJsonToStructs performance |
#11458 | [FEA] enable prune_columns for from_json |
#11842 | [BUG] udf-examples-native case failed core dump |
#11718 | [BUG] update date/time APIs in CUDF java to avoid deprecated functions |
#10907 | from_json function parses a column containing an empty array, throws an exception. |
#11807 | [BUG] mismatched cpu and gpu result in test_lead_lag_for_structs_with_arrays intermittently |
#11793 | [BUG] "Time in Heuristic" should not include previous operator's compute time |
#11798 | [BUG] mismatch CPU and GPU result in test_months_between_first_day[DATAGEN_SEED=1733006411, TZ=Africa/Casablanca] |
#11790 | [BUG] test_hash_* failed "java.util.NoSuchElementException: head of empty list" or "Too many times of repartition, may hit a bug?" |
#11643 | [BUG] Support AQE with Broadcast Hash Join and DPP on Databricks 14.3 |
#10910 | from_json, when input = empty object, rapids throws an exception. |
#10891 | Parsing a column containing invalid json into StructureType with schema throws an Exception. |
#11741 | [BUG] Fix spark400 build due to writeWithV1 return value change |
#11533 | Fix JSON Matrix tests on Databricks 14.3 |
#11722 | [BUG] Spark 4.0.0 has moved NullIntolerant and builds are breaking because they are unable to find it. |
#11726 | [BUG] Databricks 14.3 nightly deploy fails due to incorrect DB_SHIM_NAME |
#11293 | [BUG] A user query with from_json failed with "JSON Parser encountered an invalid format at location" |
#9592 | [BUG][JSON] from_json to Map type should produce null for invalid entries |
#11715 | [BUG] parquet_testing_test.py failed on "AssertionError: GPU and CPU boolean values are different" |
#11716 | [BUG] delta_lake_write_test.py failed on "AssertionError: GPU and CPU boolean values are different" |
#11684 | [BUG] 24.12 Precommit fails with wrong number of arguments in GpuDataSource |
#11168 | [BUG] reserve allocation should be displayed when erroring due to lack of memory on startup |
#7585 | [BUG] [Regexp] Line anchor '$' incorrect matching of unicode line terminators |
#11622 | [BUG] GPU Parquet scan filter pushdown fails with timestamp/INT96 column |
#11646 | [BUG] NullPointerException in GpuRand |
#10498 | [BUG] Unit tests failed: [INTERVAL_ARITHMETIC_OVERFLOW] integer overflow. Use 'try_add' to tolerate overflow and return NULL instead |
#11659 | [BUG] parse_url throws exception if partToExtract is invalid while Spark returns null |
#10894 | Parsing a column containing a nested structure to json thows an exception |
#10895 | Converting a column containing a map into json throws an exception |
#10896 | Converting an column containing an array into json throws an exception |
#10915 | to_json when converts an array will throw an exception: |
#10916 | to_json function doesn't support map[string, struct] to json conversion. |
#10919 | to_json converting map[string, integer] to json, throws an exception |
#10920 | to_json converting an array with maps throws an exception. |
#10921 | to_json - array with single map |
#10923 | [BUG] Spark UT framework: to_json function to convert the array with a single empty row to a JSON string throws an exception. |
#10924 | [BUG] Spark UT framework: to_json when converts an empty array into json throws an exception. |
#11024 | Fix tests failures in parquet_write_test.py |
#11174 | Opcode Suite fails for Scala 2.13.8+ |
#10483 | [BUG] JsonToStructs fails to parse all empty dicts and invalid lines |
#10489 | [BUG] from_json does not support input with \n in it. |
#10347 | [BUG] Failures in Integration Tests on Dataproc Serverless |
#11021 | Fix tests failures in orc_cast_test.py |
#11609 | [BUG] test_hash_repartition_long_overflow_ansi_exception failed on 341DB |
#11600 | [BUG] regex_test failed mismatched cpu and gpu values in UT and IT |
#11611 | [BUG] Spark 4.0 build failure - value cannotSaveIntervalIntoExternalStorageError is not a member of object org.apache.spark.sql.errors.QueryCompilationErrors |
#10922 | from_json cannot support line separator in the input string. |
#11009 | Fix tests failures in cast_test.py |
#11572 | [BUG] MultiFileReaderThreadPool may flood the console with log messages |
#11950 | Update latest changelog [skip ci] |
#11947 | Update version to 24.12.1-SNAPSHOT [skip ci] |
#11943 | Update rapids JNI dependency to 24.12.1 |
#11944 | Update download page for 24.12.1 hot fix release [skip ci] |
#11876 | Update latest changelog [skip ci] |
#11874 | Remove 350db143 shim's build [skip ci] |
#11851 | Update latest changelog [skip ci] |
#11849 | Update rapids JNI and private dependency to 24.12.0 |
#11841 | [DOC] update doc for 24.12 release [skip ci] |
#11857 | Increase the pre-merge CI timeout to 6 hours |
#11845 | Fix leak in isTimeStamp |
#11823 | Fix for LEAD/LAG window function test failures. |
#11832 | Fix leak in GpuBroadcastNestedLoopJoinExecBase |
#11763 | Orc writes don't fully support Booleans with nulls |
#11794 | exclude previous operator's time out of firstBatchHeuristic |
#11802 | Fall back to CPU for non-UTC months_between |
#11792 | [BUG] Fix issue 11790 |
#11768 | Fix dpp_test.py failures on 14.3 |
#11752 | Ability to decompress snappy and zstd Parquet files via CPU |
#11777 | Append knoguchi22 to blossom-ci whitelist [skip ci] |
#11712 | repartition-based fallback for hash aggregate v3 |
#11771 | Fix query hang when using rapids multithread shuffle manager with kudo |
#11759 | Avoid using StringBuffer in single-threaded methods. |
#11766 | Fix Kudo batch serializer to only read header in hasNext |
#11730 | Add support for asynchronous writing for parquet |
#11750 | Fix aqe_test failures on 14.3. |
#11753 | Enable JSON Scan and from_json by default |
#11733 | Print out the current attempt object when OOM inside a retry block |
#11618 | Execute from_json with struct schema using JSONUtils.fromJSONToStructs |
#11725 | host watermark metric |
#11746 | Remove batch size bytes limits |
#11723 | Add NVIDIA Copyright |
#11721 | Add a few more JSON tests for MAP<STRING,STRING> |
#11744 | Do not package the Databricks 14.3 shim into the dist jar [skip ci] |
#11724 | Integrate with kudo |
#11739 | Update to Spark 4.0 changing signature of SupportsV1Write.writeWithV1 |
#11737 | Add in support for months_between |
#11700 | Fix leak with RapidsHostColumnBuilder in GpuUserDefinedFunction |
#11727 | Widen type promotion for decimals with larger scale in Parquet Read |
#11719 | Skip from_json overflow tests for 14.3 |
#11708 | Support profiling for specific stages on a limited number of tasks |
#11731 | Add NullIntolerantShim to adapt to Spark 4.0 removing NullIntolerant |
#11413 | Support multi string contains |
#11728 | Change Databricks 14.3 shim name to spark350db143 [skip ci] |
#11702 | Improve JSON scan and from_json |
#11635 | Added Shims for adding Databricks 14.3 Support |
#11714 | Let AWS Databricks automatically choose an Availability Zone |
#11703 | Simplify $ transpiling and fix newline character bug |
#11707 | impalaFile cannot be found by UT framework. |
#11697 | Make delta-lake shim dependencies parametrizable |
#11710 | Add shim version 344 to LogicalPlanShims.scala |
#11706 | Add retry support in sub hash join |
#11673 | Fix Parquet Writer tests on 14.3 |
#11669 | Fix string_test for 14.3 |
#11692 | Add Spark 3.4.4 Shim |
#11695 | Fix spark400 build due to LogicalRelation signature changes |
#11689 | Update the Maven repository to download Spark JAR files [skip ci] |
#11670 | Fix misc_expr_test for 14.3 |
#11652 | Fix skipping fixed_length_char ORC tests on > 13.3 |
#11644 | Skip AQE-join-DPP tests for 14.3 |
#11667 | Preparation for the coming Kudo support |
#11685 | Exclude shimplify-generated files from scalastyle |
#11282 | Reserve allocation should be displayed when erroring due to lack of memory on startup |
#11671 | Use the new host memory allocation API |
#11682 | Fix auto merge conflict 11679 [skip ci] |
#11663 | Simplify Transpilation of $ with Extended Line Separator Support in cuDF Regex |
#11672 | Fix race condition with Parquet filter pushdown modifying shared hadoop Configuration |
#11596 | Add a new NVTX range for task GPU ownership |
#11664 | Fix orc_write_test.py for 14.3 |
#11656 | [DOC] update the supported OS in download page [skip ci] |
#11665 | Generate classes identical up to the shim package name |
#11647 | Fix a NPE issue in GpuRand |
#11658 | Support format 'yyyyMMdd HH:mm:ss' for legacy mode |
#11661 | Support invalid partToExtract for parse_url |
#11520 | UT adjust override checkScanSchemata & enabling ut of exclude_by_suffix fea. |
#11634 | Put DF_UDF plugin code into the main uber jar. |
#11522 | UT adjust test SPARK-26677: negated null-safe equality comparison |
#11521 | Datetime rebasing issue fixed |
#11642 | Update to_json to be more generic and fix some bugs |
#11615 | Spark 4 parquet_writer_test.py fixes |
#11623 | Fix collection_ops_test for 14.3 |
#11553 | Fix udf-compiler scala2.13 internal return statements |
#11640 | Disable date/timestamp types by default when parsing JSON |
#11570 | Add support for Spark 3.5.3 |
#11591 | Spark UT framework: Read Parquet file generated by parquet-thrift Rapids, UT case adjust. |
#11631 | Update JSON tests based on a closed/fixed issues |
#11617 | Quick fix for the build script failure of Scala 2.13 jars [skip ci] |
#11614 | Ensure repartition overflow test always overflows |
#11612 | Revert "Disable regex tests to unblock CI (#11606)" |
#11597 | install_deps changes for Databricks 14.3 |
#11608 | Use mvn -f scala2.13/ in the build scripts to build the 2.13 jars |
#11610 | Change DataSource calendar interval error to fix spark400 build |
#11549 | Adopt JSONUtils.concatenateJsonStrings for concatenating JSON strings |
#11595 | Remove an unused config shuffle.spillThreads |
#11606 | Disable regex tests to unblock CI |
#11605 | Fix auto merge conflict 11604 [skip ci] |
#11587 | avoid long tail tasks due to PrioritySemaphore, remaing part |
#11574 | avoid long tail tasks due to PrioritySemaphore |
#11559 | [Spark 4.0] Address test failures in cast_test.py |
#11579 | Fix merge conflict with branch-24.10 |
#11571 | Log reconfigure multi-file thread pool only once |
#11564 | Disk spill metric |
#11561 | Add in a basic plugin for dataframe UDF support in Apache Spark |
#11563 | Fix the latest merge conflict in integration tests |
#11542 | Update rapids JNI and private dependency to 24.12.0-SNAPSHOT [skip ci] |
#11493 | Support legacy mode for yyyymmdd format |
Changelog of older releases can be found at docs/archives