Generated on 2025-02-17
#11525 | [FEA] If dump always is enabled dump before decoding the file |
#11461 | [FEA] Support non-UTC timezone for casting from date to timestamp |
#11445 | [FEA] Support format 'yyyyMMdd' in GetTimestamp operator |
#11442 | [FEA] Add in support for setting row group sizes for parquet |
#11330 | [FEA] Add companion metrics for all nsTiming metrics to measure time elapsed excluding semaphore wait |
#5223 | [FEA] Support array_join |
#10968 | [FEA] support min_by function |
#10437 | [FEA] Add Spark 3.5.2 snapshot support |
#10799 | [FEA] Optimize count distinct performance optimization with null columns reuse and post expand coalesce |
#8301 | [FEA] semaphore prioritization |
#11234 | Explore swapping build table for left outer joins |
#11263 | [FEA] Cluster/pack multi_get_json_object paths by common prefixes |
#11558 | [BUG] test_sortmerge_join_ridealong fails on DB 13.3 |
#11573 | [BUG] very long tail task is observed when many tasks are contending for PrioritySemaphore |
#11367 | [BUG] Error "table_view.cpp:36: Column size mismatch" when using approx_percentile on a string column |
#11543 | [BUG] test_yyyyMMdd_format_for_legacy_mode[DATAGEN_SEED=1727619674, TZ=UTC] failed GPU and CPU are not both null |
#11500 | [BUG] dataproc serverless Integration tests failing in json_matrix_test.py |
#11384 | [BUG] "rs. shuffle write time" negative values seen in app history log |
#11509 | [BUG] buildall no longer works |
#11501 | [BUG] test_yyyyMMdd_format_for_legacy_mode failed in Dataproc Serverless integration tests |
#11502 | [BUG] IT script failed get jars as we stop deploying intermediate jars since 24.10 |
#11479 | [BUG] spark400 build failed do not conform to class UnaryExprMeta's type parameter |
#8558 | [BUG] from_json generated inconsistent result comparing with CPU for input column with nested json strings |
#11485 | [BUG] Integration tests failing in join_test.py |
#11481 | [BUG] non-utc integration tests failing in json_test.py |
#10911 | from_json: when input is a bad json string, rapids would throw an exception. |
#10457 | [BUG] ScanJson and JsonToStructs allow unquoted control chars by default |
#10479 | [BUG] JsonToStructs and ScanJson should return null for non-numeric, non-boolean non-quoted strings |
#10534 | [BUG] Need Improved JSON Validation |
#11436 | [BUG] Mortgage unit tests fail with RAPIDS shuffle manager |
#11437 | [BUG] array and map casts to string tests failed |
#11463 | [BUG] hash_groupby_approx_percentile failed assert is None |
#11465 | [BUG] java.lang.NoClassDefFoundError: org/apache/spark/BuildInfo$ in non-databricks environment |
#11359 | [BUG] a couple of arithmetic_ops_test.py cases failed mismatching cpu and gpu values with [DATAGEN_SEED=1723985531, TZ=UTC, INJECT_OOM] |
#11392 | [AUDIT] Handle IgnoreNulls Expressions for Window Expressions |
#10770 | [BUG] Slow/no progress with cascaded pandas udfs/mapInPandas in Databricks |
#11397 | [BUG] We should not be using copyWithBooleanColumnAsValidity unless we can prove it is 100% safe |
#11372 | [BUG] spark400 failed compiling datagen_2.13 |
#11364 | [BUG] Missing numRows in the ColumnarBatch created in GpuBringBackToHost |
#11350 | [BUG] spark400 compile failed in scala213 |
#11346 | [BUG] databrick nightly failing with not able to get spark-version-info.properties |
#9604 | [BUG] Delta Lake metadata query detection can trigger extra file listing jobs |
#11318 | [BUG] GPU query is case sensitive on Hive text table's column name |
#10596 | [BUG] ScanJson and JsonToStructs does not deal with escaped single quotes properly |
#10351 | [BUG] test_from_json_mixed_types_list_struct failed |
#11294 | [BUG] binary-dedupe leaves around a copy of "unshimmed" class files in spark-shared |
#11183 | [BUG] Failed to split an empty string with error "ai.rapids.cudf.CudfException: parallel_for failed: cudaErrorInvalidDevice: invalid device ordinal" |
#11008 | Fix tests failures in ast_test.py |
#11265 | [BUG] segfaults seen in cuDF after prefetch calls intermittently |
#11683 | [DOC] update download page for 2410 hot fix release [skip ci] |
#11680 | Update latest changelog [skip ci] |
#11678 | Update version to 24.10.1-SNAPSHOT [skip ci] |
#11676 | Fix race condition with Parquet filter pushdown modifying shared hadoop Configuration |
#11626 | Update latest changelog [skip ci] |
#11624 | Update the download link [skip ci] |
#11577 | Update latest changelog [skip ci] |
#11576 | Update rapids JNI and private dependency to 24.10.0 |
#11582 | [DOC] update doc for 24.10 release [skip ci] |
#11414 | Fix collection_ops_tests for Spark 4.0 |
#11588 | backport fixes of #11573 to branch 24.10 |
#11569 | Have "dump always" dump input files before trying to decode them |
#11544 | Update test case related to LEACY datetime format to unblock nightly CI |
#11567 | Fix test case unix_timestamp(col, 'yyyyMMdd') failed for Africa/Casablanca timezone and LEGACY mode |
#11519 | Spark 4: Fix parquet_test.py |
#11496 | Update test now that code is fixed |
#11548 | Fix negative rs. shuffle write time |
#11545 | Update test case related to LEACY datetime format to unblock nightly CI |
#11515 | Propagate default DIST_PROFILE_OPT profile to Maven in buildall |
#11497 | Update from_json to use new cudf features |
#11516 | Deploy all submodules for default sparkver in nightly [skip ci] |
#11484 | Fix FileAlreadyExistsException in LORE dump process |
#11457 | GPU device watermark metrics |
#11507 | Replace libmamba-solver with mamba command [skip ci] |
#11503 | Download artifacts via wget [skip ci] |
#11490 | Use UnaryLike instead of UnaryExpression |
#10798 | Optimizing Expand+Aggregate in sqls with many count distinct |
#11366 | Enable parquet suites from Spark UT |
#11477 | Install cuDF-py against python 3.10 on Databricks |
#11462 | Support non-UTC timezone for casting from date type to timestamp type |
#11449 | Support yyyyMMdd in GetTimestamp operator for LEGACY mode |
#11456 | Enable tests for all JSON white space normalization |
#11483 | Use reusable auto-merge workflow [skip ci] |
#11482 | Fix a json test for non utc time zone |
#11464 | Use improved CUDF JSON validation |
#11474 | Enable tests after string_split was fixed |
#11473 | Revert "Skip test_hash_groupby_approx_percentile byte and double test… |
#11466 | Replace scala.util.Try with a try statement in the DBR buildinfo |
#11469 | Skip test_hash_groupby_approx_percentile byte and double tests tempor… |
#11429 | Fixed some of the failing parquet_tests |
#11455 | Log DBR BuildInfo |
#11451 | xfail array and map cast to string tests |
#11331 | Add companion metrics for all nsTiming metrics without semaphore |
#11421 | [DOC] remove the redundant archive link [skip ci] |
#11308 | Dynamic Shim Detection for build Process |
#11427 | Update CI scripts to work with the "Dynamic Shim Detection" change [skip ci] |
#11425 | Update signoff usage [skip ci] |
#11420 | Add in array_join support |
#11418 | stop using copyWithBooleanColumnAsValidity |
#11411 | Fix asymmetric join crash when stream side is empty |
#11395 | Fix a Pandas UDF slowness issue |
#11371 | Support MinBy and MaxBy for non-float ordering |
#11399 | stop using copyWithBooleanColumnAsValidity |
#11389 | prevent duplicate queueing in the prio semaphore |
#11291 | Add distinct join support for right outer joins |
#11396 | Drop cudf-py python 3.9 support [skip ci] |
#11393 | Revert work-around for empty split-string |
#11334 | Add support for Spark 3.5.2 |
#11388 | JSON tests for corrected date, timestamp, and mixed types |
#11375 | Fix spark400 build in datagen and tests |
#11376 | Create a PrioritySemaphore to back the GpuSemaphore |
#11383 | Fix nightly snapshots being downloaded in premerge build |
#11368 | Move SparkRapidsBuildInfoEvent to its own file |
#11329 | Change reference to MapUtils into JSONUtils |
#11365 | Set numRows for the ColumnBatch created in GpuBringBackToHost |
#11363 | Fix failing test compile for Spark 4.0.0 |
#11362 | Add tests for repeated JSON columns/keys |
#11321 | conform dependency list in 341db to previous versions style |
#10604 | Add string escaping JSON tests to the test_json_matrix |
#11328 | Swap build side for outer joins when natural build side is explosive |
#11358 | Fix download doc [skip ci] |
#11357 | Fix auto merge conflict 11354 [skip ci] |
#11347 | Revert "Fix the mismatching default configs in integration tests (#11283)" |
#11323 | replace inputFiles with location.rootPaths.toString |
#11340 | Audit script - Check commits from sql-hive directory [skip ci] |
#11283 | Fix the mismatching default configs in integration tests |
#11327 | Make hive column matches not case-sensitive |
#11324 | Append ustcfy to blossom-ci whitelist [skip ci] |
#11325 | Fix auto merge conflict 11317 [skip ci] |
#11319 | Update passing JSON tests after list support added in CUDF |
#11307 | Safely close multiple resources in RapidsBufferCatalog |
#11313 | Fix auto merge conflict 10845 11310 [skip ci] |
#11312 | Add jihoonson as an authorized user for blossom-ci [skip ci] |
#11302 | Fix display issue of lore.md |
#11301 | Skip deploying non-critical intermediate artifacts [skip ci] |
#11299 | Enable get_json_object by default and remove legacy version |
#11289 | Use the new chunked API from multi-get_json_object |
#11295 | Remove redundant classes from the dist jar and unshimmed list |
#11284 | Use distinct count to estimate join magnification factor |
#11288 | Move easy unshimmed classes to sql-plugin-api |
#11285 | Remove files under tools/generated_files/spark31* [skip ci] |
#11280 | Asynchronously copy table data to the host during shuffle |
#11258 | Explicitly disable ANSI mode for ast_test.py |
#11267 | Update the rapids JNI and private dependency version to 24.10.0-SNAPSHOT |
#9259 | [FEA] Create Spark 4.0.0 shim and build env |
#10366 | [FEA] It would be nice if we could support Hive-style write bucketing table |
#10987 | [FEA] Implement lore framework to support all operators. |
#11087 | [FEA] Support regex pattern with brackets when rewrite to PrefixRange patten in rlike |
#22 | [FEA] Add support for bucketed writes |
#9939 | [FEA] GpuInsertIntoHiveTable supports parquet format |
#8750 | [FEA] Rework GpuSubstringIndex to use cudf::slice_strings |
#7404 | [FEA] explore a hash agg passthrough on partial aggregates |
#10976 | Rewrite `pattern1 |
#11287 | [BUG] String split APIs on empty string produce incorrect result |
#11270 | [BUG] test_regexp_replace[DATAGEN_SEED=1722297411, TZ=UTC] hanging there forever in pre-merge CI intermittently |
#9682 | [BUG] Casting FLOAT64 to DECIMAL(12,7) produces different rows from Apache Spark CPU |
#10809 | [BUG] cast(9.95 as decimal(3,1)), actual: 9.9, expected: 10.0 |
#11266 | [BUG] test_broadcast_hash_join_constant_keys failed in databricks runtimes |
#11243 | [BUG] ArrayIndexOutOfBoundsException on a left outer join |
#11030 | Fix tests failures in string_test.py |
#11245 | [BUG] mvn verify for the source-javadoc fails and no pre-merge check catches it |
#11223 | [BUG] Remove unreferenced CUDF_VER=xxx in the CI script |
#11114 | [BUG] Update nightly tests for Scala 2.13 to use JDK 17 only |
#11229 | [BUG] test_delta_name_column_mapping_no_field_ids fails on Spark |
#11031 | Fix tests failures in multiple files |
#10948 | Figure out why MapFromArrays appears in the tests for hive parquet write |
#11018 | Fix tests failures in hash_aggregate_test.py |
#11173 | [BUG] The rs. serialization time metric is misleading |
#11017 | Fix tests failures in url_test.py |
#11201 | [BUG] Delta Lake tables with name mapping can throw exceptions on read |
#11175 | [BUG] Clean up unused and duplicated 'org/roaringbitmap' folder in the spark3xx shims |
#11196 | [BUG] pipeline failed due to class not found exception: NoClassDefFoundError: com/nvidia/spark/rapids/GpuScalar |
#11189 | [BUG] regression in NDS after PR #11170 |
#11167 | [BUG] UnsupportedOperationException during delta write with optimize() |
#11172 | [BUG] get_json_object returns wrong output with wildcard path |
#11148 | [BUG] Integration test test_write_hive_bucketed_table fails |
#11155 | [BUG] ArrayIndexOutOfBoundsException in BatchWithPartitionData.splitColumnarBatch |
#11152 | [BUG] LORE dumping consumes too much memory. |
#11029 | Fix tests failures in subquery_test.py |
#11150 | [BUG] hive_parquet_write_test.py::test_insert_hive_bucketed_table failure |
#11070 | [BUG] numpy2 fail fastparquet cases: numpy.dtype size changed |
#11136 | UnaryPositive expression doesn't extend UnaryExpression |
#11122 | [BUG] UT MetricRange failed 651070526 was not less than 1.5E8 in spark313 |
#11119 | [BUG] window_function_test.py::test_window_group_limits_fallback_for_row_number fails in a distributed environment |
#11023 | Fix tests failures in dpp_test.py |
#11026 | Fix tests failures in map_test.py |
#11020 | Fix tests failures in grouping_sets_test.py |
#11113 | [BUG] Update premerge tests for Scala 2.13 to use JDK 17 only |
#11027 | Fix tests failures in sort_test.py |
#10775 | [BUG] Issues found by Spark UT Framework on RapidsStringExpressionsSuite |
#11033 | [BUG] CICD failed a case: cmp_test.py::test_empty_filter[>] |
#11103 | [BUG] UCX Shuffle With scala.MatchError |
#11007 | Fix tests failures in array_test.py |
#10801 | [BUG] JDK17 nightly build after Spark UT Framework is merged |
#11019 | Fix tests failures in window_function_test.py |
#11063 | [BUG] op time for GpuCoalesceBatches is more than actual |
#11006 | Fix test failures in arithmetic_ops_test.py |
#10995 | Fallback TimeZoneAwareExpression that only support UTC with zoneId instead of timeZone config |
#8652 | [BUG] array_item test failures on Spark 3.3.x |
#11053 | [BUG] Build on Databricks 330 fails |
#10925 | Concat cannot accept no parameter |
#10975 | [BUG] regex ^.*literal cannot be rewritten as contains(literal) for multiline strings |
#10956 | [BUG] hive_parquet_write_test.py: test_write_compressed_parquet_into_hive_table integration test failures |
#10772 | [BUG] Issues found by Spark UT Framework on RapidsDataFrameAggregateSuite |
#10986 | [BUG]Cast from string to float using hand-picked values failed in CastOpSuite |
#10972 | Spark 4.0 compile errors |
#10794 | [BUG] Incorrect cast of string columns containing various infinity notations with trailing spaces |
#10964 | [BUG] Improve stability of pre-merge jenkinsfile |
#10714 | Signature changed for PythonUDFRunner.writeUDFs |
#10712 | [AUDIT] BatchScanExec/DataSourceV2Relation to group splits by join keys if they differ from partition keys |
#10673 | [AUDIT] Rename plan nodes for PythonMapInArrowExec |
#10710 | [AUDIT] uncacheTableOrView changed in CommandUtils |
#10711 | [AUDIT] Match DataSourceV2ScanExecBase changes to groupPartitions method |
#10669 | Supporting broadcast of multiple filtering keys in DynamicPruning |
#11400 | [DOC] update notes in download page for the decompressing gzip issue [skip ci] |
#11355 | Update changelog for the v24.08 release [skip ci] |
#11353 | Update download doc for v24.08.1 [skip ci] |
#11352 | Update version to 24.08.1-SNAPSHOT [skip ci] |
#11337 | Update changelog for the v24.08 release [skip ci] |
#11335 | Fix Delta Lake truncation of min/max string values |
#11304 | Update changelog for v24.08.0 release [skip ci] |
#11303 | Update rapids JNI and private dependency to 24.08.0 |
#11296 | [DOC] update doc for 2408 release [skip CI] |
#11309 | [Doc ]Update lore doc about the range [skip ci] |
#11292 | Add work around for string split with empty input. |
#11278 | Fix formatting of advanced configs doc |
#10917 | Adopt changes from JNI for casting from float to decimal |
#11269 | Revert "upgrade ucx to 1.17.0" |
#11260 | Mitigate intermittent test_buckets and shuffle_smoke_test OOM issue |
#11268 | Fix degenerate conditional nested loop join detection |
#11244 | Fix ArrayIndexOutOfBoundsException on join counts with constant join keys |
#11259 | CI Docker to support integration tests with Rocky OS + jdk17 [skip ci] |
#11247 | Fix string_test.py errors on Spark 4.0 |
#11246 | Rework Maven Source Plugin Skip |
#11149 | Rework on substring index |
#11236 | Remove the unused vars from the version-def CI script |
#11237 | Fork jvm for maven-source-plugin |
#11200 | Multi-get_json_object |
#11230 | Skip test where Delta Lake may not be fully compatible with Spark |
#11220 | Avoid failing spark bug SPARK-44242 while generate run_dir |
#11226 | Fix auto merge conflict 11212 |
#11129 | Spark 4: Fix miscellaneous tests including logic, repart, hive_delimited. |
#11163 | Support MapFromArrays on GPU |
#11219 | Fix hash_aggregate_test.py to run with ANSI enabled |
#11186 | from_json Json to Struct Exception Logging |
#11180 | More accurate estimation for the result serialization time in RapidsShuffleThreadedWriterBase |
#11194 | Fix ANSI mode test failures in url_test.py |
#11202 | Fix read from Delta Lake table with name column mapping and missing Parquet IDs |
#11185 | Fix multi-release jar problem |
#11144 | Build the Scala2.13 dist jar with JDK17 |
#11197 | Fix class not found error: com/nvidia/spark/rapids/GpuScalar |
#11191 | Fix dynamic pruning regression in GpuFileSourceScanExec |
#10994 | Add Spark 4.0.0 Build Profile and Other Supporting Changes |
#11192 | Append new authorized user to blossom-ci whitelist [skip ci] |
#11179 | Allow more expressions to be tiered |
#11141 | Enable some Rapids config in RapidsSQLTestsBaseTrait for Spark UT |
#11170 | Avoid listFiles or inputFiles on relations with static partitioning |
#11159 | Drop spark31x shims |
#10951 | Case when performance improvement: reduce the copy_if_else |
#11165 | Fix some GpuBroadcastToRowExec by not dropping columns |
#11126 | Coalesce batches after a logical coalesce operation |
#11164 | fix the bucketed write error for non-utc cases |
#11132 | Add deletion vector metrics for low shuffle merge. |
#11156 | Fix batch splitting for partition column size on row-count-only batches |
#11153 | Fix LORE dump oom. |
#11102 | Fix ANSI mode failures in subquery_test.py |
#11151 | Fix the test error of the bucketed write for the non-utc case |
#11147 | upgrade ucx to 1.17.0 |
#11138 | Update fastparquet to 2024.5.0 for numpy2 compatibility |
#11137 | Handle the change for UnaryPositive now extending RuntimeReplaceable |
#11094 | Add HiveHash support on GPU |
#11139 | Improve MetricsSuite to allow more gc jitter |
#11133 | Fix test_window_group_limits_fallback |
#11097 | Fix miscellaneous integ tests for Spark 4 |
#11118 | Fix issue with DPP and AQE on reused broadcast exchanges |
#11043 | Dataproc serverless test fixes |
#10965 | Profiler: Disable collecting async allocation events by default |
#11117 | Update Scala2.13 premerge CI against JDK17 |
#11084 | Introduce LORE framework. |
#11099 | Spark 4: Handle ANSI mode in sort_test.py |
#11115 | Fix match error in RapidsShuffleIterator.scala [scala2.13] |
#11088 | Support regex patterns with brackets when rewriting to PrefixRange pattern in rlike. |
#10950 | Add a heuristic to skip second or third agg pass |
#11048 | Fixed array_tests for Spark 4.0.0 |
#11049 | Fix some cast_tests for Spark 4.0.0 |
#11066 | Replaced spark3xx-common references to spark-shared |
#11083 | Exclude a case based on JDK version in Spark UT |
#10997 | Fix some test issues in Spark UT and keep RapidsTestSettings update-to-date |
#11073 | Disable ANSI mode for window function tests |
#11076 | Improve the diagnostics for 'conv' fallback explain |
#11092 | Add GpuBucketingUtils shim to Spark 4.0.0 |
#11062 | fix duplicate counted metrics like op time for GpuCoalesceBatches |
#11044 | Fixed Failing tests in arithmetic_ops_tests for Spark 4.0.0 |
#11086 | upgrade blossom-ci actions version [skip ci] |
#10957 | Support bucketing write for GPU |
#10979 | [FEA] Introduce low shuffle merge. |
#10996 | Fallback non-UTC TimeZoneAwareExpression with zoneId |
#11072 | Workaround numpy2 failed fastparquet compatibility tests |
#11046 | Calculate parallelism to speed up pre-merge CI |
#11054 | fix flaky array_item test failures |
#11051 | [FEA] Increase parallelism of deltalake test on databricks |
#10993 | binary-dedupe changes for Spark 4.0.0 |
#11060 | Add in the ability to fingerprint JSON columns |
#11059 | Revert "Add in the ability to fingerprint JSON columns (#11002)" [skip ci] |
#11039 | Concat() Exception bug fix |
#11002 | Add in the ability to fingerprint JSON columns |
#10977 | Rewrite multiple literal choice regex to multiple contains in rlike |
#11035 | Fix auto merge conflict 11034 [skip ci] |
#11040 | Append new authorized user to blossom-ci whitelist [skip ci] |
#11036 | Update blossom-ci ACL to secure format [skip ci] |
#11032 | Fix a hive write test failure for Spark 350 |
#10998 | Improve log to print more lines in build [skip ci] |
#10992 | Addressing the Named Parameter change in Spark 4.0.0 |
#10943 | Fix Spark UT issues in RapidsDataFrameAggregateSuite |
#10963 | Add rapids configs to enable GPU running in Spark UT |
#10978 | More compilation fixes for Spark 4.0.0 |
#10953 | Speed up the integration tests by running them in parallel on the Databricks cluster |
#10958 | Fix a hive write test failure |
#10970 | Move Support for RaiseError to a Shim Excluding Spark 4.0.0 |
#10966 | Add default value for REF of premerge jenkinsfile to avoid bad overwritten [skip ci] |
#10959 | Add new ID to blossom-ci allow list [skip ci] |
#10952 | Add shims to take care of the signature change for writeUDFs in PythonUDFRunner |
#10931 | Add Support for Renaming of PythonMapInArrow |
#10949 | Change dependency version to 24.08.0-SNAPSHOT |
#10857 | [Spark 4.0] Account for PartitionedFileUtil.splitFiles signature change. |
#10912 | GpuInsertIntoHiveTable supports parquet format |
#10863 | [Spark 4.0] Account for CommandUtils.uncacheTableOrView signature change. |
#10944 | Added Shim for BatchScanExec to Support Spark 4.0 |
#10946 | Unarchive Spark test jar for spark.read(ability) |
#10945 | Add Support for Multiple Filtering Keys for Subquery Broadcast |
#10871 | Add classloader diagnostics to initShuffleManager error message |
#10933 | Fixed Databricks build |
#10929 | Append new authorized user to blossom-ci whitelist [skip ci] |
#10850 | [FEA] Refine the test framework introduced in #10745 |
#6969 | [FEA] Support parse_url |
#10496 | [FEA] Drop support for CentOS7 |
#10760 | [FEA]Support ArrayFilter |
#10721 | [FEA] Dump the complete set of build-info properties to the Spark eventLog |
#10666 | [FEA] Create Spark 3.4.3 shim |
#8963 | [FEA] Use custom kernel for parse_url |
#10817 | [FOLLOW ON] Combining regex parsing in transpiling and regex rewrite in rlike |
#10821 | Rewrite pattern[A-B]{X,Y} (a pattern string followed by X to Y chars in range A - B) in RLIKE to a custom kernel |
#10928 | [BUG] 24.06 test_conditional_with_side_effects_case_when test failed on Scala 2.13 with DATAGEN_SEED=1716656294 |
#10941 | [BUG] Failed to build on databricks due to GpuOverrides.scala:4264: not found: type GpuSubqueryBroadcastMeta |
#10902 | Spark UT failed: SPARK-37360: Timestamp type inference for a mix of TIMESTAMP_NTZ and TIMESTAMP_LTZ |
#10899 | [BUG] format_number Spark UT failed because Type conversion is not allowed |
#10913 | [BUG] rlike with empty pattern failed with 'NoSuchElementException' when enabling regex rewrite |
#10774 | [BUG] Issues found by Spark UT Framework on RapidsRegexpExpressionsSuite |
#10606 | [BUG] Update Plugin to use the new getPartitionedFile method |
#10806 | [BUG] orc_write_test.py::test_write_round_trip_corner failed with DATAGEN_SEED=1715517863 |
#10831 | [BUG] Failed to read data from iceberg |
#10810 | [BUG] NPE when running ParseUrl tests in RapidsStringExpressionsSuite |
#10797 | [BUG] udf_test test_single_aggregate_udf, test_group_aggregate_udf and test_group_apply_udf_more_types failed on DB 13.3 |
#10719 | [BUG] test_exact_percentile_groupby FAILED: hash_aggregate_test.py::test_exact_percentile_groupby with DATAGEN seed 1713362217 |
#10738 | [BUG] test_exact_percentile_groupby_partial_fallback_to_cpu failed with DATAGEN_SEED=1713928179 |
#10768 | [DOC] Dead links with tools pages |
#10751 | [BUG] Cascaded Pandas UDFs not working as expected on Databricks when plugin is enabled |
#10318 | [BUG] fs.azure.account.keyInvalid configuration issue while reading from Unity Catalog Tables on Azure DB |
#10722 | [BUG] "Could not find any rapids-4-spark jars in classpath" error when debugging UT in IDEA |
#10724 | [BUG] Failed to convert string with invisible characters to float |
#10633 | [BUG] ScanJson and JsonToStructs can give almost random errors |
#10659 | [BUG] from_json ArrayIndexOutOfBoundsException in 24.02 |
#10656 | [BUG] Databricks cache tests failing with host memory OOM |
#11222 | Update change log for v24.06.1 release [skip ci] |
#11221 | Change cudf version back to 24.06.0-SNAPSHOT [skip ci] |
#11217 | Update latest changelog [skip ci] |
#11211 | Use fixed seed for test_from_json_struct_decimal |
#11203 | Update version to 24.06.1-SNAPSHOT |
#11205 | Update docs for 24.06.1 release [skip ci] |
#11056 | Update latest changelog [skip ci] |
#11052 | Add spark343 shim for scala2.13 dist jar |
#10981 | Update latest changelog [skip ci] |
#10984 | [DOC] Update docs for 24.06.0 release [skip ci] |
#10974 | Update rapids JNI and private dependency to 24.06.0 |
#10830 | Use ErrorClass to Throw AnalysisException |
#10947 | Prevent contains-PrefixRange optimization if not preceded by wildcards |
#10934 | Revert "Add Support for Multiple Filtering Keys for Subquery Broadcast " |
#10870 | Add support for self-contained profiling |
#10903 | Use upper case for LEGACY_TIME_PARSER_POLICY to fix a spark UT |
#10900 | Fix type convert error in format_number scalar input |
#10868 | Disable default cuDF pinned pool |
#10914 | Fix NoSuchElementException when rlike with empty pattern |
#10858 | Add Support for Multiple Filtering Keys for Subquery Broadcast |
#10861 | refine ut framework including Part 1 and Part 2 |
#10872 | [DOC] ignore released plugin links to reduce the bother info [skip ci] |
#10839 | Replace anonymous classes for SortOrder and FIlterExec overrides |
#10873 | Auto merge PRs to branch-24.08 from branch-24.06 [skip ci] |
#10860 | [Spark 4.0] Account for PartitionedFileUtil.getPartitionedFile signature change. |
#10822 | Rewrite regex pattern literal[a-b]{x} to custom kernel in rlike |
#10833 | Filter out unused json_path tokens |
#10855 | Fix auto merge conflict 10845 [[skip ci]] |
#10826 | Add NVTX ranges to identify Spark stages and tasks |
#10836 | Catch exceptions when trying to examine Iceberg scan for metadata queries |
#10824 | Support zstd for GPU shuffle compression |
#10828 | Added DateTimeUtilsShims [Databricks] |
#10829 | Fix Inheritance Shadowing to add support for Spark 4.0.0 |
#10811 | Fix NPE in GpuParseUrl for null keys. |
#10723 | Implement chunked ORC reader |
#10715 | Rewrite some rlike expression to StartsWith/Contains |
#10820 | workaround #10801 temporally |
#10812 | Replace ThreadPoolExecutor creation with ThreadUtils API |
#10813 | Fix the errors for Pandas UDF tests on DB13.3 |
#10795 | Remove fixed seed for exact percentile integration tests |
#10805 | Drop Support for CentOS 7 |
#10800 | Add number normalization test and address followup for getJsonObject |
#10796 | fixing build break on DBR |
#10791 | Fix auto merge conflict 10779 [skip ci] |
#10636 | Update actions version [skip ci] |
#10743 | initial PR for the framework reusing Vanilla Spark's unit tests |
#10767 | Add rows-only batches support to RebatchingRoundoffIterator |
#10763 | Add in the GpuArrayFilter command |
#10766 | Fix dead links related to tools documentation [skip ci] |
#10644 | Add logging to Integration test runs in local and local-cluster mode |
#10756 | Fix Authorization Failure While Reading Tables From Unity Catalog |
#10752 | Add SparkRapidsBuildInfoEvent to the event log |
#10754 | Substitute whoami for $USER |
#10755 | [DOC] Update README for prioritize-commits script [skip ci] |
#10728 | Let big data gen set nullability recursively |
#10740 | Use parse_url kernel for PATH parsing |
#10734 | Add short circuit path for get-json-object when there is separate wildcard path |
#10725 | Initial definition for Spark 4.0.0 shim |
#10635 | Use new getJsonObject kernel for json_tuple |
#10739 | Use fixed seed for some random failed tests |
#10720 | Add Shims for Spark 3.4.3 |
#10716 | Remove the mixedType config for JSON as it has no downsides any longer |
#10733 | Fix "Could not find any rapids-4-spark jars in classpath" error when debugging UT in IDEA |
#10718 | Change parameters for memory limit in Parquet chunked reader |
#10292 | Upgrade to UCX 1.16.0 |
#10709 | Removing some authorizations for departed users [skip ci] |
#10726 | Append new authorized user to blossom-ci whitelist [skip ci] |
#10708 | Updated dump tool to verify get_json_object |
#10706 | Fix auto merge conflict 10704 [skip ci] |
#10675 | Fix merge conflict with branch-24.04 [skip ci] |
#10678 | Append new authorized user to blossom-ci whitelist [skip ci] |
#10662 | Audit script - Check commits from shuffle and storage directories [skip ci] |
#10655 | Update rapids jni/private dependency to 24.06 |
#10652 | Substitute murmurHash32 for spark32BitMurmurHash3 |
#10263 | [FEA] Add support for reading JSON containing structs where rows are not consistent |
#10436 | [FEA] Move Spark 3.5.1 out of snapshot once released |
#10430 | [FEA] Error out when running on an unsupported GPU architecture |
#9750 | [FEA] Review JsonToStruct and JsonScan and consolidate some testing and implementation |
#8680 | [AUDIT][SPARK-42779][SQL] Allow V2 writes to indicate advisory shuffle partition size |
#10429 | [FEA] Drop support for Databricks 10.4 ML LTS |
#10334 | [FEA] Turn on memory limits for parquet reader |
#10344 | [FEA] support barrier mode for mapInPandas/mapInArrow |
#10578 | [FEA] Support project expression rewrite for the case stringinstr(str_col, substr) > 0 to contains(str_col, substr) |
#10570 | [FEA] See if we can optimize sort for a single batch |
#10531 | [FEA] Support "WindowGroupLimit" optimization on GPU for Databricks 13.3 ML LTS+ |
#5553 | [FEA][Audit] - Push down StringEndsWith/Contains to Parquet |
#8208 | [FEA][AUDIT][SPARK-37099][SQL] Introduce the group limit of Window for rank-based filter to optimize top-k computation |
#10249 | [FEA] Support common subexpression elimination for expand operator |
#10301 | [FEA] Improve performance of from_json |
#10700 | [BUG] get_json_object cannot handle ints or boolean values |
#10645 | [BUG] java.lang.IllegalStateException: Expected to only receive a single batch |
#10665 | [BUG] Need to update private jar's version to v24.04.1 for spark-rapids v24.04.0 release |
#10589 | [BUG] ZSTD version mismatch in integration tests |
#10255 | [BUG] parquet_tests are skipped on Dataproc CI |
#10624 | [BUG] Deploy script "gpg:sign-and-deploy-file failed: 401 Unauthorized |
#10631 | [BUG] pending BlockState leaks blocks if the shuffle read doesn't finish successfully |
#10349 | [BUG]Test in json_test.py failed: test_from_json_struct_decimal |
#9033 | [BUG] GpuGetJsonObject does not expand escaped characters |
#10216 | [BUG] GetJsonObject fails at spark unit test $.store.book[*].reader |
#10217 | [BUG] GetJsonObject fails at spark unit test $.store.basket[0][*].b |
#10537 | [BUG] GetJsonObject throws exception when json path contains a name starting with ' |
#10194 | [BUG] GetJsonObject does not validate the input is JSON in the same way as Spark |
#10196 | [BUG] GetJsonObject does not process escape sequences in returned strings or queries |
#10212 | [BUG] GetJsonObject should return null for invalid query instead of throwing an exception |
#10218 | [BUG] GetJsonObject does not normalize non-string output |
#10591 | [BUG] test_column_add_after_partition failed on EGX Standalone cluster |
#10277 | Add monitoring for GH action deprecations |
#10627 | [BUG] Integration tests FAILED on: "nvCOMP 2.3/2.4 or newer is required for Zstandard compression" |
#10585 | [BUG]Test simple pinned blocking alloc Failed nightly tests |
#10586 | [BUG] YARN EGX IT build failing parquet_testing_test can't find file |
#10133 | [BUG] test_hash_reduction_collect_set_on_nested_array_type failed in a distributed environment |
#10378 | [BUG] test_range_running_window_float_decimal_sum_runs_batched fails intermittently |
#10486 | [BUG] StructsToJson does not fall back to the CPU for unsupported timeZone options |
#10484 | [BUG] JsonToStructs does not fallback when columnNameOfCorruptRecord is set |
#10460 | [BUG] JsonToStructs should reject float numbers for integer types |
#10468 | [BUG] JsonToStructs and ScanJson should not treat quoted strings as valid integers |
#10470 | [BUG] ScanJson and JsonToStructs should support parsing quoted decimal strings that are formatted by local (at least for en-US) |
#10494 | [BUG] JsonToStructs parses INF wrong when nonNumericNumbers is enabled |
#10456 | [BUG] allowNonNumericNumbers OFF supported for JSON Scan, but not JsonToStructs |
#10467 | [BUG] JsonToStructs should reject 1. as a valid number |
#10469 | [BUG] ScanJson should accept "1." as a valid Decimal |
#10559 | [BUG] test_spark_from_json_date_with_format FAILED on : Part of the plan is not columnar class org.apache.spark.sql.execution.ProjectExec |
#10209 | [BUG] Test failure hash_aggregate_test.py::test_hash_reduction_collect_set_on_nested_array_type DATAGEN_SEED=1705515231 |
#10319 | [BUG] Shuffled join OOM with 4GB of GPU memory |
#10507 | [BUG] regexp_test.py FAILED test_regexp_extract_all_idx_positive[DATAGEN_SEED=1709054829, INJECT_OOM] |
#10527 | [BUG] Build on Databricks failed with GpuGetJsonObject.scala:19: object parsing is not a member of package util |
#10509 | [BUG] scalar leaks when running nds query51 |
#10214 | [BUG] GetJsonObject does not support unquoted array like notation |
#10215 | [BUG] GetJsonObject removes leading space characters |
#10213 | [BUG] GetJsonObject supports array index notation without a root |
#10452 | [BUG] JsonScan and from_json share fallback checks, but have hard coded names in the results |
#10455 | [BUG] JsonToStructs and ScanJson do not fall back/support it properly if single quotes are disabled |
#10219 | [BUG] GetJsonObject sees a double quote in a single quoted string as invalid |
#10431 | [BUG] test_casting_from_overflow_double_to_timestamp DID NOT RAISE <class 'Exception'> |
#10499 | [BUG] Unit tests core dump as below |
#9325 | [BUG] test_csv_infer_schema_timestamp_ntz fails |
#10422 | [BUG] test_get_json_object_single_quotes failure |
#10411 | [BUG] Some fast parquet tests fail if the time zone is not UTC |
#10410 | [BUG]delta_lake_update_test.py::test_delta_update_partitions[['a', 'b']-False] failed by DATAGEN_SEED=1707683137 |
#10404 | [BUG] GpuJsonTuple memory leak |
#10382 | [BUG] Complile failed on branch-24.04 : literals.scala:32: object codec is not a member of package org.apache.commons |
#10844 | Update rapids private dependency to 24.04.3 |
#10788 | [DOC] Update archive page for v24.04.1 [skip ci] |
#10784 | Update latest changelog [skip ci] |
#10782 | Update latest changelog [skip ci] |
#10780 | [DOC]Update download page for v24.04.1 [skip ci] |
#10778 | Update version to 24.04.1-SNAPSHOT |
#10777 | Update rapids JNI dependency: private to 24.04.2 |
#10683 | Update latest changelog [skip ci] |
#10681 | Update rapids JNI dependency to 24.04.0, private to 24.04.1 |
#10660 | Ensure an executor broadcast is in a single batch |
#10676 | [DOC] Update docs for 24.04.0 release [skip ci] |
#10654 | Add a config to switch back to old impl for getJsonObject |
#10667 | Update rapids private dependency to 24.04.1 |
#10664 | Remove build link from the premerge-CI workflow |
#10657 | Revert "Host Memory OOM handling for RowToColumnarIterator (#10617)" |
#10625 | Pin to 3.1.0 maven-gpg-plugin in deploy script [skip ci] |
#10637 | Cleanup async state when multi-threaded shuffle readers fail |
#10617 | Host Memory OOM handling for RowToColumnarIterator |
#10614 | Use random seed for test_from_json_struct_decimal |
#10581 | Use new jni kernel for getJsonObject |
#10630 | Fix removal of internal metadata information in 350 shim |
#10623 | Auto merge PRs to branch-24.06 from branch-24.04 [skip ci] |
#10616 | Pass metadata extractors to FileScanRDD |
#10620 | Remove unused shared lib in Jenkins files |
#10615 | Turn off state logging in HostAllocSuite |
#10610 | Do not replace TableCacheQueryStageExec |
#10599 | Call globStatus directly via PY4J in hdfs_glob to avoid calling hadoop command |
#10602 | Remove InMemoryTableScanExec support for Spark 3.5+ |
#10608 | Update perfio.s3.enabled doc to fix build failure [skip ci] |
#10598 | Update CI script to build and deploy using the same CUDA classifier[skip ci] |
#10575 | Update JsonToStructs and ScanJson to have white space normalization |
#10597 | add guardword to hide cloud info |
#10540 | Handle minimum GPU architecture supported |
#10584 | Add in small optimization for instr comparison |
#10590 | Turn on transition logging in HostAllocSuite |
#10572 | Improve performance of Sort for the common single batch use case |
#10568 | Add configuration to share JNI pinned pool with cuIO |
#10550 | Enable window-group-limit optimization on |
#10542 | Make JSON parsing common between JsonToStructs and ScanJson |
#10562 | Fix test_spark_from_json_date_with_format when run in a non-UTC TZ |
#10564 | Enable specifying specific integration test methods via TESTS environment |
#10563 | Append new authorized user to blossom-ci safelist [skip ci] |
#10520 | Distinct left join |
#10538 | Move K8s cloud name into common lib for Jenkins CI |
#10552 | Fix issues when no value can be extracted from a regular expression |
#10522 | Fix missing scala-parser-combinators dependency on Databricks |
#10549 | Update to latest branch-24.02 [skip ci] |
#10544 | Fix merge conflict from branch-24.02 |
#10503 | Distinct inner join |
#10512 | Move to parsing from_json input preserving quoted strings. |
#10528 | Fix auto merge conflict 10523 |
#10519 | Replicate HostColumnVector.ColumnBuilder in plugin to enable host memory oom work |
#10521 | Fix Spark 3.5.1 build |
#10516 | One more metric for expand |
#10500 | Support "WindowGroupLimit" optimization on GPU |
#10508 | Move 351 shims into noSnapshot buildvers |
#10510 | Fix scalar leak in SumBinaryFixer |
#10466 | Use parser from spark to normalize json path in GetJsonObject |
#10490 | Start working on a more complete json test matrix json |
#10497 | Add minValue overflow check in ORC double-to-timestamp cast |
#10501 | Fix scalar leak in WindowRetrySuite |
#10474 | Remove Support for Databricks 10.4 |
#10418 | Enable GpuShuffledSymmetricHashJoin by default |
#10450 | Improve internal row to columnar host memory by using a combined spillable buffer |
#10440 | Generate CSV data per Spark version for tools |
#10449 | [DOC] Fix table rendering issue in github.io download UI page [skip ci] |
#10438 | Integrate perfio.s3 reader |
#10423 | Disable Integration Test:test_get_json_object_single_quotes on DB 10.4 |
#10419 | Export TZ in tests when default TZ is used |
#10426 | Fix auto merge conflict 10425 [skip ci] |
#10427 | Update test doc for 24.04 [skip ci] |
#10396 | Remove inactive user from github workflow [skip ci] |
#10421 | Use withRetry when manifesting spillable batch in GpuShuffledHashJoinExec |
#10420 | Disable JsonTuple by default |
#10407 | Enable Single Quote Support in getJSONObject API with GetJsonObjectOptions |
#10415 | Avoid comparing Delta logs when writing partitioned tables |
#10247 | Improve GpuExpand by pre-projecting some columns |
#10248 | Group-by aggregation based optimization for UNBOUNDED collect_set window function |
#10406 | Enabled subPage chunking by default |
#10361 | Add in basic support for JSON generation in BigDataGen and improve performance of from_json |
#10158 | Add in framework for unbounded to unbounded window agg optimization |
#10394 | Fix auto merge conflict 10393 [skip ci] |
#10375 | Support barrier mode for mapInPandas/mapInArrow |
#10356 | Update locate_parquet_testing_files function to support hdfs input path for dataproc CI |
#10369 | Revert "Support barrier mode for mapInPandas/mapInArrow (#10364)" |
#10358 | Disable Spark UI by default for integration tests |
#10360 | Fix a memory leak in json tuple |
#10364 | Support barrier mode for mapInPandas/mapInArrow |
#10348 | Remove redundant joinOutputRows metric |
#10321 | Bump up dependency version to 24.04.0-SNAPSHOT |
#10330 | Add tryAcquire to GpuSemaphore |
#10258 | Init project version 24.04.0-SNAPSHOT |
#9926 | [FEA] Add config option for the parquet reader input read limit. |
#10270 | [FEA] Add support for single quotes when reading JSON |
#10253 | [FEA] Enable mixed types as string in GpuJsonToStruct |
#9692 | [FEA] Remove Pascal support |
#8806 | [FEA] Support lazy quantifier and specified group index in regexp_extract function |
#10079 | [FEA] Add string parameter support for unix_timestamp for non-UTC time zones |
#9667 | [FEA][JSON] Add support for non default dateFormat in from_json |
#9173 | [FEA] Support format_number |
#10145 | [FEA] Support to_utc_timestamp |
#9927 | [FEA] Support to_date with non-UTC timezones without DST |
#10006 | [FEA] Support ParseToTimestamp for non-UTC time zones |
#9096 | [FEA] Add Spark 3.3.4 support |
#9585 | [FEA] support ascii function |
#9260 | [FEA] Create Spark 3.4.2 shim and build env |
#10076 | [FEA] Add performance test framework for non-UTC time zone features. |
#9881 | [TASK] Remove spark.rapids.sql.nonUTC.enabled configuration option |
#9801 | [FEA] Support DateFormat on GPU with a non-UTC timezone |
#6834 | [FEA] Support GpuHour expression for timezones other than UTC |
#6842 | [FEA] Support TimeZone aware operations for value extraction |
#1860 | [FEA] Optimize row based window operations for BOUNDED ranges |
#9606 | [FEA] Support unix_timestamp with CST(China Time Zone) support |
#9815 | [FEA] Support unix_timestamp for non-DST timezones |
#8807 | [FEA] support ‘yyyyMMdd’ format in from_unixtime function |
#9605 | [FEA] Support from_unixtime with CST(China Time Zone) support |
#6836 | [FEA] Support FromUnixTime for non UTC timezones |
#9175 | [FEA] Support Databricks 13.3 |
#6881 | [FEA] Support RAPIDS Spark plugin on ARM |
#9274 | [FEA] Regular deploy process to include arm artifacts |
#9844 | [FEA] Let Gpu arrow python runners support writing one batch one time for the single threaded model. |
#7309 | [FEA] Detect multiple versions of the RAPIDS jar on the classpath at the same time |
#9442 | [FEA] For hash joins where the build side can change use the smaller table for the build side |
#10142 | [TASK] Benchmark existing timestamp functions that work in non-UTC time zone (non-DST) |
#10548 | [BUG] test_dpp_bypass / test_dpp_via_aggregate_subquery failures in CI Databricks 13.3 |
#10530 | test_delta_merge_match_delete_only java.lang.OutOfMemoryError: GC overhead limit exceeded |
#10464 | [BUG] spark334 and spark342 shims missed in scala2.13 dist jar |
#10473 | [BUG] Leak when running RANK query |
#10432 | Plug-in Build Failing for Databricks 11.3 |
#9974 | [BUG] host memory Leak in MultiFileCoalescingPartitionReaderBase in UTC time zone |
#10359 | [BUG] Build failure on Databricks nightly run with GpuMapInPandasExecMeta |
#10327 | [BUG] Unit test FAILED against : SPARK-24957: average with decimal followed by aggregation returning wrong result |
#10324 | [BUG] hash_aggregate_test.py test FAILED: Type conversion is not allowed from Table {...} |
#10291 | [BUG] SIGSEGV in libucp.so |
#9212 | [BUG] from_json fails with cuDF error Invalid list size computation error |
#10264 | [BUG] hash aggregate test failures due to type conversion errors |
#10262 | [BUG] Test "SPARK-24957: average with decimal followed by aggregation returning wrong result" failed. |
#9353 | [BUG] [JSON] A mix of lists and structs within the same column is not supported |
#10099 | [BUG] orc_test.py::test_orc_scan_with_aggregate_pushdown fails with a standalone cluster on spark 3.3.0 |
#10047 | [BUG] CudfException during conditional hash join while running nds query64 |
#9779 | [BUG] 330cdh failed test_hash_reduction_sum_full_decimal on CI |
#10197 | [BUG] Disable GetJsonObject by default and update docs |
#10165 | [BUG] Databricks 13.3 executor side broadcast failure |
#10224 | [BUG] DBR builds fails when installing Maven |
#10222 | [BUG] to_utc_timestamp and from_utc_timestamp fallback when TZ is supported time zone |
#10195 | [BUG] test_window_aggs_for_negative_rows_partitioned failure in CI |
#10182 | [BUG] test_dpp_bypass / test_dpp_via_aggregate_subquery failures in CI (databricks) |
#10169 | [BUG] Host column vector leaks when running test_cast_timestamp_to_date |
#10050 | [BUG] test_cast_decimal_to_decimal[to:DecimalType(1,-1)-from:Decimal(5,-3)] fails with DATAGEN_SEED=1702439569 |
#10088 | [BUG] GpuExplode single row split to fit cuDF limits |
#10174 | [BUG] json_test.py::test_from_json_struct_timestamp failed on: Part of the plan is not columnar |
#10186 | [BUG] test_to_date_with_window_functions failed in non-UTC nightly CI |
#10154 | [BUG] 'spark-test.sh' integration tests FAILED on 'ps: command not found" in Rocky Docker environment |
#10175 | [BUG] string_test.py::test_format_number_float_special FAILED : AssertionError 'NaN' == |
#10166 | Detect Undeclared Shim in POM.xml |
#10170 | [BUG] test_cast_timestamp_to_date fails with TZ=Asia/Hebron |
#10149 | [BUG] GPU illegal access detected during delta_byte_array.parquet read |
#9905 | [BUG] GpuJsonScan incorrect behavior when parsing dates |
#10163 | Spark 3.3.4 Shim Build Failure |
#10105 | [BUG] scala:compile is not thread safe unless compiler bridge already exists |
#10026 | [BUG] test_hash_agg_with_nan_keys failed with a DATAGEN_SEED=1702335559 |
#10075 | [BUG] non-pinned blocking alloc with spill unit test failed in HostAllocSuite |
#10134 | [BUG] test_window_aggs_for_batched_finite_row_windows_partitioned failed on Scala 2.13 with DATAGEN_SEED=1704033145 |
#10118 | [BUG] non-UTC Nightly CI failed |
#10136 | [BUG] The canonicalized version of GpuFileSourceScanExec s that suppose to be semantic-equal can be different |
#10110 | [BUG] disable collect_list and collect_set for window operations by default. |
#10129 | [BUG] Unit test suite fails with Null data pointer in GpuTimeZoneDB |
#10089 | [BUG] DATAGEN_SEED= environment does not override the marker datagen_overrides |
#10108 | [BUG] @datagen_overrides seed is sticky when it shouldn't be |
#10064 | [BUG] test_unsupported_fallback_regexp_replace failed with DATAGEN_SEED=1702662063 |
#10117 | [BUG] test_from_utc_timestamp failed on Cloudera Env when TZ is Iran |
#9914 | [BUG] Report GPU OOM on recent passed CI premerges. |
#10094 | [BUG] spark351 PR check failure MockTaskContext method isFailed in class TaskContext of type ()Boolean is not defined |
#10017 | [BUG] test_casting_from_double_to_timestamp failed for DATAGEN_SEED=1702329497 |
#9992 | [BUG] conditionals_test.py::test_conditional_with_side_effects_cast[String] failed with DATAGEN_SEED=1701976979 |
#9743 | [BUG][AUDIT] SPARK-45652 - SPJ: Handle empty input partitions after dynamic filtering |
#9859 | [AUDIT] [SPARK-45786] Inaccurate Decimal multiplication and division results |
#9555 | [BUG] Scala 2.13 build with JDK 11 or 17 fails OpcodeSuite tests |
#10073 | [BUG] test_csv_prefer_date_with_infer_schema failed with DATAGEN_SEED=1702847907 |
#10004 | [BUG] If a host memory buffer is spilled, it cannot be unspilled |
#10063 | [BUG] CI build failure with 341db: method getKillReason has weaker access privileges; it should be public |
#10055 | [BUG] array_test.py::test_array_transform_non_deterministic failed with non-UTC time zone |
#10056 | [BUG] Unit tests ToPrettyStringSuite FAILED on spark-3.5.0 |
#10048 | [BUG] Fix out of range error from pySpark in test_timestamp_millis and other two integration test cases |
#4204 | casting double to string does not match Spark |
#9938 | Better to do some refactor for the Python UDF code |
#10018 | [BUG] GpuToUnixTimestampImproved off by 1 on GPU when handling timestamp before epoch |
#10012 | [BUG] test_str_to_map_expr_random_delimiters with DATAGEN_SEED=1702166057 hangs |
#10029 | [BUG] doc links fail with 404 for shims.md |
#9472 | [BUG] Non-Deterministic expressions in an array_transform can cause errors |
#9884 | [BUG] delta_lake_delete_test.py failed assertion [DATAGEN_SEED=1701225104, IGNORE_ORDER... |
#9977 | [BUG] test_cast_date_integral fails on databricks 3.4.1 |
#9936 | [BUG] Nightly CI of non-UTC time zone reports 'year 0 is out of range' error |
#9941 | [BUG] A potential data corruption in Pandas UDFs |
#9897 | [BUG] Error message for multiple jars on classpath is wrong |
#9916 | [BUG] test_cast_string_ts_valid_format failed at seed = 1701362564 |
#9559 | [BUG] precommit regularly fails with error trying to download a dependency |
#9708 | [BUG] test_cast_string_ts_valid_format fails with DATAGEN_SEED=1699978422 |
#10555 | Update change log [skip ci] |
#10551 | Try to make degenerative joins here impossible for these tests |
#10546 | Update changelog [skip ci] |
#10541 | Fix Delta log cache size settings during integration tests |
#10525 | Update changelog for v24.02.0 release [skip ci] |
#10465 | Add missed shims for scala2.13 |
#10511 | Update rapids jni and private dependency version to 24.02.1 |
#10513 | Fix scalar leak in SumBinaryFixer (#10510) |
#10475 | Fix scalar leak in RankFixer |
#10461 | Preserve tags on FileSourceScanExec |
#10459 | [DOC] Fix table rendering issue in github.io download UI page on branch-24.02 [skip ci] |
#10443 | Update change log for v24.02.0 release [skip ci] |
#10439 | Reverts #10232 and fixes the plugin build on Databricks 11.3 |
#10380 | Init changelog 24.02 [skip ci] |
#10367 | Update rapids JNI and private version to release 24.02.0 |
#10414 | [DOC] Fix 24.02.0 documentation errors [skip ci] |
#10403 | Cherry-pick: Fix a memory leak in json tuple (#10360) |
#10387 | [DOC] Update docs for 24.02.0 release [skip ci] |
#10399 | Update NOTICE-binary |
#10389 | Change version and branch to 24.02 in docs [skip ci] |
#10384 | [DOC] Update docs for 23.12.2 release [skip ci] |
#10309 | [DOC] add custom 404 page and fix some document issue [skip ci] |
#10352 | xfail mixed type test |
#10355 | Revert "Support barrier mode for mapInPandas/mapInArrow (#10343)" |
#10353 | Use fixed seed for test_from_json_struct_decimal |
#10343 | Support barrier mode for mapInPandas/mapInArrow |
#10345 | Fix auto merge conflict 10339 [skip ci] |
#9991 | Start to use explicit memory limits in the parquet chunked reader |
#10328 | Fix typo in spark-tests.sh [skip ci] |
#10279 | Run '--packages' only with default cuda11 jar |
#10273 | Support reading JSON data with single quotes around attribute names and values |
#10306 | Fix performance regression in from_json |
#10272 | Add FullOuter support to GpuShuffledSymmetricHashJoinExec |
#10260 | Add perf test for time zone operators |
#10275 | Add tests for window Python udf with array input |
#10278 | Clean up $M2_CACHE to avoid side-effect of previous dependency:get [skip ci] |
#10268 | Add config to enable mixed types as string in GpuJsonToStruct & GpuJsonScan |
#10297 | Revert "UCX 1.16.0 upgrade (#10190)" |
#10289 | Add gerashegalov to CODEOWNERS [skip ci] |
#10290 | Fix merge conflict with 23.12 [skip ci] |
#10190 | UCX 1.16.0 upgrade |
#10211 | Use parse_url kernel for QUERY literal and column key |
#10267 | Update to libcudf unsigned sum aggregation types change |
#10208 | Added Support for Lazy Quantifier |
#9993 | Enable mixed types as string in GpuJsonScan |
#10246 | Refactor full join iterator to allow access to build tracker |
#10257 | Enable auto-merge from branch-24.02 to branch-24.04 [skip CI] |
#10178 | Mark hash reduction decimal overflow test as a permanent seed override |
#10244 | Use POSIX mode in assembly plugin to avoid issues with large UID/GID |
#10238 | Smoke test with '--package' to fetch the plugin jar |
#10201 | Deploy release candidates to local maven repo for dependency check[skip ci] |
#10240 | Improved inner joins with large build side |
#10220 | Disable GetJsonObject by default and add tests for as many issues with it as possible |
#10230 | Fix Databricks 13.3 BroadcastHashJoin using executor side broadcast fed by ColumnarToRow [Databricks] |
#10232 | Fixed 330db Shims to Adopt the PythonRunner Changes |
#10225 | Download Maven from apache.org archives [skip ci] |
#10210 | Add string parameter support for unix_timestamp for non-UTC time zones |
#10223 | Fix to_utc_timestamp and from_utc_timestamp fallback when TZ is supported time zone |
#10205 | Deterministic ordering in window tests |
#10204 | Further prevent degenerative joins in dpp_test |
#10156 | Update string to float compatibility doc[skip ci] |
#10193 | Fix explode with carry-along columns on GpuExplode single row retry handling |
#10191 | Updating the config documentation for filecache configs [skip ci] |
#10131 | With a single row GpuExplode tries to split the generator array |
#10179 | Fix build regression against Spark 3.2.x |
#10189 | test needs marks for non-UTC and for non_supported timezones |
#10176 | Fix format_number NaN symbol in high jdk version |
#10074 | Update the legacy mode check: only take effect when reading date/timestamp column |
#10167 | Defined Shims Should Be Declared In POM |
#10168 | Prevent a degenerative join in test_dpp_reuse_broadcast_exchange |
#10171 | Fix test_cast_timestamp_to_date when running in a DST time zone |
#9975 | Improve dateFormat support in GpuJsonScan and make tests consistent with GpuStructsToJson |
#9790 | Support float case of format_number with format_float kernel |
#10144 | Support to_utc_timestamp |
#10162 | Fix Spark 334 Build |
#10146 | Refactor the window code so it is not mostly kept in a few very large files |
#10155 | Install procps tools for rocky docker images [skip ci] |
#10153 | Disable multi-threaded Maven |
#10100 | Enable to_date (via gettimestamp and casting timestamp to date) for non-UTC time zones |
#10140 | Removed Unnecessary Whitespaces From Spark 3.3.4 Shim [skip ci] |
#10148 | fix test_hash_agg_with_nan_keys floating point sum failure |
#10150 | Increase timeouts in HostAllocSuite to avoid timeout failures on slow machines |
#10143 | Fix test_window_aggs_for_batched_finite_row_windows_partitioned fail |
#9887 | Reduce time-consuming of pre-merge |
#10130 | Change unit tests that force ooms to specify the oom type (gpu |
#10138 | Update copyright dates in NOTICE files [skip ci] |
#10139 | Add Delta Lake 2.3.0 to list of versions to test for Spark 3.3.x |
#10135 | Fix CI: can't find script when there is pushd in script [skip ci] |
#10137 | Fix the canonicalizing for GPU file scan |
#10132 | Disable collect_list and collect_set for window by default |
#10084 | Refactor GpuJsonToStruct to reduce code duplication and manage resources more efficiently |
#10087 | Additional unit tests for GeneratedInternalRowToCudfRowIterator |
#10082 | Add Spark 3.3.4 Shim |
#10054 | Support Ascii function for ascii and latin-1 |
#10127 | Fix merge conflict with branch-23.12 |
#10097 | [DOC] Update docs for 23.12.1 release [skip ci] |
#10109 | Fixes a bug where datagen seed overrides were sticky and adds datagen_seed_override_disabled |
#10093 | Fix test_unsupported_fallback_regexp_replace |
#10119 | Fix from_utc_timestamp case failure on Cloudera when TZ is Iran |
#10106 | Add isFailed() to MockTaskContext and Remove MockTaskContextBase.scala |
#10112 | Remove datagen seed override for test_conditional_with_side_effects_cast |
#10104 | [DOC] Add in docs about memory debugging [skip ci] |
#9925 | Use threads, cache Scala compiler in GH mvn workflow |
#9967 | Added Spark-3.4.2 Shims |
#10061 | Use parse_url kernel for QUERY parsing |
#10101 | [DOC] Add column order error docs [skip ci] |
#10078 | Add perf test for non-UTC operators |
#10096 | Shim MockTaskContext to fix Spark 3.5.1 build |
#10092 | Implement Math.round using floor on GPU |
#10085 | Update tests that originally restricted the Spark timestamp range |
#10090 | Replace GPU-unsupported \z with an alternative RLIKE expression |
#10095 | Temporarily fix date format failed cases for non-UTC time zone. |
#9999 | Add some odd time zones for timezone transition tests |
#9962 | Add 3.5.1-SNAPSHOT Shim |
#10071 | Cleanup usage of non-utc configuration here |
#10057 | Add support for StringConcatFactory.makeConcatWithConstants (#9555) |
#9996 | Test full timestamp output range in PySpark |
#10081 | Add a fallback Cloudera Maven repo URL [skip ci] |
#10065 | Improve host memory spill interfaces |
#10069 | Revert "Support split broadcast join condition into ast and non-ast [… |
#10070 | Fix 332db build failure |
#10060 | Fix failed cases for non-utc time zone |
#10038 | Remove spark.rapids.sql.nonUTC.enabled configuration option |
#10059 | Fixed Failing ToPrettyStringSuite Test for 3.5.0 |
#10013 | Extended configuration of OOM injection mode |
#10052 | Set seed=0 for some integration test cases |
#10053 | Remove invalid user from CODEOWNER file [skip ci] |
#10049 | Fix out of range error from pySpark in test_timestamp_millis and other two integration test cases |
#9721 | Support date_format via Gpu for non-UTC time zone |
#9470 | Use float to string kernel |
#9845 | Use parse_url kernel for HOST parsing |
#10024 | Support hour minute second for non-UTC time zone |
#9973 | Batching support for row-based bounded window functions |
#10042 | Update tests to not have hard coded fallback when not needed |
#9816 | Support unix_timestamp and to_unix_timestamp with non-UTC timezones (non-DST) |
#9902 | Some refactor for the Python UDF code |
#10023 | GPU supports yyyyMMdd format by post process for the from_unixtime function |
#10033 | Remove GpuToTimestampImproved and spark.rapids.sql.improvedTimeOps.enabled |
#10016 | Fix infinite loop in test_str_to_map_expr_random_delimiters |
#9481 | Use parse_url kernel for PROTOCOL parsing |
#10030 | Update links in shims.md |
#10015 | Fix array_transform to not recompute the argument |
#10011 | Add cpu oom retry split handling to InternalRowToColumnarBatchIterator |
#10019 | Fix auto merge conflict 10010 [skip ci] |
#9760 | Support split broadcast join condition into ast and non-ast |
#9827 | Enable ORC timestamp and decimal predicate push down tests |
#10002 | Use Spark 3.3.3 instead of 3.3.2 for Scala 2.13 premerge builds |
#10000 | Optimize from_unixtime |
#10003 | Fix merge conflict with branch-23.12 |
#9984 | Fix 340+(including DB341+) does not support casting date to integral/float |
#9972 | Fix year 0 is out of range in test_from_json_struct_timestamp |
#9814 | Support from_unixtime via Gpu for non-UTC time zone |
#9929 | Add host memory retries for GeneratedInternalRowToCudfRowIterator |
#9957 | Update cases for cast between integral and (date/time) |
#9959 | Append new authorized user to blossom-ci whitelist [skip ci] |
#9942 | Fix a potential data corruption for Pandas UDF |
#9922 | Fix allowMultipleJars recommend setting message |
#9947 | Fix merge conflict with branch-23.12 |
#9908 | Register default allocator for host memory |
#9944 | Fix Java OOM caused by incorrect state of shouldCapture when exception occurred |
#9937 | Refactor to use CLASSIFIER instead of CUDA_CLASSIFIER [skip ci] |
#9904 | Params for build and test CI scripts on Databricks |
#9719 | Support fine grained timezone checker instead of type based |
#9918 | Prevent generation of 'year 0 is out of range' strings in IT |
#9852 | Avoid generating duplicate nan keys with MapGen(FloatGen) |
#9674 | Add cache action to speed up mvn workflow [skip ci] |
#9900 | Revert "Remove Databricks 13.3 from release 23.12 (#9890)" |
#9889 | Fix test_cast_string_ts_valid_format test |
#9888 | Update nightly build and deploy script for arm artifacts [skip ci] |
#9833 | Fix a hang for Pandas UDFs on DB 13.3 |
#9656 | Update for new retry state machine JNI APIs |
#9654 | Detect multiple jars on the classpath when init plugin |
#9857 | Skip redundant steps in nightly build [skip ci] |
#9812 | Update JNI and private dep version to 24.02.0-SNAPSHOT |
#9716 | Initiate project version 24.02.0-SNAPSHOT |