forked from NVIDIA/spark-rapids
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rebase Databricks 14.3
feature branch to 24.12
#5
Closed
mythrocks
wants to merge
63
commits into
razajafri:SP-10661-db-14.3
from
mythrocks:databricks-14.3-rebased-to-24.12
Closed
Rebase Databricks 14.3
feature branch to 24.12
#5
mythrocks
wants to merge
63
commits into
razajafri:SP-10661-db-14.3
from
mythrocks:databricks-14.3-rebased-to-24.12
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…arily (NVIDIA#11469) Signed-off-by: Alessandro Bellina <[email protected]>
…abricks] (NVIDIA#11466) * Switch to a regular try Signed-off-by: Gera Shegalov <[email protected]> * drop Maven tarball Signed-off-by: Gera Shegalov <[email protected]> * unused import Signed-off-by: Gera Shegalov <[email protected]> * repro Signed-off-by: Gera Shegalov <[email protected]> --------- Signed-off-by: Gera Shegalov <[email protected]>
…s temporarily (NVIDIA#11469)" (NVIDIA#11473) This reverts commit 5beeba8. Signed-off-by: Alessandro Bellina <[email protected]>
Signed-off-by: Robert (Bobby) Evans <[email protected]>
Signed-off-by: Robert (Bobby) Evans <[email protected]>
Signed-off-by: Robert (Bobby) Evans <[email protected]>
Signed-off-by: Peixin Li <[email protected]>
Signed-off-by: Robert (Bobby) Evans <[email protected]>
NVIDIA#11449) * Support yyyyMMdd in GetTimestamp operator for LEGACY mode Signed-off-by: Chong Gao <[email protected]> Co-authored-by: Chong Gao <[email protected]>
… [databricks] (NVIDIA#11462) * Support non-UTC timezone for casting from date type to timestamp type Signed-off-by: Chong Gao <[email protected]> Co-authored-by: Chong Gao <[email protected]>
* Install cuDF-py against python 3.10 on Databricks Fix on Databricks runtime for : NVIDIA#11394 Enable the udf_cudf_test test case for Databricks-13.3 Rapids 24.10+ drops python 3.9 or below conda packages. ref: https://docs.rapids.ai/notices/rsn0040/ Install cuDF-py packages against python 3.10 and above on Databricks runtime to run UDF cuDF tests, because on DB-13.3 Conda is not installed by default. Signed-off-by: timl <[email protected]> * Check if 'conda' exists to make the if/else expression more readable Signed-off-by: timl <[email protected]> --------- Signed-off-by: timl <[email protected]>
* add parquet column index ut test Signed-off-by: fejiang <[email protected]> * change Signed-off-by: fejiang <[email protected]> * added parquet suite Signed-off-by: fejiang <[email protected]> * pom changed Signed-off-by: fejiang <[email protected]> * DeltaEncoding Suite Signed-off-by: fejiang <[email protected]> * enable more suites Signed-off-by: fejiang <[email protected]> * remove ignored case Signed-off-by: fejiang <[email protected]> * format Signed-off-by: fejiang <[email protected]> * added ignored cases Signed-off-by: fejiang <[email protected]> * change to parquet hadoop version Signed-off-by: fejiang <[email protected]> * remove parquet.version Signed-off-by: fejiang <[email protected]> * adding scope and classifier Signed-off-by: fejiang <[email protected]> * pom remove unused Signed-off-by: fejiang <[email protected]> * pom chang3 2.13 Signed-off-by: fejiang <[email protected]> * add schema suite Signed-off-by: fejiang <[email protected]> * remove dataframe Signed-off-by: fejiang <[email protected]> * RapidsParquetThriftCompatibilitySuite Signed-off-by: fejiang <[email protected]> * ThriftCompaSuite added Signed-off-by: fejiang <[email protected]> * more suites but the RowIndexSuite one Signed-off-by: fejiang <[email protected]> * formatting issues Signed-off-by: fejiang <[email protected]> * exlude SPARK-36803: Signed-off-by: fejiang <[email protected]> * setting change Signed-off-by: fejiang <[email protected]> * setting change Signed-off-by: fejiang <[email protected]> * adjust order Signed-off-by: fejiang <[email protected]> * adjust settings Signed-off-by: fejiang <[email protected]> * adjust settings Signed-off-by: fejiang <[email protected]> * RapidsParquetThriftCompatibilitySuite settings * known issue added Signed-off-by: fejiang <[email protected]> * format new line Signed-off-by: fejiang <[email protected]> * known issue added Signed-off-by: fejiang <[email protected]> * RapidsParquetDeltaByteArrayEncodingSuite Signed-off-by: fejiang <[email protected]> * RapidsParquetAvroCompatibilitySuite Signed-off-by: fejiang <[email protected]> * ParquetFiledIdSchemaSuite and Avro suite added * pom Avro suite modified * ParquetFileFormatSuite added * RapidsParquetRebaseDatetimeSuite and QuerySuite added * RapidsParquetSchemaPruningSuite added * setting adjust Signed-off-by: fejiang <[email protected]> * setting adjust Signed-off-by: fejiang <[email protected]> * UT adjuct exclude added Signed-off-by: fejiang <[email protected]> * RapidsParquetThriftCompatibilitySuite adjust setting Signed-off-by: fejiang <[email protected]> * comment Create parquet table with compression Signed-off-by: fejiang <[email protected]> * SPARK_HOME NOT FOUND issue solved. Signed-off-by: fejiang <[email protected]> * enabling more suite Signed-off-by: fejiang <[email protected]> * remove exclude from RapidsParquetFieldIdIOSuite Signed-off-by: fejiang <[email protected]> * formate and remove parquet files Signed-off-by: fejiang <[email protected]> * comment setting Signed-off-by: fejiang <[email protected]> * pom modified and remove unnecess case Signed-off-by: fejiang <[email protected]> --------- Signed-off-by: fejiang <[email protected]> Signed-off-by: fejiang <[email protected]> Co-authored-by: fejiang <[email protected]>
Keep the rapids JNI and private dependency version at 24.10.0-SNAPSHOT until the nightly CI for the branch-24.12 branch is complete. Track the dependency update process at: NVIDIA#11492 Signed-off-by: nvauto <[email protected]>
…0798) * optimzing Expand+Aggregate in sqlw with many count distinct Signed-off-by: Hongbin Ma (Mahone) <[email protected]> * simplify Signed-off-by: Hongbin Ma (Mahone) <[email protected]> * add comment Signed-off-by: Hongbin Ma (Mahone) <[email protected]> * address comments Signed-off-by: Hongbin Ma (Mahone) <[email protected]> --------- Signed-off-by: Hongbin Ma (Mahone) <[email protected]>
[auto-merge] branch-24.10 to branch-24.12 [skip ci] [bot]
Signed-off-by: Alessandro Bellina <[email protected]>
[auto-merge] branch-24.10 to branch-24.12 [skip ci] [bot]
To fix: NVIDIA#11502 Download jars using wget instead of 'mvn dependency:get' to fix 'missing intermediate jars' failures, as we stopped deploying these intermediate jars since version 24.10 Signed-off-by: timl <[email protected]>
[auto-merge] branch-24.10 to branch-24.12 [skip ci] [bot]
* Support legacy mode for yyyymmdd format Signed-off-by: Chong Gao <[email protected]> Co-authored-by: Chong Gao <[email protected]>
* quick workaround to make image build work Signed-off-by: Peixin Li <[email protected]> * use mamba directly --------- Signed-off-by: Peixin Li <[email protected]>
[auto-merge] branch-24.10 to branch-24.12 [skip ci] [bot]
* add max memory watermark metric Signed-off-by: Zach Puller <[email protected]> --------- Signed-off-by: Zach Puller <[email protected]>
[auto-merge] branch-24.10 to branch-24.12 [skip ci] [bot]
* Updated parameters to enable file overwriting when dumping. Signed-off-by: ustcfy <[email protected]> * Validate LORE dump root path before execution Signed-off-by: ustcfy <[email protected]> * Add loreOutputRootPathChecked map for tracking lore output root path checks. Signed-off-by: ustcfy <[email protected]> * Delay path and filesystem initialization until actually needed. Signed-off-by: ustcfy <[email protected]> * Add test and update dev/lore.md doc. Signed-off-by: ustcfy <[email protected]> * Format code to ensure line length does not exceed 100 characters Signed-off-by: ustcfy <[email protected]> * Format code to ensure line length does not exceed 100 characters Signed-off-by: ustcfy <[email protected]> * Improved resource management by using withResource. Signed-off-by: ustcfy <[email protected]> * Update docs/dev/lore.md Co-authored-by: Renjie Liu <[email protected]> * Improved resource management by using withResource. Signed-off-by: ustcfy <[email protected]> * Removed for FileSystem instance. Signed-off-by: ustcfy <[email protected]> * Update docs/dev/lore.md Co-authored-by: Gera Shegalov <[email protected]> --------- Signed-off-by: ustcfy <[email protected]> Signed-off-by: ustcfy <[email protected]> Co-authored-by: Renjie Liu <[email protected]> Co-authored-by: Gera Shegalov <[email protected]>
[auto-merge] branch-24.10 to branch-24.12 [skip ci] [bot]
Signed-off-by: Peixin Li <[email protected]>
[auto-merge] branch-24.10 to branch-24.12 [skip ci] [bot]
Signed-off-by: Robert (Bobby) Evans <[email protected]>
[auto-merge] branch-24.10 to branch-24.12 [skip ci] [bot]
…lanca timezone and LEGACY mode (NVIDIA#11567) Signed-off-by: Chong Gao <[email protected]>
…CI (NVIDIA#11544) Signed-off-by: Chong Gao <[email protected]> Co-authored-by: Chong Gao <[email protected]>
…IA#11561) Signed-off-by: Robert (Bobby) Evans <[email protected]>
* implement watermark Signed-off-by: Zach Puller <[email protected]> * consolidate/fix disk spill metric Signed-off-by: Zach Puller <[email protected]> --------- Signed-off-by: Zach Puller <[email protected]>
…DIA#11569) Signed-off-by: Robert (Bobby) Evans <[email protected]>
Signed-off-by: Jason Lowe <[email protected]>
Signed-off-by: Gera Shegalov <[email protected]>
Fix merge conflict with branch-24.10
…A#11559) * Spark 4: Addressed cast_test.py failures. Fixes NVIDIA#11009 and NVIDIA#11530. This commit addresses the test failures in cast_test.py, on Spark 4.0. These generally have to do with changes in behaviour of Spark when ANSI mode is enabled. In these cases, the tests have been split out into ANSI=on and ANSI=off. The bugs uncovered from the tests have been spun into their own issues; fixing all of them was beyond the scope of this change. Signed-off-by: MithunR <[email protected]>
* use task id as tie breaker Signed-off-by: Hongbin Ma (Mahone) <[email protected]> * save threadlocal lookup Signed-off-by: Hongbin Ma (Mahone) <[email protected]> --------- Signed-off-by: Hongbin Ma (Mahone) <[email protected]>
Signed-off-by: Hongbin Ma (Mahone) <[email protected]>
* avoid long tail tasks due to PrioritySemaphore (NVIDIA#11574) * use task id as tie breaker Signed-off-by: Hongbin Ma (Mahone) <[email protected]> * save threadlocal lookup Signed-off-by: Hongbin Ma (Mahone) <[email protected]> --------- Signed-off-by: Hongbin Ma (Mahone) <[email protected]> * addressing jason's comment Signed-off-by: Hongbin Ma (Mahone) <[email protected]> --------- Signed-off-by: Hongbin Ma (Mahone) <[email protected]>
[auto-merge] branch-24.10 to branch-24.12 [skip ci] [bot]
* Fix collection_ops_tests for Spark 4.0. Fixes NVIDIA#11011. This commit fixes the failures in `collection_ops_tests` on Spark 4.0. On all versions of Spark, when a Sequence is collected with rows that exceed MAX_INT, an exception is thrown indicating that the collected Sequence/array is larger than permissible. The different versions of Spark vary in the contents of the exception message. On Spark 4, one sees that the error message now contains more information than all prior versions, including: 1. The name of the op causing the error 2. The errant sequence size This commit introduces a shim to make this new information available in the exception. Note that this shim does not fit cleanly in RapidsErrorUtils, because there are differences within major Spark versions. For instance, Spark 3.4.0-1 have a different message as compared to 3.4.2 and 3.4.3. Likewise, the differences in 3.5.0, 3.5.1, 3.5.2. Signed-off-by: MithunR <[email protected]> * Fixed formatting error. * Review comments. This moves the construction of the long-sequence error strings into RapidsErrorUtils. The process involved introducing many new RapidsErrorUtils classes, and using mix-ins of concrete implementations for the error-string construction. * Added missing shim tag for 3.5.2. * Review comments: Fixed code style. * Reformatting, per project guideline. * Fixed missed whitespace problem. --------- Signed-off-by: MithunR <[email protected]>
Signed-off-by: liyuan <[email protected]>
[auto-merge] branch-24.10 to branch-24.12 [skip ci] [bot]
\nWait for the pre-merge CI job to SUCCEED Signed-off-by: nvauto <[email protected]>
* Update latest changelog [skip ci] Update change log with CLI: \n\n scripts/generate-changelog --token=<GIT_TOKEN> --releases=24.08,24.10 Signed-off-by: nvauto <[email protected]> * Update changelog Signed-off-by: timl <[email protected]> * Update changelog Signed-off-by: timl <[email protected]> --------- Signed-off-by: nvauto <[email protected]> Signed-off-by: timl <[email protected]> Co-authored-by: timl <[email protected]>
…-11604 Fix auto merge conflict 11604 [skip ci]
* xfail regexp tests to unblock CI Signed-off-by: Jason Lowe <[email protected]> * Disable failing regexp unit test to unblock CI --------- Signed-off-by: Jason Lowe <[email protected]>
* Remove an unused config shuffle.spillThreads Signed-off-by: Alessandro Bellina <[email protected]> * update configs.md --------- Signed-off-by: Alessandro Bellina <[email protected]>
Needed minor modifications. Signed-off-by: MithunR <[email protected]>
Closing this PR in favour of NVIDIA#11635. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I have rebased
SP-10661-db-14.3
tobranch-24.12
, if only to make the recentRapidsErrorUtils
refactor available in this branch.