forked from apache/spark
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-49872] Spark History UI -- StreamConstraintsException: String length (20054016) exceeds the maximum length (20000000) #6
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…MP_1142 ### What changes were proposed in this pull request? This PR proposes to assign proper error condition & sqlstate for _LEGACY_ERROR_TEMP_1142 ### Why are the changes needed? To improve the error message by assigning proper error condition and SQLSTATE ### Does this PR introduce _any_ user-facing change? No, only user-facing error message improved ### How was this patch tested? Updated the existing tests ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#48451 from itholic/SPARK-49952. Authored-by: Haejoon Lee <[email protected]> Signed-off-by: Max Gekk <[email protected]>
…metic overflow ### What changes were proposed in this pull request? Improvement on default branch for try suggestion. ### Why are the changes needed? When we hit default branch in codeGen, we need to return a default value that would specify that we do not know the function, and not just a blank string. ### Does this PR introduce _any_ user-facing change? Yes. ### How was this patch tested? No branch hits this behaviour so far, but we need to safeguard from the possible errors. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#48450 from mihailom-db/binaryArithmeticOverflowFollowup. Authored-by: Mihailo Milosevic <[email protected]> Signed-off-by: Max Gekk <[email protected]>
…ng JSON string RDD ### What changes were proposed in this pull request? This is a followup of apache#42979 , to fix a regression. For the `spark.read.json(rdd)` API, there is never corrupted file, and we should not fail if the string value is null with non-failfast parsing mode. This PR is a partial revert of apache#42979 , to not treat `RuntimeException` as corrupted file when we are not reading from files. ### Why are the changes needed? A query used to work in 3.5 should still work in 4.0 ### Does this PR introduce _any_ user-facing change? no as this regression is not released yet. ### How was this patch tested? new test ### Was this patch authored or co-authored using generative AI tooling? no Closes apache#48453 from cloud-fan/json. Lead-authored-by: Wenchen Fan <[email protected]> Co-authored-by: Wenchen Fan <[email protected]> Signed-off-by: Max Gekk <[email protected]>
### What changes were proposed in this pull request? Implement support for hashing and comparison for trim collation. ### Why are the changes needed? To have full support for trim collation. ### How was this patch tested? Add tests in CollationFactorySUite and CollationSqlExpressionSuite. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#48386 from jovanpavl-db/implement_hashing. Authored-by: Jovan Pavlovic <[email protected]> Signed-off-by: Max Gekk <[email protected]>
### What changes were proposed in this pull request? Support box plots with plotly backend on both Spark Connect and Spark classic. ### Why are the changes needed? While Pandas on Spark supports plotting, PySpark currently lacks this feature. The proposed API will enable users to generate visualizations. This will provide users with an intuitive, interactive way to explore and understand large datasets directly from PySpark DataFrames, streamlining the data analysis workflow in distributed environments. See more at [PySpark Plotting API Specification](https://docs.google.com/document/d/1IjOEzC8zcetG86WDvqkereQPj_NGLNW7Bdu910g30Dg/edit?usp=sharing) in progress. Part of https://issues.apache.org/jira/browse/SPARK-49530. ### Does this PR introduce _any_ user-facing change? Yes. Box plots are supported as shown below. ```py >>> data = [ ... ("A", 50, 55), ... ("B", 55, 60), ... ("C", 60, 65), ... ("D", 65, 70), ... ("E", 70, 75), ... # outliers ... ("F", 10, 15), ... ("G", 85, 90), ... ("H", 5, 150), ... ] >>> columns = ["student", "math_score", "english_score"] >>> sdf = spark.createDataFrame(data, columns) >>> fig1 = sdf.plot.box(column=["math_score", "english_score"]) >>> fig1.show() # see below >>> fig2 = sdf.plot(kind="box", column="math_score") >>> fig2.show() # see below ``` fig1: ![newplot (17)](https://github.com/user-attachments/assets/8c36c344-f6de-47e3-bd63-c0f3b57efc43) fig2: ![newplot (18)](https://github.com/user-attachments/assets/9b7b60f6-58ec-4eff-9544-d5ab88a88631) ### How was this patch tested? Unit tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#48447 from xinrong-meng/box. Authored-by: Xinrong Meng <[email protected]> Signed-off-by: Xinrong Meng <[email protected]>
### What changes were proposed in this pull request? This PR aims to upgrade ASM from `9.7` to `9.7.1`. ### Why are the changes needed? - xbean-asm9-shaded 4.26 upgrade to use `ASM 9.7.1` and `ASM 9.7.1` is for `Java 24`. apache/geronimo-xbean#41 - https://asm.ow2.io/versions.html <img width="809" alt="image" src="https://github.com/user-attachments/assets/6ca57af9-2b03-467f-9a31-31b6d7eb4d53"> ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#48465 from panbingkun/SPARK-49965. Authored-by: panbingkun <[email protected]> Signed-off-by: yangjie01 <[email protected]>
### What changes were proposed in this pull request? This test improves a unit test case where json strings with duplicate keys are prohibited by checking the cause of the exception instead of just the root exception. ### Why are the changes needed? Earlier, the test only checked the top error class but not the cause of the error which should be `VARIANT_DUPLICATE_KEY`. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? ### Was this patch authored or co-authored using generative AI tooling? NA Closes apache#48464 from harshmotw-db/harshmotw-db/minor_test_fix. Authored-by: Harsh Motwani <[email protected]> Signed-off-by: Max Gekk <[email protected]>
### What changes were proposed in this pull request? In this PR, I propose to disallow collated strings in `collect_set` expression. ### Why are the changes needed? Proposed changes are necessary in order to achieve correct behavior of the expressions mentioned above. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? This patch was tested by modifying existing test case in `CollationSQLExpressionSuite`. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#48456 from vladanvasi-db/vladanvasi-db/collect-set-collated-disablement. Authored-by: Vladan Vasić <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Why are the changes needed?
Does this PR introduce any user-facing change?
How was this patch tested?
Was this patch authored or co-authored using generative AI tooling?