[SPARK-49872] Spark History UI -- StreamConstraintsException: String length (20054016) exceeds the maximum length (20000000) #6

roczei · 2024-10-15T14:35:26Z

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

…MP_1142 ### What changes were proposed in this pull request? This PR proposes to assign proper error condition & sqlstate for _LEGACY_ERROR_TEMP_1142 ### Why are the changes needed? To improve the error message by assigning proper error condition and SQLSTATE ### Does this PR introduce _any_ user-facing change? No, only user-facing error message improved ### How was this patch tested? Updated the existing tests ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#48451 from itholic/SPARK-49952. Authored-by: Haejoon Lee <[email protected]> Signed-off-by: Max Gekk <[email protected]>

…metic overflow ### What changes were proposed in this pull request? Improvement on default branch for try suggestion. ### Why are the changes needed? When we hit default branch in codeGen, we need to return a default value that would specify that we do not know the function, and not just a blank string. ### Does this PR introduce _any_ user-facing change? Yes. ### How was this patch tested? No branch hits this behaviour so far, but we need to safeguard from the possible errors. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#48450 from mihailom-db/binaryArithmeticOverflowFollowup. Authored-by: Mihailo Milosevic <[email protected]> Signed-off-by: Max Gekk <[email protected]>

…ng JSON string RDD ### What changes were proposed in this pull request? This is a followup of apache#42979 , to fix a regression. For the `spark.read.json(rdd)` API, there is never corrupted file, and we should not fail if the string value is null with non-failfast parsing mode. This PR is a partial revert of apache#42979 , to not treat `RuntimeException` as corrupted file when we are not reading from files. ### Why are the changes needed? A query used to work in 3.5 should still work in 4.0 ### Does this PR introduce _any_ user-facing change? no as this regression is not released yet. ### How was this patch tested? new test ### Was this patch authored or co-authored using generative AI tooling? no Closes apache#48453 from cloud-fan/json. Lead-authored-by: Wenchen Fan <[email protected]> Co-authored-by: Wenchen Fan <[email protected]> Signed-off-by: Max Gekk <[email protected]>

### What changes were proposed in this pull request? Implement support for hashing and comparison for trim collation. ### Why are the changes needed? To have full support for trim collation. ### How was this patch tested? Add tests in CollationFactorySUite and CollationSqlExpressionSuite. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#48386 from jovanpavl-db/implement_hashing. Authored-by: Jovan Pavlovic <[email protected]> Signed-off-by: Max Gekk <[email protected]>

### What changes were proposed in this pull request? Support box plots with plotly backend on both Spark Connect and Spark classic. ### Why are the changes needed? While Pandas on Spark supports plotting, PySpark currently lacks this feature. The proposed API will enable users to generate visualizations. This will provide users with an intuitive, interactive way to explore and understand large datasets directly from PySpark DataFrames, streamlining the data analysis workflow in distributed environments. See more at [PySpark Plotting API Specification](https://docs.google.com/document/d/1IjOEzC8zcetG86WDvqkereQPj_NGLNW7Bdu910g30Dg/edit?usp=sharing) in progress. Part of https://issues.apache.org/jira/browse/SPARK-49530. ### Does this PR introduce _any_ user-facing change? Yes. Box plots are supported as shown below. ```py >>> data = [ ... ("A", 50, 55), ... ("B", 55, 60), ... ("C", 60, 65), ... ("D", 65, 70), ... ("E", 70, 75), ... # outliers ... ("F", 10, 15), ... ("G", 85, 90), ... ("H", 5, 150), ... ] >>> columns = ["student", "math_score", "english_score"] >>> sdf = spark.createDataFrame(data, columns) >>> fig1 = sdf.plot.box(column=["math_score", "english_score"]) >>> fig1.show() # see below >>> fig2 = sdf.plot(kind="box", column="math_score") >>> fig2.show() # see below ``` fig1: ![newplot (17)](https://github.com/user-attachments/assets/8c36c344-f6de-47e3-bd63-c0f3b57efc43) fig2: ![newplot (18)](https://github.com/user-attachments/assets/9b7b60f6-58ec-4eff-9544-d5ab88a88631) ### How was this patch tested? Unit tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#48447 from xinrong-meng/box. Authored-by: Xinrong Meng <[email protected]> Signed-off-by: Xinrong Meng <[email protected]>

### What changes were proposed in this pull request? This PR aims to upgrade ASM from `9.7` to `9.7.1`. ### Why are the changes needed? - xbean-asm9-shaded 4.26 upgrade to use `ASM 9.7.1` and `ASM 9.7.1` is for `Java 24`. apache/geronimo-xbean#41 - https://asm.ow2.io/versions.html <img width="809" alt="image" src="https://github.com/user-attachments/assets/6ca57af9-2b03-467f-9a31-31b6d7eb4d53"> ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#48465 from panbingkun/SPARK-49965. Authored-by: panbingkun <[email protected]> Signed-off-by: yangjie01 <[email protected]>

### What changes were proposed in this pull request? This test improves a unit test case where json strings with duplicate keys are prohibited by checking the cause of the exception instead of just the root exception. ### Why are the changes needed? Earlier, the test only checked the top error class but not the cause of the error which should be `VARIANT_DUPLICATE_KEY`. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? ### Was this patch authored or co-authored using generative AI tooling? NA Closes apache#48464 from harshmotw-db/harshmotw-db/minor_test_fix. Authored-by: Harsh Motwani <[email protected]> Signed-off-by: Max Gekk <[email protected]>

### What changes were proposed in this pull request? In this PR, I propose to disallow collated strings in `collect_set` expression. ### Why are the changes needed? Proposed changes are necessary in order to achieve correct behavior of the expressions mentioned above. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? This patch was tested by modifying existing test case in `CollationSQLExpressionSuite`. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#48456 from vladanvasi-db/vladanvasi-db/collect-set-collated-disablement. Authored-by: Vladan Vasić <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

itholic and others added 9 commits October 14, 2024 15:26

Fix the jackson issue

10e932d

github-actions bot added BUILD SQL PYTHON labels Oct 15, 2024

roczei closed this Oct 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-49872] Spark History UI -- StreamConstraintsException: String length (20054016) exceeds the maximum length (20000000) #6

[SPARK-49872] Spark History UI -- StreamConstraintsException: String length (20054016) exceeds the maximum length (20000000) #6

roczei commented Oct 15, 2024

[SPARK-49872] Spark History UI -- StreamConstraintsException: String length (20054016) exceeds the maximum length (20000000) #6

[SPARK-49872] Spark History UI -- StreamConstraintsException: String length (20054016) exceeds the maximum length (20000000) #6

Conversation

roczei commented Oct 15, 2024

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?