Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change AQE default minPartitionNum to defaultParallelism #674

Merged
merged 6 commits into from
May 1, 2020

Conversation

rshkv
Copy link

@rshkv rshkv commented Apr 29, 2020

Applying changes from SPARK-31124 / apache/spark#27879.

What changes were proposed in this pull request?

Authoritative description on apache/spark#27879.

Summary: When coalescing partitions in AQE, the minimum is no longer 1 but default parallelism. The default parallelism is spark.default.parallelism when set or the number of available CPUs at any given point in time. This ensures that we don't lose parallelism by applying AQE.

Why didn't you just take the upstream commit? The AQE framework was refactored in SPARK-31037, which in turn relies on changes we haven't picked yet. And the sum of those was more than I felt comfortable with taking in one batch. So yes this is lame. We can just revert this later.

How was this patch tested? I wrote a unit test which ensures that the partition number is the default parallelism value where it would otherwise be 1.

@rshkv rshkv requested a review from robert3005 April 30, 2020 10:30
@@ -523,6 +525,38 @@ class ReduceNumShufflePartitionsSuite extends SparkFunSuite with BeforeAndAfterA
}
}

// TODO(rshkv): Remove after taking SPARK-31124

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please file an issue and reference it here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robert3005 robert3005 changed the title Change AQE default minPartitionNum to defaultParallelism [SPARK-31124] Change AQE default minPartitionNum to defaultParallelism Apr 30, 2020
@robert3005 robert3005 changed the title [SPARK-31124] Change AQE default minPartitionNum to defaultParallelism Change AQE default minPartitionNum to defaultParallelism Apr 30, 2020
@robert3005
Copy link

LGTM 👍

@rshkv rshkv merged commit da8a7ff into master May 1, 2020
@rshkv rshkv deleted the wr/aqe-min-parallelism branch May 1, 2020 10:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants