-
Notifications
You must be signed in to change notification settings - Fork 241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make maxCpuBatchSize in GpuPartitioning configurable #11929
base: branch-25.02
Are you sure you want to change the base?
make maxCpuBatchSize in GpuPartitioning configurable #11929
Conversation
Signed-off-by: Hongbin Ma (Mahone) <[email protected]>
build |
val SHUFFLE_PARTITIONING_MAX_CPU_BATCH_SIZE = | ||
conf("spark.rapids.shuffle.partitioning.maxCpuBatchSize") | ||
.doc("The maximum size of a sliced batch output to the CPU side " + | ||
"when GPU partitioning shuffle data. This is used to limit the peak on-heap memory used " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So one thing I would say was that this wasn't introduced to limit peak memory per se, it was introduced because we cannot go above this number (#45). If we are going to make it configurable, we should perhaps add a max check so we don't go above it. The comment could reflect this constraint.
+1 on the internal comment from @winningsix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"it was introduced because we cannot go above this number " yes I'm also aware of this, but it happends that we can take advantage of it to limit peak memory. I changed the wording from "This is used to " to "This can be used to" to be more precise.
@@ -1976,6 +1976,17 @@ val SHUFFLE_COMPRESSION_LZ4_CHUNK_SIZE = conf("spark.rapids.shuffle.compression. | |||
.integerConf | |||
.createWithDefault(20) | |||
|
|||
val SHUFFLE_PARTITIONING_MAX_CPU_BATCH_SIZE = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copyright needs to be updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. I have a stupid question: why don't we replace all the license header for all files in a batch? It's troublesome to take care of the license header every time.
Signed-off-by: Hongbin Ma (Mahone) <[email protected]>
This PR closes #11928