[FEA] explore setting spark.locality.wait in the plugin/auto-tuner #12131

revans2 · 2025-02-13T15:17:37Z

Is your feature request related to a problem? Please describe.
The following Spark configs deal with the scheduler waiting to schedule a task in an attempt to get better locality.

spark.locality.wait
spark.locality.wait.node
spark.locality.wait.rack
spark.locality.wait.process
spark.shuffle.reduceLocality.enabled
spark.sql.sources.ignoreDataLocality (internal config)

There are a few issues with this for Spark's. The default wait time for all of the wait times is 3 seconds. Also most modern Spark setups have split storage and compute. Even if you are running on YARN with HDFS on the same nodes, it is likely that the small subset of the YARN cluster that your spark cluster gets will not have a lot of data locality on it.

The primary issue is that if there is little to no chance that a task will get node level locality, the task with locality information attached to it will sit for at least 3 seconds before it is scheduled. If our average task time is much smaller than CPU tasks, then this wait time compounds the problem.

In most of our benchmark runs we set the locality.wait to 0. But we should do some experiments and see if a different setting might generally be better. The config has millisecond level granularity so we could do something like 10ms where we still might get some locality if it is available, but we will not slow down tasks much to get it.

We might also want to look at setting spark.locality.wait.process separately because this is how long spark will wait to read the cached data from the same process instead of shipping it some place else.

I don't think we want to touch the other configs, but I included them for completeness.

First off we need to see if we can set these in the plugin at all. If we cannot then this needs to be converted into an auto-tuner only issue. If we can, we probably also want to see if we can include this in the auto-tuner as I would want the plugin to only set it if the user has not already set it, but the auto-tuner can have more leeway in recommending to a customer what to do.

The text was updated successfully, but these errors were encountered:

revans2 added ? - Needs Triage Need team to review and classify feature request New feature or request labels Feb 13, 2025

mattahrens removed the ? - Needs Triage Need team to review and classify label Feb 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] explore setting spark.locality.wait in the plugin/auto-tuner #12131

[FEA] explore setting spark.locality.wait in the plugin/auto-tuner #12131

revans2 commented Feb 13, 2025

[FEA] explore setting spark.locality.wait in the plugin/auto-tuner #12131

[FEA] explore setting spark.locality.wait in the plugin/auto-tuner #12131

Comments

revans2 commented Feb 13, 2025