RaptorX: Why only use hash of path to choose preferred worker? #15720

baibaichen · 2021-02-15T03:35:34Z

baibaichen
Feb 15, 2021

Hi

By reading RaptorX: Building a 10X Faster Presto, I know that presto already implemented local cache, a nice feature. I am interested in how a split is assigned to workers. From the code:

    @Override
    public List<HostAddress> getPreferredNodes(List<HostAddress> sortedCandidates)
    {
        if (sortedCandidates == null || sortedCandidates.isEmpty()) {
            throw new PrestoException(NO_NODES_AVAILABLE, "sortedCandidates is null or empty for HiveSplit");
        }

        if (getNodeSelectionStrategy() == SOFT_AFFINITY) {
            // Use + 1 as secondary hash for now, would always get a different position from the first hash.
            int size = sortedCandidates.size();
            int mod = path.hashCode() % size;
            int position = mod < 0 ? mod + size : mod;
            return ImmutableList.of(
                    sortedCandidates.get(position),
                    sortedCandidates.get((position + 1) % size));
        }
        return addresses;
    }

It makes me surprise that Presto only uses the hash of the path to determine preferred nodes. If a file is large enough and can be divided into multiple splits, then the cached work will be overloaded, and hence make cache a litter bit inefficient.

shixuan-fan · 2021-02-16T19:14:58Z

shixuan-fan
Feb 16, 2021
Collaborator

Hi @baibaichen , this is a good question. When doing soft affinity split scheduling, what we are doing is:

We schedule split to primary preferred worker by default
If primary preferred worker is overloaded (controlled by node-scheduler.max-splits-per-node), we schedule split to secondary preferred worker
If both are overloaded, we schedule the split to the least loaded worker (the default split scheduling behavior), and turn off cache so that worker's cache is not polluted.

@kewang1024 might have a better idea on how it works than me :)

Of course, # of splits is just a heuristic of the workload on a worker. There are initiatives that tried to take worker CPU/memory/IO util as resource and use that to determine split placement.

0 replies

baibaichen · 2021-02-17T06:30:10Z

baibaichen
Feb 17, 2021
Author

I know that, I just curious why don't use split's hash to choose the preferred worker.

The only reason that I can think of is the footer, for example, Parquet stores meta in its footer, the current way makes the footer cache more simple.

4 replies

shixuan-fan Feb 17, 2021
Collaborator

Footer is one part because otherwise you will need to cache the same footer on multiple workers, but in general that is fine because footer is relatively small.

Split's hash does not work quite well for the following case: one split is reading file A from 0 - 1000, and another split (potentially from a different query) is reading the same file from, say, 0 - 1001. We treat file as the granularity because currently one split only contains one file.

baibaichen Feb 18, 2021
Author

We treat file as the granularity because currently one split only contains one file.

It looks reasonable, but it needs us to generate files carefully. For Parquet, how about one Row Group per split?

shixuan-fan Feb 18, 2021
Collaborator

I'm not familiar with Parquet but I assume it is similar to ORC.

There could be multiple row groups per split, and we don't necessarily want to schedule one row group per split (more splits means more overhead).
Accessing row group during scheduling means the coordinator would need to actually read the footers, which is a non-trivial change on abstraction. Currently coordinator only talks to metastore and listing files from underlying storage system. It does not get into the file details.

baibaichen Feb 19, 2021
Author

I see, thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Presto

RaptorX: Why only use hash of path to choose preferred worker? #15720

{{title}}

Replies: 2 comments 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Presto

RaptorX: Why only use hash of path to choose preferred worker? #15720

baibaichen Feb 15, 2021

Replies: 2 comments · 4 replies

shixuan-fan Feb 16, 2021 Collaborator

baibaichen Feb 17, 2021 Author

shixuan-fan Feb 17, 2021 Collaborator

baibaichen Feb 18, 2021 Author

shixuan-fan Feb 18, 2021 Collaborator

baibaichen Feb 19, 2021 Author

baibaichen
Feb 15, 2021

Replies: 2 comments 4 replies

shixuan-fan
Feb 16, 2021
Collaborator

baibaichen
Feb 17, 2021
Author

shixuan-fan Feb 17, 2021
Collaborator

baibaichen Feb 18, 2021
Author

shixuan-fan Feb 18, 2021
Collaborator

baibaichen Feb 19, 2021
Author