RaptorX: Why only use hash of path to choose preferred worker? #15720
Replies: 2 comments 4 replies
-
Hi @baibaichen , this is a good question. When doing soft affinity split scheduling, what we are doing is:
@kewang1024 might have a better idea on how it works than me :) Of course, # of splits is just a heuristic of the workload on a worker. There are initiatives that tried to take worker CPU/memory/IO util as resource and use that to determine split placement. |
Beta Was this translation helpful? Give feedback.
-
I know that, I just curious why don't use split's hash to choose the preferred worker. The only reason that I can think of is the footer, for example, Parquet stores meta in its footer, the current way makes the footer cache more simple. |
Beta Was this translation helpful? Give feedback.
-
Hi
By reading RaptorX: Building a 10X Faster Presto, I know that presto already implemented local cache, a nice feature. I am interested in how a split is assigned to workers. From the code:
It makes me surprise that Presto only uses the hash of the path to determine preferred nodes. If a file is large enough and can be divided into multiple splits, then the cached work will be overloaded, and hence make cache a litter bit inefficient.
Beta Was this translation helpful? Give feedback.
All reactions