Design or prototype distributed execution #25

fpetkovski · 2022-09-23T07:26:27Z

We would likely need to a logical plan first, but we can already start thinking about distributed query execution and how we can implement it in the engine.

I would avoid adding networking specifics to the engine itself, and defer that responsibility that to the library user. This way, a project that uses the engine might decide to inject an implementation based on gRPC, plain HTTP, or some other protocol for communicating between engine instances.

The engine itself would define an interface for inter-engine communication, and would make sure the query is properly decomposed and merged back after all parts of the plan complete successfully.

alanprot · 2022-09-28T20:40:08Z

This seems very interesting...

Besides running a single query on multiple pods this would also allow compute parts of the query closer to the data (on the storage nodes) and reduce network traffic quite a bit, right? As we would not need transfer raw data anymore.

GiedriusS · 2022-09-29T07:15:07Z

I was thinking a lot about this topic and I think the main problem at least in the Thanos space is that we do read-time deduplication instead of write-time deduplication. In my opinion, the problem with read-time deduplication is that we don't know whether there are any gaps in the data stored in a node (or any block) thus we need to download everything (all matching blocks) to be able to deduplicate effectively. If we would have identical copies of deduplicated data on multiple replicas then we could effectively execute a given query if all of the needed data for that query resides in the replica of that data.

alanprot · 2022-10-04T01:51:16Z

Indeed, another possibility would be to support sharding natively, so we could split aggregation over multiple pods?

fpetkovski · 2022-10-04T10:24:12Z

That would be the way to go I think. But it should really be up to the user to decide where the engine will run. So in theory, if you can guarantee unique data in a Store, you could also embed an engine there.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Design or prototype distributed execution #25

Design or prototype distributed execution #25

fpetkovski commented Sep 23, 2022

alanprot commented Sep 28, 2022 •

edited

Loading

GiedriusS commented Sep 29, 2022

alanprot commented Oct 4, 2022

fpetkovski commented Oct 4, 2022

Design or prototype distributed execution #25

Design or prototype distributed execution #25

Comments

fpetkovski commented Sep 23, 2022

alanprot commented Sep 28, 2022 • edited Loading

GiedriusS commented Sep 29, 2022

alanprot commented Oct 4, 2022

fpetkovski commented Oct 4, 2022

alanprot commented Sep 28, 2022 •

edited

Loading