MPI+SparseTimeFunction: More efficient _dist_gather #1013

FabioLuporini · 2019-11-25T09:44:30Z

In the context of checkpointing, it's a significant overhead that upon returning from C-land we redistribute the entire SparseTimeFunction while potentially only a relatively small number of time iterations have been computed

This should be easily fixable by plumbing the args down to _arg_apply and then to _dist_gather so that we only retain the written region of the data array

The text was updated successfully, but these errors were encountered:

tjb900 · 2019-11-25T14:30:38Z

Probably a very similar fix if so, but does this potentially apply to _dist_scatter before going to C also?

mloubout · 2019-11-26T20:33:03Z

Should we drop the "gather" completely and only scatter once and be done?

FabioLuporini · 2019-11-27T07:13:38Z

I think there might be side effects if one expects certain data to be on a certain rank while it actually is somewhere else.

Perhaps we can:

scatter upon startup (ie very first SparseFunction initialization)
gather on-the-fly each time sf.data is accessed (ie we return a view)

this way would be basically 0 overhead during forward and backward propagation

FabioLuporini added enhancement labels Nov 25, 2019

FabioLuporini added the bug-performance-py label Nov 29, 2019

FabioLuporini removed enhancement labels Apr 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MPI+SparseTimeFunction: More efficient _dist_gather #1013

MPI+SparseTimeFunction: More efficient _dist_gather #1013

FabioLuporini commented Nov 25, 2019

tjb900 commented Nov 25, 2019

mloubout commented Nov 26, 2019

FabioLuporini commented Nov 27, 2019

MPI+SparseTimeFunction: More efficient _dist_gather #1013

MPI+SparseTimeFunction: More efficient _dist_gather #1013

Comments

FabioLuporini commented Nov 25, 2019

tjb900 commented Nov 25, 2019

mloubout commented Nov 26, 2019

FabioLuporini commented Nov 27, 2019