Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Array Partitioning #174

Closed
steiltre opened this issue Jul 18, 2023 · 1 comment
Closed

Array Partitioning #174

steiltre opened this issue Jul 18, 2023 · 1 comment

Comments

@steiltre
Copy link
Collaborator

@youd3 brought an imbalance in the array partitioning to my attention. In its current implementation, if P processes are storing P+1 items in a ygm::container::array, the first approximately P/2 ranks will each get 2 items, leaving almost half of the ranks completely empty. This occurs because we determine a global block size of ceil(k/P) for storing k items on P processes, and we assign full blocks until we have run out of data.

This imbalance becomes less severe as the average number of items per rank increases. I believe the fraction of completely empty ranks is bounded by 1 - floor(k/P)/ceil(k/P) in the worst case. This is enough to realistically leave entire compute nodes empty at large scales when k/P is relatively small (measured in hundreds or possibly thousands).

The fix is relatively simple. It just requires two different chunk sizes to be used for the data, and a little bit more math to determine which rank owns a particular index. To maintain consistency, the bag.rebalance() method needs to be updated when array partitioning is changed.

@steiltre
Copy link
Collaborator Author

Addressed in PR #187.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant