-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gpu assignment scaling #46
Conversation
How about a unit test for the benchmarking? just to make sure the process doesn't take inordinate amounts of time for small to mid-size specs, and doesn't result in final penalty that are too crazy. |
Tagging @shaoxiongji for documentation purposes; this PR would add two flags to the config-config. The behavior, if I unerstand correctly, is as follows:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but please fix the linting before merging
If the heuristic only penalizes for the existence of empty GPUs, then reducing the number of empty GPUs by one will not be visible in the loss unless it is the last one. The penalty should be applied for each empty GPU individually.
Start by assigning a ready task to the first slot of each GPU
In this optmization iteration, one task per GPU is selected as a candidate. The selected slot is the one that is locally optimal to get rid of: i.e. not considering the destination of the swap.
It flickers past too fast anyhow.
fbb3cd5
to
735b096
Compare
Still LGTM, can merge |
Instead of considering all swaps of all gpu slots, make some educated guesses interleaved with random subsetting
Parameter
--time_budget_s
allows setting a fixed time budget for the GPU assignmentFix a bug in the extra_empty_penalty: apply extra_empty_penalty for each empty GPU
If the heuristic only penalizes for the existence of empty GPUs, then
reducing the number of empty GPUs by one will not be visible in the loss
unless it is the last one. The penalty should be applied for each empty
GPU individually.
Some misc improvements to config-config, e.g. use tqdm for progress indicator instead of spewing garbage all over stdout.