gpu assignment scaling #46

Waino · 2024-01-22T08:29:42Z

Instead of considering all swaps of all gpu slots, make some educated guesses interleaved with random subsetting
Parameter --time_budget_s allows setting a fixed time budget for the GPU assignment
Fix a bug in the extra_empty_penalty: apply extra_empty_penalty for each empty GPU

If the heuristic only penalizes for the existence of empty GPUs, then
reducing the number of empty GPUs by one will not be visible in the loss
unless it is the last one. The penalty should be applied for each empty
GPU individually.
Some misc improvements to config-config, e.g. use tqdm for progress indicator instead of spewing garbage all over stdout.

TimotheeMickus · 2024-01-22T08:45:59Z

How about a unit test for the benchmarking? just to make sure the process doesn't take inordinate amounts of time for small to mid-size specs, and doesn't result in final penalty that are too crazy.

TimotheeMickus · 2024-01-22T08:49:06Z

Tagging @shaoxiongji for documentation purposes; this PR would add two flags to the config-config. The behavior, if I unerstand correctly, is as follows:

--time_budget_s controls the maximum allowed runtime before forcing the gpu assignment to stop
--log_name dumps intermediate info into a file, rather than to stdout as we had previously

TimotheeMickus

LGTM, but please fix the linting before merging

If the heuristic only penalizes for the existence of empty GPUs, then reducing the number of empty GPUs by one will not be visible in the loss unless it is the last one. The penalty should be applied for each empty GPU individually.

Start by assigning a ready task to the first slot of each GPU

In this optmization iteration, one task per GPU is selected as a candidate. The selected slot is the one that is locally optimal to get rid of: i.e. not considering the destination of the swap.

It flickers past too fast anyhow.

TimotheeMickus · 2024-03-04T12:40:56Z

Still LGTM, can merge

Waino requested a review from TimotheeMickus January 22, 2024 08:29

TimotheeMickus approved these changes Feb 6, 2024

View reviewed changes

Waino added 9 commits March 4, 2024 11:53

Script to generate dummy configs for benchmarking gpu assignment

8b96730

TQDM for progress tracking and jsonl output of stats

34e716c

extra_empty_penalty for each empty GPU

ef2f4e3

If the heuristic only penalizes for the existence of empty GPUs, then reducing the number of empty GPUs by one will not be visible in the loss unless it is the last one. The penalty should be applied for each empty GPU individually.

Better initialization

46ffee4

Start by assigning a ready task to the first slot of each GPU

Interleave subsets consisting of least favorite tasks

6f17807

In this optmization iteration, one task per GPU is selected as a candidate. The selected slot is the one that is locally optimal to get rid of: i.e. not considering the destination of the swap.

Remove innermost tqdm

51128a6

It flickers past too fast anyhow.

Parameter --time_budget_s to end gpu assignment early

bdb192e

Only delete node_gpu if it is set

e208b5c

pep8

735b096

Waino force-pushed the feat/gpu_assignment_scaling branch from fbb3cd5 to 735b096 Compare March 4, 2024 09:55

Waino merged commit 1ba7af9 into main Mar 11, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gpu assignment scaling #46

gpu assignment scaling #46

Waino commented Jan 22, 2024

TimotheeMickus commented Jan 22, 2024 •

edited

Loading

TimotheeMickus commented Jan 22, 2024

TimotheeMickus left a comment

TimotheeMickus commented Mar 4, 2024

gpu assignment scaling #46

gpu assignment scaling #46

Conversation

Waino commented Jan 22, 2024

TimotheeMickus commented Jan 22, 2024 • edited Loading

TimotheeMickus commented Jan 22, 2024

TimotheeMickus left a comment

Choose a reason for hiding this comment

TimotheeMickus commented Mar 4, 2024

TimotheeMickus commented Jan 22, 2024 •

edited

Loading