Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Load unbalance is a very likely candidates for the scaling issues we faced. This PR introduces a couple new flags to enforce equal load across nodes, which seems to result in reasonable communication performances.
This can be achieved with the :
--pad_to_max_length
--max_length
to which the input tensors will be trimmed (hence the sequence dimension of input tensors will be constant)--batch_type "sents"
(hence the batch dimension of input tensors will be constant), along with a low enough batch size.Using a max length of 128 and a batch size of 300, we get 6k tok/sec on a 4-GPUs / 1 node Europarl job, with a fairly constant GPU utilization rate (90 to 100%).
Todo's before undrafting this PR
numel_fn
, allow it to track the max lengths across the batch items,