You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
#55 implements fixed size minibatches with both a fixed number of sentences and a fixed number of tokens per sentence, with truncating/padding to enforce the length.
This seems to result in good throughput and GPU utilization under some conditions, but is sensitive to the --max_length parameter: too short will fill up the batches but result in truncation and inability to translate long sentences, while too long will lead to excessive padding which is wasteful in terms of compute.
The implementation in #55 is built on top of the spiral LookAheadBucketing, which was removed in #66. It should be reimplemented as a standalone component.
The text was updated successfully, but these errors were encountered:
Fixed size batching entails using
- "batch_type: sents" to fix the batch dimension to batch_size, and
- "pad_to_max_length: true" together with "max_length" to fix the
sequence length dimension.
Closes#67
Fixed size batching entails using
- "batch_type: sents" to fix the batch dimension to batch_size, and
- "pad_to_max_length: true" together with "max_length" to fix the
sequence length dimension.
Closes#67
Fixed size batching entails using
- "batch_type: sents" to fix the batch dimension to batch_size, and
- "pad_to_max_length: true" together with "max_length" to fix the
sequence length dimension.
Closes#67
#55 implements fixed size minibatches with both a fixed number of sentences and a fixed number of tokens per sentence, with truncating/padding to enforce the length.
This seems to result in good throughput and GPU utilization under some conditions, but is sensitive to the
--max_length
parameter: too short will fill up the batches but result in truncation and inability to translate long sentences, while too long will lead to excessive padding which is wasteful in terms of compute.The implementation in #55 is built on top of the spiral
LookAheadBucketing
, which was removed in #66. It should be reimplemented as a standalone component.The text was updated successfully, but these errors were encountered: