You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When encoding texts, currently, the pipeline (https://github.com/facebookresearch/SONAR/blob/main/sonar/inference_pipelines/text.py#L169) reads them in the provided order, groups them into batches, and collates each batch by padding each text in the batch to have the same length as the longest text in this batch. This sometimes produces batches where most tokens are just pad tokens, so the computation is wasted for them.
To avoid this waste and speed up the pipeline, we could sort the text by length before batching them.
The text was updated successfully, but these errors were encountered:
When encoding texts, currently, the pipeline (https://github.com/facebookresearch/SONAR/blob/main/sonar/inference_pipelines/text.py#L169) reads them in the provided order, groups them into batches, and collates each batch by padding each text in the batch to have the same length as the longest text in this batch. This sometimes produces batches where most tokens are just pad tokens, so the computation is wasted for them.
To avoid this waste and speed up the pipeline, we could sort the text by length before batching them.
The text was updated successfully, but these errors were encountered: