-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat: Multi-GPU Evaluation #3611
base: master
Are you sure you want to change the base?
Conversation
da4e8f0
to
7734bdd
Compare
Looks like checks are hitting an unrelated type error: |
@MattGPT-ai this is due to a new mypy version and affected a deprecated class. I just fixed it in #3613. If you update this branch to current master the error should disappear. |
…gather functions to distributed utils. This works by using a DistributedSampler to allocate samples across GPU processes, then aggregates all the predictions and losses from all processes before running the evaluation. Uses broadcast to ensure all processes return the same valid result.
7734bdd
to
c0f7e7d
Compare
Awesome that worked, checks passed! |
We have been using this change successfully in our fork for about a month now, it's been a major speed improvement especially when evaluation sets are large! |
Flair now supports multi-GPU training, but not evaluation. This means that the work of n-1 GPUs is wasted during evaluation time, and this can dramatically reduce the benefit of multi-GPU training if your eval set is considerable in size. Even worse, I believe it can be slower than single GPU evaluation, as CPU portions of the evaluation code have to repeat n times, but are sharing the same CPU and memory resources.
This PR implements multi-GPU acceleration for
evaluate
in theClassifier
,TextRegressor
, andTextPairRegressor
model types. It uses theDistributedSampler
to split the eval set between the GPUs, predictions are run, and the results of inference are aggregated between processes before the metrics are calculated in the main process and returned.