-
Notifications
You must be signed in to change notification settings - Fork 635
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question] Speed up evaluate using large eval batch size #1556
Comments
@trannhuthuat96 Hello, I'm sorry for the late reply. |
Hi @Wicknight, Thanks for your response. For question 1. just want to clarify that loading the whole item set means that we can fit item set in memory (in case memory is limited). For question 2. I tried with MacridVAE. I ran multiple times with same large batch size on benchmark movilen-1M (data is provided by another paper for fair comparison) and achieved same results (Recall and NDCGs). But different large eval batch size values produced different results on the same dataset. Thanks, |
Hello @trannhuthuat96, For question 1. We do this because for each user, we must evaluate all items to get results. Therefore, the ’num_item‘ is finally taken as the basic unit of eval batch size. For question 2. It is verified by experiments that this problem does exist. At present, we are investigating the cause. If you have any suggestions, you are also welcome to put forward here. |
Hi @Wicknight , I have recently trained Amazon_Electronic by DIN model and I also encountered this question. Here is my config file:
The tqdm bar showed it was going to take more than one hour to evaluate. Do you have any ideas? |
When I use 2 GPUs to train and eval instead of 4, the time of eval reduced to 15 mins. It's so weird.. |
Hi Team,
Thanks for developing such a great library.
When using RecBole, I found that evaluation in full sort evaluation mode runs slowly. After reading the code, I found the reason from this line
RecBole/recbole/data/dataloader/general_dataloader.py
Line 242 in d3d421d
What is the purpose of re-calculating batch size? (My model is auto-encoder structured, so in training and evaluating, each batch of data contains only user_id)
In order to speed up evaluation,
eval_batch_size
is set to a large value, i.e.,eval_batch_size = 512 * num_item
, so that real eval batch size (in case of user-oriented dataloader) is 512. But different largeeval_batch_size
values result in different Recalls and NDCGs (same random seeds, same GPU machine). Could you help to explain this situation?Thanks,
The text was updated successfully, but these errors were encountered: