-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Building/updating top-k item index for retrieval models evaluations is quite slow #339
Comments
@marcromeyn - I had trouble getting an enviroment working well enough to reproduce this issue, but I did confirm with a toy code that the "model-reloading" overhead (which I suggested as a possible problem offline) is probably not the problem here. One thing that could be an issue is the use of bare Clarification: The compute issue may be a red herring in this case, but I do think it is best for us to formalize that process either way. |
Thanks for looking into this @rjzamora! I added the one-line change to a PR, @gabrielspmoreira will now check if this indeed speeds up the batch-predict issues. We will keep you posted! |
Thanks @marcromeyn and @rjzamora . Here is the line and timing before the change #1m50s
embedding_df = data.map_partitions(model_encode, filter_input_columns=[id_column]).compute() After the suggested change #20s
embedding_ddf = data.map_partitions(model_encode, filter_input_columns=[id_column])
#1m20s
embedding_df = embedding_ddf.compute(scheduler="synchronous") I have noticed that data = data.to_ddf().repartition(npartitions=10)
...
#3m33s
embedding_df = embedding_ddf.compute(scheduler="synchronous") |
Thanks for testing @gabrielspmoreira ! At first, I was going to say that Now for the compute call... When I suggested passing
If you only have one GPU, then it definitly makes sense that you are getting better performance with a single large partition than you are with multiple smaller ones (since they cannot execute in parallel anyway). Since you only have a single partition anyway, would you mind trying the same logic without using dask/ df_data = data.compute(scheduler="synchronous")
embedding_df = model_encode(df_data, filter_input_columns=[id_column]) I'm interested to know if this runs any faster. |
Hey @rjzamora . Sorry for the delay in your suggested test. I was pretty busy these days with other tasks. I tested again the current approach with the latest code #18s
embedding_ddf = data.map_partitions(model_encode, filter_input_columns=[id_column])
#51s
embedding_df = embedding_ddf.compute(scheduler="synchronous") Then replaced those lines here by your latest suggestion above #4s
df_data = data.compute(scheduler="synchronous")
#3m18s
embedding_df = model_encode(df_data, filter_input_columns=[id_column]) And I see low GPU usage in both cases (~1-5%) during those commands execution. |
@gabrielspmoreira to retest using new API |
@gabrielspmoreira , please set the milestone we should target this bug fix for. |
@EvenOldridge @viswa-nvidia Have re-tested this with |
Bug description
The evaluation of retrieval models is pretty slow. In particular, the steps related to batch-predict: (1) saving the model and (2) building / updating the top-k index, doing
df.map_partition()
over all the item features to generate item embeddings.Steps/Code to reproduce bug
RetrievalModel._load_or_update_candidates_index()
and check the runtimes for the following lines. During that processThe text was updated successfully, but these errors were encountered: