Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to prevent caching? #1263

Closed
keithachorn-intel opened this issue Jun 24, 2024 · 7 comments
Closed

How to prevent caching? #1263

keithachorn-intel opened this issue Jun 24, 2024 · 7 comments

Comments

@keithachorn-intel
Copy link

I am using cm to download the MLPerf DLRM model (~100G) using cm. However, I want to specify the final location of this dataset. By default, it resides in a 'cache' directory with a pseudo-random key in the filepath, so I cannot predict the final location beforehand. Ideally, I want to simply specify the output directory or prevent caching so that it will land in the local dir.

However, despite searching for a way to do this with the documentation in this repo (and trying '--no-cache') the model continues to be cached. Any guidance here?

@arjunsuresh
Copy link
Contributor

Hi @keithachorn-intel we'll add the --no-cache option soon. But you can use --to=<download path> option to change the location of the model download. Please let us know if this works for you.

https://github.com/GATEOverflow/cm4mlops/blob/mlperf-inference/script/get-ml-model-dlrm-terabyte/_cm.json#L21

@anandhu-eng we can follow up our discussion for --no-cache

@anandhu-eng
Copy link
Contributor

Sure @arjunsuresh 🤝

@keithachorn-intel
Copy link
Author

I am returning to this thread for a separate download attempt

This is the package I'm trying download: https://github.com/mlcommons/cm4mlops/tree/mlperf-inference/script/get-ml-model-llama2

It appears to download fully to the cache, but I cannot get it to find the intended directory. I've tried:

  • Setting the '--to' flag
  • Setting the '--outdirname' flag
  • Setting these environmental variables: LLAMA2_CHECKPOINT_PATH and CM_ML_MODEL_PATH

None appeared effective at setting the final model download location. Any suggestions?

@arjunsuresh
Copy link
Contributor

arjunsuresh commented Feb 13, 2025

@keithachorn-intel

Based on your previous request, now we have --outdirname which is uniform for all scripts. The previous to option was only applicable to scripts for which it is implemented.

Also, we are now supporting mlperf-automations via MLCFLow in the MLPerf Automations repository, so not sure if this option is working on the cm4mlops repository which we don't have access to now.

For llama2-70b checkpoint from MLCommons (for submission) you can do

pip install mlc-scripts
mlcr get,ml-model,llama2,_70b --outdirname=<myout_dir>

7b model:

mlcr get,ml-model,llama2,_7b --outdirname=<myout_dir>

For llama2 70b checkpoint from Huggingface you can do

mlcr get,ml-model,llama2,_hf,_70b --outdirname=<myout_dir>

@keithachorn-intel
Copy link
Author

Hi @arjunsuresh . Thank you for the quick reply. I did try adding 'outdirname' (mentioned above), which only worked for downloading the dataset script, but not the model script. However, your 'mlcr' script did work for my needs. Thank you.

@arjunsuresh
Copy link
Contributor

You're welcome @keithachorn-intel Glad that it worked. Sorry that there was an issue with the model variants if you were downloading from MLCommons and not Huggingface. Just fixed it now. Please see the updated commands.

@gfursin
Copy link
Contributor

gfursin commented Feb 14, 2025

I’m glad the issue is resolved, @keithachorn-intel ! I will go ahead and close this ticket. Please don’t hesitate to reach out if you have any further questions!

@gfursin gfursin closed this as completed Feb 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants