How to prevent caching? #1263

keithachorn-intel · 2024-06-24T20:20:40Z

I am using cm to download the MLPerf DLRM model (~100G) using cm. However, I want to specify the final location of this dataset. By default, it resides in a 'cache' directory with a pseudo-random key in the filepath, so I cannot predict the final location beforehand. Ideally, I want to simply specify the output directory or prevent caching so that it will land in the local dir.

However, despite searching for a way to do this with the documentation in this repo (and trying '--no-cache') the model continues to be cached. Any guidance here?

arjunsuresh · 2024-06-26T00:20:23Z

Hi @keithachorn-intel we'll add the --no-cache option soon. But you can use --to=<download path> option to change the location of the model download. Please let us know if this works for you.

https://github.com/GATEOverflow/cm4mlops/blob/mlperf-inference/script/get-ml-model-dlrm-terabyte/_cm.json#L21

@anandhu-eng we can follow up our discussion for --no-cache

anandhu-eng · 2024-06-26T05:55:25Z

Sure @arjunsuresh 🤝

keithachorn-intel · 2025-02-13T21:57:24Z

I am returning to this thread for a separate download attempt

This is the package I'm trying download: https://github.com/mlcommons/cm4mlops/tree/mlperf-inference/script/get-ml-model-llama2

It appears to download fully to the cache, but I cannot get it to find the intended directory. I've tried:

Setting the '--to' flag
Setting the '--outdirname' flag
Setting these environmental variables: LLAMA2_CHECKPOINT_PATH and CM_ML_MODEL_PATH

None appeared effective at setting the final model download location. Any suggestions?

arjunsuresh · 2025-02-13T23:28:52Z

@keithachorn-intel

Based on your previous request, now we have --outdirname which is uniform for all scripts. The previous to option was only applicable to scripts for which it is implemented.

Also, we are now supporting mlperf-automations via MLCFLow in the MLPerf Automations repository, so not sure if this option is working on the cm4mlops repository which we don't have access to now.

For llama2-70b checkpoint from MLCommons (for submission) you can do

pip install mlc-scripts
mlcr get,ml-model,llama2,_70b --outdirname=<myout_dir>

7b model:

mlcr get,ml-model,llama2,_7b --outdirname=<myout_dir>

For llama2 70b checkpoint from Huggingface you can do

mlcr get,ml-model,llama2,_hf,_70b --outdirname=<myout_dir>

keithachorn-intel · 2025-02-14T08:40:55Z

Hi @arjunsuresh . Thank you for the quick reply. I did try adding 'outdirname' (mentioned above), which only worked for downloading the dataset script, but not the model script. However, your 'mlcr' script did work for my needs. Thank you.

arjunsuresh · 2025-02-14T13:05:25Z

You're welcome @keithachorn-intel Glad that it worked. Sorry that there was an issue with the model variants if you were downloading from MLCommons and not Huggingface. Just fixed it now. Please see the updated commands.

gfursin · 2025-02-14T15:31:19Z

I’m glad the issue is resolved, @keithachorn-intel ! I will go ahead and close this ticket. Please don’t hesitate to reach out if you have any further questions!

gfursin closed this as completed Feb 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to prevent caching? #1263

How to prevent caching? #1263

keithachorn-intel commented Jun 24, 2024

arjunsuresh commented Jun 26, 2024

anandhu-eng commented Jun 26, 2024

keithachorn-intel commented Feb 13, 2025

arjunsuresh commented Feb 13, 2025 •

edited

Loading

keithachorn-intel commented Feb 14, 2025

arjunsuresh commented Feb 14, 2025

gfursin commented Feb 14, 2025

How to prevent caching? #1263

How to prevent caching? #1263

Comments

keithachorn-intel commented Jun 24, 2024

arjunsuresh commented Jun 26, 2024

anandhu-eng commented Jun 26, 2024

keithachorn-intel commented Feb 13, 2025

arjunsuresh commented Feb 13, 2025 • edited Loading

keithachorn-intel commented Feb 14, 2025

arjunsuresh commented Feb 14, 2025

gfursin commented Feb 14, 2025

arjunsuresh commented Feb 13, 2025 •

edited

Loading