Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't request model from HuggingFace when running prediction #243

Open
s-jse opened this issue Feb 4, 2022 · 3 comments
Open

Don't request model from HuggingFace when running prediction #243

s-jse opened this issue Feb 4, 2022 · 3 comments

Comments

@s-jse
Copy link
Member

s-jse commented Feb 4, 2022

When running genienlp predict on a local model, we should not send any requests to HuggingFace servers at all. I think the transformers library sends a request by default to resolve model names using the most up-to-date model list.
Yesterday, HF servers were down, and my runs on our local server would crash.

Setting TRANSFORMERS_OFFLINE=1 as an environment variable when we are loading the model from disk should work. There might be other solutions as well.

@Mehrad0711
Copy link
Member

From this post, it seems passing local_files_only=True when loading the model works too.

@s-jse
Copy link
Member Author

s-jse commented Aug 20, 2022

The download happens only when the .embeddings directory does not exist, or does not contain the model being used. In this case, whenever we reach a line like config = AutoConfig.from_pretrained(args.pretrained_model, cache_dir=args.embeddings) in TransformerSeq2Seq, HF automatically downloads the config.json file from HF servers, which we then pass to the HF model's constructor by calling super().__init__(config)
This config file is stored in the object and determines the correct behavior of various internal methods. We do not ever save this config file when we save a model, so there is no local copy of it.

All this to say that resolving this issue is not easy.

@Mehrad0711
Copy link
Member

Should we start saving the HF config files then?
Then we set local_files_only to True only if the config file is detected in --path and False otherwise (for backward compatibility).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants