-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not check trace for diffusers, saving memory and time for FLUX #1064
base: main
Are you sure you want to change the base?
Conversation
Awesome! Thanks @mvafin. Please check and fix the quality:
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Done |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a fan of patches, long term solution would be to decouple the export process by doing torch.jit.trace
with the arguments we want then ov.convert_model
with the resulting model.
I agree. I see that torch.jit.trace in one place in the Optimum-Intel code. @mvafin, why did you just not pass the argument there? |
|
okay, we can go with a patch but let's at least make it a clean one, the naming in the PR is a bit confusing and I don't think the patching should be in
|
What does this PR do?
This is further optimization of memory consumption of diffusers conversion, continuation of this PR: #1033
When
check_trace=True
the TorchScript graph is generated second time and is compared with the graph generated first time. It is useful to catch incorrect traced graph sometimes, but in optimum we control which models are supported and such issues shouldn't happen.Currently only introduce this for
diffusers
, but can be done for all the models.The most impact on memory is demonstrated for FLUX, for other diffusers it significantly reduces conversion time.
|
Fixes # (issue)
Before submitting