-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
merge PV-Tuning into AQLM main #110
Conversation
# Conflicts: # README.md # convert_to_hf.py # src/modelutils.py
README - still need to copy from pv-tuning branch |
@@ -0,0 +1,214 @@ | |||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this file now? Why can't we save models in the same data format?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file allows backward compatibility with models quantized with older version
args.load_dtype = getattr(torch, args.load_dtype) if args.load_dtype != "auto" else "auto" | ||
args.code_dtype = getattr(torch, args.code_dtype) if args.code_dtype is not None else None | ||
|
||
if not args.monkeypatch_old_pickle: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the fact that there is 3 way to load the model, that depends on user choosing correct argument, to put it mildly - not great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. As a small consolation, using the prmary way with --monkeypatch_old_pickle
will always work, it's just not the most efficient way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we ultimately decided that this is a grave, but currently necessary evil
trust_remote_code=args.trust_remote_code, | ||
) | ||
|
||
for module in quantized_model.modules(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose this happens precisely, because there is 3 way to save the model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a bit unclear. Please elaborate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolution: this is a temporary backwards compatibility patch for users tht train with previous main branch and finetune afterwards. It should be deleted in the nearest PR after 2-4 weeks
|
||
def _update_flat_codes(_flat_reference, _flat_codes): | ||
"""update _flat_codes [num_groups, num_codebooks] to approximate _flat_reference [num_groups, group_size]""" | ||
if num_codebooks == 1 and beam_size == 1 and stochastic_rounding_tau == 0 and not force_update: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor Num_codebooks, stochastic_rounding_tau e.t.c. are obtained from outer function. Because this is inner function, it is not critical, but in the future when changing beam_search_optimal_codes function, one should be careful to track the changes in the variables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, this is a closure. Would it help if we somehow comment on this fact?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO yozh will post a commentary explaining this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function is using variables from outer scope.
trust_ratio: Optional[float] = None, | ||
) -> torch.Tensor: | ||
""" | ||
Update codes using beam search to minimize L2 error in code values (regardless of activations) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So basically this is beam search for original AQ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a close relative of beam search from AQ. It solves the same problem as AQ, but also incorporates the tricks from LSQ that can be used on GPU.
Namely, this beam search starts from previous solution, whereas AQ beam search starts from scratch
https://github.com/arbabenko/Quantizations/blob/master/aqCoding.py#L37
|
||
|
||
@torch.no_grad() | ||
def evaluate_perplexity( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this migrated here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To the best of my knowledge, this function did not migrate per se, but we originally wrote it here. This evaluation code is intended for evaluation during PV-tuning.
Would you prefer to migrate this somewhere? (e.g. finetune.py)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolution : keep it here
Required
Optional
Validation