Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge PV-Tuning into AQLM main #110

Merged
merged 542 commits into from
Aug 21, 2024
Merged

merge PV-Tuning into AQLM main #110

merged 542 commits into from
Aug 21, 2024

Conversation

justheuristic
Copy link
Collaborator

@justheuristic justheuristic commented Jul 2, 2024

Required

  • all fixes from Pv updated galqiwi/AQLM#1
  • update readme to use new finetuning
  • explain model type conversion in readme; explain how to run old finetuning
  • update conversion to include args.pt for future exporting
  • support legacy finetuning mode as part of finetune_fsdp, deprecate finetune_legacy
  • black/isort

Optional

  • memory-efficient loss for large vocabularies
  • better offloading variants for PV finetuning

Validation

  • full training cycle from initial calibration to P and PV to conversion, using the new code, for some 7B model
  • verify quality vs main
  • verify P/PV it/sec does not degrade (...much)
  • verify that 70B works, maybe not to convergence

@justheuristic justheuristic requested a review from Vahe1994 August 15, 2024 13:40
@justheuristic justheuristic marked this pull request as ready for review August 15, 2024 13:41
@justheuristic
Copy link
Collaborator Author

README - still need to copy from pv-tuning branch

@@ -0,0 +1,214 @@
"""
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this file now? Why can't we save models in the same data format?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file allows backward compatibility with models quantized with older version

args.load_dtype = getattr(torch, args.load_dtype) if args.load_dtype != "auto" else "auto"
args.code_dtype = getattr(torch, args.code_dtype) if args.code_dtype is not None else None

if not args.monkeypatch_old_pickle:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the fact that there is 3 way to load the model, that depends on user choosing correct argument, to put it mildly - not great.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. As a small consolation, using the prmary way with --monkeypatch_old_pickle will always work, it's just not the most efficient way.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we ultimately decided that this is a grave, but currently necessary evil

trust_remote_code=args.trust_remote_code,
)

for module in quantized_model.modules():
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose this happens precisely, because there is 3 way to save the model.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a bit unclear. Please elaborate.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolution: this is a temporary backwards compatibility patch for users tht train with previous main branch and finetune afterwards. It should be deleted in the nearest PR after 2-4 weeks


def _update_flat_codes(_flat_reference, _flat_codes):
"""update _flat_codes [num_groups, num_codebooks] to approximate _flat_reference [num_groups, group_size]"""
if num_codebooks == 1 and beam_size == 1 and stochastic_rounding_tau == 0 and not force_update:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor Num_codebooks, stochastic_rounding_tau e.t.c. are obtained from outer function. Because this is inner function, it is not critical, but in the future when changing beam_search_optimal_codes function, one should be careful to track the changes in the variables.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, this is a closure. Would it help if we somehow comment on this fact?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO yozh will post a commentary explaining this

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function is using variables from outer scope.

trust_ratio: Optional[float] = None,
) -> torch.Tensor:
"""
Update codes using beam search to minimize L2 error in code values (regardless of activations)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So basically this is beam search for original AQ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a close relative of beam search from AQ. It solves the same problem as AQ, but also incorporates the tricks from LSQ that can be used on GPU.

Namely, this beam search starts from previous solution, whereas AQ beam search starts from scratch
https://github.com/arbabenko/Quantizations/blob/master/aqCoding.py#L37



@torch.no_grad()
def evaluate_perplexity(
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this migrated here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To the best of my knowledge, this function did not migrate per se, but we originally wrote it here. This evaluation code is intended for evaluation during PV-tuning.

Would you prefer to migrate this somewhere? (e.g. finetune.py)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolution : keep it here

@Vahe1994 Vahe1994 merged commit a441a3f into main Aug 21, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants