merge PV-Tuning into AQLM main #110

justheuristic · 2024-07-02T11:43:31Z

Required

all fixes from Pv updated galqiwi/AQLM#1
update readme to use new finetuning
explain model type conversion in readme; explain how to run old finetuning
update conversion to include args.pt for future exporting
support legacy finetuning mode as part of finetune_fsdp, deprecate finetune_legacy
black/isort

Optional

memory-efficient loss for large vocabularies
better offloading variants for PV finetuning

Validation

full training cycle from initial calibration to P and PV to conversion, using the new code, for some 7B model
verify quality vs main
verify P/PV it/sec does not degrade (...much)
verify that 70B works, maybe not to convergence

# Conflicts: # README.md # convert_to_hf.py # src/modelutils.py

justheuristic · 2024-08-15T13:41:33Z

README - still need to copy from pv-tuning branch

src/aq.py

Vahe1994 · 2024-08-20T10:01:16Z

convert_legacy_model_format.py

@@ -0,0 +1,214 @@
+"""


Do we need this file now? Why can't we save models in the same data format?

This file allows backward compatibility with models quantized with older version

finetune.py

convert_legacy_model_format.py

Vahe1994 · 2024-08-20T10:09:20Z

convert_legacy_model_format.py

+    args.load_dtype = getattr(torch, args.load_dtype) if args.load_dtype != "auto" else "auto"
+    args.code_dtype = getattr(torch, args.code_dtype) if args.code_dtype is not None else None
+
+    if not args.monkeypatch_old_pickle:


the fact that there is 3 way to load the model, that depends on user choosing correct argument, to put it mildly - not great.

True. As a small consolation, using the prmary way with --monkeypatch_old_pickle will always work, it's just not the most efficient way.

we ultimately decided that this is a grave, but currently necessary evil

Vahe1994 · 2024-08-20T10:10:55Z

convert_legacy_model_format.py

+            trust_remote_code=args.trust_remote_code,
+        )
+
+    for module in quantized_model.modules():


I suppose this happens precisely, because there is 3 way to save the model.

It is a bit unclear. Please elaborate.

Resolution: this is a temporary backwards compatibility patch for users tht train with previous main branch and finetune afterwards. It should be deleted in the nearest PR after 2-4 weeks

Vahe1994 · 2024-08-20T11:23:07Z

src/beam_search_l2.py

+
+    def _update_flat_codes(_flat_reference, _flat_codes):
+        """update _flat_codes [num_groups, num_codebooks] to approximate _flat_reference [num_groups, group_size]"""
+        if num_codebooks == 1 and beam_size == 1 and stochastic_rounding_tau == 0 and not force_update:


minor Num_codebooks, stochastic_rounding_tau e.t.c. are obtained from outer function. Because this is inner function, it is not critical, but in the future when changing beam_search_optimal_codes function, one should be careful to track the changes in the variables.

Yep, this is a closure. Would it help if we somehow comment on this fact?

TODO yozh will post a commentary explaining this

The function is using variables from outer scope.

Vahe1994 · 2024-08-20T11:24:08Z

src/beam_search_l2.py

+    trust_ratio: Optional[float] = None,
+) -> torch.Tensor:
+    """
+    Update codes using beam search to minimize L2 error in code values (regardless of activations)


So basically this is beam search for original AQ?

This is a close relative of beam search from AQ. It solves the same problem as AQ, but also incorporates the tricks from LSQ that can be used on GPU.

Namely, this beam search starts from previous solution, whereas AQ beam search starts from scratch
https://github.com/arbabenko/Quantizations/blob/master/aqCoding.py#L37

Vahe1994 · 2024-08-20T13:33:26Z

src/datautils.py

+
+
+@torch.no_grad()
+def evaluate_perplexity(


why is this migrated here?

To the best of my knowledge, this function did not migrate per se, but we originally wrote it here. This evaluation code is intended for evaluation during PV-tuning.

Would you prefer to migrate this somewhere? (e.g. finetune.py)

Resolution : keep it here

dvmazur and others added 30 commits May 8, 2024 21:42

fix

aa8898a

ензщ

fb498f4

fix

15c91ff

yet another fix

e90cbce

m

798ff52

basic pre-shard version

2330ced

typo

bbe38f7

\n

e86ca81

\n

d1b3637

relative error

cd05b9a

TODOs

2649aa4

one less todo

60b4775

one less todo

def623b

typo

64c218b

typo

1c4c3cb

eps

811ba05

order names

82b4e99

add prefetch

3a0fe20

pv experimental

f116fb2

pv experimental

ac91cee

typo

4547546

load correctly

f491c54

load correctly

9417a5a

load correctly

a996b21

debugprint

34084b7

args.dtype

ab1ee81

debugprint

cbc5170

compare exact checksums

a5971a0

debugprint

74ec292

debugprint

68583cf

justheuristic added 14 commits June 25, 2024 23:35

explicit forward_prefetch

f114362

explain the tricky bit

bbacb28

non-lazy init

c4120de

return_tensors="pt"

d16b8dd

fix

c79e5c8

trigger lazy init

3c840dc

rename

71ac701

rename

fc1f2a3

no_grad instead of inference_mode

d72f0bb

offload_student_params

8d14a5c

manually cast student to bf16

20e5870

typo

1f887c4

support --embed_dtype

d40720a

fix error found by @yaldashbz

f0934ef

justheuristic mentioned this pull request Jul 2, 2024

[In progress] merge PV-Tuning into AQLM main #97

Closed

galqiwi and others added 9 commits July 2, 2024 14:54

review

be828a0

rollback

18a51da

minimize diff

2278bb2

reduce diff

247a54b

isort

534fa4b

Merge remote-tracking branch 'origin/main' into pv-updated

da9e6c8

# Conflicts: # README.md # convert_to_hf.py # src/modelutils.py

fix init for PV when amp_dtype is used

3954b4f

typo

06db959

uncomment

a29e728

justheuristic requested a review from Vahe1994 August 15, 2024 13:40

blacked

d3b2cfd

justheuristic marked this pull request as ready for review August 15, 2024 13:41

Vahe1994 approved these changes Aug 20, 2024

View reviewed changes

Vahe1994 merged commit a441a3f into main Aug 21, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

merge PV-Tuning into AQLM main #110

merge PV-Tuning into AQLM main #110

justheuristic commented Jul 2, 2024 •

edited

Loading

justheuristic commented Aug 15, 2024

Vahe1994 Aug 20, 2024

justheuristic Aug 21, 2024

Vahe1994 Aug 20, 2024

justheuristic Aug 21, 2024

justheuristic Aug 21, 2024

Vahe1994 Aug 20, 2024

justheuristic Aug 21, 2024

justheuristic Aug 21, 2024

Vahe1994 Aug 20, 2024

justheuristic Aug 21, 2024

justheuristic Aug 21, 2024

Vahe1994 Aug 21, 2024

Vahe1994 Aug 20, 2024

justheuristic Aug 21, 2024

Vahe1994 Aug 20, 2024

justheuristic Aug 21, 2024

justheuristic Aug 21, 2024

merge PV-Tuning into AQLM main #110

merge PV-Tuning into AQLM main #110

Conversation

justheuristic commented Jul 2, 2024 • edited Loading

justheuristic commented Aug 15, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

justheuristic commented Jul 2, 2024 •

edited

Loading