From 960d4c6edad10ebf1b0b5e3241624772ede4ad15 Mon Sep 17 00:00:00 2001 From: Qubitium-ModelCloud Date: Tue, 24 Dec 2024 09:36:52 +0800 Subject: [PATCH] prepare for 1.4.5 release (#962) * prepare for 1.4.5 release * Update README.md --- README.md | 11 ++++++----- setup.py | 2 +- 2 files changed, 7 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index ab166810..bcab1bf0 100644 --- a/README.md +++ b/README.md @@ -16,12 +16,13 @@ * 12/10/2024 [1.4.0](https://github.com/ModelCloud/GPTQModel/releases/tag/v1.4.0) `EvalPlus` harness integration merged upstream. We now support both `lm-eval` and `EvalPlus`. Added pure torch `Torch` kernel. Refactored `Cuda` kernel to be `DynamicCuda` kernel. `Triton` kernel now auto-padded for max model support. `Dynamic` quantization now supports both positive `+:`:default, and `-:` negative matching which allows matched modules to be skipped entirely for quantization. Fixed auto-`Marlin` kerenl selection. Added auto-kernel fallback for unsupported kernel/module pairs. Lots of internal refractor and cleanup in-preparation for transformers/optimum/peft upstream PR merge. Deprecated the saving of `Marlin` weight format since `Marlin` supports auto conversion of `gptq` format to `Marlin` during runtime. * 11/29/2024 [1.3.1](https://github.com/ModelCloud/GPTQModel/releases/tag/v1.3.1) Olmo2 model support. Intel XPU acceleration via IPEX. Model sharding Transformer compat fix due to api deprecation in HF. Removed triton dependency. Triton kernel now optionally dependent on triton pkg. -* 11/26/2024 [1.3.0](https://github.com/ModelCloud/GPTQModel/releases/tag/v1.3.0) Zero-Day Hymba model support. Removed `tqdm` and `rogue` dependency. -* 11/24/2024 [1.2.3](https://github.com/ModelCloud/GPTQModel/releases/tag/v1.2.3) HF GLM model support. ClearML logging integration. Use `device-smi` and replace `gputil` + `psutil` depends. Fixed model unit tests.
Archived News: +* 11/26/2024 [1.3.0](https://github.com/ModelCloud/GPTQModel/releases/tag/v1.3.0) Zero-Day Hymba model support. Removed `tqdm` and `rogue` dependency. +* 11/24/2024 [1.2.3](https://github.com/ModelCloud/GPTQModel/releases/tag/v1.2.3) HF GLM model support. ClearML logging integration. Use `device-smi` and replace `gputil` + `psutil` depends. Fixed model unit tests. + * 11/11/2024 🚀 [1.2.1](https://github.com/ModelCloud/GPTQModel/releases/tag/v1.2.1) Meta MobileLLM model support added. `lm-eval[gptqmodel]` integration merged upstream. Intel/IPEX cpu inference merged replacing QBits (deprecated). Auto-fix/patch ChatGLM-3/GLM-4 compat with latest transformers. New `.load()` and `.save()` api. * 10/29/2024 🚀 [1.1.0](https://github.com/ModelCloud/GPTQModel/releases/tag/v1.1.0) IBM Granite model support. Full auto-buildless wheel install from pypi. Reduce max cpu memory usage by >20% during quantization. 100% CI model/feature coverage. @@ -79,7 +80,7 @@ Public tests/papers and ModelCloud's internal tests have shown that GPTQ is on-p * 🚀 40% faster `packing` stage in quantization (Llama 3.1 8B). 50% faster PPL calculations (OPT). ## Quality: GPTQModel 4bit can match BF16: -🤗 [ModelCloud quantized ultra-high recovery vortex-series models on HF](https://huggingface.co/collections/ModelCloud/vortex-673743382af0a52b2a8b9fe2) +🤗 [ModelCloud quantized Vortex models on HF](https://huggingface.co/collections/ModelCloud/vortex-673743382af0a52b2a8b9fe2) ![image](https://github.com/user-attachments/assets/7b2db012-b8af-4d19-a25d-7023cef19220) @@ -184,8 +185,8 @@ GPTQModel inference is integrated into both [lm-eval](https://github.com/Eleuthe We highly recommend avoid using `ppl` and use `lm-eval`/`evalplus` to validate post-quantization model quality. `ppl` should only be used for regression tests and is not a good indicator of model output quality. ``` -# gptqmodel is integrated into lm-eval >= v0.4.6 -pip install lm-eval>=0.4.6 +# gptqmodel is integrated into lm-eval >= v0.4.7 +pip install lm-eval>=0.4.7 ``` ``` diff --git a/setup.py b/setup.py index 6efe8a72..276a4be6 100644 --- a/setup.py +++ b/setup.py @@ -277,7 +277,7 @@ def run(self): 'ipex': ["intel_extension_for_pytorch>=2.5.0"], 'auto_round': ["auto_round>=0.3"], 'logger': ["clearml", "random_word", "plotly"], - 'eval': ["lm_eval>=0.4.6", "evalplus>=0.3.1"], + 'eval': ["lm_eval>=0.4.7", "evalplus>=0.3.1"], 'triton': ["triton>=2.0.0"] }, include_dirs=include_dirs,