What's Changed
⚡ Multi-modal (image-to-text) optimized quantization support has been added for Qwen 2-VL and Ovis 1.6-VL. Previous image-to-text model quantizations did not use image calibration data, resulting in less than optimal post-quantization results. Version 1.5.0 is the first release to provide a stable path for multi-modal quantization: only text layers are quantized.
🐛 Fixed Qwen 2-VL model quantization vram usage and post-quant file copy of relevant config files.
🐛 Fixed install/compilations in envs with wrong TORCH_CUDA_ARCH_LIST set (Nvidia docker images)
🐛 Warn about bad torch[cuda] install on Windows
- Fix backend not ipex by @CSY-ModelCloud in #930
- Fix broken ipex check by @Qubitium in #933
- Fix dynamic_cuda validation by @CSY-ModelCloud in #936
- Fix bdist_wheel does not exist on old setuptools by @CSY-ModelCloud in #939
- Add cuda warning on windows by @CSY-ModelCloud in #942
- Add torch inference benchmark by @CL-ModelCloud in #940
- Add
modality
toBaseModel
by @ZX-ModelCloud in #937 - [FIX] qwen_vl_utils should be locally import by @ZX-ModelCloud in #946
- Filter torch cuda arch < 6.0 by @CSY-ModelCloud in #955
- [FIX] wrong filepath was used when model_id_or_path was hugging model id by @ZX-ModelCloud in #956
- Fix import error was not caught by @CSY-ModelCloud in #961
Full Changelog: v1.4.5...v1.5.0