nebullvm 0.4.4 Release Notes

This release of Nebullvm provides new optimizers and various improvements in code stability.

New Features

Update notebooks with new api.
Improve test coverage.
Add Intel Neural compressor pruning and quantization.
The computation of the latency of the models now uses all the data and not only the first sample.
Dynamic shape of openvino has been updated with the new method available from version 2
Now the optimized model is discarted if the result is different from the original model (metric_drop_ths=0)

Fix an issue during onnx quantization, now it's much faster than before.
Fix a tensor RT bug in static quantization with onnx interface.
Fixes and improvements on the torchscript compiler: now it supports also trace and torch.fx for tracing the model.
Fix a bug on macos related to ONNX and int8 quantization.
Fix a bug on sparseml that prevented it from working on colab.
Bug-fixes on the deepsparse compiler.
Fixes and improvements on the onnx internal model handling.
Fix an issue on tensorflow backend.
Fixes on torch and onnx tensorrt with transformers.
Fix a bug on tensor rt static quantization when using a new version of polygraphy
Fix a bug on huggingface when passing the tokenizer to the optimize_model function
Fix a bug when using quantization with a few data