You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Implement optimized u8 x i8 -> i32 mat-vec kernels for x86
Implement optimized u8 x i8 -> i32 kernel for Arm (for CPUs supporting UDOT)
Implement optimized u8 x i8 -> i32 mat-vec kernels for Arm (for CPUs supporting UDOT)
Implement optimized u8 x i8 -> i32 kernel for Arm (for older CPUs not supporting UDOT)
Implement optimized u8 x i8 -> i32 mat-vec kernels for Arm (for older CPUs not supporting UDOT)
Implement optimized u8 x i8 -> i32 kernel for WASM
Implement optimized u8 x i8 -> i32 mat-vec kernels for WASM
Implement ConvInteger
Add initial support for signed-ness other than u8 x i8 -> i32 in MatMulInteger?
Write documentation that covers how to prepare and run quantized models with RTen, plus issues around data type compatibility and performance that are useful for users to know
The text was updated successfully, but these errors were encountered:
Initial support for running quantized models has been released as part of v0.16.0. The quantization guide has more details and steps for quantizing ONNX models with recommended settings.
The issue tracks the work involved in an MVP of 8-bit quantization support. The goal is to be able to convert and run:
A GPT-2 Large model that has been quantized using
quantize_dynamic
from theonnxruntime.quantization
package. This usesMatMulInteger
A model that uses
ConvInteger
Support i8 and u8 tensors for operator inputs and outputs (Support u8 and i8 tensors in operator inputs, outputs and model files #345)
Support storing i8 and u8 tensors in rten model files (Support u8 and i8 tensors in operator inputs, outputs and model files #345)
Support QuantizeLinear, DequantizeLinear, DynamicQuantizeLinear ops (Implement QuantizeLinear, DequantizeLinear, DynamicQuantizeLinear ops #346)
Support u8 tensors in
Gather
operator (Support u8 tensors inGather
operator #349)Add a script which documents the process of quantizing an ONNX model using dynamic quantisation
Add initial non-optimised MatMulInteger implementation
Implement an initial matmul kernel for u8 x i8 -> i32
Modify MatMulInteger to use the GEMM kernel
Implement optimized u8 x i8 -> i32 matmul kernel for AVX2 (Implement u8 x i8 -> i32 GEMM kernel for x86_64 using AVX2 intrinsics #535)
Implement optimized u8 x i8 -> i32 matmul kernel for AVX-512 (Add AVX-512 int8 GEMM using VNNI #537)
Implement optimized u8 x i8 -> i32 matmul kernel for VNNI ("DL Boost") (Add AVX-512 int8 GEMM using VNNI #537)
Implement optimized u8 x i8 -> i32 mat-vec kernels for x86
Implement optimized u8 x i8 -> i32 kernel for Arm (for CPUs supporting UDOT)
Implement optimized u8 x i8 -> i32 mat-vec kernels for Arm (for CPUs supporting UDOT)
Implement optimized u8 x i8 -> i32 kernel for Arm (for older CPUs not supporting UDOT)
Implement optimized u8 x i8 -> i32 mat-vec kernels for Arm (for older CPUs not supporting UDOT)
Implement optimized u8 x i8 -> i32 kernel for WASM
Implement optimized u8 x i8 -> i32 mat-vec kernels for WASM
Implement ConvInteger
Add initial support for signed-ness other than
u8 x i8 -> i32
in MatMulInteger?Write documentation that covers how to prepare and run quantized models with RTen, plus issues around data type compatibility and performance that are useful for users to know
The text was updated successfully, but these errors were encountered: