diff --git a/CHANGELOG.md b/CHANGELOG.md index f8b163f6d..0990bd247 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -4,11 +4,17 @@ All notable changes to this project will be documented in this file. The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). -## [0.0.28.post3] - TBD -### Fixed: -- Creating a `LowerTriangularMask` no longer creates a CUDA tensor +## [0.0.29] - 2024-12-27 +### Improved: +- [fMHA] Creating a `LowerTriangularMask` no longer creates a CUDA tensor +- [fMHA] Updated Flash-Attention to `v2.7.2.post1` +- [fMHA] Flash-Attention v3 will now be used by `memory_efficient_attention` by default when available, unless the operator is enforced with the `op` keyword-argument. Switching from Flash2 to Flash3 can make transformer trainings ~10% faster end-to-end on H100s +- [fMHA] Fixed a performance regression with the `cutlass` backend for the backward pass (facebookresearch/xformers#1176) - mostly used on older GPUs (eg V100) +- Fixed swiglu operator compatibility with torch-compile with PyTorch 2.6 +- Fix activation checkpointing of SwiGLU when AMP is enabled (facebookresearch/xformers#1152) ### Removed: - Following PyTorch, xFormers no longer builds binaries for conda. Pip is now the only recommended way to get xFormers +- Removed unmaintained/deprecated components in `xformers.components.*` (see facebookresearch/xformers#848) ## [0.0.28.post3] - 2024-10-30 Pre-built binary wheels require PyTorch 2.5.1 diff --git a/version.txt b/version.txt index 369bd4c2a..f092e2be2 100644 --- a/version.txt +++ b/version.txt @@ -1 +1 @@ -0.0.29 +0.0.30