diff --git a/CHANGELOG.md b/CHANGELOG.md
index f8b163f6d..0990bd247 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -4,11 +4,17 @@ All notable changes to this project will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
-## [0.0.28.post3] - TBD
-### Fixed:
-- Creating a `LowerTriangularMask` no longer creates a CUDA tensor
+## [0.0.29] - 2024-12-27
+### Improved:
+- [fMHA] Creating a `LowerTriangularMask` no longer creates a CUDA tensor
+- [fMHA] Updated Flash-Attention to `v2.7.2.post1`
+- [fMHA] Flash-Attention v3 will now be used by `memory_efficient_attention` by default when available, unless the operator is enforced with the `op` keyword-argument. Switching from Flash2 to Flash3 can make transformer trainings ~10% faster end-to-end on H100s
+- [fMHA] Fixed a performance regression with the `cutlass` backend for the backward pass (facebookresearch/xformers#1176) - mostly used on older GPUs (eg V100)
+- Fixed swiglu operator compatibility with torch-compile with PyTorch 2.6
+- Fix activation checkpointing of SwiGLU when AMP is enabled (facebookresearch/xformers#1152)
 ### Removed:
 - Following PyTorch, xFormers no longer builds binaries for conda. Pip is now the only recommended way to get xFormers
+- Removed unmaintained/deprecated components in `xformers.components.*` (see facebookresearch/xformers#848)
 
 ## [0.0.28.post3] - 2024-10-30
 Pre-built binary wheels require PyTorch 2.5.1
diff --git a/version.txt b/version.txt
index 369bd4c2a..f092e2be2 100644
--- a/version.txt
+++ b/version.txt
@@ -1 +1 @@
-0.0.29
+0.0.30