Skip to content

Releases: facebookresearch/xformers

Performance improvements for `memory_efficient_attention`

23 May 21:04
Compare
Choose a tag to compare

[0.0.20] - 2023-05-23

Improved

  • fMHA/cutlass (backward): Massive performance improvements when batch_size * num_heads is low (10x+)
  • fMHA/cutlass: Further performance improvements for both the forward & backward kernels
  • fMHA (backward): Now dispatching to cutlass when embed_dim>64
  • fMHA: Updated Flash-Attention to v1.0.5

Added

  • fMHA now runs on H100 (support is experimental)

Bugfixes & perf improvement for `memory_efficient_attention`

28 Apr 08:35
Compare
Choose a tag to compare

[0.0.19] - 2023-04-28

Added

  • Display nvcc version used to compile xformers in python -m xformers.info

Fixed

  • Fixed performance regression with nvcc>11.6 (#712)
  • fMHA/cutlass: Fixed nan in the output when using a torch.Tensor with -inf prefixes as attn_bias (#722)
  • fMHA/cutlass: Fixed nan in the output when the sequence length is larger than 2 ** 15 (#719)
  • fMHA/cutlass: Significative performance improvements (up to 2x) for both the forward pass and backward pass
  • fMHA/cutlass: The kernel are now deterministic
  • fMHA/cutlass: Fixed backward pass correctness when using dropout (#724)

Open sourcing indexing operators

30 Mar 15:38
Compare
Choose a tag to compare
OpenSource experimental indexing ops

ghstack-source-id: 7f0b1213844a6454e582536c683822d6a0a435b6
Pull Request resolved: https://github.com/fairinternal/xformers/pull/536

__original_commit__ = fairinternal/xformers@539a1388e4031fbc2b847f31a5ca341395659e67

Binaries for PT 2.0, mem-eff with bias & dropout, and varying seqlen

28 Mar 12:59
Compare
Choose a tag to compare

This release brings some improvements to the memory_efficient_attention

Pip wheels now target pytorch 2.0.0 - conda builds are available for PT 2.0.0, 1.13.1 and 1.12.1

Fixed

  • fMHA: Fixed BW pass on Sm86/Sm89 GPUs when K > 64 (RTX 3090, RTX 4090, A6000, ..) [#631]

Added

  • fMHA/CUTLASS: Added tensor attn bias support [#587] - contribution from @jfc4050
  • fMHA/CUTLASS: Added tensor attn bias grad support [#587] - contribution from @jfc4050
  • fMHA/CUTLASS: Added dropout support [#587] - contribution from @jfc4050
  • fMHA: Added support for varying sequence lengths [#500]

v0.0.17rc482

23 Mar 12:40
Compare
Choose a tag to compare
v0.0.17rc482 Pre-release
Pre-release
Fix conda with GLIBC (attempt 2)

ghstack-source-id: fd6e6a4f4909f787c8398b1b04ef13fd994ac1ec
Pull Request resolved: https://github.com/fairinternal/xformers/pull/510

__original_commit__ = fairinternal/xformers@0fbef8c5feb7b76307db32bbc6df8f39afd90751

v0.0.17rc481

21 Mar 18:23
Compare
Choose a tag to compare
v0.0.17rc481 Pre-release
Pre-release
Fix CI - anaconda upload + disable fairinternal wheels

ghstack-source-id: 8a817e879758a391894b3b6829de74d173c2fa67
Pull Request resolved: https://github.com/fairinternal/xformers/pull/505

__original_commit__ = fairinternal/xformers@c13d138e19030bf6e290721a96fe52814eb19a70

Pip wheels, improvements to mem-eff and more

31 Jan 12:27
Compare
Choose a tag to compare

This release contain many improvements to memory_efficient_attention, along with pip wheels now available on windows and linux!

New Features

Improvements

  • Stripe lineinfo from binaries, reducing the binary size [#549]
  • fMHA: Stricter inputs validation to avoid CUDA errors for unsupported inputs [#592]
  • fMHA/Flash-Attention: Updated to Dao-AILab/flash-attention@a1f49a2 with multiple changes from @TriDao that make the operator up to 20% faster
  • Updated triton dependency [#418]

Bug fixes

  • Fixed compatibility with Python 3.7 [#541] - thanks to @susumuota
  • fMHA: Fixed strides for QKV gradients for cutlass attention [#535]
  • fMHA/Flash-Attention: Fixed backward pass wrapper, where non-contiguous gradients could give the wrong result [#548]

v0.0.13

26 Sep 19:07
1d31a3a
Compare
Choose a tag to compare

Lots of improvements and bug fixes around the memory efficient attention.

v0.0.12

08 Aug 15:24
Compare
Choose a tag to compare

[0.0.12] - 2022-08-08

Fixed

  • Removed duplicated biases in the FusedMLP layers [#317]
  • Rotary embeddings respecting input types [#326]
  • Poolformer style instantiating useless projection layers [#349]
  • Fix layer position not being properly tracked, causing extra layernorms for programmatic xformers [#348]
  • Pass use_triton flag to LayerNorm module [#336]

Added

  • Four blocksparsity layouts from DeepSpeed [#320]
  • Support several initialization options [#312]
  • Conv2DFeedforward feedforward part [#321]
  • VisualAttention [#329]
  • Automatic blocksparse for causal attention [#334]
  • Better hierarchical transformer generation [#345]
  • Fused operations with AOTAutograd/NVFuser, integration into MLP [#357]
  • Refactor LRA code to use Pytorch Lightning [#343]

v0.0.11

30 May 21:25
8a2ef26
Compare
Choose a tag to compare

[0.0.11] - 2022-05-30

Fixed

  • Fix some torchscriptability [#246]
  • Fix FourierMix being compatible with AMP [#258]
  • Better asserts on QKV dimensions [#264]
  • Better perfs for FusedMLP and FusedLinearLayer [#283]
  • Deepnorm init missing self-attention [#284]

Added

  • Simplicial Embeddings [#259]
  • Mem efficient attention, FW pass [#267]
  • MHA benchmark
  • MLP benchmark
  • Move all triton kernels to triton v2 [#272]
  • Mem efficient attention, BW pass [#281]
  • Metaformer support [#294]