Releases · facebookresearch/xformers

23 May 21:04

danthe3rd

v0.0.20

1dc3d7a

Performance improvements for `memory_efficient_attention`

[0.0.20] - 2023-05-23

Improved

fMHA/cutlass (backward): Massive performance improvements when batch_size * num_heads is low (10x+)
fMHA/cutlass: Further performance improvements for both the forward & backward kernels
fMHA (backward): Now dispatching to cutlass when embed_dim>64
fMHA: Updated Flash-Attention to v1.0.5

Added

fMHA now runs on H100 (support is experimental)

Assets 2

28 Apr 08:35

danthe3rd

v0.0.19

8bf59c9

Bugfixes & perf improvement for `memory_efficient_attention`

[0.0.19] - 2023-04-28

Added

Display nvcc version used to compile xformers in python -m xformers.info

Fixed

Fixed performance regression with nvcc>11.6 (#712)
fMHA/cutlass: Fixed nan in the output when using a torch.Tensor with -inf prefixes as attn_bias (#722)
fMHA/cutlass: Fixed nan in the output when the sequence length is larger than 2 ** 15 (#719)
fMHA/cutlass: Significative performance improvements (up to 2x) for both the forward pass and backward pass
fMHA/cutlass: The kernel are now deterministic
fMHA/cutlass: Fixed backward pass correctness when using dropout (#724)

Assets 2

30 Mar 15:38

danthe3rd

v0.0.18

da27862

Open sourcing indexing operators

OpenSource experimental indexing ops

ghstack-source-id: 7f0b1213844a6454e582536c683822d6a0a435b6
Pull Request resolved: https://github.com/fairinternal/xformers/pull/536

__original_commit__ = fairinternal/xformers@539a1388e4031fbc2b847f31a5ca341395659e67

Assets 2

28 Mar 12:59

danthe3rd

v0.0.17

5eb0dbf

Binaries for PT 2.0, mem-eff with bias & dropout, and varying seqlen

This release brings some improvements to the memory_efficient_attention

Pip wheels now target pytorch 2.0.0 - conda builds are available for PT 2.0.0, 1.13.1 and 1.12.1

Fixed

fMHA: Fixed BW pass on Sm86/Sm89 GPUs when K > 64 (RTX 3090, RTX 4090, A6000, ..) [#631]

Added

fMHA/CUTLASS: Added tensor attn bias support [#587] - contribution from @jfc4050
fMHA/CUTLASS: Added tensor attn bias grad support [#587] - contribution from @jfc4050
fMHA/CUTLASS: Added dropout support [#587] - contribution from @jfc4050
fMHA: Added support for varying sequence lengths [#500]

Assets 2

23 Mar 12:40

danthe3rd

v0.0.17rc482

6967620

v0.0.17rc482 Pre-release

Pre-release

Fix conda with GLIBC (attempt 2)

ghstack-source-id: fd6e6a4f4909f787c8398b1b04ef13fd994ac1ec
Pull Request resolved: https://github.com/fairinternal/xformers/pull/510

__original_commit__ = fairinternal/xformers@0fbef8c5feb7b76307db32bbc6df8f39afd90751

Assets 2

21 Mar 18:23

danthe3rd

v0.0.17rc481

6d62b0e

v0.0.17rc481 Pre-release

Pre-release

Fix CI - anaconda upload + disable fairinternal wheels

ghstack-source-id: 8a817e879758a391894b3b6829de74d173c2fa67
Pull Request resolved: https://github.com/fairinternal/xformers/pull/505

__original_commit__ = fairinternal/xformers@c13d138e19030bf6e290721a96fe52814eb19a70

Assets 2

31 Jan 12:27

danthe3rd

v0.0.16

6f3c20f

Pip wheels, improvements to mem-eff and more

This release contain many improvements to memory_efficient_attention, along with pip wheels now available on windows and linux!

New Features

Added support for pip wheels [#588, #573, #534, #523, ...] big thanks to @AbdBarho!
fMHA: Added Triton operator for forward pass from Flash-Attention authored by @TriDao, will be automatically used on A100 when compatible
fMHA: Added xformers.ops.memory_efficient_attention_forward, xformers.ops.memory_efficient_attention_forward_requires_grad, xformers.ops.memory_efficient_attention_backward for power-users who write custom autograd functions [#560]
fMHA: Support for custom scaling for the CUTLASS-based kernel [#530] - contribution from @comaniac
fMHA: Separate each operator into forward and backward operators. It's now possible to use any combination of forward+backward (for instance Triton forward and Flash-Attention backward) [#560]

Improvements

Stripe lineinfo from binaries, reducing the binary size [#549]
fMHA: Stricter inputs validation to avoid CUDA errors for unsupported inputs [#592]
fMHA/Flash-Attention: Updated to Dao-AILab/flash-attention@a1f49a2 with multiple changes from @TriDao that make the operator up to 20% faster
Updated triton dependency [#418]

Bug fixes

Fixed compatibility with Python 3.7 [#541] - thanks to @susumuota
fMHA: Fixed strides for QKV gradients for cutlass attention [#535]
fMHA/Flash-Attention: Fixed backward pass wrapper, where non-contiguous gradients could give the wrong result [#548]

Assets 2

26 Sep 19:07

blefaudeux

v0.0.13

1d31a3a

v0.0.13

Lots of improvements and bug fixes around the memory efficient attention.

Assets 2

08 Aug 15:24

dianaml0

v0.0.12

ec81ca2

v0.0.12

[0.0.12] - 2022-08-08

Fixed

Removed duplicated biases in the FusedMLP layers [#317]
Rotary embeddings respecting input types [#326]
Poolformer style instantiating useless projection layers [#349]
Fix layer position not being properly tracked, causing extra layernorms for programmatic xformers [#348]
Pass use_triton flag to LayerNorm module [#336]

Added

Four blocksparsity layouts from DeepSpeed [#320]
Support several initialization options [#312]
Conv2DFeedforward feedforward part [#321]
VisualAttention [#329]
Automatic blocksparse for causal attention [#334]
Better hierarchical transformer generation [#345]
Fused operations with AOTAutograd/NVFuser, integration into MLP [#357]
Refactor LRA code to use Pytorch Lightning [#343]

Assets 2

30 May 21:25

blefaudeux

v0.0.11

8a2ef26

v0.0.11

[0.0.11] - 2022-05-30

Fixed

Fix some torchscriptability [#246]
Fix FourierMix being compatible with AMP [#258]
Better asserts on QKV dimensions [#264]
Better perfs for FusedMLP and FusedLinearLayer [#283]
Deepnorm init missing self-attention [#284]

Added

Simplicial Embeddings [#259]
Mem efficient attention, FW pass [#267]
MHA benchmark
MLP benchmark
Move all triton kernels to triton v2 [#272]
Mem efficient attention, BW pass [#281]
Metaformer support [#294]

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[0.0.20] - 2023-05-23

Improved

Added

[0.0.19] - 2023-04-28

Added

Fixed

Fixed

Added

New Features

Improvements

Bug fixes

[0.0.12] - 2022-08-08

Fixed

Added

[0.0.11] - 2022-05-30

Fixed

Added

Releases: facebookresearch/xformers

Performance improvements for `memory_efficient_attention`

[0.0.20] - 2023-05-23

Improved

Added

Bugfixes & perf improvement for `memory_efficient_attention`

[0.0.19] - 2023-04-28

Added

Fixed

Open sourcing indexing operators

Binaries for PT 2.0, mem-eff with bias & dropout, and varying seqlen

Fixed

Added

v0.0.17rc482

v0.0.17rc481

Pip wheels, improvements to mem-eff and more

New Features

Improvements

Bug fixes

v0.0.13

v0.0.12

[0.0.12] - 2022-08-08

Fixed

Added

v0.0.11

[0.0.11] - 2022-05-30

Fixed

Added