Skip to content

Commit 2af959d

Browse files
authored
[chore] v0.0.2 release (#64)
1 parent 0ae331a commit 2af959d

File tree

6 files changed

+18
-9
lines changed

6 files changed

+18
-9
lines changed

CHANGELOG.md

+10-3
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,16 @@ All notable changes to this project will be documented in this file.
44
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
55
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
66

7-
## [next rel] - TBD
7+
## NEXT - TBD
88
### Fixed
9-
- This very important bug is fixed (#xxx this is the PR number)
109

1110
### Added
12-
- This awesome feature has been added (#xxx this is the PR number)
11+
12+
13+
## [0.0.2] - 2021-11-01
14+
### Fixed
15+
- More robust blocksparse [#24]
16+
17+
### Added
18+
- Rotary embeddings [#32]
19+
- More flexible layernorm [#50]

HOWTO.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -282,7 +282,7 @@ Let's look at an example:
282282
mem_use(pytorch_multihead, {"query": query, "key": query, "value": query, "attn_mask": causal_mask}, "PyTorch")
283283
```
284284

285-
On a V100, with PyTorch 1.9, Triton 1.1 and xFormers 0.0.1 this reports something along the lines of:
285+
On a V100, with PyTorch 1.9, Triton 1.1 and xFormers 0.0.2 this reports something along the lines of:
286286

287287
```bash
288288
Blocksparse - Peak memory use: 151MB - 6.619ms
@@ -565,7 +565,7 @@ The equivalent to the PyTorch example above would look like the following. You c
565565

566566
Note that this exposes quite a few more knobs than the PyTorch Transformer interface, but in turn is probably a little more flexible. There are a couple of repeated settings here (dimensions mostly), this is taken care of in the [LRA benchmarking config](benchmarks/LRA/code/config.json)
567567

568-
You can compare the speed and memory use of the vanilla PyTorch Transformer Encoder and an equivalent from xFormers, there is an existing benchmark for that ([see](xformers/benchmarks/benchmark_pytorch_transformer.py)). It can be run with `python3 xformers/benchmarks/benchmark_pytorch_transformer.py`, and returns the loss values for every step along with the training time for a couple of shapes that you can customize. Current results are as follows, on a nVidia V100 (PyTorch 1.9, Triton 1.1, xFormers 0.0.1):
568+
You can compare the speed and memory use of the vanilla PyTorch Transformer Encoder and an equivalent from xFormers, there is an existing benchmark for that ([see](xformers/benchmarks/benchmark_pytorch_transformer.py)). It can be run with `python3 xformers/benchmarks/benchmark_pytorch_transformer.py`, and returns the loss values for every step along with the training time for a couple of shapes that you can customize. Current results are as follows, on a nVidia V100 (PyTorch 1.9, Triton 1.1, xFormers 0.0.2):
569569

570570
--- Transformer training benchmark - runtime ---
571571
| Units: s | emb 128 - heads 8 | emb 1024 - heads 8 | emb 2048 - heads 8 |

docs/source/conf.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@
3535
author = "Facebook AI Research"
3636

3737
# The full version, including alpha/beta/rc tags
38-
release = "0.0.1"
38+
release = "0.0.2"
3939

4040

4141
# -- General configuration ---------------------------------------------------

docs/source/tutorials/blocksparse.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -90,7 +90,7 @@ Let's look at an example:
9090
mem_use(multi_head, {"query": query, "key": query, "value": query, "att_mask": causal_mask}, "Blocksparse")
9191
mem_use(pytorch_multihead, {"query": query, "key": query, "value": query, "attn_mask": causal_mask}, "PyTorch")
9292
93-
On a V100, with PyTorch 1.9, Triton 1.1 and xFormers 0.0.1 this reports something along the lines of:
93+
On a V100, with PyTorch 1.9, Triton 1.1 and xFormers 0.0.2 this reports something along the lines of:
9494

9595
.. code-block:: bash
9696

docs/source/tutorials/pytorch_encoder.rst

+3-1
Original file line numberDiff line numberDiff line change
@@ -238,7 +238,9 @@ There's also an added flexibility with xFormers in that attention mechanisms can
238238
239239
Note that this exposes quite a few more knobs than the PyTorch Transformer interface, but in turn is probably a little more flexible. There are a couple of repeated settings here (dimensions mostly), this is taken care of in the [LRA benchmarking config](benchmarks/LRA/code/config.json)
240240

241-
You can compare the speed and memory use of the vanilla PyTorch Transformer Encoder and an equivalent from xFormers, there is an existing benchmark for that ([see](xformers/benchmarks/benchmark_pytorch_transformer.py)). It can be run with `python3 xformers/benchmarks/benchmark_pytorch_transformer.py`, and returns the loss values for every step along with the training time for a couple of shapes that you can customize. Current results are as follows, on a nVidia V100 (PyTorch 1.9, Triton 1.1, xFormers 0.0.1):
241+
You can compare the speed and memory use of the vanilla PyTorch Transformer Encoder and an equivalent from xFormers, there is an existing benchmark for that ([see](xformers/benchmarks/benchmark_pytorch_transformer.py)).
242+
It can be run with `python3 xformers/benchmarks/benchmark_pytorch_transformer.py`, and returns the loss values for every step along with the training time for a couple of shapes that you can customize.
243+
Current results are as follows, on a nVidia V100 (PyTorch 1.9, Triton 1.1, xFormers 0.0.2):
242244

243245
.. code-block:: bash
244246

xformers/__init__.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
import logging
77

88
# Please update the doc version in docs/source/conf.py as well.
9-
__version__ = "0.0.1"
9+
__version__ = "0.0.2"
1010

1111
_is_sparse_available = True
1212

0 commit comments

Comments
 (0)