[chore] v0.0.2 release (#64)

blefaudeux · web-flow · commit 2af959d328c9 · 2021-11-01T14:18:42.000-07:00
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -4,9 +4,16 @@ All notable changes to this project will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
-## [next rel] - TBD
+## NEXT - TBD
 ### Fixed
-- This very important bug is fixed (#xxx this is the PR number)
 
 ### Added
-- This awesome feature has been added (#xxx this is the PR number)
+
+
+## [0.0.2] - 2021-11-01
+### Fixed
+- More robust blocksparse [#24]
+
+### Added
+- Rotary embeddings [#32]
+- More flexible layernorm [#50]
diff --git a/HOWTO.md b/HOWTO.md
@@ -282,7 +282,7 @@ Let's look at an example:
     mem_use(pytorch_multihead, {"query": query, "key": query, "value": query, "attn_mask": causal_mask}, "PyTorch")
 ```
 
-On a V100, with PyTorch 1.9, Triton 1.1 and xFormers 0.0.1 this reports something along the lines of:
+On a V100, with PyTorch 1.9, Triton 1.1 and xFormers 0.0.2 this reports something along the lines of:
 
 ```bash
     Blocksparse - Peak memory use: 151MB - 6.619ms
@@ -565,7 +565,7 @@ The equivalent to the PyTorch example above would look like the following. You c
 
 Note that this exposes quite a few more knobs than the PyTorch Transformer interface, but in turn is probably a little more flexible. There are a couple of repeated settings here (dimensions mostly), this is taken care of in the [LRA benchmarking config](benchmarks/LRA/code/config.json)
 
-You can compare the speed and memory use of the vanilla PyTorch Transformer Encoder and an equivalent from xFormers, there is an existing benchmark for that ([see](xformers/benchmarks/benchmark_pytorch_transformer.py)). It can be run with `python3 xformers/benchmarks/benchmark_pytorch_transformer.py`, and returns the loss values for every step along with the training time for a couple of shapes that you can customize. Current results are as follows, on a nVidia V100 (PyTorch 1.9, Triton 1.1, xFormers 0.0.1):
+You can compare the speed and memory use of the vanilla PyTorch Transformer Encoder and an equivalent from xFormers, there is an existing benchmark for that ([see](xformers/benchmarks/benchmark_pytorch_transformer.py)). It can be run with `python3 xformers/benchmarks/benchmark_pytorch_transformer.py`, and returns the loss values for every step along with the training time for a couple of shapes that you can customize. Current results are as follows, on a nVidia V100 (PyTorch 1.9, Triton 1.1, xFormers 0.0.2):
 
 --- Transformer training benchmark - runtime ---
 | Units: s | emb 128 - heads 8 | emb 1024 - heads 8 | emb 2048 - heads 8 |
diff --git a/docs/source/conf.py b/docs/source/conf.py
@@ -35,7 +35,7 @@
 author = "Facebook AI Research"
 
 # The full version, including alpha/beta/rc tags
-release = "0.0.1"
+release = "0.0.2"
 
 
 # -- General configuration ---------------------------------------------------
diff --git a/docs/source/tutorials/blocksparse.rst b/docs/source/tutorials/blocksparse.rst
@@ -90,7 +90,7 @@ Let's look at an example:
     mem_use(multi_head, {"query": query, "key": query, "value": query, "att_mask": causal_mask}, "Blocksparse")
     mem_use(pytorch_multihead, {"query": query, "key": query, "value": query, "attn_mask": causal_mask}, "PyTorch")
 
-On a V100, with PyTorch 1.9, Triton 1.1 and xFormers 0.0.1 this reports something along the lines of:
+On a V100, with PyTorch 1.9, Triton 1.1 and xFormers 0.0.2 this reports something along the lines of:
 
 .. code-block:: bash
 
diff --git a/docs/source/tutorials/pytorch_encoder.rst b/docs/source/tutorials/pytorch_encoder.rst
@@ -238,7 +238,9 @@ There's also an added flexibility with xFormers in that attention mechanisms can
 
 Note that this exposes quite a few more knobs than the PyTorch Transformer interface, but in turn is probably a little more flexible. There are a couple of repeated settings here (dimensions mostly), this is taken care of in the [LRA benchmarking config](benchmarks/LRA/code/config.json)
 
-You can compare the speed and memory use of the vanilla PyTorch Transformer Encoder and an equivalent from xFormers, there is an existing benchmark for that ([see](xformers/benchmarks/benchmark_pytorch_transformer.py)). It can be run with `python3 xformers/benchmarks/benchmark_pytorch_transformer.py`, and returns the loss values for every step along with the training time for a couple of shapes that you can customize. Current results are as follows, on a nVidia V100 (PyTorch 1.9, Triton 1.1, xFormers 0.0.1):
+You can compare the speed and memory use of the vanilla PyTorch Transformer Encoder and an equivalent from xFormers, there is an existing benchmark for that ([see](xformers/benchmarks/benchmark_pytorch_transformer.py)).
+It can be run with `python3 xformers/benchmarks/benchmark_pytorch_transformer.py`, and returns the loss values for every step along with the training time for a couple of shapes that you can customize.
+Current results are as follows, on a nVidia V100 (PyTorch 1.9, Triton 1.1, xFormers 0.0.2):
 
 .. code-block:: bash
 
diff --git a/xformers/__init__.py b/xformers/__init__.py
@@ -6,7 +6,7 @@
 import logging
 
 # Please update the doc version in docs/source/conf.py as well.
-__version__ = "0.0.1"
+__version__ = "0.0.2"
 
 _is_sparse_available = True