Skip to content

Commit

Permalink
Merge branch 'main' into improve-documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
dirkgr authored Nov 26, 2024
2 parents 4e256a9 + 9c677c9 commit 71abc2c
Show file tree
Hide file tree
Showing 166 changed files with 63,869 additions and 1,030 deletions.
6 changes: 5 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- `one_in_eight` configuration for activation checkpointing
- New tokenizer in the source instead of from huggingface
- Improved support for GCS
- `torch.compile()` now only compiles each block, not the whole model.
- Support for `torch.compile()` with `dynamic=True`
- Resetting the `torch.compile()` after every evaluation, because evaluation messes with the compiled versions
- Added more in-loop evaluation tasks to pick from, mostly for scaling law.


## [v0.5.1](https://github.com/allenai/OLMo/releases/tag/v0.5.1) - 2024-10-17
Expand Down Expand Up @@ -55,7 +59,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Swapped in correct flan data mix.
- Fix bug where the attention norm, when applied before the attention block, was modifying the residual stream.
- Fixed `OLMo.from_checkpoint()` so that it correctly loads `olmo_core` and `torch_new` style checkpoints.
- Fixed `preserve_rng_state` being incorrectly set to False when doing gradient checkpointing with dropout
- Fixed `preserve_rng_state` being incorrectly set to False when doing gradient checkpointing with dropout


## [v0.4.0](https://github.com/allenai/OLMo/releases/tag/v0.4.0) - 2024-07-11
Expand Down

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

1,206 changes: 1,206 additions & 0 deletions configs/annealing/peteish7-weka-anneal-from-928646-50B-nowup-refine-rw.yaml

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

1,381 changes: 1,381 additions & 0 deletions configs/peteish1-google.yaml

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion configs/peteish1-weka.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ save_num_unsharded_checkpoints_to_keep: -1
load_path: null

max_duration: 1ep
global_train_batch_size: 1024
global_train_batch_size: 512
device_train_microbatch_size: 4

precision: amp_bf16
Expand Down
1,380 changes: 1,380 additions & 0 deletions configs/peteish13-google.yaml

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion configs/peteish13-s3.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ save_num_unsharded_checkpoints_to_keep: -1
load_path: null

max_duration: 1ep
global_train_batch_size: 1024
global_train_batch_size: 2048
device_train_microbatch_size: 2

precision: amp_bf16
Expand Down
1,380 changes: 1,380 additions & 0 deletions configs/peteish13-weka.yaml

Large diffs are not rendered by default.

1,382 changes: 1,382 additions & 0 deletions configs/peteish7-google.yaml

Large diffs are not rendered by default.

11 changes: 11 additions & 0 deletions olmo/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -696,6 +696,17 @@ class CompilerConfig(BaseConfig):
The backend to use.
"""

dynamic: Optional[bool] = None
"""
From the torch docs:
Use dynamic shape tracing. When this is True, we will up-front attempt to generate a kernel that is as dynamic
as possible to avoid recompilations when sizes change. This may not always work as some
operations/optimizations will force specialization; use TORCH_LOGS=dynamic to debug overspecialization. When
this is False, we will NEVER generate dynamic kernels, we will always specialize. By default (None), we
automatically detect if dynamism has occurred and compile a more dynamic kernel upon recompile.
"""


class DistributedStrategy(StrEnum):
ddp = "ddp"
Expand Down
Loading

0 comments on commit 71abc2c

Please sign in to comment.