Merge remote-tracking branch 'origin/main' into dave/annealing_peteis…

…h_v2
allenai · Nov 26, 2024 · d7623fb · d7623fb
2 parents 06d1e9a + 767047c
commit d7623fb
Show file tree

Hide file tree

Showing 250 changed files with 62,900 additions and 1,123 deletions.
diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml
@@ -148,7 +148,6 @@ jobs:
                 constraints:
                   cluster:
                     - ai2/general-cirrascale
-                    - ai2/general-cirrascale-a100-80g-ib
                     - ai2/allennlp-cirrascale
                 envVars:
                   - name: COMMIT_SHA

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -9,21 +9,58 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ### Added
 
+- A bunch of annealing configs
+- `constant_with_warmup` learning rate schedule
+- `one_in_eight` configuration for activation checkpointing
+- New tokenizer in the source instead of from huggingface
+- Improved support for GCS
+- `torch.compile()` now only compiles each block, not the whole model.
+- Support for `torch.compile()` with `dynamic=True`
+- Resetting the `torch.compile()` after every evaluation, because evaluation messes with the compiled versions
+- Added more in-loop evaluation tasks to pick from, mostly for scaling law.
+
+
+## [v0.5.1](https://github.com/allenai/OLMo/releases/tag/v0.5.1) - 2024-10-17
+
+### Added
+
+- Added ability to try loading latest checkpoint from save folder using `--try_load_latest_save`.
+- Added support for flash attention and gradient checkpointing to `hf_olmo`.
+- Added to `scripts.compare_wandb_configs.py` the ability to more easily compare differences in data mixes and evaluation tasks.
+- Added `effective_n_kv_heads` to OLMoConfig for hacky VLLM support.
+
+## [v0.5.0](https://github.com/allenai/OLMo/releases/tag/v0.5.0) - 2024-08-26
+
+- Fixed conversion to HuggingFace model for DDP-trained models.
+- Added support for remote source and destination for HuggingFace model conversion.
+
+### Added
+
 - Added support for document masking via flash-attn during training with `--data.generate_doc_lengths`.
 - Added config options for `model.norm_after`, `model.scale_emb_init`, and `auxiliary_loss_multiplier` (used with zloss).
 - Added scripts for running experiments on qk_norm, norm reordering, and zloss.
 - Added `model.rope_theta` configuration option.
 - Added `model.embedding_layer_norm` configuration option for adding a LN to the embeddings.
 - Added `model.emb_init_std` configuration option to override the standard deviation used to initialize the embeddings.
+- Added downstream eval task for requests dumped from oe-eval tasks
+- Added `CosLinearEnvelope` scheduler, which is a pointwise product of a cosine schedule and a linear decay.
+- Added ability to save outputs of submodules for debugging purposes.
+- Added a number of tasks from oe-eval to the downstream eval tasks.
+- Version dolma flan change in named_data_mix.py
 
 ### Changed
 
 - Changed default distributed training strategy from single-GPU to FSDP
-- Fixed behavior of `effective_memmap_dtype` to prevent unrecognized dtypes to be parsed as `uint16`. 
+- Fixed behavior of `effective_memmap_dtype` to prevent unrecognized dtypes to be parsed as `uint16`.
 
 ### Fixed
 
-- Fixed restarting a training run in later epochs so that we no longer need to set the flag `--epoch=INT`. 
+- Fixed restarting a training run in later epochs so that we no longer need to set the flag `--epoch=INT`.
+- Swapped in correct flan data mix.
+- Fix bug where the attention norm, when applied before the attention block, was modifying the residual stream.
+- Fixed `OLMo.from_checkpoint()` so that it correctly loads `olmo_core` and `torch_new` style checkpoints.
+- Fixed `preserve_rng_state` being incorrectly set to False when doing gradient checkpointing with dropout
+
 
 ## [v0.4.0](https://github.com/allenai/OLMo/releases/tag/v0.4.0) - 2024-07-11
 

diff --git a/README.md b/README.md
@@ -48,6 +48,8 @@ The core models in the OLMo family released so far are (all trained on the [Dolm
 | [OLMo 1B](https://huggingface.co/allenai/OLMo-1B) | 3 Trillion | 2048 | [configs/official/OLMo-1B.yaml](https://github.com/allenai/OLMo/blob/main/configs/official/OLMo-1B.yaml) | [wandb.ai/…/OLMo-1B](https://wandb.ai/ai2-llm/OLMo-1B/reports/OLMo-1B--Vmlldzo2NzY1Njk1) | [epoch 1](https://olmo-checkpoints.org/ai2-llm/olmo-small/46zc5fly/train_data/global_indices.npy) |
 | [OLMo 7B](https://huggingface.co/allenai/OLMo-7B) | 2.5 Trillion | 2048 | [configs/official/OLMo-7B.yaml](https://github.com/allenai/OLMo/blob/main/configs/official/OLMo-7B.yaml) | [wandb.ai/…/OLMo-7B](https://wandb.ai/ai2-llm/OLMo-7B/reports/OLMo-7B--Vmlldzo2NzQyMzk5) | [epoch 1](https://olmo-checkpoints.org/ai2-llm/olmo-medium/wvc30anm/train_data/global_indices.npy), [epoch 2](https://olmo-checkpoints.org/ai2-llm/olmo-medium/wd2gxrza/train_data/global_indices.npy) |
 | [OLMo 7B Twin 2T](https://huggingface.co/allenai/OLMo-7B-Twin-2T) | 2 Trillion  | 2048 | [configs/official/OLMo-7B.yaml](https://github.com/allenai/OLMo/blob/main/configs/official/OLMo-7B.yaml) | [wandb.ai/…/OLMo-7B-Twin-2T](https://wandb.ai/ai2-llm/OLMo-7B/reports/OLMo-7B-Twin-2T--Vmlldzo2NzU0NTIz) | [epoch 1](https://olmo-checkpoints.org/ai2-llm/olmo-medium/wvc30anm/train_data/global_indices.npy) |
+| [OLMo 7B April 2024](https://huggingface.co/allenai/OLMo-7B-0424-hf) | 2.05 Trillion  | 4096 | [configs/official/OLMo-7B-0424.yaml](https://github.com/allenai/OLMo/blob/main/configs/official/OLMo-7B-0424.yaml) | *Coming soon* | *Coming soon* |
+| [OLMo 7B July 2024](https://huggingface.co/allenai/OLMo-7B-0724-hf) | 2.75 Trillion  | 4096 | [configs/official/OLMo-7B-0724.yaml](https://github.com/allenai/OLMo/blob/main/configs/official/OLMo-7B-0724.yaml) | *Coming soon* | *Coming soon* |
 
 > ☨ *See [Inspecting training data](#inspecting-training-data) below for usage.*
 
@@ -67,8 +69,8 @@ You can utilize our Hugging Face integration to run inference on the OLMo Transf
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
 
-olmo = AutoModelForCausalLM.from_pretrained("allenai/OLMo-1.7-7B-hf")
-tokenizer = AutoTokenizer.from_pretrained("allenai/OLMo-1.7-7B-hf")
+olmo = AutoModelForCausalLM.from_pretrained("allenai/OLMo-7B-0724-hf")
+tokenizer = AutoTokenizer.from_pretrained("allenai/OLMo-7B-0724-hf")
 
 message = ["Language modeling is "]
 inputs = tokenizer(message, return_tensors='pt', return_token_type_ids=False)
@@ -80,7 +82,7 @@ Alternatively, with the Hugging Face pipeline abstraction:
 
 ```python
 from transformers import pipeline
-olmo_pipe = pipeline("text-generation", model="allenai/OLMo-1.7-7B-hf")
+olmo_pipe = pipeline("text-generation", model="allenai/OLMo-7B-0724-hf")
 print(olmo_pipe("Language modeling is"))
 ```
 
@@ -95,7 +97,7 @@ python scripts/convert_olmo_to_hf_new.py --input_dir /path/to/olmo/checkpoint --
 ### Quantization
 
 ```python
-olmo = AutoModelForCausalLM.from_pretrained("allenai/OLMo-1.7-7B-hf", torch_dtype=torch.float16, load_in_8bit=True)  # requires bitsandbytes
+olmo = AutoModelForCausalLM.from_pretrained("allenai/OLMo-7B-0724-hf", torch_dtype=torch.float16, load_in_8bit=True)  # requires bitsandbytes
 ```
 
 The quantized model is more sensitive to typing / cuda, so it is recommended to pass the inputs as inputs.input_ids.to('cuda') to avoid potential issues.
@@ -204,6 +206,10 @@ Note: passing CLI overrides like `--reset_trainer_state` is only necessary if yo
 
 Additional tools for evaluating OLMo models are available at the [OLMo Eval](https://github.com/allenai/ai2-olmo-eval) repo.
 
+## Debugging
+
+See [Debugging](https://github.com/allenai/OLMo/blob/main/docs/NOTES.md#debugging).
+
 ## Citing
 
 ```bibtex

diff --git a/checkpoints/official/OLMo-1.7-7B.csv → checkpoints/official/OLMo-7B-0424.csv b/checkpoints/official/OLMo-1.7-7B.csv → checkpoints/official/OLMo-7B-0424.csv