Releases: allenai/OLMo
v0.6.0
What's new
Added 🎉
- A bunch of annealing configs
constant_with_warmup
learning rate scheduleone_in_eight
configuration for activation checkpointing- New tokenizer in the source instead of from huggingface
- Improved support for GCS
torch.compile()
now only compiles each block, not the whole model.- Support for
torch.compile()
withdynamic=True
- Resetting the
torch.compile()
after every evaluation, because evaluation messes with the compiled versions - Added more in-loop evaluation tasks to pick from, mostly for scaling law.
Commits
b41634f One more hint for what's going on.
d74e835 A little more help for getting started
24ce0ca Merge pull request #756 from allenai/MoreCheckpoints2
69d1e4e Note about and link to Huggingface
3c6d515 Merge pull request #754 from allenai/MoreCheckpoints
645587e Merge branch 'main' of https://github.com/allenai/LLM
a346674 Fix link
4f0d7d1 We use safetensors now.
a6e6e2b Remove links that don't work
e6f6b45 Remove obsolete docs
0d14158 Merge pull request #745 from allenai/improve-documentation
1048c16 Merge pull request #750 from allenai/dave/annealing_peteish_v2
767047c Merge pull request #749 from allenai/mattj/legalwhammy2-augusta
9c677c9 Merge pull request #748 from allenai/oeeval-ladder-testtrain
7e81a6c Merge pull request #739 from allenai/peteish13-augusta
31c385f Merge pull request #742 from allenai/GoogleStorage
afd728f Merge pull request #738 from allenai/annealing_peteish_v2_neweval
837a4ff Merge pull request #687 from allenai/kylel/config-diff
v0.5.1
What's new
Added 🎉
- Added ability to try loading latest checkpoint from save folder using
--try_load_latest_save
. - Added support for flash attention and gradient checkpointing to
hf_olmo
. - Added
effective_n_kv_heads
to OLMoConfig for hacky VLLM support.
Commits
6889991 Merge pull request #735 from allenai/true-version-0.5.1
4d81b1b Merge pull request #733 from allenai/version-0.5.1
76ad758 Merge pull request #724 from allenai/shanea/lumi-24.03-2
885bc22 Merge pull request #721 from allenai/hey-my-first-pr-to-olmo
aa1863e Merge pull request #725 from allenai/shanea/fix-build-errors
59360be add missing function
d2b655a Merge pull request #720 from allenai/shanea/set-device-early
47f8f5a Merge pull request #719 from allenai/shanea/hf-olmo-gradient-checkpointing
0b92077 Merge pull request #718 from allenai/ot-fix-mmlu-bpb
ca81901 Merge pull request #717 from allenai/shanea/try-load-latest-save-2
46f06cb Merge pull request #712 from allenai/ot-fix-oe-eval-bpb
v0.5.0
What's new
- Fixed conversion to HuggingFace model for DDP-trained models.
- Added support for remote source and destination for HuggingFace model conversion.
Added 🎉
- Added support for document masking via flash-attn during training with
--data.generate_doc_lengths
. - Added config options for
model.norm_after
,model.scale_emb_init
, andauxiliary_loss_multiplier
(used with zloss). - Added scripts for running experiments on qk_norm, norm reordering, and zloss.
- Added
model.rope_theta
configuration option. - Added
model.embedding_layer_norm
configuration option for adding a LN to the embeddings. - Added
model.emb_init_std
configuration option to override the standard deviation used to initialize the embeddings. - Added downstream eval task for requests dumped from oe-eval tasks
- Added
CosLinearEnvelope
scheduler, which is a pointwise product of a cosine schedule and a linear decay. - Added ability to save outputs of submodules for debugging purposes.
- Version dolma flan change in named_data_mix.py
Changed ⚠️
- Changed default distributed training strategy from single-GPU to FSDP
- Fixed behavior of
effective_memmap_dtype
to prevent unrecognized dtypes to be parsed asuint16
.
Fixed ✅
- Fixed restarting a training run in later epochs so that we no longer need to set the flag
--epoch=INT
. - Swapped in correct flan data mix.
- Fix bug where the attention norm, when applied before the attention block, was modifying the residual stream.
- Fixed
OLMo.from_checkpoint()
so that it correctly loadsolmo_core
andtorch_new
style checkpoints. - Fixed
preserve_rng_state
being incorrectly set to False when doing gradient checkpointing with dropout
Commits
cee1a5d Merge pull request #710 from allenai/version-dolma-flan-change
213a639 Merge pull request #711 from allenai/epwalsh/fix-unbound-qkv
4575d40 Fix Conversion Issues + add support for remote upload. (#694)
78d79a5 Merge pull request #709 from allenai/shanea/debugging-docs
9147889 Merge pull request #685 from allenai/ot-oe-eval-requests
6cdc4cc Merge pull request #698 from allenai/shanea/compare-model-state
e5217cf Merge pull request #705 from allenai/dave/checkpoint_style_naming
f4b386e Merge pull request #704 from allenai/shanea/fix-olmo-1.7-batch-size
1e71ce3 Merge pull request #547 from allenai/shanea/add-olmo-1.7-7b-to-readme
6c4d53f Merge pull request #702 from chrisc36/main
0bc7f6c Merge pull request #690 from allenai/shanea/trace-model-outputs-2
4332c32 Merge pull request #691 from allenai/dave/cosine_linear_envelope
6587ddb Merge pull request #674 from allenai/dave/flan_data_mix
7d63fe0 Merge pull request #671 from allenai/s3_unshard_to_hf
c322b9a Merge pull request #686 from allenai/fix-from-checkpoint
c482df7 Merge pull request #680 from allenai/shanea/fix-incorrect-attn-norm
3e30710 Merge pull request #629 from allenai/epwalsh/amberish
4e00460 Add support for document masking during training (#661)
b45002e make epoch logging less confusing
1b7d275 Fix restarts in later epochs (#670)
345edc6 Merge branch 'main' of https://github.com/allenai/LLM
66d2be7 Revert "Update Beaker image"
0757223 Merge pull request #649 from allenai/ModelLadder
90b3889 Merge pull request #660 from allenai/fix_convert_olmo_to_hf
dfb7212 Merge pull request #616 from allenai/chameleon
d627c94 Merge pull request #665 from allenai/ddp-ckpt-fix
ab63296 Improving memmap type parser (#663)
b55fb5f Merge pull request #662 from allenai/tiny-olmo-config-fix
56d1fe0 Merge pull request #657 from allenai/shanea/lumi-torch2.3-3
26c2d53 Merge pull request #648 from allenai/shanea/default-fsdp-strategy
65f1fff Merge pull request #656 from jeqcho/patch-1
20b82f8 Merge pull request #653 from allenai/shanea/olmo-v0.4.0
v0.4.0
What's new
Added 🎉
- Added clipping fix to
Optimizer
class to make it work with FSDPno_shard
and DDP. - Added tests to compare grad norm differences between torch optimizer and clipping and OLMo optimizer and clipping on both CPU and GPU.
- Expose memmap dtype in data config
- Added support for DDP training.
- Added caching to disk of HF datasets used in downstream evals
- Added FLOPs logging
- Added configs for OLMo tiny set of models
- Added configuration field
optimizer.record_update_metrics
, which defaults toFalse
, but when set toTrue
will trigger AdamW to collect the step size norm and absolute max for each parameter. - Added configuration field
optimizer.selective_updates
, which defaults toFalse
, but when set toTrue
will tell the optimizer to skip updating the parameter and state when the corresponding gradient is 0. - Added configuration field
optimizer.record_update_metrics
, which defaults toFalse
, but when set to True will trigger AdamW to collect the step size norm and absolute max for each parameter. - Added
olmo_data
, a package holding data files like tokenizers. - Added ability to load tokenizers from
olmo_data
package data.
Changed ⚠️
- Added original legacy unsharding implementation back, as the default. The new
shared memory implementation can be used by passinguse_legacy_shared_mem_impl
tounshard.py
. - Refactor weight initialization. IMPORTANT: this does not maintain backwards-compatibility with older configs; the jobs will still run, but may produce different outputs.
- Changed the behavior of the Lion optimizer to only record the update cosine similarity when
optimizer.record_update_metrics
isTrue
in order to be consistent with the API. - Added HF datasets into
olmo_data
, and changed downstream eval to load from the package.
Fixed ✅
- Changed from
ignored_index
toignore_index
forcross_entropy_loss
whenflash-attn>=2.5.8
. - Make
hf_olmo
supportAutoModelForCasualLM
and similar HF methods again.
Commits
d423c11 Merge pull request #652 from allenai/shanea/update-to-torch2.3
b10ab4b Merge pull request #651 from allenai/shanea/lumi-torch2.3-2
a101b31 Merge pull request #646 from allenai/shanea/hf-datasets-from-package
429a752 Merge pull request #647 from allenai/shanea/fix-tokenizer-break
bc60b8a Add option to skip optim steps for 0 grad params (#636)
cbc7c25 Merge pull request #645 from allenai/shanea/tokenizer-package-data
1b2658b Add option to record step size metrics from AdamW (#605)
a3e2ea7 multiple epoch fix
a1f118a Merge pull request #628 from allenai/olmo-tiny
d7994c8 Fix Z-loss calculation (#634)
a5539f4 Merge pull request #631 from allenai/shanea/hf-olmo-auto-model
d72a262 Merge pull request #626 from allenai/shanea/inspect-train-data-improvements
2417b11 Make olmo-core checkpointer more robust on weka (#624)
ddc8847 Merge pull request #612 from allenai/ddp
41ed20a Merge pull request #623 from allenai/shanea/hf-save-to-disk-2
a33caa9 Merge pull request #604 from allenai/WandbDiff
e5d63a3 Merge pull request #619 from allenai/shanea/add-olmo-1.7-7b-checkpoints
e207df7 Officially add OLMo-core as a dependency (#615)
72159ae Merge pull request #614 from allenai/shanea/pass-include-instance-metadata
c2cedbc Merge pull request #607 from allenai/rewrite-init
578234d Merge pull request #611 from allenai/shanea/hf-get-tokenizer-from-config-2
de43ee8 Merge pull request #610 from allenai/shanea/hf-get-tokenizer-from-config
2639279 Merge pull request #594 from NeuralFabricAI/lx/expose-data-dtype
9e89408 Create sensible filenames
02a8a58 Merge pull request #603 from allenai/shanea/unshard-without-passing-type
ae84d47 Merge pull request #602 from allenai/no_shard_ddp_clip
40210bb Merge pull request #599 from allenai/train-olmo-large
55c1e2f Merge pull request #601 from allenai/no_shard_ddp_clip
5789cfe Merge pull request #593 from allenai/shanea/inspect-train-data-no-indices
eafd154 Merge pull request #579 from MLgdg/main
652c745 Merge pull request #590 from allenai/shanea/update-readme-to-olmo-1.7
8ec2809 Merge pull request #589 from allenai/shanea/update-main-readme-hf
6e714b8 Merge pull request #588 from allenai/shanea/hf-olmo-docs-auto-methods
65d5575 Merge pull request #587 from allenai/shanea/storage-cleaner-improvemnts
0bddfe0 Merge pull request #585 from allenai/shanea/add-hf-docs
e6430a0 Merge pull request #582 from allenai/shanea/hybrid-shard-as-no-shard
c29787a Merge pull request #569 from allenai/Muennighoff/fix-torchv
7a462c5 Merge pull request #580 from allenai/shanea/update-ignore-index-kwarg
4f917fb Merge pull request #575 from allenai/shanea/add-weka
5c721cc Fix GPU tests CI (#574)
467adcc Merge remote-tracking branch 'origin/train-olmo-large'
4b2d12e Merge pull request #565 from allenai/readme
ccc49fd Merge pull request #564 from allenai/shanea/add-new-hf-converter
b17abd0 Merge pull request #512 from liaoleo/main
295d309 Merge pull request #561 from allenai/shanea/delay-device-mesh-import
4e8746d Merge pull request #562 from allenai/shanea/re-add-easy-legacy-unshard-impl
f38de95 Merge pull request #558 from allenai/shanea/release-v0.3.0
829f1d6 Merge pull request #520 from allenai/add-ce-loss-metric
v0.3.0
What's new
Added 🎉
- Added support for Grouped Query Attention.
- Added commonsense_qa and social_iqa downstream evaluation tasks
- Makes it possible to read from http/https the same way we read from s3/r2.
- Added MMLU multiple choice (A/B/C/D) 5-shot variant downstream tasks
- Tokenizer patch
- Added option to specify number of model replicas when using hybrid sharding.
Changed ⚠️
- Rename
Olmo
toOLMo
everywhere in the codebase - Disabled automatic garbage collection during training, instead we run manually at regular intervals to avoid ranks getting out-of-sync with their own gc.
Removed 👋
- Removed
AMDLayerNorm
, since the original layer norm bug has been fixed and we don't need this workaround anymore. - Removed
OLMoParallelBlock
.
Fixed ✅
- Don't log garbage on nodes that aren't rank 0
- Don't crash in the HF code when we are referring to a tokenizer in a local file
- Point official training scripts to publicly available URLs
- Corrected the
resize_token_embeddings
method in theOLMoForCausalLM
class to properly update the token embeddings when resizing the vocabulary. - Changed
tie_weights
method to a no-op as weight tying is handled in olmo/model.py - Fixed the size calculation for qk layer norm
- Fixed pipeline test failure that occurs due to a bug in transformers version 4.39.1
- Make
hf_olmo
compatible with transformers versions >=4.40.0
Commits
3b16e21 Merge pull request #556 from allenai/shanea/make-hf-olmo-support-new-transformers
ccf7bf0 Merge pull request #555 from allenai/shanea/wandb-cancel-failure-bypass
7be71cd use correct PG when collecting metrics with HYBRID shard (#551)
06786a7 Merge pull request #548 from allenai/shanea/fix-olmo-name-hf
4ed135e Merge pull request #540 from allenai/shanea/hybrid-sharding-num-groups-2
2eae988 Merge pull request #546 from allenai/shanea/add-olmo-1.7-7b-checkpoints
d2afcaa Add cfg option --scheduler.warmup_min_lr
(#542)
9d40898 Merge pull request #537 from allenai/AkshitaB-tokenizer-patch
62c7954 Merge pull request #536 from allenai/shanea/storage-cleaner-wandb-path-from-checkpoint
657a55e Merge pull request #494 from allenai/shanea/storage-cleaner-move-entry
9a0a84a Merge pull request #527 from allenai/PublicTrainingData
0de5fdc Merge pull request #501 from djliden/dl/fix-embedding-resize
4792f94 Adds a new experimental sharded checkpointer from OLMo-core (#532)
1c12980 make garbage collection interval configurable (#533)
db2dee2 Merge pull request #503 from djliden/dl/hf-weight-tying
8fad649 Merge pull request #534 from allenai/shanea/fix-transformer-cache-position-regression
71f7014 Merge pull request #528 from allenai/add-mmlu-mc-5shot
8472d0b Merge pull request #521 from allenai/davidbrandfonbrener-patch-1
194012a Merge pull request #523 from allenai/davidbrandfonbrener-patch-2
8949bd8 Added deprecation for memmap (#517)
83cc8b1 Merge pull request #464 from allenai/olmo7-ablations
f8aef84 Merge pull request #509 from allenai/epwalsh/manual-gc
0ac82a9 Merge pull request #508 from allenai/RunDataloader
74de51d Merge pull request #414 from allenai/mitchish65-2
417af0e Merge pull request #504 from allenai/add-csqa-siqa
666da70 Patch other S3 methods with 404 detection fix
0b6e28c Fix checking HTTP status code for boto3 responses
0b835a8 Merge pull request #500 from allenai/shanea/expose-official-checkpoints
50da7a4 Add work-arounds for new-style checkpointing issues
6d42d7a Fix hang when training is canceled
7eb7f3d Merge pull request #455 from gahdritz/main
ed47c29 Merge pull request #453 from hxdtest/only_rank0_log_metrics
ad8198e Merge pull request #495 from allenai/add-basic-math
1511fed Merge pull request #487 from allenai/fix-mmlu-prompt-bug
c2840e4 Merge pull request #493 from allenai/shanea/storage-cleaner-move-improvements
658f7cc Merge pull request #466 from allenai/rename
eb5b2da Merge pull request #490 from allenai/RemoveAMDLN
752353b Merge pull request #488 from allenai/shanea/optimize-unsharding-2
v0.2.5
What's new
Fixed ✅
- Fixed default value of
--tokenizer
argument toscripts/prepare_tulu_data.py
to be an absolute path, not relative path, the script can be run from other directories. - Added the option to directly pass input embeddings to
OLMo
andOLMoForCausalLM
. - Added support for Python 3.8.
- Added code to throw an error if
output_attentions
is set toTrue
in forward call toOLMoForCausalLM
. This functionality hasn't been implemented yet. - Fixed running with data loading workers on LUMI
Added 🎉
- Added
output_hidden_states
argument and associated functionality toOLMo
andOLMoForCausalLM
to return model intermediate hidden states. - Added MMLU downstream evaluation tasks, with prompt variations.
- Added support for PyTorch v2.2.
- Added ability to show logs from all ranks
- Added option for QKV clipping.
Changed ⚠️
- Refactor torch.load monkey patching for legacy checkpoint unsharding in anticipation of unsharding implementation change.
Commits
c499632 Add option for QKV clipping (#489)
31d8528 Pull checkpoint patch from mitchish-gqa-2
03d7643 Merge pull request #486 from allenai/shanea/monkey-patch-ctx-manager
fd3a57b Merge pull request #483 from allenai/shanea/storage-cleaner-unshard-improvements
1d264e4 Merge pull request #481 from allenai/WorkersOnLumi
70ad30c Merge pull request #480 from allenai/Firehose
493c0b8 Add MMLU prompt variants (#484)
cb711e2 Add support for PyTorch v2.2 (#476)
67d24f5 Merge pull request #468 from allenai/mmlu-downstream
0c58bee Fix bug when clipping is disabled
922db6a Only run the profiler through a single cycle (#463)
37ca789 Merge pull request #462 from allenai/epwalsh/fsdp-wrap-patch
cc36709 Add attn bias arg to HF wrapper (#458)
7f7abbb Merge pull request #451 from sarahwie/main
9fd9130 Add support for Python 3.8 (#448)
d9c0993 Require Python>=3.9 for now
97296e6 Merge pull request #442 from allenai/shanea/add-input-embedding-arg
3be4c1e add link to W&B logs for 1B run
d7d4de4 Add link to OLMo-7B-Twin-2T W&B logs
cf12108 Update README.md (#429)
15af668 freeze official configs for reproductions (#421)
7739fe1 Add link to W&B logs for OLMo-7B
80db5e3 Fix default value of --tokenizer
6765317 Add link to paper in README badge
v0.2.4
What's new
Fixed ✅
- Fixed an issue with the HuggingFace integration where we were inadvertently using a feature that was introduced in Python 3.10, causing an error for older Python versions.
Commits
8a3f2d8 Fix HF integration for Python < 3.10 (#426)
49c8647 Use temp branding GIF for logo (for now) (#419)
v0.2.3
What's new
Commits
98c115c Bump version to v0.2.3 for release
0e53b33 specify dependencies in pyproject.toml (#418)
18e5dad update PyPI release process
141cc94 Merge pull request #415 from allenai/readme-inf
2587240 Merge pull request #417 from allenai/Muennighoff/ckpt
a5a01a2 Merge pull request #416 from allenai/nol_rdme
98425a5 Merge pull request #413 from allenai/shanea/storage-cleaner-s3-upload-cleanup
3053bfa Update install instructions in README
f36ac42 Merge pull request #410 from allenai/epwalsh/fine-tune-with-label-masking
dcae8e8 Merge pull request #411 from allenai/epwalsh/lr-schedule-tokens
45ed078 Add more mcli configs
905359e fix bug with saving unsharded checkpoint
3e3df71 Merge pull request #409 from allenai/epwalsh/tulu-fine-tune
a2e1d13 Merge pull request #368 from allenai/mitchish-lumi
5a735dd Merge pull request #350 from allenai/mitchish
df19554 Merge pull request #388 from allenai/mitchish65
23eb949 Train a few steps after time limit reached (#362)
ac1aee1 Merge pull request #408 from allenai/NixLogz
6da42cf ensure we save checkpoint at end of loop
568a3d8 Merge pull request #406 from allenai/hf-olmo-loading
3c51402 Merge pull request #407 from allenai/shanea/storage-cleaner-avoid-redundant-copy
53217d2 Merge pull request #405 from allenai/shanea/storage-cleaner-fix-upload-path
5eb26aa Merge pull request #404 from allenai/shanea/storage-cleaner-minor-fixes
87ed747 backwards compat fix
1c13e5f Merge pull request #403 from allenai/shanea/storage-cleaner-fix-max-archive-size
685d11b Merge pull request #400 from allenai/shanea/storage-cleaner-wandb
5bdccc3 Merge pull request #402 from allenai/shanea/storage-cleaner-is-run-improvement
75d6738 Merge pull request #401 from allenai/shanea/storage-cleaner-is-file-no-key
0475f3a Make logo a little smaller
1184050 Add logo to README
e2d77c4 Ephemeral checkpoints (#397)
6f2abfb Merge pull request #399 from allenai/shane/storage-cleaner-fix-s3-upload
f8beb5b Merge pull request #398 from allenai/shanea/storage-cleaner-move-run
185d7e2 Move remaining top-level mkd docs into docs folder (#395)
5d03d38 Merge pull request #396 from allenai/shanea/storage-cleaner-delete-temp-files
fe49693 Merge pull request #382 from allenai/shanea/storage-cleaner-unsharding-legacy
1ede949 Merge pull request #381 from allenai/shanea/storage-cleaner-unsharding-2
9cc7154 update some links to new repo (#394)