Skip to content

Commit

Permalink
Update moe.md (huggingface#1961)
Browse files Browse the repository at this point in the history
typo :)
  • Loading branch information
gante authored Apr 4, 2024
1 parent 08bb67c commit 8268373
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion moe.md
Original file line number Diff line number Diff line change
Expand Up @@ -227,7 +227,7 @@ More experts lead to improved sample efficiency and faster speedup, but these ar

## Fine-tuning MoEs

> Mixtral is supported with version 4.36.0 of transformers. You can install it with `pip install "transformers==4.36.0 --upgrade`
> Mixtral is supported with version 4.36.0 of transformers. You can install it with `pip install transformers==4.36.0 --upgrade`
The overfitting dynamics are very different between dense and sparse models. Sparse models are more prone to overfitting, so we can explore higher regularization (e.g. dropout) within the experts themselves (e.g. we can have one dropout rate for the dense layers and another, higher, dropout for the sparse layers).

Expand Down

0 comments on commit 8268373

Please sign in to comment.