Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update colab examples #86

Merged
merged 7 commits into from
Aug 22, 2024
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions examples/language-modeling/gemma_tuning.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -249,7 +249,7 @@
"metadata": {},
"outputs": [],
"source": [
"from optimum.tpu import AutoModelForCausalLM\n",
"from transformers import AutoModelForCausalLM\n",
"model = AutoModelForCausalLM.from_pretrained(model_id, use_cache=False)"
]
},
Expand Down Expand Up @@ -297,7 +297,11 @@
"from transformers import TrainingArguments\n",
"\n",
"# Set up the FSDP arguments\n",
"fsdp_training_args = fsdp_v2.get_fsdp_training_args(model)\n",
"cls_to_wrap = \"GemmaDecoderLayer\"\n",
"fsdp_training_args = {\n",
" \"fsdp\": \"full_shard\",\n",
" \"fsdp_config\": fsdp_v2.get_fsdp_config(cls_to_wrap),\n",
"}\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, that was the point of using get_fsdp_training_args, that you do not need to know what classes to wrap on supported models. I would revert this bit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the quick review! get_fsdp_training_args accepts only the optimum.tpu model class, not the transformers one. I updated the get_fsdp_training_args function, so it should now work.

"\n",
"# Set up the trainer\n",
"trainer = SFTTrainer(\n",
Expand Down
9 changes: 6 additions & 3 deletions examples/language-modeling/llama_tuning.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,8 +47,7 @@ Then, the tokenizer and model need to be loaded. We will choose [`meta-llama/Met

```python
import torch
from transformers import AutoTokenizer
from optimum.tpu import AutoModelForCausalLM
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "meta-llama/Meta-Llama-3-8B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
Expand All @@ -69,7 +68,11 @@ data = data.map(lambda samples: tokenizer(samples["quote"]), batched=True)
You then need to specify the FSDP training arguments to enable the sharding feature,the function will deduce the classes that should be sharded:

```python
fsdp_training_args = fsdp_v2.get_fsdp_training_args(model)
cls_to_wrap = "LlamaDecoderLayer"
fsdp_training_args = {
"fsdp": "full_shard",
"fsdp_config": fsdp_v2.get_fsdp_config(cls_to_wrap),
}
wenxindongwork marked this conversation as resolved.
Show resolved Hide resolved
```

Now training can be done as simply as using the standard `Trainer` class:
Expand Down