-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Audience For This Repo #51
Comments
To add to this:
See configbase_model: meta-llama/Meta-Llama-3-8B
model_type: LlamaForCausalLM
tokenizer_type: AutoTokenizer
load_in_8bit: false
load_in_4bit: true
strict: false
# note I have my own dataset here that isn't part of the examples
datasets:
- path: train.jsonl
type: sharegpt
dataset_prepared_path:
val_set_size: 0
output_dir: ./out/qlora-llama3-70b
adapter: qlora
lora_model_dir:
sequence_len: 512
sample_packing: false
pad_to_sequence_len: true
lora_r: 8
lora_alpha: 16
lora_dropout: 0.05
lora_target_modules:
lora_target_linear: true
lora_fan_in_fan_out:
wandb_project:
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:
gradient_accumulation_steps: 4
micro_batch_size: 1
num_epochs: 2
optimizer: adamw_torch
lr_scheduler: cosine
learning_rate: 0.00001
train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false
gradient_checkpointing: true
gradient_checkpointing_kwargs:
use_reentrant: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true
warmup_steps: 10
evals_per_epoch: 4
eval_table_size:
saves_per_epoch: 1
debug:
deepspeed:
weight_decay: 0.0
special_tokens:
eos_token: "<|im_end|>"
pad_token: "<|end_of_text|>"
tokens:
- "<|im_start|>"
Happy to make a PR |
@hamelsmu Even as a newcomer to axolotl, the discrepancy between the data flags in the two frameworks is really confusing to me. It would be helpful to have a guide describing the difference between how the flags are being used by the different frameworks (as a start) |
Agree with @JUNIORCO. It would be great to have a conversational dataset example that works with a model like Llama3-8B-Instruct. I made a few attempts based on axolotl's example config and the example configs provided in this repo, but none seem to work with Llama3-8B-Instruct's format. Additionally, it would also be great to have more details about the docker container and axolotl version used by Modal. |
Carrying over discussion with @mwaskom from this thread
axoltol
, this--data
flag was really confusing to me, because a key parameter in my config that I am used to using is being completely ignored with an extra layer of indirection. I actually got stuck on this personally as an experienced axolotl user, so I found the need to provide these two caveats.cc: @charlesfrye @winglian curious what you think
Originally posted by @hamelsmu in #48 (comment)
The text was updated successfully, but these errors were encountered: