Audience For This Repo #51

hamelsmu · 2024-04-22T19:31:58Z

Carrying over discussion with @mwaskom from this thread

I think this repo is pretty difficult to reason about if you aren't familiar with axolotl IMO. Like what are these configs? How does it work? How are my prompts assembled exactly? What does the dataset format need to be? Are there other dataset formats? How do I check the prompt construction? etc. I was actually assuming that the user is indeed familiar with axolotl.
If you are very familiar with axoltol, this --data flag was really confusing to me, because a key parameter in my config that I am used to using is being completely ignored with an extra layer of indirection. I actually got stuck on this personally as an experienced axolotl user, so I found the need to provide these two caveats.

cc: @charlesfrye @winglian curious what you think

Originally posted by @hamelsmu in #48 (comment)

The text was updated successfully, but these errors were encountered:

JUNIORCO · 2024-05-01T02:09:09Z

To add to this:

Would be great to add a Llama 3 example config. Here's mine

See config

base_model: meta-llama/Meta-Llama-3-8B
model_type: LlamaForCausalLM
tokenizer_type: AutoTokenizer

load_in_8bit: false
load_in_4bit: true
strict: false

# note I have my own dataset here that isn't part of the examples
datasets:
 - path: train.jsonl
   type: sharegpt
dataset_prepared_path:
val_set_size: 0
output_dir: ./out/qlora-llama3-70b

adapter: qlora
lora_model_dir:

sequence_len: 512
sample_packing: false
pad_to_sequence_len: true

lora_r: 8
lora_alpha: 16
lora_dropout: 0.05
lora_target_modules:
lora_target_linear: true
lora_fan_in_fan_out:

wandb_project:
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:

gradient_accumulation_steps: 4
micro_batch_size: 1
num_epochs: 2
optimizer: adamw_torch
lr_scheduler: cosine
learning_rate: 0.00001

train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
gradient_checkpointing_kwargs:
 use_reentrant: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true

warmup_steps: 10
evals_per_epoch: 4
eval_table_size:
saves_per_epoch: 1
debug:
deepspeed:
weight_decay: 0.0
special_tokens:
 eos_token: "<|im_end|>"
 pad_token: "<|end_of_text|>"
tokens:
 - "<|im_start|>"

A conversational dataset example would be nice for folks coming from the OpenAI fine-tuning world. They force a dataset of this format. What was a bit confusing is that there's no Axolotl dataset format that matches the OpenAI format, so I had to modify my dataset slightly to fit the sharegpt type
A bit more effort into the inference. Make it a POST request that exposes an OpenAI compatible endpoint like this. This is what a lot of folks are interested in doing imo

Happy to make a PR

shamikbose · 2024-06-02T11:57:40Z

If you are very familiar with axoltol, this --data flag was really confusing to me, because a key parameter in my config that I am used to using is being completely ignored with an extra layer of indirection. I actually got stuck on this personally as an experienced axolotl user, so I found the need to provide these two caveats.

@hamelsmu Even as a newcomer to axolotl, the discrepancy between the data flags in the two frameworks is really confusing to me. It would be helpful to have a guide describing the difference between how the flags are being used by the different frameworks (as a start)

devanshrj · 2024-06-06T00:28:39Z

Agree with @JUNIORCO. It would be great to have a conversational dataset example that works with a model like Llama3-8B-Instruct. I made a few attempts based on axolotl's example config and the example configs provided in this repo, but none seem to work with Llama3-8B-Instruct's format.

Additionally, it would also be great to have more details about the docker container and axolotl version used by Modal.

hamelsmu changed the title ~~My observations:~~ Audience For This Repo Apr 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Audience For This Repo #51

Audience For This Repo #51

hamelsmu commented Apr 22, 2024 •

edited

Loading

JUNIORCO commented May 1, 2024 •

edited

Loading

shamikbose commented Jun 2, 2024

devanshrj commented Jun 6, 2024

Audience For This Repo #51

Audience For This Repo #51

Comments

hamelsmu commented Apr 22, 2024 • edited Loading

JUNIORCO commented May 1, 2024 • edited Loading

shamikbose commented Jun 2, 2024

devanshrj commented Jun 6, 2024

hamelsmu commented Apr 22, 2024 •

edited

Loading

JUNIORCO commented May 1, 2024 •

edited

Loading