Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Audience For This Repo #51

Open
hamelsmu opened this issue Apr 22, 2024 · 3 comments
Open

Audience For This Repo #51

hamelsmu opened this issue Apr 22, 2024 · 3 comments

Comments

@hamelsmu
Copy link
Contributor

hamelsmu commented Apr 22, 2024

Carrying over discussion with @mwaskom from this thread

  • I think this repo is pretty difficult to reason about if you aren't familiar with axolotl IMO. Like what are these configs? How does it work? How are my prompts assembled exactly? What does the dataset format need to be? Are there other dataset formats? How do I check the prompt construction? etc. I was actually assuming that the user is indeed familiar with axolotl.
  • If you are very familiar with axoltol, this --data flag was really confusing to me, because a key parameter in my config that I am used to using is being completely ignored with an extra layer of indirection. I actually got stuck on this personally as an experienced axolotl user, so I found the need to provide these two caveats.

cc: @charlesfrye @winglian curious what you think

Originally posted by @hamelsmu in #48 (comment)

@hamelsmu hamelsmu changed the title My observations: Audience For This Repo Apr 22, 2024
@JUNIORCO
Copy link

JUNIORCO commented May 1, 2024

To add to this:

  • Would be great to add a Llama 3 example config. Here's mine
See config
base_model: meta-llama/Meta-Llama-3-8B
model_type: LlamaForCausalLM
tokenizer_type: AutoTokenizer

load_in_8bit: false
load_in_4bit: true
strict: false

# note I have my own dataset here that isn't part of the examples
datasets:
 - path: train.jsonl
   type: sharegpt
dataset_prepared_path:
val_set_size: 0
output_dir: ./out/qlora-llama3-70b

adapter: qlora
lora_model_dir:

sequence_len: 512
sample_packing: false
pad_to_sequence_len: true

lora_r: 8
lora_alpha: 16
lora_dropout: 0.05
lora_target_modules:
lora_target_linear: true
lora_fan_in_fan_out:

wandb_project:
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:

gradient_accumulation_steps: 4
micro_batch_size: 1
num_epochs: 2
optimizer: adamw_torch
lr_scheduler: cosine
learning_rate: 0.00001

train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
gradient_checkpointing_kwargs:
 use_reentrant: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true

warmup_steps: 10
evals_per_epoch: 4
eval_table_size:
saves_per_epoch: 1
debug:
deepspeed:
weight_decay: 0.0
special_tokens:
 eos_token: "<|im_end|>"
 pad_token: "<|end_of_text|>"
tokens:
 - "<|im_start|>"
  • A conversational dataset example would be nice for folks coming from the OpenAI fine-tuning world. They force a dataset of this format. What was a bit confusing is that there's no Axolotl dataset format that matches the OpenAI format, so I had to modify my dataset slightly to fit the sharegpt type
  • A bit more effort into the inference. Make it a POST request that exposes an OpenAI compatible endpoint like this. This is what a lot of folks are interested in doing imo

Happy to make a PR

@shamikbose
Copy link

If you are very familiar with axoltol, this --data flag was really confusing to me, because a key parameter in my config that I am used to using is being completely ignored with an extra layer of indirection. I actually got stuck on this personally as an experienced axolotl user, so I found the need to provide these two caveats.

@hamelsmu Even as a newcomer to axolotl, the discrepancy between the data flags in the two frameworks is really confusing to me. It would be helpful to have a guide describing the difference between how the flags are being used by the different frameworks (as a start)

@devanshrj
Copy link

Agree with @JUNIORCO. It would be great to have a conversational dataset example that works with a model like Llama3-8B-Instruct. I made a few attempts based on axolotl's example config and the example configs provided in this repo, but none seem to work with Llama3-8B-Instruct's format.

Additionally, it would also be great to have more details about the docker container and axolotl version used by Modal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants