Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test Llama-7b-model #250

Open
1 task
Ssukriti opened this issue Oct 27, 2023 · 3 comments
Open
1 task

Test Llama-7b-model #250

Ssukriti opened this issue Oct 27, 2023 · 3 comments
Assignees

Comments

@Ssukriti
Copy link
Collaborator

Description

As a developer of caikit NLP, I want to test Llama-7b models with prompt tuning and fine tuning techniques to evaluate for quality and performance.

Discussion

a. We want to test Llama 7b models for these tasks:

  1. classification one dataset
  2. summarization
    with PEFT tuning - Random and fine tuning.

b. We want to measure train time with varying accumulate steps

Acceptance Criteria

  • Results are posted in issue
@Ssukriti
Copy link
Collaborator Author

Ssukriti commented Oct 27, 2023

This needs to be assigned to me @gkumbhat .

Results on train time
parameters used : torch_dtype: bfloat16 , batch_size:64 , learning_rate, 0.3, , train_data_set_size:100 examples, num_epochs:50
1 A100 GPU on CCC

with PEFT RANDOM init :

accumulate_steps: 1
Time per iteration: 6.23sec ( [00:12<00:00, 6.23s/it])
Total time to train+evaluate+initial load : 13 mins

accumulate_steps: 16
Time per iteration: 6.23sec ([00:12<00:00, 6.23s/it])
Total time to train+evaluate+initial load : 13 mins

accumulate_steps: 32
Time per iteration: 6.23sec ( [00:12<00:00, 6.23s/it])
Total time to train+evaluate+initial load : 12 mins

Seems like varying accumulation steps keeping all other parameters constant, does not affect train time

@Ssukriti
Copy link
Collaborator Author

Ssukriti commented Oct 27, 2023

Quality scores:(editing as I run summarization)

for sentiment evaluation, 100 examples from train, 100 from predict
using PEFT, random init , 50 epochs

F1 micro: 31%

which is lower than number we saw with flan-t5-xl, but flan-t5 models are known to perform well for these tasks

@chakrn chakrn moved this from ToDo to In Progress in caikit ecosystem Oct 30, 2023
@Ssukriti
Copy link
Collaborator Author

Ssukriti commented Nov 1, 2023

testing fine tuning is blocked on #257

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

No branches or pull requests

1 participant