Add HuggingFace arg so that arch is automatic #39

bhavnicksm · 2024-05-08T15:12:06Z

This pull request is made to work on adding automated parameter calculations for all hugging face models.

Expected Behaviour:

python transformer_mem.py --hf_model_name_or_path meta-llama/Llama-2-7b-hf --num-gpus 8 --zero-stage 3 --batch-size-per-gpu 2 --sequence-length 4096

Ref: [ #1 ]

CLAassistant · 2024-05-08T15:12:11Z

All committers have signed the CLA.

bhavnicksm · 2024-05-09T13:37:57Z

@Quentin-Anthony,

If the user passes in some value that conflicts with the transformer passed, do we ignore it or take that into consideration?

for eg.

python transformer_mem.py \ 
--hf_model_name_or_path meta-llama/Llama-2-7b-hf \
--num-gpus 8 \
--zero-stage 3 \ 
--batch-size-per-gpu 2 \
--sequence-length 4096 \
--num_attention_heads 16

in the above example the num_attention_heads is both explicitly passed and implicitly provided.

bhavnicksm · 2024-05-09T14:19:46Z

Currently,
getting some wrong parameter estimates:

Calculating memory with training configuration: {'hf_model_name_or_path': 'NousResearch/Hermes-2-Pro-Llama-3-8B', 'num_gpus': 8, 'tensor_parallel_size': 1, 'pipeline_parallel_size': 1, 'partition_activations': False, 'zero_stage': 3, 'zero_allgather_bucket_size': 500000000.0, 'zero3_max_live_params': 1000000000.0, 'checkpoint_activations': False, 'batch_size_per_gpu': 2, 'sequence_length': 4096, 'vocab_size': 128288, 'hidden_size': 4096, 'num_attention_heads': 32, 'num_layers': 32, 'ffn_expansion_factor': 3.5, 'infer': False, 'kv_size_ratio': 0.25, 'is_mixed_precision': True, 'high_prec_bytes_per_val': 4, 'low_prec_bytes_per_val': 2, 'bytes_per_grad_ele': 4, 'num_experts': 0, 'expert_parallelism': 1, 'misc_mem_gib': 0}

Number of Parameters: 6.17 B

Quentin-Anthony · 2024-05-09T14:22:52Z

@Quentin-Anthony,

If the user passes in some value that conflicts with the transformer passed, do we ignore it or take that into consideration?

for eg.
python transformer_mem.py \ 
--hf_model_name_or_path meta-llama/Llama-2-7b-hf \
--num-gpus 8 \
--zero-stage 3 \ 
--batch-size-per-gpu 2 \
--sequence-length 4096 \
--num_attention_heads 16 
in the above example the num_attention_heads is both explicitly passed and implicitly provided.

I think if the user provides an arg, we overwrite the HF config on that value. All overwritten values should get a print (e.g. "overwriting HF num_attention_heads config value (x) with user arg (y)")

bhavnicksm · 2024-05-09T14:45:14Z

I think if the user provides an arg, we overwrite the HF config on that value. All overwritten values should get a print (e.g. "overwriting HF num_attention_heads config value (x) with user arg (y)")

How do we check if the value is user-provided or a default value?

Say, the user gave num_attention_heads as 64 which is also the default value, the args would not be able to tell them apart.

Instead, maybe we could keep the default values in another dictionary and have the parser only have None type as default values so we can tell when we get a user input and when we are taking a default value.

What do you think? @Quentin-Anthony

Quentin-Anthony · 2024-05-24T04:11:09Z

I think if the user provides an arg, we overwrite the HF config on that value. All overwritten values should get a print (e.g. "overwriting HF num_attention_heads config value (x) with user arg (y)")

How do we check if the value is user-provided or a default value?

Say, the user gave num_attention_heads as 64 which is also the default value, the args would not be able to tell them apart.

Instead, maybe we could keep the default values in another dictionary and have the parser only have None type as default values so we can tell when we get a user input and when we are taking a default value.

What do you think? @Quentin-Anthony

But this would mean that we have no default values and that the user needs to pass everything? If I'm misunderstanding, maybe just implement what you're describing real quick and we can iterate.

bhavnicksm · 2024-05-24T07:36:49Z

Hi @Quentin-Anthony,
Added default value dictionary to handle None values, detecting user input when args are not none, and "replacing" them from the config (since they were already in the args, I skip those values)

bhavnicksm · 2024-08-11T16:50:25Z

Hi @Quentin-Anthony, wanted to check in about this PR? Is this still required? Is something missing here?

Quentin-Anthony · 2024-08-19T15:57:00Z

Hi @Quentin-Anthony, wanted to check in about this PR? Is this still required? Is something missing here?

Yep still needed! Reviewing now.

Quentin-Anthony · 2024-08-19T16:07:32Z

I rebased, and some reason this PR "files changed" view is now showing all the rebase changes? Gonna try and close and reopen to see if that fixes it.

EDIT: That did it!

Quentin-Anthony

I cleaned things up a bit, rebased, and tested. All looks great to me. Thank you!

bhavnicksm · 2024-08-22T14:29:59Z

Thank You @Quentin-Anthony!
I enjoyed working on this with you, as my first open source PR. :)

Add placeholder fn get_hf_model_args()

5ed1b26

Update the fn get_args_from_hf

a42a27e

Remove stale comments

42aafcd

chore: Set default values for arguments in calc_transformer_mem.py

af3c5aa

bhavnicksm changed the title ~~WIP Add HuggingFace arg so that arch is automatic~~ Add HuggingFace arg so that arch is automatic May 24, 2024

bhavnicksm marked this pull request as ready for review May 24, 2024 07:00

bhavnicksm and others added 5 commits August 19, 2024 12:02

Add placeholder fn get_hf_model_args()

c2af7db

Update the fn get_args_from_hf

2d03af5

Remove stale comments

f7f687a

chore: Set default values for arguments in calc_transformer_mem.py

9823934

Merge branch 'main' of https://github.com/bhavnicksm/eai-cookbook

b0f389c

Quentin-Anthony self-requested a review August 19, 2024 16:06

Quentin-Anthony closed this Aug 19, 2024

Quentin-Anthony reopened this Aug 19, 2024

Quentin-Anthony added 3 commits August 19, 2024 12:38

Clean up and rebase

185ff45

update mem calc readme with hf arg

1831c26

add Bhavnick as calc mem author

2bf7629

Quentin-Anthony approved these changes Aug 19, 2024

View reviewed changes

Quentin-Anthony merged commit efb225c into EleutherAI:main Aug 19, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add HuggingFace arg so that arch is automatic #39

Add HuggingFace arg so that arch is automatic #39

bhavnicksm commented May 8, 2024

CLAassistant commented May 8, 2024 •

edited

Loading

bhavnicksm commented May 9, 2024

bhavnicksm commented May 9, 2024

Quentin-Anthony commented May 9, 2024

bhavnicksm commented May 9, 2024 •

edited

Loading

Quentin-Anthony commented May 24, 2024

bhavnicksm commented May 24, 2024

bhavnicksm commented Aug 11, 2024

Quentin-Anthony commented Aug 19, 2024

Quentin-Anthony commented Aug 19, 2024 •

edited

Loading

Quentin-Anthony left a comment

bhavnicksm commented Aug 22, 2024

Add HuggingFace arg so that arch is automatic #39

Add HuggingFace arg so that arch is automatic #39

Conversation

bhavnicksm commented May 8, 2024

CLAassistant commented May 8, 2024 • edited Loading

bhavnicksm commented May 9, 2024

bhavnicksm commented May 9, 2024

Quentin-Anthony commented May 9, 2024

bhavnicksm commented May 9, 2024 • edited Loading

Quentin-Anthony commented May 24, 2024

bhavnicksm commented May 24, 2024

bhavnicksm commented Aug 11, 2024

Quentin-Anthony commented Aug 19, 2024

Quentin-Anthony commented Aug 19, 2024 • edited Loading

Quentin-Anthony left a comment

Choose a reason for hiding this comment

bhavnicksm commented Aug 22, 2024

CLAassistant commented May 8, 2024 •

edited

Loading

bhavnicksm commented May 9, 2024 •

edited

Loading

Quentin-Anthony commented Aug 19, 2024 •

edited

Loading