Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add HuggingFace arg so that arch is automatic #39

Merged
merged 12 commits into from
Aug 19, 2024

Conversation

bhavnicksm
Copy link
Contributor

This pull request is made to work on adding automated parameter calculations for all hugging face models.

Expected Behaviour:

python transformer_mem.py --hf_model_name_or_path meta-llama/Llama-2-7b-hf --num-gpus 8 --zero-stage 3 --batch-size-per-gpu 2 --sequence-length 4096

Ref: [ #1 ]

@CLAassistant
Copy link

CLAassistant commented May 8, 2024

CLA assistant check
All committers have signed the CLA.

@bhavnicksm
Copy link
Contributor Author

@Quentin-Anthony,

If the user passes in some value that conflicts with the transformer passed, do we ignore it or take that into consideration?

for eg.

python transformer_mem.py \ 
--hf_model_name_or_path meta-llama/Llama-2-7b-hf \
--num-gpus 8 \
--zero-stage 3 \ 
--batch-size-per-gpu 2 \
--sequence-length 4096 \
--num_attention_heads 16 

in the above example the num_attention_heads is both explicitly passed and implicitly provided.

@bhavnicksm
Copy link
Contributor Author

Currently,
getting some wrong parameter estimates:

Calculating memory with training configuration: {'hf_model_name_or_path': 'NousResearch/Hermes-2-Pro-Llama-3-8B', 'num_gpus': 8, 'tensor_parallel_size': 1, 'pipeline_parallel_size': 1, 'partition_activations': False, 'zero_stage': 3, 'zero_allgather_bucket_size': 500000000.0, 'zero3_max_live_params': 1000000000.0, 'checkpoint_activations': False, 'batch_size_per_gpu': 2, 'sequence_length': 4096, 'vocab_size': 128288, 'hidden_size': 4096, 'num_attention_heads': 32, 'num_layers': 32, 'ffn_expansion_factor': 3.5, 'infer': False, 'kv_size_ratio': 0.25, 'is_mixed_precision': True, 'high_prec_bytes_per_val': 4, 'low_prec_bytes_per_val': 2, 'bytes_per_grad_ele': 4, 'num_experts': 0, 'expert_parallelism': 1, 'misc_mem_gib': 0}

Number of Parameters: 6.17 B

@Quentin-Anthony
Copy link
Member

@Quentin-Anthony,

If the user passes in some value that conflicts with the transformer passed, do we ignore it or take that into consideration?

for eg.

python transformer_mem.py \ 
--hf_model_name_or_path meta-llama/Llama-2-7b-hf \
--num-gpus 8 \
--zero-stage 3 \ 
--batch-size-per-gpu 2 \
--sequence-length 4096 \
--num_attention_heads 16 

in the above example the num_attention_heads is both explicitly passed and implicitly provided.

I think if the user provides an arg, we overwrite the HF config on that value. All overwritten values should get a print (e.g. "overwriting HF num_attention_heads config value (x) with user arg (y)")

@bhavnicksm
Copy link
Contributor Author

bhavnicksm commented May 9, 2024

I think if the user provides an arg, we overwrite the HF config on that value. All overwritten values should get a print (e.g. "overwriting HF num_attention_heads config value (x) with user arg (y)")

How do we check if the value is user-provided or a default value?

Say, the user gave num_attention_heads as 64 which is also the default value, the args would not be able to tell them apart.

Instead, maybe we could keep the default values in another dictionary and have the parser only have None type as default values so we can tell when we get a user input and when we are taking a default value.

What do you think? @Quentin-Anthony

@Quentin-Anthony
Copy link
Member

I think if the user provides an arg, we overwrite the HF config on that value. All overwritten values should get a print (e.g. "overwriting HF num_attention_heads config value (x) with user arg (y)")

How do we check if the value is user-provided or a default value?

Say, the user gave num_attention_heads as 64 which is also the default value, the args would not be able to tell them apart.

Instead, maybe we could keep the default values in another dictionary and have the parser only have None type as default values so we can tell when we get a user input and when we are taking a default value.

What do you think? @Quentin-Anthony

But this would mean that we have no default values and that the user needs to pass everything? If I'm misunderstanding, maybe just implement what you're describing real quick and we can iterate.

@bhavnicksm bhavnicksm changed the title WIP Add HuggingFace arg so that arch is automatic Add HuggingFace arg so that arch is automatic May 24, 2024
@bhavnicksm bhavnicksm marked this pull request as ready for review May 24, 2024 07:00
@bhavnicksm
Copy link
Contributor Author

Hi @Quentin-Anthony,
Added default value dictionary to handle None values, detecting user input when args are not none, and "replacing" them from the config (since they were already in the args, I skip those values)

@bhavnicksm
Copy link
Contributor Author

Hi @Quentin-Anthony, wanted to check in about this PR? Is this still required? Is something missing here?

@Quentin-Anthony
Copy link
Member

Hi @Quentin-Anthony, wanted to check in about this PR? Is this still required? Is something missing here?

Yep still needed! Reviewing now.

@Quentin-Anthony
Copy link
Member

Quentin-Anthony commented Aug 19, 2024

I rebased, and some reason this PR "files changed" view is now showing all the rebase changes? Gonna try and close and reopen to see if that fixes it.

EDIT: That did it!

Copy link
Member

@Quentin-Anthony Quentin-Anthony left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cleaned things up a bit, rebased, and tested. All looks great to me. Thank you!

@Quentin-Anthony Quentin-Anthony merged commit efb225c into EleutherAI:main Aug 19, 2024
1 check passed
@bhavnicksm
Copy link
Contributor Author

Thank You @Quentin-Anthony!
I enjoyed working on this with you, as my first open source PR. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants