Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why does LargeDiT 3B actually print 4.2B parameters ? #177

Open
nemonameless opened this issue Mar 11, 2024 · 6 comments
Open

Why does LargeDiT 3B actually print 4.2B parameters ? #177

nemonameless opened this issue Mar 11, 2024 · 6 comments

Comments

@nemonameless
Copy link

and 7B actually print 7.2B parameters ? @ChrisLiu6

@gaopengpjlab
Copy link
Contributor

Large-DiT-3B follow the naming practice of LLaMa-3B. As diffusion model add AdaLN-Zero which dynamically predict bias/norm to modulate the diffusion backbone. The actual parameter is increased to 4.2 billion.

Best Wishes

@nemonameless
Copy link
Author

Then why did 7b only increase to 7.2B?
and is there a standard llama configuration for 3B?

@nemonameless
Copy link
Author

@gaopengpjlab

@gaopengpjlab
Copy link
Contributor

We will adjust our naming practices to reflect the true parameters counts of our model in the future. Thanks for your suggestion.

@nemonameless
Copy link
Author

and LargeDiT-T2I 3B actually print 5B parameters...

@gaopengpjlab
Copy link
Contributor

@nemonameless the key-query weight of zero-init attention module contribute to an extra 1B parameter. We will clarify this in the future. Thanks for your timely feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants