-
Notifications
You must be signed in to change notification settings - Fork 176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why does LargeDiT 3B actually print 4.2B parameters ? #177
Comments
Large-DiT-3B follow the naming practice of LLaMa-3B. As diffusion model add AdaLN-Zero which dynamically predict bias/norm to modulate the diffusion backbone. The actual parameter is increased to 4.2 billion. Best Wishes |
Then why did 7b only increase to 7.2B? |
We will adjust our naming practices to reflect the true parameters counts of our model in the future. Thanks for your suggestion. |
and LargeDiT-T2I 3B actually print 5B parameters... |
@nemonameless the key-query weight of zero-init attention module contribute to an extra 1B parameter. We will clarify this in the future. Thanks for your timely feedback. |
and 7B actually print 7.2B parameters ? @ChrisLiu6
The text was updated successfully, but these errors were encountered: