Why does LargeDiT 3B actually print 4.2B parameters ? #177

nemonameless · 2024-03-11T11:20:51Z

and 7B actually print 7.2B parameters ? @ChrisLiu6

gaopengpjlab · 2024-03-11T12:08:59Z

Large-DiT-3B follow the naming practice of LLaMa-3B. As diffusion model add AdaLN-Zero which dynamically predict bias/norm to modulate the diffusion backbone. The actual parameter is increased to 4.2 billion.

Best Wishes

nemonameless · 2024-03-11T12:49:41Z

Then why did 7b only increase to 7.2B?
and is there a standard llama configuration for 3B?

nemonameless · 2024-03-12T03:16:58Z

@gaopengpjlab

gaopengpjlab · 2024-03-12T03:40:12Z

We will adjust our naming practices to reflect the true parameters counts of our model in the future. Thanks for your suggestion.

nemonameless · 2024-03-17T07:06:20Z

and LargeDiT-T2I 3B actually print 5B parameters...

gaopengpjlab · 2024-03-17T07:18:30Z

@nemonameless the key-query weight of zero-init attention module contribute to an extra 1B parameter. We will clarify this in the future. Thanks for your timely feedback.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why does LargeDiT 3B actually print 4.2B parameters ? #177

Why does LargeDiT 3B actually print 4.2B parameters ? #177

nemonameless commented Mar 11, 2024

gaopengpjlab commented Mar 11, 2024

nemonameless commented Mar 11, 2024

nemonameless commented Mar 12, 2024

gaopengpjlab commented Mar 12, 2024

nemonameless commented Mar 17, 2024

gaopengpjlab commented Mar 17, 2024

Why does LargeDiT 3B actually print 4.2B parameters ? #177

Why does LargeDiT 3B actually print 4.2B parameters ? #177

Comments

nemonameless commented Mar 11, 2024

gaopengpjlab commented Mar 11, 2024

nemonameless commented Mar 11, 2024

nemonameless commented Mar 12, 2024

gaopengpjlab commented Mar 12, 2024

nemonameless commented Mar 17, 2024

gaopengpjlab commented Mar 17, 2024