add Kandinsky 2.0 - the first multilingual text2image model #1761
0-NiK-0
started this conversation in
Feature suggestions
Replies: 2 comments 2 replies
-
Beta Was this translation helpful? Give feedback.
0 replies
-
|
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Ability to write Prompt in more than 100 languages.
Kandinsky 2.0
https://github.com/ai-forever/Kandinsky-2.0
https://huggingface.co/sberbank-ai/Kandinsky_2.0
https://fusionbrain.ai/diffusion
Model architecture:
It is a latent diffusion model with two multilingual text encoders:
mCLIP-XLMR 560M parameters
mT5-encoder-small 146M parameters
These encoders and multilingual training datasets unveil the real multilingual text-to-image generation experience!
Kandinsky 2.0 was trained on a large 1B multilingual set, including samples that we used to train Kandinsky.
In terms of diffusion architecture Kandinsky 2.0 implements UNet with 1.2B parameters.
Kandinsky 2.0 architecture overview:
Beta Was this translation helpful? Give feedback.
All reactions