Welcome to DALLE-pytorch Discussions! #14
Replies: 10 comments 8 replies
-
Thanks for starting the discussion, @lucidrains ! I'm Meng Lee, a machine learning engineer based in Tokyo, also write a blog (in Chinese) @ leemeng.tw. |
Beta Was this translation helpful? Give feedback.
-
Thanks for this and all the efforts in recreating this amazing work! I'm Theodore, and I'm only adjacent to the ML field. My work focuses on developing ML workflows for architectural design, and I'm really interested in exploring the potential of DALL-E in generative design through natural language. |
Beta Was this translation helpful? Give feedback.
-
So I was looking at CLIP and contrastive bag of words and I had a thought and a question, hope this is a good place for it. Would an adversarial type of training be interesting for this domain text+image domain? Something like learning representations by paraphrasing text input (a few papers on that recently, like MARGE from Facebook). Would that sort of shift in the space between the two text representations also be coupled with a shift in the visual domain representations? Could it be a way to contrastively train a dall-e kind of system (if that is even a thing)? I apologize in advance if I'm using some terms loosely here, not an expert in this by any means. |
Beta Was this translation helpful? Give feedback.
-
To foster this discussion I want to mention that yesterday I stumbled upon a recent project of fellow southern-Germany researchers of Heidelberg https://compvis.github.io/taming-transformers/ Aside from the amazing visuals these repository contains rich examples and code for the very purpose of generating high resolution images by means of quantization. It seems that OpenAI and those guys at the same time came up with this codebook technique |
Beta Was this translation helpful? Give feedback.
-
maybe this is a more appropriate place to ask this question. I am wondering, if instead of text one has another image modality, say for example the left image of a pair of stereo cameras where the right image has been used to train the VAE, how would one go about using this in DALL-E? According to the discussion section of this repo, the camera image has to be tokenized. I am contemplating whether it makes more sense to use another VAE for the second stream of images and rely on its resulting codebook indices or if it is more reasonable to use e.g. a ViT prior to token concatenation and feeding into the main transformer of DALL-E? Maybe even a simple trainable ViT Embedding layer within the forward pass of DALL-E before the concatenation process suffices? I am just spitballing here and would be grateful for yours or anyone else's take on this |
Beta Was this translation helpful? Give feedback.
-
Hi. Can you steer me a bit for CLIP part, I didn't get from the paper. Is there a way to use pretrained CLIP and then finetune on a custom dataset. if I have output from different GANs like few dozens pics and the task is so pick the most relevant pic , can I pass this image embeddings to CLIP without finetuning. How much data and compute is required to train a released CLIP . |
Beta Was this translation helpful? Give feedback.
-
Hello, I am really interested in DALL-E and exploring that sort of thing. But I have absolutely no knowledge in terms of coding or anything of that nature and this question is probably going to be pretty dumb. But is this the OpenAI portion of DALL-E? Can I use it like the examples on the OpenAI website by giving prompts and having images generated? And if so, how do I go about installing or interacting with something like this? Thanks, |
Beta Was this translation helpful? Give feedback.
-
I guess DALLE paper and dVAE code with pretrained model is out. 🎉 |
Beta Was this translation helpful? Give feedback.
-
Hi all, Thank you for your amazing work! Thank you! |
Beta Was this translation helpful? Give feedback.
-
Good morning, thank you so much for sharing this project. I'm new to machine learning in general, so apologies from the start if questions may be trivial. I have some difficulties reading the readme.
Thank you very much qq. |
Beta Was this translation helpful? Give feedback.
-
👋 Welcome!
We’re using Discussions as a place to connect with other members of our community. We hope that you:
build together 💪.
To get started, comment below with an introduction of yourself and tell us about what you do with this community.
Beta Was this translation helpful? Give feedback.
All reactions