Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mesh conditioning instead of text conditioning #77

Open
pathquester opened this issue Apr 17, 2024 · 6 comments
Open

Mesh conditioning instead of text conditioning #77

pathquester opened this issue Apr 17, 2024 · 6 comments

Comments

@pathquester
Copy link

I was wondering if this was discussed before. The idea is to condition on existing meshes rather than text. This would be particularly useful in training it to retopologize existing meshes.

@MarcusLoppe
Copy link
Contributor

Do you mean taking a mesh, then encode it into a vector embedding? Which then you can use to refine to create different versions of it?

It's possible, I don't think the author will do it since he have moved on but if it's possible to do this with the current lib, the text conditioner is just a class with the text embedding model.

So the transformer doesn't have access to the actual "text" and only uses the embedding vector, so in theory it's a easy replacement.
You can fork the lib and then create your own embedding model (dirty solution but you can just leave the text related stuff empty), then preprocess the meshes and set the "text_embedding" vector using the mesh encoder.
Then the transformer won't know the difference.
https://github.com/lucidrains/classifier-free-guidance-pytorch/blob/main/classifier_free_guidance_pytorch/bge.py

However training a model to create a good embedding of a mesh model is another thing. Which I'm not 100% how to even think about.

@pathquester
Copy link
Author

Yes, is the current autoencoder a good fit for creating mesh embeddings for this purpose?

@MarcusLoppe
Copy link
Contributor

Yes, is the current autoencoder a good fit for creating mesh embeddings for this purpose?

Kinda, it will encode the mesh to a list of tokens/codes. You can then create some kind of vector of this.
It's a lot of information to capture/encode and probably hard for the model to generalize without lots of training.

@pathquester
Copy link
Author

Is the face_embed_output that it produces not suitable for this?

@MarcusLoppe
Copy link
Contributor

Is the face_embed_output that it produces not suitable for this?

The encoder will output Fx192 meaning it will create a embedding for each triangle and not the entire mesh. So no. :(

@lucidrains
Copy link
Owner

lucidrains commented May 11, 2024

what I would recommend is just to encode the prompt and response meshes, and use a separator token in between

will require work to handle the special separator token

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants