Mesh conditioning instead of text conditioning #77

pathquester · 2024-04-17T19:51:30Z

I was wondering if this was discussed before. The idea is to condition on existing meshes rather than text. This would be particularly useful in training it to retopologize existing meshes.

MarcusLoppe · 2024-04-20T10:31:37Z

Do you mean taking a mesh, then encode it into a vector embedding? Which then you can use to refine to create different versions of it?

It's possible, I don't think the author will do it since he have moved on but if it's possible to do this with the current lib, the text conditioner is just a class with the text embedding model.

So the transformer doesn't have access to the actual "text" and only uses the embedding vector, so in theory it's a easy replacement.
You can fork the lib and then create your own embedding model (dirty solution but you can just leave the text related stuff empty), then preprocess the meshes and set the "text_embedding" vector using the mesh encoder.
Then the transformer won't know the difference.
https://github.com/lucidrains/classifier-free-guidance-pytorch/blob/main/classifier_free_guidance_pytorch/bge.py

However training a model to create a good embedding of a mesh model is another thing. Which I'm not 100% how to even think about.

pathquester · 2024-04-20T16:47:41Z

Yes, is the current autoencoder a good fit for creating mesh embeddings for this purpose?

MarcusLoppe · 2024-04-20T17:53:44Z

Yes, is the current autoencoder a good fit for creating mesh embeddings for this purpose?

Kinda, it will encode the mesh to a list of tokens/codes. You can then create some kind of vector of this.
It's a lot of information to capture/encode and probably hard for the model to generalize without lots of training.

pathquester · 2024-04-20T18:05:00Z

Is the face_embed_output that it produces not suitable for this?

MarcusLoppe · 2024-04-25T17:48:23Z

Is the face_embed_output that it produces not suitable for this?

The encoder will output Fx192 meaning it will create a embedding for each triangle and not the entire mesh. So no. :(

lucidrains · 2024-05-11T00:04:28Z

what I would recommend is just to encode the prompt and response meshes, and use a separator token in between

will require work to handle the special separator token

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mesh conditioning instead of text conditioning #77

Mesh conditioning instead of text conditioning #77

pathquester commented Apr 17, 2024

MarcusLoppe commented Apr 20, 2024

pathquester commented Apr 20, 2024

MarcusLoppe commented Apr 20, 2024

pathquester commented Apr 20, 2024

MarcusLoppe commented Apr 25, 2024

lucidrains commented May 11, 2024 •

edited

Loading

Mesh conditioning instead of text conditioning #77

Mesh conditioning instead of text conditioning #77

Comments

pathquester commented Apr 17, 2024

MarcusLoppe commented Apr 20, 2024

pathquester commented Apr 20, 2024

MarcusLoppe commented Apr 20, 2024

pathquester commented Apr 20, 2024

MarcusLoppe commented Apr 25, 2024

lucidrains commented May 11, 2024 • edited Loading

lucidrains commented May 11, 2024 •

edited

Loading