You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a question about the sample_sparse_structure function. If it samples difference size of coordinates from the image encoded latents, how can this operation be applied for multiple batch size during training ?
The text was updated successfully, but these errors were encountered:
Hi,
Batched attention with different sequence length can be efficiently handled with modern attention implementation like flash-attn and xformers. See the code here for more details.
Thanks for the excellent work!
I have a question about the sample_sparse_structure function. If it samples difference size of coordinates from the image encoded latents, how can this operation be applied for multiple batch size during training ?
The text was updated successfully, but these errors were encountered: