Skip to content
This repository has been archived by the owner on Nov 29, 2023. It is now read-only.

New paper: PolyViT #109

Open
jacobbieker opened this issue Nov 29, 2021 · 2 comments
Open

New paper: PolyViT #109

jacobbieker opened this issue Nov 29, 2021 · 2 comments
Labels
enhancement New feature or request

Comments

@jacobbieker
Copy link
Member

Detailed Description

https://arxiv.org/abs/2111.12993

Seems similar to Perceiver, and they mention Perceiver as a related model, but their training is a bit different.

But it is still a multi modal model that could also be a good one to try

Context

Possible Implementation

@jacobbieker jacobbieker added the enhancement New feature or request label Nov 29, 2021
@JackKelly
Copy link
Member

Very cool! Lots of interesting details about training.

The very last sentence of the paper confuses me though:

We also do not currently fuse multiple modalities together (ie video and audio) to make a better
prediction, and aim to do so in future.

@jacobbieker
Copy link
Member Author

Yeah, I mostly skimmed it to be honest, but it seemed like they had one network they trained on multiple tasks with various inputs, but it does each of those separately, vs Perceiver doing it all together

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants