Support for non-text modalities (images, speech, video) #316

neubig · 2023-09-02T16:40:06Z

Currently prompt2model is limited to text input text output tasks. The underlying framework can certainly handle different modalities, and it would be great to see prompt2model be able to handle different types of tasks as well (such as image classification/generation, speech tasks, etc.).

But we'll probably need to think about several things such as:

How are we picking appropriate base models and datasets for the modality
What do we do about dataset generation?
In the case of non-text output, how do we adjust our evaluation?

We can start discussing the necessary steps on this issue, and implement the necessary pieces bit-by-bit. We'd be happy for contributions!

MahamedDucale · 2023-09-02T17:09:00Z

For model selection we can use an llm to determine modality from user prompt and then retrieve appropriate dataset and model.

I think dataset generation would entail using another model retriever module to select a generative model for the modality of interest, that is if model performance increases only and if not dataset retrieval would only be used.

In the case of non text ouptut evaluation, perhaps using an appropriate evaluation metric using the huggingface evaluation library which is also retrieved.

zhaochenyang20 · 2023-09-02T22:39:01Z

Cool. Some HCI faculties in Tsinghua also talked me with multi-modalities Prompt2Model.

pieris98 · 2024-03-05T12:51:33Z

For other modalities (e.g. visual QA, video anomaly detection, image generation, speech-to-text, text-to-speech etc.), it would be nice for start to just propose existing datasets and/or models, since prompt2model is advertised as a better solution for retrieval of datasets/models than search engines and human manual searching.

neubig · 2024-03-05T13:26:42Z

Thanks @pieris98 ! In theory it should already be able to do this, but we might have to include these datasets in the dataset index. CC @ritugala who has recently re-created the dataset index and might be able to give additional guidance.

neubig added the enhancement New feature or request label Sep 2, 2023

neubig changed the title ~~Support for other modalities~~ Support for non-text modalities (images, speech, video) Sep 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for non-text modalities (images, speech, video) #316

Support for non-text modalities (images, speech, video) #316

neubig commented Sep 2, 2023 •

edited

Loading

MahamedDucale commented Sep 2, 2023 •

edited

Loading

zhaochenyang20 commented Sep 2, 2023

pieris98 commented Mar 5, 2024

neubig commented Mar 5, 2024

Support for non-text modalities (images, speech, video) #316

Support for non-text modalities (images, speech, video) #316

Comments

neubig commented Sep 2, 2023 • edited Loading

MahamedDucale commented Sep 2, 2023 • edited Loading

zhaochenyang20 commented Sep 2, 2023

pieris98 commented Mar 5, 2024

neubig commented Mar 5, 2024

neubig commented Sep 2, 2023 •

edited

Loading

MahamedDucale commented Sep 2, 2023 •

edited

Loading