Support audio in multimodal messages #370

rachwalk · 2025-01-16T15:29:48Z

Is your feature request related to a problem? Please describe.

LLM APIs have started supporting audio input, so it would be beneficial for RAIMultimodalMessages to support audio as well.

Describe the solution you'd like
MultimodalMessage class (

rai/src/rai/rai/messages/multimodal.py

Line 38 in 5d3a8f3

if self.audios not in [None, []]:

) should support audio input.

Describe alternatives you've considered

This is the only suitable solution within the current architecture.

Additional context

mdimado · 2025-01-17T02:33:52Z

from the issue I understood that the changes are to mede in the messages/multimodal.py
and the changes to be made are:

delete the if self.audios not in [None, []]: check that was blocking audio support
add support for base64 encoded audio files in the __init__ method
create audio content entries similar to how images are handled using appropriate mime type for audio (e.g "audio/wav")

should i create a pull request with these changes?

please assign this issue. ill work on it and create a pr
If i'm missing out on something, please let me know

maciejmajek · 2025-01-17T09:17:56Z

Hi @mdimado, yes, please feel free to create a PR for this task! A fully completed implementation should include:

A preprocess_audio function, similar to preprocess_image, to handle conversion of various audio formats (e.g., .mp3, .wav, np.array with sampling rate) into a standard format accepted by multimodal vendors.
Validation to ensure the model can process and understand the provided audio content (e.g., compatibility with gpt-4o-audio-preview).

Let me know if you need any further clarification or assistance (here and/or on discord)

mdimado · 2025-01-17T11:51:06Z

thanks for the clarification and additional details. after reviewing the task, i realize implementing the preprocess_audio function and handling validations might need more learning on my part. to ensure timely and high-quality work, i think someone with more expertise could handle this better. apologies for the inconvenience, and i kindly request to unassign myself for now.

maciejmajek · 2025-01-17T14:09:09Z

Hey @mdimado, no worries at all! We're all here to learn and grow together—that's what makes this such a great environment. 😊 Feel free to tackle any part of the work you're comfortable with, and don't hesitate to ask for guidance along the way. We’re always happy to help and support you through the process. Looking forward to it! 🚀

rachwalk · 2025-01-17T14:14:38Z

@mdimado I have created sub-issues based on your task description: #373 feel free to comment under it so I can assign you.

rachwalk added enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed labels Jan 16, 2025

maciejmajek assigned maciejmajek and mdimado and unassigned maciejmajek Jan 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support audio in multimodal messages #370

Support audio in multimodal messages #370

rachwalk commented Jan 16, 2025

mdimado commented Jan 17, 2025

maciejmajek commented Jan 17, 2025 •

edited

Loading

mdimado commented Jan 17, 2025

maciejmajek commented Jan 17, 2025

rachwalk commented Jan 17, 2025

Support audio in multimodal messages #370

Support audio in multimodal messages #370

Comments

rachwalk commented Jan 16, 2025

Additional context

mdimado commented Jan 17, 2025

maciejmajek commented Jan 17, 2025 • edited Loading

mdimado commented Jan 17, 2025

maciejmajek commented Jan 17, 2025

rachwalk commented Jan 17, 2025

maciejmajek commented Jan 17, 2025 •

edited

Loading