Issues with Audio Analysis and Multi-turn Dialogue in Baichuan-Omni-1.5 #4

xiexiaoshinick · 2025-02-07T11:46:42Z

I am currently testing the audio transcription (ASR) and audio analysis capabilities of Baichuan-Omni-1.5, but I am experiencing subpar performance compared to the results reported in the Baichuan-Omni paper. Below are the specific issues I have encountered:

Audio Transcription (ASR):
While Baichuan-Omni-1.5 performs better than MiniCPM-Omni in terms of audio transcription (ASR), its audio analysis capabilities seem significantly weaker. The model frequently generates errors such as "Audio file not found" or "Please provide an audio file," even when valid audio input is provided.
Multi-turn Dialogue Issues:
In multi-turn dialogue scenarios, where the model is first tasked with transcribing audio (ASR) and then combining the audio and transcription results to answer follow-up questions, the model often fails to retain context. Specifically, it frequently outputs error messages like "Audio not found," despite the audio being successfully processed in earlier steps.
Lack of Reference Code for Multi-modal Dialogue:
Unlike the MiniCPM-Omni repository, which provides reference implementations for multi-turn dialogue involving audio, images, and videos, the Baichuan-Omni repository does not include similar examples. This makes it difficult to determine whether the issues stem from my implementation or limitations in the model itself.

Request:
Could you please provide reference code or examples for multi-turn dialogue scenarios involving audio, images, and videos? This would help users verify their implementations and ensure that the model's full capabilities are being utilized effectively. Additionally, any insights into resolving the "Audio not found" issue during multi-turn dialogues would be greatly appreciated.

Thank you for your attention to these matters!

Environment Details:

Model Version: Baichuan-Omni-1.5
Framework/Toolkit: [Specify if applicable]
Hardware: [Specify if applicable]

Looking forward to your response!

HaoZeSun2016 · 2025-02-07T13:09:03Z

Could you provide the full prompt and audio files?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issues with Audio Analysis and Multi-turn Dialogue in Baichuan-Omni-1.5 #4

Issues with Audio Analysis and Multi-turn Dialogue in Baichuan-Omni-1.5 #4

xiexiaoshinick commented Feb 7, 2025

HaoZeSun2016 commented Feb 7, 2025

Issues with Audio Analysis and Multi-turn Dialogue in Baichuan-Omni-1.5 #4

Issues with Audio Analysis and Multi-turn Dialogue in Baichuan-Omni-1.5 #4

Comments

xiexiaoshinick commented Feb 7, 2025

HaoZeSun2016 commented Feb 7, 2025