whisper_cpp_vendor
:whisper.cpp
1.6.2
to1.7.2
release, build changes- Added live audio transcription streaming
whisper_server
:- Holding incoming Audio data in a Ring Buffer (removed BatchBuffer, drop oldest audio).
- Transcribing the entire buffer of audio data with whisper.cpp on a timer interrupt
- Publishing the resulting tokens + probabilities on topic /whisper/tokens
- Removing the Action Server
- New Node Parameters:
active
-- Boolean to control if whisper.cpp should be run or not.callback_ms
-- Integer controlling how often whisper.cpp is called.buffer_capacity
-- Integer number of seconds previous where audio is transcribed.
transcript_manager
package added:- Store record of what was previously transcribed.
- Track what is currently being transcribed. Align and update the text from subscribed topic /whisper/tokens.
- Updates done on timer interrupt
- Host the Action Server which was previously part of whisper_server
- Publish the entire transcript (previous and current) under /whisper/transcript_stream
- Published transcript contains text and estimated segment markings, segment timestamps
whisper_demos
: Add stream nodewhisper_idl
:- Added
msg/WhisperTokens.msg
,msg/AudioTranscript.msg
- Added
launch/replay.launch.py
which does not bring upaudio_listener
- Added
whisper_util
: Changes to directly inference and then serializewhisper.cpp
model output, also containing probability data.
Also refer to https://github.com/ros-ai/ros2_whisper/blob/main/CHANGELOG.rst. Work done by @NathanCorral