Releases · ros-ai/ros2_whisper · GitHub

16 Dec 10:12

mhubii

v1.4.0 Latest

Latest

whisper_cpp_vendor: whisper.cpp 1.6.2 to 1.7.2 release, build changes
Added live audio transcription streaming
whisper_server:
- Holding incoming Audio data in a Ring Buffer (removed BatchBuffer, drop oldest audio).
- Transcribing the entire buffer of audio data with whisper.cpp on a timer interrupt
- Publishing the resulting tokens + probabilities on topic /whisper/tokens
- Removing the Action Server
- New Node Parameters:
  - active -- Boolean to control if whisper.cpp should be run or not.
  - callback_ms -- Integer controlling how often whisper.cpp is called.
  - buffer_capacity -- Integer number of seconds previous where audio is transcribed.
transcript_manager package added:
- Store record of what was previously transcribed.
- Track what is currently being transcribed. Align and update the text from subscribed topic /whisper/tokens.
  - Updates done on timer interrupt
- Host the Action Server which was previously part of whisper_server
- Publish the entire transcript (previous and current) under /whisper/transcript_stream
  - Published transcript contains text and estimated segment markings, segment timestamps
whisper_demos: Add stream node
whisper_idl:
- Added msg/WhisperTokens.msg, msg/AudioTranscript.msg
- Added launch/replay.launch.py which does not bring up audio_listener
whisper_util: Changes to directly inference and then serialize whisper.cpp model output, also containing probability data.

Also refer to https://github.com/ros-ai/ros2_whisper/blob/main/CHANGELOG.rst. Work done by @NathanCorral

Contributors

NathanCorral

Assets 2

01 Jul 16:31

mhubii

v1.3.1

whisper_msgs: Changed to whisper_idl package
whisper_bringup: Changed executor to MultiThreadedExecutor so audio and inference can run in parallel on whisper_server

Assets 2

21 Jun 17:07

mhubii

v1.3.0

Some update to the vendor package as well as downstream updates to parameters

whisper_cpp_vendor: whisper.cpp 1.5.4 -> 1.6.2 release
whisper_cpp_vendor: CMake build flag WHISPER_CUBLAS to WHISPER_CUDA
whisper_server: Removed launch mixins
whisper_server: Updated parameters for whisper.yaml
whisper_server: Fixed whisper initialization order

Assets 2

15 Dec 00:22

mhubii

v1.2.1

Changes whisper_cpp_vendor C++ version to C++11

Assets 2

20 Nov 11:07

mhubii

v1.2.0

Now supports full CUDA backend with whisper.cpp v1.5.0, refer https://github.com/ggerganov/whisper.cpp/releases/tag/v1.5.0
Fixes small bugs

Assets 2

01 Sep 18:02

mhubii

v1.1.0

Refer to #6

Improved action server for whisper_server
Improved terminal output for whisper_demos

Assets 2

31 Aug 16:22

mhubii

v1.0.0

Refer to #3

Assets 2

15 Aug 14:26

mhubii

alpha-v0.1.0

basic inference using whisper
custom audio listener implementation

Assets 2