Streaming inference from arbitrary source (FFmpeg input) to STT, using VAD (voice activity detection). A fairly simple example demonstrating the STT streaming API in Node.js.
This example was successfully tested with a mobile phone streaming a live feed to a RTMP server (nginx-rtmp), which then could be used by this script for near real time speech recognition.
npm install
Moreover FFmpeg must be installed:
sudo apt-get install ffmpeg
Here is an example for a local audio file:
node ./index.js --audio <AUDIO_FILE> \
--model $HOME/models/output_graph.pbmm
Here is an example for a remote RTMP-Stream:
node ./index.js --audio rtmp://<IP>:1935/live/teststream \
--model $HOME/models/output_graph.pbmm
Real time streaming inference with STT's example audio (audio-0.4.1.tar.gz).
node ./index.js --audio $HOME/audio/2830-3980-0043.wav \
--scorer $HOME/models/kenlm.scorer \
--model $HOME/models/output_graph.pbmm
node ./index.js --audio $HOME/audio/4507-16021-0012.wav \
--scorer $HOME/models/kenlm.scorer \
--model $HOME/models/output_graph.pbmm
node ./index.js --audio $HOME/audio/8455-210777-0068.wav \
--scorer $HOME/models/kenlm.scorer \
--model $HOME/models/output_graph.pbmm
Real time streaming inference in combination with a RTMP server.
node ./index.js --audio rtmp://<HOST>/<APP>/<KEY> \
--scorer $HOME/models/kenlm.scorer \
--model $HOME/models/output_graph.pbmm
To get the best result mapped on to your own scenario, it might be helpful to adjust the parameters VAD_MODE
and DEBOUNCE_TIME
.