Skip to content

voskjs testing for Speech to text and ready to deploy to heroku

License

Notifications You must be signed in to change notification settings

believeitcode/voskjs-misty

Repository files navigation

VoskJs

Vosk ASR offline engine transcript APIs for NodeJs developers. Contains a simple HTTP transcript server.

VoskJs can be used for speech recognition processing in different scenarios:

  • Single-user/standalone programs (e.g. perfect for single-user embedded systems)
  • Multi-user/multi-core server architectures

What's Vosk?

Vosk is an open source embedded (offline, on-device) speech-to-text engine which can run in real time also on small devices. It's based on Kaldi, but Nikolay V. Shmyrev's Vosk offers a smart and performant interface!

Documentation:

What's VoskJs?

The goal of the project is to:

  1. Create an simple function API layer on top of already existing Vosk nodejs binding, supplying main sentence-based speech-to-text functionalities:

    • loadModel(modelDirectory)

      Loads once in RAM memory a specific Vosk engine model from a model directory.

    • transcriptFromFile(fileName, model, options)

    • transcriptFromBuffer(buffer, model, options)

      At run-rime, transcripts a speech file or buffer (in WAV/PCM format), through the Vosk engine Recognizer. It supply speech-to-text transcript detailed info.

    Using the simple transcript interface you can build your standalone custom application, accessing async functions suitable to run on a usual single thread nodejs program.

  2. Use voskjs command line program to test Vosk transcript with specific models (some tests and command line usage here).

  3. Use httpServer, a simple HTTP server to transcript speech files. Or build your own server. Some usage examples here.

🛍 Install

1. Install Vosk engine and this nodejs module

  • Install vosk-api engine

    pip3 install vosk 

    See also: https://alphacephei.com/vosk/install

  • Install this module, as global package if you want to use CLI command voskjs

    npm install -g @solyarisoftware/voskjs

2. Install/Download Vosk models

mkdir your/path/models && cd models

# English large model
wget https://alphacephei.com/vosk/models/vosk-model-en-us-aspire-0.2.zip
unzip vosk-model-en-us-aspire-0.2.zip

# English small model
wget http://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip
unzip vosk-model-small-en-us-0.15.zip

# Italian model model
wget https://alphacephei.com/vosk/models/vosk-model-small-it-0.4.zip
unzip vosk-model-small-it-0.4.zip

More about available Vosk models here: https://alphacephei.com/vosk/models

3. Demo audio files

Directory audio contains some English language speech audio files, coming from a Mozilla DeepSpeech repo. Source: Mozilla DeepSpeech audio samples These files are used for some tests and comparisons.

Usage

Some transcript usage examples here

🛠 Tests

Some tests / notes here:

To do

  • 💣 Important open issue to be solved: solyarisoftware/voskJs#3 with a temporrary workaround: alphacep/vosk-api#516 (comment)

  • httpServer will also reply to an HTTP POST request receiving the speech WAV file as binary data attached in the HTTP request:

     curl -X POST 'http:localhost:3000/transcript' \
      --header "Content-Type: audio/wav" \
      --data-binary "@speech.wav"
    
  • Implement a simplified interface for all Vosk-api functions

  • Deepen grammar usage with examples

  • Review stress and performances tests (especially for the HTTP server)

  • To speedup latencies, rethink transcript interface, maybe with an initialization phases, including Model creation an the Recognizer(s) creation

✋ How to contribute

Any contribute is welcome.

  • Discussions. Please open a new discussion (a publich chat on github) for any specific open topic, for a clarification, change request proposals, etc.
  • Issues Please submit issues for bugs, etc
  • e-mail You can contact me privately, via email

🙏 Credits

Thanks to Nicolay V. Shmyrev, author of Vosk project, for the help about nodeJs API bindings for multi-threading management

See also:

License

MIT (c) Giorgio Robino


top

About

voskjs testing for Speech to text and ready to deploy to heroku

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published