Echogarden is an easy-to-use speech toolset that includes a variety of speech processing tools.
- Easy to install, run, and update
- Written in TypeScript, for the Node.js runtime
- Can be used either as a command-line utility, or imported as a standard
npm
package - Runs on Windows (x64, ARM64), macOS (x64, ARM64) and Linux (x64, ARM64)
- Doesn't require Python, Docker, or other system-level dependencies
- Doesn't rely on essential platform-specific binaries. Engines are either written in pure TypeScript, ported via WebAssembly, or imported using the ONNX runtime
- Fully open-source (GPL v3)
- Text-to-speech using high-quality Kokoro and VITS offline models for many languages and dialects, and 16 other offline and online engines, including cloud services by Google, Microsoft, Amazon, OpenAI and Elevenlabs
- Speech-to-text using a custom TypeScript/ONNX port of the OpenAI Whisper speech recognition architecture, whisper.cpp, and several other engines, including cloud services by Google, Microsoft, Amazon and OpenAI
- Speech-to-transcript alignment using several variants of dynamic time warping (DTW, DTW-RA), including support for multi-pass (hierarchical) processing, or via guided decoding using Whisper recognition models. Supports 100+ languages
- Speech-to-text translation, translates speech in any of the 98 languages supported by Whisper, to English, with near word-level timing for the translated transcript
- Speech-to-translated-transcript alignment synchronizes spoken audio in one language, to a provided English-translated transcript, using the Whisper engine
- Speech-to-transcript-and-translation alignment synchronizes spoken audio in one language, to a translation in a variety of other languages, given both a transcript and its translation
- Text-to-text translation, translates text between various languages. Supports cloud-based Google Translate engine
- Language detection identifies the language of a given audio or text. Includes Whisper or Silero engines for spoken audio, and TinyLD or FastText for text
- Voice activity detection attempts to identify segments of audio where voice is active or inactive. Includes WebRTC VAD, Silero VAD, RNNoise-based VAD and a built-in Adaptive Gate algorithm
- Speech denoising attenuates background noise from spoken audio. Includes the RNNoise and NSNet2 engines
- Source separation isolates voice from any music or background ambience. Includes the MDX-NET deep learning architecture
- Word-level timestamps for all recognition, synthesis, alignment and translation outputs
- Advanced subtitle generation, accounting for sentence and phrase boundaries
- For the Kokoro, VITS and eSpeak-NG synthesis engines, includes enhancements to improve TTS pronunciation accuracy: adds text normalization (e.g. idiomatic date and currency pronunciation), English heteronym disambiguation (based on a simple rule-based model), various pronunciation corrections, and accepts user-provided pronunciation lexicons
- Internal package system that auto-downloads and installs voices, models and other resources, as needed
Ensure you have Node.js v18
or later installed (v22
or later is recommended).
then:
npm install -g echogarden@latest
You can use npm-check-updates
to check for a newer version:
npm install -g npm-check-updates
ncu -g echogarden
Then, if an updated version is available, use the command line ncu
provides to make the update.
A small sample of command lines:
echogarden speak "Hello World!"
echogarden speak-file story.txt --engine=kokoro
echogarden transcribe speech.mp3
echogarden translate-speech speech.webm subtitles.srt
echogarden align speech.opus transcript.txt
echogarden isolate speech.wav
See the command-line interface guide for more details on the operations supported, and the configuration options reference for a comprehensive list of all options supported.
If you are a developer, you can also directly import the package as a dependency in your code. The API operations and options closely mirror the CLI.
- Quick guide to the command-line interface
- Options reference
- Full list of all available engines
- Node.js API reference
- Enabling the CUDA ONNX execution provider
- Technical overview and Q&A
- How to help
- Setting up a development environment
- Developer's task list
- Release notes (for releases up to
1.0.0
)
This project consolidates, and builds upon the effort of many different individuals and companies, as well as contributing a number of original works.
Developed by Rotem Dan (IPA: /ˈʁɒːtem ˈdän/).
GNU General Public License v3
Licenses for components, models and other dependencies are detailed on this page.