Nala

Nala is a voice-assistant framework to quickly build and prototype voice assistants in <5 minutes within the greater context of the emerging large-language-model (LLM) landscape. With Nala you can easily intgrate with state-of-the-art (SOTA) transcription like Whisper API, text-to-speech synthesis engines like Microsoft's SpeechT5 model, and LLMs like Dolly-v2-3b within a nice front-end - across any arbitrary wake word powered with the Web Speech API.

Here are some of Nala's key features:

Extensible Architecture: Nala offers a flexible and modular, python-centric FastAPI architecture that allows developers to extend its functionality with ease. Integrate new response models or TTS voice skins into your projects effortlessly.
Native LLM Integration: Nala integrates directly with the the Dolly-v2-3b LLM model - and makes it easy for you to integrate with others using an easy-to-follow strategy with helper functions.
Multi-Platform Support: Nala is designed to work seamlessly across various platforms and operating systems (e.g. Mac/Linux and Chrome/Safari). Whether you're building web applications, mobile apps, or even IoT devices, Nala can be easily integrated into your technology stack.
Audio-to-Audio API: Nala's FastAPI design allows for you to submit an audio file and get back audio file responses through the query-response model; few projects out there exist to help guide you through how to do this, so this may help accelerate learning for your voice assistant projects.
Simple UI: Nala provides a simple user interface for users to quickly rate responses with thumbs up or thumbs-down to aid in building reinforcement learning models with Reinforcement Learning with Human Feedback.
Privacy and Security: Nala allows for downloads to be administered by superusers as specified in the settings.json - as well as authenticates users and sessions with standard JSON web tokens. Other features like encryption at rest, deletion of audio files, and other defaults are being worked on right now to preserve user privacy.

Note that this is a version 2.0, web-enabled version of a prior voice assistant app here.

getting started

mac (locally)

Install basic dependencies:

sudo apt-get install ffmpeg
git clone git@github.com:jim-schwoebel/nala_assistant.git
cd nala_assistant
virtualenv env 
source env/bin/activate
pip3 install -r requirements.txt
pip3 install git+https://github.com/suno-ai/bark.git

Generate a secret key for SESSION_SECRET, JWT_SECRET_KEY, JWT_REFRESH_SECRET_KEY and environment vars using the following line of code 3 times (save this in .env)

python -c 'import secrets; print(secrets.token_hex())'

Also, you need a WEB_URL and TERMS_URL for your website and the terms of use, accordingly. These also are in the .env file.

To open and edit .env file:

nano .env

Then run the app:

uvicorn app:app --reload

Note if you having trouble with the uvicorn app:app --reload command, you can try:

python3 -m uvicorn app:app --reload

And sometimes this make it work.

You will now be able to visit localhost (http://127.0.0.1:8000) to use application.

linux with GPU (locally)

Install basic dependencies:

sudo apt-get install ffmpeg
git clone git@github.com:jim-schwoebel/nala_assistant.git
cd nala_assistant
virtualenv env 
source env/bin/activate
pip3 install -r gpu_requirements.txt
pip3 install git+https://github.com/suno-ai/bark.git

Generate a secret key for SESSION_SECRET, JWT_SECRET_KEY, JWT_REFRESH_SECRET_KEY and environment vars using the following line of code 3 times (save this in .env)

python -c 'import secrets; print(secrets.token_hex())'

Also, you need a WEB_URL and TERMS_URL for your website and the terms of use, accordingly. These also are in the .env file.

To open and edit .env file:

nano .env

Then run the app:

uvicorn app:app --reload

You will now be able to visit localhost (http://127.0.0.1:8000) to use appication.

api docs (locally)

Once you have setup the app locally, you can get to the api docs @ http://127.0.0.1:8000/docs (for swagger docs) or http://127.0.0.1:8000/redoc (for redoc). The recommended set of docs to use is http://127.0.0.1:8000/docs (swagger) as there is greater support for authentication with JSON web tokens and audio-to-audio routes. A screenshot is shown below of the docs to give you an idea of what they look like. The auto-generated docs via FastAPI make it much easier to expand the routes to your particular need as a developer.

deploying to server (externally)

Follow these instructions to deploy on a server.

Buy a domain on namecheap.com.
Get a vultr account / forward DNS to cloudflare from domain. Note that you will need at least 1 NVIDIA V100 GPU to have a seamless user experience with the Bark model and various LLMs like Dolly.
Get a cert.pem and private.pem file on cloudflare for the server.
Create a virtual machine on vultr or a similar platform, forward CNAME on cloudflare to IP address of host.
Set up the server with at least 1 NVIDIA V100 GPU (e.g. pip3 install -r gpu_requirements.txt), as described in the linux with GPU (locally) section above.
Run the command on the server with uvicorn below.

Enable firewall rules for SSL (port 443)

sudo ufw allow 80
sudo ufw allow 443

nohup gunicorn --bind {ip_address}:443 main:app --certfile=cert.pem --keyfile=private.pem -w 10 --graceful-timeout 30 -t 30 --worker-class=uvicorn.workers.UvicornWorker --workers 10 </dev/null &>/dev/null &

</dev/null &>/dev/null & is a statement means that it is a background job, and you need to change [ip_address] with the right IP adddress.

settings

Here are the current settings that you can edit in th settings.json file:

{"website_name": "Nala",
    "wake_word": "hey", 
    "super_users": ["jim@schwoebel.me"],
    "audio_delete": {"default": false, "options": [true,false]},
    "sounds": {"default": "chime", "options": ["chime", "bell"]}, 
    "voice": {"default": "bark", "options": ["microsoft", "bark"]}, 
    "response_type": {"default": "dolly", "options": ["blender","dolly", "echo"]}, 
    "language": {"default": "en-us", "options": ["en-us"]}}

You can edit the website name, wakeword, super_users (registered users who can download data), sounds (after query), voice (response skin), response_type (e.g. LLM models), and language (e.g. en-us only supported for now) here in the file. Note that the options listed here are currently the only options provided in the repository, but they are easy-to-extend as a framework later in the helpers.py file.

browser compatibility

Currently, Nala works on Chrome and Safari-based browsers based on Web Speech API standards. If you load Nala on any other browser, it will give an error message like this.

Note that you can find a current list of browsers that support the Web Speech API here or in the figure below.

maintainers

This project was incubated as a result of the Erdos Fellowship program - and since has resulted in a larger independent initiative.

Here is a list of active maintainers to this project:

Jim - chief maintainer, Erdos Institute mentor
Jin - Erdos Institute fellow
Nathan - Erdos Institute fellow
Collin - Data scientist @ Indeed.com (project advisor)

If you'd like to help maintain this project, reach out to Jim Schwoebel @ jim@schwoebel.me and he can invite you to our weekly call to ship PRs and delegate work in our sprint cycle.

references

Here is a quick list of references for additional reading.

javascript front-end

audio.js - playback audio alternative (setting)
bootstrap icons - use bootstrap and bootstrap icons for javascript front-end
howlers.js - playback audio for assistant
recorder.js - to record audio files with bootstrap icon buttons
Web Speech API
wavesurfer.js - for enumerating last audio file generated in the browser

feature extraction

python_speech_features - audio feature extraction method useed

ML models (used)

Dolly-v2-3b - LLM (Databricks)
SpeechT5 model - text-to-speech synthesis (Microsoft)
Whisper API - speech-to-text (SOTA)
RLHF - human feedback.

future tools used

auth0 - authentication / tokens
minio - minio is an object storage platform

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Nala

getting started

mac (locally)

linux with GPU (locally)

api docs (locally)

deploying to server (externally)

settings

browser compatibility

maintainers

references

javascript front-end

feature extraction

ML models (used)

future tools used

Files

README.md

Latest commit

History

README.md

File metadata and controls

Nala

getting started

mac (locally)

linux with GPU (locally)

api docs (locally)

deploying to server (externally)

settings

browser compatibility

maintainers

references

javascript front-end

feature extraction

ML models (used)

future tools used