Skip to content

Commit

Permalink
wip
Browse files Browse the repository at this point in the history
  • Loading branch information
kenarsa committed May 16, 2024
1 parent 3958010 commit 2b8a4c9
Showing 1 changed file with 42 additions and 5 deletions.
47 changes: 42 additions & 5 deletions recipes/llm-voice-assistant/python/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,33 +5,70 @@

## AccessKey

AccessKey is your authentication and authorization token for deploying Picovoice SDKs, including picoLLM. Anyone who is
using Picovoice needs to have a valid AccessKey. You must keep your AccessKey secret. You would need internet
connectivity to validate your AccessKey with Picovoice license servers even though the LLM inference is running 100%
offline and completely free for open-weight models. Everyone who signs up for
[Picovoice Console](https://console.picovoice.ai/) receives a unique AccessKey.

## picoLLM Model

## Custom Wake Word (Optional)
picoLLM Inference Engine supports many open-weight models. The models are on
[Picovoice Console](https://console.picovoice.ai/).

## Usage

Install the required packages:

```console
pip install -r requirements.txt
```

Run the demo:

```console
python3 main.py --access_key ${ACCESS_KEY} --picollm_model_path ${PICOLLM_MODEL_PATH}
```

Replace `${ACCESS_KEY}` with yours obtained from Picovoice Console and `${PICOLLM_MODEL_PATH}` with the path to the
model downloaded from Picovoice Console.

To see all available options, type the following:

```console
python main.py --help
```

## Profiling
## Custom Wake Word (Optional)

### Realtime Factor
The demo's default wake phrase is `Picovoice`. You can generate your custom (branded) wake word using Picovoice Console by following [Porcupine Wake Word documentation (https://picovoice.ai/docs/porcupine/). Once you have the model trained, simply pass it to the demo
application using `--keyword_model_path` argument.

### Token per Second
## Profiling

### Latency
To see the runtime profiling metrics, run the demo with the `--profile` argument:

```console
python3 main.py --access_key ${ACCESS_KEY} --picollm_model_path ${PICOLLM_MODEL_PATH} --profile
```

Replace `${ACCESS_KEY}` with yours obtained from Picovoice Console and `${PICOLLM_MODEL_PATH}` with the path to the
model downloaded from Picovoice Console.

The demo profiles three metrics: Real-time Factor (RTF), Token per Second (TPS), and Latency.

### Real-time Factor (RTF)

RTF is a standard metric for measuring the speed of speech processing (e.g., wake word, speech-to-text, and
text-to-speech). RTF is the CPU time divided by the processed (recognized or synthesized) audio length. Hence, a lower RTF means a more efficient engine.

### Token per Second (PPS)

Token per second is the standard metric for measuring the speed of LLM inference engines. TPS is the number of
generated tokens divided by the compute time used to create them. A higher TPS is better.

### Latency

We measure the latency as the delay between the end of the user's utterance (i.e., the time when the user finishes talking) and the
time that the voice assistant generates the first chunk of the audio response (i.e., when the user starts hearing the response).

0 comments on commit 2b8a4c9

Please sign in to comment.