Currently, Espressif's ESP32-based speech command recognition model MultiNet supports up to 100 Chinese speech commands (We will add supports for English speech commands in the next release of esp-sr).
This example demonstrates the basic process of recognizing Chinese speech commands with ESP32-LyraT-Mini. Please also see a flow diagram below.
For more information about ESP32-LyraT-Mini, please see ESP32-LyraT-Mini Getting Started Guide.
Navigate to Audio Media HAL
, and configure the following parameters as instructed.
Audio hardware board
: selectESP32-Lyrat Mini V1.1
;Audio codec chip
: selectCODEC IS ES8311
;use external adc
: selectuse es7243
;Audio DSP chip
: selectNo DSP chip
.
Navigate to Component config
-> ESP Speech Recognition
, and configure the following parameters as instructed.
Wake word engine
: selectWakeNet 5 (quantized)
;Wake word name
: selecthilexin (WakeNet5)
;speech commands recognition model to us
: selectMultiNet 1 (quantized)
;langugae
: selectchinese (MultiNet1)
;The number of speech commands
-> The number of speech commands ID;Add speech commands
-> Add the speech commands in pinyin.
Then save the configuration and exit.
Now, the MultiNet model predifine 4 speech commands. Users also can define their own speech commands and the number of speech commands ID in the menuconfig -> Component config -> ESP Speech Recognition -> Add speech commands
and The number of speech commands
. Note that, the speech commands should be provided in Pinyin with spaces in between. For example, the command of “打开空调”, which means to turn on the air conditioner, should be provided as "da kai kong tiao".
- One speech commands ID can correspond to multiple speech command phrases;
- Up to 100 speech commands ID or speech command phrases, including customized commands, are supported;
- The corresponding multiple phrases for one Command ID need to be separated by ','.
Users can define the action for each Command ID in the void speech_commands_action(int command_id)
function. For example:
void speech_commands_action(int command_id)
{
printf("Commands ID: %d.\n", command_id);
switch (command_id) {
case 0:
// action0();
break;
case 1:
// action1();
break;
case 2:
// action2();
break;
case 3:
// action3();
break;
// ...
default:
break;
}
}
Run make flash monitor
to compile, flash and run this example, and check the output log:
Quantized wakeNet5: wakeNet5_v1_hilexin_5_0.95_0.90, mode:0
Quantized MN1
I (153) MN: ---------------------SPEECH COMMANDS---------------------
I (163) MN: Command ID0, phrase 0: da kai kong tiao
I (163) MN: Command ID1, phrase 1: guan bi kong tiao
I (173) MN: Command ID2, phrase 2: da kai dian deng
I (173) MN: Command ID3, phrase 3: guan bi dian deng
I (183) MN: ---------------------------------------------------------
chunk_num = 200
-----------awaits to be waken up-----------
Find the pre-defined wake word of the board in the printed log. In this example, the wake word is “Hi Lexin" [Ləsɪ:n].
Then, say “Hi Lexin" ([Ləsɪ:n]) to wake up the board, which then wakes up and prints the following log:
hilexin DETECTED.
-----------------LISTENING-----------------
Then, the board enters the Listening status, waiting for new speech commands.
Currently, the MultiNet model already defined 20 speech commands, which can be seen in MultiNet.
Now, you can give one speech command, for example, “打开空调 (turn on the air conditioner)”,
-
If this command exists in the supported speech command list, the board prints out the command id of this command in its log:
-----------------LISTENING----------------- phrase ID: 0, prob: 0.866630 Commands ID: 0 -----------awaits to be waken up-----------
-
If this command does not exist in the supported speech command list, the board prints an error message of "cannot recognize any speech commands" in its log:
-----------------LISTENING----------------- cannot recognize any speech commands -----------awaits to be waken up-----------
Also, the board prints -----------awaits to be waken up-----------
when it ends the current recognition cycle and re-enters the Waiting-for-Wakeup status.
Notices:
The board can only stay in the Listening status for up to six seconds. After that, it ends the current recognition cycle and re-enters the Waiting-for-wakeup status. Therefore, you must give speech commands in six seconds after the board wakes up.
You don't need any special-purpose boards to run the WakeNet and MultiNet examples. Currently, Espressif has launched several audio boards and one of them is ESP32-LyraT-Mini, which is what we use in this example.
For details on the initialization of the ESP32-LyraT-Mini board, please see codes in components/hardware_driver
.
If you want to choose other development boards other than ESP32-LyraT-Mini, please go to esp-adf, which is Espressif's development framework for building audio applications based on ESP32 products, for more detailed information on hardware drivers.
The board enters the Waiting-for-wakeup status after waking up, during which the board will pick up audio data with the on-board microphone, and feed them to the WakeNet model frame by frame (30 ms, 16 KHz, 16 bit, mono).
Currently, you cannot customize wake word yourself. Therefore, please contact us for such requests.
During the recognition, the board feeds data frame by frame (30 ms, 16 KHz, 16 bit, mono) to the MultiNet model for six seconds. Then, the model compares the speech command received against the pre-defined commands in the list, and return the command id or an error message depending on the recognition result.
Please see section 1.5 on how to customize your speech command.