diff --git a/.gitbook/assets/keyword-spotting-product-guide/3d-pcb.png b/.gitbook/assets/synthetic-data-pipeline-keyword-spotting/3d-pcb.png similarity index 100% rename from .gitbook/assets/keyword-spotting-product-guide/3d-pcb.png rename to .gitbook/assets/synthetic-data-pipeline-keyword-spotting/3d-pcb.png diff --git a/.gitbook/assets/keyword-spotting-product-guide/classifier.png b/.gitbook/assets/synthetic-data-pipeline-keyword-spotting/classifier.png similarity index 100% rename from .gitbook/assets/keyword-spotting-product-guide/classifier.png rename to .gitbook/assets/synthetic-data-pipeline-keyword-spotting/classifier.png diff --git a/.gitbook/assets/keyword-spotting-product-guide/code.png b/.gitbook/assets/synthetic-data-pipeline-keyword-spotting/code.png similarity index 100% rename from .gitbook/assets/keyword-spotting-product-guide/code.png rename to .gitbook/assets/synthetic-data-pipeline-keyword-spotting/code.png diff --git a/.gitbook/assets/keyword-spotting-product-guide/cover.png b/.gitbook/assets/synthetic-data-pipeline-keyword-spotting/cover.png similarity index 100% rename from .gitbook/assets/keyword-spotting-product-guide/cover.png rename to .gitbook/assets/synthetic-data-pipeline-keyword-spotting/cover.png diff --git a/.gitbook/assets/keyword-spotting-product-guide/data-sources.png b/.gitbook/assets/synthetic-data-pipeline-keyword-spotting/data-sources.png similarity index 100% rename from .gitbook/assets/keyword-spotting-product-guide/data-sources.png rename to .gitbook/assets/synthetic-data-pipeline-keyword-spotting/data-sources.png diff --git a/.gitbook/assets/keyword-spotting-product-guide/deployment.png b/.gitbook/assets/synthetic-data-pipeline-keyword-spotting/deployment.png similarity index 100% rename from .gitbook/assets/keyword-spotting-product-guide/deployment.png rename to .gitbook/assets/synthetic-data-pipeline-keyword-spotting/deployment.png diff --git a/.gitbook/assets/keyword-spotting-product-guide/diagram.jpg b/.gitbook/assets/synthetic-data-pipeline-keyword-spotting/diagram.jpg similarity index 100% rename from .gitbook/assets/keyword-spotting-product-guide/diagram.jpg rename to .gitbook/assets/synthetic-data-pipeline-keyword-spotting/diagram.jpg diff --git a/.gitbook/assets/keyword-spotting-product-guide/generate-tts.jpg b/.gitbook/assets/synthetic-data-pipeline-keyword-spotting/generate-tts.jpg similarity index 100% rename from .gitbook/assets/keyword-spotting-product-guide/generate-tts.jpg rename to .gitbook/assets/synthetic-data-pipeline-keyword-spotting/generate-tts.jpg diff --git a/.gitbook/assets/keyword-spotting-product-guide/impulse-design.png b/.gitbook/assets/synthetic-data-pipeline-keyword-spotting/impulse-design.png similarity index 100% rename from .gitbook/assets/keyword-spotting-product-guide/impulse-design.png rename to .gitbook/assets/synthetic-data-pipeline-keyword-spotting/impulse-design.png diff --git a/.gitbook/assets/keyword-spotting-product-guide/nicla.webp b/.gitbook/assets/synthetic-data-pipeline-keyword-spotting/nicla.webp similarity index 100% rename from .gitbook/assets/keyword-spotting-product-guide/nicla.webp rename to .gitbook/assets/synthetic-data-pipeline-keyword-spotting/nicla.webp diff --git a/.gitbook/assets/keyword-spotting-product-guide/openai-api.png b/.gitbook/assets/synthetic-data-pipeline-keyword-spotting/openai-api.png similarity index 100% rename from .gitbook/assets/keyword-spotting-product-guide/openai-api.png rename to .gitbook/assets/synthetic-data-pipeline-keyword-spotting/openai-api.png diff --git a/.gitbook/assets/keyword-spotting-product-guide/pcb-order.png b/.gitbook/assets/synthetic-data-pipeline-keyword-spotting/pcb-order.png similarity index 100% rename from .gitbook/assets/keyword-spotting-product-guide/pcb-order.png rename to .gitbook/assets/synthetic-data-pipeline-keyword-spotting/pcb-order.png diff --git a/.gitbook/assets/keyword-spotting-product-guide/pcb.png b/.gitbook/assets/synthetic-data-pipeline-keyword-spotting/pcb.png similarity index 100% rename from .gitbook/assets/keyword-spotting-product-guide/pcb.png rename to .gitbook/assets/synthetic-data-pipeline-keyword-spotting/pcb.png diff --git a/.gitbook/assets/keyword-spotting-product-guide/processing-feature.png b/.gitbook/assets/synthetic-data-pipeline-keyword-spotting/processing-feature.png similarity index 100% rename from .gitbook/assets/keyword-spotting-product-guide/processing-feature.png rename to .gitbook/assets/synthetic-data-pipeline-keyword-spotting/processing-feature.png diff --git a/.gitbook/assets/keyword-spotting-product-guide/schematic.png b/.gitbook/assets/synthetic-data-pipeline-keyword-spotting/schematic.png similarity index 100% rename from .gitbook/assets/keyword-spotting-product-guide/schematic.png rename to .gitbook/assets/synthetic-data-pipeline-keyword-spotting/schematic.png diff --git a/.gitbook/assets/keyword-spotting-product-guide/small-kws.gif b/.gitbook/assets/synthetic-data-pipeline-keyword-spotting/small-kws.gif similarity index 100% rename from .gitbook/assets/keyword-spotting-product-guide/small-kws.gif rename to .gitbook/assets/synthetic-data-pipeline-keyword-spotting/small-kws.gif diff --git a/README.md b/README.md index 3624bb3..70fd339 100644 --- a/README.md +++ b/README.md @@ -126,7 +126,7 @@ Audio classification, keyword spotting, wakeword detection, or other machine lea * [Recognize Voice Commands with the Particle Photon 2](audio-projects/voice-commands-particle-photon-2.md) * [Voice Controlled Power Plug with Syntiant NDP120 (Nicla Voice)](audio-projects/voice-controlled-power-plug-nicla-voice.md) * [Determining Compressor State with Audio Classification - Avnet RaSynBoard](audio-projects/compressor-audio-classification-rasynboard.md) -* [Developing a Voice-Activated Product with Edge Impulse's Synthetic Data Pipeline](audio-projects/keyword-spotting-product-guide.md) +* [Developing a Voice-Activated Product with Edge Impulse's Synthetic Data Pipeline](audio-projects/synthetic-data-pipeline-keyword-spotting.md) ### Predictive Maintenance & Fault Classification diff --git a/SUMMARY.md b/SUMMARY.md index 2c8b9eb..3437aec 100644 --- a/SUMMARY.md +++ b/SUMMARY.md @@ -116,7 +116,7 @@ * [Recognize Voice Commands with the Particle Photon 2](audio-projects/voice-commands-particle-photon-2.md) * [Voice Controlled Power Plug with Syntiant NDP120 (Nicla Voice)](audio-projects/voice-controlled-power-plug-nicla-voice.md) * [Determining Compressor State with Audio Classification - Avnet RaSynBoard](audio-projects/compressor-audio-classification-rasynboard.md) -* [Developing a Voice-Activated Product with Edge Impulse's Synthetic Data Pipeline](audio-projects/keyword-spotting-product-guide.md) +* [Developing a Voice-Activated Product with Edge Impulse's Synthetic Data Pipeline](audio-projects/synthetic-data-pipeline-keyword-spotting.md) ## Predictive Maintenance & Fault Classification diff --git a/audio-projects/keyword-spotting-product-guide.md b/audio-projects/synthetic-data-pipeline-keyword-spotting.md similarity index 72% rename from audio-projects/keyword-spotting-product-guide.md rename to audio-projects/synthetic-data-pipeline-keyword-spotting.md index 33b3108..d3296d5 100644 --- a/audio-projects/keyword-spotting-product-guide.md +++ b/audio-projects/synthetic-data-pipeline-keyword-spotting.md @@ -1,6 +1,6 @@ --- description: >- - End-to-end synthetic data pipeline for the creation of a portable LED product equipped with keyword spotting (KWS) capabilities. The project serves as a comprehensive guide for the development of any KWS product, emphasizing the utilization of synthetic data for model training. + End-to-end synthetic data pipeline for the creation of a portable LED product equipped with keyword spotting capabilities. The project serves as a comprehensive guide for development of any KWS product, emphasizing the utilization of synthetic data for model training. --- # Developing a Voice-Activated Product with Edge Impulse's Synthetic Data Pipeline @@ -9,21 +9,21 @@ Created By: Samuel Alexander Public Project Link: [https://studio.edgeimpulse.com/public/379737/live](https://studio.edgeimpulse.com/public/379737/live) -Github Project Link: [https://github.com/SamuelAlexander/KWS-lumovoice-edge-impulse/tree/main](https://github.com/SamuelAlexander/KWS-lumovoice-edge-impulse/tree/main) +GitHub Project Link: [https://github.com/SamuelAlexander/KWS-lumovoice-edge-impulse/tree/main](https://github.com/SamuelAlexander/KWS-lumovoice-edge-impulse/tree/main) -![](../.gitbook/assets/keyword-spotting-product-guide/cover.png) +![](../.gitbook/assets/synthetic-data-pipeline-keyword-spotting/cover.png) ## Introduction -In the era of smart devices and voice-controlled technology, developing effective keyword spotting (KWS) systems is crucial for enhancing user experience and interaction. This documentation provides a comprehensive guide for creating a portable LED product with KWS capabilities, using Edge Impulse's end-to-end synthetic data pipeline. Synthetic voice/speech data, generated artificially rather than collected from real-world recordings, offers a scalable and cost-effective solution for training machine learning models. By leveraging AI text-to-speech technologies, we can create diverse and high-quality datasets tailored specifically for our KWS applications. This guide not only serves as a blueprint for building a responsive LED product but also lays the groundwork for a wide range of voice-activated devices, such as cameras that start recording on command, alarms that snooze with a keyword, or garage door that respond to voice prompts. +In the era of smart devices and voice-controlled technology, developing effective keyword spotting (KWS) systems is crucial for enhancing user experience and interaction. This documentation provides a comprehensive guide for creating a portable LED product with KWS capabilities, using Edge Impulse's end-to-end synthetic data pipeline. Synthetic voice/speech data, generated artificially rather than collected from real-world recordings, offers a scalable and cost-effective solution for training machine learning models. By leveraging AI text-to-speech technologies, we can create diverse and high-quality datasets tailored specifically for our KWS applications. This guide not only serves as a blueprint for building a responsive LED product but also lays the groundwork for a wide range of voice-activated devices, such as cameras that start recording on command, alarms that snooze with a keyword, or garage doors that respond to voice prompts. ## Problem Exploration -Traditional methods of training KWS models often rely on extensive datasets of human speech, which can be time-consuming and expensive to collect. Moreover, ensuring diversity and representation in these datasets can be challenging, leading to models that may not perform well across different accents, languages, and speaking environments. Synthetic data addresses these challenges by providing a controlled and flexible means of generating speech data. Using AI text-to-speech technology, we can produce vast amounts of speech data with varied voices, tones, and inflections, all tailored to the specific keywords we want our models to detect. +Traditional methods of training keyword spotting models often rely on extensive datasets of human speech, which can be time-consuming and expensive to collect. Moreover, ensuring diversity and representation in these datasets can be challenging, leading to models that may not perform well across different accents, languages, and speaking environments. Synthetic data addresses these challenges by providing a controlled and flexible means of generating speech data. Using AI text-to-speech technology, we can produce vast amounts of speech data with varied voices, tones, and inflections, all tailored to the specific keywords we want our models to detect. This approach opens up numerous possibilities for product development. For instance, a smart LED light can be designed to turn on or off in response to specific voice commands, enhancing convenience and accessibility. A camera can be programmed to start recording or take a group photo when a designated keyword is spoken, making it easier to capture moments without physical interaction. Similarly, an alarm system can be configured to snooze with a simple voice command, streamlining the user experience. By utilizing synthetic data, developers can create robust and versatile KWS models that power these innovative applications, ultimately leading to more intuitive and responsive smart devices. -![](../.gitbook/assets/keyword-spotting-product-guide/diagram.jpg) +![](../.gitbook/assets/synthetic-data-pipeline-keyword-spotting/diagram.jpg) ## Project Overview @@ -31,13 +31,13 @@ This project outlines the creation of a keyword spotting (KWS) product using Edg After training, the model is deployed onto the Arduino Nicla Voice hardware. A custom PCB and casing are designed to incorporate LED lights and power circuitry, ensuring portability and ease of use. This guide serves as a practical resource for developers looking to implement KWS functionality in voice-activated devices, demonstrating the efficiency of synthetic data in creating responsive and versatile products. -![](../.gitbook/assets/keyword-spotting-product-guide/small-kws.gif) +![](../.gitbook/assets/synthetic-data-pipeline-keyword-spotting/small-kws.gif) ### Hardware selection: Arduino Nicla Voice The Arduino Nicla Voice is an ideal choice for this project due to its use of the Syntiant NDP120, which offers great power efficiency for always-on listening. This efficiency allows the NDP120 to continuously monitor for keywords while consuming minimal power, making it perfect for battery-powered applications. Upon detecting a keyword, the NDP120 can notify the secondary microcontroller, Nordic Semiconductor nRF52832, which can then be programmed to control the lighting system. The compact size of the Nicla Voice also makes it easy to integrate into a small case with a battery. Furthermore, the Nicla Voice's standardized footprint simplifies the prototyping process, allowing for the easy creation of a custom PCB module with LED circuitry that can be easily connected. -![](../.gitbook/assets/keyword-spotting-product-guide/nicla.webp) +![](../.gitbook/assets/synthetic-data-pipeline-keyword-spotting/nicla.webp) ## Hardware Requirements @@ -52,27 +52,27 @@ The Arduino Nicla Voice is an ideal choice for this project due to its use of th ## Dataset Collection -### Creating an OpenAI API secret key +### Creating an OpenAI API Secret Key To create an OpenAI API secret key, start by visiting the [OpenAI website](https://www.openai.com/). If you don't have an account, sign up; otherwise, log in. Once logged in, navigate to the API section by clicking on your profile icon or the navigation menu and selecting "API" or "API Keys." In the API section, click on "Create New Key" or a similar button to generate a new API key. You may be prompted to name your API key for easy identification. After naming it, generate the key and it will be displayed to you. Copy the key immediately and store it securely, as it might not be visible again once you navigate away from the page. You can now use this API key in your applications to authenticate and access OpenAI services, for this project we will use the API key for generating synthetic voice data via Edge Impulse's transformation blocks. Ensure you keep your API key secret and do not expose it in client-side code or public repositories. You can manage your keys (regenerate, delete, or rename) in the API section of your OpenAI account. For more detailed instructions or troubleshooting, refer to the [OpenAI API documentation](https://beta.openai.com/docs/) or the help section on the OpenAI website. -![](../.gitbook/assets/keyword-spotting-product-guide/openai-api.png) +![](../.gitbook/assets/synthetic-data-pipeline-keyword-spotting/openai-api.png) -Once you have your secret key, you can navigate to your Edge Impulse organization page and enter your API key. Please also note that this Text to Speech Data Generation feature is only available for Edge Impulse Enterprise accounts only. +Once you have your secret key, you can navigate to your Edge Impulse organization page and enter your API key. Please also note that this Text to Speech Data Generation feature is only available for Edge Impulse Enterprise accounts. -### Generating TTS synthetic data +### Generating TTS Synthetic Data -Now that we have the environment configured out, our OpenAI API configured into the Edge Impulse Studio, we are ready to start a new project and begin generating some synthetic voice data. +Now that we have the environment configured, and our OpenAI API saved in the Edge Impulse Studio, we are ready to start a new project and begin generating some synthetic voice data. On your project's page select Data acquisition --> Data sources --> + Add new data source --> Transformation block --> Whisper Synthetic Voice Generator --> Fill out the details as follow: -![](../.gitbook/assets/keyword-spotting-product-guide/generate-tts.JPG) +![](../.gitbook/assets/synthetic-data-pipeline-keyword-spotting/generate-tts.jpg) **Phrase: illuminate** -We need to generate speech for the word illuminate and extinguish, we can start with illuminate for this 'action' first and then set up another data source 'action' for extinguish after we finish configuring this one. +We need to generate speech for the words "_illuminate_" and "_extinguish_", we can start with illuminate for this 'action' first and then set up another data source 'action' for extinguish after we finish configuring this one. **Label: illuminate** @@ -88,24 +88,23 @@ We want to create diversity of voice and accent in our dataset, so choose random **Model: tts-1** -I tested tts-1 and tts-1-hd, I think the quality difference is negligible for this case, but feel free to select either one. Note that tts-1-hd will cost you more OpenAI credit. +I tested tts-1 and tts-1-hd, I think the quality difference is negligible for this case, but feel free to select either one. Note that tts-1-hd will cost you more OpenAI credits. **Speed: 0.6, 0.7, 0.8, 0.9, 1.0, 1.1, 1.2** -We want to vary the speed of the voice pronouncing the word we want to generate. 0.6 means 60% of its original speed, 0.6 - 1.2 should give enough range. +We want to vary the speed of the voice pronouncing the word we want to generate. 0.6 means 60% of its original speed, and 0.6 - 1.2 should give enough range. -Now you can run the action, if successful the tts voice generation should begin and it may take a few minutes. If the job failed you should be notified and you can recheck if the API key is entered correctly, then you can retry again. +Now you can run the action. If successful, the tts voice generation should begin and it may take a few minutes to complete. If the job failed you should be notified and you can recheck if the API key is entered correctly, then you can retry again. -Once satisfied with all the data generated, perform train and test split into approximately 80/20 ratio. - -![](../.gitbook/assets/keyword-spotting-product-guide/data-sources.png) +Once satisfied with all the data generated, perform a Train / Test split into approximately 80/20 ratio. +![](../.gitbook/assets/synthetic-data-pipeline-keyword-spotting/data-sources.png) ## Impulse Design -![](../.gitbook/assets/keyword-spotting-product-guide/impulse-design.png) +![](../.gitbook/assets/synthetic-data-pipeline-keyword-spotting/impulse-design.png) -The impulse design values are chosen for optimal keyword spotting performance. The 968 ms window size captures enough audio for accurate detection, while the 500 ms window increase balances responsiveness and efficiency. The 16000 Hz frequency is standard for capturing human voice, ensuring quality without excessive data. Using Audio (Syntiant) leverages the NDP120's capabilities for efficient digital signal processing. The Classification block distinguishes between commands, with output classes "extinguish," "illuminate," and "z_openset" allowing for control of the lighting system and handling unknown inputs. +The Impulse design values are chosen for optimal keyword spotting performance. The 968 ms window size captures enough audio for accurate detection, while the 500 ms window increase balances responsiveness and efficiency. The 16000 Hz frequency is standard for capturing human voice, ensuring quality without excessive data. Using the Audio (Syntiant) block leverages the NDP120's capabilities for efficient digital signal processing. The Classification block distinguishes between commands, with output classes "extinguish," "illuminate," and "z_openset" allowing for control of the lighting system and handling unknown inputs. - Window size: 968 ms - Window increase: 500 ms @@ -114,38 +113,41 @@ The impulse design values are chosen for optimal keyword spotting performance. T - Classification - Output: extinguish, illuminate, z_openset -![](../.gitbook/assets/keyword-spotting-product-guide/processing-feature.png) +![](../.gitbook/assets/synthetic-data-pipeline-keyword-spotting/processing-feature.png) + +Under Classifier, set the learning rate to 0.0005 and change the architecture preset to use **Dense Network**. -Under classifier, set the learning rate to 0.0005 and change the architecture preset to use **Dense Network**. +![](../.gitbook/assets/synthetic-data-pipeline-keyword-spotting/classifier.png) -![](../.gitbook/assets/keyword-spotting-product-guide/classifier.png) Our audio classifier gives in accuracy of 93.8% which is satisfactory. We can continue tuning the hyperparameters and try using some data augmentation, but for the purpose of this demonstration we are satisfied with the current result and can move to the deployment phase. ## Deployment Now the AI model is ready to be deployed to the Arduino Nicla Voice. Let's select the Arduino Nicla Voice deployment. -![](../.gitbook/assets/keyword-spotting-product-guide/deployment.png) +![](../.gitbook/assets/synthetic-data-pipeline-keyword-spotting/deployment.png) -After building our model, we'll get the new firmware. Follow this guide for more detail on how to flash the audio firmware: https://docs.edgeimpulse.com/docs/edge-ai-hardware/mcu-+-ai-accelerators/arduino-nicla-voice +After building our model, we'll get the new firmware. Follow this guide for more detail on how to flash the audio firmware: [https://docs.edgeimpulse.com/docs/edge-ai-hardware/mcu-+-ai-accelerators/arduino-nicla-voice](https://docs.edgeimpulse.com/docs/edge-ai-hardware/mcu-+-ai-accelerators/arduino-nicla-voice) Please note that we want to flash the firmware that you have just built, instead of the default audio firmware for the Nicla Voice's NDP120. -Once we have flashed the firmware, we can upload the Arduino code using the Arduino IDE. You can find the code on my Github repository here: https://github.com/SamuelAlexander/KWS-lumovoice-edge-impulse/tree/main +Once we have flashed the firmware, we can upload the Arduino code using the Arduino IDE. You can find the code on my GitHub repository here: [https://github.com/SamuelAlexander/KWS-lumovoice-edge-impulse/tree/main](https://github.com/SamuelAlexander/KWS-lumovoice-edge-impulse/tree/main) -Once the code is uploaded you can verify if everything works by saying out loud the word 'Illuminate' and 'Extinguish'. When the keyword 'Illuminate' is detected, the blue built-in led will blink and when the keyword 'Extinguish' is detected, the red built-in led will blink. +Once the code is uploaded you can verify if everything works by saying out loud the words 'Illuminate' and 'Extinguish'. When the keyword 'Illuminate' is detected, the blue built-in led will blink and when the keyword 'Extinguish' is detected, the red built-in led will blink. Next we will design and manufacture the PCB socket which holds the LED and power circuitry which can turn on when 'Illuminate' is detected and turn off when 'Extinguish' is detected. -## Designing and building the KWS product +## Designing and Building the KWS Product The schematic and PCB is designed using KiCAD. A single sided aluminum PCB is selected for this project due to its excellent thermal conductivity, which helps dissipate heat generated by the LEDs and other components, ensuring reliable performance and longevity. The design of this PCB is simple enough to make it possible to route using one side only. The schematic, pcb, and gerber (manufacturing) files are accessible in the project's github page. -![](../.gitbook/assets/keyword-spotting-product-guide/schematic.png) -![](../.gitbook/assets/keyword-spotting-product-guide/pcb.png) -![](../.gitbook/assets/keyword-spotting-product-guide/pcb-order.png) +![](../.gitbook/assets/synthetic-data-pipeline-keyword-spotting/schematic.png) + +![](../.gitbook/assets/synthetic-data-pipeline-keyword-spotting/pcb.png) + +![](../.gitbook/assets/synthetic-data-pipeline-keyword-spotting/pcb-order.png) | LCSC Part Number | Manufacture Part Number | Manufacturer | Package | Description | Order Qty. | Unit Price($) | Order Price($) | |--------------------|----------------------------------------------------------|--------------------------------------------|-----------|--------------------------------------------------------------------------------------------------------------------|------------|---------------|----------------| @@ -155,7 +157,7 @@ The schematic, pcb, and gerber (manufacturing) files are accessible in the proje | C5440143 | CS3225X7R476K160NRL | Samwha Capacitor | 1210 | 16V 47uF X7R ą10% 1210 Multilayer Ceramic Capacitors MLCC - SMD/SMT ROHS | 5 | 0.0765 | 0.38 | | C153338 | FCR1206J100RP05Z | Ever Ohms Tech | 1206 | 250mW Safety Resistor 200V ą5% 100? 1206 Chip Resistor - Surface Mount ROHS | 10 | 0.0541 | 0.54 | -![](../.gitbook/assets/keyword-spotting-product-guide/3d-pcb.png) +![](../.gitbook/assets/synthetic-data-pipeline-keyword-spotting/3d-pcb.png) ## Conclusion