Skip to content

Speech Testing GUI

Ryan Petschek edited this page Jun 6, 2017 · 1 revision

Prerequisites

Speech Testing GUI

Launching the Speech Testing GUI

  • With a roscore started, run rosrun rqt_speech_testing rqt_speech_testing
    • If the GUI fails to start, you may have to run rqt --force-discover

Loading audio data from a .wav file for recognition

To load and recognize speech from a .wav file, click the open button or enter the path to the file in the text field and press enter. The .wav file will then be loaded and keyword recognition run. The recognized speech output will be displayed in a tree format in the output window towards the bottom.

Audio can also be loaded from a folder of .wav files for easy comparison of a dataset before and after tuning the underlying speech recognition system that the Speech Testing GUI interfaces with (the hlpr_speech_recognition Python module)

The .wav format:

In order for pocketsphinx to properly recognize the speech in your audio files, the .wav files must be exported at a sample rate of 16,000 Hz and be in the 16 bit signed PCM format. If you're looking for a good audio editor and recording program that can do this easily, check out Audacity.

Output window tree format:

  • Full path to the recognized file or Recording if live audio is being recognized (see below)
  • ISO 8061 timestamp of when recognition occurred and the recognized text
    • Note that the recognized text will always be in uppercase format because that's how the keywords are defined within the hlpr_speech_recognition/data/ directory
    • Audio that contains speech or other sounds that can't be with matched with sufficient confidence will return the recognized text string UNKNOWN

Recording live audio data for recognition

Audio data can also be recorded live from the Kinect or another microphone on your computer. First, ensure that it is the default input device and then start the Speech Testing GUI. Begin recording audio for recognition by clicking the "Record" button. A new entry will be added to the output window labeled "Recording". Any audio recognized will then be output in the same format described above. To stop audio recording, click the "Record" button again.

Actions on output

At the bottom of the window are two buttons for acting on the generated output. The first, "Clear output" will empty the output view. The second will export the output view in JSON format to easily compare diffs after tuning or for other scripts, systems, or applications to parse.

For example, this is the exported JSON of the first screenshot:

[
    {
        "name": "/home/petschekr/Music/HLPR-Speech Test/Close your hand.wav", 
        "recognizedText": [
            {
                "timestamp": "2017-06-06 10:07:40", 
                "text": "CLOSE YOUR HAND"
            }
        ]
    }
]

Troubleshooting

  • Speech recognition is handled by the SpeechRecognizer class within the hlpr_speech_recognition class and not this GUI directly. To tweak the performance of pocketsphinx, you'll want to head there.
    • The pull request that adds this GUI also brought improvements to how SpeechRecognizer matches keyphrases. It now applies the recognition threshold relatively to the other matches that the engine returns. Previously, recognition results were only accepted if their probability was over 100% because the minimum probability (returned by the engine as the log10 of the actual probability) was set to 0 and 100 = 1.00. The absolute minimum threshold has now been set to -1500 which seems to be sufficient to reject background noise but not speech.
  • Check the output from the terminal in which you started the Speech Testing GUI for additional, verbose information about what is going on during speech recognition
    • When the recognition is restarted, its input arguments are printed
    • When matching a keyphrase, the engine will list possible matches in a list of tuples, e.g.: [('CLOSE YOUR HAND', -758, 3, 74)] where the tuples are in the format (phrase, log10 probability, start frame of match, end frame of match)
    • When matching a keyphrase, the engine will also print if it detected a keyphrase with sufficient confidence and if it could not find a match or wasn't confident enough in the result and returned UNKNOWN
  • Weights were added to the keyphrase list to improve accuracy but these require further tuning. See hlpr_speech_recognition/data/kps.txt.