Document audio nodes & packages (#50)

KuriRobot · Oct 16, 2018 · bf18985 · bf18985
1 parent 690fb1e
commit bf18985
Show file tree

Hide file tree

Showing 6 changed files with 325 additions and 111 deletions.
diff --git a/assets/images/reference/audio_pipeline.png b/assets/images/reference/audio_pipeline.png
diff --git a/reference/ros-nodes/audio-realtime.md b/reference/ros-nodes/audio-realtime.md
@@ -3,46 +3,134 @@ layout: reference
 title: audio_realtime
 category: node
 tags: 
-- ${tag}
-- ${tag}
-- ${tag}
+- realtime
+- echo cancellation
+- AEC
+- beam forming
+- audio
 ---
 
 ## Description
-${description}
+This node is responsible for processing real-time audio on PulseAudio sources 
+and writing processed audio to PulseAudio sinks. The processing that occurs on
+the audio includes dynamic gain control, echo cancellation, beam forming and
+spatial filtering, and noise reduction. The ``signalessence`` library is a 
+dependency that manages the algorithmic complexity of the audio processing, and
+`audio_realtime` is a wrapper that manages the interface between PulseAudio
+buffers/streams and those APIs. 
 
-## Dependencies
-${dependencies, if any}
+### Explanation of Acoustic Echo Cancellation (AEC) Pipeline
+
+![Audio Pipeline Diagram](/assets/images/reference/audio_pipeline.png)
+
+**Rcv Path**  
+Performs dynamic gain control, equalization, and crossover
+
+***system_out***  
+Identifiers:
+- "far end talker"
+- rin
+- PulseAudio default sink (for media playback)
 
-## Action API
-### Action Subscribed Topics
-``${topic}``  
-``${topic}``  
+***line_out***  
+Identifiers:
+- "loudspeaker"
+- rout
+- PulseAudio sink alsa_output.default
 
-### Action Published Topics
-``${topic}``  
-``${topic}``  
+**Tx Path**  
+Performs acoustic echo cancellation, beam forming and spatial filtering,
+noise reduction, and dynamic gain control
 
-## Subscribed Topics
-``${topic}``  
-``${topic}``  
-``${topic}``  
+***line_in***  
+Identifiers:
+- "near end talker" (sin), "reference signal" (refin)
+- sin + refin
+- microphone input + reference signal copied from line_out
+- PulseAudio source alsa_input.default
 
-## Published Topics
-``${topic}``  
-``${topic}``  
+***system_in***  
+Identifiers:
+- sout
+- PulseAudio default source (post-AEC)
+
+## Dependencies
+- PulseAudio
+- ``signalessence``
 
 ## Services
-``${service}``  
-``${service}``  
+#### `/speaker_phone/get_direction`  
+``GetDirection.srv``  
+Retrieves the direction vector of the most recent audio capture, in Cartesian
+coordinates, as well as an angle in degrees
+
+```sh 
+$ rosservice call /speaker_phone/get_direction 0 1000 0
+direction:
+  x: -97.0
+  y: -26.0
+  z: 0.0
+relative_angle: 195
+```
+
+#### `/speaker_phone/get_field`
+``GetField.srv``  
+Retrieves the value of a diagnostic field in the Signal Essence library
+
+```sh 
+$ rosservice call /speaker_phone/get_field sercv_rin_power_db_00
+json_value: -379.2977905273438
+```
 
-## Service Calls
-``${service}``  
-``${service}``  
+#### `/speaker_phone/list_fields`
+``ListFields.srv``  
+Lists all of the available diagnostic fields in the Signal Essence library, 
+including information about type, length, description and read/write mode
+
+```sh 
+$ rosservice call /speaker_phone/list_fields
+fields:
+  -
+    name: aecmon_sin_power_per_mic
+    type: float32
+    length: 4
+    description: ''
+    mode: read|write
+...
+```
+
+#### `/speaker_phone/set_field`
+``SetField.srv``  
+Writes a value to a diagnostic field in the Signal Essence library, assuming 
+the field is writable
+> NOTE: There is a known bug that may prevent use of this service
 
 ## Parameters
-``${parameter}``  
-``${parameter}``  
+#### `/audio_realtime/line_in`
+PulseAudio source associated with the microphone input.
+
+#### `/audio_realtime/line_out`
+PulseAudio sink associated with audio playback on the loudspeakers.
+
+#### `/audio_realtime/params/bulk_delay`
+Configured delay between audio playback and microphone recording in samples. 
+Kuri's XMOS firmware currently handles the loopback of audio playback, which 
+means the only delay is the time required for sound to travel the air between 
+the loudspeakers and the microphones (~1.5ms). This is equivalent to 0 samples,
+(where the sample size is 10ms) or a `bulk_delay` of 0.
+
+#### `/audio_realtime/system_in`
+PulseAudio source or sink-monitor used by applications and higher-level Kuri
+software that requires processed microphone input.
+
+#### `/audio_realtime/system_out`
+PulseAudio sink associated with audio sent by applications and higher-level 
+Kuri software for playback through the loudspeakers.
+
+#### `/audio_realtime/type`
+The name of the library used for audio processing. It is recommended that this 
+remain set as `signal-essence` and not modified, for stability of the audio
+system.
 
 ## Launch File
 ``audio_realtime.launch``  
diff --git a/reference/ros-nodes/audio-voice-delegate.md b/reference/ros-nodes/audio-voice-delegate.md
@@ -3,46 +3,193 @@ layout: reference
 title: audio_voice_delegate
 category: node
 tags: 
-- ${tag}
-- ${tag}
-- ${tag}
+- wake word
+- hey kuri
+- voice commands
+- transcription
 ---
 
 ## Description
-${description}
+This node is responsible for handling Kuri voice interactions. Specifically, 
+`audio_voice_delegate` has two responsibilities:
+- determining if a wake word has been spoken
+- managing audio transcription and voice command matching 
 
-## Dependencies
-${dependencies, if any}
-
-## Action API
-### Action Subscribed Topics
-``${topic}``  
-``${topic}``  
+If a wake word is detected, a request is made to the 
+[Houndify](https://www.houndify.com/) service to determine the spoken command, 
+before publishing the results on a ROS topic.
 
-### Action Published Topics
-``${topic}``  
-``${topic}``  
+## Dependencies
+- PulseAudio
+- ``soundhound_kuri``
 
 ## Subscribed Topics
-``${topic}``  
-``${topic}``  
-``${topic}``  
+- ``/image_wp_server/waypoint_update``    
+- ``/voice_commands/update``  
 
 ## Published Topics
-``${topic}``  
-``${topic}``  
+#### `/audio/voice_delegate/asleep`
+``Asleep.msg``   
+If audio enters the "asleep" state as a result of the `awake_timeout` (while
+transcribing audio) or a call to ``/audio/voice_delegate/sleep``, an empty
+message will be published
+
+#### `/audio/voice_delegate/awake`
+``Awake.msg``  
+If audio enters the "awake" state as a result of a wake word being detected 
+or the service ``/audio/voice_delegate/wake_up`` is called, a message will be
+published. If the wake word triggered this state change, a non-zero direction 
+vector will be provided for the observed source of audio.
+
+```sh 
+$ rostopic echo /audio/voice_delegate/awake
+direction:
+  x: -97.0
+  y: 26.0
+  z: 0.0
+---
+```
+
+#### `/audio/voice_delegate/exchange`
+``Exchange.msg``  
+If an audio transcription or command is received from Houndify, a message will 
+be published containing the matched command(s), the raw transcription, and any 
+known error messages 
+
+```sh 
+$ rostopic echo /audio/voice_delegate/exchange
+commands:
+  -
+    name: CustomCommand
+    params:
+      -
+        k: name
+        v: Happy Birthday Song
+error: ''
+transcription: it's my birthday
+---
+commands:
+  -
+    name: StopCommand
+    params: []
+error: ''
+transcription: stop
+---
+commands:
+  -
+    name: TurnCommand
+    params:
+      -
+        k: direction
+        v: -90
+error: ''
+transcription: turn left
+---
+```
 
 ## Services
-``${service}``  
-``${service}``  
+#### `/audio/voice_delegate/deafen`
+``Deafen.srv``    
+Sets mayfield_audio to the "deaf" state. This is the most common way to 
+temporarily set Kuri to ignore wake words and voice commands. Kuri must be 
+returned to the "asleep" state via service call in order for wake words to be 
+recognized.
+
+```sh
+$ rosservice call /audio/voice_delegate/deafen
+state: deaf
+```
+
+#### `/audio/voice_delegate/snooze` 
+``Snooze.srv``  
+Sets mayfield_audio to the "asleep" listening state. This effectively ends any 
+ongoing voice command transcriptions and returns Kuri to a state of listening 
+for the wake word.
+
+```sh
+$ rosservice call /audio/voice_delegate/snooze
+state: asleep
+```
+
+#### `/audio/voice_delegate/stat`  
+``Stat.srv``  
+Reports the current audio status of mayfield_audio. Possible states include 
+"awake" (transcribing audio), "asleep" (listening for wake words), and "deaf" 
+(ignoring all audio inputs).
+
+```sh 
+$ rosservice call /audio/voice_delegate/stat
+state: asleep
+direction: -97.000000, 26.000000, 0.000000
+```
+
+#### `/audio/voice_delegate/wake_up` 
+``WakeUp.srv``  
+Sets mayfield_audio to the "awake" state. This service triggers Kuri to start 
+transcribing audio in attempt to find a voice command match. Transcription 
+will time out after ``/audio_voice_delegate/awake_timeout`` and Kuri will 
+return to the "asleep" state (listening for wake word).
+
+```sh 
+$ rosservice call /audio/voice_delegate/wake_up
+state: awake
+```
 
 ## Service Calls
-``${service}``  
-``${service}``  
+- ``/speaker_phone/get_direction``  
 
 ## Parameters
-``${parameter}``  
-``${parameter}``  
+#### `/audio_voice_delegate/agent/params/client_id`
+The client ID provided by Houndify for identification with an account and 
+subscribed domains. Defaults to the Mayfield client/account
+
+#### `/audio_voice_delegate/agent/params/client_key`
+The client key associated with the Houndify client ID, above. Defaults to the
+Mayfield client key
+
+#### `/audio_voice_delegate/agent/params/user_id`
+A unique identifier for the robot, used to authenticate the robot with the 
+Houndify service, and provide improved location name matching for "go to" voice
+commands. Defaults to the Kuri UUID
+
+#### `/audio_voice_delegate/agent/type`
+Identifier for the voice transcription service to use. It is recommended that
+this be set to `soundhound` and not modified for the stability of voice 
+commands.
+
+#### `/audio_voice_delegate/awake_timeout`
+The duration of audio samples to send before ending transcription, measured in
+seconds
+
+#### `/audio_voice_delegate/capture/params/channels`
+The number of channels associated with the audio buffer used for listening
+
+#### `/audio_voice_delegate/capture/params/device`
+PulseAudio source or sink-monitor associated with the processed audio input,
+used for wake word detection and audio transcription
+
+#### `/audio_voice_delegate/capture/params/latency_msecs`
+The acceptable number of milliseconds of latency, used to configure the stream
+for listening to audio
+
+#### `/audio_voice_delegate/capture/params/sample_rate`
+The sample rate of the audio buffer used for listening
+
+#### `/audio_voice_delegate/capture/type`
+System audio API type. Defaults to `pulse`
+
+#### `/audio_voice_delegate/debug`
+Deprecated (unused)
+
+#### `/audio_voice_delegate/direction`
+Deprecated (unused)
+
+#### `/audio_voice_delegate/wake/type`
+Identifier for the wake word library to use. It is recommended that this be
+set to `okhound` and not modified for the stability of wake word detection.
+
+#### `/audio_voice_delegate/wake_threshold`
+Deprecated (unused) 
 
 ## Launch File
 ``audio_voice_delegate.launch``