Skip to content

Commit

Permalink
Document audio nodes & packages (#50)
Browse files Browse the repository at this point in the history
  • Loading branch information
cjwilliams authored Oct 16, 2018
1 parent 690fb1e commit bf18985
Show file tree
Hide file tree
Showing 6 changed files with 325 additions and 111 deletions.
Binary file added assets/images/reference/audio_pipeline.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
142 changes: 115 additions & 27 deletions reference/ros-nodes/audio-realtime.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,46 +3,134 @@ layout: reference
title: audio_realtime
category: node
tags:
- ${tag}
- ${tag}
- ${tag}
- realtime
- echo cancellation
- AEC
- beam forming
- audio
---

## Description
${description}
This node is responsible for processing real-time audio on PulseAudio sources
and writing processed audio to PulseAudio sinks. The processing that occurs on
the audio includes dynamic gain control, echo cancellation, beam forming and
spatial filtering, and noise reduction. The ``signalessence`` library is a
dependency that manages the algorithmic complexity of the audio processing, and
`audio_realtime` is a wrapper that manages the interface between PulseAudio
buffers/streams and those APIs.

## Dependencies
${dependencies, if any}
### Explanation of Acoustic Echo Cancellation (AEC) Pipeline

![Audio Pipeline Diagram](/assets/images/reference/audio_pipeline.png)

**Rcv Path**
Performs dynamic gain control, equalization, and crossover

***system_out***
Identifiers:
- "far end talker"
- rin
- PulseAudio default sink (for media playback)

## Action API
### Action Subscribed Topics
``${topic}``
``${topic}``
***line_out***
Identifiers:
- "loudspeaker"
- rout
- PulseAudio sink alsa_output.default

### Action Published Topics
``${topic}``
``${topic}``
**Tx Path**
Performs acoustic echo cancellation, beam forming and spatial filtering,
noise reduction, and dynamic gain control

## Subscribed Topics
``${topic}``
``${topic}``
``${topic}``
***line_in***
Identifiers:
- "near end talker" (sin), "reference signal" (refin)
- sin + refin
- microphone input + reference signal copied from line_out
- PulseAudio source alsa_input.default

## Published Topics
``${topic}``
``${topic}``
***system_in***
Identifiers:
- sout
- PulseAudio default source (post-AEC)

## Dependencies
- PulseAudio
- ``signalessence``

## Services
``${service}``
``${service}``
#### `/speaker_phone/get_direction`
``GetDirection.srv``
Retrieves the direction vector of the most recent audio capture, in Cartesian
coordinates, as well as an angle in degrees

```sh
$ rosservice call /speaker_phone/get_direction 0 1000 0
direction:
x: -97.0
y: -26.0
z: 0.0
relative_angle: 195
```

#### `/speaker_phone/get_field`
``GetField.srv``
Retrieves the value of a diagnostic field in the Signal Essence library

```sh
$ rosservice call /speaker_phone/get_field sercv_rin_power_db_00
json_value: -379.2977905273438
```

## Service Calls
``${service}``
``${service}``
#### `/speaker_phone/list_fields`
``ListFields.srv``
Lists all of the available diagnostic fields in the Signal Essence library,
including information about type, length, description and read/write mode

```sh
$ rosservice call /speaker_phone/list_fields
fields:
-
name: aecmon_sin_power_per_mic
type: float32
length: 4
description: ''
mode: read|write
...
```

#### `/speaker_phone/set_field`
``SetField.srv``
Writes a value to a diagnostic field in the Signal Essence library, assuming
the field is writable
> NOTE: There is a known bug that may prevent use of this service
## Parameters
``${parameter}``
``${parameter}``
#### `/audio_realtime/line_in`
PulseAudio source associated with the microphone input.

#### `/audio_realtime/line_out`
PulseAudio sink associated with audio playback on the loudspeakers.

#### `/audio_realtime/params/bulk_delay`
Configured delay between audio playback and microphone recording in samples.
Kuri's XMOS firmware currently handles the loopback of audio playback, which
means the only delay is the time required for sound to travel the air between
the loudspeakers and the microphones (~1.5ms). This is equivalent to 0 samples,
(where the sample size is 10ms) or a `bulk_delay` of 0.

#### `/audio_realtime/system_in`
PulseAudio source or sink-monitor used by applications and higher-level Kuri
software that requires processed microphone input.

#### `/audio_realtime/system_out`
PulseAudio sink associated with audio sent by applications and higher-level
Kuri software for playback through the loudspeakers.

#### `/audio_realtime/type`
The name of the library used for audio processing. It is recommended that this
remain set as `signal-essence` and not modified, for stability of the audio
system.

## Launch File
``audio_realtime.launch``
197 changes: 172 additions & 25 deletions reference/ros-nodes/audio-voice-delegate.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,46 +3,193 @@ layout: reference
title: audio_voice_delegate
category: node
tags:
- ${tag}
- ${tag}
- ${tag}
- wake word
- hey kuri
- voice commands
- transcription
---

## Description
${description}
This node is responsible for handling Kuri voice interactions. Specifically,
`audio_voice_delegate` has two responsibilities:
- determining if a wake word has been spoken
- managing audio transcription and voice command matching

## Dependencies
${dependencies, if any}

## Action API
### Action Subscribed Topics
``${topic}``
``${topic}``
If a wake word is detected, a request is made to the
[Houndify](https://www.houndify.com/) service to determine the spoken command,
before publishing the results on a ROS topic.

### Action Published Topics
``${topic}``
``${topic}``
## Dependencies
- PulseAudio
- ``soundhound_kuri``

## Subscribed Topics
``${topic}``
``${topic}``
``${topic}``
- ``/image_wp_server/waypoint_update``
- ``/voice_commands/update``

## Published Topics
``${topic}``
``${topic}``
#### `/audio/voice_delegate/asleep`
``Asleep.msg``
If audio enters the "asleep" state as a result of the `awake_timeout` (while
transcribing audio) or a call to ``/audio/voice_delegate/sleep``, an empty
message will be published

#### `/audio/voice_delegate/awake`
``Awake.msg``
If audio enters the "awake" state as a result of a wake word being detected
or the service ``/audio/voice_delegate/wake_up`` is called, a message will be
published. If the wake word triggered this state change, a non-zero direction
vector will be provided for the observed source of audio.

```sh
$ rostopic echo /audio/voice_delegate/awake
direction:
x: -97.0
y: 26.0
z: 0.0
---
```

#### `/audio/voice_delegate/exchange`
``Exchange.msg``
If an audio transcription or command is received from Houndify, a message will
be published containing the matched command(s), the raw transcription, and any
known error messages

```sh
$ rostopic echo /audio/voice_delegate/exchange
commands:
-
name: CustomCommand
params:
-
k: name
v: Happy Birthday Song
error: ''
transcription: it's my birthday
---
commands:
-
name: StopCommand
params: []
error: ''
transcription: stop
---
commands:
-
name: TurnCommand
params:
-
k: direction
v: -90
error: ''
transcription: turn left
---
```
## Services
``${service}``
``${service}``
#### `/audio/voice_delegate/deafen`
``Deafen.srv``
Sets mayfield_audio to the "deaf" state. This is the most common way to
temporarily set Kuri to ignore wake words and voice commands. Kuri must be
returned to the "asleep" state via service call in order for wake words to be
recognized.
```sh
$ rosservice call /audio/voice_delegate/deafen
state: deaf
```
#### `/audio/voice_delegate/snooze`
``Snooze.srv``
Sets mayfield_audio to the "asleep" listening state. This effectively ends any
ongoing voice command transcriptions and returns Kuri to a state of listening
for the wake word.
```sh
$ rosservice call /audio/voice_delegate/snooze
state: asleep
```
#### `/audio/voice_delegate/stat`
``Stat.srv``
Reports the current audio status of mayfield_audio. Possible states include
"awake" (transcribing audio), "asleep" (listening for wake words), and "deaf"
(ignoring all audio inputs).
```sh
$ rosservice call /audio/voice_delegate/stat
state: asleep
direction: -97.000000, 26.000000, 0.000000
```
#### `/audio/voice_delegate/wake_up`
``WakeUp.srv``
Sets mayfield_audio to the "awake" state. This service triggers Kuri to start
transcribing audio in attempt to find a voice command match. Transcription
will time out after ``/audio_voice_delegate/awake_timeout`` and Kuri will
return to the "asleep" state (listening for wake word).
```sh
$ rosservice call /audio/voice_delegate/wake_up
state: awake
```
## Service Calls
``${service}``
``${service}``
- ``/speaker_phone/get_direction``
## Parameters
``${parameter}``
``${parameter}``
#### `/audio_voice_delegate/agent/params/client_id`
The client ID provided by Houndify for identification with an account and
subscribed domains. Defaults to the Mayfield client/account
#### `/audio_voice_delegate/agent/params/client_key`
The client key associated with the Houndify client ID, above. Defaults to the
Mayfield client key
#### `/audio_voice_delegate/agent/params/user_id`
A unique identifier for the robot, used to authenticate the robot with the
Houndify service, and provide improved location name matching for "go to" voice
commands. Defaults to the Kuri UUID
#### `/audio_voice_delegate/agent/type`
Identifier for the voice transcription service to use. It is recommended that
this be set to `soundhound` and not modified for the stability of voice
commands.
#### `/audio_voice_delegate/awake_timeout`
The duration of audio samples to send before ending transcription, measured in
seconds
#### `/audio_voice_delegate/capture/params/channels`
The number of channels associated with the audio buffer used for listening
#### `/audio_voice_delegate/capture/params/device`
PulseAudio source or sink-monitor associated with the processed audio input,
used for wake word detection and audio transcription
#### `/audio_voice_delegate/capture/params/latency_msecs`
The acceptable number of milliseconds of latency, used to configure the stream
for listening to audio
#### `/audio_voice_delegate/capture/params/sample_rate`
The sample rate of the audio buffer used for listening
#### `/audio_voice_delegate/capture/type`
System audio API type. Defaults to `pulse`
#### `/audio_voice_delegate/debug`
Deprecated (unused)
#### `/audio_voice_delegate/direction`
Deprecated (unused)
#### `/audio_voice_delegate/wake/type`
Identifier for the wake word library to use. It is recommended that this be
set to `okhound` and not modified for the stability of wake word detection.
#### `/audio_voice_delegate/wake_threshold`
Deprecated (unused)
## Launch File
``audio_voice_delegate.launch``
Loading

0 comments on commit bf18985

Please sign in to comment.