GUI to chop, categorize, and annotate audio clips from daylong recordings. This version annotates ambient language characteristics like language spoken (for bilingual environments), child-directed speech, and speaker category. However, the application is easily modifiable for any annotation categories.
- A description of the materials
- An example work flow to process and annotate recordings
- Research output
The following dependencies are required to run the applications and scripts in this repo: ffprobe
, ffmpeg
, (both available here), pydub
, matplotlib
, scipy
, and pandas
. In addition, this repo is configured for Python 3.7.0 or older. That doesn't mean it's unworkable with newer versions, but you will likely have to create some work arounds. One solution is to configure a Python 3.7.0 environment within Anaconda:
conda create --name py37 python=3.7.0 pip
conda activate py37
1_chunk-and-label/
- a chopper + vocal activity dectector
- chops longform recordings into 30 second clips
- runs standard vocal activity detector (Usoltsev, 2015)
- Creates accompanying metadata file config.csv
which contains % vocal acvitity, timestamp of clip in original file, speaker metadata, filename, etc.
2_categorize_app_biling.py
- GUI to walk research assistants through annotatating clips derived from 1_chunk-and-label/
. Includes annotation options for language choice, speaker, and presence/absence of media. Categorize_app creates new file responses.csv
that records annotations made through application + speaker and file metadata.
2_categorize_app_CDS.py
- same as above, but additionally includes a child-directed speech versus other-directed speech category
2_categorize_app_CDS_entire.py
- same as above, but instead samples every other clip from the recording, in chronological order, instead of randomly from the recording
Several notebooks to reliably estimate ambient language characteristics of daylong audio recordings.
3a_bilingual_ratio.ipynb
- estimates the ratio between language categories for a given speaker using responses.csv
from 2_categorize_app_biling.py
or 2_categorize_app_CDS.py
.
3b_cds_biling_ratio.ipynb
- estimates the ratio between type of speech categories (CDS, ODS) for a given speaker using responses.csv
from 2_categorize_app_CDS.py
4_rater_reliability.ipynb
- estimates intra- and inter-rater agreement for language annotation on the basis of clips annotated two and three times. Also concatenates all clips annotated as 'unsure' into filename_relisten.csv
to be fed into the Relisten application and listened to again by two annotators.
5_relisten_app.py
- to appear; draws all files that appear in the filename_relisten.csv
dataframe to listen to again (without replacement)
collect_responses.py
- to appear; a script that will collect all of the responses.csv
files from all of the coded participants and concatenate them together in a single .csv for analysis
This is an example of a workflow to get you from a full daylong recording to one annotated for ambient language characteristics like language choice and speech type via random sampling. It assumes that you have all relevant libraries and contingencies installed.
-
(Optional) If you have multiple .wav files from a given day, sew them together and intersperse with white noise such as
whiteNoise16.wav
(included in this repo). Interspersing white noise allows the annotator to know if a new recording has begun in the middle of a clip.On the command line, type the following:
sox filename1.wav whiteNoise16.wav filename2.wav new_name_of_combined_files.wav
-
Run the chopper and vocal activity detector
Using the command line, move inside of
1_chunk-and-label/
and run the following where childID is the ID of the participant (any length or format), birthdate is YYMMDD, recordingdate is YYYYMMDD, and gender is any way that you would like to code participant gender (e.g. 'Female', 'Non-binary', etc.).python3 split_app.py childID birthdate recordingdate gender
An example of this is the following:
python3 split_app.py 6396 180612 20190515 female
After starting the application, you will be prompted with a window to select an audio file on your local machine to process:
You will be then prompted with a window to select a directory to store the chopped files and generated
config.csv
file:The speed of the VAD depends on your machine, but a good estimate is 1 second/clip or 24 minutes for a 12 hour recording.
-
(Optional) Delimit parts of the recording that you may not want to annotate.
For example, you may not want to listen to any clips where the child is sleeping or where the researcher is present in the recording. That would be a waste of your time. The following example is our approach to finding and eliminating clips where the child is sleeping. You can always modify the source code of
split_app.py
to add additional parameters.Open
config.csv
and note in thepercents_voc
column the clips that have a very, very low percentage of vocal activity (in the realm of 0-7%). Listen to the parts of the recording with this low activity/compare the timestamps to daily activity logs (if you have them) to ensure that the child is in fact sleeping.Mark all of the clips where you believe the child to be sleeping with a
1
and all of the remaining clips with0
.Now when you annotate, you will continue to draw clips where the child is sleeping. However, you will not have to listen to them. Instead you will be prompted with the message
Child is sleeping. Press Next.
on the command line. So press next! -
Begin annotation
Again on the command line, run the annotation application
2_categorize_app_CDS.py
. You will be prompted to select aconfig.csv
file so it does not matter where2_categorize_app_CDS.py
is stored on your machine.python3 2_categorize_app_CDS.py
You will be prompted with a window to select the metadata file that you previously created (
config.csv
):You will be prompted with a window to enter your name (to calculate inter-rater reliability in the event that you have multiple annotators):
The main interactive window will appear, which can be easily modified in the source code according to your annotation goals. For example, in
2_categorize_app_biling.py
, the following window is displayed:And in
2_categorize_app_CDS.py
the following window is displayed:Press "Play." Various metadata details will display in the command line. After the clip, make your selection from the menus and hit "Next" or "Repeat". You can repeat as many times as you would like. To play the next clip in your current annotation session, always press "Next."
When you have completed with that annotation session, you can close the application and restart at any time. There is nothing else that you have to do. Your responses will always be recorded in
responses.csv
-
After you have annotated for a while, you may be interested in your progress. The next step is to run a notebook to evaluate when the ratio between annotation categories has asymptoted for a given recording, indicating that you can stop annotation. An example of this for language annotations in bilingual environments and speech type choices (e.g. child-directed speech) is the following:
Move to the directory where you are storing
3a_bilingual_ratio.ipynb
and/or3b_cds_biling_ratio.ipynb
and type:jupyter notebook
Select the notebook you would like to use from the webpage.
In the second cell of the notebook, write the path name to the
responses.csv
file on your computer for the participant.Run the remainder of the notebook and evaluate your progress. For example, the following area plot demonstrates that the language categories have stablized for this participant:
Cychosz_Villanueva_Weisleder_paper.pdf
- manuscript
Cychosz_Villanueva_Weisleder_poster.pdf
- poster from recent Many Paths to Language Acquisition Conference demosntrating the validity of the random sampling approach
Villanueva_Cychosz_Weisleder.pdf
- poster from recent Many Paths to Language Acquisition Conference documenting dual language input in the US and Bolivia
validation_results.Rmd
- R notebook to generate results and analyses for paper
figures/
- directory with figures generated by validation_results.Rmd
Villaneuva, A., Cychosz, M., & Weisleder, A. (2020). Dual language input from adults and older children in two communities. Poster to given at the Many Paths to Language Acquisition Workshop, Nijmegen, The Netherlands.
Cychosz, M., Villanueva, A., & Weisleder, A. (to appear). Efficient estimation of children’s language exposure in two bilingual communities. Journal of Speech, Language, and Hearing Research. https://psyarxiv.com/dy6v2/
Usoltsev, A. (2015). Voice Activity Detector-Python. GitHub Repository. https://github.com/marsbroshok/VAD-python