Skip to content

del82/Prosodylab-Aligner

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

88 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Prosodylab-Aligner, v. 1.0

Scripts for alignment of laboratory speech production data

Funding

  • FQRSC Nouvelle Chercheur NP-132516
  • SSHRC Canada Research Chair 218503
  • SSHRC Digging Into Data Challenge Grant 869-2009-0004

License

See included "LICENSE"

Citation

Please you use this tool, we would appreciate if you cited the following paper:

Gorman, Kyle, Jonathan Howell and Michael Wagner. 2011. Prosodylab-Aligner: A Tool for Forced Alignment of Laboratory Speech. Canadian Acoustics. 39.3. 192–193.

Usage

USAGE: ./align.py [OPTIONS] data_to_be_aligned/

Option              Function

-a                  Perform speaker adaptation,
                    w/ or w/o prior training

-d dictionary       specify a dictionary file     [default: dictionary.txt]

-h                  Display this message
-m                  List files containing
                    out-of-dictionary words

-n n                Number of training iterations [default: 4]
                    for each step of training
                    (NB: available only with -t)

-s samplerate (Hz)  Samplerate for models         [default: 8000]
                    (NB: available only with -t)

-t training_data/   Perform model training

FAQ

What is forced alignment?

Forced alignment can be thought of as the process of finding the times at which individual sounds and words appear in an audio recording under the constraint that words in the recording follow the same order as they appear in the transcript. This is accomplished much in the same way as traditional speech recognition, but the problem is somewhat easier given the constraints on the "language model" imposed by the transcript.

What is forced alignment good for?

The primary use of forced alignment is to eliminate the need for human annotation of time-boundaries for acoustic events of interest. Perhaps you are interested in sound change: forced alignment can be used to locate individual vowels in a sociolinguistic interview for formant measurement. Perhaps you are interested in laboratoy speech production: forced alignment can be used to locate the target word for pitch measurement.

Can I use Prosodylab-Aligner for languages other than English?

Yes! If you have a few hours of high quality speech and associated word-level transcripts, Prosodylab-Aligner can induce a new acoustic model, then compute the best alignments for said data according to the acoustic model.

What are the limitations of forced alignment?

Forced alignment works well for audio from speakers of similar dialects with little background noise. Aligning data with considerable dialect variation, or to speech embedded in noise or music, is currently state of the art.

How can I improve alignment quality?

You can train your own acoustic models, using as much training data as possible, or try to reduce the noise in your test data before aligning.

How does Prosodylab-Aligner differ from HTK?

The Hidden Markov Model Toolkit (HTK) is a set of programs for speech recognition and forced alignment. The HTK book describes how to train acoustic models and perform forced alignment. However, the procedure is rather complex and the error messages are cryptic. Prosodylab-Aligner essentially automates the HTK forced alignment workflow.

How does Prosodylab-Aligner differ from the Penn Forced Aligner?

The Penn Forced Aligner (P2FA) provides forced alignment for American English using an acoustic model derived from audio of US Supreme Court oral arguments. Prosodylab-Aligner has a number of additional capabilities, most importantly acoustic model training, and it is possible in theory to use Prosodylab-Aligner to simulate P2FA.

Installing

The scripts require a version of Python no earlier than 2.6, a BASH-compatible shell located in /bin/sh, and curl. All these will be installed on recent Macintosh computers as well as most computers running Linux. The scripts included here also assume that HTK and SoX are installed on your system. While these scripts can also be made to work on Windows computers, it is non-trivial and not described here.

Installing SoX

Linux

On Linux or similar POSIX-based systems, SoX can be obtained from the distribution-specific package manager (apt-get, yum, etc.), or can be compiled from source without too much difficulty.

Mac OS X

On Mac OS X it may be obtained via package managers like http://brew.sh. The SoX maintainers also provide compiled binaries, which can be downloaded from SourceForge: click on the link after "Looking for the latest version?". The zip file can be expanded by double-clicking on it. The resulting files must be placed in your $PATH. A simple way to do this is to navigate to the resulting directory, and issue the following command:

$ sudo mv rec play sox soxi /usr/local/bin

This will prompt for your password; type it in (it will not "echo", as ***), and hit Enter when you're done.

Checking installation

You can confirm that SoX is installed by issuing the following command in any directory:

$ sox --version
sox: SoX v14.3.2

Note that your version may be different: align.py has been tested for this version, but it should work for both somewhat older versions as well as for the foreseeable future.

Installing HTK

You will need first to download HTK's source code.

Note that you will have to make an account and agree to their restricted distribution license. Once you obtain the "tarball", the following command (adjusting for version number) should unpack it:

$ tar -fvxz htk-3.4.1.tar.gz

Note that if your browser automatically attempts to unpack compressed files upon download, you may get the following error:

tar: Must specify one of -c, -r, -t, -u, -x

In this case, use the command (again adjusting for version number):

$ tar -xf htk-3.4.1.tar.gz

Once you extract the application, navigate into the resulting directory:

$ cd htk

64-bit x86 Linux

Run the following commands:

$ ./configure --disable-hslab --disable-hlmtools
...
$ make all
...
$ sudo make install
...

Mac OS X

By default, no C compiler is installed on Mac OS X. There are a few quick ways to get one. You can get a full set of compilers by downloading Xcode from the Mac App Store. This package is really quite large and may take days(!) to download. A good alternative is to download the new Command Line Tools for Xcode package on the Mac App Store, which is much smaller. You will need a free registration to download either package.

Once that's taken care of, execute the following commands in the "htk/" directory you just navigated to:

$ ./configure --disable-hslab --disable-hlmtools
...
$ make all
...
$ sudo make install
...

Checking installation

You can confirm that HTK is installed by issuing the following command in any directory:

$ HCopy -V
HTK Version Information
Module     Version    Who    Date      : CVS Info
HCopy      3.4.1      CUED   12/03/09  : $Id: HCopy.c,v 1.1.1.1 2006/10/11 09:54:59 jal58 Exp $
...

Tutorial

Obtaining a dictionary

First, obtain an appropriate pronunciation dictionary. Currently, the aligner comes with a file "dictionary.txt" intended for use with American English. Some dictionaries we have created are available at the prosodylab.dictionaries repository. Other dictionaries can be found online, or written for specific tasks. If you're working with RP speakers, CELEX might be a good choice. For languages with regular, transparent orthographies, you may want to create a simple rule-based grapheme-to-phoneme system as a series of ordered rules.

Aligning one pair

Imagine you simply want to align multiple audio files with their associated label files, in the following format:

file data/myexp_1_1_1.*
data/myexp_1_1_1.lab: ASCII text
data/myexp_1_1_1.wav: RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 22050 Hz

cat data/myexp_1_1_1.lab
BARACK OBAMA WAS TALKING ABOUT HOW THERE'S A MISUNDERSTANDING THAT ONE MINORITY GROUP CAN'T GET ALONG WITH ANOTHER SUCH AS AFRICAN AMERICANS AND LATINOS AND HE'S SAID THAT HE HIMSELF HAS SEEN IT HAPPEN THAT THEY CAN AND HE'S BEEN INVOLVED WITH GROUPS OF OTHER MINORITIES

In the case that you only want to align one .wav/.lab pair, perhaps to test out the system, the script align_ex.sh is provided, and can be used like the following:

$ ./align_ex.sh data/myexp_1_1_1.wav data/myexp_1_1_1.lab
...

Assuming alignment is successful, this script will copy the resulting TextGrid file (called myexp_1_1_1.TextGrid) to the current directory for your inspection.

Aligning multiple pairs

If you'd like to align multiple .wav/.lab file pairs, and they're all in a single directory, aligning them is as simple as:

$ ./align.py data/
...

This will compute the best alignments, and then place then in Praat TextGrids in the data/ directory.

Likely errors

Several errors can occur at this stage.

Unpaired data

First, if a .lab file in data/ is not paired with a .wav file in the same directory, or vis versa, then align.py will quit and report the unpaired data to unpaired.txt. You can read this file to figure out what files are missing, or use it to delete present, but unpaired, files. The following will delete unpaired files, after they are found by align.py and written to unpaired.txt.

$ rm `xargs -d '\n' < unpaired`

Out of dictionary words

Secondly, a word in your .lab files may be missing from the dictionary. Such words are written to outofdict.txt. You can transcribe these in outofdict.txt using a text editor, then mix them back in like so:

$ ./sort.py dictionary.txt outofdict.txt > tmp; 
$ mv tmp dictionary.txt

If you call align.py with the argument -m, each word in outofdict.txt is paired with a list of .lab files where it occurs. This may be useful for fixing typos in the .lab files.

If you are transcribing new words using the CMU phone set, see this page for IPA equivalents.

SoX not installed

Also, if SoX is not installed, but it needed because the audio is in a different format than the provided models (sampled at 8000 Hz and mono), an error result.

align.py not executable ("Permission denied")

Lastly, the file align.py may not be marked as executable on your system, in which case you'll get an error like the following:

$ ./align.py data/ 
-bash: ./align.py: Permission denied

On Linux or Mac OS X, the following command should do the trick:

$ chmod +x ./align.py

Then, run align.py like above.

Out of space errors

The align.py script makes prodigious use of "temporary" disk space. On Linux (in particular), it is possible that this space is limited by the OS, and align.py will fail with number of cascading errors referring to disc space. A simple way to fix this is to use a temporary directory located somewhere else. If the environmental variable $TMPDIR is defined and it points to a writeable directory, align.py will use it.

$ mkdir ~/tmpdir
$ export TMPDIR=~/tmpdir

Training your own models

The align.py script also allows you to train your own models, where the folder for training is specified by a directory after the -t flag

$ ./align.py -t test_data/ data/
...

Please note: THIS REQUIRES A LOT OF DATA to work well, and further takes a long time when there is a lot of data. It is also possible to train on your test data, and in fact it is something we do quite often at the lab. That looks like:

$ ./align.py -t data/ data/
...

When -t is specified, a few other command-line options to align.py become available. The -s flag specifies samplerate for the models used, and if SoX is installed, both training and testing data will be resampled to this rate, if they do not match it. For instance, to use 44010 Hz models, you could say:

$ ./align.py -s 44010 -t data data
...

Note that the slash character </> is not obligatory in specifying directories: align.py assumes these are directory names, possibly including wildcards, and expands the wildcards if possible.

$ ./align.py -d MY_DICTIONARY.txt -t data data
...

Lastly, the -n flag may be used to specify the number of training iterations per "round": align.py performs three rounds of training, each of which take approximately the same time, so the effect of increasing this value by one is approximately 3-fold. By default, -n is 4 (so 12 iterations of training in all), but the following command would set it at 5 (or 15 rounds of training):

$ ./align.py -n 4 -t data data
...

Other options are documented above.

Importing the module

Users who are familiar with Python are encouraged to import align.py as a Python module if it makes sense for their application.

Making your own default acoustic model

Many users have requested the ability to store an acoustic model for future use. Prosodylab-Aligner is not built with this in mind, but it is certainly possible for technically-inclined users to save their acoustic models for reuse.

  1. Open align.py in a text editor.
    1. Change the global variable DEBUG to True.
    2. Then, edit the global variable CMU\_PHONES so that it contains the same phoneset as your training data.
    3. Exit the text editor.
  2. Gather the training data and perform model training with the -t flag.
  3. At the end of training and alignment, align.py will print out the location of the temporary directory where the resulting acoustic models are stored.
    1. Navigate to this directory, then to the subdirectory HMM.
    2. You will see a number of numbered subdirectories here. Go to the second-highest numbered subdirectory (e.g., if the last subdiretory is 9, go to 8).
    3. Copy the files hmmdefs and macros to the subdirectory MOD where Prosodylab-Aligner is located.
  4. To return to normal operation, change the global variable DEBUG in align.py back to False.

Note that this will overwrite the default acoustic model, so you may want to keep multiple copies of the Prosodylab-Aligner directory.

About

Python interface for forced audio alignment using HTK and SoX

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 98.6%
  • Shell 1.4%