ezra is a web application for producing research-quality datasets of annotated audio files from recordings available on the web.
The web contains vast quantities of recorded speech, and much of it is accompanied by transcripts and is therefore discoverable by search engine queries. The web is thus a potentially valuable source of data for speech research. But before audio files harvested from the web can constitute research data, they must be subjected to processing. Each search engine hit must be manually validated, and each token must be extracted with the required amount of context and annotated with the appropriate metadata.
ezra is a simple but powerful web interface allowing non-expert users to perform this processing efficiently. Its effectiveness as a corpus annotation tool has been demonstrated in the production of corpora consisting of thousands of annotated tokens.
A basic tutorial for annotators is available, as is basic developer documentation. Tutorials for supervisors will be available soon.
- Install RVM (may need to run with the
--ruby
flag) - Install ruby 1.9.3:
$ rvm install 1.9.3
- restart the shell
$ git clone [email protected]:del82/ezra.git
$ bundle install
(may need to saygem install bundler
first)- Generate a site-specific secret token.
- Copy
config/initializers/secret_token_template.rb
toconfig/initializers/secret_token.rb
- In
secret_token.rb
replace'your secret token'
with an actual secret token, which can be generated by sayingrake secret
at the shell prompt - Or if you're feeling lazy: run this command from directory root.
touch config/initializers/secret_token.rb; secret=$(rake secret); /
echo "Ezra::Application.config.secret_token = '$secret'" >> config/initializers/secret_token.rb
- Copy
- initialize the db:
- (if pulling)
$ rake db:reset
$ rake db:migrate
$ rake db:populate
$ rake db:test:prepare
- annotate the source:
bundle exec annotate
- run the tests:
$ rspec
- If you like, start guard/spork to detect code changes and run tests automatically.
$ guard
- To start a server, run
rails s
- Hack away
Notes:
- Installation on Ubuntu was aided by this SO answer, as .bashrc was adding .rvm/bin to $PATH before sourcing .rvm/scripts/rvm
- If you're running Linux, you may need to install a javascript runtime like Node.js if you haven't already.