Skip to content

Latest commit

 

History

History
8 lines (6 loc) · 808 Bytes

README.md

File metadata and controls

8 lines (6 loc) · 808 Bytes

Corpus of Tom Lehrer's music

Tom Lehrer, being an absolute legend, has given free permission to download and use all music/lyrics written by him and stored on https://tomlehrersongs.com/

Current contents

I've written a web scraper which downloaded all of the lyric PDFs from the site. The PDFs are under lyrics-pdf/2020-12-26. The code I used is in the code folder.

Goals

I'm planning, at the very least, to create an unannotated NLTK corpus of the lyrics to all of these songs. If I feel like it, I might enlist friends to annotate rhymes or other features of interest. I'd also like to scrape the sheet music PDFs from the website and maybe do something with those, but first I have to make sure I understand which are written by Mr. Lehrer vs. copyrighted by others and parodied/adapted by him.