Transcribe Vimeo videos #6

alexdresko · 2018-09-27T13:49:15Z

Plan is to loop through all of the video data at https://github.com/RESTFest/videos.restfest.org/tree/master/_data/videos and pass them to a transcription API. We'll turn the result into JSON and drop it somewhere here in teh repo.

Where to drop the JSON
What transcription API

The text was updated successfully, but these errors were encountered:

alexdresko · 2018-09-27T13:52:05Z

@bradgrace

alexdresko · 2018-09-27T14:46:07Z

Here's my initial research:

Important questions

How many hours of video do we have?

General requirements

From what I've seen while investigating various speech to text APIs, it's always either cheaper or a requirement, that audio is submitted to the API as opposed to video. Based on that, it looks like our solution will need to download the videos locally for processes. If the chosen API requires audio only, we'll have to extract the audio from the videos, then push the audio to the API.

Currently, downloads are not enabled for videos.restfest.org. Here's what I found about that:

https://help.vimeo.com/hc/en-us/articles/229678128-Downloading-videos

Plus, PRO, Business, and Premium members have the option to enable their videos for download. If you have a Basic membership and upgrade your account, the option for enabling downloads will be automatically turned on.

The availability of videos for download depends on the subscription tier of the video’s creator. Basic members cannot enable their videos to be downloaded; however, if the video belongs to a Plus, Pro, or Business member, they have the option to toggle on the download option.

Plus, PRO, and Business members have the ability to store their original, untranscoded source files right here on Vimeo. This means that, if you are a Plus, PRO, or Business member that chooses to store your source files, you will always be able download your original file, as long as you maintain your paid subscription with us. You can choose to make your original file downloadable by others, too.

While Basic members can indeed download the source files made available by Plus, PRO, and Business members, they do not have the ability to store or share their own source files on Vimeo.

Google Cloud Speech to Text

is an option, but it wouldn't be free.

https://cloud.google.com/speech-to-text/

Cloudinary

Is this the service Benjamin was talking about? If so, hook me up with some contact deets. If not, what was the service you were talking about and hook me up with some contact deets. :)

Azure Cognitive Services

https://azure.microsoft.com/en-us/pricing/details/cognitive-services/speech-services/

This looks to be the winner for now. The free tier allows 5 hours of transcription per month. It might take a couple months, but we can work with that.

Kaldi?

May be a thing. I mean, it's a thing. Just not sure what kind of thing it is.

http://kaldi-asr.org/doc/index.html

IBM's thing

IBM has a thing. Hard cap at 100 minutes of translation.

https://www.ibm.com/watson/services/speech-to-text/?S_PKG=AW&cm_mmc=Search_Google-_-Watson+Core_Watson+Core+-+Discovery-_-WW_NA-_-+software++speech++to++text_Broad_&cm_mmca1=000000OF&cm_mmca2=10000409&cm_mmca7=9010601&cm_mmca8=kwd-320402843704&cm_mmca9=40e9640b-9708-4a80-a52b-5e5e01593f90&cm_mmca10=260794964851&cm_mmca11=b&mkwid=40e9640b-9708-4a80-a52b-5e5e01593f90|1081|13959&cvosrc=ppc.google.%2Bsoftware%20%2Bspeech%20%2Bto%20%2Btext&cvo_campaign=000000OF&cvo_crid=260794964851&Matchtype=b&gclid=EAIaIQobChMIhOr4k63b3QIVlMDICh2ExAdMEAAYAiAAEgIkN_D_BwE

The AWS thing

Ultimately, it's got a hard cap at 12 hours

https://aws.amazon.com/transcribe/pricing/

The non-profit advantage

We could speed this up if restfest was a non-profit. There's always a non-profit tier with these services that gives you additional monthly credits. Should we create an issue to track the non-profit effort?

alexdresko · 2018-09-27T14:49:06Z

@bradgrace calculated 75.8 hours of video by summing all the duration properties in the Vimeo JSON data. He's made a Powershell script that collects and parses all of the data. From that, we've got a good starting point for scripting the translation.

Given Azure's 5/hours per month limit, it would take 16 months to translate all of the videos... UNLESS we split the job across multiple computers. :)

alexdresko · 2018-09-27T15:14:14Z

Getting started with C# - the next sample is more important, but this sample at least shows how to get the nuget package.

Continuous speech recognition from a file

Our goal is to attempt to translate those C# samples to Powershell. Powershell runs on most OSs now. If Powershell fails us, we'll switch to .NET Core.

alexdresko · 2018-09-27T15:28:31Z

We'll use https://www.ffmpeg.org/ to extract the wav from the mp4

https://superuser.com/questions/609740/extracting-wav-from-mp4-while-preserving-the-highest-possible-quality

I confirmed I can use ffmpeg to extract the wav from one of the restfest mp4 files that I downloaded manually.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transcribe Vimeo videos #6

Transcribe Vimeo videos #6

alexdresko commented Sep 27, 2018

alexdresko commented Sep 27, 2018

alexdresko commented Sep 27, 2018

alexdresko commented Sep 27, 2018 •

edited

Loading

alexdresko commented Sep 27, 2018

alexdresko commented Sep 27, 2018 •

edited

Loading

Transcribe Vimeo videos #6

Transcribe Vimeo videos #6

Comments

alexdresko commented Sep 27, 2018

alexdresko commented Sep 27, 2018

alexdresko commented Sep 27, 2018

Important questions

General requirements

Google Cloud Speech to Text

Cloudinary

Azure Cognitive Services

Kaldi?

IBM's thing

The AWS thing

The non-profit advantage

alexdresko commented Sep 27, 2018 • edited Loading

alexdresko commented Sep 27, 2018

alexdresko commented Sep 27, 2018 • edited Loading

alexdresko commented Sep 27, 2018 •

edited

Loading

alexdresko commented Sep 27, 2018 •

edited

Loading