Skip to content

Commit

Permalink
validate uploaded vtt files (#144)
Browse files Browse the repository at this point in the history
* validate uploaded vtt files

* add missing vtt file

Co-authored-by: Sax <[email protected]>
  • Loading branch information
adamthesax and Sax authored Mar 17, 2020
1 parent d36cb7c commit 14c1f7a
Show file tree
Hide file tree
Showing 22 changed files with 710 additions and 1 deletion.
4 changes: 3 additions & 1 deletion go.mod
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
module github.com/NYTimes/video-captions-api

require (
cloud.google.com/go v0.54.0
cloud.google.com/go v0.54.0 // indirect
cloud.google.com/go/datastore v1.1.0
cloud.google.com/go/storage v1.6.0
github.com/NYTimes/gizmo v1.3.5
github.com/NYTimes/gziphandler v1.1.1
github.com/google/uuid v1.1.1
Expand All @@ -11,6 +12,7 @@ require (
github.com/nytimes/threeplay v0.3.2
github.com/sirupsen/logrus v1.4.2
github.com/stretchr/testify v1.5.1
github.com/tdewolff/parse/v2 v2.4.2
)

go 1.13
114 changes: 114 additions & 0 deletions go.sum

Large diffs are not rendered by default.

23 changes: 23 additions & 0 deletions providers/upload.go
Original file line number Diff line number Diff line change
@@ -1,10 +1,13 @@
package providers

import (
"bytes"
"fmt"
"path/filepath"

captionsConfig "github.com/NYTimes/video-captions-api/config"
"github.com/NYTimes/video-captions-api/database"
"github.com/NYTimes/video-captions-api/vtt"
log "github.com/sirupsen/logrus"
)

Expand Down Expand Up @@ -53,6 +56,12 @@ func (c *UploadProvider) GetProviderJob(job *database.Job) (*database.ProviderJo
// DispatchJob sets the status of the upload job as delivered so
// that a call to check the job status uploads it to the cloud.
func (c *UploadProvider) DispatchJob(job *database.Job) error {
err := c.validateCaptionFile(&job.CaptionFile)

if err != nil {
return err
}

job.Status = "delivered"
job.ProviderParams = map[string]string{
"ProviderID": job.ID,
Expand All @@ -65,3 +74,17 @@ func (c *UploadProvider) DispatchJob(job *database.Job) error {
func (c *UploadProvider) CancelJob(job *database.Job) (bool, error) {
return false, nil
}

// validateCaptionFile checks the contents of the uploaded file to
// ensure it's a valid captions fle. It uses extension to determine
// which type of file check to perform. Currently only .vtt validation
// is supported.
func (c *UploadProvider) validateCaptionFile(file *database.UploadedFile) error {
ext := filepath.Ext(file.Name)

if ext == ".vtt" {
return vtt.Validate(bytes.NewReader(file.File))
}

return nil
}
4 changes: 4 additions & 0 deletions vtt/testdata/cue-invalid-timestamp.vtt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
WEBVTT
00:11.00 --> 00:13.000
We are in New York City
Empty file added vtt/testdata/empty.vtt
Empty file.
4 changes: 4 additions & 0 deletions vtt/testdata/garbage-cue.vtt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
WEBVTT
00:00.000 -->
cue text
1 change: 1 addition & 0 deletions vtt/testdata/garbage-signature.vtt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
garbage vtt file
21 changes: 21 additions & 0 deletions vtt/testdata/many-comments.vtt
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
WEBVTT
NOTE
This file was written by Jill. I hope
you enjoy reading it. Some things to
bear in mind:
- I was lip-reading, so the cues may
not be 100% accurate
- I didn't pay too close attention to
when the cues should start or end.
00:01.000 --> 00:04.000
Never drink liquid nitrogen.

NOTE check next cue
00:05.000 --> 00:09.000
- It will perforate your stomach.
- You could die.

NOTE end of file
4 changes: 4 additions & 0 deletions vtt/testdata/no-space-cue-times-arrow.vtt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
WEBVTT
00:00.000--> 00:01.000 line:40% size:40%
cue text
4 changes: 4 additions & 0 deletions vtt/testdata/no-space-cue-times-cue-settings.vtt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
WEBVTT
00:00.000 --> 00:01.000line:40% size:40%
cue text
40 changes: 40 additions & 0 deletions vtt/testdata/sample.vtt
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
WEBVTT
00:11.000 --> 00:13.000
<v Roger Bingham>We are in New York City

00:13.000 --> 00:16.000
<v Roger Bingham>We’re actually at the Lucern Hotel, just down the street

00:16.000 --> 00:18.000
<v Roger Bingham>from the American Museum of Natural History

00:18.000 --> 00:20.000
<v Roger Bingham>And with me is Neil deGrasse Tyson

00:20.000 --> 00:22.000
<v Roger Bingham>Astrophysicist, Director of the Hayden Planetarium

00:22.000 --> 00:24.000
<v Roger Bingham>at the AMNH.

00:24.000 --> 00:26.000
<v Roger Bingham>Thank you for walking down here.

00:27.000 --> 00:30.000
<v Roger Bingham>And I want to do a follow-up on the last conversation we did.

00:30.000 --> 00:31.500 align:right size:50%
<v Roger Bingham>When we e-mailed—

00:30.500 --> 00:32.500 align:left size:50%
<v Neil deGrasse Tyson>Didn’t we talk about enough in that conversation?

00:32.000 --> 00:35.500 align:right size:50%
<v Roger Bingham>No! No no no no; 'cos 'cos obviously 'cos

00:32.500 --> 00:33.500 align:left size:50%
<v Neil deGrasse Tyson><i>Laughs</i>

00:35.500 --> 00:38.000
<v Roger Bingham>You know I’m so excited my glasses are falling off here.
4 changes: 4 additions & 0 deletions vtt/testdata/signature-bad-comment.vtt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
WEBVTT-I should have a space before this comment

00:00.000 --> 00:02.000
text
4 changes: 4 additions & 0 deletions vtt/testdata/signature-comment.vtt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
WEBVTT - This is totally legit, but don't know why
00:00.000 --> 00:02.000
text
3 changes: 3 additions & 0 deletions vtt/testdata/signature-no-new-line.vtt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
WEBVTT
00:11.000 --> 00:13.000
Test
4 changes: 4 additions & 0 deletions vtt/testdata/signature-space.vtt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
WEBVTT
00:00.000 --> 00:02.000
text
4 changes: 4 additions & 0 deletions vtt/testdata/signature-tab.vtt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
WEBVTT
00:00.000 --> 00:02.000
text
6 changes: 6 additions & 0 deletions vtt/testdata/style-invalid-css.vtt
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
WEBVTT
STYLE
::cue {
background-image linear-gradient(to bottom, dimgray, lightgray);
colorpapayawhip
7 changes: 7 additions & 0 deletions vtt/testdata/style.vtt
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
WEBVTT
STYLE
::cue {
background-image: linear-gradient(to bottom, dimgray, lightgray);
color: papayawhip;
}
4 changes: 4 additions & 0 deletions vtt/testdata/with-bom.vtt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
WEBVTT
00:00.000 --> 00:02.000
text
25 changes: 25 additions & 0 deletions vtt/testdata/with-header.vtt
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
WEBVTT
Kind: captions
Language: en-US
Channel: CC1
Station: Online ABC
ProgramID: SH010855880000
ProgramType: TV series
ProgramName: Castle
Title: Law & Murder
Season: 3
Episode: 19
PublishDate: 2011-03-28
ContentAdvisory: TV-14
00:11.000 --> 00:13.000
<v Roger Bingham>We are in New York City

00:13.000 --> 00:16.000
<v Roger Bingham>We’re actually at the Lucern Hotel, just down the street

00:16.000 --> 00:18.000
<v Roger Bingham>from the American Museum of Natural History

00:18.000 --> 00:20.000
<v Roger Bingham>And with me is Neil deGrasse Tyson
Loading

0 comments on commit 14c1f7a

Please sign in to comment.