-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue#13 transcribe #209
base: master
Are you sure you want to change the base?
Issue#13 transcribe #209
Conversation
…tle taken from iota.subject
…ranscribe event , got rid of transcribe in create partcipant
app/models/transcribe.js
Outdated
|
||
class Transcribe extends MongoModels { | ||
static create(obj) { | ||
return new Prommise(async, (ok, ko) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JS Promises should be spelled with one m. async followed by a comma seems to be a syntax error. ko and ok are parameters but are vague.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible you are looking at an older commit, this file was deleted 2 weeks ago?
app/models/transcribe.js
Outdated
const result = await this.insertOne(doc) | ||
if (result && result.length === 1) ok(result[0]) | ||
else { | ||
const msg = ` unexpected number of results receivec ${results.length}` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
received is misspelled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see above
(ok,ko) came from me and I learned it from someone else. I like it
because it's way shorter than resolve and reject, and they are the
reverse of each other, and If someone throws a punch (error) you get
knocked out (ko'd).
I have code spell checker installed in VSC to help with spelling -but it
has it's disadvantages too, like it doesn't recognize variable name.
It would give you a little squiggly under Prommise and under receivec
…On 6/15/2020 11:53 AM, MrNanosh wrote:
***@***.**** commented on this pull request.
------------------------------------------------------------------------
In app/models/transcribe.js
<#209 (comment)>:
> +
+const Joi = require('joi')
+const MongoModels = require('mongo-models')
+
+const schema = Joi.object({
+ _id: Joi.object(),
+ path: Joi.string(),
+ subject: Joi.string().required(),
+ description: Joi.string().required(),
+ component: Joi.object(),
+ userId: Joi.string(),
+})
+
+class Transcribe extends MongoModels {
+ static create(obj) {
+ return new Prommise(async, (ok, ko) => {
JS Promises should be spelled with one m. async followed by a comma
seems to be a syntax error. ko and ok are parameters but are vague.
------------------------------------------------------------------------
In app/models/transcribe.js
<#209 (comment)>:
> + path: Joi.string(),
+ subject: Joi.string().required(),
+ description: Joi.string().required(),
+ component: Joi.object(),
+ userId: Joi.string(),
+})
+
+class Transcribe extends MongoModels {
+ static create(obj) {
+ return new Prommise(async, (ok, ko) => {
+ try {
+ const doc = new Transcribe(obj)
+ const result = await this.insertOne(doc)
+ if (result && result.length === 1) ok(result[0])
+ else {
+ const msg = ` unexpected number of results receivec ${results.length}`
received is misspelled.
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
<#209 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZJ537F5KLBBQLO5MMGKBDRWZUZJANCNFSM4N3AA4ZA>.
|
) // .some to stop after finding the first one | ||
if (transcribe) { | ||
if (!parentIota.webComponent) parentIota.webComponent = {} | ||
if (!parentIota.webComponent.metaTags) parentIota.webComponent.metaTags = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the metaTags here are for sharing information to facebook about this page. The transcription information should be put into webComponent.participant[the participant]. It's going to be a little challenging though to figure out which .participant is for which transcription - but I think you can use the userId field to make the association.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I admit the description of the structure of the parentIota here is very weak. The best description is in: https://github.com/EnCiv/undebate/blob/master/app/components/data-components/merge-participants.js
If you go down to the part where it says "what we are trying to create"
Also - I see that, at least as documented, the userId is not there. We may need to find some other way to associate the transcription to the participant - or we may just need have the mergeParticipants component add the userId.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only thing I can come up with is we can match by url , example: "https://res.cloudinary.com/hf6mryjpf/video/upload/v1566510654/5d5b73c01e3b194174cd9b92-1-speaking.webm"
I do see this being highly inefficient. I think the best method will be as you suggested , a userid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have been thinking about Hartford and how to have a second and third round of undebates - meaning first round is BP's questions, and second round is hartfords questions. I think that userId will not be sufficient. I am thinking that when participants are merged - mergeParticipant into parent needs to get the participant Iota's _id. Then when merging transcription you can look for the corresponding _id to the one you transcribed. This also solves the problem that a participants re-records their answers - which creates a new partcipant records, - but there isn't a translation for it yet. If you update the last forEach loop of merge-participants-into-parent.js you'll get both participantId, and userId.
limitedLatestParticipants.forEach(participantDoc => {
parentIota.webComponent.participants[audience + nextIndex++] = { participantId: participantDoc._id.toString(), userId: participantDoc.userId, ...participantDoc.component.participant}
})
this code is not tested though.
note that a weirdism of Mongo is that _id is an object. In this project I have made up a convention that that is the only place where it's an object. So parentId, and userId, and now participantId are all strings. I've just had too much trouble with inconsistency about this in past projects. If you know of other conventions or anything - I'd be excited to talk about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would the path field(""path": "/schoolboard-demo"") be a better way to match?
// the list is sorted by date, find the first / youngest child with a socialpreview | ||
let transcribe | ||
childIotas.some(iota => | ||
iota.component && iota.component.component === 'Transcription' ? (transcribe = iota) : false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there are multiple candidates in an undebate - there will be multiple transcription records - one for each user. This is different than smpreview where there was only one preview for the whole page. I think you are going to have to do something like:
let transcribes=childIotas.reduce((transcribes,iota)=>iota.component && iota.component.component === 'Transcription' ? (transcribes.push(iota),transcribes) : transcribes)
transcribes will be a list of the iota's that are transcriptions.
app/server/events/transcribe.js
Outdated
let convertedFile = speakingFile.replace('.mp4', '.wav') | ||
let chunkedFile = fs.createWriteStream('chunkedFile.wav', 'base64') | ||
let request = https.get(convertedFile, function(resp) { | ||
logger.info('Status code is:' + Object.getOwnPropertyNames(resp)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is a challenge - I think that after you do https.get resp.body will be the wav file data and all you need to do is audioString = resp.body.toString('base64'). But I'm not sure and their may be another layer of structure with resp.body that you need to drill into. But lets try to save the time of writing to a file and a reading it back out. There may be format issues to resolve but I bet we can figure it out buy looking into what comes back in resp.body and maybe comparing that to what we see in audiostring if you run it now.
enciv-transcribe.json
Outdated
@@ -0,0 +1,12 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Danger! you should not check in private_key and things into github. I see you added it to .gitignore but that was probably after there was a git add .
You will need to move the file out of the directory, git another git add ., and then a git commit -m "removed enciv-transcribe.json" in order to get rid of it.
I'm not sure how this code getting the keys into the api call - are they set in env variables somewhere? We'll need to document it somehow so people can set it up. (like me).
Also - you are going to need to get new keys - these are public.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking of adding installing instructions into the readme after we do the merge to master. To get a key you need to put your credit card info, i hope this isnt a barrier for new developers.
app/server/events/transcribe.js
Outdated
main(audioString).catch(console.error) | ||
}) | ||
}) | ||
async function main(audioByte) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest that variable names of lists/arrays end in s like audioBytes
app/server/events/transcribe.js
Outdated
languageCode: 'en-US', | ||
enableWordTimeOffsets: true, | ||
} | ||
const request = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since main is called within the closure of a function which is tied to a variable called request it is better to not use a const name called request here. It is confusing but also might cause errors.
enciv-transcribe.json
Outdated
"type": "service_account", | ||
"project_id": "enciv-1583191497701", | ||
"private_key_id": "21076fa78d8d13f853162eb2333f3f10d7b0664e", | ||
"private_key": "-----BEGIN PRIVATE KEY-----\nMIIEvwIBADANBgkqhkiG9w0BAQEFAASCBKkwggSlAgEAAoIBAQCZmEH4rAUNKCJm\n7+xsPn0mDivwe7AVFBIZV6MQDZJs9oo0U0Q9PrlQOxI+9fJ9OJ669MeYW28cCQbc\nhPlOSGuZMtWcTKqAB5rEax9wFf37O4pss8GIvk9BF60WizPcSaNa5r76p+R/zuer\ngufjzJX5pdZ5Xvty1U1OOqNOTHh4YliGSLPdo3GvvL2q72y1myAyPE7DzkGkYAQt\nUO9JMAyDlZBHyy4DMFLSebipkcqzd4vHhps+qfnvBWyoqXCe5nsLrsiJqCbp6C3O\n/g3nKG7Vx0OQ8TJLqPED1G48/foNDnqoDMatn6Y96LF5YzpEGpj0zNw6C43ITPTe\nR5oFndtfAgMBAAECggEALryb5nVBnD1IKpZ7FHz3S+soB6c7b06KK1f1cF8Q3UMv\nzrg/nXtGnFk9NhdU0DG4ax8s1PmNl7RPeC6mReHXi+hiA4t4njiyKW6HRG4MuLPn\nbShNjbSLHT19F80H3NIzeOeZ2V/ZMeLdr9zHfxOz1yFVX909GjY5rcI+CwdN6SNa\n7kZNrkoqk9VW59Y3md3aT5NI2XgmcFl6Qq+oJGUy+ngx/pmhSevrPzq7h1t5DfUO\n7qkS/wTAVtvQrwbp57svMsfB08ausKS5jE2bjVyatmOt4Qu+NiwzrAoPi7oQ9EFa\nPRrpyq4KLc4H6J7SqoAiErLqv9J6aMPdcRnQ5Gh2gQKBgQDTIZWlvb3DWnvgQFDu\nVrXiihUNDufsXqnulY+ljR6UHGpGZlIXRurgDiFUDo8w6dfHmMkTeIqF7C3r4Yo5\nO/i4PKV3umW05/iU7nLR0Ij5uE6kCwjG37hicvQHZDpOYW+2kBAQ8OYbv+j/HOu5\nNuNhCMqwgHioN67ISRC0pX0kIQKBgQC6PHOjzI9ToAtMavA/keRXB9JR8uqYorsb\n1F+cnHId9MY6XPjSFnM8ja9D/2+OzmXpDgFd3zxYXV4ydb00ehlK/IuAxoYQ3TQK\n9hHmK1CB8IkuAcq/3bKoZC5HFxTLClrgWvuRl8F2n0Us7DVtXjVWO5YqWsssE6GP\nfoCHygsPfwKBgQCq04ejXMxHXdTQ8vkIfrwXmaXCtQbN5yITWoupZW8SY5NRdsSA\n9O5hbs1kl4sgBnSCmIpI6MUb6qaVLh2KhY2Oc/Nl1jkokHA/AFeCj/nkI03tyMfH\nMYicj2oG+P98H7YlvpZhPqjQVwyMLbhEWbbL8jMdcDUv1i3i+8s6fpOsQQKBgQCr\nBwMHpL7nBC38EqTpLDiu9/7cxSFN5PvB1emsMDvYaMZ2KJVUkctoC5Gt93Fiiwcp\n3HPC2lRXrf7ohhmojIyXwY73RtktuYamnk3Xu2VmvxeriXfBdX4xiDa7kGXHBI5q\nRQOMM/o1zpQ/afiypHaE55nC8bhtlOWkmn68tP4tTwKBgQCa1RKW27N4JMmlaBBh\n93uiKcZ/0m13C4zRW3Bhfm+21A9xfC/nFDMUlHDklR5NZ/02XPuuBf71XibyCn13\n743QKY2KRQpWsZpA4rfjFrhwItKgAmvcveeanLL8vUC4h4egLEOn++tuAbbZIbNA\nJiQXroG0XtYj/X/coUbbk3O6TA==\n-----END PRIVATE KEY-----\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Private keys should go in some sort of env file that is untracked. This should be scrubbed from the git history.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in your .bashrc file create:
export GOOGLE_TRANSCRIPTION_PROJECT_ID="..."
export GOOGLE_TRANSCRIPTION_KEY_ID="..."
export GOOGLE_TRANSCRIPTION_KEY="..."
then you can run:
source .bashrc
heroku config:set GOOGLE_TRANSCRIPTION_PROJECT_ID=$GOOGLE_TRANSCRIPTION_PROJECT_ID
heroku config:set GOOGLE_TRANSCRIPTION_KEY_ID=$GOOGLE_TRANSCRIPTION_KEY_ID
heroku config:set GOOGLE_TRANSCRIPTION_KEY=$GOOGLE_TRANSCRIPTION_KEY
If you have troubles - on heroku I had to go in through the web user interface and go to my app, and go to settings, and Reveal Config Vars - and then edited the GOOGLE_TRANSCIPTION_KEY add a newline at the end. (meaning go to the end of the line and hit return so it looks like two lines). I have this same sort of key configuration for gmail from the server and I had to figure that out.
…able are not defined
… into issue#13-transcribe
Cloudinary process: We use the cloudinary api to upload the video file to cloudinary and google speech to text api. We use the video url and replace the .mp4 extension with .transcript . Once we do that we can call the url and extract the contents of the .transcript file. |
Just reached out to cloudinary, this is what they said: "The transcription gets queued in an async process, so you'll need to wait for that process to finish. I did notice that the documentation doesn't include those details, so I will have them add it but first let me test it." I will try to get an estimate of the wait time, if the wait time is too long we can just do google transcribe streaming |
In
https://cloudinary.com/documentation/google_ai_video_transcription_addon#:~:text=With%20the%20Google%20AI%20Video,best%20possible%20speech%20recognition%20results.it
says:
The google_speech parameter value activates a call to Google's Cloud
Speech API, which is performed _asynchronously after your original
method call is completed_. Thus your original method call response
displays a pending status:
...
"info": {
"raw_convert": {
"google_speech": {
"status": "pending"
}
}
}
...
When the google_speech request is complete (may take several seconds
or minutes depending on the length of the video), a new raw file is
created in your account with the same public ID as your video or
audio file and with the .transcript file extension.
If you also provided a notification_url in your method call, the
specified URL then receives a notification when the process completes:
Here is the documentation on notifications:
https://cloudinary.com/documentation/notifications
We are going to have to create athis.app.post(...) handler in
server.js. But we should keep putting feature specific code in there -
but lets get it working first and then clean it up. I've got time to
talk this through on Friday if anyone's available.
…On 7/16/2020 7:53 PM, epg323 wrote:
Just reached out to cloudinary, this is what they said: "The
transcription gets queued in an async process, so you'll need to wait
for that process to finish. I did notice that the documentation
doesn't include those details, so I will have them add it but first
let me test it."
I will try to get an estimate of the wait time, if the wait time is
too long we can just do google transcribe streaming
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
<#209 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZJ536VOBU6IA6GIRU5KULR364LFANCNFSM4N3AA4ZA>.
|
closes issue#13