Skip to content

Add speech recognition phrase list to the Web Speech API #145

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 14, 2025

Conversation

yrw-google
Copy link
Contributor

@yrw-google yrw-google commented Feb 24, 2025

Introduce a new speech recognition phrases feature for contextual biasing


Preview | Diff

@yrw-google
Copy link
Contributor Author

yrw-google commented Feb 27, 2025

Hi @padenot, can you review this one too and also take a look at the explainer again when you get a chance? Feel free to assign another reviewer too if needed

index.bs Outdated
[Exposed=Window]
interface SpeechRecognitionPhraseList {
readonly attribute unsigned long length;
getter SpeechRecognitionPhrase item(unsigned long index);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is invalid WebIDL.

It would be either:

SpeechRecognition item(unsigned long index);

or

getter SpeechRecognition(unsigned long index)

What is the intent here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed this to SpeechRecognitionPhrase item(unsigned long index), but using getter is actually how SpeechRecognitionResultList is doing it, as well as some list objects I've seen in other specs, e.g. https://html.spec.whatwg.org/multipage/common-dom-interfaces.html#the-domstringlist-interface. I thought it's a standard thing to always define a getter for a list, but I don't see that getter is required in our use case, so I can either keep it or remove it.

@yrw-google yrw-google force-pushed the main branch 5 times, most recently from 5be471c to f1cd16f Compare March 4, 2025 23:21
@yrw-google
Copy link
Contributor Author

Hi @padenot, I've updated the specs as well as the explainer according to your comments. Please take a look again when you get a chance. Thanks!

@padenot
Copy link
Member

padenot commented Mar 5, 2025

Can we please focus on either the explainer or the spec patch? If we have a spec patch, an explainer shouldn't be necessary. If we aren't comfortable writing the spec patch right now because we want to iterate, it doesn't seem useful to update the spec patch.

Let me know which one I should look at first please?

@yrw-google
Copy link
Contributor Author

Hi @padenot, you can focus on the spec patch right now. I'm keeping the spec patch and the explainer in sync and the spec patch has many more details than the explainer, so if we can reach consensus on the spec, we can also reach consensus on the explainer easily.

I think the explainer is still necessary when we want to launch this feature since we will be asked for a link to an explainer in many places. The explainer is also a good place to show why we want to add contextual biasing, and provides brief introduction on the changes for people who don't want to learn about every detail in specs.

@padenot
Copy link
Member

padenot commented Mar 10, 2025

I think the explainer is still necessary when we want to launch this feature since we will be asked for a link to an explainer in many places. The explainer is also a good place to show why we want to add contextual biasing, and provides brief introduction on the changes for people who don't want to learn about every detail in specs.

It's not necessary. Details can and should be in informative notes or MDN.

@yrw-google
Copy link
Contributor Author

I think the explainer is still necessary when we want to launch this feature since we will be asked for a link to an explainer in many places. The explainer is also a good place to show why we want to add contextual biasing, and provides brief introduction on the changes for people who don't want to learn about every detail in specs.

It's not necessary. Details can and should be in informative notes or MDN.

I've closed the PR for explainer. Can you now review the spec changes?

@yrw-google yrw-google requested a review from padenot March 10, 2025 22:17
Copy link
Member

@padenot padenot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting there! lmk if I have been unclear in certain parts of the review.

@yrw-google
Copy link
Contributor Author

Hi @padenot, I decided to remove SpeechRecognitionContext and updateContext() since they seem redundant at this point and can cause confusion, so instead we will add SpeechRecognitionPhraseList phrases to SpeechRecognition directly and always update this attribute. I've also addressed your other comments so please take a look again.

@yrw-google yrw-google requested a review from padenot March 13, 2025 23:49
Copy link
Member

@padenot padenot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking a lot better, some questions still, but it feels we're almost there.

@yrw-google
Copy link
Contributor Author

Hi @padenot, a gentle ping for reviewing this

Copy link
Member

@padenot padenot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there!

index.bs Outdated
The setter steps are:
1. If the {{SpeechRecognitionPhraseList/length}} of {{SpeechRecognition/phrases}} is greater than 0
and the system using the given value for {{SpeechRecognition/[[mode]]}} does not support contextual biasing,
throw a {{SpeechRecognitionErrorEvent}} with the {{SpeechRecognitionErrorCode/not-allowed}} error code and abort these steps.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not-supported for consistency? Or we want to clearly distinguish the two? What do we lose by using "normal" errors here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that when user is trying to set the mode here but get a phrases-not-supported error, it seems weird that the error is not related to mode directly, but we can also add error message to explain this better, so I'll switch to phrases-not-supported for consistency.

yrw-google and others added 3 commits March 31, 2025 15:24
Introduce a new speech recognition context feature for contextual biasing
Remove SpeechRecognitionContext and add SpeechRecognitionPhraseList to SpeechRecognition directly

Remove updateContext and always update phrases instead

Rename context-not-supported error code to phrases-not-supported

Add removeItem to SpeechRecognitionPhraseList
@yrw-google yrw-google changed the title Add speech recognition context to the Web Speech API Add speech recognition phrase list to the Web Speech API Apr 10, 2025
@yrw-google
Copy link
Contributor Author

Hi @padenot, another ping on reviewing this.

@yrw-google
Copy link
Contributor Author

Hi @padenot, thanks for approving but I didn't find how I can merge this pull request. Since it shows "1 workflow awaiting approval" and "This workflow requires approval from a maintainer", does it mean you need to run something on your side to start the merge?

@evanbliu evanbliu merged commit bc21038 into WebAudio:main Apr 14, 2025
1 check passed
github-actions bot added a commit that referenced this pull request Apr 14, 2025
SHA: bc21038
Reason: push, by evanbliu

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@padenot
Copy link
Member

padenot commented Apr 15, 2025

Thanks @evanbliu for sorting it out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants