Skip to content

Commit a36a57a

Browse files
committed
Add speech recognition context to the Web Speech API
Introduce a new speech recognition context feature for contextual biasing
1 parent c0694cb commit a36a57a

File tree

2 files changed

+110
-2
lines changed

2 files changed

+110
-2
lines changed

.gitignore

+2
Original file line numberDiff line numberDiff line change
@@ -1 +1,3 @@
11
index.html
2+
.DS_Store
3+
.idea/

index.bs

+108-2
Original file line numberDiff line numberDiff line change
@@ -162,12 +162,14 @@ interface SpeechRecognition : EventTarget {
162162
attribute boolean interimResults;
163163
attribute unsigned long maxAlternatives;
164164
attribute SpeechRecognitionMode mode;
165+
attribute SpeechRecognitionContext context;
165166

166167
// methods to drive the speech interaction
167168
undefined start();
168169
undefined start(MediaStreamTrack audioTrack);
169170
undefined stop();
170171
undefined abort();
172+
undefined updateContext(SpeechRecognitionContext context);
171173
static Promise<boolean> availableOnDevice(DOMString lang);
172174
static Promise<boolean> installOnDevice(DOMString lang);
173175

@@ -192,7 +194,8 @@ enum SpeechRecognitionErrorCode {
192194
"network",
193195
"not-allowed",
194196
"service-not-allowed",
195-
"language-not-supported"
197+
"language-not-supported",
198+
"context-not-supported"
196199
};
197200

198201
enum SpeechRecognitionMode {
@@ -247,6 +250,30 @@ dictionary SpeechRecognitionEventInit : EventInit {
247250
unsigned long resultIndex = 0;
248251
required SpeechRecognitionResultList results;
249252
};
253+
254+
// The object representing a phrase for contextual biasing.
255+
[Exposed=Window]
256+
interface SpeechRecognitionPhrase {
257+
constructor(DOMString phrase, optional float boost = 1.0);
258+
readonly attribute DOMString phrase;
259+
readonly attribute float boost;
260+
};
261+
262+
// The object representing a list of biasing phrases.
263+
[Exposed=Window]
264+
interface SpeechRecognitionPhraseList {
265+
constructor();
266+
readonly attribute unsigned long length;
267+
SpeechRecognitionPhrase item(unsigned long index);
268+
undefined addItem(SpeechRecognitionPhrase item);
269+
};
270+
271+
// The object representing a recognition context collection.
272+
[Exposed=Window]
273+
interface SpeechRecognitionContext {
274+
constructor(SpeechRecognitionPhraseList phrases);
275+
readonly attribute SpeechRecognitionPhraseList phrases;
276+
};
250277
</xmp>
251278

252279
<h4 id="speechreco-attributes">SpeechRecognition Attributes</h4>
@@ -277,6 +304,9 @@ dictionary SpeechRecognitionEventInit : EventInit {
277304

278305
<dt><dfn attribute for=SpeechRecognition>mode</dfn> attribute</dt>
279306
<dd>An enum to determine where speech recognition takes place. The default value is "ondevice-preferred".</dd>
307+
308+
<dt><dfn attribute for=SpeechRecognition>context</dfn> attribute</dt>
309+
<dd>This attribute will set the speech recognition context for the recognition session to start with.</dd>
280310
</dl>
281311

282312
<p class=issue>The group has discussed whether WebRTC might be used to specify selection of audio sources and remote recognizers.
@@ -322,12 +352,22 @@ See <a href="https://lists.w3.org/Archives/Public/public-speech-api/2012Sep/0072
322352
The user agent must raise an <a event for=SpeechRecognition>end</a> event once the speech service is no longer connected.
323353
If the abort method is called on an object which is already stopped or aborting (that is, start was never called on it, the <a event for=SpeechRecognition>end</a> or <a event for=SpeechRecognition>error</a> event has fired on it, or abort was previously called on it), the user agent must ignore the call.</dd>
324354

355+
<dt><dfn method for=SpeechRecognition>updateContext({{SpeechRecognitionContext}} |context|)</dfn> method</dt>
356+
<dd>
357+
The updateContext method updates the speech recognition context after the speech recognition session has started.
358+
If the session has not started yet, user should update {{SpeechRecognition/context}} instead of using this method.
359+
360+
When invoked, run the following steps:
361+
1. If {{[[started]]}} is <code>false</code>, throw an {{InvalidStateError}} and abort these steps.
362+
1. If the system does not support speech recognition context, throw a {{SpeechRecognitionErrorEvent}} with the {{context-not-supported}} error code and abort these steps.
363+
1. The system updates its speech recognition context to be |context|.
364+
</dd>
365+
325366
<dt><dfn method for=SpeechRecognition>availableOnDevice({{DOMString}} lang)</dfn> method</dt>
326367
<dd>The availableOnDevice method returns a Promise that resolves to a boolean indicating whether on-device speech recognition is available for a given BCP 47 language tag. [[!BCP47]]</dd>
327368

328369
<dt><dfn method for=SpeechRecognition>installOnDevice({{DOMString}} lang)</dfn> method</dt>
329370
<dd>The installOnDevice method returns a Promise that resolves to a boolean indicating whether the installation of on-device speech recognition for a given BCP 47 language tag initiated successfully. [[!BCP47]]</dd>
330-
331371
</dl>
332372

333373
When the <dfn>start session algorithm</dfn> with
@@ -344,6 +384,9 @@ following steps:
344384
1. If |requestMicrophonePermission| is `true` and [=request
345385
permission to use=] "`microphone`" is [=permission/"denied"=], abort
346386
these steps.
387+
1. If {{SpeechRecognition/context}} is not null and the system does not support
388+
speech recognition context, throw a {{SpeechRecognitionErrorEvent}} with the
389+
{{context-not-supported}} error code and abort these steps.
347390
1. Once the system is successfully listening to the recognition, queue a task to
348391
[=fire an event=] named <a event for=SpeechRecognition>start</a> at [=this=].
349392

@@ -437,6 +480,9 @@ For example, some implementations may fire <a event for=SpeechRecognition>audioe
437480

438481
<dt><dfn enum-value for=SpeechRecognitionErrorCode>"language-not-supported"</dfn></dt>
439482
<dd>The language was not supported.</dd>
483+
484+
<dt><dfn enum-value for=SpeechRecognitionErrorCode>"context-not-supported"</dfn></dt>
485+
<dd>The speech recognition model does not support speech recognition context.</dd>
440486
</dl>
441487
</dd>
442488

@@ -515,6 +561,66 @@ For a non-continuous recognition it will hold only a single value.</p>
515561
Note that when resultIndex equals results.length, no new results are returned, this may occur when the array length decreases to remove one or more interim results.</dd>
516562
</dl>
517563

564+
<h4 id="speechreco-phrase">SpeechRecognitionPhrase</h4>
565+
566+
<p>The SpeechRecognitionPhrase object represents a phrase for contextual biasing.</p>
567+
568+
<dl>
569+
<dt><dfn constructor for=SpeechRecognitionPhrase>SpeechRecognitionPhrase(|phrase|, |boost|)</dfn> constructor</dt>
570+
<dd>
571+
When invoked, run the following steps:
572+
1. If the |phrase| is an empty string, throw a "{{SyntaxError}}" {{DOMException}}.
573+
1. If the |boost| is smaller than 0.0 or greater than 10.0, throw a "{{SyntaxError}}" {{DOMException}}.
574+
1. Construct a new SpeechRecognitionPhrase object with |phrase| and |boost|.
575+
1. Return the object.
576+
</dd>
577+
578+
<dt><dfn attribute for=SpeechRecognitionPhrase>phrase</dfn> attribute</dt>
579+
<dd>This attribute is the text string to be boosted.</dd>
580+
581+
<dt><dfn attribute for=SpeechRecognitionPhrase>boost</dfn> attribute</dt>
582+
<dd>This attribute is approximately the natural log of the number of times more likely the website thinks this phrase is than what the speech recognition model knows.
583+
A valid boost must be a float value inside the range [0.0, 10.0], with a default value of 1.0 if not specified.
584+
A boost of 0.0 means the phrase is not boosted at all, and a higher boost means the phrase is more likely to appear.
585+
A boost of 10.0 means the phrase is extremely likely to appear and should be rarely set.
586+
</dd>
587+
</dl>
588+
589+
<h4 id="speechreco-phraselist">SpeechRecognitionPhraseList</h4>
590+
591+
<p>The SpeechRecognitionPhraseList object holds a sequence of phrases for contextual biasing.</p>
592+
593+
<dl>
594+
<dt><dfn constructor for=SpeechRecognitionPhraseList>SpeechRecognitionPhraseList()</dfn> constructor</dt>
595+
<dd>This constructor returns an empty list.</dd>
596+
597+
<dt><dfn attribute for=SpeechRecognitionPhraseList>length</dfn> attribute</dt>
598+
<dd>This attribute indicates how many phrases are in the list. The user agent must ensure it is set to the number of phrases in the list.</dd>
599+
600+
<dt><dfn method for=SpeechRecognitionPhraseList>SpeechRecognitionPhrase(|index|)</dfn> method</dt>
601+
<dd>
602+
This method gets the SpeechRecognitionPhrase object at the |index| of the list.
603+
When invoked, run the following steps:
604+
1. If the |index| is smaller than 0, or greater than or equal to {{SpeechRecognitionPhraseList/length}}, return null.
605+
1. Return the SpeechRecognitionPhrase from the |index| of the list.
606+
</dd>
607+
608+
<dt><dfn method for=SpeechRecognitionPhraseList>addItem(|item|)</dfn> method</dt>
609+
<dd>This method adds the SpeechRecognitionPhrase object |item| to the end of the list.</dd>
610+
</dl>
611+
612+
<h4 id="speechreco-context">SpeechRecognitionContext</h4>
613+
614+
<p>The SpeechRecognitionContext object holds contextual information to provide to the speech recognition models.</p>
615+
616+
<dl>
617+
<dt><dfn constructor for=SpeechRecognitionContext>SpeechRecognitionContext(|phrases|)</dfn> constructor</dt>
618+
<dd>This constructor returns a new SpeechRecognitionContext object with the SpeechRecognitionPhraseList object |phrases| in it.</dd>
619+
620+
<dt><dfn attribute for=SpeechRecognitionContext>phrases</dfn> attribute</dt>
621+
<dd>This attribute represents the phrases to be boosted.</dd>
622+
</dl>
623+
518624
<h3 id="tts-section">The SpeechSynthesis Interface</h3>
519625

520626
<p>The SpeechSynthesis interface is the scripted web API for controlling a text-to-speech output.</p>

0 commit comments

Comments
 (0)