Simulators/devices repeating the same words/sentences during the speech #61

RubenMCCarreira · 2024-11-21T19:07:58Z

Using this code, the issue I encountered is that on some devices and simulators, everything works fine, but on others, the words I am saying are being repeated. For example, if I say, "Hello, are you there?", the result is: "hello hello are hello are you hello are you there.". I want to show to the user the current text, at the same time that he speaks, and keep the all text to send to the caller.

export interface DictateProps {
  onConfirm: (text: string) => void;
}

const Dictate = ({ onConfirm }: DictateProps) => {
  const [recognizing, setRecognizing] = useState(false);
  const [current, setCurrent] = useState<string>("");
  const [final, setFinal] = useState<string>("");

  useSpeechRecognitionEvent("start", () => setRecognizing(true));

  useSpeechRecognitionEvent("end", () => setRecognizing(false));

  useSpeechRecognitionEvent("result", (event) => {
    console.log(Platform.Version, "Dictate: result", event.results); // eslint-disable-line no-console

    if (!event.results?.length) return;

    setCurrent(event.results[0].transcript.toLowerCase());

    if (event.results[0].confidence > 0) {
      setCurrent("");
      setFinal((current) =>
        (current + " " + event.results[0].transcript)
          .trim()
          .replace(/\s{2,}/g, " ")
          .toLowerCase()
      );
    }
  });

  useSpeechRecognitionEvent("error", (event) => {
    console.error("Dictate: error\n", event.error, "\n", event.message); // eslint-disable-line no-console
  });

  const deviceLanguage = useMemo(() => {
    let lang = "";

    if (Platform.OS === "ios") {
      lang =
        NativeModules.SettingsManager.settings.AppleLocale || NativeModules.SettingsManager.settings.AppleLanguages[0];
    } else {
      lang = NativeModules.I18nManager.localeIdentifier;
    }

    lang = lang.replace(/_/, "-");

    return DICTATE_POSSIBLE_LANGUAGES.includes(lang) ? lang : "en-US";
  }, []);

  const onStart = async () => {
    const { granted } = await ExpoSpeechRecognitionModule.requestPermissionsAsync();

    if (!granted) {
      console.warn("Dictate: Permissions not granted"); // eslint-disable-line no-console
      return;
    }

    ExpoSpeechRecognitionModule.start({
      lang: deviceLanguage,
      interimResults: true,
      maxAlternatives: 1,
      continuous: true,
    });

    setCurrent("");
    setFinal("");
  };

  const onCancel = () => {
    ExpoSpeechRecognitionModule.stop();
    setCurrent("");
    setFinal("");
  };

  const handleConfirm = () => {
    ExpoSpeechRecognitionModule.stop();

    const valueToUse = final || current;
    onConfirm(valueToUse ? valueToUse + "\n" : "");
    setCurrent("");
    setFinal("");
  };


  return (
    <>
      <View style={{ alignItems: "center", }}>
        <IconButton
          onPress={onStart}
          style={{
            borderRadius: 100,
            alignItems: "center",
            flexDirection: "row",
            position: "relative",
            justifyContent: "center",
          }}

        />
        <Text value={"speak_description"} />
      </View>

      <View
        style={{
          position: "relative",
          borderRadius: 20,
          width: "100%",
          minHeight: "40%",
          maxHeight: "90%",
          gap: theme.spacings[8],
        }}
      >
        <Pressable onPress={onCancel} style={{ alignSelf: "flex-start" }}>
          <Text value={"cancel"} />
        </Pressable>

        <Text value={`${final}${current}`} />

        <Button onPress={handleConfirm} text={"confirm"} />
      </View>
    </>
  );
};

export default Dictate;

The text was updated successfully, but these errors were encountered:

jamsch · 2024-11-21T21:07:29Z

Hey @RubenMCCarreira, since you're running continuous + interim results, you'll be receiving a list of results like the following:

["hello"]
["hello how"]
["hello how are you"]
["hello how are you doing"] user pauses
["hello how are you doing", "hello how are you today"] { isFinal: true }
[" I am"]
[" I am doing"] user pauses
[" I am doing fine thanks", " I am doing fine thanks"] { isFinal: true }

So that means you should keep two stateful values: A "transcript" variable to show the user, and a "tally"

The "tally" should contain the current transcript up to the last final result
The "transcript" should be the "tally" + the latest result
When a final result comes in, append it to the tally.

Here's an example from the example app:

expo-speech-recognition/example/App.tsx

Lines 87 to 106 in eef1cd0

    
           useSpeechRecognitionEvent("result", (ev) => { 
        
             console.log("[event]: result", { 
        
               isFinal: ev.isFinal, 
        
               transcripts: ev.results.map((result) => result.transcript), 
        
             }); 
        
             const transcript = ev.results[0]?.transcript || ""; 
        
             setTranscription((current) => { 
        
               // When a final result is received, any following recognized transcripts will omit the previous final result 
        
               const transcriptTally = ev.isFinal 
        
                 ? (current?.transcriptTally ?? "") + transcript 
        
                 : (current?.transcriptTally ?? ""); 
        
               return { 
        
                 transcriptTally, 
        
                 transcript: ev.isFinal ? transcriptTally : transcriptTally + transcript, 
        
               }; 
        
             }); 
        
           });

RubenMCCarreira · 2024-11-24T11:21:17Z

so the issue that I am trying to handle is, after a couple of seconds of silence, I am receiving the result and I can't speech anymore. any recommendation or solution? to you this code is according the right lib implementation?

  useSpeechRecognitionEvent("result", (ev) => {
    if (ev.isFinal) {
      if (statusRef.current === "confirm") {
        onConfirm(ev.results.map((result) => result.transcript) + "");
        statusRef.current = undefined;
        setTranscription(initialTranscription);
      } else if (statusRef.current === "cancel") {
        statusRef.current = undefined;
        setTranscription(initialTranscription);
      }
      return;
    }

    const transcript = ev.results[0]?.transcript || "";

    setTranscription((current) => {
      const transcriptTally = ev.isFinal ? current.transcriptTally + transcript : current.transcriptTally;

      return {
        transcriptTally,
        transcript: ev.isFinal ? transcriptTally : transcriptTally + transcript,
      };
    });
  });

  useSpeechRecognitionEvent("error", (event) => {
    console.error("Dictate: error\n", event.error, "\n", event.message); // eslint-disable-line no-console
  });

  const onStart = async () => {
    const { granted } = await ExpoSpeechRecognitionModule.requestPermissionsAsync();

    if (!granted) {
      console.warn("Dictate: Permissions not granted"); // eslint-disable-line no-console
      return;
    }

    ExpoSpeechRecognitionModule.start({
      lang: deviceLanguage,
      interimResults: true,
      maxAlternatives: 1,
      continuous: true,
    });
  };

  const onCancel = () => {
    statusRef.current = "cancel";
    ExpoSpeechRecognitionModule.stop();
  };

  const handleConfirm = () => {
    statusRef.current = "confirm";
    ExpoSpeechRecognitionModule.stop();
  };

jamsch · 2024-11-24T11:29:46Z

Hey @RubenMCCarreira, I'm not sure what your onConfirm(ev.results.map((result) => result.transcript) + ""); part does here, nor the statusRef & setTranscription(initialTranscription);. There's also an early return which means that the setTranscription(...) logic is only going to be called on interim (non-final) results. What exactly are you trying to achieve here?

Also, just know that you may receive multiple final result events in a speech recognition session and the way you receive them will behave differently depending on a variety of factors (device, speech service, whether it's on-device, etc).

In regards to continuous mode, continuous mode isn't currently available on Android 12 and below so you'll likely need to re-start speech recognition if that's the case.

RubenMCCarreira changed the title ~~Some phones, mostly ios are repeating over and over agian the same words/sentences~~ Simulators/devices repeating the same words/sentences during the speech Nov 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simulators/devices repeating the same words/sentences during the speech #61

Simulators/devices repeating the same words/sentences during the speech #61

RubenMCCarreira commented Nov 21, 2024

jamsch commented Nov 21, 2024 •

edited

Loading

RubenMCCarreira commented Nov 24, 2024

jamsch commented Nov 24, 2024 •

edited

Loading

Simulators/devices repeating the same words/sentences during the speech #61

Simulators/devices repeating the same words/sentences during the speech #61

Comments

RubenMCCarreira commented Nov 21, 2024

jamsch commented Nov 21, 2024 • edited Loading

RubenMCCarreira commented Nov 24, 2024

jamsch commented Nov 24, 2024 • edited Loading

jamsch commented Nov 21, 2024 •

edited

Loading

jamsch commented Nov 24, 2024 •

edited

Loading