Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simulators/devices repeating the same words/sentences during the speech #61

Open
RubenMCCarreira opened this issue Nov 21, 2024 · 3 comments

Comments

@RubenMCCarreira
Copy link

Using this code, the issue I encountered is that on some devices and simulators, everything works fine, but on others, the words I am saying are being repeated. For example, if I say, "Hello, are you there?", the result is: "hello hello are hello are you hello are you there.". I want to show to the user the current text, at the same time that he speaks, and keep the all text to send to the caller.

export interface DictateProps {
  onConfirm: (text: string) => void;
}

const Dictate = ({ onConfirm }: DictateProps) => {
  const [recognizing, setRecognizing] = useState(false);
  const [current, setCurrent] = useState<string>("");
  const [final, setFinal] = useState<string>("");

  useSpeechRecognitionEvent("start", () => setRecognizing(true));

  useSpeechRecognitionEvent("end", () => setRecognizing(false));

  useSpeechRecognitionEvent("result", (event) => {
    console.log(Platform.Version, "Dictate: result", event.results); // eslint-disable-line no-console

    if (!event.results?.length) return;

    setCurrent(event.results[0].transcript.toLowerCase());

    if (event.results[0].confidence > 0) {
      setCurrent("");
      setFinal((current) =>
        (current + " " + event.results[0].transcript)
          .trim()
          .replace(/\s{2,}/g, " ")
          .toLowerCase()
      );
    }
  });

  useSpeechRecognitionEvent("error", (event) => {
    console.error("Dictate: error\n", event.error, "\n", event.message); // eslint-disable-line no-console
  });

  const deviceLanguage = useMemo(() => {
    let lang = "";

    if (Platform.OS === "ios") {
      lang =
        NativeModules.SettingsManager.settings.AppleLocale || NativeModules.SettingsManager.settings.AppleLanguages[0];
    } else {
      lang = NativeModules.I18nManager.localeIdentifier;
    }

    lang = lang.replace(/_/, "-");

    return DICTATE_POSSIBLE_LANGUAGES.includes(lang) ? lang : "en-US";
  }, []);

  const onStart = async () => {
    const { granted } = await ExpoSpeechRecognitionModule.requestPermissionsAsync();

    if (!granted) {
      console.warn("Dictate: Permissions not granted"); // eslint-disable-line no-console
      return;
    }

    ExpoSpeechRecognitionModule.start({
      lang: deviceLanguage,
      interimResults: true,
      maxAlternatives: 1,
      continuous: true,
    });

    setCurrent("");
    setFinal("");
  };

  const onCancel = () => {
    ExpoSpeechRecognitionModule.stop();
    setCurrent("");
    setFinal("");
  };

  const handleConfirm = () => {
    ExpoSpeechRecognitionModule.stop();

    const valueToUse = final || current;
    onConfirm(valueToUse ? valueToUse + "\n" : "");
    setCurrent("");
    setFinal("");
  };


  return (
    <>
      <View style={{ alignItems: "center", }}>
        <IconButton
          onPress={onStart}
          style={{
            borderRadius: 100,
            alignItems: "center",
            flexDirection: "row",
            position: "relative",
            justifyContent: "center",
          }}

        />
        <Text value={"speak_description"} />
      </View>

      <View
        style={{
          position: "relative",
          borderRadius: 20,
          width: "100%",
          minHeight: "40%",
          maxHeight: "90%",
          gap: theme.spacings[8],
        }}
      >
        <Pressable onPress={onCancel} style={{ alignSelf: "flex-start" }}>
          <Text value={"cancel"} />
        </Pressable>

        <Text value={`${final}${current}`} />

        <Button onPress={handleConfirm} text={"confirm"} />
      </View>
    </>
  );
};

export default Dictate;

@RubenMCCarreira RubenMCCarreira changed the title Some phones, mostly ios are repeating over and over agian the same words/sentences Simulators/devices repeating the same words/sentences during the speech Nov 21, 2024
@jamsch
Copy link
Owner

jamsch commented Nov 21, 2024

Hey @RubenMCCarreira, since you're running continuous + interim results, you'll be receiving a list of results like the following:

  • ["hello"]
  • ["hello how"]
  • ["hello how are you"]
  • ["hello how are you doing"] user pauses
  • ["hello how are you doing", "hello how are you today"] { isFinal: true }
  • [" I am"]
  • [" I am doing"] user pauses
  • [" I am doing fine thanks", " I am doing fine thanks"] { isFinal: true }

So that means you should keep two stateful values: A "transcript" variable to show the user, and a "tally"

  • The "tally" should contain the current transcript up to the last final result
  • The "transcript" should be the "tally" + the latest result
  • When a final result comes in, append it to the tally.

Here's an example from the example app:

useSpeechRecognitionEvent("result", (ev) => {
console.log("[event]: result", {
isFinal: ev.isFinal,
transcripts: ev.results.map((result) => result.transcript),
});
const transcript = ev.results[0]?.transcript || "";
setTranscription((current) => {
// When a final result is received, any following recognized transcripts will omit the previous final result
const transcriptTally = ev.isFinal
? (current?.transcriptTally ?? "") + transcript
: (current?.transcriptTally ?? "");
return {
transcriptTally,
transcript: ev.isFinal ? transcriptTally : transcriptTally + transcript,
};
});
});

@RubenMCCarreira
Copy link
Author

so the issue that I am trying to handle is, after a couple of seconds of silence, I am receiving the result and I can't speech anymore. any recommendation or solution? to you this code is according the right lib implementation?

  useSpeechRecognitionEvent("result", (ev) => {
    if (ev.isFinal) {
      if (statusRef.current === "confirm") {
        onConfirm(ev.results.map((result) => result.transcript) + "");
        statusRef.current = undefined;
        setTranscription(initialTranscription);
      } else if (statusRef.current === "cancel") {
        statusRef.current = undefined;
        setTranscription(initialTranscription);
      }
      return;
    }

    const transcript = ev.results[0]?.transcript || "";

    setTranscription((current) => {
      const transcriptTally = ev.isFinal ? current.transcriptTally + transcript : current.transcriptTally;

      return {
        transcriptTally,
        transcript: ev.isFinal ? transcriptTally : transcriptTally + transcript,
      };
    });
  });

  useSpeechRecognitionEvent("error", (event) => {
    console.error("Dictate: error\n", event.error, "\n", event.message); // eslint-disable-line no-console
  });

  const onStart = async () => {
    const { granted } = await ExpoSpeechRecognitionModule.requestPermissionsAsync();

    if (!granted) {
      console.warn("Dictate: Permissions not granted"); // eslint-disable-line no-console
      return;
    }

    ExpoSpeechRecognitionModule.start({
      lang: deviceLanguage,
      interimResults: true,
      maxAlternatives: 1,
      continuous: true,
    });
  };

  const onCancel = () => {
    statusRef.current = "cancel";
    ExpoSpeechRecognitionModule.stop();
  };

  const handleConfirm = () => {
    statusRef.current = "confirm";
    ExpoSpeechRecognitionModule.stop();
  };

@jamsch
Copy link
Owner

jamsch commented Nov 24, 2024

Hey @RubenMCCarreira, I'm not sure what your onConfirm(ev.results.map((result) => result.transcript) + ""); part does here, nor the statusRef & setTranscription(initialTranscription);. There's also an early return which means that the setTranscription(...) logic is only going to be called on interim (non-final) results. What exactly are you trying to achieve here?

Also, just know that you may receive multiple final result events in a speech recognition session and the way you receive them will behave differently depending on a variety of factors (device, speech service, whether it's on-device, etc).

In regards to continuous mode, continuous mode isn't currently available on Android 12 and below so you'll likely need to re-start speech recognition if that's the case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants