StyleTTS: Reference Audio ineffective #248

andrewhowdencom · 2025-01-12T21:27:51Z

Holaa! I hope you're doing well.

Am trying to get the StyleTTS model working, but cannot for the life of me get it to change at all based on the "reference audio". Without getting into too many details, snippet is here:

        # Check to see if the file in the voices path exists before starting the model. Otherwise, the model falls back
        # to a "default voice" somehow.
        p = os.path.join(self.reference_audio_path, f"{voice}.wav", )
        if not os.path.isfile(p):
            raise Exception(f"Voice file not found: {p}")

        voice = StyleTTSVoice(
            model_config_path=self.model_config_path,
            model_checkpoint_path=self.model_checkpoint_path,
            ref_audio_path=p
        )

        engine = StyleTTSEngine(
            style_root=self.styletts_checkout,
            diffusion_steps=15,
            voice=voice,
        )

        stream = TextToAudioStream(
            engine=engine,
            tokenizer="stanza",
            muted=True,
        )

        with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as tmp_wav:  # Create temp file for result
            stream.feed(_split_paragraph(sys.stdin.read())).play(
                output_wavfile=tmp_wav.name,
            )

        engine.shutdown()

        # Process the input wav into something that the output can use. The input filename is passed directly to ffmpeg,
        # implicitly allowing the user to set formats (e.g. mp3)
        (
            ffmpeg.input(tmp_wav.name)
                .output(to)
                .overwrite_output()
                .run()
        )

Is there something obvious I am missing? It all seems to work similarly to another StyleTTS version (NeuralVox), but I can't quite connect the dots here.

Is there a working example of this?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

StyleTTS: Reference Audio ineffective #248

StyleTTS: Reference Audio ineffective #248

andrewhowdencom commented Jan 12, 2025

StyleTTS: Reference Audio ineffective #248

StyleTTS: Reference Audio ineffective #248

Comments

andrewhowdencom commented Jan 12, 2025