Streamline `PiperEngine.synthesize` to allow use of medium- and high-quality models #244

InspectorCaracal · 2025-01-07T06:04:48Z

The current implementation requires writing a temporary .wav file as well as forces the sample rate to 16000 via validation of that file. However:

The piper command through subprocess.run can return the raw audio data which the player needs, so the file write isn't necessary.
Most models are medium-quality and thus 22050, so the sample rate restriction to 16000 is extremely limiting.
The model configs already indicate the sample rate to be used for playback, which the existing playback system correctly handles with no additional effort.

This PR modifies the synthesize method to change the file-output parameter given to Piper to raw output that can be added directly to the queue, and removes the WAV-file validation since reading a WAV file is no longer necessary. The change is primarily intended to allow using all sizes of piper voices, but should also reduce I/O overhead.

KoljaB · 2025-01-07T10:04:00Z

That's a great simplification of the code, thank you for that!

I wasn't aware that piper does not always synthesizes with 16000 Hz. This leaves one issue: every engine reports the exact sample rate by implementing get_stream_info to get the output stream initialized properly.

Currently this is still hardcoded to 16000 in the PiperEngine, like this:

    def get_stream_info(self):
        """
        Returns PyAudio stream configuration for Piper.

        Returns:
            tuple: (format, channels, rate)
        """
        return pyaudio.paInt16, 1, 16000

By removing writing of the wav file we lose the opportunity to read that sample rate directly from the wav.

I'd love to hear your opinion. What would you suggest how we should handle this? We can rewrite the wav file which as you correctly stated introduces I/O overhead and thus is ugly. We could also add a parameter to the PiperEngine constructor allowing for customization of the sample rate by the user. Also not perfect, since it requires active interaction.

Maybe there's a third option which I can't figure that right now. What do you think? Thanks again for the PR.

InspectorCaracal · 2025-01-07T22:46:47Z

Ahh, interesting! I didn't notice that since I only tested that the final audio wasn't being played at the wrong speed.

Since piper requires a configuration file in JSON format and has a documented fallback location for when it isn't specified, the best way is probably reading that file and integrating the needed fields into the PiperVoice class, then referencing it from there. Should be straightforward, I'll try it out in a bit.

streamline piper synthesis method

7c67c97

KoljaB mentioned this pull request Jan 13, 2025

Piper Engine "Unexpected WAV Properties: Channels=1, Rate=22050, Width=2" #250

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streamline `PiperEngine.synthesize` to allow use of medium- and high-quality models #244

Streamline `PiperEngine.synthesize` to allow use of medium- and high-quality models #244

InspectorCaracal commented Jan 7, 2025 •

edited

Loading

KoljaB commented Jan 7, 2025

InspectorCaracal commented Jan 7, 2025

Streamline PiperEngine.synthesize to allow use of medium- and high-quality models #244

Are you sure you want to change the base?

Streamline PiperEngine.synthesize to allow use of medium- and high-quality models #244

Conversation

InspectorCaracal commented Jan 7, 2025 • edited Loading

KoljaB commented Jan 7, 2025

InspectorCaracal commented Jan 7, 2025

Streamline `PiperEngine.synthesize` to allow use of medium- and high-quality models #244

Streamline `PiperEngine.synthesize` to allow use of medium- and high-quality models #244

InspectorCaracal commented Jan 7, 2025 •

edited

Loading