TTS • Custom Speech Synthesis

Wait… what?!?!

I worked with TTS for a short while. Mainly via OpenAI’s TTS APIs.

While browsing through the MBS plugins I ran into the Speech section.

Is it possible to sample any human voice, and use that to generate my custom TTS?
I looked at the Apple Frameworks. And it vaguely points to some possibilities.
If that all works as they promise, it could open up a lot of opportunities, right?

Are there any folks out there with these custom TTS functionalities?

You can use an AI tech like RVC and use it to train a voice model. Then you can use any TTS that’s similar to your target voice and pass it through RVC to transform it to something that is nearly indistinguishable from the source voice. It won’t work realtime, and it works fastest with Nvidia hardware, but the quality is the best available at the moment.

