Homie's gonna have a wild night
Homie's gonna have a wild night
Homie's gonna have a wild night
If anyone has an Nvidia card and a couple sound samples of Marge's voice -- you don't need a ton of VRAM for this like you do Stable Diffusion, can do it on an 8GB card -- you can fire up Tortoise TTS and render this in her voice.
Is TTS voice replication much better self hosted than it was about half a year ago? Last time I tried it didn't sound like the person I had samples of, instead I had to use elevenlabs to get close to sounding right
I mean, that's a subjective question. I think it's decent. Here's some samples:
https://nonint.com/static/tortoise_v2_examples.html
Last time I was running it, Tortoise TTS didn't have a way to directly annotate voice with intonation or emotional stuff. The best you can do is pulling tricks like using a feature that lets you add some words to a sentence that aren't actually spoken to affect the emotional impact of the words that are (e.g. sad words to make the spoken words be spoken in a sad voice).
Imagine the difference between someone saying gloatingly "none of you will survive" and someone saying it in an agonized voice.
I do wonder a bit whether it'd be possible to train it on a corpus that's been automatically annotated with output from software that does sentiment analysis on text, and then generate keywords that one could use to alter the sound of sentences. I don't think that this is so much a fundamental limitation of the software as it is limitations in the training set.
So hot. How does she do it?
Sailor Marge
The one time when the wrong number of fingers is the right number of fingers.