Voice
Voice
CONCEPT
Alpha
Cloning a voice has come along way, however majority of them still sounds like a robot. Case in point billion dollar companies still can't make Alexa or Siri sound realistic. We use zero shot voice cloning initially using anywhere from 10 seconds to 3 minutes of audio. The results wow's most people initially but eventually actually having a conversation you can easily point out discrepancies in longer voice.
Beta
Our goal is to use mix in human like words like "umm" and just grammatical goof ups and gaffs that we normally mess up on at any given moment. This is based on how bad of an english student your conversations are based on. We also mix in laughing, burps, coughs, clearing throat noises, etc. for added realism. For that we need about 60-90 minutes of audio.
Gamma
Now that we have realistic imperfect wording, then we introduce changes simple things like pitch, speed, pauses, etc. The harder part is incorperating emotional dramatic nuances in the phrases. For this we recommend about 3 hours of audio. that we clean, fine tune and eventually bake into a fully cloned voice.