‘Instant’ voice cloning is a type of one-shot learning, where only few utterances are available for the target speaker and cutting-edge AI can be used to reproduce most personal sayings. Traditional voice synthesis can require tremendous amounts of data and hours to days to generate a model, while instant voice cloning models only need several seconds of recordings for speaker reconstruction in minutes. This is enabled through sophisticated deep learning algorithms that have been crafted to process vast amounts of data — often as large as terabytes worth of voice recordings, which in turn takes into account all types of nuances such as pitch, tone and speed.
In speed, an immediate voice cloning reduces the time to replicate a new voice and some systems create it in under 5 minutes. That quick time to market has been attractive in gaming, content creation and customer service where firms want speedy performance while tailoring tools. For example, Lyrebird has stated that it uses the technology to aid in creating custom voices for virtual assistants which helps reduce production costs by 30% over traditional processes.
Another important aspect is the accuracy of these models. Generate by Respeecher has been shown to have an error rate of just 1-2%, meaning that voices can be cloned almost perfectly, and they study continues research in these type of technologies. These levels of precision have landed voice cloning as a story in the entertainment world, where we are seeing projects utilizing it to give voices back or even dub actors into other languages.
On the flip side, instant voice cloning verges on a deep ethical question. A prominent instance occurred when a cloned voice was employed in an act of deepfake fraud, causing funds amounting to millions for one European company. It does make clear the necessity for regulation and authentication to prevent abuse.
Free versions of instant voice cloning software usually have restrictions, for instance: they offer fewer customization features and low quality audio. The premium offerings are more polished, but can range from $30 to as much as $500 per month depending on what features you need and how many times a day or week (etc.) use the service.
For people looking to snoop around what capabilities instant voice cloning has, services like it is a good first-pass that will give you an idea of what sort of magic (or lack thereof) AI can pull off with your speech.