Microsoft’s AI Voice Tool Can Virtually Imitate Anyone Using a Three-Second Sample

Recent advances in artificial intelligence give the impression that this technology is finally entering the mainstream, and as a result, previously unthinkable uses of AI are becoming common. However, as much as this may seem frightening and dangerous, there is much more to this than just the negative aspect, as we saw earlier when ChatGPT was being integrated into multiple services utilized by millions of people worldwide, to improve their lives and make it much more efficient. 

Similarly, we now have information on Microsoft‘s approach to an AI Voice tool. The company is calling it VALL-E (well, we’ve all heard the name somewhere before), and they are claiming that the technology is light years ahead of its competition. Using only a three-second sample, VALL-E is able to mimic not only people’s voices (which is really mind-blowing) but also emotions, something that not a lot of AI voice tools can do. 

Researchers describe the training process as using “discrete codes derived from an off-the-shelf neural audio codec model,” and Microsoft claims that 60,000 hours of English speech data were used. Some samples of the technology can be heard on GitHub. This may sound terrifying if you consider that any person you talk to online could very well be ‘not a real person’, but the voices aren’t foolproof because they still have a slightly robotic touch to them if you listen closely.

It’s possible that one day you’ll be having a conversation with someone on the phone who doesn’t exist in the physical world as this technology is expected to improve over time as more sample data is fed into it. The technology is not yet available to the general public, and there are currently far more drawbacks to it than benefits.

To begin with, these AIs can easily impersonate anyone in the world and even make up lines that the target person never said. In the wrong hands, the ability to imitate virtually anyone on the planet with a recorded voice could cause widespread confusion and distrust.

There is no immediate danger, but warning signs are appearing, and we could start feeling the effects of this very soon. With that in mind, please share your thoughts on the technology and suggestions for its future application in the comments below.


Muhammad Qasim

Qasim's deep love for technology and gaming drives him to not only stay up-to-date on the latest developments but also to share his informed perspectives with others through his writing. Whether through this or other endeavors, he is committed to sharing his expertise and making a meaningful contribution to the world of tech and gaming.
Back to top button