Skip to main content

Baidu’s Deep Voice 2 text-to-speech engine can imitate hundreds of human accents

baidu
Image used with permission by copyright holder
Baidu, the Beijing-based juggernaut that commands 80 percent of the Chinese internet search market, is investing heavily in artificial intelligence. In 2013, it opened the Institute of Deep Learning, an R&D center focused on machine learning. And in May, it took the wraps off the newest version of Deep Voice, its AI-powered text-to-speech engine.

Deep Voice 2, which follows on the heels of Deep Voice’s public debut earlier this year, can produce real-time speech that’s nearly indistinguishable from a human voice. All the more impressive, it needs just thirty minutes of audio to build a working model, and can imitate the regional accents of hundreds of different speakers.

Recommended Videos

That’s leaps and bounds better than early versions of Deep Voice, which took multiple hours to learn one voice.

Please enable Javascript to view this content

They key is Deep Voice 2’s ability to identify similarities between hundreds of different speakers to build a working model of a human voice. Then, it autonomously derives unique voices from that model — unlike voice assistants like Apple’s Siri, which require that a human record thousands of hours of speech that engineers tune by hand, Deep Voice 2 doesn’t require guidance or manual intervention.

Baidu (sign)
Image used with permission by copyright holder

“Give it the right data, and it can learn on [its] own what sort of features are important,” Andrew Gibiansky, a research scientist at Baidu’s Silicon Valley AI Lab, told The Verge.

Baidu isn’t the only company investing in high-quality text-to-speech tech. Google’s WaveNet, a product of the company’s DeepMind division, generates voices by sampling real human speech and independently creating its own sounds in a variety of voices. Adobe’s Project VoCo transcribes human speech to editable text in real time. And Lyrebird, a Canadian AI startup, licenses algorithms that can imitate any voice with just a single minute of sample audio, create one thousand sentences in less than half a second, and can infuse the speech it creates with emotions like anger, sympathy, and stress.

But don’t expect Deep Voice 2 or WaveNet to replace Siri, the Google Assistant, or Amazon’s Alexa anytime soon — AI-powered translation apps require more resources than today’s phones can reasonably supply. But Baidu sees potential in applications like text-to-speech apps and voice-based assistants. “The ability to quickly synthesize multiple human voices will have a huge effect on products such as personal assistants and eBook readers in the future. For example, each character of your eBook could have a unique voice when you listen to the eBook.”

Kyle Wiggers
Former Digital Trends Contributor
Kyle Wiggers is a writer, Web designer, and podcaster with an acute interest in all things tech. When not reviewing gadgets…
A new leak teases how thin the Galaxy S25 Slim will be — and it’s impressive
Side profile of the Samsung Galaxy S24 FE.

The Samsung Galaxy S25 "Slim" has been part of the rumor mill for a while now. If you've missed it, here's what you need to know: It almost certainly does exist, and it's expected to launch sometime during the middle of 2025, instead of next month like the rest of the Galaxy S25 lineup. And now, we have a better idea of just how thin this phone might actually be.

Well-known tipster Ice Universe shared the information on Weibo, stating that its thickness "may be 6.x mm." In other words, the leaker isn't sure of the exact thickness, but expects it to fall between 6mm and 6.9mm.

Read more
Google Photos is getting a cool new feature to speed up your photo edits
Google Photos' year in review feature for 2024.

Google Photos for Android is introducing a new feature that simplifies photo editing right before sharing. A tipster from Android Authority first reported this tool.

The new “Quick Edit” tool lets users easily enhance or crop individual photos before sharing them. It features an “Enhance” button, which functions similarly to the “Enhance” effect in the standard photo-editing options. A crop button is also similar to the one in the regular photo editor. When multiple photos are selected before hitting the share button, the typical share sheet appears instead of the new “Quick Edit” screen.

Read more
The base model Galaxy S25 will get a RAM upgrade we’ve waited years for
Someone holding the Samsung Galaxy S24 with the display turned on.

Back in November, we heard rumors that the Samsung Galaxy S25 might come with an upgraded amount of RAM compared to the base Galaxy S24. The Galaxy S24 Plus and S24 Ultra both start with 12GB of RAM minimum, but until now, the majority of base-model Samsung handsets only had 8GB.

Abhishek Yadav, a known leaker, shared a post on X that said the base storage variant of the Galaxy S25 would come with 12GB of RAM. This also implies that the base storage is likely to be 256GB too. As apps, operating systems, and integrated AI become more powerful, so do their technical requirements. A bump to the base amount of RAM and storage will yield improved performance (hopefully) without a significant cost increase.

Read more