Skip to main content

This humanlike synthesized speech could be the future of audiobooks

Synthesized voices like those used by Siri and Alexa are fine for telling us the day’s weather forecast or how many minutes remain on a cooking timer, but would you really want their flat, monotonous tones reading you audiobooks? Probably not, which is why most of us turn to human-voiced services like Audible to get our audiobook fix. Human voice actors might not get the nod for too much longer, however, due to to the pioneering work of a London-based startup called DeepZen.

Using artificial intelligence algorithms, augmented by the technological firepower of IBM’s Power A.I. and Watson technologies, DeepZen has developed text-to-speech tools that not only sound human at first listen, but can also pick up on the emotional cues needed for reading text in a compelling manner. In doing so, the company claims that it could reduce the time and cost to produce audiobooks by up to 90%.

Recommended Videos

“Our system is truly revolutionary,” Taylan Kamis, CEO and co-founder of DeepZen, told Digital Trends. “It works using deep learning and neural networks to understand how a human talks and reads. We then train the system so it can recognize where to apply the right emotions and intonation when reading a piece of text. The result is humanlike speech very closely resembling the real thing.”

Inevitably, work like this can be cast as yet another example of cutting-edge A.I. tools threatening a human profession. In this case, that profession involves actors who, despite what a few high-profile figures are able to achieve, don’t have the most steady, stable careers as it is. It would be naive to think that software such as this won’t have an impact on the future of voice actors, but, as Kamis points out, there are plenty of scenarios in which tools such as DeepZen’s could be a net positive for humanity.

For example, it could make possible the creation of audiobooks based on works by new and emerging writers, or from publishers who don’t have the luxury of big budgets. It could also be used to help develop superior text-to-speech tools for people who have dyslexia or otherwise have trouble reading.

“As for the future, we are also looking at producing voice-overs for the video production industry, as well as gaming, where there is a need for real-time text-to-speech to enhance the player experience,” Kami said. “We are also looking at other languages.”

You can check out a sample of the system here.

Luke Dormehl
Former Digital Trends Contributor
I'm a UK-based tech writer covering Cool Tech at Digital Trends. I've also written for Fast Company, Wired, the Guardian…
The future of A.I.: 4 big things to watch for in the next few years
brain with computer text scrolling artificial intelligence

A.I. isn’t going to put humanity on the scrap heap any time soon. Nor are we one Google DeepMind publication away from superintelligence. But make no mistake about it: Artificial intelligence is making enormous strides.

As noted in the Artificial Intelligence Index Report 2021, last year the number of journal publications in the field grew by 34.5%. That’s a much higher percentage than the 19.6% seen one year earlier. A.I. is going to transform everything from medicine to transportation, and there are few who would argue otherwise.

Read more
This tech was science fiction 20 years ago. Now it’s reality
Hyundai Wearable Exoskeleton, assistive tech

Twenty years really isn’t all that long. A couple of decades ago, kids were reading Harry Potter books, Pixar movies were all the rage, and Microsoft’s Xbox and Sony’s PlayStation were battling it out for video game supremacy. That doesn’t sound all that different from 2021.

But technology has come a long way in that time. Not only is today’s tech far more powerful than it was 20 years ago, but a lot of the gadgets we thought of as science fiction have become part of our lives. Heck, in some cases, this technology has become so ubiquitous that we don’t even think about it as being cutting-edge tech.

Read more
Emotion-sensing A.I. is here, and it could be in your next job interview
man speaking into phone

I vividly remember witnessing speech recognition technology in action for the first time. It was in the mid-1990s on a Macintosh computer in my grade school classroom. The science fiction writer Arthur C. Clarke once wrote that “any sufficiently advanced technology is indistinguishable from magic” -- and this was magical all right, seeing spoken words appearing on the screen without anyone having to physically hammer them out on a keyboard.

Jump forward another couple of decades, and now a large (and rapidly growing) number of our devices feature A.I. assistants like Apple’s Siri or Amazon’s Alexa. These tools, built using the latest artificial intelligence technology, aren’t simply able to transcribe words -- they are able to make sense of their contents to carry out actions.

Read more