In the past decade, we’ve seen a huge shift in smartphone image quality. From once-middling, low-resolution photos and videos, we now have gorgeous 4K and high dynamic range content coming from the tiny computers in our pockets.
Unfortunately, the same can’t be said for phone-recorded audio.
Sure, we have seen modest improvements in mobile audio processing since the dawn of the iPhone, but the reality is that the sound we now capture alongside beautiful video is often woefully low fidelity by comparison. There are currently cell phones that capture gorgeous 4K video alongside the same tired mono audio that your grandma listened to on her first radio. Imagine the reverse, where films had lifelike Dolby Atmos sound, but were still shot in tepid black and white.
Will Nokia really be able to bring immersive cell-phone captured audio to the world?
Enter, of all companies, Nokia, whose software-based Ozo audio platform aims to take existing phone hardware and up its audio recording quality, allowing viewers to capture cell phone videos with lifelike 3D audio, no extra gear required.
We know what you’re thinking: Will a mobile company like Nokia really be able to bring immersive cell-phone captured audio to the world? Well, after hearing a bit more about Ozo, we think it might.
Before we get into the weeds, let’s talk a bit about how Ozo audio was born.
Originally developed for use with the company’s virtual reality technology, Ozo audio shifted its focus to mobile when the VR market’s growth wasn’t as quick as the company originally hoped it would be.
“The VR market was developing a bit slower than originally expected, so we took many of those technologies and started to focus on new use cases that could improve user experiences immediately with various devices,” says Paul Melin, Nokia’s vice president of digital media. “[Ozo audio] is both providing an immersive audio experience where you can hear the sound from the right directions as if you were there, but also, more interestingly, using that information to improve the quality of user generated content by focusing more intelligently on the audio direction.”
Ozo is designed to react differently based on what and how you are recording.
Much like the modern video and photo software built into smartphones, Ozo is designed to react differently based on what and how you are recording. Just as cameras adjust color balance, focus, and various other aspects of an image on the fly, Ozo uses AI to adjust the way that your phone processes incoming sound.
When taking a selfie, for example, Ozo might focus the attention of the microphones on the forward-facing region of the sound. This allows for viewers to have a much more lifelike playback experience, and one which more accurately reflects the context in which the video was made. It also means that there is less background noise and more clarity in self-made content — something that has long been missing in smartphone-crafted audio.
“That’s really helpful regardless of how you are going to consume the content,” says Melin.
Audio nerds will likely be quick to point out that there have been a number of interesting audio technologies ported to cell phones in the past, and that many of them have seen rather limited use. They’re right: 3D audio technology like DTS Headphone:X has existed for cell phones for years, and there has always been an issue with getting enough people to create content using that tech, thereby limiting the eventual listenership.
Ozo solves these issues of adoption by simply putting the tech in the hands of the amateur content creators themselves from the moment they pick up their phone or camera. And because listeners can playback audio made with Ozo technology on any pair of stereo speakers, headphones, or even the built-in speakers found on a smartphone, content shot with Ozo-enabled smartphones or cameras is compatible with all major social media platforms and playback devices.
The hardware options are also potentially limitless: Because Ozo is a software-based technology and not reliant on specific microphone configurations or placements, there are very few limits when it comes to what cell phone or camera hardware can use it, which means that virtually any manufacturer can license the software from the company for improved audio. Melin says one major company has already licensed Ozo for use with its flagship device so far, and that he expects more will follow. The software is also already available on Nokia-branded phones like the Nokia 8.
Just as cameras adjust color balance, focus, and various other aspects of an image on the fly, Ozo uses AI to adjust the way your phone processes incoming audio.
“We are far in negotiations with several other smartphone and camera makers, so we are very much making the technology broadly available,” he says.
One thing that likely makes licensing easier is just how noticeable the difference is between standard monaural audio and the kind of immersive sounds being offered by audio processed with Nokia’s Ozo tech.
The immersive nature of 3D audio has the ability to transform even mundane scenes into vibrant and lifelike scenarios, and is the reason why so many filmmakers and cinemas have been turning towards object-based audio tech like Dolby Atmos in recent years. Manufactures, creators, and consumers alike are all realizing that what happens around our ears is perhaps equally as important as what we’re seeing on-screen, and the few demo videos we’ve seen that were shot with Ozo are dramatically better-sounding than those shot with typical cell phone audio.
With such huge leaps in video and picture quality in recent years, Melin thinks it’s about time that audio recording technology will follow suit.
“We are seeing a lot of our customers really looking to elevate the quality of the audio experience,” he says, “It doesn’t necessarily mean that you need to have 3D, spatial audio as part of the experience, but really improve the clarity and quality of the [sound] in ways that the user doesn’t need to worry about.”
It’s this natural ease of use that may give Nokia a fighting chance in the war for better sound. Because if consumers can get significantly improved audio without having to lift a finger, they might actually take that preference into consideration when reaching for their wallets.