THWOK. Ponk. Thwok. Ponk. THWOK. PONK. Thwok. Cheering intensifies. Ponk. THWOK. The squeak of sneakers on asphalt. PONK. Pause. Enthusiastic applause.
Recognize it? Of course you don’t. It is, in fact, Roger Federer’s “impossible shot” against Andre Agassi in the quarterfinals of the 2005 Dubai Tennis Championships. It’s about as perfect a moment of tennis as one could hope to witness, rendered unintelligible by the fact that it’s reproduced here as an abstract soundscape.
As easy to follow as tennis is visually, on an audioscape level, it’s as tough to follow as an episode of Twin Peaks you’ve switched on halfway through. That’s a challenge engineers from Tennis Australia, digital design and communications agency AKQA, and Monash University in Melbourne, Australia are working to solve with something called Action Audio. It’s an online audio stream that was developed for last month’s Australian Open broadcasts to assist the 285 million people globally — many of whom are tennis fans — living with blindness or a visual impairment.
“An easy way to understand it is … are you familiar with audio descriptions?” Tim Devine, AKQA’s executive creative director, told Digital Trends. “A lot of blind and low-vision people use them. This is kind of an abstract version of audio descriptions to give people more information about what’s happening [on the court].”
So commentary, basically?
There is, of course, a kind of audio description that’s existed in sport for years — and it’s the same one that allows sports fans to listen to sport on the radio, where images aren’t exactly in plentiful supply. But audio commentary from pundits isn’t the same thing as being able to watch a game and come to your own conclusions. The idea of Action Audio is to take images of the actions on court, and then to turn these into an unobtrusive action cue language — with different sounds to illustrate forehands, backhands, nail-bitingly close shots, and more.
“[We wanted people to be able to] make their own appraisal of what’s happening on the court,” Devine said. “So they can have their own insights, rather than just having the insights created by a commentator. They have more information. They’re able to say, ‘wow, Federer’s really hammering the backhand tonight.’ Without Action Audio, they couldn’t have that insight because no one’s telling them ‘that’s a backhand’ or ‘that’s a forehand.’ But you can do that very easily with abstract audio signals.”
The technology works by using 3D spatial data from a high-speed camera/computer vision system called Hawk-Eye, which tracks ball position in real time for electronic line calling for scoring purposes. This information is then transformed into 3D audio, turning actions into individual sound cues, while also shifting where the sound appears to be coming from in order to allow fans to get a sense of where the ball is on the court at any particular moment. “We deconstruct sport to reconstruct it as sound,” Devine said.
He noted that the project was inspired by neuroscientist David Eagleman’s sensory substitution technology. Eagleman, who Digital Trends interviewed last year, has long explored sensory substitution that’s able to, for instance, capture sound and turn it into vibratory patterns on a wearable vest or wristband. This could be used to allow a deaf person to understand what a person is saying when they speak.
“If you think about it, every signal that comes into the body, no matter what sense it is, is abstract, right?” Devine said. “We just learn to decode that signal. So there’s no reason why we can’t create — [and learn to understand] — an abstract signal that comes through an audio channel.”
Beyond the Australian Open
For the Australian Open, Action Audio was available as a television audio channel as part of the broadcast. Audio cues, which could be listened to in isolation or with audio commentary, offered sounds for forehand and backhand, along with others to indicate how close the ball is to the line. Devine said that, in the future, there are opportunities to expand this — the same way that any spoken language can be used to communicate simple intentions, as well as far more detailed observations. In the future, it may even be possible for users to customize the system for themselves, either changing the specific audio cues or, potentially, adding or removing detail depending on what they want to see highlighted. The possibilities are endless.
Devine acknowledged that this is a challenging job, not just technologically, but also in ensuring they are improving the viewing experience, not detracting from it. “Hearing for a blind or low-vision person is a very meaningful and precious sense,” he noted. Elsewhere in the interview, he asserted that the team is “trying to reduce cognitive load” on users.
Having demonstrated the technology, they now hope to expand it to coverage of other tennis tournaments such as Wimbledon (which already uses some impressive A.I. tech), the French Open, and the U.S. Open. They also believe that they can expand to other sports as well — although this will raise new challenges.
“Tennis is actually a really good example of action audio, because it’s dynamic,” Devine said. “It has moments of stopping, and moments of full action where everyone is quiet. But every sport is quite different. How could we sonify surfing? How do we give people a sense of what a surfer is doing? What do we identify? How about multiplayer events like basketball? How do we know what the information that someone who’s blind or low vision would like to hear? That’s where a codesign process will work.”
There’s more work to be done, but these are certainly promising beginnings.