Between malware hiding in seemingly innocent apps and deadly strings of emoji, the battle to keep our smart devices secure is a never ending one. Every new mode of interaction, be it voice control or a unique identifier like a fingerprint or facial recognition, presents another venue by which hackers can access and manipulate the technology around us.
The researchers at UC Berkeley and Georgetown University are keenly aware of this, which is why last year they decided to investigate precisely how vulnerable the voice recognition software that powers so many of our computing devices really is. They focused on Google Assistant, which lives system-wide on Android and within the Google app on iOS, and developed a way to garble voice commands just enough so that Google Assistant could understand them, but they were unintelligible to most humans.
Researchers tested the recognition of several obfuscated commands, like “OK Google,” and measured the software’s ability to decipher the message compared to that of humans. What they found, particularly in the case of “OK Google,” was that the panel of participants were only able to identify the scrambled phrase 22 percent of the time, but the Assistant understood it 95 percent of the time. What’s more, the software was better at decoding the obfuscated version than the normal pronunciation of “OK Google,” which yielded a recognition rate of only 90 percent.
At first glance, many of these distorted commands may just come off as static with the vague cadence of speech, only sped up. As humans, when we know what the phrase is before we hear it, it becomes infinitely easier to identify. But without that information, in many cases, we’re left stumped.
The study notes that some of the jumbled-up commands are easier for us to figure out than others. “Call 911,” for example, yielded a human recognition rate of 94 percent, compared to only 40 percent by Google Assistant, probably because it’s a phrase the wide majority of American English speakers have been preconditioned to hear. But, the right combination of a niche command altered just enough so that our personal assistants are receptive to it while we’re left scratching our heads poses an obvious risk, considering voice controls in most consumer devices lack any form of authentication.
What can we do to protect against voice hacking?
One of the few preventative measures against this kind of voice-targeted manipulation is that many commands prompt assistants to request confirmation afterward. However, as The Atlantic points out in their piece about the study, that’s just a small roadblock to clear with a distorted “yes,” and if everything happens too fast for the user to realize what’s going on, they won’t be able to stop it in time.
Some of the jumbled-up commands are easier for us to figure out than others.
The team followed up its discovery by proposing ways services like Google Assistant, Apple’s Siri and Amazon’s Alexa could head off these attacks, and it turns out there are a variety of methods companies might be inclined to implement. Some defenses, like an audio CAPTCHA, could be thrown in as a final confirmation to distinguish human users from machines — though the researchers point out that the algorithms that power audio CAPTCHAs are relatively outdated and have not kept pace with advancements made in speech recognition technology. Not to mention, CAPTCHAs are infuriating to deal with.
A more complicated solution is tailoring recognition to the owner’s voice, which many services already employ in a limited capacity. However, the report concedes that proposal requires training on the part of the device, and poses a problem for gadgets intended to be used by multiple people, like the Amazon Echo. The team has determined one of the most practical and effective defenses would be a filter that slightly degrades the audio quality of commands, rendering most obfuscated phrases unrecognizable to the device while allowing human ones to pass through.
While reports of voice-based attacks of this kind of are uncommon, if not nonexistent, it’s always helpful to be aware of areas where vulnerabilities lie so they can be curbed before problems really start popping up. Thanks to the research done here, we’ll be a little bit more prepared in case a wave of satanic-sounding whispers begin telling our smartphones what to do.