While working the front desk of a New York City Thai restaurant, I picked up the phone, took down a reservation, and hung up. Surreal was the first word that came to mind afterward, not because I was pretending to work at a restaurant, but that I just spoke to and booked a table from the artificially intelligent Google Assistant.
I couldn’t shake the thought that the voice on the other end wasn’t a living being, even if it was only a two-minute conversation. That’s not to say the Assistant wasn’t convincing; if it didn’t tell me (and if I wasn’t in a demo environment), I wouldn’t have thought much about the exchange with what sounded like another human on the other end.
The robot exhange I had was part of a press demo held for Google’s Duplex technology, first announced at the company’s I/O developer conference in May. It lets people book a table at a restaurant, schedule a haircut appointment, and find out store hours through Google Assistant. The voice assistant will call the restaurant or store — after you make a request from your phone or Assistant-enabled smart speaker, like a Google Home — and minutes later you’ll get a notification confirming your plans are officially booked (or if the call couldn’t be completed).
The demo at I/O was jaw-dropping, but wasn’t without controversy. Concerns were raised: Why did the AI need to sound so lifelike and why didn’t it announce a disclaimer to the human caller, amounting to what some considered to be deception. And because the call was recorded, some questioned if the technology would violate certain laws governing phone calls.
Since that time Google has clarified its position, and we now have some answers. Digital Trends spent some time with the team behind Duplex and demoed the technology for ourselves. But before we dive into our experience, let’s take a look at the new details we’ve unconvered on how and where Duplex will work.
What is Duplex, and where will it work?
Duplex is a technology in development that enables Google Assistant to make phone calls on the user’s behalf, and it stems from years of research and work into artificial intelligence for natural language processing.
“We can now understand natural speech and we can generate natural speech,” Nick Fox, vice president of Product Management at Google, said. “Those technologies are applied with Duplex to have a natural, engaging conversation that adapts to what’s happening within the conversation, ultimately with the goal of getting things done.”
Helping you get things done is Google’s goal for Assistant, and with Duplex the company is starting with three specific tasks: Booking a table at a restaurant, finding store hours, and scheduling a hair salon appointment. Duplex cannot do anything more than this at the moment, so if a query isn’t pertinent (say, asking about the weather or sports scores) Assistant won’t understand. Similarly, a user cannot ask Assistant to make calls unrelated to the aforementioned tasks.
It has the potential to save a lot of time for employees stuck answering calls about store hours.
For the user, having the Assistant make these calls frees up a little time, but it also benefits businesses that receive these calls. For example, if a caller asks Assistant for the holiday hours of a local store, the Assistant will place the call, get the answer, and the hours will be added to Google Maps and Google Search for all to see, with a verified tag next to it.
With this info, if other callers want to know the same info, Assistant would only need to pull it from Google’s servers, and not make another call. It only takes one person to ask Assistant, but has the potential to save a lot of time for employees who would be stuck answering calls about store hours otherwise.
During Google’s testing phase this summer, Duplex will only work with select businesses and users in the U.S., and it will only be available at restaurants and hair salons that do not have an online booking system. Google Assistant already works with OpenTable, so it doesn’t need to call restaurants that use the booking service.
Google will test calls for business hours first in the next few weeks, and later this summer, the test will expand to calling for reservations and haircut appointments. There will be a lot of testing and tweaking during this period, so the end result of Duplex may look a little different from what we’ve already seen. Businesses will be able to opt out if they do not want to receive calls from the Google Assistant.
“What you’re seeing is a technology that’s very early stages,” Fox said. “We want to talk about it publicly even at this stage to make sure we get it right, but you’re seeing something quite early in the process here.”
The demo
The demo experience, which took place at Thep Thai in the Upper East Side neighborhood of New York City, was heavily controlled. First, Google put on a presentation showing the overall process of making and ending a call, but actually using the technology in real-time. An engineer fed Duplex a time and party size we suggested for a booking, and here’s what followed.
The idea is you’d tell Google Assistant on your phone or Google Home that you want to “book a table at Thep Thai for two at 8 p.m. tomorrow.” Assistant will ask if it’s okay to book a time from 8 p.m. to 9 p.m. in case there’s no table available at 8 p.m., and after you confirm, it will say it’s going to call the restaurant and will get back to you soon.
When the restaurant employee picks up the phone, Assistant will say the following, or something similar to it: “Hi, I’m calling to make a reservation. I’m Google’s automated booking service so I’ll record the call. Can I book a table for Thursday?”
The call is recorded so that human operators at Google can listen back to the recording, annotate the conversation, and highlight any mistakes Duplex made so it can make the service better.
Getting Duplex to the level where it’s at now started with a lot of manual and human work.
Assistant goes on to respond to each question asked — such as what time the reservation is for, how many people are in the party, and the name for the reservation — and the conversation politely and promptly ends. If it’s asked for information like a user’s email address, the Assistant will say it does not have permission to provide that information.
The person who booked the reservation through Assistant will now get a notification saying that the table has been reserved, and it will automatically be added to Google Calendar. Ahead of the reservation, the user will get a notification and an opportunity to cancel the appointment in case she can no longer make it. Thep Thai’s restaurant owner said a lot of people make reservations and then don’t show up. He’s hopeful this system, which offers an easy way to cancel a booking, will mean fewer empty tables.
After the main demo Google allowed us to try it. When we took the call, we tried to trip Duplex up and threw in some complications, but Assistant handled them well. We asked it to hold at the beginning of the conversation — to which it responded “mhmm,” rather than a verbal “yes.” When we told it the 6 p.m. booking time was full up, Assistant responded with a range between 6 p.m. and 8 p.m., and it settled for our 7:45 p.m. recommendation. We then asked for a name and phone number for the reservation, and if Assistant could spell the name, which it did successfully.
It’s impressive how consistently well Assistant handled its demos, though we did encounter a moment when it needed to fall back on a human operator. It was when someone asked if the Assistant’s client was okay with receiving emails from the restaurant. The phrasing was a little awkward, and the Assistant responded with, “I’m sorry, I think I got confused,” and it said it’s putting a supervisor on the line. The human operator swiftly took over, told the caller it can’t reveal the email address, and finished booking the reservation.
How Duplex works
Early test versions of Duplex, which Google played for us, sounded incredibly robotic. However, the Assistant was still able to understand pauses in the conversation, and even say, “hello?” when a restaurant employee paused for a few seconds. Still, Scott Huffman, vice president of Engineering for Google Assistant, said it was “painful to listen to it.”
If the system really doesn’t know what to do next, it will gracefully bow out of the call, and a human operator will take over.
Getting Duplex to the level where it’s at now started with a lot of manual and human work. Human operators placed calls to restaurants, annotated conversations, and fed the results into Duplex. The team would link phrases like “how many people” and its variations to “number of people in the party,” allowing Duplex to understand the question.
The second stage involved human operators listening to calls the Assistant made, and if things go off track, the operator jumped in to take over and make sure the call is successful. This allowed the team to identify the rough edges around the service, annotate those conversations, and feed it back into the machine-learning algorithms so that Duplex could learn.
The final testing stage is automated mode, where the automated system places calls and completes them. Escape hatches built into the system allow the Assistant to jump back to the key goal of completing the task, thanks to sentences like, “I’m not sure what you said, but can I book a table for three.” If the system really doesn’t know what to do next, it will gracefully bow out of the conversation, and the human operator will take over.
At the moment, Huffman said about four out of five calls made by Duplex do not need the assistance of a human operator. Interestingly, he said human operators aren’t going to be pulled away from the service as Duplex gets better, as Google sees them as an integral part in ensuring Duplex works without a hitch.
The “ums” and “ahs”
Throughout the process of teaching Assistant, Google placed an emphasis on making it sound more natural, and less like a robot. After the demo at Google I/O, critics asked why Google is trying to mix in “ums and ahs” to make the Assistant sound more human, especially if it didn’t add a disclaimer to the beginning of the call that it wasn’t a human. There are disclaimers now, but Huffman said speech disfluencies like “um” or “mhm” were added to keep the flow of conversation going.
Assistant added an “um” after it couldn’t hear what the restaurant employee said, and followed up with its request again.
“We’re not trying to trick or impersonate, but if you go back to that recording of that painful early system, it didn’t sound very natural, it didn’t sound very human,” Huffman said. “But as a result of that, the Assistant was not very successful at completing the tasks. A lot of people would hang up, or get confused about what they were talking to, the conversation would just break down because it didn’t feel natural.”
Huffman said speech disfluencies, according to linguists, are a key part of keeping human conversation between two people going. It’s easy to see how well this works when hearing conversations with the Assistant, and the results are far from the original recording.
One way speech disfluencies help is for conversational acknowledgement, such as when one person is talking, but you want to make sure the receiver know you’re still engaged and listening, like when Assistant said “mhmm” to us when it was asked to hold.
Another useful tool is saying ”um” when there’s uncertainty, as a polite way of asking for clarification. Assistant added an “um” after it couldn’t hear what the restaurant employee said, and followed up with its request again.
Assistant with these speech disfluencies is a stark contrast over the original, robotic Assistant. It’s far less cold, and the conversation moved much more quickly. And rather than accepting simple commands, Assistant is actually interacting with humans through our language, which is sure to excite some while frightening others.
Convenience
Duplex is all about convenience. It saves you a little time, it can give you more accurate store hours, and it can save businesses time as well. Google also said there’s a big opportunity here to help people who can’t speak or have trouble speaking.
Huffman said to think of Duplex as an evolution of automated voice machines from the past, such as when calling your bank, when it was a slow process of pressing numbers to get to the right department.
“Today if you call those airlines or banks, you’ll get something much, much nicer,” he said. “You’ll hear a much more natural sounding voice, and it might say something like, ‘Just tell me what you need, you can say things like, what time is my flight?’ In Duplex, we’re really just taking that same idea a step further, evolving the conversation and making it more natural so that it’s more successful for users and businesses.”
From what we’ve seen so far, it’s promising technology, but is it something we should embrace, or fear?
Google hasn’t shown us how Duplex makes calls to businesses for store hours, nor has it demoed scheduling haircut appointments — so we can’t comment on how well Duplex would work in these instances. We also are unsure if the human operators will have access to your phone number and full name, as that poses a bit of a privacy risk. We also wonder if Duplex would support multiple languages in the future. We’ve reached out to Google to verify some unanswered questions, but of course, there are sure to be more as the technology progresses.
From what we’ve seen so far, it’s promising technology, but is it something we should embrace, or fear? We’ll be happy if we never have to be put on hold for hours on end again, but it’s important to consider the trajectory here. We’re constantly inching towards a future where we do not need to talk to anyone, where you can live isolated in an apartment, with food delivered to you; packages dropped off by drones, and thousands of hours of media to consume without ever having to step outside.
While Duplex may start with some of the more mundane phone calls, the AI is going to get better at conversations, making it easy to port to other industries. It will be up to us as a society to decide how much of our talking should be done through AI, and whether it’s worth picking the phone up again.