Artificial intelligence has made extraordinary advances when it comes to understanding words and even being able to translate them into other languages. Google has helped pave the way here with amazing tools like Google Translate and, recently, with its development of Transformer machine learning models. But language is tricky — and there’s still plenty more work to be done to build A.I. that truly understands us.
Language Model for Dialogue Applications
At Tuesday’s Google I/O, the search giant announced a significant advance in this area with a new language model it calls LaMDA. Short for Language Model for Dialogue Applications, it’s a sophisticated A.I. language tool that Google claims is superior when it comes to understanding context in conversation. As Google CEO Sundar Pichai noted, this might be intelligently parsing an exchange like “What’s the weather today?” “It’s starting to feel like summer. I might eat lunch outside.” That makes perfect sense as a human dialogue, but would befuddle many A.I. systems looking for more literal answers.
LaMDA has superior knowledge of learned concepts which it’s able to synthesize from its training data. Pichai noted that responses never follow the same path twice, so conversations feel less scripted and more responsively natural.
While it is still in research and development, Google is supposedly using it internally to explore novel interactions. During Google I/O, it demonstrated a couple of recent exchanges with a less formal, more casually conversational dialogue than the typical way we might interact with a chatbot tool such as Google Assistant. Slightly trippily, these were conversations with a bot pretending to be, variously, the dwarf planet Pluto and a paper airplane, answering questions about themselves. The demo was to show how the model is able to carry out in-depth conversations on any topic.
Eventually, LaMDA should result in Google A.I. tools that are better at following human conversations in terms of context. Pichai specifically called out Google Assistant and search as domains where this will be useful.
Building multimodal models
He also noted that the technology is being used to create multimodal models that can understand images, text, audio, and video. This could be used to, for instance, ask Google Maps to plan a road trip with beautiful mountain views, combining its knowledge of audio, text and images. It could also be used for superior video search. One other example might be asking to jump to the part of a video in which a lion roars at sunset.