Do you remember that scene in Walt Disney’s Bambi where the titular fawn learns to stand up and walk under its own power? It’s a charming vignette in the movie, showcasing a skill that plenty of baby animals — from pigs to giraffe to, yes, deer — pick up within minutes of their birth. Over the first few hours of life, these animals rapidly refine their motor skills until they have full control over their own locomotion. Humans, who learn to stand holding onto things at around seven months and who begin walking at 15 months, are hopelessly sluggish by comparison.
Guess what the latest task that robots have beaten us at? In a new study carried out by researchers at Google, engineers have taught a quadruped Minitaur robot to walk by, well, not really having to teach it much at all. Rather, they’ve used a a type of goal-oriented artificial intelligence to make a four-legged robot learn how to walk forward, backward, and turn left and right entirely on its own. It was able to successfully teach itself to do this on three different terrains, including flat ground, a soft mattress, and a doormat with crevices.
“Legged robots can have great mobility because legs are essential to navigate unpaved roads and places designed for humans,” Jie Tan, principle investigator on the project and Google’s head of locomotion efforts, told Digital Trends. “We are interested in enabling legged robots to navigate our diverse and complex real-world environments, but it is difficult to manually engineer robotic controllers that can handle such diversity and complexity. Therefore it is important that robots be able to learn by themselves. This work is exciting because this is an early demonstration that, with our system, a legged robot can successfully learn to walk on its own.”
Positive reinforcement
The technology at the root of this particular project is something called deep reinforcement learning, a specific approach to deep learning that’s inspired by behaviorist psychology and trial and error learning. Told to maximize a certain reward, software agents learn to take actions in an environment that will achieve those results in the most precise, efficient way possible. The power of reinforcement learning was famously demonstrated in 2013 when Google’s DeepMind released a paper showing how it had trained an A.I. to play classic Atari video games. This was achieved with no instruction other than the on-screen score and the approximately 30,000 pixels that made up each frame of the video games it was playing.
Video games, or at least simulations, are frequently used by robotics researchers, too. A simulation makes perfect sense in theory, since it allows roboticists to train their machine in a virtual world before going out into the real one. That saves robots from the inevitable pratfalls and wear-and-tear that it would undergo as it learns to carry out a specific task. As an analogy, imagine if all of your driving lessons were carried out using a driving simulator. The argument could be made that you would learn more quickly because you wouldn’t have to be so cautious about risking your physical safety or damaging your car (or someone else’s). You could also train more rapidly without having to wait for allocated lessons or for a licensed driver to be willing to take you out.
The problem with this is that, as anyone who has ever played a driving video game will know, it’s pretty darn hard to model the real world in a way that feels like, well, the real world. Instead, Google’s researchers began developing improved algorithms that allows their robot to learn more rapidly with fewer trials involved. Building on a previous piece of Google research published in 2018, their robot was able to learn to walk in just a couple of hours in this latest demonstration.
It’s also able to do this while emphasizing a more cautious, safer approach to learning, involving fewer falls. As a result, it minimizes the number of human interventions that need to be made to pick the robot up and dust it off every time it takes a tumble.
Building better robots
Learning to walk in two hours may not be quite deer levels of learning-to-walk efficiency, but it’s a far cry from engineers having to explicitly program how a robot is usually taught to maneuver. (And, as noted, it’s a whole lot better than human infants can manage in that kind of time frame!)
“Although many unsupervised learning or reinforcement learning algorithms have been demonstrated in simulation, applying them on real, legged robots turns out to be incredibly difficult,” Tan explained. “First, reinforcement learning is data-hungry, and collecting robot data is expensive. Our previous work has addressed this challenge. Second, training requires someone to spend a lot of time supervising the robot. If we need a person to monitor the robot and manually reset it every time it stumbles — hundreds or thousands of times — it’s going to take a lot of effort and a very long time to train the robot. The longer it takes, the more difficult it is to scale up the learning to many robots in many different environments.”
One day this research could help create more agile robots that are more rapidly able to adapt to a variety of terrains. “The potential applications are numerous,” Tan said. However, Tan stressed that this is “still early days, and there are many challenges that we still need to overcome.”
In keeping with the reinforcement learning theme, it’s certainly a reward that’s worth maximizing, though!