In tests, they set the AI playing games such as Super Mario and a basic 3D shooting game called VizDoom, and in the games, it displayed a propensity for exploring its environment.
“Recent success in AI, specifically in reinforcement learning (RL), mostly relies on having explicit dense supervision — such as rewards from the environment that can be positive or negative,” Deepak Pathak, a researcher on the project, told Digital Trends. “For example, most RL algorithms need access to the dense score when learning to play computer games. It is easy to construct a dense reward structure in such games, but one cannot assume the availability of an explicit dense reward-based supervision in the real world with similar ease.”
But given that Super Mario is — last time we checked — a game, how does this differ from AI like the DeepMind artificial intelligence that learned to play Atari games? According to Pathak, the answer is in its approach to what it is doing. Rather than simply trying to complete a game, it sets out to find novel things to do.
“The major contribution of this work is showing that curiosity-driven intrinsic motivation allows the agent to learn even when rewards are absent,” he said.
This, he notes, is similar to the way we show curiosity as humans. “Babies entertain themselves by picking up random objects and playing with toys,” Pathak continued. “In doing so, they are driven by their innate curiosity, and not by external rewards or the desire to achieve a goal. Their intrinsic motivation to explore new, interesting spaces and objects not only helps them learn more about their immediate surroundings, but also learn more generalizable skills. Hence, reducing the dependence on dense supervision from the environment with an intrinsic motivation to drive progress is a fundamental problem.”
Although it’s still relatively early in the project, the team now wants to build on its research by applying the ideas to real robots.
“Curiosity signal would help the robots explore their environment efficiently by visiting novel states, and develop skills that could be transferred to different environments,” Pathak said. “For example, the VizDoom agent learns to navigate hallways, and avoid collisions or bumping into walls on its own, only by curiosity, and these skills generalize to different maps and textures.”