Maluuba’s approach is interesting, because it breaks down the strategies and maneuvers required to beat the game into their component parts. Various different agents focus on one job and one job alone, while an agent put in charge of managing from the top makes high-level decisions about what actions should be prioritized.
For instance, some agents might be tasked with chasing down pellets, while others focused on avoiding enemies. The decision-making agent would then choose the best option based on weighted logic — if a hundred agents wanted to move left to grab a pellet, but only three wanted to move right to avoid a ghost, it would elect to move right because colliding with the enemy would end the run.
Ms. Pac-Man is relatively widely used in AI research because of the unpredictable nature of its gameplay, according to a post on the official Microsoft blog. Steve Golson, who is credited as co-creator of the original arcade version of the game, notes that this was intentional, as the game was reliant on players spending quarter after quarter on extra lives for it to be a financial success.
Maluuba used reinforcement learning, a process by which an AI receives positive or negative feedback for each attempt it makes at a problem, to address this unpredictability. It’s hoped that reinforcement learning could help foster systems that are better equipped to make decisions on their own, compared to those that are trained via supervised learning, where the system is simply fed good and bad examples to establish a base of experience.