Google shocked the world in 2016 when AlphaGo, an artificial intelligence program created specifically to play the ancient board game Go, defeated one of the game’s top competitors in a five-game match. Such a feat wasn’t predicted to occur for at least another decade, leaving tech types and laymen alike wondering just how intelligent AI has become.
A little over one year later, AlphaGo again competed in a high-profile match, this time against the world’s top Go player, a 19-year-old prodigy named Ke Jie. The machine shut the human out, three games to none. With these victories under its belt, Google announced in May that it would retire AlphaGo.
But Google’s AI group, DeepMind, has just unveiled a newer, shinier, smarter version of AlphaGo dubbed AlphaGo Zero, which has pushed beyond the capabilities of its predecessor by mastering the ancient board game without any help from humans. Equipped with just the rules of the game, AlphaGo Zero managed to learn Go from scratch, create its own knowledge along the way, and ultimately defeat its predecessor 100 games to zero.
Both the old and new AlphaGo learned through a process called reinforcement learning, which encourages good moves that are more likely to be rewarded with a win. However, the way DeepMind trained the systems differed, and that’s where AlphaGo Zero really shined.
To train the original AlphaGo, DeepMind researchers fed the system thousands of games that were played by amateur and professional human Go players. These games helped the system develop winning strategies and identify good and bad moves. AlphaGo Zero, on the other hand, only played by itself (albeit millions of time), making moves at random until it recognized strategies. The new system had no help from humans beyond its initial startup.
What’s truly astonishing about AlphaGo Zero’s self-schooling is that it went from chump to champ in just a few days. The system started off as a completely incompetent player. By the third day, after only playing against itself, the system was capable of defeating its predecessor. By day 40, DeepMind suggests the system became the greatest Go player ever.
Where the original AlphaGo was little more than an exceptionally talented board game player, the advances made by AlphaGo Zero — specifically it’s ability to teach itself from scratch — makes the system relevant to a wide range of real-world applications. The same principles that help AlphaGo Zero learn from just the rules could be applied to other rules-based task.
“For us, AlphaGo wasn’t just about winning the game of Go,” Demis Hassabis, CEO of DeepMind, told The Guardian. “It was also a big step for us towards building these general-purpose algorithms.”
DeepMind published a paper detailing the development of AlphaGo Zero in the journal Nature.