- The previous version of AlphaGo beat the human world champion in 2016.
- The new AlphaGo Zero beat the previous version by 100 games to 0, and learned Go completely on its own.
- Here is the Nature paper explaining technical details (also PDF version: Mastering the Game of Go without Human Knowledge One of the main reasons for success was the use of a novel form of Reinforcement learning in which AlphaGo learned by playing itself.The system starts with a neural net that…
- It plays millions of games against itself and tuned the neural network to predict next move and the eventual winner of the games.The updated neural network was merged with the Monte Carlo Tree Search algorithm to create a new and stronger version of AlphaGo Zero, and the process resumed.
- In each iteration, the performance improved by a small amount, but because it can play millions of games a day, AlphaGo Zero surpassed thousands of years of human knowledge of Go in just 3 days., from DeepMind post This is a hugely significant advance for AI and Machine Learning research.Here…
The previous version of AlphaGo beat the human world champion in 2016. The new AlphaGo Zero beat the previous version by 100 games to 0, and learned Go completely on its own. We examine what this means for AI.
@kdnuggets: #AlphaGoZero: The Most Significant Research Advance in #AI #AlphaGo
Why? The key is not in any of the components being extremely innovative (although there is definitely some smart new stuff going on), but rather in the formulation of the problem itself. This is not about supervised vs. unsupervised learning. It is not even about the fact the network learns without human intervention or examples. It is about the fact that Alpha Go Zero learned without any data!
This is a feat that can not be understated. We have all heard about the “Unreasonable effectiveness of data”. We have all heard how data-hungry deep learning approaches are. Well, it turns out that (under some constraints) we don’t need data at all! The only thing that was input into the model was the basic rules of the game, not even complex strategies or known “tricks”.
Can you imagine if you could do the same thing in other domains? You specify the rules of the system, you let it generate data and learn from itself. You can stretch your mind to think about that in physical world situations (e.g. biological systems) where you could describe the “rules of the game” and then allow the AI model to generate data and learn on its own. As a matter of fact, the whole network can also be seen as a synthetic data generation system. I would be really curious to see how AlphaGo Lee (the previous version) would perform if trained on the data generated by AlphaGo Zero….