This week I presented to our weekly reading group, this work:
Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., van den Driessche, G., … Hassabis, D. (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587), 484–489.
To quickly summarize this work…
Basically, they create a
policy network, which is a convolutional neural network, that predicts the next move a human player would do from a board state. They create a
value network, also a convolutional neural network, that predicts the outcome (win or lose) of the game given the current board state.
Then they setup a
tree search that explores different board positions to figure out what future moves lead to wins. They use these
value networks to predict the future outcomes and choose moves that will lead to board positions that likely will cause it to win.
A nice analogy for the
policy network is that it predicts good short-term moves, while the
value network predicts board positions that lead to a long term win.
This was my first real look at reinforcement learning, and if you aren’t familiar with this topic, this post explaining reinforcement learning for the game of pong was particularly helpful.
Alright! So here are the slides for “Mastering the Game of Go with Deep Neural Networks and Tree Search”, which can be downloaded as slides for PPTX or PDF. Please feel free to download and modify as needed (check back for any updates).