team members: Eric Clinch, Jennifer Lee
MCTS Final Paper (PDF)
We will parallelize the Monte Carlo Tree Search algorithm for playing the game of Go.
Monte Carlo Tree Search (MCTS) is an algorithm used to solve perfect information sequential games. More specifically, MCTS takes as input a game and a state in the game and approximates the best move to make in that state of the game. MCTS has been shown to be exceptionally good at playing the game of Go, so that is the game we will be testing on. MCTS is an anytime algorithm, meaning that it can be ran for as many or as few iterations as desired. The solution that the algorithm converges to improves in quality as more iterations are ran, so if you can run multiple iterations in parallel then this should improve the algorithm?s rate of convergence and would be a source of parallelism.
Additionally, MCTS evaluates a state in the game by performing random rollouts from that state. A random rollout is where each we just simulate each player playing completely randomly until the game is finished and using the result of this playout as a heuristic for the value of the starting state. The idea is that if the starting state of the game is favorable to one player, then that player is more likely to win if both players play randomly from that state, so in expectation this should be a good heuristic. If multiple random rollouts are performed for the given state, then we?ll get an evaluation that has lower variance and more accurately represents who is favored in that state of the game. So running multiple rollouts in parallel is again another source of parallelism.
For our project checkpoint, our goal was to implement a fully-functioning sequential implementation of Monte Carlo Search Trees (MCTS). One of the biggest applications of MCTS is Go. Consequently, our project can be broken into two components.
We have a board class that acts as an interface between the MCT algorithm and users. The Board class written in C++ has a 19x19 board of white and black stones that are denoted by ‘W’ and ‘B’ characters. Each player or strategy also has a count of how many of its opponent’s stones it has captured. The game is over when one strategy is the only strategy that has its stones on the board.
The second part of the project is the MCT itself and the propagation and simulation “unrolled” at a specific node to predict the best action. Given that the algorithm uses trees, we made a UtilityNode class to implement the MCT algorithm. We have also finished implementing the entire MCT algorithm; we’ve also added a new class to simulate n games between two different strategies that will try to beat each other.
So far we are on track to complete all the deliverables we initially set out to do. Implementing MCTS on other games might not be an achievable as implementing Go was harder than expected, but increasing the cache efficiency of our MCTS implementation is still an achievable stretch goal.
At the moment we do not yet have any kind of preliminary results to show as we have not yet added any parallelization to our code.
While it was noted in the background section that multiple iterations of MCTS can be ran in parallel, this isn?t actually trivial to do. MCTS models choosing actions in each state of the game as a Multi-Armed Bandit (MAB) problem. Specifically, MCTS generally uses the UCB1 algorithm, which is a MAB solver, to choose which action to try at every iteration. This means that if you run multiple iterations at the same time, the UCB1 algorithm will be using ?stale? data when choosing which action to attempt, so the solution quality of running N > 1 iterations in parallel would be lower than the solution quality of running these N iterations sequentially. So having the algorithm scale well with this stale data is one challenge of parallelizing MCTS.
Additionally, each of these iterations traverses a branch of the tree that UCB1 deems to be the ?most promising? course of actions, so it is likely that the parallel iterations will look at similar branches of the tree, potentially leading to a lot of false sharing. This could be solved by having each of the iterations search a different branch of the tree, but again this could effect solution quality as then most of the iterations would be traversing the branch most recommended by UCB1. Also because the structure MCTS traverses is a tree, there is very little inherent locality, so trying to improve the cache efficiency of this application is an interesting challenge.
While parallelizing random playouts is a very easy source of parallelism that is easy to implement, all it does is provide more accurate state evaluations. This helps UCB1 more accurately evaluate which actions are most promising, but still doesn?t scale completely linearly. So running 1 iteration of MCTS with N random playouts would give a worse solution quality than running N iterations of MCTS sequentially with 1 random playout each. So finding a good balance between parallelizing the random playouts and parallelizing the MCTS iterations is also an interesting challenge for this problem.
We will not be using any starter code since we’ll probably be starting from scratch. We will also not be needing any book or papers for references. In addition, we will be using GHC and latedays clusters.
Our platform will use language C. We will also use the GHC clusters and later on latedays cluster to test parallelism.
Our current schedule for the following 4 weeks:
Week 1: [Eric] polish the existing code for simulating the game and the sequential MCTS implementation. [Jennifer] Implement random playout parallelization.
Week 2: [Eric] Implement iteration parallelization and [Jennifer] begin experimenting with the most intelligent way to have the iterations differ
Week 3: [Eric] Continue to optimize iteration parallelization and [Jennifer] start experimenting with increasing cache efficiency.
Week 4: [Eric] Continue experimenting with cache efficiency and [Jennifer] create interactive demos of the AI.
Week 1: build a Go implementation and start work on a sequential version of MCTS
Week 2: complete sequential version of MCT
Week 3: implement random playout parallelization and begin work on iteration parallelization
Week 4: complete iteration parallelization and begin experimenting with the most intelligent way to have the iterations differ
Week 5: continue to optimize iteration parallelization and start experimenting with increasing cache efficiency
Week 6: continue experimenting with cache efficiency and create interactive demos of the AI