Wikinews interviews the research team behind 'human-like' Maia chess engine

Monday, March 1, 2021

Portrait of Professor Ashton Anderson, one of the researchers on this study.

Interview with Professor Andersen

Portrait of Reid McIlroy-Young.
Image: Reid McIlroy-Young.

What was the timeline of this study? Why did you choose the name 'Maia'? How many researchers were involved in this study and what were their roles?

((Ashton Anderson)) We started in late 2018, as one of the first projects of Reid's PhD. There were 4 people involved for most of the time. The programing and data analysis was done by Reid, with close collaboration with the other team members[.]

((WN)) How was Maia trained?

((Ashton Anderson)) We used first generation NC Azure VMS with NVIDIA Tesla K80 GPUs, the final models were based on the hyperparameter tuning. The final training time was a couple days[.]

((WN)) How many games were selected for training the neural network? Did you exclude those games where one of the players had quit/got disconnected from the game? (ie, letting the time run out, instead of resigning?) After dividing the games based on ratings, how many games were used for each Maia version?

((Ashton Anderson)) We used 12 million games for each model, we truncated games where either player had 30 seconds or fewer, we didn't filter by termination condition[.]

((WN)) If I understand correctly, different versions of Maia predict moves depending on the Elo rating; how was that achieved? Did the team choose only those games where both players where in the same rating division? (eg, both were in 1500-1599) If not (eg, 1500 v 1700), did two versions of Maia train using that game?

((Ashton Anderson)) Yes, the models were only shown games where both players were within the targeted rating range. So all our models have fully separate training sets[.]

((WN)) Do we notice a pattern of higher rated Maia playing rather easy game against weaker opponents?

((Ashton Anderson)) Maia doesn't take the opponent into account, so it should play the same against all opponents.

((WN)) Did you try playing Maia v5 (1500 ELO) against v1 or v9? Does the weaker model ever win against the stronger model?

((Ashton Anderson)) maia1 can win against maia9, but playing them raw requires an entropy source as the models are deterministic or they'll just play the same game over and over again. I ran a few a long time ago so you can search for maia1 vs maia9 on their accounts.

Stockfish showing different options of ideal moves.
Image: Lichess.

((WN)) Some players play aggressively, and some have different styles -- how does Maia take those into consideration, to predict the next move? Does it also provide a list of moves, ordered based on the likelihood of it being played by a human? (For perspective Stockfish shows which was the ideal move, and which was the second best move, per its evaluation)

((Ashton Anderson)) We don't consider play style, the models just average over the players. The models take in a board stack and output probabilities for all 1858 possible chess moves. We then filter that down to just the legal moves and convert it to a probability distribution and select the top move.

GM Hikaru Nakamura (black) with an esoteric fashion of check mating the opponent's King.
Image: Lichess.

((WN)) In the endgame, some players try to beat the opponent in rather creative, or sometimes esoteric fashion, chasing the king across the board. Does Maia also do that -- something that a sapient player does purely because of emotions, fun or thrill; xor will Maia play the shortest series of winning moves all the time?

((Ashton Andersonn)) The models tend to play for longer endgames. This is likely due to those being more plentiful in the training data, since players that concede early don't leave samples[.]

The rating distribution for weekly Classical games on Lichess on March 7, 2021, forming a bell curve.
Image: Lichess.

((WN)) Given the rating distribution of players follow a bell curve, what measures did the research team take to avoid over-fitting games at the curve's peak, and under-fitting at the extremes?

((Ashton Anderson)) We trained on the same number of games for each rating level (12 million).

Maia's move-matching accuracy.
Image: Ashton Anderson, Reid McIlroy, Siddhartha Sen and Jon Kleinberg.

((WN)) This graph shows a trend; the maximum of the curve lies ahead of the rating it was trained for. (eg, Maia 1100's peak is at 1200, and so on). Could you please explain why?

((Ashton Anderson)) We hypothesize it's because the models are more like committees of players than a single player, so they tend to be a bit stronger than their targets. https://maiachess.com/assets/js/plots.js has the data used for the plots[.]

((WN)) The Microsoft announcement says "Maia could look at your games and tell which blunders were predictable and which were random mistakes." When Maia predicts the blunder, is Maia aware the move is a blunder? If so, how was it achieved?

((Ashton Anderson)) The models assign a probability of winning to each board position, so this can be used to say if a blunder occurred[.]

((WN)) What are some of the use cases of training a chess engine like how a sapient being would play rather than generating a minimax tree/Monte Carlo tree search to find a more optimal move?

((Ashton Anderson)) The main use case is developing training tools that can help people improve. If we can tell what common mistakes are, we can help people find them and target them in their training.

((WN)) Can a player run Maia on their computer to train, practice and improve their games? Xor, does the trained neural network take too much computational power?

((Ashton Anderson)) Yes, we are using the Leela/lc0 client which can run on just about anything, although a GPU would make them faster.

((WN)) Given Lichess uses Glicko-2 rating, rather than using Elo, did the research involve converting the player ratings on Lichess to Elo while training the neural network?

((Ashton Anderson)) We used the Lichess rating system, and sometimes refer to it as Elo as that's the more commonly used term.

((WN)) Time plays a factor in how a human player will play a move. Does that affect Maia's move predictions when Maia is playing in blitz, rapid or classical time control?

((Ashton Anderson)) No, the models don't know anything about move time, but we plan on incorporating this in future versions.

Portrait of GM Hikaru Nakamura
Image: Andreas Kontokanis.

((WN)) The Microsoft announcement mentioned games like bullet and ultra-bullet were filtered out since rate of blunders increase. However, there are players like Hikaru Nakamura, who are good at bullet. Are there plans to increase Maia's domain on how skilled players play in such time controls? And this could also help humans detect which type of blunders they might make under time pressure. Though I understand, this might pollute Neural Network, and might give underwhelming results.

((Ashton Anderson)) We are looking into it, but access to enough training games also quickly becomes an issue.

((WN)) Glicko-2 ratings for new players start at 1500 on Lichess and changes rapidly for their first few games. Their ratings look like: "1642 (?)", as Glicko-2 is not very confident of their rating yet. Did your team filter out those games where either of the players were new?

((Ashton Anderson)) No, they are infrequent in our sample as we only looked at rated games. Although on some tests (in a tournament with tree search it was the weakest) maia-1500 has been an outlier so we suspect the new players do have an effect on it[.]

((WN)) Since Maia was trained till 2500 rating, do we expect it to lose against players who are rated above 2500? Will Maia continually run, and train itself while playing against a human opponent?

((Ashton Anderson)) The released models only go up to 1900-1999, we tested them up to 2500. The models are static and don't update or learn from play.

((WN)) Given a position on the board may not appear frequently, how did you get around this in order to train Maia?

((Ashton Anderson)) Most positions don't occur in our training set, the deep learning based design means they can extrapolate to novel positions[.]

((WN)) On Maia's website, it is said "even when players make horrific blunders, Maia correctly predicts the exact blunder they make around 25% of the time". Does that indicate humans in-general, tend to make the same type of blunders?

((Ashton Anderson)) That is our speculation too, but we can't generalize our results yet.

((WN)) The Microsoft announcement also said, "some personalized models can predict an individual's moves with accuracies up to 75%". Which player's moves were used to train the model? How many games were analysed? What is the reason of this improvement in prediction? How can it be improved? Is it possible to download the model and train it with games of a specific player on a regular computer?

((Ashton Anderson)) We recently found some issues with the data used in that analysis, so would say we get up to 65% accuracy. The updated paper is on arXiv. We did a variety of analyses so there is no single answer. Once the paper has undergone peer review we will have more information available about code/models.

((WN)) Is there a way we could quantify, visualise or explain how the style of any two players differ? It might be interesting to see. Moreover, it could be used to track how one's style changes as their rating changes. Ah, this reminds me of the "Play Magnus" app; maybe using Maia, one can make a better, more accurate, and free alternative of such applications for various grand masters; this is indeed brilliant! Is it possible to take the model and train on say Alireza Firouzja's games in PGN?

((Ashton Anderson)) Yes!

((WN)) AI mimicking actions of a sapient being is in the territories of Turing Test. Do you think a machine can pass Turing Tests in a very niche domain? Will this make it harder for anti-cheat tools to detect if a move was a sapient-decision, or was assisted by computer?

((Ashton Anderson)) We plan to test if our systems pass a Turing test, so we're at least optimistic about it.

((WN)) In what way could someone misuse Maia?

((Ashton Anderson)) Cheating with them like any other chess engine.

((WN)) Is Maia ready to be used for detecting cheating on online chess websites like Lichess?

((Ashton Anderson)) No, that's a much harder problem. It could be a valuable input, though.

((WN)) Does the team intend to train Maia on the other chess variants offered by Lichess?

((Ashton Anderson)) Not currently, no.

((WN)) Does the team plan on training Maia with more datasets? Will Maia also be training with live matches happening on Lichess?

((Ashton Anderson)) Yes, we'll release new versions of Maia in the coming months.