Lc0 training. - Leela Chess Zero (2024)

Lc0 training. - Leela Chess Zero (1)

If you are new to Leela (Lc0) Chess and have begun contributing gameseither using Google Cloud or some other online service or your own homecomputer, you may be wondering where all those games go and how training ofLeela happens.

Leela uses a deep convolutional neural network (NN) as a major part of itschess playing. The NN Leela uses is complex and needs extensive trainingbefore it plays high quality chess. The LCZero project kicked off with aperiod of program development and initial test training (you can see some ofthese preliminary tests in the first few crowded “bumps” at the left in thegraph here). The first high quality NNsproduced for Leela are called the “test 10” networks. The training games fromabout 15 million to 67 million are those used to train test 10. As of October2018 the best networks from test 10 are in use in chess engine competitionsand at play.Lc0.org. The quality of test 10 networks neared a plateau in lateAugust 2018, and training of new networks was started from scratch to try toimprove on test 10 (starting around training game 67 million), with somechanges in the game play parameters and two minor bug fixes. The main networkseries now being trained is test 20, with an additional experimental test 30network series training separately (not shown on the above graph). If youcontribute games to training Leela, they will normally be directed by thedevelopers to the test network most in need of new training games. You canaccess more limited data on test 30 by clicking on “Training Runs” and then“Alternative testing” from the main page, somewhat confusingly called ID 2(same as test 30).

_ **
How Leela trains**_

What goes into training and how can you visualize progress in training theneural networks? Leela, like the famous AlphaZero, uses self-play(reinforcement learning) to learn chess: it starts only with the rules ofchess moves, promotions, etc. The initial neural network is given random“weights” and evaluates chess positions extremely poorly (weights are thestrength of neuron connections in the network, try Wikipedia and blogs tounderstand more about what that means). When this initial network plays gamesagainst itself (playing both white and black sides), moves are largely randombut they result in a legal chess game with an endpoint that is either a win, aloss, or a draw.
Training works by collecting a large set of such self-play games andevaluating how well the network predicted the best (winning) moves and whichside was winning each game (called the policy and the value outputs of thenetwork respectively). A process called backpropagation and gradient descentis then used to update the weights in the current network so that the policyand value outputs are improved for the current set of training games. Afterthese updates, the weights for the new network are made available fordownload and a rapid test of network quality isreported (self-play Elo on the graphs, about which more below). You candownload any of these networks and use them with Lc0 on your own computer byusing the –weights=path_to_weights_file option.

Using the new (hopefully slightly improved) network the whole process repeats:

  1. generate self-play games using the current network,
  2. gather a batch of such games,
  3. update the weights in the network so that it better predicts results forthis batch of games,
  4. save the new network, repeat.
    The most compute-intensive part of this cycle is the self-play games and thatis what you can contribute (the process is almost fully automated - you don’thave to worry about any of this stuff). One of many choices the Lc0 developersmake in this process is how many self-play games go into each batch. Currentlytest 20 uses 32,000 games in each batch.

_ **
Understanding the Elo graphs**_
At each round of network updating, many parameters that measure networkquality are made, one of which appears on the main graph atlczero.org. This measure is called Self-play Elo and isa crude but fast estimate of playing strength made by assuming random moveshave an Elo of 0 (do not give any credence to short term changes in Self-playElo, more below). Much more meaningful estimates of playing strength can beseen by clicking “Elo Estimates”, which gives a graphical window of recentnetworks assessed in various ways. The x-axes on these two plots are different(games since Lc0 started vs network ID) so don’t try too hard to line them upin your head. All of the measurements on the Elo Estimates graph are useful,but the simplest to interpret is the CCRL estimate (currently in dark red),which is obtained by playing a Leela network against other computer chessengines with known Elo. The CCRL estimate is a good approximation of how thenetwork would perform against other engines in a tournament with a specifichardware configuration. Some or all of these Elo estimates are often a littleout of date - the price of volunteers contributing their time to make them -and some are made only periodically because they require much more computetime to estimate.

Another very useful measure is the “100n vs SF9” estimates (it is the upperaquamarine line on the graph). These are made by running a network againstStockfish 9, but stopping Leela when it has evaluated 100 nodes at each move(a “node” is a board position in the variant lines tested for move choice).The absolute Elo estimate is not correct for normal play conditions (whichallow Leela to evaluate many more nodes), but the test is fast and fairlyaccurate (unlike Self-play Elo) and is done for many more networks than theCCRL estimate. Similar estimates are made at 10 nodes and 1 node for Leela.You can see that the Self-play Elo values often do NOT follow these moreaccurate estimates (Self-play Elo is the dark blue line, scaled to fit nicelyon the same graph). For example, at the time I am writing, there is a longperiod (from networks 21600 to 21900) when the Self-play Elo curve dropsextensively while the CCRL and 100n vs SF9 curves go up or are flat. Muchcomment is often made when the Self-play Elo curve rises or drops duringtraining - explain to your friends that this is expected: it is not excitingwhen it goes up and it is not distressing when it goes down!

Two other useful indicators are currently on the Elo estimates graph. One is adark red line near the top that approximates the best performance of anyprevious Leela network (exceeding this is a major long term goal). This lineseems a bit high to me and I don’t know why - presumably it comes from thesame CCRL estimate method. Last but not least are the " LR " dots. Theseare the points when the “learning rate” (LR) was dropped for networktraining, and it is theoretically expected that the fastest networkimprovement will occur just after these drops, as is the case in the currentgraph window (note: some Elo tests are intermittent and sometimes they aremissing data points, don’t misinterpret sudden increases “before” the LR drop

  • those are artifacts of when the Elo tests were done).
    The LR is used to scale the magnitude of network update by backpropagation andgradient descent, and it is varied during the training process. For currentLeela training runs, the LR starts high and is occasionally reduced (the LRdrop) until it gets close to zero at the end (other methods such as cyclicalLR changes are possible). You can think of early stages of training as fastbut crude and the later stages as increasingly fine tuned ([readhere](https://towardsdatascience.com/understanding-learning-rates-and-how-it-improves-performance-in-deep-learning-d0d4059c1c10) for more).
    Though network improvement occurs fastest after an LR drop, the processcontinues until the next LR drop, and some patience is needed to let eachlearning rate level squeeze out all the juice before the next LR drop.Dropping the LR too soon is counterproductive, even when network quality isn’tobviously improving when you eyeball these curves. If you have the impressionthe entire learning process should take hours rather than months, that comesfrom the AlphaZero paper, which usedhuge hardware to self-play 44 million games in a few hours.

_ **
Going deeper**_
Network evaluation over time is much more extensive than the Self-play Elo andElo estimate graphs. If you want to start to immerse yourself in some of thesedata, get a free account on the Discord gamer chat site (“LCZero chat”buttonat lczero.org), and type !sheet, !sheet2, !tensorflow or other similarcommands as your message. Be warned, things get very complicated very fast butthere is a wealth of additional information that you can access there. Amongother things, you can find more extensive Elo estimates across all of the maintest networks plus many other tantalizing mysteries such as “Gradient norm”and “MSE Loss”. These sheets are designed mostly for the developers to trackNN progress, so be prepared to spend a lot of time figuring out what it allmeans; for some of it you will need to learn a lot more about neural networks.

Have fun!

Article by Jhorthos(nickname at Leela Discord).

Lc0 training. - Leela Chess Zero (2024)

References

Top Articles
Latest Posts
Article information

Author: Maia Crooks Jr

Last Updated:

Views: 6169

Rating: 4.2 / 5 (63 voted)

Reviews: 86% of readers found this page helpful

Author information

Name: Maia Crooks Jr

Birthday: 1997-09-21

Address: 93119 Joseph Street, Peggyfurt, NC 11582

Phone: +2983088926881

Job: Principal Design Liaison

Hobby: Web surfing, Skiing, role-playing games, Sketching, Polo, Sewing, Genealogy

Introduction: My name is Maia Crooks Jr, I am a homely, joyous, shiny, successful, hilarious, thoughtful, joyous person who loves writing and wants to share my knowledge and understanding with you.