Skip to content
disservin edited this page Aug 26, 2023 · 61 revisions

Table of Contents

Interpretation of the Stockfish evaluation

The evaluation of a position that results from search has traditionally been measured in pawns or centipawns (1 pawn = 100 centipawns). A value of 1, implied a 1 pawn advantage. However, with engines being so strong, and the NNUE evaluation being much less tied to material value, a new scheme was needed. The new normalized evaluation is now linked to the probability of winning, with a 1.0 pawn advantage being a 0.5 (that is 50%) win probability. An evaluation of 0.0 means equal chances for a win or a loss, but also nearly 100% chance of a draw.

Some GUIs will be able to show the win/draw/loss probabilities directly when the UCI_ShowWDL engine option is set to True.

The full plots of win, loss, and draw probability are given below. From these probabilities, one can also obtain the expected match score.

Probabilities Expected match score

The probability of winning or drawing a game, of course, depends on the opponent and the time control. With bullet games, the draw rate will lower, and against a weak opponent, even a negative score could result in a win. These graphs have been generated from a model derived from Fishtest data for Stockfish playing against Stockfish (so an equally strong opponent), at 60+0.6s per game. The curves are expected to evolve, i.e. as the engines get stronger, an evaluation of 0.0 will approach the 100% draw limit. These curves are for SF15.1 (Dec 2022).

Optimal settings

To get the best possible evaluation or the strongest move for a given position, the key is to let Stockfish analyze long enough, using a recent release (or development version), properly selected for the CPU architecture.

The following settings are important as well:

Threads

Set it to the maximum - (1 or 2 threads).

Set the number of threads to the maximum available, possibly leaving 1 or 2 threads free for other tasks. SMT or Hyper-threading is beneficial, so normally the number of threads available is twice the number of cores available. Consumer hardware typically has at least 4-8 threads, Stockfish supports hundreds of threads.

More detailed results on the efficiency of threading are available.

Hash

Set it to the maximum - (1 or 2 GiB RAM).

Set the hash to nearly the maximum amount of memory (RAM) available, leaving some memory free for other tasks. The Hash can be any value, not just powers of two. The value is specified in MiB, and typical consumer hardware will have GiB of RAM. For a system with 8GiB of RAM, one could use 6000 as a reasonable value for the Hash.

More detailed results on the cost of too little hash are available.

MultiPV

Set it to 1.

A value higher than 1 weakens the quality of the best move computed, as resources are used to compute other moves.

More detailed results on the cost of MultiPV are available.

The Elo rating of Stockfish

"What is the Elo of Stockfish?": A seemingly simple question, with no easy answer. First, the obvious: it is higher than any human Elo, and when SF 15.1 ranked with more than 4000 Elo on some rating lists, YouTube knew.

To answer the question in more detail, some background info is needed. In its simplest form, the Elo rating system predicts the score of a match between two players, and conversely, a match between two players will give information about the Elo difference between them. The Elo difference will depend on the conditions of the match. For human players, the time control (blitz vs classical TC) or the variant (standard chess vs Fischer random chess) are well-known factors that influence the Elo difference between the two players. Needless to say, one needs sufficiently many games to confidently measure the Elo difference or match score. Finally, given an Elo difference between two players, one needs to know the Elo rating of one of them to know the Elo rating of the other. More generally, one needs an anchor or reference within a group of opponents, and if that reference is different in different groups, the Elo number can not be compared.

The same observations hold for computing the Elo rating of Stockfish, with caveats related to the fact that engines play very high-level chess, and are able to draw the majority of games between engines of similar strength. From the starting position or any other very balanced condition, a draw rate of 100% has essentially been reached between top engines especially at rapid or longer TCs, even more so on powerful hardware. This results in small Elo differences between top engines, e.g. a +19 -2 =79 match score is a convincing win, but a small Elo difference. Carefully constructed books of starting positions that have a clear advantage for one side can reduce that draw rate significantly and increase Elo differences. The book used in the match is thus an important factor in the computed Elo difference. Similarly, the pool of opponents and their ranking has a large impact on the Elo rating, and Elo ratings computed with different pools of opponents can hardly be compared, especially if weaker (but different) engines are part of that pool. Finally, in order to accurately compute Elo differences at this level, a very large number of games (typically tens of thousands of games) are needed, as small samples of games (independent of the time control) will lead to large relative errors.

Having introduced all these caveats, accurately measuring Elo differences is central to the development of Stockfish, and our Fishtest framework constantly measures with great precision the Elo difference of Stockfish and its proposed improvements. These performance improvements are accurately tracked overtime on the regression testing wiki page. The same page also links to various external websites that rank Stockfish against a wide range of other engines.

Finally, rating Stockfish on a human scale (e.g. FIDE Elo) has become an almost impossible task, as strength differences between engines and humans are now so large, that this difference can hardly be measured. After all, this would require a human to play Stockfish for long enough to have at least a handful of draws and wins.

Stockfish crashed

Stockfish may crash if fed incorrect fens, or fens with illegal positions. Full validation code is complex to write, and within the UCI protocol, there is no established mechanism to communicate such an error back to the GUI. Therefore Stockfish is written with the expectation that the input fen is correct.

On the other hand, the GUI must carefully check fens. If you find a GUI through which you can crash Stockfish or any other engine, then by all means report it to that GUI's developers.

Does Stockfish support chess variants?

The official Stockfish engine only supports standard chess and Chess960 or Fischer Random Chess (FRC). However, various forks based on Stockfish support variants, most notably The Fairy-Stockfish project.

Can Stockfish use my GPU?

No, Stockfish is a chess engine that uses the CPU only for chess evaluation. Its NNUE evaluation (see this in-depth description) is very effective on CPUs. With extremely short inference times (sub-micro-second), this network can not be efficiently evaluated on GPUs, in particular with the alpha-beta search that Stockfish employs. However, for training networks, Stockfish employs GPUs with effective code that is part of the NNUE pytorch trainer. Other chess engines require GPUs for effective evaluation, as they are based on large convolutional or transformer networks, and use a search algorithm that allows for batching evaluations. See also the Leela Chess Zero (Lc0) project.

How can I use Stockfish in my own Software Application?

First of all, you should read our Terms of Use and follow them carefully.

Stockfish is a UCI Chess Engine, but what does that mean? It means that stockfish follows the UCI protocol, which you find explained here in great detail. This is the usual way of communicating with Stockfish, so you don't need to write any C++!

Your next step is probably gonna be researching how you can open an executable in your programming language. You will need to write to stdin and listen to stdout, that's where Stockfish's output will end up.

Examples

Limitations

I want Stockfish to comment on the move it made, what do I need to do?

That is not possible. You will have to write your own logic to create such a feature.

I want to get an evaluation of the current position.

While Stockfish has an eval command, it only statically evaluates positions without performing any search. A more precise evaluation is available after you use the go command together with a specified limit.

Executing Stockfish opens a CMD window

Stockfish is a command line program, when you execute it, you might notice that it simply opens a Command Prompt (CMD) window. This behavior is intentional and serves as the interface for interacting with the engine.

User-friendly experience

If you prefer a more user-friendly experience with a chessboard and additional features, you can consider using a graphical user interface (GUI) alongside Stockfish. To set up a GUI, you can visit the Download and Usage page.

Available commands

The CMD window allows you to input various commands and receive corresponding outputs from Stockfish. If you want to explore the available commands and their explanations, you can refer to the Commands page but this is only recommended for advanced users and developers.

What is depth?

First, we need to understand how minimax search works. We will go with the vanilla one because explaining what Alpha-beta is doesn't do much.

Minimax

Each player tries to maximize the score in their favor. White wants the evaluation to be as positive as it can, and Black as negative as it can - we do this all the time when we play chess. Search works in a similar way - you explore your moves, explore the opponent's replies, assign a value called evaluation to each resulting board position (which is not precise but tries to be), and find a sequence where White plays some move that has the maximum evaluation for the best opponent's reply.

Then you search one ply (half-move) deeper - exploring your reply to the last opponent's replies. This process is called iterative deepening - you explore a position up to depth 1, then to depth 2, then to depth 3, and so on - you deepen your search with each iteration and this is why it's called this way.

So, for now, "depth" is a perfect thing - it means you fully calculated the search tree up to this "depth" and you know everything that can happen within it. For a mate in 5, you will need depth 9 to see it (because depth is written in half-moves). But chess has a lot of moves, 20 from the starting position and usually many more from any middlegame position. Even if you can evaluate millions of positions per second as engines like Stockfish do, you will still hit a wall in what depths you can realistically reach, and it wouldn't be that high - depth 8, maybe 10.

Pruning

How to battle this? With a thing called pruning. Pruning splits into quite a lot different heuristics, but they mostly serve one purpose - remove branches in search that don't look too desirable, to reexplore them later when iterative deepening depth goes higher. So this is where "depth" starts to mean less - because you don't search the entire game tree and modern engines prune large percentages of branches.

Extensions

But then there is also a thing called "extensions" which is more or less the opposite of pruning. With extensions, you start by searching "important" branches deeper (for example, checks) than what is needed to complete the iterative deepening iteration.

Conclusion

With all of this, instead of a search tree that is strictly cut off at this "depth", you have most of the branches ending really early and a lot of branches searched deeper than the given "depth". Stockfish is the most aggressive engine in both pruning and extensions, so its search tree looks nothing like what you usually see on Wikipedia.

Coming back to how much Stockfish prunes, there is some data here. The branching factor indicates how many moves you calculate on average per depth increase, calculated as $nodes^{\frac{1}{depth}}$. Stockfish 15.1 with higher depths goes all the way down to 1.5, so at depth 50 considers approximately 1.5 moves per ply from the full 20-30-40 moves we usually have. And this is why it misses some short mates up to high depths while vanilla minimax would've found them at lower ones. It just throws away 90%+ of the moves.

Clone this wiki locally