This repo aims at providing auxiliary code, as well as analytical data, to establish rating for Stockfish variants at various parameter setting.
I think this can be useful for other projects, as well. My immediate purpose is to put evaluating of GPT-based chess emulators on a firm basis. Lately, there has been much non-quantitative discussion on putative playing strength of those emulators, using vaguely specified Stockfish clients as opponents, but without well defined reference engine ratings. I am trying to remediate this unfortunate state of affairs.
For the bulk of work presented here, I run chess engine tournaments using the [Cute Chess CLI]( My setup largely follows that used for [fishtest]( Out