-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a script for finding most important [FT] weights in a net. #150
base: master
Are you sure you want to change the base?
Conversation
@Sopel97 Thanks for this! I'll take a closer look later. |
@Sopel97 What's the use of KingBuckets[64]? official-stockfish/Stockfish@d61d385 |
A remnant from a more generic implementation where I was able to assign multiple king squares to one bucket. The king square is ensured to be in e..h files by |
@Sopel97 I'm having this error after trying to run (env) (base) C:\Users\User\nnue-pytorch>compile_data_loader.bat (env) (base) C:\Users\User\nnue-pytorch>cmake . -Bbuild -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_INSTALL_PREFIX="./" 'nmake' '-?' failed with: The system cannot find the file specified CMake Error: CMAKE_C_COMPILER not set, after EnableLanguage (env) (base) C:\Users\User\nnue-pytorch>cmake --build ./build --config RelWithDebInfo --target install (env) (base) C:\Users\User\nnue-pytorch> |
You can try
instead to force mingw makefiles. Looks like it tries to use ninja for some reason and that fails |
I think some tool like this is interesting. It would be interesting to understand more precisely what 'importance' means in this context. The gradient might not quite be the right quantity to look at, probably more something like the second derivatives (Hessian matrix, or at least its diagonal). |
Computing the full hessian with this many parameters might not be feasible, though pytorch has an autograd function that could achieve it in principle. Computing the diagonal of the hessian should be trivial https://stackoverflow.com/a/50375367. I'll revise this later with an option to use nth [configurable] derivative instead. |
It appears that gradients returned by our FT's backward doing have grad_fn define (and I have no idea what it should be), so we cannot get a second derivative for it. It should however work for later layers, if we want to support them in the future. |
@Sopel97 I downloaded the wrongNNUE binpack and it's fine now. There's just some deprecation warning. (env) (base) C:\Users\User\nnue-pytorch>python weight_importance.py --data=wrongNNUE_02_d9.binpack --features=HalfKAv2_hm --pos_n=1024 --best_n=32 --output=out.txt nn-13406b1dcbe0.nnue (env) (base) C:\Users\User\nnue-pytorch> |
@Sopel97 What is |
The first layer is of shape The most important weight is |
@Sopel97 Thanks. Now it's clear to me. |
@SFisGOD On the PSQT note. Have you tried, or considered, training an additive [for example] pawn psqt terms, starting from 0? I mean, our usual |
I have not tried something like that. |
This script provides a way to find most important (currently only for the feature transformer) weights in a given network under given dataset. The importance is determined by taking a sum of absolute values of the gradients. Because it is not possible to accumulate absolute values of the gradients over a batch this process needs to be done with batch size of 1, which means it's relatively slow.
pos_n
of several 10s or 100s of thousands should be feasible however. This tool is supposed to help with choosing weights for SPSA tuning.The produced output can be optionally also saved to a file by using
--output
option to provide the path to the file. The output format is{feature_index}\t{output_index}\t{total_grad}
.Example from a small HalfKAv2_hm-128x2-8-32-1 net: