Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a script for finding most important [FT] weights in a net. #150

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Sopel97
Copy link
Member

@Sopel97 Sopel97 commented Sep 3, 2021

This script provides a way to find most important (currently only for the feature transformer) weights in a given network under given dataset. The importance is determined by taking a sum of absolute values of the gradients. Because it is not possible to accumulate absolute values of the gradients over a batch this process needs to be done with batch size of 1, which means it's relatively slow. pos_n of several 10s or 100s of thousands should be feasible however. This tool is supposed to help with choosing weights for SPSA tuning.

usage: weight_importance.py [-h] [--best_n BEST_N] [--best_pct BEST_PCT]
                            [--pos_n POS_N] [--layer LAYER] [--output OUTPUT]
                            [--data DATA] [--features FEATURES]
                            model

Finds weights with the highest importance. Importance is measured by the
absolute value of the gradient.

positional arguments:
  model                Source model (can be .ckpt, .pt or .nnue)

optional arguments:
  -h, --help           show this help message and exit
  --best_n BEST_N      Get only n most important weights
  --best_pct BEST_PCT  Get only weights up to a given percent [0, 1] of the
                       total importance. Whichever of best_n or best_pct is
                       reached faster.
  --pos_n POS_N        The number of positions to evaluate.
  --layer LAYER        The layer to probe. Currently only 'ft' is supported.
  --output OUTPUT      Optional output file.
  --data DATA          path to a .bin or .binpack dataset
  --features FEATURES  The feature set to use. Can be a union of feature
                       blocks (for example P+HalfKP). "^" denotes a factorized
                       block. Currently available feature blocks are: HalfKP,
                       HalfKP^, HalfKA, HalfKA^, HalfKAv2, HalfKAv2^,
                       HalfKAv2_hm, HalfKAv2_hm^

The produced output can be optionally also saved to a file by using --output option to provide the path to the file. The output format is {feature_index}\t{output_index}\t{total_grad}.

Example from a small HalfKAv2_hm-128x2-8-32-1 net:

C:\dev\nnue-pytorch>python weight_importance.py --data=d10_10000.bin --features=
HalfKAv2_hm --pos_n=1024 --best_n=32 --output=out.txt nn.nnue
Done 100 out of 1024 evaluations...
Done 200 out of 1024 evaluations...
Done 300 out of 1024 evaluations...
Done 400 out of 1024 evaluations...
Done 500 out of 1024 evaluations...
Done 600 out of 1024 evaluations...
Done 700 out of 1024 evaluations...
Done 800 out of 1024 evaluations...
Done 900 out of 1024 evaluations...
Done 1000 out of 1024 evaluations...
22468   81      22.95361328125
21062   1       19.015615463256836
21833   81      16.794130325317383
22468   1       16.653770446777344
22468   19      16.062719345092773
22468   52      15.507932662963867
22468   37      15.23114013671875
22468   75      15.099000930786133
22468   103     14.928569793701172
21062   52      14.71535873413086
21062   19      14.61031436920166
22215   81      14.525751113891602
22468   72      14.388435363769531
22468   110     14.307860374450684
22468   70      14.287771224975586
21062   37      14.226285934448242
21062   81      14.158844947814941
22468   119     13.891007423400879
22468   88      13.868194580078125
21062   103     13.766294479370117
22468   31      13.745142936706543
22208   81      13.659659385681152
21832   81      13.514847755432129
21062   110     13.416680335998535
22468   8       13.376480102539062
22468   44      13.148722648620605
21062   31      13.105257987976074
22328   81      13.007036209106445
21062   75      12.977574348449707
22468   3       12.970292091369629
21837   81      12.898788452148438
21062   8       12.8389310836792
21838   81      12.642690658569336

@SFisGOD
Copy link

SFisGOD commented Sep 6, 2021

@Sopel97 Thanks for this! I'll take a closer look later.

@SFisGOD
Copy link

SFisGOD commented Sep 7, 2021

@Sopel97 What's the use of KingBuckets[64]? official-stockfish/Stockfish@d61d385

@Sopel97
Copy link
Member Author

Sopel97 commented Sep 7, 2021

@Sopel97 What's the use of KingBuckets[64]? official-stockfish/Stockfish@d61d385

A remnant from a more generic implementation where I was able to assign multiple king squares to one bucket. The king square is ensured to be in e..h files by orient, and this lookup table provides values for the squares on e..h files to map them to 0..31 (due to a small mistake they are in reverse order, that is e1 has highest bucket, but that's not important). This could be simplified to simple arithmetic, but I don't think there's a need for it.

@SFisGOD
Copy link

SFisGOD commented Sep 10, 2021

@Sopel97 I'm having this error after trying to run compile_data_loader.bat

(env) (base) C:\Users\User\nnue-pytorch>compile_data_loader.bat

(env) (base) C:\Users\User\nnue-pytorch>cmake . -Bbuild -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_INSTALL_PREFIX="./"
CMake Error at CMakeLists.txt:3 (project):
Running

'nmake' '-?'

failed with:

The system cannot find the file specified

CMake Error: CMAKE_C_COMPILER not set, after EnableLanguage
CMake Error: CMAKE_CXX_COMPILER not set, after EnableLanguage
-- Configuring incomplete, errors occurred!
See also "C:/Users/User/nnue-pytorch/build/CMakeFiles/CMakeOutput.log".

(env) (base) C:\Users\User\nnue-pytorch>cmake --build ./build --config RelWithDebInfo --target install
The system cannot find the file specified
CMake Error: Generator: execution of make failed. Make command was: nmake -f Makefile /nologo install &&

(env) (base) C:\Users\User\nnue-pytorch>

@SFisGOD
Copy link

SFisGOD commented Sep 10, 2021

cmake_minimum_required(VERSION 3.0)

project(training_data_loader)

I have the training_data_loader file

image

@Sopel97
Copy link
Member Author

Sopel97 commented Sep 11, 2021

You can try

cmake . -Bbuild -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_INSTALL_PREFIX="./" -G "MinGW Makefiles"
cmake --build ./build --config RelWithDebInfo --target install

instead to force mingw makefiles. Looks like it tries to use ninja for some reason and that fails

@vondele
Copy link
Member

vondele commented Sep 13, 2021

I think some tool like this is interesting. It would be interesting to understand more precisely what 'importance' means in this context. The gradient might not quite be the right quantity to look at, probably more something like the second derivatives (Hessian matrix, or at least its diagonal).

@Sopel97
Copy link
Member Author

Sopel97 commented Sep 13, 2021

Computing the full hessian with this many parameters might not be feasible, though pytorch has an autograd function that could achieve it in principle. Computing the diagonal of the hessian should be trivial https://stackoverflow.com/a/50375367. I'll revise this later with an option to use nth [configurable] derivative instead.

@Sopel97
Copy link
Member Author

Sopel97 commented Sep 13, 2021

It appears that gradients returned by our FT's backward doing have grad_fn define (and I have no idea what it should be), so we cannot get a second derivative for it. It should however work for later layers, if we want to support them in the future.

@SFisGOD
Copy link

SFisGOD commented Sep 13, 2021

@Sopel97 I downloaded the wrongNNUE binpack and it's fine now. There's just some deprecation warning.

(env) (base) C:\Users\User\nnue-pytorch>python weight_importance.py --data=wrongNNUE_02_d9.binpack --features=HalfKAv2_hm --pos_n=1024 --best_n=32 --output=out.txt nn-13406b1dcbe0.nnue
Done 100 out of 1024 evaluations...
Done 200 out of 1024 evaluations...
Done 300 out of 1024 evaluations...
Done 400 out of 1024 evaluations...
Done 500 out of 1024 evaluations...
Done 600 out of 1024 evaluations...
Done 700 out of 1024 evaluations...
Done 800 out of 1024 evaluations...
Done 900 out of 1024 evaluations...
Done 1000 out of 1024 evaluations...
C:\Users\User\nnue-pytorch\env\lib\site-packages\torch_tensor.py:575: UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values.
To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at ..\aten\src\ATen\native\BinaryOps.cpp:467.)
return torch.floor_divide(self, other)
21062 197 24.06382179260254
18254 197 16.326650619506836
17551 197 14.6910400390625
16149 197 14.099198341369629
18957 197 13.728248596191406
21062 389 12.21109390258789
20430 197 11.612868309020996
19660 197 10.220696449279785
18254 389 10.145904541015625
21765 197 10.132981300354004
21062 897 9.354948043823242
20429 197 9.321372985839844
16852 389 8.915192604064941
18254 897 8.53359317779541
16852 197 8.460387229919434
15446 197 7.808490753173828
20534 197 7.556325435638428
19660 897 7.460641384124756
20527 197 7.434072017669678
14044 389 7.373429298400879
16149 389 7.353665351867676
20526 197 7.340466499328613
11236 609 7.326484203338623
18254 719 7.312647819519043
20438 197 7.240901470184326
16149 897 7.228122711181641
20359 197 7.172454357147217
20533 197 7.096660614013672
20431 197 7.050807476043701
11236 888 6.967772483825684
21765 389 6.918582439422607
20439 197 6.909554958343506
17551 897 6.840373516082764

(env) (base) C:\Users\User\nnue-pytorch>

@SFisGOD
Copy link

SFisGOD commented Sep 13, 2021

@Sopel97 What is {feature_index} and t{output_index} ?
So from the results above, is this the most important weight?
int(Stockfish::Eval::NNUE::featureTransformer->psqtWeights[21062]);
int psqtW[21062] = {36043};

@Sopel97
Copy link
Member Author

Sopel97 commented Sep 13, 2021

@Sopel97 What is {feature_index} and t{output_index} ?
So from the results above, is this the most important weight?
int(Stockfish::Eval::NNUE::featureTransformer->psqtWeights[21062]);
int psqtW[21062] = {36043};

The first layer is of shape (32*64*11, 1024). feature_index is the index in the first dimension, output_index in the second dimension. The PSQT weights are excluded, as they have uncomparable gradients to the rest of the feature transformer.

The most important weight is featureTransformer->weights[21062 * TransformedFeatureDimensions + 197]

@SFisGOD
Copy link

SFisGOD commented Sep 14, 2021

@Sopel97 Thanks. Now it's clear to me.

@Sopel97
Copy link
Member Author

Sopel97 commented Sep 14, 2021

@SFisGOD On the PSQT note. Have you tried, or considered, training an additive [for example] pawn psqt terms, starting from 0? I mean, our usual Score for each square, accumulated for each pawn on the board and added to the NNUE result. It could in principle tell us whether the net can learn good PSQT values from just training data. I was considering that recently, but I have no idea what spsa parameters would be suitable when starting from 0.

@SFisGOD
Copy link

SFisGOD commented Sep 16, 2021

@SFisGOD On the PSQT note. Have you tried, or considered, training an additive [for example] pawn psqt terms, starting from 0?

I have not tried something like that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants