-
Notifications
You must be signed in to change notification settings - Fork 585
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AVX512 Support #3776
base: master
Are you sure you want to change the base?
AVX512 Support #3776
Conversation
CMake Changes: Added the AVX512F_SUPPORT flag to CMake. Set by checking if system supports AVX512F. If it doesn't it will not build (and therefore run) any of the AVX512 tests Added the target descriptions. Assumed that PKRU support also implies AVX512 support as I could find no information that said otherwise. Added gdb_avx512 test - sets 3 ZMM registers and reads them back in a gdb session of a replay to verify the contents. We place mask registers K0-K7 after ZMM31H internally, because this simplifies the logic that parses the XSAVE area. The target description we provide, takes this into account (and also places them last) so they come after the ZMM registers in the 'g' packet. GDB doesn't care so long as we do what we say in the target desc Changed test harness util.sh to also export TESTNAME variable as we may (I do at least) want to query from the python script what mode we are running in (64-bit/32-bit). See gdb_avx512.py for further info.
I'll look at this in detail this week. The assumption that AVX-512 implies PKRU is not necessarily correct because these two features can be controlled independently by the kernel. So it's possible to have a system with AVX-512 available but PKRU unavailable even if the silicon supports both. I do wonder if we could start pretending that certain things are available when they're not (e.g. for reads returning default register values and for writes returning errors) to avoid a combinatorial explosion in the number of target description XML files we need to ship. We're already up to 20 files. Any thoughts @rocallahan? |
GDB has a way to signal "could not read", which means, we could potentially do something like this: Have a "one description to rule them all" (ish, or just remove the edge case ones like PKRU) kind of thing, and then just have RR splat 'x' (representing "could not read"-value in the g-packet) in all the right places. Sure, it will mean that when reporting registers an architecture doesn't even have, we are doing the extra work of filling (N*regWidth + ...) bytes of 'x' into the g-packet, so that has to be taken into consideration. At least this could be done for the PKRU (and potentially similar CPU features) feature, as it doesn't involve all that much unnecessary data. (the g-packet is the packet a debugger client requests from a gdb server implementation. So PKRU being an 8 byte register, on an architecture that doesn't have it or it isn't enabled would have |
I guess we can assume the presence of AVX & AVX2 too (in the target descriptions), since these are very common features these days anyhow. |
We could dynamically assemble the top-level XML file. I.e. given the base CPU type (aarch64, i386, amd64) and some set of optional features (a subset of |
Great idea. I think GDB does something similar under the hood. I can add this feature to this PR, or in a later PR, if we're all in agreement. What say you @khuey || @rocallahan ? The feature would basically involve:
|
I think we should do it before this PR. |
Ok, can I wrap it up in the same PR or create a separate PR for it? |
Separate PR I think, thanks |
TargetDescription dynamically generates the "master" target description contents (target.xml), so that we only have to manage the individual concrete CPU features (like foo-sse.xml, for instance). TargetDescription will generate the xml contents that includes the used features. Removed combinatorial target.xml files These are now handled by `TargetDescription` instead. This unblocks the rr-debugger#3776 PR (AVX512 support) from getting pulled in, though that PR will need to take this PR into account.
TargetDescription dynamically generates the "master" target description contents (target.xml), so that we only have to manage the individual concrete CPU features (like foo-sse.xml, for instance). TargetDescription will generate the xml contents that includes the used features. Removed combinatorial target.xml files These are now handled by `TargetDescription` instead. This unblocks the rr-debugger#3776 PR (AVX512 support) from getting pulled in, though that PR will need to take this PR into account.
TargetDescription dynamically generates the "master" target description contents (target.xml), so that we only have to manage the individual concrete CPU features (like foo-sse.xml, for instance). TargetDescription will generate the xml contents that includes the used features. Removed combinatorial target.xml files These are now handled by `TargetDescription` instead. This unblocks the rr-debugger#3776 PR (AVX512 support) from getting pulled in, though that PR will need to take this PR into account.
Fixes #2974
This PR introduces AVX512 support in replays.
CMake Changes:
Added the AVX512F_SUPPORT flag to CMake. Set by checking
if system supports AVX512F. If it doesn't it will not build (and therefore run) any of the AVX512 tests
Added the target descriptions. Assumed that PKRU support also implies AVX512 support as I could find no information that said otherwise.
Added gdb_avx512 test - sets 3 ZMM registers and reads them back in a gdb session of a replay to verify the contents.
We place mask registers K0-K7 after ZMM31H internally, because this simplifies the logic that parses the XSAVE area. The target description we provide, takes this into account (and also places them last) so they come after the ZMM registers in the 'g' packet. GDB doesn't care so long as we do what we say in the target desc
Changed test harness util.sh to also export TESTNAME variable as we may (this one test do at least) want to query from the python script what mode we are running in (64-bit/32-bit). See gdb_avx512.py for further info.
Not tested or developed for: