Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propertly Truncate Paired Single Move Instructions #13059

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

Geotale
Copy link
Contributor

@Geotale Geotale commented Sep 8, 2024

Matches hardware in rounding paired singles after move operations!!
Move operations consist of operations which only transfer direct bits (including absolute values and merges)
This also accounts for ps_rsqrte because it has a similar quirk
Specifically in hardware they're rounded in accordance to their slot:

  • PS0 rounds only the mantissa in accordance to the set rounding mode
  • PS1 truncates the mantissa

ps_rsqrte also only truncates for PS0 for some reason ^^;
This has all been tested on hardware, along with a few edge case tests to ensure this implementation works

@Geotale Geotale changed the title Paired move truncation Propertly Truncate Paired Single Move Instructions Sep 8, 2024
@Dentomologist
Copy link
Contributor

Can you add a unit test matching the tests you ran on hardware? That would help show what has been tested and prevent any future regressions.

Also, the second commit should be squashed into the first one.

@Geotale
Copy link
Contributor Author

Geotale commented Sep 8, 2024

I can probably commit to the hwtests repo if I can figure out how to get it set up ^^;
I also will definitely squash the commits but I am not confident that I won't have to change things and I'd rather not have to multiple times if possible :D

@Dentomologist
Copy link
Contributor

While it would definitely be good to have that in the hwtests repo, I was primarily thinking about here which gets run on every PR.

@Geotale
Copy link
Contributor Author

Geotale commented Sep 8, 2024

I'm mildly afraid to try that bc this is interpreter-only and would require an entire thing somehow setting that up :,D

@JosJuice
Copy link
Member

JosJuice commented Sep 9, 2024

I would very much like a hwtest. And if we have that, I don't think there's any need to require a unit test on top. Yes, being able to automatically test stuff is nice, but:

  1. This code doesn't get changed often at all, so running the relevant hwtests manually is a surmountable task, and
  2. The time it would take to set up a bunch of unit tests for the interpreter would probably be better spent getting hwtests to run automatically like we run FifoCI, or setting up some other kind of testing system that can run both on console and on Dolphin. Unit tests are inherently less useful than hwtests because we can't run them on hardware.

@Tilka
Copy link
Member

Tilka commented Sep 14, 2024

I've confirmed with your hwtest (dolphin-emu/hwtests#56) plus isync that the ps_merge*/ps_mr/ps_neg/ps_abs/ps_nabs changes match hardware. If you move the changes for those opcodes (and the shared code) into a separate PR, we can get that merged now and leave ps_sel/ps_rsqrte/ps_sum0/ps_sum1 verification for later.

@Geotale
Copy link
Contributor Author

Geotale commented Sep 14, 2024

Oh right I forgot about sum0 and sum1, ty for reminding me ^^ -- I was planning try to figure out what in the world is failing the 3 specific tests with ps_rsqrte first though :O

@Tilka
Copy link
Member

Tilka commented Sep 15, 2024

Based on some limited testing, I think ps_sel doesn't have the NaN/infinity special case.

@Geotale
Copy link
Contributor Author

Geotale commented Sep 24, 2024

I did get the tests fully working, but the changes might be larger and more complicated than initially expected ^^; I'm going to abstract as much away as possible though

@Geotale
Copy link
Contributor Author

Geotale commented Oct 2, 2024

ARM tests are actually failing which I think is really funny ^^; I would really like more information to see what exactly is failing and can try to fix the JIT (or my own code!!) to resolve the issue need be!!

@AdmiralCurtiss
Copy link
Contributor

You can see it in the build log for the failing build:

[ RUN      ] JitArm64.Fres
3ff0000000000000 -> 3ffffe0000000000 == 3fefff0000000000
/Users/administrator/buildbot-worker/osx_m1/pr-osx-universal/build/Source/UnitTests/Core/PowerPC/JitArm64/Fres.cpp:72: Failure
Expected equality of these values:
  expected
    Which is: 4607181319288389632
  actual
    Which is: 4611683819404132352
bff0000000000000 -> bffffe0000000000 == bfefff0000000000
/Users/administrator/buildbot-worker/osx_m1/pr-osx-universal/build/Source/UnitTests/Core/PowerPC/JitArm64/Fres.cpp:72: Failure
Expected equality of these values:
  expected
    Which is: 13830553356143165440
  actual
    Which is: 13835055856258908160
3800000000000000 -> 47dffe0000000000 == 47dfff0000000000
/Users/administrator/buildbot-worker/osx_m1/pr-osx-universal/build/Source/UnitTests/Core/PowerPC/JitArm64/Fres.cpp:72: Failure
Expected equality of these values:
  expected
    Which is: 5179138471964442624
  actual
    Which is: 5179137372452814848
3810000000000000 -> 47dffe0000000000 == 47cfff0000000000
/Users/administrator/buildbot-worker/osx_m1/pr-osx-universal/build/Source/UnitTests/Core/PowerPC/JitArm64/Fres.cpp:72: Failure
Expected equality of these values:
  expected
    Which is: 5174634872337072128
  actual
    Which is: 5179137372452814848
b800000000000000 -> c7dffe0000000000 == c7dfff0000000000
/Users/administrator/buildbot-worker/osx_m1/pr-osx-universal/build/Source/UnitTests/Core/PowerPC/JitArm64/Fres.cpp:72: Failure
Expected equality of these values:
  expected
    Which is: 14402510508819218432
  actual
    Which is: 14402509409307590656
b810000000000000 -> c7dffe0000000000 == c7cfff0000000000
/Users/administrator/buildbot-worker/osx_m1/pr-osx-universal/build/Source/UnitTests/Core/PowerPC/JitArm64/Fres.cpp:72: Failure
Expected equality of these values:
  expected
    Which is: 14398006909191847936
  actual
    Which is: 14402509409307590656
3800123456789abc -> 57f0212700000000 == 47dfdbd8e0000000
/Users/administrator/buildbot-worker/osx_m1/pr-osx-universal/build/Source/UnitTests/Core/PowerPC/JitArm64/Fres.cpp:72: Failure
Expected equality of these values:
  expected
    Which is: 5179099821016875008
  actual
    Which is: 6336601127097729024
3810123456789abc -> 57e0212700000000 == 47cfdbd8e0000000
/Users/administrator/buildbot-worker/osx_m1/pr-osx-universal/build/Source/UnitTests/Core/PowerPC/JitArm64/Fres.cpp:72: Failure
Expected equality of these values:
  expected
    Which is: 5174596221389504512
  actual
    Which is: 6332097527470358528
b800123456789abc -> d7f0212700000000 == c7dfdbd8e0000000
/Users/administrator/buildbot-worker/osx_m1/pr-osx-universal/build/Source/UnitTests/Core/PowerPC/JitArm64/Fres.cpp:72: Failure
Expected equality of these values:
  expected
    Which is: 14402471857871650816
  actual
    Which is: 15559973163952504832
b810123456789abc -> d7e0212700000000 == c7cfdbd8e0000000
/Users/administrator/buildbot-worker/osx_m1/pr-osx-universal/build/Source/UnitTests/Core/PowerPC/JitArm64/Fres.cpp:72: Failure
Expected equality of these values:
  expected
    Which is: 14397968258244280320
  actual
    Which is: 15555469564325134336
47c0000000000000 -> 381ffe0000000000 == 381fff0000000000
/Users/administrator/buildbot-worker/osx_m1/pr-osx-universal/build/Source/UnitTests/Core/PowerPC/JitArm64/Fres.cpp:72: Failure
Expected equality of these values:
  expected
    Which is: 4044231365867077632
  actual
    Which is: 4044230266355449856
47d0000000000000 -> 0000000000000000 == 380fff0000000000
/Users/administrator/buildbot-worker/osx_m1/pr-osx-universal/build/Source/UnitTests/Core/PowerPC/JitArm64/Fres.cpp:72: Failure
Expected equality of these values:
  expected
    Which is: 4039727766239707136
  actual
    Which is: 0
c7c0000000000000 -> b81ffe0000000000 == b81fff0000000000
/Users/administrator/buildbot-worker/osx_m1/pr-osx-universal/build/Source/UnitTests/Core/PowerPC/JitArm64/Fres.cpp:72: Failure
Expected equality of these values:
  expected
    Which is: 13267603402721853440
  actual
    Which is: 13267602303210225664
c7d0000000000000 -> 8000000000000000 == b80fff0000000000
/Users/administrator/buildbot-worker/osx_m1/pr-osx-universal/build/Source/UnitTests/Core/PowerPC/JitArm64/Fres.cpp:72: Failure
Expected equality of these values:
  expected
    Which is: 13263099803094482944
  actual
    Which is: 9223372036854775808
37f0000000000000 -> 47fffe0000000000 == 47efff0000000000
/Users/administrator/buildbot-worker/osx_m1/pr-osx-universal/build/Source/UnitTests/Core/PowerPC/JitArm64/Fres.cpp:72: Failure
Expected equality of these values:
  expected
    Which is: 5183642071591813120
  actual
    Which is: 5188144571707555840
b7f0000000000000 -> c7fffe0000000000 == c7efff0000000000
/Users/administrator/buildbot-worker/osx_m1/pr-osx-universal/build/Source/UnitTests/Core/PowerPC/JitArm64/Fres.cpp:72: Failure
Expected equality of these values:
  expected
    Which is: 14407014108446588928
  actual
    Which is: 14411516608562331648
3ff8000000000000 -> 3feaaa0000000000 == 3fe5550000000000
/Users/administrator/buildbot-worker/osx_m1/pr-osx-universal/build/Source/UnitTests/Core/PowerPC/JitArm64/Fres.cpp:72: Failure
Expected equality of these values:
  expected
    Which is: 4604179652544561152
  actual
    Which is: 4605680485916475392
408f400000000000 -> 3f51280000000000 == 3f50628000000000
/Users/administrator/buildbot-worker/osx_m1/pr-osx-universal/build/Source/UnitTests/Core/PowerPC/JitArm64/Fres.cpp:72: Failure
Expected equality of these values:
  expected
    Which is: 4562254724421648384
  actual
    Which is: 4562471877968134144
c008000000000000 -> bfdaaa0000000000 == bfd5550000000000
/Users/administrator/buildbot-worker/osx_m1/pr-osx-universal/build/Source/UnitTests/Core/PowerPC/JitArm64/Fres.cpp:72: Failure
Expected equality of these values:
  expected
    Which is: 13823048089771966464
  actual
    Which is: 13824548923143880704
[  FAILED  ] JitArm64.Fres (1 ms)

@Geotale
Copy link
Contributor Author

Geotale commented Oct 2, 2024

I can check on hardware quickly, but at least from the 3ff0000000000000 -> 3fefff0000000000 (new result) it seems like this PR is the correct one?
Although I'll also add that some of these are issues with denormals -- I'll update the tested FPSCR if we should just always assume that the NI flag is set ^^

@Geotale
Copy link
Contributor Author

Geotale commented Oct 2, 2024

Thinking about it, actually I do need to modify the JIT if it ends up calling this function!! x86-64 definitely does this at least...

@JosJuice
Copy link
Member

JosJuice commented Oct 2, 2024

Jit64 (but not JitArm64) calls ApproximateReciprocal and ApproximateReciprocalSquareRoot for edge cases, but what's also worth noting is that both JITs read from fres_expected and BaseAndDec.

Note that while the unit test is set up to check that the JIT gives the same results as the FloatUtils functions, that's just the way this test happens to be written - we have no absolute policy that the behavior of the JIT and interpreter must match. But we do try to make the JIT and interpreter match when it's feasible performance-wise. On the other hand, we do have a requirement that the behavior of Jit64 matches the behavior of JitArm64, so we don't break cross-architecture netplay.

I'd be happy to help out with updating the JITs to match your changes to the interpreter, but if you'd like me to take the lead on it, you may have to wait a bit before I'll have time.

@Geotale
Copy link
Contributor Author

Geotale commented Oct 2, 2024

I'm going to try my best as of now for x86-64 at the very least because the outputs of that function seem to match those failing in ARM64 on my machine :o It's so weird too...

@Geotale
Copy link
Contributor Author

Geotale commented Oct 3, 2024

Although the build should pass, note that this PR can't really be submitted yet until it's decided for JITs in the fres instruction whether:
a. x86-64 should have imperfect behavior and inline more for more complex inputs, or
b. Aarch64 should instead use a function call for complex inputs, giving perfect behavior
Specifically the inputs differ for numbers which would give a subnormal output when NI=0 (but never in any other instances)
Also technically a function call isn't necessary for either of them, because both implement the most complex part, and Aarch64 implements everything other than this subnormal case ^^

Matches hardware in rounding paired singles after move operations!!
Move operations consist of operations which only transfer direct bits
This also accounts for ps_rsqrte, because it has a similar quirk
Specifically in hardware they're rounded in accordance to their slot:
- PS0 rounds *only the mantissa* in accordance to the set rounding mode
- PS1 truncates the mantissa
ps_rsqrte also truncates for PS0 for some reason ^^;
This has all been tested on hardware, along with a few edge case tests

Co-Authored-By: JosJuice <[email protected]>
Also changes >= 0 to > 0.0
This technically leads to fewer branches taken ^^;
More importantly it looks/feels nicer to me

Fixes the approximate reciprocal function
- Currently not optimized
- Considering rewrite for cleanliness
Moves PS rounding to FloatUtils
- Done because it's used in more places now
Changes TruncateMantissa to occur on read
- This is to account for reciprocal cases
Adds PS1 getting function for reciprocals
Fixes ps_sum1 edge case with rounding
TODO: Test what ops can set PS1 edge case
- ps_merge is known to be able to
Assume NI Set For Unit Tests

This does *not* match x86-64, which properly handles any weird values using a function call
This should hopefully pass tests though, which is important before fixing that issue

I had forgotten that the JITs would use the same modified base and pair tables ^^;
Also fixes call for complex inputs in x86
This saves an instruction on both x86-64 and ARM64!!
TODO: Due to fixes with interpreter, ARM64 JIT likely doesn't match x86 JIT which calls a fallback on weird inputs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants