-
Notifications
You must be signed in to change notification settings - Fork 172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tests fail on x86 #2691
Comments
This is very curious. The failures are of the type... test/wallet_tests.cpp(245): error: in "wallet_tests/coin_selection_tests": check nValueRet == 1.0105 * COIN has failed [101049999 != 101050000] Which looks like a floating point rounding error. Note That CI tests 32 bit linux for every PR merge and it is passing. See https://github.com/gridcoin-community/Gridcoin-Research/actions/runs/5292472929/jobs/9579392573 for example. |
Can you provide more specifics of the Linux distribution/kernel version/etc you are running? My guess is that it is a version of openSUSE :). |
This is good to know, as I wasn't sure if 32-bit x86 is actually even something that's meant to be supported.
Yes. This is one openSUSE. Specifically, I ran into this with the 32-bit package for openSUSE Tumbleweed. To be fair, that already started a while ago but I only now had the time to look into it. The kernel in that case is When reproducing via Podman, the kernel was As in both cases the kernels are actually 64 bit ones (you can also see that from the build log on the OBS where it states |
So it seems this might be related to the combination of GCC13 and 32bit x86. Based on our discussion I just retried my reproducing example but explicitly selected GCC12 and in that case the tests pass. |
Hmm... do you think this is a compiler bug? |
Might be a compiler bug. But might also be some edge-case that only this specific compiler exposes. A calculation like |
It very well could be an implicit casting issue. 1.0105 as a literal is normally a float. COIN is CAmount which is really int64_t underneath. Let me put casting statements to ensure the casting is being done the way we want it and we can recheck. |
No dice even with explicit casting. I really think this is a compiler problem. |
See jamescowens@32da816 and the log: output_fix_wallet_tests_gcc13_3.log Notice that the changeover in integer arithmetic in that particular spot cleared the error. There are remaining ones. All of these are conversions to/from floating point, and these are properly coded. I think we really must be looking at a compiler issue. |
Thanks for looking into this. I have to admit, there's two hearts within me regarding this issue. As a passionate software engineer, I'd love to get this down to a minimal example, report it upstream, and have the world improved for everybody. But as a packager with limited (time) resources, "just don't build on GCC 13 when on 32-bit x86" is what I'll be going for. How far you want to carry this as the project: just relying on users not running builds for which the tests fail, adding an explicit warning, fixing it within your codebase, or reporting this upstream, is your call. |
One more thing I noticed, it seems that on Fedora 32-bit x86 the build using GCC 13 works fine. Thus, this seems to be specific to either the GCC 13 shipped by SUSE or some build flags used by SUSE but not by Fedora. |
I don't exactly understand why you think the compiler has a bug in this case. In |
@div72, that is actually not where the failure is. It is on the other side of the test equality and stems from add_coin(0.0005 * COIN); at 239. There are also failures like: this: test/gridcoin/researcher_tests.cpp(237): error: in "MiningProject/it_parses_a_project_xml_string": check project.m_rac == 123.45 has failed. An inspection of this is that MiningProject::Parse(const std::string& xml) is not returning a float of exactly 123.45. This function calls bool ParseDouble(const std::string& str, double *out) You can argue all you want that this is implementation specific behavior, and I will counter-argue that this same code has not ever failed across many platforms and compilers. Something is wrong with GCC 13 in 32 bit x86. |
Ok. I just looked up the IEEE 754 spec. For double precision, a number is represented by (−1)^(sign bit) * (1+fraction) * 2^(exponent - bias), where sign bit is 0 for positive and 1 for negative, the fraction is a binary fraction that can have up to 52 bits, and the exponent up to 11 bits, with a bias of 1023. To convert the string 123.45 to an equivalent binary IEEE 754 double... First the sign is positive, so the sign bit is 0. Second, to get the exponent we do floor(ln(123.45) / ln(2)), so the exponent is 6. Third, to get the fraction, we do fraction = 123.45 / 2^6 - 1 = 0.92890625 (exact in decimal) In binary: to 52 bits we have 1110110111001100110011001100110011001100110011001101 Notice there is a repeating cell 0011 that repeats forever. Given that 52 bits in this instance is an approximation the maximum error due to a change of 1 bit at the 52nd place: (1 + 2^(-52)) * 2^6 - 1 * 2^6 = 2^(-52) * 2^6 = 2^(-52) = 2.22E-16. Since this is smaller than the last decimal in original number (123.45) by far, the error induced by truncating the repeating binary does not result in an error in representation, so effectively 123.45 can be represented exactly. I would say this is compiler error. It isn't following the IEEE 754 standard. |
@theMarix are we still seeing this? |
I'll need to check as I had pinned the package build to GCC 12 to avoid not having up-to-date packages on x86. |
I can still reproduce with the latest development branch on current openSUSE Tumbleweed x86. |
This jamescowens@fb06203 should fix it. Here I make use of the augmented Fraction class/Allocation class that I did for PR #2704. |
I'll give it a look as soon as I can. I currently have to deal with some issues that prevent me from spending a lot of time at my workstation. So this might take until the next weekend. |
@jamescowens , Just tested on the latest development branch and on your commit. Both cases still show the error. |
Can you provide the test log output please.Sent from my iPhoneOn Feb 17, 2024, at 11:24 AM, Matthias Bach ***@***.***> wrote:
@jamescowens , Just tested on the latest development branch and on your commit. Both cases still show the error.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>
|
@theMarix I just tried the podman script you gave and am getting several errors. I had to add gawk and got farther, but then am getting errors with the bdb53 compile indicating a mixed arch problem. Did you update the Dockerfile script since the earlier runs above? |
@jamescowens , it's weird that you had to update the file. But it is possible that some updates were needed over time. I probably adjusted those on the fly when updating the spec for the actual packaging so I might have forgotten to update it here. Attached the latest I've been using. Further, the following is the complete build and test log from running |
You had definitely made tweaks. The one above worked. :) Here is a grepped version of the test output on the current dev branch. The file which I made changes is not generating any errors. We have errors coming from the magnitude to float conversion which has not failed anywhere else, and quite frankly should not fail. This is definitely a compiler problem. The compiler is out of spec. I have to think carefully about what to do here, because I do not want to weaken the tests simply to make this special case pass. |
Bug Report
Current behavior
When building on x86 (32 bit, not x86_64), the test suit fails with "117 failures are detected in the test module "Gridcoin Test Suite". This happens both on
master
as well as on the currentdevelopment
branch. Running the tests on x86_64 works as expected.Expected behavior
Expected behaviour would be for the test suite to succeed on x86.
Steps to reproduce:
I run the following on an x86 Linux:
This results in the following (shortened) output:
I have attached a copy of the test-suite.log
You can reproduce the issue on an x86_64 host by using the attached Dockerfile and running the build with the x86 arch explicitly selected:
Gridcoin version
master
@5.4.5.0
development
@d614a0fa
Machine specs
The text was updated successfully, but these errors were encountered: