Tests fail on x86 #2691

theMarix · 2023-06-23T18:45:01Z

Bug Report

Current behavior

When building on x86 (32 bit, not x86_64), the test suit fails with "117 failures are detected in the test module "Gridcoin Test Suite". This happens both on master as well as on the current development branch. Running the tests on x86_64 works as expected.

Expected behavior

Expected behaviour would be for the test suite to succeed on x86.

Steps to reproduce:

I run the following on an x86 Linux:

./autogen.sh
./configure --disable-bench --with-miniupnpc --with-qrencode --with-gui=qt5
make -j8 && VERBOSE=1 make check

This results in the following (shortened) output:

============================================================================
Testsuite summary for Gridcoin 5.4.5
============================================================================
# TOTAL: 2
# PASS:  1
# SKIP:  0
# XFAIL: 0
# FAIL:  1
# XPASS: 0
# ERROR: 0
============================================================================
See src/test-suite.log
Please report to https://github.com/gridcoin/Gridcoin-Research/issues
============================================================================

I have attached a copy of the test-suite.log

You can reproduce the issue on an x86_64 host by using the attached Dockerfile and running the build with the x86 arch explicitly selected:

podman build --arch 386  -f Dockerfile .

Gridcoin version

master @ 5.4.5.0
development @ d614a0fa

Machine specs

OS: Linux
CPU: x86

The text was updated successfully, but these errors were encountered:

jamescowens · 2023-06-24T22:22:51Z

This is very curious. The failures are of the type...

test/wallet_tests.cpp(245): error: in "wallet_tests/coin_selection_tests": check nValueRet == 1.0105 * COIN has failed [101049999 != 101050000]

Which looks like a floating point rounding error.

Note That CI tests 32 bit linux for every PR merge and it is passing. See https://github.com/gridcoin-community/Gridcoin-Research/actions/runs/5292472929/jobs/9579392573 for example.

jamescowens · 2023-06-24T22:25:08Z

Can you provide more specifics of the Linux distribution/kernel version/etc you are running? My guess is that it is a version of openSUSE :).

theMarix · 2023-06-26T18:27:35Z

` Note That CI tests 32 bit linux for every PR merge and it is passing. See https://github.com/gridcoin-community/Gridcoin-Research/actions/runs/5292472929/jobs/9579392573 for example.

This is good to know, as I wasn't sure if 32-bit x86 is actually even something that's meant to be supported.

Can you provide more specifics of the Linux distribution/kernel version/etc you are running? My guess is that it is a version of openSUSE :).

Yes. This is one openSUSE. Specifically, I ran into this with the 32-bit package for openSUSE Tumbleweed. To be fair, that already started a while ago but I only now had the time to look into it. The kernel in that case is Linux version: 6.3.9-1-default #1 SMP PREEMPT_DYNAMIC Thu Jun 22 03:53:43 UTC 2023 (0df701d). You can see the build log for those builds at https://build.opensuse.org/public/build/home:theMarix/openSUSE_Tumbleweed/i586/gridcoin/_log.

When reproducing via Podman, the kernel was 5.14.21-150500.53-default #1 SMP PREEMPT_DYNAMIC Wed May 10 07:56:26 UTC 2023 (b630043) x86_64 x86_64 x86_64 GNU/Linux which is quite a bit older, as I was running on Leap 15.5 in that case.

As in both cases the kernels are actually 64 bit ones (you can also see that from the build log on the OBS where it states machine type: x86_64) and the 64 bit builds on the same machines are fine, I suspect this is more related to the gcc version or some build flags than to the kernel.

theMarix · 2023-06-26T19:36:28Z

So it seems this might be related to the combination of GCC13 and 32bit x86. Based on our discussion I just retried my reproducing example but explicitly selected GCC12 and in that case the tests pass.

jamescowens · 2023-06-26T19:52:58Z

Hmm... do you think this is a compiler bug?

theMarix · 2023-06-26T19:55:55Z

Might be a compiler bug. But might also be some edge-case that only this specific compiler exposes. A calculation like 1.0105 * COIN might strongly depend on how and in which order types are cast and I am not 100% sure on what the C and C++ specs say in that case.

jamescowens · 2023-06-26T22:24:35Z

It very well could be an implicit casting issue. 1.0105 as a literal is normally a float. COIN is CAmount which is really int64_t underneath. Let me put casting statements to ensure the casting is being done the way we want it and we can recheck.

jamescowens · 2023-06-26T23:40:09Z

No dice even with explicit casting. I really think this is a compiler problem.

jamescowens · 2023-06-27T00:30:46Z

See jamescowens@32da816 and the log:

output_fix_wallet_tests_gcc13_3.log

Notice that the changeover in integer arithmetic in that particular spot cleared the error. There are remaining ones.

All of these are conversions to/from floating point, and these are properly coded.

I think we really must be looking at a compiler issue.

theMarix · 2023-06-27T06:27:01Z

Thanks for looking into this. I have to admit, there's two hearts within me regarding this issue. As a passionate software engineer, I'd love to get this down to a minimal example, report it upstream, and have the world improved for everybody. But as a packager with limited (time) resources, "just don't build on GCC 13 when on 32-bit x86" is what I'll be going for.

How far you want to carry this as the project: just relying on users not running builds for which the tests fail, adding an explicit warning, fixing it within your codebase, or reporting this upstream, is your call.

theMarix · 2023-06-27T08:45:20Z

One more thing I noticed, it seems that on Fedora 32-bit x86 the build using GCC 13 works fine. Thus, this seems to be specific to either the GCC 13 shipped by SUSE or some build flags used by SUSE but not by Fedora.

div72 · 2023-06-27T22:35:44Z

I don't exactly understand why you think the compiler has a bug in this case. In float literal * COIN, the CAmount would be promoted to a double and a floating point multiplication would be done. The fact that 1.0105 * COIN gets optimized to 101050000 during compile time should be implemention-defined behaviour. I feel like our code is wrong here.

jamescowens · 2023-07-04T23:34:50Z

@div72, that is actually not where the failure is. It is on the other side of the test equality and stems from add_coin(0.0005 * COIN); at 239.

There are also failures like: this: test/gridcoin/researcher_tests.cpp(237): error: in "MiningProject/it_parses_a_project_xml_string": check project.m_rac == 123.45 has failed.

An inspection of this is that MiningProject::Parse(const std::string& xml) is not returning a float of exactly 123.45. This function calls bool ParseDouble(const std::string& str, double *out)
on the extracted RAC, which in this case is the string "123.45". I am no expert in IEEE spec and compiler specs, but 123.45 converted to float should not be anything other than 123.45, not 123.499999999...

You can argue all you want that this is implementation specific behavior, and I will counter-argue that this same code has not ever failed across many platforms and compilers.

Something is wrong with GCC 13 in 32 bit x86.

jamescowens · 2023-07-05T00:44:56Z

Ok. I just looked up the IEEE 754 spec.

For double precision, a number is represented by (−1)^(sign bit) * (1+fraction) * 2^(exponent - bias), where sign bit is 0 for positive and 1 for negative, the fraction is a binary fraction that can have up to 52 bits, and the exponent up to 11 bits, with a bias of 1023.

To convert the string 123.45 to an equivalent binary IEEE 754 double...

First the sign is positive, so the sign bit is 0.

Second, to get the exponent we do

floor(ln(123.45) / ln(2)), so the exponent is 6.
Including the bias of 1023, we have 1023 + 6 = 1029 = 10000000101 (binary) for the exponent

Third, to get the fraction, we do

fraction = 123.45 / 2^6 - 1 = 0.92890625 (exact in decimal)

In binary: to 52 bits we have 1110110111001100110011001100110011001100110011001101

Notice there is a repeating cell 0011 that repeats forever.

Given that 52 bits in this instance is an approximation the maximum error due to a change of 1 bit at the 52nd place:

(1 + 2^(-52)) * 2^6 - 1 * 2^6 = 2^(-52) * 2^6 = 2^(-52) = 2.22E-16. Since this is smaller than the last decimal in original number (123.45) by far, the error induced by truncating the repeating binary does not result in an error in representation, so effectively 123.45 can be represented exactly.

I would say this is compiler error. It isn't following the IEEE 754 standard.

jamescowens · 2023-12-03T15:06:05Z

@theMarix are we still seeing this?

theMarix · 2023-12-06T21:02:23Z

I'll need to check as I had pinned the package build to GCC 12 to avoid not having up-to-date packages on x86.

theMarix · 2023-12-07T08:02:43Z

I can still reproduce with the latest development branch on current openSUSE Tumbleweed x86.

jamescowens · 2024-02-05T07:29:39Z

This jamescowens@fb06203 should fix it. Here I make use of the augmented Fraction class/Allocation class that I did for PR #2704.

jamescowens · 2024-02-09T17:45:50Z

@theMarix can you try #2737 and verify it clears the issue?

theMarix · 2024-02-11T16:24:03Z

I'll give it a look as soon as I can. I currently have to deal with some issues that prevent me from spending a lot of time at my workstation. So this might take until the next weekend.

theMarix · 2024-02-17T16:24:02Z

@jamescowens , Just tested on the latest development branch and on your commit. Both cases still show the error.

jamescowens · 2024-02-17T21:34:34Z

Can you provide the test log output please.Sent from my iPhoneOn Feb 17, 2024, at 11:24 AM, Matthias Bach ***@***.***> wrote: @jamescowens , Just tested on the latest development branch and on your commit. Both cases still show the error. —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>

jamescowens · 2024-02-18T16:31:40Z

@theMarix I just tried the podman script you gave and am getting several errors. I had to add gawk and got farther, but then am getting errors with the bdb53 compile indicating a mixed arch problem. Did you update the Dockerfile script since the earlier runs above?

theMarix · 2024-02-19T18:51:50Z

@jamescowens , it's weird that you had to update the file. But it is possible that some updates were needed over time. I probably adjusted those on the fly when updating the spec for the actual packaging so I might have forgotten to update it here. Attached the latest I've been using.
Dockerfile.suse.txt

Further, the following is the complete build and test log from running podman build --arch 386 -f Dockerfile.suse.txt . on commit 871aad2:
suse-x86.log

jamescowens · 2024-02-22T15:04:17Z

You had definitely made tweaks. The one above worked. :)

Here is a grepped version of the test output on the current dev branch. The file which I made changes is not generating any errors. We have errors coming from the magnitude to float conversion which has not failed anywhere else, and quite frankly should not fail. This is definitely a compiler problem. The compiler is out of spec. I have to think carefully about what to do here, because I do not want to weaken the tests simply to make this special case pass.

OpenSUSE i386 test errors.log

theMarix added the bug label Jun 23, 2023

jamescowens added this to the Miss Piggy milestone Jun 24, 2023

jamescowens added the build system label Dec 3, 2023

jamescowens self-assigned this Feb 5, 2024

jamescowens mentioned this issue Feb 8, 2024

refactor: Fix compilation warnings #2737

Merged

jamescowens closed this as completed in #2737 Feb 11, 2024

jamescowens reopened this Feb 18, 2024

jamescowens mentioned this issue Feb 18, 2024

util: Improve allocation class #2740

Merged

jamescowens closed this as completed in #2740 Feb 18, 2024

jamescowens reopened this Feb 18, 2024

jamescowens mentioned this issue Feb 22, 2024

build: Implement comp_double comparison function in certain tests #2741

Merged

jamescowens closed this as completed in #2741 Mar 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tests fail on x86 #2691

Tests fail on x86 #2691

theMarix commented Jun 23, 2023

jamescowens commented Jun 24, 2023

jamescowens commented Jun 24, 2023

theMarix commented Jun 26, 2023

theMarix commented Jun 26, 2023

jamescowens commented Jun 26, 2023

theMarix commented Jun 26, 2023

jamescowens commented Jun 26, 2023

jamescowens commented Jun 26, 2023 •

edited

Loading

jamescowens commented Jun 27, 2023

theMarix commented Jun 27, 2023

theMarix commented Jun 27, 2023

div72 commented Jun 27, 2023

jamescowens commented Jul 4, 2023

jamescowens commented Jul 5, 2023 •

edited

Loading

jamescowens commented Dec 3, 2023

theMarix commented Dec 6, 2023

theMarix commented Dec 7, 2023

jamescowens commented Feb 5, 2024 •

edited

Loading

jamescowens commented Feb 9, 2024

theMarix commented Feb 11, 2024

theMarix commented Feb 17, 2024

jamescowens commented Feb 17, 2024 via email

jamescowens commented Feb 18, 2024

theMarix commented Feb 19, 2024

jamescowens commented Feb 22, 2024 •

edited

Loading

Tests fail on x86 #2691

Tests fail on x86 #2691

Comments

theMarix commented Jun 23, 2023

Bug Report

jamescowens commented Jun 24, 2023

jamescowens commented Jun 24, 2023

theMarix commented Jun 26, 2023

theMarix commented Jun 26, 2023

jamescowens commented Jun 26, 2023

theMarix commented Jun 26, 2023

jamescowens commented Jun 26, 2023

jamescowens commented Jun 26, 2023 • edited Loading

jamescowens commented Jun 27, 2023

theMarix commented Jun 27, 2023

theMarix commented Jun 27, 2023

div72 commented Jun 27, 2023

jamescowens commented Jul 4, 2023

jamescowens commented Jul 5, 2023 • edited Loading

jamescowens commented Dec 3, 2023

theMarix commented Dec 6, 2023

theMarix commented Dec 7, 2023

jamescowens commented Feb 5, 2024 • edited Loading

jamescowens commented Feb 9, 2024

theMarix commented Feb 11, 2024

theMarix commented Feb 17, 2024

jamescowens commented Feb 17, 2024 via email

jamescowens commented Feb 18, 2024

theMarix commented Feb 19, 2024

jamescowens commented Feb 22, 2024 • edited Loading

jamescowens commented Jun 26, 2023 •

edited

Loading

jamescowens commented Jul 5, 2023 •

edited

Loading

jamescowens commented Feb 5, 2024 •

edited

Loading

jamescowens commented Feb 22, 2024 •

edited

Loading