Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Wrong" output for double value #273

Closed
josesimoes opened this issue Aug 29, 2024 · 16 comments
Closed

"Wrong" output for double value #273

josesimoes opened this issue Aug 29, 2024 · 16 comments

Comments

@josesimoes
Copy link

Hey,

I was testing this library to replace the current implementation of printf for .NET nanoFramework.
Everything looks great and all tests pass, except a couple of ones trying to print a double 1234.567 which, using the format specifier %0.19G is being printed as 1234.5669999998062849045

I've played around a bit and found that if I use up to %0.9G it outputs nicelly 1234.567000000. Above nine it messes up with the rounding.

Is this something that can be easily fixed?

Thanks!

@charlesnicholson
Copy link
Owner

Heya @josesimoes - I'm not sure it's actually a bug, though I think it probably depends on what your expectation is :)

1234.567 can not be perfectly represented by a single- or double-precision float. See here: https://lukaskollmer.de/ieee-754-visualizer/
Screenshot 2024-08-29 at 12 06 25 PM

Or here, through https://www.h-schmidt.net/FloatConverter/IEEE754.html
Screenshot 2024-08-29 at 12 06 44 PM

But, the double-precision representation does come very close, as you've seen. nanoprintf rounds to the final requested precision digit, so when you ask for %0.9G, I think what's happening is that it's computing it to the 9th precision digit and the cascading rounding lands you on 1234.567000000. But, when you ask for 19 precision digits, it's taking you past the run of 9's and rounding up to the value at the end of the stream. That doesn't cause a rounding cascade, so you're seeing how precise the double-precision approximation is to the "true" value.

nanoprintf will give you as much precision as you ask for, and then round to the final digit. That's what it's doing here, and I think it's giving you the expected result.

You can see that gcc and clang also do not print 1234.567 with %0.19G, though they do give different results than nanoprintf (i'm not sure why tbh). https://gcc.godbolt.org/z/WMKTzsqdz

Though, I'll cc @Okarss to spot-check me, if he's around and has a spare minute! :) (He just rewrote the entire floating-point conversion code recently.)

@josesimoes
Copy link
Author

josesimoes commented Aug 29, 2024

@charlesnicholson wow! Thank you so much for the prompt and detailed reply. 👍🏻 😄

Yeap, I understand that this one can't be perfectly represented as float. That's why we're using it in one of the tests 😉
Looking at the output of gcc it gives 1234.567000000000007 which is OK.
Because of these errors with rounding, we add a couple of extra precision to the required one so we can drop those and then get a "perfect" result.
This is only valid, of course of the rounding error is "carried" to the last digit.
Now, if the rounding error starts to accumulate and propagates, this "workaround" doesn't work anymore. As this seems to be the case.

With the current embedded printf implementation that we're using (es-printf)[https://github.com/skirridsystems/es-printf] (for several years now) this is not happening.

@charlesnicholson
Copy link
Owner

Gotcha :) Yeah, let's see if @Okarss chimes in on this one!

@Okarss
Copy link
Contributor

Okarss commented Aug 29, 2024

Hello, guys!

Basically you both have understood what's happening correctly and indeed it is not a bug, but a deliberate compromise by design. To get the smallest code size, the float implementation in nanoprintf sacrifices the absolute mathematical correctness and therefore will not print exactly the same values as the full implementations. As the README file explains, with a conversion integer type width of N bits on average the algorithm retains N - log2(5) bits of accuracy and fraction parts with up to that number of bits after the decimal separator are converted perfectly without loosing any bits. Translating that to decimal digits the formula is log10(2 ^^ N / 5), which gives 8.9 and 18.6 digits for 32-bit and 64-bit processing respectively.

Currently you are using (most likely the default) 32-bit processing. One thing you can do is try the 64-bit processing by defining this for nanoprintf:

#define NANOPRINTF_CONVERSION_FLOAT_TYPE uint64_t

For this particular test case the number then will be printed perfectly, but take a note that it just makes the processing more accurate, not perfect. Also take a note that it will pull in the runtime code for 64-bit integer arithmetic if your other code is not using it already.

Apart from the README file, one can find additional fine details about the implementation and floating point math in general in #221 and #263.

@josesimoes
Copy link
Author

OK, I think that will get us there.
Here's the output using uint64_t: 1234.5670000000000072760 which is pretty much in line with the one from GCC.

And yes, this is running on 32bits systems (although we have a virtual device running on 64-bits).
And again, yes, we're already pulling the 64 bits integer arithmetic so no worries with that.

Out of curiosity, a build from one of our targets, has 322756B with default and goes to 323044B after adding the define for uint64_t. Again, pretty acceptable considering that we need it for this to work.

Thank you, guys, for the awesome support and congrats on this fine library! 👍🏻 😄

@josesimoes josesimoes changed the title Wrong output for double value "Wrong" output for double value Aug 30, 2024
@charlesnicholson
Copy link
Owner

@josesimoes that's great to hear! I'm curious- do you happen to have any size or performance benchmarks that you could share as a part of your work? I'm always eager to hear how nanoprintf performs "in the wild" :)

Also, thanks to @Okarss for supporting :)

@josesimoes
Copy link
Author

I'm affraid I have a couple more...

Ouput of 1234567.1210 with format %0.2f" is returning 1234567.13, instead of 1234567.12` (expected and confirmed with https://gcc.godbolt.org/

and when testing the max values:

double max_double = DBL_MAX;
printf("%0.19G", max_double);

I get: 0000000000000000ERR

@josesimoes josesimoes reopened this Aug 30, 2024
@charlesnicholson
Copy link
Owner

ERR means that nanoprintf doesn't have the available bit width to fully represent the value. I vaguely remember @Okarss adding extrema (min + max representable value) tests though? @Okarss can I tag you back in here please?

@Okarss
Copy link
Contributor

Okarss commented Aug 30, 2024

For the 1234567.121 my guess is that you are compiling it for targets, where the double type is 32-bit large and typically the same as the float type. You can quickly check it with sizeof(double). In a 32-bit float format that number is represented as 1234567.125 and therefore gets rounded to 1234567.13. This page can show the actual values of both precision types at the same time. Though for this particular number some implementations could still print 1234567.12 if they are using round (ties) to even algorithm. Anyway even for those implementations you can test the 1234567.371, which is represented as 1234567.375 and will be rounded to 1234567.38. For what CPU architecture and compilers does this happen? I know that for AVR, MSP430, RL78 (configurable) double is 32-bit.

The DBL_MAX is a very large number and nanoprintf prints err or ERR when the internal character buffer used for conversion is too small. Read about the NANOPRINTF_CONVERSION_BUFFER_SIZE define in the README file. By default it is defined to 23 bytes, but one can redefine it even to hundreds or thousands of bytes if it doesn't overflow the stack memory.

The additional zeroes printed before the ERR is indeed a bug. I managed to reproduce and simplify it down to this:

npf_snprintf(..., ..., "%.5g", 1e18);

It outputs 00err, which is incorrect. It does this with 'g', 'e', 'a' format specifiers, but not with the 'f' one. The format specifier case doesn't matter. It seems that the bug is in the field width/precision formatting code, not the float conversion code. I will look into this bug during the weekend... ☺️

@josesimoes
Copy link
Author

OK... As it is now, I have the double as uint64_t as you've suggested and also the buffer set to 48. Tried to increase it to 64, but it didn't improved.

This is running on several architures, Cortex-M for STM32, all ESP32 series, Silabs GG11, NXP and a virtual device build with the latest MVC++.

Appreciate your advice or any fix that's required. Otherwise this is a blocker for us...

@Okarss
Copy link
Contributor

Okarss commented Aug 31, 2024

The NANOPRINTF_CONVERSION_FLOAT_TYPE changes the integer type used by nanoprintf internally for float conversion. The size of double type on the other hand is defined by the compiler tool-chain. So those two are different and almost unrelated aspects.

For 64-bit double the DBL_MAX value has 309 decimal digits before the decimal point. So the NANOPRINTF_CONVERSION_BUFFER_SIZE would need to be capable of holding at least those digits plus the decimal point and fraction part. Take a note that nanoprintf doesn't implement the 'e', 'g', 'a' format specifiers and those just fall back and act as the 'f' specifier, which just prints the whole number in decimal format. Again, this is explained in the README file. 😉

@charlesnicholson
Copy link
Owner

charlesnicholson commented Aug 31, 2024

@josesimoes - One of the core value propositions of nanoprintf is that it's small above all else. @Okarss's recent overhaul of the floating point formatting code was very exciting because it dramatically increased the precision + correctness of formatting and made the code smaller! But, npf is by design not meant to have features that are at times desirable but complex and large: e.g. bit-perfect round trips between float and string, perfect rounding and precision, etc.

There are many cases where developers might not care about these floating-point processing goals, are happy with a "good enough" implementation, and just want a full-featured printf that works in tiny places. This is nanoprintf's "sweet spot".

It would be a bummer if npf didn't work for you, but if you need perfect-precision double-formatting then it's possible that it's not the right fit for your needs :(

@josesimoes
Copy link
Author

Sure! I get that. I was looking for alternative implementations and nanoprintf looked great. Not oy because of the name nano. 😅 Let's see if we can figure it out, if not, that's OK and I'll still keep it on my bookmarks. 😉

@josesimoes
Copy link
Author

@Okarss thank you for that explanation. Probably laziness from my, but admit I haven't read the readme paying enough attention... 😯

I've just increased the buffer to 364 and the output of double max is showing as 17976931348623157E+328
(I've copied the changes you've made in #274 )

@josesimoes
Copy link
Author

@charlesnicholson as I've reported above, I've copied the changes in the draft PR #274.
Still have a wrong output for max double with 17976931348623157E+328 when using uint64_t for the floating point type.
If I remove this and use the large buffer, it does output the correct value to max double, but it fails the rounding for the 1234.567
which was what started this issue.

Wanted to confirm if these two configurations are really incompatible.
If that's the case then, unfortunately, I have to drop the attempt to migrate to nanoprintf... 😞

@Okarss
Copy link
Contributor

Okarss commented Oct 5, 2024

The 1234.567 cannot be represented exactly in float format and the actual stored value in 64-bit double type is 1234.5670000000000072759576141834259033203125. Such a value has more than 32 non-zero bits in the mantissa for the fraction part and therefore cannot be processed without losses with 32-bit internal conversion code. The dynamic scaling algorithm will print 1234.56699999980628490447998046875 or the shortened and properly rounded 1234.5669999998062849045 in your case.

The 17976931348623157E+328 is not a proper DBL_MAX value. The actual one for 64-bit double type is 1.7976931348623157e+308 or 17976931348623157e+292 if the decimal separator is moved 16 places to the right. I guess that was just a math error, not an actual output value. Anyway, the correct values for the dynamic scaling algorithm depending on the internal conversion width can be seen there: 8-bit, 16-bit, 32-bit, 64-bit. I showed all four of them so that it is easier to see the pattern of the impact conversion width has on the result. Also an interesting fact is that the 8-bit version accumulates so much error for such a large value that the result looses a whole decimal digit - so the result is approximately 10x less than the mathematically perfect one.

Again, this is all explained in the "Floating-Point" chapter in the README file. It's just a few sentences there, but they explain the expected accuracy and the range of values, which will be printed mathematically perfect.

So in short:

  • There is no conflict between conversion width and buffer size settings and those can be combined in any way.
  • The dynamic scaling algorithm prints values perfectly in the range documented, but for values out of that range an error will be introduced in the internal conversion.
  • 1234.5670000000000072759576141834259033203125 can be printed perfectly only with 64-bit internal conversion.
  • DBL_MAX (64-bit) cannot be printed perfectly in any configuration. By the way, the perfect value is:
17976931348623157081452742373170435679807056752584499659891747680315726078002853
87605895586327668781715404589535143824642343213268894641827684675467035375169860
49910576551282076245490090389328944075868508455133942304583236903222948165808559
332123348274797826204144723168738177180919299881250404026184124858368

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants