Fixed IntegerFormatStyle for values > Int64.max. #262

wadetregaskis · 2023-09-16T06:10:48Z

If I understand the NumberFormatter implementation in the ICU library, this should handle arbitrary-precision numbers, as long as they have a description which returns their POSIX-formatted representation, although that's not in the unit tests because it requires a working arbitrary-precision type, and I assume it's not acceptable to pull in a BigInt package for this purpose, and my efforts to make a quick-and-dirty one for testing purposes led only to tears and threats against Xcode's life.

If I understand the NumberFormatter implementation in the ICU library, this _should_ handle arbitrary-precision numbers, as long as they have a `description` which returns their POSIX-formatted representation, although that's not in the unit tests because it requires a working arbitrary-precision type, and I assume it's not acceptable to pull in a BigInt package for this purpose, and my efforts to make a quick-and-dirty one for testing purposes led only to tears and threats against Xcode's life.

itingliu · 2023-09-18T23:40:52Z

@swift-ci please test

itingliu · 2023-09-19T16:44:14Z

@swift-ci please test

itingliu · 2023-09-20T17:25:10Z

To be precise, ICU requires the string to be a "numeric string" as defined here: https://speleotrove.com/decimal/daconvs.html#refnumsyn

Ideally we would want to implement our own if we are to follow the spec, but realistically I doubt swift's implementation of description will ever change, so it'd probably work fine. Using description however doesn't feel right... I'm curious to hear what others think. @parkera @glessard ?

parkera · 2023-09-20T17:44:43Z

We've certainly been known to change descriptions for various reasons over the years, so I would vote for using a dedicated internal function to be clear what the requirements are.

wadetregaskis · 2023-09-20T18:21:15Z

Ideally we would want to implement our own if we are to follow the spec, but realistically I doubt swift's implementation of description will ever change, so it'd probably work fine. Using description however doesn't feel right... I'm curious to hear what others think. @parkera @glessard ?

It's not just what the built-in integer types do for description, it's 3rd party types too. My reason for doing this, for example, is because I want to use arbitrary-precision integers (via BigInt) while still having usable display in GUIs (because big numbers especially need things like thousands separators).

There are many 'BigInt' packages for Swift, and while I didn't exhaustively check them, I did look at most and they all do render the desired output from their description (because none of them remotely try to produce localised or "nicely" formatted results, they all do the same basic thing of just splatting out the latin decimal digits).

I did initially intend to write a BinaryInteger-to-Unicode-number-string function for this, but deferred it given:

All my examples already produce the right output.
I don't know what BinaryInteger.description intends to do as-is (e.g. maybe it's already defined to output in the desired format? Sounds like no, though, from @parkera's comment).
I wasn't sure how much effort it'd be to write an efficient implementation.
I couldn't conclusively locate the actual format specification, from ICU. They refer vaguely to some other spec document, which doesn't seem to contain a clear number format specification.

If everyone's generally happy with this change otherwise, then I'm happy to dig into this last aspect and address that too. I just didn't want to do that work pre-emptively without knowing if this fix would be welcome to begin with.

itingliu · 2023-09-20T22:43:45Z

Thinking more about this I think it'd be more reasonable if we implement a string following the spec of on our own

The problem with relying on .description is that there's no API contract for what that returns, so the fact that it works is pure coincidence. On the contrary we have all the internal values for BinaryInteger and BinaryFloatingPoint thanks to the protocol requirement, which would allow us to implement such function reliably.

I wasn't sure how much effort it'd be to write an efficient implementation.

I understand this concern, but I think it's totally fine to start with something that works intuitively, and defer non-obvious optimization later if needed. I'm all for avoiding unnecessary premature optimization

I couldn't conclusively locate the actual format specification, from ICU. They refer vaguely to some other spec document, which doesn't seem to contain a clear number format specification.

https://speleotrove.com/decimal/daconvs.html#refnumsyn this is the spec. I agree that it's unclear, but in my opinion it's not much different from how opaque .description is

I just didn't want to do that work pre-emptively without knowing if this fix would be welcome to begin with.

Yes we do welcome changes!

…tStyle`. This patch introduces an explicit `numericStringRepresentation` property for `BinaryInteger`, that can be relied on to produce exactly the decimal string expected by the ICU library rather than relying on `BinaryInteger`'s default `description` (which happens to produce the same format today but isn't guaranteed to in future, and can be overridden by 3rd party implementations of `BinaryInteger`). This uses a non-trivial algorithm, nominally for performance although it has not been benchmarked. The trivial (relatively-speaking) algorithm is to just iteratively divide `self` by 10, which produces one digit of the result per iteration. This is in fact what is done for all values that fit into a natural machine word (`UInt`), as a fast path optimisation. But for `BinaryInteger`s which are larger than a machine word - and especially for those that have unbounded size, such as arbitrary-precision integers, dividing that entire number by 10 may be quite expensive. So a potential optimisation - as used in this patch - is to leverage the fact that every `BinaryInteger` is defined in terms of one or more `UInt`s, by (essentially) processing a whole `UInt` at a time. There's still a divide-by-ten loop involved per UInt, but integer division on a natural machine word is likely much faster than those same divisions on the full arbitrary-precision number.

wadetregaskis · 2023-09-26T04:30:46Z

Sorry for the delay - I installed Sonoma on my Macs last week and it took pretty much the entire week to repair the damage it did. That's still a work in progress, but I can at least get Xcode to launch again, so I've picked this up.

I do have a working implementation of the additional changes, to not rely on description.

Unfortunately I rebased and I'm now blocked by the fact that top of tree doesn't build anymore (even before I reapply my patches). 1,513 errors, things like "'Predicate' is only available in macOS 9999 or newer" and "CircularReference". I'll wait until those are all fixed before resuming my work.

itingliu · 2023-09-26T17:06:17Z

Unfortunately I rebased and I'm now blocked by the fact that top of tree doesn't build anymore (even before I reapply my patches). 1,513 errors, things like "'Predicate' is only available in macOS 9999 or newer" and "CircularReference". I'll wait until those are all fixed before resuming my work.

Thanks for bringing this up to our attention! We're going to resolve this soon

parkera · 2023-09-26T18:31:34Z

Unfortunately I rebased and I'm now blocked by the fact that top of tree doesn't build anymore (even before I reapply my patches). 1,513 errors, things like "'Predicate' is only available in macOS 9999 or newer" and "CircularReference". I'll wait until those are all fixed before resuming my work.

Thanks for bringing this up to our attention! We're going to resolve this soon

#271

…may be faster. It depends on how the compiler treats the literal - e.g. it might already replace that with a simple reference to the `.zero` constant - but this way removes the possibility of an inefficiency (e.g. actually creating a duplicate instance of the zero constant).

wadetregaskis · 2023-09-29T23:37:41Z

Does this project support 32-bit platforms (specifically, where UInt is 32-bit)?

…h standalone test cases. Technically not an issue when they're nested inside test cases, but still potentially confusing to a human. Plus, this way they can be moved up a level without having to be renamed (if later tests would like to use the same helpers).

…l (from `test`), moved it up a level, and changed its `magnitude` parameter to be `UInt`. This way it can be used from additional test cases in future. The parameter type change is helpful for any future tests that want to use this with types that might not be initialisable from the necessary integer literal (e.g. an arbitrary-precision integer which doesn't support ExpressibleByIntegerLiteral from StaticBigInt and is thus limited to Int, yet its word magnitude will still be UInt as required by the BinaryInteger protocol and its word magnitude value will not fit into an Int).

… rather than simple assignment via subscripting, as the latter mysteriously crashes once the buffer gets large enough. I wasn't able to precisely root-cause the crash, but it does seem like it's something to do with ContiguousArray's assignment operator implementation assuming `self` is valid, which of course it isn't (yet) in this case. I didn't explore how `initializeElement(at:to)` does the (figuratively) same assignment without running afoul of this, just observed that it evidently does.

…ded. The previous algorithm worked pretty amazingly well, considering how baseless it was in mathematical correctness (it was inspired by examples found in _many_ arbitrary-precision numerics libraries, suggesting many people before me also cargo-culted the calculation - but at least I *checked* it). It took me a while to find an example case that would actually break it (using a BigInt library) and I wasn't able to find an example that was less than hundreds of digits. I'm sure there's a much smaller value that would expose the problem with the prior code, I just didn't feel it a valuable exercise to go full maths nerd in order to find it.

wadetregaskis · 2023-09-30T01:50:09Z

Getting there. I believe I've address all the correctness issues for positive numbers, but there's at least one remaining bug for negative.

Also, this surely won't work if UInt is 32-bit, but only because I've hard-coded the assumption of 64-bitness into the unit tests (the non-test code should work fine in any bitness - I was careful not to hard-code any assumptions there). I don't have any way (that I know of) to actually test the project in 32-bit mode.

I also still want to optimise the output generation - it's currently using a little over twice as much memory as it has to, due to the two-phase process of first assembling the 'word' blocks in reverse order, then glueing them together. I think I can combine those two (possibly making the code a little simpler overall).

…ant to that invariant.

…MagnitudePerWord on a real BigInt implementation.

This doesn't make the code shorter per-se but it does make it a bit simpler in most respects. It does make the buffer indexing slightly more complicated. But the benefit, of course, is that it no longer allocates twice as much memory as actually needed, and does less work at runtime to assemble the result. Because the algorithm emits the resulting digits in reverse order (least significant to most), the buffers get filled from end to front. Since the estimate of required buffer size is only an upper bound, not a precise value, that means sometimes there's some space at the front of the buffer that ends up unused. Rather than doing a memmove to shunt the contents of the buffer up to its front - which would be pretty expensive for non-trivially-sized numbers - the `numericStringRepresentation` property now simply returns an `ArraySlice` instead of a `ContiguousArray`. The callers don't care - same interface as far as they're concerned - and they quickly dispose of the `ArraySlice` which in turn disposes off the underlying `ContiguousArray` anyway.

wadetregaskis · 2023-09-30T17:47:22Z

@swift-ci please test

wadetregaskis · 2023-09-30T17:49:28Z

Close to finished now. I still need to do a final review and clean up any rough edges or outdated comments.

I added a dependency on attaswift/BigInt in order to unit test against a real arbitrary-precision integer. I kept that to a single patch so it's easy to exclude or revert - though that dependency is only used for testing purposes, I don't know if it's considered kosher to include it.

The unit tests still only work in 64-bit, too. Let me know if that needs to change.

…ned (it was one too high). There _was_ actually a reason for this, originally, I vaguely recall… I just don't know now why. What it was (before this patch) returning was actually the _maximum_ number of decimal digits for values of the given type, which is _not_ what it's actually needed for.

…tension type - now just "UInt" outright. Of course this is equivalent to the prior extension point of "BinaryInteger where Self == UInt". I think I just wrote it that way because of the particular frame of reference I had at the time. Thankfully testing shows that the compiler doesn't require that more obtuse way of writing "UInt" in order to correctly detect and use the specialisation.

… the result directly using pretty trivial maths. This does require doing a power operation, though, so it requires Darwin or equivalent (e.g. Glibc). And since thats now required I removed the previously hard-coded constant for log10(2) in favour of just calling `log10`. Although I'm not sure if the compiler is clever enough to simplify that to the actual value at compile time, to improve runtime performance. 🤔 It is possible to hard-code the guess values for a finite number of bit widths (of which we really only care about 32 & 64), which would avoid the maths and the depending on Darwin etc, but I couldn't find a way to do that which didn't result in the compiler complaining about unreachable code. It only issues warnings in that case, but still, having pointless warnings forever attached to this code is unpalatable.

@iCharlesHu

…nto the methods themselves. The comments in question pertain to implementation details, that probably aren't of interest to mere users of these methods & properties. At @iCharlesHu's request (swiftlang#262 (comment)).

@iCharlesHu

…cy, at @iCharlesHu's request (swiftlang#262 (comment)).

@iCharlesHu

…ndition, at @iCharlesHu's request (swiftlang#262 (comment)). It should never trip as long as `UInt(exactly:)` and the implementation of the concrete type's words property aren't buggy. If either is, then it's likely that the result of this method will be wrong anyway. So the benefit of the check is probably just a faster and clearer failure. But the downside of the precondition check seems quite small - slight increase in code size and a [hopefully very small] runtime cost.

…m the default of internal). It's not used outside the file and is a fairly specific implementation detail of the `numericStringRepresentation` property, so it would be wise to discourage casual use.

…#262. Arguably they should already have been there, to avoid exactly this kind of spurious noise when adding or removing dependencies, but, 🤷‍♂️.

oscbyspro

Hi, I saw this pull request mentioned on the Swift forums and wanted to try it out. While doing so, I noticed that a debug assertion fails for fixed-width integers wider than UInt (see comment). Without this assertion, the rest works with my double-width integer in Numberick.

Sources/FoundationInternationalization/Formatting/Number/BinaryInteger+FormatStyle.swift

This is attempting to work around `bitWidth` not working properly (IMO) for FixedWidthIntegers (they return the bit width of the _type_, not the _actual value at hand_). This is non-trivial to implement because signed numbers. 🙁 Specifically, the use of two's complement (for `BinaryInteger`'s underlying storage, in `words`) - the edge cases of -(every power of two) which fits into a signed type one bit smaller than +(every power of two). So we need to detect powers of two - easy, right? No. We only have a `Numeric` value (for the `magnitude`), which is next to useless because it's _such_ a high-level protocol with very little API. It _might_ be possible to make this work using just the `words`, by basically reinventing all the necessary arithmetic operations on `words`, but it'll be a bit hairy and duplicative of functionality already in the _actual_ `BinaryInteger` we're operating on (but blocked from us by the type system). I'm checking this in for posterity - in case for some reason this turns out to be the only viable path - but I think all this is unnecessary… what we really want is the number of bits required to represent the _magnitude_, because that's what we're actually going to convert to a numeric string…

@glessard

…on(intoEndOfBuffer:)` with an assert, at @glessard's request (swiftlang#262 (comment))." We'd both overlooked the fact that it _can_ be called on zero because it's used for the individual words of the BinaryInteger, any of which can actually be zero. 72cd8f0

…iven `BinaryInteger`. I resisted just deriving it from `words` at the outset because I thought surely that's naive; surely there's a more elegant if not more efficient way. But there isn't, as far as I can tell. And this actually isn't complicated nor all that expensive anyway, it turns out (at least for `BinaryInteger`s which natively store their values in two's complement form, and thus don't have to specially generate `words` on demand). An important aspect of this is that what we care about is _not_ the two's complement form, which is what all the built-ins (e.g. `bitWidth`) nominally return, but that size of the _unsigned_ value that is the _magnitude_. That's what we're actually working on; converting to its constituent decimal digits. So the whole off-by-one problems of two's complement representation, with negative powers of two, etc, are all irrelevant really.

…, to cover fixed-width 128-bit and 256-bit integers. These are conditionalised on the inclusion of Numberick in Package.swift, which is not the default. Like the existing BigInt tests. This proved useful as it did reveal a bug - one I'd actually discovered previously but apparently the fix got lost during Git patch shuffling or something - in `magnitudeBitWidth`, whereby it was forgetting to check if the current word is actually non-zero before treating it as relevant (meaning not just zero padding). This wasn't caught by the BigInt tests because those BigInt / BigUInt are flexible-width and only use as many words as strictly necessary to contain the value.

wadetregaskis · 2023-10-10T23:11:03Z

I added unit tests using Numberick (https://github.com/oscbyspro/Numberick) in the same vein as the ones using BigInt, which found one additional bug. Mostly the code doesn't care about fixed-width vs flexible-width but there is an underlying difference as the former may zero-pad words to the expected length, whereas the latter doesn't have any reason to (but also may have redundant, zero-padding words, I suppose - but attaswift/BigInt doesn't).

I knew what I meant when I wrote it - I got the actual description correct regarding the rounding mode - but I inverted the two letters. T (for truncating, a la `.rounded(.towardZero)`) vs F (for flooring, a la `.rounded(.down)`).

…ap-around is intentional. The comment might seem excessive, but in reviewing this I had already forgotten that I was using the ~0 + 1 == 0 trick, and I tried to replace '&+' with plain '+', which then crashes (thankfully I'd also had the foresight to make a unit test case which checks this exact situation). So, for future readers who aren't even me, a comment might save them some time (or anguish). 🙂

wadetregaskis · 2023-10-13T22:54:58Z

@iCharlesHu & @glessard, I think this is just waiting on your clarifications regarding your earlier comments. Can you take a quick look, please?

wadetregaskis · 2023-10-31T01:59:40Z

So… should I drop this PR? It seems like it's not going anywhere. 😞

I'm not sure how best to put this; I don't want to offend… this was my first attempt to contribute to an Apple open source project, and I had quite some enthusiasm towards some further contributions, but now I think that would be a bad idea.

This is fundamentally a trivially bug fix, for something that was really obviously broken for a long time. It ended up being many days of work for me - which would have been fine, it was educational at least, iff it actually went through. Doing all that work to no actual avail is disheartening.

itingliu

Sorry @wadetregaskis I was absorbed in my own work and this just slipped my mind. This looks good to merge to me. I'm going to kick off another CI run and then I'll merge it

If @iCharlesHu or @glessard has further comments we can always address them in a follow up PR

itingliu · 2023-10-31T15:01:50Z

@swift-ci please test

parkera · 2023-10-31T15:12:15Z

This is weird, but I doubt it's related to this PR:

[889/956] Compiling FoundationEssentials Predicate.swift
Swift/UnsafeRawPointer.swift:421: Fatal error: load from misaligned raw pointer
Current stack trace:

parkera · 2023-10-31T15:13:47Z

Thank you so much @wadetregaskis! I really appreciate this contribution and the detailed work you put into it.

stephentyrone · 2023-10-31T15:13:53Z

Yeah, I don't see anyway for this change to manifest that error.

- [IntegerFormatStyle] I removed the trapping conversion to `Int`. IntegerFormatStyle can format big integers since (swiftlang#262). - [FloatingPointFormatStyle] I removed the rounding conversion to `Double`. `formatted()` now does whatever `format(_:)` does. - [Decimal.FormatStyle] N/A (there were no conversions here).

* Improved integer and floating point `formatted()` methods. - [IntegerFormatStyle] I removed the trapping conversion to `Int`. IntegerFormatStyle can format big integers since (#262). - [FloatingPointFormatStyle] I removed the rounding conversion to `Double`. `formatted()` now does whatever `format(_:)` does. - [Decimal.FormatStyle] N/A (there were no conversions here). * Reenabled "Decimal Tests" in NumberFormatStyleTests.swift. - IntegerFormatStyle big integer tests succeed. - IntegerFormatStyle.Attributed big integer tests fail (output is clamped to `Int64`). * Fixes IntegerFormatStyle.Attributed. - Added a numeric string representation case to ICUNumberFormatter.Value. - IntegerFormatStyle.Attributed now uses the above instead of `Int64(clamping:)`. * Removed conversions to Decimal in each integer format style (#186). BinaryInteger's numeric string representation supersedes Decimal in the following cases: 1. IntegerFormatStyle. 2. integerFormatStyle.Attributed. 3. IntegerFormatStyle.Currency. 4. IntegerFormatStyle.Percent. * Check whether numeric string is zero using Double. The numeric string format permits redundant zeros (like `+00.00`). * Removed `isZero` and `doubleValue` from `ICUNumberFormatter.Value`. Both `isZero` and `doubleValue` were used in `ByteCountFormatStyle`. These values are now taken from `FormatInput` (`Int64`) instead. Removing them from `ICUNumberFormatter.Value` makes it easier to accommodate non-numeric payloads such as strings, which can be used to format arbitrary precision numbers. * Added `_format(_:doubleValue:)` to `ByteCountFormatStyle`. Here's an internal method simliar to the `_format(_:)` method removed earlier because  wants its method back! I removed the first method to accommodate `ICUNumberFormatter.Value` cases that cannot implement `doubleValue`. The new method parameterizes the conversion instead, so you can call it whenever conversions to `Double` are possible.

wadetregaskis added 2 commits September 25, 2023 21:07

Merge branch 'apple:main' into main

fed358b

wadetregaskis added 3 commits September 26, 2023 17:07

Merge branch 'apple:main' into main

778a6d0

Merge branch 'apple:main' into main

dac9328

wadetregaskis added 4 commits September 29, 2023 16:44

wadetregaskis added 4 commits September 30, 2023 08:47

Handle negative remainders correctly, by ignore the sign bit.

b78c9db

Moved the Word == UInt precondition to a location more directly relev…

9efae3d

…ant to that invariant.

Added unit tests for numericStringRepresentation and decimalDigitsAnd…

854b010

…MagnitudePerWord on a real BigInt implementation.

wadetregaskis added 3 commits October 1, 2023 17:14

wadetregaskis added 6 commits October 5, 2023 16:59

Removed the commented-out additions of attaswift/BigInt as a dependen…

9f90a3f

…cy, at @iCharlesHu's request (swiftlang#262 (comment)).

Made numericStringRepresentation(intoEndOfBuffer:) fileprivate (fro…

31c585d

…m the default of internal). It's not used outside the file and is a fairly specific implementation detail of the `numericStringRepresentation` property, so it would be wise to discourage casual use.

Merge remote-tracking branch 'upstream/main'

9a7854b

Removed trailing commas that were originally absent, before swiftlang…

b848a28

…#262. Arguably they should already have been there, to avoid exactly this kind of spurious noise when adding or removing dependencies, but, 🤷‍♂️.

oscbyspro suggested changes Oct 8, 2023

View reviewed changes

Sources/FoundationInternationalization/Formatting/Number/BinaryInteger+FormatStyle.swift Outdated Show resolved Hide resolved

wadetregaskis added 6 commits October 8, 2023 16:06

Merge remote-tracking branch 'upstream/main'

152c4b8

Merge remote-tracking branch 'upstream/main'

eacacc6

wadetregaskis added 2 commits October 10, 2023 16:30

wadetregaskis mentioned this pull request Oct 13, 2023

A word-based integer encoding approach. wadetregaskis/swift-foundation#1

Closed

itingliu approved these changes Oct 31, 2023

View reviewed changes

parkera merged commit 534a60d into swiftlang:main Oct 31, 2023
1 of 2 checks passed

itingliu mentioned this pull request Nov 7, 2023

Inconsistent formatting behavior when BinaryInteger cannot be converted into Int64 #186

Open

oscbyspro mentioned this pull request Nov 14, 2023

A word-based binary integer encoding approach. #306

Merged

oscbyspro mentioned this pull request Dec 13, 2023

Improved big-number formatting #346

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed IntegerFormatStyle for values > Int64.max. #262

Fixed IntegerFormatStyle for values > Int64.max. #262

wadetregaskis commented Sep 16, 2023

itingliu commented Sep 18, 2023

itingliu commented Sep 19, 2023

itingliu commented Sep 20, 2023

parkera commented Sep 20, 2023

wadetregaskis commented Sep 20, 2023

itingliu commented Sep 20, 2023 •

edited

Loading

wadetregaskis commented Sep 26, 2023

itingliu commented Sep 26, 2023

parkera commented Sep 26, 2023

wadetregaskis commented Sep 29, 2023

wadetregaskis commented Sep 30, 2023

wadetregaskis commented Sep 30, 2023

wadetregaskis commented Sep 30, 2023

oscbyspro left a comment

wadetregaskis commented Oct 10, 2023

wadetregaskis commented Oct 13, 2023

wadetregaskis commented Oct 31, 2023

itingliu left a comment

itingliu commented Oct 31, 2023

parkera commented Oct 31, 2023

parkera commented Oct 31, 2023

stephentyrone commented Oct 31, 2023

Fixed IntegerFormatStyle for values > Int64.max. #262

Fixed IntegerFormatStyle for values > Int64.max. #262

Conversation

wadetregaskis commented Sep 16, 2023

itingliu commented Sep 18, 2023

itingliu commented Sep 19, 2023

itingliu commented Sep 20, 2023

parkera commented Sep 20, 2023

wadetregaskis commented Sep 20, 2023

itingliu commented Sep 20, 2023 • edited Loading

wadetregaskis commented Sep 26, 2023

itingliu commented Sep 26, 2023

parkera commented Sep 26, 2023

wadetregaskis commented Sep 29, 2023

wadetregaskis commented Sep 30, 2023

wadetregaskis commented Sep 30, 2023

wadetregaskis commented Sep 30, 2023

oscbyspro left a comment

Choose a reason for hiding this comment

wadetregaskis commented Oct 10, 2023

wadetregaskis commented Oct 13, 2023

wadetregaskis commented Oct 31, 2023

itingliu left a comment

Choose a reason for hiding this comment

itingliu commented Oct 31, 2023

parkera commented Oct 31, 2023

parkera commented Oct 31, 2023

stephentyrone commented Oct 31, 2023

itingliu commented Sep 20, 2023 •

edited

Loading