Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed IntegerFormatStyle for values > Int64.max. #262

Merged
merged 52 commits into from
Oct 31, 2023

Commits on Sep 16, 2023

  1. Fixed IntegerFormatStyle for values > Int64.max.

    If I understand the NumberFormatter implementation in the ICU library, this _should_ handle arbitrary-precision numbers, as long as they have a `description` which returns their POSIX-formatted representation, although that's not in the unit tests because it requires a working arbitrary-precision  type, and I assume it's not acceptable to pull in a BigInt package for this purpose, and my efforts to make a quick-and-dirty one for testing purposes led only to tears and threats against Xcode's life.
    wadetregaskis committed Sep 16, 2023
    Configuration menu
    Copy the full SHA
    84b6e5a View commit details
    Browse the repository at this point in the history

Commits on Sep 26, 2023

  1. Removed the reliance on BinaryInteger.description for `IntegerForma…

    …tStyle`.
    
    This patch introduces an explicit `numericStringRepresentation` property for `BinaryInteger`, that can be relied on to produce exactly the decimal string expected by the ICU library rather than relying on `BinaryInteger`'s default `description` (which happens to produce the same format today but isn't guaranteed to in future, and can be overridden by 3rd party implementations of `BinaryInteger`).
    
    This uses a non-trivial algorithm, nominally for performance although it has not been benchmarked.  The trivial (relatively-speaking) algorithm is to just iteratively divide `self` by 10, which produces one digit of the result per iteration.  This is in fact what is done for all values that fit into a natural machine word (`UInt`), as a fast path optimisation.  But for `BinaryInteger`s which are larger than a machine word - and especially for those that have unbounded size, such as arbitrary-precision integers, dividing that entire number by 10 may be quite expensive.  So a potential optimisation - as used in this patch - is to leverage the fact that every `BinaryInteger` is defined in terms of one or more `UInt`s, by (essentially) processing a whole `UInt` at a time.  There's still a divide-by-ten loop involved per UInt, but integer division on a natural machine word is likely much faster than those same divisions on the full arbitrary-precision number.
    wadetregaskis committed Sep 26, 2023
    Configuration menu
    Copy the full SHA
    1ded056 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    fed358b View commit details
    Browse the repository at this point in the history

Commits on Sep 27, 2023

  1. Configuration menu
    Copy the full SHA
    778a6d0 View commit details
    Browse the repository at this point in the history

Commits on Sep 29, 2023

  1. Configuration menu
    Copy the full SHA
    dac9328 View commit details
    Browse the repository at this point in the history
  2. Use the .zero property of BinaryInteger than a literal 0 as it …

    …may be faster.
    
    It depends on how the compiler treats the literal - e.g. it might already replace that with a simple reference to the `.zero` constant - but this way removes the possibility of an inefficiency (e.g. actually creating a duplicate instance of the zero constant).
    wadetregaskis committed Sep 29, 2023
    Configuration menu
    Copy the full SHA
    cbb3bf2 View commit details
    Browse the repository at this point in the history
  3. Renamed the helpers from test to check to avoid any confusion wit…

    …h standalone test cases.
    
    Technically not an issue when they're nested inside test cases, but still potentially confusing to a human.  Plus, this way they can be moved up a level without having to be renamed (if later tests would like to use the same helpers).
    wadetregaskis committed Sep 29, 2023
    Configuration menu
    Copy the full SHA
    cb4c237 View commit details
    Browse the repository at this point in the history
  4. Renamed the decimalDigitsAndMagnitudePerWord helper to check as wel…

    …l (from `test`), moved it up a level, and changed its `magnitude` parameter to be `UInt`.
    
    This way it can be used from additional test cases in future.
    
    The parameter type change is helpful for any future tests that want to use this with types that might not be initialisable from the necessary integer literal (e.g. an arbitrary-precision integer which doesn't support ExpressibleByIntegerLiteral from StaticBigInt and is thus limited to Int, yet its word magnitude will still be UInt as required by the BinaryInteger protocol and its word magnitude value will not fit into an Int).
    wadetregaskis committed Sep 29, 2023
    Configuration menu
    Copy the full SHA
    4312195 View commit details
    Browse the repository at this point in the history

Commits on Sep 30, 2023

  1. Switched to using UnsafeMutableBufferPointer.initializeElement(at:to)…

    … rather than simple assignment via subscripting, as the latter mysteriously crashes once the buffer gets large enough.
    
    I wasn't able to precisely root-cause the crash, but it does seem like it's something to do with ContiguousArray's assignment operator implementation assuming `self` is valid, which of course it isn't (yet) in this case.  I didn't explore how `initializeElement(at:to)` does the (figuratively) same assignment without running afoul of this, just observed that it evidently does.
    wadetregaskis committed Sep 30, 2023
    Configuration menu
    Copy the full SHA
    f72a106 View commit details
    Browse the repository at this point in the history
  2. Corrected the calculation for the (maximum) number of wordStrings nee…

    …ded.
    
    The previous algorithm worked pretty amazingly well, considering how baseless it was in mathematical correctness (it was inspired by examples found in _many_ arbitrary-precision numerics libraries, suggesting many people before me also cargo-culted the calculation - but at least I *checked* it).
    
    It took me a while to find an example case that would actually break it (using a BigInt library) and I wasn't able to find an example that was less than hundreds of digits.  I'm sure there's a much smaller value that would expose the problem with the prior code, I just didn't feel it a valuable exercise to go full maths nerd in order to find it.
    wadetregaskis committed Sep 30, 2023
    Configuration menu
    Copy the full SHA
    87c866c View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    b78c9db View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    9efae3d View commit details
    Browse the repository at this point in the history
  5. Added unit tests for numericStringRepresentation and decimalDigitsAnd…

    …MagnitudePerWord on a real BigInt implementation.
    wadetregaskis committed Sep 30, 2023
    Configuration menu
    Copy the full SHA
    854b010 View commit details
    Browse the repository at this point in the history
  6. Optimised the implementation to use just one buffer to build the output.

    This doesn't make the code shorter per-se but it does make it a bit simpler in most respects.  It does make the buffer indexing slightly more complicated.
    
    But the benefit, of course, is that it no longer allocates twice as much memory as actually needed, and does less work at runtime to assemble the result.
    
    Because the algorithm emits the resulting digits in reverse order (least significant to most), the buffers get filled from end to front.  Since the estimate of required buffer size is only an upper bound, not a precise value, that means sometimes there's some space at the front of the buffer that ends up unused.  Rather than doing a memmove to shunt the contents of the buffer up to its front - which would be pretty expensive for non-trivially-sized numbers - the `numericStringRepresentation` property now simply returns an `ArraySlice` instead of a `ContiguousArray`.  The callers don't care - same interface as far as they're concerned - and they quickly dispose of the `ArraySlice` which in turn disposes off the underlying `ContiguousArray` anyway.
    wadetregaskis committed Sep 30, 2023
    Configuration menu
    Copy the full SHA
    b979c25 View commit details
    Browse the repository at this point in the history

Commits on Oct 2, 2023

  1. Corrected the digits count returned by maximumDecimalDigitsForUnsig…

    …ned (it was one too high).
    
    There _was_ actually a reason for this, originally, I vaguely recall… I just don't know now why.  What it was (before this patch) returning was actually the _maximum_ number of decimal digits for values of the given type, which is _not_ what it's actually needed for.
    wadetregaskis committed Oct 2, 2023
    Configuration menu
    Copy the full SHA
    069a6eb View commit details
    Browse the repository at this point in the history
  2. Simplified the specialisation of numericStringRepresentation re. ex…

    …tension type - now just "UInt" outright.
    
    Of course this is equivalent to the prior extension point of "BinaryInteger where Self == UInt".  I think I just wrote it that way because of the particular frame of reference I had at the time.  Thankfully testing shows that the compiler doesn't require that more obtuse way of writing "UInt" in order to correctly detect and use the specialisation.
    wadetregaskis committed Oct 2, 2023
    Configuration menu
    Copy the full SHA
    e5ae0e4 View commit details
    Browse the repository at this point in the history
  3. Added a fast path to decimalDigitsAndMagnitudePerWord that computes…

    … the result directly using pretty trivial maths.
    
    This does require doing a power operation, though, so it requires Darwin or equivalent (e.g. Glibc).  And since thats now required I removed the previously hard-coded constant for log10(2) in favour of just calling `log10`.  Although I'm not sure if the compiler is clever enough to simplify that to the actual value at compile time, to improve runtime performance. 🤔
    
    It is possible to hard-code the guess values for a finite number of bit widths (of which we really only care about 32 & 64), which would avoid the maths and the depending on Darwin etc, but I couldn't find a way to do that which didn't result in the compiler complaining about unreachable code.  It only issues warnings in that case, but still, having pointless warnings forever attached to this code is unpalatable.
    wadetregaskis committed Oct 2, 2023
    Configuration menu
    Copy the full SHA
    703a6ee View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    342513f View commit details
    Browse the repository at this point in the history
  5. Removed outdated precondition check, that Words.Element == UInt.

    If that isn't true the code won't compile anyway, as there's no definition of the required helper methods on any type other than UInt.
    wadetregaskis committed Oct 2, 2023
    Configuration menu
    Copy the full SHA
    71d661b View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    087deac View commit details
    Browse the repository at this point in the history
  7. Removed unnecessary use of Word.max.bitWidth when Word.bitWidth retur…

    …ns the same thing, but probably more efficiently.
    wadetregaskis committed Oct 2, 2023
    Configuration menu
    Copy the full SHA
    c73c7cc View commit details
    Browse the repository at this point in the history
  8. Replaced UnsafeMutableBufferPoint.Index.advance(by:) with direct arit…

    …hmetic, per the directions in that function's definition.
    
    …which say to only use `advance(by:)` for Indexes which are not integers.  I'm a bit surprised by this advice - it then hard-codes the integer index assumption for types that ever used them, like UnsafeMutableBufferPointer - but it does seem fairly safe in this case as fast manipulation of indices in UnsafeMutableBufferPointers is important, and so likely to remain simple integers.
    wadetregaskis committed Oct 2, 2023
    Configuration menu
    Copy the full SHA
    28bb1f5 View commit details
    Browse the repository at this point in the history
  9. Removed an unnecessary temporary variable ('nextWordInsertPoint').

    And I only just realised I'd typoed its name anyway - it should have been `nextWordInsertionPoint`. 🙃
    wadetregaskis committed Oct 2, 2023
    Configuration menu
    Copy the full SHA
    e6f0b18 View commit details
    Browse the repository at this point in the history
  10. Made the numericStringRepresentation unit tests work when UInt is 3…

    …2-bit… in theory.
    
    I don't have any way to build or execute the tests in 32-bit, so I cannot test this.
    wadetregaskis committed Oct 2, 2023
    Configuration menu
    Copy the full SHA
    71e8244 View commit details
    Browse the repository at this point in the history

Commits on Oct 4, 2023

  1. Commented out the import of attaswift/BigInt and put the tests that d…

    …epend on it behind canImport guards.
    
    @itingliu recommended (swiftlang#262 (comment)) not adding this as a dependency (by default) as it's a 3rd party library that could change at any time, in a way that erroneously breaks these unit tests.
    
    But it's still handy to have the tests available if someone wants to go out of their way to try them, as they did prove quite useful in finding bugs during the initial implementation.
    wadetregaskis committed Oct 4, 2023
    Configuration menu
    Copy the full SHA
    0714bfa View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    45e44e9 View commit details
    Browse the repository at this point in the history
  3. Revised the explanatory comment on why description cannot be used i…

    …nstead, despite having the same output for many types of BinaryIntegers.
    
    While the prior comment was correct that the output of `description` can change at any time, which is also a problem, it's not the root problem.  `description`'s output is fundamentally not a defined format and there's no guarantee (nor requirement) that any particular BinaryInteger-conforming type use any particular format.  It doesn't matter the the built-in types happen to use the desired format, this code needs to work for _all_ BinaryIntegers.
    wadetregaskis committed Oct 4, 2023
    Configuration menu
    Copy the full SHA
    ac4f8f4 View commit details
    Browse the repository at this point in the history
  4. Corrected comment about the UInt specialisation of `numericStringRepr…

    …esentation`.
    
    Technically it doesn't need to exist now, because the part the generic implementation relies on is actually the `numericStringRepresentation(intoEndOfBuffer:)` method.  But the specialisation is still significantly faster, being much simpler, so it's worth keeping even only for that reason.
    wadetregaskis committed Oct 4, 2023
    Configuration menu
    Copy the full SHA
    dbae775 View commit details
    Browse the repository at this point in the history
  5. Removed errant comma.

    No need to write like James T. Kirk.
    wadetregaskis committed Oct 4, 2023
    Configuration menu
    Copy the full SHA
    9c45109 View commit details
    Browse the repository at this point in the history

Commits on Oct 5, 2023

  1. Added rudimentary detection of broken quotientAndRemainder(dividingBy…

    …:) implementations.
    
    As the included comment explains, it'd be entirely understandable for an implementor to use e.g. T-division instead of F-division, since the documentation for quotientAndRemainder(dividingBy:) doesn't directly specify what is expected (it merely hints, obtusely, that it's F-division because that's the only way to satisfy its requirement that the remainder's sign match the original value's sign).  It's easy to catch this with a precondition, and it simplifies the code by removing the need for a second "isPositive" / "isNegative" variable.
    
    In doing that consolidation I also inverted the variable from "positive" to "negative" because arguably using the term "positive" wasn't correct (it included where the value is zero, which technically doesn't have a sign).
    wadetregaskis committed Oct 5, 2023
    Configuration menu
    Copy the full SHA
    37d9207 View commit details
    Browse the repository at this point in the history
  2. Removed use of signum() in favour of just comparing with the value …

    …directly.
    
    I originally assumed (mistakenly) that the return value was some trivial integer type, e.g. `Int`.  But it's actually `Self`, making it potentially fairly expensive to create the result, depending on how the type in question is implemented (e.g. Int512 with a fixed-size words array underneath).  It's impossible to say in general which approach is faster, but it seems like performance will be more predictable by just comparing with the original value (especially since its comparison operator implementations should by rights be pretty optimised anyway given how likely they are to be frequently used).
    wadetregaskis committed Oct 5, 2023
    Configuration menu
    Copy the full SHA
    37932ac View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    df74622 View commit details
    Browse the repository at this point in the history
  4. Added some comments in decimalDigitsAndMagnitudePerWord delineating…

    … the fast vs slow paths and explaining why the fast path isn't universally applicable.
    wadetregaskis committed Oct 5, 2023
    Configuration menu
    Copy the full SHA
    371275b View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    50c16c7 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    1b0e1f5 View commit details
    Browse the repository at this point in the history
  7. Added explanatory error messages to the asserts.

    These help explain the issue clearly and with relevant details (e.g. ancestor values of the assertion expression itself).
    wadetregaskis committed Oct 5, 2023
    Configuration menu
    Copy the full SHA
    7ef6563 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    d20025a View commit details
    Browse the repository at this point in the history
  9. Replaced special-casing of zero in `numericStringRepresentation(intoE…

    …ndOfBuffer:)` with an assert, at @glessard's request (swiftlang#262 (comment)).
    wadetregaskis committed Oct 5, 2023
    Configuration menu
    Copy the full SHA
    72cd8f0 View commit details
    Browse the repository at this point in the history
  10. Moved some comments from the formal method & property documentation i…

    …nto the methods themselves.
    
    The comments in question pertain to implementation details, that probably aren't of interest to mere users of these methods & properties.
    
    At @iCharlesHu's request (swiftlang#262 (comment)).
    wadetregaskis committed Oct 5, 2023
    Configuration menu
    Copy the full SHA
    3413e63 View commit details
    Browse the repository at this point in the history

Commits on Oct 6, 2023

  1. Configuration menu
    Copy the full SHA
    9f90a3f View commit details
    Browse the repository at this point in the history
  2. Changed the "fast path should handle the zero case" assert to a preco…

    …ndition, at @iCharlesHu's request (swiftlang#262 (comment)).
    
    It should never trip as long as `UInt(exactly:)` and the implementation of the concrete type's words property aren't buggy.  If either is, then it's likely that the result of this method will be wrong anyway.  So the benefit of the check is probably just a faster and clearer failure.
    
    But the downside of the precondition check seems quite small - slight increase in code size and a [hopefully very small] runtime cost.
    wadetregaskis committed Oct 6, 2023
    Configuration menu
    Copy the full SHA
    b2e2931 View commit details
    Browse the repository at this point in the history
  3. Made numericStringRepresentation(intoEndOfBuffer:) fileprivate (fro…

    …m the default of internal).
    
    It's not used outside the file and is a fairly specific implementation detail of the `numericStringRepresentation` property, so it would be wise to discourage casual use.
    wadetregaskis committed Oct 6, 2023
    Configuration menu
    Copy the full SHA
    31c585d View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    9a7854b View commit details
    Browse the repository at this point in the history
  5. Removed trailing commas that were originally absent, before swiftlang…

    …#262.
    
    Arguably they should already have been there, to avoid exactly this kind of spurious noise when adding or removing dependencies, but, 🤷‍♂️.
    wadetregaskis committed Oct 6, 2023
    Configuration menu
    Copy the full SHA
    b848a28 View commit details
    Browse the repository at this point in the history

Commits on Oct 8, 2023

  1. Configuration menu
    Copy the full SHA
    152c4b8 View commit details
    Browse the repository at this point in the history

Commits on Oct 9, 2023

  1. Added a broken implementation of actualBitWidth.

    This is attempting to work around `bitWidth` not working properly (IMO) for FixedWidthIntegers (they return the bit width of the _type_, not the _actual value at hand_).
    
    This is non-trivial to implement because signed numbers. 🙁
    
    Specifically, the use of two's complement (for `BinaryInteger`'s underlying storage, in `words`) - the edge cases of -(every power of two) which fits into a signed type one bit smaller than +(every power of two).  So we need to detect powers of two - easy, right?  No.  We only have a `Numeric` value (for the `magnitude`), which is next to useless because it's _such_ a high-level protocol with very little API.
    
    It _might_ be possible to make this work using just the `words`, by basically reinventing all the necessary arithmetic operations on `words`, but it'll be a bit hairy and duplicative of functionality already in the _actual_ `BinaryInteger` we're operating on (but blocked from us by the type system).
    
    I'm checking this in for posterity - in case for some reason this turns out to be the only viable path - but I think all this is unnecessary… what we really want is the number of bits required to represent the _magnitude_, because that's what we're actually going to convert to a numeric string…
    wadetregaskis committed Oct 9, 2023
    Configuration menu
    Copy the full SHA
    426e11e View commit details
    Browse the repository at this point in the history
  2. Revert "Replaced special-casing of zero in `numericStringRepresentati…

    …on(intoEndOfBuffer:)` with an assert, at @glessard's request (swiftlang#262 (comment))."
    
    We'd both overlooked the fact that it _can_ be called on zero because it's used for the individual words of the BinaryInteger, any of which can actually be zero.
    
    72cd8f0
    wadetregaskis committed Oct 9, 2023
    Configuration menu
    Copy the full SHA
    4826709 View commit details
    Browse the repository at this point in the history

Commits on Oct 10, 2023

  1. Finally found a solution for determining the minimum bit width of a g…

    …iven `BinaryInteger`.
    
    I resisted just deriving it from `words` at the outset because I thought surely that's naive; surely there's a more elegant if not more efficient way.  But there isn't, as far as I can tell.  And this actually isn't complicated nor all that expensive anyway, it turns out (at least for `BinaryInteger`s which natively store their values in two's complement form, and thus don't have to specially generate `words` on demand).
    
    An important aspect of this is that what we care about is _not_ the two's complement form, which is what all the built-ins (e.g. `bitWidth`) nominally return, but that size of the _unsigned_ value that is the _magnitude_.  That's what we're actually working on; converting to its constituent decimal digits.  So the whole off-by-one problems of two's complement representation, with negative powers of two, etc, are all irrelevant really.
    wadetregaskis committed Oct 10, 2023
    Configuration menu
    Copy the full SHA
    bc2499f View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    eacacc6 View commit details
    Browse the repository at this point in the history
  3. Added tests using the Numberick 3rd party package (if it's available)…

    …, to cover fixed-width 128-bit and 256-bit integers.
    
    These are conditionalised on the inclusion of Numberick in Package.swift, which is not the default.  Like the existing BigInt tests.
    
    This proved useful as it did reveal a bug - one I'd actually discovered previously but apparently the fix got lost during Git patch shuffling or something - in `magnitudeBitWidth`, whereby it was forgetting to check if the current word is actually non-zero before treating it as relevant (meaning not just zero padding).  This wasn't caught by the BigInt tests because those BigInt / BigUInt are flexible-width and only use as many words as strictly necessary to contain the value.
    wadetregaskis committed Oct 10, 2023
    Configuration menu
    Copy the full SHA
    75bb993 View commit details
    Browse the repository at this point in the history
  4. Corrected precondition failure message and comment: T ⇄ F.

    I knew what I meant when I wrote it - I got the actual description correct regarding the rounding mode - but I inverted the two letters.  T (for truncating, a la `.rounded(.towardZero)`) vs F (for flooring, a la `.rounded(.down)`).
    wadetregaskis committed Oct 10, 2023
    Configuration menu
    Copy the full SHA
    efb3f08 View commit details
    Browse the repository at this point in the history
  5. Added a comment on the two's complement of remainder noting that wr…

    …ap-around is intentional.
    
    The comment might seem excessive, but in reviewing this I had already forgotten that I was using the ~0 + 1 == 0 trick, and I tried to replace '&+' with plain '+', which then crashes (thankfully I'd also had the foresight to make a unit test case which checks this exact situation).  So, for future readers who aren't even me, a comment might save them some time (or anguish). 🙂
    wadetregaskis committed Oct 10, 2023
    Configuration menu
    Copy the full SHA
    5b9a221 View commit details
    Browse the repository at this point in the history