Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed IntegerFormatStyle for values > Int64.max. #262

Merged
merged 52 commits into from
Oct 31, 2023
Merged
Show file tree
Hide file tree
Changes from 36 commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
84b6e5a
Fixed IntegerFormatStyle for values > Int64.max.
wadetregaskis Sep 16, 2023
1ded056
Removed the reliance on `BinaryInteger.description` for `IntegerForma…
wadetregaskis Sep 26, 2023
fed358b
Merge branch 'apple:main' into main
wadetregaskis Sep 26, 2023
778a6d0
Merge branch 'apple:main' into main
wadetregaskis Sep 27, 2023
dac9328
Merge branch 'apple:main' into main
wadetregaskis Sep 29, 2023
cbb3bf2
Use the `.zero` property of `BinaryInteger` than a literal `0` as it …
wadetregaskis Sep 26, 2023
cb4c237
Renamed the helpers from `test` to `check` to avoid any confusion wit…
wadetregaskis Sep 29, 2023
4312195
Renamed the decimalDigitsAndMagnitudePerWord helper to `check` as wel…
wadetregaskis Sep 29, 2023
f72a106
Switched to using UnsafeMutableBufferPointer.initializeElement(at:to)…
wadetregaskis Sep 30, 2023
87c866c
Corrected the calculation for the (maximum) number of wordStrings nee…
wadetregaskis Sep 30, 2023
b78c9db
Handle negative remainders correctly, by ignore the sign bit.
wadetregaskis Sep 30, 2023
9efae3d
Moved the Word == UInt precondition to a location more directly relev…
wadetregaskis Sep 30, 2023
854b010
Added unit tests for numericStringRepresentation and decimalDigitsAnd…
wadetregaskis Sep 30, 2023
b979c25
Optimised the implementation to use just one buffer to build the output.
wadetregaskis Sep 30, 2023
069a6eb
Corrected the `digits` count returned by maximumDecimalDigitsForUnsig…
wadetregaskis Oct 2, 2023
e5ae0e4
Simplified the specialisation of `numericStringRepresentation` re. ex…
wadetregaskis Oct 2, 2023
703a6ee
Added a fast path to `decimalDigitsAndMagnitudePerWord` that computes…
wadetregaskis Oct 2, 2023
342513f
Corrected & refined the wording of various comments.
wadetregaskis Oct 2, 2023
71d661b
Removed outdated precondition check, that Words.Element == UInt.
wadetregaskis Oct 2, 2023
087deac
Added some additional asserts, for rigour.
wadetregaskis Oct 2, 2023
c73c7cc
Removed unnecessary use of Word.max.bitWidth when Word.bitWidth retur…
wadetregaskis Oct 2, 2023
28bb1f5
Replaced UnsafeMutableBufferPoint.Index.advance(by:) with direct arit…
wadetregaskis Oct 2, 2023
e6f0b18
Removed an unnecessary temporary variable ('nextWordInsertPoint').
wadetregaskis Oct 2, 2023
71e8244
Made the `numericStringRepresentation` unit tests work when UInt is 3…
wadetregaskis Oct 2, 2023
0714bfa
Commented out the import of attaswift/BigInt and put the tests that d…
wadetregaskis Oct 4, 2023
45e44e9
Merge remote-tracking branch 'upstream/main'
wadetregaskis Oct 4, 2023
ac4f8f4
Revised the explanatory comment on why `description` cannot be used i…
wadetregaskis Oct 4, 2023
dbae775
Corrected comment about the UInt specialisation of `numericStringRepr…
wadetregaskis Oct 4, 2023
9c45109
Removed errant comma.
wadetregaskis Oct 4, 2023
37d9207
Added rudimentary detection of broken quotientAndRemainder(dividingBy…
wadetregaskis Oct 5, 2023
37932ac
Removed use of `signum()` in favour of just comparing with the value …
wadetregaskis Oct 5, 2023
df74622
Corrected inaccurate wording in the `decimalDigitsAndMagnitudePerWord…
wadetregaskis Oct 5, 2023
371275b
Added some comments in `decimalDigitsAndMagnitudePerWord` delineating…
wadetregaskis Oct 5, 2023
50c16c7
Simplified a variable name, `remainderAsSelf` → `remainder`.
wadetregaskis Oct 5, 2023
1b0e1f5
Updated copyright headers per @glessard's direction (https://github.c…
wadetregaskis Oct 5, 2023
7ef6563
Added explanatory error messages to the asserts.
wadetregaskis Oct 5, 2023
d20025a
Corrected typo in unit test name.
wadetregaskis Oct 5, 2023
72cd8f0
Replaced special-casing of zero in `numericStringRepresentation(intoE…
wadetregaskis Oct 5, 2023
3413e63
Moved some comments from the formal method & property documentation i…
wadetregaskis Oct 5, 2023
9f90a3f
Removed the commented-out additions of attaswift/BigInt as a dependen…
wadetregaskis Oct 6, 2023
b2e2931
Changed the "fast path should handle the zero case" assert to a preco…
wadetregaskis Oct 6, 2023
31c585d
Made `numericStringRepresentation(intoEndOfBuffer:)` fileprivate (fro…
wadetregaskis Oct 6, 2023
9a7854b
Merge remote-tracking branch 'upstream/main'
wadetregaskis Oct 6, 2023
b848a28
Removed trailing commas that were originally absent, before https://g…
wadetregaskis Oct 6, 2023
152c4b8
Merge remote-tracking branch 'upstream/main'
wadetregaskis Oct 8, 2023
426e11e
Added a broken implementation of `actualBitWidth`.
wadetregaskis Oct 9, 2023
4826709
Revert "Replaced special-casing of zero in `numericStringRepresentati…
wadetregaskis Oct 9, 2023
bc2499f
Finally found a solution for determining the minimum bit width of a g…
wadetregaskis Oct 10, 2023
eacacc6
Merge remote-tracking branch 'upstream/main'
wadetregaskis Oct 10, 2023
75bb993
Added tests using the Numberick 3rd party package (if it's available)…
wadetregaskis Oct 10, 2023
efb3f08
Corrected precondition failure message and comment: T ⇄ F.
wadetregaskis Oct 10, 2023
5b9a221
Added a comment on the two's complement of `remainder` noting that wr…
wadetregaskis Oct 10, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 5 additions & 2 deletions Package.swift
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,9 @@ let package = Package(
revision: "0c1de7149a39a9ff82d4db66234dec587b30a3ad"),
.package(
url: "https://github.com/apple/swift-syntax.git",
from: "509.0.0")
from: "509.0.0"),
// .package(url: "https://github.com/attaswift/BigInt.git",
// .upToNextMajor(from: "5.1.0")),
wadetregaskis marked this conversation as resolved.
Show resolved Hide resolved
],
targets: [
// Foundation (umbrella)
Expand Down Expand Up @@ -113,7 +115,8 @@ let package = Package(
package.targets.append(contentsOf: [
.testTarget(name: "FoundationInternationalizationTests", dependencies: [
"TestSupport",
"FoundationInternationalization"
"FoundationInternationalization",
//"BigInt"
wadetregaskis marked this conversation as resolved.
Show resolved Hide resolved
], swiftSettings: availabilityMacros),
])
#endif
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
//
// This source file is part of the Swift.org open source project
//
// Copyright (c) 2020 Apple Inc. and the Swift project authors
// Copyright (c) 2020 - 2023 Apple Inc. and the Swift project authors
// Licensed under Apache License v2.0 with Runtime Library Exception
//
// See https://swift.org/LICENSE.txt for license information
Expand All @@ -14,6 +14,12 @@
import FoundationEssentials
#endif

#if canImport(Darwin)
import Darwin
#elseif canImport(Glibc)
import Glibc
#endif

@available(macOS 12.0, iOS 15.0, tvOS 15.0, watchOS 8.0, *)
extension BinaryInteger {

Expand All @@ -34,6 +40,199 @@ extension BinaryInteger {

}

// MARK: - BinaryInteger + Numeric string representation

extension BinaryInteger {
/// Formats `self` in "Numeric string" format (https://speleotrove.com/decimal/daconvs.html) which is the required input form for certain ICU functions (e.g. `unum_formatDecimal`).
///
/// This produces output that (at time of writing) looks identical to the `description` for many `BinaryInteger` types, such as the built-in integer types. However, the format of `description` is not specifically defined by `BinaryInteger` (or anywhere else, really), and as such cannot be relied upon. Thus this purpose-built method, instead.
///
/// It might be worth moving this into the Swift standard library one day, so that it can be used as the basis for the default `description` instead of duplicating that conversion process. At least while `description`'s output happens to match this one's.
///
/// This property's type is an ArraySlice rather than a straight ContiguousArray because it's computationally more expensive to size the array exactly right (whether by pre-calculating the required size more precisely, or by dynamically resizing it during execution of the iterative algorithm). Since this is intended to only be a transient value, for the purposes of translating from BinaryInteger to the ICU libraries, this is the right compromise of runtime speed (CPU time) and memory efficiency (it's usually only off by a byte or two, if that).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(super nit): these lines feels more like comments and explanations rather than header doc. We should try to keep the header doc concise because it shows up in Xcode's quick help.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, could you break down the comments into multiple lines? I know we don't have an official style guide yet, but these comments will definite exceed that limit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can. What is the line length limit?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have a line length limit in this project, preferring to let users wrap it to their own preference in their editors.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said - I agree that we should keep header doc and implementation comments split up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think this is resolved, then? I didn't hard-wrap anything, but I moved the couple of paragraphs in question into the function body as regular comments.

internal var numericStringRepresentation: ArraySlice<UInt8> {
// Fast-path for values that fit into a UInt, as the conversion to a UInt should be virtually free if it's possible (it's essentially just self.words[0]) and there's a specialisation of this function for UInt that's faster.
if let fastForm = UInt(exactly: self) {
return fastForm.numericStringRepresentation
}

assert(.zero != self, "Value of zero (for self) should have been handled by fast path, but wasn't.") // Zero isn't handled correctly in the algorithm below (no numbers will actually be emitted) because it's more work to do so, which is unnecessary as the fast path above should handle that case.
wadetregaskis marked this conversation as resolved.
Show resolved Hide resolved

// The algorithm here is conceptually fairly simple. In a nutshell, the value of self is broken down into Word-sized chunks, each of which is divided by ten repeatedly until the value dimishes to zero. The remainder of each division is the next digit of the result (starting with the least significant).
//
// A conceptually simpler approach is to skip the first step of breaking things into Word-sized chunks, and just do the division by 10 on the whole value of `self`. The difference is performance - native integer division (for machine-word-sized integers) is essentially O(1), whereas division of arbitrary-precision integers is essentially O(log2(bitWidth)) since it's composed of _multiple_ machine-word-sized divides proportionate to its binary magnitude.
//
// So we replace some of those expensive O(log2(bitWidth)) divides with simpler O(1) divides, by first dividing by the largest multiple of ten such that the remainder fits in a single machine word (UInt), and then using regular integer division CPU instructions to further divide that simple machine-word-sized remainder down into individual digits.

let (decimalDigitsPerWord, wordMagnitude) = Self.decimalDigitsAndMagnitudePerWord()
let negative = 0 > self
let maximumDigits = (Self.maximumDecimalDigitsForUnsigned(bitWidth: self.bitWidth - (negative ? 1 : 0)) // -1 for negative values because their bit width includes the sign bit, which we don't care about.
+ (negative ? 1 : 0)) // Include room for "-" prefix if necessary.
var actualDigits: Int = Int.min // Actually initialised inside the closure below, but the compiler mistakenly demands a default value anyway.

return ContiguousArray<UInt8>(unsafeUninitializedCapacity: maximumDigits) { buffer, initialisedCount in
var tmp = self
var wordInsertionPoint = buffer.endIndex - 1

while .zero != tmp {
let (quotient, remainder) = tmp.quotientAndRemainder(dividingBy: wordMagnitude)
precondition(.zero == remainder || (negative == (0 > remainder)), "Starting value \(tmp) is \(negative ? "negative" : "positive (or zero)") yet the remainder of division by \(wordMagnitude) is not: \(remainder). quotientAndRemainder(dividingBy:) is not implemented correctly for \(type(of: self)) (it might be using T-division instead of F-division).") // It's an entirely understandable error for an implementor to use T-division for their integer quotient and remainder, but they're supposed to use F-division. i.e. the quotient is supposed to be rounded towards zero rather than down (and that effects the modulus correspondingly, since either way the results must satisfy r = d ⨉ (r idiv i) + (r mod i)). F-division is convenient because its remainder is neatly the value of interest to this algorithm, rather than being offset by the divisor if r is negative. While it would be technically possible to assume T-division if the remainder's sign doesn't match, the incorrect implementation of quotientAndRemainder(dividingBy:) will probably still break other algorithms and so we shouldn't encourage it.

// By definition the remainder has to be a single word (since the divisor, `wordMagnitude`, fits in a single word), so we can avoid working on a BinaryInteger generically and just use the first word directly, which is concretely UInt.
assert(remainder.bitWidth - (negative ? 1 : 0) <= Words.Element.bitWidth,
wadetregaskis marked this conversation as resolved.
Show resolved Hide resolved
"The remainder of dividing \(tmp) by \(wordMagnitude), \(remainder), should fit into a single word, yet it does not (its bit width is \(remainder.bitWidth) - \(negative ? 1 : 0) (for the sign bit) which is greater than the \(Words.Element.bitWidth) bits of \(Words.Element.Type.self)).") // When we're working with negative values the reported `bitWidth` will be one greater than that of the magnitude because it counts the sign bit, but we don't care about that sign bit.
var word = remainder.words.first ?? 0

if negative {
// Luckily for us `words` is defined to be in two's complement form, so we can manually flip the sign. This doesn't normally work because two's complement cannot represent the positive version of its most negative value, but we know we won't have that here because it's the remainder from division by `wordMagnitude`, which is always going to be less than UInt.max because `wordMagnitude` itself has to fit into UInt (and the remainder of division is always at least one smaller than the divisor).
word = ~word &+ 1
}

let digitsAdded = word.numericStringRepresentation(intoEndOfBuffer: &buffer[...wordInsertionPoint])

if .zero != quotient { // Not on the last word, so need to fill in leading zeroes etc.
wordInsertionPoint -= decimalDigitsPerWord

let leadingZeroes = decimalDigitsPerWord - digitsAdded
assert(0 <= leadingZeroes, "Negative leading zeroes \(leadingZeroes)! Expected \(decimalDigitsPerWord) digits per word and added \(digitsAdded).")

if 0 < leadingZeroes {
buffer[(wordInsertionPoint + 1)...(wordInsertionPoint + leadingZeroes)].initialize(repeating: UInt8(ascii: "0"))
}
} else {
wordInsertionPoint -= digitsAdded
}

tmp = quotient
}

if negative {
buffer[wordInsertionPoint] = UInt8(ascii: "-")
wordInsertionPoint -= 1
}

actualDigits = wordInsertionPoint.distance(to: buffer.endIndex) - 1

let unusedDigits = maximumDigits - actualDigits
assert(0 <= unusedDigits, "Negative unused digits \(unusedDigits)! Expected at most \(maximumDigits) digit(s) but emitted \(actualDigits).")

if 0 < unusedDigits {
buffer[0..<unusedDigits].initialize(repeating: 0) // The buffer is permitted to be partially uninitialised, but only at the end. So we have to initialise the unused portion at the start, in principle. Technically this probably doesn't matter given we never subsequently read this part of the buffer, but there's no way to express that such that the compiler can ensure it for us.
}

initialisedCount = maximumDigits
}[(maximumDigits - actualDigits)...]
}

/// - Parameter bitWidth: The bit width of interest. Must be zero or positive.
/// - Returns: The maximum number of decimal digits that an unsigned value of the given bit width may contain.
internal static func maximumDecimalDigitsForUnsigned(bitWidth: Int) -> Int {
guard 0 < bitWidth else { return 0 }
guard 1 != bitWidth else { return 1 } // Algorithm below only works for bit widths of _two_ and above.

return Int((Double(bitWidth) * log10(2)).rounded(.up)) + 1 // https://www.exploringbinary.com/number-of-decimal-digits-in-a-binary-integer
}

/// Determines the magnitude (the largest decimal magnitude that fits in Word, e.g. 100 for UInt8) and "maximum" digits per word (e.g. two for UInt8).
///
/// Note that 'maximum' in this case is context-specific to the `numericStringRepresentation` algorithm. It is not necessarily the maximum digits required for _any_ Word, but rather any value of Word type which is less than the maximum decimal magnitude. Typically this is one digit less.
///
/// This cannot be defined statically because it depends on the types of both Self and Word. The compiler can in principle fold this down to the resulting values at compile time - since it knows the concrete types for any given call site - and then just inline those into the caller.
internal // For unit test accessibility, otherwise would be fileprivate.
static func decimalDigitsAndMagnitudePerWord() -> (digits: Int, magnitude: Self) {
// First, a fast-path that works for any type (for `Self`) which can (essentially) represent a UInt (or larger).
let guessDigits = Int(Double(Words.Element.bitWidth) * log10(2))
let guessMagnitude = pow(10, Double(guessDigits))

if let magnitudeAsSelf = Self(exactly: guessMagnitude) {
return (guessDigits, magnitudeAsSelf)
}

// Alas `Self` is smaller than UInt, so fall back to a truly generic - but slower - algorithm to find the results. This is because BinaryIntegers - unlike e.g. FixedWidthIntegers - don't provide APIs for questions like "what is the maximum bit width of Self?".

var count = 1
var value: Words.Element = 1

while true {
var (nextValue, overflowed) = value.multipliedReportingOverflow(by: 10)

// Words.Element might be wider than the actual type, e.g. if Self is UInt8 (and UInt is not). The magnitude is limited by the smallest of the two.
if !overflowed && nil == Self(exactly: nextValue) {
overflowed = true
}

if !overflowed || .zero == nextValue {
count += 1
}

if overflowed {
return (count - 1, Self(value))
}

value = nextValue
}
}
}

extension UInt {
/// Formats `self` in "Numeric string" format (https://speleotrove.com/decimal/daconvs.html) which is the required input form for certain ICU functions (e.g. `unum_formatDecimal`).
///
/// This is intended to be used only by the `numericStringRepresentation` property (both the specialised form below and the generic one for all BinaryIntegers, above). Prefer the `numericStringRepresentation` property for all other use-cases.
///
/// - Parameter intoEndOfBuffer: The buffer to write into, which _must_ contain enough space for the result. The formatted output is placed into the _end_ of this buffer ("right-aligned", if you will), though the output is numerically still left-to-right. The contents of this buffer do not have to be pre-initialised.
/// - Returns: How many entries (UInt8s) of the buffer were used. Note that zero is a valid return value, as nothing is written to the buffer if `self` is zero (this may be odd but it's acceptable to `numericStringRepresentation` and it simplifies the overall implementation).
func numericStringRepresentation(intoEndOfBuffer buffer: inout Slice<UnsafeMutableBufferPointer<UInt8>>) -> Int {
wadetregaskis marked this conversation as resolved.
Show resolved Hide resolved
guard .zero != self else { return 0 } // Easier to special-case this here than deal with it below (annoying off-by-one potential errors).
wadetregaskis marked this conversation as resolved.
Show resolved Hide resolved

var insertionPoint = buffer.endIndex - 1
var tmp = self

// Keep dividing by ten until the value disappears. Each time we divide, we get one more digit for the output as the remainder of the division. Since with this approach digits "pop off" from least significant to most, the output buffer is filled in reverse.
while .zero != tmp {
let (quotient, remainder) = tmp.quotientAndRemainder(dividingBy: 10)

buffer[insertionPoint] = UInt8(ascii: "0") + UInt8(remainder)

if .zero != quotient {
insertionPoint -= 1
assert(insertionPoint >= buffer.startIndex, "Buffer is too small (\(buffer.count) UInt8s) to contain the result.")
}

tmp = quotient
}

return insertionPoint.distance(to: buffer.endIndex)
}

/// Formats `self` in "Numeric string" format (https://speleotrove.com/decimal/daconvs.html) which is the required input form for certain ICU functions (e.g. `unum_formatDecimal`).
///
/// This specialisation (for UInt) is faster than the generic BinaryIntegers implementation (earlier in this file). It is used as an opportunistic fast-path in the generic implementation, for any values that happen to fit into a UInt.
///
/// This property's type is an ArraySlice rather than a straight ContiguousArray because it's computationally more expensive to size the array exactly right (whether by pre-calculating the required size more precisely, or by dynamically resizing it during execution of the iterative algorithm). Since this is intended to only be a transient value, for the purposes of translating from BinaryInteger to the ICU libraries, this is the right compromise of runtime speed (CPU time) and memory efficiency (usually just one excess byte, if any).
internal var numericStringRepresentation: ArraySlice<UInt8> {
// It's easier to just special-case zero than handle it in the main algorithm.
guard .zero != self else {
return [UInt8(ascii: "0")]
}

let maximumDigits = Self.maximumDecimalDigitsForUnsigned(bitWidth: self.bitWidth)
var actualDigits: Int = Int.min // Actually initialised inside the closure below, but the compiler mistakenly demands a default value anyway.

return ContiguousArray(unsafeUninitializedCapacity: maximumDigits) { buffer, initialisedCount in
actualDigits = numericStringRepresentation(intoEndOfBuffer: &buffer[...])

let unusedDigits = maximumDigits - actualDigits
assert(0 <= unusedDigits, "Negative unused digits \(unusedDigits)! Expected at most \(maximumDigits) digit(s) but emitted \(actualDigits).")

if 0 < unusedDigits {
buffer[0..<unusedDigits].initialize(repeating: 0) // The buffer is permitted to be partially uninitialised, but only at the end. So we have to initialise the unused portion at the start, in principle. Technically this probably doesn't matter given we never subsequently read this part of the buffer, but there's no way to express that such that the compiler can ensure it for us.
}

initialisedCount = maximumDigits
}[(maximumDigits - actualDigits)...]
}
}

// MARK: - BinaryInteger + Parsing

@available(macOS 12.0, iOS 15.0, tvOS 15.0, watchOS 8.0, *)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
//
// This source file is part of the Swift.org open source project
//
// Copyright (c) 2020 Apple Inc. and the Swift project authors
// Copyright (c) 2020 - 2023 Apple Inc. and the Swift project authors
// Licensed under Apache License v2.0 with Runtime Library Exception
//
// See https://swift.org/LICENSE.txt for license information
Expand Down Expand Up @@ -172,6 +172,10 @@ internal class ICUNumberFormatterBase {
try? FormatResult(formatter: uformatter, value: v).string
}

func format(_ v: ArraySlice<UInt8>) -> String? {
try? FormatResult(formatter: uformatter, value: v).string
}

// MARK: -

class FormatResult {
Expand Down Expand Up @@ -209,6 +213,22 @@ internal class ICUNumberFormatterBase {
try status.checkSuccess()
}

init(formatter: OpaquePointer, value: ArraySlice<UInt8>) throws {
var status = U_ZERO_ERROR
result = unumf_openResult(&status)
try status.checkSuccess()

value.withUnsafeBufferPointer {
unumf_formatDecimal(formatter,
$0.baseAddress,
Int32($0.count),
result,
&status)
}

try status.checkSuccess()
}

deinit {
unumf_closeResult(result)
}
Expand Down
Loading