-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add weejson-jsoniter-scala #105
base: v1
Are you sure you want to change the base?
Conversation
060184f
to
4ff4710
Compare
4ff4710
to
2e69f1b
Compare
2e69f1b
to
01f88f0
Compare
515052e
to
4d82ff5
Compare
val cs = new String(in.readRawValAsBytes(), StandardCharsets.US_ASCII) | ||
|
||
/** | ||
* This regex performs rather badly, but gets the tests passing. | ||
* | ||
* We're looking for a value we can pass through the Visitor interface-- | ||
* either a primitive Double or a CharSequence representing a *valid* | ||
* number conforming to https://datatracker.ietf.org/doc/html/rfc7159#page-6. | ||
* | ||
* `in.readRawValAsBytes()` does NOT do that validation. It will happily | ||
* return a String of "------". | ||
* | ||
* `in.readBigDecimal(null).toString` is tempting, but will not provide the raw input. | ||
* Instead, it transforms the input from "0.00000001" to "1.0E-8". | ||
* This fails roundtrip tests. | ||
* | ||
* I tried combining the two approaches, `in.readBigDecimal(null)` for validation, | ||
* then `in.rollbackToMark()` + `in.readRawValAsBytes()` to capture the raw input, | ||
* but for a value like "1.0-----", `in.readBigDecimal(null)` will read "1.0", | ||
* then `in.readRawValAsBytes()` will return the whole string, including the unwanted | ||
* trailing hyphens. | ||
* | ||
*/ | ||
require(ValidJsonNum.pattern.matcher(cs).matches(), "invalid number") | ||
v.visitFloat64String(cs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@plokhotnyuk, I seem to be stuck here. Do you have any suggestions to achieve better performance than my hacky regex given the failed attempts in the comment above? Specifically, is there anything in JsonReader
that can help me that I might have overlooked?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@htmldoug You can use some counters to calculate number of valid chars like in the following code snippet (BEWARE: it is not tested)
private def parseNumber[J](
in: JsonReader,
v: Visitor[_, J]
): J = {
in.setMark()
var b = in.nextByte()
var digits, index = 0
var decIndex, expIndex = -1
if (b == '-') {
b = in.nextByte()
index += 1
}
try {
digits -= index
while (b >= '0' && b <= '9') {
b = in.nextByte()
index += 1
}
digits += index
if (b == '.') {
decIndex = index
b = in.nextByte()
index += 1
}
digits -= index
while (b >= '0' && b <= '9') {
b = in.nextByte()
index += 1
}
digits += index
if ((b | 0x20) == 'e') {
expIndex = index
b = in.nextByte()
index += 1
if (b == '-' || b == '+') {
b = in.nextByte()
index += 1
}
while (b >= '0' && b <= '9') {
b = in.nextByte()
index += 1
}
}
} catch {
case _: JsonReaderException => // ignore the end of input error for now
} finally in.rollbackToMark()
if ((decIndex & expIndex) == -1) {
if (digits < 19) v.visitInt64(in.readLong())
else {
val x = in.readBigInt(null)
if (x.bitLength < 64) v.visitInt64(x.longValue)
else v.visitFloat64StringParts(x.toString, -1, -1)
}
} else {
val cs = new String(in.readRawValAsBytes(), StandardCharsets.US_ASCII)
require(cs.length - 1 == index, "invalid number")
v.visitFloat64String(cs)
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooh, comparing the walked length vs the string.length
is smart. I'll run with this implementation tomorrow. Thank you, @plokhotnyuk!
I tried something similar previously, but it wasn't giving me the throughput I expected. I just realized that most of the time is spent in endOfInputError
> appendHexDump
. "3.14 "
parses 3x faster than "3.14"
! I wonder if something similar could also be happening in BigDecimalReading
, BigIntReading
, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@htmldoug You can parse an array of numbers to make your benchmark more realistic. I bet nobody use JSON parsers to read a single number.
b649e46
to
c66c734
Compare
c66c734
to
be795ee
Compare
Summary
Adds basic integration for the very fast jsoniter-scala parser.
Benchmarks
Optional Jar
Based on performance alone, I'd love to have this parser as weepickle's default UTF-8 implementation, but:
The README suggests as a feature:
We may want to go this route in the future, but for now, I'm adding the integration as an optional jar primarily for use by non-libraries at the absolute root of the dependency tree.
TODO