Jsonal is a highly opinionated JSON parser, printer, and Abstract Syntax Tree (AST).
It is intended to excel at being an intermediate format for data, including numeric data. The design goals for Jsonal are:
- Be scrupulous. Adhere perfectly to the JSON specification.
- Be efficient. JSON I/O should never be the bottleneck.
- Be concise. Place the focus on the logic of what to accomplish.
- Be helpful. Provide powerful, flexible methods for the most common operations. Provide comprehensible error messages if anything goes wrong.
- Be mathematical. Understand numbers and precision and provide ways to work with them.
Jsonal rejects other valuable goals as incompatible with its primary goals:
- Be immutable. Immutability aids consistency, but it is not always compatible with efficiency.
- Be reshapeable. Reshaping an AST is expensive without structural sharing, but structural sharing is not always efficient.
- Be idealistic. If something is RECOMMENDED, that's nice, but we won't assume it.
If you wish to accomplish these goals, which complement each other nicely, convert to another AST.
In addition, Jsonal does not have a streaming mode. If a single file is too big to process in memory, you should use another approach.
No other JSON parser/printer/AST for Scala fulfills the design goals. (Jawn comes close.)
JSON parsers and ASTs often take shortcuts that enable them to represent only JSON that avoids "SHOULD NOT" directives in the specification. These are not, strictly speaking, compliant parsers. Because Jsonal insists upon supporting the entire JSON specification, it must:
- Represent numbers as a non-primitive type. JSON allows arbitrary numbers; Jsonal must also.
- Not embed NaN and +/- Infinity bare in JSON. These are not allowed; they must be quoted as strings, or be
null
. - Not represent a JSON object as a single-valued unordered map. JSON is a linear format, and does not forbid multiple values with the same key, so the order and number of values for the same key must be preserved.
- Preserve the actual value of a number, not some internal approximation thereof.
Jsonal is not so scrupulous as to:
- Preserve whitespace. It is semantically irrelevant.
- Preserve the encoding of strings. Again, the choice of unicode escape vs. plain unicode is semantically irrelevant.
All parsing and printing routines are written by hand to optimize speed. (sun.misc.Unsafe
is used, but only for InputStreams.) But there are additional choices that that are dictated by efficiency:
- Parsing big numbers can be slow. No need to unless they're wanted, so big numbers must be stored as a
String
or similar format. - Parsing small numbers can be fast and happen in one pass. Small numbers should thus be parsed and stored as a
Double
or similar. - Parsing arrays of numbers creates a lot of boxes. Numbers should be unboxed into a primitive array when possible.
- Constructing key-value maps is slow. Parsing must be into an array.
- Looking up keys in objects is slow. Lookup must be from a map. (Lazily created based on the array.)
- Extra layers of boxing, or exceptions, for errors are slow. The core data type must represent errors also (but outside of the JSON types).
The JSON specification is for a serialized data format, and it talks in terms of several basic types. Jsonal uses exactly the same types (but not necessarily only those types), and basic operators to build and destructure JSON.
- The basic hierarchy of JSON data types should map exactly to the types in the specification, and have the same properties. A JSON value is
Json
. A complete set of subtypes areNull
,True
,False
,Str
,Num
,Arr
, andObj
, corresponding exactly to the seven JSON types (in abbreviated form). - Building a single JSON value is as simple as
Json(x)
, wherex
is an appropriate type. - Building a composite value is accomplished with builders where you just list the values:
Json ~ x ~ y ~ Json
. Note that the builders are delimited withJson
.
The JSON specification does not map perfectly onto Scala's data types. Therefore, some adjustment to the JSON type hierarchy and some utility methods are advisable.
- Values that might be JSON or might be in error are of type
Jast
(JSON Abstract Syntax Tree). This encapsulates an error state as well as correct states for lookups that may fail (e.g. looking up a missing key). - Natural destructuring is provided by
apply
methods for keys and values; on failure, aJast
is returned (upon which all destructuring methods are no-ops and just preserve the original error). - Parsers and printers are provided for most common data types or wrappers thereof:
String
,ByteBuffer
,CharBuffer
,InputStream
. A prettyprinter is provided also. - Converters from common types are provided (mostly via typeclasses), so you don't have to select which type of JSON value you're building. Usually there is a single obvious choice.
The mismatch beween JSON's "numbers are decimal numbers of arbitrary size" and computing's heavy reliance upon Double
provides some challenges for using JSON as a data exchange format for number-heavy data.
- Mathematics is mostly done with Doubles, so I/O of Doubles should be fast and easy.
- Many physical sciences have an idea of precision or significant figures which, if applied, can discard roundoff error and/or allow more compact JSON files. What is known about precision can be supplied.
- Floats are also sometimes used in place of Doubles. But those may have rounding errors. It shouldn't be easy to mistake one for the other.
- Arrays of Doubles should go to-and-from arrays of Doubles. The
Arr
type thus has subtypesAll
(for arbitraryJson
values) andDbl
(for Doubles).