Skip to content

Latest commit

 

History

History
98 lines (70 loc) · 3.58 KB

README.md

File metadata and controls

98 lines (70 loc) · 3.58 KB

cpak

npm version

cpak is a compressed data format for numeric data, embeddable in code and text documents. It's especially suitable for geometry and geographic data.

This package contains the spec (soon) and some supporting functions. For reading and writing geographic data, use (for example) cgeo-cpak

The specification consists of multiple levels:

  • Level 0: Suitable for any numeric data. Necessary JavaScript code (and no more) is included in this package.
  • Level 1: Defines an encoding for SQL/MM Spatial data types. Additional convention or configuration is required to represent WKB and WKT data.
  • Level 2 (future): Full replacement for WKB and WKT.

Level 0

Unsigned integers are encoded using the 92 printable non-whitespace ASCII characters that survive unchanged through JSON.stringify. These form 2 groups containing 64 and 28 characters, used as "digits" in base 64 or 28.

Signed integers are zigzag encoded first, in the same way as protocol buffers:

unsigned = (signed << 1) ^ (signed >> 31);
signed = (unsigned >>> 1) ^ -(unsigned & 1);

Digits are stored most significant first, and the last digit always comes from a different group than any preceding ones, to distinguish where the next number begins.

Numbers expected to be smaller than 64 use the 64-character group for their last "digit". That allows usually storing them in one byte. Larger numbers use the 28-character group for their last "digit" and 64-character group for the other digits. That allows usually storing them in the same amount of space that Base64 would take.

Characters left unused due to incompatibility with JSON are " and \ (ASCII 34 and 92). It turns out the following formulas convert between digits and ASCII characters without requiring lookup tables:

ascii = digit + (((digit + 1934) * 9) >> 9);
digit = ascii - (((ascii + 1900) * 9) >> 9);

The same code works at least in JavaScript, C and C++. All those languages also support string literals with cpak encoded data as-is.

digit is a number between 0-91, where 0-63 forms one group and 64-91 is the other. ascii is a character code from the following string, indexed by digit:

!#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~

Level 1

Encoding is similar to WKB (Well-Known Binary) with endianness flags omitted and all integers stored in a variable-length format.

Geometry type IDs are multiplied by 2 compared to WKB. This allows the least significant bit to indicate spec level.

  • 0 means level 1.
  • 1 means level 2 (or higher).

Floating point values need to be rounded to integers before encoding. Multiplying them by some factor first allows keeping fractional digits up to a desired precision. Powers of 2 and 10 both have their advantages when choosing a multiplication factor:

  • Base 2 allows easy further lossless conversion to WKB or shapefiles and back.
  • Base 10 allows easy further lossless conversion to WKT or GeoJSON and back.

The same multiplication factor must be used when reading and writing, and level 1 leaves it to the application to guarantee this. Embedding cpak-encoded geometry into JSON is the recommended way to allow storing additional metadata.

License

Dual-licensed under:

The MIT License

CC0

Copyright (c) 2017 BusFaster Ltd