Skip to content
Pang Wu edited this page Jun 9, 2014 · 6 revisions

Schema

The event table will store events generated from different sensors per-day per-user. All the events are stored in one field and the full event list will be overwritten after reprocessing. The rational behind this design is to have a schema that can avoid read/update race condition thus has a high throughput.

Attribute Name Attribute Type Memo
account_id Hash Key Account ID of the user
target_date_of_night String, Range Key The date of targeted sleeping night, in user's local time.
events_data Binary Compressed JSON data which corresponds to List data structure. The compression algorithm is specified in "compression_type" field, currently BZip2 is used due to its high compression rate.
compression_type Number The compression type corresponds to Compression.CompressionType enum.

Discussion of the Selection of Compression Algorithms

DynamoDB limits each attribute to 64KB. To be safe, we want the compressed data less than half of the limit, which is 32KB. We tried three algorithms with the following setup: Raw data: A full-day event list with three types(MOTION, NOISE, LIGHT) of events. For each type, the first event begins at the beginning of the testing date. Each event lasts for 1 minute, which is the collection interval of motion data. So we have 3 * 24 * 60 = 4320 events. Compression Algorithms: Snappy, GZip, BZip2 Testing round: 100 rounds on the same data

Algorithm Output Data Size(Bytes) Compression Time(Million Seconds) Decompression Time(Million Seconds)
Raw Data 489601 N/A N/A
BZip2 11362 9613 912
GZip 25051 557 89
Snappy 54792 62 31

From the result Snappy is the fastest, however, due to its non-bitwise compression design, it has the lowest compression ratio. GZip is roughly 10 times slower in compression and 30 times slower in decompression, comparing to Snappy. But GZip's final output is only half-size. BZip2 has the best compression ratio, which is almost 43:1. However it is 155 times slower in compression and 29 times slower in decompression, comparing to Snappy.

Due to the limit of 64KB each record and the possibility to add more event lists as addition attributes, as well as the current design that all reprocessing only triggered by motion data upload, we decide to go with the CPU heavy BZip2 compression. However, a field "compression_type" is added so we have the flexibility to change algorithm later.