-
Notifications
You must be signed in to change notification settings - Fork 7
Event Table
The event table will store events generated from different sensors per-day per-user. All the events are stored in one field and the full event list will be overwritten after reprocessing. The rational behind this design is to have a schema that can avoid read/update race condition thus has a high throughput.
Attribute Name | Attribute Type | Memo |
---|---|---|
account_id | Hash Key | Account ID of the user |
target_date_of_night | String, Range Key | The date of targeted sleeping night, in user's local time. |
events_data | Binary | Compressed JSON data which corresponds to List data structure. The compression algorithm is specified in "compression_type" field, currently BZip2 is used due to its high compression rate. |
compression_type | Number | The compression type corresponds to Compression.CompressionType enum. |
DynamoDB limits each attribute to 64KB. To be safe, we want the compressed data less than half of the limit, which is 32KB. We tried three algorithms with the following setup: Raw data: A full-day event list with three types(MOTION, NOISE, LIGHT) of events. For each type, the first event begins at the beginning of the testing date. Each event lasts for 1 minute, which is the collection interval of motion data. So we have 3 * 24 * 60 = 4320 events. Compression Algorithms: Snappy, GZip, BZip2 Testing round: 100 rounds on the same data
Algorithm | Output Data Size(Bytes) | Compression Time(Million Seconds) | Decompression Time(Million Seconds) |
---|---|---|---|
Raw Data | 489601 | N/A | N/A |
BZip2 | 11362 | 9613 | 912 |
GZip | 25051 | 557 | 89 |
Snappy | 54792 | 62 | 31 |
From the result Snappy is the fastest, however, due to its non-bitwise compression design, it has the lowest compression ratio. GZip is roughly 10 times slower in compression and 30 times slower in decompression, comparing to Snappy. But GZip's final output is only half-size. BZip2 has the best compression ratio, which is almost 43:1. However it is 155 times slower in compression and 29 times slower in decompression, comparing to Snappy.
Due to the limit of 64KB each record and the possibility to add more event lists as addition attributes, as well as the current design that all reprocessing only triggered by motion data upload, we decide to go with the CPU heavy BZip2 compression. However, a field "compression_type" is added so we have the flexibility to change algorithm later.