Series data type doesn't reflect "best practices" for time-series schemas #18

ixmatus · 2014-10-01T15:27:27Z

I love the aeson-esque interface that you've built but there's a glaring misstep in that the ToSeriesData is being treated a lot like tables of data instead of series of data.

Let's say I have a datatype that's an instance of ToSeriesData that looks something like this:

data Reading = Reading UTCTimeEpoch UUID DeviceType ReadingType Double
data DeviceType = Plug | Switch
data ReadingType = Watts | Volts | Temp

Let's say you have a series named "device_readings" that go into it. Works great at small-scale but the minute you reach millions of points you're suddenly hitting performance problems because InfluxDB isn't designed to handle that type of querying (SELECT uuid FROM device_readings WHERE ...) and filtering on, say, the device type column. You'll traverse the entire key space of that specific series to do that because underneath Influx is just a dumb key-value store.

If there's 20million keys in the device_readings series, that's really severe pain and you've just tremendously fucked yourself because migrating that data to another schema could take quite a bit of time...

This is my major beef with InfluxDB because they wanted to keep a "SQL like" interface to the data but the underlying model definitely will not handle the kind of queries that you CAN run on it. This is also their fault for not urgently writing up a document on "Schema Design".

TempoDB got it right. Your series name should contain the key, category, and attributes you want to "query".

So instead of a series name like: device_readings. It would instead look like: device_readings.2c9e4570-9b35-0131-c7ce-48e0eb16f719.Watts.Dimmer. You will then, efficiently, be able to query the data you want by being able to construct the key from known categories, ID's, and attributes. The datatype then looks extremely simple:

data Reading = Reading UTCTimeEpoch Double

What I would love to is a data type that can give us a structured and easy way of building series names from a key, a category, and some attributes! Which is what I wish this library was doing, instead of following a more table like model.

I'm going to throw together my ideas in a fork and see what you think of them. Because right now I'm building series names with functions and its ugly, I would rather do it with specialized data types and instances of a class like ToSeriesName or something similar.

The text was updated successfully, but these errors were encountered:

maoe · 2014-10-02T14:05:03Z

Thanks for your comment.

For now I deliberately left out the series name construction part from the library because I thought it was an application specific part. And it seemed to me that influxdb devs and community was eager to do table-like operations as column index was (and actually still is) planned.

Anyway, I'm open to suggestions. Please let me know once your ideas come out.

ixmatus · 2014-10-02T15:12:12Z

Ahh, got it. No need of they're going to implement column indexes. Do you
know how far out?
On Oct 2, 2014 9:05 AM, "Mitsutoshi Aoe" [email protected] wrote:

Thanks for your comment.

For now I deliberately left out the series name construction part from the
library because I thought it was an application specific part. And it
seemed to me that influxdb devs and community was eager to do table-like
operations as column index was (and actually still is) planned.

Anyway, I'm open to suggestions. Please let me know once your ideas come
out.

—
Reply to this email directly or view it on GitHub
#18 (comment)
.

maoe · 2014-10-02T23:35:39Z

I don't know how the development is going. You can subscribe influxdata/influxdb#582.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Series data type doesn't reflect "best practices" for time-series schemas #18

Series data type doesn't reflect "best practices" for time-series schemas #18

ixmatus commented Oct 1, 2014

maoe commented Oct 2, 2014

ixmatus commented Oct 2, 2014

maoe commented Oct 2, 2014

Series data type doesn't reflect "best practices" for time-series schemas #18

Series data type doesn't reflect "best practices" for time-series schemas #18

Comments

ixmatus commented Oct 1, 2014

maoe commented Oct 2, 2014

ixmatus commented Oct 2, 2014

maoe commented Oct 2, 2014