Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Series data type doesn't reflect "best practices" for time-series schemas #18

Open
ixmatus opened this issue Oct 1, 2014 · 3 comments

Comments

@ixmatus
Copy link

ixmatus commented Oct 1, 2014

I love the aeson-esque interface that you've built but there's a glaring misstep in that the ToSeriesData is being treated a lot like tables of data instead of series of data.

Let's say I have a datatype that's an instance of ToSeriesData that looks something like this:

data Reading = Reading UTCTimeEpoch UUID DeviceType ReadingType Double
data DeviceType = Plug | Switch
data ReadingType = Watts | Volts | Temp

Let's say you have a series named "device_readings" that go into it. Works great at small-scale but the minute you reach millions of points you're suddenly hitting performance problems because InfluxDB isn't designed to handle that type of querying (SELECT uuid FROM device_readings WHERE ...) and filtering on, say, the device type column. You'll traverse the entire key space of that specific series to do that because underneath Influx is just a dumb key-value store.

If there's 20million keys in the device_readings series, that's really severe pain and you've just tremendously fucked yourself because migrating that data to another schema could take quite a bit of time...

This is my major beef with InfluxDB because they wanted to keep a "SQL like" interface to the data but the underlying model definitely will not handle the kind of queries that you CAN run on it. This is also their fault for not urgently writing up a document on "Schema Design".

TempoDB got it right. Your series name should contain the key, category, and attributes you want to "query".

So instead of a series name like: device_readings. It would instead look like: device_readings.2c9e4570-9b35-0131-c7ce-48e0eb16f719.Watts.Dimmer. You will then, efficiently, be able to query the data you want by being able to construct the key from known categories, ID's, and attributes. The datatype then looks extremely simple:

data Reading = Reading UTCTimeEpoch Double

What I would love to is a data type that can give us a structured and easy way of building series names from a key, a category, and some attributes! Which is what I wish this library was doing, instead of following a more table like model.

I'm going to throw together my ideas in a fork and see what you think of them. Because right now I'm building series names with functions and its ugly, I would rather do it with specialized data types and instances of a class like ToSeriesName or something similar.

@maoe
Copy link
Owner

maoe commented Oct 2, 2014

Thanks for your comment.

For now I deliberately left out the series name construction part from the library because I thought it was an application specific part. And it seemed to me that influxdb devs and community was eager to do table-like operations as column index was (and actually still is) planned.

Anyway, I'm open to suggestions. Please let me know once your ideas come out.

@ixmatus
Copy link
Author

ixmatus commented Oct 2, 2014

Ahh, got it. No need of they're going to implement column indexes. Do you
know how far out?
On Oct 2, 2014 9:05 AM, "Mitsutoshi Aoe" [email protected] wrote:

Thanks for your comment.

For now I deliberately left out the series name construction part from the
library because I thought it was an application specific part. And it
seemed to me that influxdb devs and community was eager to do table-like
operations as column index was (and actually still is) planned.

Anyway, I'm open to suggestions. Please let me know once your ideas come
out.


Reply to this email directly or view it on GitHub
#18 (comment)
.

@maoe
Copy link
Owner

maoe commented Oct 2, 2014

I don't know how the development is going. You can subscribe influxdata/influxdb#582.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants