How to integrate a timeseries database into the data model? #227

rdmolony · 2023-07-06T08:04:43Z

I mentioned TimescaleDB & discussed its pros/cons from an implementation perspective in #222 (comment) which was a little off-topic so I'm moving it here for further discussion

The text was updated successfully, but these errors were encountered:

rdmolony · 2023-07-06T08:10:35Z

@stephenholleran mentioned -

based on the fact that Windographer will very soon be able to import the data model, why try and prepare the data i.e. calibrate it, before loading it into Windographer? In theory, Windographer should be able to apply the calibrations for you as the adjustments required should all be contained in the meta data.

This could be a good option. I've been holding out on this a while now.

A "platform" could ...

Link each station to its raw files of generic readings
Export the data model

End-users would have to open all raw data files in Windographer & then import the data model

On the flip side calibrating & flagging readings in a "platform" enables it to be more agnostic about tooling. In our case we're interested in tracking the data availability which requires this being applied

abohara · 2023-07-07T16:28:17Z

Thanks @rdmolony . TimescaleDB sounds like would be one good option for storing timeseries. Though recommending a db is not part of core data model, we do have a postgres example like you noted. Would you be open to editing this example to include a timescaledb hypertable ? Perhaps, it can evolve into a blog on how the metadata could be joined with the timeseries.

rdmolony · 2023-07-07T16:42:59Z

No problem @abohara, I could do that or also a separate sql file to tools/ with a few additional CREATE statements without much hassle. I'm not fussed.

The TimescaleDB team invited me to collaborate on a blog post on this experience to date so I do intend to document this all a bit more in the coming weeks/months

abohara · 2023-07-14T19:30:11Z

Thanks @rdmolony . That would be great.

rdmolony · 2023-12-11T16:17:44Z

Hi all,

I finally got around to writing up my experience on integrating Postgres/TimescaleDB database into the Mainstream Renewable Power (MRP) data flow -

Struggling to sync sensors & databases | rowanmolony.com

My experience

I wanted to gain some experience running it smoothly in production (the switch was made in September) to be sure it actually worked before posting anything.

In the MRP context, Postgres/TimescaleDB was a good fit since it enabled reducing the complexity of the data flow system to make it more manageable by a small team of 1-3 people.

How?

It allowed us to use the web application database (i.e. the sensor metadata database) for storing & processing time series data rather than multiple tools. Two databases became one database¹ & pandas was ditched for processing timeseries². Add data flow unit tests (code that checks other code) became possibl since this system became far easier to test.

TimescaleDB is an extension to the Postgres database that enables working with timeseries readings from within the database. It provides a special table called a Hypertable which speeds up insert & query performance for timeseries tables³.

How does this relate to the IEA Task 43 data model?

I’m hesitant to recommend any next steps here since I have just left my job at Mainstream Renewable Power, and don’t intend to adapt this data model for Postgres/TimescaleDB, however, I can make a few comments in case someone would find that interesting.

Postgres/TimescaleDB probably only makes sense as an “extension” of this data model since it couples this project with a specific technology.

How could it extend the data model?

It enables storing readings in a database table just like the metadata.

One could create one table per data source like ...

timestamp	Wind Speed 10m 180deg	Wind Direction 10m 180deg
2023-11-27 00:00:00	5	60
...	...	...

timestamp	WS10.180	WD10.180
2023-11-27 00:00:00	4	70
...	...	...

... or a single “long-format” table like …

timestamp	mast_id	sensor_id	value
2023-11-27 00:00:00	1	1	5
2023-11-27 00:00:00	1	2	60
2023-11-27 00:00:00	1	3	4
2023-11-27 00:00:00	1	4	70

I opted for the "long-format" table since it easily “mergeable” with metadata like …

mast_id	sensor_id	sensor_name	data_type	height	magnetic_orientation
1	1	“WS10.90”	“Wind Speed"	10	90
1	2	“WS20.90”	“Wind Speed"	20	90
1	3	“WS10.180”	“Wind Speed"	10	180
1	4	“WS20.180”	“Wind Speed"	20	180

… or flags of erroneous readings like …

sensor_id	timestamp	flag_type
1	2023-11-27 00:00:00	“Fault”

… in SQL …

     select *
       from sensor_reading_generic as reading
inner join sensor_metadata as meta
           on  meta.sensor_id = reading.sensor_id,
     where station_id = <ID>
       and timestamp between '2023-12-01' and now()

Once the TimescaleDB extension has been installed in the database, one can create this “readings” table like …

      create table sensor_reading_generic (
          timestamp timestamp not null,
          mast_id integer not null,
          file_id integer not null,
          sensor_id integer not null,
          generic_reading float,
          primary key (timestamp, station_id, sensor_id)
      );
      select create_hypertable('sensor_reading_generic', 'timestamp');
      alter table sensor_reading_generic set (
          timescaledb.compress,
          timescaledb.compress_orderby = 'timestamp DESC',
          timescaledb.compress_segmentby = 'mast_id, file_id, sensor_id'
      );
      select add_compression_policy('sensor_reading_generic', interval '1 week');

mast_id, file_id & sensor_id are foreign keys to other tables

I’m not that familiar with this data model.

So I built it locally by …

Installing Postgres/TimescaleDB⁴
Creating the data model via tools/iea43_wra_data_model_postgresql.sql
Fillling it with data via demo_data/iea43_wra_data_model_demo_data.sql

From this model, I can see that one could use -

column_name to link sensor readings to metadata
measurement_location to link to mast name

I don’t see -

A model for logger files?
Or erroneous sensor readings?

To make the above joins easier in the context of the MRP data model I used views to merge multiple tables of interest so that things like mast_name & sensor_name are all accessible from a “table-of-tables” which I can then merge with my readings.

Let me know if you have any questions, or if interested want to discuss in more detail over a virtual coffee

Previously one for timeseries, and one for everything else ↩
Linking timeseries readings to sensor metadata for re-calibrating & flagging erroneous readings ↩
As of 2023-11-20 TimescaleDB by default chunks readings in compressed hypertables into intervals of one week so Postgres doesn't need to read all rows for queries that only care about a particular period of time, see Hypertables ↩

I used nix -

nix profile install --impure --expr 'with import <nixpkgs> {}; pkgs.postgresql.withPackages   (p: [ p.timescaledb ])'

This works on Linux/OSX or on Windows via WSL ↩

rdmolony mentioned this issue Jul 6, 2023

How to extend the data model to allow self-describing data files? #222

Open

stephenholleran added the tool This is a new tool that could be worked on label Aug 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to integrate a timeseries database into the data model? #227

How to integrate a timeseries database into the data model? #227

rdmolony commented Jul 6, 2023

rdmolony commented Jul 6, 2023 •

edited

Loading

abohara commented Jul 7, 2023

rdmolony commented Jul 7, 2023

abohara commented Jul 14, 2023

rdmolony commented Dec 11, 2023 •

edited

Loading

How to integrate a timeseries database into the data model? #227

How to integrate a timeseries database into the data model? #227

Comments

rdmolony commented Jul 6, 2023

rdmolony commented Jul 6, 2023 • edited Loading

abohara commented Jul 7, 2023

rdmolony commented Jul 7, 2023

abohara commented Jul 14, 2023

rdmolony commented Dec 11, 2023 • edited Loading

Footnotes

rdmolony commented Jul 6, 2023 •

edited

Loading

rdmolony commented Dec 11, 2023 •

edited

Loading