-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to extend the data model to allow self-describing data files? #222
Comments
Thanks @AndyClifton for posing the question. I understand the benefits of a self describing file so am interested to explore this. I suppose I have two thoughts so far;
Cheers, |
Re your questions...
|
Sorry, what I mean when talking about space is not for actual storage but in transmitting the data e.g. via a web api. Ideally you'd want it to be compact so it can be transferred quickly. |
Formally, I don't believe we have any sort of recommendation on how to combine it into a single file. Pre-fixing the json header into the CSV seems like one option but parsing it would require a little more complicated parser than reading in a csv or json individually. This does seem like the format many OEM are using as well. Regarding how the data model ( in json format ) ties into the table of measurement (e.g. in csv format ), the natural way for them to link is through the data columns table ( which also specifies the |
Understood. I think this depends on the use case. We only anticipate doing this bundling of data and metadata for specific purposes, and the main one is a user requesting an export of their data so they can use it in another application. Then it would be fine if a call was made and the user informed later that the data was ready for download. So, I don't think that speed is too much of a challenge. But, I guess the problem could be if the API call is fast enough, it would be tempting to use it more frequently. Because this seems like a UI / UX issue, I would say its worth setting out how the data model could include data, and worry about the implementation challenges separately? |
@abohara: thanks for this.
Do you have an example of what this could look like? |
@AndyClifton I know you're talking about lidars but Kintech Orbit 360 logger for met mast is a good example of metadata in json in the header and time series in the file's body. Still, such files need to be parsed in IEA task 43 format. |
Sorry for the delay, I was at WESC last week. From @abohara:
This can work however it just makes working with the data a little trickier if there was a certain type of config change, for example an 80m anemometer been moved to 100m but it is still connected to Ch1. The Ch1 column in the data now means two different things (a wind speed at 80m for the first part and then a wind speed at 100m for the second part). The data model captures the config change so all the info is there to break up that column and rename it something else. What we do is assemble the data taking into account these config changes and so then use the measurement point name as the column heading. To get around the lack of a So there are two methods you can link the metadata to the timeseries data. The difference is just when the assembling of the data is done. From @AndyClifton :
People are impatient @AndyClifton, they'll want the data immediately 😄 . There are several ways to capture tabular data in JSON Schema, we should consider data transfer speeds when picking one. Of the possible JSON formats for storing table data I would probably lean towards one of the formats that pandas outputs https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_json.html . I know we have used one of them in the past but I can't remember which. This would make it easy for the user to load the JSON back into a pandas dataframe or something similar. |
filename and checksum would be possible options to keep original format and adding metadata. We've done that and feels comfortable and retro compatibility. Anyway , if data is updated then it gets broken and checksum needs to be updated. |
Hey there, If you haven't encountered it yet, It's an extension to
|
Thanks @rdmolony, Yeah, that could be a useful for an extension of the SQL create table statements. Just a note of caution with timeseries databases. They are mostly geared towards retrieving the most recent time based data and not what wind analysts want which is the whole dataset in one go. They do shard the tables based on time, keeping the most recent in memory for fast retrieval which is great if you are wanting the last 5 days for data monitoring but maybe not so fast for 2 years of data. It all depends on what hardware you can throw at it (i.e. cost) and how you configure it too and as you say TimescaleDB does give you extra benefits by been integrated into Postgres. Thanks also for contributing in yesterday's workshop and it is great to hear that Mainstream are further aligning with the data model. |
There's no such thing as a free lunch! As always, it depends on your use case! If you're interested in our specific quirks so far with this tech, some notes -
In our case this compressed hypertable ingest constraint is "okay" for ingesting our logger data but not so convenient for 3rd party data sources. To work around the query slowness on calibrated readings we we run periodic jobs to dump calibrated readings to files. EDIT 1: I looped in the
EDIT 2: Calibrating generic readings approximately doubles the query time as |
Thanks @rdmolony, brilliant info there on how you are solving this. One comment, based on the fact that Windographer will very soon be able to import the data model, why try and prepare the data i.e. calibrate it, before loading it into Windographer? In theory, Windographer should be able to apply the calibrations for you as the adjustments required should all be contained in the meta data. This could potentially save you the headache of trying to provide calibrated data. Just a thought and we are digressing from the opening question about how the data model could accommodate the timeseries data. :) Sorry everyone else! EDIT: This timescaleDB discussion is now moved to #227. |
Also, @rdmolony if you are interested in joining our Task 43 wind resource data model working group we would love to have you? Experience from someone who is implementing the data model would be very useful and as you are a software engineer your perspective on changes going forward could be valuable. We have a 1 hour call every second Thursday at 4 pm Irish time. There will be no homework if you don't want to, it is mostly turn up and contribute within that 1 hour. |
Moving to #227 @stephenholleran! |
Sure, add me to the email thread or however these things are organised :) |
Hi folks,
We're looking at using the data model as the basis for a single file that contains both metadata and measurements from a met mast or wind lidar.
The reason for this is, to keep measurements together with metadata and so allow continuity when data are transferred from person to person, or from process to process, or between organisations. In our experience, any approach that separates metadata from data runs the risk of losing the link. Our goal is to be able to create a single file that contains the data the user/process requires as an export from a file collection, database, web app or whatever.
As we understand it, currently the data model does not have a way to store time series data. (Thanks @stephenholleran for the patience / confirmation)
🤔 How would you extend / modify the data model to include time series data?
The text was updated successfully, but these errors were encountered: