Replies: 1 comment 5 replies
-
@stephenholleran @kersting @Dynorat @sdsmdp @heikowestermann Posting the introduction as a discussion for now. Much better support for images (relative to wiki). Look forward to hearing from you. |
Beta Was this translation helpful? Give feedback.
5 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Task 43 WRA Data Model - An introduction
Amit Bohara ( Altosphere )
The need for a digital data model
A wind resource assessment helps quantify the long term energy output of a wind farm. It is a core input to wind project planning, development and investment. A bankable grade wind resource assessment will often require a prolonged campaign to measure the local wind resource lasting from one to several years. Often such measurement campaigns will utilize tall measurement towers (e.g. from 30m to 130m) and increasingly remote sensing devices such as lidars.
The data from a wind measurement campaign comprises two principal components, each of which can come in varying formats:
Sensor measurements such as wind speed, direction, air temperatures etc. logged directly from sensors at fixed intervals (typically every 10 minutes). They are commonly exchanged as raw logger files often in proprietary formats. There are half a dozen to a dozen file formats actively used in the industry;
Measurement metadata that describes the details of a measurement campaign such as the height at which the sensors were deployed, tower details, sensor calibration information and other important information related to sensors and how the measurement campaign was conducted overall. These are often exchanged as service forms by the field personnel in Word, Excel or PDF format.
The image below shows a portion of a typical service form used to exchange the details (metadata) of a measurement campaign.
Figure 1 - A portion of a typical field service form for a meteorological mast
These service forms are primarily designed for human readability. However, data in this format has several drawbacks:
It requires an expert or analyst to inspect, assess and manually transcribe.
Manual data entry not only means duplicated work for each team in the analysis chain but also increased risk from potential data entry errors. For instance, a slope as 0.0415 instead of 0.045 will have a non-trivial impact on the energy assessment.
QA/QC and analysis processes are difficult to automate and scale when manual intervention is required.
Given lack of industry standards, each organization will have service forms that vary in nomenclature, formatting and the list of data reported. This makes data exchange between teams & organization inefficient and often may require several communication attempts to resolve naming differences or missing information.
The issues discussed above can be broadly addressed by using digital data formats (such as JSON) that are friendlier for automated processing and adopting a consistent reporting nomenclature and structure. The IEA Task 43 data model addresses these challenges on how metadata related to a measurement campaign should be tracked and shared. Specifically the Task 43 Data model :
Encourages the exchange of data through digitization friendly data formats such a JSON;
Provides a standard structure (schema) for naming and organizing information related to resource assessment;
A “digitization” friendly data format:
Data tracked in PDF or Word forms can be considered “digital” in the sense that they are direct analogues for paper forms and can be managed digitally. However, these file formats are designed for human readability and carry significant hidden data related to formatting and content presentation. This impedes extraction of relevant data and are poor candidates for automated processing.
Alternatively, data formats such as JSON are much friendlier for automated exchange of data between programs while maintaining basic readability. JSON is the de-facto format for data exchange between a server and a browser on the web. Sensor metadata represented as a PDF form as well in a JSON are shown side by side below. The table on the left is clearly optimal for human readability but processing it would require specialized libraries that can parse .doc or .pdf files and additional processing to extract relevant information from text formatted as table. The data represented in JSON ( below right) provides basic readability but is much friendlier for automated processing and commonly supported by most programming languages. In fact, the JSON data below can be read and ingested in a few lines of code in Python or R programming languages.
Need for a data reporting standard:
A format such as JSON provides a friendlier medium for automated data exchange and is the default format recommended by the WRA data model. However, there still remains a number of open questions on how the data should be reported. The following are a few concerns:
Minimum reporting requirements - The figure above shows a report snippet related to a sensor deployment. However, is there more information that could be included to add further context or avoid confusion? For instance, should it be clear that the orientation is relative to the ‘true’ or ‘magnetic’ north ? Should ‘boom’ dimensions (diameter, length) be included? The lack of an industry standard on minimal data reporting responsibility has meant each organization has to discover and construct its own variant.
Standard naming conventions - The resource assessment community has established a loosely defined nomenclature which has worked reasonably well given the absence of a standard naming convention. While the nomenclature may work reasonable well for consumption by experts, it can introduce unnecessary complexity in the development of data management tools. For instance, the ‘slope’ programmed into the logger could be ‘logger slope’ or ‘slope in logger’ or just ‘slope’ (with the other being ‘calibration slope’ ). A standardized nomenclature would make data storage and exchange between organizations and teams more consistent.
Data structure - The data related to a weather measurement campaign can be deceptively complex. A mast over its life time can have many sensor replacements. Occasionally, the sensor configuration may change in the logger (e.g. the programmed slope or offset may change). Sensor height or boom orientation can change during field service. Loggers may be replaced several times and each carries its own unique settings. Understanding these complexities is one of the primary purposes of an analyst as it has direct bearing on the final results. A data structure that is expressive enough to capture the complex history of a measurement campaign can help teams pass on knowledge and reduce the cognitive & time burden of digesting the unique history of each measurement campaign.
Task 43 WRA Data Model
The Task 43 WRA committee represents a number of industry stakeholders (OEM, developers, consultants) who have jointly worked over a 18 month period to create a digital data reporting standard to address the issues described in the previous sections. The work has been released as a github repository. It’s main contributions are listed below:
Reporting structure & nomenclature - The WRA data model lists the important data fields that should be tracked as part of a resource assessment measurement campaign. For instance, here are the pages that describe the information that should be included as part of the documentation related to mast , mast geometry, sensor or the mounting arrangement of a sensor and many more. For the technically inclined, the primary schema is documented in a single file as a JSON schema and is the primary document pertaining to the data standard.
A sample document - The github repository provides a sample JSON file that describes the complete metadata history of a mast and is a digitized analog of a service form that would typically be delivered in PDF or Word format. This example shows how individual information corresponding to different topics (e.g. mast geometry, sensor , measurement types, calibration etc.) can be organized into a single document that is friendly for automated processing.
Tools and guides - The github repository includes a SQL script that will deploy the WRA data model table structure in Postgres SQL database. Additionally, a python Jupyter notebook demonstrating the ingestion and processing of JSON files containing WRA data model is also included.
How can you use it
The Task 43 data model illustrates how metadata related to resource assessment data can be tracked in a digital friendly format. There are a number of ways you can leverage the data model for your internal needs while helping accelerate the adoption of a common standard for the industry. The following are some suggestions sorted by complexity and effort:
Adopt the standardized nomenclature internally within your team. The documentation is easily accessible here.
Consider exporting your internal data in WRA data model compliant JSON file together with your “doc” and “pdf” forms. The github repository provides a sample JSON file and a basic UI form to help you get started.
Use WRA data model schema to create an internal database to track your measurement campaign. The github repository includes a SQL script that will get you started within an hour.
The Task 43 data model coalesces the shared knowledge of many industry partners whose combined knowledge represents decades of experience in the resource assessment field. It can be a resource for those looking to build the next generation of analysis tools, an enabler of seamless data exchange between industry partners or a learning resource for those entering into the industry.
(Bonus) Our experience
At Altosphere, we are working to build a data platform for the management and processing of resource assessment data for wind and solar plants. We are big believers in open data standards and have contributed as well as benefited from the Task 43 data model. The joint industry co-operation has meant the overall model is significantly better than anything we would have produced internally. This has enabled us to invest in some nifty visualization that automatically renders data in Task 43 model format into figures that make the information easily digestible. Text and table have essential details, but a picture is hard to beat and adopting the Task 43 data model has freed us to focus on the user experience.
Beta Was this translation helpful? Give feedback.
All reactions