Skip to content

This data specification harmonizes contextual data to support H5N1 monitoring with PHA4GE. It provides standardized, ontology-based fields and terms for comprehensive surveillance. Users can implement it via their preferred tools, including DataHarmonizer. Field and reference guides, along with curation and new term request SOPs, support its use.

Notifications You must be signed in to change notification settings

pha4ge/HPAI_Contextual_Data_Specification

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 

Repository files navigation

status

Disclaimer: This repository is in an early draft stage and is actively being developed. Expect frequent changes, updates, and potentially significant revisions as the project progresses. Feedback and contributions are welcome, but please be aware of the evolving nature of the specification at this stage.

The HPAI Contextual Data Specification

About

This draft data specification harmonizes contextual data to support the monitoring of Highly Pathogenic Avian Influenza virus (HPAI). Developed in collaboration with Public Health Alliance for Genomic Epidemiology (PHA4GE), it provides standardized, ontology-based fields and terms aimed at facilitating comprehensive, accurate, and consistent data collection. Currently implemented through the DataHarmonizer tool, the specification is designed to be adaptable for use in other tools. Supporting resources include detailed field and reference guides, as well as SOPs for data curation and new term requests.

Contribution Guidelines

We encourage feedback and contributions to improve the specification. However, please note that this is a work in progress, and the structure and content are likely to change as the project evolves. If you would like to contribute or propose changes, please open an issue or submit a pull request, or alternatively contact Emma Griffiths at [email protected]

What are ontologies and how do they improve data quality?

Labs collect, encode and store information in different ways. They use different fields, terms and formats, they categorize variables in different ways, and the meanings of words change depending on the focus of the organization (think of the word “plant”. To someone in agriculture, “plant” could mean an organism that carries out photosynthesis, while a food regulator might understand the word “plant” to mean a factory where food products are made). This variability makes comparing, integrating and analyzing data generated by different organizations like trying to compare apples, oranges and bananas, which is difficult to do.

Ontologies are collections of controlled vocabulary that are arranged in a hierarchy, where all the terms are linked using logical relationships. Ontologies are open source and meant to represent “universal truth” as much as possible (so not tied to one organization’s vocabulary of use case). Ontologies encode synonyms, which enables mapping between the specific languages used by different organizations, and every term in the ontology is assigned a globally unique and persistent identifier. Using ontology terms to standardize HPAI contextual data not only helps make data more interoperable by using a common language, it also helps to make contextual data FAIR (Findable, Accessible, Interoperable, Reusable).

The HPAI Contextual Data Specification Package

This specification is currently implemented via a DataHarmonizer validation template, with accompanying Field and Term reference guides (which provide definitions and additional specific guidance) and a curation Standard Operating Procedure (SOP). Please note, this specification is not only available in the DataHarmonizer and can be implemented in any data capture tool, please refer to the field and term reference guides for the data types and picklists.

New terms and/or term changes can be requested through GitHub using the issue request forms, with additional guidance on how to do so outline in the New Term Request (NTR) SOP. This resources are available in the files of this repository and listed below under Package Contents.

Version Control

Please note that development of the specification is dynamic and it will be updated periodically to address user needs. Versioning is done in the format of x.y.z.

x = Field level changes
y = Term value / ID level changes
z = Definition, guidance, example, formatting, or other uncategorized changes

Descriptions of changes are provided in release notes for every new version.

Package Contents

Data Collection Template

Field and Term Reference Guides

The HPAI contextual data specification has been subset into four different use case templates. These are for environmental samples, food samples, wastewater samples and host specific samples.

XLSX version

Master Field and Term Reference Guide

PDF version

Curation and DataHarmonizer SOP

New Term Request (NTR) SOP

Contacts

For more information and/or assistance, contact Emma Griffiths at [email protected] or submit a repository issue request.

License

Pending / To Be Determined

Acknowledgements

Brought to you by The Centre for Infectious disease Genomics and One Health and Public Health Alliance for Genomic Epidemiology(PHA4GE)

LogoCIDGOH2

About

This data specification harmonizes contextual data to support H5N1 monitoring with PHA4GE. It provides standardized, ontology-based fields and terms for comprehensive surveillance. Users can implement it via their preferred tools, including DataHarmonizer. Field and reference guides, along with curation and new term request SOPs, support its use.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published