last updated 2021-12-17 (BH)
Welcome to the GitHub project for IUPAC Project 2019-031-1-024, Development of a Standard for FAIR Data Management for Spectroscopic Data.
Check out the GitHub web pages site for this project to see what we are talking about.
This is a an active project. Our (rather minimal, admittedly) progress reports can be found in documents/reports
Our current working specification can be found at this site under documentation/specifications.
This GitHub project provides public copies of all presentations and publications of the IUPAC Project, as well as a reference Java implementation of the Standard as a Java library as well as a reference Java implementation of an "IUPAC FAIRSpec data and metadata extractor". This GitHub project is currently under intensely active development. It is reasonable well tested (as of 11/2024) and, though public, is only meant for demonstration purposes. implement these preliminary standards with this in mind as they periodically change as new issues emerge.
The principal goal of the project is to define standardized metadata associated with complex collections of spectroscopic data in the area of chemistry -- NMR, IR, Raman, MS, etc. The specification is modular and has been worked out primarily in the area of NMR spectroscopy at this time.
It is the IUPAC FAIRSpec Finding Aid that, when represented as JSON (in this case) or XML (leaving that for others for now), along with the extracted collection forms the basis of what we are calling "FAIR Data Management of Spectroscopic Data."
If you just want to get an idea of what the "data extractor" does and not install anything yourself, see the demos at [https://iupac.github.io/IUPAC-FAIRSpec/index.html](our GitHub website).
The five principles that underlie development of the IUPAC FAIRSpec standard are given below.
- 1. FAIR Management of data should be an ongoing concern.
- A. FAIR management of data must be an explicit part of research culture.
- B. FAIR management of data should be of intrinsic value.
- C. Good data management requires distributed curation.
- D. Experimental work is by nature iterative.
- 2. Context is important.
- A. Digital objects are generally part of a collection.
- B. Chemical properties are related to chemical structure.
- C. Data relationships are diverse and develop over time.
- D. FAIR management of data should allow for validation.
- 3. FAIR management of data requires curation.
- A. Data reuse relies upon practical findability.
- B. Data has to be organized to be accessible.
- C. Data interoperability requires well-designed metadata.
- D. Value is in the eye of the reuser.
- 4. Metadata must be registered and standardized.
- A. Register key metadata.
- B. Assign a variety of persistent identifiers.
- C. Enable metadata crosswalks.
- D. Allow for value-added benefits.
- 5. FAIR data management standards should be modular, extensible, and flexible.
- A. Modularity allows specialization.
- B. Design to adapt to future needs.
- C. Respect digital diversity.
- D. All data formats should be valued.
The code here is an Eclipse Java project. If you want to clone it, feel free. Check it out. Run the test. Even suggest changes. Contribute. Since it is quite a preliminary project, don't get too frustrated if it doesn't work for you. It probably means I have forgotten to mention some aspsect of its implmeentation. Please contact Bob Hanson ([email protected]) if you want some help. We'd like to hear from you.
The reference implementation consists of two main parts -- a Java library of mostly abstract classes that define the basics of the IUPAC FAIRSpec schema, and an imiplemenation of a "data and metadata extractor" that can produce IUPAC FAIRSpec Collections and their associated IUPAC FAIRSpec Finding Aids in JSON format.
The basic working code (https://github.com/IUPAC/IUPAC-FAIRSpec/blob/main/src/main/java/com/integratedgraphics/ifd/ExtractorTest.java) takes a monolithic ZIP file (30-200MB) provided by authors as supporting information for manuscripts accepted by the Journal of Organic Chemistry and Organic Letters and extracts Digital Objects from it into a Digital Collection. As it does this, it creates in internal Java data model in the form of a an ISFSpecDataFindingAid. When it is done, it serializes this finding aid and writes it to a file.
The Java test class is https://github.com/IUPAC/IUPAC-FAIRSpec/blob/main/src/main/java/com/integratedgraphics/ifd/ExtractorTest.java. The extractor test reads one or more "extraction scripts" from /extract/ subdirectories and uses those to parse a Figshare zip file that was deposited by the American Chemical Society as part of their FAIR Data initiative.
As it parses the extraction script, it:
- opens one or more Figshare ZIP files
- extracts Digital Objects into an "IFS FAIR Data Collection" in the site/ifs directory (not present here because of .gitignore)
- builds an IFSSpecDataFindingAid internal representation of the collection
- when done, generates a JSON serialization of the IFSSpecDataFindingAid object
Before you run the test, take a look at then test's main() method and adjust the parameters there a bit if you want. They include:
- first the first test to run (0 to 12)
- last the last test to run (0 to 12)
- targetDir leave this as "../site/ifs"
- sourceDir you can indicate a local source dir to use instead of Figshare to save download time. If you do that, you need to save the figshare nnnnnnnn.zip there.
There are several other flags that can be set. The demo is not set up for batch command-line operation, and it is not built as a JAR file. It is simply an Eclipse Java project right now.
After you run the test, the /site/ifd directory will be populated, and the /html/demo.htm file should work. Since this HTML file is going to open files on your local machine, be sure to have your browser set up for local file reading.
If this doesn't work, before you get frustrated, talk with us.
Bob Hanson [email protected]