Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

high-level compliance checker #740

Open
ocehugo opened this issue May 4, 2021 · 0 comments
Open

high-level compliance checker #740

ocehugo opened this issue May 4, 2021 · 0 comments
Labels
Type:code_repo_consistency code-repo wise quirks Type:enhancement General enhancements

Comments

@ocehugo
Copy link
Contributor

ocehugo commented May 4, 2021

The Toolbox does not contain any specialized function to evaluate if a dataset (or a netcdf file) is compliant in relation to a defined standard. Most of the compliance is via coding practices, manual inspection, or individual testing of attributes/variables/dimensions.

At the moment, the toolbox creates netcdf fields by using templates (done by makeNetCDFCompliant) and some basic validations occur at export time (in exportNetcdf). Most validations are ad-hoc at the export time and the templates are compliant only to a certain degree since field filling is still required. Thus, protection is limited against invalid inputs and assurance of conventions is not stricly guaranteed (see #737 as an example).

The new +IMOS package advanced in that regard by controlling the creation of variables/dimensions by argument inspection and cross-checking, but this is not enough, since modifications after creation are everywhere.

Ideally, an interface to add things to a dataset would be the best solution (e.g. crud like). Another option would be to explicitly validate data fields before exporting. This got two avenues: a. evaluate the toolbox dataset struct state or b. evaluate the created netcdf file. The latter, however, is already done by python tools (cc-imos-checker/cf-checker), so another further option is to use them within matlab.

All four options got pros/cons:

  1. Creating a CRUD like interface that provides validation and always maintain a dataset valid (conventional) is powerful, but this kind of abstraction incur a deep redesign. The number of code changes is large, and given the lack of test coverage, quite costly. For example, the IMOS package is still barely used since testing for older parser/functions are inexistent.

  2. Implementing a schema validator to verify a dataset before importing is very close to what cc-imos-checker do and would early warn/block the user before files are created. However, it would be good for the tool to be generic enough to evaluate different validation schemas (e.g. imos, cf, or anything else).

  3. Implementing a schema validator to be run after exporting a dataset to netcdf is the same as writing a new cc-imos-checker in matlab. This is obviously more wasteful than option 2 since it will be creating files, reading files, and require the use of the matlab netcdf API/interface.

  4. Implementing calls to cf-checker and cc-imos-check python code at export time will not require any duplicated coding effort but would require distributing the software with the toolbox. This involves managing their versions and installation, interfacing the proper calls, and all the cross-language requirements.

I believe 4 should be investigated first, followed by 2. The only requirement for 4 is to investigate the cross-language support and how that affects the different distribution avenues used in the toolbox (mostly the binary package).

For 2, we already got some related functionality (e.g Util/Schema). The bulk of the work is selecting the right abstraction with matlab objects, rewriting the rules from cc-imos-checker/cf code, and rules in a declarative way.

@ocehugo ocehugo added Type:enhancement General enhancements Type:code_repo_consistency code-repo wise quirks labels May 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type:code_repo_consistency code-repo wise quirks Type:enhancement General enhancements
Projects
None yet
Development

No branches or pull requests

1 participant