Skip to content

Commit

Permalink
Adding some text inspired by @Tasilee's comment about idiot proofing,…
Browse files Browse the repository at this point in the history
… not being able to do, in weiging conflicts between principles of test design in the principles section of the supplement. Rebuilt generated documents.
  • Loading branch information
chicoreus committed Feb 21, 2025
1 parent 739a9ff commit 7d5f3ee
Show file tree
Hide file tree
Showing 2 changed files with 14 additions and 6 deletions.
10 changes: 7 additions & 3 deletions tg2/_review/build/templates/supplement/supplement-header.md
Original file line number Diff line number Diff line change
Expand Up @@ -643,13 +643,13 @@ The bdqffdq ontology provides a description of the specifications for a test, a

Tests should be informative when applied to data found in the wild. Each Test provides information about some aspect of the 'quality' / 'fitness for use' of the data for some specified purpose. While the Tests have been developed to address a wide variety of needs, we accept that some tests may of course, not be applicable in some domains; some tests may be irrelevant in some contexts. A suite of tests assembled for some Use Case should all be informative for that Use Case. Tests are widely applicable, address quality needs, and must be tied to at least one identified Use Case. Test development should not start with "here's a term, these are sane values it could hold", but rather with "here is an identified need for quality in the values in this term".

In general, Tests have power in that they will not likely result in 0% or 100% of all record hits (e.g. a VALIDATION type Test should not in general be expected to return NOT_COMPLIANT for almost all data in the wild or COMPLIANT for all wild data). Tests may however identify non-compliant data in a large portion of cases where we highlight an important point about quality that should be, but is not currently met by the community, e.g. where dcterms:license is EMPTY.

Given that the Tests are based on Darwin Core terms, we 'piggy-back' on the community acceptance and understanding of Darwin Core. Tests evaluate values based on the normative definition of terms, and against non-normative guidance provided with each term (the Notes/Comments), and may bring in other concepts from the real world (e.g. practical limits on elevation and depth) or vocabularies. In general, tests are only possible for terms where the value is bounded by real world extents, or by an agreed vocabulary.

#### 3.11.2 Principles

Simple. Each Test specification must be as simple as possible. The more complex a Test, the more difficult it will be to implement and to evaluate against real-world data. While the human-readable Decription of the test may be open to interpretation, Test Specifications were written to allow for no ambiguity. In some cases, this has meant that some Specifications are lengthy, but required if implementation and interpretation are to be accurate. For example, [VALIDATION_GEOGRAPHY_STANDARD](https://github.com/tdwg/bdq/issues/139) was proposed, but after discussion abandoned and tagged as DO NOT IMPLEMENT as involving too much complexity to be able to phrase a clean and simple specification. Conversely, [VALIDATION_DAY_STANDARD](https://rs.tdwg.org/bdqcore/terms/47ff73ba-0028-4f79-9ce1-ee7008d66498) asks one simple question: is the value of dwc:day an integer in the range 1 to 31.
Informative. In general, Tests have power when they will likely result in neither 0% nor 100% of all record hits. That is, a Validation should not in general be expected to return NOT_COMPLIANT for almost all data in the wild or COMPLIANT for all wild data. Tests may however identify non-compliant data in a large portion of cases where we highlight an important point about quality that should be, but is not currently met by the community, e.g. where dcterms:license is EMPTY, that is where a large portion of the community is not providing values for a term important for data quality, or providing values for a term inconsistent with best practices.

Simple. Each Test specification must be as simple as possible. The more complex a Test, the more difficult it will be to implement and to evaluate against real-world data. While the human-readable Decription of the test may be open to interpretation, Test Specifications were written to allow for no ambiguity. In some cases, this has meant that some Specifications are lengthy, but required if implementation and interpretation are to be accurate. For example, [VALIDATION_GEOGRAPHY_STANDARD](https://github.com/tdwg/bdq/issues/139) was proposed, but after discussion abandoned and tagged as DO NOT IMPLEMENT as involving too much complexity to be able to phrase a clean and simple specification. Conversely, [VALIDATION_DAY_STANDARD](https://rs.tdwg.org/bdqcore/terms/47ff73ba-0028-4f79-9ce1-ee7008d66498) asks one simple question: is the value of dwc:day an integer in the range 1 to 31.

Atomic. Specification should be as atomic as possible, dealing with only a single evaluation. For example, evaluating dwc:minimumDepthInMeters should involve separate tests to evaluate if a value is present, if the value is numeric, if the value is within a reasonable range, or if the minimumDepthInMeters is smaller or equal to the maximumDepthInMeters. These should not be combined into a single test, though tests that rely upon an interpretable value may build on a logic that evaluate as INTERNAL_PREREQUISITES_NOT_MET if no value is present or if it is not interpretable before proceeding to the central assertion of the test. Each test should be designed to independent of other tests. For example, the test [ISSUE_DAYMONTH_SWAPPED](https://github.com/tdwg/bdq/issues/37) was proposed, but abandoned and tagged as DO NOT IMPLEMENT, in part because of the comment identifing potentially entangled causes and a lack of atomicity: "It would seem this test assumes that there is a transposition as opposed to just an error in the month field. I'd say it's risky to assume that, and that this could lead to greater issues." [comment by Paula Zermoglio](https://github.com/tdwg/bdq/issues/37#issuecomment-357280140). Conversely, [VALIDATION_DAY_STANDARD](https://rs.tdwg.org/bdqcore/terms/47ff73ba-0028-4f79-9ce1-ee7008d66498) and [(VALIDATION_DAY_INRANGE](8d787cb5-73e2-4c39-9cd1-67c7361dc02e) related, but different, questions about the value in dwc:day, each separating out a distinct element of quality.

Expand All @@ -665,6 +665,10 @@ Compare data with data. When testing consistency between or among terms, values

Avoid validation of verbatim terms. Tests should not attempt to validate the content of literal verbatim terms (e.g., dwc:verbatimLocality), though such terms may be informative for validation or amendment of other terms.

Proven by implementation, validated with data. We repeatedly found that an inital phrasing of the specification of a test did not survive implementation, and subsequent phrasings did not survive exposure to multiple test values. Test specifications written without implementation and without validation against data from the wild often hide hidden assumptions, missing elements of logic, and incorrect assumptions about how a test implementation might behave. Actually producing a test implementation, and testing the implementation against input data values and expected response values (with cases provided by more than one person) have proved essential for test specification development.

Conflicts between these principles can be expected. For example, [AMENDMENT_GEODETICDATUM_ASSUMEDDEFAULT](https://rc.tdwg.org/bdqcore/7498ca76-c4d4-42e2-8103-acacccbdffa7) and its related validation proved difficult to develop, with conflicts between simplicity and the power of the test, and conflicts between the guidance in the non-normative Darwin Core Notes/Comments for the term and best practices. Requirements for data quality arising from spatial analysis impose an expectation that dwc:geodeticDatum be explicit about what is being said about the associated dwc:decimalLatitude and dwc:decimalLongitude values, with this best being expressed as EPSG code in the form Authority:Number (e.g. EPSG:4326), while much of the data in the wild is in the form of less explicit text strings (e.g. WGS84), thus these tests are aspirationaly in seeking to drive the community to best (explicit) practices. A simple validation could ask if dwc:geodeticDatum matches any EPSG code in the form Authority:Number, but the definition of dwc:geodeticDatum explicitly limits values to those that represent the Coordinate Reference System for the geographic coordinate expressed by dwc:decimalLatitude and dwc:decimalLongitude, or a datum or ellipsoid appropriate for such. This means only a subset of EPSG codes have quality, adding complexity to the test definition and implementation (with the GBIF vocabulary for Geodetic Datum not helping as it does not assert the authority for the numeric values (with both ESRI and EPSG authorities in use by consumers of spatial data), and it does not provide a means to identify which values are appropriate for dwc:geodeticDatum, given the limitations of the definition to geographic coordinate reference systems). Furthermore, the georeference best practice guide (Chapman and Wieczorek 2020), specifies the use of "not recorded" for unknown values (under specified conditions), while the Notes/Comments on dwc:geodeticDatum recommend the use of "unknown", here we support the best practice guide, but need to include the explicit value "not recorded" as valid for dwc:geodeticDatum, as well as a specified subset of EPSG codes. Thus, each test specification must be "as simple as possible", with tradeoffs from other principles likely increasing the minimum complexity.

#### 3.11.3 Consistent behaviors

Leading/trailing whitespace or non-printing characters in values shall cause validations against controlled vocabularies to be NOT_COMPLIANT, shall be proposed to be removed by amendments, and may be ignored when evaluating numeric values.
Expand Down
10 changes: 7 additions & 3 deletions tg2/_review/docs/supplement/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -691,13 +691,13 @@ The bdqffdq ontology provides a description of the specifications for a test, a

Tests should be informative when applied to data found in the wild. Each Test provides information about some aspect of the 'quality' / 'fitness for use' of the data for some specified purpose. While the Tests have been developed to address a wide variety of needs, we accept that some tests may of course, not be applicable in some domains; some tests may be irrelevant in some contexts. A suite of tests assembled for some Use Case should all be informative for that Use Case. Tests are widely applicable, address quality needs, and must be tied to at least one identified Use Case. Test development should not start with "here's a term, these are sane values it could hold", but rather with "here is an identified need for quality in the values in this term".

In general, Tests have power in that they will not likely result in 0% or 100% of all record hits (e.g. a VALIDATION type Test should not in general be expected to return NOT_COMPLIANT for almost all data in the wild or COMPLIANT for all wild data). Tests may however identify non-compliant data in a large portion of cases where we highlight an important point about quality that should be, but is not currently met by the community, e.g. where dcterms:license is EMPTY.

Given that the Tests are based on Darwin Core terms, we 'piggy-back' on the community acceptance and understanding of Darwin Core. Tests evaluate values based on the normative definition of terms, and against non-normative guidance provided with each term (the Notes/Comments), and may bring in other concepts from the real world (e.g. practical limits on elevation and depth) or vocabularies. In general, tests are only possible for terms where the value is bounded by real world extents, or by an agreed vocabulary.

#### 3.11.2 Principles

Simple. Each Test specification must be as simple as possible. The more complex a Test, the more difficult it will be to implement and to evaluate against real-world data. While the human-readable Decription of the test may be open to interpretation, Test Specifications were written to allow for no ambiguity. In some cases, this has meant that some Specifications are lengthy, but required if implementation and interpretation are to be accurate. For example, [VALIDATION_GEOGRAPHY_STANDARD](https://github.com/tdwg/bdq/issues/139) was proposed, but after discussion abandoned and tagged as DO NOT IMPLEMENT as involving too much complexity to be able to phrase a clean and simple specification. Conversely, [VALIDATION_DAY_STANDARD](https://rs.tdwg.org/bdqcore/terms/47ff73ba-0028-4f79-9ce1-ee7008d66498) asks one simple question: is the value of dwc:day an integer in the range 1 to 31.
Informative. In general, Tests have power when they will likely result in neither 0% nor 100% of all record hits. That is, a Validation should not in general be expected to return NOT_COMPLIANT for almost all data in the wild or COMPLIANT for all wild data. Tests may however identify non-compliant data in a large portion of cases where we highlight an important point about quality that should be, but is not currently met by the community, e.g. where dcterms:license is EMPTY, that is where a large portion of the community is not providing values for a term important for data quality, or providing values for a term inconsistent with best practices.

Simple. Each Test specification must be as simple as possible. The more complex a Test, the more difficult it will be to implement and to evaluate against real-world data. While the human-readable Decription of the test may be open to interpretation, Test Specifications were written to allow for no ambiguity. In some cases, this has meant that some Specifications are lengthy, but required if implementation and interpretation are to be accurate. For example, [VALIDATION_GEOGRAPHY_STANDARD](https://github.com/tdwg/bdq/issues/139) was proposed, but after discussion abandoned and tagged as DO NOT IMPLEMENT as involving too much complexity to be able to phrase a clean and simple specification. Conversely, [VALIDATION_DAY_STANDARD](https://rs.tdwg.org/bdqcore/terms/47ff73ba-0028-4f79-9ce1-ee7008d66498) asks one simple question: is the value of dwc:day an integer in the range 1 to 31.

Atomic. Specification should be as atomic as possible, dealing with only a single evaluation. For example, evaluating dwc:minimumDepthInMeters should involve separate tests to evaluate if a value is present, if the value is numeric, if the value is within a reasonable range, or if the minimumDepthInMeters is smaller or equal to the maximumDepthInMeters. These should not be combined into a single test, though tests that rely upon an interpretable value may build on a logic that evaluate as INTERNAL_PREREQUISITES_NOT_MET if no value is present or if it is not interpretable before proceeding to the central assertion of the test. Each test should be designed to independent of other tests. For example, the test [ISSUE_DAYMONTH_SWAPPED](https://github.com/tdwg/bdq/issues/37) was proposed, but abandoned and tagged as DO NOT IMPLEMENT, in part because of the comment identifing potentially entangled causes and a lack of atomicity: "It would seem this test assumes that there is a transposition as opposed to just an error in the month field. I'd say it's risky to assume that, and that this could lead to greater issues." [comment by Paula Zermoglio](https://github.com/tdwg/bdq/issues/37#issuecomment-357280140). Conversely, [VALIDATION_DAY_STANDARD](https://rs.tdwg.org/bdqcore/terms/47ff73ba-0028-4f79-9ce1-ee7008d66498) and [(VALIDATION_DAY_INRANGE](8d787cb5-73e2-4c39-9cd1-67c7361dc02e) related, but different, questions about the value in dwc:day, each separating out a distinct element of quality.

Expand All @@ -713,6 +713,10 @@ Compare data with data. When testing consistency between or among terms, values

Avoid validation of verbatim terms. Tests should not attempt to validate the content of literal verbatim terms (e.g., dwc:verbatimLocality), though such terms may be informative for validation or amendment of other terms.

Proven by implementation, validated with data. We repeatedly found that an inital phrasing of the specification of a test did not survive implementation, and subsequent phrasings did not survive exposure to multiple test values. Test specifications written without implementation and without validation against data from the wild often hide hidden assumptions, missing elements of logic, and incorrect assumptions about how a test implementation might behave. Actually producing a test implementation, and testing the implementation against input data values and expected response values (with cases provided by more than one person) have proved essential for test specification development.

Conflicts between these principles can be expected. For example, [AMENDMENT_GEODETICDATUM_ASSUMEDDEFAULT](https://rc.tdwg.org/bdqcore/7498ca76-c4d4-42e2-8103-acacccbdffa7) and its related validation proved difficult to develop, with conflicts between simplicity and the power of the test, and conflicts between the guidance in the non-normative Darwin Core Notes/Comments for the term and best practices. Requirements for data quality arising from spatial analysis impose an expectation that dwc:geodeticDatum be explicit about what is being said about the associated dwc:decimalLatitude and dwc:decimalLongitude values, with this best being expressed as EPSG code in the form Authority:Number (e.g. EPSG:4326), while much of the data in the wild is in the form of less explicit text strings (e.g. WGS84), thus these tests are aspirationaly in seeking to drive the community to best (explicit) practices. A simple validation could ask if dwc:geodeticDatum matches any EPSG code in the form Authority:Number, but the definition of dwc:geodeticDatum explicitly limits values to those that represent the Coordinate Reference System for the geographic coordinate expressed by dwc:decimalLatitude and dwc:decimalLongitude, or a datum or ellipsoid appropriate for such. This means only a subset of EPSG codes have quality, adding complexity to the test definition and implementation (with the GBIF vocabulary for Geodetic Datum not helping as it does not assert the authority for the numeric values (with both ESRI and EPSG authorities in use by consumers of spatial data), and it does not provide a means to identify which values are appropriate for dwc:geodeticDatum, given the limitations of the definition to geographic coordinate reference systems). Furthermore, the georeference best practice guide (Chapman and Wieczorek 2020), specifies the use of "not recorded" for unknown values (under specified conditions), while the Notes/Comments on dwc:geodeticDatum recommend the use of "unknown", here we support the best practice guide, but need to include the explicit value "not recorded" as valid for dwc:geodeticDatum, as well as a specified subset of EPSG codes. Thus, each test specification must be "as simple as possible", with tradeoffs from other principles likely increasing the minimum complexity.

#### 3.11.3 Consistent behaviors

Leading/trailing whitespace or non-printing characters in values shall cause validations against controlled vocabularies to be NOT_COMPLIANT, shall be proposed to be removed by amendments, and may be ignored when evaluating numeric values.
Expand Down

0 comments on commit 7d5f3ee

Please sign in to comment.