Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

class: TaxonConcept #1

Closed
nielsklazenga opened this issue Dec 28, 2020 · 63 comments
Closed

class: TaxonConcept #1

nielsklazenga opened this issue Dec 28, 2020 · 63 comments
Assignees
Labels
class:TaxonConcept Organized in the TaxonConcept class class RDF type of term is 'class'

Comments

@nielsklazenga
Copy link
Member

nielsklazenga commented Dec 28, 2020

TaxonConcept (class)

Label Taxon Concept
Definition The underlying meaning, or referential extension, of a scientific name as stated by a particular author in a particular publication. It represents the author’s full-blown view of how the name reaches out to observed or unobserved objects in nature (beyond statements about type specimens). It is a direct reflection of what has been written, illustrated, and deposited by a taxonomist, regardless of his or her theoretical orientation (Franz & Peet 2009).
Comments

Mapping

TCS 1 DataSet/TaxonConcepts/TaxonConcept
TDWG Ontology http://rs.tdwg.org/ontology/voc/TaxonConcept#TaxonConcept
Darwin Core
@nielsklazenga
Copy link
Member Author

nielsklazenga commented Jan 2, 2021

I have adopted the text from Franz & Peet (2009) for now, but I am not sure it was intended as a definition, as it is more a description of the relationship between a taxon concept and its name. For me, defining the taxon concept by reference to a taxon name is like putting the world upside-down, so I would prefer something like this:

An hypothesis, assertion or opinion about the delimitation of a taxon.

I had 'a taxonomic group of organisms' first, but I think we should just refer to the definition in Darwin Core to indicate that our usage of the term 'taxon' also includes groups that are not normally considered taxa, such as hybrids and cultivars.

Franz and Peet's (2009) text could still very well serve as a comment.

@nfranz
Copy link

nfranz commented Jan 4, 2021

It was intended as a definition (just to clarify that). To further clarify, it makes sense to center this definition around the relationship between a label/string and its referential extension as viewed by a certain author/source and at a given time, if the intent is not to use the term "Taxon". Which is not mentioned once in Franz & Peet (2009). I understand the contextual requirement for compatibility with DwC, however. In some sense, given that the 2009 article was not really meant to be all that compatible with DwC, there might be a case here for just omitting it; or say: for an alternative conception, see...

@jar398
Copy link

jar398 commented Jan 4, 2021

It is not obvious to me that the author's full blown view of a name is going to be an extension, or a single extension. (Similarly there will be extensions, such as those hypothesized algorithmically or those for "partly blown views", that are not any author's full blown view.) For example, the full blown view might be vague or ambiguous or incomplete or inconsistent, perhaps intentionally so. A view and an extension are very different kinds of things - they have different properties and identity criteria.

I could try to invent a definition that I like better but I'd insist on starting with use cases - those should be able to dictate any missing details of the particular sense that we would like to assign to this class name. Can someone offer high-quality examples where some particular thing (hypothesis, extension, text, etc.) would be a member of class:taxonConcept? Maybe there are some in TCS ...? It is always better to build a class up from examples than to think about it in a vacuum. The latter generally just leads to a lot of unproductive philosophizing and arguing.

I'm not disagreeing with @nfranz , I'm saying that there are easier ways to hone definitions than just talking about them abstractly.

@nielsklazenga
Copy link
Member Author

@nfranz, what is the rationale behind trying to avoid the term "Taxon"? As that is where I would start and that has got nothing to do with Darwin Core.

@nfranz
Copy link

nfranz commented Jan 5, 2021

Thanks, @jar398. I may, or may not, be able to rescue "full blown" by reading it narrowly to just mean: all that one (someone else) can justifiably (according to mainstream systematic criteria for acceptable practice at the time) infer from that source about the concept's extension. Agree though, ok to move forward from that.

Examples: https://doi.org/10.1080/14772000.2013.806371

@nfranz
Copy link

nfranz commented Jan 5, 2021

@nielsklazenga It is one term that has two I think very important flavors or functional domains in biology, a more realist one (referring [however imperfectly] to natural, causally sustained phenomena) and a more constructivist one (modeling human data evolution); and, as often defined and applied in DwC, it can support both or kind of either in context, but as just one term it is not well suited to keep the two flavors apart consistently and explicitly, when and where that is needed. (And this is my vote not to continue this subthread further here; I am merely answering a question I was asked.)

@nielsklazenga
Copy link
Member Author

@jar398 , not sure what sort of examples you are after, but the use cases @jgerbracht and @camwebb presented at the IG meeting in September might be a good start. I have plenty of examples too, but they are mosses and I think it is better to illustrate with examples of better-known organisms.

At Biodiversity_Next, Olaf Banki had a nice example of the African Elephant (I think that came originally from David Remsen). @jliljeblad spoke at that symposium as well, so may have some insect examples.

And @nfranz just posted an example.

@jar398
Copy link

jar398 commented Jan 5, 2021

I'm looking not just for pointers to documents but designation of particular entities either in or described in documents. The point is to be able to nail down identity criteria, use vs. mention questions, and other ontological fundamentals (so as to enable interoperability). Are we talking (in an example) about the text in an article, or the meaning of the text (that is different), or the extension of the meaning of some text? Or something else? If there are distinct taxon concepts with the same extension, what is an example of that? Having more examples is not necessarily better since different examples may indicate different answers to the general questions; that is why I think a small number of "best" or "canonical" examples (one might say: "type" examples) is better. They should be relevant to some existing project like eBird, not aspirational. OBO tries to include one or more examples with every class definition, and I think that is a good idea, but again, one has to be a bit careful here or else the text describing the example will be too ambiguous. (In particular, taxonomic names are ambiguous in multiple ways.) A use case may have to not just give data, but talk a bit about how it will be used, because without that there are likely to be ambiguities. - I agree that this shouldn't be difficult, I'm just not sure I'm the right person to be putting such things forward.

You say @nfranz posted an example but even in that careful article I can't quite tell whether the given 'taxon concepts' are meant to be extensions, or entities (perhaps bibliographic or conceptual) with associated extensions. Is it possible to have two distinct taxon concepts with the same extension? That's not answered. They have 'membership compositions' - that is useful information - but do they have anything else? Talking about the inferences that happens inside of real applications is going to hone these questions more than looking at prose like Nico's article. So when I say "example" I really mean, ideally, examples of digital data (like a row of a csv file) in its natural habitat (such as Euler/X or iNaturalist). Stuff of the sort that this ontology is targeting. Prose is not so helpful as an example source (unless the entity in question is textual), and if there is no inference (such as deduplication) it is hard to know what is intended. Again, this is not a matter of speculation. We should be able to look at what our applications do and figure it out from there.

@jar398
Copy link

jar398 commented Jan 5, 2021

In TCS 1 taxon concepts are definitely textual in nature - at least, I don't see how to read it otherwise, although it doesn't come out and say so. Evidence: a TaxonConcept can have only one taxon name. (by comparison, an extension might have many taxon names, if it has many descriptions/publications.) We could say that TaxonConcept is carried over as compatibly as possible from TCS 1 to TCS 2, but I think examples would still be in order, to clarify questions like this.

@nfranz
Copy link

nfranz commented Jan 5, 2021

Thanks, @jar398. I get that. I am not sure we have that on tap and ready to be deployed. A while ago I started more of a HowTo guide; but that is semi abandoned. Two sensible outs I see. (1) Decide that that is more implementation than TNC document specification and take a pass, either pragmatically or perhaps even more profoundly (see below). (2) Yes, do that work for an existing, sufficiently structured, relevant source. eBird would be great. Avibase might be easier. Both options (two phrases back) may have downsides.

The way I have preferred to implement this, to address your questions. I have looked for maximizing intensional congruence across concepts in separate treatments, wherever I could imagine sitting in front of an audience of skeptics and hold my ground well enough. My base challenge to myself has been: at what point can I no longer claim with a straight face that intensional congruence can somehow be rescued for any subset of the concepts being aligned? Inversely: I have looked for rather unassailable evidence, textual or contextual or otherwise, of non-congruence; and in absence of that given it the benefit of the doubt. So therefore I have asserted RCC-5 articulations for meanings, more so than texts. Extension has been mostly a synonym of meaning. Because there is now a kind of pay-off by saying: two separately published concepts are congruent ("same extension") - that is the supposedly helpful integration product being offered - yes of course two concepts can be distinct minimally in the sense of: two non-identical name sec. source labels; while having congruent extensions.

In the Perelleschus paper referenced above, 54 concepts are recognized. There are numerous instances of congruent extensions among these, hence there are far fewer instances of reciprocally non-congruent sets or clusters of concepts (if that makes sense). That is further explored here: https://doi.org/10.1371/journal.pone.0118247 (which has input data files for reasoning). Look for "Alignment 1 — Voss (1954) and Günther (1936)" to dig deeper.

Hard though for anyone, I suppose, not just me, to divorce this effort then from some related political aspirations. I'd rather be vague and allow different more specific implementations of a purposefully under-specified TNC document fight for however local and temporary functional adoption, than overly constrain future application through examples that might limit someone's freedoms to actually use RCC-5 productively. The experienced truth, I think, and way out of this maybe false choice, is to ask ourselves: ok, research communities in biodiversity are often quite shy with this RCC-5 business. How, minimally and through well chosen examples, can the TNC document serve to reduce that reluctance?

@nielsklazenga
Copy link
Member Author

Yes, I think it is definitely the intention to carry over the TCS 1 TaxonConcept as compatibly as possible and I agree that we need a lot more than TCS 1 has, including examples of taxon concepts (and also examples of what are not taxon concepts).

The definition in the TCS user guide, by the way, is:

A Taxon Concept is a name plus a description of a taxon.

There is some good stuff in sect. 14.1.

@jar398
Copy link

jar398 commented Jan 6, 2021

@nfranz, I think there is enough current practice that (a) we don't have to be political and (b) there is little need for underspecification. I think that if you are dealing with an existing specification or platform that is underspecified, then that underspecification needs to be preserved. DwC seems to be like this. But that is not the case here. TCS 1 is pretty well specified (if I remember correctly - would need to review it), and the TCS 2 features that are not in TCS 1 are new so we can be totally prescriptive (subject to a desire for utility). - to repeat what I said before, underspecification is a recipe for non-interoperability, chaos, and errors. The political gains of underspecification are short term and rely on debts that always have to be paid off later. There are limits to how sharply anything can or should be specified but that's not what I'm talking about here.

Looks to me like 'Taxon concept' sensu TCS 1 should be preserved, possibly under a different name, and possibly a new, separate class 'extension' added - and conceivably a third one, for 'intension', as you might be suggesting - each with different identity criteria. I'm not sure where our discussion most recently ended on that. I think I had suggested using 'extension' informally in the documentation but not turning it into an ontology term but I'm not going to voice much of a position here. - but. again, the deciding factor here should be which things we need for data exchange, and that depends mostly on what kind of reasoning our various platforms and applications are doing (also on what we consider erroneous inputs, misuse, etc of the platforms), and that's an empirical question. If having two entities (rows, etc) with the same extension is the wrong way to use a given application, then those entities are probably intended to represent extensions, not taxon concepts. If it's right then they're taxon concepts or taxon intensions (which can be distinguished by the same method), etc. We can tell the difference by looking at how the application works and what input constraints have to be observed to get good results from it.

You'll probably have a chance to be politically aspirational in the TCS 2 documentation. Or maybe your aspirations are captured well by Euler/X and the ways it's used, and targeting Euler/X as a use case could ensure that they're represented.

I understand you've voiced a nuanced view above and maybe I'm not treating it with sufficient care, let me know if any of this helps

@nfranz
Copy link

nfranz commented Jan 7, 2021

Thanks, @jar398! I don't have much to add. Yes, for the paradigm case higher-level taxonomic concept alignment, say "the oak genus of the Chinese Flora" vs. "the oak genus of the Mexican Flora", my taxonomic instincts have pointed to: intensionally congruent; extensionally overlapping (some children being widespread). But not all data worthy of alignment offer that duality, that clearly. Sometimes, a single "congruent" is the most sensible, and will do good services that way. That said, I feel your comment is pointing in the right direction.

@nielsklazenga nielsklazenga transferred this issue from tdwg/tnc May 15, 2021
@nielsklazenga nielsklazenga added class RDF type of term is 'class' class:TaxonConcept Organized in the TaxonConcept class labels May 15, 2021
@nielsklazenga nielsklazenga added this to the Taxon Concept terms for 01 Feb 2022 meeting milestone Jan 24, 2022
@Archilegt
Copy link

I am wondering why "Taxon Concept" (the label) and not "Taxonomic Concept". The latter makes more sense to me, but I am not a native English speaker. In Spanish it is also better "Concepto Taxonómico" than "Concepto de Taxon".

@Archilegt
Copy link

Archilegt commented Jan 31, 2022

About the "Definition" of Taxon[omic] Concept: "The underlying meaning, or referential extension, of a scientific name as stated by a particular author in a particular publication." That part is overly complex.
Simpler/simplest: "A description or a definition of a taxon denoted by a scientific name, as stated by a particular author in a particular publication."
Additionally: Within the "knowledge of the object" sensu Linnaeus, the narrower taxonomic concepts are formed by characters and characters states, but exclude attributes and relacters (sensu Dubois, 2017).

@nielsklazenga
Copy link
Member Author

nielsklazenga commented Feb 1, 2022

I am wondering why "Taxon Concept" (the label) and not "Taxonomic Concept". The latter makes more sense to me, but I am not a native English speaker. In Spanish it is also better "Concepto Taxonómico" than "Concepto de Taxon".

The terms are used interchangeably, but it was TaxonConcept in TCS 1 (so that is the short answer). My two-cents' worth is that TaxonConcept is correct and that 'taxon concept' and 'taxonomic concept' are two different things. It is not just a matter of using a noun or an adjective. The noun for which 'taxonomic' is an adjective for is 'taxonomy', not 'taxon'. 'Taxonomy' is much broader than 'taxon'. 'species', 'genus' and 'scientific name' are all taxonomic concepts and that is just a few examples from biology and, unlike 'taxon', 'taxonomy' is also widely used outside biology.

@nielsklazenga
Copy link
Member Author

About the "Definition" of Taxon[omic] Concept: "The underlying meaning, or referential extension, of a scientific name as stated by a particular author in a particular publication." That part is overly complex.

We could lose the ',or referential extension` bit, if that helps.

Simpler/simplest: "A description or a definition of a taxon denoted by a scientific name, as stated by a particular author in a particular publication."

That is sort of what it was in TCS 1. The problem is that this definition excludes a lot of things that we consider taxon concepts, such as checklist entries etc. A Taxon Concept needs neither a description nor a scientific name (so for me the problem with the (current) definition is the word 'scientific'). They need a label and sufficient context to be able to compare them with other Taxon Concepts (a lot of things that we have to deal with as Taxon Concepts do not even have that).

I would have gone from the Taxon rather than the Taxon Name, so something like:

An opinion about the delimitation of a Taxon (sensu Darwin Core) or taxonomic group as as stated by a particular author in a particular publication.

...but most people do not like that and also that still not covers all the things that we want to treat as Taxon Concepts.

@jgerbracht
Copy link

jgerbracht commented Feb 1, 2022 via email

@nielsklazenga
Copy link
Member Author

How about 'The delimitation of a taxon as stated by a particular author in a particular publication' ?

I would be very happy with that. Maybe make it '...stated or implied by...'? I am sure we can pick holes in this definition, as we can in all others, but I think perfection is unattainable here.

@camwebb
Copy link
Member

camwebb commented Feb 1, 2022

My contribution during the meeting, take it or leave it 🙂: "The delimitation (or boundaries) of a taxon, usually by humans, often
through the work of taxonomic circumscription, usually communicated in a publication."

I'd revise it to: "The delimitation of a taxon, often established through the work of taxonomic circumscription, usually communicated in a publication." or even... "The delimitation of a taxon, usually communicated in a publication."

@nielsklazenga
Copy link
Member Author

nielsklazenga commented Feb 21, 2022

Sorry to keep bringing this up, after we spent almost all of last meeting on it, but I think this is the whole ball of wax and we will not get anywhere unless we can settle this properly, as we (or other people if we do not) will keep revisiting it if we do not. I do not really think that anybody in the meeting thought this would be the end of the discussion, but, after having thought about it for a while (almost three weeks now), I think we cannot just put it to bed and move on.

Needless to say, after this, that I am not happy with the definition as it stands now (#1 (comment)). It entirely lacks the "concept" and is less a definition of a term than a description of the data we want to put in there. I think that, if we define terms based on what the data looks like, we will always be going in circles.

I have been thinking about language a lot in the last two weeks in order to understand my own thinking and to try to understand why I have so much trouble explaining things that I think I have so clearly in my mind to other people (or why other people do not get it really). I think that people's brains are wired (slightly) differently because of the language they grew up with. Also, if English is not your first language, if you want to really understand something, you always fall back on your first language. I have been living in Australia for 22 years and, since I graduated from university, have never written anything significant in any language other than English, and I still do that.

So, I hope the following is helpful.

We have the word 'concept' in Dutch, but we are much more likely to use one of its synonyms (https://www.interglot.com/dictionary/en/nl/translate/concept), 'begrip', which literally translates in English to 'understanding', or 'opvatting', which translates (also literally) to 'opinion'. So, in my mind, all the definition of 'Taxon Concept' needs to (and should) be is:

Understanding of a taxon

This is what I have always understood taxon concepts to be and what my colleagues, who know nothing about biodiversity informatics and have never heard of TCS, understand them to be. Taxon concepts were not invented by TCS or Franz and Peet (2009), they have always been there; maybe not exactly as the combination of words 'taxon concept' (that would be one word in Dutch if we had one), but we certainly always have been talking about someone's concept of a taxon.

Note that the definition above is the same in meaning, if not verbatim, as the definition from Franz & Peet (2009) that we started with. Franz and Peet's definition was for a different audience and they tried to avoid the term 'taxon', because of difficulties with the term for that audience. For our audience, the term is unproblematic, as there is a perfectly adequate definition of Taxon in Darwin Core. On the other hand, I think it is important to avoid 'scientific name' to make clear to an audience of largely non-systematists that names are not the things we are interested in, but are the labels of the things we are interested in.

I also removed the 'as stated by a particular author in a particular publication', as I do not think being published (or having an 'according to') makes something a taxon concept. A notion (which funnily enough also translates to 'begrip' in Dutch) about a taxon in someone's head is just as much a taxonomic concept as a published opinion. Of course we cannot do anything in TCS with taxon concepts that are not communicated in some way, shape or form, and in TCS taxon concepts need to have an 'according to' (and a label), but that has got nothing to do with definition.

I think we have focused way too much on names and publications and possibly have lost track a bit of what we really want to describe, convey, or exchange. That is what happens when you look too closely at the data – or get at it from the data. You stop seeing the forest through the trees (or the domain through the data).

I think less is more here and that removing every reference to names and publications actually makes the definition clearer and makes it easier for people to understand what things are taxon concepts and what things are not. I think it is clear, for example, that it is clear that there is no difference between the taxon concepts in individual publications ("taxon name usages") and the so-called "deep" taxon concepts in e.g. AviBase (this is absolutely not to take away from AviBase, which I think is great). It is also clear that synonyms, no matter how broadly you take the term, are not taxon concepts. At the data level, I myself, like most of us, have always treated synonyms as taxon concepts (or the same as taxon concepts), and not just because this is the only way you can deal with synonyms in TCS 1 and Darwin Core, but I have never thought they are taxon concepts (they are names) and I do not think this should be accommodated by the standard (and certainly not by the definition). That would just stop people from looking for better ways...and there is a better way.

@deepreef is not the only one who can write long comments.

@nfranz
Copy link

nfranz commented Feb 21, 2022

A way to provide a functional definition may be this? An identifiable taxonomic position that can be aligned to other such positions via [TCS-compatible] relationships.

I like this because it shifts the work of productive definitional precision (and productive ambiguity) to those agents that are providing the relationships. And it's the production of relationships that we really should try to incentivize (I assume that is a shared view).

If and when these agents (humans, human-specified algorithms) feel justified in producing alignments, well I suppose then we others are justified in harvesting the information integration benefits, thereby granting in turn that was has been aligned somehow met the functional thresholds of being taxonomic concepts.

@deepreef
Copy link

I think @nfranz is on the right track here. I'm not sure the word "position" is right (maybe "assertion"? But that's not much better, and may be worse), so there needs to be some wordsmithing, but my gut tells me this is the right direction to go.

Now, for some elaboration:

This general issue has been extensively discussed/debated for decades, and remains unresolved. Ironically, it parallels the "species concept" debate (i.e., no end in sight), even though "concept" is used in a different sense.

I would STRONGLY prefer to avoid the word "concept" -- in part because of the "species concept" confusion, but mostly because of the excessive amount of "baggage" that word carries. By "baggage" I mean that almost everyone in our space (Biodiversity/Taxonomy/Informatics/etc.) has a clear (in their own mind) understanding of the meaning of that word, get there are dozens (hundreds?) of subtly (and not-so-subtly) different interpretations of its meanings. The problem is that when people see that word, they immediately interpret it in their own sense by default, even if provided with a specific definition. Keeping the word "concept" as part of the term will perpetuate that barrier to effective confusion indefinitely.

While certainly not perfect, I think the word "circumscription" suffers far less baggage and associated heterogeneity in meaning within our assorted relevant communities. It immediately invokes the notion of a set of things, and filters out any meaning associated with classification/hierarchy (which some definitions of "concept" include).

Aside from the term, we also wrestle with the "thing" that forms the basis of one of these instances (concept/circumscription). I think we all agree that the "thing" is not a scientific name. The "thing" involves actual physical biological organisms, and the scientific name is just a crude and inconsistently applied text-string label that has historically been used to (roughly) represent the "thing". So I hope we can all agree that the name is not the "thing".

But we still have several candidates for the "thing". I think the two most commonly discussed options are:

  1. The thing is the circumscription of organisms implied or defined within a TNU.
  2. The thing is a well-defined abstract object that represents a stacked set of circumscriptions of organisms implied or defined within multiple TNUs that are deemed to represent congruent circumscriptions.

Option 1 implies that identifiers are minted for TNUs, and we have secondary data structures that track sets of TNU-circumscriptions asserted to be congruent or asserted to have other RCC-5 relationships with other TNUs.

The advantages of this approach are:

  1. The definition of concept/circumscription and the definition of TNU are the same, and we don't need to define another class or mint identifiers.
  2. We can more-or less define a TNU objectively, and the ratio of substance to fuzziness is pretty good.
  3. TNUs are the foundation of all nomenclatural and taxonomic assertions, as well as the anchor-point for all biological information, so they play a central role in all of biodiversity informatics (i.e., we're going to need to robustly deal with them any way, so might as well make them the core object of "taxon concepts" as well).

The disadvantages of this approach are:

  1. We don't (yet) have records for the vast majority of TNUs in existence. Certainly not all are necessary, but even the key ones (Protonyms, major revisions, etc.) still do not exist in structured form with persistent identifiers.
  2. We don't (yet) have a robust set of RCC-5 relationships among the TNUs that we do have, and we don't (really) have a standard way of minting them such that they can be easily shared.
  3. Even if we could solve these two things, the network of RCC-5 relationships necessary to do any sort of reasoning or derive any useful utility about taxon concept mapping is almost intractably large, and would probably necessitate huge amounts of computing power to run even simple queries.

Option 2 implies that we have some mechanism for recognizing/defining a particular abstract circumscription of organisms, and we assign a single identifier to each unique circumscription. One or more TNUs would be linked to each identified/defined circumscription, but would reman as separate "things" (perhaps they could be framed as "instances" of a particular identified/defined abstract circumscription).

The advantages of this approach are:

  1. RCC-5 relationships are applied directly to these abstract defined/identified circumscriptions, so we avoid the problem of an intractable number of RCC-5 relationships among individual TNUs
  2. This approach is probably more intuitive for most biologists and most informaticians
  3. There would be no need to define "congruent" relationships among these things, because by definition two circumscriptions are the same if they are congruent, so there is only one identifier needed.

The disadvantages of this approach are:

  1. We would need a pretty solid definition for these things, such that using that definition it would be unambiguous whether two defined circumscriptions are the same, or different. No one has (yet) proposed such a definition that is practical.
  2. Just because there would logically be no need to define "congruent" relationships between two of these things, doesn't mean they won't get minted by accident. Thus, there needs to be a mechanism for establishing two different identifiers as being duplicates, whenever two minted instances of this thing are deemed to represent congruent circumscriptions.
  3. There would almost certainly need to be a central authority to mint/define these things.

For most of the past couple of decades, I've been a firm supporter of Option 1, on the grounds that it's relatively easy to define a TNU in a way that most people would implement them in the same way, but it's almost impossible to define a "taxon circumscription" independently of any particular TNU in a way that would be used consistently and semi-objectively.

However, over the past year or so, Dave Remsen, Nicolas Bailly and I have been meeting every Thursday to brainstorm this stuff, and we think we're on to an approach for Option 2 that could work pretty well. I originally suggested it at a workshop hosted by Bob Peet to establish FGDC metadata standards for taxonomy back in the late 1990s (I don't remember exactly when, but Stan Blum, Walter Berendsohn and others in this space were there). Basically, I pointed out that taxon concepts/circumscriptions could be defined at different levels of granularity: taxonomic, geographic, population, and individual organism. The last (individual organism) is the most granular, but also the most useless (in that the vast, vast, vast majority of organisms on Earth are never seen or documented or recorded by humans, nor ever will be). Defining concept circumscriptions based on geography or specific populations is fraught with peril at many levels.

That leaves defining taxon concepts based on taxonomy -- which is the least granular, but definitely the most practical. Using the word "taxonomy" in this sense is misleading, because specifically what this approach does is define taxonomic circumscriptions by included vs. excluded name-bearing type specimens. What has changed in recent months through discussions with Dave and Nicolas is the realization that we can devise specific mechanisms for tracking these kinds of circumscriptions based on, for lack of a better term, "Protonym Count".

This post is already WAY too long, and the amount of text and diagrams necessary to adequately communicate our ideas about this would be enormous. But we're chipping away on documentation to explain and illustrate these ideas, and we'll certainly share them with this group as soon as they're ready. But the point is, I see enough promise in this approach that I've shifted my decades-long stance supporting "TNUs as proxies for taxonomic circumscriptions" (Option 1 above) to "sets of implied Protonyms as explicit definitions of taxonomic circumscriptions" (Option 2 above).

I strongly doubt that this post has added any clarity to the discussion, but at the very least I can reclaim my throne as provider of overly long comments...

@nielsklazenga
Copy link
Member Author

And I am with @ghwhitbread . Greg just beat me too it, so I am going to repeat a lot of what he said.

I do not really see the difference between options 1 and 2, but I think option 1 are supposed to be what TCS, the TDWG Ontology, Franz & Peet 2009, and the OpenBiodiv Ontology – and AviBase for that matter – call Taxon[omic] Concepts. This includes taxonomic treatments, entries in field guides, entries in checklists, entries in databases like Catalogue of Life, and clades in published cladograms. I see some solid differences between these things. [AviBase also calls the things it assigns AviBase IDs to – which I think are supposed to be option 2 – Taxon[omic] Concepts].

But, as Greg also said, "none of this matters". What matters is what we want to do with these things. If we want to classify them, align them and, most importantly, push them around with TCS, they have to be TCS Taxon Concepts. We cannot have two classes doing the same thing. So TCS Taxon Concept has to be a pretty big tent.

tcs:TaxonConcept is not a taxon concept.

Does anybody really think that dwc:Occurrence is really an occurrence?

As Greg already hinted at, 'Taxon Concept' is only a label. I think it is a good label, but, even if I did not, it has so much history and it has been used so often, that it would be crazy to change.

Here we have a thing called tcs:taxonConcept. We are chartered with the task of freeing it from the realm of models and agreeing on a vocabulary of properties that we can apply to those taxonomic things we care about and contribute to an open taxonomic knowledge graph.

Couldn't have said it better myself. That is exactly what we are supposed to be doing.

@deepreef
Copy link

deepreef commented Mar 3, 2022

But, as Greg also said, "none of this matters". What matters is what we want to do with these things. If we want to classify them, align them and, most importantly, push them around with TCS, they have to be TCS Taxon Concepts.

This is exactly why it DOES matter! If we want to classify them, align them, and share them via well-defined terms for classes and properties, then we need to have a shared understanding of what they actually "are". If the "thing" is a TNU, then don't call it a "Taxon Concept".

There is a fundamental (and important) difference between how people will use the standard depending on Option 1 vs. Option 1.

Option 1: tcs:TaxonConcept = "Treatment" (i.e., accepted-name TNUs plus heterotypic synonyms; = "Potential Taxon" of Berendsohn et al. 1995). In this scenario, TC identifiers are TNU identifiers, and are explicitly tied to a single TNU, which carries with it properties of Rank, Spelling, higher classification, etc., within the context of a particular Reference. All Treatment/TNUs deemed to represent the "same" Taxon Concept are mapped directly to each other via "Congruent" relationships.

Option 2: tcs:TaxonConcept is not the same thing as a Treatment/TNU. Rather, it is an abstract entity representing an implicit set of organisms that are taxonomically homogeneous.

Option 1 encourages conflation of the name, rank, higher classification, and other properties of the TNU as though they are properties of the tcs:TaxonConcept. Moreover, it becomes intractable how many relationship statements are necessary to represent a "stack" of congruent TNUs. For example, when 50 TNUs are all asserted to represent the "same" Taxon Concept, how is this relationship expressed? Is one of the TNUs deemed to be the "master"/hub and all other 49 TNUs are directly linked as congruent to that one? Or are they daisy-chained such that TNU2 links to TNU1 as congruent, and TNU3 links to TNU2 as congruent, and so-on down to TNU50 (i.e., 50 separate statements of congruency)? Or should all congruencies be explicitly mapped among all 50 TNUs (which, if I'm not mistaken, is 1,225 statements of congruency if they are taken as reciprocal, and double that if they need to be expressed in both directions separately)?

Option 2 allows the tcs:TaxonConcept to be completely independent from any particular TNU (and hence, independent of name, rank, higher classification, or Reference), and DRMATICALLY simplifies the process of representing RCC-5 relationships. The problem with Option 2 is that it needs a clean/solid definition for what this Taxon Concept "thing" is, that facilitates computability and re-use.

I used to favor Option 1 because I saw no pathway to overcome the problem of Option 2. I switched sides because I can now see that pathway.

In any case, my stubbornness/arguments here have very little to do with philosophical arm-waving, and very much to do with practical implications of the standard. A big part of the reason that TCS 1.0 failed to become widely adopted is that it wasn't practical for a lot of providers/consumers to implement.

Does anybody really think that dwc:Occurrence is really an occurrence?

YES!!! It absolutely is an occurrence ("An existence of an Organism at a particular place at a particular time.")! One of the main reasons there is now a dwc:MaterialSample Working Group is because enough people realize the importance of correcting the MISconception that dwc:Occurrence is a "specimen". For TCS, why would we want to intentionally promote the misapplication of a term?

@nielsklazenga
Copy link
Member Author

@deepreef, you still do not seem to understand what we are doing here. Please read the charter of the TCS 2 Task Group at https://www.tdwg.org/community/tnc/tcs2/. We are NOT changing the TCS Taxon Concept class.

A Taxon[omic] Concept is a real thing, whether you like it or not, or whether you get it or not, at least to people like @nfranz and myself (and the makers of AviBase from the looks of it) and that is the thing we are after. If you do not agree with that, that is fine, but the entire purpose of the TCS 2 Task Group, not to mention the TCS Maintenance Group, and what we have been chartered to do, is to make the TCS Taxon Concept work, not to reinvent the wheel. This is not some book club with free-flowing discussion that can go anywhere and does not get us anywhere.

@deepreef
Copy link

deepreef commented Mar 4, 2022

@nielsklazenga :

I specifically read the charter before posting, to make sure I was not off base. I just re-read it again now. Maybe you can quote the part of the charter that you believe I am misunderstanding?

A Taxon[omic] Concept is a real thing, whether you like it or not,

You apparently haven't been reading my posts very carefully. The problem is not that I don't recognize a "Taxonomic Concept" as a "real thing". The problem is that I see two very distinct "real things" it could refer to, and I am very explicitly trying to pin down which version of the "real thing" that tcs:TaxonConcept will be defined as. I don't know how to make it any more clear than I already have. I am not trying to reinvent the wheel. I am trying to pin down a key definition that was not effectively pinned down in TCS 1.0.

If you genuinely believe that "Option 2" is off the table because it somehow represents a "significant change to the meaning of terms", then fine -- we'll introduce it in TCS 3.0. In that case, I will (again) suggest a better definition for tcs:TaxonConcept that represents Option 1, and does not cross the threshold for "significant change":

"An set of organisms, explicitly indicated and/or implied to exist, that are asserted by a particular static reference to be taxonomically homogeneous and collectively represent the entirety of a taxon."

@jgerbracht
Copy link

I also see 2 distinct real things and in my view the TCS 2.0 standard will not be an improvement over TCS 1 unless we have a clearer definition of TaxonConcept (I think we can all agree to that). As I stated at the beginning of this entire TCS 2 process, the lack of a clear definition was the main reason that we didn't adopt TCS 1.0, the definition of TaxonConcept was vague enough that it could be, and often was, interpreted as either of these 2 distinct real things or even less constrained interpretations of a taxon. I recall also from early discussions that we would agree to tackle the 'deep' concept in a latter phase, as @deepreef is also floating as an option, and that is still fine with me, if it helps us make progress (but we should clearly state that we are shelving the 'deep' concept for a latter effort). Since we do have 2 distinct real things which were 'combined' in TCS 1.0, we need to decide which of those two should be labeled tcs:TaxonConcept and we need a very clear definition of tcs:TaxonConcept. I think we need to split the current tcs:TaxonConcept into these 2 real things and I'm OK with either of these being assigned as the tcs:TaxonConcept, but if we try to have tcs:TaxonConcept represent a TN through time, a TNU and a 'deep' Taxon Concept then I don't think we've done what we really needed to do to improve TCS. If we decide that tcs:TaxonConcept is a TNU, then this definition is a great start.

"A set of organisms, explicitly indicated and/or implied to exist, that are asserted by a particular static reference to be taxonomically homogeneous and collectively represent the entirety of a taxon."

I might remove 'static' as that implies to me that it has been published and we run into cases where we need to exchange observational data (and associated concept data) pertaining to a TaxonConcept (TNU) which has not been published.

@nielsklazenga nielsklazenga removed this from the Taxon Concept terms for 01 Feb 2022 meeting milestone Nov 7, 2022
@mdoering
Copy link

mdoering commented Nov 8, 2022

I am awefully sorry that I missed this vivid earlier discussion. We all seem to agree there are 2 things that can be defined, so why don't we just define both instead of choosing one over the other? That would also help to have a better definition.

I would also like us to think about how to express Plazi style treatment data in TCS2 with real examples as this is a important source of actual digital taxonomic data. Plazi uses the terminology article, treatment and treatment citation (name/ name usage relations. Mostly synonyms, but also references to a basionym, replaced name, etc) for their main entities.

I worked on the Berlin Model quite a bit. It was a major influence to TCS as was Prometheus from Jessie Kennedys (TCS) team. The potential taxa sensu Berendsohn were clearly like the TaxonConcepts in TCS1: "taxa as circumscribed by a reference". They are TNUs with references varying from journal articles to personal communication or just a persons name in a given year. I always found this wide range of reference types a barrier to working with concepts. This is where the taxon concept explosion starts. Limiting them only to treatments, i.e. statically published TNUs through scholary works, makes it tangible as you can progressively capture immutable data. Usages maintained in databases that can be in a permanent flux and are much harder to work with. For that reason alone I wouldn't mind having an explicit Treatment class in TCS - but it is also very limiting for sharing database work for current taxonomic activities.

Darwin Core on the other hand never had focused on taxonomy. In the earlier days it was expected that at some stage the TDWG standards could be joined into a larger model and different standards had a different focus on what is modelled. TCS1 has placeholders for specimens or literature references for this reason. DwC needed some taxonomic terms in order to exchange occurrence data though, but there was no need to structure it properly. Only later we kind of hijacked DwC to also share standalone taxonomic data without occurrences. This evolution of DwC over time and the desire to keep terms stable and not rename them (there is no strict versioning in dwc) lead to inconsistent naming of things. dwc:Taxon existed early on; it makes sense to speak of a taxon in the light of an identification/occurrence as you never actively tie an observation to a synonym. dwc:taxonID therefore was born as the primary key just as there is occurrenceID for Occurrence. Once we wanted to share traditional taxonomic checklists though we needed a way to also share synonyms. And ideally also taxon concepts as per TCS1 or the Berlin Models potential taxa. By that time we liked to refer to name usages instead of taxa, hence parentNameUsageID, acceptedNameUsageID and originalNameUsageID were born - but referring to a taxonID, not a nameUsageID of a NameUsage class. dwc:Taxon was considered to be treated as NameUsage, i.e. a TNU. taxonConceptID and scientificNameID never got much attention and use, but were originally created with the desire to be able to differ between names (scientificNameID), name usages (taxonID) and also taxon concepts (taxonConceptID) in the sense of Richs Option 2 or Avibase IDs. For both no explicit class term was created and it was all pooled under the broad dwc:Taxon class. What exactly dwc:taxonConceptID points to never was really clear. I saw it basically as a shortcut to define RCC5 equals relations between name usages - either by picking an existing taxonID of a name usage as the representative usage for the concept (Option 1) - think of type specimens or reference sequences in OTU clustering or by creating explicit concept ids for just the purpose of aggregating all congruent name usages as Avibase does (Option 2). As we know the taxon terms in Darwin Core are not it's strength, I would not use them as a source for TCS terms. I think it is better to come up with something consistent.

So much to the history. Sorry for not having a clear proposal, I probably only added to the confusion.

@nielsklazenga
Copy link
Member Author

nielsklazenga commented Nov 9, 2022

Sorry I stopped engaging. This was taking up too much of my thoughts and got in the way of my work (not to mention my sleep). Also, I went on leave.

I know there is no hope of cutting this short, but there is also no hope of ever getting TCS ratified as a usable standard if we cannot put this if not behind then beside us, so I am going to try. Nothing is off the table, but not everything is in scope for the TCS 2 Task Group. At the moment, the TCS Maintenance Group cannot action anything, as we haven't got a standard to action it on. Unless of course people want another XML Schema, but I thought that if there is only one thing we all agree on, it is that we want a vocabulary standard that fits in with other TDWG standards like Darwin Core.

We all have slightly different visions for TCS, so if we try to make it into the perfect standard right now we never get there, because even if we agree there will be part of the community that does not. I think the only way to get anything ratified is to stick as closely as possible to (at least the intend of) what has already been ratified, so that, if people object to something we propose we can say that it they are objecting to something we already have. So, the task of the TCS 2 Task Group is just to turn the XML Schema Definition (XSD) into a vocabulary. Simple maintenance job, right?

I had not realised how big of a problem it is that there are no definitions in TCS (1) and that nothing is explicit. TCS assumes people know what the terms (elements and attributes) mean and frankly so did I. Turns out that people read things differently. So, while when I read in the User Guide that a Taxon Concept is "a name with a description" I think that is not a very good definition of a Taxon Concept, other people read this as meaning that a Taxon Concept is a Treatment. I had not realised that. So, that is one thing we need to get to the bottom of. Since we can apparently not come to an agreement based on what we intuitively think a Taxon Concept is, can we get some help from outside and agree that the TCS Taxon Concept is the same thing as the Taxonomic Concept sensu Franz & Peet, 2009 and sensu Senderov et al., 2018? So, since @deepreef spelt out the different "options", "Option 2", in my opinion, is the TCS Taxon Concept and "Option 1" is the TCS 1 Publication (or part thereof) and is the accordingTo of "Option 2".

I think, or I thought, that there was something else going on in this thread as well, namely that people do understand what a Taxon Concept is, but want a different object. That, of course, we cannot do. But this might have been miscommunication.

Regarding extra objects, so new terms, when I say that something is not in scope, it means that it is not in scope for the Task Group, or this body of work, not that it cannot be in TCS 2 (which is just a working name and the '2' is not necessarily a major version). It is not about me trying to stifle the discussion or control what is in and what is out, but about finding the path of least resistance and making sure that there will be a TCS 2 some day and the Task Group does not turn into an Interest Group.

Also, staying as close as possible to TCS 1 is a means to an end, not a hard rule. Apart from the set of terms that we really need – we cannot decide that the Taxon Concept is too hard, for example – for all other terms the real criterium is whether we can get it ratified easily. @mdoering 's suggestion to have both the Taxon Concept and the Treatment is very tempting, as it will settle the issue about the meaning of Taxon Concept we are having right now, but adding Treatment is far from straightforward, as I can think of many questions that need to be answered. That being said, nobody is stopping anybody from opening an issue and making the case for it (that is the important bit) independently of the Task Group. In fact, that is encouraged. Then, if it is ready to go for review when the rest is, it goes with and otherwise we (the Maintenance Group) will get to it when we get to it.

@ghwhitbread
Copy link
Collaborator

ghwhitbread commented Nov 9, 2022

It seems that we might have a consensus here. So why don’t we fix it?

As none of the TCS2 properties has a domain or range and none of the TCS1 elements has formal definitions, we could add the Taxon Name Usage (TNU) class and sort the definitions of TNU and TaxonConcept. … And maybe tweak some property names.

@nielsklazenga
Copy link
Member Author

I like it. Let's just add the issue for now. That is for Treatment, TaxonNameUsage we already have (#51).

@nielsklazenga
Copy link
Member Author

How about I also create a new issue for TaxonConcept and rename this one?

@nielsklazenga nielsklazenga removed class RDF type of term is 'class' class:TaxonConcept Organized in the TaxonConcept class TCS2.0.0 labels Nov 10, 2022
This was referenced Nov 10, 2022
@nielsklazenga
Copy link
Member Author

This has now been decomposed into TaxonConcept (#213), TaxonomicTreatment (#214) and TaxonNameUsage (#51).

@deepreef
Copy link

My apologies for not engaing myself. I'm dealing with a family health situation, and am way behind on many work-related things.

Here is a very brief summary of my view of things:

  1. I agree with Markus that we should not assume that a TNU and a "Taxon Concept" are the same thing.

We have discussed for many years using TNUs as representatives of Taxon Concepts, and perhaps using TNU IDs for Taxon Concepts (this is something I strongly supported for many years). My main problem with specific IDs for Taxon Concepts is that there was nothing "real" to link them to, other than a specific TNU. No one had proposed a clean definition for a Concept (to which an ID would be assigned) that did not involve heavily subjective expert assessments, or in a way that was automatable or scalable, so I believed that it was best to "stack" sets of TNUs representing congruent concepts, and select one of those TNU IDs to serve as a proxy ID for the associated Taxon Concept. My view on this has now changed. We badly need a clean definition of a TNU, and a separate clean definition of a "Taxon Concept".

  1. TNUs are a low-hanging fruit.

We need to come up with a clean, stable definition of a TNU, which maps to PLAZI Treatments (all PLAZI treatments correspond to a single TNU, but all heterotypic synonyms included within a Treatment correspond to separate TNUs). This is fundamentally important, because TNUs literally underpin ALL nomenclature and ALL taxonomy. They are, by far, the most important informatic units of taxonomy in general, and therefore it's extremely important to get the definition and properties right.

  1. We need to define a "Taxon Concept"

I recently gave a presentation at TDWG explaining how the term "Taxon Concept" often conflates three different things: Classification, Nomenclature, and Circumscription. Based on the work that Dave Remsen, Nicolas Bailly and I have been doing these past 2+ years, it has become clear that all three of these things are best modelled as sets of Protonyms. Classifications are actually an ordered array of Protonyms, starting with the terminal taxon name and stepping up through the ranks all the way up to Kingdom/Domain. Nomenclature is a subset of classification, representing either a single Protonym (for names at the rank of genus or higher) or an ordered array of Protonyms (for names below the rank of genus), and some algorithms to format text strings based on these Protonym Arrays and relevant Code rules. Circumscriptions are sets of Protonyms that are collectively regarded as heterotypic synonyms at the same taxonomic rank. This requires a LOT more text and graphics to explain, which Dave, Nicolas and I are currently working on (we have 14 "Chapters" already, so this will likely end up as a book). We think we've solved all the major informatic question (yes, @mdoering, in a way that very elegantly handles taxonomic "splits" -- among other edge cases).

The question is: what is a "Taxon Concept"? We tend to favor restricting it to "Circumscription" only, keeping "Classification" and "Nomenclature" separate. However, because all three of them are respresented by ordered or unordered arrays/sets of Protonyms, they could all collectively represent the "Taxon Concept". But I think "Circumscription" is the part mostr people want an informatic solution for.

  1. We should come up with a standard that replaces both TCS and the DwC "Taxon" class.

I will not be satisfied with any new taxonomic "standard" unless it will allow us to deprecate both TCS 1.0 and dwc:Taxon terms. We do not need to support multiple concurrent standards for representing taxonomic and nomenclatural information.

OK, that's enough for now. I'm still dealing with the family health situation (I'm in Florida right now, helping my wife's sisters take care of their ailing father, my father-in-law), so I will not be able to engage again on this topic until late next week at the soonest. I need to reconnect with Dave and Nicolas to flesh out our ideas in writing an images, based on feedback we got at TDWG (including some really important discussions with Donat and Guido of PLAZI, as well as Pensoft and some of the IPNI folks). I'm hoping that by the end of this year or early in 2023, Dave, Nicolas and I will have something more complete to share, to express where we're coming from on the "Protonym Sets" approach to modelling taxonomy and nomenclature.

@nielsklazenga
Copy link
Member Author

nielsklazenga commented Nov 18, 2022

Thanks @deepreef.

  1. Just noting that the TCS TaxonConcept does not conflate these three things, but I agree that a lot of people do. I think it is in fact neither of the three things you mention. Also noting that we have not been talking about TNUs for more than two years now, although some of us seem to think we still are. I personally have very little time for them.

  2. I do not think TNUs are low-hanging fruits. We wasted almost two years on them and got nowhere close to a definition. I think this is because they are just syntactic sugar.

  3. I agree, but I was under the impression that that was what I was trying to do when I started this whole discussion. I also think we have got a good definition now.

  4. This is not something the TCS Maintenance Group can decide on its own, but I think we need both the dwc:Taxon and the tcs:TaxonConcept classes in TDWG. I agree that the dwc:Taxon class is better not used for taxonomy and nomenclature, but we need it for collections data etc. where most "taxa" are nominal concepts. I think that, once TCS 2 is ratified, a lot of the properties on the dwc:Taxon are not needed anymore, but that is for the Darwin Core Maintenance Group to decide. Also noting that the Darwin Core RDF Guide already says that the dwc:Taxon cannot be used in RDF (or any graph-like serialization I extrapolate from that) and makes all dwc:Taxon properties that can be used convenience properties of the dwc:Identification. So for most intents and purposes the tcs:TaxonConcept already will replace the dwc:Taxon.

All of this is wildly out of scope for the TCS 2 Task Group of course. Our job is just to get a set of terms ready for review and we are going to do that before mid-2023.

@deepreef
Copy link

Thanks, @nielsklazenga -- the views I expressed were general thoughts on the informatics needs of the community, without any constraint on whatever may or may not be included in TCS 1.0.

I disagree with your point 2 (at several levels), but unfortunately I don't have time to elaborate further right now. Perhaps next week, if you're interested in continuing the discussion.

No dispute on point 3. My post was not intended as a rebuttal -- it was just an opportunity to make up for my long prior silence by summarizing my current views.

I agree that point 4 is out of scope for this particular task group -- it is really something that should be addressed in the context of DwC.

@nielsklazenga nielsklazenga added this to the TCS 2 initial release milestone Sep 18, 2024
@nielsklazenga nielsklazenga self-assigned this Sep 18, 2024
@nielsklazenga nielsklazenga added class RDF type of term is 'class' class:TaxonConcept Organized in the TaxonConcept class TCS2.0.0 labels Sep 18, 2024
@nielsklazenga nielsklazenga changed the title class:TaxonConcept class: TaxonConcept Sep 19, 2024
@nielsklazenga nielsklazenga removed this from the TCS 2 initial release milestone Sep 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
class:TaxonConcept Organized in the TaxonConcept class class RDF type of term is 'class'
Projects
None yet
Development

No branches or pull requests