Details about SemanticTextAnnotation #640

marco-brandizi · 2023-03-07T15:37:13Z

I'd like some clarifications on the SemanticTextAnnotation profile, which appears to be a new proposed profile here to associate texts with semantic annotations.

Does the specification imply that the new type bioschema:SemanticTextAnnotation (STA in the follow) is being introduced too? Else, How can one distinguish a STA from any other schema:CreativeWork? Via dct:conformsTo?
Restricting schema:mainEntity to schema:DefinedTerm is very limiting to me: I have use cases where the output of a text mining tool can be: an ontology term, a recognised entity like a gene (schema:mainEntity would point to its URL or even to its symbol or accession, via a plain string value), or a more complex structure, such as a reified statment, as shown here. In schema.org this has schema:Thing in the range, I think it would be useful to keep it like that for STAs too (most schema validators accepts plain text by default). Also, note that other annotation models have the same general approach (eg, Web Annotation Model)

Is schema:CreativeWork the best base class for this? I've seen similar derivation examples that use schema:Action or schema:CreateAction. A STA could be modelled as an action in the sense of the action that created the association between the original text and some entity.
- An advantage of that would be that we could use schema:instrument (eg, to trace the software that produced the annotation).
  - Also, schema:result would play the role that now is played by schema:mainEntity, while schema:object would be used in place of schema:text and schema:subjectOf. An advantage of that would be that it could link both the original document (eg, an article) and some structured information about the specific text fragment the annotation is derived from (eg, the sentence and its position within the article). Currently, a way to do the same is maybe: schema:isBasedOn <sentence as CreativeWork> [schema:partOf <article>], but to me, it looks like less explicit and more confusing.
- As an alternative, you might want to keep things under CreativeWork, but at least add bioschema:SemanticTextAnnotation to the domain of schema:instrument.

The text was updated successfully, but these errors were encountered:

gtsueng · 2023-03-16T19:25:15Z

Does the specification imply that the new type bioschema:SemanticTextAnnotation (STA in the follow) is being introduced too? Else, How can one distinguish a STA from any other schema:CreativeWork? Via dct:conformsTo?
--From what I can see, the bioschemas SemanticTextAnnotation profile does not have any original properties that cannot be inherited from the current parent class ("schema:CreativeWork") so it does not seem necessary to treat it as a new schema.org type or bioschemas type.

Is schema:CreativeWork the best base class for this? I've seen schemaorg/schemaorg#1905 that use schema:Action or schema:CreateAction. A STA could be modelled as an action in the sense of the action that created the association between the original text and some entity. An advantage of that would be that we could use schema:instrument (eg, to trace the software that produced the annotation). Also, schema:result would play the role that now is played by schema:mainEntity, while schema:object would be used in place of schema:text and schema:subjectOf. An advantage of that would be that it could link both the original document (eg, an article) and some structured information about the specific text fragment the annotation is derived from (eg, the sentence and its position within the article).
--While there are some properties from schema:Action could potentially be used in place of those in schema:CreativeWork, there are many useful properties which don't have a good mapping: 'dateModified', 'datePublished', 'inLanguage', 'isPartOf'. ('dateCreated' could potentially map to 'startTime').

Currently, a way to do the same is maybe: schema:isBasedOn <sentence as CreativeWork> [schema:partOf <article>], but to me, it looks like less explicit and more confusing.
--It could also be modeled as schema:isBasedOn: {'@type': 'schema:SoftwareApplication', 'name':'name of tool'}. ('schema:SoftwareApplication' is a subclass of 'schema:CreativeWork').

As an alternative, you might want to keep things under CreativeWork, but at least add bioschema:SemanticTextAnnotation to the domain of schema:instrument.
--Do you mean that bioschemas:SemanticTextAnnotation should be the expected type for some instrument property to some sort of SemanticTextAnnotationAction? Because that seems to be how it's being used on schema.org. X is a subproperty of schema:instrument (eg- schema:diet, schema:deliveryMethod, schema:recipe). It has an expected type of 'Y' (which is generally some subclass of CreativeWork like, schema:Diet, schema:DeliveryMethod, schema:Recipe) and is used on Z (some subclass of Action eg- schema:ExerciseAction, schema:OrderAction, schema:CookingAction).

marco-brandizi · 2023-03-18T20:20:53Z

Thanks, @gtsueng.

What about schema:mainEntity? Having defined term as the only possible entity that the annotation associates to a text to me is very limiting.

As for the rest:

If a new CreativeWork subclass isn't expected, then I guess I have to use some other qualifier, such as schema:additionalType, to make it clear we're dealing with an STA. Else, I don't see how a tool could know if a creative work is an STA or not (instantiating specific types is a possible reason for wanting subclasses, not just accommodating new properties). This approach would work, but wouldn't be much standard.

CreateAction vs CreativeWork, OK, I see your points. It's also good to see there is a way to map the software that produced an annotation, though I'd say using isBasedOn to track the software quite stretches the intended meaning of this property (which usually an input/output relation and it doesn't extend to the thing that produced the output from the input).

Also, I think the general problem here is schema.org doesn't have generic entities of mereological type (eg, part-of with any domain/range) or provenance type (es, derived-from or based-on with any domain/range).

Do you mean that bioschemas:SemanticTextAnnotation should be the expected type for some instrument property to some sort of SemanticTextAnnotationAction

Basically, yes, with this other approach, the annotation is an action, which of input is a text and the output is a defined term (or, as I wrote, something else).

gtsueng · 2023-04-17T18:15:45Z

Hi @marco-brandizi--

On the matter of mainEntity, I will defer to @ljgarcia, as she was involved in the development of the SemanticTextAnnotation profile.

While the bioschemas:SemanticTextAnnotation has not been defined a new bioschemas Type that will be pushed to schema.org, it is still a bioschemas profile. From a JSON-schema/JSON-LD expression/definition standpoint, this is still a subclass of CreativeWork, even if it is not a bioschemas type. If you look at the JSON-LD file defining this "class", it's literally a subclass of CreativeWork. Hence, you could point your dct:conformsTo to the bioschemas STA profile.

ljgarcia · 2023-05-26T15:18:31Z

This profile has never been used and has been hanging there since the beginning. The date on the latest profile comes from a massive update with sameAs as recommended on all profiles. This profile is a candidate for deprecation.

marco-brandizi · 2023-05-29T10:52:38Z

@ljgarcia thanks for the clarification. I think I'll stick to the action approach for now. But I think there's a number of text mining use cases (and in general, provenance tracking cases) that require some definitions like this profile.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Details about SemanticTextAnnotation #640

Details about SemanticTextAnnotation #640

marco-brandizi commented Mar 7, 2023

gtsueng commented Mar 16, 2023

marco-brandizi commented Mar 18, 2023

gtsueng commented Apr 17, 2023

ljgarcia commented May 26, 2023

marco-brandizi commented May 29, 2023

Details about SemanticTextAnnotation #640

Details about SemanticTextAnnotation #640

Comments

marco-brandizi commented Mar 7, 2023

gtsueng commented Mar 16, 2023

marco-brandizi commented Mar 18, 2023

gtsueng commented Apr 17, 2023

ljgarcia commented May 26, 2023

marco-brandizi commented May 29, 2023