`filteringTerms`: Add `excluded` flag #63

mbaudis · 2023-03-27T06:46:20Z

There is currently no way to exclude specific ontologyTerms in filter queries although this would be desirable to expand query options (especially regarding the limitation of not having Booleans) and to stay in line with Phenopackets (which has an excluded flag e.g. in PhenotypicFeature.

(There is a current workaround to use some custom filter design, e.g. pre-pending an ontologyTerm in a request with ! - but this is non-standard & only would work for individual Beacons.)

Proposal:

  OntologyFilter:
    type: object
    description: Filter results to include records that contain a specific ontology
      term.
    required:
      - id
    properties:
      id:
        type: string
        description: >-
          Term ID to be queried, using CURIE syntax where possible.
        example: HP:0002664
      includeDescendantTerms:
        type: boolean
        default: true
        description: >-
          Define if the Beacon should implement the ontology hierarchy,
          thus query the descendant terms of `id`.
      similarity:
        type: string
        enum:
          - exact
          - high
          - medium
          - low
        default: exact
        description: >-
          Allow the Beacon to return results which do not match the filter
          exactly, but do match to a certain degree of similarity. The Beacon defines
          the semantic similarity model implemented and how to apply the thresholds
          of 'high', 'medium' and 'low' similarity.
      scope:
        type: string
        description: The entry type to which the filter applies
        example: biosamples
      excluded:
        description: >-
          Flag to indicate whether the filtering term was observed or not. The default is `false`,
          _i.e._ will be selected for **existence** of the term.
        type: boolean
        default: false

The text was updated successfully, but these errors were encountered:

mbaudis · 2024-06-03T14:52:42Z

Use case examples

All breast adenocarcinoma samples by NCIt (NCIT:C5214), excluding triple-negative (NCIT:C71732)

This is a simple use case where a diagnosis is queried w/ the explicit exclusion of one specific subtype. In principle this could be achieved by an AND query for all wanted codes at the same hierarchy level as NCIT:C71732 but this requires understanding of the alternative codes and branch structures etc. Also more Beacon-like: "answer a simple query as good as possible".

filters:
  - id: "NCIT:C5214"
  - id: "NCIT:C71732"
    excluded: true

GET:

?filters=NCIT:C5214,!NCIT:C71732

COVID19 in a non-diabetic patient with cardiovascular risk factors

This is based on an example provided inn the Phenopackets documentation.

filters:
  - id: "NCIT:C2985"                # Diabetes Mellitus
    excluded: true
  - id: "NCIT:C34830"               # Cardiomyopathy"
  - id: "NCIT:C80389"               # Chronic Kidney Disease, Stage 3"
  - id: "NCIT:C3283"                # Obesity"
  - id: "MONDO:0100096"             # COVID-19"

GET:

?filters=!NCIT:C2985,NCIT:C34830,NCIT:C80389,NCIT:C3283,MONDO:0100096

redmitry · 2024-06-04T07:00:49Z

Hi all,

Just a naïve question. May the exclusion (negation) be included in filtering logic along with AND & OR?.
I mean do we need this special case for negation?

Cheers,

D.

mbaudis · 2024-06-04T07:48:45Z

@redmitry These are actually 2 questions.

May the exclusion (negation) be included in filtering logic along with AND & OR?

If Boolean queries are developed than negated filters would be included in that scenario; just treat them as another filter, with a flag.

do we need this special case for negation

In the Booleans? If allowing Boolean combinations a "normal" query (some filters connected by AND) are just a special case - so yes. It is not so much about dreaming up scenarios where this needs to happen (e.g. retrieving 2 groups of results in a single query); excluded is just a potential flag of any filter (such as similarity).

In any case adding the excluded flag as an option seems straightforward (and is IMO clearer as the procedural similarity flag...¹).

One actually could re-use similarity like "similarity": "NONE" or such but this would introduce a new special option & not be in line w/ e.g. Phenopackets - so please forget I wrote this. ↩

redmitry · 2024-06-04T08:47:20Z

In any case adding the excluded flag as an option seems straightforward (and is IMO clearer as the procedural similarity flag...

I never understood the exact meaning of similarity for practical implementation.

From the usecase I understand that we want all but triple negative breast cancers. Without exclusion we would need repeat too many identifiers one by one.

?filters=NCIT:C5214,!NCIT:C71732 looks good, but for the POST I would use generic logical query...

"query": {
    "filters": [ 
         {"id": "NCIT:C5214"}, {"id": "NCIT:C71732"}
    ],
    "filters_logic": "NCIT:C5214 & !NCIT:C71732"
}

I mean that we may use "filters_logic" (similar to @anuradhawick "queryExpression": "(A|B)&(C&D|E&F)&!G").
I would prefer to repeat identifiers for robustness...

Cheers,

D.

gsfk · 2024-06-05T19:28:16Z

If I understand correctly, the proposal is to be able to make a query and get back results where a particular ontology is absent... is that correct? But this is quite a bit different from the phenopackets excluded field, where a term is present, but marked as "excluded": "true" (ie this condition was specifically ruled out). Obviously terms that are excluded in this sense will not be missing from the data.

Is "missing" or "absent" a less confusing name for this feature? Especially for those of us already using phenopackets with beacon.

anuradhawick · 2024-06-06T06:10:20Z

@redmitry happy with your implementation, but not sure if it would extend to support includeDescendentTerms, Alphanumeric filters and so on. Especially with LLMs in horizon similarity searches will take prominence searching "Melanoma or a related condition" in a vector space. In this case it would be nice to still have filters as a JSON object with other attributes such as descendent terms, similarity LOW/HIGH etc.

But keen to see where it lands :)

@gsfk one area this approach shines is when Cancer X is covered by datasets from HPO and SNOMED ontologies. Logical expressions makes querying this magically easy. Also when there are mutually exclusive conditions like "from Australia" or "from New Zealand" kind of queries are nearly impossible in one go without some logic.

If we can achieve logic, I think we cover all the bases for "missing", etc.

dbujold · 2024-06-06T14:14:33Z

I'll echo @gsfk point, it's not clear here whether we're talking about filtering on properties that are explicitly mentioned as excluded (as Phenopackets does), or if we're talking about adding a flag that checks whether a property is absent from a given record.

mbaudis · 2024-06-06T14:29:39Z

O.k., several convincing arguments although they fall into different camps: 1. the excluded flag would be ambiguous inits use since it doesn't define an explicit test so far (on could make this the default behavior, though - which in case of the diagnosis test would require to see if the excluded one is part of the existing filters etc.) but this requires to many conditions to agree on. The 2. argument is that a full conditional query model already solves such issues.

So excluded is too specific, but an expression of a negating condition (absent, missing, excluded) seems good - unless solving this through the query language (which IMO is still something "emergent").

negated: true? ... which also is a clearer expression of the filter value's intent, less tied to the implementation.

Btw. the comment by @gsfk Gordon actually points to a potential need for a specific excluded: true solution in the PXF sense...

gsfk · 2024-06-18T17:24:00Z

Surprisingly the name "negated" still seems ambiguous to me, in the same way as above (does it mean "negate this particular term in my query" or "this term is negated in the data")? "Negated" is actually the original name of the phenopackets excluded field.

mbaudis · 2024-06-26T07:49:57Z

@gsfk While I started w/ pointing to PXF it really is different when using the term negation here. So probably negated or absent is the best choice: One expresses that the term should not be found w/o the requirement that the subject has been explicitly tested.

All: Comments on absent?!

redmitry · 2024-06-26T08:20:47Z

I implemented a possible prototype and still do not understand the difference between "excluded" and "negation".

"query": {
  "filters": [
    { "id": "NCIT:C9335" }
  ],
  "filtersLogic": "NCIT:C9335 & !NCIT:C5213"

does this query do what we expect with "excluded"?

D.

anuradhawick · 2024-06-26T08:28:09Z

@redmitry i think your solution should imply exclusion. It looks really good and intuitive.

Can we use indices in logic expression? There's a slim chance same ontology term or filter appearing in different scopes. For example "NCIT:123456" filter could be present across multiple scopes. So if it does this ontology code in filter logic becomes ambiguous.

Also complex queries like

A & B | !A & C has to repeat same term multiple times.

What are your thoughts on that?

redmitry · 2024-06-26T08:33:19Z

The scopes are defined in the filters[] array. The "filtersLogic" only defines the logic. I use "ids" (or better should be "labels") in "filtersLogic" because we may reuse the same filter in a complex query.
The question is should "NCIT:C5213" (a subclass of "NCIT:C9335") be in "filters" also? By default descendant terms are enabled.

mbaudis · 2024-06-26T08:36:01Z

@redmitry Some notes:

filters[0].id should be NCIT:C9335, not double-prefixed
not sure how the filter logic would work

Regarding the flag names: As Gordon pointed out, excluded has a bit of a "tested for that explicitly" meaning, stronger than "absent". So e.g. in the case of testing against male sex the !male could be interpreted as
* absent: either female or not indicated / no value etc.
* excluded: only females match
So even if we could say that the flag applies to filter, not the data, it seems sensible to use "this term is absent" instead of "this term has been excluded".

Also, I really would prefer to separate the discussion of the filter flag from "how does this get implemented" ¯\_(ツ)_/¯

redmitry · 2024-06-26T08:45:53Z

@mbaudis

filters[0].id should be NCIT:C9335, not double-prefixed

fixed (typo).

This is a question of interpretation and what actually be generated for execution.

For alphanumeric filters "! Weight > 25" is interpreted as "Weight <= 25"
For ontological filters "!NCIT:C5213" "disease ne NCIT:C5213". We may include "and exists"

Note that in a query disease $in: ["NCIT:C9335", "NCIT:C4671", ...] and $ne NCIT:C5213 checks the disease field existence in the first "and" operand.

mbaudis · 2024-06-26T08:59:51Z

@redmitry My take here is:

We should not use this option for filters with numeric expressions since we already have other ways to do this here
We should stick to the "not seen" way of interpretation (absent), not to the "formally excluded". While there are cases for the latter it represents a specific subset of the former and goes (IMO) a bit too much in trying to be overly precise (and then losing on implementations). This seems to be in line with/ some addtl. emails (@jrambla & Tony).

jrambla · 2024-07-11T17:07:46Z

My 2 cents:

We already have the excluded flag in phenotypicFeature schema, Its meaning is stated in PXF and "inherited" by Beacon.
We don't have a syntax for expressing that excluded in the query. @tb143 we could need one
I understand that we are planning to add the NOT boolean operator together with OR logic. @tb143 please confirm
If we adopt the NOT operator, we could not be able to avoid users adding them to any filter (ontology or alphanum), and we shouldn't. The user should be allowed to say ! Weight > 25" as synonym for Weight <= 25. Whatever is clear to the user should be fine to us.
I guess that, in any case, some sanityzation would be required in the implementations to avoid queries like Age< 250 that will return everything or playing with boolean logic to trick the safety measures, But this should go into the implementation not in the spec.

tb143 · 2024-07-23T17:30:38Z

Following a Filters Scout discussion on this issue, I will try to summarise what we are asking for votes on (👍,👎 or leave +1,-1).

Question: Do you agree that the ontology filter query options should include an excluded flag? Beacon will reuse the Phenopackets definition of excluded, essentially "tested for, but not found/observed".

Some considerations that have emerged from following the discussion:

You are voting on the principle of including an ontology filter excluded flag in the spec. The addition of other (related) flags - or how excluded could get implemented - can be taken up in separate issues/discussions.
This proposal is independent of other parallel proposals and discussions, for example AND/OR filter logic. If complementary proposals are accepted by the Scout, examples will be given of how they could work together.

jrambla · 2024-07-23T18:05:14Z

+1

mbaudis · 2024-10-29T22:19:28Z

+1

jrambla · 2024-10-30T10:52:24Z

I also insist is that "excluded" is not an operator but a different property to take into account.

jrambla assigned tb143 May 5, 2023

mbaudis mentioned this issue Jul 4, 2023

Question: AND/OR combined filter logic #100

Closed

jrambla added the Dev scout label Mar 21, 2024

mbaudis added the Filters scout label Mar 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`filteringTerms`: Add `excluded` flag #63

`filteringTerms`: Add `excluded` flag #63

mbaudis commented Mar 27, 2023 •

edited

Loading

mbaudis commented Jun 3, 2024 •

edited

Loading

redmitry commented Jun 4, 2024

mbaudis commented Jun 4, 2024 •

edited

Loading

redmitry commented Jun 4, 2024 •

edited

Loading

gsfk commented Jun 5, 2024 •

edited

Loading

anuradhawick commented Jun 6, 2024

dbujold commented Jun 6, 2024

mbaudis commented Jun 6, 2024 •

edited

Loading

gsfk commented Jun 18, 2024

mbaudis commented Jun 26, 2024 •

edited

Loading

redmitry commented Jun 26, 2024 •

edited

Loading

anuradhawick commented Jun 26, 2024

redmitry commented Jun 26, 2024 •

edited

Loading

mbaudis commented Jun 26, 2024 •

edited

Loading

redmitry commented Jun 26, 2024 •

edited

Loading

mbaudis commented Jun 26, 2024 •

edited

Loading

jrambla commented Jul 11, 2024

tb143 commented Jul 23, 2024

jrambla commented Jul 23, 2024

mbaudis commented Oct 29, 2024

jrambla commented Oct 30, 2024

filteringTerms: Add excluded flag #63

filteringTerms: Add excluded flag #63

Comments

mbaudis commented Mar 27, 2023 • edited Loading

mbaudis commented Jun 3, 2024 • edited Loading

Use case examples

All breast adenocarcinoma samples by NCIt (NCIT:C5214), excluding triple-negative (NCIT:C71732)

COVID19 in a non-diabetic patient with cardiovascular risk factors

redmitry commented Jun 4, 2024

mbaudis commented Jun 4, 2024 • edited Loading

Footnotes

redmitry commented Jun 4, 2024 • edited Loading

gsfk commented Jun 5, 2024 • edited Loading

anuradhawick commented Jun 6, 2024

dbujold commented Jun 6, 2024

mbaudis commented Jun 6, 2024 • edited Loading

gsfk commented Jun 18, 2024

mbaudis commented Jun 26, 2024 • edited Loading

redmitry commented Jun 26, 2024 • edited Loading

anuradhawick commented Jun 26, 2024

redmitry commented Jun 26, 2024 • edited Loading

mbaudis commented Jun 26, 2024 • edited Loading

redmitry commented Jun 26, 2024 • edited Loading

mbaudis commented Jun 26, 2024 • edited Loading

jrambla commented Jul 11, 2024

tb143 commented Jul 23, 2024

jrambla commented Jul 23, 2024

mbaudis commented Oct 29, 2024

jrambla commented Oct 30, 2024

`filteringTerms`: Add `excluded` flag #63

`filteringTerms`: Add `excluded` flag #63

mbaudis commented Mar 27, 2023 •

edited

Loading

mbaudis commented Jun 3, 2024 •

edited

Loading

mbaudis commented Jun 4, 2024 •

edited

Loading

redmitry commented Jun 4, 2024 •

edited

Loading

gsfk commented Jun 5, 2024 •

edited

Loading

mbaudis commented Jun 6, 2024 •

edited

Loading

mbaudis commented Jun 26, 2024 •

edited

Loading

redmitry commented Jun 26, 2024 •

edited

Loading

redmitry commented Jun 26, 2024 •

edited

Loading

mbaudis commented Jun 26, 2024 •

edited

Loading

redmitry commented Jun 26, 2024 •

edited

Loading

mbaudis commented Jun 26, 2024 •

edited

Loading