Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

physical quantity format #551

Merged
merged 7 commits into from
Feb 28, 2024
Merged

physical quantity format #551

merged 7 commits into from
Feb 28, 2024

Conversation

schristley
Copy link
Member

fix #504
fix #500
fix #498

@schristley schristley added DataRep Ontology Issues discussing vocabularies and ontologies labels Sep 16, 2021
@schristley schristley added this to the AIRR v1.4.0 milestone Sep 16, 2021
@schristley
Copy link
Member Author

@bussec My initial attempt just in the openapi3 schema file

@schristley
Copy link
Member Author

@javh age, that which was deprecated becomes reborn. I hope we don't need to specify a deprecation of deprecation?!

@bussec
Copy link
Member

bussec commented Sep 20, 2021

@schristley Looks good to me, as long as the point/range distinction does not create a major hazzle for search routines.

@bcorrie
Copy link
Contributor

bcorrie commented Sep 20, 2021

My only concern would be the same as @bussec. See #504

@schristley
Copy link
Member Author

The other thing I don't quite like is having the unit twice for a range, what if people start doing silly stuff like make one unit year and another month. Having to do conversions on queries sounds complicated... I'd much prefer a TimeRange object, similar to TimePoint, with a single unit.

@schristley
Copy link
Member Author

@bcorrie time interval it is, how does this look? I went with TimeInterval instead of TimeRange as I think it's more descriptive.

@schristley
Copy link
Member Author

as a side note, TimeInterval is relative, i.e. not fixed to a specific date. If we find we need to express such intervals in the future then we can create a DateInterval where the min and max are datetime

@scharch
Copy link
Contributor

scharch commented Jul 10, 2023

Leaving the referenced issues open under the assumption that this will eventually be merged....

@javh
Copy link
Contributor

javh commented Oct 16, 2023

From the call:

  • Verify that we still want to do this. @schristley, @bcorrie as this seems ADC-ish.
  • If so, resolve conflicts and merge.

@schristley
Copy link
Member Author

@javh I certainly hope this can be done for AIRR 2.0. The current design of having separate, unconnected fields (except through a semantic interpretation of the names), is very TSV-ish. I think we should be more JSON-ish by compositing fields together into an object that semantically belong together.

@schristley schristley marked this pull request as ready for review December 16, 2023 23:05
@schristley
Copy link
Member Author

The check error will be fixed with #739

@@ -2308,37 +2366,30 @@ Sample:
subset: sample
name: Disease state of sample
collection_time_point_relative:
type: number
$ref: '#/PhysicalQuantity'
Copy link
Contributor

@bcorrie bcorrie Dec 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't collection_time_point be a TimePoint, not a PhysicalQuantity

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #689

@bcorrie
Copy link
Contributor

bcorrie commented Dec 17, 2023

I also think:

  • reactivity_value and reactivity_unit in ReceptorReactivity should be a PhysicalQuantity, no?

@bcorrie
Copy link
Contributor

bcorrie commented Dec 17, 2023

I think this would also resolve #689

@bcorrie
Copy link
Contributor

bcorrie commented Dec 17, 2023

@schristley if no objections I will make the Sample.collection_time_point and ReceptorReactivity.reactivity changes.

@schristley
Copy link
Member Author

@schristley if no objections I will make the Sample.collection_time_point and ReceptorReactivity.reactivity changes.

@bcorrie sure, sounds good

@schristley
Copy link
Member Author

Hey @bcorrie , I can see you are trying to clean up this PR, let me know if I can help in any way?

@bcorrie
Copy link
Contributor

bcorrie commented Feb 8, 2024

Yes, I was working through it. Thanks for the test data fixes, I gave up last night and was going to dive back in this morning. You know the test suites better than I do clearly.

Looks like the checks have passed.

@bcorrie
Copy link
Contributor

bcorrie commented Feb 8, 2024

Only outstanding issue on this is for CellReactivity.reactivity_value should be a PhysicalQuantity which is being covered in #705 (comment)

So I think this can be merged. @javh and @bussec were both reviewers for this previously so I have left you on if you want to have a look.

@bcorrie
Copy link
Contributor

bcorrie commented Feb 8, 2024

In #646 we discuss whether collection_time_point_relative should be a TimeInterval.

It also raises the question should we have a QuantityInterval or QuantityRange object. We can define it but not use it??? Or should all of our PhyscialQuantity objects be QuantityRange objects?

Not quite done with this yet. 8-)

@bcorrie bcorrie linked an issue Feb 8, 2024 that may be closed by this pull request
@bcorrie
Copy link
Contributor

bcorrie commented Feb 8, 2024

In #689 we talk about using the label field of TimePoint in collection_time_point_relative so that we no longer need collection_time_point_reference. Is this desirable?

@bcorrie bcorrie linked an issue Feb 8, 2024 that may be closed by this pull request
@schristley
Copy link
Member Author

In #646 we discuss whether collection_time_point_relative should be a TimeInterval.

It also raises the question should we have a QuantityInterval or QuantityRange object. We can define it but not use it??? Or should all of our PhyscialQuantity objects be QuantityRange objects?

Not quite done with this yet. 8-)

Upscale quantity fields to allow either a range or point, like with age...

I'm hesitant to do that until we actually have a use case that requires it. For some quantities, the idea of measuring a range might not be semantic viable. And it has the implication that downstream analysis will always have to handle the fields as ranges in order to do calculations properly, which introduces complexity.

@bcorrie
Copy link
Contributor

bcorrie commented Feb 20, 2024

@bussec @javh I think this is OK to merge. It would be nice to have this merged before any Event stuff gets added, as Event extends/relies on some of these changes. Needs your review (or remove your self from the review list if you would prefer not to 8-)

Copy link
Member

@bussec bussec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good in general, some minor fixes.

lang/R/inst/extdata/airr-schema.yaml Outdated Show resolved Hide resolved
lang/R/inst/extdata/airr-schema.yaml Outdated Show resolved Hide resolved
lang/R/inst/extdata/airr-schema.yaml Outdated Show resolved Hide resolved
lang/R/tests/data-tests/good_combined_airr.json Outdated Show resolved Hide resolved
lang/R/tests/data-tests/good_combined_airr.json Outdated Show resolved Hide resolved
specs/airr-schema-openapi3.yaml Outdated Show resolved Hide resolved
specs/airr-schema-openapi3.yaml Outdated Show resolved Hide resolved
specs/airr-schema.yaml Outdated Show resolved Hide resolved
specs/airr-schema.yaml Outdated Show resolved Hide resolved
specs/airr-schema.yaml Outdated Show resolved Hide resolved
@bcorrie
Copy link
Contributor

bcorrie commented Feb 23, 2024

@schristley and I both did this in parallel - I think I merged them OK. I added a couple of adc_query_support attributes that were missing.

@bcorrie
Copy link
Contributor

bcorrie commented Feb 23, 2024

@bussec remaining issue from above is disease_length, I believe it should be a PhysicalQuantity as I don't think a TimeInterval makes sense since it is a length not a time span.

@schristley
Copy link
Member Author

@bussec remaining issue from above is disease_length, I believe it should be a PhysicalQuantity as I don't think a TimeInterval makes sense since it is a length not a time span.

I agree it should be PhysicalQuantity because it is a length or duration. If you go to a TimeInterval then what value would you put for min, why would it be anything other than zero? If min is zero, it's the same result as PhysicalQuantity, while if it's non-zero, then what is it relative to?

With patient data, we don't want to use dates, so what we are seeing with the AKC work (and you see this when getting clinical data), is that there is a defined T=0 event, and then everything after that is a relative time point. But we don't have that in AIRR yet.

In the future redesign there will be time points for specific events, so the description, "Time duration between initial diagnosis and current intervention" would suggest an event at "initial diagnosis" and an event at "current intervention" and this length could be computed from those.

@bussec
Copy link
Member

bussec commented Feb 23, 2024

@bussec remaining issue from above is disease_length, I believe it should be a PhysicalQuantity as I don't think a TimeInterval makes sense since it is a length not a time span.

@bcorrie I agree that TimeInterval (or TimePoint) sound odd, but they would enforce the correct units for a measurement of time, which PhysicalQuantity does not. It is probably more a question how we call the Time* objects, but we can worry about that another day as the current solution does not create any limits here. So no objections from my side.

@schristley
Copy link
Member Author

@bussec remaining issue from above is disease_length, I believe it should be a PhysicalQuantity as I don't think a TimeInterval makes sense since it is a length not a time span.

@bcorrie I agree that TimeInterval (or TimePoint) sound odd, but they would enforce the correct units for a measurement of time, which PhysicalQuantity does not. It is probably more a question how we call the Time* objects, but we can worry about that another day as the current solution does not create any limits here. So no objections from my side.

I see. Yes, you are right, I missed that.

So we need to fix this because PhysicalQuantity has the top ontology node as mass unit and thus technically we are not allowed to specify time units with it.

@bcorrie
Copy link
Contributor

bcorrie commented Feb 23, 2024

Do we want to add a TimeQuantity object - to enforce units. Or just have a general Quantity object with a higher root node? Or just change the current root node and leave it as a PhysicalQuantity.

I would be tempted to change the name to Quantity and change the root node to UO:0000000, unit

@bussec @schristley thoughts?

@schristley
Copy link
Member Author

I think we are pretty much required to have a TimeQuantity object to enforce time units. If we change the root to UO:0000000, unit then that will allow all types of incorrect units to be used.

An interesting aside, LinkML has slot_usage which allows for this. That is, we could define a general Quantity but then indicate when used as disease_length that its range is a time unit, while other usages of the slot might have range as mass units, and so on. I would guess how this gets translated into JSON schema ends up being not much different from what we are doing manually.

@bussec
Copy link
Member

bussec commented Feb 26, 2024

+1 for introducing a TimeQuantity object.

@bcorrie
Copy link
Contributor

bcorrie commented Feb 26, 2024

I will work on this...

@lgcowell
Copy link

lgcowell commented Feb 26, 2024 via email

@lgcowell
Copy link

lgcowell commented Feb 26, 2024 via email

@bcorrie
Copy link
Contributor

bcorrie commented Feb 26, 2024

This has been done - ready for merge, unless we can find anything else...

@bussec bussec merged commit d6e8c31 into master Feb 28, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ontology Issues discussing vocabularies and ontologies
Projects
None yet
6 participants