Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collections response fix endpoints #121

Closed
wants to merge 6 commits into from

Conversation

mbaudis
Copy link
Member

@mbaudis mbaudis commented Mar 21, 2024

This addresses the easy part of #116 (collections responses w/ strange endpoints etc.)

@mbaudis mbaudis changed the base branch from schema-urgent-fixes to main March 21, 2024 18:29
@mbaudis mbaudis requested review from redmitry and costero-e March 21, 2024 18:30
costero-e
costero-e previously approved these changes Mar 22, 2024
Copy link
Collaborator

@costero-e costero-e left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything makes sense, as cohorts and datasets endpoints should be a collection response, not a resultSet response.
I'm also seeing updated the way of showing arrays in examples:

        examples:
          - 
            - OMIABIS:0001017

This will need to be updated everywhere to follow this pattern, I guess.

@redmitry
Copy link
Collaborator

I am in doubts...
We are removing the "CollectionsResponse" and modifying "ResultsOKResponse" to be the collection.
What will happen with endpoint like?
/cohorts/{id}/individuals

Before changes, only the "root" endpoint was returning the collection an the rest "usual" ResultsOKResponse (like any other entryType e.g. "biosamples").

Are we changing the contract?

D.

@mbaudis
Copy link
Member Author

mbaudis commented Mar 22, 2024

@redmitry Oh well, you are right w/ this breaking the resultsetResponses if allowing those endpoints (which I guess we should). I guess the naming of ResultsOKResponse makes this look special but is extremely confusing. I'll iterate another version for discussion...

@costero-e costero-e dismissed their stale review March 22, 2024 08:57

Reviewer spotted other breaking fixes that need to be reviewed again.

@mbaudis
Copy link
Member Author

mbaudis commented Mar 22, 2024

Have a look at ea4324b Only for cohorts so far.

My reasoning there is that ResultsOKResponse does not make sense as a name, especially if there are several possible responses.

But generally this all is also a big argument for ditching the concept of having separate Boolean and Count response types and just having a single one for each data response flavour (collections, resultsets, filtering terms ...) where we the data payload (and count) are optional... This is probably against some "how it should be done" rule and I remember that the different response schemas looked nice when drafting them; but now we see a confusion of alternative response types and alternative payload types.

Alas, for now how about this?

@redmitry
Copy link
Collaborator

IMO "ResultsetsResponse" is more appropriate name.
So we'll have "ResultsetsResponse" which is one of three responses (boolean, count, resultsets) and the "CollectionsResponse".

@costero-e
Copy link
Collaborator

But resultSets response gives results splitted by dataset and if I'm not wrong, boolean and count responses don't do that, do they?
On the other hand, I see with your last commits @mbaudis that cohorts have the resultsets response and datasets still have the resultsOkResponse.

@mbaudis
Copy link
Member Author

mbaudis commented Mar 22, 2024

Yes and I really would like to see this (since the previous generic ResultsOKResponse is kind of confusing - at least it confused me; I thought it was a required term ...).

Now, in principle we should then propagate this everywhere we have such a generic to make it clearer what is being executed.

Pro: clarity, won't break anything since references inside of schemas.
Con: change a parameter name just for these reasons, w/o any functional gain

I'm clearly pro here, but for all entry types.

@mbaudis
Copy link
Member Author

mbaudis commented Mar 22, 2024

@costero-e I left the datasets just since I didn't want to do changes I have to reverse ...

ResultSets are actually split by any type of collection:

      setType:
        description: Entry type of resultSet. It SHOULD MATCH an entry type declared
          as collection in the Beacon configuration.
        type: string
        default: dataset

... so a datasets response for individuals could be for the dataset or split for its cohorts, theoretically.

@costero-e
Copy link
Collaborator

I think you tagged another Oriol @mbaudis, jeje. Ok, thank you, it's good you are just giving an example using cohorts.
On the other hand, what I meant is that we don't have resultSets in a typical Count or Boolean response:

{
    "$schema": "https://json-schema.org/draft/2020-12/schema",
    "description": "Payload definition for the \"count\" response.",
    "properties": {
        "exists": {
            "$ref": "../../common/beaconCommonComponents.json#/definitions/Exists"
        },
        "numTotalResults": {
            "$ref": "../../common/beaconCommonComponents.json#/definitions/NumTotalResults",
            "description": "Total number of results."
        }
    },
    "required": [
        "exists",
        "numTotalResults"
    ],
    "type": "object"
}

For me, putting the "countResponse" and "booleanResponse" under a "resultSetsresponse" makes it a bit confusing. That's why I'm more in favor of keeping "resultsOkResponse".

@mbaudis
Copy link
Member Author

mbaudis commented Mar 22, 2024

@costero-e It is confusing in any case. Therefore I would just prefer that we have only a ResultSetsResponse with a Boolean granularity instead of switching responses.

But overall the new way would be slightly less confusing... You need to name the response anyway and ResultsOKResponse + CollectionsResponse is worse compared to ResultsetsResponse + CollectionsResponse.

Anyway, naming is schema internal. And IMO it is correct to have an endpoint with a ResultsetsResponse, which can take the format of one of the options. If not doing it for ResultsetsResponse then you have additional inconsistencies.

But again, this is a reason for having a single response for and the granularity handled inside it.

@mbaudis
Copy link
Member Author

mbaudis commented Mar 22, 2024

... and another inconsistency is the "cohorts have individuals for records retrieval while datasets have all entry types". I understand the argument ("cohorts as a group of individuals") but still you may want to get the samples etc. Not wanting to change right now but good to keep in mind.

@mbaudis
Copy link
Member Author

mbaudis commented Mar 25, 2024

@redmitry @costero-e So WDYT - moving ahead in the current version w/o going for the other "ResultsOKResponse" instances, but doing it for the collections which have these 2 types of responses? I'd like to cloes #116

I guess we should have the "response schemas for entry types should be defined somewhere" in a general "remove dependencies on OpenAPI" discussion/issue.

@redmitry
Copy link
Collaborator

Again we are changing the contract:

was:

"/cohorts/{id}": {
            "get": {
                "responses": {
                    "200": {
                        "$ref": "#/components/responses/ResultsOKResponse"
                    },

now:

        "/cohorts/{id}": {
            "get": {
                "responses": {
                    "200": {
                        "$ref": "#/components/responses/CollectionsResponse"
                    },

Most probably that was the error in the spec coz I remember reporting this to Oriol #98, but since I implemented my java implementation from spec.

Should we fix both the spec. and the implementation?

Cheers,

Dmitry

@mbaudis
Copy link
Member Author

mbaudis commented Mar 25, 2024

@redmitry This just changes the name ResultsOKResponse => CollectionsResponse; the operation is getting a single collection (as it should).

I've fixed now the messy partial changes in the last commit (all the separate ones should be squashed...).

This fixes the wrong response for some of teh collections endpoints and changes the (definitely now) ambiguous ResultsOKResponse to the correct instances of

  • CollectionsResponse
  • ResultsetsResponse

@redmitry
Copy link
Collaborator

This just changes the name ResultsOKResponse => CollectionsResponse; the operation is getting a single collection (as it should).

I do not doubt that it probably (😏) should, but the current spec's "ResultsOKResponse" is
"oneOf": [ beaconBooleanResponse.json, beaconCountResponse.json, beaconResultsetsResponse.json ]

So that my comment: if we want the "/cohorts/{id}" to return "/a single collection/" we have to change implementations also...

Dmitry

mbaudis and others added 3 commits March 25, 2024 15:13
collections response new prototypes cleanup

This fixes the wrong response for some of teh collections endpoints and changes the (definitely now) ambiguous ResultsOKResponse to the correct instances of

* CollectionsResponse
* ResultsetsResponse
This is just a temporary removal of files colliding w/ case insensitive file systems. We'll have to address the doc file re-generation at some point ...
This is just a temporary removal of files colliding w/ case insensitive file systems. We'll have to address the doc file re-generation at some point ...
@mbaudis mbaudis force-pushed the CollectionsResponse-fix-endpoints branch from 3500d55 to b854dc4 Compare March 25, 2024 14:13
@mbaudis
Copy link
Member Author

mbaudis commented Mar 25, 2024

@redmitry That is actually in line with all entry types: biosamples etc. also only know the ResultsOKResponse which is defined as

    ResultsOKResponse:
      description: Successful operation.
      content:
        application/json:
          schema:
            oneOf:
              - $ref: https://raw.githubusercontent.com/ga4gh-beacon/beacon-v2/main/framework/json/responses/beaconBooleanResponse.json
              - $ref: https://raw.githubusercontent.com/ga4gh-beacon/beacon-v2/main/framework/json/responses/beaconCountResponse.json
              - $ref: https://raw.githubusercontent.com/ga4gh-beacon/beacon-v2/main/framework/json/responses/beaconResultsetsResponse.json

... so yes, a biosamples/{id} response will return a beaconResultsetsResponse where the single biosample is wrapped in results (list) in resultSets (list) in the response (object).

One can argue that we should have a 3rd type of response which is an beaconIdrequestResponse where the result could be the single document (and the responseSummary would be as of now).

How do you respond to {id} requests?

@redmitry
Copy link
Collaborator

How do you respond to {id} requests?
Currently all /{entryType}/{} endpoints return ResultsetsResponse(s) even for "collection" entryTypes.

https://beacons.bsc.es/beacon/v2.0.0/cohorts/CINECA_synthetic_cohort_UK1
https://beacon-apis-demo.ega-archive.org/api/cohorts/CINECA_synthetic_cohort_EUROPE_UK1

@mbaudis
Copy link
Member Author

mbaudis commented Mar 25, 2024

@redmitry Ah; that's what you mean; i.e. the wrong response type (BeaconResultsetsResponse instead of BeaconCollectionsResponse). Yes, that should just be a single collection info. Alt least according to the description in the endpoints files:

  /cohorts/{id}:
    parameters:
      - $ref: '#/components/parameters/entryId'
    get:
      parameters:
        - $ref: '#/components/parameters/requestedSchema'
      description: Get details about one cohort, identified by its (unique) 'id'

... and

  /datasets/{id}:
    parameters:
      - $ref: '#/components/parameters/entryId'
    get:
      parameters:
        - $ref: '#/components/parameters/requestedSchema'
      description: Get details about one dataset, identified by its (unique) 'id'
      operationId: getOneDataset
      tags:

... which is IMO the clear intention; in BeaconCollectionsResponse this would result in a response.results list of one collection. (I didn't write this ¯\_(ツ)_/¯).

My point above was already a bit further - a separate BeaconIdResponse ... or such which would move the list -> object, i.e. a single cohort or biosample ... in response.result.

@costero-e
Copy link
Collaborator

Agree @redmitry that this will change the implementations but anyway, I'm changing it almost every day for adding new features that are on demand so no problem. Plus, it makes more sense to have a collectionResponse rather than resultSetsResponse as @mbaudis says. For me, the PR is good to be merged and we can proceed.

@redmitry
Copy link
Collaborator

So that. Once merged, I update java implementation in accordance.

@mbaudis
Copy link
Member Author

mbaudis commented Mar 25, 2024

Great! Note: I removed some schema .md files from the documentation since they always break merges on Mac OS (same name, different case). This doesn't affect anything since the documentation branch is kept separately, and the website is being built from it. IMO the schema -> .md -> web scripts should be redone (and rerun!) but that's for a separate discussion.

@jrambla
Copy link
Contributor

jrambla commented Mar 25, 2024

Without having read the whole thread in detail yet, as I'm trying to understand which issue we are trying to solve here...
The original issue was described by @redmitry as

cohorts/endpoints.json and datasets/endpoints.json have CollectionsResponse response object derfined as a
choice of beaconBooleanResponse, beaconCountResponse and beaconCollectionsResponse which is wrong.

Should be defined as beaconCollectionsResponse precising the type according the endpoint cohorts/defaultSchema.json or cohorts/defaultSchema.json.

The assertion "is wrong" needs to be clarified. What is exactly wrong in that?
A collection response is different from a resultsets response.

@mbaudis
Copy link
Member Author

mbaudis commented Mar 25, 2024

@jrambla Yeah, well, this filtered down to the collections response in cohorts mapping to a resultsets response as the data option in the beaconOKresponse (in contrast to what was written everywhere else). But we need a resultset response, too, for the cohorts/{id}/individuals (and more in datasets) items. Unless we say they are also just a list since the cohort / dataset is already the wrapper ... But this just isn't clear; it was wrong in any case and the current change basically allows to have now resultsets for the records payloads (e.g. a /datasets/{id}/ endpoint could have individuals from mutiple cohorts resultsets).

@mrueda
Copy link
Collaborator

mrueda commented Mar 25, 2024

Great! Note: I removed some schema .md files from the documentation since they always break merges on Mac OS (same name, different case). This doesn't affect anything since the documentation branch is kept separately, and the website is being built from it. IMO the schema -> .md -> web scripts should be redone (and rerun!) but that's for a separate discussion.

@mbaudis I'll have a look and rename the clashing .md files. My idea was to create a Gihtub action to run the scripts...but that was before we had the re-branching...

@jrambla
Copy link
Contributor

jrambla commented Mar 25, 2024

I fear this is incorrect.
Cohorts/{id} and datasets/{id} should return details of THAT collection, NOT the contents of the collection.
In contrast, cohorts/{id}/individuals will return a resultset with the individuals (simplyfying at this level of discussion)

@redmitry
Copy link
Collaborator

redmitry commented Mar 25, 2024

The assertion "is wrong" needs to be clarified. What is exactly wrong in that?
A collection response is different from a resultsets response.

Well.. I may probably messed it up...
Thing you are right is that we have 2 thing here... my initial issue and the response from /cohorts/{id}
I would pause it as Jordi suggest.

@costero-e
Copy link
Collaborator

Yes, @jrambla is right that this is more a change than a fix and can wait. Maybe it needs more discussion on whether the solution proposed here is better than the one we had so I would also stop the PR until it is properly discussed.

@mbaudis
Copy link
Member Author

mbaudis commented Mar 27, 2024

I'm closing this - see fresh start in #123

@mbaudis mbaudis closed this Mar 27, 2024
@mbaudis mbaudis deleted the CollectionsResponse-fix-endpoints branch March 27, 2024 09:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants