Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dcat:distribution in dcat:DataSets #3995

Closed
arnoweiss opened this issue Mar 12, 2024 Discussed in #3697 · 2 comments · Fixed by #4088
Closed

dcat:distribution in dcat:DataSets #3995

arnoweiss opened this issue Mar 12, 2024 Discussed in #3697 · 2 comments · Fixed by #4088
Assignees
Labels
triage all new issues awaiting classification

Comments

@arnoweiss
Copy link

Opening an issue from an unresolved discussion. Since then, the payloads have slightly changed. Below [1], there's a more recent example. The bahavior from the initial discussion however still persists:

  1. The dcat:distribution section on a dcat:Dataset still returns all transfer profiles without regard to how the edc:Asset was registered.
  2. dct:accessService still points to a literal. So even in a dcat:Catalog document, the data isn't linked.
  3. The dct:format property does point to a resource, however the @id field isn't namespaced. So it's hard to dereference this to a concept. I think, I remember @jimmarino saying they should be maintained in the dspace context but this isn't imported.

[1]

{
    "@id": "3c81f9fe-1ff9-48fa-b8cd-5cc2ea1758f1",
    "@type": "dcat:Catalog",
    "dcat:dataset": {
        "@id": "86cee14d-1fe7-4227-9a7b-a7b59513127d",
        "@type": "dcat:Dataset",
        "odrl:hasPolicy": {
            "@id": "Y29udHJhY3QtODZjZWUxNGQtMWZlNy00MjI3LTlhN2ItYTdiNTk1MTMxMjdk:ODZjZWUxNGQtMWZlNy00MjI3LTlhN2ItYTdiNTk1MTMxMjdk:ZTM3ZjZjZTktZGNmZi00YjJhLWJiNTMtOWMwNTE5MmZkZWRj",
            "@type": "odrl:Set",
            "odrl:permission": {
                "odrl:target": "86cee14d-1fe7-4227-9a7b-a7b59513127d",
                "odrl:action": {
                    "odrl:type": "http://www.w3.org/ns/odrl/2/use"
                },
                "odrl:constraint": {
                    "odrl:leftOperand": "https://w3id.org/tractusx/v0.0.1/ns/Membership",
                    "odrl:operator": {
                        "@id": "odrl:eq"
                    },
                    "odrl:rightOperand": "active"
                }
            },
            "odrl:prohibition": [],
            "odrl:obligation": [],
            "odrl:target": {
                "@id": "86cee14d-1fe7-4227-9a7b-a7b59513127d"
            }
        },
        "dcat:distribution": [
            {
                "@type": "dcat:Distribution",
                "dct:format": {
                    "@id": "HttpProxy-PUSH"
                },
                "dcat:accessService": "bb176d2c-4b3d-4ff5-814a-bfbfee6bfab8"
            },
            {
                "@type": "dcat:Distribution",
                "dct:format": {
                    "@id": "AzureStorage-PUSH"
                },
                "dcat:accessService": "bb176d2c-4b3d-4ff5-814a-bfbfee6bfab8"
            },
            {
                "@type": "dcat:Distribution",
                "dct:format": {
                    "@id": "HttpData-PULL"
                },
                "dcat:accessService": "bb176d2c-4b3d-4ff5-814a-bfbfee6bfab8"
            },
            {
                "@type": "dcat:Distribution",
                "dct:format": {
                    "@id": "AmazonS3-PUSH"
                },
                "dcat:accessService": "bb176d2c-4b3d-4ff5-814a-bfbfee6bfab8"
            }
        ],
        "http://myname.space/property": {
            "@id": "https://w3id.org/catenax/taxonomy#MyConcept"
        },
        "id": "86cee14d-1fe7-4227-9a7b-a7b59513127d"
    },
    "dcat:service": {
        "@id": "bb176d2c-4b3d-4ff5-814a-bfbfee6bfab8",
        "@type": "dcat:DataService",
        "dct:terms": "connector",
        "dct:endpointUrl": "https://mycontrol.plane/api/v1/dsp"
    },
    "participantId": "BPNL00000007RG4F",
    "@context": {
        "@vocab": "https://w3id.org/edc/v0.0.1/ns/",
        "edc": "https://w3id.org/edc/v0.0.1/ns/",
        "tx": "https://w3id.org/tractusx/v0.0.1/ns/",
        "dcat": "http://www.w3.org/ns/dcat#",
        "dct": "https://purl.org/dc/terms/",
        "odrl": "http://www.w3.org/ns/odrl/2/",
        "dspace": "https://w3id.org/dspace/v0.8/"
    }
}

Discussed in #3697

Originally posted by arnoweiss December 11, 2023
The catalog-requests return data in the structure of dcat:DataSets that look as follows. Please note the dcat:distribution section.

{
  "@id": "10b1b0f3-5a67-4eee-9404-5a300356a50d",
  "@type": "dcat:Catalog",
  "dcat:dataset": [
    {
      "@id": "<ASSET-ID>",
      "@type": "dcat:Dataset",
      "odrl:hasPolicy": {
        "@id": "Y29udHJhY3QtZ2V0LTE=:anNvbi1nZXQtMQ==:MDEwODg2ZTItZDhmNi00Y2NjLWFhMWYtY2U2Y2JmYjlmMWQz",
        "@type": "odrl:Set",
        "odrl:permission": {
          "odrl:target": "<ASSET-ID>",
          "odrl:action": {
            "odrl:type": "http://www.w3.org/ns/odrl/2/use"
          },
          "odrl:constraint": {
            "odrl:leftOperand": "https://w3id.org/tractusx/v0.0.1/ns/Membership",
            "odrl:operator": {
              "@id": "odrl:eq"
            },
            "odrl:rightOperand": "active"
          }
        },
        "odrl:prohibition": [],
        "odrl:obligation": [],
        "odrl:target": "<ASSET-ID>"
      },
      "dcat:distribution": [
        {
          "@type": "dcat:Distribution",
          "dct:format": {
            "@id": "HttpProxy"
          },
          "dcat:accessService": "b4f2c6b6-d3d1-46e2-a517-6912b7f8a509"
        },
        {
          "@type": "dcat:Distribution",
          "dct:format": {
            "@id": "AmazonS3"
          },
          "dcat:accessService": "b4f2c6b6-d3d1-46e2-a517-6912b7f8a509"
        }
      ],
      "edc:description": "Json Get Asset",
      "edc:id": "<ASSET-ID>",
      "dct:type": {
        "@id": "https://my-namespa.ce/my-asset-type"
      }
    }
  ],
  "dcat:service": {
    "@id": "b4f2c6b6-d3d1-46e2-a517-6912b7f8a509",
    "@type": "dcat:DataService",
    "dct:terms": "connector",
    "dct:endpointUrl": "https://provider-data.plane/api/v1/dsp"
  },
  "edc:participantId": "BPNL000SAP000003",
  "@context": {
    "dct": "https://purl.org/dc/terms/",
    "tx": "https://w3id.org/tractusx/v0.0.1/ns/",
    "edc": "https://w3id.org/edc/v0.0.1/ns/",
    "dcat": "https://www.w3.org/ns/dcat/",
    "odrl": "http://www.w3.org/ns/odrl/2/",
    "dspace": "https://w3id.org/dspace/v0.8/"
  }
}

dcat:distribution currently always returns the two entries - one with the format "AmazonS3" and one with "HttpProxy" as dcat:format. As I understand the definition of this section in the context of an EDC, this should signify the means a consumer can use to access the data - in this case either the Http- or S3-Data-Planes. In its current behavior, the two distribution methods are returned independent of the èdc:type that was set by the Data Provider in the POST /v3/assets's DataAddress object. So here's a couple of hypotheses to discuss:

  • The catalog should return only those dcat:distributions that the backend's data is actually available via.
  • dcat:accessService should contain a resource, not a literal.
  • The range of edc:type (currently AmazonS3, HttpData) and dct:format (AmazonS3 , HttpProxy) should be the same.
@github-actions github-actions bot added the triage all new issues awaiting classification label Mar 12, 2024
Copy link

This issue is stale because it has been open for 14 days with no activity.

@github-actions github-actions bot added the stale Open for x days with no activity label Mar 27, 2024
@ndr-brt ndr-brt removed the stale Open for x days with no activity label Mar 27, 2024
@ndr-brt
Copy link
Member

ndr-brt commented Apr 4, 2024

  1. The dcat:distribution section on a dcat:Dataset still returns all transfer profiles without regard to how the edc:Asset was registered.

this is correct, the way the distribution is built is driven by the DistributorResolver, specifically, by default (DefaultDistributionResolver) the distribution types are obtained from the data flow manager. The way a consumer can get the data is dependent on the data planes registered on the connector.
e.g. if a data plane with support for "AmazonS3" is registered, every asset can get with an "AmazonS3-PUSH", also if the data is located e.g. on Azure Blobstorage.
For different needs, the DistributorResolver can be overridden.

  1. dct:accessService still points to a literal. So even in a dcat:Catalog document, the data isn't linked.

true, I guess it was intended in the wrong way when it was implemented, at the moment the EDC by default has a single DataSource, that's specified by the dcat:service attribute and can be used instead of the accessService for now.

  1. The dct:format property does point to a resource, however the @id field isn't namespaced. So it's hard to dereference this to a concept. I think, I remember @jimmarino saying they should be maintained in the dspace context but this isn't imported.

true, the format has not been namespaced. BTW I don't think the formats should be described in the dspace context but they should be dataspace specific in my opinion.
I think this is related to the work to be done in #4031

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triage all new issues awaiting classification
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants