Let me compare MW, logP and PSA for known oxidoreductase inhibitors #3

egonw · 2019-08-19T08:32:11Z

Should be solvable with just the EBI platform.

egonw · 2019-08-19T08:39:12Z

From ChEBI, get all oxidoreductate inhibitors:

SELECT ?subject ?label ?altTerm 
from <http://rdf.ebi.ac.uk/dataset/chebi> 
    WHERE { 
    ?subject rdfs:subClassOf* <http://purl.obolibrary.org/obo/CHEBI_76725> . 
    ?subject rdfs:label ?label. 
}

Chris-Evelo · 2019-08-19T08:49:02Z

OK, so what do I need in front of that, to be sure I get that ChEBI ontology term ID from the description oxicoreductase inhibitors? Ask the OLS RDF? And if that fails use OLS API? Can we stat at WikiData?

egonw · 2019-08-19T08:57:31Z

okay, the EBIRDF platform has the ChEBI, but I cannot find the has_role predicate:

SELECT * WHERE {
  {
    <http://purl.obolibrary.org/obo/CHEBI_3962> ?p <http://purl.obolibrary.org/obo/CHEBI_77484>
  }   UNION 
  {
    <http://purl.obolibrary.org/obo/CHEBI_77484> ?p <http://purl.obolibrary.org/obo/CHEBI_3962>
  }
}

Chris-Evelo · 2019-08-19T09:27:39Z

So moving down you would ask WikiData first about all oxidoreductases (but it doesn't have ChEBI yet) then ask the OLS RDF via EBI SPARQL endpoint there but you run into a missing predicate problem. Then you could use ChEBI RDF itself, but not in EBI RDF, so you need to fire up a SPARQL endpoint.

Chris-Evelo · 2019-08-19T09:29:05Z

Where the discussion about the "expressed in tissue" question we just had lead to the conclusion that you would still start the question at WikiData, but then simply get 0 hits and combine that with whatever you find further down the line.

egonw · 2019-08-19T09:29:27Z

Working on those inhibitors in Wikidata:

egonw · 2019-08-19T12:32:16Z

Solution (run):

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX hasValue: <http://semanticscience.org/resource/SIO_000300>
PREFIX hasAttribute: <http://semanticscience.org/resource/SIO_000008>
PREFIX psa: <http://semanticscience.org/resource/CHEMINF_000307>
PREFIX logP: <http://semanticscience.org/resource/CHEMINF_000295>
PREFIX mw: <http://semanticscience.org/resource/CHEMINF_000216>

  SELECT DISTINCT ?mol ?molLabel ?InChIKey ?mass ?ChEMBL ?ChEMBLUrl ?mw ?PSA ?logP WITH {
    SELECT DISTINCT ?mol WHERE {
      ?mol wdt:P31/wdt:P279* wd:Q66587127 .
    } LIMIT 500
  } AS %result
  WITH {
    SELECT ?mol ?InChIKey ?mass ?ChEMBL ?ChEMBLUrl WHERE {
      INCLUDE %result
      OPTIONAL { ?mol wdt:P235 ?InChIKey }
      OPTIONAL { ?mol wdt:P2067 ?mass }
      VALUES ?ChEMBLIDdir { wdt:P592 }
      ?mol ?ChEMBLIDdir ?ChEMBL .
      OPTIONAL {
        ?ChEMBLIDpred wikibase:directClaim ?ChEMBLIDdir .
        ?ChEMBLIDpred wdt:P1921 ?ChEMBLformatterurl .
      }
      BIND(IRI(REPLACE(?ChEMBLformatterurl, '\\$1', str(?ChEMBL))) AS ?ChEMBLUrl).
    }
  } AS %nextresult
  WITH {
    SELECT ?mol ?InChIKey ?mass ?ChEMBL ?ChEMBLUrl ?mw ?PSA ?logP WHERE {
      INCLUDE %nextresult
      SERVICE <https://www.ebi.ac.uk/rdf/services/sparql> {
        ?ChEMBLUrl hasAttribute: ?prop1 .
        ?prop1 a ?prop1Type ; hasValue: ?mw .
        ?prop1Type rdfs:subClassOf mw: .
        ?ChEMBLUrl hasAttribute: ?prop2 .
        ?prop2 a ?prop2Type ; hasValue: ?PSA .
        ?prop2Type rdfs:subClassOf psa: .
        ?ChEMBLUrl hasAttribute: ?prop3 .
        ?prop3 a ?prop3Type ; hasValue: ?logP .
        ?prop3Type rdfs:subClassOf logP: .
      }
    }
  } AS %finalresult
  WHERE {
    INCLUDE %finalresult
    SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
  }

Chris-Evelo · 2019-08-19T12:51:11Z

It turns out that in its current form this doesn't have a mapping problem because WikiData does the mapping itself (which is in fact a nice way to solve the problem and should be part of a write-up). As soon as we add another layer (like "for this group of chemically related compounds, tell me which ones are oxidoreductase inhibitors and ..." you do run into mapping and even lenses (what are the active stereoisomers) problems.

Chris-Evelo · 2019-08-19T12:51:30Z

Also, we could use this as a lead example for the CRS development when we ask for: "which of the compounds that contain these substructures are oxido-reductase inhibitors and ..."

egonw · 2019-08-19T14:29:44Z

So, if we start with EC enzyme numbers (for oxidoreductases) and then go to ChEMBL and use the experimental data there, we have something like the following, but it doesn't really scale:

PREFIX cco: <http://rdf.ebi.ac.uk/terms/chembl#>
PREFIX hasValue: <http://semanticscience.org/resource/SIO_000300>
PREFIX hasAttribute: <http://semanticscience.org/resource/SIO_000008>
PREFIX psa: <http://semanticscience.org/resource/CHEMINF_000307>
PREFIX logP: <http://semanticscience.org/resource/CHEMINF_000295>
PREFIX mw: <http://semanticscience.org/resource/CHEMINF_000216>

SELECT DISTINCT ?molecule ?mw ?PSA ?logP WITH {
  SELECT DISTINCT ?UniProtUrl WHERE {
    ?protein wdt:P591 ?ecnumber ; wdt:P702 [] ; wdt:P352 ?uniprot .
    FILTER (STRSTARTS(?ecnumber, "1."))
    VALUES ?UniProtIDdir { wdt:P352 }
    ?protein ?UniProtIDdir ?uniprot .
    ?UniProtIDpred wikibase:directClaim ?UniProtIDdir ;
                   wdt:P1921 ?UniProtformatterurl .
    BIND(IRI(REPLACE(?UniProtformatterurl, '\\$1', str(?uniprot))) AS ?UniProtUrl).
  } LIMIT 1
} AS %results
WITH {
  SELECT DISTINCT ?molecule WHERE {
    INCLUDE %results
    SERVICE <https://www.ebi.ac.uk/rdf/services/sparql> {
      ?activity a cco:Activity ;
        cco:hasMolecule ?molecule ;
        cco:hasAssay/cco:hasTarget/cco:hasTargetComponent/cco:targetCmptXref ?UniProtUrl ;
        cco:pChembl ?pchembl .
      FILTER (?pchembl > 8)
    }
  } LIMIT 50
} AS %nextresults
WHERE {
  SELECT DISTINCT ?molecule ?mw ?PSA ?logP WHERE {
    INCLUDE %nextresults
    SERVICE <https://www.ebi.ac.uk/rdf/services/sparql> {
      ?molecule hasAttribute: ?prop1 .
      ?prop1 a ?prop1Type ; hasValue: ?mw .
      ?prop1Type rdfs:subClassOf mw: .
      ?molecule hasAttribute: ?prop2 .
      ?prop2 a ?prop2Type ; hasValue: ?PSA .
      ?prop2Type rdfs:subClassOf psa: .
      ?molecule hasAttribute: ?prop3 .
      ?prop3 a ?prop3Type ; hasValue: ?logP .
      ?prop3Type rdfs:subClassOf logP: .
    }
  } LIMIT 50
}

AlasdairGray · 2019-08-21T07:03:27Z

I can now recreate the single complex query as a series of small queries. However, we encounter the query limits on wikidata which only permit a query every 60s. Even with putting a sleep 60s in the for loop I'm hitting the timeout, with 2 (4th & 10th) of my 10 requests being rejected. This already took 10 minutes to run.

Will need to investigate if we can give a list of values to use in the query or think of an alternative. @egonw @pgroth any thoughts?

egonw · 2019-08-21T07:10:14Z

@AlasdairGray, this page is relevant in this contect: https://www.wikidata.org/wiki/Wikidata:WikiProject_Limits_of_Wikidata

AlasdairGray · 2019-08-21T08:17:21Z

I was more thinking could we pass a list in using the VALUES feature. I'll need to investigate whether that is possible both in terms of SPARQL and grlc.

Any other thoughts, or do we have to make the query block larger, i.e. include the inhibitor type as the parameter but then return the chemical information? We are then likely to run into a similar problem for pulling the data from the EBI RDF platform (limits.

Chris-Evelo · 2019-08-21T09:54:28Z

I would discuss this with Andra Waagmeester. They may have solved problems like this before.

AlasdairGray · 2019-08-22T16:30:15Z

Using the VALUES approach I can include a list of identifiers. I will now need to work out how to do this using grlc for the REST API.

I don't see an obvious way of doing this from the documentation. I'm thinking that I'll need to define a new string parameter which would then include the values that are to be passed in. My concern is that these don't get processed correctly. I'll create a new play query and try a few things out.

egonw · 2019-08-22T16:46:36Z

Ah, yes, sure. Please use VALUES to query these properties for as many compounds at the same time. Using VALUES for asking the three properties is harder, as they would end up as separate rows, tho Finn actually has a trick for that in Scholia too.

egonw self-assigned this Aug 19, 2019

egonw added the question Further information is requested label Aug 19, 2019

Chris-Evelo added ChEMBL Uses ChEMBL as datasource CRS Uses a (to be developed) chemical substructure resolution system to identify compounds IRS Would benefit from an identity resolution service step labels Aug 19, 2019

egonw added the Wikidata label Aug 19, 2019

egonw removed their assignment Oct 6, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Let me compare MW, logP and PSA for known oxidoreductase inhibitors #3

Let me compare MW, logP and PSA for known oxidoreductase inhibitors #3

egonw commented Aug 19, 2019

egonw commented Aug 19, 2019 •

edited

Loading

Chris-Evelo commented Aug 19, 2019

egonw commented Aug 19, 2019

Chris-Evelo commented Aug 19, 2019

Chris-Evelo commented Aug 19, 2019 •

edited

Loading

egonw commented Aug 19, 2019

egonw commented Aug 19, 2019 •

edited

Loading

Chris-Evelo commented Aug 19, 2019

Chris-Evelo commented Aug 19, 2019

egonw commented Aug 19, 2019

AlasdairGray commented Aug 21, 2019

egonw commented Aug 21, 2019

AlasdairGray commented Aug 21, 2019

Chris-Evelo commented Aug 21, 2019

AlasdairGray commented Aug 22, 2019

egonw commented Aug 22, 2019

Let me compare MW, logP and PSA for known oxidoreductase inhibitors #3

Let me compare MW, logP and PSA for known oxidoreductase inhibitors #3

Comments

egonw commented Aug 19, 2019

egonw commented Aug 19, 2019 • edited Loading

Chris-Evelo commented Aug 19, 2019

egonw commented Aug 19, 2019

Chris-Evelo commented Aug 19, 2019

Chris-Evelo commented Aug 19, 2019 • edited Loading

egonw commented Aug 19, 2019

egonw commented Aug 19, 2019 • edited Loading

Chris-Evelo commented Aug 19, 2019

Chris-Evelo commented Aug 19, 2019

egonw commented Aug 19, 2019

AlasdairGray commented Aug 21, 2019

egonw commented Aug 21, 2019

AlasdairGray commented Aug 21, 2019

Chris-Evelo commented Aug 21, 2019

AlasdairGray commented Aug 22, 2019

egonw commented Aug 22, 2019

egonw commented Aug 19, 2019 •

edited

Loading

Chris-Evelo commented Aug 19, 2019 •

edited

Loading

egonw commented Aug 19, 2019 •

edited

Loading