Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Let me compare MW, logP and PSA for known oxidoreductase inhibitors #3

Open
egonw opened this issue Aug 19, 2019 · 16 comments
Open

Let me compare MW, logP and PSA for known oxidoreductase inhibitors #3

egonw opened this issue Aug 19, 2019 · 16 comments
Labels
ChEMBL Uses ChEMBL as datasource CRS Uses a (to be developed) chemical substructure resolution system to identify compounds IRS Would benefit from an identity resolution service step question Further information is requested Wikidata

Comments

@egonw
Copy link
Member

egonw commented Aug 19, 2019

Should be solvable with just the EBI platform.

@egonw egonw self-assigned this Aug 19, 2019
@egonw
Copy link
Member Author

egonw commented Aug 19, 2019

From ChEBI, get all oxidoreductate inhibitors:

SELECT ?subject ?label ?altTerm 
from <http://rdf.ebi.ac.uk/dataset/chebi> 
    WHERE { 
    ?subject rdfs:subClassOf* <http://purl.obolibrary.org/obo/CHEBI_76725> . 
    ?subject rdfs:label ?label. 
}

@egonw egonw added the question Further information is requested label Aug 19, 2019
@Chris-Evelo
Copy link
Collaborator

OK, so what do I need in front of that, to be sure I get that ChEBI ontology term ID from the description oxicoreductase inhibitors? Ask the OLS RDF? And if that fails use OLS API? Can we stat at WikiData?

@egonw
Copy link
Member Author

egonw commented Aug 19, 2019

okay, the EBIRDF platform has the ChEBI, but I cannot find the has_role predicate:

SELECT * WHERE {
  {
    <http://purl.obolibrary.org/obo/CHEBI_3962> ?p <http://purl.obolibrary.org/obo/CHEBI_77484>
  }   UNION 
  {
    <http://purl.obolibrary.org/obo/CHEBI_77484> ?p <http://purl.obolibrary.org/obo/CHEBI_3962>
  }
}

@Chris-Evelo
Copy link
Collaborator

So moving down you would ask WikiData first about all oxidoreductases (but it doesn't have ChEBI yet) then ask the OLS RDF via EBI SPARQL endpoint there but you run into a missing predicate problem. Then you could use ChEBI RDF itself, but not in EBI RDF, so you need to fire up a SPARQL endpoint.

@Chris-Evelo
Copy link
Collaborator

Chris-Evelo commented Aug 19, 2019

Where the discussion about the "expressed in tissue" question we just had lead to the conclusion that you would still start the question at WikiData, but then simply get 0 hits and combine that with whatever you find further down the line.

@egonw
Copy link
Member Author

egonw commented Aug 19, 2019

Working on those inhibitors in Wikidata:

image

@egonw
Copy link
Member Author

egonw commented Aug 19, 2019

Solution (run):

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX hasValue: <http://semanticscience.org/resource/SIO_000300>
PREFIX hasAttribute: <http://semanticscience.org/resource/SIO_000008>
PREFIX psa: <http://semanticscience.org/resource/CHEMINF_000307>
PREFIX logP: <http://semanticscience.org/resource/CHEMINF_000295>
PREFIX mw: <http://semanticscience.org/resource/CHEMINF_000216>

  SELECT DISTINCT ?mol ?molLabel ?InChIKey ?mass ?ChEMBL ?ChEMBLUrl ?mw ?PSA ?logP WITH {
    SELECT DISTINCT ?mol WHERE {
      ?mol wdt:P31/wdt:P279* wd:Q66587127 .
    } LIMIT 500
  } AS %result
  WITH {
    SELECT ?mol ?InChIKey ?mass ?ChEMBL ?ChEMBLUrl WHERE {
      INCLUDE %result
      OPTIONAL { ?mol wdt:P235 ?InChIKey }
      OPTIONAL { ?mol wdt:P2067 ?mass }
      VALUES ?ChEMBLIDdir { wdt:P592 }
      ?mol ?ChEMBLIDdir ?ChEMBL .
      OPTIONAL {
        ?ChEMBLIDpred wikibase:directClaim ?ChEMBLIDdir .
        ?ChEMBLIDpred wdt:P1921 ?ChEMBLformatterurl .
      }
      BIND(IRI(REPLACE(?ChEMBLformatterurl, '\\$1', str(?ChEMBL))) AS ?ChEMBLUrl).
    }
  } AS %nextresult
  WITH {
    SELECT ?mol ?InChIKey ?mass ?ChEMBL ?ChEMBLUrl ?mw ?PSA ?logP WHERE {
      INCLUDE %nextresult
      SERVICE <https://www.ebi.ac.uk/rdf/services/sparql> {
        ?ChEMBLUrl hasAttribute: ?prop1 .
        ?prop1 a ?prop1Type ; hasValue: ?mw .
        ?prop1Type rdfs:subClassOf mw: .
        ?ChEMBLUrl hasAttribute: ?prop2 .
        ?prop2 a ?prop2Type ; hasValue: ?PSA .
        ?prop2Type rdfs:subClassOf psa: .
        ?ChEMBLUrl hasAttribute: ?prop3 .
        ?prop3 a ?prop3Type ; hasValue: ?logP .
        ?prop3Type rdfs:subClassOf logP: .
      }
    }
  } AS %finalresult
  WHERE {
    INCLUDE %finalresult
    SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
  }

@Chris-Evelo
Copy link
Collaborator

It turns out that in its current form this doesn't have a mapping problem because WikiData does the mapping itself (which is in fact a nice way to solve the problem and should be part of a write-up). As soon as we add another layer (like "for this group of chemically related compounds, tell me which ones are oxidoreductase inhibitors and ..." you do run into mapping and even lenses (what are the active stereoisomers) problems.

@Chris-Evelo
Copy link
Collaborator

Also, we could use this as a lead example for the CRS development when we ask for: "which of the compounds that contain these substructures are oxido-reductase inhibitors and ..."

@Chris-Evelo Chris-Evelo added ChEMBL Uses ChEMBL as datasource CRS Uses a (to be developed) chemical substructure resolution system to identify compounds IRS Would benefit from an identity resolution service step labels Aug 19, 2019
@egonw
Copy link
Member Author

egonw commented Aug 19, 2019

So, if we start with EC enzyme numbers (for oxidoreductases) and then go to ChEMBL and use the experimental data there, we have something like the following, but it doesn't really scale:

PREFIX cco: <http://rdf.ebi.ac.uk/terms/chembl#>
PREFIX hasValue: <http://semanticscience.org/resource/SIO_000300>
PREFIX hasAttribute: <http://semanticscience.org/resource/SIO_000008>
PREFIX psa: <http://semanticscience.org/resource/CHEMINF_000307>
PREFIX logP: <http://semanticscience.org/resource/CHEMINF_000295>
PREFIX mw: <http://semanticscience.org/resource/CHEMINF_000216>

SELECT DISTINCT ?molecule ?mw ?PSA ?logP WITH {
  SELECT DISTINCT ?UniProtUrl WHERE {
    ?protein wdt:P591 ?ecnumber ; wdt:P702 [] ; wdt:P352 ?uniprot .
    FILTER (STRSTARTS(?ecnumber, "1."))
    VALUES ?UniProtIDdir { wdt:P352 }
    ?protein ?UniProtIDdir ?uniprot .
    ?UniProtIDpred wikibase:directClaim ?UniProtIDdir ;
                   wdt:P1921 ?UniProtformatterurl .
    BIND(IRI(REPLACE(?UniProtformatterurl, '\\$1', str(?uniprot))) AS ?UniProtUrl).
  } LIMIT 1
} AS %results
WITH {
  SELECT DISTINCT ?molecule WHERE {
    INCLUDE %results
    SERVICE <https://www.ebi.ac.uk/rdf/services/sparql> {
      ?activity a cco:Activity ;
        cco:hasMolecule ?molecule ;
        cco:hasAssay/cco:hasTarget/cco:hasTargetComponent/cco:targetCmptXref ?UniProtUrl ;
        cco:pChembl ?pchembl .
      FILTER (?pchembl > 8)
    }
  } LIMIT 50
} AS %nextresults
WHERE {
  SELECT DISTINCT ?molecule ?mw ?PSA ?logP WHERE {
    INCLUDE %nextresults
    SERVICE <https://www.ebi.ac.uk/rdf/services/sparql> {
      ?molecule hasAttribute: ?prop1 .
      ?prop1 a ?prop1Type ; hasValue: ?mw .
      ?prop1Type rdfs:subClassOf mw: .
      ?molecule hasAttribute: ?prop2 .
      ?prop2 a ?prop2Type ; hasValue: ?PSA .
      ?prop2Type rdfs:subClassOf psa: .
      ?molecule hasAttribute: ?prop3 .
      ?prop3 a ?prop3Type ; hasValue: ?logP .
      ?prop3Type rdfs:subClassOf logP: .
    }
  } LIMIT 50
}

@egonw egonw added the Wikidata label Aug 19, 2019
@AlasdairGray
Copy link
Member

I can now recreate the single complex query as a series of small queries. However, we encounter the query limits on wikidata which only permit a query every 60s. Even with putting a sleep 60s in the for loop I'm hitting the timeout, with 2 (4th & 10th) of my 10 requests being rejected. This already took 10 minutes to run.

Will need to investigate if we can give a list of values to use in the query or think of an alternative. @egonw @pgroth any thoughts?

@egonw
Copy link
Member Author

egonw commented Aug 21, 2019

@AlasdairGray, this page is relevant in this contect: https://www.wikidata.org/wiki/Wikidata:WikiProject_Limits_of_Wikidata

@AlasdairGray
Copy link
Member

I was more thinking could we pass a list in using the VALUES feature. I'll need to investigate whether that is possible both in terms of SPARQL and grlc.

Any other thoughts, or do we have to make the query block larger, i.e. include the inhibitor type as the parameter but then return the chemical information? We are then likely to run into a similar problem for pulling the data from the EBI RDF platform (limits.

@Chris-Evelo
Copy link
Collaborator

I would discuss this with Andra Waagmeester. They may have solved problems like this before.

@AlasdairGray
Copy link
Member

Using the VALUES approach I can include a list of identifiers. I will now need to work out how to do this using grlc for the REST API.

I don't see an obvious way of doing this from the documentation. I'm thinking that I'll need to define a new string parameter which would then include the values that are to be passed in. My concern is that these don't get processed correctly. I'll create a new play query and try a few things out.

@egonw
Copy link
Member Author

egonw commented Aug 22, 2019

Ah, yes, sure. Please use VALUES to query these properties for as many compounds at the same time. Using VALUES for asking the three properties is harder, as they would end up as separate rows, tho Finn actually has a trick for that in Scholia too.

@egonw egonw removed their assignment Oct 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ChEMBL Uses ChEMBL as datasource CRS Uses a (to be developed) chemical substructure resolution system to identify compounds IRS Would benefit from an identity resolution service step question Further information is requested Wikidata
Projects
None yet
Development

No branches or pull requests

3 participants