-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Let me compare MW, logP and PSA for known oxidoreductase inhibitors #3
Comments
From ChEBI, get all oxidoreductate inhibitors: SELECT ?subject ?label ?altTerm
from <http://rdf.ebi.ac.uk/dataset/chebi>
WHERE {
?subject rdfs:subClassOf* <http://purl.obolibrary.org/obo/CHEBI_76725> .
?subject rdfs:label ?label.
} |
OK, so what do I need in front of that, to be sure I get that ChEBI ontology term ID from the description oxicoreductase inhibitors? Ask the OLS RDF? And if that fails use OLS API? Can we stat at WikiData? |
okay, the EBIRDF platform has the ChEBI, but I cannot find the SELECT * WHERE {
{
<http://purl.obolibrary.org/obo/CHEBI_3962> ?p <http://purl.obolibrary.org/obo/CHEBI_77484>
} UNION
{
<http://purl.obolibrary.org/obo/CHEBI_77484> ?p <http://purl.obolibrary.org/obo/CHEBI_3962>
}
} |
So moving down you would ask WikiData first about all oxidoreductases (but it doesn't have ChEBI yet) then ask the OLS RDF via EBI SPARQL endpoint there but you run into a missing predicate problem. Then you could use ChEBI RDF itself, but not in EBI RDF, so you need to fire up a SPARQL endpoint. |
Where the discussion about the "expressed in tissue" question we just had lead to the conclusion that you would still start the question at WikiData, but then simply get 0 hits and combine that with whatever you find further down the line. |
Working on those inhibitors in Wikidata: |
Solution (run): PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX hasValue: <http://semanticscience.org/resource/SIO_000300>
PREFIX hasAttribute: <http://semanticscience.org/resource/SIO_000008>
PREFIX psa: <http://semanticscience.org/resource/CHEMINF_000307>
PREFIX logP: <http://semanticscience.org/resource/CHEMINF_000295>
PREFIX mw: <http://semanticscience.org/resource/CHEMINF_000216>
SELECT DISTINCT ?mol ?molLabel ?InChIKey ?mass ?ChEMBL ?ChEMBLUrl ?mw ?PSA ?logP WITH {
SELECT DISTINCT ?mol WHERE {
?mol wdt:P31/wdt:P279* wd:Q66587127 .
} LIMIT 500
} AS %result
WITH {
SELECT ?mol ?InChIKey ?mass ?ChEMBL ?ChEMBLUrl WHERE {
INCLUDE %result
OPTIONAL { ?mol wdt:P235 ?InChIKey }
OPTIONAL { ?mol wdt:P2067 ?mass }
VALUES ?ChEMBLIDdir { wdt:P592 }
?mol ?ChEMBLIDdir ?ChEMBL .
OPTIONAL {
?ChEMBLIDpred wikibase:directClaim ?ChEMBLIDdir .
?ChEMBLIDpred wdt:P1921 ?ChEMBLformatterurl .
}
BIND(IRI(REPLACE(?ChEMBLformatterurl, '\\$1', str(?ChEMBL))) AS ?ChEMBLUrl).
}
} AS %nextresult
WITH {
SELECT ?mol ?InChIKey ?mass ?ChEMBL ?ChEMBLUrl ?mw ?PSA ?logP WHERE {
INCLUDE %nextresult
SERVICE <https://www.ebi.ac.uk/rdf/services/sparql> {
?ChEMBLUrl hasAttribute: ?prop1 .
?prop1 a ?prop1Type ; hasValue: ?mw .
?prop1Type rdfs:subClassOf mw: .
?ChEMBLUrl hasAttribute: ?prop2 .
?prop2 a ?prop2Type ; hasValue: ?PSA .
?prop2Type rdfs:subClassOf psa: .
?ChEMBLUrl hasAttribute: ?prop3 .
?prop3 a ?prop3Type ; hasValue: ?logP .
?prop3Type rdfs:subClassOf logP: .
}
}
} AS %finalresult
WHERE {
INCLUDE %finalresult
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
} |
It turns out that in its current form this doesn't have a mapping problem because WikiData does the mapping itself (which is in fact a nice way to solve the problem and should be part of a write-up). As soon as we add another layer (like "for this group of chemically related compounds, tell me which ones are oxidoreductase inhibitors and ..." you do run into mapping and even lenses (what are the active stereoisomers) problems. |
Also, we could use this as a lead example for the CRS development when we ask for: "which of the compounds that contain these substructures are oxido-reductase inhibitors and ..." |
So, if we start with EC enzyme numbers (for oxidoreductases) and then go to ChEMBL and use the experimental data there, we have something like the following, but it doesn't really scale: PREFIX cco: <http://rdf.ebi.ac.uk/terms/chembl#>
PREFIX hasValue: <http://semanticscience.org/resource/SIO_000300>
PREFIX hasAttribute: <http://semanticscience.org/resource/SIO_000008>
PREFIX psa: <http://semanticscience.org/resource/CHEMINF_000307>
PREFIX logP: <http://semanticscience.org/resource/CHEMINF_000295>
PREFIX mw: <http://semanticscience.org/resource/CHEMINF_000216>
SELECT DISTINCT ?molecule ?mw ?PSA ?logP WITH {
SELECT DISTINCT ?UniProtUrl WHERE {
?protein wdt:P591 ?ecnumber ; wdt:P702 [] ; wdt:P352 ?uniprot .
FILTER (STRSTARTS(?ecnumber, "1."))
VALUES ?UniProtIDdir { wdt:P352 }
?protein ?UniProtIDdir ?uniprot .
?UniProtIDpred wikibase:directClaim ?UniProtIDdir ;
wdt:P1921 ?UniProtformatterurl .
BIND(IRI(REPLACE(?UniProtformatterurl, '\\$1', str(?uniprot))) AS ?UniProtUrl).
} LIMIT 1
} AS %results
WITH {
SELECT DISTINCT ?molecule WHERE {
INCLUDE %results
SERVICE <https://www.ebi.ac.uk/rdf/services/sparql> {
?activity a cco:Activity ;
cco:hasMolecule ?molecule ;
cco:hasAssay/cco:hasTarget/cco:hasTargetComponent/cco:targetCmptXref ?UniProtUrl ;
cco:pChembl ?pchembl .
FILTER (?pchembl > 8)
}
} LIMIT 50
} AS %nextresults
WHERE {
SELECT DISTINCT ?molecule ?mw ?PSA ?logP WHERE {
INCLUDE %nextresults
SERVICE <https://www.ebi.ac.uk/rdf/services/sparql> {
?molecule hasAttribute: ?prop1 .
?prop1 a ?prop1Type ; hasValue: ?mw .
?prop1Type rdfs:subClassOf mw: .
?molecule hasAttribute: ?prop2 .
?prop2 a ?prop2Type ; hasValue: ?PSA .
?prop2Type rdfs:subClassOf psa: .
?molecule hasAttribute: ?prop3 .
?prop3 a ?prop3Type ; hasValue: ?logP .
?prop3Type rdfs:subClassOf logP: .
}
} LIMIT 50
} |
I can now recreate the single complex query as a series of small queries. However, we encounter the query limits on wikidata which only permit a query every 60s. Even with putting a sleep 60s in the for loop I'm hitting the timeout, with 2 (4th & 10th) of my 10 requests being rejected. This already took 10 minutes to run. Will need to investigate if we can give a list of values to use in the query or think of an alternative. @egonw @pgroth any thoughts? |
@AlasdairGray, this page is relevant in this contect: https://www.wikidata.org/wiki/Wikidata:WikiProject_Limits_of_Wikidata |
I was more thinking could we pass a list in using the Any other thoughts, or do we have to make the query block larger, i.e. include the inhibitor type as the parameter but then return the chemical information? We are then likely to run into a similar problem for pulling the data from the EBI RDF platform (limits. |
I would discuss this with Andra Waagmeester. They may have solved problems like this before. |
Using the I don't see an obvious way of doing this from the documentation. I'm thinking that I'll need to define a new string parameter which would then include the values that are to be passed in. My concern is that these don't get processed correctly. I'll create a new play query and try a few things out. |
Ah, yes, sure. Please use |
Should be solvable with just the EBI platform.
The text was updated successfully, but these errors were encountered: