Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mapping of Compound IDs #70

Open
joewandy opened this issue Aug 11, 2020 · 4 comments
Open

Mapping of Compound IDs #70

joewandy opened this issue Aug 11, 2020 · 4 comments
Assignees
Labels
bug priority This issue should be worked on before others

Comments

@joewandy
Copy link
Member

Sometimes a compound is mapped by ID to some ChEBI ID, but it turns out in Reactome a different ID is being used to relate the compound to reactions and pathways. This leads to some compounds missing from the table. We need to fix this.

e.g. for Histidine, often people use this ID to identify the compound

https://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:27570

but Reactome uses this ID, which is the zwitterion form:

https://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:57595

@joewandy joewandy added bug priority This issue should be worked on before others labels Aug 11, 2020
@RonanDaly
Copy link
Member

Have some R code to do this along with a mapping from ChEBI.

@RonanDaly
Copy link
Member

Here's the R. It looks like I was using the mapping to construct a graph and performing a breadth first search to find linked entities. This might not be a good way to do it!

expandChEBIIds = function(ids) {
  chebi_relation_file = system.file("extdata", "relation.tsv", package = "polydatar")
  df = read.table(chebi_relation_file, stringsAsFactors = FALSE, sep='\t', header = TRUE)
  mapping = df[df$TYPE == 'is_conjugate_acid_of' | df$TYPE == 'is_conjugate_base_of' | df$TYPE == 'is_tautomer_of',c('INIT_ID', 'FINAL_ID')]
  mapping$INIT_ID = as.character(mapping$INIT_ID)
  mapping$FINAL_ID = as.character(mapping$FINAL_ID)

  vertices = unique(c(mapping$INIT_ID, mapping$FINAL_ID, ids))
  g = graph_from_data_frame(mapping, vertices=vertices)
  map = as.list(ids)
  names(map) = ids

  for ( i in seq_along(map) ) {
    b = bfs(g, root=ids[i], unreachable=FALSE, order=TRUE)
    map[[i]] = names(b$order[!is.na(b$order)])
  }
  return(map)
}

@RonanDaly
Copy link
Member

RonanDaly commented Aug 11, 2020

The file relation.tsv was downloaded from ChEBI, ftp://ftp.ebi.ac.uk/pub/databases/chebi/Flat_file_tab_delimited/

@joewandy
Copy link
Member Author

joewandy commented Apr 1, 2021

The codes is being implemented into pyMultiOmics in this issue glasgowcompbio/pyMultiOmics#5.

Once pyMultiOmics is integrated into this project as part of #80, we can close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug priority This issue should be worked on before others
Projects
None yet
Development

No branches or pull requests

2 participants