Mapping of Compound IDs #70

joewandy · 2020-08-11T10:46:10Z

Sometimes a compound is mapped by ID to some ChEBI ID, but it turns out in Reactome a different ID is being used to relate the compound to reactions and pathways. This leads to some compounds missing from the table. We need to fix this.

e.g. for Histidine, often people use this ID to identify the compound

https://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:27570

but Reactome uses this ID, which is the zwitterion form:

https://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:57595

RonanDaly · 2020-08-11T10:49:13Z

Have some R code to do this along with a mapping from ChEBI.

RonanDaly · 2020-08-11T11:51:05Z

Here's the R. It looks like I was using the mapping to construct a graph and performing a breadth first search to find linked entities. This might not be a good way to do it!

expandChEBIIds = function(ids) {
  chebi_relation_file = system.file("extdata", "relation.tsv", package = "polydatar")
  df = read.table(chebi_relation_file, stringsAsFactors = FALSE, sep='\t', header = TRUE)
  mapping = df[df$TYPE == 'is_conjugate_acid_of' | df$TYPE == 'is_conjugate_base_of' | df$TYPE == 'is_tautomer_of',c('INIT_ID', 'FINAL_ID')]
  mapping$INIT_ID = as.character(mapping$INIT_ID)
  mapping$FINAL_ID = as.character(mapping$FINAL_ID)

  vertices = unique(c(mapping$INIT_ID, mapping$FINAL_ID, ids))
  g = graph_from_data_frame(mapping, vertices=vertices)
  map = as.list(ids)
  names(map) = ids

  for ( i in seq_along(map) ) {
    b = bfs(g, root=ids[i], unreachable=FALSE, order=TRUE)
    map[[i]] = names(b$order[!is.na(b$order)])
  }
  return(map)
}

RonanDaly · 2020-08-11T11:52:49Z

The file relation.tsv was downloaded from ChEBI, ftp://ftp.ebi.ac.uk/pub/databases/chebi/Flat_file_tab_delimited/

joewandy · 2021-04-01T15:49:08Z

The codes is being implemented into pyMultiOmics in this issue glasgowcompbio/pyMultiOmics#5.

Once pyMultiOmics is integrated into this project as part of #80, we can close this issue.

joewandy added bug priority This issue should be worked on before others labels Aug 11, 2020

joewandy assigned joewandy and RonanDaly Aug 11, 2020

joewandy mentioned this issue Feb 4, 2021

Improve chebi mapping glasgowcompbio/pyMultiOmics#5

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mapping of Compound IDs #70

Mapping of Compound IDs #70

joewandy commented Aug 11, 2020

RonanDaly commented Aug 11, 2020

RonanDaly commented Aug 11, 2020

RonanDaly commented Aug 11, 2020 •

edited

Loading

joewandy commented Apr 1, 2021 •

edited

Loading

Mapping of Compound IDs #70

Mapping of Compound IDs #70

Comments

joewandy commented Aug 11, 2020

RonanDaly commented Aug 11, 2020

RonanDaly commented Aug 11, 2020

RonanDaly commented Aug 11, 2020 • edited Loading

joewandy commented Apr 1, 2021 • edited Loading

RonanDaly commented Aug 11, 2020 •

edited

Loading

joewandy commented Apr 1, 2021 •

edited

Loading