Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gene map ensembl ids are transcripts probeToMeanPromoterMethylation #27

Open
FischerJoBio opened this issue Jul 26, 2023 · 4 comments
Open

Comments

@FischerJoBio
Copy link
Collaborator

The returned Ensembl ids of the probe map of probeToMeanPromoterMethylation are transcripts (ENST*) but should be genes (ENSG*), as probes are mapped to the genome and not transcript-level information

@katehoffshutta
Copy link
Collaborator

katehoffshutta commented Jul 27, 2023

This behavior actually comes from the generation of the probe map in mapProbesToGenes. The rationale for doing the mapping at the transcript level is to identify probes that are within a certain range of the TSS for any of the various transcripts of a gene that are recorded in the transcriptIDs column of the Illumina manifest. We actually have to map to the transcript level for this reason, I think.

@FischerJoBio
Copy link
Collaborator Author

I see your point, but this is problematic for most downstream applications (including everything we do), as we work on a gene-level (i.e., reference genome annotation) and not transcript-level. Then the question becomes, how do we map from the transcripts to genome-level? Also, these methylation counts are a genomic feature and not mRNA feature. There is reasoning for both ways, but we should decide and keep it consistent for practicality -- and if we stick to transcript-level promoter definitions we should warn users and provide a function to map to the genome.

@katehoffshutta
Copy link
Collaborator

Based on the manifest use of transcripts vs. genes, I propose this resolution @FischerJoBio:

  • Add a note that promoter definitions are at the transcript level
  • Advise users that they can write their own functions to pre-filter the input probe_gene_map if they are looking to calculate promoter methylation for a single transcript per gene

I don't think we need to provide a function to map to the genome at this point, but we could open it as a separate issue for enhancement if desired.

@katehoffshutta
Copy link
Collaborator

Note that when we do the mapping in probeToMeanPromoterMethylation, the current behavior is to include a probe multiple times if it maps to multiple isoforms. This also needs to be included in the documentation. MWE:

library(NetSciDataCompanion)
genesInt <- c("NLGN4X") #,"MAP7D2","SH3KBP1"  ,"MAP3K15","ARSD" , "ARSF" , "MXRA5"  ,"NLGN4X"  ,"PUDP")
objNet2 <- CreateNetSciDataCompanionObject()
probeList = c("cg17811446","cg20303283","cg21875437",
              "cg21142738","cg06811375","cg08881159",
              "cg15790913","cg19743317")
probemap <- objNet2$mapProbesToGenes(probeList,rangeUp=500, rangeDown=0)

set.seed(1989)
betas = rbeta(n=length(probeList),shape1=0.5,shape2=0.5)

gene_prom_meth <- objNet2$probeToMeanPromoterMethylation(methylation_betas = data.frame("probeID"=probeList,"person1"=betas),
                                                         genesOfInterest = genesInt,
                                                         probe_gene_map = probemap)
gene_prom_meth
# 0.45119
mean(betas)
# 0.47121
1/9*(sum(betas) + betas[which(probemap$geneNames == "NLGN4X;NLGN4X")])
# 0.45119

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants