Gene map ensembl ids are transcripts probeToMeanPromoterMethylation #27

FischerJoBio · 2023-07-26T11:24:42Z

The returned Ensembl ids of the probe map of probeToMeanPromoterMethylation are transcripts (ENST*) but should be genes (ENSG*), as probes are mapped to the genome and not transcript-level information

The text was updated successfully, but these errors were encountered:

katehoffshutta · 2023-07-27T10:05:09Z

This behavior actually comes from the generation of the probe map in mapProbesToGenes. The rationale for doing the mapping at the transcript level is to identify probes that are within a certain range of the TSS for any of the various transcripts of a gene that are recorded in the transcriptIDs column of the Illumina manifest. We actually have to map to the transcript level for this reason, I think.

FischerJoBio · 2023-07-27T10:21:28Z

I see your point, but this is problematic for most downstream applications (including everything we do), as we work on a gene-level (i.e., reference genome annotation) and not transcript-level. Then the question becomes, how do we map from the transcripts to genome-level? Also, these methylation counts are a genomic feature and not mRNA feature. There is reasoning for both ways, but we should decide and keep it consistent for practicality -- and if we stick to transcript-level promoter definitions we should warn users and provide a function to map to the genome.

katehoffshutta · 2024-07-15T17:08:55Z

Based on the manifest use of transcripts vs. genes, I propose this resolution @FischerJoBio:

Add a note that promoter definitions are at the transcript level
Advise users that they can write their own functions to pre-filter the input probe_gene_map if they are looking to calculate promoter methylation for a single transcript per gene

I don't think we need to provide a function to map to the genome at this point, but we could open it as a separate issue for enhancement if desired.

katehoffshutta · 2024-07-21T01:05:38Z

Note that when we do the mapping in probeToMeanPromoterMethylation, the current behavior is to include a probe multiple times if it maps to multiple isoforms. This also needs to be included in the documentation. MWE:

library(NetSciDataCompanion)
genesInt <- c("NLGN4X") #,"MAP7D2","SH3KBP1"  ,"MAP3K15","ARSD" , "ARSF" , "MXRA5"  ,"NLGN4X"  ,"PUDP")
objNet2 <- CreateNetSciDataCompanionObject()
probeList = c("cg17811446","cg20303283","cg21875437",
              "cg21142738","cg06811375","cg08881159",
              "cg15790913","cg19743317")
probemap <- objNet2$mapProbesToGenes(probeList,rangeUp=500, rangeDown=0)

set.seed(1989)
betas = rbeta(n=length(probeList),shape1=0.5,shape2=0.5)

gene_prom_meth <- objNet2$probeToMeanPromoterMethylation(methylation_betas = data.frame("probeID"=probeList,"person1"=betas),
                                                         genesOfInterest = genesInt,
                                                         probe_gene_map = probemap)
gene_prom_meth
# 0.45119
mean(betas)
# 0.47121
1/9*(sum(betas) + betas[which(probemap$geneNames == "NLGN4X;NLGN4X")])
# 0.45119

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gene map ensembl ids are transcripts probeToMeanPromoterMethylation #27

Gene map ensembl ids are transcripts probeToMeanPromoterMethylation #27

FischerJoBio commented Jul 26, 2023

katehoffshutta commented Jul 27, 2023 •

edited

Loading

FischerJoBio commented Jul 27, 2023

katehoffshutta commented Jul 15, 2024

katehoffshutta commented Jul 21, 2024

Gene map ensembl ids are transcripts probeToMeanPromoterMethylation #27

Gene map ensembl ids are transcripts probeToMeanPromoterMethylation #27

Comments

FischerJoBio commented Jul 26, 2023

katehoffshutta commented Jul 27, 2023 • edited Loading

FischerJoBio commented Jul 27, 2023

katehoffshutta commented Jul 15, 2024

katehoffshutta commented Jul 21, 2024

katehoffshutta commented Jul 27, 2023 •

edited

Loading