-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
how to get single l2fc & padj for the additive effects of two factors of geneA (2 levels) and geneB (3 levels) #10
Comments
Dear @Gloriafu I'm not sure I understood all your solutions, but in your situation I would maybe take a different approach and use the likelihood ratio test (LRT) implemented in DESeq2. For your case: dds <- DESeqDataSetFromMatrix(countData = data_cts, colData = meta, design = ~ sex + GeneA_factor + GeneB_factor )
# test for the effect of both genes
dds <- DESeq(dds, test = "LRT", reduced = ~ sex)
res <- results(dds) In this case, you can mostly ignore the The results from this test will not be specific to any gene, but what I would then do is:
See sections "Identifying candidate genes" and "Using clustering" in these materials for an example of gene clustering using rlog-normalised data. A final note that in your model you did not consider an interaction between the genes - you are assuming all effects are additive. |
Dear @tavareshugo Thank you so much for your timely response and kind suggestions. I probably didn't phrase my question unclearly - My end goal is to plot 3 volcano plots with L2FC (x axis) and padj (y axis). These 3 plots shows genes that are affected by geneA only, geneB only and geneA + geneB (additive effect), respectively. Therefore, for each gene, I'm looking for methods to get the l2fc and padj for the effect of geneA only, effect of geneB only and the additive effect of geneA + geneB. In the reduced model you suggested, could you explain would does the l2fc mean for each gene? Also thank you for reminding me of the interaction term - I did tried to add the interaction in my model, but got the error message 'One or more variables or interaction terms in the design formula are linear’. Therefore, I removed the interaction in my final model. Thank you! |
OK, I think there might be a confusion about what an additive effect is, and perhaps you mean to find an interaction. If you're assuming additive effects (which is what your model assumes at the moment), then there is no test for "geneA + geneB", precisely because they are additive. Let's assume the following genotypes:
Let's also assume that
Then the additive effective of being
Where therefore the additive effect of being HET for both genes would be 2 - 2 = 0 (i.e. the expression of double HET is the same as the double WT). If you expect those effects not to be additive, then you need to model an interaction. It might be easier if you share the metadata to see exactly what combinations of genotypes you have. |
@tavareshugo thank you so much for the example! For additive vs interaction, my understanding is that geneA + geneB models the additive effect; whereas geneAgeneB is more for multiplicative effect. In the beginning I tried to model the two together as dds <- DESeqDataSetFromMatrix(countData = data_cts, colData = meta, design = ~ sex + GeneA_factor + GeneB_factor + GeneA_factorGeneB_factor), however, r gave me an error - 'One or more variables or interaction terms in the design formula are linear’. Therefore, I removed the interaction term and dds <- DESeqDataSetFromMatrix(countData = data_cts, colData = meta, design = ~ sex + GeneA_factor + GeneB_factor) worked fine. I have a question regarding adding L2FC for two factors - we could simply add LFC of many factors for the same gene together, correct? In your example, you added LFC(geneA-WT_vs_geneA-HET) + LFC(geneB-WT_vs_geneB-HET) = 1+2 =3 as geneA-HET; geneB-HET. I'm wondering in real analysis, could I just add L2FC together for genes that are significant at both geneA & geneB to get the L2FC of additive effect? Also, could you please help with comparing the LFC'= LFC(geneA-WT_vs_geneA-HET) + LFC(geneB-WT_vs_geneB-HET) vs LFC' = LFC(geneA-WT-geneB-WT_vs_geneA-HET-geneB-HET/HOM)? Here is my metadata Thank you! |
Maybe let's forget about the syntax for a minute and think about the biological question. If we do expect that a double mutant may be different from additive effects of single mutants, then we should fit a model with an interaction: However, there is an issue in your experimental design: you don't have the combination I also noticed that all of your Finally, going back to the syntax to generate the contrast vectors for several genotype combinations in your data: # meta is the table you pasted above
meta$geneA_factor <- relevel(factor(meta$geneA_factor), "WT")
meta$geneB_factor <- relevel(factor(meta$geneB_factor), "WT")
# rename variables to make it more readable
names(meta) <- c("id", "sex", "genea", "geneb")
# model with genotype interaction
mod_mat <- model.matrix(~ sex + genea + geneb + genea:geneb, data = meta)
# single genotypes
# WT;WT
wt_wt <- colMeans(mod_mat[meta$genea == "WT" & meta$geneb == "WT", ])
# HET;HET
het_het <- colMeans(mod_mat[meta$genea == "HET" & meta$geneb == "HET", ])
# WT;HET
wt_het <- colMeans(mod_mat[meta$genea == "WT" & meta$geneb == "HET", ])
# HET;WT - this is missing from your design!!!
het_wt <- colMeans(mod_mat[meta$genea == "HET" & meta$geneb == "WT", ])
# combined genotypes
# WT;HET or WT;HOM
wt_mut <- colMeans(mod_mat[meta$genea == "WT" & meta$geneb != "WT", ])
# HET;HET or HET;HOM
mut_mut <- colMeans(mod_mat[meta$genea != "WT" & meta$geneb != "WT", ])
# HET;WT or HET;HET
mut_any <- colMeans(mod_mat[meta$genea != "WT", ])
# WT;HET or WT;HOM or HET;HOM
any_mut <- colMeans(mod_mat[meta$geneb != "WT", ]) Based on these vectors, there are many comparisons you can make - although some may or may not make sense biologically. Regarding summing LFC values, I would probably not do this. Instead, what you could do is make a scatter plot of the two LFC against each other to get a sense of whether they are broadly correlating or not. There is another approach that you could take: "paste" the two genotypes as a new single variable, let's call it I still feel like the LRT approach I mentioned earlier, followed by gene clustering on normalised counts could be more adequate in this case, especially as you are missing the HET;WT genotype combination. |
@tavareshugo Thank you so much for your detailed instructions. We do miss the geneA - HET; geneB - WT group for the analysis - this was something we missed in our mouse experiment. the SRR samples are from the same batch as other genotypes - we did batches of a lot mice with different genetic background, but only used SRR samples for the analysis in an early publication, and now we want to analyze all the samples with different gt. As for interaction, yes, we assume there will be interaction between geneA and geneB, and I agree with you that we need to add interaction term. I used your syntax to code my meta data, and had no problem running mod_mat <- model.matrix(~ sex + genea + geneb + genea:geneb, data = meta). However, I still got the error message with dds
Could you please let me know what is the possible reason for this error? Any options I can bypass this error and let dds run? I also tried the LRT methods you mentioned and was able to cluster genes. However, I still need l2fc and padj for each gene in order to make the volcano plots, which were asked for by my PI. I'm wondering, how to get the l2fc and padj for them. One thing I noticed is that the genes I got from LRT & clustering don't always match with genes I got with contrast. I'm wondering which method is more reliable. |
Apologies for the delay. That error is a common issue (see e.g. here), and in your case it's because of the missing combination HET; WT. As I said earlier, I think an easier way to address your questions is to make a new variable with the genotype of both genes. Then you can do all the comparisons of interest. Here is some example code, assuming you already have your # create a new metadata column that pastes the two genotypes together
dds$genotype <- paste(dds$genea, dds$geneb, sep = ".")
# turn it to a factor with WT;WT as the reference level
dds$genotype <- relevel(factor(dds$genotype), "WT.WT")
# fit the model
design(dds) <- ~ sex + genotype
dds <- DESeq(dds)
# look at the coefficient names
resultsNames(dds) The output of that last line of code is something like:
So, you can now make different comparisons of interest. Single comparisons against the WT;WT are relatively easy: # WT;WT vs WT;HET
results(dds, contrast = list("genotype_HET.HET_vs_WT.WT"))
# and same logic for the others Comparisons between genotypes, it depends on what you want, but here is an example: # WT;HET vs HET;HET
results(dds, contrast = list("HET.HET_vs_WT.WT", "WT.HET_vs_WT.WT")) Finally, for your question of the average of genotypes, that one is a bit trickier, but I think a numeric contrast should work. # (HET;HET + HET;HOM)/2 - (WT;HET + WT;HOM)/2
results(dds, contrast = c(0, 0, 0.5, 0.5, 0, 0.5, 0.5)) The logic here is to take the vector from Hope this helps. |
Hi,
I made mutations on two genes - geneA is WT/HET and geneB is WT/HET/HOM. I obtained mice with both sexes carrying these mutations, and want to know what is the individual effect of geneA and geneB separately, and what is the additive effects of geneA & geneB together, while controlling for sex.
dds <- DESeqDataSetFromMatrix(countData = data_cts, colData = meta, design = ~ sex + GeneA_factor + GeneB_factor )
dds <- DESeq(dds)
resultsNames(dds)
I don't have problem with individual effects -
results_geneAhet_vs_geneAwt = lfcShrink(dds, contrast = c('GeneA_factor', 'HET', 'WT'), type = 'ashr')
geneB_WT = colMeans(mod_mat[dds$GeneB_factor == "WT", ])
geneB_nonWT = colMeans(mod_mat[dds$GeneB_factor %in% c("HOM", "HET"),])
results_geneB_nonWT_vs_WT = lfcShrink(dds, contrast = geneB_nonWT - geneB_WT, type = 'ashr')
However, I'm not sure how to find the additive effects of geneA & geneB together (I want single values of l2fc_additive & padj_additive for each affected gene) I'm thinking of two options -
option1 - find genes that are affected by both geneA and geneB individuals (genes in both results_geneAhet_vs_geneAwt & results_geneB_nonWT_vs_WT) and l2fc_additive = l2fc_geneA + l2fc_geneB, padj_additive = min(padj_geneA, padj_geneB).
option2 - use the design to find additive effects -
geneAWT_geneBWT = colMeans(mod_mat[dds$GeneA_factor == "WT" & dds$GeneB_factor=='WT', ])
geneAnonWT_geneBnonWT = colMeans(mod_mat[dds$GeneA_factor =="HET" & dds$GeneB_factor !='WT',])
results_geneAnonWT_geneBnonWT = lfcShrink(dds, contrast = geneAnonWT_geneBnonWT - geneAWT_geneBWT, type = 'ashr')
In this way, I get single value l2fc & padj.
I tried both options, but they gave me different number of affected genes. Even for same genes, the l2fc values were different although the signs/effect directions were the the same.
Could you please help to point the correct way for additive effect of geneA & geneB together? if neither option 1 and option2 is correct, then what's the best way you suggested?
Thank you!
Ming
The text was updated successfully, but these errors were encountered: