-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Effect of over- and underexpression on itselves #4
Comments
@holgerman thanks for playing with the data and your interest.
To make sure I understand, these plots show the z-score distributions for perturbation target genes (effect_itselves == True) and all non-target genes (effect_itselves == False)? So while there are dysregulated non-target genes that have absolute s-zcores > 5, the tails of this distribution are small enough that they aren't visible above in red? We have noticed similar issues before, where many perturbations don't significantly dysregulate their target in the expected direction. In this discussion, we attempt to diagnose the issue. In particular, we found that measured target genes tended to be dysregulated in the expected direction, while imputed target genes did not. Therefore, we conclude the issue is likely primarily due to poor imputation quality in the original LINCS data (even within the BING "best inferred gene set" genes). In the Project Rephetio manuscript, we summarize:
|
@dhimmel thank you for giving me the right directions! And congrats for your truly collaborative eLIFE paper!
Yes, this is right, I just contrasted any non-target pair with the targeted pair. R ggplot fixes in vertical facetting the scaling of the x-axis for the largest range. Overexpression z values of red non-target pairs ranged from -19.074 to 38.136 and knock-down of non-targeted pairs ranged from -58.775 to 45.133 with tails not visible in the graph above.
Thanks, this insight was very helpful, also the discussion in Himmelstein et al., 2016o! As you discussed that a general stress response might be the reason, I had a closer look at the contrasts used for calculating the z scores in the LINCS data. My motivation for this was that the general stress response must be weaker in controls to get manifested in the z score. From this GitHub issue I understood that your
In this discussion you wrote
Does this mean you used VC controls in your method? As vehicles the pert_info file includes as
Interestingly, the Ma'ayan lab suggested in this youtube lecture at 18' 30'' using the PC. However, the independent calculation of Level 5 as the third version of z scores by the Ma’ayan lab themselves using their characteristic direction method described in their paper sounds like - if I understood correctly - as they used VC:
I will ask them about it. And, finally, would you think that using a different definition of the level 4 z-scores might help at least a bit to improve the problem of low quality of imputed L1000 genes? Thanks again for your time! |
That sounds right, although I don't remember using the "level 4" terminology. When we were accessing the data through LINCS Cloud, I don't believe the GEO upload existed. Frankly I'm not sure whether the
Perhaps you can get in touch with someone from the LINCS L1000 team and inquire about what control was used for this file. @tnat1031 (Ted Natoli) was very helpful during the online L1000 office hours. Perhaps he will know which control was used. I don't remember there being an option in the past, so its possible that only vehicle control existed when we did our analyses and population control is newer? Anyways, @holgerman I do think a population control could be preferable. Removing the general stress response would be valuable. I'm not sure how this would relate to the imputation quality issue. Since I believe genes are imputed prior to the control stage, its possible the quality would not change. If it turns out that more robust controls or differential expression data is now available, a pull request to update this repo would be of interest. |
@dhimmel Yes, the old @holgerman Thanks for your interest in the data. I agree with @dhimmel that when considering the effect of a perturbagen on the specific gene it is designed to target, the directly measured (aka Could you share more about the type of research you're doing and the questions you want to address with this data? Also, you may find our documentation and other resources at clue.io helpful. Thanks a lot, |
Dear @tnat1031 , thank you very much for providing this information! @dhimmel I got also information from the Ma'ayan lab regarding their controls. Indeed, they did not use population controls but VC (vehicle controls) for their characteristic direction method. In the video they ment that the Broad preferred PC for their z-score method, this was not related to their own work. They did not compare yet their methods performance between VC and PC on this dataset and used VC because they regarded it the typical design for a single microarray experiment. |
Hi Daniel,
thank you very much for sharing this work. As a computational biologist, this data seems very interesting for lookup of hypothesis won in another dataset in a wet lab data, great!
I had a look at the datasets you kindly provided in https://github.com/dhimmel/lincs/tree/gh-pages/data/consensi and checked the effect of overexpression/underexpression of a gene as perturbagen on itself:
About a third of the genes showed nominal significant (z score <= -1.96) underexpression when it was itself the repressing perturbagen. When looking on overexpression, about 10 percent of genes showed overexpression when they were the overexpressed perturbagen itself.
My first question is: While this is truly a clear enrichment in the right direction, is this rather low efficiency of a gene as perturbagen on itself expected?
My second question is: Do you suggest to filter for genes that have an effect as perturbagen on itself for quality control?
To illustrate this issue, here is a histogram of z-scores showing effect as perturbagen on itselves vs. effect on other genes:
Thanks and best, Holger
The text was updated successfully, but these errors were encountered: