-
Notifications
You must be signed in to change notification settings - Fork 0
JE7: Homework assignment GSEA
- Review GSEA lecture and complete the questions
Started: Mar 20, 2023
Finished: Mar 21, 2023
Time Estimated: 2 hours
Time Spent: 1.5 hours
From the assignment page Practice using GSEA: https://q.utoronto.ca/courses/294979/assignments/990549?module_item_id=4287880
Given the ranked list comparing mesenchymal and immunoreactive ovarian cancer subtypes(mesenchymal genes have positive scores, immunoreactive have negative scores). perform a GSEA preranked analysis using the following parameters:
- mesenchymal vs immuno rank fileLinks to an external site.
- genesets from the baderlab genesetLinks to an external site. collection from March 1, 2021 containing GO biological process, no IEA and pathways.
- maximum geneset size of 200
- minimum geneset size of 15
- gene set permutation and answer the following questions in your journal:
Explain the reasons for using each of the above parameters. What is the top gene set returned for the Mesenchymal sub type? What is the top gene set returned for the Immunoreactive subtype? For each of the genesets answer the below questions: What is its pvalue, ES, NES and FDR associated with it. How many genes in its leading edge? What is the top gene associated with this geneset.
- Downloaded the GSEA Java GUI
- Downloaded MesenvsImmuno_RNASeq_ranks from GitHub
- Downloaded http://download.baderlab.org/EM_Genesets/March_01_2021/ by R code example covered in Lecture 10
- Loaded data and inputted parameters as denoted on the Assignment page
- Ran GSEA Preranked with the inputted parameters
- Explain the reasons for using each of the above parameters.
mesenchymal vs immuno rank is the actual ranked list of gene expressions we're trying to analyze. The Bader genesets are regularly updated which is why they're used in this instance. The gene set permutations are the number of randomizations we want the program to perform.
The maximum and minimum geneset size controls the number of genesets we get back in our results. It's better to set the maximum sets at a smaller starting number, otherwise, we might get back a bunch of genesets that don't really have any significance.
- What is the top gene set returned for the Mesenchymal sub type? What is the top gene set returned for the Immunoreactive subtype? For each of the genesets answer the below questions:
Top geneset: HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION
What is its pvalue, ES, NES and FDR associated with it.
Nominal pvalue: 0.0 ES: 0.8635254 NES: 2.5625694 FDR: 0.0
How many genes in its leading edge?
145 genes.
What is the top gene associated with this geneset.
The top gene is FBN1.
Top geneset: HALLMARK_INTERFERON_ALPHA_RESPONSE
What is its pvalue, ES, NES and FDR associated with it.
Nominal pvalue: 0.0 ES: -0.85694104 NES: -2.8964732 FDR: 0.0
How many genes in its leading edge?
79 genes.
What is the top gene associated with this geneset.
The top gene is PROCR.
The same tool can be accessed in many different ways through a number of interfaces. Some of the tools can be downloaded to your computer with a GUI, but these interfaces tend to be kind of clunky and unintuitive. It really becomes a matter of personal preference whether you want to access the tool through R or figure out a weird GUI.
https://www.gsea-msigdb.org/gsea/doc/GSEAUserGuideFrame.html?_Interpreting_GSEA_Results