JE7: Homework assignment GSEA

Objective

Review GSEA lecture and complete the questions

Started: Mar 20, 2023

Finished: Mar 21, 2023

Time Estimated: 2 hours

Time Spent: 1.5 hours

Procedure

From the assignment page Practice using GSEA: https://q.utoronto.ca/courses/294979/assignments/990549?module_item_id=4287880

Given the ranked list comparing mesenchymal and immunoreactive ovarian cancer subtypes(mesenchymal genes have positive scores, immunoreactive have negative scores). perform a GSEA preranked analysis using the following parameters:

mesenchymal vs immuno rank fileLinks to an external site.
genesets from the baderlab genesetLinks to an external site. collection from March 1, 2021 containing GO biological process, no IEA and pathways.
maximum geneset size of 200
minimum geneset size of 15
gene set permutation and answer the following questions in your journal:

Explain the reasons for using each of the above parameters. What is the top gene set returned for the Mesenchymal sub type? What is the top gene set returned for the Immunoreactive subtype? For each of the genesets answer the below questions: What is its pvalue, ES, NES and FDR associated with it. How many genes in its leading edge? What is the top gene associated with this geneset.

Downloaded the GSEA Java GUI
Downloaded MesenvsImmuno_RNASeq_ranks from GitHub
Downloaded http://download.baderlab.org/EM_Genesets/March_01_2021/ by R code example covered in Lecture 10
Loaded data and inputted parameters as denoted on the Assignment page
Ran GSEA Preranked with the inputted parameters

Results

Explain the reasons for using each of the above parameters.

mesenchymal vs immuno rank is the actual ranked list of gene expressions we're trying to analyze. The Bader genesets are regularly updated which is why they're used in this instance. The gene set permutations are the number of randomizations we want the program to perform.

The maximum and minimum geneset size controls the number of genesets we get back in our results. It's better to set the maximum sets at a smaller starting number, otherwise, we might get back a bunch of genesets that don't really have any significance.

What is the top gene set returned for the Mesenchymal sub type? What is the top gene set returned for the Immunoreactive subtype? For each of the genesets answer the below questions:

Mesenchymal

Top geneset: HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION

What is its pvalue, ES, NES and FDR associated with it.

Nominal pvalue: 0.0 ES: 0.8635254 NES: 2.5625694 FDR: 0.0

How many genes in its leading edge?

145 genes.

What is the top gene associated with this geneset.

The top gene is FBN1.

Immunoreactive

Top geneset: HALLMARK_INTERFERON_ALPHA_RESPONSE

What is its pvalue, ES, NES and FDR associated with it.

Nominal pvalue: 0.0 ES: -0.85694104 NES: -2.8964732 FDR: 0.0

How many genes in its leading edge?

79 genes.

What is the top gene associated with this geneset.

The top gene is PROCR.

Conclusion

The same tool can be accessed in many different ways through a number of interfaces. Some of the tools can be downloaded to your computer with a GUI, but these interfaces tend to be kind of clunky and unintuitive. It really becomes a matter of personal preference whether you want to access the tool through R or figure out a weird GUI.

References

https://www.gsea-msigdb.org/gsea/doc/GSEAUserGuideFrame.html?_Interpreting_GSEA_Results

Provide feedback

Saved searches

Use saved searches to filter your results more quickly