Skip to content
Lola-W edited this page Mar 21, 2023 · 3 revisions

Practise using GSEA


Objective: perform a GSEA preranked analysis on mesenchymal and immunoreactive ovarian cancer subtypes

Time estimated: 2 h; taken 1.5 h;

Date started: 2023-3-20 ; completed: 2023-3-20


Process

Issue: cannot open docker with GSEA Solution: download software- GSEA v4.3.2 Mac App on GSEA official website

  • Download data:
    • mesenchymal vs immuno rank file using github URL
    • newest baderlab geneset: [Human_GOBP_AllPathways_no_GO_iea_March_02_2023_symbol.gmt](http://download.baderlab.org/EM_Genesets/March_02_2023/Human/symbol/Human_GOBP_AllPathways_no_GO_iea_March_02_2023_symbol.gmt) with no IEA
  • Upload using Load data
  • Run GSEAPreanked
    • Collapse: No_Collapse
    • Basic fields: Maxsize 200
    • rest of the parameters used default, as the defaulted permutation method is gene set permutation.

Issue: no Max genesize for selection Solution: Scroll right for basic fields, show

Conditions:

  • mesenchymal vs immuno rank file
  • genesets from the baderlab geneset collection containing GO biological process, no IEA and pathways.
  • maximum geneset size of 200
    • Exclude huge pathways, increase specificity of result
  • minimum geneset size of 15
  • gene set permutation
    • We use a ranked list instead of phenotype mutation which is not optimized for RNA seq datasets.
    • To calculate the NES values for all S and permutation, and compare where the actual ES is in this distribution using FDR and p value

Questions:

  1. Explain the reasons for using each of the above parameters.
    • We used the newest baderlab geneset because
      • It is updated on a monthly basis, therefore it contain more pathways then GSEA default, and more up to date.
      • GO biological process is included, not included IEA for more credible pathways instead of electronic annotations.
    • Maximum geneset size is set to 200
      • We want to exclude huge pathways and increase specificity of our result.
    • Minimum geneset size is set to 15
      • We want to exclude extremely small pathways which might that may just contain a few number of genes. . This approach ensures that the resulting pathways are not too specific.
    • We chose gene set permutation
      • The other option: phenotype mutation is not optimized for RNA seq datasets
      • We used a ranked gene list, and we want to maintain the rank during the permutation of GSEA analysis
    • What is the top gene set returned for the Mesenchymal sub type?
      • HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION%MSIGDBHALLMARK%HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION
      1. What is its pvalue, ES, NES and FDR associated with it.
        • pvalue: 0.000
        • ES: 0.87
        • NES: 2.59
        • FDR: 0.000
      2. How many genes in its leading edge?
        • 147 genes in its leading edge.
      3. What is the top gene associated with this geneset.
    • What is the top gene set returned for the Immunoreactive subtype?
      • HALLMARK_INTERFERON_ALPHA_RESPONSE%MSIGDBHALLMARK%HALLMARK_INTERFERON_ALPHA_RESPONSE
      1. What is its pvalue, ES, NES and FDR associated with it.
        • pvalue: 0.000
        • ES: -0.86
        • NES: -2.90
        • FDR: 0.000
      2. How many genes in its leading edge?
        • 79 genes in its leading edge.
      3. What is the top gene associated with this geneset.

Summary: GSEA account for all signals instead of the top differentiated ones, the negative and positive values account for different ovarian cancer subtypes and will be detected as phenotypes automatically.

References:

Mootha, V. K., Lindgren, C. M., Eriksson, K. F., Subramanian, A., Sihag, S., Lehar, J., Puigserver, P., Carlsson, E., Ridderstrale, M., Laurila, E., et al.  (2003). PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat Genet 34, 267-273.

Clone this wiki locally