-
Notifications
You must be signed in to change notification settings - Fork 0
Assignment 3
Complete non-thresholded analysis of datasets and visualise pathways in cytoscape.
- Estimated Time Required: 12 hrs
- Actual Time Required: 2 0hrs
- Start Date: Wed Apr 5th
- Finish Date: Apr 9th
Conduct non-thresholded gene set enrichment analysis using the ranked set of genes from Assignment #2.
-
What method did you use? What genesets did you use? Make sure to specify versions and cite your methods.
-
Summarize your enrichment results.
-
How do these results compare to the results from the thresholded analysis in Assignment #2. Compare qualitatively. Is this a straight forward comparison? Why or why not?
Using your results from your non-thresholded gene set enrichment analysis visualize your results in Cytoscape.
-
Create an enrichment map - how many nodes and how many edges in the resulting map? What thresholds were used to create this map? Make sure to record all thresholds. Include a screenshot of your network prior to manual layout.
-
Annotate your network - what parameters did you use to annotate the network. If you are using the default parameters make sure to list them as well.
-
Make a publication ready figure - include this figure with proper legends in your notebook.
-
Collapse your network to a theme network. What are the major themes present in this analysis? Do they fit with the model? Are there any novel pathways or themes?
- Present your results with the use of tables and screenshots. All figures should have appropriate figure legends.
- If using figures create a figures directory in your repo and make sure all references to the figures are relative in your Rmarkdown notebook.
-
Used some of my code from A2 to generate a *.rnk file.
-
I've found a pathway gene set on plat GSAD http://systemsbiology.cau.edu.cn/PlantGSEAv2/browse.php.
-
Not in the right format <1 hours of browsing internet ensures>
-
Found out that you can download gene sets directly from g:Profiler.
-
Downloaded all Biological process annotations from g:Profiler for Arabidopsis thaliana.
-
Realized I din't apply -log to p values in rank file. Fixing...
-
OK! Analysis complete.
- Here we go!
- Not Quite sure what "Publication quality looks like". Finding some examples.
- Found https://www.nature.com/articles/s41596-018-0103-9
- I've manually annotated the data. Looks pretty good! Might need some more support to back up the categories (even though these are fairly obvious), but seems just as rigorous as looking for enriched words.
- Spent some time faffing around with the automtic annotator. Seems to have a few bugs. Fixed by restarting a few times.
-
Ran into a whole bunch of problems.
-
Much of the dark matter detected is annotated when I google it - not sure why.
-
Trying to find OFFICAL gene names, which I presume are what TAIR uses.
-
Annoyingly, none of the TAIR database packages store the official gene name.
-
I have concluded that science is anarchic and that there are no official gene names - only vibes.
-
Fixed by searching using ALL gene names in addition to the locus ID.
-
Still many are annotated...
-
All of the remaining genes that are annotated are annotated via electronic annotations!!
-
Not going to include the electronic annotations - they do explain much of the dark matter though, but likely at a lower quality.
-
Found a massive bug that resolves the above. Using annotations that use locus id instead of gene names.
-
Fixed this.
-
Had to re-do much of the assignment. Gene names suck! Always use ensembl IDs.
Use standardized naming systems.