-
Notifications
You must be signed in to change notification settings - Fork 0
3 Assignment 1
Arianne Beauregard edited this page Apr 7, 2023
·
2 revisions
Estimated time: 12h
Actual time: About 12h over several days
- Used the GEOmetadb package to find a dataset
- Tried using getSQLiteFile() function, but error
- Manually downloaded from https://gbnci.cancer.gov/geo/GEOmetadb.sqlite.gz
- Followed vignette from GEOmetadb and lecture3 notes
- Dataset chosen: GSE155955
- Borrego, S. L., Fahrmann, J., Hou, J., Lin, D. W., Tromberg, B. J., Fiehn, O., & Kaiser, P. (2021). Lipid remodeling in response to methionine stress in MDA-MBA-468 triple-negative breast cancer cells. Journal of lipid research, 62, 100056. https://doi.org/10.1016/j.jlr.2021.100056
- Used GEOquery package
- Dataset included gene symbols and Entrez IDs
- The dataset contained a few duplicate symbols; upon checking, I realized that some of the symbols were from HUGO and some were from other sources (e.g OMIM)
- Decided to map to both Entrez IDs and HUGO symbols (if available, if not, then official symbol on NCBI)
- Dataset had ERCC RNA controls
- Followed convention from here (in supplementary materials)
To check if compiles:
docker run --rm -it -v ${PWD}:/home/rstudio/projects --user rstudio risserlin/bcb420-base-image /usr/local/bin/R -e "rmarkdown::render('/home/rstudio/projects/ArianneChristina_Beauregard/Assignment1/Assignment1.nb.html',output_file='/home/rstudio/projects/test.html')" > processing_output_filename
Evans C, Hardin J, Stoebel DM. Selecting between-sample RNA-Seq normalization methods from the perspective of their assumptions. Brief Bioinform. 2018 Sep 28;19(5):776-792. doi: 10.1093/bib/bbx008. PMID: 28334202; PMCID: PMC6171491. (for selecting normalization)