-
Notifications
You must be signed in to change notification settings - Fork 0
Assignment #1
Lola-W edited this page Feb 15, 2023
·
1 revision
Objective: Data exploration on expression dataset
Time estimated: 10 h; taken 12 h;
Date started: 2023-2-10 ; completed: 2023-1-14
-
GEOMetadb to find db:
rs <- dbGetQuery(con,sql)
-
Selected:
https://www-ncbi-nlm-nih-gov/geo/query/acc.cgi?acc=GSE104406 Aging Human Hematopoietic Stem Cells Manifest Profound Epigenetic Reprogramming of Enhancers That May Predispose to Leukemia (RNA-Seq of HSCe)
-
Download the data with GEO2R and see the infos
kable(data.frame(head(Meta(gse))), format = "html")
-
Assess data quality for the control and test conditions
Issue:
Error in cpm.default(abs(raw_dat[, 2:21])) : library sizes should be finite and non-negative
Solution:summary(raw_dat)
instead oftable(raw_dat)
, found a row ofNA
, removed.
- Method?
- plots: pre- and after-, fig.align to make side by side
- Used four types of plots
- It is notable that from MDS Plot, some differences may be caused by gender, should be noticed in future analysis.
- Map rows to HUGO gene symbols
-
Search for human dataset starting with ENSG
ensembl <- useMart("ensembl") kable(head(datasets[grep(datasets$dataset, pattern = "sapiens"),]),format = "html") # ENSG and HGNC kable(searchAttributes(mart = ensembl, 'ensembl|hgnc')[1:12,] , format="html") %>% row_spec(c(1,11), background = "yellow")
-
unmapped rows:
1 alignment_not_unique 10 ENSG00000108264 - 2 types of missing mapping: either is invalid ensembl(i.e. side notes instead of meaningful data); or the emsembl id AND its corresponding hgnc exists, but not included in our mapping db
- alignment_noy_unique should be removed before in data cleaning
- No rows that map to more than one symbol
- Multiple rows that map to the same symbol: should keep all, because cannot expect 1-on-1 mapping
-
💡 Conclusion and outlook: Selected and cleaned data, normalized and mapped to HUGO symbols