-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Welcome to the CS838 wiki!
Class Project for Cancer Bioinformatics (Spring 2015) @ UW - Madison
Members: Taylor J., Haixiang L., Erkin O.
TCGA CCLE
CCLE data transformation: http://www.bioconductor.org/packages/release/bioc/vignettes/affy/inst/doc/affy.pdf
Moving to a different cancer type
Taylor to pick data: ?????
http://www.bioconductor.org/help/workflows/arrays/
Normalization:
- Mean centered
- Unit variance
For each gene For the average of genes
Analysis:
Data type versus cancer types
could cluster on 2 clusters to see if picking
plots: columns for each cluster - colors for sample type cells weird hub plot
Next week: normalized data sets
TCGA vs. CCLE Prostate & Ovarian Clustering based on samples RNA Ex: TCGA http://firebrowse.org/?cohort=PRAD&download_dialog=true Rows - (ID, Genes...) Columns - sample
TODO: Erkin parse (friday) CCLE (Taylor) Meet between Tues-Thurs spring break
Similar to ovarian cancer (primary tumor and cell line) Hierarchical clustering Biological: looking at different types of primary tumor samples Prostate vs. Ovarian (may be similar) Taking stuff from TCGA & CCLE
End Result: What clusters together (e.g. prostate, breast, ovarian both cell lines & tumor) - biological insight into the relevancy between cell lines & tumors.
CCLE painted a rosy picture Cell line vs liquid tumors - massive PCA on tumor & cell line
No right way to cluster: different pipelines give different answers one path may not tell the whole story
Choice of clustering algorithm CCLE joint normalization Ovarian paper did preprocessing separately variation from baseline (e.g. TCGA gives variation from adjacent normal samples - differential expression, or other data sets that show what normal ovarian tissue looks like) Show common preprep steps and how they affect the results. Simple reasonable steps may have huge ramifications down the line.
Consistency: conclusive biological answer! Inconsistency: different answers different pipelines - research community needs to rethink approach.
Practical Issues Smaller number of analyses - to get to full conclusions *Think about what specifically we will try Normalization step (based on normal tissues, subtype specific corrections, etc.)
Getting Data: TCGA & CCLE data expression data (clinical annotations from TCGA - varies from wg to wg, age, fu status (survival, remission, etc)) good automated ways to deal with TCGA - if we don't need everything we can just download the files CCLE - has matrix on genes by cell lines - supplementary website (gct files, gives annotations)
RNA Seq vs Microarray TCGA (RNA) - CCLE may want to use old TCGA data to remove batch variation
Prior Art: Reread Ovarian paper & CCLE paper
Older paper that tries to remove batch effects from samples (could be preprocessing steps)
Taylor: Find source data
Haixang: Database ground work
Erkin: Work on Parser