Skip to content
Erkin Otles edited this page Apr 19, 2015 · 17 revisions

Welcome to the CS838 wiki!

Class Project for Cancer Bioinformatics (Spring 2015) @ UW - Madison
Members: Taylor J., Haixiang L., Erkin O.

Data

TCGA CCLE

Project Plan

4/19/15

CCLE data transformation: http://www.bioconductor.org/packages/release/bioc/vignettes/affy/inst/doc/affy.pdf

4/14/15

Moving to a different cancer type Taylor to pick data: ?????
http://www.bioconductor.org/help/workflows/arrays/

Normalization:

  1. Mean centered
  2. Unit variance
    For each gene For the average of genes

Analysis:
Data type versus cancer types could cluster on 2 clusters to see if picking

plots: columns for each cluster - colors for sample type cells weird hub plot

Next week: normalized data sets

3/24/15

TCGA vs. CCLE Prostate & Ovarian Clustering based on samples RNA Ex: TCGA http://firebrowse.org/?cohort=PRAD&download_dialog=true Rows - (ID, Genes...) Columns - sample

TODO: Erkin parse (friday) CCLE (Taylor) Meet between Tues-Thurs spring break

3/5/15 Meeting with Professor Gitter

Similar to ovarian cancer (primary tumor and cell line) Hierarchical clustering Biological: looking at different types of primary tumor samples Prostate vs. Ovarian (may be similar) Taking stuff from TCGA & CCLE

End Result: What clusters together (e.g. prostate, breast, ovarian both cell lines & tumor) - biological insight into the relevancy between cell lines & tumors.

CCLE painted a rosy picture Cell line vs liquid tumors - massive PCA on tumor & cell line

No right way to cluster: different pipelines give different answers one path may not tell the whole story

Choice of clustering algorithm CCLE joint normalization Ovarian paper did preprocessing separately variation from baseline (e.g. TCGA gives variation from adjacent normal samples - differential expression, or other data sets that show what normal ovarian tissue looks like) Show common preprep steps and how they affect the results. Simple reasonable steps may have huge ramifications down the line.

Consistency: conclusive biological answer! Inconsistency: different answers different pipelines - research community needs to rethink approach.

Practical Issues Smaller number of analyses - to get to full conclusions *Think about what specifically we will try Normalization step (based on normal tissues, subtype specific corrections, etc.)

Getting Data: TCGA & CCLE data expression data (clinical annotations from TCGA - varies from wg to wg, age, fu status (survival, remission, etc)) good automated ways to deal with TCGA - if we don't need everything we can just download the files CCLE - has matrix on genes by cell lines - supplementary website (gct files, gives annotations)

RNA Seq vs Microarray TCGA (RNA) - CCLE may want to use old TCGA data to remove batch variation

Prior Art: Reread Ovarian paper & CCLE paper

Older paper that tries to remove batch effects from samples (could be preprocessing steps)

3/3/15

Taylor: Find source data
Haixang: Database ground work
Erkin: Work on Parser

Clone this wiki locally