Multi-sample API #211

tavinathanson · 2017-05-24T18:09:50Z

We haven't necessarily wanted to incorporate multiple samples into one Cohort object, as that complicates every part of the library. For example:

Which samples does missense_snv_count count?
How would we refer to the tumor_sample and normal_sample?

For many use cases, different Cohort objects can just be created with different sets of samples.

However, questions pop up like:

How can we make it easier to iterate through all samples?
How can we take advantage of Cohorts/Discohorts when we don't have clinical data?

One thought is a separate SampleCollection for specifying a bunch of samples and optionally creating Cohorts from those samples. And perhaps a SampleGroup (where Patient can extend SampleGroup) to link samples together when we don't have the clinical data appropriate for a Patient. Something like:

group = SampleGroup(id="1033")
sample_1 = Sample(group=group, label="pre", bam_path_dna=...)
sample_2 = Sample(group=group, label="post", bam_path_dna=...)
samples = SampleCollection([sample_1, sample_2])
samples.run() # Use Discohorts to run over the samples rather than patients
# TODO: If running, Epidisco, how do we know which is tumor/normal/RNA? 
# Cohort objects give us that, which is how Discohorts currently does it.
cohort = samples.as_cohort() # Works better if `group` is a `Patient`

The text was updated successfully, but these errors were encountered:

jburos · 2017-05-24T18:14:32Z

I like the idea of a sample-group, though I don't know how well that extends to @julia326 's use case(s).

It might be helpful here to document the analysis that one would want to enable when there are multiple samples. This might help to motivate the API for using multi-sample data.

E.g. : some change in status, comparing pre-tx vs post-tx?

Could there be different settings on the samples (e.g. bqsr vs not)?

In each of these cases, I can imagine one sample might be the "default" (pre & with-bqsr) and another might be referenced on-demand.

tavinathanson · 2017-05-24T18:15:12Z

@jburos definitely! I suppose my motivating "analysis" to start with was running Epidisco/other pipelines for all samples.

tavinathanson · 2017-05-24T18:16:59Z

Will also be useful to talk to @julia326 about what types of analyses we could enable here.

jburos · 2017-07-17T17:57:01Z

@tavinathanson great, i can understand wanting to run the pipelines, but then .. do what with the results?

I am bringing this up again b/c we're approaching this problem from the other end, so to speak, for a different project. For this cohort, we have a subset of patients with pre/post Tx RNA samples & already have epidisco pipeline results for these samples. I am now thinking about how to extend cohorts in order to process them.

In my use case (granted parts of this aren't yet supported by cohorts, but .. putting here for the record), I'd like to be able to:

run an command using "pre-tx" samples, "post-tx" samples, whichever is available, or the difference between the two timepoints.
- for something like a differential expression analysis, then the interpretation of the above should be clear
- for expressed mutation count, I might want to restrict to pre-tx samples, restrict to post-tx samples, or look at number of mutations that went from expressed to non-expressed or vice-versa.

Seems to me that a lot of the above could be facilitated with a sample label or keyword. Again, not thinking here about discohorts, just cohorts.

armish mentioned this issue Jul 9, 2017

Make cohorts more workflow-engine and cloud storage friendly #226

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-sample API #211

Multi-sample API #211

tavinathanson commented May 24, 2017 •

edited

Loading

jburos commented May 24, 2017 •

edited

Loading

tavinathanson commented May 24, 2017 •

edited

Loading

tavinathanson commented May 24, 2017

jburos commented Jul 17, 2017

Multi-sample API #211

Multi-sample API #211

Comments

tavinathanson commented May 24, 2017 • edited Loading

jburos commented May 24, 2017 • edited Loading

tavinathanson commented May 24, 2017 • edited Loading

tavinathanson commented May 24, 2017

jburos commented Jul 17, 2017

tavinathanson commented May 24, 2017 •

edited

Loading

jburos commented May 24, 2017 •

edited

Loading

tavinathanson commented May 24, 2017 •

edited

Loading