Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-sample API #211

Open
tavinathanson opened this issue May 24, 2017 · 4 comments
Open

Multi-sample API #211

tavinathanson opened this issue May 24, 2017 · 4 comments

Comments

@tavinathanson
Copy link
Member

tavinathanson commented May 24, 2017

We haven't necessarily wanted to incorporate multiple samples into one Cohort object, as that complicates every part of the library. For example:

  • Which samples does missense_snv_count count?
  • How would we refer to the tumor_sample and normal_sample?

For many use cases, different Cohort objects can just be created with different sets of samples.

However, questions pop up like:

  • How can we make it easier to iterate through all samples?
  • How can we take advantage of Cohorts/Discohorts when we don't have clinical data?

One thought is a separate SampleCollection for specifying a bunch of samples and optionally creating Cohorts from those samples. And perhaps a SampleGroup (where Patient can extend SampleGroup) to link samples together when we don't have the clinical data appropriate for a Patient. Something like:

group = SampleGroup(id="1033")
sample_1 = Sample(group=group, label="pre", bam_path_dna=...)
sample_2 = Sample(group=group, label="post", bam_path_dna=...)
samples = SampleCollection([sample_1, sample_2])
samples.run() # Use Discohorts to run over the samples rather than patients
# TODO: If running, Epidisco, how do we know which is tumor/normal/RNA? 
# Cohort objects give us that, which is how Discohorts currently does it.
cohort = samples.as_cohort() # Works better if `group` is a `Patient`
@jburos
Copy link
Member

jburos commented May 24, 2017

I like the idea of a sample-group, though I don't know how well that extends to @julia326 's use case(s).

It might be helpful here to document the analysis that one would want to enable when there are multiple samples. This might help to motivate the API for using multi-sample data.

E.g. : some change in status, comparing pre-tx vs post-tx?

Could there be different settings on the samples (e.g. bqsr vs not)?

In each of these cases, I can imagine one sample might be the "default" (pre & with-bqsr) and another might be referenced on-demand.

@tavinathanson
Copy link
Member Author

tavinathanson commented May 24, 2017

@jburos definitely! I suppose my motivating "analysis" to start with was running Epidisco/other pipelines for all samples.

@tavinathanson
Copy link
Member Author

Will also be useful to talk to @julia326 about what types of analyses we could enable here.

@jburos
Copy link
Member

jburos commented Jul 17, 2017

@tavinathanson great, i can understand wanting to run the pipelines, but then .. do what with the results?

I am bringing this up again b/c we're approaching this problem from the other end, so to speak, for a different project. For this cohort, we have a subset of patients with pre/post Tx RNA samples & already have epidisco pipeline results for these samples. I am now thinking about how to extend cohorts in order to process them.

In my use case (granted parts of this aren't yet supported by cohorts, but .. putting here for the record), I'd like to be able to:

  1. run an command using "pre-tx" samples, "post-tx" samples, whichever is available, or the difference between the two timepoints.
    • for something like a differential expression analysis, then the interpretation of the above should be clear
    • for expressed mutation count, I might want to restrict to pre-tx samples, restrict to post-tx samples, or look at number of mutations that went from expressed to non-expressed or vice-versa.

Seems to me that a lot of the above could be facilitated with a sample label or keyword. Again, not thinking here about discohorts, just cohorts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants