Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decouple directories and module names #31

Open
safisher opened this issue Oct 27, 2014 · 1 comment
Open

Decouple directories and module names #31

safisher opened this issue Oct 27, 2014 · 1 comment

Comments

@safisher
Copy link
Owner

Each directory should include a module file that contains the name of the module used to create that directory. This information could be added to the SAMPLE_ID.versions file. The file containing the module information can be used by STATS to determine which module is used to generate the stats. In this case STATS would be provided with a list of directories rather than a list of modules.

This will allow us to decouple the directory name from the module name and allow for more flexibility in running modules repeatedly. For example STAR could be run twice on two different genome versions or HTSEQ could be run repeated on different transcriptomes. This will also allow for meta-modules and more overall granularity in modules. For example we could run HTSEQ on exons then introns and use another (meta-)module to combine the exon and intron counts.

@safisher
Copy link
Owner Author

safisher commented Nov 6, 2014

In order to implement this every module should include an inputDirectory and an outputDirectory. These should not contain default values rather PIPELINE should explicitly state the directories.

It should be fine for modules to require/assume file names. For example, STAR should expect to find the file "unaligned_1.fq" in the inputDirectory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant