You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
(...) what is presented to the user does not need to match what goes into the pipeline (i.e. the user could be presented with e.g. "Arabidopsis thaliana (tair9)" but your application could still write "araTha_tair9" to the csv, which would guarantee no disruption to the pipeline), or am I missing the point?
The problem is that the code pointed to by Valentine is not the only place where this is used, e.g. bcbio/pipeline/alignment.py also uses this value but without having any alias hash. There are of course many places where we can handle this: in the samplesheet generator, in the csv2yaml conversion, with aliases when fetching the reference file or with multiple entries in the reference mapping .loc-file. Implementing an alias hash is probably the most flexible and future-proof solution. I'll log this as an issue.
I've been thinking that defining the following structure in biodata.yml would sove the issue:
So as to solve the following situation:
b97pla@f5f59a5
That re-downloads the same data on different directories and re-runs all the alignments.
As @b97pla discussed:
(...) what is presented to the user does not need to match what goes into the pipeline (i.e. the user could be presented with e.g. "Arabidopsis thaliana (tair9)" but your application could still write "araTha_tair9" to the csv, which would guarantee no disruption to the pipeline), or am I missing the point?
The problem is that the code pointed to by Valentine is not the only place where this is used, e.g. bcbio/pipeline/alignment.py also uses this value but without having any alias hash. There are of course many places where we can handle this: in the samplesheet generator, in the csv2yaml conversion, with aliases when fetching the reference file or with multiple entries in the reference mapping .loc-file. Implementing an alias hash is probably the most flexible and future-proof solution. I'll log this as an issue.
I've been thinking that defining the following structure in biodata.yml would sove the issue:
Then, symlink accordingly on the filesystem, instead of re-downloading the same genomes.
The text was updated successfully, but these errors were encountered: