Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Picard interval list files with .list extension can't be used with -L #3555

Closed
cmnbroad opened this issue Sep 7, 2017 · 7 comments
Closed
Assignees
Milestone

Comments

@cmnbroad
Copy link
Collaborator

cmnbroad commented Sep 7, 2017

This Barclay feature automatically expands the contents of a file ending in ".list" whenever the target argument is a collection. This precludes the use of Picard interval list files ending in ".list" with -L in GATK, since they contain a sam header. The raw sam header lines wind up getting added as interval strings, which then fails parsing: A USER ERROR has occurred: Badly formed genome unclippedLoc: Failed to parse Genome Location string: @hd VN:1.5: Problem parsing start/end value in interval string. Value was: 1.5

A short term GATK workaround is to use a file ending in one of the other known Picard interval list extensions (.interval_list, .intervals, or .picard) instead, but we should find a better fix for this since .list seems to be commonly used.

Tools such as GetHetCoverage, which take an interval list in an argument typed as a File (--snpIntervals), are able to consume the interval file because the target argument is not a collection, so the auto-expansion is not triggered.

I expect this issue could cause more problems in Picard as well once Barclay is the default parser there.

@cmnbroad cmnbroad self-assigned this Sep 7, 2017
@cmnbroad
Copy link
Collaborator Author

cmnbroad commented Sep 7, 2017

@sooheelee This is what I mentioned in slack.

@cmnbroad cmnbroad added this to the Engine-4.0 milestone Sep 7, 2017
@magicDGS
Copy link
Contributor

magicDGS commented Sep 7, 2017

I've already suggested in broadinstitute/barclay#28 (comment)), when the feature was implemented, that this can be an issue in downstream projects. Maybe it is still not late to change the extension to ".arg_list" to be sure that it is what the user requested.

@cmnbroad
Copy link
Collaborator Author

After some discussion, we're planning to change the built-in extension used by Barclay for automatic expansion from ".list" to ".args".

@sooheelee
Copy link
Contributor

sooheelee commented Sep 22, 2017 via email

@magicDGS
Copy link
Contributor

I added a PR to Barclay (broadinstitute/barclay#95) to both change the extension and to switch off the behaviour if need it. That may solve other errors in the future too, if other downstream project want to use the extension for other purposes.

@cmnbroad
Copy link
Collaborator Author

cmnbroad commented Nov 14, 2017

This is fixed by the Barclay upgrade in #3804, but we really should add an explicit test that uses a Picard interval list with a .list extension that includes a header (i.e., src/test/resources/small_unmerged_picard_intervals.list, but this is only used in a unit test).

@droazen
Copy link
Contributor

droazen commented Jan 16, 2018

We fixed this.

@droazen droazen closed this as completed Jan 16, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants