Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FILTER line is malformed #1

Open
dridk opened this issue Apr 26, 2022 · 1 comment
Open

FILTER line is malformed #1

dridk opened this issue Apr 26, 2022 · 1 comment

Comments

@dridk
Copy link
Owner

dridk commented Apr 26, 2022

Issue from jamescasbon#337

Background:
In FILTER, multiple filters should be separated by semicolons. The widely used, but not actively maintained, VarScan2 genomic variant caller uses commas instead. Moreover, VarScan2 does not add ##FILTER metadata for most of its filters. Picard FixVcfHeader can be used to fix missing FILTER metadata. A "fixed" metadata row will look like:
##FILTER=<ID="RefAvgRL,VarAvgRL",Description="Missing description: this FILTER line was added by Picard's FixVCFHeader">

Error:
PyVCF fails with:
`
Traceback (most recent call last):
File "/mnt/hdd/dnanexus/scripts_local/compare_vcfs.py", line 236, in
main()

File "/mnt/hdd/dnanexus/scripts_local/compare_vcfs.py", line 232, in main
run(parser.parse_args())

File "/mnt/hdd/dnanexus/scripts_local/compare_vcfs.py", line 166, in run
df_1 = vcf_to_dataframe(args.vcf_1)

File "/mnt/hdd/dnanexus/scripts_local/compare_vcfs.py", line 74, in vcf_to_dataframe
vcf_reader = vcf.Reader(open(vcf_file, "r"))

File "/home/myourshaw/.venv/dnanexus/lib/python3.10/site-packages/vcf/parser.py", line 300, in init
self._parse_metainfo()

File "/home/myourshaw/.venv/dnanexus/lib/python3.10/site-packages/vcf/parser.py", line 326, in _parse_metainfo
key, val = parser.read_filter(line)

File "/home/myourshaw/.venv/dnanexus/lib/python3.10/site-packages/vcf/parser.py", line 142, in read_filter
raise SyntaxError(

SyntaxError: One of the FILTER lines is malformed: ##FILTER=<ID="RefAvgRL,VarAvgRL",Description="Missing description: this FILTER line was added by Picard's FixVCFHeader">
`

Issue:
It might be more robust for PyVCF to treat a filter with commas as just one big filter name, as does Picard FixVcfHeader.
Instead of raising an exception, accept metadata with a filter ID inside double quotes and containing commas, e.g., ID="RefAvgRL,VarAvgRL".
Similarly, in the data, treat a FILTER value like RefAvgRL,VarAvgRL as a single entity. I think this solution is consistent with the VCF 4.2 spec for a filter name: String, no whitespace or semicolons permitted.

Possible pull request:
This hack (changing [^,] + to .+ worked to get me through an urgent analysis, but it may not be the best solution. At parser.py line 142
self.filter_pattern = re.compile(r'''##FILTER=< ID=(?P.+),\s* Description="(?P[^"]*)" >''', re.VERBOSE)

=======

I get the same problem, any update on this issue ?

I hoped switching to PyVCF3 (c.f. jamescasbon#335 ) would solve the issue but apparently not.

My bad, in my case the problem originated from a tag Source in a FILTER field:

##FILTER=<ID=xxx,Description="yyy",Source="zzz">

which is a INFO field tag according to https://samtools.github.io/hts-specs/ and not a FILTER field tag.

@dridk
Copy link
Owner Author

dridk commented Apr 26, 2022

Could you five me a VCF example to test ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant