Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

summary file: missing results #24

Open
Gilles179 opened this issue Jun 19, 2020 · 2 comments
Open

summary file: missing results #24

Gilles179 opened this issue Jun 19, 2020 · 2 comments
Labels
bug Something isn't working enhancement New feature or request

Comments

@Gilles179
Copy link

As part of new scheme set-up, it is convenient to test a database on multiple assemblies, and to use the --summary option in combination with the -o option to get a list of assignments.
However, when analysing a few hundred assemblies, I get a summary file with 53 to 57 lines instead of the expected number (whereas I get the whole list on the screen)

@CarolineOhrman CarolineOhrman added bug Something isn't working enhancement New feature or request labels Jun 25, 2020
@CarolineOhrman
Copy link
Contributor

I have also experienced that the summary file not catches all assemblies. Out of my 283 genomes only 117 ended up in the summary file.

As it is now, the summary function is implemented to only include assemblies where a Final snp call was made.

I would instead like to have all genomes in the summaryfile. If I run 283 genomes all of them should be in the summary even if the final call is NA. In the summary i propose to have 3 columns; ID, final_snp and snp_path. Then the results are easy to cut and paste with other data. Se example below. Then its easy to see also the final snps that ended up as NA and investigate more whats the issue.

ID            final_snp  snp_path
NIH_B_38      A.II.2     T/N.1;T.1;A/M.1;A.1;A.II.1;A.II.2
WY_00W4114    A.II.4     T/N.1;T.1;A/M.1;A.1;A.II.1;A.II.2;A.II.6;A.II.3;A.II.4
WY96          A.II.4     T/N.1;T.1;A/M.1;A.1;A.II.1;A.II.2;A.II.6;A.II.3;A.II.4
O_HARA        NA         T/N.1;T.1;B.1;B.16;B.218;B.219;A.II.3;A.II.4

I also propose to change snp_summary.txt to summary_snp.txt to match summary_tree.pdf

@Gilles179
Copy link
Author

I fully agree, would be great.
Regarding what is currently included in the summary file, in a run with 400+ genomes, only 50+ end in the summary, even if I have only about 10 genomes classified as NA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants