Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No reports of IS3, IS4 and IS5 in the .sum file #7

Open
fgaudilliere opened this issue Oct 13, 2022 · 4 comments
Open

No reports of IS3, IS4 and IS5 in the .sum file #7

fgaudilliere opened this issue Oct 13, 2022 · 4 comments

Comments

@fgaudilliere
Copy link

fgaudilliere commented Oct 13, 2022

Hello,

I used digIS on a large number of bacterial genomes, and while IS belonging to the IS3, IS4, IS5, IS200/605 and ISNCY are reported in the .csv and in the .gff file, they're not listed as belonging to these families in the .sum file. Is there a reason for this?

Best,
Flora

@janka2012
Copy link
Owner

Hi @fgaudilliere, thanks for reporting this. In general, I do not see a reason why this should happen. May I ask you to share a with me at least one bacterial genome in which you see this is happening and the .csv/.gff and .sum file?

@fgaudilliere
Copy link
Author

Here are the files for one of my genomes: digIS_issue.zip

@janka2012
Copy link
Owner

janka2012 commented Oct 13, 2022

@fgaudilliere I found what is the issue. IS3, IS4, IS5 and IS200/605 contain multiple subfamilies (see here) and we refer to them e.g. IS3_IS2. Then, if the found record contains a _ character, it is reported in the other group of detected IS elements as it is not family but subfamily level. I can fix this but would be nice to see your perspective on how it would make sense the most. However, if this is blocking you in any way, feel free to create your own summary statistics from the .csv/.gff output file. I hope this helps :)

@fgaudilliere
Copy link
Author

Thanks for the quick answer!
I think what would make the most sense to me would be to regroup the IS3, IS4, IS5 and IS200/605 copies by family in the .sum report: that way it remains an overview without too many details, but if someone is interested in the subfamily, the information is still available in the .csv and .gff files.
Yes, I'm writing a small script to extract data from the .csv file :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants