You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm testing out your pipeline on a few individuals for which I have subsetted data (to see how well the programs perform on different read depths on my specific species and data). Looking at the summary output I saw that RelocaTE2 was reporting the exact same amount of reference TEs over different depths (8x, 16x, 24x, and 32x) but also between the individuals I have the same amount of reference TEs.
When I checked the unfiltered All.all_ref_insert.gff file I see that a lot of these reported reference TEs don't have the junction supported by reads nor left/right support reads, so I assume they should be excluded:
I can find these also in the summary output html, while they shouldn't be there.
This also happens when the tag reports insufficient data:
chr_4 RelocaTE2 not_given 4250994 4251609 . - . ID=repeat_chr_4_4250994_4251609;TSD=insufficient_data;Note=insufficient_data;Right_junction_reads:0;Left_junction_reads:5;Right_support_reads:1;Left_support_reads:0;
I can find this in the summary and the bed file, while it shouldn't be.
Interestingly, if a reference TE is not at all found within the sample, this is correctly excluded from the summary outputs. To clarify, what I'm seeing is that as soon as a reference TE is found 1x (with appropriate confidence), all other occurrences mentioned in the gff are also included regardless of them being not having any support.
I'm happy to share the files if needed.
Best,
Andrea
The text was updated successfully, but these errors were encountered:
Thanks for reporting this issue. This looks like an area for improvement in how McClintock is parsing RelocaTE2 output. It would be helpful to have test files to replicate this result. Can you email me offline to arrange a data transfer?
Hi,
I'm testing out your pipeline on a few individuals for which I have subsetted data (to see how well the programs perform on different read depths on my specific species and data). Looking at the summary output I saw that RelocaTE2 was reporting the exact same amount of reference TEs over different depths (8x, 16x, 24x, and 32x) but also between the individuals I have the same amount of reference TEs.
When I checked the unfiltered All.all_ref_insert.gff file I see that a lot of these reported reference TEs don't have the junction supported by reads nor left/right support reads, so I assume they should be excluded:
However, this is them in the summary nonredundant.bed file:
I can find these also in the summary output html, while they shouldn't be there.
This also happens when the tag reports insufficient data:
chr_4 RelocaTE2 not_given 4250994 4251609 . - . ID=repeat_chr_4_4250994_4251609;TSD=insufficient_data;Note=insufficient_data;Right_junction_reads:0;Left_junction_reads:5;Right_support_reads:1;Left_support_reads:0;
I can find this in the summary and the bed file, while it shouldn't be.
Interestingly, if a reference TE is not at all found within the sample, this is correctly excluded from the summary outputs. To clarify, what I'm seeing is that as soon as a reference TE is found 1x (with appropriate confidence), all other occurrences mentioned in the gff are also included regardless of them being not having any support.
I'm happy to share the files if needed.
Best,
Andrea
The text was updated successfully, but these errors were encountered: