You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We've been running BioTraDIS on a .embl file which contains a chromosome and two plasmids (GCA_000008865.2.txt). On our data the command bacteria_tradis works fine and as would be expected this produces three .insertion_site_plot.gz each one corresponding to either the chromosome or plasmids (SK-1-1-2-5.ENA_AB011548_AB011548.2.insert_site_plot.gz, SK-1-1-2-5.ENA_AB011549_AB011549.2.insert_site_plot.gz, SK-1-1-2-5.ENA_BA000007_BA000007.3.insert_site_plot.gz). Its important to note that each of these files have the same amount of lines each corresponding to the length of the particular contig in bp. However when we come to run tradis_gene_insert_sites for each insert_site_plot file we start to encounter a couple of issues within the tradis_gene_insert_site.csv generated.
Issue 1
The first issue that we have encountered is that in the annotations which do not correspond to the particular chormasone or plasmid ran there is data such as read count and insertion indices being generated for some annotations. This is particularly noted in our plasmid files (denoted by the AB) where we see a read count being generated for genes which are present on the chromosome and the other plasmid, which shouldn't be happening. My guess is that the annotation for each contig is being overlayed over the insert_site_plot file creating entries for each contig up to the length of the insert_site_plot file. Our assumption is to ignore the annotations for the other contigs and set these back to 0. Is there anyway to prevent this ?
Issue 2
Secondly, we've noted another issues in regards to annotations where the genomic start and genomic end of a feature span the beginning and end of a DNA sequence. An example of this can be found here in the gene tagA.
<style>
</style>
locus_tag
gene_name
ncrna
start
end
strand
read_count
ins_index
gene_length
ins_count
fcn
AB011549_1_92527_2502
tagA
0
92527
2502
1
0
0
-90024
0
ToxR-regulated lipoprotein
AB011549_1_2589_3464
etpC
0
2589
3464
1
2954
0.277397
876
243
Type II secretion pathway related protein
AB011549_1_3675_5432
etpD
0
3675
5432
1
7430
0.261092
1758
459
Type II secretion pathway related protein
Here tagA spans the start of the plasmid sequence and really should have a gene length of approximately 2762bp, however generates a negative gene length. In addition because of this no data entered for the gene in question. Is there anyway to solve this?
Thanks for you help
Mat
The text was updated successfully, but these errors were encountered:
Thanks for the detailed report. So, to answer these:
re: 1, I suspect this is because tradis_gene_insert_sites expects an embl file with a single replicon annotation in it. Could you try splitting your embl file into one for each replicon and process these separately with the appropriate plot files to see if this resolves the issue?
re: 2, I think this is a genuine bug, or at least an unimplemented feature -- it's fairly unusual to have a replicon sequence split in the middle of a gene annotation, and it looks like the code just doesn't consider this case in calculating the gene length leading to a nonsensical result. Assuming the above suggestion fixes your problem 1, if you could post an example case with data for one of the plasmids where this happens, I'll try to put in a fix for this. In the meantime, I don't think this should affect the rest of the result table, so as long as the tagA gene isn't your primary interest you can probably just ignore/remove this row and carry on with downstream analysis.
Hi,
We've been running BioTraDIS on a .embl file which contains a chromosome and two plasmids (GCA_000008865.2.txt). On our data the command bacteria_tradis works fine and as would be expected this produces three .insertion_site_plot.gz each one corresponding to either the chromosome or plasmids (SK-1-1-2-5.ENA_AB011548_AB011548.2.insert_site_plot.gz, SK-1-1-2-5.ENA_AB011549_AB011549.2.insert_site_plot.gz, SK-1-1-2-5.ENA_BA000007_BA000007.3.insert_site_plot.gz). Its important to note that each of these files have the same amount of lines each corresponding to the length of the particular contig in bp. However when we come to run tradis_gene_insert_sites for each insert_site_plot file we start to encounter a couple of issues within the tradis_gene_insert_site.csv generated.
An example of the tradis_gene_insert_sites generated files are here (trimmed_1-1-2-5.fq.ENA_AB011549_AB011549.2.tradis_gene_insert_sites.csv, trimmed_1-1-2-5.fq.ENA_BA000007_BA000007.3.tradis_gene_insert_sites.csv, trimmed_1-1-2-5.fq.ENA_AB011548_AB011548.2.tradis_gene_insert_sites.csv). Where BA000007 is the chromosome and AB011548 + AB011549 are the plasmids.
Issue 1
The first issue that we have encountered is that in the annotations which do not correspond to the particular chormasone or plasmid ran there is data such as read count and insertion indices being generated for some annotations. This is particularly noted in our plasmid files (denoted by the AB) where we see a read count being generated for genes which are present on the chromosome and the other plasmid, which shouldn't be happening. My guess is that the annotation for each contig is being overlayed over the insert_site_plot file creating entries for each contig up to the length of the insert_site_plot file. Our assumption is to ignore the annotations for the other contigs and set these back to 0. Is there anyway to prevent this ?
Issue 2
Secondly, we've noted another issues in regards to annotations where the genomic start and genomic end of a feature span the beginning and end of a DNA sequence. An example of this can be found here in the gene tagA.
<style> </style>Here tagA spans the start of the plasmid sequence and really should have a gene length of approximately 2762bp, however generates a negative gene length. In addition because of this no data entered for the gene in question. Is there anyway to solve this?
Thanks for you help
Mat
The text was updated successfully, but these errors were encountered: