Issues on running BioTraDIS on multiple contigs #130

Madhuskey1993 · 2022-11-30T15:52:44Z

Hi,

We've been running BioTraDIS on a .embl file which contains a chromosome and two plasmids (GCA_000008865.2.txt). On our data the command bacteria_tradis works fine and as would be expected this produces three .insertion_site_plot.gz each one corresponding to either the chromosome or plasmids (SK-1-1-2-5.ENA_AB011548_AB011548.2.insert_site_plot.gz, SK-1-1-2-5.ENA_AB011549_AB011549.2.insert_site_plot.gz, SK-1-1-2-5.ENA_BA000007_BA000007.3.insert_site_plot.gz). Its important to note that each of these files have the same amount of lines each corresponding to the length of the particular contig in bp. However when we come to run tradis_gene_insert_sites for each insert_site_plot file we start to encounter a couple of issues within the tradis_gene_insert_site.csv generated.

An example of the tradis_gene_insert_sites generated files are here (trimmed_1-1-2-5.fq.ENA_AB011549_AB011549.2.tradis_gene_insert_sites.csv, trimmed_1-1-2-5.fq.ENA_BA000007_BA000007.3.tradis_gene_insert_sites.csv, trimmed_1-1-2-5.fq.ENA_AB011548_AB011548.2.tradis_gene_insert_sites.csv). Where BA000007 is the chromosome and AB011548 + AB011549 are the plasmids.

Issue 1
The first issue that we have encountered is that in the annotations which do not correspond to the particular chormasone or plasmid ran there is data such as read count and insertion indices being generated for some annotations. This is particularly noted in our plasmid files (denoted by the AB) where we see a read count being generated for genes which are present on the chromosome and the other plasmid, which shouldn't be happening. My guess is that the annotation for each contig is being overlayed over the insert_site_plot file creating entries for each contig up to the length of the insert_site_plot file. Our assumption is to ignore the annotations for the other contigs and set these back to 0. Is there anyway to prevent this ?

Issue 2

Secondly, we've noted another issues in regards to annotations where the genomic start and genomic end of a feature span the beginning and end of a DNA sequence. An example of this can be found here in the gene tagA.

locus_tag	gene_name	start	end	strand	read_count	ins_index	gene_length	ins_count	fcn
AB011549_1_92527_2502	tagA	92527	2502	1	0	0	-90024	0	ToxR-regulated lipoprotein
AB011549_1_2589_3464	etpC	2589	3464	1	2954	0.277397	876	243	Type II secretion pathway related protein
AB011549_1_3675_5432	etpD	3675	5432	1	7430	0.261092	1758	459	Type II secretion pathway related protein

Here tagA spans the start of the plasmid sequence and really should have a gene length of approximately 2762bp, however generates a negative gene length. In addition because of this no data entered for the gene in question. Is there anyway to solve this?

Thanks for you help

Mat

lbarquist · 2022-12-01T11:37:06Z

Hi,

Thanks for the detailed report. So, to answer these:

re: 1, I suspect this is because tradis_gene_insert_sites expects an embl file with a single replicon annotation in it. Could you try splitting your embl file into one for each replicon and process these separately with the appropriate plot files to see if this resolves the issue?

re: 2, I think this is a genuine bug, or at least an unimplemented feature -- it's fairly unusual to have a replicon sequence split in the middle of a gene annotation, and it looks like the code just doesn't consider this case in calculating the gene length leading to a nonsensical result. Assuming the above suggestion fixes your problem 1, if you could post an example case with data for one of the plasmids where this happens, I'll try to put in a fix for this. In the meantime, I don't think this should affect the rest of the result table, so as long as the tagA gene isn't your primary interest you can probably just ignore/remove this row and carry on with downstream analysis.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issues on running BioTraDIS on multiple contigs #130

Issues on running BioTraDIS on multiple contigs #130

Madhuskey1993 commented Nov 30, 2022

lbarquist commented Dec 1, 2022

Issues on running BioTraDIS on multiple contigs #130

Issues on running BioTraDIS on multiple contigs #130

Comments

Madhuskey1993 commented Nov 30, 2022

lbarquist commented Dec 1, 2022