Intro

To get an XMFA file that works in ClonalFrame by Xavier Didelot, the original XMFA output from MAUVE multiple genome alignment software will not work.

MAUVE is maintained by Aaron Darling ("koadman") and it's great software for aligning MANY bacterial (or smaller) genomes, and can do large genomes as well. The MAUVE output includes a non-standard formatted XMFA file which is difficult to use for downstream genomic recombination analyses. ClonalFrame and ClonalOrigin are a powerful suite of tools to identify microevolution maintained by by Xavier Didelot.

Getting the XMFA file you need.

First: Keep all your files in one directory. It's just easier that way.

Second: Get ClonalOrigin and all it's associated programs as described HERE.

START:

Align genomes with Progressive Mauve
Using the Progressive Mauve outputs, run StripSubsetsLCB as described here using the MAUVE output .xmfa and .bbcols files. It should look something like:

** "stripSubsetLCBs full_alignment.xmfa full_alignment.xmfa.bbcols core_alignment.xmfa 500" **
the StripsubsetsLCB output new XMFA file (not the original from MAUVE) now has only the CORE alignment region where all the lines and line-lengths are correct.

However, the header lines often have additional information such as genome position numbers that must be removed, leaving only the organism name (e.g. >S. aureus) for downstream analyses.

if and ONLY if, your headers look like mine (for example):

>2:3289121-3291310 + e.anophelisNUHP1.fas

As shown below, sed can be used in short script to remove the genome position numbers up to the organism name on the StripsubsetsLCB output XMFA file:

sed -r 's/^>.* />/' your_xmfa_file.xmfa

The result header from the example above now would be :

>e.anophelisNUHP1.fas

I'm not a great coder so please feel free to improve on this one-liner (thanks Dana). It seems to work for me.

Xavier Didelot suggested the following Perl script based on my initial scripting efforts:

perl -i.bk -wpe's/^>..* -*/>/' your_xmfa_file.xmfa

It is very likely that

perl -i.bk -wpe's/^>.* />/' your_xmfa_file.xmfa

Will work just as well. The next step is to infer clonal geneology which will take several days of compute time, and leave you with a consensus tree.

If you find some of your warg jobs are running slow, see my additional thoughts on Thoughts_about_slow_warg_jobs in Clonal Frame analyses

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MAUVE_XMFA_reformatting.md

MAUVE_XMFA_reformatting.md

Intro

Getting the XMFA file you need.

Files

MAUVE_XMFA_reformatting.md

Latest commit

History

MAUVE_XMFA_reformatting.md

File metadata and controls

Intro

Getting the XMFA file you need.