Skip to content

Preprocessing Script Tutorial

Marcus Fedarko edited this page Jul 16, 2018 · 4 revisions

NOTE: This tutorial isn't finished yet, sorry. Please check back soon -- in the meantime, feel free to contact us with any questions about MetagenomeScope.

This tutorial will take you through the process of converting an assembly graph file into a SQLite3 database file that can be visualized in MetagenomeScope's viewer interface application. This will necessitate using MetagenomeScope's preprocessing script, a command-line tool.

Downloading and installing the script

You can download MetagenomeScope's source code using the "Clone or download" button on the main page of its GitHub repository. Once you've downloaded MetagenomeScope's source code, you'll need to make sure the relevant system requirements are installed on your machine.

If you want to use the SPQR tree functionality of MetagenomeScope, you'll need to ensure some extra system requirements are installed, and you'll need to manually build the "SPQR script". However, if you don't want to use this functionality right now, then you can skip this step.

Running the script

You can generate a .db file using the preprocessing script. A list of the various options available is located here; only the -i and -o options are required, though.

Sample data

The preprocessing script should work with any assembly graph in GFA, GML, or LastGraph format. We've provided a sample test file (in GML format) for reference, if you don't have any assembly graph data readily available. (This same assembly graph will be used in the viewer interface tutorial, also.)

Commands

Our objective is to convert the given assembly graph file into a database file ready for visualization in the viewer interface. The simplest way to do this is to run the following command:

python graph_collator/collate.py -i sample.gml -o sample

This will create a database file sample.db in the current working directory.

The preprocessing script features a number of other arguments that can be used to produce alternate functionality. See the Preprocessing Script Settings page for more information on these settings.