Skip to content

Development

Tony Boyles edited this page Sep 18, 2018 · 2 revisions

Please note that this page is for developers and likely does not contain any information relevant to users of MicrobeTrace.

Architecture

MicrobeTrace is a static web application. Accordingly, everything is written in HTML, CSS, and Javascript. There is a nodejs-based express server designed for development, testing, and deploying to Heroku, but this is purely a convenience script and not required to deploy or use MicrobeTrace.

Requirements

To develop, you're going to need a computer with:

Getting Started

To jump right to development, Fork the repository. Once Github finishes, open up your terminal emulator, cd over to your favorite projects directory, and clone your fork of the repository. i.e., run

git clone https://github.com/<your-github-username>/MicrobeTRACE.git

This will download this repo. Next, enter the repository:

cd MicrobeTrace

... and install the dependencies:

yarn

(Alternately, if you didn't follow my advice and install yarn, run this:)

npm install

This will download all of the development dependencies for this project. Note that this will download several hundred megabytes. Luckily, the dev dependencies are much, much larger than the prod dependencies, so the distribution file that should be output will only take up a few tens of megabytes.

Next, issue the following command to your command prompt:

yarn start #or npm start

This will launch the server, but not your browser. Open your preferred browser and go to http://localhost:5000/ to see MicrobeTrace.

Design

The basic design of MicrobeTrace works like this: first, the browser loads the index.html file, like every other static file load at the root of a directory. That file contains the core DOM for the app, as well as link and script tags for all dependencies used by more than one widget in the app. However, most importantly, it contains an in-line Javascript which sets up this MicrobeTrace session. It also sets up the DOM components which must be procedurally constructed, rather than statically interpreted. Finally, it loads the contents of components/files.html

Each file in the components/ directory mirrors the structure of index.html. At the top are the links to the stylesheets required by the view, followed by the DOM content, followed by an in-line Javascript to set up the view's interactive features.

Note how the files in the components/ directory are not complete HTML documents by themselves. They aren't intended to be loaded in independent windows, but to loaded into GoldenLayout panes within the same window.

Data

Under the hood, MicrobeTrace tracks everything it needs in two global objects: app and session. app contains all the mechanical functions which are required globally, but are not provided by other libraries (e.g. underscore.js). session contains all the information required to reconstruct that session of MicrobeTrace. In particular, session.data contains all the data used to construct the network.

app, in detail

app is almost completely created by the scripts/common.js file, which was used in earlier versions of MicrobeTrace to make code portable across windows. Now it is simply included in index.html (along with all the requisite libraries).

session in detail

session contains all the information that is unique to that session of MicrobeTrace. Specifically:

session.data - all the data, as Javascript objects
session.data.nodes - an array of distinct nodes (i.e. with a unique "id") as a Javascript object
session.data.links - an array of distinct, non-directed links. While we may be able to paint arrows on links, link objects are deduplicated by checking for (a.source === b.source && a.target === b.target) || (a.source === b.target && a.target === b.source), meaning we cannot represent two directed links between two nodes.
session.style - everything needed to color, draw and style your session.

Getting your feet wet

  1. Testing - (Our testing regime) leaves something to be desired. We need to develop a battery of Unit tests, E2E tests, and adversarial tests, along with an automated testing infrastructure to verify application consistency before deployment.
  2. Check out what's going on in the issues. Try picking a feature request and seeing if you can implement it.

Adding a component

To add a component, simply duplicate the most similar component.

Diving in headfirst

If you're interested in paying down some technical debt, here are some good places to start.

  1. The DOM Architecture - This is a biggie. Right now, this project is mostly powered by jQuery spaghetti. It works if you (the developer) catch all the edge cases, but there are a lot of things that can go wrong. The exception to this is the table view, which is powered by Vue.js. This could just as easily be React, Angular 2, Moon, backbonejs, <whatever's popular now>, etc... Basically, we should probably transition views away from jQuery towards more MV*. Since we already have one example using Vue, that should probably be the framework we adopt, but we don't have enough committed to it to fight too hard if there's a highly compelling reason to adopt one of the others.
  2. The UI - I'm no UX designer. I like to make things that look clean and friendly, but it's easy to take pot-shots at UI design without having any substantive criticism, so a lot of people do. Accordingly, I don't have a strong set of beliefs about whether the UI is good or not. My suspicion is not, but changing that is hard, when this was the best I could do to begin with. If you can take Bootstrap 4 and do amazing things with it, be my guest.
  3. The Algorithm - We're computing a distance matrix, which is an O(kN^2) operation, where N is the number of sequences and k is the length of the sequences. However, all we need is actually to know whether any two nodes are within some threshold of each other, not the exact distance. Accordingly, we may be able to compute a consensus sequence and distance from each node to the consensus. Then, in lieu of computing each pairwise distance, we can compute the difference and sum of their consensus distances, and if that range covers the default threshold value, only then do we know we need to compute the actual genetic distance. (Huge Hat tip to @Sergey-Knyazev for designing this scheme). Now we just need to implement it.
  4. The Output - The network is rendered to the browser using D3, generating SVG. However, SVG is extremely costly for large networks (>1e4 elements). We should transition to a blend of SVG (for selection identifiers) and Canvas (for everything else) to keep the DOM from dragging.
  5. The data architecture - we've come close to a writing fully-featured network library in javascript. It's primary components can be found in the app global (from scripts/common.js), but we've toyed with the idea of spinning it out for others to use.

Other stuff

In addition to MicrobeTrace, several related Open Source projects have been developed or improved-upon in support of MicrobeTrace. These are:

This is a javascript port I (Tony Boyles) wrote of libtn93, itself a C port of TN93 implementation used in the original HIVTrace. An earlier version of this project used hivtrace along with NetworkX to compute a bounded minimum spanning tree of tn93 distances. Unfortunately, this architecture proved much too cumbersome to ship with a viable product. The javascript alternative has its downsides (most conspicuously, it's slower than the C version). However, the Javascript solution is comparatively quick (given that it's performing an O(n^2) operation and there's literally nothing we can do about that).

Anyway, I've maintained tn93.js in a separate repository. It works well, but its much slower than the version that was transpiled from C to WebAssembly. I found the C interface to be failure-prone, but we might reconsider using it at some point in the future.

This is the library that MicrobeTrace uses for gapped alignments. I didn't code it originally, but I added the scaffolding to make it work in Node and in Browser, and published it on NPM. It's generally fine, but you should know what it is (just in case).

This is the library that MicrobeTrace uses for ungapped alignments. It's blisteringly fast because it uses binary encoding and bitwise comparisons for nucleotides. Sadly, it doesn't do much else. We should really extend it to subsume the task of the above two libraries. I believe we could gain a substantial speed advantage.

In past iterations, MicrobeTrace used MSAViewer to show sequences. However, MSAViewer's esoteric handling of its dependencies made it a bad candidate for inclusion in a larger application like MicrobeTrace. However, there weren't any other viable alternatives, so we wrote one.

Clone this wiki locally