In addition to what is described here, this document by Jeff Forcier and this talk from Carl Meyer provide wonderful footings for developing on/in open source projects.
1) Ensure all development in performed within a virtualenv. A good way too bootstrap this is via virtualenv-burrito.
Execute the installation using:
$ curl -sL https://raw.githubusercontent.com/brainsik/virtualenv-burrito/master/virtualenv-burrito.sh | $SHELL
2) Make a virtualenv called BanzaiDB:
$ mkvirtualenv BanzaiDB
3) Install autoenv:
$ git clone git://github.com/kennethreitz/autoenv.git ~/.autoenv $ echo 'source ~/.autoenv/activate.sh' >> ~/.bashrc
Something like this:
$ cd $PATH_WHERE_I_KEEP_MY_REPOS $ git clone https://github.com/mscook/BanzaiDB.git
Something like this:
$ cd BanzaiDB $ # Assuming you installed autoenv - $ # You'll want to say 'y' as this will activate the virtualenv each time you enter the code directory $ # Otherwise - $ # workon BanzaiDB $ pip install -r requirements.txt $ pip install -r requirements-dev.txt
The BanzaiDB/BanzaiDB.py is the core module. It handles database insertion, deletion and updating.
For example:
$ ~/BanzaiDB/BanzaiDB$ python BanzaiDB.py -h usage: BanzaiDB.py [-h] [-v] {init,populate,update,query} ... BanzaiDB v 0.3.0 - Database for Banzai NGS pipeline tool (http://github.com/mscook/BanzaiDB) positional arguments: {init,populate,update,query} Available commands: init Initialise a DB populate Populates a database with results of an experiment update Updates a database with results from a new experiment query List available or provide database query functions optional arguments: -h, --help show this help message and exit -v, --verbose verbose output Licence: ECL 2.0 by Mitchell Stanton-Cook <[email protected]>
Listing help on populate:
$ python BanzaiDB.py populate -h usage: BanzaiDB.py populate [-h] {qc,mapping,assembly,ordering,annotation} run_path positional arguments: {qc,mapping,assembly,ordering,annotation} Populate the database with data from the given pipeline step run_path Full path to a directory containing finished experiments from a pipeline run optional arguments: -h, --help show this help message and exit
The fabfile (Fabric file) in fabfile directory contains query pre-written functions.
You can list them like this:
$ ~/BanzaiDB$ fab -l Available commands: variants.get_variants_by_keyword Return variants with a match in the "Product" with the regular_expression variants.get_variants_in_range Return all the variants in given [start:end] range (inclusive of) variants.plot_variant_positions Generate a PDF of SNP positions for given strains using GenomeDiagram variants.strain_variant_stats Print the number of variants and variant classes for all strains variants.variant_hotspots Return the (default = 100) prevalent variant positions variants.variant_positions_within_atleast Return positions that have at least this many variants variants.what_differentiates_strains Provide variant positions that differentiate two given sets of strains
Note: python BanzaiDB.py query simply calls the fabfile discussed above.
Use GitHub. You will have already cloned the BanzaiDB repo (if you followed instructions above). To make things easier, please fork (https://github.com/mscook/BanzaiDB/fork) and update your local copy to point to your fork.
Something like this:
$ # Assuming your fork is like this $ # https://github.com/$YOUR_USERNAME/BanzaiDB/ $ vi .git/config $ # Replace: $ # url = [email protected]:mscook/BanzaiDB.git $ # with: $ # url = [email protected]:$YOUR_USERNAME/BanzaiDB.git
With this setup you will be able to push development changes to your fork and submit Pull Requests to the core BanzaiDB repo when you're happy.
Important Note: Upstream changes will not be synced to your fork by default. Please, before submitting a pull request please sync your fork with any upstream changes (specifically handle any merge conflicts). Info on syncing a fork can be found here.
We try to make joining and/or modifying the BanzaiDB project simple.
- General:
- As close to PEP8 as possible but I ain't no Saint. Just a long as it's clean and readable,
- Using standard lib UnitTest. There are convenience functions check_coverage.sh & tests/run_tests.sh respectively. We would prefer SMART test vs 100 % coverage.
- In the master GitHub repository we use hooks that call:
- landscape.io (code QC)
- drone.io (continuous integration)
- ReadTheDocs (documentation building)