Skip to content

Open-source framework for simple and fast integration of protein structure data with sequence annotations and genetic variation

License

Notifications You must be signed in to change notification settings

bartongroup/ProteoFAV

Repository files navigation

ProteoFAV

Protein Features, Annotations and Variants

Pypi Build Status Documentation Status Python: versions License

ProteoFAV is a Python module that address the challenge of cross-mapping protein structures and protein sequences, allowing for protein structures to be annotated with sequence features and annotations. It implements methods for working with protein structures (via mmCIF, PDB, PDB Validation, DSSP and SIFTS files), sequence Features (via UniProt GFF annotations) and genetic variants (via UniProt/EBI Proteins, Ensembl REST and TCGA TCGA Pan cancer APIs). Cross-mapping of structure and sequence is performed with the aid of SIFTS.

ProteFAV relies heavily in the Pandas library to quickly load data into DataFrames for fast data exploration and analysis. Structure and sequence data are parsed/fetched onto Pandas DataFrames that are then merged-together (collapsed) onto a single DataFrame.

Data such as protein structures (sequence and atom 3D coordinates) and respective annotations (from structural analysis, e.g. interacting interfaces, secondary structure and solvent accessibility), as well as protein sequences and annotations (e.g. genetic variants, and other functional information obtained from SIFTS and UniProt) are handled by the classes/methods so that each modular (component) table can be integrated onto a single 'merged table'.

proteofav.png

The methods implemenented in proteofav/mergers.py allow for the different components to be merged together onto a single Pandas DataFrame.

Getting Started

Dependencies

ProteoFAV was developed to support Python 3.5+ and Pandas 0.20+. Check requirements for specific requirements.

Installation

To install the stable release, run this command in your terminal:

$ pip install proteofav

If you don't have pip installed, this Python installation guide can guide you through the process.

Installing from source in a virtual environment

Getting ProteoFAV:

$ wget https://github.com/bartongroup/ProteoFAV/archive/master.zip -O ProteoFAV.zip
$ unzip ProteoFAV.zip

# alternatively, cloning the git repository
$ git clone https://github.com/bartongroup/ProteoFAV.git

Installing with Virtualenv:

$ virtualenv --python `which python` env
$ source env/bin/activate
$ pip install -r requirements.txt
$ python path/to/ProteoFAV/setup.py install

Installing With Conda:

$ conda-env create -n proteofav -f path/to/ProteoFAV/requirements.txt
$ source activate proteofav
$ cd path/to/ProteoFAV
$ pip install .

Testing the installation

Test dependencies should be resolved with:

$ python path/to/ProteoFAV/setup.py develop --user

Run the Tests with:

$ python path/to/ProteoFAV/setup.py test
# or
$ cd path/to/ProteoFAV/tests
$ python -m unittest discover

ProteoFAV Configuration

ProteoFAV uses a configuration file config.ini where the user can specify the directory paths, as well as urls for commonly used data sources.

After installing run:

$ proteofav-setup

Example Usage

Example usage is currently provided as a Jupyter Notebook, which can be viewed with the GitHub's file viewer or with the Jupyter nbviewer.

You can download the Jupyter notebook from GitHub and test it with your ProteoFAV's installation.

Contributing and Bug tracking

Feel free to fork, clone, share and distribute. If you find any bugs or issues please log them in the issue tracker.

Before you submit your Pull-requests read the Contributing Guide.

Credits

See the Credits

Changelog

See the Changelog

Licensing

The MIT License (MIT). See license for details.