Protein Features, Annotations and Variants
ProteoFAV is a Python module that address the challenge of cross-mapping protein structures and protein sequences, allowing for protein structures to be annotated with sequence features and annotations. It implements methods for working with protein structures (via mmCIF, PDB, PDB Validation, DSSP and SIFTS files), sequence Features (via UniProt GFF annotations) and genetic variants (via UniProt/EBI Proteins, Ensembl REST and TCGA TCGA Pan cancer APIs). Cross-mapping of structure and sequence is performed with the aid of SIFTS.
ProteFAV relies heavily in the Pandas library to quickly load data into DataFrames for fast data exploration and analysis. Structure and sequence data are parsed/fetched onto Pandas DataFrames that are then merged-together (collapsed) onto a single DataFrame.
Data such as protein structures (sequence and atom 3D coordinates) and respective annotations (from structural analysis, e.g. interacting interfaces, secondary structure and solvent accessibility), as well as protein sequences and annotations (e.g. genetic variants, and other functional information obtained from SIFTS and UniProt) are handled by the classes/methods so that each modular (component) table can be integrated onto a single 'merged table'.
The methods implemenented in proteofav/mergers.py
allow for the different components to be merged together onto a single Pandas DataFrame.
ProteoFAV was developed to support Python 3.5+ and Pandas 0.20+. Check requirements for specific requirements.
To install the stable release, run this command in your terminal:
$ pip install proteofav
If you don't have pip installed, this Python installation guide can guide you through the process.
Getting ProteoFAV:
$ wget https://github.com/bartongroup/ProteoFAV/archive/master.zip -O ProteoFAV.zip
$ unzip ProteoFAV.zip
# alternatively, cloning the git repository
$ git clone https://github.com/bartongroup/ProteoFAV.git
Installing with Virtualenv:
$ virtualenv --python `which python` env
$ source env/bin/activate
$ pip install -r requirements.txt
$ python path/to/ProteoFAV/setup.py install
Installing With Conda:
$ conda-env create -n proteofav -f path/to/ProteoFAV/requirements.txt
$ source activate proteofav
$ cd path/to/ProteoFAV
$ pip install .
Test dependencies should be resolved with:
$ python path/to/ProteoFAV/setup.py develop --user
Run the Tests with:
$ python path/to/ProteoFAV/setup.py test
# or
$ cd path/to/ProteoFAV/tests
$ python -m unittest discover
ProteoFAV uses a configuration file config.ini
where the user can specify the directory paths, as well as urls for commonly used data sources.
After installing run:
$ proteofav-setup
Example usage is currently provided as a Jupyter Notebook, which can be viewed with the GitHub's file viewer or with the Jupyter nbviewer.
You can download the Jupyter notebook from GitHub and test it with your ProteoFAV's installation.
Feel free to fork, clone, share and distribute. If you find any bugs or issues please log them in the issue tracker.
Before you submit your Pull-requests read the Contributing Guide.
See the Credits
See the Changelog
The MIT License (MIT). See license for details.