QuckGOProteinAnnotation

The database QuickGO provides protein function annotations for proteins, specified by UniProt ID. Arranging proteins by function rather than family extends protein associations beyond evolutionary relations. However, proteins may have multiple functions (e.g. receptor tyrosine kinases) and are therefore not uniquely assigned.

Provided code can be used to extract (specified) annotations from QuickGO.

Installation in Conda

If not already installed, install pip and git:

conda install git
conda install pip

Then install via pip:

pip install git+git://github.com/c-feldmann/QuickGOProteinAnnotation

Quickstart

From Terminal

The script annotate_protein_list.py takes an input-file (here: demo_data/demo_uniprot_ids.tsv) where proteins are specified in the column "uniprot_id". Results are saved to the file demo_data/demo_output.tsv as a tab-separated file.

python annotate_protein_list.py -i demo_data/demo_uniprot_ids.tsv -o demo_data/demo_output.tsv -c "uniprot_id" -s tab

Argument	Explanation
-i	input file
-o	output file
-c	column name
-s	separator

The default value for -s is "tab", whereas the default output-file is named go_function_annotation.tsv.

In Python

A short example how this package could be used in a python code:

from go_protein_annotation  import DefaultAnnotation

test_proteins = ["Q16512", "P30085", "P25774"]
default_annotation = DefaultAnnotation()
protein_class_df = default_annotation.annotate_proteins(test_proteins)

protein_class_df

	uniprot_id	protein_function
0	Q16512	Transcription regulator
1	Q16512	Kinase
2	P30085	Kinase
3	P25774	Peptidase

Details

QuckGO functions are ordered hierarchically. E.g. an explicit annotation of peptidase activity implies also a hydrolase activity. Provided code extracts all explicit functional annotations and extends it with implicit annotations.

All Protein Functions

To obtain all annotations for a protein the class AllFunctionAnnotation is used.

from go_protein_annotation import  AllFunctionAnnotation
all_functions = AllFunctionAnnotation()

# For a single protein
all_functions_q16512 = all_functions.get_protein_functions("Q16512")

# For a list of proteins
protein_functions = all_functions.annotate_proteins(["Q16512", "P30085"])

all_functions_q16512.head(10)

	uniprot_id	go_id	protein_function
0	Q16512	GO:0005515	protein binding
1	Q16512	GO:0035639	purine ribonucleoside triphosphate binding
2	Q16512	GO:0000166	nucleotide binding
3	Q16512	GO:1901363	heterocyclic compound binding
4	Q16512	GO:0050681	androgen receptor binding
5	Q16512	GO:0140110	transcription regulator
6	Q16512	GO:0017076	purine nucleotide binding
7	Q16512	GO:0019901	protein kinase binding
8	Q16512	GO:0042826	histone deacetylase binding
9	Q16512	GO:0035257	nuclear hormone receptor binding

protein_functions.groupby("uniprot_id").nunique()

	go_id	protein_function
uniprot_id
P30085	30	30
Q16512	55	55

A Subset of Protein Functions

Often it can be useful to extract only a subset of protein functions. This can be achieved using the class SelectedFunctionAnnotation.

from go_protein_annotation import SelectedFunctionAnnotation

selected_functions = {"GO:0016301",  # Kinase activity
                      "GO:0140110",  # Transcription regulator activity
                      "GO:0008233",  # Peptidase activity
                      }
sel_function_extraction = SelectedFunctionAnnotation(selected_functions)
out = sel_function_extraction.get_protein_functions("Q16512")

out

	uniprot_id	go_id	protein_function
0	Q16512	GO:0140110	transcription regulator
1	Q16512	GO:0016301	kinase

User defined Protein Annotations

Users can also specify groups based on personal preferences. Therefore three arguments need to be specified:

Required functions: A set of functions which a protein must have to be assigned to this group.
Permitted functions: A set of functions of which must not overlap with the protein functions.
A name

This class is also used to define the class DefaultAnnotation. The individual definitions can be found in the file go_protein_annotation/default_use.py. A simple example to separate protein kinases from other kinases and non-kinases:

from go_protein_annotation import SpecialFunctionAnnotation
# Must have 'GO:0004672' (protein kinase activity)
# No permitted functions
# Name: "Protein kinase"
protein_kinases = ({"GO:0004672"}, set(), "Protein kinase")

# Must have 'GO:0004672' (kinase activity)
# Must not have '"GO:0004672' (protein kinase activity)
# Name: "Other kinase"
other_kinases = ({"GO:0016301"}, {"GO:0004672"}, "Other kinase")

# No required functions (all proteins would match this)
# Must not have '"GO:0016301' (kinase activity)
# Name: "Non-kinase"
non_kinases = (set(), {"GO:0016301"}, "Non-kinase")

example_classification = SpecialFunctionAnnotation([protein_kinases, other_kinases, non_kinases])

test_protein_annotations = example_classification.annotate_proteins(test_proteins)

test_protein_annotations

	uniprot_id	protein_function
0	Q16512	Protein kinase
1	P30085	Other kinase
2	P25774	Non-kinase

Default Function Definition

See go_protein_annotation/default_use.py. Explicit explanation will follow.

Miscellaneous

Only QuickGO protein functions are used. QuckGO also gives information about involvement in biological processes. These annotations are not considered.
The classes AllFunctionAnnotation and SelectedFunctionAnnotation accept the keyword alternative_name_dict
- Keys: GO ID
- Value: Alternative name
The classes AllFunctionAnnotation and SelectedFunctionAnnotation accept the keyword simplify_name
- True (default): " activity" is removed from each protein function name (e.g. "kinase activity" -> "kinase")
- False: protein functions are named as given by QuickGO

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.ipynb_checkpoints		.ipynb_checkpoints
demo_data		demo_data
go_protein_annotation		go_protein_annotation
unittests		unittests
README.ipynb		README.ipynb
README.md		README.md
annotate_protein_list.py		annotate_protein_list.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

QuckGOProteinAnnotation

Installation in Conda

Quickstart

From Terminal

In Python

Details

All Protein Functions

A Subset of Protein Functions

User defined Protein Annotations

Default Function Definition

Miscellaneous

About

Releases

Packages

Languages

c-feldmann/QuickGOProteinAnnotation

Folders and files

Latest commit

History

Repository files navigation

QuckGOProteinAnnotation

Installation in Conda

Quickstart

From Terminal

In Python

Details

All Protein Functions

A Subset of Protein Functions

User defined Protein Annotations

Default Function Definition

Miscellaneous

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages