An implementation of the same algorithm from Perseus Wiśniewski, J. R., Hein, M. Y., Cox, J. and Mann, M. (2014) A “Proteomic Ruler” for Protein Copy Number and Concentration Estimation without Spike-in Standards. Mol Cell Proteomics 13, 3497–3506.
used for estimation of protein copy number from deep profile experiment.
Python >= 3.9
pip install proteomicruler
In order to use the package, it is required that the input data is loaded into a pandas.DataFrame
object. The following
basic parameters are also required:
accession_id_col
- column name that contains protein accession idsmw_col
- column name that contains molecular weight of proteinsploidy
- ploidy numbertotal_cellular_protein_concentration
- total cellular protein concentration used for calculation of total volumeintensity_columns
- list of column names that contain sample intensities
import pandas as pd
accession_id_col = "Protein IDs"
# used as unique index and to directly fetch mw data from UniProt
mw_col = "Mass"
# molecular weight column name
ploidy = 2
# ploidy number
total_cellular_protein_concentration = 200
# cellular protein concentration used for calculation of total volume
filename = r"example_data\example_data.tsv" # example data from Perseus
df = pd.read_csv(filename, sep="\t")
# selecting intensity columns
intensity_columns = df.columns[57:57+16] # select 16 columns starting from column 57th that contain sample intensity
If the data does not contain molecular weight information, it is required to fetch it from UniProt.
from proteomicRuler.ruler import add_mw
df = add_mw(df, accession_id_col)
df = df[pd.notnull(df[mw_col])]
df[mw_col] = df[mw_col].astype(float)
The Ruler object can be created by passing the DataFrame
object and the required parameters.
from proteomicRuler.ruler import Ruler
ruler = Ruler(df, intensity_columns, mw_col, accession_id_col, ploidy, total_cellular_protein_concentration) #
ruler.df.to_csv("output.txt", sep="\t", index=False)
It is also possible to use the package through the command line interface.
Usage: ruler [OPTIONS]
Options:
-i, --input FILENAME Input file containing intensity of samples and
uniprot accession ids
-o, --output FILENAME Output file
-p, --ploidy INTEGER Ploidy of the organism
-t, --total-cellular FLOAT Total cellular protein concentration
-m, --mw-column TEXT Molecular weight column name
-a, --accession-id-col TEXT Accession id column name
-c, --intensity-columns TEXT Intensity columns list delimited by commas
-g, --get-mw Get molecular weight from uniprot
--help Show this message and exit.