Skip to content

Commit

Permalink
Merge pull request #264 from SysBioChalmers/feat/readme
Browse files Browse the repository at this point in the history
doc: README for GECKO root and databases
  • Loading branch information
edkerk authored Mar 5, 2023
2 parents 9acd37c + 89b8731 commit 983c490
Show file tree
Hide file tree
Showing 44 changed files with 245 additions and 478 deletions.
115 changes: 35 additions & 80 deletions README.rst
Original file line number Diff line number Diff line change
@@ -1,126 +1,81 @@
.. image:: GECKO.png
:align: center

|Current Version| |Tests passing| |Build Status| |PyPI Version| |Docs Status| |Gitter|
|Current Version| |Tests passing| |Gitter| |Zenodo|

About GECKO
-----------

The **GECKO** toolbox is a Matlab/Python package for enhancing a **G**\ enome-scale model to account for **E**\ nzyme **C**\ onstraints, using **K**\ inetics and **O**\ mics. It is the companion software to `this <http://www.dx.doi.org/10.15252/msb.20167411>`_ publication, and it has two main parts:
The **GECKO** toolbox is able to enhance a **G**\ enome-scale model to account for **E**\ nzyme **C**\ onstraints, using **K**\ inetics and **O**\ mics. The resulting enzyme-constrained model (**ecModel**) can be used to perform simulations where enzyme allocation is either drawn from a total protein pool, or constrained by measured protein levels from proteomics data.

- ``geckomat``: Matlab+Python scripts to fetch online data and build/simulate enzyme-constrained models.
- ``geckopy``: a Python package which can be used with `cobrapy <https://opencobra.github.io/cobrapy/>`_ to obtain a ecYeastGEM model object, optionally adjusted for provided proteomics data.
**Note:** Due to significant refactoring of the code, ecModels generated with GECKO versions 1 or 2 are not compatible with GECKO 3, and *vice versa*. The latest GECKO 2 release is available `here <https://github.com/SysBioChalmers/GECKO/releases/tag/v2.0.3>`_, while the ``gecko2`` branch is retained.

Last update: 2021-02-17
**Citation**

This repository is administered by Benjamin J. Sanchez (`@BenjaSanchez <https://github.com/benjasanchez>`_), Division of Systems and Synthetic Biology, Department of Biology and Biological Engineering, Chalmers University of Technology.
- A GECKO 3 publication is currently under consideration, citation information will appear here in due course.
- For GECKO release 2, please cite `Domenzain et al. (2022) <https://doi.org/10.1038/s41467-022-31421-1>`_.
- For GECKO release 1, please cite `Sánchez et al. (2017) <https://doi.org/10.15252/msb.20167411>`_.

Last update: 2023-03-05

geckomat: Building enzyme-constrained models
--------------------------------------------

Required software - Python module
Required software
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

- `Python 2.7 <https://www.python.org/>`_
- `setuptools for python 2.7 <http://www.lfd.uci.edu/~gohlke/pythonlibs/#setuptools>`_
- SOAPpy:

::

easy_install-2.7 SOAPpy
- MATLAB version 2019b or later, no additional MathWorks toolboxes are required.
- `RAVEN <https://github.com/SysBioChalmers/RAVEN>`_ Toolbox version 2.7.12 or later.
- `Gurobi Optimizer <https://www.gurobi.com/solutions/gurobi-optimizer/>`_ is recommended for simulations (free academic license available). Alternatively, the open-source GNU Linear Programming Kit (`GLPK <https://www.gnu.org/software/glpk/>`_, distributed with RAVEN) or SoPlex as part of the `SCIP Optimization Suite <https://scipopt.org/>`_ can be used.
- `Docker <https://www.docker.com/>`_ for running DLKcat.

Required software - Matlab module
Installation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

- `MATLAB <http://www.mathworks.com/>`_ 9.1 (R2016b) or higher + Optimization Toolbox.
- The `COBRA toolbox for MATLAB <https://github.com/opencobra/cobratoolbox>`_.
- The `RAVEN toolbox for MATLAB <https://github.com/SysBioChalmers/RAVEN>`_.
- The `libSBML MATLAB API <https://sourceforge.net/projects/sbml/files/libsbml/MATLAB%20Interface>`_ (version 5.17.0 is recommended).
**GECKO toolbox**

Usage
~~~~~
- The preferred way to download GECKO is via git clone::

- **For creating an enzyme constrained model:**
git clone --depth=1 https://github.com/SysBioChalmers/GECKO

- Update the following data files in ``/databases`` with your organism infomation:
- Alternatively, a `ZIP-archive <https://github.com/SysBioChalmers/GECKO/releases>`_ can be directly downloaded from GitHub. The ZIP-archive should be extracted to a disk location where the user has read- and write-access rights.

- ``databases/prot_abundance.txt``: Protein abundance Data from Pax-DB. If data is not available for your organism, then a relative proteomics dataset (in molar fractions) can be used instead. The required format is a tab-separated file, named as ``databases/relative_proteomics.txt`` , with a single header line and 2 columns; the first with gene IDs and the second with the relative abundances for each protein.
- ``databases/uniprot.tab``: Gene-proteins data from uniprot.
- ``databases/chemostatData.tsv``: Chemostat data for estimating GAM (optional, called by ``fitGAM.m``).
- ``databases/manual_data.txt``: Kcat data from eventual manual curations (optional, called by ``manualModifications.m``).
- After git clone or extracting the ZIP-archive, the user should navigate in MATLAB to the GECKO folder. GECKO can then be installed with the command that adds GECKO (sub-)folders to the MATLAB path::

- Adapt the following functions in ``/geckomat`` to your organism:
cd('C:\path\to\GECKO') % Modify to match GECKO folder and OS
GECKOInstaller.install

- ``geckomat/getModelParameters.m``
- ``geckomat/change_model/manualModifications.m``
- ``geckomat/limit_proteins/sumProtein.m``
- ``geckomat/limit_proteins/scaleBioMass.m``
- ``geckomat/kcat_sensitivity_analysis/changeMedia_batch.m`` (optional)
- ``geckomat/change_model/removeIncorrectPathways.m`` (optional, called by ``manualModifications.m``)
- ``geckomat/limit_proteins/sumBioMass.m`` (optional, called by ``sumProtein.m`` & ``scaleBiomass.m``)
- If desired, a removal command is available as::

- Run ``geckomat/get_enzyme_data/updateDatabases.m`` to update ``ProtDatabase.mat``.
- Run ``geckomat/enhanceGEM.m`` with your metabolic model as input.
GECKOInstaller.uninstall

- **For performing simulations with an enzyme-constrained model:** Enzyme-constrained models can be used as any other metabolic model, with toolboxes such as COBRA or RAVEN. For more information on rxn/met naming convention, see the supporting information of `Sanchez et al. (2017) <https://dx.doi.org/10.15252/msb.20167411>`_
**RAVEN Toolbox and Gurobi**

geckopy: Integrating proteomic data to ecYeastGEM
-------------------------------------------------
- The RAVEN Toolbox Wiki contains installation instructions for both `RAVEN Toolbox <https://github.com/SysBioChalmers/RAVEN/wiki/Installation>`_ and `Gurobi <https://github.com/SysBioChalmers/RAVEN/wiki/Installation#solvers>`_.

If all you need is the ecYeastGEM model to use together with cobrapy you can use the ``geckopy`` Python package.
- Briefly, RAVEN is either downloaded via git clone, as ZIP-archive from GitHub, or installed as `MATLAB AddOn <https://se.mathworks.com/matlabcentral/fileexchange/112330-raven-toolbox>`_.

Required software
~~~~~~~~~~~~~~~~~
- After finishing all installation instructions, the user should run installation checks in MATLAB with::

- Python 3.6, 3.7 or 3.8
- cobrapy
checkInstallation

Installation
~~~~~~~~~~~~
**Docker**

::
- Installation instructions are available at https://docs.docker.com/get-docker/.

pip install geckopy

Usage
~~~~~

.. code:: python
Getting started
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

from geckopy import GeckoModel
import pandas
some_measurements = pandas.Series({'P00549': 0.1, 'P31373': 0.1, 'P31382': 0.1})
model = GeckoModel('multi-pool')
model.limit_proteins(some_measurements)
model.optimize()
In the GECKO folder, ``protocols.m`` contains instructions on how to reconstruct and analyze an ecModel for *S. cerevisiae*.

Contributing
------------

Contributions are always welcome! Please read the `contributing guidelines <https://github.com/SysBioChalmers/GECKO/blob/devel/.github/CONTRIBUTING.md>`_ to get started.

Contributors
------------

- Ivan Domenzain (`@IVANDOMENZAIN <https://github.com/IVANDOMENZAIN>`_), Chalmers University of Technology, Gothenburg Sweden
- Eduard Kerkhoven (`@edkerk <https://github.com/edkerk>`_), Chalmers University of Technology, Gothenburg Sweden
- Benjamin J. Sanchez (`@BenjaSanchez <https://github.com/benjasanchez>`_), Chalmers University of Technology, Gothenburg Sweden
- Moritz Emanuel Beber (`@Midnighter <https://github.com/Midnighter>`_), Danish Technical University, Lyngby Denmark
- Henning Redestig (`@hredestig <https://github.com/hredestig>`_), Danish Technical University, Lyngby Denmark
- Cheng Zhang, Science for Life Laboratory, KTH - Royal Institute of Technology, Stockholm Sweden

.. |Current Version| image:: https://badge.fury.io/gh/sysbiochalmers%2Fgecko.svg
:target: https://badge.fury.io/gh/sysbiochalmers%2Fgecko
.. |Tests passing| image:: https://github.com/SysBioChalmers/GECKO/actions/workflows/tests.yml/badge.svg?branch=main
:target: https://github.com/SysBioChalmers/GECKO/actions
.. |Build Status| image:: https://travis-ci.com/SysBioChalmers/GECKO.svg?branch=master
:target: https://travis-ci.com/SysBioChalmers/GECKO
.. |PyPI Version| image:: https://badge.fury.io/py/geckopy.svg
:target: https://badge.fury.io/py/geckopy
.. |Docs Status| image:: https://readthedocs.org/projects/geckotoolbox/badge/?version=latest
:alt: Documentation Status
:target: http://geckotoolbox.readthedocs.io/
.. |Gitter| image:: https://badges.gitter.im/SysBioChalmers/GECKO.svg
:alt: Join the chat at https://gitter.im/SysBioChalmers/GECKO
:target: https://gitter.im/SysBioChalmers/GECKO?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge
.. |Zenodo| image:: https://zenodo.org/badge/DOI/10.5281/zenodo.7699818.svg
:target: https://doi.org/10.5281/zenodo.7699818
6 changes: 6 additions & 0 deletions databases/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
- `DLKcatCurrencyMets.tsv` is a table of metabolites that form pairs of currency metabolites when occuring in a reaction together (one as substrate, other as product). This is used by `writeDLKcatInput` to filter out currency metabolites. This file is manually curated to reflect common metabolite pairs, but can be extended to include more model-specific metabolite names. This can either be in this folder (and a pull request to the GitHub repository will make this more widely available to other users), or by keeping a copy of this file in the `data` subfolder of the model adapter folder.
- `DLKcatIgnoreMets.tsv` is a table of small metabolites/ions that `writeDLKcatInput` filters out as DLKcat does not predict kcat values for such substrates. This can either be in this folder (and a pull request to the GitHub repository will make this more widely available to other users), or by keeping a copy of this file in the `data` subfolder of the model adapter folder.
- `max_KCAT.txt` is a collation of maximum kcat values per organism, reaction and substrate, as gathered from BRENDA database by `/src/geckopy/brenda_parser`.
- `max_MW.txt` is a collation of maximum molecular weights per organism and reaction (without explicitly referring to an protein identifier), as gathered from BRENDA database by `/src/geckopy/brenda_parser`.
- `max_SA.txt` is a collation of maximum specific activities per organism, reaction and substrate, as gathered from BRENDA database by `/src/geckopy/brenda_parser`.
- `PhylDist.mat` is a taxonomic tree of KEGG organisms, as generated by RAVEN Toolbox.
2 changes: 1 addition & 1 deletion protocol.m
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@
modelY = loadConventionalGEM();
% modelY = importModel(fullfile(modelRoot,'models','yeast-GEM.xml')); %Alternative

% Prepare ec-model
% Prepare ecModel
[ecModel, noUniprot] = makeEcModel(modelY,false,ModelAdapter);
% Read makeEcModel documentation to get a list of all it does: it prepare
% the new model.ec structure and prepares the S-matrix by splitting
Expand Down
2 changes: 1 addition & 1 deletion src/geckomat/change_model/applyComplexData.m
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
% Apply stochiometry for complex in an ecModel
%
% Input:
% model an ecModel in GECKO 3 version
% model an ecModel in GECKO 3 format (with ecModel.ec structure)
% complexInfo structure as generated by getComplexData. If nothing
% is provided, an attempt will be made to read
% data/ComplexPortal.json from the obj.params.path folder
Expand Down
14 changes: 7 additions & 7 deletions src/geckomat/change_model/applyCustomKcats.m
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
function [model, rxnUpdated, notMatch] = applyCustomKcats(model, customKcats, modelAdapter)
% applyCustomKcats
% Apply user defined kcats
% Apply user defined kcats. Reads data/customKcats.tsv in the obj.params.path
% specified in the model adapter. Alternatively, a customKcats structure can
% provided, as specified below.
%
% Input:
% model an ecModel in GECKO 3 version
% model an ecModel in GECKO 3 format (with ecModel.ec structure)
% customKcats structure with custom kcat information. If nothing
% is provided, an attempt will be made to read
% data/customKcats.tsv from the obj.params.path folder
Expand All @@ -20,9 +22,7 @@
% based on GPR rules. Then, they are suggested to be
% curated by the user
%
% A file data/customKcats.tsv will be read from the obj.params.path
% folder specified in the modelAdapter. Alternatively, a customKcats
% structure can be defined with the following fields:
% customKcats structure:
% - proteins protein identifiers, multiple for the same kcat (in case
% of a protein complex) are separated by ' + '
% - genes gene identifiers (optional, not used in matching)
Expand All @@ -41,7 +41,7 @@
% that makeEcModel introduces.
%
% Usage:
% [model, rxnUpdated, notMatch] = applyComplexData(model, customKcats, modelAdapter);
% [model, rxnUpdated, notMatch] = applyCustomKcats(model, customKcats, modelAdapter);

if nargin < 3 || isempty(modelAdapter)
modelAdapter = ModelAdapterManager.getDefaultAdapter();
Expand Down Expand Up @@ -73,7 +73,7 @@
customKcats.notes = fileContent{6};
customKcats.stoicho = fileContent{7};
elseif ~all(strcmp(fieldnames(customKcats),{'proteins','kcat','notes','stoicho'}))
error('The customKcats file does not have all the required fields in the header.');
error(['The customKcats file at ' customKcats ' does not have all the required fields in the header.']);
end

rxnToUpdate = false(length(model.ec.rxns),1);
Expand Down
10 changes: 4 additions & 6 deletions src/geckomat/change_model/applyKcatConstraints.m
Original file line number Diff line number Diff line change
@@ -1,21 +1,19 @@
function model = applyKcatConstraints(model,updateRxns)
% applyKcatConstraints
% Applies kcat-derived enzyme constraints to an ec-model. Existing enzyme
% Applies kcat-derived enzyme constraints to an ecModel. Existing enzyme
% constraints are first removed (unless updateRxns is provided), and new
% constraints are defined based on the content of model.ec.kcat.
%
% Input:
% model ec-model that was generated by makeEcModel, or loaded from
% an earlier run. Not compatible with ec-models generated by
% earlier GECKO versions (pre 3.0).
% model an ecModel in GECKO 3 format (with ecModel.ec structure)
% updateRxns if not all enzyme constraints should be updated, this can
% be given as either a logical vector of length
% model.ec.rxns, a vector of model.ec.rxns indices, or a
% (cell array of) string(s) with model.ec.rxns identifiers.
% For light models, these reactions should match model.rxns.
%
% Output:
% model ec-model where reactions are constrained by enzyme usage
% model ecModel where reactions are constrained by enzyme usage
% if a kcat value was provided for the reaction-enzyme pair
% in model.ec.kcat
%
Expand All @@ -41,7 +39,7 @@

if ~isfield(model,'ec')
error(['No model.ec structure could be found: the provided model is'...
' not a valid GECKO3 ec-model. First run makeEcModel(model).'])
' not a valid GECKO3 ecModel. First run makeEcModel(model).'])
end
if all(model.ec.kcat==0)
warning('No kcat values are provided in model.ec.kcat, model remains unchanged.')
Expand Down
12 changes: 7 additions & 5 deletions src/geckomat/change_model/findMetSmiles.m
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,14 @@
% metSmiles field, then non-empty entries will not be overwritten.
%
% Input:
% model Input model, whose model.metNames field is used to find the
% relevant SMILES
% model a model whose metNames field is used to find the relevant SMILES
% modelAdapter a loaded model adapter (Optional, will otherwise use the
% default model adapter).
% verbose logical whether progress should be reported (Optional,
% default true)
% Ouput:
% model Output model with model.metSmiles specified.
% noSMILES metabolite names for which no SMILES could be found.
% model model with model.metSmiles specified.
% noSMILES metabolite names for which no SMILES could be found.
%
if nargin < 3 || isempty(verbose)
verbose = true;
Expand Down Expand Up @@ -92,7 +91,10 @@
error('Cannot reach PubChem. Check your internet connection and try again.')
end
end
if verbose; fprintf('\b\b\b\b\b\b\b\b\b\b\b\b\bdone.\n'); end
if verbose;
fprintf('\b\b\b\b\b\b\b\b\b\b\b\b\bdone.\n');
fprintf('Model-specific SMILES database stored at %s\n',smilesDBfile);
end
end
newSmiles = uniqueSmiles(uniqueIdx);
noSMILES = cellfun(@isempty,uniqueSmiles);
Expand Down
7 changes: 4 additions & 3 deletions src/geckomat/change_model/getComplexData.m
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
function complexInfo = getComplexData(organism, modelAdapter)
% getComplexData
% Download curated complex stochiometries from the EMBL-EBI Complex
% Portal database. Writes data/ComplexPortal.json in the the
% obj.params.path specified in the ModelAdapter.
% Portal database. Writes data/ComplexPortal.json in the obj.params.path
% specified in the model adapter.
%
% Input:
% organism the organism for which complex information should be
Expand Down Expand Up @@ -30,7 +30,7 @@
% 2 if complex consists of sub-complexes, whose
% subunit stochiometries are given
% Usage
% complexInfo = getComplexData('Saccharomyces cerevisiae', modelAdapter);
% complexInfo = getComplexData(organism, modelAdapter);

if nargin < 2 || isempty(modelAdapter)
modelAdapter = ModelAdapterManager.getDefaultAdapter();
Expand Down Expand Up @@ -176,4 +176,5 @@
fid = fopen(fullfile(params.path,'data','ComplexPortal.json'), 'w');
fprintf(fid, '%s', jsontxt);
fclose(fid);
fprintf('Model-specific ComplexPortal database stored at %s\n',fullfile(params.path,'data','ComplexPortal.json'));
end
6 changes: 3 additions & 3 deletions src/geckomat/change_model/getKcatAcrossIsoenzymes.m
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,11 @@
% is then used to fill in model.ec.kcat.
%
% Input:
% model an ecModel in GECKO 3 version, not geckoLight
% model an ecModel in full GECKO 3 format (with ecModel.ec structure),
% not GECKO light
%
% Output:
% model an ecModel in GECKO 3 version with kcat values assigned to
% isoenzymes in model.ec.kcat
% model an ecModel with kcat values assigned to isoenzymes in model.ec.kcat
%
% Usage: model = getKcatAcrossIsoenzymes(model);

Expand Down
Loading

0 comments on commit 983c490

Please sign in to comment.