From cb5474794462b6adcd8cc3ab612c7ccaf5af5cf7 Mon Sep 17 00:00:00 2001 From: Eduard Kerkhoven Date: Mon, 27 May 2024 11:39:13 +0200 Subject: [PATCH] feat: prot_usage always draws from prot_pool (#379) * fix: calculateFfactor can take protData input * feat: constrainEnzConcs keep prot pool draw addresses #375 * fix: updateProtPool obsolete * doc: update full_ecModel for prot_usage rxns * fix: enzymeUsage correct mention of output units solves #376 * doc: updateGECKOdoc * doc: update README.md with protocol change * fix: README.md link --- README.md | 3 + .../limit_proteins/calculateFfactor.html | 31 ++-- .../limit_proteins/constrainEnzConcs.html | 2 +- .../limit_proteins/updateProtPool.html | 146 +++++++++++------- doc/src/geckomat/utilities/enzymeUsage.html | 4 +- .../limit_proteins/calculateFfactor.m | 17 +- .../limit_proteins/constrainEnzConcs.m | 2 +- src/geckomat/limit_proteins/updateProtPool.m | 30 +++- src/geckomat/utilities/enzymeUsage.m | 2 +- tutorials/full_ecModel/protocol.m | 38 +++-- 10 files changed, 174 insertions(+), 101 deletions(-) diff --git a/README.md b/README.md index 77aab20e5..d7c451e7f 100644 --- a/README.md +++ b/README.md @@ -10,6 +10,9 @@ The **GECKO** toolbox enhances a **G**enome-scale model to account for **E**nzym 💡 In the [`GECKO/tutorials`](https://github.com/SysBioChalmers/GECKO/tree/main/tutorials) folder there are examples of how GECKO can be applied to GEMs, in either of its _full_ or _light_ forms. Each `protocol.m` contains instructions on how to reconstruct and analyze an ecModel, demonstrating how different fuctions in GECKO can be used. These two scripts complement the [protocols paper](#citation). +### Significant changes since protocol publication +- GECKO **3.2.0**: all protein usage reactions draw from the protein pool, even if they are constrained by proteomics data. This affects **Step 58** in the protocol, changing behaviour of `constrainEnzConcs` and making `updateProtPool` obsolete, `tutorials/full_ecModel/protocol.m` is updated to reflect this change. See [#357](https://github.com/SysBioChalmers/GECKO/issues/375) for more details. + _**Note:** Regarding code and model compatibility with earlier GECKO versions, see [Previous versions: GECKO 1 and 2](https://github.com/SysBioChalmers/GECKO/wiki/Previous-versions:-GECKO-1-and-2)_. ## Cite us diff --git a/doc/src/geckomat/limit_proteins/calculateFfactor.html b/doc/src/geckomat/limit_proteins/calculateFfactor.html index 0a0ad575d..855823fed 100644 --- a/doc/src/geckomat/limit_proteins/calculateFfactor.html +++ b/doc/src/geckomat/limit_proteins/calculateFfactor.html @@ -121,23 +121,22 @@

SOURCE CODE ^end -0067 -0068 % Get MW and abundance (unit does not matter, f is fraction) -0069 [~,idx] = ismember(protData.uniprot,uniprotDB.ID); -0070 protData.MW = uniprotDB.MW(idx); -0071 protData.abundance = protData.level .* protData.MW; -0072 -0073 totalProt = sum(protData.abundance); -0074 -0075 % Get enzymes in model -0076 enzymesInModel = ismember(protData.uniprot,enzymes); -0077 totalEnz = sum(protData.abundance(enzymesInModel)); -0078 -0079 f = totalEnz/totalProt; -0080 end +0066 % Get MW and abundance (unit does not matter, f is fraction) +0067 [~,idx] = ismember(protData.uniprotIDs,uniprotDB.ID); +0068 protData.MW = uniprotDB.MW(idx); +0069 protData.abundances = protData.level .* protData.MW; +0070 end +0071 +0072 totalProt = sum(protData.abundances); +0073 +0074 % Get enzymes in model +0075 enzymesInModel = ismember(protData.uniprotIDs,enzymes); +0076 totalEnz = sum(protData.abundances(enzymesInModel)); +0077 +0078 f = totalEnz/totalProt; +0079 end
Generated by m2html © 2005
\ No newline at end of file diff --git a/doc/src/geckomat/limit_proteins/constrainEnzConcs.html b/doc/src/geckomat/limit_proteins/constrainEnzConcs.html index 1ee4e2b56..9c3b7d133 100644 --- a/doc/src/geckomat/limit_proteins/constrainEnzConcs.html +++ b/doc/src/geckomat/limit_proteins/constrainEnzConcs.html @@ -103,7 +103,7 @@

SOURCE CODE ^%If non-NaN in model.ec.concs, then constrain by UB 0045 if any(protCons) -0046 model.S(protPoolIdx, usageRxnsIdx(protCons)) = 0; +0046 % model.S(protPoolIdx, usageRxnsIdx(protCons)) = 0; % Since GECKO 3.2.0 0047 model.lb(usageRxnsIdx(protCons)) = -model.ec.concs(protCons); 0048 end 0049 end diff --git a/doc/src/geckomat/limit_proteins/updateProtPool.html b/doc/src/geckomat/limit_proteins/updateProtPool.html index ce88a1a0b..e757f1f7c 100644 --- a/doc/src/geckomat/limit_proteins/updateProtPool.html +++ b/doc/src/geckomat/limit_proteins/updateProtPool.html @@ -28,8 +28,13 @@

SYNOPSIS ^DESCRIPTION ^

 updateProtPool
-   Updates the protein pool to compensate for measured proteomics data (in
-   model.ec.concs).
+   Obsolete since GECKO 3.2.0, as all (measured and unmeasured) enzymes
+   are drawn from the protein pool. Instead, use setProtPoolSize. See
+   https://github.com/SysBioChalmers/GECKO/issues/375 for explanation.
+
+   Before GECKO 3.2.0: updates the protein pool to compensate for measured
+   proteomics data (in model.ec.concs), as only the unmeasured enzymes
+   draw from the protein pool.
 
  Input:
    ecModel         an ecModel in GECKO 3 format (with ecModel.ec structure)
@@ -39,6 +44,7 @@ 

DESCRIPTION ^CROSS-REFERENCE INFORMATION ^
 <h2><a name=SOURCE CODE ^

0001 function ecModel  = updateProtPool(ecModel, Ptot, modelAdapter)
 0002 % updateProtPool
-0003 %   Updates the protein pool to compensate for measured proteomics data (in
-0004 %   model.ec.concs).
-0005 %
-0006 % Input:
-0007 %   ecModel         an ecModel in GECKO 3 format (with ecModel.ec structure)
-0008 %   Ptot            total protein content in g/gDCW, overwrites the value
-0009 %                   from modelAdapter. For instance, condition-specific
-0010 %                   fluxData.Ptot from loadFluxData can be used. If nothing
-0011 %                   is provided, the modelAdapter value is used.
-0012 %   modelAdapter    a loaded model adapter (Optional, will otherwise use the
-0013 %                   default model adapter).
-0014 % Output:
-0015 %   model           an ecModel where model.ec.concs is populated with
-0016 %                   protein concentrations
-0017 % Usage:
-0018 %   ecModel  = updateProtPool(ecModel, Ptot, modelAdapter)
-0019 
-0020 if nargin < 3 || isempty(modelAdapter)
-0021     modelAdapter = ModelAdapterManager.getDefault();
-0022     if isempty(modelAdapter)
-0023         error('Either send in a modelAdapter or set the default model adapter in the ModelAdapterManager.')
-0024     end
-0025 end
-0026 params = modelAdapter.params;
-0027 
-0028 if nargin < 2 || isempty(Ptot)
-0029     Ptot = params.Ptot;
-0030     disp(['Total protein content used: ' num2str(Ptot) ' [g protein/gDw]'])
-0031 end
-0032 
-0033 % Convert Ptot to mg/gDW if provided in g/gDCW (which is default)
-0034 if Ptot < 1
-0035     Ptot = Ptot * 1000;
-0036 end
-0037 
-0038 Pmeas = sum(ecModel.ec.concs,'omitnan');
-0039 if Pmeas == 0
-0040     error('The model has not yet been constrained with proteomics, as ecModel.ec.concs is empty.')
-0041 end
-0042 
-0043 Pnew = (Ptot - Pmeas) * params.f;
-0044 
-0045 if Pnew > 0
-0046     PoolRxnIdx = strcmp(ecModel.rxns,'prot_pool_exchange');
-0047     ecModel.lb(PoolRxnIdx) = -Pnew*params.sigma;
-0048     sol = solveLP(ecModel);
-0049     if isempty(sol.x)
-0050         error(['Estimating the remaining protein pool by subtracting the ' ...
-0051                'sum of measured enzyme concentrations (in ecModel.ec.concs) ' ...
-0052                'from the total protein pool (Ptot) does not yield a functional ' ...
-0053                'model.'])
-0054     end
-0055 else
-0056     error('The total measured protein mass exceeds the total protein content.')
-0057 end
-0058 end
+0003 % Obsolete since GECKO 3.2.0, as all (measured and unmeasured) enzymes +0004 % are drawn from the protein pool. Instead, use setProtPoolSize. See +0005 % https://github.com/SysBioChalmers/GECKO/issues/375 for explanation. +0006 % +0007 % Before GECKO 3.2.0: updates the protein pool to compensate for measured +0008 % proteomics data (in model.ec.concs), as only the unmeasured enzymes +0009 % draw from the protein pool. +0010 % +0011 % Input: +0012 % ecModel an ecModel in GECKO 3 format (with ecModel.ec structure) +0013 % Ptot total protein content in g/gDCW, overwrites the value +0014 % from modelAdapter. For instance, condition-specific +0015 % fluxData.Ptot from loadFluxData can be used. If nothing +0016 % is provided, the modelAdapter value is used. +0017 % modelAdapter a loaded model adapter (Optional, will otherwise use the +0018 % default model adapter). +0019 % +0020 % Output: +0021 % model an ecModel where model.ec.concs is populated with +0022 % protein concentrations +0023 % Usage: +0024 % ecModel = updateProtPool(ecModel, Ptot, modelAdapter) +0025 +0026 % Do not run from GECKO version 3.2.0 onwards. This can be recognized by +0027 % prot_usage reactions that are constrained by proteomics and still draw +0028 % from the protein pool +0029 protRxns = find(startsWith(ecModel.rxns,'usage_prot_')); +0030 poolMetIdx = find(strcmp(ecModel.mets,'prot_pool')); +0031 % Selected proteins with proteomics constraints +0032 constProtRxns = ~(ecModel.lb(protRxns)==-1000); +0033 % Are any still drawing from prot_pool? This is introduced in GECKO 3.2.0. +0034 if any(full(ecModel.S(poolMetIdx,protRxns(constProtRxns)))) +0035 error(['In the provided ecModel, all protein usage reactions, both with ' ... +0036 'and without concentration constraints, draw from the protein pool. ' ... +0037 'This was introduced with GECKO 3.2.0. Since then, updateProtPool ' ... +0038 'has become obsolete, use setProtPoolSize instead to constrain the ' ... +0039 'total protein pool with condition-specific total protein content. See '... +0040 '<a href="https://github.com/SysBioChalmers/GECKO/issues/375">here</a> ' ... +0041 'for more explanation.']) +0042 end +0043 +0044 if nargin < 3 || isempty(modelAdapter) +0045 modelAdapter = ModelAdapterManager.getDefault(); +0046 if isempty(modelAdapter) +0047 error('Either send in a modelAdapter or set the default model adapter in the ModelAdapterManager.') +0048 end +0049 end +0050 params = modelAdapter.params; +0051 +0052 if nargin < 2 || isempty(Ptot) +0053 Ptot = params.Ptot; +0054 disp(['Total protein content used: ' num2str(Ptot) ' [g protein/gDw]']) +0055 end +0056 +0057 % Convert Ptot to mg/gDW if provided in g/gDCW (which is default) +0058 if Ptot < 1 +0059 Ptot = Ptot * 1000; +0060 end +0061 +0062 Pmeas = sum(ecModel.ec.concs,'omitnan'); +0063 if Pmeas == 0 +0064 error('The model has not yet been populated with proteomics, as ecModel.ec.concs is empty.') +0065 end +0066 +0067 Pnew = (Ptot - Pmeas) * params.f; +0068 +0069 if Pnew > 0 +0070 PoolRxnIdx = strcmp(ecModel.rxns,'prot_pool_exchange'); +0071 ecModel.lb(PoolRxnIdx) = -Pnew*params.sigma; +0072 sol = solveLP(ecModel); +0073 if isempty(sol.x) +0074 error(['Estimating the remaining protein pool by subtracting the ' ... +0075 'sum of measured enzyme concentrations (in ecModel.ec.concs) ' ... +0076 'from the total protein pool (Ptot) does not yield a functional ' ... +0077 'model.']) +0078 end +0079 else +0080 error('The total measured protein mass exceeds the total protein content.') +0081 end +0082 end

Generated by m2html © 2005
\ No newline at end of file diff --git a/doc/src/geckomat/utilities/enzymeUsage.html b/doc/src/geckomat/utilities/enzymeUsage.html index 763f93d6a..012d491c0 100644 --- a/doc/src/geckomat/utilities/enzymeUsage.html +++ b/doc/src/geckomat/utilities/enzymeUsage.html @@ -30,7 +30,7 @@

DESCRIPTION ^
 enzymeUsage
    Gives enzyme usages based on a provided flux distribution, as obtained
    from a full GECKO model. It can give:
-   1)  absolute usage: the specific enzyme usage in ug/gDCW/h, which can
+   1)  absolute usage: the specific enzyme usage in mg/gDCW, which can
        be given for enzymes with- and without concentration information;
    2)  capacity usage: the ratio of available enzyme that is used, calcuted
        by (absUsage/UB) (note that capacity usage is 0 if an enzyme
@@ -76,7 +76,7 @@ 

SOURCE CODE ^% enzymeUsage 0003 % Gives enzyme usages based on a provided flux distribution, as obtained 0004 % from a full GECKO model. It can give: -0005 % 1) absolute usage: the specific enzyme usage in ug/gDCW/h, which can +0005 % 1) absolute usage: the specific enzyme usage in mg/gDCW, which can 0006 % be given for enzymes with- and without concentration information; 0007 % 2) capacity usage: the ratio of available enzyme that is used, calcuted 0008 % by (absUsage/UB) (note that capacity usage is 0 if an enzyme diff --git a/src/geckomat/limit_proteins/calculateFfactor.m b/src/geckomat/limit_proteins/calculateFfactor.m index 790e9b970..81cc87b6e 100644 --- a/src/geckomat/limit_proteins/calculateFfactor.m +++ b/src/geckomat/limit_proteins/calculateFfactor.m @@ -61,20 +61,19 @@ uniprot = uniprotDB.ID(b(a)); level(~a) = []; clear protData - protData.uniprot = uniprot; + protData.uniprotIDs = uniprot; protData.level = level; + % Get MW and abundance (unit does not matter, f is fraction) + [~,idx] = ismember(protData.uniprotIDs,uniprotDB.ID); + protData.MW = uniprotDB.MW(idx); + protData.abundances = protData.level .* protData.MW; end -% Get MW and abundance (unit does not matter, f is fraction) -[~,idx] = ismember(protData.uniprot,uniprotDB.ID); -protData.MW = uniprotDB.MW(idx); -protData.abundance = protData.level .* protData.MW; - -totalProt = sum(protData.abundance); +totalProt = sum(protData.abundances); % Get enzymes in model -enzymesInModel = ismember(protData.uniprot,enzymes); -totalEnz = sum(protData.abundance(enzymesInModel)); +enzymesInModel = ismember(protData.uniprotIDs,enzymes); +totalEnz = sum(protData.abundances(enzymesInModel)); f = totalEnz/totalProt; end diff --git a/src/geckomat/limit_proteins/constrainEnzConcs.m b/src/geckomat/limit_proteins/constrainEnzConcs.m index 736f30c28..66094b53b 100644 --- a/src/geckomat/limit_proteins/constrainEnzConcs.m +++ b/src/geckomat/limit_proteins/constrainEnzConcs.m @@ -43,7 +43,7 @@ %If non-NaN in model.ec.concs, then constrain by UB if any(protCons) - model.S(protPoolIdx, usageRxnsIdx(protCons)) = 0; +% model.S(protPoolIdx, usageRxnsIdx(protCons)) = 0; % Since GECKO 3.2.0 model.lb(usageRxnsIdx(protCons)) = -model.ec.concs(protCons); end end diff --git a/src/geckomat/limit_proteins/updateProtPool.m b/src/geckomat/limit_proteins/updateProtPool.m index 3cc4f54c3..dcbc20f21 100644 --- a/src/geckomat/limit_proteins/updateProtPool.m +++ b/src/geckomat/limit_proteins/updateProtPool.m @@ -1,7 +1,12 @@ function ecModel = updateProtPool(ecModel, Ptot, modelAdapter) % updateProtPool -% Updates the protein pool to compensate for measured proteomics data (in -% model.ec.concs). +% Obsolete since GECKO 3.2.0, as all (measured and unmeasured) enzymes +% are drawn from the protein pool. Instead, use setProtPoolSize. See +% https://github.com/SysBioChalmers/GECKO/issues/375 for explanation. +% +% Before GECKO 3.2.0: updates the protein pool to compensate for measured +% proteomics data (in model.ec.concs), as only the unmeasured enzymes +% draw from the protein pool. % % Input: % ecModel an ecModel in GECKO 3 format (with ecModel.ec structure) @@ -11,12 +16,31 @@ % is provided, the modelAdapter value is used. % modelAdapter a loaded model adapter (Optional, will otherwise use the % default model adapter). +% % Output: % model an ecModel where model.ec.concs is populated with % protein concentrations % Usage: % ecModel = updateProtPool(ecModel, Ptot, modelAdapter) +% Do not run from GECKO version 3.2.0 onwards. This can be recognized by +% prot_usage reactions that are constrained by proteomics and still draw +% from the protein pool +protRxns = find(startsWith(ecModel.rxns,'usage_prot_')); +poolMetIdx = find(strcmp(ecModel.mets,'prot_pool')); +% Selected proteins with proteomics constraints +constProtRxns = ~(ecModel.lb(protRxns)==-1000); +% Are any still drawing from prot_pool? This is introduced in GECKO 3.2.0. +if any(full(ecModel.S(poolMetIdx,protRxns(constProtRxns)))) + error(['In the provided ecModel, all protein usage reactions, both with ' ... + 'and without concentration constraints, draw from the protein pool. ' ... + 'This was introduced with GECKO 3.2.0. Since then, updateProtPool ' ... + 'has become obsolete, use setProtPoolSize instead to constrain the ' ... + 'total protein pool with condition-specific total protein content. See '... + 'here ' ... + 'for more explanation.']) +end + if nargin < 3 || isempty(modelAdapter) modelAdapter = ModelAdapterManager.getDefault(); if isempty(modelAdapter) @@ -37,7 +61,7 @@ Pmeas = sum(ecModel.ec.concs,'omitnan'); if Pmeas == 0 - error('The model has not yet been constrained with proteomics, as ecModel.ec.concs is empty.') + error('The model has not yet been populated with proteomics, as ecModel.ec.concs is empty.') end Pnew = (Ptot - Pmeas) * params.f; diff --git a/src/geckomat/utilities/enzymeUsage.m b/src/geckomat/utilities/enzymeUsage.m index 8f622ccd5..f9a0faca8 100644 --- a/src/geckomat/utilities/enzymeUsage.m +++ b/src/geckomat/utilities/enzymeUsage.m @@ -2,7 +2,7 @@ % enzymeUsage % Gives enzyme usages based on a provided flux distribution, as obtained % from a full GECKO model. It can give: -% 1) absolute usage: the specific enzyme usage in ug/gDCW/h, which can +% 1) absolute usage: the specific enzyme usage in mg/gDCW, which can % be given for enzymes with- and without concentration information; % 2) capacity usage: the ratio of available enzyme that is used, calcuted % by (absUsage/UB) (note that capacity usage is 0 if an enzyme diff --git a/tutorials/full_ecModel/protocol.m b/tutorials/full_ecModel/protocol.m index ae85482dc..8260f7e81 100644 --- a/tutorials/full_ecModel/protocol.m +++ b/tutorials/full_ecModel/protocol.m @@ -250,7 +250,7 @@ % STEP 39-42 Relax protein pool constraint % As a simplistic way to ensure the model to reach the growth rate, the % upper bound of the protein pool exchange reaction can be increased to -% whatever is required. This works, but STEP 17 is preferred. +% whatever is required. This works, but STEP 43 is preferred. ecModel = setParam(ecModel, 'lb', 'r_4041', 0.41); ecModel = setParam(ecModel, 'lb', 'prot_pool_exchange', -1000); ecModel = setParam(ecModel, 'obj', 'prot_pool_exchange', 1); @@ -266,7 +266,7 @@ % STEP 43-44 Sensitivity tuning % First reset the protein pool constraint to a more realistic value, -% reverting STEP 16. +% reverting STEP 42. ecModel = setProtPoolSize(ecModel); [ecModel, tunedKcats] = sensitivityTuning(ecModel); @@ -325,17 +325,35 @@ ecModel = constrainEnzConcs(ecModel); % STEP 58 Update protein pool -% The protein pool reaction will be constraint by the remaining, unmeasured -% enzyme content. This is calculated by subtracting the sum of -% ecModel.ec.concs from the condition-specific total protein content. The -% latter is stored together with the flux data that otherwise will be used -% in Step 21. +% +% ==> Since GECKO 3.2.0: +% All protein usage reactions draw from the protein pool, both for +% proteins with and without concentration constraints. As a +% consequence, updateProtPool has become obsolete. To set the protein +% pool exchange with condition-specific total protein content, use +% setProtPoolSize instead. For more explanation, see +% https://github.com/SysBioChalmers/GECKO/issues/375 +% +% See STEP 32 for considerations about the f-factor. Here, we can +% recalculate the f-factor based on the proteomics dataset. + +f = calculateFfactor(ecModel,protData); fluxData = loadFluxData(); -ecModel = updateProtPool(ecModel,fluxData.Ptot(1)); +ecModel = setProtPoolSize(ecModel,fluxData.Ptot(1),f); + +% ==> Before GECKO 3.2.0: +% The legacy code is still shown here, but should not be run. + % The protein pool reaction will be constraint by the remaining, unmeasured + % enzyme content. This is calculated by subtracting the sum of + % ecModel.ec.concs from the condition-specific total protein content. The + % latter is stored together with the flux data that otherwise will be used + % in STEP 59. + % fluxData = loadFluxData(); + % ecModel = updateProtPool(ecModel,fluxData.Ptot(1)); % STEP 59-63 Load flux data % Matching the proteomics sample(s), condition-specific flux data needs to -% be loaded to constrain the model. This was already loaded in Step 20 for +% be loaded to constrain the model. This was already loaded in STEP 58 for % gathering Ptot, but is repeated here nonetheless. Flux data is read from % /data/fluxData.tsv. fluxData = loadFluxData(); @@ -411,7 +429,7 @@ % It is obvious that no total protein constraint is reached, and Crabtree % effect is not observed. -% Perform the Crabtree simulation on the pre-Step 33 ecModel (where kcat +% Perform the Crabtree simulation on the pre-STEP 33 ecModel (where kcat % sensitivity tuning has not yet been applied). ecModel_preTuning = loadEcModel('ecYeastGEM_stage2.yml'); ecModel_preTuning = setParam(ecModel_preTuning,'lb','r_1714',-1000);