-
Notifications
You must be signed in to change notification settings - Fork 217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
switch solver backend to optlang / solverbasedmodel from cameo #289
Comments
current status:
timings are currently
copy pasted together, to reproduce in more verbose form do:
|
Do I see correctly that FVA became 90 times slower? There seem to be some performance issues... Edit: No it didn't, see below... |
I just realised the benchmark are actually quite unfair, optlang model is slower in creation of the model etc but with the tiny "textbook" model that is what takes most of the time when re-running the FVA the number of times that are needed to get data, so I am switching these to a larger more realistic model (but doing so ran into some unexpected issues.. story continues) either way, don't worry, won't merge anything like this to devel :) |
I also saw that your benchmarks also call the entire test function along with all asserts. That also creates noise. Why does model creation take so long with optlang? Is it the Sympy parsing of the constraints? |
Just an addition just ran some benchmarks with |
And another comment (sorry): I think we should keep |
Just a thought, would it possible to refactor some of the optlang code in cobra.Model and cobra.Reaction to a separate file? Those classes are getting to be fairly long (~1000 lines) and the optlang hooks seem like they might be better formulated as functions of a reaction rather than class methods. Idk, you'll obviously know better about when/where they're called, just a quick observation. |
updated status (including #307)
good points above, will consider them |
adjusting the benchmarks to reflect that optlang version also updates the solver problem (which cobrapy also eventually has to do) we arrive at these numbers
@phantomas1234 |
I tried pulling the optlang branch, but got a In [1]: import cobra
No solvers available.
In [2]: import cobra.test
In [3]: model = cobra.test.create_test_model('ecoli')
...
AttributeError: module 'optlang' has no attribute 'Model'
... Should I not be installing optlang via pip? Also, it complained about In terms of refactoring the properties, you can always define functions that operate on reactions |
Hi @hredestig I ran the benchmark comparison but get pretty different results from yours. Could you post how you got that nice formatting with "optlang" and "cobrapy" in different columns so I can post the results here. Major difference i fould now was MOMA which is about 20 times slower with optlang...
EDIT: Nevermind figured it out. Here my results. Factor means optlang/coprapy:
Many of the benchmarks are not as bad for me with optlang. However, there are some weird performance differences we will need to check out. I will continue profiling the problematic operations a bit. |
Hi sorry, just as a clarification because I think it's confusing: almost none of the benchmark results reported here actually use the optlang solvers. optlang solvers are still not added to the |
Modifying the solutions in-place should theoretically lead to speed improvements in the long run, right? That way the solver doesn't need to be re-initialized for repeated model changes... |
That depends on the use case. Basically it should be faster if you have a workflow where you change only few reactions and solve a lot of times (the first few lines of the tests in my benchmark correspond to that case). However, there is also some overhead due to the optlang API. It is slower in cases where you modify the model a lot and only solve once, since in the worst case you update every reaction several times before solving (see loopless and moma who do that more or less). It really depends on what our default use case is supposed to be. For instance there are a lot of people only using cobrapy to convert between model formats or construct models de novo (with only little FBA) as one can see on the mailing list. Those will experience significant performance reductions. In general the nicer optlang API will cost some performance. I think that's okay given the better usability. |
Would it be better to evaluate lazily then? I wonder if we could use something similar to the context manager approach proposed in #306 to build up a history of changes that need to be done to the solver, and then flush the queue when we call |
That would be insanely cool! However, I have no idea how to implement that since you would also have to know when a change overwrites another one. For instance, calling: model.lower_bound = 3
...
model.lower_bound = 4 The history should "know" that it can delete the first one since the second one supersedes it. Maybe just having some data structure that logs which bounds, reactions, metabolites and objectives have changed (constant memory usage yeiii). Than only update those in the solver upon optimize or getting the solver property. Like for a reaction it would check whether it is still there and update bounds and stoichiometries or just remove it. |
You could actually say the same thing about the context managers, no need to re-write a bound many times over. But that's not a required feature for a first pass implementation, it would still be as efficient as the current methods even with re-writing. If everyone's on board we should try to merge the history classes into one, I would bet a lot of the code would be pretty similar. Anything that we want to track for the solution would likely also want to get reverted by the context, so we might be able to similarly decorate |
Yeah. The only difference is that solver history is not executed until solving the model whereas model history is applied immediately and reset afterwards. No idea how you would do that in the same history, but looking forward to your ideas. |
I dont have it totally worked out either. In each case we'd build up a stack of modifications and then trigger their applications at a later time... In the model context, that happens on |
Using the reversible representation in the solver instead of the irreversible does give significant performance improvement. For a large model reading from SBML gets about 1/3 faster and calling optimize the first time is twice as fast. Also muuuuch less code in the bound setters and _populate_solver... EDIT: Also model.copy is slow to due to the deepcopy of the solution, not the deepcopy of _solver. Resetting the solution of the copied object to a new LazySolution make model.copy much faster. Still 5-10 times slower than the old copy but much better. This conserves the solution basis so solving the copied model still gains the speed advantage of previous copies. |
@cdiener that sounds promising. Can you maybe share your code? @hredestig should we maybe open a pull request for the refactored code so we can discuss it more easily? |
Sure, I made a branch here: https://github.com/cdiener/cobrapy/tree/optlang_cdiener. Comparison with the old (current optlang branch): test time [ms] old time [ms]new factor
23 test_copy_benchmark 297.671050 20.608631 0.069233
24 test_deepcopy_benchmark 278.250522 33.817388 0.121536
4 test_single_gene_deletion_moma_benchmark 34998.070239 6889.473639 0.196853
10 test_loopless_benchmark 50.506941 17.797014 0.352368
2 test_pfba_benchmark[cplex] 252.139153 135.429851 0.537123
0 test_pfba_benchmark[gurobi] 177.806495 95.888721 0.539287
26 test_change_objective_benchmark[cglpk] 0.550403 0.326667 0.593505
27 test_change_objective_benchmark[cplex] 0.550066 0.327242 0.594913
25 test_change_objective_benchmark[gurobi] 0.546461 0.327909 0.600059
22 test_add_remove_reaction_benchmark[cplex] 0.626644 0.435079 0.694301
20 test_add_remove_reaction_benchmark[gurobi] 0.610961 0.438270 0.717345
21 test_add_remove_reaction_benchmark[cglpk] 0.611174 0.439949 0.719842
1 test_pfba_benchmark[cglpk] 365.088172 274.528417 0.751951
14 test_add_metabolite_benchmark[gurobi] 8.348050 6.311666 0.756065
15 test_add_metabolite_benchmark[cglpk] 8.404744 6.418215 0.763642
16 test_add_metabolite_benchmark[cplex] 8.259900 6.363597 0.770421
12 test_benchmark_read 21.185443 16.960052 0.800552
3 test_single_gene_deletion_fba_benchmark 338.928193 311.072180 0.917811
7 test_flux_variability_benchmark[gurobi] 2367.999671 2175.041806 0.918514
8 test_flux_variability_benchmark[cglpk] 1216.494740 1122.802473 0.922982
9 test_flux_variability_benchmark[cplex] 1166.438857 1091.038845 0.935359
5 test_single_gene_deletion_benchmark 272.857519 256.449415 0.939866
13 test_benchmark_write 35.595469 33.492812 0.940929
6 test_double_gene_deletion_benchmark 764.976082 722.069430 0.943911
19 test_subtract_metabolite_benchmark[cplex] 0.007282 0.006949 0.954214
11 test_phenotype_phase_plane_benchmark 31.039275 30.079134 0.969067
18 test_subtract_metabolite_benchmark[cglpk] 0.007030 0.006888 0.979790
17 test_subtract_metabolite_benchmark[gurobi] 0.006804 0.007063 1.038079 |
@cdiener Ok, this might be the way to go but will break a lot of things in cameo unfortunately. After getting rid of the 2 variables per reaction, what would be your take on implementing pfba? Because I don't want to go back to things like |
The gain is not huge, we can still think about leaving it as two but might have to optimize a bit more. CProfile is your friend here. For pFBA I would temporsrily add new variables to the cameo solver and change the bounds. Remove everything afterwards. As we can see from the benchmarks now the overhead of that is not huge. Another thing I found with speed is to optimize Reaction.objective_coefficient. Getting the objective fron the low-level solver, converting it to sympy and extracting the coefficients and doing that for every reaction is not very efficient. |
Can we implement absolute value lower down in the stack? One of the 'future directions' for optlang lists handling absolute values. I'm thinking for things like linear MOMA & future methods I'd rather have the solver interface handle that |
We tested 2 vs. 1 variable per reaction and we see only a marginal increase in runtime for |
Give me ~4 days and I'll work something out that I deem compatible with cameo (after all I don't see a value in merging things from cameo into cobrapy in a way that breaks cameo; adding variables on the fly is not efficient but I get that pfba might be the only method that would suffer from that). If for example Edit: |
After fixing |
Yeah for solving its only about 30% faster. I think the issue we have is more with model creation. For instance reading from SBML became twice as slow due to _populate_solver. However, I think the right way to go here is to implement the lazy solver as proposed by @pstjohn. This way model creation for models that are not to be solved (converting between matlab and SBML for example) wouldn't suffer because of some optimization for a future FBA that is never run. In hindsight I also think leaving the problem in standard form might be better for allowing dualising methods to be formulated faster. Regarding pFBA, I think all methods have to be ported to the new solver interface. Some operations that were cheap before a slow now and vice versa. @phantomas1234 I noted that switching the solver interface with optlang is really slow (model.solver = "glpk"). Also optlang is not deterministic in the order of the solvers it picks. Might be worthwhile having a look at optlang to see whether we can squeeze some speed out of there too... |
Sure, I can push some stuff. For the functions I would like to take the moma module. I have some ideas how to implement that without any new variables. |
Sounds good. Do you mean quadratic or linear moma? Also, when it comes to benchmarks one should keep in mind the huge timing differences for |
Yeah this would be solved by the lazy solver anyways since that extra time On Tue, Nov 8, 2016 at 2:16 PM, Nikolaus Sonnenschein <
Dr. Christian Diener [email protected] For encrypted communication you may use the following public key: |
with creation of devel-2 branch featuring first version of solverbased model, this ticket is no longer specific enough so closing this, further refactoring to be discussed in other issues |
No description provided.
The text was updated successfully, but these errors were encountered: