Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve VmInstantiation calibration #825

Merged
merged 9 commits into from
Jun 8, 2023
Merged

Improve VmInstantiation calibration #825

merged 9 commits into from
Jun 8, 2023

Conversation

jayz22
Copy link
Contributor

@jayz22 jayz22 commented May 31, 2023

What

Resolves #821:

  • Use synthesized Wasm at different sizes to fit the linear model parameter for VmInstantiation.
  • Rerun calibration and update the model parameters. See full output

The main diff is in the VmInstantiation costs

// cpu cost
ContractCostType::VmInstantiation => {
-    cpu.const_term = 1_000_000;
-    cpu.linear_term = 0;
+    cpu.const_term = 736049;
+    cpu.linear_term = 684;
}

// mem cost
  ContractCostType::VmInstantiation => {
-      mem.const_term = 1100000;
-      mem.linear_term = 0;
+     mem.const_term = 107854;
+     mem.linear_term = 49;
  }

Why

VmInstantiation cost depends linearly w.r.t contract size, and has been calibrated as a linear model.
Previously we were using the soroban-test-wasm contracts which were few, as a result, the linear parameter was poorly calibrated (R^2 around 0.7). As a result we manually overrode the calibrated parameters with a constant ones.

This PR uses synthesized Wasm so that the size of the contracts can be varied and controlled. The resulting R^2 score is > 0.99.

Known limitations

During calibration, it was realized the precision of metering is compromised by the integer representation of the model params (especially the linear one that are close to 0). They should probably be represented as fixed-point integers. Created this followup issue: #824

@jayz22 jayz22 requested review from graydon and sisuresh as code owners May 31, 2023 02:01
@jayz22 jayz22 requested a review from anupsdf May 31, 2023 02:01
@anupsdf
Copy link
Contributor

anupsdf commented May 31, 2023

Plotting a chart from my local run for VMInstantiation shows that CPU instruction is linear but memory seems like a constant.
No strong objection though if you feel it should be constant for both.

@graydon
Copy link
Contributor

graydon commented May 31, 2023

Hm yeah I thought this was fairly .. linear actually! I mean, concretely it definitely does have to do work that's at least linear in contract size (in the sense that it has to parse the wasm). Can you not use the synth-wasm subcrate to get a clearly varying set of samples?

@jayz22
Copy link
Contributor Author

jayz22 commented May 31, 2023

Thanks for the plot @anupsdf.
@graydon yeah I can try it with synth-wasm. My guess is it is only weakly linear to the size of the contract function if since it has to parse all the wasm sections, validate them, and link the host functions etc (thus the the high floor of instructions). But it's worth exploring.

@jayz22
Copy link
Contributor Author

jayz22 commented May 31, 2023

Okay, I was definitely wrong about the weak linear dependency.
I did more experiments with wasm sizes. The cpu_insns w.r.t wasm size is definitely strongly linear. I did two sets of experiments, both sweeping across synthesized wasm contracts of growing sizes (up to around 64kB our pre-defined limits). The first experiment set contains a single function with all the instructions in it, the second experiment set contains many "empty" internal functions. Both cases show strong cpu linear dependency (R^2 > 0.99). However, the slope are quite different, here are the results:

First experiment (contract with a single function having n instructions)

cpu model params: FPCostModel { const_param: 777313.8510813023, lin_param: 117.59111280561848, r_squared: 0.9974074087372953 }
mem model params: FPCostModel { const_param: 121941.20503468116, lin_param: 32.46245896634867, r_squared: 0.9354351396402935 }

Second experiment (contract with n empty functions)

cpu model params: FPCostModel { const_param: 712899.9195890937, lin_param: 687.9128326191914, r_squared: 0.9996499306519864 }
mem model params: FPCostModel { const_param: 107677.62659833141, lin_param: 48.693943068574725, r_squared: 0.96778598722394 }

I'm leaning towards using the first experiment as the calibration target. As I feel real contracts should have somewhat limited number of internal functions. The resulting linear param in the second experiment maybe too large and too panelizing.
@graydon @anupsdf Any thoughts?

@jayz22 jayz22 changed the title Change VmInstantiation cost type from linear to constant Improve VmInstantiation calibration May 31, 2023
@jayz22
Copy link
Contributor Author

jayz22 commented May 31, 2023

Updated the PR and the comments. Ready for review again.

@mootz12
Copy link
Contributor

mootz12 commented May 31, 2023

Hello!

Wanted to chime in as I've just started benchmarking the Blend protocol to start looking for optimizations.

I have some concerns about how much weight the VmInitialization cost function has on the total cost of both the CpuInst and MemBytes. Currently, this is taking up ~70-80% of total CPU instructions and ~99% of Mem Bytes, leaving the only optimizations available as reducing the number of cross contract calls.

The main culprit is a custom token (tracks collateral or liabilities for an underlying token, protocol calls balance on these a lot) that has a fairly small WASM size (8500 bytes).

From what I can quickly grasp from this PR - it looks like I will expect costs to increase substantially in this department, from 1m CpuInst per cross contract call to 1.78m (8500 * 119 + 772061). This is not including the significantly increased access cost of large contracts, either (8400 bytes is the breakeven point to the old fixed cost).

Happy to share more information if it would be useful.

@graydon
Copy link
Contributor

graydon commented May 31, 2023

@jayz22 The second, larger bound should be used -- we're aiming for an upper bound. Users can submit a malicious contract.

@mootz12 Yes, cross-contract calls are for the time being the largest cost centre, by a fairly wide margin, and they're likely to always be so. We're likely to bring down the magnitude of difference in the future by caching a certain amount of material in the VM, making instantiation avoid re-parsing wasms from one call to the next, but we haven't put any time into this yet, and will only be able to do so much. The rest of the system is fairly efficient, so by comparison instantiation of each VM is high-cost.

@jayz22
Copy link
Contributor Author

jayz22 commented May 31, 2023

@graydon I've updated it to use the second, larger bound.

@mootz12 Thanks for your feedback. It is unfortunately VM instantiation is as high as it is now, due to the work necessary to set up the VM and parsing the wasm contracts. The metered cost parameters are just reflection of the calibrated costs (and we have to take the upper bound to prevent malicious contracts). We may have improvements and optimizations for it in the future (created #827 to track it).

@dmkozh
Copy link
Contributor

dmkozh commented Jun 1, 2023

Do we have any understanding on what actually drives these numbers up so much and why do we have so much variance? E.g. in Anup's data we have linear param at around 230 and in 2 Jay's experiments we have values from 100 to 600. Either way the number is much higher than the cost of interpreting a Wasm instruction, which seems really weird to me. Does VM do some heavy pre-processing? Is something off about our measurement methodology?

@jayz22
Copy link
Contributor Author

jayz22 commented Jun 1, 2023

Do we have any understanding on what actually drives these numbers up so much and why do we have so much variance? E.g. in Anup's data we have linear param at around 230 and in 2 Jay's experiments we have values from 100 to 600. Either way the number is much higher than the cost of interpreting a Wasm instruction, which seems really weird to me. Does VM do some heavy pre-processing? Is something off about our measurement methodology?

Please see the comment above #825 (comment)
These are two different setups.

@dmkozh
Copy link
Contributor

dmkozh commented Jun 1, 2023

Please see the comment above #825 (comment) These are two different setups.

Sure, I understand that these are two or even three different setups. What I don't understand is why do we observe such a drastic difference and whether we are using the proper value as the input size (it seems to me like we are not).

@jayz22
Copy link
Contributor Author

jayz22 commented Jun 2, 2023

Sure, I understand that these are two or even three different setups. What I don't understand is why do we observe such a drastic difference and whether we are using the proper value as the input size (it seems to me like we are not).

Well, in this case (and most cases) there is no perfect single input to capture all the degree of variations. Instantiating an VM requires much work including parsing and validating the Wasm file, initializing linear memory, linking host functions etc.
Using Wasm bytes size is the best we can do in terms of capturing all the variations in one degree-of-freedom. But we have to (or try our best to) set it up such that the resulting one-dimensional function represents an upper bound across all other factors.
In this case, one of the simplest case is having one exported contract function containing all the bytes, the it will just have to parse and link that one function . That was the first experiment ( lin_param: 117.59111280561848). On the other end, the bytes can be made up of many small functions, each of which consist of a simple argument that is trivial to parse, but the overall cost of parsing that file of the same length as the first one will be much larger (lin_param: 687.9128326191914) per byte. We have to pick the second case for meter charging.
The other stages of VM instantiation (initing linear memory, host functions) should be more or less constant, thus explains the high (but similar for both) floor (~700k).

@dmkozh
Copy link
Contributor

dmkozh commented Jun 2, 2023

But we have to (or try our best to) set it up such that the resulting one-dimensional function represents an upper bound across all other factors.

That's not necessarily the only answer - we have other options, such as changing what the input value is or breaking down the operation into multiple linear components dependent on different values.

On the other end, the bytes can be made up of many small functions, each of which consist of a simple argument that is trivial to parse, but the overall cost of parsing that file of the same length as the first one will be much larger (lin_param: 687.9128326191914) per byte. We have to pick the second case for meter charging.

If we believe that the number of functions is what drives the cost, then why don't we use it as the input parameter to metering? This of course creates a 'loop' for the current model (because you need to load the contract to count the functions), but there are ways around that, such as:

  • caching the number of functions as part of wasm upload operation
  • break the vm instantiation down in two stages (if feasible)

We have to pick the second case for meter charging.

I'm not sure that's the best solution:

  • on one hand, we have an unknown number of contracts (likely majority?) overpaying for instantiation at 3-4x rate
  • on the other hand, since we still don't have a good understanding on what actually is the cost function, we can't quite claim that this is the worst case

One potential way to exercise this would be to create two sets of benchmarks:

  • Contracts with a single function and growing size (e.g. calling into a bunch of host fns)
  • Contracts with a growing number of no-op functions

I think both can be generated with a bit of proc macros.

Another thing that potential might matter is the number of imported host functions. I'm not sure if we're covering that now - it might or might not have an impact (again, even the 687 coefficient might not necessarily be enough). Would be nice to also cover contracts that import as many host fns as possible.

@jayz22
Copy link
Contributor Author

jayz22 commented Jun 2, 2023

Multi-dimensional inputs were considered (see #208) but we decided it was not worth the complexity at the time. And we will be facing the same challenges of picking the number of degrees of freedom, as well as deciding which inputs are worthy. Not to mention the calibration across multi-dimensional inputs will be challenging in terms of computation and accuracy. In this case one can simply argue the contract can do other wild things to make it expensive to parse but we need to make an assumption and draw a line, and I think the many-simple-local-functions approach is a decent one.

we can't quite claim that this is the worst case

If we think there is another clear-cut worse case we would consider using that.

One potential way to exercise this would be to create two sets of benchmarks

This might help but again not sure this is worth the extra complexity and not sure this solves the problem, it doesn't eliminate the heuristic of deciding how many local functions is "too many" that needs to fall into one bucket vs another.

Another thing that potential might matter is the number of imported host functions.

All of the host functions are linked into Wasmi for any contract. That's is necessary for Wasmi to resolve a particular host function for a contract to call.

As @graydon mentioned above and in the thread. We have other more systematic ways to make the VM instantiation cheaper, e.g. sharing modules, caching parsed contracts which may reduce the cost fundamentally. But in any case the cost formula will have to exhibit the worst case estimate in some way.

@graydon
Copy link
Contributor

graydon commented Jun 2, 2023

There are a lot of degrees of freedom in a wasm contract. We're not going to capture all of them and we've no strong reason to believe that "large number of empty functions" is the worst case use of bytes in terms of incurring wasmi costs. Merely that it's worse than other better cases. If we find even worse cases, we have to either find a way to dynamically limit or prohibit them, or integrate them into the cost model as well, as a user can DoS us with a txset full of instances of contracts that exhibit the worst case.

CC'ign @brson on this, he might have some fun trying to wire up synth-wasm (or some other wasm fuzzer, eg. wasm-smith) to find "the most expensive thing we can ask wasmi to do during parsing and validation".

@dmkozh
Copy link
Contributor

dmkozh commented Jun 2, 2023

Multi-dimensional inputs were considered (see #208) but we decided it was not worth the complexity at the time.

I'm not necessarily suggesting to use multi-dimensional inputs. We could just have two linear functions (in case if the runtime is a linear function of the inputs of course), e.g. 'vm parse (contract size)', 'vm instantiate (function count)' or something like that. This may or may not be the right model, but we won't know if we don't try.

This might help but again not sure this is worth the extra complexity and not sure this solves the problem, it doesn't eliminate the heuristic of deciding how many local functions is "too many" that needs to fall into one bucket vs another.

It does have a potential to solve up 5x discrepancy between the cases.

If we think there is another clear-cut worse case we would consider using that.

What I'm proposing is to do more rigorous benchmarking to figure that out.

All of the host functions are linked into Wasmi for any contract. That's is necessary for Wasmi to resolve a particular host function for a contract to call.

That's not exactly true; the contract itself has to declare the host fns it's using. I don't know if this does or doesn't have impact on the performance, which is why that's just one more idea of what could be measured.

We're not going to capture all of them and we've no strong reason to believe that "large number of empty functions" is the worst case use of bytes in terms of incurring wasmi costs. Merely that it's worse than other better cases.

Sure, so shouldn't we think more as to what might be the worse case? But in any case, if there is a straightforward parameter of the model (like the function number), shouldn't we just use it? If linear combination of contract size and function number gives us a model that has, say, just <2x fluctuation between best and worst cases we find (instead of 3-5x), then I'd say that would be huge win for the ecosystem.

@jayz22
Copy link
Contributor Author

jayz22 commented Jun 7, 2023

I've done a few experiments and did not find a way to further improve the current metering model.
The most contract-size dependent component in Wasmi instantiation is Module::new, which requires parsing the entire Wasm file, section by section and all the bytes in it. There is no way around that.
The number of the bytes is the best proxy to the single input for a linear model for metering.
The only way to improve the metering model, without significantly altering the current framework and assumptions (i.e. no multi-dimensional inputs), is to do #838. This requires breaking down Module::new into sections and meter them individually.
I also tried component recycling in #827 and didn't find any quick wins.

Given our current constrains on time and resources, we should proceed with merging this PR, and adjust the resource limits at the network config level.

@graydon graydon enabled auto-merge (squash) June 8, 2023 01:58
@graydon graydon merged commit 8d3d85e into stellar:main Jun 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve VmInstantiation calibration
5 participants