Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coincidence of Wants query #7

Merged
merged 6 commits into from
Sep 5, 2024
Merged

Coincidence of Wants query #7

merged 6 commits into from
Sep 5, 2024

Conversation

fhenneke
Copy link
Contributor

@fhenneke fhenneke commented Aug 31, 2024

This PR adds queries for computing Coincidence of Wants (CoWs) per batch.

The approach to CoWs follows a master's thesis by Vigan Lladrovci. It is based on aggregated amounts transferred in and out of the settlement contract by users and other sources (AMMs, PMM, ...). This gives a metric for CoWs and for potential for CoWs for batches. The main idea is to aggregate per token bought in a batch the difference of the total usd volume of that token sold and the total volume sent towards AMMs. Aggregated over all tokens in a batch, this quantity can be compared to the total volume.

The original approach had problems taking internalizations into account. and excluded such batches from its analysis. Here, the approach is modified to handle such cases.

Most important formulas

Per token, the CoW fraction

naive_cow = (user_in - amm_out - slippage_in) / user_out

is computed, capped at 1 from above and 0 from below. This fraction can be used to compute the CoW volume per batch

naive_cow_volume = sum(token_price * user_out * naive_cow)

and a CoW fraction per batch

naive_cow_volume / total_volume

This means, the weighted average by usd volume of user_out over all bought tokens is the CoW value per batch.

Queries

There are six queries to compute CoW fractions per batch, and one query aggregating batch values into a total. (Dashed squares and lines correspond to potential future queries.)

CoWs drawio(1)

  • balance_changes: This query collects all balance changes by looking at erc20 transfers, native token transfers, and additional changes from e.g. deposit and withdrawal events. A similar query is used for monitoring CoW AMMs. I do not know how to combine those queries, though, as I do not know how to programmatically choose a parameter from within a query. If that were possible, this query could be parametrized by a list of addresses and that list would be chosen to only contain the settlement contract in the CoW use case.
  • imbalances: This query aggregates balance changes into one signed value per token. This query should compute the same values as https://github.com/cowprotocol/token-imbalances for raw token imbalances. The edge case of the protocol trading is handles consistently with the current slippage accounting: even though the buffer values are reduced significantly (fees are withdrawn from the contract), the imbalances only account for unexpected changes by counting transfers from the settlement contract to itself as incoming transfers.
  • classified_balance_changes: This query classifies all balance changes into the categories user_in, user_out, amm_in, and amm_out. Additionally it adds the remaining imbalance computed in the imbalance query as slippage_in or slippage_out depending on sign.
  • cow_per_token: This query computes CoW fractions per token. One should be cautious in interpreting this value. It does not have much meaning on its own (e.g. because the symmetry between buy and sell is broken) but can be turned into a meaningful aggregate value per batch.
  • token_prices: This query computes token prices for tokens bought and sold by users from Dune and the exchange rate of trades. Prices are in USD per atom of the token. Instead of joining on Dune data, it uses the cow_protocol_{{blockchain}}.trades table which contains Dune USD prices if they exist. If the USD price does not exist, a backup price is recovered from the price of the other traded token and the effective exchange rate of the trade. If multiple trades give such a price for a token, the average is used. If there is no price nor backup price, the price is set to zero. This seems to be rare though (below 1% of tokens).
  • cow_per_batch: This query computes CoW fractions and CoW volumes per batch. If combines CoWs per token with USD volumes, where prices come from token_prices.
  • cow_total: This query just aggregates numbers from cow_per_batch over the full time window.

All queries are parameterized by network (blockchain), start time (start_time, inclusive), and end time (end_time, exclusive). Computing CoW fractions for one month is reasonably fast. I have not checked the query with larger time windows. Running the query for individual transactions is not possible. Instead, a time window containing the transaction should be used and the result filtered on the hash.

Open issues

  • The naming naive_cow could/should be changed to just cow or some more descriptive name. I do not expect there to be one true definition of CoWs. And if there is, it will probably not be easy to compute it on Dune.
  • I am computing a symmetrized version of naive_cow called naive_cow_averaged. This was in an attempt to translate the CoW metric to individual orders. I do not think this to be possible anymore, so one might as well remove those quantities. (Edit: There might be a way to define per order CoWs (see edit below) but it does not require a new metric for per token CoWs.)
  • Translating this metric to individual orders will not be possible in general. At least there are examples, where missing information on which incoming and outgoing transfers to AMMs are connected leads to misclassification. One can still do correlations experiments (e.g. "are larger CoW fractions correlated to use of CoW AMMs?"). (Edit: It is possible using Shapley values, but not something we would want to do on Dune. And it might not really mean what we would want it to mean.)
  • I have not accounted for fees in computing slippage or user amounts. This could create a bias in final numbers. For example, if large fees are charged, the volume incoming from users might be a lot larger than the volume outgoing to users. If fees are charged in buy tokens, that can increase CoW fractions. Since fees can be almost of the same order of magnitude as CoW volume, fees might have to be accounted for somehow.
  • Some queries might also be useful in for slippage. The main thing missing would be better prices (and for more tokens) and correct accounting of fees.
  • I have not applied any systematic formatting yet. (Basic sqlfluff formatting is applied now.)

@acanidio-econ
Copy link

acanidio-econ commented Sep 2, 2024

I'm wondering if trading with the settlement contract should be considered as a CoW. I would say yes: it is a CoW between a user and a solver, and it has all the efficiency gains than any other CoW.

So let's assume for the moment that slippage volume counts as CoW volume, and therefore:
naive_cow = (user_in - amm_out) / user_out
Let's also use the identity:
user_in = user_out + AMM_out + slippage_out
Then the naive_cow formula becomes
naive_cow_volume = sum(token_price * (user_out + slippage_out))

I'm not too sure, but maybe this formula is easier to compute: slippage_out could come from the slippage accounting, and user_out should be easy to check

Copy link
Contributor

@fleupold fleupold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if trading with the settlement contract should be considered as a CoW. I would say yes: it is a CoW between a user and a solver, and it has all the efficiency gains than any other CoW.

At least historically those trades wouldn't happen at mid but instead give the user the same price they would get when trading against the AMM (bid/offer) but allows them to reduce the fees. While this is not required anymore (solvers can do what they want for internalisations) I do think that at least in some use cases (e.g. in order to decide whether a solver supports CoW settlements) it's not helpful to consider them as CoWs.

So, if we add it this capability to the query, I think it should be made an input argument (count_internalisations_as_cows or the like).

Overall this is a very impressive query (I'd hope we can reuse a large part of it for accounting as well). I do think that adding a proper view on fees is important (we should already have this logic somewhere in Dune by looking at the per trade execution prices vs. the uniform clearing prices).

from erc20_{{blockchain}}.evt_transfer
where
evt_block_time >= cast('{{start_time}}' as timestamp) and evt_block_time < cast('{{end_time}}' as timestamp) -- partition column
and 0x9008D19f58AAbD9eD0D60971565AA8510560ab41 in ("from", to)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can definitely add a list parameter here (and have it 0x9008... for this use case and * for the other one).

Given that this query does a lot of good edge case handling it would be great to combine it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried adding a list parameter but there seems to be an issue with using query results to populate it.
E.g. with this adapted query I get an error if I run this one:

Error: Query 4042690 parameter key 'address_list' uses results from another query.

Hard coding a list would work but then one has to keep track of CoW AMM addresses manually.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think advanced features (like lists that are based on a query) are not supported.

What you could do is have one base query that takes a string or list with arbitrary options as a base and then have the "nice UX" query specify a list based off a query and call the base query with the selected value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will tackle that when a new query using that base query is added to the repo.

Comment on lines 37 to 38
buy_price * units_bought / atoms_bought * atoms_bought / atoms_sold as token_price_backup_sell,
sell_price * units_sold / atoms_sold * atoms_sold / atoms_bought as token_price_backup_buy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we are already using this logic for computing the usd_value of each trade, is there maybe a simpler way to express this in terms of that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rewrote this to use volumes instead.

@harisang
Copy link
Contributor

harisang commented Sep 2, 2024

I would like to think a bit more about the definition itself. There are three things that are of a concern to me for now, two are more aesthetic/abstract and the other is more concrete:

  1. The naive_cow quantity that is computed on a per token basis is not well-defined if there are no users buying a token. This is a bit annoying and it is not clear to me why this should be the case. An alternative definition for that particular quantity could be (I assume slippage = 0 here) naive_cow = user_in / (user_out + amm_out). But again i don't have a good intuitive understanding of the definition yet so i am not sure why that definition was picked.

  2. The capping also seems to bother me a bit for the naive_cow quantity. Can we have a definition where this is naturally between 0 and 1?

  3. A ring trade where we alternate between user orders and AMMs would give a CoW fraction of zero. One could say that is fine, but instinctively I would like to measure this somehow as some kind of "CoW". The same goes for two users trading in the same direction with a single AMM.

@fhenneke
Copy link
Contributor Author

fhenneke commented Sep 3, 2024

  1. The naive_cow quantity that is computed on a per token basis is not well-defined if there are no users buying a token. This is a bit annoying and it is not clear to me why this should be the case. An alternative definition for that particular quantity could be (I assume slippage = 0 here) naive_cow = user_in / (user_out + amm_out). But again i don't have a good intuitive understanding of the definition yet so i am not sure why that definition was picked.

I also do not like the asymmetry in cow_per_token and that it is not well defined for all traded tokens. Note though, that the quantity cow_per_batch is always well defined.

I played around with a symmetric definition

naive_cow_averaged = ((user_in + user_out) - (amm_in + amm_out) - (slippage_in + slippage_out)) / (user_in + user_out)

and this does make a difference at the moment mostly because of fees. But 1) the difference is not large and 2) I do not have a real use for this quantity per token and for batches the other definition works fine.

Adding fees in a consistent manner might show what is going on here.

  1. The capping also seems to bother me a bit for the naive_cow quantity. Can we have a definition where this is naturally between 0 and 1?

This is also a bit strange for sure. Two observations

  1. Potential CoWs user_in / user_out only need capping from above. That capping comes into effect if the sum of sell amounts for some token is larger than the sum of buy amounts. Then the CoW per token metric gives "100% of funds being bought are covered by sell amounts" and all of buy volume contributes to CoW volume. This seems to correctly identify partial CoW volumes. (But I do not know how general that observation is.)
  2. Realized CoWs (user_in - amm_out) / user_out (lets ignore slippage and fees here) reduce to potential CoWs in case of "optimal" batching:
  • If user_in > user_out then the remaining tokens should be transferred to amms, amm_out = user_in - user_out, and no amm should send in those tokens, amm_in = 0. Since we have user_in + amm_in = user_out + amm_out, the realized cow fraction becomes (user_in - amm_out) / user_out = (user_out - amm_in) / user_out = 1. This corresponds to capping of potential CoWs.
  • If user_in < user_out then the missing tokens should be transferred from an amm, amm_in = user_out - user_in and amm_out = 0. Thus (user_in - amm_out) / user_out = user_in / user_out. This is the case without capping.
    Without optimal batching, there might be more outflow of a token than inflow from users. For example, the token could be part of an intermediary hop. Then user_in - amm_out could be negative. The implemented approach does not count the buy volume at all for total CoW volume.

Nothing super convincing but at least it is somewhat consistent.
One thing I would like to understand is what properties this approach has. For example, is the CoW volume of a combined solution the sum of CoW volumes of its parts (assuming zero slippage)?

  1. A ring trade where we alternate between user orders and AMMs would give a CoW fraction of zero. One could say that is fine, but instinctively I would like to measure this somehow as some kind of "CoW". The same goes for two users trading in the same direction with a single AMM.

I do agree that the definition of CoWs implemented here does not detect all batching capabilities of solvers. It is also possible for market makers to just fake being able to create CoWs (for example in alternating ring trades).
In that sense we could just keep the name naive_cow. Other definitions of CoWs can be added later if we want. They would probably require more heavy machinery and might not play well with Dune.

@harisang
Copy link
Contributor

harisang commented Sep 3, 2024

Note though, that the quantity cow_per_batch is always well defined.

Hm, how is so?

Also, your symmetric definition still suffers from this value not being well-defined for tokens that are not traded by users (i.e., intermediate hops)

@fhenneke
Copy link
Contributor Author

fhenneke commented Sep 3, 2024

Note though, that the quantity cow_per_batch is always well defined.

Hm, how is so?

Also, your symmetric definition still suffers from this value not being well-defined for tokens that are not traded by users (i.e., intermediate hops)

CoW fractions are only defined on tokens with non-zero buy volume, user_out > 0. Aggregation into batch values also multiplies by user_out. (The averaged definition uses user_in + user_out > 0 instead.)

The definition of CoW fraction intuitively is "What fraction of buy volume is covered by sell volume (correcting for which amount of sell volume is actually used differently)?". Multiplying that by the buy volume we get the buy volume covered by sell volume (correcting for sell volume used differently, i.e. sent to AMMs). Aggregating that gives the CoW volume of a batch.

@fhenneke
Copy link
Contributor Author

fhenneke commented Sep 4, 2024

I addressed most comments.

After discussion within the solver team we concluded that we can go with this approach to CoWs for now.

The main missing piece is fees. I will add it at a later time. That will be required since for some solvers, CoW volumes are of the same order of magnitude as fees.

It is relatively easy to base new queries on cow_per_batch. E.g. this one on CoW per solver (observation: Fractal is really going for CoWs). We will also use it for additional rewards for CoWs of CoW AMMs.

This should be good enough to be merged in its current state.

@fhenneke fhenneke merged commit e4d4ce6 into main Sep 5, 2024
1 check passed
@fhenneke fhenneke deleted the coincidence_of_wants branch September 5, 2024 16:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants