Coincidence of Wants query #7

fhenneke · 2024-08-31T14:44:03Z

This PR adds queries for computing Coincidence of Wants (CoWs) per batch.

The approach to CoWs follows a master's thesis by Vigan Lladrovci. It is based on aggregated amounts transferred in and out of the settlement contract by users and other sources (AMMs, PMM, ...). This gives a metric for CoWs and for potential for CoWs for batches. The main idea is to aggregate per token bought in a batch the difference of the total usd volume of that token sold and the total volume sent towards AMMs. Aggregated over all tokens in a batch, this quantity can be compared to the total volume.

The original approach had problems taking internalizations into account. and excluded such batches from its analysis. Here, the approach is modified to handle such cases.

Most important formulas

Per token, the CoW fraction

naive_cow = (user_in - amm_out - slippage_in) / user_out

is computed, capped at 1 from above and 0 from below. This fraction can be used to compute the CoW volume per batch

naive_cow_volume = sum(token_price * user_out * naive_cow)

and a CoW fraction per batch

naive_cow_volume / total_volume

This means, the weighted average by usd volume of user_out over all bought tokens is the CoW value per batch.

Queries

There are six queries to compute CoW fractions per batch, and one query aggregating batch values into a total. (Dashed squares and lines correspond to potential future queries.)

balance_changes: This query collects all balance changes by looking at erc20 transfers, native token transfers, and additional changes from e.g. deposit and withdrawal events. A similar query is used for monitoring CoW AMMs. I do not know how to combine those queries, though, as I do not know how to programmatically choose a parameter from within a query. If that were possible, this query could be parametrized by a list of addresses and that list would be chosen to only contain the settlement contract in the CoW use case.
imbalances: This query aggregates balance changes into one signed value per token. This query should compute the same values as https://github.com/cowprotocol/token-imbalances for raw token imbalances. The edge case of the protocol trading is handles consistently with the current slippage accounting: even though the buffer values are reduced significantly (fees are withdrawn from the contract), the imbalances only account for unexpected changes by counting transfers from the settlement contract to itself as incoming transfers.
classified_balance_changes: This query classifies all balance changes into the categories user_in, user_out, amm_in, and amm_out. Additionally it adds the remaining imbalance computed in the imbalance query as slippage_in or slippage_out depending on sign.
cow_per_token: This query computes CoW fractions per token. One should be cautious in interpreting this value. It does not have much meaning on its own (e.g. because the symmetry between buy and sell is broken) but can be turned into a meaningful aggregate value per batch.
token_prices: This query computes token prices for tokens bought and sold by users from Dune and the exchange rate of trades. Prices are in USD per atom of the token. Instead of joining on Dune data, it uses the cow_protocol_{{blockchain}}.trades table which contains Dune USD prices if they exist. If the USD price does not exist, a backup price is recovered from the price of the other traded token and the effective exchange rate of the trade. If multiple trades give such a price for a token, the average is used. If there is no price nor backup price, the price is set to zero. This seems to be rare though (below 1% of tokens).
cow_per_batch: This query computes CoW fractions and CoW volumes per batch. If combines CoWs per token with USD volumes, where prices come from token_prices.
cow_total: This query just aggregates numbers from cow_per_batch over the full time window.

All queries are parameterized by network (blockchain), start time (start_time, inclusive), and end time (end_time, exclusive). Computing CoW fractions for one month is reasonably fast. I have not checked the query with larger time windows. Running the query for individual transactions is not possible. Instead, a time window containing the transaction should be used and the result filtered on the hash.

Open issues

The naming naive_cow could/should be changed to just cow or some more descriptive name. I do not expect there to be one true definition of CoWs. And if there is, it will probably not be easy to compute it on Dune.
I am computing a symmetrized version of naive_cow called naive_cow_averaged. This was in an attempt to translate the CoW metric to individual orders. I do not think this to be possible anymore, so one might as well remove those quantities. (Edit: There might be a way to define per order CoWs (see edit below) but it does not require a new metric for per token CoWs.)
Translating this metric to individual orders will not be possible in general. At least there are examples, where missing information on which incoming and outgoing transfers to AMMs are connected leads to misclassification. One can still do correlations experiments (e.g. "are larger CoW fractions correlated to use of CoW AMMs?"). (Edit: It is possible using Shapley values, but not something we would want to do on Dune. And it might not really mean what we would want it to mean.)
I have not accounted for fees in computing slippage or user amounts. This could create a bias in final numbers. For example, if large fees are charged, the volume incoming from users might be a lot larger than the volume outgoing to users. If fees are charged in buy tokens, that can increase CoW fractions. Since fees can be almost of the same order of magnitude as CoW volume, fees might have to be accounted for somehow.
Some queries might also be useful in for slippage. The main thing missing would be better prices (and for more tokens) and correct accounting of fees.
~~I have not applied any systematic formatting yet.~~ (Basic sqlfluff formatting is applied now.)

acanidio-econ · 2024-09-02T11:24:14Z

I'm wondering if trading with the settlement contract should be considered as a CoW. I would say yes: it is a CoW between a user and a solver, and it has all the efficiency gains than any other CoW.

So let's assume for the moment that slippage volume counts as CoW volume, and therefore:
naive_cow = (user_in - amm_out) / user_out
Let's also use the identity:
user_in = user_out + AMM_out + slippage_out
Then the naive_cow formula becomes
naive_cow_volume = sum(token_price * (user_out + slippage_out))

I'm not too sure, but maybe this formula is easier to compute: slippage_out could come from the slippage accounting, and user_out should be easy to check

fleupold

I'm wondering if trading with the settlement contract should be considered as a CoW. I would say yes: it is a CoW between a user and a solver, and it has all the efficiency gains than any other CoW.

At least historically those trades wouldn't happen at mid but instead give the user the same price they would get when trading against the AMM (bid/offer) but allows them to reduce the fees. While this is not required anymore (solvers can do what they want for internalisations) I do think that at least in some use cases (e.g. in order to decide whether a solver supports CoW settlements) it's not helpful to consider them as CoWs.

So, if we add it this capability to the query, I think it should be made an input argument (count_internalisations_as_cows or the like).

Overall this is a very impressive query (I'd hope we can reuse a large part of it for accounting as well). I do think that adding a proper view on fees is important (we should already have this logic somewhere in Dune by looking at the per trade execution prices vs. the uniform clearing prices).

cowprotocol/coincidence_of_wants/balance_changes_4021257.sql

fleupold · 2024-09-02T13:51:31Z

cowprotocol/coincidence_of_wants/balance_changes_4021257.sql

+    from erc20_{{blockchain}}.evt_transfer
+    where
+        evt_block_time >= cast('{{start_time}}' as timestamp) and evt_block_time < cast('{{end_time}}' as timestamp) -- partition column
+        and 0x9008D19f58AAbD9eD0D60971565AA8510560ab41 in ("from", to)


I think we can definitely add a list parameter here (and have it 0x9008... for this use case and * for the other one).

Given that this query does a lot of good edge case handling it would be great to combine it.

I tried adding a list parameter but there seems to be an issue with using query results to populate it.
E.g. with this adapted query I get an error if I run this one:

Error: Query 4042690 parameter key 'address_list' uses results from another query.

Hard coding a list would work but then one has to keep track of CoW AMM addresses manually.

Yeah I think advanced features (like lists that are based on a query) are not supported.

What you could do is have one base query that takes a string or list with arbitrary options as a base and then have the "nice UX" query specify a list based off a query and call the base query with the selected value.

I will tackle that when a new query using that base query is added to the repo.

fleupold · 2024-09-02T14:01:48Z

cowprotocol/coincidence_of_wants/token_prices_4031637.sql

+        buy_price * units_bought / atoms_bought * atoms_bought / atoms_sold as token_price_backup_sell,
+        sell_price * units_sold / atoms_sold * atoms_sold / atoms_bought as token_price_backup_buy


I believe we are already using this logic for computing the usd_value of each trade, is there maybe a simpler way to express this in terms of that?

I rewrote this to use volumes instead.

harisang · 2024-09-02T21:13:09Z

I would like to think a bit more about the definition itself. There are three things that are of a concern to me for now, two are more aesthetic/abstract and the other is more concrete:

The naive_cow quantity that is computed on a per token basis is not well-defined if there are no users buying a token. This is a bit annoying and it is not clear to me why this should be the case. An alternative definition for that particular quantity could be (I assume slippage = 0 here) naive_cow = user_in / (user_out + amm_out). But again i don't have a good intuitive understanding of the definition yet so i am not sure why that definition was picked.
The capping also seems to bother me a bit for the naive_cow quantity. Can we have a definition where this is naturally between 0 and 1?
A ring trade where we alternate between user orders and AMMs would give a CoW fraction of zero. One could say that is fine, but instinctively I would like to measure this somehow as some kind of "CoW". The same goes for two users trading in the same direction with a single AMM.

fhenneke · 2024-09-03T11:23:48Z

The naive_cow quantity that is computed on a per token basis is not well-defined if there are no users buying a token. This is a bit annoying and it is not clear to me why this should be the case. An alternative definition for that particular quantity could be (I assume slippage = 0 here) naive_cow = user_in / (user_out + amm_out). But again i don't have a good intuitive understanding of the definition yet so i am not sure why that definition was picked.

I also do not like the asymmetry in cow_per_token and that it is not well defined for all traded tokens. Note though, that the quantity cow_per_batch is always well defined.

I played around with a symmetric definition

naive_cow_averaged = ((user_in + user_out) - (amm_in + amm_out) - (slippage_in + slippage_out)) / (user_in + user_out)

and this does make a difference at the moment mostly because of fees. But 1) the difference is not large and 2) I do not have a real use for this quantity per token and for batches the other definition works fine.

Adding fees in a consistent manner might show what is going on here.

The capping also seems to bother me a bit for the naive_cow quantity. Can we have a definition where this is naturally between 0 and 1?

This is also a bit strange for sure. Two observations

Potential CoWs user_in / user_out only need capping from above. That capping comes into effect if the sum of sell amounts for some token is larger than the sum of buy amounts. Then the CoW per token metric gives "100% of funds being bought are covered by sell amounts" and all of buy volume contributes to CoW volume. This seems to correctly identify partial CoW volumes. (But I do not know how general that observation is.)
Realized CoWs (user_in - amm_out) / user_out (lets ignore slippage and fees here) reduce to potential CoWs in case of "optimal" batching:

If user_in > user_out then the remaining tokens should be transferred to amms, amm_out = user_in - user_out, and no amm should send in those tokens, amm_in = 0. Since we have user_in + amm_in = user_out + amm_out, the realized cow fraction becomes (user_in - amm_out) / user_out = (user_out - amm_in) / user_out = 1. This corresponds to capping of potential CoWs.
If user_in < user_out then the missing tokens should be transferred from an amm, amm_in = user_out - user_in and amm_out = 0. Thus (user_in - amm_out) / user_out = user_in / user_out. This is the case without capping.
Without optimal batching, there might be more outflow of a token than inflow from users. For example, the token could be part of an intermediary hop. Then user_in - amm_out could be negative. The implemented approach does not count the buy volume at all for total CoW volume.

Nothing super convincing but at least it is somewhat consistent.
One thing I would like to understand is what properties this approach has. For example, is the CoW volume of a combined solution the sum of CoW volumes of its parts (assuming zero slippage)?

A ring trade where we alternate between user orders and AMMs would give a CoW fraction of zero. One could say that is fine, but instinctively I would like to measure this somehow as some kind of "CoW". The same goes for two users trading in the same direction with a single AMM.

I do agree that the definition of CoWs implemented here does not detect all batching capabilities of solvers. It is also possible for market makers to just fake being able to create CoWs (for example in alternating ring trades).
In that sense we could just keep the name naive_cow. Other definitions of CoWs can be added later if we want. They would probably require more heavy machinery and might not play well with Dune.

harisang · 2024-09-03T11:59:49Z

Note though, that the quantity cow_per_batch is always well defined.

Hm, how is so?

Also, your symmetric definition still suffers from this value not being well-defined for tokens that are not traded by users (i.e., intermediate hops)

fhenneke · 2024-09-03T12:32:38Z

Note though, that the quantity cow_per_batch is always well defined.

Hm, how is so?

Also, your symmetric definition still suffers from this value not being well-defined for tokens that are not traded by users (i.e., intermediate hops)

CoW fractions are only defined on tokens with non-zero buy volume, user_out > 0. Aggregation into batch values also multiplies by user_out. (The averaged definition uses user_in + user_out > 0 instead.)

The definition of CoW fraction intuitively is "What fraction of buy volume is covered by sell volume (correcting for which amount of sell volume is actually used differently)?". Multiplying that by the buy volume we get the buy volume covered by sell volume (correcting for sell volume used differently, i.e. sent to AMMs). Aggregating that gives the CoW volume of a batch.

as I do not know what it means

fhenneke · 2024-09-04T13:35:08Z

I addressed most comments.

After discussion within the solver team we concluded that we can go with this approach to CoWs for now.

The main missing piece is fees. I will add it at a later time. That will be required since for some solvers, CoW volumes are of the same order of magnitude as fees.

It is relatively easy to base new queries on cow_per_batch. E.g. this one on CoW per solver (observation: Fractal is really going for CoWs). We will also use it for additional rewards for CoWs of CoW AMMs.

This should be good enough to be merged in its current state.

fhenneke added 2 commits August 31, 2024 10:32

added queries for coincidence of wants

3d0c359

fix sqlfluff linting

85b5a41

fhenneke requested review from harisang, fleupold and acanidio-econ August 31, 2024 15:08

fleupold approved these changes Sep 2, 2024

View reviewed changes

fhenneke added 3 commits September 4, 2024 14:41

change price computation to use volumes directly

e3e816e

remove symmetrized cow definition

9932fc4

as I do not know what it means

fix typo

9a90316

added reference to masters thesis in cow specific queries

ead402a

fhenneke merged commit e4d4ce6 into main Sep 5, 2024
1 check passed

fhenneke deleted the coincidence_of_wants branch September 5, 2024 16:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Coincidence of Wants query #7

Coincidence of Wants query #7

fhenneke commented Aug 31, 2024 •

edited by harisang

Loading

acanidio-econ commented Sep 2, 2024 •

edited

Loading

fleupold left a comment

fleupold Sep 2, 2024

fhenneke Sep 4, 2024

fleupold Sep 4, 2024

fhenneke Sep 5, 2024

fleupold Sep 2, 2024

fhenneke Sep 5, 2024

harisang commented Sep 2, 2024 •

edited

Loading

fhenneke commented Sep 3, 2024 •

edited

Loading

harisang commented Sep 3, 2024 •

edited

Loading

fhenneke commented Sep 3, 2024

fhenneke commented Sep 4, 2024

		buy_price * units_bought / atoms_bought * atoms_bought / atoms_sold as token_price_backup_sell,
		sell_price * units_sold / atoms_sold * atoms_sold / atoms_bought as token_price_backup_buy

Coincidence of Wants query #7

Coincidence of Wants query #7

Conversation

fhenneke commented Aug 31, 2024 • edited by harisang Loading

Most important formulas

Queries

Open issues

acanidio-econ commented Sep 2, 2024 • edited Loading

fleupold left a comment

Choose a reason for hiding this comment

fleupold Sep 2, 2024

Choose a reason for hiding this comment

fhenneke Sep 4, 2024

Choose a reason for hiding this comment

fleupold Sep 4, 2024

Choose a reason for hiding this comment

fhenneke Sep 5, 2024

Choose a reason for hiding this comment

fleupold Sep 2, 2024

Choose a reason for hiding this comment

fhenneke Sep 5, 2024

Choose a reason for hiding this comment

harisang commented Sep 2, 2024 • edited Loading

fhenneke commented Sep 3, 2024 • edited Loading

harisang commented Sep 3, 2024 • edited Loading

fhenneke commented Sep 3, 2024

fhenneke commented Sep 4, 2024

fhenneke commented Aug 31, 2024 •

edited by harisang

Loading

acanidio-econ commented Sep 2, 2024 •

edited

Loading

harisang commented Sep 2, 2024 •

edited

Loading

fhenneke commented Sep 3, 2024 •

edited

Loading

harisang commented Sep 3, 2024 •

edited

Loading