Restructure payouts file #435

fhenneke · 2024-11-21T13:47:01Z

This PR is an attempt at implementing #427. The PR does not change the values of final results (up to rounding of floats). It changes the structure of the code and intermediate representations.

As a first step, it separates the computation of different parts of the accounting into different functions.

    # compute individual components of payments
    solver_info = compute_solver_info(
        reward_target_df,
        service_fee_df,
        config,
    )
    rewards = compute_rewards(
        batch_data,
        quote_rewards_df,
        exchange_rate_native_to_cow,
        config.reward_config,
    )
    protocol_fees = compute_protocol_fees(batch_data)
    partner_fees = compute_partner_fees(batch_data, config.protocol_fee_config)
    buffer_accounting = compute_buffer_accounting(batch_data, slippage_df)

Those functions are implemented in separate files

The results of these steps are converted into data frames for solver payments and for protocol and partner fee payments.

    # combine into solver payouts and partner payouts
    solver_payouts = compute_solver_payouts(
        solver_info, rewards, protocol_fees, buffer_accounting
    )
    partner_payouts = (
        partner_fees  # no additional computation required here at the moment
    )

Payout data on transfers and overdrafts is then computed from solver and parner payouts data.

    payouts = prepare_payouts(solver_payouts, partner_payouts, dune.period, config)

The code can be tested to produce the same transfer files as the old code.
Tests have been adapted and cover essentially what was tested before. There is a bit more strictness in the testing of the separate computations of rewards, protocol fees, partner fees, etc.
There is no end-to-end test for payments yet. This should be added at some point.

Future changes could remove data frames from intermediate results. This would make it easier to have correct types and detect and handle missing data. Data for the different parts of the accounting can be changed to use intermediate tables generated by src/data_sync/sync_data.py.

This PR changes how missing data for reward targets is identified. It fixes a bug where missing data causes the script to crash. The issue seems to come from a column without valid values being misidentified as float column and the missing value is encoded as `NaN`. This was not identified as missing data since it only checks for `None`. With this PR, the pandas function `isna` is used to identify missing data. A similar approach was used in #435. Since local tests are still running, I created this as draft PR. I will remove the draft status once the local run is successful. --------- Co-authored-by: Haris Angelidakis <[email protected]>

src/fetch/payouts.py

harisang · 2025-01-22T10:53:43Z

src/fetch/buffer_accounting.py

+) -> DataFrame:
+    """Compute buffer accounting per solver"""
+
+    # validate batch rewards and quote rewards columns


comment seems wrong

src/fetch/buffer_accounting.py

src/fetch/partner_fees.py

src/fetch/protocol_fees.py

src/fetch/rewards.py

harisang · 2025-01-22T11:56:04Z

src/fetch/rewards.py

+
+def compute_rewards(
+    batch_data: DataFrame,
+    quote_rewards: DataFrame,


i would rename this to quote_data, to make it more uniform and also because there are no rewards stored in that dataframe

Also, we could be more explicit by writing

batch_data_per_solver

quote_data_per_solver

I am torn a bit between using more descriptive names (e.g. batch_data_per_solver) and more general names (e.g. batch_data).

On the one hand, in the end, the general code should not depend on what data exactly is needed to compute rewards, it just depends on batch_data and we choose what that entails. And the structure of that part of the code was originally intended to end up as something like

rewards = compute_rewards(orderbook, dune, config)

where all the fetching and computing is abstracted away.

On the other hand, we will not end up with a clean version any time soon. So we might as well be as explicit as possible for now.

harisang · 2025-01-22T11:59:15Z

src/fetch/rewards.py

+]
+
+
+def compute_rewards(


I was thinking whether we should express rewards in their "native currency", i.e., batch rewards in native token and quote rewards in COW, and then at the final step, when we build the payout, actually do the conversions.

This would mean dropping the primary_reward_cow column, as well as not pass the exchange_rate as a parameter.

Originally, the intention was to have a data frame rewards containing all information for computing reward payments. This also works smoothly with the rest of the payments code. It does completely decouple the reward computation and the creation of payments.

I thought about relaxing this to being able to compute all reward information from rewards and reward_config. This would allow dropping the strange 'reward_token_address' column. But it would still require exchange rates to be part of that data frame.

src/fetch/solver_info.py

harisang · 2025-01-22T12:30:36Z

src/fetch/solver_info.py

+        for service_fees_flag in solver_info["service_fee"]
+    ]
+
+    if not solver_info["solver"].is_unique:


Can this happen as part of this function, or the only way this can happen is if the input table itself (reward_targets) contains duplicates?

Good point. I'll will have another look at that. I do remember needing it at some point but those queries have changed a bit since then.
Potentially, this can be changed to an assert, and the actual changing of data happens in the dune fetching, as for other queries.
At some point, all the processing might take place in compute_solver_info, but for now we should keep it consistent.

src/fetch/payouts.py

harisang · 2025-01-24T10:45:59Z

src/fetch/solver_info.py

+            "0x"-prefixed hex representation of address of a solvers bonding pool.
+        solver_name: str
+            Name of a solver.
+    service_fees : DataFrame


Something is off with indentation here.

I changed the format a bit to make it clearer what is an input and what is a column.

src/fetch/solver_info.py

fhenneke · 2025-01-24T14:26:01Z

I added some documentation for most functions which were added or changed.

I am trying to stick to a numpy/scipy style docstring style. Additionally, I added explanations for all expected columns.

src/fetch/payouts.py

harisang · 2025-01-24T15:36:40Z

src/fetch/payouts.py

+        solver_payouts["reward_token_address"] = (
+            solver_payouts["reward_token_address"]
+            .fillna(
+                "0x0000000000000000000000000000000000000001"


Is this ever needed btw?

We could alternatively add an assert here as otherwise it the dataframe doesn't make sense (unless the corresponding row contains all zeros).

This is quite a bit of code smell. I added it because without this default there were errors. But I do not exactly know why.

The reward token address is a required field in the RewardAndPenaltyDatum class and will be turned into an Address. The address is, however, never used for solvers who do not get a reward. In principle, whenever a solver settled an auction or provided a quote, they should have an entry in rewards and thus filling the in defaults should not be required.

I will have to have another look (or follow up on this in another PR).

I removed this default value and adapted one of the tests accordingly.

harisang

Looks good! There is one small comment to be addressed but not a deal breaker

also adapt corresponding tests

fhenneke requested review from bram-vdberg and harisang November 21, 2024 13:47

fhenneke mentioned this pull request Dec 3, 2024

[Hotfix] Handle missing data #440

Merged

fhenneke mentioned this pull request Dec 10, 2024

Fix merging of data frames for payments #453

Merged

fhenneke added 4 commits January 20, 2025 12:34

added files for computing parts of the accounting

bb60055

use new functions in payouts

594e22a

update payouts test

5ea8d2e

add tests for new files

ffcadef

fhenneke force-pushed the restructure_payouts branch from e2e51c8 to ffcadef Compare January 20, 2025 11:58

fhenneke marked this pull request as ready for review January 20, 2025 13:30

fhenneke changed the title ~~[Draft] Restructure payouts~~ Restructure payouts file Jan 20, 2025

fhenneke requested review from harisang and bram-vdberg and removed request for harisang and bram-vdberg January 20, 2025 13:40

fhenneke and others added 2 commits January 21, 2025 13:53

minor layout change

5aba9ea

Merge branch 'main' into restructure_payouts

a55a9f0