-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restructure payouts file #435
Conversation
This PR changes how missing data for reward targets is identified. It fixes a bug where missing data causes the script to crash. The issue seems to come from a column without valid values being misidentified as float column and the missing value is encoded as `NaN`. This was not identified as missing data since it only checks for `None`. With this PR, the pandas function `isna` is used to identify missing data. A similar approach was used in #435. Since local tests are still running, I created this as draft PR. I will remove the draft status once the local run is successful. --------- Co-authored-by: Haris Angelidakis <[email protected]>
e2e51c8
to
ffcadef
Compare
src/fetch/buffer_accounting.py
Outdated
) -> DataFrame: | ||
"""Compute buffer accounting per solver""" | ||
|
||
# validate batch rewards and quote rewards columns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment seems wrong
src/fetch/rewards.py
Outdated
|
||
def compute_rewards( | ||
batch_data: DataFrame, | ||
quote_rewards: DataFrame, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would rename this to quote_data
, to make it more uniform and also because there are no rewards stored in that dataframe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, we could be more explicit by writing
- batch_data_per_solver
- quote_data_per_solver
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am torn a bit between using more descriptive names (e.g. batch_data_per_solver
) and more general names (e.g. batch_data
).
On the one hand, in the end, the general code should not depend on what data exactly is needed to compute rewards, it just depends on batch_data
and we choose what that entails. And the structure of that part of the code was originally intended to end up as something like
rewards = compute_rewards(orderbook, dune, config)
where all the fetching and computing is abstracted away.
On the other hand, we will not end up with a clean version any time soon. So we might as well be as explicit as possible for now.
] | ||
|
||
|
||
def compute_rewards( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking whether we should express rewards in their "native currency", i.e., batch rewards in native token and quote rewards in COW, and then at the final step, when we build the payout, actually do the conversions.
This would mean dropping the primary_reward_cow
column, as well as not pass the exchange_rate
as a parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Originally, the intention was to have a data frame rewards
containing all information for computing reward payments. This also works smoothly with the rest of the payments code. It does completely decouple the reward computation and the creation of payments.
I thought about relaxing this to being able to compute all reward information from rewards
and reward_config
. This would allow dropping the strange 'reward_token_address'
column. But it would still require exchange rates to be part of that data frame.
for service_fees_flag in solver_info["service_fee"] | ||
] | ||
|
||
if not solver_info["solver"].is_unique: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this happen as part of this function, or the only way this can happen is if the input table itself (reward_targets
) contains duplicates?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I'll will have another look at that. I do remember needing it at some point but those queries have changed a bit since then.
Potentially, this can be changed to an assert, and the actual changing of data happens in the dune fetching, as for other queries.
At some point, all the processing might take place in compute_solver_info
, but for now we should keep it consistent.
"0x"-prefixed hex representation of address of a solvers bonding pool. | ||
solver_name: str | ||
Name of a solver. | ||
service_fees : DataFrame |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something is off with indentation here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the format a bit to make it clearer what is an input and what is a column.
I added some documentation for most functions which were added or changed. I am trying to stick to a numpy/scipy style docstring style. Additionally, I added explanations for all expected columns. |
src/fetch/payouts.py
Outdated
solver_payouts["reward_token_address"] = ( | ||
solver_payouts["reward_token_address"] | ||
.fillna( | ||
"0x0000000000000000000000000000000000000001" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this ever needed btw?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could alternatively add an assert
here as otherwise it the dataframe doesn't make sense (unless the corresponding row contains all zeros).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is quite a bit of code smell. I added it because without this default there were errors. But I do not exactly know why.
The reward token address is a required field in the RewardAndPenaltyDatum
class and will be turned into an Address
. The address is, however, never used for solvers who do not get a reward. In principle, whenever a solver settled an auction or provided a quote, they should have an entry in rewards
and thus filling the in defaults should not be required.
I will have to have another look (or follow up on this in another PR).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed this default value and adapted one of the tests accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! There is one small comment to be addressed but not a deal breaker
also adapt corresponding tests
This PR is an attempt at implementing #427. The PR does not change the values of final results (up to rounding of floats). It changes the structure of the code and intermediate representations.
As a first step, it separates the computation of different parts of the accounting into different functions.
Those functions are implemented in separate files
The results of these steps are converted into data frames for solver payments and for protocol and partner fee payments.
Payout data on transfers and overdrafts is then computed from solver and parner payouts data.
The code can be tested to produce the same transfer files as the old code.
Tests have been adapted and cover essentially what was tested before. There is a bit more strictness in the testing of the separate computations of rewards, protocol fees, partner fees, etc.
There is no end-to-end test for payments yet. This should be added at some point.
Future changes could remove data frames from intermediate results. This would make it easier to have correct types and detect and handle missing data. Data for the different parts of the accounting can be changed to use intermediate tables generated by
src/data_sync/sync_data.py
.