Skip to content
This repository has been archived by the owner on Dec 20, 2024. It is now read-only.

Introduce new logic for syncing raw batch data #140

Open
wants to merge 26 commits into
base: dev
Choose a base branch
from

Conversation

harisang
Copy link
Contributor

@harisang harisang commented Nov 21, 2024

This PR introduces an alternative way of syncing raw batch data on Dune, by relying on the dune api for uploading data as csv files.

The main ideas:

  • It creates one table per month
  • in order to ensure all data is uploaded, on the first day of each month, it recomputes the previous month's table and re-uploads it
  • it finally introduces the auction id and environment as entries in the batch_data table!

To test it, one can run

python -m src.main --sync-table batch_data

and check the results by running the following query on Dune:

select * from dune.cowprotocol.dataset_test_batch_rewards_ethereum_2024_11 order by block_deadline DESC

Copy link

New dependencies detected. Learn more about Socket for GitHub ↗︎

Package New capabilities Transitives Size Publisher
pypi/[email protected] environment, eval, filesystem, network, shell Transitive: unsafe +2 7.11 MB carver, fselmo, kclowes, ...5 more

View full report↗︎

Copy link
Contributor

@fhenneke fhenneke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look reasonable.

What is the strategy with respect to rerunning old month? It looks as if the script is not invoced with a parameter and always just uses the current time to determine the month.

.replace("{{start_block}}", str(block_range.block_from))
.replace("{{end_block}}", str(block_range.block_to))
.replace(
"{{EPSILON_LOWER}}", "10000000000000000"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value of the cap needs to be adapted for Gnosis and Arbitrum One

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh very good point!

src/fetch/orderbook.py Outdated Show resolved Hide resolved
src/main.py Show resolved Hide resolved
src/sync/batch_data.py Show resolved Hide resolved
src/main.py Outdated Show resolved Hide resolved
@harisang
Copy link
Contributor Author

Changes look reasonable.

What is the strategy with respect to rerunning old month? It looks as if the script is not invoced with a parameter and always just uses the current time to determine the month.

So currently it's a very simple strategy. Basically if it is the first day of the month, just rerun the last month as well. Here it assumes that the script will always run at least once every 24h. Of course, this ideally should be fixed, and i would like to add a runtime argument to specify the starting month as well, in case we want to run it again for several months

+ os.environ.get("NETWORK", "mainnet")
)
else:
db_url = os.environ[f"{db_env}_DB_URL"]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR changes the meaning of the PROD_DB_URL and BARN_DB_URL env variables, and this else statement just ensures that it defaults to the "old" way, so that the tests won't fail. This should be removed once the secrets in the repo are edited.

@harisang harisang marked this pull request as ready for review November 23, 2024 02:51
@harisang harisang requested a review from fhenneke November 23, 2024 02:51
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants