-
Notifications
You must be signed in to change notification settings - Fork 721
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] - query utxos by address performance varies vastly #5810
Comments
For comparison: we saw "query utxos by txins" with 90+ txins taking only around 100 milliseconds. |
There's a change of how UTXOs are stored in the works - UTXO-HD. The main change is that UTXOs won't be kept in the memory, so I believe the queries will be slower, but you should get less varying query times. @jasagredo Correct me if I'm wrong here please. Do we have any performance expectations here? |
@carbolymer so do I understand this correctly: the queries will be even slower but consistently slower? |
The QueryUtxoByAddress query has to traverse the whole UTXO set to find the UTXOs associated with the requested account. May I be wrong, but that query is not expected to be performant, nor it is expected to be used in production. The node does not need a reverse index of UTXOs for operating in the network, thus it falls outside of the responsibilities of the node to mantain such an index, and doing so would impact other places, in particular memory usage. I think the expected setup is for clients to track the UTXO set of their accounts, which (I think) cardano-wallet does. The alternative would be to use some external UTXO indexer like db-sync. When UTXO-HD arrives, this query still will have to traverse the whole UTXO set but that set will be on the disk instead of in memory so it is expected some regression there. Please correct me if I misinterpreted something above @disassembler |
In any case, leaving aside that this query might be slower than desired, I don't have an explanation for the fluctuations and I would not have expected that those are happening. |
If you want to investigate this, perhaps the first step would be to query on latest node versions a chain that is at the same (more or less) tip than 1.35.7 was at that time. My suspicion is the code that performs the query is not at fault as I don't think it has changed much, but rather (1) the data in the chain, which perhaps was much smaller back then, and (2) also perhaps some thunks are being forced by traversing the UTXO set. If the cause is (1) there is not much to investigate here. If the cause is (2) then there is probably some profiling investigation that could be done to try to (even if the query is slow) smooth out the fluctuations |
Then put some warnings/documentation into haddock to make it VERY clear.
Relying on db-sync is a big risk, requires a lot of resources, requires additional integration, and is often behind releases (for example is/was not compatible with 8.9.1 nodes despite official 8.9.1 release). But ok there are other options than db-sync - it seems that the only really feasible option is to rely on Blockfrost.
Right, so this makes the query completely unusable. Please remove it from the api and/or mark it clearly as "dont use in production". Tbh I dont understand such a fundamental refactoring as a "UTXO-HD" when the consequence is an actual performance regression... As I said: we saw those fluctuations on the very same address over a time window of a few days - and the utxo set of this address did definitely not change that much. |
The UTXO set is currently stored in memory. Since it is an ever growing thing, it will at some point have to be moved to disk. Otherwise, only computers with a large memory budget will be able to run a node. However, what wasn't mentioned yet is that the node will have two modes: one for storing the UTXO set on disk, one for storing it in memory. The consensus code will still be refactored to account for both modes, but the on-disk mode is not mandatory |
Yes. I can confirm that People have created a variety of external chain indexers, see Carp vs alternatives for details. I do want to point out that developers who are new to the Cardano ecosystem often expect the node to perform some indexing out of the box, such as an I generally agree with the strict separation of concerns taken by the Cardano node, but it's also a fact that the Node-to-Client protocol exists and goes beyond the needs of block producers, so there is room to haggle about what precisely it should contain — perhaps in the same executable, perhaps in a separate executable. It's mostly a trade-off involving implementation complexity of clients as well as resource usage. |
No, don't remove it. There are a whole bunch of tools and scripts out there relying on the abilitiy of the cli to query utxos by an address. Even if its slow! |
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 120 days. |
I'm still using 1.35.7 for this reason - even today it still replies in less than 2 seconds where as all the newer versions are considerably slower. We all have GIGS of ram these days, it seems so petty to be fretting over an index of utxo taking up precious ram. The fact is - we want people to use this software - we should support all use cases and not tell people they are doing it wrong. If you provided a defective tool, don't complain when people are upset it doesn't work the same from day to day. I agree with the db-sync being behind the main version - ALWAYS. We want adoption - people using it - not "You are doing it wrong". |
@frog357 |
Ogmios is just a REST wrapper around the cardano-node, there is no reason why it should be faster against your own local node. Besides that, it requires additional integration effort. |
ah sorry.. i mean |
Ok we just had three extreme outlier cases where querying the address According to our grafana dashboard, the node had a prolonged spike of CPU usaged (25%) at that time for a period of 30 minutes - it is completely unclear why, as we didnt produce this load. For reference: when I query it on my local machine against a locally synced mainnet running on Node 9.0 it takes 3 seconds, returning 2 UTxOs. Whatever you guys say, 2 minutes and 35 seconds is absolutely ridiculous. |
hmm... did a query right now with node 9.0.0 and it took 4 secs. the node with the higher cpu load, was that also node 9.0.0? and it was not in the ledger replay phase after an upgrade? have you taken a look on the system (htop or so), about read or write activity during that period? |
@gitmachtl yeah I know, locally it always works very fast, which I verified myself, also arriving at 3-4 seconds. |
@gitmachtl just checked the metrics:
|
@gitmachtl and when looking at the node logs we see quite a few of these
Also there was a lot of activity with peer protocol around that time. Also we saw a drop of connections as a result of these queries, dropping from 60 to 53, and recovering after. |
Note that this very problem was the primary motivation for Kupo ( https://github.com/CardanoSolutions/kupo ). I was told more than 2 years ago that the queryUTxOByAddress was deprecated and about to be removed due to performance reasons. So an alternative had to be found. I believe Kupo provides a great solution to that nowadays and has even evolved beyond that initial goal thanks to open source contributions. It's also sufficiently lightweight and self contained (single unit of deployment) that it doesn't add much constraints on an existing setup. |
@KtorZ thanks for your input on this - unfortunately we never got told that queryUTxOByAddress was deprecated and about to be removed due to performance reasons, nor was/is this apparent anywhere in the Haddocks unless I am mistaken. We might have a look into Kupo then. |
I hope that it at least stays in there as a funciton, because 3rd party tools are relying on it. Even if it is slow. |
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 120 days. |
External
Area
Node querying
Summary
We observed decreasing performance of "query utxos by address" by increasing node versions. On 1.35.7 it was around 4 seconds, and with 8.7.3 & 8.9.1 we saw wild fluctuations in timings, with most queries taking around 7 seconds while regularly observed outliers range from 30, to 45 and even 90 (!) seconds. We observed this behaviour querying the same address, on the same cloud hardware and same application.
Expected behavior
Query timings should be within a predictable range - it is clear that some flucuations are perfectly normal and expected but outliers in timings of up to 12 times throw a wrench in every production environment.
The text was updated successfully, but these errors were encountered: