Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

services/horizon: Fixing Claimable Balances Query Limit Issue #5032

Merged
merged 8 commits into from
Sep 20, 2023

Conversation

urvisavla
Copy link
Contributor

@urvisavla urvisavla commented Aug 30, 2023

PR Checklist

PR Structure

  • This PR has reasonably narrow scope (if not, break it down into smaller PRs).
  • This PR avoids mixing refactoring changes with feature changes (split into two PRs
    otherwise).
  • This PR's title starts with name of package that is most changed in the PR, ex.
    services/friendbot, or all or doc if the changes are broad or impact many
    packages.

Thoroughness

  • This PR adds tests for the most critical parts of the new functionality or fixes.
  • I've updated any docs (developer docs, .md
    files, etc... affected by this change). Take a look in the docs folder for a given service,
    like this one.

Release planning

  • I've updated the relevant CHANGELOG (here for Horizon) if
    needed with deprecations, added features, breaking changes, and DB schema changes.
  • I've decided if this PR requires a new major/minor version according to
    semver, or if it's mainly a patch change. The PR is targeted at the next
    release branch if it's not a patch change.

What

This PR addresses the issue identified in #4907.

Issue:
When querying claimable balances with a combination of asset and claimant and setting a limit of 200, the response displayed the items correctly. However, when reducing the limit to 1 in the query, the response showed zero results.

Analysis:
The query to filter claimable balances by claimants is as follows:

SELECT cb.id, cb.claimants, cb.asset, cb.amount, cb.sponsor, cb.last_modified_ledger, cb.flags 
FROM claimable_balances cb 
WHERE cb.asset = '?' AND cb.id IN 
   (SELECT id FROM claimable_balance_claimants
   WHERE destination = 'GDB3GJCDWLAMVV7ZL6Q7BH3PGVMFWHA5KSRIY2NML3RXX35ESL3MBOXU' 
   ORDER BY last_modified_ledger ASC, id ASC LIMIT 10) 
ORDER BY last_modified_ledger ASC, id ASC LIMIT 10;

In this query, there are no asset or sponsor filters in the subquery because there is no asset or sponsor information in caimable_balance_claimants table. Consequently, when the same limit as the outer query is applied to the subquery and an asset or sponsor filter is provided, the balance ids returned by the subquery may not necessarily match the asset type hence returning incorrect (incomplete) result. The same observation was made for the sponsor filter.

Solution:
The solution is to use a different query that uses a JOIN with claimable_balance_claimants when asset or sponsor filters are specified:

SELECT cb.id, cb.claimants, cb.asset, cb.amount, cb.sponsor, cb.last_modified_ledger, cb.flags 
FROM claimable_balances cb 
JOIN claimable_balance_claimants ON claimable_balance_claimants.id = cb.id 
WHERE claimable_balance_claimants.destination = 'GDB3GJCDWLAMVV7ZL6Q7BH3PGVMFWHA5KSRIY2NML3RXX35ESL3MBOXU' 
AND cb.asset = 'AAAAAUFRVUEAAAAAW5QuU6wzyP0KgMx8GxqF19g4qcQZd6rRizrwV/jjPfA=' 
AND cb.sponsor = 'GDFCYDQOVJ2OEWPLEGIRQVAM3VTOQ6JDNLJTDZP5S5OGTEHM5CIWMYBH' 
ORDER BY cb.last_modified_ledger ASC, cb.id ASC 
LIMIT 10;

During testing, it was found that the JOIN query performs poorly when no filters other than the claimant are supplied. Therefore, in such case, we will continue to use the original subquery to retrieve claimable balances by claimants.

Test coverage:
Implemented a unit test to replicate the scenario described in the bug.

Why

Fixes #4907

Known limitations

N/A

@urvisavla urvisavla force-pushed the 4907/limit-flag-claimable-balance branch from 40a9c16 to 734938a Compare August 30, 2023 18:36
@urvisavla urvisavla marked this pull request as ready for review August 31, 2023 18:54
@Shaptic Shaptic requested review from tamirms, sydneynotthecity and a team August 31, 2023 19:17
@tamirms
Copy link
Contributor

tamirms commented Sep 5, 2023

@urvisavla you can test the performance of this fix by deploying your branch to the staging horizon cluster. Afterwards you can send requests to Horizon which exercise this query. You can also run SQL queries directly on the staging horizon DB

@urvisavla
Copy link
Contributor Author

urvisavla commented Sep 11, 2023

I executed the query against both the staging pubnet database, experimenting with different limit values. The parameters I selected were:

  1. The most frequently occurring claimants from the claimable_balance_claimants table, GDB3GJCDWLAMVV7ZL6Q7BH3PGVMFWHA5KSRIY2NML3RXX35ESL3MBOXU which 1368944 occurrences.

  2. The asset Id with the highest number of occurrences in the claimable_balances table, AAAAAUFRVUEAAAAAW5QuU6wzyP0KgMx8GxqF19g4qcQZd6rRizrwV/jjPfA= with 1397329 occurrences.

Here are the various scenarios I tested:

  1. Limit 10 for both the main and subquery (this is the current default when no limit is specified):
    Execution Time: 2.354 ms

  2. No limit on the subquery with a main query limit of 10 (proposed fix):
    Execution Time: 2572.291 ms (It's clear that some kind of limit is needed)

  3. Arbitrarily set a limit of 1000 on the inner query with an outer query limit of 10:
    Execution Time: 26.086 ms (This limit works for 99.9% of claimants in the current dataset)

  4. No limit on either the main or subquery:
    Execution Time: 5840.723 ms

I think the correct approach to address this issue would involve adding asset information to the claimants' table. However, as others have noted, this will require schema migration and data reingestion for the claimants table. Alternatively, following the suggestion of others, implementing an arbitrary limit should work well in most of cases. While it doesn't fix the bug, it significantly reduces the chances of running into it.
I think we need to make a decision on how critical it is for the query to always work correctly versus mostly working correctly. And if we want to temporarily raise the limit to an arbitrary number as stopgap solution.

@urvisavla urvisavla force-pushed the 4907/limit-flag-claimable-balance branch from 069481d to 9a42fff Compare September 11, 2023 23:04
@Shaptic
Copy link
Contributor

Shaptic commented Sep 11, 2023

Awesome analysis @urvisavla! This gives a really good picture.

Arbitrarily set a limit of 1000 on the inner query with an outer query limit of 10:
Execution Time: 26.086 ms (This limit works for 99.9% of claimants in the current dataset)

While it doesn't fix the bug, it significantly reduces the chances of running into it.

This seems the most reasonable to me. Yes it's not perfect, but it's probably more than good enough to immediately help people affected by this. Is the 0.1% a random chance, or is there a particular type of query that would continue to trigger this bug? And how certain is that %? Or do you just colloquially mean "really unlikely it'll fail"?

@urvisavla
Copy link
Contributor Author

Arbitrarily set a limit of 1000 on the inner query with an outer query limit of 10:
Execution Time: 26.086 ms (This limit works for 99.9% of claimants in the current dataset)
While it doesn't fix the bug, it significantly reduces the chances of running into it.

This seems the most reasonable to me. Yes it's not perfect, but it's probably more than good enough to immediately help people affected by this. Is the 0.1% a random chance, or is there a particular type of query that would continue to trigger this bug? And how certain is that %? Or do you just colloquially mean "really unlikely it'll fail"?

Currently, there are 615539 unique claimants, out of which 90 have more than 1000 entries in the claimants table, accounting for approximately 0.14% of the total claimants. When executing the query with a specific asset filter for these claimants when the limit is set to 1000, there is a possibility that the query may return incomplete result if all (1000) balance Ids returned by the subquery are for a different asset than the one the user has requested.

@tamirms
Copy link
Contributor

tamirms commented Sep 12, 2023

@urvisavla

I think the correct approach to address this issue would involve adding asset information to the claimants' table. However, as others have noted, this will require schema migration and data reingestion for the claimants table.

We will have to reingest all the state tables (including claimable_balance_claimants) because of the soroban release. So I think we should try changing the schema as long as the new query will still be efficient.

If you change the schema, you will need to bump this variable so that the reingestion of state tables occurs automatically:

https://github.com/stellar/go/blob/master/services/horizon/internal/ingest/main.go#L61

@urvisavla urvisavla force-pushed the 4907/limit-flag-claimable-balance branch 2 times, most recently from 920cf28 to fa48973 Compare September 13, 2023 20:59
Copy link
Contributor

@tamirms tamirms left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice job, this issue was not easy!

@urvisavla urvisavla force-pushed the 4907/limit-flag-claimable-balance branch from b4b9bc5 to b96d983 Compare September 19, 2023 22:02
@urvisavla urvisavla force-pushed the 4907/limit-flag-claimable-balance branch from b96d983 to fb28b78 Compare September 20, 2023 00:53
@urvisavla urvisavla merged commit 2b876cd into stellar:master Sep 20, 2023
@urvisavla urvisavla deleted the 4907/limit-flag-claimable-balance branch September 20, 2023 05:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

services/horizon:limit flag on claimable_balances behaving oddly
4 participants