Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pageserver: compact with algorithm tiered occurs "could not find data for key" #11040

Open
pechstony opened this issue Feb 28, 2025 · 5 comments
Labels
external A PR or Issue is created by an external user

Comments

@pechstony
Copy link

The error info is "2025-02-27T11:51:22.200904Z ERROR compaction_loop{tenant_id=72d3f9b661153d0c04688e8a0aad39e9 shard_id=0001}: Compaction failed 1 times, retrying in 2s: Other(read failed

Caused by:
requested key not found: could not find data for key 000000067F0000362000004078000002CBF6 (shard ShardNumber(0)) at LSN 58D/E58AB55A, request LSN 58D/E58AB559, ancestor 0/0)"

we found that the key 000000067F0000362000004078000002CBF6 existed in layer file 000000067F0000362000004078000002A56E-030000000000000000000000000000000002__00000581BACDFE51-0000058D4DDE9889-00000011
which generated by a previous compact task;
log showing as below:
2025-02-21T16:53:59.983698Z INFO compaction_loop{tenant_id=72d3f9b661153d0c04688e8a0aad39e9 shard_id=0001}:run:compact_timeline{timeline_id=07793eed5d1d6cbb4e167a8ca2be5ba4}: executing job 1
2025-02-21T16:54:04.251847Z INFO compaction_loop{tenant_id=72d3f9b661153d0c04688e8a0aad39e9 shard_id=0001}:run:compact_timeline{timeline_id=07793eed5d1d6cbb4e167a8ca2be5ba4}: Rebuilt layer map. Did 2577 insertions to process a batch of 195 updates.
2025-02-21T16:54:04.251878Z INFO compaction_loop{tenant_id=72d3f9b661153d0c04688e8a0aad39e9 shard_id=0001}:run:compact_timeline{timeline_id=07793eed5d1d6cbb4e167a8ca2be5ba4}: scheduled layer file upload 000000067F00003620000040770000000000-030000000000000000000000000000000002__00000581BACDFE51-0000058D4DDE9889-00000011 gen=00000011 shard=0001
2025-02-21T16:54:04.251900Z INFO compaction_loop{tenant_id=72d3f9b661153d0c04688e8a0aad39e9 shard_id=0001}:run:compact_timeline{timeline_id=07793eed5d1d6cbb4e167a8ca2be5ba4}: scheduled layer file upload 000000067F0000362000004078000002A56E-030000000000000000000000000000000002__00000581BACDFE51-0000058D4DDE9889-00000011 gen=00000011 shard=0001
2025-02-21T16:54:04.254980Z INFO compaction_loop{tenant_id=72d3f9b661153d0c04688e8a0aad39e9 shard_id=0001}:run:compact_timeline{timeline_id=07793eed5d1d6cbb4e167a8ca2be5ba4}: scheduling metadata upload up to consistent LSN 58D/4DDE9888 with 2653 files (195 changed)

we also found that the previous compact generate another layer file:000000067F00003620000040770000000000-030000000000000000000000000000000002__00000581BACDFE51-0000058D4DDE9889-00000011 which have a different startkey with 000000067F0000362000004078000002A56E-030000000000000000000000000000000002__00000581BACDFE51-0000058D4DDE9889-00000011.

then we review the function get_vectored_reconstruct_data where the error happens, and we add some essential logs in get_vectored_reconstruct_data_timeline as follows:

then we run the pageserver with the modified version, we cannot find the layer file 000000067F0000362000004078000002A56E-030000000000000000000000000000000002__00000581BACDFE51-0000058D4DDE9889-00000011 but find 000000067F00003620000040770000000000-030000000000000000000000000000000002__00000581BACDFE51-0000058D4DDE9889-00000011 with search key-range{000000067F00003620000267D800FFFFFFFF-000000067F00003620000267D801 00000000} in the log.

we also run range_search in the ut as follows:

Image

it output failed with len = 1, this means range_search will only return the latest delta layer. Is this by design?

if this is by design, previous compact should not output these two layers which only different with startkey.

and we review the function create_delta, it will output the layers like these two layers. so it seems inconsistent
between generate layers and search layers, so what is the origin desgin and how to solve the problem?

@github-actions github-actions bot added the external A PR or Issue is created by an external user label Feb 28, 2025
@skyzh
Copy link
Member

skyzh commented Feb 28, 2025

tiered compaction is not supported for now, we didn't test thoroughly on this code path and it is not used anywhere :(

@pechstony
Copy link
Author

pechstony commented Mar 3, 2025

Ok, Thank you for your reply, @skyzh . Execuse me to describe why we use compact with tiered instead of legacy, we have a lot of wals for a tanant-timeline(may be 64MB per second), and we set pitr_interval = 7days. it will exhausted the disk which have volumn 5TB and most of these layers are delta layers. in fact, compute node(postgres) only have 300GB with pg_database_size, the expansion rate is too high so we tried the algorithm tiered and set pitr_interval = 1 hour, it cut down the disk usage. we find the legacy algorithm have add a flag EnhancedGcBottomMostCompaction which will do full compact as below:
Image
so it will cut down diskusage if we set pitr_interval = 1 hour and algorithm = legacy, and change the flags passed to compact in function compaction_iteration like this: timeline.compact(cancel, EnumSet::empty(), ctx) ->timeline.compact(cancel, EnumSet::only(CompactFlags::EnhancedGcBottomMostCompaction), ctx)? or any other method do you have to recommend to cut down the disk usage.

@skyzh
Copy link
Member

skyzh commented Mar 3, 2025

Yes, I think gc-compaction should help in your use case. We've recently added a tenant-level option to enable auto gc-compaction:

#[serde(skip_serializing_if = "Option::is_none")]
pub gc_compaction_enabled: Option<bool>,

@skyzh
Copy link
Member

skyzh commented Mar 3, 2025

...which works like tiered compaction with only two tiers

@pechstony
Copy link
Author

pechstony commented Mar 3, 2025

Thank you again, @skyzh . I'll test it in our use case. I'll share the results here once I've completed the testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
external A PR or Issue is created by an external user
Projects
None yet
Development

No branches or pull requests

2 participants