Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

page_service: getpage batching: refactor & minor fixes #9792

Open
wants to merge 14 commits into
base: problame/batching-benchmark
Choose a base branch
from

Conversation

problame
Copy link
Contributor

@problame problame commented Nov 18, 2024

This PR refactors the page_service server-side batching code that was recently added in
#9377.

Changes:

  • Store carried-over batching state in a local variable inside handle_pagestream, instead of in the PageServerHandler. This adds robustness because it systematically avoids a source of bugs.
  • Instead of next_batch, call it carry.
  • When starting a new batch read, take into account the time carried over from the previous call.
  • Build the batch inside the &mut Option<Carry> of in a local variable.
    • Before, we would have to make sure that we move the batch back into the self.next_batch whenever we bail.

Copy link

github-actions bot commented Nov 18, 2024

5535 tests run: 5309 passed, 0 failed, 226 skipped (full report)


Flaky tests (1)

Postgres 15

Code coverage* (full report)

  • functions: 31.4% (7951 of 25322 functions)
  • lines: 49.3% (63084 of 127831 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
058b35f at 2024-11-21T11:30:41.601Z :recycle:

problame added a commit that referenced this pull request Nov 18, 2024
## Problem

We don't take advantage of queue depth generated by the compute
on the pageserver. We can process getpage requests more efficiently
by batching them. 

## Summary of changes

Batch up incoming getpage requests that arrive within a configurable
time window (`server_side_batch_timeout`).
Then process the entire batch via one `get_vectored` timeline operation.
By default, no merging takes place.

## Testing

* **Functional**: #9792
* **Performance**: will be done in staging/pre-prod

# Refs

* #9377
* #9376

Co-authored-by: Christian Schwarz <[email protected]>
Base automatically changed from vlad/pageserver-merge-get-page-requests to main November 18, 2024 20:24
@problame problame changed the title WIP: page_service: add basic testcase for merging WIP: page_service: add basic benchmark Nov 18, 2024
@problame problame changed the title WIP: page_service: add basic benchmark WIP: page_service: add basic test & benchmark Nov 18, 2024
The steps in the test work in neon_local + psql
but for some reason they don't work in the test.

Asked compute team on Slack for help:
https://neondb.slack.com/archives/C04DGM6SMTM/p1731952688386789
@problame problame changed the title WIP: page_service: add basic test & benchmark WIP: page_service: add basic sequential scan benchmark Nov 18, 2024
@problame problame changed the title WIP: page_service: add basic sequential scan benchmark WIP: batching: bugfixes & benchmark Nov 19, 2024
problame added a commit that referenced this pull request Nov 20, 2024
This PR adds a benchmark to demonstrate the effect of server-side
getpage request batching added in #9321.

Refs:

- Epic: #9376
- Extracted from #9792
@problame problame changed the base branch from main to problame/batching-benchmark November 20, 2024 13:24
@problame problame changed the title WIP: batching: bugfixes & benchmark page_service: refactor & minor fixes for batching code Nov 20, 2024
@problame problame changed the title page_service: refactor & minor fixes for batching code page_service: getpage batching: refactor & minor fixes Nov 20, 2024
@problame problame marked this pull request as ready for review November 21, 2024 10:24
@problame problame requested a review from a team as a code owner November 21, 2024 10:24
Copy link
Contributor

@VladLazar VladLazar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked it out locally and stepped through it a number of times. Looks correct, but have a look at the comments.

pageserver/src/page_service.rs Show resolved Hide resolved
pageserver/src/page_service.rs Show resolved Hide resolved
pageserver/src/page_service.rs Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pageserver: batch get page requests and serve them with one vectored get
2 participants