Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

impr: propagate VirtualFile metrics via RequestContext #7202

Draft
wants to merge 44 commits into
base: main
Choose a base branch
from

Conversation

problame
Copy link
Contributor

@problame problame commented Mar 21, 2024

Refs

Problem

VirtualFile currently parses the path it is opened with to identify the tenant,shard,timeline labels to be used for the STORAGE_IO_SIZE metric.

Further, for each read or write call to VirtualFile, it uses with_label_values to retrieve the correct metrics object, which under the hood is a global hashmap guarded by a parking_lot mutex.

We perform tens of thousands of reads and writes per second on every pageserver instance; thus, doing the mutex lock + hashmap lookup is wasteful.

Changes

Apply the technque we do for all other timeline-scoped metrics to avoid the repeat with_label_values: add it to TimelineMetrics.

Wrap TimelineMetrics into an Arc.

Propagate the Arc<TimelineMetrics> down do VirtualFile, and use Timeline::metrics::storage_io_size.

We avoid contention on the Arc<TimelineMetrics>'s refcount atomics between different connection handlers for the same timeline.
The technique is to add indirection, using another Arc wrapper.
To avoid frequent allocations, we store that Arc<Arc> inside the per-connection timeline cache.

Preliminary refactorings to enable this change:

Performance

I ran the benchmarks in test_runner/performance/pageserver/pagebench/test_pageserver_max_throughput_getpage_at_latest_lsn.py on an i3en.3xlarge because that's what we currently run them on.

None of the benchmarks shows a meaningful difference in latency or throughput or CPU utilization.

I would have expected some improvement in the many-tenants-one-client-each workload because they all hit that hashmap constantly, and clone the same UintCounter / Arc inside of it.

But apparently the overhead is miniscule compared to the remaining work we do per getpage.

Yet, since the changes are already made, the added complexity is manageable, and the perf overhead of with_label_values demonstable in micro-benchmarks, let's have this change anyway.
Also, propagating TimelineMetrics through RequestContext might come in handy down the line.

Micro-benchmark that demonstrates perf impact of with_label_values, along with other pitfalls and mitigation techniques around the metrics/prometheus crate:

Alternative Designs

An earlier iteration of this PR stored an Arc<Arc<Timeline>> inside RequestContext.
The problem is that this risks reference cycles if the RequestContext gets stored in an object that is owned directly or indirectly by Timeline.

Ideally, we wouldn't be using this mess of Arc's at all and propagate Rust references instead.
But tokio requires tasks to be 'static, and so, we wouldn't be able to propagate references across task boundaries, which would be incompatible with fan-out type code.
So, we'd have to be propagating Cow<>s instead.

@problame problame changed the base branch from jcsp/storcon-secrets-mk2 to main March 21, 2024 17:47
Copy link

github-actions bot commented Mar 21, 2024

7744 tests run: 7366 passed, 0 failed, 378 skipped (full report)


Flaky tests (1)

Postgres 17

Code coverage* (full report)

  • functions: 32.8% (8655 of 26380 functions)
  • lines: 48.6% (73263 of 150671 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
5767947 at 2025-02-27T22:10:58.798Z :recycle:

…t wrong, also detached_child and attached_child don't use ::extend())
@problame problame force-pushed the problame/virtual-file-metrics-no-hashing branch from c8897e5 to 26da696 Compare February 24, 2025 13:55
…n; page_service's shard swapping turns out to be painful / requires `Handle: Clone`, don't want that
… Scope's refs into another Arc that can be propagated cheaply
…ys go the furthest upstack to minimize cloning of Arc<Timeline> in inner loops
For some reason the layer download API never fully got RequestContext-infected.

This PR fixes that as a precursor to
- #6107
@problame problame changed the title WIP: STORAGE_IO_SIZE: avoid with_label_values on each IO impr: propagate STORAGE_IO_SIZE metric via RequestContext Feb 25, 2025
@problame problame changed the title impr: propagate STORAGE_IO_SIZE metric via RequestContext impr: propagate VirtualFile metrics via RequestContext Feb 25, 2025
@problame
Copy link
Contributor Author

I extracted a piece of this PR into

That one I definitely want to land.

@problame
Copy link
Contributor Author

@arpad-m I was able to remove the blanket Arc<ScopeInner> - we still do the arc_arc for new_timeline(), but no longer for the pagestream case, so, we save one allocation per getpage batch

b6cc0c3

…reference cycles

Storing Arc<Timeline> inside RequestContext risks reference cycles if
the RequestContext gets stored in an object that is owned directly or
indirectly by Timeline.

So, we wrap the TimelineMetrics into an Arc and propagate that instead.
This makes it easy for future metrics to be access through
RequestContext, like we do for storage_io_size here.

To make the page_service case happy, where we may be dispatching to
a different shard every successive request, and don't want to be cloning
from the shared Timeline::metrics on each request, we pre-clone as
part of the handle cache miss.
…sk of reference cycles): bring back no alloc for pagestream
@problame problame force-pushed the problame/virtual-file-metrics-no-hashing branch from ebe5493 to ab7271a Compare February 27, 2025 20:34
…reference cycles

Storing Arc<Timeline> inside RequestContext risks reference cycles if
the RequestContext gets stored in an object that is owned directly or
indirectly by Timeline.

So, we wrap the TimelineMetrics into an Arc and propagate that instead.
This makes it easy for future metrics to be access through
RequestContext, like we do for storage_io_size here.

To make the page_service case happy, where we may be dispatching to
a different shard every successive request, and don't want to be cloning
from the shared Timeline::metrics on each request, we pre-clone as
part of the handle cache miss.
github-merge-queue bot pushed a commit that referenced this pull request Feb 28, 2025
…a special case (#11030)

# Changes

While working on
- #7202

I found myself needing to cache another expensive Arc::clone inside
inside the timeline::handle::Cache by wrapping it in another Arc.

Before this PR, it seemed like the only expensive thing we were caching
was the
connection handler tasks' clone of `Arc<Timeline>`.

But in fact the GateGuard was another such thing, but it was
special-cased
in the implementation.

So, this refactoring PR de-special-cases the GateGuard.

# Performance

With this PR we are doing strictly _less_ operations per `Cache::get`.
The reason is that we wrap the entire `Types::Timeline` into one Arc.
Before this PR, it was a separate Arc around the Arc<Timeline> and
one around the Arc<GateGuard>.

With this PR, we avoid an allocation per cached item, namely,
the separate Arc around the GateGuard.

This PR does not change the amount of shared mutable state.

So, all in all, it should be a net positive, albeit probably not
noticable
with our small non-NUMA instances and generally high CPU usage per
request.

# Reviewing

To understand the refactoring logistics, look at the changes to the unit
test types first.
Then read the improved module doc comment.
Then the remaining changes.

In the future, we could rename things to be even more generic.
For example, `Types::TenantMgr` could really be a `Types::Resolver`.
And `Types::Timeline` should, to avoid constant confusion in the doc
comment,
be called `Types::Cached` or `Types::Resolved`.
Because the `handle` module, after this PR, really doesn't care that
we're
using it for storing Arc's and GateGuards.

Then again, specicifity is sometimes more useful than being generic.
And writing the module doc comment in a totally generic way would
probably also be more confusing than helpful.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

STORAGE_IO_SIZE metric doesn't use pre-with_label_valued counters
2 participants