Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server,sql,kv: contexts account for 6% of all allocated objects under sysbench oltp_read_only (1.9% cpu on oltp_read_write) #135904

Open
tbg opened this issue Nov 21, 2024 · 1 comment
Assignees
Labels
branch-master Failures and bugs on the master branch. C-performance Perf of queries or internals. Solution not expected to change functional behavior. o-perf-efficiency Related to performance efficiency P-2 Issues/test failures with a fix SLA of 3 months

Comments

@tbg
Copy link
Member

tbg commented Nov 21, 2024

Certainly this can be improved, which could speed up critical code paths and ease pressure on the garbage collector.
Context are nested, and we likely nest then fairly deep, resulting in linked lists that are traversed multiple times during the lifetime of a request on the critical path.

image

Epic: CRDB-42584

Jira issue: CRDB-44784

@tbg tbg added C-performance Perf of queries or internals. Solution not expected to change functional behavior. branch-master Failures and bugs on the master branch. P-2 Issues/test failures with a fix SLA of 3 months o-perf-efficiency Related to performance efficiency labels Nov 21, 2024
@tbg tbg changed the title server,sql,kv: contexts account for 6% of all allocated objects under sysbench oltp_read_only server,sql,kv: contexts account for 6% of all allocated objects under sysbench oltp_read_only (1.9% cpu) Dec 13, 2024
@tbg tbg changed the title server,sql,kv: contexts account for 6% of all allocated objects under sysbench oltp_read_only (1.9% cpu) server,sql,kv: contexts account for 6% of all allocated objects under sysbench oltp_read_only (1.9% cpu on oltp_read_write) Dec 13, 2024
@tbg
Copy link
Member Author

tbg commented Dec 13, 2024

cpu profile from oltp_read_write on e8ee6f5
image

craig bot pushed a commit that referenced this issue Dec 15, 2024
137344: ctxutil: introduce fast values r=RaduBerinde a=RaduBerinde

#### go.mod: update logtag dependency

See cockroachdb/logtags#4

Epic: none
Release note: None

#### serverident: unexport ServerIdentificationContextKey

Epic: none
Release note: None

#### ctxutil: introduce fast values

This commit introduces a custom context type that can be used to more
efficiently associate values to contexts.

Instead of using arbitrary objects as keys, the keys are a small set
of integers. Any key must be globally registered first, and there is a
limited number that can be registered.

The `fastValuesCtx` stores *all* values for these keys, so traversing
the context hierarchy is not necessary. We also provide a way to add
multiple values at the same time, which saves on allocations.

I looked at before and after profiles in sysbench. Before:
 - CPU usage:
  context.value 0.6%
  context.WithValue 0.3%
  total 0.9%
 - Allocs:
  context.WithValue: 3.5%

After:
 - CPU usage:
  context.value + ctxutil.FastValue 0.5%
  context.WithValue 0.1%
  ctxutil.WithFastValue(s) 0.1%
  total 0.7%
 - Allocs:
  context.WithValue: 0.2%
  context.WithFastValue: 0.6%
  ctxutil.FastValuesBuilder.Finish: 1.4%
  total 2.2%

I will investigate improving on this, perhaps with providing a
pre-allocated context in codepaths where this is possible.

Informs: #136581
Informs: #135904
Release note: None

Co-authored-by: Radu Berinde <[email protected]>
RaduBerinde added a commit to RaduBerinde/cockroach that referenced this issue Dec 15, 2024
Reduce allocations related to adding tags to the ambient context for a
non-local flow.

Informs: cockroachdb#135904
Release note: None
RaduBerinde added a commit to RaduBerinde/cockroach that referenced this issue Dec 15, 2024
Reduce allocations related to adding tags to the ambient context for a
non-local flow.

Informs: cockroachdb#135904
Release note: None
RaduBerinde added a commit to RaduBerinde/cockroach that referenced this issue Dec 15, 2024
Reduce allocations related to adding tags to the ambient context for a
non-local flow.

Informs: cockroachdb#135904
Release note: None
craig bot pushed a commit that referenced this issue Dec 16, 2024
137515: distsql: reduce allocations in setupFlow  r=RaduBerinde a=RaduBerinde

#### go.mod: bump cockroachdb/logtags

Incorporate cockroachdb/logtags#5

Epic: none
Release note: None

#### distsql: reduce allocations in setupFlow

Reduce allocations related to adding tags to the ambient context for a
non-local flow.

Informs: #135904
Release note: None

Co-authored-by: Radu Berinde <[email protected]>
RaduBerinde added a commit to RaduBerinde/cockroach that referenced this issue Dec 17, 2024
This commit adds `fastValuesContainer` which preallocates multiple
`fastValueCtx` objects. Contexts allocated from a container retain a
reference to the container, allowing it to be used by new descendant
contexts.

We use this explicitly in `Batch()` where we annotate a context three
times (once at the node level, then at the store level, then at the
replica level).

```
name                                   old time/op    new time/op    delta
Sysbench/SQL/3node/oltp_read_write-10    5.21ms ± 8%    5.29ms ± 6%    ~     (p=0.353 n=10+10)
Sysbench/KV/3node/oltp_read_write-10     2.42ms ± 1%    2.42ms ± 6%    ~     (p=0.447 n=9+10)

name                                   old alloc/op   new alloc/op   delta
Sysbench/SQL/3node/oltp_read_write-10    2.49MB ± 0%    2.50MB ± 0%  +0.38%  (p=0.000 n=8+10)
Sysbench/KV/3node/oltp_read_write-10     1.43MB ± 0%    1.44MB ± 0%  +0.49%  (p=0.000 n=10+10)

name                                   old allocs/op  new allocs/op  delta
Sysbench/KV/3node/oltp_read_write-10      6.21k ± 0%     6.14k ± 0%  -1.11%  (p=0.000 n=10+10)
Sysbench/SQL/3node/oltp_read_write-10     10.5k ± 0%     10.5k ± 1%  -0.57%  (p=0.001 n=10+10)
```

Informs: cockroachdb#135904
Release note: None
RaduBerinde added a commit to RaduBerinde/cockroach that referenced this issue Dec 17, 2024
This commit adds `fastValuesContainer` which preallocates multiple
`fastValueCtx` objects. Contexts allocated from a container retain a
reference to the container, allowing it to be used by new descendant
contexts.

We use this explicitly in `Batch()` where we annotate a context three
times (once at the node level, then at the store level, then at the
replica level).

```
name                                   old time/op    new time/op    delta
Sysbench/KV/3node/oltp_read_write-10     2.41ms ± 1%    2.37ms ± 2%  -1.71%  (p=0.001 n=8+9)
Sysbench/SQL/3node/oltp_read_write-10    5.15ms ± 4%    5.19ms ± 4%    ~     (p=0.631 n=10+10)

name                                   old alloc/op   new alloc/op   delta
Sysbench/KV/3node/oltp_read_write-10     1.43MB ± 0%    1.43MB ± 0%  +0.15%  (p=0.001 n=10+10)
Sysbench/SQL/3node/oltp_read_write-10    2.49MB ± 1%    2.50MB ± 0%  +0.37%  (p=0.024 n=9+9)

name                                   old allocs/op  new allocs/op  delta
Sysbench/KV/3node/oltp_read_write-10      6.21k ± 0%     6.14k ± 0%  -1.19%  (p=0.000 n=9+9)
Sysbench/SQL/3node/oltp_read_write-10     10.5k ± 0%     10.5k ± 0%  -0.48%  (p=0.000 n=10+10)
```

Informs: cockroachdb#135904
Release note: None
craig bot pushed a commit that referenced this issue Dec 23, 2024
137632: ctxutil: bundle up fastValueCtx allocs in Batch r=RaduBerinde a=RaduBerinde

This commit adds `fastValuesContainer` which preallocates multiple
`fastValueCtx` objects. Contexts allocated from a container retain a
reference to the container, allowing it to be used by new descendant
contexts.

We use this explicitly in `Batch()` where we annotate a context three
times (once at the node level, then at the store level, then at the
replica level).

```
name                                   old time/op    new time/op    delta
Sysbench/KV/3node/oltp_read_write-10     2.41ms ± 1%    2.37ms ± 2%  -1.71%  (p=0.001 n=8+9)
Sysbench/SQL/3node/oltp_read_write-10    5.15ms ± 4%    5.19ms ± 4%    ~     (p=0.631 n=10+10)

name                                   old alloc/op   new alloc/op   delta
Sysbench/KV/3node/oltp_read_write-10     1.43MB ± 0%    1.43MB ± 0%  +0.15%  (p=0.001 n=10+10)
Sysbench/SQL/3node/oltp_read_write-10    2.49MB ± 1%    2.50MB ± 0%  +0.37%  (p=0.024 n=9+9)

name                                   old allocs/op  new allocs/op  delta
Sysbench/KV/3node/oltp_read_write-10      6.21k ± 0%     6.14k ± 0%  -1.19%  (p=0.000 n=9+9)
Sysbench/SQL/3node/oltp_read_write-10     10.5k ± 0%     10.5k ± 0%  -0.48%  (p=0.000 n=10+10)
```

Informs: #135904
Release note: None

Co-authored-by: Radu Berinde <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-performance Perf of queries or internals. Solution not expected to change functional behavior. o-perf-efficiency Related to performance efficiency P-2 Issues/test failures with a fix SLA of 3 months
Projects
None yet
Development

No branches or pull requests

2 participants