Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[exporter/clickhouseexporter] Sort attribute maps before insertion #33634 #35725

Conversation

earwin
Copy link
Contributor

@earwin earwin commented Oct 9, 2024

Description

Our attributes are stored as Map(String, String) in CH.
By default the order of keys is undefined and as described in #33634 leads to worse compression and duplicates in group by (unless carefully accounted for).
This PR uses the column.IterableOrderedMap facility from clickhouse-go to ensure fixed attribute key order.

It is a reimplementation of #34598 that uses less allocations and is (arguably) somewhat more straightforward.

I'm opening this as a draft, because this PR (and #34598) are blocked by ClickHouse/clickhouse-go#1365 (fixed in ClickHouse/clickhouse-go#1418)

In addition, I'm trying to add the implementation of column.IterableOrderedMap used to clickhouse-go upstream: ClickHouse/clickhouse-go#1417
If it is accepted, I will amend this PR accordingly.

Link to tracking issue

Fixes #33634

Testing

The IOM implementation was used in production independently.
I'm planning to build otelcollector with this PR and cut over my production to it in the next few of days.

Copy link

linux-foundation-easycla bot commented Oct 9, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: earwin / name: Earwin (a15186c)

@earwin earwin force-pushed the clickhouseexporter-ordered-attributes branch from 0576d04 to 5bf53bd Compare October 9, 2024 22:45
@dmitryax
Copy link
Member

dmitryax commented Oct 9, 2024

Please add a changelog entry

@earwin earwin force-pushed the clickhouseexporter-ordered-attributes branch from 5bf53bd to b093216 Compare October 9, 2024 23:46
@earwin
Copy link
Contributor Author

earwin commented Oct 9, 2024

@dmitryax there you are!

@earwin earwin force-pushed the clickhouseexporter-ordered-attributes branch from b093216 to 4e716f6 Compare October 9, 2024 23:52
@earwin
Copy link
Contributor Author

earwin commented Oct 9, 2024

Added license header

@earwin earwin force-pushed the clickhouseexporter-ordered-attributes branch 2 times, most recently from 142c36d to 1055b0b Compare October 10, 2024 00:17
@earwin
Copy link
Contributor Author

earwin commented Oct 10, 2024

[x] make goporto
[ ] Error: ./generated_component_test.go:37:20: factory.CreateLogsReceiver undefined (type receiver.Factory has no field or method CreateLogsReceiver) (typecheck) doesn't seem to have anything to do with my changes
[x] make gogci touches a lot of unrelated stuff, but did a manual run as suggested by a linter

@earwin
Copy link
Contributor Author

earwin commented Oct 15, 2024

Blocking clickhouse-go PR got merged.
Waiting for the optional one.

@dmitryax What is our policy on dependency versions?
I can go for the first commit after all the PRs are in, or wait until clickhouse-go does a tagged release.

@earwin earwin force-pushed the clickhouseexporter-ordered-attributes branch from 1055b0b to 636bdaf Compare October 16, 2024 13:52
@earwin
Copy link
Contributor Author

earwin commented Oct 16, 2024

The updated patch drops my implementation of IterableOrderedMap since it got accepted into upstream clickhouse-go.
For now it updates clickhouse-go dependency to the latest commit, waiting for a tagged release.

I am planning to make changes to clickhouse-go such as any usual map[K]V will end up being written in sorted order without any extra developer effort. IOMs will remain, but will only be used for cases when you need to override default sort order.
This will make this whole PR obsolete except for bumping the dependency ver.
I suspect the implementation is not going to be trivial/fast, so my suggestion is to take this patch now, and revert it when/if I fix clickhouse-go.

@earwin earwin force-pushed the clickhouseexporter-ordered-attributes branch from 636bdaf to 361f6b5 Compare October 17, 2024 15:36
@earwin earwin marked this pull request as ready for review October 17, 2024 15:38
@earwin earwin requested a review from a team as a code owner October 17, 2024 15:38
@earwin
Copy link
Contributor Author

earwin commented Oct 17, 2024

@dmitryax What is our policy on dependency versions? I can go for the first commit after all the PRs are in, or wait until clickhouse-go does a tagged release.

Ok, nevermind, updated to clickhouse-go v2.30.0, ready to go.

Copy link
Member

@SpencerTorres SpencerTorres left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Appreciate the contribution, glad to see this implemented. The compression benefits will be great!

@earwin
Copy link
Contributor Author

earwin commented Oct 29, 2024

Hi, folks. Is there anything else expected from me in regards to this PR, or we're waiting for reviewers' timeslot?

@hanjm
Copy link
Member

hanjm commented Oct 29, 2024

@dmitryax could you help run ci? thank you.

@earwin
Copy link
Contributor Author

earwin commented Nov 11, 2024

bump? @dmitryax

@hanjm
Copy link
Member

hanjm commented Nov 15, 2024

@open-telemetry/collector-contrib-approvers hi,anyone could help run ci? thank you.

@earwin earwin force-pushed the clickhouseexporter-ordered-attributes branch from 2e17baf to a15186c Compare November 26, 2024 21:25
@earwin
Copy link
Contributor Author

earwin commented Nov 26, 2024

Rebased to latest 'main'.
Previous CI run failed with check-links / Check that anchor links are lowercase which is mighty confusing, since I'm not adding any changelog links whatsoever.

@earwin
Copy link
Contributor Author

earwin commented Nov 26, 2024

@hanjm I will probably drop this PR after a while, sorry.
With CI run taking a month and popping errors seemingly unrelated to code changes, I don't have the focus to follow it through.

@ChrsMark
Copy link
Member

@Frapschen @dmitryax mind taking a look?

@SpencerTorres
Copy link
Member

@earwin I appreciate your patience, the CI can be a bit difficult with all of the linting scripts. It looks like the CI is passing now though

@crobert-1 I don't have permission to merge, let me know if you can get in touch with the right approvers for this. It's an important change that has been open for a while. Thanks!

@OneCyrus
Copy link

OneCyrus commented Dec 4, 2024

we are waiting for this merge as well.

looks like @crobert-1 is away for some time according to his status.
image

maybe someone else can approve the merge?

@TylerHelmuth TylerHelmuth merged commit 0cb937e into open-telemetry:main Dec 12, 2024
168 checks passed
@github-actions github-actions bot added this to the next release milestone Dec 12, 2024
sbylica-splunk pushed a commit to sbylica-splunk/opentelemetry-collector-contrib that referenced this pull request Dec 17, 2024
…en-telemetry#33634 (open-telemetry#35725)

#### Description
Our attributes are stored as Map(String, String) in CH.
By default the order of keys is undefined and as described in open-telemetry#33634
leads to worse compression and duplicates in `group by` (unless
carefully accounted for).
This PR uses the `column.IterableOrderedMap` facility from clickhouse-go
to ensure fixed attribute key order.

It is a reimplementation of open-telemetry#34598 that uses less allocations and is
(arguably) somewhat more straightforward.

I'm **opening this as a draft**, because this PR (and open-telemetry#34598) are
blocked by ClickHouse/clickhouse-go#1365
(fixed in ClickHouse/clickhouse-go#1418)

In addition, I'm trying to add the implementation of
`column.IterableOrderedMap` used to clickhouse-go upstream:
ClickHouse/clickhouse-go#1417
If it is accepted, I will amend this PR accordingly.

#### Link to tracking issue
Fixes open-telemetry#33634

#### Testing
The IOM implementation was used in production independently.
I'm planning to build otelcollector with this PR and cut over my
production to it in the next few of days.
mterhar pushed a commit to mterhar/opentelemetry-collector-contrib that referenced this pull request Dec 19, 2024
…en-telemetry#33634 (open-telemetry#35725)

#### Description
Our attributes are stored as Map(String, String) in CH.
By default the order of keys is undefined and as described in open-telemetry#33634
leads to worse compression and duplicates in `group by` (unless
carefully accounted for).
This PR uses the `column.IterableOrderedMap` facility from clickhouse-go
to ensure fixed attribute key order.

It is a reimplementation of open-telemetry#34598 that uses less allocations and is
(arguably) somewhat more straightforward.

I'm **opening this as a draft**, because this PR (and open-telemetry#34598) are
blocked by ClickHouse/clickhouse-go#1365
(fixed in ClickHouse/clickhouse-go#1418)

In addition, I'm trying to add the implementation of
`column.IterableOrderedMap` used to clickhouse-go upstream:
ClickHouse/clickhouse-go#1417
If it is accepted, I will amend this PR accordingly.

#### Link to tracking issue
Fixes open-telemetry#33634

#### Testing
The IOM implementation was used in production independently.
I'm planning to build otelcollector with this PR and cut over my
production to it in the next few of days.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ClickHouse exporter produces duplicates and poor compression without sorting attributes
8 participants