Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[chore] Schema Processor Revamp Implementation Parent PR #35248

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

ankitpatel96
Copy link
Contributor

Description:
This is a duplicate of #35213 - I had to reopen this because the Github Actions refused to trigger.

This is the schema processor, as originally written by @MovieStoreGuy in #11547 - refreshed and rebased. I've rewritten some parts and added significant amounts of testing to many parts - resulting in quite a few changes. I will be splitting this PR into many parts.

There are several features that need to be completed before this is ready for production use:

This only supports the Schema File Format 1.0.0 - we need to support 1.1.0
Right now only downgrades work. Upgrades need some work to enable
We should improve internal metrics and observability.

@ankitpatel96 ankitpatel96 requested a review from a team September 17, 2024 16:00
@github-actions github-actions bot added the processor/schema Schema processor label Sep 17, 2024
@ankitpatel96 ankitpatel96 changed the title Schema Processor Revamp implementation [chore] Schema Processor Revamp Implementation Parent PR Sep 17, 2024
@tigrannajaryan
Copy link
Member

I will be splitting this PR into many parts.

@ankitpatel96 thank you for working on this. Are you looking for reviews on this PR or we should wait until you split into smaller ones?

@ankitpatel96
Copy link
Contributor Author

ankitpatel96 commented Sep 18, 2024

I will be splitting this PR into many parts.

@ankitpatel96 thank you for working on this. Are you looking for reviews on this PR or we should wait until you split into smaller ones?

Hi @tigrannajaryan, I am definitely not trying to get a review on this entire PR haha - I am in the process of splitting it into 3-4 PRs.

Part 1 - #35214
Part 2 - #35267

The rest of the code is largely done - I'm just working on splitting it into stacked PRs that makes sense.

I am working up a mini design document so people have an idea of what they are looking at.
Will leave a comment on this PR when that's done!

@jsuereth
Copy link
Contributor

jsuereth commented Sep 19, 2024

Edit: Added link to specification

One major question -> Is this not intended to support resource attribute changes?

I was going through looking for the intricate "fun" code to deal with that scenario, but I didn't see anything, nor do I see tests associated with it.

I think it's fine for to disallow that for an initial implementation/improvement, but you may want to consider the architecture/design for when you need to apply attirbute name changes on resources in addition to signals.

Specifically -> The shuffling of Data points out of a resource and into a new one is... fun. From my SDK implementation this was much simpler, as the SDK only allows one resource. For the Collector -> I think you'll need some robust code, checks and performance tests around that.

See https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/schemas/file_format_v1.1.0.md#all-section - specifically the notion there's a resource attribute rename possibility and that all must apply to that too.

@ankitpatel96
Copy link
Contributor Author

Hi Josh,
I have implemented attribute renames for the all block and the resource block but I made a mistake and didn't realize that I had to make each ResourceLogs struct have a unique Resource within in a plog.Logs object (along with all the other signal types). I cut a ticket for it - #35305 - and I think this is definitely something we can support in the future. Thanks for bringing it up!

Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Oct 15, 2024
@ankitpatel96 ankitpatel96 force-pushed the ankit-schema-processor branch from 1829df1 to 472cedb Compare October 15, 2024 19:26
@github-actions github-actions bot removed the Stale label Oct 16, 2024
@ankitpatel96 ankitpatel96 force-pushed the ankit-schema-processor branch from 472cedb to af3a4fd Compare October 16, 2024 22:01
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Nov 22, 2024
@songy23 songy23 added never stale Issues marked with this label will be never staled and automatically removed and removed Stale labels Nov 22, 2024
mx-psi added a commit that referenced this pull request Dec 3, 2024
…35267)

**Description:** <Describe what has changed.>
This is a slice of changes from
#35248

This PR details how operators are used to build the execution pipeline
for a given schemafile.



Changed files from the [previous
PR](#35214)
are:

processor/schemaprocessor/internal/changelist/changelist.go
processor/schemaprocessor/internal/translation/revision_v1.go
processor/schemaprocessor/internal/translation/revision_v1_test.go
processor/schemaprocessor/go.mod

I'm asking a maintainer if they would be willing to push a copy of the
previous PR's branch to the core repo so I can switch the base of this
PR to the previous PR - thus only the stacked changes would be shown.

Edit: this is apparently not easily supported - so asking reviewers to
just focus on the changed files listed above. Sorry about that!

**Testing:** <Describe what testing was performed and which tests were
added.>
Unit tests

---------

Co-authored-by: Pablo Baeyens <[email protected]>
shivanthzen pushed a commit to shivanthzen/opentelemetry-collector-contrib that referenced this pull request Dec 5, 2024
…pen-telemetry#35267)

**Description:** <Describe what has changed.>
This is a slice of changes from
open-telemetry#35248

This PR details how operators are used to build the execution pipeline
for a given schemafile.



Changed files from the [previous
PR](open-telemetry#35214)
are:

processor/schemaprocessor/internal/changelist/changelist.go
processor/schemaprocessor/internal/translation/revision_v1.go
processor/schemaprocessor/internal/translation/revision_v1_test.go
processor/schemaprocessor/go.mod

I'm asking a maintainer if they would be willing to push a copy of the
previous PR's branch to the core repo so I can switch the base of this
PR to the previous PR - thus only the stacked changes would be shown.

Edit: this is apparently not easily supported - so asking reviewers to
just focus on the changed files listed above. Sorry about that!

**Testing:** <Describe what testing was performed and which tests were
added.>
Unit tests

---------

Co-authored-by: Pablo Baeyens <[email protected]>
ZenoCC-Peng pushed a commit to ZenoCC-Peng/opentelemetry-collector-contrib that referenced this pull request Dec 6, 2024
…pen-telemetry#35267)

**Description:** <Describe what has changed.>
This is a slice of changes from
open-telemetry#35248

This PR details how operators are used to build the execution pipeline
for a given schemafile.



Changed files from the [previous
PR](open-telemetry#35214)
are:

processor/schemaprocessor/internal/changelist/changelist.go
processor/schemaprocessor/internal/translation/revision_v1.go
processor/schemaprocessor/internal/translation/revision_v1_test.go
processor/schemaprocessor/go.mod

I'm asking a maintainer if they would be willing to push a copy of the
previous PR's branch to the core repo so I can switch the base of this
PR to the previous PR - thus only the stacked changes would be shown.

Edit: this is apparently not easily supported - so asking reviewers to
just focus on the changed files listed above. Sorry about that!

**Testing:** <Describe what testing was performed and which tests were
added.>
Unit tests

---------

Co-authored-by: Pablo Baeyens <[email protected]>
sbylica-splunk pushed a commit to sbylica-splunk/opentelemetry-collector-contrib that referenced this pull request Dec 17, 2024
…pen-telemetry#35267)

**Description:** <Describe what has changed.>
This is a slice of changes from
open-telemetry#35248

This PR details how operators are used to build the execution pipeline
for a given schemafile.



Changed files from the [previous
PR](open-telemetry#35214)
are:

processor/schemaprocessor/internal/changelist/changelist.go
processor/schemaprocessor/internal/translation/revision_v1.go
processor/schemaprocessor/internal/translation/revision_v1_test.go
processor/schemaprocessor/go.mod

I'm asking a maintainer if they would be willing to push a copy of the
previous PR's branch to the core repo so I can switch the base of this
PR to the previous PR - thus only the stacked changes would be shown.

Edit: this is apparently not easily supported - so asking reviewers to
just focus on the changed files listed above. Sorry about that!

**Testing:** <Describe what testing was performed and which tests were
added.>
Unit tests

---------

Co-authored-by: Pablo Baeyens <[email protected]>
@ankitpatel96 ankitpatel96 force-pushed the ankit-schema-processor branch from 58256c1 to 64aedc8 Compare December 19, 2024 18:07
@ankitpatel96
Copy link
Contributor Author

Hi, this is ready for review! @tigrannajaryan @MovieStoreGuy please take a look!

Much of this is @MovieStoreGuy's original work - it works to lookup and cache the schema files as well as orchestrating the rest of the code / hooking it up to the collector factory. The vast majority of this PR is tests that I've written that are pretty end-to-end.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
never stale Issues marked with this label will be never staled and automatically removed processor/schema Schema processor
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants