Skip to content
This repository has been archived by the owner on Jan 21, 2025. It is now read-only.

Sync from fork #212

Merged
merged 36 commits into from
Dec 22, 2024
Merged

Sync from fork #212

merged 36 commits into from
Dec 22, 2024

Conversation

Oded-B
Copy link
Collaborator

@Oded-B Oded-B commented Dec 22, 2024

Description

Please provide a meaningful description of what this change will do, or is for. Bonus points for including links to related issues, other PRs, or technical references.

Note that by not including a description, you are asking reviewers to do extra work to understand the context of this change, which may lead to your PR taking much longer to review, or result in it not being reviewed at all.

Type of Change

  • Bug Fix
  • New Feature
  • Breaking Change
  • Refactor
  • Documentation
  • Other (please describe)

Checklist

  • I have read the contributing guidelines
  • Existing issues have been referenced (where applicable)
  • I have verified this change is not present in other open pull requests
  • Functionality is documented
  • All code style checks pass
  • New code contribution is covered by automated tests
  • All new and existing tests pass

Oded-B and others added 30 commits June 19, 2024 17:21
Another sync from upstream 🤷‍♂️
* Use  new context (not a child)
* Remove calls to a function that exits the process.
* Add more "context"(info, like for humans) to errors
* Address another err with no context issue
* Respond the webhook only based on payload parsing.
Actual Event processing is move to background thread.

This require some refactoring, created ReciveWebhook and ReciveEventFile
functions to represent the different behavior in Web Server VS CLI
triggering while keeping to the GH stuff in the GH package
* Cancel whole drift work  on context deadline
* Move error function return value to the standard position
Handle cases where GetContents returns nil HTTP response (like in Context
cancellation)
* Document mirrord usage for local testing

Co-authored-by: Arthur Shvarts <[email protected]>

---------

Co-authored-by: Arthur Shvarts <[email protected]>
* Allow setting argoCD revision to PR git branch
* Triggering from checkbox event was written in a generic way for future proofing
* Document new config key


---------

Co-authored-by: Hannes Gustafsson <[email protected]>
* Add a new component-level "disableDiff" configuration key
* Read the configuration file before diff
* Pass the the config the relevant function
* Replace diff content with a relevant message
* Document new configuration key


Co-authored-by: Yazdan Mohammadi <[email protected]>

---------

Co-authored-by: Yazdan Mohammadi <[email protected]>
* Initial commit of "Provide Diff for new apps"
* Delete temp app after diff
* Nest ArgoCD configuration keys under one key.
* Add configuration key to toggle this feature
* Add some feature limitation to docs
* Use prBtanch for temp app object TargetRevision
…21)

* Configure mirrord to 'steal' traffic.

Stealing traffic means the Telefonistka instance running in the
cluster will not receive the webhook, only the local instance will.
This is so that we only see results from our local instance.

* Change format of promotion path comment.

The promotion path used to look like 'source/'->['path1/','path2/'].
This PR changes it to look like:
'source/'->
  'path1/'
  'path2/'

* Remove truncation of sp in promotionPRBody

* Add test for prBody

* Run gofumpt

* Add md extension to golden file

* Addjusted prBody and matching test/goldenfile

* Remove dead code
* Improve application diff

This changes the application diff to render a YAML diff with a number of
context lines instead of relying on the cmp library previously used
which would render a diff of Go objects.

* Allow bare URL in markdown

After discussion it has been deemed unnecessary to enforce non-bare URLs
in markdown.

* Upgrade golangci-lint

The earlier version fails to run on Darwin with Go 1.23.
This PR resolves intermittent test suite execution failures that are caused by
the TestAggregatePromotionPlan test [1].

JIRA Issue [2].

The function signature for generatePromotionPlanTestHelper [3] has been modified to
support a variadic expected promotions parameter.

[1] https://github.com/commercetools/telefonistka/blob/bug/SD-732/internal/pkg/githubapi/promotion_test.go#L174
[2] https://commercetools.atlassian.net/jira/software/c/projects/SD/issues/SD-732?jql=project%20%3D%20%22SD%22%20ORDER%20BY%20created%20DESC
[3] https://github.com/commercetools/telefonistka/blob/bug/SD-732/internal/pkg/githubapi/promotion_test.go#L52
* Keep the application if there is a diff error

* Update the template
* Split too big comment per cluster

If a "regular" aggregate diff can't fit in a GH comment create one
comment per cluster.
There's still a fallback to concise diff (just lists changed objects)
for extreme cases

* Move logging out of executeTemplate to higher up the stack
Create a new "testable"  generateArgoCdDiffComments function to generate
all the comments content

Comment all the comments

* Add test for 3 "levels" of comments

* Apply suggestions from code review

Co-authored-by: Hannes Gustafsson <[email protected]>
… a new application (#27)

* add nil condition to found app to prevent nil references

* add templates path to generate argo cd diff comments test

* add nolint

* separate error and nil conditions for clarity

* add a unit test to verify the nil return value

* update test for clarity
* return err as second value

* Make executeTemplate more general by extracting template path join

* Generate target url with dynamic value (CommitTime)

* Inject /etc/telefonistka-gh-app-config/ for mirrord

This will be used for CUSTOM_COMMIT_STATUS_URL_TEMPLATE_PATH

* Add doc

* Add test

* Fix lint issues

* Make targetURL const

* Change the level to Debug

See: #30 (comment)
* Improve promotion comment.

Promotion comment used to list all the values in the 'targetPaths' key,
ignoring the promotionAllow and promotionBlock lists. This PR limits
the comment to only paths that are being promoted.

* Use testify library for assert

* return nil, use assert

* Add t.Parallel()
* Avoid displaying the "Sync from PR Branch" checkbox for new applications.
* Create a new function to check if all application in a PR are new.
* Write tests for new function
* Rename HasSyncableComponents to DisplaySyncBranchCheckBox to better match its new usage
* Switch from Emoji based annotation to GH markdown "alerts"
* Fix potential panic

The change [1] introduced a potential panic as it is trying to log the
app name when a failure happens, but the app variable is overridden
during such a failure and is thus nil.

This results in the following panic.

    panic: runtime error: invalid memory address or nil pointer dereference
    [signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x20ddccb]

    goroutine 367 [running]:
    internal/pkg/argocd.generateDiffOfAComponent()
	    internal/pkg/argocd/argocd.go:477
    internal/pkg/argocd.GenerateDiffOfChangedComponents()
	    internal/pkg/argocd/argocd.go:561
    internal/pkg/githubapi.HandlePREvent()
	    internal/pkg/githubapi/github.go:161
    internal/pkg/githubapi.handleEvent()
	    internal/pkg/githubapi/github.go:382
    created by internal/pkg/githubapi.ReciveWebhook
	    internal/pkg/githubapi/github.go:322

By instead using the name from the query constructed prior, the panic
should be avoided.

[1] adbd913
… branch with auto-sync on (#35)

* Introduce .AppSyncedFromPRBranch and return no error

* Disable deploy from branch for apps that we synced from branch
When the event handling takes to long the previous context is canceled
and the request fails and the commit status ends up in a pending state.

Using a separate context will allow the status to always be set,
regardless of the event handling timing out.
Since it is very specific, might as well make it operate directly on
labels. This should make it slightly clearer and easier to read.

This reverts commit fcd2aeffd5b2752c7274514b7c78be7fc1bc60fd.
This change refactors the event handling logic such that a deferred
panic handler can log panics in the downstream handler logic. This
should avoid crashing when such panics occur, and instead it would log
the panic using the logger.

Additionally, the parsing of the event payload to determine which
handling logic to invoke is separated out, and now also indicates
whether a match was found. This is to allow PR status updates to be
applied once, when appropriate, and to enable ensuring that the
success/failure update is always applied.

To achieve this the individual downstream logic is factored out into
separate functions, and errors encountered in them are returned where
prHandleError were previously set.

Getting the default branch and Telefonistka config is duplicated in each
handler as needed.
The message seems to be informative only to developers i.e. better
suited as a debug message.

Moving it to the function that it is actually logging for ensures that
it is always logged, instead of putting this burden on the caller.
Now that the earlier error is returned, the else is not needed and can
be dropped, reducing the indentation of the happy path [1, 2].

[1] https://maelvls.dev/go-happy-line-of-sight/
[2] https://medium.com/@matryer/line-of-sight-in-code-186dd7cdea88
* Change metric type log level to debug

  Logging metric style info is better to handle properly; since this
  is one of very few instances it is changed to debug level for now.
  Future goal is to add tracing and metric instrumentation for such
  information.

* Log event type once at start of handle function

* Drop now duplicate log lines of the event type

* Consistently add event type into PR logger

  Once PR logger includes the event_type. For consistency add the same
  field to the other PR loggers.

* Remove now unused eventType argument
Prior when the configuration was not successfully fetched, the error was
only logged but execution continued. Not fetching the configuration is
an unrecoverable error that should result in upstream failure.

Instead return the error to caller and let them log it.
The test added in [1] uses environment variables in a parallel test
which causes it to be flaky and sometimes fail as is described in [2].

To fix the flaky test, the environment variable is pulled out and read
in the caller of the function under test, so that it can be tested
without reading the global environment variable.

[1] a43d3d1
[2] https://pkg.go.dev/testing#T.Setenv
ashvarts and others added 6 commits October 30, 2024 09:13
Promotion PR description generated by Telefonistka contains all default
promotion targets for a certain promotion path.

This updates the prBody function to only list the promotion targets to which
the promotion is supposed to happen for a certain promotion path.

* Added a filterSkipPaths function when generating a pr comment.

This will take the targetPaths and filter out any skipped paths from
the final promotion pr comment.

* Add getPromotionSkipPaths to find the component with fewest skip paths

---------

Co-authored-by: Hannes Gustafsson <[email protected]>
Calculating diffs on promotion pull requests like [1] fails since the
context is timing out and processing is aborted. This is evident from
logs.

    Error getting manifests for app ..., revision ...:
	rpc error: code = DeadlineExceeded desc = context deadline exceeded"
    Error generating diff for component ...:
	rpc error: code = DeadlineExceeded desc = context deadline exceeded"
    Handling of PR event failed: err=getting diff information:
	rpc error: code = DeadlineExceeded desc = context deadline exceeded\n" ...

Increasing the timeout is a short-term fix to mitigate the issue and
allow users to get the diff while additional investigation is done to
figure out how we can address this better long-term.

[1] commercetools/k8s-gitops#998
* wip

* Concurrent argocd diff


* Move InitArgoClients out of server.go

* Update internal/pkg/githubapi/github.go

Co-authored-by: Hannes Gustafsson <[email protected]>

* Update internal/pkg/argocd/argocd_test.go

Co-authored-by: Hannes Gustafsson <[email protected]>

* Update internal/pkg/argocd/argocd.go

Co-authored-by: Hannes Gustafsson <[email protected]>

* Update internal/pkg/argocd/argocd_test.go

Co-authored-by: Hannes Gustafsson <[email protected]>

* Update internal/pkg/argocd/argocd.go

Co-authored-by: Hannes Gustafsson <[email protected]>

* Update internal/pkg/argocd/argocd.go

Co-authored-by: Hannes Gustafsson <[email protected]>

* Update internal/pkg/argocd/argocd.go

Co-authored-by: Hannes Gustafsson <[email protected]>

* Use dependency injection with argocd client

* Remove dangling err check

---------

Co-authored-by: Hannes Gustafsson <[email protected]>
Instrument  PR handling failures.

We track both explicit failures with pr_handle_failures_total
And cases where telefonistka commit status check  is left in "pending"
state (because Telefonistka exploded while handling that event )


Co-authored-by: Yazdan Mohammadi <[email protected]>
@Oded-B Oded-B merged commit 011d32c into wayfair-incubator:main Dec 22, 2024
3 of 4 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants