Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: background pooler with retry #5

Closed
wants to merge 5 commits into from

Conversation

CermakM
Copy link

@CermakM CermakM commented Feb 26, 2024

This PR shifts the initial implementation towards a backend pooler
with retry functionality.

The assumptions and implementation decisions have been made when implementing the pooler:

  • the background manager's primary purpose is to run (ideally)
    permananent background goroutines, as such, there is no need for
    lifecycle hooks such as OnTaskSucceeded or onTaskFailed, those were
    removed in favor of the retrier.
  • the manager is context-based and all go routines spawned by the
    manager share a single cancellable context (cancellable independantly
    of their parent context)
  • when stopped or cancelled, the conc pooler gracefully shuts down
    dangling goroutines
  • by default, the pooler returns joined errors, this behaviour can be
    modifier with WithFirstError Option.

Example usage:

	ctx := context.Background()
	// Create a new manager.
	// The manager will cancel its context and all its tasks if any of the tasks returns an error.
	// The manager will return the first error encountered by any of its tasks.
	backgroundManager := manager.New(ctx, manager.WithCancelOnError(), manager.WithFirstError())
	//nolint
	backgroundManager.Run(
		func(ctx context.Context) error {
			select {
			case <-ctx.Done():
				return ctx.Err()
			case <-time.After(2 * time.Second):
				// Fail after 2 seconds.
				return context.DeadlineExceeded
			}
		},
	)
	backgroundManager.Run(
		func(ctx context.Context) error {
			select {
			case <-ctx.Done():
				return ctx.Err()
			case <-time.After(5 * time.Second):
				fmt.Printf("This won't be executed\n")
			}
		},
	)
	if err := backgroundManager.Wait(); err != nil {
		//nolint
		fmt.Printf("Error: %v\n", err) // Output: Error: context deadline exceeded
	}

Signed-off-by: Marek Cermak <[email protected]>
This commit shifts the initial implementation towards a backend pooler
with retry functionality.

The assumptions and implementation decisions have been made when implementing the pooler:
- the background manager's primary purpose is to run (ideally)
  permananent background goroutines, as such, there is no need for
  lifecycle hooks such as OnTaskSucceeded or onTaskFailed, those were
  removed in favor of the retrier.
- the manager is context-based and all go routines spawned by the
  manager share a single cancellable context (cancellable independantly
      of their parent context)
- when stopped or cancelled, the conc pooler gracefully shuts down
  dangling goroutines
- by default, the pooler returns joined errors, this behaviour can be
  modifier with WithFirstError Option.

Signed-off-by: Marek Cermak <[email protected]>
Signed-off-by: Marek Cermak <[email protected]>
Copy link

@Fazt01 Fazt01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First impression it doesn't really do much either, but will review more thoroughly tomorrow.
Just found 1 obvious point.


func (m *Manager) withStats(task TaskFunc) TaskFunc {
return func(ctx context.Context) error {
m.stat.RunningTasks++
Copy link

@Fazt01 Fazt01 Feb 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see a race condition here (not atomic operation).
simple: don't have stat (no adding and subtracting 1)
less-but-still-simple: add mutex or use atomic operation

Copy link

@Fazt01 Fazt01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even after more thorough review, I still think that this approach doesn't really add value.
Most of the functionality is in options which:

  • forwards 3 of the conc options
  • forwards 2 of the retrier options.

If we consider the use case of long running tasks (e.g. SQS handlers), I don't really want WithCancelOnError or WithFirstError. If I know all the tasks that will need to run at the start, I also don't want WithMaxTasks. Considering the retriers: in SQS handler I'll have a for (while true) loop, that will probably handle any errors and retries in its body - I don't really expect it to ever return an error (maybe apart from context.Cancelled) - maybe panic could happen, which is the only think that the retrier would help with.

My possible idea for a "feature request" would be to handle such outer loop of for (while true): user would provide an inner body of that loop (e.g. read from SQS and handle the message), optionally return and error. This package could then reasonably handle retries (even with infinite attempts - why not) - wait a bit on error, rerun immediately on success (read next message).

Then, a follow-up "feature-request" would be to gracefully complete the iteration on .Stop(): consider .Stop() is called on SIGTERM - the SQS handler should have a possibility to (without cancelling the ctx) complete the iteration (handle all db and external-api stuff) - hopefuly within few seconds. But next iteration would not start - there would be no possible context.Cancelled when doing db or external-api stuff.

However, even with these features, it would most likely still be less ergonomic to include such module, than to handle these in the app itself - not sure without seeing such potential usage.

I prefer this version infinitesimally to the master version. I prefer nothing infinitely to either version.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a reason to add all these, apart from .idea and .vscode.
For example, .env.test is something that we would want to have in repo, right? (if it did exist at all)

"github.com/eapache/go-resiliency/retrier"
)

type Option interface {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be simpler to have just type Option func(m *Manager) - most options could be simplified to inline the function. And there isn't really any other reasonable method into this interface 🤔

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functional options pattern is implemented here for example

robertrossmann added a commit that referenced this pull request Feb 29, 2024
Now the task can be set up to retry on failure. By default, we only make
a single execution attempt.

The retry mechanism is implemented in
https://github.com/kamilsk/retry/v5 . I chose this package over
https://github.com/eapache/go-resiliency because `retry` has a much more
flexible API, provides a ton of strategies and backoff algorithms and
the strategies are nicely defined via an interface, making extensions
and customisations very easy.

Unfortunately, the last update to `retry` was in February 2021 so I
might yet regret this decision but for the time being I feel like this
is the superior choice over `go-resiliency`.

> This partially implements #5. More work is still needed to incorporate
all features from that pull request, namely being able to cancel
long-running tasks.
robertrossmann added a commit that referenced this pull request Mar 8, 2024
> Closes #5 

This is an alternative implementation of what @CermakM proposed in #5
that retains the previous functionality (one-off task executions,
hooks/observer methods, etc.) and adds a periodic/looping tasks on top
of that.

### Some implementation notes:

- the tasks are restarted immediately when the previous function's
iteration returns.
- if the task returns an error, retry policies are applied just like for
one-off tasks. Retries happen within the same loop iteration.
- if the retries are not successful, the observer's `OnTaskFailed()`
method is called and a new loop starts again.
- the `OnTaskStalled()` and `OnTaskSucceeded()` methods are ignored for
loop tasks.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants