Skip to content
This repository has been archived by the owner on Apr 15, 2024. It is now read-only.

feat: better requeue mechanism using a separate nonce channel #643

Merged
merged 6 commits into from
Dec 6, 2023

Conversation

rach-id
Copy link
Member

@rach-id rach-id commented Dec 4, 2023

Overview

This PR offers a better requeue mechanism using a separate nonce channel. The reason for this is for the orchestrator to retry a nonce every hour, instead of looping instantly on it if it's up-to-date.
We could also improve this mechanism by introducing a way to reference how many tries we retired a certain before the way, but I don't think it's worth it at the moment.

Checklist

  • New and updated code has appropriate documentation
  • New and updated code has new and/or updated testing
  • Required CI checks are passing
  • Visual proof for any user facing features like CLI or documentation updates
  • Linked issues closed with keywords

Summary by CodeRabbit

  • New Features

    • Implemented a new queue management system to enhance processing efficiency.
  • Improvements

    • Improved the handling of process failures to ensure better system reliability.
  • Refactor

    • Streamlined the process flow to accommodate new queuing logic.

@rach-id rach-id added enhancement New feature or request orchestrator orchestrator related labels Dec 4, 2023
@rach-id rach-id self-assigned this Dec 4, 2023
@rach-id rach-id requested a review from evan-forbes as a code owner December 4, 2023 23:26
Copy link
Contributor

coderabbitai bot commented Dec 4, 2023

Walkthrough

The Go codebase has been updated to enhance the handling of nonce processing within an orchestrator. A new constant defines the queue size, and the orchestrator's start method now initializes a channel for failed nonces. The processing method has been augmented with a requeue mechanism and periodic checks, improving the system's resilience and efficiency in managing nonce failures.

Changes

File Change Summary
orchestrator/orchestrator.go Added queueSize constant, modified Start and ProcessNonces methods to handle failed nonces with new channels and logic.

Poem

🐇
In the code where nonces flow,
A rabbit set the queues to grow.
With each tick, the checks align,
Ensuring every nonce will shine.
🌟


Tips

Chat with CodeRabbit Bot (@coderabbitai)

  • If you reply to a review comment from CodeRabbit, the bot will automatically respond.
  • To engage with CodeRabbit bot directly around the specific lines of code in the PR, mention @coderabbitai in your review comment
  • Note: Review comments are made on code diffs or files, not on the PR overview.
  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai help to get help.

Note: For conversation with the bot, please use the review comments on code diffs or files.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • The JSON schema for the configuration file is available here.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 2

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between d3131ed and 12cb6d9.
Files selected for processing (1)
  • orchestrator/orchestrator.go (5 hunks)
Additional comments: 4
orchestrator/orchestrator.go (4)
  • 31-35: The introduction of queueSize aligns with the PR objectives to manage the size of the nonce queues, which is part of the improved requeue mechanism.

  • 72-78: The creation of failedNoncesQueue with a size defined by queueSize is consistent with the PR objectives to handle nonces that need to be retried.

  • 98-104: The addition of failedNoncesQueue to the ProcessNonces method signature and the logic within the method to handle requeuing is in line with the PR objectives.

  • 274-279: The logic to retry processing a nonce and potentially requeue it if it fails is consistent with the PR objectives. However, the MaybeRequeue function should be reviewed to ensure it adheres to the new requeue mechanism.

orchestrator/orchestrator.go Outdated Show resolved Hide resolved
orchestrator/orchestrator.go Show resolved Hide resolved
@codecov-commenter
Copy link

codecov-commenter commented Dec 4, 2023

Codecov Report

Attention: 18 lines in your changes are missing coverage. Please review.

Comparison is base (0dd4726) 24.64% compared to head (09d1a97) 24.55%.

Files Patch % Lines
orchestrator/orchestrator.go 0.00% 18 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #643      +/-   ##
==========================================
- Coverage   24.64%   24.55%   -0.10%     
==========================================
  Files          29       29              
  Lines        3201     3213      +12     
==========================================
  Hits          789      789              
- Misses       2317     2329      +12     
  Partials       95       95              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

if len(requeueQueue) > 0 && len(noncesQueue) < queueSize {
// The use of the go routine is to avoid blocking
go func() {
nonce := <-requeueQueue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we just add this as a different case in the select/for loop?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want this to be blocking, that's why I added the condition. Also, we don't want to requeue every time so that we don't keep retry indefinitely the failing nonce, but instead do it once in an hour

Copy link
Member

@evan-forbes evan-forbes Dec 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mind explaining how will it block? won't it only get triggered is something is pushed through the channel to the select statement? If it will block if noncesQueue is full, then imo we should either increase that buffer or empty the channel faster. I could be missing something ofc too

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps the emptying and filling of the channel should not be in the same select statement?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If nonces queue is full and we run:

noncesQueue <- nonce

it will block, and at the same time, it won't be able to process nonces because it will be stuck here, hence the use of the go routine.

Also, we're using the ticker so that the nonsense don't get processed every time they fail directly so that we give them an hour of a delay before they get re-processed.

perhaps the emptying and filling of the channel should not be in the same select statement?

we could and it would be cleaner, but didn't want to add that extra complexity.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just put the nonce in the queue when we attempt to retry instead of putting it in a channel, which eventually gets added to the nonce channel

or if we need the extra protection for some reason we can have a single goroutine that is connecting the two channels.

this could just be me being too nit picky, but spinning up a single goroutine to connect two channels on a timer feels like there's room for improvement

@rach-id rach-id enabled auto-merge (squash) December 6, 2023 13:30
@rach-id rach-id merged commit 9226b2e into celestiaorg:main Dec 6, 2023
17 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request orchestrator orchestrator related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants