-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: isolate faulty channels and retry channel task on faults #1
Conversation
…k number of faults
a24ac4d
to
5f957c5
Compare
@prestwich @ltchang2019 I merged the Unit tests, so I believe it is ready to be merged |
a3ff450
to
1abde77
Compare
@arnaud036 @yourbuddyconner lets run this PR in dev before merging |
This repo only build a container once the PR is merged to main. We could change this behavior but we should agree on our git flow strategy for CI/CD. The flow I had in mind was to continuously deploy main to dev when a PR gets merged and deploy to prod once a tag is created. We could also consider deploying to staging eventually and use this environment as an integration env |
High Level Changes:
Code Changes
Agent::run
no longer borrows&self
and instead takes an agent-specific<Agent>Channel
struct that defines all data types needed to run one home <> replica channelAgent::run_many
builds an<Agent>Channel
struct and hands this off to anAgent::run
task; if the run task errors out, it will log error and try to start it again instead of returning error to top levelAgent::run_all
TODO:
[ ] add unit tests to mock faulty RPC
[x] add exponential backoff for retries
[x] metric to track channel number of channel faults
Closes #161