-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dust Apps: Improve resilience #4552
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, modulo one understandability issue
factor: 2, | ||
retries: 8, | ||
retries: 3, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why from 8 to 3 here VS 8 to 1 for other providers eg ai21?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Retryable erorrs vs non retryable errors
Retryable => all to factor 2 retries 3
Non Retryable => all to factor 1 retry 1
factor: 2, | ||
retries: 8, | ||
retries: 3, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok I understand, the logic is go from 8 to 3 for errors we find "retryable" (varying defintion across providers) , and from 0 to 1 for errors we find "not retriable"
A bit uncomfortable with actually retrying errors flagged as "non_retryable"
Would probably comment or rename, or explain somewhere that there are multiple definitions of retryable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarifiaction: I'm clear that the point of the PR is to retry, this is just a readability / understandability comment
* Always retry model errors at least once * bump data source retrieval max topk to 1024 * Bump parallelism of model execution in dust apps maps
Description
Fixes https://github.com/dust-tt/tasks/issues/598
Fixes https://github.com/dust-tt/tasks/issues/597
Risk
Could induce some rate limits but I believe low propability
Deploy Plan
core