Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dust Apps: Improve resilience #4552

Merged
merged 3 commits into from
Apr 4, 2024
Merged

Dust Apps: Improve resilience #4552

merged 3 commits into from
Apr 4, 2024

Conversation

spolu
Copy link
Contributor

@spolu spolu commented Apr 3, 2024

Description

Fixes https://github.com/dust-tt/tasks/issues/598
Fixes https://github.com/dust-tt/tasks/issues/597

  • Add at least one retry to all model errors (should reduce agent errors drastically)
  • Bump max topk to 1024 for data sources (allow deeper processing)
  • Bump the parallelism of execution of model blocks to 32 (from 8)

Risk

Could induce some rate limits but I believe low propability

Deploy Plan

  • deploy core

@spolu spolu requested a review from philipperolet April 3, 2024 14:40
Copy link
Contributor

@philipperolet philipperolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, modulo one understandability issue

factor: 2,
retries: 8,
retries: 3,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why from 8 to 3 here VS 8 to 1 for other providers eg ai21?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retryable erorrs vs non retryable errors

Retryable => all to factor 2 retries 3
Non Retryable => all to factor 1 retry 1

factor: 2,
retries: 8,
retries: 3,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok I understand, the logic is go from 8 to 3 for errors we find "retryable" (varying defintion across providers) , and from 0 to 1 for errors we find "not retriable"
A bit uncomfortable with actually retrying errors flagged as "non_retryable"
Would probably comment or rename, or explain somewhere that there are multiple definitions of retryable

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarifiaction: I'm clear that the point of the PR is to retry, this is just a readability / understandability comment

@spolu spolu merged commit 43e726f into main Apr 4, 2024
5 checks passed
@spolu spolu deleted the spolu-dust_app_resilience branch April 4, 2024 08:26
flvndvd pushed a commit that referenced this pull request May 26, 2024
* Always retry model errors at least once

* bump data source retrieval max topk to 1024

* Bump parallelism of model execution in dust apps maps
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants