-
Notifications
You must be signed in to change notification settings - Fork 15.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
community[patch]: Graceful handling of redis errors in RedisCache and AsyncRedisCache #17171
community[patch]: Graceful handling of redis errors in RedisCache and AsyncRedisCache #17171
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Ignored Deployment
|
generations.append(Generation(text=text)) | ||
# Gotta catch'em all | ||
except Exception as e: | ||
logger.warning(f"Redis lookup failed: {e}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this an error
level issue rather than a warning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, will change the log level to error
# In a previous life we stored the raw text directly | ||
# in the table, so assume it's in that format. | ||
generations.append(Generation(text=text)) | ||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two other reasonable behaviors.
- Retry once prior to giving up
- Circuit breaker pattern (out of scope)
What are your thoughts on these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Adding retry with some timeout make sense.
- About circuit breaker for long term failure scenario:
- We keep logging errors while redis is down until circuit is open.
- When circuit is open do the error logs continue for redis failure in this duration?
Or - Circuit is open we log error that the circuit is open so redis is not being used?
Ideally it would be better to continue showing error logs since the api costs are incurred without cache, the service user should be aware of the redis failure even during the open circuit state from continued failure.
I can start with retry and close this and start a new pr for circuit breaker considering all the scenarios or do it here. Let me know what do you suggest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think users would appreciate being able to configure the retry behavior since a retry involves some additional latency and isn't guaranteed to pay off.
Would you be able to suggest a parameterization for the initializer that you think would make sense to configure the retry behavior?
For a circuit breaker, I think it probably make sense to log periodically, but not every request. We should probably figure out a separate way to tackle this. I'd appreciate if we could design a circuit breaker as a higher level primitive that accepts a cache as an instance and dresses it up as a circuit breaker
cc @cbornet / @dzmitry-kankalovich tagging you in case this is of interest for you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think even as-is current change set is great, because cache layer IO should not break LLM calls in my opinion.
Taking it further, I do agree that more configuration around it would be nice, like the fact whether it should error out or not (default "not"?); and if retry to be added, then its definitely need to have configuration for that as well (which I'd probably set to no retry by default? you rightfully mentioned with cache retries you need to be sure this is what you want, as it can just defeat the purpose of a cache in the first place).
And yes, a general circuit breaker for cache interface would be great, but this is probably a substantial changeset, which IDK if @snsten would go for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree on all points!
@snsten do you want to proceed with the PR as is and just update the logger.warning -> logger.error? This will probably spam someone's logs at some point, but at least their service will stay up.
And if you want to tackle more configuration separately that will be great (or a circuit breaker).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, will change the error log only in this pr for the two classes mentioned
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
I'd also add it to |
f7ea281
to
53b8c86
Compare
…r redis client failure
Added the error logs for |
LGTM 👍 |
# In a previous life we stored the raw text directly | ||
# in the table, so assume it's in that format. | ||
generations.append(Generation(text=text)) | ||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
… AsyncRedisCache (langchain-ai#17171) - **Description:** The existing `RedisCache` implementation lacks proper handling for redis client failures, such as `ConnectionRefusedError`, leading to subsequent failures in pipeline components like LLM calls. This pull request aims to improve error handling for redis client issues, ensuring a more robust and graceful handling of such errors. - **Issue:** Fixes langchain-ai#16866 - **Dependencies:** No new dependency - **Twitter handle:** N/A Co-authored-by: snsten <> Co-authored-by: Eugene Yurtsev <[email protected]>
Description:
The existing
RedisCache
implementation lacks proper handling for redis client failures, such asConnectionRefusedError
, leading to subsequent failures in pipeline components like LLM calls. This pull request aims to improve error handling for redis client issues, ensuring a more robust and graceful handling of such errors.Issue: Fixes RedisCache does't handle errors from redis. #16866
Dependencies: No new dependency
Twitter handle: N/A