You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Due to the general implementation design of tidb query max_execution_time, some queries can still complete even after exceeding the specified limit. However, after the recent change introduced in #56923, we observed an increased query error rate during TiKV store leader transfers when using strict max_execution_time(subsecond, like 500ms).
1. Minimal reproduce step (Required)
Simulate a scenario where a TiKV node experiences EBS latency issues, triggering leader transfers away from this node.
When TiDB attempts to read from regions undergoing leader transfer, it encounters notLeader errors without receiving new leader information.
max_execution_time was not propagated to the backoff context.
The TiKV client could perform multiple retries (ref) to eventually locate the new leader(or try follower) and complete the query, even if it exceeded max_execution_time.
max_execution_time is now propagated to the backoff context.
The request gets canceled once the backoff detects that the context timeout(or other func like s.client.SendRequest detects the context timeout) has been reached (ref), resulting in more query failures during leader transfers.
2. What did you expect to see? (Required)
See below
3. What did you see instead (Required)
While it's hard to say this is a real "bug," the stricter enforcement of max_execution_time has led to a noticeable increase in query errors during TiKV leader transfers. This behavioral change is significant and worth attention, as it affects query reliability under certain failure scenarios.
4. What is your TiDB version? (Required)
v6.5.4, but this issue very likely applies to later versions as well.
The text was updated successfully, but these errors were encountered:
Bug Report
Due to the general implementation design of tidb query max_execution_time, some queries can still complete even after exceeding the specified limit. However, after the recent change introduced in #56923, we observed an increased query error rate during TiKV store leader transfers when using strict max_execution_time(subsecond, like 500ms).
1. Minimal reproduce step (Required)
2. What did you expect to see? (Required)
See below
3. What did you see instead (Required)
While it's hard to say this is a real "bug," the stricter enforcement of max_execution_time has led to a noticeable increase in query errors during TiKV leader transfers. This behavioral change is significant and worth attention, as it affects query reliability under certain failure scenarios.
4. What is your TiDB version? (Required)
v6.5.4, but this issue very likely applies to later versions as well.
The text was updated successfully, but these errors were encountered: