Skip to content

Commit 431c32b

Browse files
authored
[Fleet] cancel tasks when 3rd retry failed (elastic#147190)
## Summary Related to elastic#144161 Found that on a bulk update tags task failure, the task didn't stop after 3 retries (should be over in less then a minute), the retries kept happening for 2 hours. This change removes the retry task if 3 retries are reached. Also testing in cloud deployment to see if the tags error can be reproduced with this fix. I could reproduce the reported error locally, and seeing it goes away with this fix. To verify: - Add at least 50k agents with the `create_agents` script in kibana repo - open Kibana, select the 50k agents, and open Actions / Add tags - Try this in a few seconds: add 2 new tags, and remove one of them - Wait about 30s, the agents should reflect the changes - Check the logs to see that the tasks are removed after 3rd retry is reached or successful. - Check that there are no more running tasks. Any running task can be found in Kibana Console by running this query: `GET .kibana_task_manager/_search?q=task.taskType:"fleet:update_agent_tags:retry"` Locally simulated an error to test that the retry (and check) task is removed: ``` [2022-12-07T15:52:16.415+01:00][ERROR][plugins.fleet] Retry #3 of task fleet:update_agent_tags:retry:848984ab-c11d-4ebe-8d1f-606143dd656b failed: failing task [2022-12-07T15:52:16.416+01:00][WARN ][plugins.fleet] Stopping after 3rd retry. Error: failing task [2022-12-07T15:52:16.416+01:00][INFO ][plugins.fleet] Removing task fleet:update_agent_tags:retry:check:848984ab-c11d-4ebe-8d1f-606143dd656b [2022-12-07T15:52:16.416+01:00][INFO ][plugins.fleet] Removing task fleet:update_agent_tags:retry:848984ab-c11d-4ebe-8d1f-606143dd656b ```
1 parent 1f0ae32 commit 431c32b

File tree

1 file changed

+7
-0
lines changed

1 file changed

+7
-0
lines changed

x-pack/plugins/fleet/server/services/agents/action_runner.ts

+7
Original file line numberDiff line numberDiff line change
@@ -113,6 +113,13 @@ export abstract class ActionRunner {
113113
if (this.retryParams.retryCount === 3) {
114114
const errorMessage = 'Stopping after 3rd retry. Error: ' + error.message;
115115
appContextService.getLogger().warn(errorMessage);
116+
117+
// clean up tasks after 3rd retry reached
118+
await Promise.all([
119+
this.bulkActionsResolver!.removeIfExists(this.checkTaskId!),
120+
this.bulkActionsResolver!.removeIfExists(this.retryParams.taskId!),
121+
]);
122+
116123
return;
117124
}
118125
} else {

0 commit comments

Comments
 (0)