-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle cluster_block_exception during reindexing the TM index #201297
Handle cluster_block_exception during reindexing the TM index #201297
Conversation
|
||
const FLUSH_MARKER = Symbol('flush'); | ||
export const ADJUST_THROUGHPUT_INTERVAL = 10 * 1000; | ||
export const PREFERRED_MAX_POLL_INTERVAL = 60 * 1000; | ||
export const INTERVAL_AFTER_BLOCK_EXCEPTION = 61 * 1000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make it 1 sec longer than the max limit, so I can check the previousPollInterval on error flush and set the interval back to default.
return event.tag === 'emit'; | ||
} | ||
|
||
function incementErrorCount(count: number) { | ||
function incrementOrEmitErrorCount(count: number, isBlockException: boolean) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to emit the error event as soon as possible in case of ClusterBlockException
Haven't reviewed the code yet, but I did take it for a spin. Notes:
Other than that, seems to work as described. Looks like it's logging the Discovery service message ~1/minute, and then you can see errors updating task claims, etc, as expected. When the block is removed, everything comes back to normal. |
Yes, I also think that it is ok, because there should not be a write-block during plugin start. Upgrade assistant can be used in an already running Kibana.
I don't think that there will be users opting in for |
Pinging @elastic/response-ops (Team:ResponseOps) |
x-pack/plugins/task_manager/server/lib/create_managed_configuration.ts
Outdated
Show resolved
Hide resolved
2cde300
to
c2da309
Compare
); | ||
} else { | ||
this.logger.error( | ||
`Kibana Discovery Service couldn't update this node's last_seen timestamp. id: ${this.currentNode}, last_seen: ${lastSeen}, error:${e.message}` | ||
); | ||
} | ||
if (isClusterBlockException(e)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this check needs to move up, otherwise the log always says the retryInterval is 10000 ms even if it's actually 60,000
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah good point, i didn't check that. Fixed it, thanks.
I'm seeing the poll interval flip flop between 500 and 61,000 (I changed the debug log to an info)
and then 1 minute later
Should we also look at the task manager capacity calculation log? It's calculating |
Actually it is not flapping, It sets the interval to 61000 and schedules the tasks with it.
I set the capacity to previousCapacity in case of cluster_block_exception but not sure if this is correct. WDYT? |
I see, so the poll interval does get reset back to 500 but since it is already set to |
I think I managed to hide that message, have just pushed the change |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Verified the poll interval increases when cluster block exception seen and reverts when it is no longer seen.
💚 Build Succeeded
Metrics [docs]
History
|
Starting backport for target branches: 8.x |
…c#201297) Resolves: elastic/response-ops-team#249 This PR increases task claiming interval in case of `cluster_block_exception` to avoid generating too many error during TM index reindexing. ## To verify: - Run your local Kibana, - Create a user with `kibana_system` and `kibana_admin` roles - Logout and login with your new user - Use below request to put a write block on TM index. `PUT /.kibana_task_manager_9.0.0_001/_block/write` - Observe the error messages and their occurring interval on your terminal. - Use below request on the Kibana console to halt write block. ``` PUT /.kibana_task_manager_9.0.0_001/_settings { "index": { "blocks.write": false } } ``` (cherry picked from commit 7aa80ce)
💚 All backports created successfully
Note: Successful backport PRs will be merged automatically after passing CI. Questions ?Please refer to the Backport tool documentation |
…201297) (#203609) # Backport This will backport the following commits from `main` to `8.x`: - [Handle cluster_block_exception during reindexing the TM index (#201297)](#201297) <!--- Backport version: 9.4.3 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport) <!--BACKPORT [{"author":{"name":"Ersin Erdal","email":"[email protected]"},"sourceCommit":{"committedDate":"2024-12-10T15:17:27Z","message":"Handle cluster_block_exception during reindexing the TM index (#201297)\n\nResolves: https://github.com/elastic/response-ops-team/issues/249\r\n\r\nThis PR increases task claiming interval in case of\r\n`cluster_block_exception` to avoid generating too many error during TM\r\nindex reindexing.\r\n\r\n## To verify:\r\n\r\n- Run your local Kibana,\r\n- Create a user with `kibana_system` and `kibana_admin` roles\r\n- Logout and login with your new user\r\n- Use below request to put a write block on TM index.\r\n `PUT /.kibana_task_manager_9.0.0_001/_block/write`\r\n- Observe the error messages and their occurring interval on your\r\nterminal.\r\n- Use below request on the Kibana console to halt write block.\r\n```\r\nPUT /.kibana_task_manager_9.0.0_001/_settings\r\n{\r\n \"index\": {\r\n \"blocks.write\": false\r\n }\r\n}\r\n```","sha":"7aa80ce53027df7ac0e5fc01d206ef38ac3f9575","branchLabelMapping":{"^v9.0.0$":"main","^v8.18.0$":"8.x","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","Team:ResponseOps","v9.0.0","backport:prev-minor"],"title":"Handle cluster_block_exception during reindexing the TM index","number":201297,"url":"https://github.com/elastic/kibana/pull/201297","mergeCommit":{"message":"Handle cluster_block_exception during reindexing the TM index (#201297)\n\nResolves: https://github.com/elastic/response-ops-team/issues/249\r\n\r\nThis PR increases task claiming interval in case of\r\n`cluster_block_exception` to avoid generating too many error during TM\r\nindex reindexing.\r\n\r\n## To verify:\r\n\r\n- Run your local Kibana,\r\n- Create a user with `kibana_system` and `kibana_admin` roles\r\n- Logout and login with your new user\r\n- Use below request to put a write block on TM index.\r\n `PUT /.kibana_task_manager_9.0.0_001/_block/write`\r\n- Observe the error messages and their occurring interval on your\r\nterminal.\r\n- Use below request on the Kibana console to halt write block.\r\n```\r\nPUT /.kibana_task_manager_9.0.0_001/_settings\r\n{\r\n \"index\": {\r\n \"blocks.write\": false\r\n }\r\n}\r\n```","sha":"7aa80ce53027df7ac0e5fc01d206ef38ac3f9575"}},"sourceBranch":"main","suggestedTargetBranches":[],"targetPullRequestStates":[{"branch":"main","label":"v9.0.0","branchLabelMappingKey":"^v9.0.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/201297","number":201297,"mergeCommit":{"message":"Handle cluster_block_exception during reindexing the TM index (#201297)\n\nResolves: https://github.com/elastic/response-ops-team/issues/249\r\n\r\nThis PR increases task claiming interval in case of\r\n`cluster_block_exception` to avoid generating too many error during TM\r\nindex reindexing.\r\n\r\n## To verify:\r\n\r\n- Run your local Kibana,\r\n- Create a user with `kibana_system` and `kibana_admin` roles\r\n- Logout and login with your new user\r\n- Use below request to put a write block on TM index.\r\n `PUT /.kibana_task_manager_9.0.0_001/_block/write`\r\n- Observe the error messages and their occurring interval on your\r\nterminal.\r\n- Use below request on the Kibana console to halt write block.\r\n```\r\nPUT /.kibana_task_manager_9.0.0_001/_settings\r\n{\r\n \"index\": {\r\n \"blocks.write\": false\r\n }\r\n}\r\n```","sha":"7aa80ce53027df7ac0e5fc01d206ef38ac3f9575"}}]}] BACKPORT--> Co-authored-by: Ersin Erdal <[email protected]>
…c#201297) Resolves: elastic/response-ops-team#249 This PR increases task claiming interval in case of `cluster_block_exception` to avoid generating too many error during TM index reindexing. ## To verify: - Run your local Kibana, - Create a user with `kibana_system` and `kibana_admin` roles - Logout and login with your new user - Use below request to put a write block on TM index. `PUT /.kibana_task_manager_9.0.0_001/_block/write` - Observe the error messages and their occurring interval on your terminal. - Use below request on the Kibana console to halt write block. ``` PUT /.kibana_task_manager_9.0.0_001/_settings { "index": { "blocks.write": false } } ```
Resolves: https://github.com/elastic/response-ops-team/issues/249
This PR increases task claiming interval in case of
cluster_block_exception
to avoid generating too many error during TM index reindexing.To verify:
kibana_system
andkibana_admin
rolesPUT /.kibana_task_manager_9.0.0_001/_block/write