-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fleet] Allow to reset agents log level to honor agent policy log level #180778
Comments
Pinging @elastic/fleet (Team:Fleet) |
Hello 👋 @criamico
|
@lucabelluccini this particular issue will not be changing the default behavior. Do address your last point (there's no direct way):
hope this makes sense. |
Level of effort is unclear without some investigation on how passing a null log level through the agent action is handled by Fleet Server today. We could probably move this into the current sprint 28 to do that investigation, but if it turns out we need to make changes to Fleet Server to support this kind of reset I doubt we'll be able to finish the implementation in this sprint. |
Besides what @kpollich side, we also need to define the UI for this change. I think that we have two possible ways:
|
We could also do the rest on a per agent basis, so there'd be a new option in the individual agent log level dropdown to "reset". I think since the only path to diverge an agent's log level from its parent policy's is to select an option in that dropdown, it'd be okay to expose the reset function as a single agent operation basically in reverse of the initial change. The common workflow here as I understand it:
We'll also want to make sure this option is obvious when the agent-level setting is divergent from its policy, so maybe this should live outside of the dropdown and be a button instead. It'd be easy to set the agent back to |
I tried using
Then, the action worked, but fleet-server gave an error (agent went to unhealthy). So, fleet-server has to be updated to accept resetting the log level. The error comes from here. The change looks simple, but I'm not sure if there is any complexity to reset the log level to the one that comes from the agent policy. @nchaulet or @michel-laterman could probably estimate better.
One more thing that has to be fixed on the UI: |
Thanks, @juliaElastic for taking a look. I think we have what we need to get started on an implementation here next sprint. |
The change in Fleet server do not look to complicated, but I guess we will have to do a similar change in elastic-agent for the null settings to be handled there too https://github.com/elastic/elastic-agent/blob/fd7984b1d70dc968ba67fb8f4221905e508d6a06/internal/pkg/agent/application/actions/handlers/handler_action_settings.go#L41 |
When the "reset" is set on the agent, I guess the log level will revert only when the policy payload is received from Fleet. or is the agent storing the log level value that was received in the last policy payload ? |
There's a dedicated action for changing the log level of an agent, so "resetting" it will happen in real time without a policy update. |
Does this resetting cause a policy update to occur? |
No it does not. Changing the log level from the existing dropdown sets the log level value on the agent document in So, the action associated with the dropdown is just coupled to this field on |
@kpollich , @nimarezainia I am currently implementing the elastic-agent side of this change. As part of this PR I am allowing a reset of the log level through the settings action https://github.com/elastic/elastic-agent/pull/3090/files#diff-2bd1a4e0a667e4ae3065d0fba47659d4157dd6dc41528aed1eee3aa41aeaa163R19 The idea is that the settings action will now allow an empty value of log level and interpret that as a reset for the log level of the specific agent. When the agent receives the action to clear its specific log level it will fall back to the policy log level (current implementation keeps track of it) or the hardcoded default log level if there's no agent-level or policy-level log level specified. Currently struggling with the fact that the agent-specific log level is populated at startup even before any settings action is received but hopefully we can remove that value without too much damage. |
so this should be at what ever the system default is (INFO) at startup. Aren't there logs send before policy is downloaded that may be important to us during that startup phase? |
There are indeed some logs that the agent writes before downloading a policy... I may need to check again but I think I saw elastic-agent using a Do we have a preference on which log level we want to set at startup of a managed agent ? |
I reckon it should be a default, what ever that is set to (in the future we may change the default from info to something else) Couldn't this be written into the elastic-agent.yml file which is read at startup and then overwritten by the policy when it's received (which happens for all the configuration in that file). Again I don't know how important those log files are that get sent during this window. More importantly whether they are needed during troubleshooting. |
Edit: I tested the changes and and the metadata field is still set at enrollment. I think we'll always want to keep this metadata so unsetting it is not an option. But, I could not find a way to determine the source of the log level. @pchila is there a way to expose the source of the log level? Something like this would be really nice:
Btw, I tried enrolling a new agent into a policy that already had log level set to |
On the good news side, I was able to send
This required no Fleet Server changes. The fleet-server LOC that Julia mentioned in #180778 (comment) contains a similar error but actually the original error originates from elastic-agent. @juliaElastic @nchaulet (and @michel-laterman?) Even though it works without Fleet Server changes for a normal agent, maybe the func here: needs to be updated to match this one? (my poor understanding of Go leads me to think the first file is for handling fleet server log level?) |
…#183203) ## Summary Spinning out a few unrelated tweaks from the work for #180778. This PR: - In addition to strings, allows React components to be used for configured settings `description` - Fixes small i18n issues - Fixes some setting names to have sentence casing and adjust some description copies - Moves deprecated unenrollment timeout setting to bottom of list Screenshot with all changes: ![image](https://github.com/elastic/kibana/assets/1965714/0c7b692b-e47f-4013-a95c-6ec49f5976e4) --------- Co-authored-by: Julia Bardi <[email protected]>
@jen-huang How did you test with the agent changes? the latest agent 8.15-SNAPSHOT is 1 week old, and doesn't include the log level changes yet. EDIT: I managed to test by building elastic-agent from source, and seeing the same behaviour, the reset action seems to work, though we might want to improve the log message to say that the log level is reset to default |
@jen-huang in the current agent implementation we don't keep track of where a specific configuration value comes from but we merge everything in a single config and use that to run agent. This kinda rule out the The local_metadata that you are looking at comes from the log level set for the specific agent as the fleet policy log level is not persisted as agent metadata but is kept in memory at runtime (if we overwrite the log level settings for the specific agent we will no longer know if that is a policy value or a settings value).
Since the policy log level value is not written on disk you are still seeing the default value of the agent log level (in case it's not set we output "info" anyways), to check the actual log level you can use Edit: as for the bug, the local metadata is reflecting what is written on disk, not sure if that should change if the log level from policy is taken into account... Maybe @michalpristas @cmacknz can weigh in here... |
With Consider the scenario where policy is set to The |
+1, the I really like the idea of having a |
I created two followup issues:
@ycombinator @pchila The first bug should be addressed for 8.15, I consider this a blocker for the UI work. However both should be addressed for the ideal & intended UX. Edit: one more small bug: |
I confirmed that reset works for a fleet-server agent as well (with a build from source) and it remains in healthy:
Agreed, opened another small bug for this: elastic/elastic-agent#4749 |
‼️ Should be reverted if elastic/elastic-agent#4747 does not make 8.15.0. ## Summary Resolves #180778 This PR allows agent log level to be reset back to the level set on its policy (or if not set, simply the default agent level, see elastic/elastic-agent#3090). To achieve this, this PR: - Allows `null` to be passed for the log level settings action, i.e.: ``` POST kbn:/api/fleet/agents/<AGENT_ID>/actions {"action":{"type":"SETTINGS","data":{"log_level": null}}} ``` - Enables the agent policy log level setting implemented in #180607 - Always show `Apply changes` on the agent details > Logs tab - For agents >= 8.15.0, always show `Reset to policy` on the agent details > Logs tab - Ensures both buttons are disabled if user does not have access to write to agents <img width="1254" alt="image" src="https://github.com/elastic/kibana/assets/1965714/bcdf763e-2053-4071-9aa8-8bcb57b8fee1"> <img width="1267" alt="image" src="https://github.com/elastic/kibana/assets/1965714/182ac54d-d5ad-435f-9376-70bb24f288f9"> ### Caveats 1. The reported agent log level is not accurate if agent is using the level from its policy and does not have a log level set on its own level (elastic/elastic-agent#4747), so the initial selection on the agent log level could be wrong 2. We have no way to tell where the log level came from (elastic/elastic-agent#4748), so that's why `Apply changes` and `Reset to policy` are always shown ### Testing Use the latest `8.15.0-SNAPSHOT` for agents or fleet server to test this change ### Checklist Delete any items that are not applicable to this PR. - [x] Any text added follows [EUI's writing guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses sentence case text and includes [i18n support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md) - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios
Follow up of #158861
With #158861 we added a select in Agent policy settings that allow the user to set the log level per policy. However, if the user has also changed the log level at agent level, this overrides the previous setting and the agent policy doesn't have control of the log level for that agent (see this comment).
true
. It is currently set to false.The text was updated successfully, but these errors were encountered: