-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Log when container-paths.yml
is loaded
#5462
base: main
Are you sure you want to change the base?
Conversation
Pinging @elastic/elastic-agent (Team:Elastic-Agent) |
This pull request does not have a backport label. Could you fix it @belimawr? 🙏
NOTE: |
@@ -903,6 +905,8 @@ func tryContainerLoadPaths() error { | |||
if err != nil { | |||
return fmt.Errorf("failed to unpack %s: %w", pathFile, err) | |||
} | |||
log.Printf("state Path: '%s', config path: '%s', logs path: '%s', socket path: '%s'", paths.StatePath, paths.ConfigPath, paths.LogsPath, paths.SocketPath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't agent be logging this regardless of if it is in a container? Also isn't this in diagnostics, why do we need to log it (from the perspective that every log statement we add adds to the size and cost of the remote monitoring indices)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR was triggered by my work on fixing a flaky test (#5159), I don't recall if the diagnostics were being successfully collected, but I believe they were not because the failing tests do not install the Elastic-Agent, they enrol and run the agent, hence the need to have it in the logs.
This is actually logged in any run of the agent where tryContainerLoadPaths
sets the paths. On a correct execution it should be only inside a container.
Thinking more broadly, I agree it makes sense to log it every time the agent runs. Should I update the PR for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I diagnostics we get the following in variables.yaml which omits the state path which is container specific, so let's just leave the scope of this one to here.
path:
config: C:\Program Files\Elastic\Agent
data: C:\Program Files\Elastic\Agent\data
home: C:\Program Files\Elastic\Agent\data\elastic-agent-8.14.1-1348b9
logs: C:\Program Files\Elastic\Agent
I'm fine with this log.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, it seems like instead of both log.Printf("container path file '%s' found", pathFile)
and log.Printf("state Path: '%s', config path: '%s', logs path: '%s', socket path: '%s'", paths.StatePath, paths.ConfigPath, paths.LogsPath, paths.SocketPath)
we should have a single log line just unconditionally logging the state path we are using.
Reading #5361 it also seems like the state path is the problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should log the state path right after this block above:
statePath := envWithDefault("", "STATE_PATH")
if statePath == "" {
statePath = defaultStateDirectory
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The STATE_PATH
mentioned on #5361 was what made that test leave state behind, which made a new execution/installation of the Elastic-Agent not to behave as it should have, even the --force
flag does not overwrite the container-paths.yml
. That's is, in my opinion, one of the issues, hence this PR to be explicit about which paths the Elastic-Agent is using.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading the code and thinking more about it, the data in container-paths.yml
can overwrite the state path set by STATE_PATH
, so I believe we still need both logs, one indicating which file we read and another indicating the final paths we're using after reading it. I'd keep both log lines.
|
Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane) |
|
231b841
to
1a30c6b
Compare
I cannot reproduce those CI failures, I'll just re-run/update the branch |
buildkite test this |
The tests are failing due to issues creating a 9.0.0 stack. |
1a30c6b
to
c97567f
Compare
6e963b0
to
77afa82
Compare
e8493ac
to
974e961
Compare
1a8b0b5
to
28bd404
Compare
Quality Gate passedIssues Measures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't love logging this unconditionally, instead of putting this diagnostics, or having it respect a log level, but it's not a hill I'm going to die on if we think it's useful. I also think the true root cause that caused us to need this is our attempts to run the |
I am talking about
|
That is true. The reason for this PR was really to help debugging tests or other weird situations we might fall into. The tests didn't have diagnostics because the agent wasn't installed, logs are the primary source of information on those cases. I'm not opposed to closing this PR, I haven't had time to come back to it in a while :/. Honestly, I love that Beats logs their paths right at start up and I miss it on Elastic-Agent. |
I agree, now that we have the infrastructure in place, that's definitely a better option. |
What does this PR do?
See title
Why is it important?
It adds logs when the Elastic-Agent loads
container-paths.yml
Checklist
[ ] I have commented my code, particularly in hard-to-understand areas[ ] I have made corresponding changes to the documentation[ ] I have made corresponding change to the default configuration files[ ] I have added tests that prove my fix is effective or that my feature works[ ] I have added an entry in./changelog/fragments
using the changelog tool[ ] I have added an integration test or an E2E testDisruptive User Impact
None
How to test this PR locally
Export the following environmet variables:
FLEET_ENROLL
FLEET_URL
FLEET_ENROLLMENT_TOKEN
Run the container command as root
Stop the Elastic-Agent and run the status command:
You should see the logs:
## Related issuesQuestions to ask yourself