Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log when container-paths.yml is loaded #5462

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

belimawr
Copy link
Contributor

@belimawr belimawr commented Sep 6, 2024

What does this PR do?

See title

Why is it important?

It adds logs when the Elastic-Agent loads container-paths.yml

Checklist

  • My code follows the style guidelines of this project
  • [ ] I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [ ] I have made corresponding change to the default configuration files
  • [ ] I have added tests that prove my fix is effective or that my feature works
  • [ ] I have added an entry in ./changelog/fragments using the changelog tool
  • [ ] I have added an integration test or an E2E test

Disruptive User Impact

None

How to test this PR locally

Export the following environmet variables:

  • FLEET_ENROLL
  • FLEET_URL
  • FLEET_ENROLLMENT_TOKEN

Run the container command as root

./elastic-agent container

Stop the Elastic-Agent and run the status command:

./elastic-agent status

You should see the logs:

2024/09/06 16:54:33 container path file '/usr/share/elastic-agent/state/container-paths.yml' found
2024/09/06 16:54:33 state Path: '/usr/share/elastic-agent/state', config path: '/usr/share/elastic-agent/state', logs path: '', socket path: 'unix:///usr/share/elastic-agent/state/data/Td8I7R-Zby36_zF_IOd9QVNlFblNEro3.sock'

## Related issues

Questions to ask yourself

  • How are we going to support this in production?
  • How are we going to measure its adoption?
  • How are we going to debug this?
  • What are the metrics I should take care of?
  • ...

@belimawr belimawr added Team:Elastic-Agent Label for the Agent team skip-changelog labels Sep 6, 2024
@belimawr belimawr requested a review from a team as a code owner September 6, 2024 16:58
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

Copy link
Contributor

mergify bot commented Sep 6, 2024

This pull request does not have a backport label. Could you fix it @belimawr? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v./d./d./d is the label to automatically backport to the 8./d branch. /d is the digit

NOTE: backport-skip has been added to this pull request.

@@ -903,6 +905,8 @@ func tryContainerLoadPaths() error {
if err != nil {
return fmt.Errorf("failed to unpack %s: %w", pathFile, err)
}
log.Printf("state Path: '%s', config path: '%s', logs path: '%s', socket path: '%s'", paths.StatePath, paths.ConfigPath, paths.LogsPath, paths.SocketPath)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't agent be logging this regardless of if it is in a container? Also isn't this in diagnostics, why do we need to log it (from the perspective that every log statement we add adds to the size and cost of the remote monitoring indices)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR was triggered by my work on fixing a flaky test (#5159), I don't recall if the diagnostics were being successfully collected, but I believe they were not because the failing tests do not install the Elastic-Agent, they enrol and run the agent, hence the need to have it in the logs.

This is actually logged in any run of the agent where tryContainerLoadPaths sets the paths. On a correct execution it should be only inside a container.

Thinking more broadly, I agree it makes sense to log it every time the agent runs. Should I update the PR for it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I diagnostics we get the following in variables.yaml which omits the state path which is container specific, so let's just leave the scope of this one to here.

      path:
        config: C:\Program Files\Elastic\Agent
        data: C:\Program Files\Elastic\Agent\data
        home: C:\Program Files\Elastic\Agent\data\elastic-agent-8.14.1-1348b9
        logs: C:\Program Files\Elastic\Agent

I'm fine with this log.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it seems like instead of both log.Printf("container path file '%s' found", pathFile) and log.Printf("state Path: '%s', config path: '%s', logs path: '%s', socket path: '%s'", paths.StatePath, paths.ConfigPath, paths.LogsPath, paths.SocketPath) we should have a single log line just unconditionally logging the state path we are using.

Reading #5361 it also seems like the state path is the problem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should log the state path right after this block above:

    statePath := envWithDefault("", "STATE_PATH")
	if statePath == "" {
		statePath = defaultStateDirectory
	}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The STATE_PATH mentioned on #5361 was what made that test leave state behind, which made a new execution/installation of the Elastic-Agent not to behave as it should have, even the --force flag does not overwrite the container-paths.yml. That's is, in my opinion, one of the issues, hence this PR to be explicit about which paths the Elastic-Agent is using.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading the code and thinking more about it, the data in container-paths.yml can overwrite the state path set by STATE_PATH, so I believe we still need both logs, one indicating which file we read and another indicating the final paths we're using after reading it. I'd keep both log lines.

Copy link
Contributor

mergify bot commented Sep 10, 2024

backport-v8.x has been added to help with the transition to the new branch 8.x.

@ycombinator ycombinator added the Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team label Sep 10, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

Copy link
Contributor

mergify bot commented Sep 11, 2024

backport-v8.x has been added to help with the transition to the new branch 8.x.

@mergify mergify bot added the backport-8.x Automated backport to the 8.x branch with mergify label Sep 11, 2024
@v1v v1v removed the backport-v8.x label Sep 11, 2024
@belimawr
Copy link
Contributor Author

I cannot reproduce those CI failures, I'll just re-run/update the branch

@belimawr
Copy link
Contributor Author

buildkite test this

@belimawr
Copy link
Contributor Author

The tests are failing due to issues creating a 9.0.0 stack.

@ycombinator ycombinator force-pushed the log-go-be-container branch 3 times, most recently from e8493ac to 974e961 Compare October 2, 2024 18:21
Copy link
Contributor

@andrzej-stencel andrzej-stencel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we should wait for @cmacknz to reply in the thread before merging, although I guess we shouldn't be waiting forever either.

@cmacknz
Copy link
Member

cmacknz commented Oct 17, 2024

I still don't love logging this unconditionally, instead of putting this diagnostics, or having it respect a log level, but it's not a hill I'm going to die on if we think it's useful.

I also think the true root cause that caused us to need this is our attempts to run the elastic-agent container command outside of an actual container, which is something we no longer need to do now that we have a way to run integration tests on k8s directly.

@cmacknz
Copy link
Member

cmacknz commented Oct 17, 2024

I am talking about

cmd, agentOutput := prepareAgentCMD(t, ctx, agentFixture, []string{"container"}, env)
specifically that should probably be rewritten to actually run in a container, for reference.

@belimawr
Copy link
Contributor Author

belimawr commented Oct 18, 2024

I also think the true root cause that caused us to need this is our attempts to run the elastic-agent container command outside of an actual container, which is something we no longer need to do now that we have a way to run integration tests on k8s directly.

That is true. The reason for this PR was really to help debugging tests or other weird situations we might fall into. The tests didn't have diagnostics because the agent wasn't installed, logs are the primary source of information on those cases.

I'm not opposed to closing this PR, I haven't had time to come back to it in a while :/.

Honestly, I love that Beats logs their paths right at start up and I miss it on Elastic-Agent.

@belimawr
Copy link
Contributor Author

I am talking about

cmd, agentOutput := prepareAgentCMD(t, ctx, agentFixture, []string{"container"}, env)

specifically that should probably be rewritten to actually run in a container, for reference.

I agree, now that we have the infrastructure in place, that's definitely a better option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-8.x Automated backport to the 8.x branch with mergify backport-skip skip-changelog Team:Elastic-Agent Label for the Agent team Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants