-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Windows] TestProxyURL fails with access denied
error on fleet.enc
#4913
Comments
Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane) |
Pinging @elastic/elastic-agent (Team:Elastic-Agent) |
Did some additional runs on a Windows VM... It seems that the issue is intermittent which suggests some kind of race condition that make the test fail sometimes on windows unprivileged. In the specific instance I managed to reproduce only once over 3 runs of the specific test on a Win10 VM |
That this error has only ever been seen in #4770, and that PR also changes the policy change logic, makes me suspect it could be related to something in #4770. It is also possible you've added test coverage to something that was always flaky, and now we can observe it.
|
Yeah I agree that its possible this is coming from this PR. I just looked at main to try and determine how I can reproduce this and I see that this test is skipped https://github.com/elastic/elastic-agent/blob/main/testing/integration/proxy_url_test.go#L35. Happy to help work with you on this error and why you think its specifically related to unprivileged mode and permissions on Windows. |
After further development done in #4770 the proxy url tests are enabled again (those were disabled due to #4861) In #4770 I changed a log level in the agent to see what error may cause an action dispatch failure https://github.com/elastic/elastic-agent/pull/4770/files#diff-b1931bdf74ee7a85f4291c623d9a21868c438c9ef9829e8e939f2b43906b3535R159 (before it was a simple debug log which was not printed most of the time), so that may have revealed an error that we already have before the changes... The code changes for mTLS did not add any extra reading/writing of I will keep investigating trying to figure out what triggers the |
access denied
error on fleet.enc
access denied
error on fleet.enc
I am going to disable the ProxyURL tests on windows until the access denied is resolved |
After a few tries we got a CI run that contains useful information about the issue: https://buildkite.com/elastic/elastic-agent-extended-testing/builds/810#01903606-b9b9-4977-bc0d-b2327f19b1f1 Grepping ➜ TestProxyURL-EnrollWithProxy-PolicyProxyTakesPrecedence-diagnostics-2024-06-20T14-59-38Z grep -ir fleet.enc ./* | sort
<omitted...>
{"file.name":"storage/encrypted_disk_store.go","file.line":129},"message":"Save of C:\\Program Files\\Elastic\\Agent\\fleet.enc started at 2024-06-20 14:54:17.257021 +0000 GMT m=+1.117881501\n","ecs.version":"1.6.0"}
./logs/elastic-agent-8.15.0-SNAPSHOT-fbeed0/elastic-agent-20240620-2.ndjson:{"log.level":"info","@timestamp":"2024-06-20T14:54:17.282Z","log.logger":"encrypted-disk-store-debug","log.origin":{"file.name":"storage/encrypted_disk_storage_debug_windows.go","file.line":41},"message":"C:\\Program Files\\Elastic\\Agent\\fleet.enc security descriptor: O:S-1-5-21-310561555-502049046-1354876720-1009G:S-1-5-21-310561555-502049046-1354876720-513","ecs.version":"1.6.0"}
./logs/elastic-agent-8.15.0-SNAPSHOT-fbeed0/elastic-agent-20240620-2.ndjson:{"log.level":"info","@timestamp":"2024-06-20T14:54:17.282Z","log.logger":"encrypted-disk-store-debug","log.origin":{"file.name":"storage/encrypted_disk_storage_debug_windows.go","file.line":61},"message":"C:\\Program Files\\Elastic\\Agent\\fleet.enc stat:\n{\n \"FileAttributes\": 32,\n \"CreationTime\": {\n \"LowDateTime\": 3101609211,\n \"HighDateTime\": 31114017\n },\n \"LastAccessTime\": {\n \"LowDateTime\": 3101784772,\n \"HighDateTime\": 31114017\n },\n \"LastWriteTime\": {\n \"LowDateTime\": 3101784772,\n \"HighDateTime\": 31114017\n },\n \"FileSizeHigh\": 0,\n \"FileSizeLow\": 274,\n \"ReparseTag\": 0\n}\n","ecs.version":"1.6.0"}
./logs/elastic-agent-8.15.0-SNAPSHOT-fbeed0/elastic-agent-20240620-2.ndjson:{"log.level":"info","@timestamp":"2024-06-20T14:54:17.288Z","log.logger":"encrypted-disk-store-debug","log.origin":{"file.name":"storage/encrypted_disk_storage_debug_windows.go","file.line":53},"message":"owner for \"C:\\\\Program Files\\\\Elastic\\\\Agent\\\\fleet.enc\": OGC-WINDOWS-AMD\\elastic-agent-user account type 1","ecs.version":"1.6.0"}
./logs/elastic-agent-8.15.0-SNAPSHOT-fbeed0/elastic-agent-20240620-2.ndjson:{"log.level":"info","@timestamp":"2024-06-20T14:54:17.288Z","log.logger":"encrypted-disk-store-debug","log.origin":{"file.name":"storage/encrypted_disk_store.go","file.line":131},"message":"Save of C:\\Program Files\\Elastic\\Agent\\fleet.enc finished at 2024-06-20 14:54:17.2883767 +0000 GMT m=+1.149237201\n","ecs.version":"1.6.0"}
./logs/elastic-agent-8.15.0-SNAPSHOT-fbeed0/elastic-agent-20240620-3.ndjson:{"log.level":"error","@timestamp":"2024-06-20T14:54:35.407Z","log.origin":{"file.name":"dispatcher/dispatcher.go","file.line":161},"message":"Failed to dispatch action id \"actionID-TestValidProxyInThePolicy\" of type \"POLICY_CHANGE\", error: saving config: fail to persist new Fleet Server API client hosts: could not replace target file C:\\Program Files\\Elastic\\Agent\\fleet.enc: rename C:\\Program Files\\Elastic\\Agent\\fleet.enc.tmp C:\\Program Files\\Elastic\\Agent\\fleet.enc: Access is denied.","log":{"source":"elastic-agent"},"ecs.version":"1.6.0"}
<omitted...> Along with the logs from the test report [ ===] Waiting For Enroll... [4s] {"log.level":"warn","@timestamp":"2024-06-20T14:54:16.230Z","log.logger":"tls","log.origin":{"file.name":"tlscommon/tls_config.go","file.line":107},"message":"SSL/TLS verifications disabled.","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-06-20T14:54:16.783Z","log.origin":{"file.name":"cmd/enroll_cmd.go","file.line":518},"message":"Starting enrollment to URL: http://fleet.elastic.co/","ecs.version":"1.6.0"}
Load of C:\Program Files\Elastic\Agent\fleet.enc started at 2024-06-20 14:54:16.9998057 +0000 GMT m=+0.860666201Load of C:\Program Files\Elastic\Agent\fleet.enc finished at 2024-06-20 14:54:17.0038126 +0000 GMT m=+0.864673101{"log.level":"info","@timestamp":"2024-06-20T14:54:17.674Z","log.origin":{"file.name":"cmd/enroll_cmd.go","file.line":481},"message":"Restarting agent daemon, attempt 0","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-06-20T14:54:17.683Z","log.origin":{"file.name":"cmd/enroll_cmd.go","file.line":299},"message":"Successfully triggered restart on running Elastic Agent.","ecs.version":"1.6.0"}
Successfully enrolled the Elastic Agent. If we look at the timestamp when the enroll command is restarting agent after writing fleet.enc This appears in the tests for 2 reasons:
On windows having such a short time where 2 processes try to modify the same file seems to cause the access denied error, maybe this can be mitigated by making the mock fleet server slow down a few milliseconds before sending out the action or having it resend on the next checkin if it's not been acknowledged. |
oh joy. In both cases I think we are using the |
We discuss this issue in the weekly meeting and I wanted to document it here. I believe the following change will result in this no longer having a race condition and overall will provide a better experience for installation.
|
While validating PR #4770
TestProxyURL
test fails withCI run -> https://buildkite.com/elastic/elastic-agent-extended-testing/builds/538#01900846-16ad-428c-8a60-5a48324d55f5
As a workaround the test now install elastic-agent as privileged on Windows (with this the test runs correctly) but it seems that there are some issues with
fleet.enc
permissions in unprivileged mode not directly related to the developed enhancement.This issue is there to track this issue and collect results of further investigation,
The text was updated successfully, but these errors were encountered: