Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix double stop components #3482
Fix double stop components #3482
Changes from all commits
e5eeb1f
9d76c08
a6d5921
4d4ae1e
2606c30
21564f6
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do it here? Why not do it inside of
existing.stop(
? Seems safer, that way if the code is updated some where else to callexisting.stop(
the same logic would apply. Seems weird that it would be checked outside of the function that sets it.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you are referring to the log, it's there because that's where I landed when investigating the bug initially.
If you are referring to the stop() itself, the initial fix was only here in the manager without any modification to the service runtime object.
Would you prefer to have only for a service runtime object ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pchila No I am referring to this line function https://github.com/elastic/elastic-agent/blob/main/pkg/component/runtime/runtime.go#L189 that is being called from https://github.com/elastic/elastic-agent/blob/main/pkg/component/runtime/manager.go#L708 (which is what your changing here).
This runtime wraps the actual managers. So this change will apply to all managers.
I thinking the function could easily be changed to:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done in 2606c30
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can
shuttingDown
be set to false between thisLoad
and the followingStore
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from a component point of view it should be guarded, but from agent architecture standpoint it should not be concurrent and should be safe to have it as it is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That this is never set to false doesn't seem right. What would happen if we stopped the service and then tried to start it again? This would be uninstalling the endpoint integration and then reinstalling it quickly, or perhaps reassigning the agent to a different policy.
The
ignoreCheckins
on this path is set to false again in thecase actionStart:
block.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When discussing the fix with @blakerouse , I asked about recycling a service runtime object and we came to the conclusion that a stopped component is ultimately discarded (removed from the maps of running components in the runtime manager) after it eventually stops.
If this assumption does not hold, then we need to keep track of the installation status and if the object gets recycled we need to wait for the ongoing uninstall to finish, then reinstall it (it is done as part of the start()) and set up the component back properly.
@blakerouse any thoughts ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is code in this file that would only be executed if the object is not discarded, so at minimum the behavior needs to be confirmed and then documented. It may be that your and Blake's analysis is correct, but it isn't obvious to myself and others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is correct. Once a manager is stopped it is never restarted again. This follows the same pattern - https://github.com/elastic/elastic-agent/blob/main/pkg/component/runtime/runtime.go#L190; once set it is not unset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is the case we should add a comment explaining this.