The division of responsibility between agent versions during an upgrade is incorrect #3639
Labels
Team:Elastic-Agent
Label for the Agent team
Team:Elastic-Agent-Control-Plane
Label for the Agent Control Plane team
Problem
Today the majority of the work performed during an agent upgrade is performed by the current version the user is upgrading from. Today this includes:
data/elastic-agent-$hash
where $hash is the Git commit hash truncated to exactly 6 characters.elastic-agent watch
command is executed using the path to the next version of the agent, but the path to this version is assumed to match the current version.Today these steps today are implemented here, and once the agent artifact is downloaded and extracted every one of them requires the current version of the agent to make assumptions about the structure of the next agent version.
Impact
This strategy has two major flaws:
Solution
The solution to this problem is move the majority of the logic needed to perform an upgrade into the next version of the agent. The current version of the agent should be responsible for as little as possible. The minimum set of steps that for the current version of the agent to perform are:
Implementing this solution will require significant changes to the upgrade process, and will also require changing the path to the executable invoked to start the upgrade. Changing the path to the agent executable is the exact same set of work already started in #2579 with a preliminary backwards compatible solution.
Note that all changes must be backwards compatible since we allow upgrading from any version of the agent to any later version of the agent, across both 7.17.x and 8.x.x. Making a breaking change to support these changes is currently not an option.
Next Steps
Create an RFC describing the changes we must make to the upgrade process to implement the solution above (or an alternate solution that accomplishes the same goals). Describe how the upgrade process will be tested and how we will achieve backwards compatibility. The changes made must unblock #2579 and any other future changes to the directory structure of the agent.
The proposed implementation should be broken into phases to allow us to make progress incrementally and reduce the risk of the implementation. Ideally the changes to the directory structure needed to address #2579 and implement a well known upgrade entrypoint can be made separately from the migration of most upgrade functionality into that entrypoint.
The text was updated successfully, but these errors were encountered: