Skip to content

Latest commit

 

History

History
161 lines (123 loc) · 7.63 KB

node_management_status.md

File metadata and controls

161 lines (123 loc) · 7.63 KB
copyright years lastupdated title description parent nav_order
2022 - 2023
2023-02-05
Node management status
Automatic agent upgrade status using policy based node management
Agent (anax)
14

{:new_window: target="blank"} {:shortdesc: .shortdesc} {:screen: .screen} {:codeblock: .codeblock} {:pre: .pre} {:child: .link .ulchildlink} {:childlinks: .ullinks}

Node management status

{: nmp-status}

Overview

The {{site.data.keyword.edge_notm}} policy based, autonomous node management capability is described here.

A Node Management Policy (NMP) status object is generated by the node management worker during node management jobs in order to track the state of the job.

Definition

{: nmp-status-def}

Following are the fields in the JSON representation of a NMP status:

  • agentUpgradePolicyStatus: a JSON structure to define the status of an automatic agent upgrade job.
    • scheduledTime: a RFC3339 formatted timestamp for when the NMP should start execution.
    • startTime: a RFC3339 formatted timestamp for when the NMP was actually executed.
    • endTime: a RFC3339 formatted timestamp for when the NMP was executed successfully. This field will not be populated if the NMP fails.
    • upgradedVersions: a JSON structure to define the versions being upgraded or downgraded.
      • softwareVersion: the version of the agent software packages to be installed.
      • certVersion: the version of the certificate to be installed.
      • configVersion: the version of the configuration to be installed.
    • status: the state of the upgrade job. See the section Status Values below for more information.
    • errorMessage: a short message that describes why an agent upgrade job has failed.
    • workingDirectory: the directory that the upgrade job will be reading and writing files to.

Status values

{: nmp-status-vals}

  • Agent Auto Upgrade status values
    • "waiting": the node management worker has matched the NMP to this node and created the status object in the local database.
    • "download started": the download worker has began downloading all necessary packages from the Management Hub.
    • "downloaded": the download worker has finished downloading all necessary packages from the Management Hub.
    • "initiated": the installation of the downloaded packages has started and is being performed by the AgentAutoUpgrade cron job script.
    • "successful": the node management worker has successfully performed the upgrade job specified in the NMP.
    • "no action required": the node management worker has determined that no actions need to be taken to upgrade or downgrade the agent. This typically means that all the files specified within the NMP's manifest are already installed, or they are at a lower version than what is currently installed, and the NMP is set the allowDowngrade field to false.
    • precheck failed: there was a problem during the pre-check in the AgentAutoUpgrade cron job script, so the job was cancelled before the installation.
    • "download failed": the download worker was unable to download all necessary packages from the Management Hub.
    • "failed": there was a problem during the installation of the downloaded packages either in the node management worker or in the AgentAutoUpgrade cron job script.
    • "reset": a temporary state following a NMP status reset.
    • "rollback started": if the status was set to "failed", the next time the AgentAutoUpgrade cron job wakes up, it will attempt to rollback the version to the previous version, and it will set the status to this value.
    • "rollback failed": there was a problem with the rollback to the previous version. The agent is most likely in an inoperable state and will need manual intervention to fix.
    • "rollback successful": the agent was successfully rolled back to the previous version.
    • "unknown": the NMP job is in some unrecognizable state.

Examples

The following is an example of a NMP status json file. The status objects are nested within the node and the NMP they apply to, as this is how they are stored in the Exchange. There can be multiple NMPs running on a single node, and there can be multiple nodes running the same NMP, so this is why the structure is formatted this way.

In this case, there is one upgrade job type - agent auto upgrade. The status for this job is stored in the agentUpgradePolicyStatus field. This job was completed successfully, so the status is set to "successful" and all of the timestamps are included. It should also be noted that the errorMessage field is omitted since the job was successful and there were no errors.

hzn exchange nmp status org/sample-nmp
{
  "org/sample-node": "successful"
}

{: codeblock}

or

hzn exchange nmp status org/sample-nmp -l
{
  "org/sample-node": {
    "org/sample-nmp": {
      "agentUpgradePolicyStatus": {
        "scheduledTime": "2022-05-24T12:00:00Z",
        "startTime": "2022-05-24T12:01:00Z",
        "endTime": "2022-05-24T12:02:00Z",
        "upgradedVersions": {
          "softwareVersion": "2.30.0",
          "certVersion": "1.0.0",
          "configVersion": "1.0.0"
        },
        "status": "successful"
      }
    }
  }
}

{: codeblock}

Listing the Exchange status of NMP currently stored in the Exchange

{: nmp-status-list}

To list the Exchange status of a NMP currently stored in the Exchange, use the following command:

hzn exchange nmp status <nmp-name>

{: codeblock}

Optional flags

  • --long, -l: Display the entire contents of each node management policy status object.
  • --node: Filter output to include just this one node. Use with --long flag to display entire content of a single node management policy status object.

To list all of the NMP statuses for a specific node, use the following command:

hzn exchange node management status <node-name>

{: codeblock}

Optional flags

  • --long, -l: Display the entire contents of each node management policy status object.
  • --policy, -p: Filter output to include just this one node managment policy. Use with --long flag to display entire content of a single node management policy status object.

Note: The two commands in this section are for listing the status of a NMP as that status exists in the Exchange. To see the status that is stored on the node itself, see the section Listing the local status of NMP currently stored in the Exchange below.

Listing the local status of NMP currently stored in the Exchange

{: #nmp-status-list-hub}

To list the local status of a NMP currently stored in the Exchange, use the following command:

hzn nmstatus list <nmp-name>

{: codeblock}

The nmp-name argument is optional and lets the user list a single NMP. This is useful when used with the --long flag.

Optional flags

  • --long, -l: Display the entire contents of each node management policy status object.

Note: The command in this section are for listing the status of a NMP as that status exists in the node's local database. To see the status that is stored in the Exchange, see the section Listing the Exchange status of NMP currently stored in the Exchange below.

Resetting an previously completed NMP

{: #nmp-reset}

Resetting a NMP stored in the Exchange can only be performed by the org admin (as well as root).

This command allows the admin to reset a NMP status to the "waiting" status so that the NMP can be re-evaluated and possibly executed again. This is useful when the same NMP needs to be run again due to a failure or if there is a change to the manifest.

To reset a NMP that exists in the Exchange, use the following command:

hzn nmstatus reset <nmp_name>

{: codeblock}