Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CloudWatch Agent Config Doesn't Support append_dimensions for nvidia_gpu Metrics #489

Open
simonrouse9461 opened this issue Jun 14, 2022 · 4 comments
Labels
enhancement New feature or request

Comments

@simonrouse9461
Copy link

Is your feature request related to a problem? Please describe.
I'm trying to set up CloudWatch agent to monitoring GPU usage on a group of ECS clusters. I want to append an ECS cluster name dimension to the GPU metrics. This is supported for all other metrics according to the doc. However, GPU metric seems to be the only one I can't append custom dimensions to. Is there a specific reason why GPU metric is so special?

Describe the solution you'd like
I want to use append_dimensions in the nvidia_gpu section.

Describe alternatives you've considered
Didn't come up with an alternative.

Additional context
N/A

@khanhntd
Copy link
Contributor

Hey @simonrouse9461,
There are no reasons why GPU metric is special and there are no reason why GPU metrics cannot get the dimensions from append_dimensions. However, have you tried to use the following config

{
  "metrics": {
    "metrics_collected": {
      "nvidia_gpu": {
        "measurement": [
          "utilization_gpu",
          "power_draw",
          "temperature_gpu"
        ],
        "metrics_collection_interval": 60
      }
    },
    "append_dimensions": {
      "ImageId": "${aws:ImageId}",
      "InstanceId": "${aws:InstanceId}",
      "InstanceType": "${aws:InstanceType}",
      "AutoScalingGroupName": "${aws:AutoScalingGroupName}"
    },
    "aggregation_dimensions" : [["ImageId"], ["InstanceId", "InstanceType"], ["d1"],[]],
    "force_flush_interval": 60
  }
}

@simonrouse9461
Copy link
Author

Thanks @khanhntd
This is basically what I'm doing right now. The problem with the shared append_dimensions is that only ImageID, InstanceId, InstanceType, and AutoScalingGroupName are supported. Everything else will be ignored according to the documentation. However, I want to include ECS cluster name as well. Usually, this can be added inside metrics_collected section for each individual metric, but nvidia_gpu is the only metric that doesn't have an append_dimensions argument as specified in the doc.

@khanhntd
Copy link
Contributor

Hey @simonrouse9461,
Not only nvidia_gpu but also ethtool and other plugins too. However, this is a good call out for us to discuss the support path for nvidia_gpu plugin. Therefore, thanks for reporting the issues and I will talk with my team regarding that.

@khanhntd khanhntd added the enhancement New feature or request label Jun 17, 2022
@simonrouse9461
Copy link
Author

@khanhntd Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants