Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: update static models to use 0.7.11 models #1728

Merged
merged 6 commits into from
Aug 29, 2024

Conversation

sthaha
Copy link
Collaborator

@sthaha sthaha commented Aug 23, 2024

NOTE: intel_rapl_DynPower has not been updated since
SGDRegressor for Dynamic Power Node Type 0 is missing https://github.com/sustainable-computing-io/kepler-model-db/tree/main/models/v0.7/ec2-0.7.11/rapl-sysfs/DynPower/BPFOnly

After updating the models to the latest, here are the changes on my machine


kepler_node_info{components_power_source="rapl-sysfs", cpu_architecture="Skylake", instance="kepler-latest:8888", job="latest", platform_power_source="none", source="os"} | 1
-- | --
kepler_node_info{components_power_source="estimator", cpu_architecture="Skylake", instance="kepler-dev:8888", job="dev", platform_power_source="none", source="os"}

NOTE: kepler-latest reports power_sources incorrectly (fixed in this pr; see "dev")

Platform / Idle

  • seems like an improvement
image

Platform / Dynamic

image

Package / Idle

  • Idle power is much higher than the old
image

Package / dynamic

  • higher but closer to real usage
image

@sthaha sthaha requested a review from sunya-ch August 23, 2024 02:52
Copy link
Contributor

github-actions bot commented Aug 23, 2024

🤖 SeineSailor

Here is a concise summary of the pull request changes:

Summary: This pull request updates static models to use version 0.7.11, but it's still in draft stage due to issues with model loading. The changes primarily affect the power estimation functionality, with key modifications including:

  1. Updated power model initialization: The createNodeComponentPowerModelConfig function now accepts three string slices as arguments, and the CreateNodeComponentPoweEstimatorModel function initializes the nodeComponentPowerModel variable using the createPowerModelEstimator function.
  2. Error handling and logging: The CreateNodePlatformPoweEstimatorModel function now checks for errors when creating the power model estimator and logs detailed error messages. Logging has been improved to provide more informative messages.
  3. New ModelOutputType and SourceURL function: A new type for ModelOutputType has been added, and a new SourceURL function is added to the ModelConfig struct, which returns the URL or filepath of the init model based on whether InitModelURL or InitModelFilepath is set.
  4. Simplified model weight copying: The prepare_dev_env.sh script now copies all .json files from the ../data/model_weight directory to /var/lib/kepler/data/model_weight, simplifying the process and ensuring inclusion of all necessary models.
  5. Config changes: The GetModelConfigMap() function now trims string values to remove leading/trailing whitespaces, and a new GetDefaultPowerModel function returns the trainer and the path to the embedded power model based on modelOutputType and energySource.
  6. Expose estimated idle power metrics: The kepler.config files now set EXPOSE_ESTIMATED_IDLE_POWER_METRICS to true.

Observations and suggestions:

  • The changes seem to focus on updating the power estimation functionality, but the pull request is still in draft stage due to issues with model loading. It's essential to resolve these issues before merging the changes.
  • The addition of the GetDefaultPowerModel function and the SourceURL function to the ModelConfig struct provides more flexibility and configurability to the power estimation functionality.
  • The changes to the prepare_dev_env.sh script simplify the model weight copying process, but it's crucial to ensure that all necessary models are included and correctly loaded.
  • The logging improvements will provide more informative messages, which can aid in debugging and troubleshooting.
  • It's recommended to thoroughly test the updated power estimation functionality to ensure it works correctly and doesn't introduce any regressions.

@sthaha sthaha force-pushed the update-models-0.7.11 branch from 9b6945a to 1620112 Compare August 26, 2024 22:38
@sthaha sthaha force-pushed the update-models-0.7.11 branch 2 times, most recently from af67668 to 6f95209 Compare August 28, 2024 04:02
@sthaha sthaha force-pushed the update-models-0.7.11 branch from 54366e9 to ecd5f54 Compare August 29, 2024 01:21
Previously, when DISABLE_POWER_METER is set, kepler would still probe
system for power-meters resulting in kepler_node_info to produce
incorrect results for components_power_source and platform_power_source.
E.g.
kepler_node_info{
  components_power_source="rapl-sysfs",
  cpu_architecture="Skylake",
  instance="kepler-latest:8888",
  job="latest",
  platform_power_source="acpi",
  source="os"
}

The commit fixes this to use the fake power-meters so that kepler_node_info
now shows
```
kepler_node_info{components_power_source="estimator",
  cpu_architecture="Skylake",
  instance="kepler-dev:8888",
  job="dev",
  platform_power_source="none",
  source="os"
}
```

Signed-off-by: Sunil Thaha <[email protected]>
@sthaha sthaha marked this pull request as ready for review August 29, 2024 03:09
@sthaha sthaha changed the title chore: update static models to use 0.7.11 models feat: update static models to use 0.7.11 models Aug 29, 2024
Copy link
Collaborator

@sunya-ch sunya-ch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm Thank you so much!

@sthaha sthaha merged commit 9425830 into sustainable-computing-io:main Aug 29, 2024
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants