-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rfc: add platform support RFC #1065
base: main
Are you sure you want to change the base?
Conversation
This adds an RFC for platform support and representation in Contrast. This is in very much of a bare-bones stage now, so reviews and critique are welcome.
7459452
to
1650f8f
Compare
This would also enable easy validation of user error, as the individual types can be checked for | ||
validity at parsing time, making invalid states unrepresentable [^1]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not that it changes much about the proposal at hand, but "make invalid states unrepresentable" is not achieved if I can do
myFancyPlatform := Platform{
KubernetesDistribution: RKE2,
Hypervisor: CloudHypervisor,
HardwarePlatform: Unknown,
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Under the assumption of only validating constructors being used in the program, of course. I do acknowledge that my example is counter-intuitive for that, but it's a simplification, after all.
So `AKS-CLH-SNP` could become something like this: | ||
|
||
```go | ||
type Platform struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional fields that come to mind: Snapshotter
, ImageVariant
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate on ImageVariant
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with GPU support or without, debug or prod, ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think those should be additional fields, ultimately. That's the key point I want to convey in this proposal. It seems I need to make that more clear?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That might help
|
||
## Considerations | ||
|
||
### Node-Installer Images |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do Platform
s relate to runtimes? A runtime could be used with more than one image (not easily, but possible), but not with more than one hypervisor or snapshotter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Runtimes in the sense of Kata runtimes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kubernetes runtimes.
for all platforms. This could also be a foundation for heterogeneous deployments that utilize multiple | ||
platforms (for example GPU- and Non-GPU machines). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The node-installer could figure out a lot of configuration without user interaction - TEE flavour, containerd paths, maybe even hypervisor. On the other hand, outside the runtime the config is only used to decide which genpolicy to use and to populate reference values. I wonder whether we actually need to expose the entire runtime config to the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think large parts of it have to be exposed. It might be possible that you have an instance that has GPUs available but don't want to run GPU-enabled Contrast on it, for example.
Also, reference value generation needs almost all values, if I'm not mistaken?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You picked the one dimension that I did not list as candidate for automation ;)
It's still an interesting question whether / how we want to deal with mixed images. As a user, how do I tell Contrast to use a GPU image for pod A, but a plain image for pod B? I imagine a config file is not granular enough here. If there is only one image per runtime, how do I mix two runtimes with one coordinator? (I think that's a scenario we might want to support eventually, though)
The reference values need the image measurements, the genpolicy choice and the hardware reference values, where the last two are only a question of AKS-CLH vs. rest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reference values need the image measurements, the genpolicy choice and the hardware reference values, where the last two are only a question of AKS-CLH vs. rest.
For hardware reference values, I think it's also a matter of TDX vs. SNP (Genoa) vs. SNP (Milan) and so on, right?
About mixed images, how the customer decides to do that is TBD still, I think. I don't have the solution at hand for this.
About "one image per runtime": I think we could ship all variants in the node-installer if we can make them reasonably small?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't fill hardware values for bare metal, so it's just image measurements I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a user, how do I tell Contrast to use a GPU image for pod A, but a plain image for pod B?
Via the runtime class. We must install different runtimeclasses for such variants.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that using a diffferent runtimeclass is sufficient, but why is it necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General ideas LGTM, but I lack knowledge for some of the details.
RTMRs | ||
precalculator | ||
initrd | ||
deduplicated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RTMRs | |
precalculator | |
initrd | |
deduplicated | |
precalculator | |
initrd | |
deduplicate |
nit: you only need to add the basic form of a word
@@ -137,3 +137,6 @@ RTMR | |||
RTMRs | |||
precalculator | |||
initrd | |||
deduplicated | |||
unrepresentable | |||
serializable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: please keep-sorted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and please move the keep-sorted comment to the end of the list again.
|
||
Implement support for diverse platforms in Contrast, ensuring that: | ||
|
||
1. Users can mix-and-match features throughout all platforms supported by Contrast |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like how this rfc mixes a user facing feature with a refactoring. For an rfc for a feature, I'd expect far more arguing from the users perspective and UX considerations.
|
||
```go | ||
type Platform struct { | ||
KubernetesDistribution KubernetesDistribution |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We must think about what we really need to know. Like, is the distribution relevant, or do we really only need the paths and the systemd unit to restart? Otherwise we are just pushing the problem down a layer.
foo := Platform{ | ||
KubernetesDistribution: AKS, | ||
Hypervisor: CloudHypervisor, | ||
HardwarePlatform: SNPMilan, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HardwarePlatform: SNPMilan, | |
HardwarePlatform: SNP, |
Not sure if this is an intended but undocumented change or just a typo. We don't have split platforms for different product lines right now.
combinations, it might be better to ship a singular one-size-fits-all node installer with support | ||
for all platforms. This could also be a foundation for heterogeneous deployments that utilize multiple |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As as user, I don't want to download tens of gigs of stuff that I don't really need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to qualify that: it's not the user who needs to download it, but the datacenter - a couple gigs might not make a big difference there.
Anyway, I feel like the the unified node installer contradicts the objectives (1) and (2). Maybe we should focus on the tooling to make new node installer variants instead?
for all platforms. This could also be a foundation for heterogeneous deployments that utilize multiple | ||
platforms (for example GPU- and Non-GPU machines). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a user, how do I tell Contrast to use a GPU image for pod A, but a plain image for pod B?
Via the runtime class. We must install different runtimeclasses for such variants.
combinations, it might be better to ship a singular one-size-fits-all node installer with support | ||
for all platforms. This could also be a foundation for heterogeneous deployments that utilize multiple |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion on how to solve this: Remove all the stuff from the node installer image, give it a list of url/hash pairs. Send the config to the node-installer (via cm or what) and let it download the required components one by one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes installation in an air-gapped / proxied facility less convenient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can fetch from a file path as fallback and ship a separate image for air gapped, then we cover both with two images.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about this, but we also have to consider that it brings quite some overhead to development, as we lose the benefit of a self-contained image, and have to build some kind of uploading pipeline again for the components to S3 or so.
@@ -137,3 +137,6 @@ RTMR | |||
RTMRs | |||
precalculator | |||
initrd | |||
deduplicated | |||
unrepresentable | |||
serializable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and please move the keep-sorted comment to the end of the list again.
This adds an RFC for platform support and representation in Contrast. This is in very much of a bare-bones stage now, so reviews and critique are welcome.