Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add peerpods support for the node-installer #959

Merged
merged 11 commits into from
Nov 4, 2024
Merged

Conversation

Freax13
Copy link
Contributor

@Freax13 Freax13 commented Oct 28, 2024

This PR adds support for peerpods on SEV-SNP AKS to the node-installer and adjusts our justfile to use the node-installer instead of the coco-operator-based runtime steps.

@Freax13 Freax13 requested review from burgerdev, 3u13r and msanft October 28, 2024 14:48
@Freax13 Freax13 requested a review from katexochen as a code owner October 28, 2024 14:48
@Freax13 Freax13 added the no changelog PRs not listed in the release notes label Oct 28, 2024
internal/platforms/platforms.go Outdated Show resolved Hide resolved
internal/kuberesource/parts.go Outdated Show resolved Hide resolved
set -euo pipefail
kubectl apply -f ./{{ workspace_dir }}/runtime
if [[ {{ platform }} == "AKS-PEER-SNP" ]]; then
kubectl apply -f ./infra/azure-peerpods/peer-pods-config.yaml --namespace {{ target }}${namespace_suffix-}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are still unable to deploy the runtime without knowing the target namespace, right? Please create a ticket with a detailed issue description.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this has been fixed. Instead of using federated credentials, we know use a application password credential and directly pass a client id and secret to the cloud-api-adaptor. Conventionally, we already had a application password credential set up:
https://github.com/edgelesssys/contrast/blob/0362d8ccc7892311a799dfca717ac84cc595cea9/infra/azure-peerpods/main.tf#L80C11-L82

@Freax13 Freax13 force-pushed the tom/peerpod-node-installer branch from 5453ee0 to d18466f Compare October 29, 2024 07:46
@Freax13 Freax13 requested a review from katexochen October 29, 2024 08:24
filename = "id_rsa.pub"
}

resource "local_file" "peer-pods-config" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing that needs to be done now, but if we still want to have a "working peerpods cluster" through a single terraform apply, using the Kubernetes Terraform provider might be cleaner here.

Volume().
WithName("ssh").
WithSecret(applycorev1.SecretVolumeSource().
WithDefaultMode(384).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using an octal literal 0o600 would be more readable here

@Freax13 Freax13 force-pushed the tom/peerpod-node-installer branch from d18466f to f649399 Compare October 29, 2024 09:20
@Freax13 Freax13 requested a review from msanft October 29, 2024 09:20
set -euo pipefail
kubectl apply -f ./{{ workspace_dir }}/runtime
if [[ {{ platform }} == "AKS-PEER-SNP" ]]; then
kubectl apply -f ./infra/azure-peerpods/peer-pods-config.yaml --namespace {{ target }}${namespace_suffix-}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who is supposed to place peer-pods-config.yaml in that directory?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Terraform:

resource "local_file" "peer-pods-config" {
filename = "./peer-pods-config.yaml"
file_permission = "0777"
content = <<EOF
apiVersion: v1
kind: ConfigMap
metadata:
name: peer-pods-cm
data:
AZURE_CLIENT_ID: ${azuread_application.app.client_id}
AZURE_TENANT_ID: ${data.azurerm_subscription.current.tenant_id}
AZURE_AUTHORITY_HOST: https://login.microsoftonline.com/
AZURE_IMAGE_ID: ${var.image_id}
AZURE_INSTANCE_SIZE: Standard_DC2as_v5
AZURE_REGION: ${data.azurerm_resource_group.rg.location}
AZURE_RESOURCE_GROUP: ${data.azurerm_resource_group.rg.name}
AZURE_SUBNET_ID: ${one(azurerm_virtual_network.main.subnet.*.id)}
AZURE_SUBSCRIPTION_ID: ${data.azurerm_subscription.current.subscription_id}
CLOUD_PROVIDER: azure
DISABLECVM: "false"
---
apiVersion: v1
data:
AZURE_CLIENT_SECRET: ${base64encode(azuread_application_password.cred.value)}
kind: Secret
metadata:
name: azure-client-secret
---
type: Opaque
apiVersion: v1
data:
id_rsa.pub: ${data.local_file.id_rsa.content_base64}
kind: Secret
metadata:
name: ssh-key-secret
type: Opaque
EOF
}

Copy link
Member

@3u13r 3u13r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The nginx deployment still fails with the same error

Events:
  Type     Reason            Age                    From               Message
  ----     ------            ----                   ----               -------
  Warning  FailedScheduling  6m8s                   default-scheduler  0/1 nodes are available: 1 Insufficient memory. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod.
  Normal   Scheduled         5m59s                  default-scheduler  Successfully assigned default/nginx-56bf5648cd-8dbfc to aks-default-34657868-vmss000000
  Normal   Pulled            4m34s                  kubelet            Successfully pulled image "nginx" in 4.268s (4.268s including waiting)
  Normal   Pulled            2m31s                  kubelet            Successfully pulled image "nginx" in 746ms (747ms including waiting)
  Warning  Failed            2m21s                  kubelet            Error: failed to create containerd task: failed to create shim task: ttrpc: closed: unknown
  Warning  BackOff           2m20s (x2 over 2m21s)  kubelet            Back-off restarting failed container nginx in pod nginx-56bf5648cd-8dbfc_default(9c8eca10-1358-437c-929b-ca5c6ba4adaf)
  Normal   Pulling           2m5s (x3 over 4m38s)   kubelet            Pulling image "nginx"
  Normal   Pulled            2m4s                   kubelet            Successfully pulled image "nginx" in 797ms (797ms including waiting)
  Normal   Created           2m2s (x3 over 4m28s)   kubelet            Created container nginx
  Warning  Failed            15s (x2 over 3m51s)    kubelet            Error: failed to create containerd task: failed to create shim task: Could not send CopyFile request: Dead agent: unknown
  Normal   SandboxChanged    15s (x2 over 3m50s)    kubelet            Pod sandbox changed, it will be killed and re-created.

but the openssl deployments are in the expected state (policy rejected).

@@ -18,6 +18,8 @@ import (
var (
//go:embed testdata/expected-aks-clh-snp.toml
expectedConfAKSCLHSNP []byte
//go:embed testdata/expected-aks-peer-snp.toml
expectedConfAKSPEERSNP []byte
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
expectedConfAKSPEERSNP []byte
expectedConfAKSPeerSNP []byte

@Freax13 Freax13 force-pushed the tom/peerpod-node-installer branch from f649399 to 915d90a Compare October 30, 2024 14:22
@Freax13 Freax13 requested a review from katexochen November 4, 2024 07:16
@Freax13 Freax13 merged commit 56fc63d into main Nov 4, 2024
10 checks passed
@Freax13 Freax13 deleted the tom/peerpod-node-installer branch November 4, 2024 10:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no changelog PRs not listed in the release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants