Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SELinux "denied" errors #697

Closed
honarkhah opened this issue Apr 3, 2023 · 66 comments
Closed

SELinux "denied" errors #697

honarkhah opened this issue Apr 3, 2023 · 66 comments

Comments

@honarkhah
Copy link
Contributor

Description

Hi, I am trying to integrate the vault-csi-provider into my cluster which I got an error!

│ 2023-04-03T12:01:38.605Z [INFO]  Creating new gRPC server                                                                                                                                                                                                                                           │
│ 2023-04-03T12:01:38.605Z [INFO]  Opening unix socket: endpoint=/provider/vault.sock                                                                                                                                                                                                                 │
│ 2023-04-03T12:01:38.606Z [ERROR] Error running provider: err="failed to listen on unix socket at /provider/vault.sock: listen unix /provider/vault.sock: bind: permission denied"                                                                                                                   │
│ Stream closed EOF for vault/vault-csi-provider-csi-provider-mwtd6 (vault-csi-provider)  

I have checked on the nodes and realized it is not able to create that file on the host os, I just have disabled the policy enforcing to test if that is causing it and was able to run the vault csi provider on the node that setenforce flag was disabled as you can see in below.

Screenshot 2023-04-03 at 14 03 58

Is there any way that I can attach customized policy based on the need?

Kube.tf file

module "kube-hetzner" {

  providers = {
    hcloud = hcloud
  }
  
  hcloud_token    = var.HCLOUD_TOKEN
  source          = "kube-hetzner/kube-hetzner/hcloud"
  version         = "2.0.2"
  ssh_public_key  = tls_private_key.kube_hetzner.public_key_openssh
  ssh_private_key = tls_private_key.kube_hetzner.private_key_openssh
  network_region  = "eu-central"

  control_plane_nodepools = [
    {
      name        = "control-plane",
      server_type = "cpx11",
      location    = "fsn1",
      labels      = [],
      taints      = [],
      count       = 3
    }
  ]
  
  agent_nodepools = [
    {
      name        = "worker",
      server_type = "cx51",
      location    = "fsn1",
      labels      = [],
      taints      = [],
      floating_ip = false,
      count       = 3
    }
  ]

  load_balancer_type     = "lb11"
  load_balancer_location = "fsn1"

  ingress_controller        = "none"
  cluster_name              = "test"
  cni_plugin                = "cilium"
  enable_cert_manager       = true
  use_control_plane_lb      = true
  create_kubeconfig         = false
  create_kustomization      = false
  restrict_outbound_traffic = false
}

Screenshots

No response

Platform

Linux

@honarkhah honarkhah added the bug Something isn't working label Apr 3, 2023
@mysticaltech
Copy link
Collaborator

@honarkhah Good catch, please SSH into your node, and execute that command sudo grep "avc: denied" /var/log/audit/audit.log | audit2allow -M my_custom_policy. Then copy paste here the value of your generated my_custom_policy.te file.

Ideally, before that, apply the generate .pp to double check that it works. If not, re-run the command and apply again the .pp, until you get vault working, and then either open a PR with the content of the .te added to the policy section in locals.tf, or just copy it here so that I add it myself.

@mysticaltech mysticaltech changed the title [Bug]: Selinux policy conflict with csi provider [Bug]: Selinux policy conflict with vault csi provider Apr 3, 2023
@mysticaltech mysticaltech removed the bug Something isn't working label Apr 3, 2023
@mysticaltech
Copy link
Collaborator

@honarkhah FYI, you can apply the .pp with semodule -i my_custom_policy.pp.

@mysticaltech
Copy link
Collaborator

@honarkhah Any update on this, I await the content of your .te to add it to our kube-hetzner policy so that others can run vault and similar software too.

@mysticaltech mysticaltech changed the title [Bug]: Selinux policy conflict with vault csi provider SELinux policy not allowing Vault CSI provider Apr 3, 2023
@honarkhah
Copy link
Contributor Author

honarkhah commented Apr 4, 2023

grep "avc: denied" /var/log/audit/audit.log | audit2allow -M my_custom_policy

I have tried it couple of times, but I still get permission denied!

$:~ # grep "avc:  denied" /var/log/audit/audit.log
type=AVC msg=audit(1680592959.117:208937): avc:  denied  { write } for  pid=12053 comm="vault-csi-provi" name="secrets-store-csi-providers" dev="overlay" ino=97 scontext=system_u:system_r:container_t:s0:c735,c822 tcontext=system_u:object_r:etc_t:s0 tclass=dir permissive=0

policy.te file:

module my_custom_policy 1.0;

require {
        type container_t;
        type etc_t;
        class dir { add_name remove_name write };
        class sock_file create;
}

#============= container_t ==============

#!!!! This avc is allowed in the current policy
allow container_t etc_t:dir { add_name remove_name write };

#!!!! This avc is allowed in the current policy
allow container_t etc_t:sock_file create;

But as I see the error in container changed to

[ERROR] Error running provider: err="failed to check for existence of unix socket: stat /provider/vault.sock: permission denied"

@honarkhah
Copy link
Contributor Author

honarkhah commented Apr 4, 2023

Ah, finally got it working, the issue was the socket was created before, after deleting and let it to create with new policy it worked, I will create a PR.

module my_custom_policy 1.0;

require {
        type etc_t;
        type container_t;
        class dir { add_name remove_name write };
        class sock_file { create unlink };
}

#============= container_t ==============

#!!!! This avc is allowed in the current policy
allow container_t etc_t:dir { add_name remove_name write };

#!!!! This avc is allowed in the current policy
allow container_t etc_t:sock_file create;
allow container_t etc_t:sock_file unlink;

@michailkorenev
Copy link

michailkorenev commented Apr 4, 2023

I have similar problem with the Filebeat.
Error from the container:

failed to create Beat meta file: open /usr/share/filebeat/data/meta.json.new: permission denied

Pod's manifest:

apiVersion: v1
kind: Pod
metadata:
  annotations:
    beat.k8s.elastic.co/config-hash: '2123631025'
  creationTimestamp: '2023-04-04T08:33:30Z'
  generateName: filebeat-elk-beat-filebeat-
  labels:
    beat.k8s.elastic.co/name: filebeat-elk
    beat.k8s.elastic.co/version: 8.3.0
    common.k8s.elastic.co/type: beat
    controller-revision-hash: 697549fdf6
    pod-template-generation: '1'
  managedFields:
    - apiVersion: v1
      fieldsType: FieldsV1
      fieldsV1:
        :metadata:
      manager: k3s
      operation: Update
      subresource: status
      time: '2023-04-04T08:36:44Z'
  name: filebeat-elk-beat-filebeat-ctgx5
  namespace: logging
  ownerReferences:
    - apiVersion: apps/v1
      blockOwnerDeletion: true
      controller: true
      kind: DaemonSet
      name: filebeat-elk-beat-filebeat
      uid: aaacb244-8de3-48f8-9c5e-7372a2cc28db
  resourceVersion: '702767'
  uid: 50035b3e-c551-4fc7-8a93-f6cb52862a44
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchFields:
              - key: metadata.name
                operator: In
                values:
                  - k3s-agent-hel1-dnp
  automountServiceAccountToken: true
  containers:
    - args:
        - '-e'
        - '-c'
        - /etc/beat.yml
      env:
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: spec.nodeName
      image: docker.elastic.co/beats/filebeat:8.3.0
      imagePullPolicy: IfNotPresent
      name: filebeat
      resources:
        limits:
          cpu: 250m
          memory: 500Mi
        requests:
          cpu: 100m
          memory: 200Mi
      securityContext:
        runAsUser: 0
      terminationMessagePath: /dev/termination-log
      terminationMessagePolicy: File
      volumeMounts:
        - mountPath: /usr/share/filebeat/data
          name: beat-data
        - mountPath: /etc/beat.yml
          name: config
          readOnly: true
          subPath: beat.yml
        - mountPath: /var/lib/docker/containers
          name: varlibdockercontainers
        - mountPath: /var/log/containers
          name: varlogcontainers
        - mountPath: /var/log/pods
          name: varlogpods
        - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
          name: kube-api-access-s4rcs
          readOnly: true
  dnsPolicy: ClusterFirstWithHostNet
  enableServiceLinks: true
  hostNetwork: true
  nodeName: k3s-agent-hel1-dnp
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: filebeat-elk
  serviceAccountName: filebeat-elk
  terminationGracePeriodSeconds: 30
  tolerations:
    - effect: NoSchedule
      key: dedicated
      operator: Exists
    - effect: NoExecute
      key: node.kubernetes.io/not-ready
      operator: Exists
    - effect: NoExecute
      key: node.kubernetes.io/unreachable
      operator: Exists
    - effect: NoSchedule
      key: node.kubernetes.io/disk-pressure
      operator: Exists
    - effect: NoSchedule
      key: node.kubernetes.io/memory-pressure
      operator: Exists
    - effect: NoSchedule
      key: node.kubernetes.io/pid-pressure
      operator: Exists
    - effect: NoSchedule
      key: node.kubernetes.io/unschedulable
      operator: Exists
    - effect: NoSchedule
      key: node.kubernetes.io/network-unavailable
      operator: Exists
  volumes:
    - hostPath:
        path: /var/lib/logging/filebeat-elk/filebeat-data
        type: DirectoryOrCreate
      name: beat-data
    - name: config
      secret:
        defaultMode: 292
        optional: false
        secretName: filebeat-elk-beat-filebeat-config
    - hostPath:
        path: /var/lib/docker/containers
        type: ''
      name: varlibdockercontainers
    - hostPath:
        path: /var/log/containers
        type: ''
      name: varlogcontainers
    - hostPath:
        path: /var/log/pods
        type: ''
      name: varlogpods
    - name: kube-api-access-s4rcs
      projected:
        defaultMode: 420
        sources:
          - serviceAccountToken:
              expirationSeconds: 3607
              path: token
          - configMap:
              items:
                - key: ca.crt
                  path: ca.crt
              name: kube-root-ca.crt
          - downwardAPI:
              items:
                - fieldRef:
                    apiVersion: v1
                    fieldPath: metadata.namespace
                  path: namespace
status:
  conditions:
    - lastProbeTime: null
      lastTransitionTime: '2023-04-04T08:33:30Z'
      status: 'True'
      type: Initialized
    - lastProbeTime: null
      lastTransitionTime: '2023-04-04T08:36:44Z'
      message: 'containers with unready status: [filebeat]'
      reason: ContainersNotReady
      status: 'False'
      type: Ready
    - lastProbeTime: null
      lastTransitionTime: '2023-04-04T08:36:44Z'
      message: 'containers with unready status: [filebeat]'
      reason: ContainersNotReady
      status: 'False'
      type: ContainersReady
    - lastProbeTime: null
      lastTransitionTime: '2023-04-04T08:33:30Z'
      status: 'True'
      type: PodScheduled
  containerStatuses:
    - containerID: >-
        containerd://85f2c69219d0a831dab2dba4032c0f61e07917ddb7b8a4919552263bf49f0933
      image: docker.elastic.co/beats/filebeat:8.3.0
      imageID: >-
        docker.elastic.co/beats/filebeat@sha256:2972cf06e669fc62d319bd2135ab7bebb9c476f26ec82934061ba5932a5db5b1
      lastState:
        terminated:
          containerID: >-
            containerd://85f2c69219d0a831dab2dba4032c0f61e07917ddb7b8a4919552263bf49f0933
          exitCode: 1
          finishedAt: '2023-04-04T08:36:43Z'
          reason: Error
          startedAt: '2023-04-04T08:36:42Z'
      name: filebeat
      ready: false
      restartCount: 5
      started: false
      state:
        waiting:
          message: >-
            back-off 2m40s restarting failed container=filebeat
            pod=filebeat-elk-beat-filebeat-ctgx5_logging(50035b3e-c551-4fc7-8a93-f6cb52862a44)
          reason: CrashLoopBackOff
  hostIP: 10.2.0.101
  phase: Running
  podIP: 10.2.0.101
  podIPs:
    - ip: 10.2.0.101
  qosClass: Burstable
  startTime: '2023-04-04T08:33:30Z'

my_custom_policy.te:

module my_custom_policy 1.0;

require {
        type container_var_lib_t;
        type container_t;
        class file { create lock open read rename write };
}

#============= container_t ==============
allow container_t container_var_lib_t:file create;
allow container_t container_var_lib_t:file { open read write rename lock };

@maaft
Copy link

maaft commented Apr 4, 2023

Hi! I'm also getting these avc errors when trying to write on volumes provisioned using local-path-provisioner.

type=AVC msg=audit(1680610782.192:1681): avc:  denied  { write } for  pid=9622 comm="barman-cloud-re" name="pvc-1e25420c-f5f8-4929-b1f5-bd1eefc92905_dug-staging_hasura-1" dev="sda3" ino=258 scontext=system_u:system_r:container_t:s0:c715,c838 tcontext=system_u:object_r:usr_t:s0 tclass=dir permissive=0

Any workarounds?

Edit: I think it would be great if local-storage is supported by kube-hetzner from the get-go! It's an important feature to have minimum file-latency e.g. for databases.

@honarkhah
Copy link
Contributor Author

@honarkhah Good catch, please SSH into your node, and execute that command sudo grep "avc: denied" /var/log/audit/audit.log | audit2allow -M my_custom_policy. Then copy paste here the value of your generated my_custom_policy.te file.

Ideally, before that, apply the generate .pp to double check that it works. If not, re-run the command and apply again the .pp, until you get vault working, and then either open a PR with the content of the .te added to the policy section in locals.tf, or just copy it here so that I add it myself.

Have you guys followed the instructions?
I have followed the path and after couple of try and error got the final policy in place and created a PR

@maaft
Copy link

maaft commented Apr 4, 2023

ah, sorry - I must have skipped that part. will try

@phinx110
Copy link

phinx110 commented Apr 4, 2023

I use this project to setup cheaper dev/test/staging kubernetes clusters. After reinstalling an older cluster I got the following problems:

  • fluent-bit (needs access to local files on the node):

[error] [sqldb] cannot open database /var/log/flb_kube.db
[error] [input:tail:tail.0] could not open/create database
[error] failed initialize input tail.0
[error] [engine] input initialization failed
[error] [lib] backend failed

  • openvpn(needs to load "tun" and "conntrack" kernel modules):

Error: Could not process rule: No such file or directory
add rule ip filter INPUT iifname tun0 ct state related,established accept
^^^^^^^^

ERROR: Cannot open TUN/TAP dev /dev/net/tun: No such device (errno=19) (((Even though I did mknod /dev/net/tun c 10 200 )))

  • lbnodeagent (don't know what it needs):

{"caller":"panic.go:884","msg":"done","op":"shutdown"}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1358ad9]
goroutine 1 [running]:
purelb.io/internal/local.(*announcer).announceRemote(0xc000200420, 0xc0007b8d60, 0xc0006a4ad0, 0xc00042b420?, {0xc00042b420, 0x10, 0x10})
/usr/src/internal/local/announcer_local.go:273 +0x519
purelb.io/internal/local.(*announcer).SetBalancer(0xc000200420, 0xc0007b8d60, 0x16c9ff1?)
/usr/src/internal/local/announcer_local.go:193 +0x5cf
main.(*controller).ServiceChanged(0xc000204190, 0xc0007b8d60, 0xc000492501?)
/usr/src/cmd/lbnodeagent/controller.go:102 +0x9e3
purelb.io/internal/k8s.(*Client).sync(0xc0000bc9a0, {0x14312e0?, 0xc0004925e0?})
/usr/src/internal/k8s/k8s.go:403 +0x3bd
purelb.io/internal/k8s.(*Client).Run(0xc0000bc9a0, 0xc0001140c0)
/usr/src/internal/k8s/k8s.go:280 +0x2be
main.main()
/usr/src/cmd/lbnodeagent/main.go:119 +0xf2b
Stream closed EOF for purelb/lbnodeagent-dlr82 (lbnodeagent)
`

These issues have been solved by disabling Selinux. Probably not the smartest Idea but I don't mind to do it on a throwaway clusters.
This is how I did it:

ssh [email protected]
setenforce Permissive #This disabeles it until the next restart
vi /etc/sysconfig/selinux-policy #set SELINUX=permissive and save, this will persist the setting after node reboot.

From this use case of mine I can conclude that it is probably best to have the Selinux settings be exposed in the kube.tf.example similar to how extra_firewall_rules are. Preferably with documentation on how to configure it like with the Examples section in the readme.md This is necessary because different users have different needs and it's not possible to cover everything in locals.tf

@maaft
Copy link

maaft commented Apr 4, 2023

okay, for local-storage this seems to do the trick for me:

nevermind, some permissions are missing. back to the drawing board


module my_custom_policy 1.0;

require {
	type usr_t;
	type container_t;
	class dir { add_name create remove_name setattr write };
	class file { append create rename setattr unlink write };
}

#============= container_t ==============

allow container_t usr_t:dir { add_name create remove_name setattr write };
allow container_t usr_t:file { append create rename setattr unlink write };

I suspect though, that this is highly workload dependent!

@mysticaltech basically, we need to allow every kind of file-/folder-access to support any workloads that make use of local-storage. So any suggestions whats missing in above .te file?

@mysticaltech
Copy link
Collaborator

@honarkhah Good job, will merge later on tonight.

@maaft Test it via applying th generated .pp file, if works great, if not, re-run the command for it to post the latest errors. Normally, it will convert all uses cases, so if the above is your final .te that is proven to work, I will integrate it.

@michailkorenev Perfect, .te noted, will integrate, also make sure to test with .pp as described above.

@phinx110 Please grab your .te, test your .pp, if all good post your .te, don't worry no need to disable SELinux I need your help to make this project compatible with SELinux while being has secure as possible, just generate and share your .te file please, will merge later tonight.

@mysticaltech
Copy link
Collaborator

mysticaltech commented Apr 4, 2023

@honarkhah Good job, will merge later on tonight.

@maaft Test it via applying the generated .pp file, if works great, if not, re-run the command for it to post the latest errors. Normally, it will convert all uses cases, so if the above is your final .te that is proven to work, I will integrate it.

@michailkorenev Perfect, .te noted, will integrate, also make sure to test with .pp as described above.

@phinx110 Please grab your .te, test your .pp, if all good post your .te, don't worry no need to disable selinux.

@maaft
Copy link

maaft commented Apr 4, 2023

@mysticaltech its a very time-consuming process:

  • my postgres db does init from backup (takes time)
  • after that, new errors occur
  • generate and re-apply .pp, throw away old DB, restart restore procedure, wait ...
  • i have multiple agents where the pods need to run due to HA-postgres-cluster (different /var/log/audit/audit.log content , therefore different .pp patches - how to aggregate easily?)
  • when the cluster finally runs - how can I be sure that everything is included? maybe at the next backup there is still some permission missing? or at any other random time when the cluster does something specific which it hasn't done in my tests?

What I want to say is, wouldn't it be easier to allow just everything regarding files and folders? If yes, where to find such a list?

@mysticaltech
Copy link
Collaborator

mysticaltech commented Apr 4, 2023

@maaft No worries than, I will include a quick flag "enable_selinux" to turn off selinux at the k3s level, how it was before. It has always been active at the OS level, but not the k3s level.

@mysticaltech
Copy link
Collaborator

Now for best security, you would leave it active, and after you run your cluster, issue the above command to pickup the denied stuff, submit a PR and voila.

@mysticaltech
Copy link
Collaborator

@maaft My theory is that applications actually are not that different. Probably the new updated policy with the values above, will fix most issues, and after a few iterations, we will have convered the whole playing field.

So my advice, both @maaft and @phinx110 see the command above, generate your te and pp file, test pp, if good and everything works, submit your final .te content here that is just text. That way we will really map all possible use cases, there aren't so many.

@honarkhah
Copy link
Contributor Author

honarkhah commented Apr 4, 2023

@mysticaltech I can help if there is WIP or a list of features that need to develop!

@mysticaltech
Copy link
Collaborator

Thanks @honarkhah, really appreciate it!

@mysticaltech
Copy link
Collaborator

mysticaltech commented Apr 4, 2023

@maaft @phinx110 @michailkorenev Along with the recently merged SELinux rules additions by @honarkhah, I have merged your respective rules provided above. Please give v2.0.4 a shot, you do not need to create the snapshot, but you do not need to update the module and at least create new nodepools (while taking bringing the count to the old nodepools to 0 after draining the old nodes), or if you prefer completely recreate the cluster.

If that works, great! If not, please let's iterate a little bit, either you add your rules to locals.tf directly via a PR, or you create a discussion about additional SELinux rules and add them there tagging me explicitly so that I add them ASAP.

I believe we are near to mapping all the possible use cases, I also proactively added permissions based on GPT-4 recommendations of common uses cases.

If that approach works, great! If however in the future we see the need for custom rules, we will provide a variable for you to add it, and if necessary, we will provide the ability to turn off SELinux at the k3s level (not the OS level), hopefully that will not be necessary so that clusters deployed with this tool stay as secure as possible.

Again to create your needed policy if you see denied errors:

  • SSH in the affected node.
  • Run: sudo grep "denied" /var/log/audit/audit.log | audit2allow -M my_custom_policy
  • Apply with semodule -i my_custom_policy.pp
  • Test, and if failed again, re-rerun the above two commands to catch the remaining denied errors.
  • When success, submit the content of your my_custom_policy.te in a discussion tagging me.

@mysticaltech mysticaltech changed the title SELinux policy not allowing Vault CSI provider SELinux "denied" errors Apr 4, 2023
@mysticaltech mysticaltech pinned this issue Apr 4, 2023
@phinx110
Copy link

phinx110 commented Apr 5, 2023

@mysticaltech
When testing lbnodeagent I got the following but I'm not sure if this is actually doing anything because it was the first one I tested. It's possible that something else is causing this and not lbnodeagent so I wouldn't include this in the locals.tf just yet:

purelb-lbnodeagent 1.0;

require {
	type init_t;
	type container_t;
	class file { open read };
}

#============= container_t ==============
allow container_t init_t:file { open read };

I need this to load the "tun" (openvpn) and "conntrack" (nftables) kernel modules:

module kernel_module_request 1.0;

require {
	type container_t;
	type kernel_t;
	class system module_request;
}

#============= container_t ==============

#!!!! This avc can be allowed using the boolean 'domain_kernel_load_modules'
allow container_t kernel_t:system module_request;

Regarding the fluentbit-cluster-monitoring it was a mess. I got the following 3 configs:

module fluentbit-cluster-monitoring 1.0;

require {
	type container_log_t;
	type container_t;
	type var_log_t;
	class file { create lock open read setattr watch write };
	class dir { add_name read write };
	class lnk_file read;
}

#============= container_t ==============
allow container_t container_log_t:dir read;
allow container_t container_log_t:file { open read watch };
allow container_t container_log_t:lnk_file read;
allow container_t var_log_t:dir { add_name write };
allow container_t var_log_t:file { create lock open read setattr write };
module fluentbit-cluster-monitoring2 1.0;

require {
	type var_log_t;
	type container_t;
	class dir remove_name;
	class file unlink;
}

#============= container_t ==============
allow container_t var_log_t:dir remove_name;
allow container_t var_log_t:file unlink;
module filesystem_associate 1.0;

require {
	type proc_t;
	type container_t;
	class filesystem associate;
}

#============= container_t ==============
allow container_t proc_t:filesystem associate;

After loading all these modules with semodule -i, the fluentbit still keeps crashing duo to permission problems. If I disable selinux it works again however, grep "denied" /var/log/audit/audit.log | audit2allow does not return to me anything new so now I'm stuck.

I understand the desire to have a fully locked down k3s installation however, even with good instructions provided, it still remains a tedious iterative process, especially if you have to do this for multiple individual failing components separately, in order to have the semodules separated as well, like I did, and in the end it still did not end up working for me. With the older version of this project I had a working setup but after a reinstall I suddenly had to spend quite some time to get it working the correct way. In the end I have decided to just give up on this and disable selinux altogether to get on with it because of timepresure. I simply cannot afford to keep working on this any longer then I currently have and I assume there are quite a few other developers who end up in the same position as I am.

The main reason to have "selinux at the k3s level" is if you run your production on Hetzner. In my case we run our production on AWS and use Hetzner as a cheaper environments to run different version/copies of our software for different departments (dev/test/staging). I would strongly consider the ability to disable "selinux at the k3s level" as a valid "feature" that I would like to request.

@mysticaltech
Copy link
Collaborator

Okay, after some searching, going by https://stackoverflow.com/a/71152408/2199687 what could work I guess is manually updating /var/lib/cloud/instance/cloud-config.txt via a file provisioner and then running cloud-init single --name write-files --frequency always, cloud-init single --name scripts-user --frequency always (possibly more)

@jhass Please try it if you can, and PR most welcome!

Otherwise, just to get SElinux going try such a shell script https://chat.openai.com/share/c2ace355-82ce-4df4-a5b4-590e81443cea. But make sure to replace by the latest version of the policy from locals.tf.

@maaft
Copy link

maaft commented Apr 22, 2024

hi @mysticaltech

Here's what I noticed:

  • running an arm64 cluster for 2-3 weeks without any issues
  • without touching the cluster, suddenly some nodes loose network connection and cluster becomes completely broken

For broken nodes:

  • no ping working
  • can only access broken nodes through other working nodes (via private network ssh)
  • from within broken nodes, i cannot even ping google.de
  • audit yields huge list of rules

do you have any idea what could have gone wrong here?

Edit
On one of the other nodes where network connection through public networking is completely broken, grep "avc: denied" /var/log/audit/audit.log yields no results. What is going on? :-O

not sure if something really weird is going on, since my .te file is huge. But I'll post it anyway.

my_custom_policy.te module kube_hetzner_selinux 1.0;

require {
type kernel_t, bin_t, kernel_generic_helper_t, iscsid_t, iscsid_exec_t, var_run_t,
init_t, unlabeled_t, systemd_logind_t, systemd_hostnamed_t, container_t,
cert_t, container_var_lib_t, etc_t, usr_t, container_file_t, container_log_t,
container_share_t, container_runtime_exec_t, container_runtime_t, var_log_t, proc_t, io_uring_t, fuse_device_t, http_port_t,
container_var_run_t;
class key { read view };
class file { open read execute execute_no_trans create link lock rename write append setattr unlink getattr watch };
class sock_file { watch write create unlink };
class unix_dgram_socket create;
class unix_stream_socket { connectto read write };
class dir { add_name create getattr link lock read rename remove_name reparent rmdir setattr unlink search write watch };
class lnk_file { read create };
class system module_request;
class filesystem associate;
class bpf map_create;
class io_uring sqpoll;
class anon_inode { create map read write };
class tcp_socket name_connect;
class chr_file { open read write };
}

#============= kernel_generic_helper_t ==============
allow kernel_generic_helper_t bin_t:file execute_no_trans;
allow kernel_generic_helper_t kernel_t:key { read view };
allow kernel_generic_helper_t self:unix_dgram_socket create;

#============= iscsid_t ==============
allow iscsid_t iscsid_exec_t:file execute;
allow iscsid_t var_run_t:sock_file write;
allow iscsid_t var_run_t:unix_stream_socket connectto;

#============= init_t ==============
allow init_t unlabeled_t:dir { add_name remove_name rmdir };
allow init_t unlabeled_t:lnk_file create;
allow init_t container_t:file { open read };
allow init_t container_file_t:file { execute execute_no_trans };
allow init_t fuse_device_t:chr_file { open read write };
allow init_t http_port_t:tcp_socket name_connect;

#============= systemd_logind_t ==============
allow systemd_logind_t unlabeled_t:dir search;

#============= systemd_hostnamed_t ==============
allow systemd_hostnamed_t unlabeled_t:dir search;

#============= container_t ==============

Basic file and directory operations for specific types

allow container_t cert_t:dir read;
allow container_t cert_t:lnk_file read;
allow container_t cert_t:file { read open };
allow container_t container_var_lib_t:file { create open read write rename lock };
allow container_t etc_t:dir { add_name remove_name write create setattr watch };
allow container_t etc_t:file { create setattr unlink write };
allow container_t etc_t:sock_file { create unlink };
allow container_t usr_t:dir { add_name create getattr link lock read rename remove_name reparent rmdir setattr unlink search write };
allow container_t usr_t:file { append create execute getattr link lock read rename setattr unlink write };

Additional rules for container_t

allow container_t container_file_t:file { open read write append getattr setattr };
allow container_t container_file_t:sock_file watch;
allow container_t container_log_t:file { open read write append getattr setattr };
allow container_t container_log_t:dir read;
allow container_t container_share_t:dir { read write add_name remove_name };
allow container_t container_share_t:file { read write create unlink };
allow container_t container_runtime_exec_t:file { read execute execute_no_trans open };
allow container_t container_runtime_t:unix_stream_socket { connectto read write };
allow container_t kernel_t:system module_request;
allow container_t container_log_t:dir { read watch };
allow container_t container_log_t:file { open read watch };
allow container_t container_log_t:lnk_file read;
allow container_t var_log_t:dir { add_name write };
allow container_t var_log_t:file { create lock open read setattr write };
allow container_t var_log_t:dir remove_name;
allow container_t var_log_t:file unlink;
allow container_t proc_t:filesystem associate;
allow container_t self:bpf map_create;
allow container_t self:io_uring sqpoll;
allow container_t io_uring_t:anon_inode { create map read write };
allow container_t container_var_run_t:dir { add_name remove_name write };
allow container_t container_var_run_t:file { create open read rename unlink write };

@mysticaltech
Copy link
Collaborator

@maaft Apparently sometimes node automatic upgrade breaks the networking changes, we are currently investigating. Please try to see if you can reapply cloud-init manually, maybe it's saved on the machine. Or look at the code to see what it does and do thst manually. If it works for you, we will be able to create a boot script that does that automatically.

@maaft
Copy link

maaft commented Apr 23, 2024

@maaft Apparently sometimes node automatic upgrade breaks the networking changes, we are currently investigating. Please try to see if you can reapply cloud-init manually, maybe it's saved on the machine. Or look at the code to see what it does and do thst manually. If it works for you, we will be able to create a boot script that does that automatically.

okay, thanks - will try it later.

What are the current workarounds? Disable automatic upgrades?

Edit: Also, this must be something new. My 1.5 year old cluster that a created with your repo back in the days, continues to do node upgrades successfully.

Edit2:

As suggested, I executed:

sudo cloud-init init
sudo cloud-init modules --mode config
sudo cloud-init modules --mode final

on the failing node. Still no network connection afterwards.

I noticed that the commands issued by cloud-init actually do nothing!

I executed the commands from cat /var/log/cloud-init.log manually, and suddenly my network is working again.

This is highly unstable though and I'm very much interested more stable solution if you have any.

Logs Cloud-init v. 23.3-8.1 running 'init' at Tue, 23 Apr 2024 10:12:22 +0000. Up 215.38 seconds. ci-info: ++++++++++++++++++++++++++++++++++++++++Net device info++++++++++++++++++++++++++++++++++++++++ ci-info: +--------+------+------------------------------+-----------------+--------+-------------------+ ci-info: | Device | Up | Address | Mask | Scope | Hw-Address | ci-info: +--------+------+------------------------------+-----------------+--------+-------------------+ ci-info: | enp7s0 | True | 10.255.0.101 | 255.255.255.255 | global | 86:00:00:7e:2d:fc | ci-info: | enp7s0 | True | fe80::f88a:c1f2:363d:31b5/64 | . | link | 86:00:00:7e:2d:fc | ci-info: | eth0 | True | 49.13.59.226 | 255.255.255.255 | global | 96:00:03:1b:cd:01 | ci-info: | eth0 | True | fe80::8f1f:5511:4709:5e88/64 | . | link | 96:00:03:1b:cd:01 | ci-info: | lo | True | 127.0.0.1 | 255.0.0.0 | host | . | ci-info: | lo | True | ::1/128 | . | host | . | ci-info: +--------+------+------------------------------+-----------------+--------+-------------------+ ci-info: +++++++++++++++++++++++++++++Route IPv4 info++++++++++++++++++++++++++++++ ci-info: +-------+-------------+------------+-----------------+-----------+-------+ ci-info: | Route | Destination | Gateway | Genmask | Interface | Flags | ci-info: +-------+-------------+------------+-----------------+-----------+-------+ ci-info: | 0 | 0.0.0.0 | 10.0.0.1 | 0.0.0.0 | enp7s0 | UG | ci-info: | 1 | 0.0.0.0 | 172.31.1.1 | 0.0.0.0 | eth0 | UG | ci-info: | 2 | 10.0.0.0 | 10.0.0.1 | 255.0.0.0 | enp7s0 | UG | ci-info: | 3 | 10.0.0.1 | 0.0.0.0 | 255.255.255.255 | enp7s0 | UH | ci-info: | 4 | 172.31.1.1 | 0.0.0.0 | 255.255.255.255 | eth0 | UH | ci-info: | 5 | 172.31.1.1 | 0.0.0.0 | 255.255.255.255 | eth0 | UH | ci-info: +-------+-------------+------------+-----------------+-----------+-------+ ci-info: +++++++++++++++++++Route IPv6 info+++++++++++++++++++ ci-info: +-------+-------------+---------+-----------+-------+ ci-info: | Route | Destination | Gateway | Interface | Flags | ci-info: +-------+-------------+---------+-----------+-------+ ci-info: | 0 | fe80::/64 | :: | enp7s0 | U | ci-info: | 1 | fe80::/64 | :: | eth0 | U | ci-info: | 3 | local | :: | eth0 | U | ci-info: | 4 | local | :: | enp7s0 | U | ci-info: | 5 | multicast | :: | enp7s0 | U | ci-info: | 6 | multicast | :: | eth0 | U | ci-info: +-------+-------------+---------+-----------+-------+

Cloud-init v. 23.3-8.1 running 'modules:config' at Tue, 23 Apr 2024 10:12:28 +0000. Up 221.53 seconds.

Cloud-init v. 23.3-8.1 running 'modules:final' at Tue, 23 Apr 2024 10:12:31 +0000. Up 224.38 seconds.
Cloud-init v. 23.3-8.1 finished at Tue, 23 Apr 2024 10:12:31 +0000. Datasource DataSourceHetzner. Up 224.45 seconds

#cloud-config

from 1 files

init.cfg


debug: true
growpart:
devices:
- /var
hostname: production-control-plane-fsn1-wjz
preserve_hostname: true
runcmd:

    • btrfs
    • filesystem
    • resize
    • max
    • /var
    • semanage
    • port
    • -a
    • -t
    • ssh_port_t
    • -p
    • tcp
    • '21259'
    • checkmodule
    • -M
    • -m
    • -o
    • /root/kube_hetzner_selinux.mod
    • /root/kube_hetzner_selinux.te
    • semodule_package
    • -o
    • /root/kube_hetzner_selinux.pp
    • -m
    • /root/kube_hetzner_selinux.mod
    • semodule
    • -i
    • /root/kube_hetzner_selinux.pp
    • setsebool
    • -P
    • virt_use_samba
    • '1'
    • setsebool
    • -P
    • domain_kernel_load_modules
    • '1'
    • systemctl
    • disable
    • --now
    • rebootmgr.service
    • systemctl
    • reload
    • NetworkManager
    • sed
    • -i
    • s/#SystemMaxUse=/SystemMaxUse=3G/g
    • /etc/systemd/journald.conf
    • sed
    • -i
    • s/#MaxRetentionSec=/MaxRetentionSec=1week/g
    • /etc/systemd/journald.conf
    • sed
    • -i
    • s/NUMBER_LIMIT="2-10"/NUMBER_LIMIT="4"/g
    • /etc/snapper/configs/root
    • sed
    • -i
    • s/NUMBER_LIMIT_IMPORTANT="4-10"/NUMBER_LIMIT_IMPORTANT="3"/g
    • /etc/snapper/configs/root
    • chmod
    • +x
    • /etc/cloud/rename_interface.sh
    • systemctl
    • restart
    • sshd
    • systemctl
    • restart
    • NetworkManager
    • systemctl
    • status
    • NetworkManager
    • ip
    • route
    • add
    • default
    • via
    • 172.31.1.1
    • dev
    • eth0
    • truncate
    • -s
    • '0'
    • /var/log/audit/audit.log
      ssh_authorized_keys:
  • ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIE52ueclUWvaQrZRY5z9zkEZP/gGBhM+hFK0oQx7byTC
    [email protected]
    write_files:

  • content: "#!/bin/bash\nset -euo pipefail\n\nsleep 11\n\nINTERFACE=$(ip link show
    \ | awk '/^3:/ print $2 ' | sed 's/://g')\nMAC=$(cat /sys/class/net/$INTERFACE/address)\n
    \ncat < /etc/udev/rules.d/70-persistent-net.rules\nSUBSYSTEM=="net"
    , ACTION=="add", DRIVERS=="?*", ATTR address=="$MAC", NAME="eth1"
    \nEOF\n\nip link set $INTERFACE down\nip link set $INTERFACE name eth1\nip
    \ link set eth1 up\n\neth0_connection=$(nmcli -g GENERAL.CONNECTION device
    \ show eth0)\nnmcli connection modify "$eth0_connection" \\n con-name
    \ eth0 \\n connection.interface-name eth0\n\neth1_connection=$(nmcli -g
    \ GENERAL.CONNECTION device show eth1)\nnmcli connection modify "$eth1_connection"
    \ \\n con-name eth1 \\n connection.interface-name eth1\n\nsystemctl restart
    \ NetworkManager\n"
    path: /etc/cloud/rename_interface.sh
    permissions: '0744'

  • content: 'Port 21259

    PasswordAuthentication no
    
    X11Forwarding no
    
    MaxAuthTries 4
    
    AllowTcpForwarding no
    
    AllowAgentForwarding no
    
    AuthorizedKeysFile .ssh/authorized_keys
    
    '
    

    path: /etc/ssh/sshd_config.d/kube-hetzner.conf

  • content: 'REBOOT_METHOD=kured

    '
    

    path: /etc/transactional-update.conf

  • content: '[rancher-k3s-common-stable]

    name=Rancher K3s Common (stable)
    
    baseurl=https://rpm.rancher.io/k3s/stable/common/microos/noarch
    
    enabled=1
    
    gpgcheck=1
    
    repo_gpgcheck=0
    
    gpgkey=https://rpm.rancher.io/public.key
    
    '
    

    path: /etc/zypp/repos.d/rancher-k3s-common.repo

  • content: "module kube_hetzner_selinux 1.0;\n\nrequire {\n type kernel_t, bin_t,
    \ kernel_generic_helper_t, iscsid_t, iscsid_exec_t, var_run_t,\n init_t,
    \ unlabeled_t, systemd_logind_t, systemd_hostnamed_t, container_t,\n cert_t,
    \ container_var_lib_t, etc_t, usr_t, container_file_t, container_log_t,\n
    \ container_share_t, container_runtime_exec_t, container_runtime_t, var_log_t,
    \ proc_t, io_uring_t, fuse_device_t, http_port_t,\n container_var_run_t;\n
    \ class key { read view };\n class file { open read execute execute_no_trans
    \ create link lock rename write append setattr unlink getattr watch };\n
    \ class sock_file { watch write create unlink };\n class unix_dgram_socket
    \ create;\n class unix_stream_socket { connectto read write };\n class dir
    \ { add_name create getattr link lock read rename remove_name reparent rmdir
    \ setattr unlink search write watch };\n class lnk_file { read create };\n
    \ class system module_request;\n class filesystem associate;\n class bpf
    \ map_create;\n class io_uring sqpoll;\n class anon_inode { create map read
    \ write };\n class tcp_socket name_connect;\n class chr_file { open read
    \ write };\n}\n\n#============= kernel_generic_helper_t ==============\nallow
    \ kernel_generic_helper_t bin_t:file execute_no_trans;\nallow kernel_generic_helper_t
    \ kernel_t:key { read view };\nallow kernel_generic_helper_t self:unix_dgram_socket
    \ create;\n\n#============= iscsid_t ==============\nallow iscsid_t iscsid_exec_t:file
    \ execute;\nallow iscsid_t var_run_t:sock_file write;\nallow iscsid_t var_run_t:unix_stream_socket
    \ connectto;\n\n#============= init_t ==============\nallow init_t unlabeled_t:dir
    \ { add_name remove_name rmdir };\nallow init_t unlabeled_t:lnk_file create;\n
    allow init_t container_t:file { open read };\nallow init_t container_file_t:file
    \ { execute execute_no_trans };\nallow init_t fuse_device_t:chr_file { open
    \ read write };\nallow init_t http_port_t:tcp_socket name_connect;\n\n#=============
    \ systemd_logind_t ==============\nallow systemd_logind_t unlabeled_t:dir
    \ search;\n\n#============= systemd_hostnamed_t ==============\nallow systemd_hostnamed_t
    \ unlabeled_t:dir search;\n\n#============= container_t ==============\n#
    \ Basic file and directory operations for specific types\nallow container_t
    \ cert_t:dir read;\nallow container_t cert_t:lnk_file read;\nallow container_t
    \ cert_t:file { read open };\nallow container_t container_var_lib_t:file {
    \ create open read write rename lock };\nallow container_t etc_t:dir { add_name
    \ remove_name write create setattr watch };\nallow container_t etc_t:file
    \ { create setattr unlink write };\nallow container_t etc_t:sock_file { create
    \ unlink };\nallow container_t usr_t:dir { add_name create getattr link lock
    \ read rename remove_name reparent rmdir setattr unlink search write };\n
    allow container_t usr_t:file { append create execute getattr link lock read
    \ rename setattr unlink write };\n\n# Additional rules for container_t\nallow
    \ container_t container_file_t:file { open read write append getattr setattr
    \ };\nallow container_t container_file_t:sock_file watch;\nallow container_t
    \ container_log_t:file { open read write append getattr setattr };\nallow
    \ container_t container_log_t:dir read;\nallow container_t container_share_t:dir
    \ { read write add_name remove_name };\nallow container_t container_share_t:file
    \ { read write create unlink };\nallow container_t container_runtime_exec_t:file
    \ { read execute execute_no_trans open };\nallow container_t container_runtime_t:unix_stream_socket
    \ { connectto read write };\nallow container_t kernel_t:system module_request;\n
    allow container_t container_log_t:dir { read watch };\nallow container_t container_log_t:file
    \ { open read watch };\nallow container_t container_log_t:lnk_file read;\n
    allow container_t var_log_t:dir { add_name write };\nallow container_t var_log_t:file
    \ { create lock open read setattr write };\nallow container_t var_log_t:dir
    \ remove_name;\nallow container_t var_log_t:file unlink;\nallow container_t
    \ proc_t:filesystem associate;\nallow container_t self:bpf map_create;\nallow
    \ container_t self:io_uring sqpoll;\nallow container_t io_uring_t:anon_inode
    \ { create map read write };\nallow container_t container_var_run_t:dir {
    \ add_name remove_name write };\nallow container_t container_var_run_t:file
    \ { create open read rename unlink write };\n"
    path: /root/kube_hetzner_selinux.te

  • content: IA==
    encoding: base64
    path: /etc/rancher/k3s/registries.yaml

  • content: '[main]

    dns=none
    
    '
    

    path: /etc/NetworkManager/conf.d/dns.conf

  • content: 'nameserver 1.1.1.1

    nameserver 8.8.8.8
    
    nameserver 2606:4700:4700::1111
    
    '
    

    path: /etc/resolv.conf
    permissions: '0644'
    ...

@mysticaltech
Copy link
Collaborator

@maaft Thanks for confirming and trying it out. With that I can work on a permanent fix. Will do ASAP 🙏

@kube-hetzner/maintainers FYI

@mysticaltech
Copy link
Collaborator

And yes, turning off upgrade would solve that, but not necessary, will create a fix. Probably a systemctl shell that run after each reboot to make sure everything is kosher.

@jhass
Copy link

jhass commented Apr 23, 2024

Forcing a re-run of cloud-init requires commands like cloud-init single --frequency always --name <> with the name value set to write_files, runcmd etc.

However that seems rather brittle because 1. cloud-init files are never updated after node creation (changing user_data has no effect after node creation) and 2. if they were to be updated manually with a provisioner one might forget to add or remove a module from the boot command as the config is changed.

I guess one approach to workaround this somewhat is to reduce the cloud init config to the bare minimum of just invoking a script that does the setup and have that script reprovisioned. That should also help with updating the SELinux rules I guess.

Nonetheless the automatic "atomic" updates somehow rolling back the system prior the first cloud-init run (without rerunning cloud-init since the state for that is in a stateful part of the system) seems like a deeper issue that shouldn't be just hacked around :/ I've had to completely remove and re-add nodes due to this because even re-initializing the node through a combination of forced cloud-init reruns and manually tainting terraform state didn't help since I couldn't recover the kubelet's node key in that state sometimes.

@maaft
Copy link

maaft commented Apr 23, 2024

@jhass totally agree, the underlying issue has to be fixed

@maaft
Copy link

maaft commented Apr 29, 2024

@mysticaltech Hey friend! Did you have time to investigate this issue? Anything I can do to help?

@mysticaltech
Copy link
Collaborator

@maaft Today will put my head on this, it will be fixed. Keep you posted 🙏

@mysticaltech
Copy link
Collaborator

Forcing a re-run of cloud-init requires commands like cloud-init single --frequency always --name <> with the name value set to write_files, runcmd etc.

However that seems rather brittle because 1. cloud-init files are never updated after node creation (changing user_data has no effect after node creation) and 2. if they were to be updated manually with a provisioner one might forget to add or remove a module from the boot command as the config is changed.

I guess one approach to workaround this somewhat is to reduce the cloud init config to the bare minimum of just invoking a script that does the setup and have that script reprovisioned. That should also help with updating the SELinux rules I guess.

Nonetheless the automatic "atomic" updates somehow rolling back the system prior the first cloud-init run (without rerunning cloud-init since the state for that is in a stateful part of the system) seems like a deeper issue that shouldn't be just hacked around :/ I've had to completely remove and re-add nodes due to this because even re-initializing the node through a combination of forced cloud-init reruns and manually tainting terraform state didn't help since I couldn't recover the kubelet's node key in that state sometimes.

@jhass Thanks for sharing this. The nodes are pretty stateless and unchanging apart from the updates, there is indeed a deeper bug in microos, but it will get fixed probably (someone was asked to signal it and I tagged the project lead too), but anyways even if not, running cloudinit after each reboot is a quick "backup" work-around, even though it does not pull the new user-data, this is actually not needed as it pretty much does not change.

@maaft
Copy link

maaft commented May 1, 2024

there is indeed a deeper bug in microos

Are you sure it is MicroOS? Because my old clusters (provisioned with a version of this repo ~ 1.5 years ago, continue to run and upgrade their OS flawlessly.

If the reason is really a bug in MicroOS, all my clusters would be non-functional by now.

.. which would be an absolute disaster since we're running prod on this. So the option that it might be MicroOS bug gives me some serious shivers and I'll probably not sleep very well until this is resolved :3

@mysticaltech
Copy link
Collaborator

@maaft Just turn off automatic upgrades for production until we get to rhe the bottom of this. See readme. My guess is that there was one upgrade that has gone wrong or a particular upgrade edge case that is being hit. So in prod, to have peace of mind, best to turn off for now. i will update the docs.

@mysticaltech
Copy link
Collaborator

@maaft If your system is HA, adding --force-reboot to kured_options will protect you from this. As identified by @andi0b, the potential issue are caused by systems that fail to drain and reboot for more than 20 days, after which the old snapshot is gone.

@michailkorenev
Copy link

michailkorenev commented May 15, 2024

@mysticaltech
I experienced similar problem on one of the nodes. After the OS upgrade all the pods on this node lost their Internet connection (except Cilium). Node's network was configured to use floating IP, but this IP was assigned to a different node. After removing floating IP config and rebooting the node, network connectivity was restored.

Kube-Hetzner version used: 2.0.2
Network plugin: cilium, strict replacement of kube-proxy
Commands for setting up floating IP:

      CONN_UUID="$(nmcli connection show | grep eth0 | awk -F '  ' '{print $1}')"
      nmcli connection modify "$CONN_UUID" +ipv4.address <FLOATING_IP_ADDRESS>/32
      nmcli connection modify "$CONN_UUID" connection.autoconnect yes
      touch /var/run/reboot-required

Commands for removing floating IP config:

      CONN_UUID="$(nmcli connection show | grep eth0 | awk -F '  ' '{print $1}')"
      nmcli connection modify "$CONN_UUID" -ipv4.address <FLOATING_IP_ADDRESS>/32

@maggie44
Copy link
Contributor

@mysticaltech, another one here, this time from Prometheus node exporter.

module my_custom_policy 1.0;

require {
        type container_log_t;
        type container_t;
        type container_file_t;
        class dir read;
        class file lock;
}

#============= container_t ==============

#!!!! This avc is a constraint violation.  You would need to modify the attributes of either the source or target types to allow this access.
#Constraint rule: 
#       mlsconstrain file { ioctl read lock execute execute_no_trans } ((h1 dom h2 -Fail-)  or (t1 != mcs_constrained_type -Fail-) ); Constraint DENIED
mlsconstrain file { write setattr append unlink link rename } ((h1 dom h2 -Fail-)  or (t1 != mcs_constrained_type -Fail-) ); Constraint DENIED
mlsconstrain file { create relabelto } ((h1 dom h2 -Fail-)  and (l2 eq h2)  or (t1 != mcs_constrained_type -Fail-) ); Constraint DENIED
mlsconstrain file { relabelfrom } ((h1 dom h2 -Fail-)  or (t1 != mcs_constrained_type -Fail-) ); Constraint DENIED

#       Possible cause is the source level (s0:c573,c577) and target level (s0:c80,c276) are different.
allow container_t container_file_t:file lock;

#!!!! This avc is allowed in the current policy
allow container_t container_log_t:dir read;

@maaft
Copy link

maaft commented Jul 9, 2024

@mysticaltech Found some more:

module my_custom_policy 1.0;

require {
	type shell_exec_t;
	type systemd_generic_generator_t;
	type snapperd_data_t;
	type bin_t;
	type systemd_fstab_generator_t;
	type udev_var_run_t;
	type passwd_file_t;
	class file { execute execute_no_trans getattr open read };
	class dir { getattr search };
}

#============= systemd_fstab_generator_t ==============
allow systemd_fstab_generator_t snapperd_data_t:dir getattr;

#============= systemd_generic_generator_t ==============
allow systemd_generic_generator_t bin_t:file { execute execute_no_trans };
allow systemd_generic_generator_t passwd_file_t:file { getattr open read };
allow systemd_generic_generator_t shell_exec_t:file execute;
allow systemd_generic_generator_t udev_var_run_t:dir { getattr search };

How can I persist these rules so that new nodepools (and auto-scaled nodes) also get the new rules?

@aleksasiriski
Copy link
Member

aleksasiriski commented Jul 9, 2024

How can I persist these rules so that new nodepools (and auto-scaled nodes) also get the new rules?

Currently they are persisted when added to the module, but you can do that as well locally until it's merged in the repo.

@aleksasiriski
Copy link
Member

New ones, required by SigNoz to mount hostfs (/) to read the host metrics:

module my_custom_policy 1.0;

require {
        type fixed_disk_device_t;
        type container_t;
        type removable_device_t;
        class blk_file getattr;
}

#============= container_t ==============

#!!!! This avc can be allowed using the boolean 'container_use_devices'
allow container_t fixed_disk_device_t:blk_file getattr;

#!!!! This avc can be allowed using the boolean 'container_use_devices'
allow container_t removable_device_t:blk_file getattr;

@aleksasiriski
Copy link
Member

aleksasiriski commented Sep 18, 2024

I believe we should consider disabling SELinux by default, at least for agent nodes. I believe the whole point is to make it secure for one own's cluster, not to include a lot of default rules since that defeats the purpose. This could be a topic of discussion for the v3.

I plan to open a discussion soon on this repo so we can talk about defaults and behaviors since the rewrite is going very well and is nearing completion.

@mysticaltech
Copy link
Collaborator

@aleksasiriski You have a point, will run it through o1, see what it says ☺️

@aleksasiriski
Copy link
Member

aleksasiriski commented Sep 25, 2024

New ones for juicefs:

module my_custom_policy 1.0;

require {
        type container_t;
        type container_var_lib_t;
        class sock_file write;
}

#============= container_t ==============
allow container_t container_var_lib_t:sock_file write;

@maggie44
Copy link
Contributor

Another one here for node-exporter:

module my_custom_policy 1.0;

require {
	type container_file_t;
	type spc_t;
	type container_t;
	type container_var_run_t;
	class file lock;
	class sock_file write;
	class memprotect mmap_zero;
}

#============= container_t ==============

#!!!! This avc is a constraint violation.  You would need to modify the attributes of either the source or target types to allow this access.
#Constraint rule: 
#	mlsconstrain file { ioctl read lock execute execute_no_trans } ((h1 dom h2 -Fail-)  or (t1 != mcs_constrained_type -Fail-) ); Constraint DENIED
mlsconstrain file { write setattr append unlink link rename } ((h1 dom h2 -Fail-)  or (t1 != mcs_constrained_type -Fail-) ); Constraint DENIED
mlsconstrain file { create relabelto } ((h1 dom h2 -Fail-)  and (l2 eq h2)  or (t1 != mcs_constrained_type -Fail-) ); Constraint DENIED
mlsconstrain file { relabelfrom } ((h1 dom h2 -Fail-)  or (t1 != mcs_constrained_type -Fail-) ); Constraint DENIED

#	Possible cause is the source level (s0:c182,c512) and target level (s0:c555,c658) are different.
allow container_t container_file_t:file lock;
allow container_t container_var_run_t:sock_file write;

#============= spc_t ==============

#!!!! This avc can be allowed using the boolean 'mmap_low_allowed'
allow spc_t self:memprotect mmap_zero;

@maggie44
Copy link
Contributor

And another one:

module my_custom_policy 1.0;

require {
	type container_var_run_t;
	type container_t;
	class dir write;
}

#============= container_t ==============
allow container_t container_var_run_t:dir write;

This time for isito CNI to work.

@aleksasiriski
Copy link
Member

And another one:

module my_custom_policy 1.0;

require {
	type container_var_run_t;
	type container_t;
	class dir write;
}

#============= container_t ==============
allow container_t container_var_run_t:dir write;

This time for isito CNI to work.

Have you disabled all three of the provided CNIs to have Istio or is it running alongside one of the supported ones?

@maggie44
Copy link
Contributor

And another one:

module my_custom_policy 1.0;

require {
	type container_var_run_t;
	type container_t;
	class dir write;
}

#============= container_t ==============
allow container_t container_var_run_t:dir write;

This time for isito CNI to work.

Have you disabled all three of the provided CNIs to have Istio or is it running alongside one of the supported ones?

With Calico

@mysticaltech mysticaltech pinned this issue Nov 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants