Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRIU Fails to Restore Ubuntu Container in Minikube #2534

Open
binaryBard97 opened this issue Nov 29, 2024 · 8 comments
Open

CRIU Fails to Restore Ubuntu Container in Minikube #2534

binaryBard97 opened this issue Nov 29, 2024 · 8 comments

Comments

@binaryBard97
Copy link

binaryBard97 commented Nov 29, 2024

Description
I'm encountering an issue while attempting to restore a container running in local minikube setup. I am trying to implement this tutorial but I am not interested in migration.

Steps to reproduce the issue:

  1. minikube ssh
  2. mkdir cat-rootfs
  3. cd cat-rootfs
  4. debootstrap --arch amd64 focal /containers/cat-rootfs http://archive.ubuntu.com/ubuntu/
  5. cd /home/cat
  6. generate config.json using oci-runtime-tool
  7. jq '.linux.namespaces |= map(select(.type != "network"))' config.json > temp.json && mv temp.json config.json
  8. jq '.root.path = "/containers/cat-rootfs"' /home/cat/config.json > /home/cat/config.json.tmp && mv /home/cat/config.json.tmp /home/cat/config.json
  9. runc checkpoint <>
  10. kill the container
  11. runc restore <>

Describe the results you received:

root@minikube:/home/cat# runc restore cat
ERRO[0000] criu failed: type NOTIFY errno 0
log file: restore.log 

Describe the results you expected:
Restored container

Additional information you deem important (e.g. issue happens only occasionally):
1.

root@minikube:/home/cat# criu restore
Error (criu/protobuf.c:72): Unexpected EOF on (empty-image)
root@minikube:/home/cat# runc --version
runc version 1.1.12-0ubuntu2~22.04.1
spec: 1.0.2-dev
go: go1.21.1
libseccomp: 2.5.3
root@minikube:/home/cat# 

root@minikube:/home/cat/checkpoint# cat /proc/self/cgroup
0::/user.slice/user-0.slice/session-c3.scope
root@minikube:/home/cat/checkpoint# ls 
cgroup.img   descriptors.json  fdinfo-3.img  fs-15.img   inventory.img     mm-15.img           pagemap-1.img   pages-2.img  seccomp.img   tmpfs-dev-155.tar.gz.img  utsns-12.img
core-1.img   dump.log          files.img     ids-1.img   ipcns-var-11.img  mountpoints-13.img  pagemap-15.img  pstree.img   stats-dump    tmpfs-dev-159.tar.gz.img
core-15.img  fdinfo-2.img      fs-1.img      ids-15.img  mm-1.img          netns-10.img        pages-1.img     restore.log  timens-0.img  tmpfs-dev-177.tar.gz.img
root@minikube:/etc/criu# cat runc.conf 
tcp-close
skip-in-flight 

CRIU logs and information:

CRIU full dump/restore logs:

root@minikube:/home/cat/checkpoint# cat restore.log 
(00.000000) Parsing config file /etc/criu/runc.conf
(00.000373) Version: 4.0 (gitid 0)
(00.000388) Running on minikube Linux 6.10.4-linuxkit #1 SMP PREEMPT_DYNAMIC Mon Aug 12 08:48:58 UTC 2024 x86_64
(00.000390) Would overwrite RPC settings with values from /etc/criu/runc.conf
(00.000896) Loaded kdat cache from /run/criu.kdat
(00.001079) Hugetlb size 2 Mb is supported but cannot get dev's number
(00.001872) cpu: x86_family 6 x86_vendor_id GenuineIntel x86_model_id Intel(R) Core(TM) i5-7400 CPU @ 3.00GHz
(00.001907) cpu: fpu: xfeatures_mask 0x5 xsave_size 832 xsave_size_max 832 xsaves_size 832
(00.001917) cpu: fpu: x87 floating point registers     xstate_offsets      0 / 0      xstate_sizes    160 / 160   
(00.001921) cpu: fpu: AVX registers                    xstate_offsets    576 / 576    xstate_sizes    256 / 256   
(00.002201) Parsing config file /etc/criu/runc.conf
(00.002246) Will skip in-flight TCP connections
(00.002261) Will drop all TCP connections on restore
(00.002331) rlimit: RLIMIT_NOFILE unlimited for self
(00.003214) cpu: fpu:1 fxsr:1 xsave:1 xsaveopt:1 xsavec:1 xgetbv1:1 xsaves:0
(00.003310) kernel pid_max=4194304
(00.003321) Reading image tree
(00.004304) Add mnt ns 13 pid 1
(00.004314) Add net ns 10 pid 1
(00.004329) Add pid ns 9 pid 1
(00.004752) pstree pid_max=15
(00.004760) Will restore in 6c020000 namespaces
(00.004781) NS mask to use 6c020000
(00.004862) Collecting 51/56 (flags 3)
(00.004955) No memfd.img image
(00.004960)  `- ... done
(00.004975) Collecting 40/54 (flags 2)
(00.005582) Collected [usr/bin/dash] ID 0x1
(00.005597) Collected [usr/lib/x86_64-linux-gnu/libc.so.6] ID 0x2
(00.005601) Collected [usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2] ID 0x3
(00.005603) Collected [dev/null] ID 0x4
(00.005614) Collected pipe entry ID 0x5 PIPE ID 0x56e73
(00.005649) Found id pipe:[355955] (fd 2) in inherit fd list
(00.005666) Collected pipe entry ID 0x6 PIPE ID 0x56e74
(00.005673) Found id pipe:[355956] (fd 9) in inherit fd list
(00.005676) Collected [.] ID 0x7
(00.005678) Collected [.] ID 0x8
(00.005681) Collected [usr/bin/sleep] ID 0x9
(00.005683) Collected [.] ID 0xa
(00.005684) Collected [.] ID 0xb
(00.005692)  `- ... done
(00.005702) Collecting 46/68 (flags 0)
(00.005730) No remap-fpath.img image
(00.005734)  `- ... done
(00.005782) No apparmor.img image
(00.006555) cg: Preparing cgroups yard (cgroups restore mode 0x4)
(00.007198) cg: Opening .criu.cgyard.dqKsqL as cg yard
(00.007243) cg:         Making controller dir .criu.cgyard.dqKsqL/unified (), type cgroup2
(00.007309) cg: Determined cgroup dir unified/crio/crio-1118ea5d8ac7d93168e2ffbc08f7570ec979724c42d6dfa6456fe86dbfae565b already exist
(00.007312) cg: Skip restoring properties on cgroup dir unified/crio/crio-1118ea5d8ac7d93168e2ffbc08f7570ec979724c42d6dfa6456fe86dbfae565b
(00.007586) Running pre-restore scripts
(00.007601)     RPC
(00.007760) cg: cgroud: Daemon started
(00.008136) net: Saved netns fd for links restore
(00.010524) mnt: Reading mountpoint images (id 13 pid 1)
(00.010593) mnt:                Will mount 597 from /
(00.010610) mnt:                Will mount 597 @ /tmp/.criu.mntns.TUqdaZ/mnt-0000000597 /sys/firmware
(00.010613) mnt:        Read 597 mp @ /sys/firmware
(00.010618) mnt:                Will mount 596 from /dev/null (E)
(00.010620) mnt:                Will mount 596 @ /tmp/.criu.mntns.TUqdaZ/mnt-0000000596 /proc/timer_list
(00.010621) mnt:        Read 596 mp @ /proc/timer_list
(00.010624) mnt:                Will mount 595 from /dev/null (E)
(00.010625) mnt:                Will mount 595 @ /tmp/.criu.mntns.TUqdaZ/mnt-0000000595 /proc/keys
(00.010626) mnt:        Read 595 mp @ /proc/keys
(00.010633) mnt:                Will mount 594 from /dev/null (E)
(00.010634) mnt:                Will mount 594 @ /tmp/.criu.mntns.TUqdaZ/mnt-0000000594 /proc/kcore
(00.010636) mnt:        Read 594 mp @ /proc/kcore
(00.010638) mnt:                Will mount 593 from /
(00.010639) mnt:                Will mount 593 @ /tmp/.criu.mntns.TUqdaZ/mnt-0000000593 /proc/acpi
(00.010641) mnt:        Read 593 mp @ /proc/acpi
(00.010643) mnt:                Will mount 592 from /sysrq-trigger
(00.010644) mnt:                Will mount 592 @ /tmp/.criu.mntns.TUqdaZ/mnt-0000000592 /proc/sysrq-trigger
(00.010646) mnt:        Read 592 mp @ /proc/sysrq-trigger
(00.010674) mnt:                Will mount 591 from /sys
(00.010678) mnt:                Will mount 591 @ /tmp/.criu.mntns.TUqdaZ/mnt-0000000591 /proc/sys
(00.010679) mnt:        Read 591 mp @ /proc/sys
(00.010681) mnt:                Will mount 590 from /irq
(00.010683) mnt:                Will mount 590 @ /tmp/.criu.mntns.TUqdaZ/mnt-0000000590 /proc/irq
(00.010684) mnt:        Read 590 mp @ /proc/irq
(00.010700) mnt:                Will mount 589 from /fs
(00.010704) mnt:                Will mount 589 @ /tmp/.criu.mntns.TUqdaZ/mnt-0000000589 /proc/fs
(00.010705) mnt:        Read 589 mp @ /proc/fs
(00.010707) mnt:                Will mount 588 from /bus
(00.010709) mnt:                Will mount 588 @ /tmp/.criu.mntns.TUqdaZ/mnt-0000000588 /proc/bus
(00.010710) mnt:        Read 588 mp @ /proc/bus
(00.010713) Error (criu/mount.c:3130): mnt: No mapping for 822:(null) mountpoint
(00.014951) Error (criu/cgroup.c:1998): cg: cgroupd: recv req error: No such file or directory

Output of `criu --version`:

root@minikube:/home/cat# criu --version
Version: 4.0

Output of `criu check --all`:

root@minikube:/home/cat/checkpoint# criu check --all
Warn  (criu/cr-check.c:824): Dirty tracking is OFF. Memory snapshot will not work.
Error (criu/cr-check.c:1225): UFFD is not supported
Error (criu/cr-check.c:1225): UFFD is not supported
Warn  (criu/cr-check.c:1348): Nftables based locking requires libnftables and set concatenations support
Looks good but some kernel features are missing
which, depending on your process tree, may cause
dump or restore failure.

Additional environment details:
1.minikube start --driver=docker --network=host --container-runtime=cri-o --extra-config=kubelet.feature-gates=ContainerCheckpoint=true

root@minikube:/home#  stat -fc %T /sys/fs/cgroup/
cgroup2fs
root@minikube:/home/cat/checkpoint# grep -i "596" dump.log
(00.111412)     type tmpfs source tmpfs mnt_id 596 s_dev 0x9b /null @ ./proc/timer_list flags 0x1000002 options size=65536k,mode=755
(00.111455) mnt:                Working on 596->810
(00.111516) mnt:        Resorting children of 596 in mount order
(00.111565) mnt:   [./proc/timer_list](596->810)
(00.111596) mnt:        The mount 595 is bind for 596 (@./proc/keys -> @./proc/timer_list)
(00.111599) mnt:        The mount 594 is bind for 596 (@./proc/kcore -> @./proc/timer_list)
(00.111600) mnt:        The mount 811 is bind for 596 (@./dev -> @./proc/timer_list)
(00.111654) mnt: Inspecting sharing on 596 shared_id 0 master_id 0 (@./proc/timer_list)
(00.236983) mnt:        596: 9b:/null @ ./proc/timer_list
@adrianreber
Copy link
Member

Why do you grep for 596? Not sure what are you are trying to say with the last step.

The tutorial you are following was written by myself but that was a couple of years ago.

To know which mountpoint is causing the problems I think following command could help. Do a crit show mountpoints-13.img in the checkpoint/ directory and list the information related to ID 822.

What exactly does minikube do? Are you running the runc container in a Kubernetes container?

@binaryBard97
Copy link
Author

Thank you so much for the reply.

  1. I ran grep for 596 to check if there were any errors related to 596 during the checkpointing process. I originally intended to check for ID 822 but accidentally referenced an old restore.log.

Latest restore log:

(00.428744) Error (criu/mount.c:3130): mnt: No mapping for 762:(null) mountpoint
(00.429596) Error (criu/cgroup.c:1970): cg: cgroupd: recv req error: No such file or directory

I then ran crit show mountpoints-13.img and found the following info related to ID 762:

{
         "fstype": 5,
         "mnt_id": 762,
         "root_dev": 149,
         "parent_mnt_id": 749,
         "flags": 2097153,
         "root": "/",
         "mountpoint": "/var/run/secrets/kubernetes.io/serviceaccount",
         "source": "tmpfs",
         "options": "size=4008620k,noswap",
         "shared_id": 0,
         "master_id": 0,
         "sb_flags": 0,
         "ext_key": "/var/run/secrets/kubernetes.io/serviceaccount"
     },
  1. Minikube is my local setup for running containers. My main goal is to implement checkpoint/restore in Kubernetes. Since my container runtime is CRI-O (and underneath that is runc), I figured I would be able to follow what you did in the tutorial, and that’s why I’ve been using the runc restore command. 😅

@adrianreber
Copy link
Member

My main goal is to implement checkpoint/restore in Kubernetes.

Try to use the Kubernetes API endpoints for it. The tutorial you found is only about runc, not in combination with Kubernetes.

I am still very confused what you are trying to do. Have you seen this: https://kubernetes.io/blog/2022/12/05/forensic-container-checkpointing-alpha/ ? That should have all steps described which you need to do to checkpoint and restore a container with Kubernetes.

@binaryBard97
Copy link
Author

Yes, I did saw that article and I was able to restore the container using buildah, but the issue I am facing stems from how the container's state is being restored. I too am running a counter app like you demo'ed at some conferences. With buildah ... I am not seeing the expected behavior i.e counter picking up from where it left off.

thank you.

@adrianreber
Copy link
Member

I am using the following image: quay.io/adrianreber/counter Try that one. That is known to work.

Which one are you using? I can try it with your image to see if it would work for me.

@binaryBard97
Copy link
Author

  1. I will try that.
  2. I am using a simple ubuntu image and have pod manifest like:
apiVersion: v1
kind: Pod
metadata:
  name: test
  namespace: counter
spec:
  containers:
    - name: test-container
      image: docker.io/ubuntu:22
      command:
        [
          "bin/bash",
          "-c",
          "i=1; while true; do echo Hello World $i; i=$((i+1)); sleep 1; done"
        ]

thanks

@adrianreber
Copy link
Member

Ah, maybe we never tested a container with a command specified in the yaml file. Maybe the console handling also does not work.

@rst0git
Copy link
Member

rst0git commented Dec 3, 2024

maybe we never tested a container with a command specified in the yaml file

It works, we did similar experiments with @viktoriaas.

Maybe the console handling also does not work.

The restore part was recently fixed in CRI-O with cri-o/cri-o#8290

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants