Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Native arm64 builds: loop device woes #522

Open
jakob-tsd opened this issue Nov 19, 2024 · 3 comments · May be fixed by #524
Open

Native arm64 builds: loop device woes #522

jakob-tsd opened this issue Nov 19, 2024 · 3 comments · May be fixed by #524

Comments

@jakob-tsd
Copy link
Contributor

We run debos on arm64 with --disable-fakemachine on our RK3588-based SBC ( https://embedded.cherry.de/product/jaguar-sbc-rk3588/ ). If relevant, recipes and build script are here: https://git.embedded.cherry.de/debos-recipes.git/tree/

We run 4 builds concurrently and it works quite well, thank you!

However, every 20 builds or so, we get a loop device related failure. Examples:

2024/11/06 14:55:04 apt | Failed to stat /dev/loop0: No such file or directory
2024/11/06 14:55:04 Action `recipe` failed at stage Run, error: exit status 1
2024/10/18 12:00:28 ==== image-partition ====
2024/10/18 12:00:28 parted | Error: Partition(s) on /dev/loop0 are being used.
2024/10/18 12:00:29 Action `image-partition` failed at stage Run, error: exit status 1
2024/10/18 12:00:29 Warning: Failed to get unmount /: device or resource busy
2024/10/18 12:00:29 Unmount failure can cause images being incomplete!

Have you seen something like this already?

@jakob-tsd
Copy link
Contributor Author

Oh. I think we need this: freddierice/go-losetup@d9566aa

And also, if Attach() fails, we should retry like losetup does:
https://github.com/util-linux/util-linux/blob/4c4b248c68149089c8be2f830214bb2be693307e/sys-utils/losetup.c#L662

jakob-tsd pushed a commit to jakob-tsd/debos that referenced this issue Nov 19, 2024
We were stuck in 2017 (v1.0.0-20170407175016-fc9adea44124).

Related: go-debos#522
@obbardc
Copy link
Member

obbardc commented Nov 19, 2024

Oh. I think we need this: freddierice/go-losetup@d9566aa

That should be solved with #523 right ?

And also, if Attach() fails, we should retry like losetup does: https://github.com/util-linux/util-linux/blob/4c4b248c68149089c8be2f830214bb2be693307e/sys-utils/losetup.c#L662

We do something similar for closing the loop device, perhaps you could use that as inspiration ?
https://github.com/go-debos/debos/blob/main/actions/image_partition_action.go#L668

I am happy to take fixes around this if it helps your usecase, even though --disable-fakemachine really isn't a usecase which debos suggests.

@jakob-tsd
Copy link
Contributor Author

Yes #523 will pull in the go-losetup fix.

The reason I am using --disable-fakemachine is that I don't have KVM support on the builder right now.

jakob-tsd pushed a commit to jakob-tsd/debos that referenced this issue Dec 2, 2024
losetup.Attach() can fail due to concurrent attaches in other processes
as seen in go-debos#522 .

The problem is a race condition between finding a free loop device
and attaching the image.

Now that we have go-losetup v2, which does report the error, we can do
what util-linux does
( https://github.com/util-linux/util-linux/blob/4c4b248c68149089c8be2f830214bb2be693307e/sys-utils/losetup.c#L662 )
and retry on failure.

I only sleep for 200 ms as opposed to 1 second as in
https://github.com/go-debos/debos/blob/78aad24dc068ec2aac0355c165f760b953379b8f/actions/image_partition_action.go#L668
because the race condition should immediately resolve without waiting
at all.

I still sleep for 200 ms as this is what util-linux does to
prevent spinning ( util-linux/util-linux@3ff6fb8 ).

Fixes: go-debos#522
jakob-tsd pushed a commit to jakob-tsd/debos that referenced this issue Dec 2, 2024
losetup.Attach() can fail due to concurrent attaches in other processes
as seen in go-debos#522 .

The problem is a race condition between finding a free loop device
and attaching the image.

Now that we have go-losetup v2, which does report the error, we can do
what util-linux does
( https://github.com/util-linux/util-linux/blob/4c4b248c68149089c8be2f830214bb2be693307e/sys-utils/losetup.c#L662 )
and retry on failure.

I only sleep for 200 ms as opposed to 1 second as in
https://github.com/go-debos/debos/blob/78aad24dc068ec2aac0355c165f760b953379b8f/actions/image_partition_action.go#L668
because the race condition should immediately resolve without waiting
at all.

I still sleep for 200 ms as this is what util-linux does to
prevent spinning ( util-linux/util-linux@3ff6fb8 ).

Fixes: go-debos#522
@jakob-tsd jakob-tsd linked a pull request Dec 2, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants