Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(multi-device) bcachefs mount fails on boot since bcachefs-tools 1.9.0 #308

Open
Flowdalic opened this issue Jun 27, 2024 · 19 comments
Open

Comments

@Flowdalic
Copy link

Flowdalic commented Jun 27, 2024

Starting with bcachefs-tools 1.9.0 my bcachefs mount fails on boot (i.e., with systemd mount unit), but suceeeds if I manually mount it.

I have the following fstab entry

UUID=acfd231c-8554-42dd-a69f-01a648984d01	/data-scratch   bcachefs	defaults	0 2

which works fine with bcachefs-tools 1.7.0. However, as soon s I upgrade to 1.9.0 or 1.9.1, the mount fails on boot. Unfortunately, there seems to be no error message in the journal:

Jun 27 07:41:58 neo-pc.sch systemd[1]: Mounting /data-scratch...
Jun 27 07:41:58 neo-pc.sch (udev-worker)[1072]: md126: Process 'bcache-register /dev/md126' failed with exit code 1.
Jun 27 07:41:58 neo-pc.sch systemd-networkd[1061]: eno1: Link UP
Jun 27 07:41:58 neo-pc.sch systemd[1]: data\x2dscratch.mount: Mount process exited, code=exited, status=1/FAILURE
Jun 27 07:41:58 neo-pc.sch systemd[1]: data\x2dscratch.mount: Failed with result 'exit-code'.
Jun 27 07:41:58 neo-pc.sch systemd[1]: Failed to mount /data-scratch.
Jun 27 07:41:58 neo-pc.sch systemd[1]: Dependency failed for Local File Systems.

is all I got. Note that this is a system with a bcache block device and a bcachefs filesystem, both are separate (of course). Not sure if this matters.

Can reliable reproduce with 1.9.0 and 1.9.1, problem disappears always once I downgrade back to 1.7.0.

Downstream issue report: https://bugs.gentoo.org/938596

@tmuehlbacher
Copy link

Do you have the means to add -v to bcachefs mount in your setup? Otherwise a bisect would be useful but it looks like that would not be possible to automate in terms of testing, so probably a bit of work.

@Flowdalic
Copy link
Author

Do you have the means to add -v to bcachefs mount in your setup?

Unfortunately I see no obvious way to do so with systemd's mount unit. :/

That said, I was surprised that bcachefs-tools seems to exit with a non-zero exit status without emitting an error message. Is this expected behavior?

Any other ideas on how to approach the issue? I ran out of ideas…

@tmuehlbacher
Copy link

It looks like it already fails with the bcache-register, not even bcachefs.

Can you try if it's better with #309 if you set BCACHEFS_LOG=trace for the environment of that service? The env variable is introduced with that PR.

@Flowdalic
Copy link
Author

FWIW, this also happens with 1.11.0

Can you try if it's better with #309 if you set BCACHEFS_LOG=trace for the environment of that service? The env variable is introduced with that PR.

You mean data\x2dscratch.mount (data-scratch.mount)? It's a bit tricky to set environment variables to .mount units, but I'll give it a try.

@Flowdalic
Copy link
Author

It looks like it already fails with the bcache-register, not even bcachefs.

It sure does, but how does a bcache command have an impact on a bcachefs mount? It would match my suspicion that this is related to my dual bcache / bcachefs setup (and the feeling that not many users are affected by this, since they probably have an exclusive bcachefs setup).

In any case, here is the trace output (in reverse order, due to journalctl -r being used):

Unsuccessful mount using bcachefs-tools 1.11.0:

Aug 27 13:09:06 neo-pc.sch systemd[1]: Failed to mount /data-scratch.
Aug 27 13:09:06 neo-pc.sch systemd[1]: data\x2dscratch.mount: Failed with result 'exit-code'.
Aug 27 13:09:06 neo-pc.sch systemd[1]: data\x2dscratch.mount: Mount process exited, code=exited, status=1/FAILURE
Aug 27 13:09:06 neo-pc.sch systemd[1]: Mounting /data-scratch...
Aug 27 13:09:06 neo-pc.sch systemd[1]: data\x2dscratch.mount: Directory /data-scratch to mount over is not empty, mounting anyway.
Aug 27 13:09:08 neo-pc.sch mount[1303]: Error: Invalid argument
Aug 27 13:09:08 neo-pc.sch mount[1303]: [INFO  src/commands/mount.rs:43] mounting filesystem
Aug 27 13:09:08 neo-pc.sch mount[1303]: [DEBUG src/commands/mount.rs:74] parsing mount options: rw
Aug 27 13:09:08 neo-pc.sch mount[1303]: [INFO  src/commands/mount.rs:303] mounting with params: device: /dev/sdb5:/dev/nvme0n1p2, target: /data-scratch, options: rw
Aug 27 13:09:08 neo-pc.sch mount[1303]: [DEBUG src/commands/mount.rs:216] enumerating devices with UUID acfd231c-8554-42dd-a69f-01a648984d01
Aug 27 13:09:08 neo-pc.sch mount[1303]: [DEBUG src/commands/mount.rs:143] Walking udev db!

Successful mount using bcachefs-tools 1.7.0:

Aug 27 13:11:35 neo-pc.sch systemd[1]: Mounted /data-scratch.
Aug 27 13:11:35 neo-pc.sch mount[1281]: bcachefs (/dev/md127): error reading superblock: Not a bcachefs superblock layout
Aug 27 13:11:35 neo-pc.sch mount[1281]: bcachefs (/dev/md127): error reading default superblock: Not a bcachefs superblock (got magic cf84833c-411e-3c24-0000-000000000000)
Aug 27 13:11:35 neo-pc.sch mount[1281]: bcachefs (/dev/md126): error reading superblock: Not a bcachefs superblock layout
Aug 27 13:11:35 neo-pc.sch mount[1281]: bcachefs (/dev/md126): error reading default superblock: Unsupported superblock version 0.1: (unknown version) (min 0.9: (unknown version), max 1.7: mi_btree_bitmap)
Aug 27 13:11:35 neo-pc.sch mount[1281]: bcachefs (/dev/dm-0): error reading superblock: Not a bcachefs superblock layout
Aug 27 13:11:35 neo-pc.sch mount[1281]: bcachefs (/dev/dm-0): error reading default superblock: Not a bcachefs superblock (got magic f9638b04-0800-0000-003e-000069000000)
Aug 27 13:11:35 neo-pc.sch mount[1281]: bcachefs (/dev/bcache0): error reading superblock: Not a bcachefs superblock layout
Aug 27 13:11:35 neo-pc.sch mount[1281]: bcachefs (/dev/bcache0): error reading default superblock: Not a bcachefs superblock (got magic 9ed60388-0300-82fc-0000-000000000000)
Aug 27 13:11:35 neo-pc.sch mount[1281]: bcachefs (/dev/nvme0n1p4): error reading superblock: Not a bcachefs superblock layout
Aug 27 13:11:35 neo-pc.sch mount[1281]: bcachefs (/dev/nvme0n1p4): error reading default superblock: Not a bcachefs superblock (got magic 00100000-0000-0000-00f0-0f0000000000)
Aug 27 13:11:33 neo-pc.sch systemd[1]: Mounting /data-scratch...
Aug 27 13:11:33 neo-pc.sch systemd[1]: data\x2dscratch.mount: Directory /data-scratch to mount over is not empty, mounting anyway.
Aug 27 13:11:35 neo-pc.sch mount[1281]: bcachefs (/dev/nvme0n1p3): error reading superblock: Not a bcachefs superblock layout
Aug 27 13:11:35 neo-pc.sch mount[1281]: bcachefs (/dev/nvme0n1p3): error reading default superblock: Unsupported superblock version 0.3: (unknown version) (min 0.9: (unknown version), max 1.7: mi_btree_bitmap)
Aug 27 13:11:35 neo-pc.sch mount[1281]: bcachefs (/dev/nvme0n1p1): error reading superblock: Not a bcachefs superblock layout
Aug 27 13:11:35 neo-pc.sch mount[1281]: bcachefs (/dev/nvme0n1p1): error reading default superblock: Not a bcachefs superblock (got magic e016f974-9d7f-0000-7700-000005000000)
Aug 27 13:11:35 neo-pc.sch mount[1281]: bcachefs (/dev/nvme0n1): error reading superblock: Not a bcachefs superblock layout
Aug 27 13:11:35 neo-pc.sch mount[1281]: bcachefs (/dev/nvme0n1): error reading default superblock: Not a bcachefs superblock (got magic 00000000-0000-0000-0000-000000000000)
Aug 27 13:11:35 neo-pc.sch mount[1281]: bcachefs (/dev/sdd2): error reading superblock: Not a bcachefs superblock layout
Aug 27 13:11:35 neo-pc.sch mount[1281]: bcachefs (/dev/sdd2): error reading default superblock: Not a bcachefs superblock (got magic 00100000-0000-0000-00f0-0f0000000000)
Aug 27 13:11:35 neo-pc.sch mount[1281]: bcachefs (/dev/sdd1): error reading superblock: Not a bcachefs superblock layout
Aug 27 13:11:35 neo-pc.sch mount[1281]: bcachefs (/dev/sdd1): error reading default superblock: Not a bcachefs superblock (got magic 00000000-0000-0000-0000-000000000000)
Aug 27 13:11:35 neo-pc.sch mount[1281]: bcachefs (/dev/sdd): error reading superblock: Not a bcachefs superblock layout
Aug 27 13:11:35 neo-pc.sch mount[1281]: bcachefs (/dev/sdd): error reading default superblock: Not a bcachefs superblock (got magic 00000000-0000-0000-0000-000000000000)
Aug 27 13:11:35 neo-pc.sch mount[1281]: bcachefs (/dev/sdc2): error reading superblock: Not a bcachefs superblock layout
Aug 27 13:11:35 neo-pc.sch mount[1281]: bcachefs (/dev/sdc2): error reading default superblock: Not a bcachefs superblock (got magic 00100000-0000-0000-00f0-0f0000000000)
Aug 27 13:11:35 neo-pc.sch mount[1281]: bcachefs (/dev/sdc1): error reading superblock: Not a bcachefs superblock layout
Aug 27 13:11:35 neo-pc.sch mount[1281]: bcachefs (/dev/sdc1): error reading default superblock: Not a bcachefs superblock (got magic 00000000-0000-0000-0000-000000000000)
Aug 27 13:11:35 neo-pc.sch mount[1281]: bcachefs (/dev/sdc): error reading superblock: Not a bcachefs superblock layout
Aug 27 13:11:35 neo-pc.sch mount[1281]: bcachefs (/dev/sdc): error reading default superblock: Not a bcachefs superblock (got magic 00000000-0000-0000-0000-000000000000)
Aug 27 13:11:35 neo-pc.sch mount[1281]: bcachefs (/dev/sdb4): error reading superblock: Not a bcachefs superblock layout
Aug 27 13:11:35 neo-pc.sch mount[1281]: bcachefs (/dev/sdb4): error reading default superblock: Not a bcachefs superblock (got magic 00100000-0000-0000-00f0-0f0000000000)
Aug 27 13:11:35 neo-pc.sch mount[1281]: bcachefs (/dev/sdb3): error reading superblock: Not a bcachefs superblock layout
Aug 27 13:11:35 neo-pc.sch mount[1281]: bcachefs (/dev/sdb3): error reading default superblock: Not a bcachefs superblock (got magic 7af5a101-9321-b56d-6c6f-63616c686f73)
Aug 27 13:11:35 neo-pc.sch mount[1281]: bcachefs (/dev/sdb2): error reading superblock: Not a bcachefs superblock layout
Aug 27 13:11:35 neo-pc.sch mount[1281]: bcachefs (/dev/sdb2): error reading default superblock: Not a bcachefs superblock (got magic 2bbd3475-f730-28f7-6c6f-63616c686f73)
Aug 27 13:11:35 neo-pc.sch mount[1281]: bcachefs (/dev/sdb1): error reading superblock: Not a bcachefs superblock layout
Aug 27 13:11:35 neo-pc.sch mount[1281]: bcachefs (/dev/sdb1): error reading default superblock: Not a bcachefs superblock (got magic 00000000-0000-0000-0000-000000000000)
Aug 27 13:11:35 neo-pc.sch mount[1281]: bcachefs (/dev/sdb): error reading superblock: Not a bcachefs superblock layout
Aug 27 13:11:35 neo-pc.sch mount[1281]: bcachefs (/dev/sdb): error reading default superblock: Not a bcachefs superblock (got magic 00000000-0000-0000-0000-000000000000)
Aug 27 13:11:35 neo-pc.sch mount[1281]: bcachefs (/dev/sda4): error reading superblock: Not a bcachefs superblock layout
Aug 27 13:11:35 neo-pc.sch mount[1281]: bcachefs (/dev/sda4): error reading default superblock: Not a bcachefs superblock (got magic 00100000-0000-0000-00f0-0f0000000000)
Aug 27 13:11:35 neo-pc.sch mount[1281]: bcachefs (/dev/sda3): error reading superblock: Not a bcachefs superblock layout
Aug 27 13:11:35 neo-pc.sch mount[1281]: bcachefs (/dev/sda3): error reading default superblock: Not a bcachefs superblock (got magic 7af5a101-9321-b56d-6c6f-63616c686f73)
Aug 27 13:11:35 neo-pc.sch mount[1281]: bcachefs (/dev/sda2): error reading superblock: Not a bcachefs superblock layout
Aug 27 13:11:35 neo-pc.sch mount[1281]: bcachefs (/dev/sda2): error reading default superblock: Not a bcachefs superblock (got magic 2bbd3475-f730-28f7-6c6f-63616c686f73)
Aug 27 13:11:35 neo-pc.sch mount[1281]: bcachefs (/dev/sda1): error reading superblock: Not a bcachefs superblock layout
Aug 27 13:11:35 neo-pc.sch mount[1281]: bcachefs (/dev/sda1): error reading default superblock: Not a bcachefs superblock (got magic 00000000-0000-0000-0000-000000000000)
Aug 27 13:11:35 neo-pc.sch mount[1281]: bcachefs (/dev/sda): error reading superblock: Not a bcachefs superblock layout
Aug 27 13:11:35 neo-pc.sch mount[1281]: bcachefs (/dev/sda): error reading default superblock: Not a bcachefs superblock (got magic 00000000-0000-0000-0000-000000000000)

csfore added a commit to csfore/gentoo that referenced this issue Aug 28, 2024
- 1.11.0 builds with sys-libs/musl-1.2.5 so 1.4.0 can be dropped
- 1.9.4 is stable so <1.9.4, sans 1.7.0, can be dropped as well

See-also: koverstreet/bcachefs-tools#308
Signed-off-by: Christopher Fore <[email protected]>
@Flowdalic
Copy link
Author

FWIW, I have the issue on a second system which does not use bcache, too. Therefore, the issue is unrelated to the existence of a bcache block device on the system. It know occurred to me that it could be the form of the used fstab entry, which is

UUID=acfd231c-8554-42dd-a69f-01a648984d01	/data-scratch   bcachefs	defaults	0 2

Is the the idiomatic format for a multi-device fstab entry with systemd?

@koverstreet
Copy link
Owner

No, mount tries to resolve the UUID= line to a specific block device, which is not how we do things.

Since we now have scanning in userspace we might be able to make that work now, but it'd need to be looked at.

Try OLD_BCACHEFS_UUID=; that was the workaround we added awhile back.

@Flowdalic
Copy link
Author

Thanks for the reply.

Try OLD_BCACHEFS_UUID=; that was the workaround we added awhile back.

I replaced UUID with OLD_BCACHEFS_UUID in the fstab entry and upgrade bcachefs-tools from 1.7.0 to 1.11.0. However, the result was the same: the data\x2dscratch.mount unit fails and systemd presents me with a prompt to either enter the root password or to press Ctrl-D to continue.

I usually enter the root password, afterwards I am presented with a shell, where I enter mount /data-scratch, which succeeds. I can then continue the boot without issues via systemctl isolate graphical.target and downgrade bcachefs-tools to 1.7.0 again.

However, this time, I somehow simply hit Ctrl-D. To my surprise, the system booted normally and the bcachefs volume was successfully mounted.

Here is the relevant part of the system's journa with BCACHEFS_LOG=trace enabled for the mount unit:

Oct 06 17:44:54 neo-pc.sch systemd[1]: Mounting /data-scratch...
Oct 06 17:44:55 neo-pc.sch systemd[1]: data\x2dscratch.mount: Mount process exited, code=exited, status=1/FAILURE
Oct 06 17:44:55 neo-pc.sch systemd[1]: data\x2dscratch.mount: Failed with result 'exit-code'.
Oct 06 17:44:55 neo-pc.sch systemd[1]: Failed to mount /data-scratch.
Oct 06 17:44:55 neo-pc.sch systemd[1]: Dependency failed for Local File Systems.
Oct 06 17:44:55 neo-pc.sch systemd[1]: local-fs.target: Job local-fs.target/start failed with result 'dependency'.
Oct 06 17:44:55 neo-pc.sch systemd[1]: local-fs.target: Triggering OnFailure= dependencies.

<Ctrl-D>

Oct 06 17:45:32 neo-pc.sch systemd[1]: Mounting /data-scratch...
Oct 06 17:45:32 neo-pc.sch mount[1509]: [DEBUG src/commands/mount.rs:143] Walking udev db!
Oct 06 17:45:32 neo-pc.sch mount[1509]: [DEBUG src/commands/mount.rs:216] enumerating devices with UUID acfd231c-8554-42dd-a69f-01a648984d01
Oct 06 17:45:33 neo-pc.sch mount[1509]: [INFO  src/commands/mount.rs:303] mounting with params: device: /dev/sda5:/dev/sdb5:/dev/nvme0n1p2, target: /data-scratch, options: rw
Oct 06 17:45:33 neo-pc.sch mount[1509]: [DEBUG src/commands/mount.rs:74] parsing mount options: rw
Oct 06 17:45:33 neo-pc.sch mount[1509]: [INFO  src/commands/mount.rs:43] mounting filesystem
Oct 06 17:45:33 neo-pc.sch kernel: bcachefs (acfd231c-8554-42dd-a69f-01a648984d01): mounting version 1.7: mi_btree_bitmap opts=foreground_target=nvme,background_target=hdd,promote_target=nvme
Oct 06 17:45:33 neo-pc.sch kernel: bcachefs (acfd231c-8554-42dd-a69f-01a648984d01): recovering from clean shutdown, journal seq 25673455
Oct 06 17:45:33 neo-pc.sch kernel: bcachefs (acfd231c-8554-42dd-a69f-01a648984d01): alloc_read... done
Oct 06 17:45:33 neo-pc.sch kernel: bcachefs (acfd231c-8554-42dd-a69f-01a648984d01): stripes_read... done
Oct 06 17:45:33 neo-pc.sch kernel: bcachefs (acfd231c-8554-42dd-a69f-01a648984d01): snapshots_read... done
Oct 06 17:45:33 neo-pc.sch kernel: bcachefs (acfd231c-8554-42dd-a69f-01a648984d01): journal_replay... done
Oct 06 17:45:33 neo-pc.sch kernel: bcachefs (acfd231c-8554-42dd-a69f-01a648984d01): resume_logged_ops... done
Oct 06 17:45:33 neo-pc.sch kernel: bcachefs (acfd231c-8554-42dd-a69f-01a648984d01): going read-write
Oct 06 17:45:33 neo-pc.sch systemd[1]: Mounted /data-scratch.

Interestingly, we only find bcachefs-tools trace logs in the second mount attempt, but not in the first. It appears as the systemd mount unit doesn't invoke bcachefs-tools when initially executing the mount unit, but somehow on the second time. I am not sure what exactly is going on, and the initially mount failing without any further error messages doesn't help much.

@Flowdalic
Copy link
Author

There is a similar report as mine at systemd/systemd#8234 (comment)

However,, this issue report is about a regression after bcachefs-tools 1.7.0. The system boots fine with bcachefs-tools 1.7.0, regardless of whether UUID or OLD_BLK_UUID is used in the fstab entry. And it fails to boot as soon as any version later than 1.7.0 is installed (again, regardless of UUID or OLD_BLK_UUID).

Try OLD_BCACHEFS_UUID=; that was the workaround we added awhile back.

Just to verify: you meant OLD_BLK_UUID here, right? Does it really matter if UUID or OLD_BLK_UUID is used, given that bcachefs-tools treat them equal at

if let Some(("UUID" | "OLD_BLKID_UUID", uuid)) = cli.dev.split_once('=') {
devs_str_sbs_from_uuid(&udev_info, uuid)?

?

@Flowdalic
Copy link
Author

Flowdalic commented Oct 25, 2024

I managed to debug this issue further by hooking the mount.bcachefs invocation with a script that prints the used arguments and invokes the actual mount.bcachefs with strace. In short, setting BCACHEFS_BLOCK_SCAN for the mount unit fixes this for me. However, I am not sure if this is working as intended.

Remember that since bcachefs-tools 1.9, the system fails to mount the bcachefs volume on start. However, after dropping into the recovery shell and just issuing systemctl isolate graphical, the the mount succeeds immediately and the system boots without issues.

The following is an excerpt of the log of the first mount failing using bcachefs-tools 1.13.0, sda5 is a part of a multi-device bcachefs volume:

Oct 25 09:13:55 myhost mount[1248]: /sbin/mount.bcachefs OLD_BLKID_UUID=acfd231c-8554-42dd-a69f-01a648984d01 /data-scratch -o rw
...
Oct 25 09:13:56 myhost mount[1263]: [pid  1270] openat(AT_FDCWD, "/sys/devices/pci0000:00/0000:00:17.0/ata2/host1/target1:0:0/1:0:0:0/block/sda/sda5/uevent", O_RDONLY|O_NOCTTY|O_CLOEXEC) = 5
Oct 25 09:13:56 myhost mount[1263]: [pid  1270] fstat(5, {st_mode=S_IFREG|0644, st_size=4096, ...}) = 0
Oct 25 09:13:56 myhost mount[1263]: [pid  1270] read(5, "MAJOR=8\nMINOR=5\nDEVNAME=sda5\nDEV"..., 4104) = 65
Oct 25 09:13:56 myhost mount[1263]: [pid  1270] close(5)                    = 0
Oct 25 09:13:56 myhost mount[1263]: [pid  1270] getrandom("\xab\x68\xbd\x3a\x3f\x18\xd9\x6c\x5f\xe5\x73\x2f\x9a\xa3\xe6\xa5", 16, GRND_INSECURE) = 16
Oct 25 09:13:56 myhost mount[1263]: [pid  1270] openat(AT_FDCWD, "/run/udev/data/b8:5", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
Oct 25 09:13:56 myhost mount[1263]: [pid  1270] openat(AT_FDCWD, "/run/udev/data/b8:5", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
Oct 25 09:13:56 myhost mount[1263]: [pid  1270] openat(AT_FDCWD, "/run/udev/data/b8:5", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
Oct 25 09:13:56 myhost mount[1263]: [pid  1270] openat(AT_FDCWD, "/run/udev/data/b8:5", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
Oct 25 09:13:56 myhost mount[1263]: [pid  1270] openat(AT_FDCWD, "/run/udev/data/b8:5", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
Oct 25 09:13:56 myhost mount[1263]: [pid  1270] openat(AT_FDCWD, "/run/udev/data/b8:5", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
Oct 25 09:13:56 myhost mount[1263]: [pid  1270] openat(AT_FDCWD, "/run/udev/data/b8:5", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
Oct 25 09:13:56 myhost mount[1263]: [pid  1270] openat(AT_FDCWD, "/run/udev/data/b8:5", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
Oct 25 09:13:56 myhost mount[1263]: [pid  1270] openat(AT_FDCWD, "/run/udev/data/b8:5", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
Oct 25 09:13:56 myhost mount[1263]: [pid  1270] openat(AT_FDCWD, "/", O_RDONLY|O_CLOEXEC|O_PATH|O_DIRECTORY) = 5
Oct 25 09:13:56 myhost mount[1263]: [pid  1270] openat(5, "..", O_RDONLY|O_CLOEXEC|O_PATH|O_DIRECTORY) = 6

The same part, now succeeding on the second mount.

Oct 25 09:15:24 myhost mount[1494]: /sbin/mount.bcachefs OLD_BLKID_UUID=acfd231c-8554-42dd-a69f-01a648984d01 /data-scratch -o rw
...
Oct 25 09:15:24 myhost mount[1497]: [pid  1500] readlinkat(AT_FDCWD, "/sys/devices/pci0000:00/0000:00:17.0/ata2/host1/target1:0:0/1:0:0:0/block/sda/sda5/subsystem", "../../../../../../../../../../cl"..., 4096) = 41
Oct 25 09:15:24 myhost mount[1497]: [pid  1500] openat(AT_FDCWD, "/sys/devices/pci0000:00/0000:00:17.0/ata2/host1/target1:0:0/1:0:0:0/block/sda/sda5/uevent", O_RDONLY|O_NOCTTY|O_CLOEXEC) = 5
Oct 25 09:15:24 myhost mount[1497]: [pid  1500] fstat(5, {st_mode=S_IFREG|0644, st_size=4096, ...}) = 0
Oct 25 09:15:24 myhost mount[1497]: [pid  1500] read(5, "MAJOR=8\nMINOR=5\nDEVNAME=sda5\nDEV"..., 4104) = 65
Oct 25 09:15:24 myhost mount[1497]: [pid  1500] close(5)                    = 0
Oct 25 09:15:24 myhost mount[1497]: [pid  1500] getrandom("\x0a\xd0\x49\xd1\xa8\x4f\x7d\x3d\xde\xa2\x41\x50\x75\xc2\x62\xf3", 16, GRND_INSECURE) = 16
Oct 25 09:15:24 myhost mount[1497]: [pid  1500] openat(AT_FDCWD, "/run/udev/data/b8:5", O_RDONLY|O_CLOEXEC) = 5
Oct 25 09:15:24 myhost mount[1497]: [pid  1500] fstat(5, {st_mode=S_IFREG|0644, st_size=2620, ...}) = 0
Oct 25 09:15:24 myhost mount[1497]: [pid  1500] fstat(5, {st_mode=S_IFREG|0644, st_size=2620, ...}) = 0
Oct 25 09:15:24 myhost mount[1497]: [pid  1500] read(5, "S:disk/by-label/neo-data-scratch"..., 4096) = 2620
Oct 25 09:15:24 myhost mount[1497]: [pid  1500] read(5, "", 4096)           = 0
Oct 25 09:15:24 myhost mount[1497]: [pid  1500] close(5)                    = 0
Oct 25 09:15:24 myhost mount[1497]: [pid  1500] getrandom("\x51\xdb\x98\x75\x72\xb2\x79\x62\x5d\xd0\x76\xb7\xd0\xd6\xa7\xf5", 16, GRND_INSECURE) = 16
Oct 25 09:15:24 myhost mount[1497]: [pid  1500] getrandom("\x31\xa6\xb0\x38\xd2\x63\xef\x89\x7c\x57\xf9\x9f\x68\x83\x33\xbc", 16, GRND_INSECURE) = 16
Oct 25 09:15:24 myhost mount[1497]: [pid  1500] getrandom("\xa7\x5c\x68\x44\x3c\x2e\xbe\x6f\x2c\x64\xc8\x82\x8a\x6d\xb8\x87", 16, GRND_INSECURE) = 16
Oct 25 09:15:24 myhost mount[1497]: [pid  1500] getrandom("\x8d\x0b\xd3\x81\x30\x38\x57\xfb\x3f\xee\x68\x6b\x30\x2f\x53\xc9", 16, GRND_INSECURE) = 16
Oct 25 09:15:24 myhost mount[1497]: [pid  1500] getrandom("\x5b\x7e\x2c\x06\x1e\x28\x44\x19\xc9\xb4\xaa\xb1\x68\x0a\x93\x7a", 16, GRND_INSECURE) = 16
Oct 25 09:15:24 myhost mount[1497]: [pid  1500] getrandom("\xf6\x99\xf8\xd3\x2b\x66\x17\xdf\x4a\xb4\x15\x29\xad\xc5\x16\x3b", 16, GRND_INSECURE) = 16
Oct 25 09:15:24 myhost mount[1497]: [pid  1500] getrandom("\xc5\xe7\xab\x16\x11\x43\x54\xf3\xa6\x90\x6b\xac\x2f\x3a\x75\x0e", 16, GRND_INSECURE) = 16
Oct 25 09:15:24 myhost mount[1497]: [pid  1500] getrandom("\x29\x9d\x8f\xe5\xb8\xf9\x0a\x38\xd7\xd7\x1b\x49\x7e\xaf\xe2\x83", 16, GRND_INSECURE) = 16
Oct 25 09:15:24 myhost mount[1497]: [pid  1500] getrandom("\xde\x86\x74\xaa\x2e\x4d\x7b\x2a\x6f\xba\x1f\x85\x44\x94\x0c\xc2", 16, GRND_INSECURE) = 16
Oct 25 09:15:24 myhost mount[1497]: [pid  1500] openat(AT_FDCWD, "/", O_RDONLY|O_CLOEXEC|O_PATH|O_DIRECTORY) = 5
Oct 25 09:15:24 myhost mount[1497]: [pid  1500] openat(5, "..", O_RDONLY|O_CLOEXEC|O_PATH|O_DIRECTORY) = 6

It appears that /run/udev/data is not populated at the time mount.bcachefs is invoked the first time, but later is.

But why does it work flawlessly with bcachefs-tools 1.7? Again, here is the excerpt of the related log using 1.7:

Oct 25 09:47:23 myhost mount[1289]: [pid  1292] openat(AT_FDCWD, "/sys/devices/pci0000:00/0000:00:17.0/ata2/host1/target1:0:0/1:0:0:0/block/sda/sda5/uevent", O_RDONLY|O_NOCTTY|O_CLOEXEC) = 3
Oct 25 09:47:23 myhost mount[1289]: [pid  1292] fstat(3, {st_mode=S_IFREG|0644, st_size=4096, ...}) = 0
Oct 25 09:47:23 myhost mount[1289]: [pid  1292] read(3, "MAJOR=8\nMINOR=5\nDEVNAME=sda5\nDEV"..., 4104) = 65
Oct 25 09:47:23 myhost mount[1289]: [pid  1292] close(3)                    = 0
Oct 25 09:47:23 myhost mount[1289]: [pid  1292] getrandom("\x17\xf6\xec\x0e\xf1\xd1\xbb\x2c\x9e\x2a\xff\xa7\xec\x30\xd4\x35", 16, GRND_INSECURE) = 16
Oct 25 09:47:23 myhost mount[1289]: [pid  1292] openat(AT_FDCWD, "/dev/sda5", O_RDWR|O_DIRECT) = 3
Oct 25 09:47:23 myhost mount[1289]: [pid  1292] fstat(3, {st_mode=S_IFBLK|0660, st_rdev=makedev(0x8, 0x5), ...}) = 0
Oct 25 09:47:23 myhost mount[1289]: [pid  1292] fstat(3, {st_mode=S_IFBLK|0660, st_rdev=makedev(0x8, 0x5), ...}) = 0
Oct 25 09:47:23 myhost mount[1289]: [pid  1292] ioctl(3, BLKPBSZGET, [4096]) = 0
Oct 25 09:47:23 myhost mount[1289]: [pid  1292] io_submit(0x7f2ff9068000, 1, [{aio_data=0x55fd63f31680, aio_lio_opcode=IOCB_CMD_PREADV, aio_fildes=3, aio_buf=[{iov_base=0x55fd63f3a000, iov_len=4096}], aio_offset=4096}]) = 1
Oct 25 09:47:23 myhost mount[1289]: [pid  1292] futex(0x55fd63f2f2bc, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
Oct 25 09:47:23 myhost mount[1289]: [pid  1294] <... io_getevents resumed>[{data=0x55fd63f31680, obj=0x7ffe8145c420, res=4096, res2=0}], NULL) = 1
Oct 25 09:47:23 myhost mount[1289]: [pid  1294] futex(0x55fd63f2f2bc, FUTEX_WAKE_PRIVATE, 2147483647 <unfinished ...>
Oct 25 09:47:23 myhost mount[1289]: [pid  1292] <... futex resumed>)        = 0
Oct 25 09:47:23 myhost mount[1289]: [pid  1294] <... futex resumed>)        = 1
Oct 25 09:47:23 myhost mount[1289]: [pid  1292] futex(0x7ffe8145c548, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
Oct 25 09:47:23 myhost mount[1289]: [pid  1294] futex(0x7ffe8145c548, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
Oct 25 09:47:23 myhost mount[1289]: [pid  1292] <... futex resumed>)        = -1 EAGAIN (Resource temporarily unavailable)
Oct 25 09:47:23 myhost mount[1289]: [pid  1294] <... futex resumed>)        = 0
Oct 25 09:47:23 myhost mount[1289]: [pid  1292] futex(0x7ffe8145c548, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
Oct 25 09:47:23 myhost mount[1289]: [pid  1294] io_getevents(0x7f2ff9068000, 1, 8,  <unfinished ...>
Oct 25 09:47:23 myhost mount[1289]: [pid  1292] <... futex resumed>)        = 0
Oct 25 09:47:23 myhost mount[1289]: [pid  1292] fstat(3, {st_mode=S_IFBLK|0660, st_rdev=makedev(0x8, 0x5), ...}) = 0
Oct 25 09:47:23 myhost mount[1289]: [pid  1292] ioctl(3, BLKPBSZGET, [4096]) = 0
Oct 25 09:47:23 myhost mount[1289]: [pid  1292] io_submit(0x7f2ff9068000, 1, [{aio_data=0x55fd63f37a80, aio_lio_opcode=IOCB_CMD_PREADV, aio_fildes=3, aio_buf=[{iov_base=0x55fd63f3c000, iov_len=4096}, {iov_base=0x55fd63f3d000, iov_len=4096}], aio_offset=4096}]) = 1
Oct 25 09:47:23 myhost mount[1289]: [pid  1292] futex(0x55fd63f2f2bc, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
Oct 25 09:47:23 myhost mount[1289]: [pid  1294] <... io_getevents resumed>[{data=0x55fd63f37a80, obj=0x7ffe8145c410, res=8192, res2=0}], NULL) = 1
Oct 25 09:47:23 myhost mount[1289]: [pid  1294] futex(0x55fd63f2f2bc, FUTEX_WAKE_PRIVATE, 2147483647) = 1
Oct 25 09:47:23 myhost mount[1289]: [pid  1294] io_getevents(0x7f2ff9068000, 1, 8,  <unfinished ...>
Oct 25 09:47:23 myhost mount[1289]: [pid  1292] <... futex resumed>)        = 0
Oct 25 09:47:23 myhost mount[1289]: [pid  1292] fstat(3, {st_mode=S_IFBLK|0660, st_rdev=makedev(0x8, 0x5), ...}) = 0
Oct 25 09:47:23 myhost mount[1289]: [pid  1292] ioctl(3, BLKPBSZGET, [4096]) = 0
Oct 25 09:47:23 myhost mount[1289]: [pid  1292] openat(AT_FDCWD, "/", O_RDONLY|O_CLOEXEC|O_PATH|O_DIRECTORY) = 4
Oct 25 09:47:23 myhost mount[1289]: [pid  1292] openat(4, "..", O_RDONLY|O_CLOEXEC|O_PATH|O_DIRECTORY) = 5

That made me dig once again into the bachefs-tools source and its history. I found 761ef99 ("Add env. variable BCACHEFS_BLOCK_SCAN") and 1fb1898 ("mount: Leverage udev db"), which where commited between 1.7 and 1.9.

Setting BCACHEFS_BLOCK_SCAN for the bcachefs mount unit, via /etc/systemd/system/data\\x2dscratch.mount.d/override.conf, made the system boot without issues with bcachefs-tools 1.13.0.

I suspect the issue got introduced with 1fb1898 ("mount: Leverage udev db"), even though the commit message states that there is a fallback to traversing block devices. I did not yet find the time to look into it. Maybe @tasleson can comment on this.

Flowdalic added a commit to Flowdalic/bcachefs-tools that referenced this issue Oct 25, 2024
Just returning a failure exit status makes it hard to tell what the
actual issue was. Instead, the mount command now also provides some
textual output.

See koverstreet#308 for an
example where mount does not provide any helpful output, even with
BCACHEFS_LOG=trace enabled.

Signed-off-by: Florian Schmaus <[email protected]>
@tasleson
Copy link

  • What version of util-linux are you running?
  • Can you supply output of bcachefs show-super?

My initial thoughts are you're missing the lastest fixes to util-linux which includes bcachefs corrections for probing. The reason the old code works and new code works with the env. variable BCACHEFS_BLOCK_SCAN is bcachefs mount is simply opening every block device and checking for a bcachefs superblock signature.

However, this comment

However, this time, I somehow simply hit Ctrl-D. To my surprise, the system booted normally and the bcachefs volume was successfully mounted.

Seems to indicate you have a race condition in that your block device isn't ready yet by the time the fstab gets processed. This is the exact behavior I would see during testing where I purposely prevented a device from showing up at boot.

I'm also wondering if you have different blkid running in your initramfs, vs. normal runtime.

I would suggest enabling udev debug output, see man 5 udev.conf

@tasleson
Copy link

tasleson commented Oct 25, 2024

@Flowdalic

even though the commit message states that there is a fallback to traversing block devices.

fn get_devices_by_uuid(
udev_bcachefs: &HashMap<String, Vec<String>>,
uuid: Uuid,
) -> anyhow::Result<Vec<(PathBuf, bch_sb_handle)>> {
let devices = {
if !udev_bcachefs.is_empty() {
let uuid_string = uuid.hyphenated().to_string();
if let Some(devices) = udev_bcachefs.get(&uuid_string) {
devices.clone()
} else {
Vec::new()
}
} else {
get_all_block_devnodes()?
}
};
Ok(get_super_blocks(uuid, &devices))
}

If we find no bcachefs devices in the udev db, the code does revert to walking all the block devices. However, it is using the udev db to get the list of block devices. If your block device isn't in the udev db, it wouldn't be probed. However, the old bcachefs mount was using the udev db to get block devices too.

Do you have more than 1 device that has a bcachefs superblock signature, one that isn't getting mounted?

sda5 is a part of a multi-device bcachefs volume:

You do have more than one device with a bcachefs signature, thus all we need it one of them to be in the udev db for the fallback to not run.

@koverstreet
Copy link
Owner

There's a 'udev settle' command we use in some of our tests to wait for block devices to show up, do we need to be doing the equivalent of that here as well?

@tasleson
Copy link

There's a 'udev settle' command we use in some of our tests to wait for block devices to show up, do we need to be doing the equivalent of that here as well?

I think there probably needs to be better support for bcachefs in systemd. When the fstab is being processed, systemd waits for the FS UUID to show up before it issues the mount. When you have a multi-disk setup as soon as one disk gets probed and is found to contain the FS UUID of interest, the mount gets issued. Systemd needs to wait until all the block devices for the FS are present before issuing the mount. A FS UUID which is stacked on md or lvm etc. doesn't become available until the block layer has all the devices and brings up the logical volume etc.

I suppose we could modify bcachefs mount to wait when at least 1 disk of a mult-disk setup is found, but that might break other things.

@Flowdalic
Copy link
Author

  • What version of util-linux are you running?

2.39.4

  • Can you supply output of bcachefs show-super?
Device:                                     (unknown device)
External UUID:                             acfd231c-8554-42dd-a69f-01a648984d01
Internal UUID:                             2173a606-e66f-4266-8eea-10dd03e7f7d1
Magic number:                              c68573f6-66ce-90a9-d96a-60cf803df7ef
Device index:                              0
Label:                                     neo-data-scratch
Version:                                   1.7: mi_btree_bitmap
Version upgrade complete:                  1.7: mi_btree_bitmap
Oldest version on disk:                    1.4: member_seq
Created:                                   Fri Jun 14 20:37:16 2024
Sequence number:                           420
Time of last write:                        Sun Oct 27 10:38:49 2024
Superblock size:                           6.13 KiB/1.00 MiB
Clean:                                     0
Devices:                                   3
Sections:                                  members_v1,replicas_v0,disk_groups,clean,journal_seq_blacklist,journal_v2,counters,members_v2,errors,ext,downgrade
Features:                                  lz4,zstd,journal_seq_blacklist_v3,reflink,new_siphash,inline_data,new_extent_overwrite,btree_ptr_v2,extents_above_btree_updates,btree_updates_journalled,reflink_inline_data,new_varint,journal_no_flush,alloc_v2,extents_across_btree_nodes
Compat features:                           alloc_info,alloc_metadata,extents_above_btree_updates_done,bformat_overflow_done

Options:
  block_size:                              4.00 KiB
  btree_node_size:                         256 KiB
  errors:                                  continue [fix_safe] panic ro 
  metadata_replicas:                       1
  data_replicas:                           1
  metadata_replicas_required:              1
  data_replicas_required:                  1
  encoded_extent_max:                      64.0 KiB
  metadata_checksum:                       none [crc32c] crc64 xxhash 
  data_checksum:                           none [crc32c] crc64 xxhash 
  compression:                             none
  background_compression:                  none
  str_hash:                                crc32c crc64 [siphash] 
  metadata_target:                         none
  foreground_target:                       nvme
  background_target:                       hdd
  promote_target:                          nvme
  erasure_code:                            0
  inodes_32bit:                            1
  shard_inode_numbers:                     1
  inodes_use_key_cache:                    1
  gc_reserve_percent:                      8
  gc_reserve_bytes:                        0 B
  root_reserve_percent:                    0
  wide_macs:                               0
  promote_whole_extents:                   1
  acl:                                     1
  usrquota:                                0
  grpquota:                                0
  prjquota:                                0
  journal_flush_delay:                     1000
  journal_flush_disabled:                  0
  journal_reclaim_delay:                   100
  journal_transaction_names:               1
  allocator_stuck_timeout:                 30
  version_upgrade:                         [compatible] incompatible none 
  nocow:                                   0

members_v2 (size 448):
Device:                                    0
  Label:                                   1 (1)
  UUID:                                    27b33f8e-fd93-4427-a8a4-0f59c64dd692
  Size:                                    95.0 GiB
  read errors:                             0
  write errors:                            0
  checksum errors:                         0
  seqread iops:                            0
  seqwrite iops:                           0
  randread iops:                           0
  randwrite iops:                          0
  Bucket size:                             256 KiB
  First bucket:                            0
  Buckets:                                 389120
  Last mount:                              Sun Oct 27 10:38:49 2024
  Last superblock write:                   420
  State:                                   rw
  Data allowed:                            journal,btree,user
  Has data:                                journal,btree,user,cached
  Btree allocated bitmap blocksize:        4.00 MiB
  Btree allocated bitmap:                  0000000000000000111111111111111111111111111111111111111111111111
  Durability:                              1
  Discard:                                 1
  Freespace initialized:                   1
Device:                                    1
  Label:                                   1 (3)
  UUID:                                    d978982c-e5f0-4d03-a69b-a9664602459a
  Size:                                    1.03 TiB
  read errors:                             0
  write errors:                            0
  checksum errors:                         0
  seqread iops:                            0
  seqwrite iops:                           0
  randread iops:                           0
  randwrite iops:                          0
  Bucket size:                             256 KiB
  First bucket:                            0
  Buckets:                                 4317248
  Last mount:                              Sun Oct 27 10:38:49 2024
  Last superblock write:                   420
  State:                                   rw
  Data allowed:                            journal,btree,user
  Has data:                                journal,user,cached
  Btree allocated bitmap blocksize:        1.00 B
  Btree allocated bitmap:                  0000000000000000000000000000000000000000000000000000000000000000
  Durability:                              1
  Discard:                                 0
  Freespace initialized:                   1
Device:                                    2
  Label:                                   2 (4)
  UUID:                                    859ebb7f-64bb-49cd-a60c-1165bc3f21b5
  Size:                                    1.03 TiB
  read errors:                             0
  write errors:                            0
  checksum errors:                         0
  seqread iops:                            0
  seqwrite iops:                           0
  randread iops:                           0
  randwrite iops:                          0
  Bucket size:                             256 KiB
  First bucket:                            0
  Buckets:                                 4317248
  Last mount:                              Sun Oct 27 10:38:49 2024
  Last superblock write:                   420
  State:                                   rw
  Data allowed:                            journal,btree,user
  Has data:                                journal,user,cached
  Btree allocated bitmap blocksize:        1.00 B
  Btree allocated bitmap:                  0000000000000000000000000000000000000000000000000000000000000000
  Durability:                              1
  Discard:                                 0
  Freespace initialized:                   1

errors (size 8):

I'm also wondering if you have different blkid running in your initramfs, vs. normal runtime.

I would not assume that to be the case, then again, I am not sure how to verify that. Does blkid here refer to the filesystem's UUID? Why would that change between initramfs and normal runtime?

My bcachefs setup just involves one NVMe and two HDDs. No mdraid nor lvm in between, on a Gentoo mostly-stable system (systemd is at 256.7).

You do have more than one device with a bcachefs signature, thus all we need it one of them to be in the udev db for the fallback to not run.

How about running the fallback in every case? Just until there is better systemd support.

I think we have enough indication to assume that the issue is caused by the bcachefs mount being issued before all block devices are in the udev db.

@Flowdalic
Copy link
Author

BTW, BCACHEFS_BLOCK_SCAN does not seem to be documented, should it?

@Flowdalic Flowdalic changed the title bcachefs mount fails on boot since bcachefs-tools 1.9.0 (multi-device) bcachefs mount fails on boot since bcachefs-tools 1.9.0 Oct 27, 2024
@tasleson
Copy link

tasleson commented Oct 28, 2024

I would not assume that to be the case, then again, I am not sure how to verify that. Does blkid here refer to the filesystem's UUID? Why would that change between initramfs and normal runtime?

blkid is a utility that retrieves the information about the block device including the FS UUID. A copy would exist in the initramfs as well as on your filesystem. It could be different if user space is newer than the last time the initramfs was generated. You could examine the contents of your iniitramfs to verify, but its not needed. Your copy on your FS doesn't have all the fixes for bcachefs, but I don't think those missing fixes are contributing to the issue you're seeing.

My bcachefs setup just involves one NVMe and two HDDs. No mdraid nor lvm in between, on a Gentoo mostly-stable system (systemd is at 256.7).

NVMe is likely available immediately, disks needs to spin up, thus the reason the mount fails.

You do have more than one device with a bcachefs signature, thus all we need it one of them to be in the udev db for the fallback to not run.

How about running the fallback in every case? Just until there is better systemd support.

That is an option, the env. variable was added to give users this as a workaround too. The env. variable should be documented.

Please note that none of this addresses the root problem. You could fail to mount at boot even when doing the signature scan of every block device at boot as the udev db is used to locate all the block devices to walk.

I'm assuming if you do a warm reboot/reset the mount works fine without BCACHEFS_BLOCK_SCAN?

@Flowdalic
Copy link
Author

Currently I work around the issue by using x-systemd.requires in the fstab entry, which seems to take care that all required block devices are there before systemd issues the mount. That is, my fstab entry looks like this

OLD_BLKID_UUID=acfd231c-8554-42dd-a69f-01a648984d01	/data-scratch   bcachefs	defaults,x-systemd.requires=/dev/disk/by-id/nvme-SAMSUNG_MZVLB256HAHQ-00000_S444NB0K572323_1-part2,x-systemd.requires=/dev/disk/by-id/wwn-0x5000039fd1c10920-part5,x-systemd.requires=/dev/disk/by-id/wwn-0x5000c500b205e301-part5

Note that I use OLD_BLK_UUID (not sure if this is required) and I still have BCACHEFS_BLOCK_SCAN=1 set for the systemd mount (not sure if this is still required).

@tasleson
Copy link

Note that I use OLD_BLK_UUID (not sure if this is required) and I still have BCACHEFS_BLOCK_SCAN=1 set for the systemd mount (not sure if this is still required).

If systemd is ensuring that the devices are present before issuing the mount, then you shouldn't need BCACHEFS_BLOCK_SCAN=1. I would need to look at the code again before commenting on OLD_BLK_UUID.

Thanks for following up with this workaround.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants