Cache Device Incomplete and Core Devices Inactive after reboot #1215

nfronz · 2022-05-26T19:47:30Z

Question

Why don't Cache and cores activate properly on reboot?

Motivation

I broke my Ceph Cluster as I have Open CAS running on all OSDs

Your Environment

OpenCAS version (commit hash or tag):22.03.0.0685.master
Operating System:Debian 11/Proxmox
Kernel version:5.13.19-6-pve
Cache device type (NAND/Optane/other):NAND
Core device type (HDD/SSD/other):HDD
Cache configuration:
[caches]
1 /dev/disk/by-id/nvme-Samsung_SSD_980_PRO_1TB_S5P2NG0R508005Y wb cache_line_size=4,ioclass_file=/etc/opencas/ansible/default.csv,cleaning_policy=alru,promotion_policy=always

[cores]
1 1 /dev/disk/by-id/wwn-0x5000cca255c01f9d-part1
1 2 /dev/disk/by-id/wwn-0x5000c500c93f187a-part1
1 3 /dev/disk/by-id/wwn-0x5000c50085e9d2eb-part1
1 4 /dev/disk/by-id/wwn-0x5000c50085b40c5b-part1
1 5 /dev/disk/by-id/wwn-0x5000c50085e47697-part1
1 6 /dev/disk/by-id/wwn-0x5000c50085e577ff-part1
1 7 /dev/disk/by-id/wwn-0x5000c50085dd1293-part1

- Cache mode: (default: wt)WB
- Cache line size: (default: 4)4
- Promotion policy: (default: always)ALWAYS
- Cleaning policy: (default: alru)ALRU
- Sequential cutoff policy: (default: full)FULL
Other (e.g. lsblk, casadm -P, casadm -L)

mmichal10 · 2022-05-30T06:12:53Z

Hi @nfronz

thank you for posting the question. Did Open CAS print any information to dmesg during booting to the OS? What is the state of the cache instance after the reboot? Is is not running at all? Or is it in an incomplete state? Could you share casadm -L output?

brokoli18 · 2023-07-07T18:29:42Z

@mmichal10 I have the same issue as the above asker. What I did is deploy opencas on several ceph nodes in a test lab using the instructions in https://open-cas.github.io/getting_started_open_cas_linux.html.

Casadm version: 22.12.0.0844.master
Operating system: Ubuntu 18.04
Kernel version: 5.4.0-139-generic
Cache device type: SSD
Core device type: HDD

While changing one of the config options the casadm command hung and as I couldnt kill it I rebooted the node. I assume that becouse I had no opencas.conf configuration the opencas devices didnt come back, but after running the casadm -S -d /dev/disk/by-id/ata-SSDSC2K...--load command I am in this state:

# casadm -L
type    id   disk       status       write policy   device
cache   1    /dev/sdh   Incomplete   wb             -
+core   1    /dev/sda   Inactive     -              -
+core   2    /dev/sdb   Inactive     -              -
+core   3    /dev/sdc   Inactive     -              -
+core   4    /dev/sdd   Inactive     -              -

Looking at the actual block devices it seems that ceph is now trying to run on the raw disks instead of the cas devices as well:

# lsblk -o name
NAME
sda
└─sda1
  └─ceph--418e8367--4d48--42a0--89e9--0bed53fc705b-osd--block--418e8367--4d48--42a0--89e9--0bed53fc705b
sdb
└─sdb1
  └─ceph--4169945f--e1f1--44d5--8ba0--ddf9ebe42738-osd--block--4169945f--e1f1--44d5--8ba0--ddf9ebe42738
sdc
└─sdc1
  └─ceph--c818d7ca--9467--443c--941b--bf7df5ac8376-osd--block--c818d7ca--9467--443c--941b--bf7df5ac8376
sdd
└─sdd1
  └─ceph--5e92a9e5--9176--460e--950d--e0858ddef29d-osd--block--5e92a9e5--9176--460e--950d--e0858ddef29d

Is there anything that can be done to fix this situation?

P.S. Here are some (what I assume) relevant dmesg logs:

[   19.315794] cache1: Loading cache state...
[   19.379479] Thread cas_cl_cache1 started
[   19.379484] cache1: Policy 'always' initialized successfully
[   19.379509] cache1: Cannot open core 1. Cache is busy
[   19.379511] cache1.core1: Seqential cutoff init
[   19.379549] cache1: Cannot find core 1 in pool, core added as inactive
[   19.379555] cache1: Cannot open core 2. Cache is busy
[   19.379556] cache1.core2: Seqential cutoff init
[   19.379590] cache1: Cannot find core 2 in pool, core added as inactive
[   19.379595] cache1: Cannot open core 3. Cache is busy
[   19.379596] cache1.core3: Seqential cutoff init
[   19.379630] cache1: Cannot find core 3 in pool, core added as inactive
[   19.379635] cache1: Cannot open core 4. Cache is busy
[   19.379644] cache1.core4: Seqential cutoff init
[   19.379678] cache1: Cannot find core 4 in pool, core added as inactive
[   31.707718] cache1: Done loading cache state
[   46.325490] cache1: Done saving cache state!
[   46.362173] cache1: Cache attached
[   46.362176] cache1: Successfully loaded
[   46.362177] cache1: Cache mode : wb
[   46.362178] cache1: Cleaning policy : acp
[   46.362179] cache1: Promotion policy : always
[   46.362181] cache1.core1: Failed to initialize
[   46.362182] cache1.core2: Failed to initialize
[   46.362183] cache1.core3: Failed to initialize
[   46.362184] cache1.core4: Failed to initialize
[   46.362223] [Open-CAS] Adding device /dev/disk/by-id/ata-SSDSC2KB480G8R_PHYF8446032P480BGN as cache cache1
[   46.362235] [Open-CAS] [Classifier] Initialized IO classifier
[   46.362260] [Open-CAS] Adding device /dev/disk/by-id/wwn-0x5000039562708b05 as core core1 to cache cache1
[   46.362262] [Open-CAS] Adding device /dev/disk/by-id/wwn-0x5000039961a8734e as core core2 to cache cache1
[   46.362263] [Open-CAS] Adding device /dev/disk/by-id/wwn-0x5000039881608fe5 as core core3 to cache cache1
[   46.362264] [Open-CAS] Adding device /dev/disk/by-id/wwn-0x5000039673688f92 as core core4 to cache cache1
[   46.367193] cache1.core3: Inserting core
[   46.367214] cache1: Adding core core3 failed
[   46.371824] cache1.core4: Inserting core
[   46.371832] cache1: Adding core core4 failed

mmichal10 · 2023-07-10T07:44:53Z

Hello @brokoli18,

The cache couldn't be loaded because CAS couldn't open core devices exclusively. Would it be possible stop the cache, detach the disks from ceph, load the cache again and attach cas exported objects (/dev/cas1-X) to ceph?

brokoli18 · 2023-07-11T11:21:22Z

Thank you for your response. What exactly do you mean by detach the disks from ceph. Do you mean:

Stop ceph/lvm from trying to autostart the disks on boot
Remove the disks from ceph/lvm configuration completely and readd them

For your reference I have tried option 1 by masking the ceph-osd@ service at startup and I am still in much the same situation. Although I could try option 2 here as it is a lab environment this would not be a good result as I cant start wiping disks in our prod environment when this sort of situation occurs.

fpeg26 · 2024-09-04T09:15:18Z

Hi,

Just letting you know that we're having the issue in our setup.

The only "fix" that I found was to add this line in the open-cas.service file:
ExecStartPre=/bin/sleep 30
But this is not a proper solution.

I've also been experimenting on the startup order of the open-cas, lvm and ceph services with no luck.

Any help on this would be greatly appreciated.
Regards

fpeg26 · 2024-09-06T14:11:26Z

It also seems like if you zap the ceph volumes attached to the /dev/sd{a..d}1 devices like this:
for i in /dev/sd{a..d}1; do ceph-volume lvm zap --destroy $i;done
(you will need to allow those devices in /etc/lvm/lvm.conf first and you might also need to stop the corresponding osd services first)
And reboot your server, then the cas devices will be initialized properly, and the ceph osds will use them as you would expect.
The next reboots will also work properly so you should need to run the above command only once.

mmichal10 · 2024-10-17T14:05:32Z

Hello @fpeg26,
we're trying to reproduce the problem on our setup. Could you please share which version of ceph are you using?

fpeg26 · 2024-10-17T15:18:07Z

Hi @mmichal10,

Thanks a lot for taking the time to have look at this issue.
We're running ceph 18.2.2 on proxmox:
ceph-base/stable,now 18.2.2-pve1

If that helps, we're running this version of OpenCAS:
╔═════════════════════════╤═════════════════════╗
║ Name │ Version ║
╠═════════════════════════╪═════════════════════╣
║ CAS Cache Kernel Module │ 22.12.0.0852.master ║
║ CAS CLI Utility │ 22.12.0.0852.master ║
╚═════════════════════════╧═════════════════════╝

Let me know if I can provide more details

Regards

jfckm · 2024-10-18T11:46:28Z

Hi @fpeg26,
I tried to reproduce that with some simple cluster, but it started alright multiple times. Can you provide journalctl, dmesg and lvm.conf from your failing config? Maybe there will be some clues that will show us what might've gone wrong.

fpeg26 · 2024-10-18T14:16:51Z

Hi,

Thank you for taking the time to give this a try.

Here is the content of our lvm.conf file we're using (with drive wwn ids masked).
It's the same file on every node with corresponding wwn updated accordingly :

devices {
     global_filter=["r|/dev/zd.*|","r|/dev/rbd.*|","r|/dev/disk/by-id/wwn-xxxxxx|","r|/dev/sda|","r|/dev/disk/by-id/wwn-xxxxxx|","r|/dev/sdb|","r|/dev/disk/by-id/wwn-xxxxxx|","r|/dev/sdc|","r|/dev/disk/by-id/wwn-xxxxxx|","r|/dev/sdd|","r|/dev/disk/by-id/wwn-xxxxxx|","r|/dev/sde|","r|/dev/disk/by-id/wwn-xxxxxx|","r|/dev/sdf|","r|/dev/disk/by-id/wwn-xxxxxx|","r|/dev/sdg|","r|/dev/disk/by-id/wwn-xxxxxx|","r|/dev/sdh|",]
     types=["cas",16]
}

Unfortunately, I won't be able to upload the other 2 requested files in public.
Let me know if there is a way I can send them to you in private.

Our OpenCAS setup looks like this:

type    id   disk       status    write policy   device
cache   1    /dev/sdi   Running   wb             -
├core   1    /dev/sda   Active    -              /dev/cas1-1
├core   2    /dev/sdb   Active    -              /dev/cas1-2
├core   3    /dev/sdc   Active    -              /dev/cas1-3
└core   4    /dev/sdd   Active    -              /dev/cas1-4
cache   2    /dev/sdj   Running   wb             -
├core   1    /dev/sde   Active    -              /dev/cas2-1
├core   2    /dev/sdf   Active    -              /dev/cas2-2
├core   3    /dev/sdg   Active    -              /dev/cas2-3
└core   4    /dev/sdh   Active    -              /dev/cas2-4

When a node fails to attach the core devices to the cache on boot, we see those lines in journalctl (one for each core device):

Sep 17 09:31:11 xyz lvm[1748]: /dev/sda excluded: device is rejected by filter config.
Sep 17 09:31:12 xyz open-cas-loader.py[2684]: Unable to attach core /dev/disk/by-id/wwn-xxx from cache 1. Reason: Error while adding core device to cache instance 1
Sep 17 09:31:12 xyz (udev-worker)[1685]: sda: Process '/lib/opencas/open-cas-loader.py /dev/sda' failed with exit code 1.

Ceph then uses /dev/sdX as lvms before OpenCAS was able to attach the core devices.

Just out of curiosity, when you restart a node, do you do anything special like flushing the cache or outing the osds on ceph, or do you just issue a simple reboot command?

Shuting down a node that's gonna fail on next reboot results in those lines in journalctl:

Sep 17 09:27:48 xyz casctl[2328286]: Unable to detach core /dev/disk/by-id/wwn-xxxxxx. Reason:
Sep 17 09:27:48 xyz casctl[2328286]: Error while detaching core device 1 from cache instance 1
Sep 17 09:27:48 xyz  casctl[2328286]: Device opens or mount are pending to this cache

Also (but that's a separate issue), we're using OpenCAS for a workstation workload and we noticed that the cache never flushes on ALRU policy because the cluster never sleeps. Even with an Activity Threshold set to 1ms. For now we're doing it manually.

Hopefully this will help you understand our issue a bit better.

Regards

robertbaldyga · 2024-10-19T17:27:37Z

Hi @fpeg26 ,

I did a little experiment and managed to get a reproduction of the problem.
I looked at journalctl and I found the following message:

Oct 19 17:42:30 localhost lvm[752]:   Please remove the lvm.conf filter, it is ignored with the devices file.

It turns out that lvm by default is configured to use a new mechanism (the devices file) instead of filters.
Disabling the devices file enables the filters. So I added this line to my /etc/lvm/lvm.conf (in the devices section):

use_devicesfile = 0

After the reboot the cores were added correctly and the lvms detected on top of Open CAS devices:

type    id   disk       status    write policy   device
cache   1    /dev/vdb   Running   wt             -
├core   1    /dev/sda   Active    -              /dev/cas1-1
└core   2    /dev/sdb   Active    -              /dev/cas1-2

NAME                              MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
sda                                 8:0    0    5G  0 disk 
└─cas1-1                          251:0    0    5G  0 disk 
  └─ceph--276b3665--98c5--486d--87b3--f2ca8807cb3e-osd--block--30a9f46b--5b78--4fe2--b2ec--de7d2102e69f
                                  253:3    0    5G  0 lvm  
sdb                                 8:16   0    5G  0 disk 
└─cas1-2                          251:256  0    5G  0 disk 
  └─ceph--acc15d17--b9d9--4402--bc77--bde81240d449-osd--block--bd617c4c--00e3--41d6--a201--5836ec1423a7
                                  253:4    0    5G  0 lvm

Let me know it that solves the problem on your configuration.

robertbaldyga · 2024-10-19T19:05:38Z

I also managed to get the proper behavior with use_devicesfile = 1, but to make it work Open CAS needs to provide a serial for its block devices, so that lvm can distinguish them properly from the backend devices. I prepared a patch that does this #1574.

robertbaldyga · 2024-10-19T20:22:10Z

Just out of curiosity, when you restart a node, do you do anything special like flushing the cache or outing the osds on ceph, or do you just issue a simple reboot command?

Shuting down a node that's gonna fail on next reboot results in those lines in journalctl:
Sep 17 09:27:48 xyz casctl[2328286]: Unable to detach core /dev/disk/by-id/wwn-xxxxxx. Reason:
Sep 17 09:27:48 xyz casctl[2328286]: Error while detaching core device 1 from cache instance 1
Sep 17 09:27:48 xyz  casctl[2328286]: Device opens or mount are pending to this cache

We have two redundant shutdown methods. One is this service that you just noticed failing, another one is systemd-shutdown script, that is executed later in the shutdown process. In my tests with Ceph I also see the first one failing, but the second one correctly stops the cache (you can see it in kernel log). Not super elegant, but good enough.

fpeg26 · 2024-10-21T09:29:15Z

Hi @robertbaldyga,

Thank you for taking the time to look into this further

Unfortunately, we don't see the LVM filter message on any of the hosts in our cluster.
However, If I add the use_devicesfile = 0 line in lvm.conf and run lvmconfig --test , there are no complaints, which suggests that this feature might already be enabled.
We're also missing the lvmdevices command which is typically used to manage the devices file (unless done manually).
Could it be that this new mechanism was fully implemented in a newer LVM release? It seems possible that the error message we're expecting is not present in our version and may have been introduced later.
I will check the LVM Git repo and try to find more details.

This is our setup to compare with yours:

# Kernel
Linux localhost 6.5.13-6-pve

# LVM packages
libllvm15/stable,now 1:15.0.6-4+b1 amd64 [installed,automatic]
liblvm2cmd2.03/stable,now 2.03.16-2 amd64 [installed,automatic]
lvm2/stable,now 2.03.16-2 amd64 [installed]

I will apply the configuration change on each host to observe how it behaves with this setting in place.

Thanks as well for clarifying the shutdown methods. I just wanted to confirm that shutting down a node without flushing the cache is considered a good practice in a Ceph environment.

EDIT:
I checked the LVM config on each nodes before manually adding the use_devicesfile = 0 line and they are all already setup like this by default:

lvmconfig --type full | grep use_devices
	use_devicesfile=0

So it feels like we're on the wrong track 😞

robertbaldyga · 2024-10-23T06:13:24Z

@fpeg26 Are you sure there is no other reference to /dev/sd* than /dev/disk/by-id/wwn-*? Maybe your filter does not filter out those paths and that's why LVM is able to start on the backend device. I even tried to look if LVM on Proxmox platform behaves somehow differently, but it doesn't seem so.

fpeg26 · 2024-10-23T07:42:00Z

The lvm.conf I posted earlier is the exact one we have on each host. Note that I filtered out both /dev/disk/by-id/wwn-* and /dev/sd* in and attempt to fix this issue. Prior to that, I was only filtering /dev/disk/by-id/wwn-*.

Unless there is another file that got populated automatically that I'm not aware of, lvm.conf is the only place where I filtered them manually.

Also, I can tell that it's working, at least partially, because if I comment out the global_filter line, I get those warnings for each filtered out devices when running lvm commands:
WARNING: Not using device /dev/sda for PV 1aSner-oik2-ad2c-fljc-Ynzg-pBkl-fjvieML.

Now, I noticed some strange behaviors while writing this comment:

on one node, commenting out the global_filter line doesn't display the above warnings at all
one another node, the warning only appears on 6/8 devices used by OpenCAS/Ceph. On another it's 4/8...
one node doesn't see the lvm at all. ceph-volume lvm list reports nothing even though the OSDs are in and up and ceph is "healthy"
only one node out of 9 in the cluster is showing the correct informations: 8 warnings and pvs, lvs, vgs, ceph in correct state

So it's really inconsistent across the cluster even thought they all share the exact same configuration.
All nodes displayed the same problem after their first reboot: lvm using sdX instead of casX-X. They were all fixed the same way using the method described earlier (#1215 (comment)) and were compliant to what was expected (pvs, lvs, vgs, ceph in correct state) but they got worse overtime after a few reboots here and there.

robertbaldyga · 2024-10-23T11:07:06Z

My guess is that gradual degradation is a result of using Write-Back. When LVM metadata is stored as a dirty data on the cache, LVM will not recognize the backend device. Once it gets flushed to the backend, after the next reboot LVM starts on the backend device because now it can see its metadata there.

Can you verify that ls -l /dev/disk/* | grep /sd* shows only wwn-*?

fpeg26 · 2024-10-23T11:28:21Z

That would make a lot of sense.

No, ls -l /dev/disk/* | grep /sd* return a whole bunch of different things like this (took one example for each type of returns):

lrwxrwxrwx 1 root root  9 Oct 15 08:06 9 -> ../../sda
lrwxrwxrwx 1 root root  9 Oct 15 08:06 ata-DELLBOSS_VD_xyz -> ../../sdk
lrwxrwxrwx 1 root root  9 Oct 15 08:06 lvm-pv-uuid-xyz -> ../../sda
lrwxrwxrwx 1 root root  9 Oct 15 08:06 scsi-xyz -> ../../sda
lrwxrwxrwx 1 root root  9 Oct 15 08:06 wwn-xyz -> ../../sda
lrwxrwxrwx 1 root root 10 Oct 15 08:06 EFI\x20System\x20Partition -> ../../sdk1
lrwxrwxrwx 1 root root 10 Oct 15 08:06 6a803660-0b54-43bb-bd8a-2e43321459411c -> ../../sdk1
lrwxrwxrwx 1 root root  9 Oct 15 08:06 pci-0000:01:00.0-sas-phy0-lun-0 -> ../../sda
lrwxrwxrwx 1 root root  9 Oct 15 08:06 pci-0000:a1:00.0-ata-1 -> ../../sdk
lrwxrwxrwx 1 root root 10 Oct 15 08:06 pci-0000:a1:00.0-ata-1.0-part1 -> ../../sdk1
lrwxrwxrwx 1 root root 10 Oct 15 08:06 W4FI-F189 -> ../../sdk1

robertbaldyga · 2024-10-25T13:07:22Z

Ok, so you'd need to filter out all of those links or add one rule for the entire /dev/disk directory like "r|/dev/disk/.*|", so that LVM could not match the backend device by neither of those paths.

Alternatively you can try the current Open CAS master (https://github.com/Open-CAS/open-cas-linux/tree/588b7756a957417430d6eca17ccb66beae051365) with use_devicesfile = 1, and backend devices removed from /etc/lvm/devices/system.devices. If that method suits better your setup, we can release the needed changes in a dot release within a few weeks.

fpeg26 · 2024-10-25T13:52:58Z

Thanks @robertbaldyga it's really appreciated.
I will test both solutions and report back asap.

I have a few questions though:

If I filter out everything from /dev/disk, what will happen with the OS disk (sdk) that has LV on it? (sda-sdh = cores, sdi-j = caches, sdk = OS disk)
What will happen if a cache disk fails and need to be replaced? I guess I will have to un-filter the raw devices affected to this drive so ceph can still function?

robertbaldyga · 2024-10-25T18:52:10Z

Thanks @robertbaldyga it's really appreciated. I will test both solutions and report back asap.

I have a few questions though:

If I filter out everything from /dev/disk, what will happen with the OS disk (sdk) that has LV on it? (sda-sdh = cores, sdi-j = caches, sdk = OS disk)

The base path /dev/sdk is not filtered out, so LVM should be able to recognize it. You can also add "a|/dev/sdk|" at the beginning of the filter list, to make sure it will not be affected by any other filter rules. That way you can even simplify your filter to something like this:

global_filter=["a|/dev/disk/by-id/wwn-of-OS-disk|", "r|/dev/zd.*|", "r|/dev/rbd.*|", "r|/dev/sd.*|", "r|/dev/disk/.*|"]

or even:

global_filter=["a|/dev/disk/by-id/wwn-of-OS-disk|", "a|/dev/cas.*|", "r|.*|"]

What will happen if a cache disk fails and need to be replaced? I guess I will have to un-filter the raw devices affected to this drive so ceph can still function?

Yes, if you want to move back to the backend devices, you need to allow them in the filter.
The major consideration when using Write-Back mode is that most of the dime the data is not fully flushed to the backend devices, so you need to make sure to flush the cache before switching back to the backend devices. If it's impossible (like during cache disk fail), then you most likely would have to reinitialize the OSDs and then Ceph would recreate the content from the replicas. That may generate additional load on the cluster. If you want to avoid such situation, you can setup a cache using two SSDs in RAID1 configuration.

fpeg26 · 2024-10-30T08:44:57Z

So I've tested the new global_filter you suggested and unfortunately, it didn't help.

devices {
     global_filter=["a|/dev/disk/by-id/ata-DELLBOSS_VD_ec320321312as2a9c1232130|", "a|/dev/cas.*|", "r|.*|"]
     types=["cas",16]
}

I rebooted the node, it came back with all OSDs down with empty folders under /var/lib/ceph/osd.
Both my caches were in incomplete state:

type    id   disk       status       write policy   device
cache   2    /dev/sdj   Incomplete   wb             -
├core   1    /dev/sde   Active       -              /dev/cas2-1
├core   2    /dev/sdf   Inactive     -              -
├core   3    /dev/sdg   Active       -              /dev/cas2-3
└core   4    /dev/sdh   Active       -              /dev/cas2-4
cache   1    /dev/sdi   Incomplete   wb             -
├core   1    /dev/sda   Inactive     -              -
├core   2    /dev/sdb   Inactive     -              -
├core   3    /dev/sdc   Active       -              /dev/cas1-3
└core   4    /dev/sdd   Active       -              /dev/cas1-4

Now, I can see that lvm is excluding the 3 inactive devices and am seeing entries like this in journalctl (only for those 3):

Oct 30 04:28:10 localhost lvm[1723]: /dev/sdf excluded: device is rejected by filter config.
Oct 30 04:28:10 localhost lvm[1722]: /dev/sdb excluded: device is rejected by filter config.
Oct 30 04:28:10 localhost lvm[1722]: /dev/sda excluded: device is rejected by filter config.

But Ceph is still using them as target which I don't understand:

NAME                                                                                      MAJ:MIN  RM   SIZE RO TYPE MOUNTPOINTS
sda                                                                                         8:0     0   2.2T  0 disk 
└─ceph--319da2ec--a844--44c5--be92--1a02e313cf08-osd--block--a328f07c--4ff7--4578--a914--aac07c344f9d
                                                                                          252:3     0   2.2T  0 lvm  
sdb                                                                                         8:16    0   2.2T  0 disk 
└─ceph--4ad2c2b7--b7fb--4b25--ba7b--029e400da73b-osd--block--b3d00f67--bd6d--45a2--9d44--2dc960d685fb
                                                                                          252:4     0   2.2T  0 lvm  
sdc                                                                                         8:32    0   2.2T  0 disk 
└─cas1-3                                                                                  251:0     0   2.2T  0 disk 
sdd                                                                                         8:48    0   2.2T  0 disk 
└─cas1-4                                                                                  251:256   0   2.2T  0 disk 
sde                                                                                         8:64    0   2.2T  0 disk 
└─cas2-1                                                                                  251:512   0   2.2T  0 disk 
sdf                                                                                         8:80    0   2.2T  0 disk 
└─ceph--1d9c8b4a--627a--463a--b357--49d61a9dfc84-osd--block--457c2716--7904--4c18--b57f--52c65b235322
                                                                                          252:2     0   2.2T  0 lvm  
sdg                                                                                         8:96    0   2.2T  0 disk 
└─cas2-3                                                                                  251:768   0   2.2T  0 disk 
sdh                                                                                         8:112   0   2.2T  0 disk 
└─cas2-4                                                                                  251:1024  0   2.2T  0 disk

And Ceph is actually only only those 3, the cas devices were not selected to mount any OSDs on them.
The end result is the same for all OSDs as said earlier; all down with empty folders and services failing.

I will fix this node, let Ceph recover it, test the use_devicesfile = 1 solution and report back.

fpeg26 · 2024-10-30T10:35:49Z

Also on a unrelated note, I was trying to compile OpenCAS master and found this line that is missing ";" at the end:
https://github.com/Open-CAS/open-cas-linux/blob/master/casadm/cas_lib.c#L1875

fpeg26 · 2024-10-30T17:08:21Z

Quick update about use_devicesfile = 1: my version of lvm doesn't have the lvmdevices command so I'm not sure what the system.devices should look like. Is it just a list of devices with one /dev/disk/by-id/xyz per line ? was not able to find much info about this online but I will continue digging more.

In the meantime, I updated Ceph (18.2.4-pve3) and OpenCAS (24.09.0.0909.master) and the behavior is the same a previously; Ceph might use the /dev/sdX devices after a reboot even if the devices are excluded by lvm filters:

  PV          VG                                        Fmt  Attr PSize   PFree
  /dev/cas1-4 ceph-b1c39c2a-508d-4c01-9d52-22a326ec9489 lvm2 a--    2.18t    0 
  /dev/cas2-4 ceph-6096de43-3a34-46aa-a023-2643c2b64707 lvm2 a--    2.18t    0 
  /dev/sda    ceph-79eb3207-f751-439e-81db-0fac4a0cbce4 lvm2 a--    2.18t    0 
  /dev/sdb    ceph-69b5a73a-211f-4023-a288-9caee1447fdb lvm2 a--    2.18t    0 
...

fpeg26 · 2024-10-31T10:07:11Z

I was able to "test" use_devicefile=1. I added a bunch of devices like this in /etc/lvm/devices/system.devices:

devices = [
    "/dev/sdk",
    "/dev/cas1-1",
    "/dev/cas1-2",
    "/dev/cas1-3",
    "/dev/cas1-4",
    "/dev/cas2-1",
    "/dev/cas2-2",
    "/dev/cas2-3",
    "/dev/cas2-4",
]

I appears that it's not right syntax but what I was able to demonstrate here is that even when forcing lvm to use a devices file, it ignores it and still attaches the Ceph vgs to random pvs like below after a reboot:

  PV          VG                                        Fmt  Attr PSize   PFree
  /dev/cas1-2 ceph-ce0ca297-0916-4942-a859-b478c09d6106 lvm2 a--    2.18t    0 
  /dev/sda    ceph-0f4c7589-18ff-4502-9e77-d6e435f233fb lvm2 a--    2.18t    0 
  /dev/sdc    ceph-d9ba47f1-9fda-409b-b3cf-aef3df2a53d1 lvm2 a--    2.18t    0 
  /dev/sdd    ceph-d8c99cfa-3e49-4937-a12e-1e26652ac997 lvm2 a--    2.18t    0 
  /dev/sde    ceph-50c1eccf-0e46-47fe-b9a2-4044bfbba663 lvm2 a--    2.18t    0 
  /dev/sdf    ceph-9bcad917-7c6b-4421-bde6-3365edff92bd lvm2 a--    2.18t    0 
  /dev/sdg    ceph-f97c1d1e-6765-407c-b96a-0b82d1a3ca7b lvm2 a--    2.18t    0 
  /dev/sdh    ceph-b8850d2b-a7ac-4436-9b66-d20a6486db8b lvm2 a--    2.18t    0

See here that it decided to use one CAS device and took raw devices for the rest of the OSDs.
To add to this, the pvs, lvs and vgs commands returned nothing until I commented out the use_devicefile=1 line in lvm.conf. So lvm knows that it should ignore every devices but decide not to follow the directive on boot.

fpeg26 · 2024-11-01T13:35:15Z

It seems that adding the open-cas.service to the After and Required section of the [email protected] fixes the issue

/usr/lib/systemd/system# cat [email protected]
[Unit]
Description=Ceph Volume activation: %i
After=local-fs.target open-cas.service
Required=open-cas.service
...

I will do more investigation with this but it's promising so far.`
edit: the issue is back on one node ... so I guess it was a false hope ... I was able to reboot a few times with no problem the issue came back after the 3rd or 4rth reboot.
edit2: got 6/6 working consecutive reboots by replacing the After/Wants settings with this:

ExecStartPre=/bin/sh -c 'until [ -e /dev/cas2-4 ]; do sleep 1; done'

edit3: tried again this morning (2 days after posting edit2) and the issue is back

Also, what's the best practice when it comes to updating OpenCAS? I've been doing it on a couple hosts but I had to re-create the caches manually and rebuild the OSDs after the update which can take a while because of the rebalance process.

fpeg26 · 2024-11-06T10:05:54Z

One more update, I updated all nodes to Ceph 19.2.0 and I can see the same behavior on restart.

robertbaldyga · 2024-11-07T20:25:15Z

@fpeg26 I did some experiments on Proxmox and I reproduced the same behavior. It seems that it initializes all the LVM volumes in initramfs. After setting filter in /etc/lvm/lvm.conf I called update-initramfs -u and it worked. Interestingly I did not observe this problem on the other distros even when rootfs was on LVM. Let me know if that resolved the issue on your setup.

fpeg26 · 2024-11-08T17:15:29Z

I've been running some tests all day and update-initramfs -u definitely seems to be helping.

One of my node was not happy about it though and was showing the same reboot behavior but I think I narrowed it down to the lvm filters.

Using devices by id is totally ignored on boot and had to use /dev/sdX instead.

Using this:
global_filter=["a|/dev/disk/by-id/wwn-of-OS-disk|", "a|/dev/cas.*|", "r|.*|"]
or this:
global_filter=["a|/dev/disk/by-id/wwn-of-OS-disk.*|", "a|/dev/cas.*|", "r|.*|"]
resulted in the host booting in initramfs mode which is an improvment in the sense that, now, I know that the filters are actually being used.
But I had to set the filters like this:
global_filter=["a|/dev/sdk.*|", "a|/dev/cas.*|", "r|.*|"]
to get a proper boot and it looks like Ceph is now using the proper cas devices.

I will report back next week after I've done more tests but it looks promising.
Thanks a lot.

fpeg26 · 2024-11-20T11:09:05Z

Hi,
I tested extensively for the past 2 weeks and I was not able to reproduce the problem since!

Still have to use devices by "name" like this:
global_filter=["a|/dev/sdk.*|", "a|/dev/cas.*|", "r|.*|"]
And I'm not entirely sure why...

The only thing I noticed is that after a reboot, the lettering can be messed up on the devices sdA-sdJ but it doesn't impact the cluster in anyway because OpenCAS is setup to use the devices by id and Ceph uses CAS devices
A reboot usually fixes the order.

Quick TLDR if somebody passes by looking for a fix:

setup lvm filters properly using devices name, not devices by id.
run initramfs -u
this only seems to be required in a proxmox environment

Thanks a lot for your help!

nfronz added the question Further information is requested label May 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache Device Incomplete and Core Devices Inactive after reboot #1215

Cache Device Incomplete and Core Devices Inactive after reboot #1215

nfronz commented May 26, 2022

mmichal10 commented May 30, 2022

brokoli18 commented Jul 7, 2023

mmichal10 commented Jul 10, 2023 •

edited

Loading

brokoli18 commented Jul 11, 2023

fpeg26 commented Sep 4, 2024

fpeg26 commented Sep 6, 2024 •

edited

Loading

mmichal10 commented Oct 17, 2024

fpeg26 commented Oct 17, 2024 •

edited

Loading

jfckm commented Oct 18, 2024

fpeg26 commented Oct 18, 2024 •

edited

Loading

robertbaldyga commented Oct 19, 2024

robertbaldyga commented Oct 19, 2024

robertbaldyga commented Oct 19, 2024

fpeg26 commented Oct 21, 2024 •

edited

Loading

robertbaldyga commented Oct 23, 2024

fpeg26 commented Oct 23, 2024 •

edited

Loading

robertbaldyga commented Oct 23, 2024

fpeg26 commented Oct 23, 2024

robertbaldyga commented Oct 25, 2024 •

edited

Loading

fpeg26 commented Oct 25, 2024

robertbaldyga commented Oct 25, 2024 •

edited

Loading

fpeg26 commented Oct 30, 2024 •

edited

Loading

fpeg26 commented Oct 30, 2024

fpeg26 commented Oct 30, 2024

fpeg26 commented Oct 31, 2024 •

edited

Loading

fpeg26 commented Nov 1, 2024 •

edited

Loading

fpeg26 commented Nov 6, 2024

robertbaldyga commented Nov 7, 2024 •

edited

Loading

fpeg26 commented Nov 8, 2024 •

edited

Loading

fpeg26 commented Nov 20, 2024 •

edited

Loading

Cache Device Incomplete and Core Devices Inactive after reboot #1215

Cache Device Incomplete and Core Devices Inactive after reboot #1215

Comments

nfronz commented May 26, 2022

Question

Motivation

Your Environment

mmichal10 commented May 30, 2022

brokoli18 commented Jul 7, 2023

mmichal10 commented Jul 10, 2023 • edited Loading

brokoli18 commented Jul 11, 2023

fpeg26 commented Sep 4, 2024

fpeg26 commented Sep 6, 2024 • edited Loading

mmichal10 commented Oct 17, 2024

fpeg26 commented Oct 17, 2024 • edited Loading

jfckm commented Oct 18, 2024

fpeg26 commented Oct 18, 2024 • edited Loading

robertbaldyga commented Oct 19, 2024

robertbaldyga commented Oct 19, 2024

robertbaldyga commented Oct 19, 2024

fpeg26 commented Oct 21, 2024 • edited Loading

robertbaldyga commented Oct 23, 2024

fpeg26 commented Oct 23, 2024 • edited Loading

robertbaldyga commented Oct 23, 2024

fpeg26 commented Oct 23, 2024

robertbaldyga commented Oct 25, 2024 • edited Loading

fpeg26 commented Oct 25, 2024

robertbaldyga commented Oct 25, 2024 • edited Loading

fpeg26 commented Oct 30, 2024 • edited Loading

fpeg26 commented Oct 30, 2024

fpeg26 commented Oct 30, 2024

fpeg26 commented Oct 31, 2024 • edited Loading

fpeg26 commented Nov 1, 2024 • edited Loading

fpeg26 commented Nov 6, 2024

robertbaldyga commented Nov 7, 2024 • edited Loading

fpeg26 commented Nov 8, 2024 • edited Loading

fpeg26 commented Nov 20, 2024 • edited Loading

mmichal10 commented Jul 10, 2023 •

edited

Loading

fpeg26 commented Sep 6, 2024 •

edited

Loading

fpeg26 commented Oct 17, 2024 •

edited

Loading

fpeg26 commented Oct 18, 2024 •

edited

Loading

fpeg26 commented Oct 21, 2024 •

edited

Loading

fpeg26 commented Oct 23, 2024 •

edited

Loading

robertbaldyga commented Oct 25, 2024 •

edited

Loading

robertbaldyga commented Oct 25, 2024 •

edited

Loading

fpeg26 commented Oct 30, 2024 •

edited

Loading

fpeg26 commented Oct 31, 2024 •

edited

Loading

fpeg26 commented Nov 1, 2024 •

edited

Loading

robertbaldyga commented Nov 7, 2024 •

edited

Loading

fpeg26 commented Nov 8, 2024 •

edited

Loading

fpeg26 commented Nov 20, 2024 •

edited

Loading