-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache Device Incomplete and Core Devices Inactive after reboot #1215
Comments
Hi @nfronz thank you for posting the question. Did Open CAS print any information to dmesg during booting to the OS? What is the state of the cache instance after the reboot? Is is not running at all? Or is it in an incomplete state? Could you share |
@mmichal10 I have the same issue as the above asker. What I did is deploy opencas on several ceph nodes in a test lab using the instructions in https://open-cas.github.io/getting_started_open_cas_linux.html. Casadm version: 22.12.0.0844.master While changing one of the config options the casadm command hung and as I couldnt kill it I rebooted the node. I assume that becouse I had no
Looking at the actual block devices it seems that ceph is now trying to run on the raw disks instead of the cas devices as well:
Is there anything that can be done to fix this situation? P.S. Here are some (what I assume) relevant dmesg logs:
|
Hello @brokoli18, The cache couldn't be loaded because CAS couldn't open core devices exclusively. Would it be possible stop the cache, detach the disks from ceph, load the cache again and attach cas exported objects (/dev/cas1-X) to ceph? |
Thank you for your response. What exactly do you mean by detach the disks from ceph. Do you mean:
For your reference I have tried option 1 by masking the |
Hi, Just letting you know that we're having the issue in our setup. The only "fix" that I found was to add this line in the open-cas.service file: I've also been experimenting on the startup order of the open-cas, lvm and ceph services with no luck. Any help on this would be greatly appreciated. |
It also seems like if you zap the ceph volumes attached to the /dev/sd{a..d}1 devices like this: |
Hello @fpeg26, |
Hi @mmichal10, Thanks a lot for taking the time to have look at this issue. If that helps, we're running this version of OpenCAS: Let me know if I can provide more details Regards |
Hi @fpeg26, |
Hi, Thank you for taking the time to give this a try. Here is the content of our lvm.conf file we're using (with drive wwn ids masked).
Unfortunately, I won't be able to upload the other 2 requested files in public. Our OpenCAS setup looks like this:
When a node fails to attach the core devices to the cache on boot, we see those lines in journalctl (one for each core device):
Ceph then uses /dev/sdX as lvms before OpenCAS was able to attach the core devices. Just out of curiosity, when you restart a node, do you do anything special like flushing the cache or outing the osds on ceph, or do you just issue a simple reboot command? Shuting down a node that's gonna fail on next reboot results in those lines in journalctl:
Also (but that's a separate issue), we're using OpenCAS for a workstation workload and we noticed that the cache never flushes on ALRU policy because the cluster never sleeps. Even with an Activity Threshold set to 1ms. For now we're doing it manually. Hopefully this will help you understand our issue a bit better. Regards |
Hi @fpeg26 , I did a little experiment and managed to get a reproduction of the problem.
It turns out that lvm by default is configured to use a new mechanism (the devices file) instead of filters.
After the reboot the cores were added correctly and the lvms detected on top of Open CAS devices:
Let me know it that solves the problem on your configuration. |
I also managed to get the proper behavior with |
We have two redundant shutdown methods. One is this service that you just noticed failing, another one is systemd-shutdown script, that is executed later in the shutdown process. In my tests with Ceph I also see the first one failing, but the second one correctly stops the cache (you can see it in kernel log). Not super elegant, but good enough. |
Hi @robertbaldyga, Thank you for taking the time to look into this further Unfortunately, we don't see the LVM filter message on any of the hosts in our cluster. This is our setup to compare with yours:
I will apply the configuration change on each host to observe how it behaves with this setting in place. Thanks as well for clarifying the shutdown methods. I just wanted to confirm that shutting down a node without flushing the cache is considered a good practice in a Ceph environment. EDIT:
So it feels like we're on the wrong track 😞 |
@fpeg26 Are you sure there is no other reference to |
The lvm.conf I posted earlier is the exact one we have on each host. Note that I filtered out both Unless there is another file that got populated automatically that I'm not aware of, lvm.conf is the only place where I filtered them manually. Also, I can tell that it's working, at least partially, because if I comment out the global_filter line, I get those warnings for each filtered out devices when running lvm commands: Now, I noticed some strange behaviors while writing this comment:
So it's really inconsistent across the cluster even thought they all share the exact same configuration. |
My guess is that gradual degradation is a result of using Write-Back. When LVM metadata is stored as a dirty data on the cache, LVM will not recognize the backend device. Once it gets flushed to the backend, after the next reboot LVM starts on the backend device because now it can see its metadata there. Can you verify that |
That would make a lot of sense. No,
|
Ok, so you'd need to filter out all of those links or add one rule for the entire Alternatively you can try the current Open CAS master (https://github.com/Open-CAS/open-cas-linux/tree/588b7756a957417430d6eca17ccb66beae051365) with |
Thanks @robertbaldyga it's really appreciated. I have a few questions though:
|
The base path
or even:
Yes, if you want to move back to the backend devices, you need to allow them in the filter. |
So I've tested the new global_filter you suggested and unfortunately, it didn't help.
I rebooted the node, it came back with all OSDs down with empty folders under
Now, I can see that lvm is excluding the 3 inactive devices and am seeing entries like this in journalctl (only for those 3):
But Ceph is still using them as target which I don't understand:
And Ceph is actually only only those 3, the cas devices were not selected to mount any OSDs on them. I will fix this node, let Ceph recover it, test the |
Also on a unrelated note, I was trying to compile OpenCAS master and found this line that is missing ";" at the end: |
Quick update about In the meantime, I updated Ceph (18.2.4-pve3) and OpenCAS (24.09.0.0909.master) and the behavior is the same a previously; Ceph might use the /dev/sdX devices after a reboot even if the devices are excluded by lvm filters:
|
I was able to "test"
I appears that it's not right syntax but what I was able to demonstrate here is that even when forcing lvm to use a devices file, it ignores it and still attaches the Ceph vgs to random pvs like below after a reboot:
See here that it decided to use one CAS device and took raw devices for the rest of the OSDs. |
It seems that adding the open-cas.service to the
I will do more investigation with this but it's promising so far.`
edit3: tried again this morning (2 days after posting edit2) and the issue is back Also, what's the best practice when it comes to updating OpenCAS? I've been doing it on a couple hosts but I had to re-create the caches manually and rebuild the OSDs after the update which can take a while because of the rebalance process. |
One more update, I updated all nodes to Ceph 19.2.0 and I can see the same behavior on restart. |
@fpeg26 I did some experiments on Proxmox and I reproduced the same behavior. It seems that it initializes all the LVM volumes in initramfs. After setting filter in |
I've been running some tests all day and One of my node was not happy about it though and was showing the same reboot behavior but I think I narrowed it down to the lvm filters. Using devices by id is totally ignored on boot and had to use /dev/sdX instead. Using this: I will report back next week after I've done more tests but it looks promising. |
Hi, Still have to use devices by "name" like this: The only thing I noticed is that after a reboot, the lettering can be messed up on the devices sdA-sdJ but it doesn't impact the cluster in anyway because OpenCAS is setup to use the devices by id and Ceph uses CAS devices Quick TLDR if somebody passes by looking for a fix:
Thanks a lot for your help! |
Question
Why don't Cache and cores activate properly on reboot?
Motivation
I broke my Ceph Cluster as I have Open CAS running on all OSDs
Your Environment
1 /dev/disk/by-id/nvme-Samsung_SSD_980_PRO_1TB_S5P2NG0R508005Y wb cache_line_size=4,ioclass_file=/etc/opencas/ansible/default.csv,cleaning_policy=alru,promotion_policy=always
[cores]
1 1 /dev/disk/by-id/wwn-0x5000cca255c01f9d-part1
1 2 /dev/disk/by-id/wwn-0x5000c500c93f187a-part1
1 3 /dev/disk/by-id/wwn-0x5000c50085e9d2eb-part1
1 4 /dev/disk/by-id/wwn-0x5000c50085b40c5b-part1
1 5 /dev/disk/by-id/wwn-0x5000c50085e47697-part1
1 6 /dev/disk/by-id/wwn-0x5000c50085e577ff-part1
1 7 /dev/disk/by-id/wwn-0x5000c50085dd1293-part1
The text was updated successfully, but these errors were encountered: