plex issue, though not kube-plex specific, around nfs and /config becoming corrupt #20

karezza · 2021-09-23T02:57:23Z

karezza
Sep 23, 2021

I've learned the hard way that plex has issues with using an nfs share as its config folder. The plexinc/pms-docker image notes that the issue is around file locking not being enabled by default with nfs.

In my storageClass nfs mount options I'm using 'hard' and 'nfsvers=4.2', yet I still watch in amazement as my plex server stops working after awhile and seems to only come back to a working condition by deleting the deployment, erasing the config pvc, recreating the pvc, and restarting.

My nfs server and all my cluster nodes are running the latest centos-8-stream.

If you guys are using NFS for your config mount, what mount options are you using (on both server and client side)? If you are using something else, what has worked for you for an onprem storage solution to store your /config on as a network share?

ressu · 2021-09-25T10:22:19Z

ressu
Sep 25, 2021
Maintainer

My configuration is on a iSCSI volume (backed by https://github.com/ressu/synology-csi) and aside from CSI isssues, the configuration is stable and works as expected.

But I do recall some mentions of issues if database is on NFS. There are some mentions of locking issues in the discussion there, but if your issues are resolved by simply recreating the PVC (effectively remounting the volume) then locking is an unlikely cause. Locking issues would show up as data corruption instead.

Have you tried rebooting the node? While it is disruptive to the other pods running on the node, it would confirm whether this is a mount issue or an issue somewhere else.

0 replies

karezza · 2021-09-25T12:58:44Z

karezza
Sep 25, 2021
Author

It isn't fixed by remounting, only by deleting the pvc, recreating the pvc, and mounting the new pvc. Definitely "appears to be" corruption. The issues occur within and hour or two of the initial discovery and loading of the available media files into the libraries. I've started over several times with different mount options with no luck. My main share is off a 2019 windows server that I am mounting on a Linux system which uses it for an nfs export. I then am using that nfs share for dynamic provisioning in the cluster. It seems to work for everything so far, except for plex and nextcloud. I am concerned everything else I have recently set up will all become corrupted as well. Looks like I need a better solution. Looking for ideas though I wasn't expecting a sudden expense...

…

On Sat, Sep 25, 2021, 4:22 AM Sami Haahtinen ***@***.***> wrote: My configuration is on a iSCSI volume (backed by https://github.com/ressu/synology-csi) and aside from CSI isssues, the configuration is stable and works as expected. But I do recall some mentions <https://www.reddit.com/r/PleX/comments/ff4a59/plex_hangs_with_library_and_database_on_nfs/> of issues if database is on NFS. There are some mentions of locking issues in the discussion there, but if your issues are resolved by simply recreating the PVC (effectively remounting the volume) then locking is an unlikely cause. Locking issues would show up as data corruption instead. Have you tried rebooting the node? While it is disruptive to the other pods running on the node, it would confirm whether this is a mount issue or an issue somewhere else. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#19 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEHXQ7IMV4EHV2OVMFNS2KLUDWPGPANCNFSM5ESWC5HA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

0 replies

ressu · 2021-09-26T11:32:50Z

ressu
Sep 26, 2021
Maintainer

Oh, so you need to delete the data to get everything back running. Got it.

I'm not sure how locking with Windows 2019 works, so you could try setting mountOptions in the persistent volume definition to include local_lock=all. Local locking will override any NLM and handle the locking on the kubernetes node. This should fix the issue, but you should avoid read-write mounting the same volume on multiple machines since locking isn't respected on other machines.

On the upside, the way my fork of kube-plex is built avoids mounting configuration on anything but the main Plex pod. I did this because multi-mounting an ext4 filesystem isn't supported and I was too lazy to seek out other alternatives 😆

0 replies

karezza · 2021-10-02T01:29:31Z

karezza
Oct 2, 2021
Author

Actually, I got some details wrong. My data folder is a cifs mount, but my config mount is via an nfs share hosted on a linux system with a local folder. The local folder is via locally mounted drive (a virtual disk). So, the nfs share is just a disk on a linux system. At least this removes one layer.

I've learned if I wait awhile the plex server comes back up. It's very odd. Logs don't give any clues as far as I can tell. Still looking into local_lock=all.

I can log into the pod, install telnet and connect to 32400, then do a GET /web/index.html and it will just hang, most of the time eventually working after several minutes. Though once it took so long I just restarted the pod.

Continuing to investigate...

1 reply

ressu Oct 2, 2021
Maintainer

Actually, I got some details wrong. My data folder is a cifs mount, but my config mount is via an nfs share hosted on a linux system with a local folder. The local folder is via locally mounted drive (a virtual disk). So, the nfs share is just a disk on a linux system. At least this removes one layer.

Just to double check, you are saying that the config is still mounted via NFS PV to the pod, but the change here is that the server is a Linux NFS server instead of Windows 2019?

I can log into the pod, install telnet and connect to 32400, then do a GET /web/index.html and it will just hang, most of the time eventually working after several minutes. Though once it took so long I just restarted the pod.

One thing you could do is check the kernel log (either via dmesg or via log files in /var/log) to see if there are any NFS or locking errors happening during the time of the hang. This sounds a lot like a locking issue to me, where Plex is trying to acquire a lock to a file and failing. This means that local_lock should help. But if you keep seeing similar locking issues on other NFS connections too, I would recommend investigating the locking setup in more detail.

karezza · 2021-10-06T14:37:15Z

karezza
Oct 6, 2021
Author

Started working after I did two things:

Uncheck 'Enable Relay' which was enabled by default, seems like something that should be disabled by default or able to be set by an environment variable
Change my volumes to be RWO instead of RWX

I also switched to longhorn (iscsi), but I think my NFS setup would have also worked with RWO.

Without RWX I can't play around with the transcode working in a separate pod, so, unable to take advantage of resources on any node. Well, one thing at a time, at least I can retire my docker vm now and stay in kubernetes. Thank you for your help.

0 replies

lknite · 2021-12-24T23:33:42Z

lknite
Dec 24, 2021

Discussed this with plex and ended up completely abandoning NFS in my kubernetes clusters and recommend anyone using NFS to do the same. Don't believe anyone when they tell you NFS or CIFS has fully functional file locking, and they can be used as kubernetes provider solution. Even they work for awhile, data corruption is inevitable.

I ended up going with longhorn that uses a local disk on each worker node to allocate PVCs. It uses iscsi as needed without having to be involved at all in the iscsi setup, it does everything other than installing iscsi itself. Longhorn, and other similar products also create multiple copies of PVCs so if you were to lose a worker node completely the data would not be lost. I've had zero issues since switching to this solution and can hardly believe how fast plex is. Also, I should be able to re-enable kube-plex now and enjoy this project again with its separate instance of a transcoding pod. My media folders are still via CIFS but this is ok as they are only used in a read-only manner (except for occasionally deleting something via the gui).

See the discussion here:
https://forums.plex.tv/t/roadmap-to-allow-network-share-for-configuration-data/761162

1 reply

ressu Dec 25, 2021
Maintainer

Yeah, this seems to be an inherent problem with the way SQLite and NFS interact.

see: https://www.sqlite.org/faq.html#q5

I'll look into updating the readme to mention this problem. I personally use iSCSI as storage and kept clear of NFS due to performance issues. But locking is a bigger concern.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

plex issue, though not kube-plex specific, around nfs and /config becoming corrupt #20

{{title}}

Replies: 6 comments 2 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

plex issue, though not kube-plex specific, around nfs and /config becoming corrupt #20

karezza Sep 23, 2021

Replies: 6 comments · 2 replies

ressu Sep 25, 2021 Maintainer

karezza Sep 25, 2021 Author

ressu Sep 26, 2021 Maintainer

karezza Oct 2, 2021 Author

ressu Oct 2, 2021 Maintainer

karezza Oct 6, 2021 Author

lknite Dec 24, 2021

ressu Dec 25, 2021 Maintainer

karezza
Sep 23, 2021

Replies: 6 comments 2 replies

ressu
Sep 25, 2021
Maintainer

karezza
Sep 25, 2021
Author

ressu
Sep 26, 2021
Maintainer

karezza
Oct 2, 2021
Author

ressu Oct 2, 2021
Maintainer

karezza
Oct 6, 2021
Author

lknite
Dec 24, 2021

ressu Dec 25, 2021
Maintainer