Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OSD Permissions Denied when using OSD Check Plugins with Nagios 4.3.4 / NPRE 3.2.1 #36

Open
Kbirkland opened this issue Jan 10, 2018 · 6 comments

Comments

@Kbirkland
Copy link

When attempting to check OSD with Nagios using NRPE, I am getting the following error:
OSD ERROR: 2018-01-10 14:18:26.252441 7f67360c7700 -1 asok(0x7f6730001680) AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to '/var/run/ceph/ceph-client.nagios.1784273.140081163671472.asok': (13) Permission denied .

I have followed the documentation where we create the keyring file for nagios:
ceph auth get-or-create client.nagios mon 'allow r' > /etc/ceph/client.nagios.keyring

thank you

Karl Birkland

@valerytschopp
Copy link
Contributor

valerytschopp commented Jan 11, 2018

What is the command you are using ? (e.g. ./check_ceph_osd -H ceph-storage-0.example.com -k client.nagios.keyring -i nagios)

@Kbirkland
Copy link
Author

[root@example ~ ]# /usr/local/nagios/libexec/check_ceph_osd --id nagios --keyring /etc/ceph/client.nagios.keyring -H 10.x.y.z -I 28
OSD OK
Up OSDs: osd.28
Down+In OSDs:
Down+Out OSDs:
| 'osd_up'=1 'osd_down_in'=0;;2 'osd_down_out'=0;;2

[root@example libexec]# su - nagios

[nagios@example~]$ /usr/local/nagios/libexec/check_ceph_osd --id nagios --keyring /etc/ceph/client.nagios.keyring -H 10.x.y.z -I 28
OSD ERROR: 2018-01-11 07:14:43.368155 7fe62f823700 -1 asok(0x7fe628001680) AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to '/var/run/ceph/ceph-client.nagios.698493.140626490300336.asok': (13) Permission denied

[root@example] ceph auth list
client.nagios
key: key-removed-for-security reasons==
caps: [mon] allow r

For security reasons, would prefer NOT to use the SUDO option to allow this command to run.
We use NRPE to execute the command, but as you can see, even locally as the user nagios is does not run properly.

@valerytschopp
Copy link
Contributor

It should be a file permission problem. Here is what I have for the user nagios:

$ id
uid=6001(nagios) gid=792(nagios) groups=792(nagios),4(adm)

$ ls -l /etc/ceph/ceph.conf 
-rw-r--r-- 1 root root 578 Dec 21 13:50 /etc/ceph/ceph.conf

$ ls -l /etc/ceph/ceph.client.nagios.keyring 
-rw-r----- 1 root nagios 65 Jan 11 10:59 /etc/ceph/ceph.client.nagios.keyring

$ ceph --id nagios health
HEALTH_OK

$ ceph --id nagios osd dump
epoch 192705
fsid ...
...

@Kbirkland
Copy link
Author

Adjusting the permissions to the keyring file did help - but only when checking ceph locally. It is still failing with the error about permissions being denied when being run from NRPE.

Xinetd has NRPE configured to run as user 'nagios'.

Running Locally:

[root@ceph1 libexec]# ./check_ceph_osd -H 10.x.y.z -i nagios -k /etc/ceph/client.nagios.keyring -I 0
OSD OK
Up OSDs: osd.0
Down+In OSDs:
Down+Out OSDs:
| 'osd_up'=1 'osd_down_in'=0;;2 'osd_down_out'=0;;2

Running through NRPE

[root@nagios]# /usr/local/nagios/libexec/check_nrpe -H 10.x.y.z -c check_ceph_osd -a 10.x.y.z 0
OSD ERROR: 2018-01-12 07:59:29.591544 7f7f5902b700 -1 asok(0x7f7f54001680) AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to '/var/run/ceph/ceph-client.nagios.2213551.140184846866352.asok': (13) Permission denied

NRPE config on CEPH Server for osd:
command[check_ceph_osd]=/usr/local/nagios/libexec/check_ceph_osd --id nagios --keyring /etc/ceph/client.nagios.keyring -H $ARG1$ -I $ARG2$

@valerytschopp
Copy link
Contributor

So if I understand this NRPE line /usr/local/nagios/libexec/check_nrpe -H 10.x.y.z -c check_ceph_osd -a 10.x.y.z 0 correctly, it means: run the plugin check_ceph_osd on remote host 10.x.y.z with the parameters ARGS1=10.x.y.z and ARG2=0

But there is a misunderstanding here. The plugin check_ceph_osd should not directly on the OSD, it should run on the same host where the local run is working (e.g. ceph1), so the -H IP_ADDRESS parameter should the IP address of your ceph1 host, and for the -a ARG1 ARG2 parameter, ARG1 should be the IP address OSD storage node...

I'm sorry, we don't use NRPE here, but MRPE. And the plugin runs on the simple node, configured as a Ceph client. It is not a MON or OSD host, but just a node with the ceph-common package, and the correct /etc/ceph/ceph.conf file and the keyings.

@Kbirkland
Copy link
Author

Yes, the check_nrpe command runs from the Nagios server - making a remote call to the CEPHX node, where it runs check_ceph_osd -H "CEPHX-IP Address" -I 0
I can run the check_ceph_osd on the CEPHX host, and with the same command line arguments, and I get valid output. Only when it is run through NRPE do I get the Permissions problem. We utilize XINETD to handle NRPE calls, and for NRPE calls, it is configured to run as the user nagios.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants