Recover CouchDB monitoring post Alma9 migration #12230

amaltaro · 2025-01-16T19:32:43Z

Impact of the new feature
Central CouchDB instances

Is your feature request related to a problem? Please describe.
This issue is a result of the migration of CouchDB services to Alma9, where we started adopting a CouchDB image - provided with CMSKubernetes/pull/1502 - instead of the old RPM-base deployment in the Openstack VMs.

With that migration, it looks like the vm.args file wasn't configured correctly, causing the CouchDB instances to no longer have a specific name, now it shows in the logs as nonode@nohost. There is more to this vm.args file, as database content is actually associated to the node name - so extra care is needed with that, once we get to this.

In addition, there is no more process scrapping CouchDB metrics and pushing it to monitoring. IIRC, last discussions with Aroosha suggested that we would be running a second container in those VMs only for scrapping and pushing metrics upstream. It needs to be confirmed though.

Describe the solution you'd like
We need to recover the CouchDB monitoring dashboard for central CouchDBs.

In addition to have data again available in MonIT and properly separated by each CouchDB instance, we need to have the proper exporters (couchdb exporter from prometheus?) running along with each CouchDB instance.

Describe alternatives you've considered
This issue needs to be addressed with the HTTP team (Aroosha).

For extra context, a somehow recent configuration change that we made was provided with dmwm/deployment#1345

Additional context
None

The text was updated successfully, but these errors were encountered:

amaltaro added New Feature Monitoring CouchDB labels Jan 16, 2025

amaltaro added this to WMCore quarterly developments Jan 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recover CouchDB monitoring post Alma9 migration #12230

Recover CouchDB monitoring post Alma9 migration #12230

amaltaro commented Jan 16, 2025

Recover CouchDB monitoring post Alma9 migration #12230

Recover CouchDB monitoring post Alma9 migration #12230

Comments

amaltaro commented Jan 16, 2025