New metricbeat module: pgbouncer #41174

manuelsaks · 2024-10-08T13:57:20Z

WHAT:
The module follows the standard design patterns of Metricbeat modules, ensuring consistency with other database modules like PostgreSQL. It implements the MetricSet interface to collect metrics from PgBouncer.
The module interacts with PgBouncer using SQL queries to gather pool statistics and server performance data. The data is parsed and processed through Metricbeat's internal processing pipeline. It communicates with PgBouncer using the native Postgres protocol over TCP. It sends queries to fetch relevant metrics and processes the responses, converting them into Metricbeat-compatible events.
WHY:
PGBouncer is widely used as a connection pooler for PostgreSQL, and monitoring its performance is crucial for maintaining the health of database-backed systems. By adding this module, Metricbeat provides users with out-of-the-box monitoring capabilities for PGBouncer, enabling them to track metrics such as connection pool utilization, server performance, and query rates. This integration helps users maintain optimal database performance, reduce latency, and prevent resource exhaustion in their systems.
-->

Checklist

My code follows the style guidelines of this project
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have made corresponding change to the default configuration files
I have added tests that prove my fix is effective or that my feature works
I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Disruptive User Impact

It shouldn't impact users.

Author's Checklist

Code follows the established coding guidelines and style.
Proper error handling is in place.
Tests passing and the coverage > 80%.
Code builds as expected.
Data collection is efficient, and no unnecessary queries are run.
Documentation is updated to reflect the new module, including how to configure and use it.

How to test this PR locally

Run the docker-compose stack and wait until you will be able to reach kibana on localhost:5601

docker-compose -f ./metricbeat/module/pgbouncer/docker-compose.yml up -d

Run the metricbeat agent:

go run ./metricbeat/main.go -c ./metricbeat/module/pgbouncer/_meta/config_local.yml

Go to kibana panel, create Data View with the test pattern and check logs in discover.

cla-checker-service · 2024-10-08T13:57:26Z

💚 CLA has been signed

mergify · 2024-10-08T13:58:40Z

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b module_pgbouncer upstream/module_pgbouncer
git merge upstream/main
git push upstream module_pgbouncer

mergify · 2024-10-08T13:58:41Z

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @manuelsaks? 🙏.
For such, you'll need to label your PR with:

The upcoming major version of the Elastic Stack
The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit

mergify · 2024-10-08T13:58:42Z

backport-8.x has been added to help with the transition to the new branch 8.x.
If you don't need it please use backport-skip label and remove the backport-8.x label.

elasticmachine · 2024-10-08T14:34:19Z

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

pierrehilbert · 2024-10-08T14:35:10Z

Thanks @manuelsaks for your contribution.
As a first step before we are reviewing your work, could you please sign our CLA?

pierrehilbert · 2024-10-08T14:37:57Z

@lalit-satapathy as this is related to PostgreSQL, would you mind assigning someone from your team for the reviewing part?

pierrehilbert · 2024-10-11T15:20:58Z

/test

pierrehilbert · 2024-10-13T12:46:08Z

/test

pierrehilbert · 2024-10-14T07:11:22Z

We are moving in the right direction, one last error:

__________________________________________________________________ Test.test_index_management __________________________________________________________________
--
  |  
  | self = <test_xpack_base.Test testMethod=test_index_management>
  |  
  | @unittest.skipUnless(INTEGRATION_TESTS, "integration test")
  | def test_index_management(self):
  | """
  | Test that the template can be loaded with `setup --index-management`
  | """
  | es = Elasticsearch([self.get_elasticsearch_url()])
  | self.render_config_template(
  | modules=[{
  | "name": "apache",
  | "metricsets": ["status"],
  | "hosts": ["localhost"],
  | }],
  | elasticsearch={"host": self.get_elasticsearch_url()},
  | )
  | exit_code = self.run_beat(extra_args=["setup", "--index-management", "-E", "setup.template.overwrite=true"])
  |  
  | >       assert exit_code == 0
  | E       assert 1 == 0
  |  
  | ../../metricbeat/tests/system/test_base.py:57: AssertionError

pierrehilbert · 2024-10-14T10:06:04Z

/test

manuelsaks · 2024-10-14T10:16:02Z

The problem is:

{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "composable template [metricbeat-9.0.0] template after composition is invalid"
      }
    ],
    "type": "illegal_argument_exception",
    "reason": "composable template [metricbeat-9.0.0] template after composition is invalid",
    "caused_by": {
      "type": "illegal_argument_exception",
      "reason": "invalid composite mappings for [metricbeat-9.0.0]",
      "caused_by": {
        "type": "illegal_argument_exception",
        "reason": "Limit of total fields [10000] has been exceeded"
      }
    }
  },
  "status": 400
}

This occurs when running the test_index_management function in beats/metricbeat/tests/system/test_base.py:

def test_index_management(self):
    """
    Test that the template can be loaded with `setup --index-management`
    """
    es = Elasticsearch([self.get_elasticsearch_url()])
    self.render_config_template(
        modules=[{
            "name": "apache",
            "metricsets": ["status"],
            "hosts": ["localhost"],
        }],
        elasticsearch={"host": self.get_elasticsearch_url()},
    )
    exit_code = self.run_beat(extra_args=["setup", "--index-management", "-E", "setup.template.overwrite=true"])

    assert exit_code == 0
    assert self.log_contains('Loaded index template')
    assert len(es.cat.templates(name='metricbeat-*', h='name')) > 0

If I increase the field limit from the default: 10000:

exit_code = self.run_beat(extra_args=["setup", "--index-management", "-E", "setup.template.overwrite=true"])

to 10090:

exit_code = self.run_beat(extra_args=["setup", "--index-management", "-E", "setup.template.overwrite=true", "-E", "setup.template.settings.index.mapping.total_fields.limit=10090"])

the test passes.

Please advise what should I do now.

pierrehilbert · 2024-10-21T13:54:10Z

/test

pierrehilbert · 2024-10-22T13:42:12Z

/test

pierrehilbert · 2024-11-06T16:37:49Z

/test

pierrehilbert · 2024-11-13T16:23:40Z

/test

pierrehilbert · 2024-11-18T11:46:52Z

/test

pierrehilbert · 2024-11-18T13:11:29Z

run docs-build

pierrehilbert · 2024-11-18T13:12:42Z

@manuelsaks Sorry for the extra delay, we found the root cause of the issue you were facing and it has been fixed.
@lalit-satapathy could we please have someone from your team to review here?

lalit-satapathy · 2024-11-26T11:21:13Z

@lalit-satapathy could we please have someone from your team to review here?

@kush-elastic can you please review this new beats module?

kush-elastic · 2024-12-09T05:43:33Z

metricbeat/module/pgbouncer/lists/_meta/fields.yml

+    - name: free_clients
+      type: long
+      description: >
+        Count of free clients. These are clients that are disconnected, but PgBouncer keeps the memory around that was allocated for them so it can be reused for future clients to avoid allocations.
+    - name: used_clients
+      type: long
+      description: >
+        Count of used clients.
+    - name: login_clients
+      type: long
+      description: >
+        Count of clients in login state.


what do you think of normalizing these fields in Sub JSON fields?

{ "clients": { "free":0, "used":0, "login":0 } }

Same goes for servers as well. It will be easier for user to understand from document.

79260f5
0c5ed2d

kush-elastic

Added comments

kush-elastic · 2024-12-09T05:49:33Z

metricbeat/module/pgbouncer/lists/lists.go

+func (m *MetricSet) Fetch(reporter mb.ReporterV2) error {
+	ctx := context.Background()
+	results, err := m.QueryStats(ctx, "SHOW LISTS;")


If you need context here, you should probably use interface ReportingMetricSetV2WithContext which will give you Fetch(ctx context.Context, r ReporterV2) error.
Avoid using Background context here.

aba1559
2046288

metricbeat/module/pgbouncer/lists/lists.go

kush-elastic · 2024-12-09T08:48:47Z

metricbeat/module/pgbouncer/lists/lists.go

+	for _, s := range results {
+		listValue, ok := s["list"].(string)
+		if !ok {
+			return fmt.Errorf("expected string type for 'list' but got something else")


are we suppose to not collect remaining results and just return error from here?

Yeah, it might be better to log a warning instead of returning an error here, so that we can still process the remaining valid results. What do you think?

kush-elastic · 2024-12-09T08:50:19Z

metricbeat/module/pgbouncer/lists/lists_integration_test.go

+	if len(errs) > 0 {
+		t.Fatalf("Expected 0 error, had %d. %v\n", len(errs), errs)
+	}
+	assert.NotEmpty(t, events)


Suggested change

if len(errs) > 0 {

t.Fatalf("Expected 0 error, had %d. %v\n", len(errs), errs)

}

assert.NotEmpty(t, events)

require.Empty(t, errs, "Expected no errors during fetch")

require.NotEmpty(t, events, "Expected to receive at least one event")

kush-elastic · 2024-12-09T08:55:23Z

metricbeat/module/pgbouncer/mem/_meta/data.json

+            "credentials_cache": {
+              "size": 616,
+              "used": 1,
+              "free": 49,
+              "memtotal": 30800
+            },
+            "peer_pool_cache": {
+              "size": 616,
+              "used": 1,
+              "free": 49,
+              "memtotal": 30800
+            },


NOT SURE IF APPLICABLE:
how about normalizing this from cache?
if in future you add something else in the metricset it would be easier to manage those data as well.

Suggested change

"credentials_cache": {

"size": 616,

"used": 1,

"free": 49,

"memtotal": 30800

},

"peer_pool_cache": {

"size": 616,

"used": 1,

"free": 49,

"memtotal": 30800

},

{

"cache": {

"credentials": {

"size": 616,

"used": 1,

"free": 49,

"memtotal": 30800

},

"peer_pool": {

"size": 616,

"used": 1,

"free": 49,

"memtotal": 30800

}

}

}

These are the only cache metrics, so I'm not sure if extracting them is necessary.

kush-elastic · 2024-12-09T08:56:38Z

metricbeat/module/pgbouncer/mem/mem.go

+func (m *MetricSet) Fetch(reporter mb.ReporterV2) error {
+	// Create a new context for this operation.
+	ctx := context.Background()


Same goes here. you can use ReportingMetricSetV2WithContext.

aba1559
2046288

kush-elastic · 2024-12-09T09:01:14Z

metricbeat/module/pgbouncer/mem/mem.go

+		tmpData, err := schema.Apply(result)
+		if err != nil {
+			// Log the error and skip this iteration if schema application fails.
+			log.Printf("Error applying schema: %v", err)


BaseMetricSet should already have logger, please initialize that and use it across module.

kush-elastic · 2024-12-09T09:01:55Z

metricbeat/module/pgbouncer/mem/mem_integration_test.go

+	if len(errs) > 0 {
+		t.Fatalf("Expected 0 error, had %d. %v\n", len(errs), errs)
+	}
+	assert.NotEmpty(t, events)


Please refer previous comment and update accordingly.

Co-authored-by: Kush Rana <[email protected]>

manuelsaks requested review from a team as code owners October 8, 2024 13:57

manuelsaks requested review from faec and leehinman October 8, 2024 13:57

botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Oct 8, 2024

mergify bot assigned manuelsaks Oct 8, 2024

mergify bot added the backport-8.x Automated backport to the 8.x branch with mergify label Oct 8, 2024

manuelsaks force-pushed the module_pgbouncer branch from 3f19266 to 22fe659 Compare October 8, 2024 14:26

pierrehilbert added the Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team label Oct 8, 2024

botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Oct 8, 2024

adds pgbouncer module

458f523

manuelsaks added 12 commits October 8, 2024 16:38

adds pgbouncer module

c366ea0

add metricsets

9754c9d

fix list metricset

54bbcc7

fix

0d688c2

add mem metricset

21a248c

add mem metricset

4731009

add mem metricset

ab504d1

simplify metricsets

5cfe4f4

simplify metricsets

182af1d

add meta

ad0a653

add tests

a7d301a

cleanup

a929b0b

Merge branch 'main' into module_pgbouncer

044a889

pierrehilbert requested a review from lalit-satapathy October 14, 2024 13:03

Merge branch 'main' into module_pgbouncer

ceb5f55

Merge branch 'main' into module_pgbouncer

6667c4d

shmsr mentioned this pull request Nov 5, 2024

x-pack/metricbeat/module/openai: Add new module #41516

Merged

8 tasks

Merge branch 'main' into module_pgbouncer

6e7d90d

Merge branch 'main' into module_pgbouncer

75b3dde

Merge branch 'main' into module_pgbouncer

0a23250

kush-elastic reviewed Dec 9, 2024

View reviewed changes

manuelsaks and others added 6 commits December 10, 2024 20:03

Refactor: Inline resultMap assignment

2222740

Co-authored-by: Kush Rana <[email protected]>

Normalize fields to json

79260f5

Refactor: Use context in Fetch method

aba1559

Refactor: Use context in Fetch method

7973937

update normalized data

0c5ed2d

Refactor: Use context in Fetch method

2046288

New metricbeat module: pgbouncer #41174

Are you sure you want to change the base?

New metricbeat module: pgbouncer #41174

Conversation

manuelsaks commented Oct 8, 2024 • edited Loading

Checklist

Disruptive User Impact

Author's Checklist

How to test this PR locally

cla-checker-service bot commented Oct 8, 2024 • edited Loading

mergify bot commented Oct 8, 2024

mergify bot commented Oct 8, 2024

mergify bot commented Oct 8, 2024

elasticmachine commented Oct 8, 2024

pierrehilbert commented Oct 8, 2024

pierrehilbert commented Oct 8, 2024

pierrehilbert commented Oct 11, 2024

pierrehilbert commented Oct 13, 2024

pierrehilbert commented Oct 14, 2024

pierrehilbert commented Oct 14, 2024

manuelsaks commented Oct 14, 2024 • edited Loading

pierrehilbert commented Oct 21, 2024

pierrehilbert commented Oct 22, 2024

pierrehilbert commented Nov 6, 2024

pierrehilbert commented Nov 13, 2024

pierrehilbert commented Nov 18, 2024

pierrehilbert commented Nov 18, 2024

pierrehilbert commented Nov 18, 2024

lalit-satapathy commented Nov 26, 2024

Choose a reason for hiding this comment

manuelsaks Dec 10, 2024 • edited Loading

Choose a reason for hiding this comment

kush-elastic left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

manuelsaks Dec 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

manuelsaks Dec 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

manuelsaks Dec 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

manuelsaks commented Oct 8, 2024 •

edited

Loading

cla-checker-service bot commented Oct 8, 2024 •

edited

Loading

manuelsaks commented Oct 14, 2024 •

edited

Loading

manuelsaks Dec 10, 2024 •

edited

Loading

manuelsaks Dec 10, 2024 •

edited

Loading

manuelsaks Dec 10, 2024 •

edited

Loading

manuelsaks Dec 10, 2024 •

edited

Loading