Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New metricbeat module: pgbouncer #41174

Open
wants to merge 52 commits into
base: main
Choose a base branch
from

Conversation

manuelsaks
Copy link

@manuelsaks manuelsaks commented Oct 8, 2024

  • WHAT:
    The module follows the standard design patterns of Metricbeat modules, ensuring consistency with other database modules like PostgreSQL. It implements the MetricSet interface to collect metrics from PgBouncer.
    The module interacts with PgBouncer using SQL queries to gather pool statistics and server performance data. The data is parsed and processed through Metricbeat's internal processing pipeline. It communicates with PgBouncer using the native Postgres protocol over TCP. It sends queries to fetch relevant metrics and processes the responses, converting them into Metricbeat-compatible events.
  • WHY:
    PGBouncer is widely used as a connection pooler for PostgreSQL, and monitoring its performance is crucial for maintaining the health of database-backed systems. By adding this module, Metricbeat provides users with out-of-the-box monitoring capabilities for PGBouncer, enabling them to track metrics such as connection pool utilization, server performance, and query rates. This integration helps users maintain optimal database performance, reduce latency, and prevent resource exhaustion in their systems.
    -->

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Disruptive User Impact

It shouldn't impact users.

Author's Checklist

  • Code follows the established coding guidelines and style.
  • Proper error handling is in place.
  • Tests passing and the coverage > 80%.
  • Code builds as expected.
  • Data collection is efficient, and no unnecessary queries are run.
  • Documentation is updated to reflect the new module, including how to configure and use it.

How to test this PR locally

  1. Run the docker-compose stack and wait until you will be able to reach kibana on localhost:5601
docker-compose -f ./metricbeat/module/pgbouncer/docker-compose.yml up -d
  1. Run the metricbeat agent:
go run ./metricbeat/main.go -c ./metricbeat/module/pgbouncer/_meta/config_local.yml
  1. Go to kibana panel, create Data View with the test pattern and check logs in discover.

@manuelsaks manuelsaks requested review from a team as code owners October 8, 2024 13:57
@manuelsaks manuelsaks requested review from faec and leehinman October 8, 2024 13:57
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Oct 8, 2024
Copy link

cla-checker-service bot commented Oct 8, 2024

💚 CLA has been signed

Copy link
Contributor

mergify bot commented Oct 8, 2024

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b module_pgbouncer upstream/module_pgbouncer
git merge upstream/main
git push upstream module_pgbouncer

Copy link
Contributor

mergify bot commented Oct 8, 2024

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @manuelsaks? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit

Copy link
Contributor

mergify bot commented Oct 8, 2024

backport-8.x has been added to help with the transition to the new branch 8.x.
If you don't need it please use backport-skip label and remove the backport-8.x label.

@mergify mergify bot added the backport-8.x Automated backport to the 8.x branch with mergify label Oct 8, 2024
@pierrehilbert pierrehilbert added the Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team label Oct 8, 2024
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Oct 8, 2024
@pierrehilbert
Copy link
Collaborator

Thanks @manuelsaks for your contribution.
As a first step before we are reviewing your work, could you please sign our CLA?

@pierrehilbert
Copy link
Collaborator

@lalit-satapathy as this is related to PostgreSQL, would you mind assigning someone from your team for the reviewing part?

@pierrehilbert
Copy link
Collaborator

/test

1 similar comment
@pierrehilbert
Copy link
Collaborator

/test

@pierrehilbert
Copy link
Collaborator

We are moving in the right direction, one last error:

__________________________________________________________________ Test.test_index_management __________________________________________________________________
--
  |  
  | self = <test_xpack_base.Test testMethod=test_index_management>
  |  
  | @unittest.skipUnless(INTEGRATION_TESTS, "integration test")
  | def test_index_management(self):
  | """
  | Test that the template can be loaded with `setup --index-management`
  | """
  | es = Elasticsearch([self.get_elasticsearch_url()])
  | self.render_config_template(
  | modules=[{
  | "name": "apache",
  | "metricsets": ["status"],
  | "hosts": ["localhost"],
  | }],
  | elasticsearch={"host": self.get_elasticsearch_url()},
  | )
  | exit_code = self.run_beat(extra_args=["setup", "--index-management", "-E", "setup.template.overwrite=true"])
  |  
  | >       assert exit_code == 0
  | E       assert 1 == 0
  |  
  | ../../metricbeat/tests/system/test_base.py:57: AssertionError

@pierrehilbert
Copy link
Collaborator

/test

@manuelsaks
Copy link
Author

manuelsaks commented Oct 14, 2024

The problem is:

{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "composable template [metricbeat-9.0.0] template after composition is invalid"
      }
    ],
    "type": "illegal_argument_exception",
    "reason": "composable template [metricbeat-9.0.0] template after composition is invalid",
    "caused_by": {
      "type": "illegal_argument_exception",
      "reason": "invalid composite mappings for [metricbeat-9.0.0]",
      "caused_by": {
        "type": "illegal_argument_exception",
        "reason": "Limit of total fields [10000] has been exceeded"
      }
    }
  },
  "status": 400
}

This occurs when running the test_index_management function in beats/metricbeat/tests/system/test_base.py:

def test_index_management(self):
    """
    Test that the template can be loaded with `setup --index-management`
    """
    es = Elasticsearch([self.get_elasticsearch_url()])
    self.render_config_template(
        modules=[{
            "name": "apache",
            "metricsets": ["status"],
            "hosts": ["localhost"],
        }],
        elasticsearch={"host": self.get_elasticsearch_url()},
    )
    exit_code = self.run_beat(extra_args=["setup", "--index-management", "-E", "setup.template.overwrite=true"])

    assert exit_code == 0
    assert self.log_contains('Loaded index template')
    assert len(es.cat.templates(name='metricbeat-*', h='name')) > 0

If I increase the field limit from the default: 10000:

exit_code = self.run_beat(extra_args=["setup", "--index-management", "-E", "setup.template.overwrite=true"])

to 10090:

exit_code = self.run_beat(extra_args=["setup", "--index-management", "-E", "setup.template.overwrite=true", "-E", "setup.template.settings.index.mapping.total_fields.limit=10090"])

the test passes.

Please advise what should I do now.

@pierrehilbert
Copy link
Collaborator

/test

@pierrehilbert
Copy link
Collaborator

/test

@pierrehilbert
Copy link
Collaborator

/test

@pierrehilbert
Copy link
Collaborator

/test

@pierrehilbert
Copy link
Collaborator

/test

@pierrehilbert
Copy link
Collaborator

run docs-build

@pierrehilbert
Copy link
Collaborator

@manuelsaks Sorry for the extra delay, we found the root cause of the issue you were facing and it has been fixed.
@lalit-satapathy could we please have someone from your team to review here?

@lalit-satapathy
Copy link
Contributor

@lalit-satapathy could we please have someone from your team to review here?

@kush-elastic can you please review this new beats module?

Comment on lines 19 to 30
- name: free_clients
type: long
description: >
Count of free clients. These are clients that are disconnected, but PgBouncer keeps the memory around that was allocated for them so it can be reused for future clients to avoid allocations.
- name: used_clients
type: long
description: >
Count of used clients.
- name: login_clients
type: long
description: >
Count of clients in login state.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you think of normalizing these fields in Sub JSON fields?

{
    "clients": {
        "free":0,
        "used":0,
        "login":0
    }
}

Same goes for servers as well. It will be easier for user to understand from document.

Copy link
Author

@manuelsaks manuelsaks Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

@kush-elastic kush-elastic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comments

Comment on lines 56 to 58
func (m *MetricSet) Fetch(reporter mb.ReporterV2) error {
ctx := context.Background()
results, err := m.QueryStats(ctx, "SHOW LISTS;")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you need context here, you should probably use interface ReportingMetricSetV2WithContext which will give you Fetch(ctx context.Context, r ReporterV2) error.
Avoid using Background context here.

Copy link
Author

@manuelsaks manuelsaks Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

metricbeat/module/pgbouncer/lists/lists.go Outdated Show resolved Hide resolved
for _, s := range results {
listValue, ok := s["list"].(string)
if !ok {
return fmt.Errorf("expected string type for 'list' but got something else")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we suppose to not collect remaining results and just return error from here?

Copy link
Author

@manuelsaks manuelsaks Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it might be better to log a warning instead of returning an error here, so that we can still process the remaining valid results. What do you think?

Comment on lines 38 to 41
if len(errs) > 0 {
t.Fatalf("Expected 0 error, had %d. %v\n", len(errs), errs)
}
assert.NotEmpty(t, events)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if len(errs) > 0 {
t.Fatalf("Expected 0 error, had %d. %v\n", len(errs), errs)
}
assert.NotEmpty(t, events)
require.Empty(t, errs, "Expected no errors during fetch")
require.NotEmpty(t, events, "Expected to receive at least one event")

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +14 to +25
"credentials_cache": {
"size": 616,
"used": 1,
"free": 49,
"memtotal": 30800
},
"peer_pool_cache": {
"size": 616,
"used": 1,
"free": 49,
"memtotal": 30800
},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOT SURE IF APPLICABLE:
how about normalizing this from cache?
if in future you add something else in the metricset it would be easier to manage those data as well.

Suggested change
"credentials_cache": {
"size": 616,
"used": 1,
"free": 49,
"memtotal": 30800
},
"peer_pool_cache": {
"size": 616,
"used": 1,
"free": 49,
"memtotal": 30800
},
{
"cache": {
"credentials": {
"size": 616,
"used": 1,
"free": 49,
"memtotal": 30800
},
"peer_pool": {
"size": 616,
"used": 1,
"free": 49,
"memtotal": 30800
}
}
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the only cache metrics, so I'm not sure if extracting them is necessary.

Comment on lines 57 to 59
func (m *MetricSet) Fetch(reporter mb.ReporterV2) error {
// Create a new context for this operation.
ctx := context.Background()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same goes here. you can use ReportingMetricSetV2WithContext.

Copy link
Author

@manuelsaks manuelsaks Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tmpData, err := schema.Apply(result)
if err != nil {
// Log the error and skip this iteration if schema application fails.
log.Printf("Error applying schema: %v", err)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BaseMetricSet should already have logger, please initialize that and use it across module.

Comment on lines 38 to 41
if len(errs) > 0 {
t.Fatalf("Expected 0 error, had %d. %v\n", len(errs), errs)
}
assert.NotEmpty(t, events)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please refer previous comment and update accordingly.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-8.x Automated backport to the 8.x branch with mergify Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants