Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metricbeat/module/postgresql: Add PostgreSQL Replication metricset #35562

Closed
wants to merge 27 commits into from
Closed

metricbeat/module/postgresql: Add PostgreSQL Replication metricset #35562

wants to merge 27 commits into from

Conversation

zxcSora
Copy link

@zxcSora zxcSora commented May 24, 2023

  • Enhancement

What does this PR do?

Add metricset for replication to Metricbeat PostgreSQL module.

Why is it important?

It is important monitoring PostgreSQL replication

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Author's Checklist

  • [ ]

How to test this PR locally

  1. You need 2 PostgreSQL servers, Primary<->Replica.
  2. Build metricbeat from this PR
  3. Run metricbeat with:
- module: postgresql
  metricsets:
    - replication
  hosts: ["postgres://user:password@localhost:5432/postgres"]

Related issues

Use cases

Screenshots

Screenshots from real primary production server with 2 replicas.
image
image

Logs

@zxcSora zxcSora requested review from a team as code owners May 24, 2023 14:21
@zxcSora zxcSora requested review from faec and leehinman May 24, 2023 14:21
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label May 24, 2023
@mergify
Copy link
Contributor

mergify bot commented May 24, 2023

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @zxcSora? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v8./d.0 is the label to automatically backport to the 8./d branch. /d is the digit

@elasticmachine
Copy link
Collaborator

elasticmachine commented May 24, 2023

❕ Build Aborted

The PR is not allowed to run in the CI yet

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Start Time: 2023-09-19T15:05:24.770+0000

  • Duration: 5 min 50 sec

Steps errors 2

Expand to view the steps failures

Load a resource file from a library
  • Took 0 min 0 sec . View more details here
  • Description: approval-list/elastic/beats.yml
Error signal
  • Took 0 min 0 sec . View more details here
  • Description: githubApiCall: The REST API call https://api.github.com/orgs/elastic/members/zxcSora return the message : java.lang.Exception: httpRequest: Failure connecting to the service https://api.github.com/orgs/elastic/members/zxcSora : httpRequest: Failure connecting to the service https://api.github.com/orgs/elastic/members/zxcSora : Code: 404Error: {"message":"User does not exist or is not a member of the organization","documentation_url":"https://docs.github.com/rest/orgs/members#check-organization-membership-for-a-user"}

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@ishleenk17
Copy link
Contributor

Thanks for the contribution @zxcSora .
We will be sharing our reviews on this soon.

@shmsr shmsr changed the title Add PostgresSQL Replication metricset Add PostgreSQL Replication metricset May 25, 2023
@shmsr shmsr self-requested a review May 30, 2023 05:45
Copy link
Member

@shmsr shmsr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution.

Left some comments that can easily be fixed. I'll next do a thorough technical review as well, please give me more time to review the same.

I left some comments for parts which are even autogenerated. I think we should file a separate issue to fix the template used for autogenerating these.

Here's one: metricset.go.tmpl where some doc comments for some methods (and functions) can be improved. For example: "Fetch method implements" instead of "Fetch methods implements"

metricbeat/docs/fields.asciidoc Show resolved Hide resolved
metricbeat/docs/fields.asciidoc Outdated Show resolved Hide resolved
metricbeat/docs/fields.asciidoc Outdated Show resolved Hide resolved
metricbeat/docs/fields.asciidoc Outdated Show resolved Hide resolved
metricbeat/docs/fields.asciidoc Outdated Show resolved Hide resolved
metricbeat/module/postgresql/replication/replication.go Outdated Show resolved Hide resolved
metricbeat/module/postgresql/replication/replication.go Outdated Show resolved Hide resolved
metricbeat/module/postgresql/replication/replication.go Outdated Show resolved Hide resolved
metricbeat/docs/fields.asciidoc Outdated Show resolved Hide resolved
Copy link
Member

@shmsr shmsr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution.

Left some comments that can easily be fixed. I'll next do a thorough technical review as well, please give me more time to review the same.

I left some comments for parts which are even autogenerated. I think we should file a separate issue to fix the template used for autogenerating these.

Here's one: metricset.go.tmpl where some doc comments for some methods (and functions) can be improved. For example: "Fetch method implements" instead of "Fetch methods implements"

@zxcSora
Copy link
Author

zxcSora commented May 31, 2023

Thank you for your contribution.

Left some comments that can easily be fixed. I'll next do a thorough technical review as well, please give me more time to review the same.

I left some comments for parts which are even autogenerated. I think we should file a separate issue to fix the template used for autogenerating these.

Here's one: metricset.go.tmpl where some doc comments for some methods (and functions) can be improved. For example: "Fetch method implements" instead of "Fetch methods implements"

Thanks for your comments. I commited changes. Waiting for technical review

@zxcSora zxcSora requested a review from shmsr May 31, 2023 04:49
"write_lsn": c.Str("write_lsn"),
"flush_lsn": c.Str("flush_lsn"),
"replay_lsn": c.Str("replay_lsn"),
"write_lag": c.Str("write_lag"),
Copy link
Contributor

@ishleenk17 ishleenk17 May 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the lag related fields have postgresql datatype of "Interval", which requires 16 bytes storage.
Here we are storing it as string. Doesn't look to be appropriate?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you suggest storing?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shmsr Maybe you can suggest?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either we have to use a custom data type or maybe string only. Depends.

Example of a custom type: https://github.com/sanyokbig/pqinterval/blob/v1.1.2/interval.go#L11

https://www.postgresql.org/docs/current/datatype-datetime.html#DATATYPE-INTERVAL-INPUT

I'd suggest doing some more research before finalizing what to use. @zxcSora please let us know what are you going to use and why.

Copy link
Author

@zxcSora zxcSora Jun 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in my opinion, sent_lsn, write_lsn, flush_lsn, replay_lsn shoud be a string, because value of this parameters it's like:

sent_lsn         | 3D0/F7574390
write_lsn        | 3D0/F7574390
flush_lsn        | 3D0/F7574390
replay_lsn       | 3D0/F7574390

Copy link
Author

@zxcSora zxcSora Jun 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shmsr I think that making a custom parser does not make sense, so I slightly changed the query to the table so that I get an integer(uint). Means delay in seconds (ignoring nanoseconds). Check last commit

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For lsn fields, string datatype looks fine.

metricbeat/module/postgresql/replication/data.go Outdated Show resolved Hide resolved
metricbeat/module/postgresql/replication/data.go Outdated Show resolved Hide resolved
metricbeat/module/postgresql/replication/replication.go Outdated Show resolved Hide resolved
metricbeat/module/postgresql/replication/data.go Outdated Show resolved Hide resolved
metricbeat/module/postgresql/replication/_meta/fields.yml Outdated Show resolved Hide resolved
metricbeat/module/postgresql/replication/_meta/fields.yml Outdated Show resolved Hide resolved
metricbeat/docs/fields.asciidoc Outdated Show resolved Hide resolved
@shmsr
Copy link
Member

shmsr commented Jun 6, 2023

Also, please don't mark comments as resolved before the reviewers validate if your changes really fixed them especially if the change request is fixing something complex (not some nitpick). I had to open multiple comments just because the changes didn't actually fix them, and I again asked for a change request. If the changes actually fix the issue mentioned, we'll mark them as resolved. Thank you!

@zxcSora zxcSora requested a review from a team as a code owner June 6, 2023 09:55
Co-authored-by: subham sarkar <[email protected]>
@zxcSora zxcSora requested a review from ishleenk17 July 11, 2023 08:21
Copy link
Contributor

@ishleenk17 ishleenk17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing multiple review comments.
Looks good!

@zxcSora
Copy link
Author

zxcSora commented Jul 14, 2023

Thanks for addressing multiple review comments. Looks good!

How i can request review from another codeowners ? or it will happen automatically ?

@ishleenk17
Copy link
Contributor

@elastic/elastic-agent-data-plane : Could you please review this change.

@mergify
Copy link
Contributor

mergify bot commented Jul 27, 2023

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b main upstream/main
git merge upstream/main
git push upstream main

@pierrehilbert pierrehilbert added the Team:Elastic-Agent Label for the Agent team label Jul 28, 2023
@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Jul 28, 2023
func getConfig(host string) map[string]interface{} {
return map[string]interface{}{
"module": "postgresql",
"metricsets": []string{"database"},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this meant to be "metricsets": []string{"replication"}, instead? In case that was a copy-paste error, I ran it with the change, still got an error from TestFetch

Copy link
Author

@zxcSora zxcSora Aug 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fearful-symmetry
i think we can't do tests like this because creating cluster from docker-compose its bad idea (too much bad code)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, okay. Should we just remove this test, then? Find some smaller unit tests we can add here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i have no idea what i can find in this situation, do you ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea solution here would be to fix the tests or fix the code.

if len(errs) > 0 {
t.Fatalf("Expected 0 error, had %d: %v", len(errs), errs)
}
assert.NotEmpty(t, events)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be require.NotEmpty(), so we don't just panic on the next line if this is empty.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

@pierrehilbert
Copy link
Collaborator

@zxcSora Did you get time to review @fearful-symmetry's comments?

@zxcSora
Copy link
Author

zxcSora commented Sep 19, 2023

@zxcSora Did you get time to review @fearful-symmetry's comments?

Have no idea...

@fearful-symmetry
Copy link
Contributor

fearful-symmetry commented Oct 4, 2023

So, it looks like the integration test still isn't passing:

=== RUN   TestFetch
--no-ansi option is deprecated and will be removed in future versions. Use `--ansi never` instead.
--no-ansi option is deprecated and will be removed in future versions. Use `--ansi never` instead.
--no-ansi option is deprecated and will be removed in future versions. Use `--ansi never` instead.
    replication_integration_test.go:43: 
                Error Trace:    /home/alexk/go/src/github.com/elastic/beats/metricbeat/module/postgresql/replication/replication_integration_test.go:43
                Error:          Should NOT be empty, but was []
                Test:           TestFetch
--- FAIL: TestFetch (1.38s)
=== RUN   TestData
--no-ansi option is deprecated and will be removed in future versions. Use `--ansi never` instead.
--no-ansi option is deprecated and will be removed in future versions. Use `--ansi never` instead.
--no-ansi option is deprecated and will be removed in future versions. Use `--ansi never` instead.
    data_generator.go:72: skip data generation tests
--- SKIP: TestData (1.35s)
FAIL
exit status 1
FAIL    github.com/elastic/beats/v7/metricbeat/module/postgresql/replication    2.743s

Not a postgres expert, but I assume it's because a newly-created db instance doesn't have any stats to report? I wonder if we can somehow just inject some statistics, or if we can't, we might just have to skip the integration tests entirely.

@pierrehilbert
Copy link
Collaborator

@zxcSora any news?

@pierrehilbert
Copy link
Collaborator

I'm closing for now as we don't have any news.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Elastic-Agent Label for the Agent team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add metricset for replication to Metricbeat Postgresql module
8 participants