-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
metricbeat/module/postgresql: Add PostgreSQL Replication metricset #35562
Conversation
This pull request does not have a backport label.
To fixup this pull request, you need to add the backport labels for the needed
|
❕ Build Aborted
Expand to view the summary
Build stats
Steps errorsExpand to view the steps failures
|
Thanks for the contribution @zxcSora . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your contribution.
Left some comments that can easily be fixed. I'll next do a thorough technical review as well, please give me more time to review the same.
I left some comments for parts which are even autogenerated. I think we should file a separate issue to fix the template used for autogenerating these.
Here's one: metricset.go.tmpl where some doc comments for some methods (and functions) can be improved. For example: "Fetch method implements" instead of "Fetch methods implements"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your contribution.
Left some comments that can easily be fixed. I'll next do a thorough technical review as well, please give me more time to review the same.
I left some comments for parts which are even autogenerated. I think we should file a separate issue to fix the template used for autogenerating these.
Here's one: metricset.go.tmpl where some doc comments for some methods (and functions) can be improved. For example: "Fetch method implements" instead of "Fetch methods implements"
Thanks for your comments. I commited changes. Waiting for technical review |
"write_lsn": c.Str("write_lsn"), | ||
"flush_lsn": c.Str("flush_lsn"), | ||
"replay_lsn": c.Str("replay_lsn"), | ||
"write_lag": c.Str("write_lag"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the lag related fields have postgresql datatype of "Interval", which requires 16 bytes storage.
Here we are storing it as string. Doesn't look to be appropriate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you suggest storing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shmsr Maybe you can suggest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either we have to use a custom data type or maybe string only. Depends.
Example of a custom type: https://github.com/sanyokbig/pqinterval/blob/v1.1.2/interval.go#L11
https://www.postgresql.org/docs/current/datatype-datetime.html#DATATYPE-INTERVAL-INPUT
I'd suggest doing some more research before finalizing what to use. @zxcSora please let us know what are you going to use and why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in my opinion, sent_lsn, write_lsn, flush_lsn, replay_lsn shoud be a string, because value of this parameters it's like:
sent_lsn | 3D0/F7574390
write_lsn | 3D0/F7574390
flush_lsn | 3D0/F7574390
replay_lsn | 3D0/F7574390
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shmsr I think that making a custom parser does not make sense, so I slightly changed the query to the table so that I get an integer(uint). Means delay in seconds (ignoring nanoseconds). Check last commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For lsn fields, string datatype looks fine.
Also, please don't mark comments as resolved before the reviewers validate if your changes really fixed them especially if the change request is fixing something complex (not some nitpick). I had to open multiple comments just because the changes didn't actually fix them, and I again asked for a change request. If the changes actually fix the issue mentioned, we'll mark them as resolved. Thank you! |
Co-authored-by: subham sarkar <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing multiple review comments.
Looks good!
How i can request review from another codeowners ? or it will happen automatically ? |
@elastic/elastic-agent-data-plane : Could you please review this change. |
This pull request is now in conflicts. Could you fix it? 🙏
|
func getConfig(host string) map[string]interface{} { | ||
return map[string]interface{}{ | ||
"module": "postgresql", | ||
"metricsets": []string{"database"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this meant to be "metricsets": []string{"replication"},
instead? In case that was a copy-paste error, I ran it with the change, still got an error from TestFetch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fearful-symmetry
i think we can't do tests like this because creating cluster from docker-compose its bad idea (too much bad code)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, okay. Should we just remove this test, then? Find some smaller unit tests we can add here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i have no idea what i can find in this situation, do you ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea solution here would be to fix the tests or fix the code.
if len(errs) > 0 { | ||
t.Fatalf("Expected 0 error, had %d: %v", len(errs), errs) | ||
} | ||
assert.NotEmpty(t, events) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be require.NotEmpty()
, so we don't just panic on the next line if this is empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed
@zxcSora Did you get time to review @fearful-symmetry's comments? |
Have no idea... |
So, it looks like the integration test still isn't passing:
Not a postgres expert, but I assume it's because a newly-created db instance doesn't have any stats to report? I wonder if we can somehow just inject some statistics, or if we can't, we might just have to skip the integration tests entirely. |
@zxcSora any news? |
I'm closing for now as we don't have any news. |
What does this PR do?
Add metricset for replication to Metricbeat PostgreSQL module.
Why is it important?
It is important monitoring PostgreSQL replication
Checklist
CHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.Author's Checklist
How to test this PR locally
Related issues
Use cases
Screenshots
Screenshots from real primary production server with 2 replicas.
Logs