-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Principal=>subordinate coordination for logrotation (grafana-agent causing a lot of unreleased files and huge syslog) #153
Comments
@simskij do you have good ideas how to proceed here? |
@taurus-forever Do you think this would be solved by not getting all the files from |
@lucabello AFAIK, no. The grafana-agent charm / promtail binary (did?) read some log files from disk (to send them to Loki). PostgreSQL charm rotates logs, but didn't send any signals to COS to close the current descriptor and reopen the file (as the old one moved to archive folder). IMHO, we have two options:
|
I'd appreciate your input on the above @taurus-forever. |
Patroni should be using an extended version of Python's RotatingFileHandler. Python's docs indicate “rename and create”. We can only configure size and amount of files to keep. Postgresql is configured to keep a week's worth of per minute logs that are truncated each minute. This is behaviour configured by us, but it is spec behaviour so discussions would be necessary to change it. Both Patroni and Postgresql try to keep about a week's worth of per minute logs, so that should be about 10k files for each.
I don't know how the agent detects log changes, but for Patroni only the last few logs should be relevant, the deeper backlog was likely already synced and shouldn't be changing. For Postgresql things are trickier, since the files are the same ( |
I'm hoping this is still on peoples' radar - this interaction between postgresql and grafana-agent is creating truly excessive logging in some environments. I'm seeing roughly 8 gigs of log lines a day just related to events related to the patroni log files. This is quickly filling the disk of the hosts in one of the clouds I am working on. (10,080 log lines * roughly 300 characters per line * 2,800 times a day (twice a minute) = 8,467,200,000 bytes as a rough estimate) |
Adding additional context, per discussion with the obs team: here is the related job in /etc/grafana-agent.yaml:
|
Summary of our call:
Per comment by @dragomirp this is by design:
We could see how ergonomic it would be to make the grafana-agent-operator/src/charm.py Lines 648 to 654 in 5ea3a57
That being said, the patroni log files seem to have little value:
I wonder if:
Another option: see if we can have grafana agent not log CREATE events. |
Hi, My current workaround is to route EDIT, looks like attaching the file failed, reproduced below: #!/bin/bash
[[ "$1" == "" ]] && exit 1
juju ssh $1 << 'EOF1'
sudo su
cat > /etc/logrotate.d/grafana-agent << EOF
/var/log/grafana-agent.log {
daily
rotate 7
create 0644 syslog root
missingok
notifempty
compress
delaycompress
dateext
postrotate
[ ! -x /usr/lib/rsyslog/rsyslog-rotate ] || /usr/lib/rsyslog/rsyslog-rotate
endscript
}
EOF
cat > /etc/rsyslog.d/22-grafana-agent.conf << 'EOF'
if $programname == 'grafana-agent.grafana-agent' then {
action(type="omfile" file="/var/log/grafana-agent.log" fileOwner="root" fileGroup="syslog")
stop
}
EOF
systemctl restart rsyslog
EOF1 |
Another option is to tell grafana-agent to exclude old patroni files. |
Hi, My 2c, I don't know details of grafana-agent operator implementation, but purely from a user POV it would be nice if the default is not to fill the disk even if the log level is verbose like in this case (an option is to drop a rsyslog and logrotate file like in my workaround or some other way to achieve the same control of file size/rotation frequency) Cheers |
Hey @verterok, Grafana agent logs
You could raise log_level to warn:
|
Hi, There seems to be no such config in the latest/stable charm ( latest/stable, 332 )
FWIW, my daily rotation config/setup was not enough and I got a dead postgresql cluster (luckily not a production one) |
From what I see, that option ( |
Unfortunately, it seems that this is not possible as gagent isn't exposing any way to change the list of events it's logging. |
@sed-i Regarding that config option ( |
Bug Description
Hi,
Please check the complete issue description in PostgreSQL repo:
canonical/postgresql-operator#524
TL;DR: PostgreSQL charm rotates logs but doesn't send any signals to subordinated grafana-agent causing a lot of unreleased files and huge syslog => downtime.
It is a cross-team ticket to build a solution here.
To Reproduce
See steps to reproduce in canonical/postgresql-operator#524
Environment
See Versions in canonical/postgresql-operator#524
Relevant log output
Additional context
Proposals:
Better ideas are welcome!
The text was updated successfully, but these errors were encountered: