Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizing for streaming atop data file from remote location or an atopDB perhaps? #318

Open
gleventhal opened this issue Nov 12, 2024 · 4 comments

Comments

@gleventhal
Copy link
Contributor

I love atop, it's the best tool of its kind for wide use, IMHO. I have many thousands of computers and would like to be able to deal with atop logs in a centralized way without requiring that the log be stored on local disk. I also want to retain at least several weeks of logs for each host.

Is there any recommended procedure or plans to support a centralized datastore or at least any optimizations for running atop with the data file location being a DFS (Ceph, NFS, S3, etc)?

@jbd
Copy link
Contributor

jbd commented Nov 12, 2024

Hello,

At work, we simply use rsync every minute from each client (hundreds at the moment, not thousands), with a random sleep to prevent hammering the rsync server. See #140 for a small discussion about it. We are running diskless machine, and we are keeping only one hour of atop log locally. The retention on the rsync destination is handled separately.

Some precautions about the atop filename to prevent previous file truncation in case of restart or reboot:

# /etc/default/atop
# [...]
# Note: CURDAY is not configurable (see https://github.com/Atoptool/atop/issues/140)
#
# Add hour (and minute) because atop will be restarted every hour
# Add boot time (in epoch format) because node could be rebooted
#CURDAY=`date +%Y%m%d`
CURDAY=$(date +%Y%m%d)_$(date +%H%M)_$(date --date="$(uptime -s)" +%Y%m%d%H%M%S)
# [...]

This is what our rsync script looks like (the destination directory is implicitly created, which is nice):

#!/bin/bash

# Ansible managed, please don't edit this file directly

### PREVENT concurrent execution
# This is useful boilerplate code for shell scripts. Put it at the top of the shell script you want to lock and it'll automatically lock itself on the first run.
# If the env var $FLOCKER is not set to the shell script that is being run, then execute flock and grab an exclusive non-blocking lock (using the script itself as the lock file)
# before re-execing itself with the right arguments.  It also sets the FLOCKER env var to the right value so it doesn't run again.
[ "${FLOCKER}" != "$0" ] && exec env FLOCKER="$0" flock --close -en "$0" "$0" "$@" || :

SHORTHOSTNAME=$(hostname -s)

RSYNC_DEST=atop@rsync-server::atop/${SHORTHOSTNAME?}
RSYNC_SRC=/var/log/atop/

RSYNC_SLEEP=$[($RANDOM % 30)]
TIMEOUT=15

sleep ${RSYNC_SLEEP?}

timeout ${TIMEOUT?} /usr/bin/rsync --password-file /etc/atop-rsync.secret -a ${RSYNC_SRC?} ${RSYNC_DEST?}
EXIT_VALUE=$?

if [ ${EXIT_VALUE?} != 0 ]; then
   logger -t atop-rsync "atop_rsync.sh from ${RSYNC_SRC?} to ${RSYNC_DEST?} failed (exit value was: ${EXIT_VALUE?})"
   exit 1
fi

It works well enough for us at the moment. See the previous mentioned issue

@natoscott
Copy link
Contributor

@gleventhal not sure if it helps, but there is a modified version of atop - pcp-atop(1) - in the Performance Co-Pilot (pcp.io) toolkit which supports distributed operation, either directly communicating with remote host, or running on central recording from remote hosts like you seek.

@famz
Copy link

famz commented Nov 14, 2024

Actually, I am thinking, would prometheus metrics be a good form of exporting data, then we can delegate the data storage, and query to prometheus.

A very incomplete PoC here:

43e1124

What I have in mind is either run a local prometheus to scrape from atop in real time, or aggregate data centrally.

Anyone interested in collaborating? I'm happy to continue my prototyping regardlessly for our data center use cases.

@pizhenwei
Copy link
Contributor

HI, @gleventhal
I wrote atophttpd with my colleague together. It reads atop log files and provides HTTP/HTTPS service. Hope this helps.

The atop log file is stored in raw binary format, and different version atop uses different logs. Storing lots of atop logs(may be in several version) into a centralized datastore may be difficult to manage. Instead, atophttpd provides json data, it's more friendly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants