Optimizing for streaming atop data file from remote location or an atopDB perhaps? #318

gleventhal · 2024-11-12T14:05:32Z

I love atop, it's the best tool of its kind for wide use, IMHO. I have many thousands of computers and would like to be able to deal with atop logs in a centralized way without requiring that the log be stored on local disk. I also want to retain at least several weeks of logs for each host.

Is there any recommended procedure or plans to support a centralized datastore or at least any optimizations for running atop with the data file location being a DFS (Ceph, NFS, S3, etc)?

jbd · 2024-11-12T14:19:52Z

Hello,

At work, we simply use rsync every minute from each client (hundreds at the moment, not thousands), with a random sleep to prevent hammering the rsync server. See #140 for a small discussion about it. We are running diskless machine, and we are keeping only one hour of atop log locally. The retention on the rsync destination is handled separately.

Some precautions about the atop filename to prevent previous file truncation in case of restart or reboot:

# /etc/default/atop
# [...]
# Note: CURDAY is not configurable (see https://github.com/Atoptool/atop/issues/140)
#
# Add hour (and minute) because atop will be restarted every hour
# Add boot time (in epoch format) because node could be rebooted
#CURDAY=`date +%Y%m%d`
CURDAY=$(date +%Y%m%d)_$(date +%H%M)_$(date --date="$(uptime -s)" +%Y%m%d%H%M%S)
# [...]

This is what our rsync script looks like (the destination directory is implicitly created, which is nice):

#!/bin/bash

# Ansible managed, please don't edit this file directly

### PREVENT concurrent execution
# This is useful boilerplate code for shell scripts. Put it at the top of the shell script you want to lock and it'll automatically lock itself on the first run.
# If the env var $FLOCKER is not set to the shell script that is being run, then execute flock and grab an exclusive non-blocking lock (using the script itself as the lock file)
# before re-execing itself with the right arguments.  It also sets the FLOCKER env var to the right value so it doesn't run again.
[ "${FLOCKER}" != "$0" ] && exec env FLOCKER="$0" flock --close -en "$0" "$0" "$@" || :

SHORTHOSTNAME=$(hostname -s)

RSYNC_DEST=atop@rsync-server::atop/${SHORTHOSTNAME?}
RSYNC_SRC=/var/log/atop/

RSYNC_SLEEP=$[($RANDOM % 30)]
TIMEOUT=15

sleep ${RSYNC_SLEEP?}

timeout ${TIMEOUT?} /usr/bin/rsync --password-file /etc/atop-rsync.secret -a ${RSYNC_SRC?} ${RSYNC_DEST?}
EXIT_VALUE=$?

if [ ${EXIT_VALUE?} != 0 ]; then
   logger -t atop-rsync "atop_rsync.sh from ${RSYNC_SRC?} to ${RSYNC_DEST?} failed (exit value was: ${EXIT_VALUE?})"
   exit 1
fi

It works well enough for us at the moment. See the previous mentioned issue

natoscott · 2024-11-12T22:21:06Z

@gleventhal not sure if it helps, but there is a modified version of atop - pcp-atop(1) - in the Performance Co-Pilot (pcp.io) toolkit which supports distributed operation, either directly communicating with remote host, or running on central recording from remote hosts like you seek.

famz · 2024-11-14T18:01:13Z

Actually, I am thinking, would prometheus metrics be a good form of exporting data, then we can delegate the data storage, and query to prometheus.

A very incomplete PoC here:

43e1124

What I have in mind is either run a local prometheus to scrape from atop in real time, or aggregate data centrally.

Anyone interested in collaborating? I'm happy to continue my prototyping regardlessly for our data center use cases.

pizhenwei · 2024-11-18T10:00:26Z

HI, @gleventhal
I wrote atophttpd with my colleague together. It reads atop log files and provides HTTP/HTTPS service. Hope this helps.

The atop log file is stored in raw binary format, and different version atop uses different logs. Storing lots of atop logs(may be in several version) into a centralized datastore may be difficult to manage. Instead, atophttpd provides json data, it's more friendly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimizing for streaming atop data file from remote location or an atopDB perhaps? #318

Optimizing for streaming atop data file from remote location or an atopDB perhaps? #318

gleventhal commented Nov 12, 2024

jbd commented Nov 12, 2024 •

edited

Loading

natoscott commented Nov 12, 2024

famz commented Nov 14, 2024 •

edited

Loading

pizhenwei commented Nov 18, 2024

Optimizing for streaming atop data file from remote location or an atopDB perhaps? #318

Optimizing for streaming atop data file from remote location or an atopDB perhaps? #318

Comments

gleventhal commented Nov 12, 2024

jbd commented Nov 12, 2024 • edited Loading

natoscott commented Nov 12, 2024

famz commented Nov 14, 2024 • edited Loading

pizhenwei commented Nov 18, 2024

jbd commented Nov 12, 2024 •

edited

Loading

famz commented Nov 14, 2024 •

edited

Loading