Skip to content

Commit

Permalink
Add support to monitor bpf programs.
Browse files Browse the repository at this point in the history
BPF is very important component for modern Linux systems, and getting more
features and adoptions. This commit enables atop to monitor BPF programs.

The output looks like:

ATOP - kerneltest002       2020/06/16  17:01:12       --------------       10s elapsed
PRC |  sys    2.72s |  user   4.85s |  #proc    761  | #zombie    0  | #exit    250  |
CPU |  sys      29% |  user     50% |  irq       0%  | idle   7915%  | wait      8%  |
CPL |  avg1    1.68 |  avg5    1.05 |  avg15   0.72  | csw   160979  | intr   66341  |
[...]
BPF_PROG_ID                       NAME  TOTAL_TIME_NS      RUN_CNT     CPU AVG_TIME_NS
        894            tracepoint__sch          83882           11      0%     7625.64
        893            tracepoint__sch          43231            5      0%     8646.20
        892            tracepoint__tas          34818            4      0%     8704.50
    PID SYSCPU USRCPU  VGROW  RGROW  RDDSK  WRDSK EXC  THR S CPUNR  CPU CMD      1/113
2669644  0.45s  1.08s 603.1M 23100K     0K     0K   -   10 S    59  15% squashfuse_ll

To build atop with BPF monitoring, we need pass in option to make as:

    ATOP_BPF_SUPPORT=1 make -j

Atop periodically enables monitoring of BPF programs calling:
    bpf_enable_stats(BPF_STATS_RUN_TIME);

Since monitoring of BPF program has non-trivial overhead to the bpf
programs, the following options are added to only monitor BPF program
less often:

    bpfsamplerate, default 1
    bpfsampleinterval, default 1

bpf stats is enabled for bpfsampleinterval seconds every bpfsamplerate
atop intervals. bpfsampleinterval must be smaller than atop interval.

Changes v1 => v2:
1. Instead of using unsafe sysctl, using a safe new API to enable BPF
   runtime stats.
2. Change output columns: remove "TYPE", add "CPU" for cpu %.
  • Loading branch information
liu-song-6 committed Jul 9, 2020
1 parent dd0fb8a commit abe5f48
Show file tree
Hide file tree
Showing 12 changed files with 662 additions and 156 deletions.
12 changes: 9 additions & 3 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -28,12 +28,18 @@ OBJMOD3 = showgeneric.o showlinux.o showsys.o showprocs.o
OBJMOD4 = atopsar.o netatopif.o gpucom.o
ALLMODS = $(OBJMOD0) $(OBJMOD1) $(OBJMOD2) $(OBJMOD3) $(OBJMOD4)

ifneq ($(ATOP_BPF_SUPPORT),)
ALLMODS += photobpf.o
ATOP_BPF_LDFLAGS = -lbpf
CFLAGS += -DATOP_BPF_SUPPORT
endif

VERS = $(shell ./atop -V 2>/dev/null| sed -e 's/^[^ ]* //' -e 's/ .*//')

all: atop atopsar atopacctd atopconvert atopcat

atop: atop.o $(ALLMODS) Makefile
$(CC) atop.o $(ALLMODS) -o atop -lncursesw -lz -lm -lrt $(LDFLAGS)
$(CC) atop.o $(ALLMODS) -o atop -lncursesw -lz -lm -lrt $(ATOP_BPF_LDFLAGS) $(LDFLAGS)

atopsar: atop
ln -sf atop atopsar
Expand Down Expand Up @@ -187,7 +193,7 @@ versdate.h:
./mkdate

atop.o: atop.h photoproc.h photosyst.h acctproc.h showgeneric.h
atopsar.o: atop.h photoproc.h photosyst.h
atopsar.o: atop.h photoproc.h photosyst.h
rawlog.o: atop.h photoproc.h photosyst.h rawlog.h showgeneric.h
various.o: atop.h acctproc.h
ifprop.o: atop.h photosyst.h ifprop.h
Expand All @@ -200,7 +206,7 @@ photoproc.o: atop.h photoproc.h
photosyst.o: atop.h photosyst.h
showgeneric.o: atop.h photoproc.h photosyst.h showgeneric.h showlinux.h
showlinux.o: atop.h photoproc.h photosyst.h showgeneric.h showlinux.h
showsys.o: atop.h photoproc.h photosyst.h showgeneric.h
showsys.o: atop.h photoproc.h photosyst.h showgeneric.h
showprocs.o: atop.h photoproc.h photosyst.h showgeneric.h showlinux.h
version.o: version.c version.h versdate.h
gpucom.o: atop.h photoproc.h photosyst.h
Expand Down
94 changes: 68 additions & 26 deletions atop.c
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
/*
** ATOP - System & Process Monitor
**
** The program 'atop' offers the possibility to view the activity of
** The program 'atop' offers the possibility to view the activity of
** the system on system-level as well as process-level.
**
** This source-file contains the main-function, which verifies the
** calling-parameters and takes care of initialization.
** calling-parameters and takes care of initialization.
** The engine-function drives the main sample-loop in which after the
** indicated interval-time a snapshot is taken of the system-level and
** process-level counters and the deviations are calculated and
Expand Down Expand Up @@ -35,7 +35,7 @@
** --------------------------------------------------------------------------
**
** After initialization, the main-function calls the ENGINE.
** For every cycle (so after another interval) the ENGINE calls various
** For every cycle (so after another interval) the ENGINE calls various
** functions as shown below:
**
** +---------------------------------------------------------------------+
Expand All @@ -48,15 +48,15 @@
** | | ^ | ^ | ^ | ^ | | |
** +---|-----|--------|-----|--------|----|--------|----|--------|----|--+
** | | | | | | | | | |
** +--V-----|--+ +--V-----|--+ +--V----|--+ +--V----|--+ +--V----|-+
** +--V-----|--+ +--V-----|--+ +--V----|--+ +--V----|--+ +--V----|-+
** | | | | | | | | | |
** | photosyst | | photoproc | | acct | | deviate | | print |
** | | | | |photoproc | | ...syst | | |
** | | | | | | | ...proc | | |
** +-----------+ +-----------+ +----------+ +----------+ +---------+
** +-----------+ +-----------+ +----------+ +----------+ +---------+
** ^ ^ ^ ^ |
** | | | | |
** | | | V V
** | | | V V
** ______ _________ __________ ________ _________
** / \ / \ / \ / \ / \
** /proc /proc accounting task screen or
Expand Down Expand Up @@ -84,8 +84,8 @@
** When all counters have been gathered, functions are called to calculate
** the difference between the current counter-values and the counter-values
** of the previous cycle. These functions operate on the system-level
** as well as on the task-level counters.
** These differences are stored in a new structure(-table).
** as well as on the task-level counters.
** These differences are stored in a new structure(-table).
**
** - deviatsyst()
** Calculates the differences between the current system-level
Expand All @@ -98,7 +98,7 @@
** task-database; this "database" is implemented as a linked list
** of taskinfo structures in memory (so no disk-accesses needed).
** Within this linked list hash-buckets are maintained for fast searches.
** The entire task-database is handled via a set of well-defined
** The entire task-database is handled via a set of well-defined
** functions from which the name starts with "pdb_..." (see the
** source-file procdbase.c).
** The processes which have been finished during the last cycle
Expand All @@ -112,7 +112,7 @@
** these addresses can be modified in the main-function depending on particular
** flags. In this way various representation-layers (ASCII, graphical, ...)
** can be linked with 'atop'; the one to use can eventually be chosen
** at runtime.
** at runtime.
**
** $Log: atop.c,v $
** Revision 1.49 2010/10/23 14:01:00 gerlof
Expand Down Expand Up @@ -296,6 +296,7 @@
#include "showgeneric.h"
#include "parseable.h"
#include "gpucom.h"
#include "photobpf.h"

#define allflags "ab:cde:fghijklmnopqrstuvwxyz1ABCDEFGHIJKL:MNOP:QRSTUVWXYZ"
#define MAXFL 64 /* maximum number of command-line flags */
Expand All @@ -322,6 +323,16 @@ char threadview = 0; /* boolean: show individual threads */
char calcpss = 0; /* boolean: read/calculate process PSS */
char getwchan = 0; /* boolean: obtain wchan string */

/*
** arguments for bpf stats sampling
** We enable bpf stats for bpfsampleinterval seconds every bpfsamplerate
** atop intervals. bpfsampleinterval must be smaller than atop interval.
**
** If bpfsamplerate == 0, disable sampling of bpf stats.
*/
unsigned int bpfsamplerate = 1;
unsigned int bpfsampleinterval = 1;

unsigned short hertz;
unsigned int pagesize;
unsigned int nrgpus;
Expand Down Expand Up @@ -391,6 +402,9 @@ void do_almostcrit(char *, char *);
void do_atopsarflags(char *, char *);
void do_pacctdir(char *, char *);
void do_perfevents(char *, char *);
void do_bpflines(char *, char *);
void do_bpfsamplerate(char *, char *);
void do_bpfsampleinterval(char *, char *);

static struct {
char *tag;
Expand Down Expand Up @@ -440,6 +454,9 @@ static struct {
{ "atopsarflags", do_atopsarflags, 0, },
{ "perfevents", do_perfevents, 0, },
{ "pacctdir", do_pacctdir, 1, },
{ "bpflines", do_bpflines, 0, },
{ "bpfsamplerate", do_bpfsamplerate, 0, },
{ "bpfsampleinterval", do_bpfsampleinterval, 0, },
};

/*
Expand All @@ -466,6 +483,8 @@ main(int argc, char *argv[])
exit(42);
}

photo_bpf_check();

/*
** preserve command arguments to allow restart of other version
*/
Expand Down Expand Up @@ -497,12 +516,12 @@ main(int argc, char *argv[])
if ( memcmp(p, "atopsar", 7) == 0)
return atopsar(argc, argv);

/*
** interpret command-line arguments & flags
/*
** interpret command-line arguments & flags
*/
if (argc > 1)
{
/*
/*
** gather all flags for visualization-functions
**
** generic flags will be handled here;
Expand Down Expand Up @@ -600,17 +619,17 @@ main(int argc, char *argv[])
}

/*
** get optional interval-value and optional number of samples
** get optional interval-value and optional number of samples
*/
if (optind < argc && optind < MAXFL)
{
if (!numeric(argv[optind]))
prusage(argv[0]);

interval = atoi(argv[optind]);

optind++;

if (optind < argc)
{
if (!numeric(argv[optind]) )
Expand Down Expand Up @@ -766,6 +785,7 @@ engine(void)
gpupending=0; /* boolean: request sent */

struct gpupidstat *gp = NULL;
struct bstats *bstats = NULL;

/*
** initialization: allocate required memory dynamically
Expand Down Expand Up @@ -817,6 +837,8 @@ engine(void)
if (nrgpus)
supportflags |= GPUSTAT;

if (system_support_bpf())
supportflags |= BPFSTAT;
/*
** MAIN-LOOP:
** - Wait for the requested number of seconds or for other trigger
Expand All @@ -838,11 +860,15 @@ engine(void)
/*
** if the limit-flag is specified:
** check if the next sample is expected before midnight;
** if not, stop atop now
** if not, stop atop now
*/
if (midnightflag && (curtime+interval) > timelimit)
break;

if ((supportflags & BPFSTAT) &&
bpfsamplerate && sampcnt % bpfsamplerate == 0)
bstats = get_devbstats();

/*
** wait for alarm-signal to arrive (except first sample)
** or wait for SIGUSR1/SIGUSR2
Expand All @@ -859,13 +885,13 @@ engine(void)
curtime = time(0); /* seconds since 1-1-1970 */

/*
** send request for statistics to atopgpud
** send request for statistics to atopgpud
*/
if (nrgpus)
gpupending = gpud_statrequest();

/*
** take a snapshot of the current system-level statistics
** take a snapshot of the current system-level statistics
** and calculate the deviations (i.e. calculate the activity
** during the last sample)
*/
Expand Down Expand Up @@ -918,7 +944,7 @@ engine(void)
curtime-pretime > 0 ? curtime-pretime : 1);

/*
** take a snapshot of the current task-level statistics
** take a snapshot of the current task-level statistics
** and calculate the deviations (i.e. calculate the activity
** during the last sample)
**
Expand Down Expand Up @@ -1013,10 +1039,14 @@ engine(void)
** the deviations
*/
lastcmd = (vis.show_samp)( curtime,
curtime-pretime > 0 ? curtime-pretime : 1,
&devtstat, devsstat,
nprocexit, noverflow, sampcnt==0);
curtime-pretime > 0 ? curtime-pretime : 1,
&devtstat, devsstat, bstats,
nprocexit, noverflow, sampcnt==0);

if (bstats) {
free(bstats->bpfall);
bstats = NULL;
}
/*
** release dynamically allocated memory
*/
Expand Down Expand Up @@ -1065,7 +1095,7 @@ prusage(char *myname)
printf("\t -%c show version information\n", MVERSION);
printf("\t -%c show or log all processes (i.s.o. active processes "
"only)\n", MALLPROC);
printf("\t -%c calculate proportional set size (PSS) per process\n",
printf("\t -%c calculate proportional set size (PSS) per process\n",
MCALCPSS);
printf("\t -%c determine WCHAN (string) per thread\n", MGETWCHAN);
printf("\t -P generate parseable output for specified label(s)\n");
Expand Down Expand Up @@ -1146,6 +1176,18 @@ do_linelength(char *name, char *val)
linelen = get_posval(name, val);
}

void
do_bpfsamplerate(char *name, char *val)
{
bpfsamplerate = get_posval(name, val);
}

void
do_bpfsampleinterval(char *name, char *val)
{
bpfsampleinterval = get_posval(name, val);
}

/*
** read RC-file and modify defaults accordingly
*/
Expand Down Expand Up @@ -1196,7 +1238,7 @@ readrc(char *path, int syslevel)
default:
if (tagname[0] == '#')
continue;

if (tagvalue[0] != '#')
break;

Expand Down
15 changes: 10 additions & 5 deletions atop.h
Original file line number Diff line number Diff line change
Expand Up @@ -42,8 +42,9 @@ struct tstat;
struct devtstat;
struct sstat;
struct netpertask;
struct bstats;

/*
/*
** miscellaneous flags
*/
#define RRBOOT 0x0001
Expand All @@ -57,7 +58,7 @@ struct netpertask;

struct visualize {
char (*show_samp) (time_t, int,
struct devtstat *, struct sstat *,
struct devtstat *, struct sstat *, struct bstats *,
int, unsigned int, char);
void (*show_error) (const char *, ...);
void (*show_end) (void);
Expand Down Expand Up @@ -105,6 +106,9 @@ extern int netbadness;
extern int pagbadness;
extern int almostcrit;

extern int bpflines;
extern unsigned int bpfsampleinterval;

/*
** bit-values for supportflags
*/
Expand All @@ -114,9 +118,10 @@ extern int almostcrit;
#define NETATOPD 0x00000020
#define DOCKSTAT 0x00000040
#define GPUSTAT 0x00000080
#define BPFSTAT 0x00000100

/*
** in rawlog file, the four least significant bits
** in rawlog file, the four least significant bits
** are moved to the per-sample flags and therefor dummy
** in the support flags of the general header
*/
Expand All @@ -126,7 +131,7 @@ extern int almostcrit;
** structure containing the start-addresses of functions for visualization
*/
char generic_samp (time_t, int,
struct devtstat *, struct sstat *,
struct devtstat *, struct sstat *, struct bstats *,
int, unsigned int, char);
void generic_error(const char *, ...);
void generic_end (void);
Expand Down Expand Up @@ -168,7 +173,7 @@ int contcompar(const void *, const void *);
count_t subcount(count_t, count_t);
int rawread(void);
char rawwrite (time_t, int,
struct devtstat *, struct sstat *,
struct devtstat *, struct sstat *, struct bstats *,
int, unsigned int, char);

int numeric(char *);
Expand Down
Loading

0 comments on commit abe5f48

Please sign in to comment.