Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Rob's cool vmstat/gnuplot tooling to plot hardware utilization #302

Open
mikemccand opened this issue Oct 2, 2024 · 11 comments
Open

Comments

@mikemccand
Copy link
Owner

@rmuir has some cool tooling that measures how effectively processes are using CPU/RAM/IO. Let's turn this on at least during indexing phases of nightly benchy, when we are trying to fully saturate the available hardware resources...

@rmuir
Copy link
Collaborator

rmuir commented Oct 2, 2024

@mikemccand to start with "text mode", just do this:

start of benchy:

$ nohup vmstat --active --wide --timestamp --unit M 1 > vmstat.log 2>/dev/null &

end of benchy:

$ pkill vmstat

it should produce human readable log looking like this:

router:~$  vmstat --active --wide --timestamp --unit M 1
--procs-- -----------------------memory---------------------- ---swap-- -----io---- -system-- ----------cpu---------- -----timestamp-----
   r    b         swpd         free        inact       active   si   so    bi    bo   in   cs  us  sy  id  wa  st  gu                 EDT
   1    0            0         3177           89          140    0    0     2     6  277    0   0   0 100   0   0   0 2024-10-02 08:12:00
   0    0            0         3177           89          140    0    0     0     0  323  161   0   0 100   0   0   0 2024-10-02 08:12:01
   1    0            0         3177           89          140    0    0     0   956  373  222   0   0 100   0   0   0 2024-10-02 08:12:02
   0    0            0         3177           89          140    0    0     0     0   93   84   0   0 100   0   0   0 2024-10-02 08:12:03
   0    0            0         3177           89          140    0    0     0     8  170  134   0   0 100   0   0   0 2024-10-02 08:12:04

We can then add fancy graph plot to "end of benchy", but the text mode itself is a good way to start?

@rmuir
Copy link
Collaborator

rmuir commented Oct 2, 2024

also if you don't want output in your local time, set TZ variable:

$ TZ=UTC vmstat ...

if you don't want inactive/active remove the --active flag. I prefer this way of breaking down the memory, but maybe for lucene, you'd prefer to just monitor OS cache instead:

$ TZ=UTC vmstat --wide --timestamp --unit M 1
--procs-- -----------------------memory---------------------- ---swap-- -----io---- -system-- ----------cpu---------- -----timestamp-----
   r    b         swpd         free         buff        cache   si   so    bi    bo   in   cs  us  sy  id  wa  st  gu                 UTC
   1    0            0        14480          178         1890    0    0    17    84  628    3   3   1  95   0   0   0 2024-10-02 12:49:15
   1    0            0        14485          178         1880    0    0     0     0  391  733   0   2  98   0   0   0 2024-10-02 12:49:16
   0    0            0        14494          178         1875    0    0     0     0  433  784   1   1  98   0   0   0 2024-10-02 12:49:17
   0    0            0        14506          178         1873    0    0     0     0  374 1083   3   1  96   0   0   0 2024-10-02 12:49:18

@mikemccand
Copy link
Owner Author

OK! I did the first step, which is to run vmstat during each indexing run and save away the resulting log. Yesterday's run produced these exciting vmstat logs:

beast3:tmp$ wc -l *vmstat*
    120 fastIndexBigDocs.vmstat.log
    148 fastIndexMediumDocs.vmstat.log
   3903 fastIndexMediumDocsWithVectorsQuantized.vmstat.log
   6416 fastIndexMediumDocsWithVectors.vmstat.log
  20995 fixedIndex.vmstat.log
    240 nrtIndexMediumDocs.vmstat.log
  31822 total

Only 120 lines (120 seconds) for fastIndexBigDocs.vmsat.log!

Next we need to somehow gnuplot these buggers?

@rmuir
Copy link
Collaborator

rmuir commented Oct 9, 2024

thanks @mikemccand! can you hook me up with these sample data files? it will be faster to make it look decent.

@mikemccand
Copy link
Owner Author

Does this work? vmstat-logs.zip

This is all the vmstat logs, each from building the different N indices that nightly benchy builds ...

@rmuir
Copy link
Collaborator

rmuir commented Oct 9, 2024

thank you @mikemccand, that works. crazy how idle your CPU is while doing this.

@rmuir
Copy link
Collaborator

rmuir commented Oct 10, 2024

@mikemccand see https://rmuir.org/vmstat-report/

You can grab the 3 files there to reproduce it:

  • vmstat.gpi: gnuplot file to make several .svg files
  • index.html.template: simple html file to lay them out
  • generate.sh: script that makes directory with these files for each log file

Seems the hardware is not fully utilized. There is a direct correlation to the CPU usage with the number of process (=threads here). I guess you have some insane number of cores?

Screen_Shot_2024-10-10_at_00 47 43

@mikemccand
Copy link
Owner Author

Whoa, thanks @rmuir! Nice charts! Yeah, this is beast3 with 128 cores, and one of those indexing runs (the fixedIndex one) uses only one core (mostly) so that it generates a precise deterministic index each night so the search results are "comparable". But for the other runs, I think I expect them to be saturating CPU (ish?) ... maybe I'm not using enough concurrency for them!

@rmuir
Copy link
Collaborator

rmuir commented Oct 10, 2024

This makes sense. These are just a start, we can tweak/beautify later once things are working.

I struggled a bit because SVG files were enormous out of box: the program is written by scientists, so SVG out of box were totally lossless and accurately representing all 20,000+ datapoints in your fixedIndex, which made for huge sizes.

I had not done interpolation before so had to figure it out, I tried to add comments so you can tweak any of this if you want.

@mikemccand
Copy link
Owner Author

Last night's benchy made the nice gnuplot vmstat charts!

It's linked off the nightly details page, to here. Each sub-directory has the charts for building that one index, e.g. fastIndexMediumDocs. It is quite depressing how idle the CPUs actually are, especially at the end of the run. We are not waiting for merges, so I suspect this is just Lucene moving the (big) in-memory segments to disk ... I'll see if I can better tune these runs to saturate the hardware.

@rmuir
Copy link
Collaborator

rmuir commented Oct 11, 2024

We are not waiting for merges, so I suspect this is just Lucene moving the (big) in-memory segments to disk ... I'll see if I can better tune these runs to saturate the hardware.

@mikemccand This could be whats happening. Normally, I'd expect to see some iowait here for something like that (maybe, iowait is tricky like that).

But there could be a challenge: you have 128 cpus, so if e.g. we have single cpu in iowait, we may not see it represented in the vmstat output, because it is not even 1%?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants