Skip to content

system with OpenGrok docker containers maxed out memory #3089

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Ymoise opened this issue Mar 24, 2020 · 15 comments
Closed

system with OpenGrok docker containers maxed out memory #3089

Ymoise opened this issue Mar 24, 2020 · 15 comments

Comments

@Ymoise
Copy link

Ymoise commented Mar 24, 2020

Like I mentioned in #3071, I have separate containers set up for several branches my team is working on.

I currently have 5 containers set up and... I'm running out of memory.

The server has 64gb of ram plus 64gb of swap and I'm currently 100% out of ram and about 54gb into the swap, with only about 10gb left.

That means ~19gb per container, but when I run docker stats I have two containers using ~10-16, each, and the others using amounts in the mbs, not gbs.

The only thing running is the OpenGrok containers, and killing them does free mem in a big way.

Nearly all container are one project. One container has 124 projects... but surprisingly enough, it's barely taking any memory.

I can't account for anything causing this.

@vladak
Copy link
Member

vladak commented Mar 24, 2020

The MEM USAGE column in docker stats output seems to correspond to the rss counter in /sys/fs/cgroup/memory/docker/<long-container-ID> (as documented on https://docs.docker.com/config/containers/runmetrics/). If the numbers in docker stats don't add up w.r.t. observed total memory usage, then either there is something else running on the host outside of the containers that is occupying the memory (ps --sort=rss -o pid,cgroup,rss,args -e) or the memory is taken by the kernel (buffer cache etc., slabtop -s c will tell the breakdown). I'd also suggest to take a look into /proc/meminfo.

Inside an OpenGrok Docker container the memory is by large taken by the Tomcat process. The indexer could also take a sizable chunk of memory - that is only temporary however might lead to in kernel buffer space inflating significantly. That said, this falls into memory that can be reclaimed.

@Ymoise
Copy link
Author

Ymoise commented Mar 24, 2020

Cache doesn't seem to have taken it.

Tasks: 270 total,   1 running, 268 sleeping,   1 stopped,   0 zombie
%Cpu(s): 18.5 us,  1.0 sy,  0.0 ni, 79.6 id,  0.8 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 65810144 total, 10110104 free, 54057100 used,  1642940 buff/cache
KiB Swap: 67043324 total,  5162832 free, 61880492 used. 11300812 avail Mem

s --sort=rss -o pid,rss,args -e isn't showing anything major beyond the containers themselves.

I've checked /proc/meminfo earlier and while I'd be the first to admit I might be reading this wrong, I'm not seeing anything much here, either:

MemTotal:       65810144 kB
MemFree:        10106488 kB
MemAvailable:   11297284 kB
Buffers:            1240 kB
Cached:           393004 kB
SwapCached:     10820496 kB
Active:         18796056 kB
Inactive:       21471708 kB
Active(anon):   18635176 kB
Inactive(anon): 21247732 kB
Active(file):     160880 kB
Inactive(file):   223976 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:      67043324 kB
SwapFree:        5168160 kB
Dirty:                48 kB
Writeback:             0 kB
AnonPages:      29053516 kB
Mapped:            65420 kB
Shmem:              9384 kB
Slab:            1248816 kB
SReclaimable:     927584 kB
SUnreclaim:       321232 kB
KernelStack:       14032 kB
PageTables:       222528 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    99948396 kB
Committed_AS:   95919868 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      285444 kB
VmallocChunk:   34359431224 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:      280512 kB
DirectMap2M:    66828288 kB

The only thing running on this server is the containers. It was rebooted a little while ago and the only thing that I loaded after that was the containers.

@vladak
Copy link
Member

vladak commented Mar 24, 2020

Also, it might be worth checking what processes are actually swapped out (https://stackoverflow.com/a/7180078/11582827).

@vladak
Copy link
Member

vladak commented Mar 24, 2020

What does the output from:

 ps --sort=rss -o pid,rss,size,cgroup -e | grep 'docker\/' | \
     awk '{ s+=$2 } END { print(s); }'

say ?

It should match the output from:

sudo docker ps --no-trunc -q | \
    xargs -I{} grep '^rss ' /sys/fs/cgroup/memory/docker/{}/memory.stat | \
    cut -d" " -f2 | awk '{ s+=$1 } END { print(s); }'

@vladak vladak changed the title Memory leak? system with OpenGrok docker containers maxed out memory Mar 24, 2020
@Ymoise
Copy link
Author

Ymoise commented Mar 24, 2020

What does the output from:

 ps --sort=rss -o pid,rss,size,cgroup -e | grep 'docker\/' | \
     awk '{ s+=$2 } END { print(s); }'

say ?

24439736

It should match the output from:

sudo docker ps --no-trunc -q | \
    xargs -I{} grep '^rss ' /sys/fs/cgroup/memory/docker/{}/memory.stat | \
    cut -d" " -f2 | awk '{ s+=$1 } END { print(s); }'

It doesn't.

24982634496

@Ymoise
Copy link
Author

Ymoise commented Mar 24, 2020

Also, it might be worth checking what processes are actually swapped out (https://stackoverflow.com/a/7180078/11582827).

Thank you!

PID=31163 swapped 153488 KB (java)
PID=28420 swapped 4453216 KB (java)
PID=20806 swapped 13592004 KB (java)
PID=26057 swapped 14910148 KB (java)
PID=1224 swapped 16281256 KB (java)
PID=20569 swapped 16735540 KB (java)

59gb of the swap is going to the containers, and another ~26gb of RAM are, too. Not sure where another 30gbs are (since there are currently 10gb free), but at least it doesn't look like the containers are running on air and good wishes anymore.

So, 85gb between 6 containers, so ~14gb per container.

The top swapper there, at the bottom, is a one-project container (which is weird, because I read the issue someone opened for multi-projects being memory guzzlers), but the one just above it is ~120 projects.

Each 16gb.

Is that normal? Is there a formula for gauging normal here?

@Ymoise
Copy link
Author

Ymoise commented Mar 25, 2020

I guess what I'm really asking is: Should I expect the containers to take up this much memory, in which case I need to ask for a server upgrade (the existing machine won't carry more than it does now, and I need it to), or is this not normal (e.g. "containers usually take up... 8gb"), in which case, I need to fix something?

@vladak
Copy link
Member

vladak commented Mar 25, 2020

24982634496

The values have just different magnitude. The rss value reported by ps(1) is in "kiloBytes" while the control group stats are in bytes.

24982634496/1024
24397104
24397104/1024
23825
23825/1024
23

so some 23 GiB.

@vladak
Copy link
Member

vladak commented Mar 25, 2020

Is that normal? Is there a formula for gauging normal here?

The webapp could to be memory hungry. https://github.com/oracle/opengrok/wiki/Tuning-for-large-code-bases#web-application has some tips on how to size the JVM heap for the web app. When the suggester was introduced I had to scale up the default heap size quite a bit for our internal deployment (with hundreds of mid sized projects).

Now, in Tomcat logs you might see some memory leaks being detected however these are not under our control (see #2899); AFAIK all OpenGrok leaks in the web app were plugged.

@vladak
Copy link
Member

vladak commented Mar 25, 2020

For reference, almost empty (2 small projects indexed - OpenGrok and sudo) OpenGrok container is reported to consume ~1.35 GiB in docker stats.

@Ymoise
Copy link
Author

Ymoise commented Mar 25, 2020

For reference, almost empty (2 small projects indexed - OpenGrok and sudo) OpenGrok container is reported to consume ~1.35 GiB in docker stats.

Define "small"? Because my smallest container is 4 projects that add up to roughly 7gb, and looking at swap and stats, it seems to take up around 5gb of memory.

@vladak
Copy link
Member

vladak commented Mar 25, 2020

For reference, almost empty (2 small projects indexed - OpenGrok and sudo) OpenGrok container is reported to consume ~1.35 GiB in docker stats.

Define "small"? Because my smallest container is 4 projects that add up to roughly 7gb, and looking at swap and stats, it seems to take up around 5gb of memory.

$ du -sh /var/opengrok/src
377M	/var/opengrok/src
$ find /var/opengrok/src -type f  |wc -l
4995

these are 2 clones of the OpenGrok source repository and 3 clones of sudo source repo.

Looking at the stats of the container now it grew to 2.79GiB, will keep an eye on it.

@vladak
Copy link
Member

vladak commented Mar 26, 2020

Actually, there seems to be something fishy going on with the Tomcat process, respectively JVM. Getting a JVM heap dump (gotcha: jmap stores the dump file withing the container) from a process with some ~3 GiB RSS and looking into it in MAT (Eclipse) reveals that while the heap size is some 27 MiB the unreachable objects occupy 2.38 GiB. Looks like the GC got stuck somehow. Looks similar to https://stackoverflow.com/questions/14370738/java-heap-overwhelmed-by-unreachable-objects however it is not clear to me how to resolve this.

The growth of RSS is not linear it seems. When the container was started it began under ~1 GiB which then quite quickly shoot into the ~2 GiB range and eventually ended in ~3 GiB. I then let the container run over night and it remained around the ~3 GiB with very little growth.

@vladak
Copy link
Member

vladak commented Mar 26, 2020

Instead of tuning GC or trying different GC implementation in Java 8 I tried switching the Tomcat layer to JDK 11. I created 2 containers - one based on the opengrok/docker:latest image and another using tomcat:9-jdk11. Both containers were started at the same time and the initial indexing was already done (on the same data) by the time they were started. Running them over an hour reveals JDK 11 does not suffer from this problem:
docker-opengrok-jdk8_vs_jdk11
The reindex happens every 10 minutes by default so the very first one changes the memory landscape a lot for the JDK 8 based container.

vladak pushed a commit to vladak/OpenGrok that referenced this issue Mar 26, 2020
@vladak vladak closed this as completed in 841e93e Mar 26, 2020
@vladak
Copy link
Member

vladak commented Mar 27, 2020

Fixed in the image with OpenGrok 1.3.11.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants