MALT is a memory tool to find where you allocate your memory. It also provides you some statistics about memory usage and help to find memory leaks.
MALT depends on the presence of :
- binutils (nm and add2line) to extract symbols. Tested version is 2.24 - 2.38.
It optionally depends on :
- nodejs (http://nodejs.org/) to run the webview GUI. Tested version is 0.10.30 - 12.22.9.
- libelf (http://www.mr511.de/software/english.html) to extract global variable list from executables and libs. Tested version is 0.128 - 0.183.
- libunwind (http://www.nongnu.org/libunwind/) as an alternative implementation of glibc backtrace method. Tested version is 1.1 - 1.3.2.
Supported system (known):
- Linux (Gentoo / Debian / Ubuntu / Centos / RedHat)
MALT use CMake for the build system but provide a simple configure wrapper for users familiar with autotools packaging so you can install by following the procedure :
mkdir build
cd build
../configure --prefix={YOUR_PREFIX}
make
make test
make install
If you want more advance usage, you need to call cmake by yourself so you can install it by following the procedure :
mkdir build
cd build
cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX={YOUR_PREFIX}
make
make test
make install
If you are a user of spack you can also install it easily by using :
spack install malt
Gentoo
I provide an overlay containing both MALT & NUMAPROF, you can use it by calling :
# MALT using gentoo overlay memtt :
sudo eselect repository add memtt git https://github.com/memtt/gentoo-memtt-overlay.git
sudo eselect repository enable memtt
sudo emerge -a malt numaprof
Debian / Ubuntu / Centos / RedHat / Fedora / Arch
You can give a look into packaging/README.md if you want to yourself build packages for those distributions with the embeded scripts.
MALT build support several options to define with -D option of CMake :
-DENABLE_CODE_TIMING={yes|no}
: Enable quick and dirty function to measure MALT internal performances.-DENABLE_TESTS={yes|no}
: Enable build of unit tests.-DJUNIT_OUTPUT={yes|no}
: Enable generation of junit files for jenkins integration.-DENABLE_VALGRIND={yes|no}
: Run unit tests inside valgrind memcheck and generate XML report.-DPORTABILITY_OS={UNIX}
: Set portability build options to fix OS specific calls.-DPORTABILITY_MUTEX={PTHREAD}
: Set portability build option to select mutex implementation.
MALT is written in C++ so you might possibly encounterd some issue with you build it with GCC and profile applications built with Intel Compiler. In most cases it should work out of the box without any issues.
But, I got once an error report about that. In that case, try to compile MALT also with intel compiler instead if GCC to match the app :
../configure CC=icc CXX=icpc
make
MALT currently provides a dynamic library you need to preload in your application to wrap the default memory allocator. It provides two basic instrumentation modes.
By default MALT use backtrace to reconstruct you stack on malloc/free/... calls :
{YOUR_PREFIX}/bin/malt {YOUR_PROGRAM} [OPTIONS]
You can get better performance but less detailed stack by using option -finstrument-function or similar for your compiler. Then, you need to tel MALT to use the "enter-exit" stack mode :
{YOUR_PREFIX}/bin/malt -s=enter-exit {YOUR_PROGRAM} [OPTIONS]
The malt script only provides a wrapper to automatically preload a dynamic library into the executable, you can also do it by hand in cas of issues :
LD_PRELOAD={YOUR_PREFIX}/lib/libmalt.so {YOUR_PROGRAM} [OPTIONS]
MALT work out of the box with your program but it required you to compile your program with
debug options (-g
) to get access to the source code attached to each call sites.
It might also be better to use -O0
or use -fno-inline
to disable inlining which might
provide more accurate call stacks to you.
MALT also provides a lightweight support of MPI to generate profile files named with MPI rank ID instead of process ID. In order to support this you first need to compile the MPI interface on top of your MPI. It will generate a small library in your home directory.
{YOUR_PREFIX}/bin/malt --prep-mpi [mpicxx]
Caution it will link malt to the current MPI version you are using, if you want to switch to another you will need to redo the previous command.
Then to profile you mpi application proceed like :
mpirun -np X {YOUR_PREFIX}/bin/malt --mpi {YOUR_PROGRAM} [OPTIONS]
You can use the webview by calling command malt-webview
as :
malt-webview [-p PORT] [--no-auth] -i malt-YOUR_PROGRAM-1234.json
It will open a server listening locally on port 8080 so you can open your web browser to connect to the web interface via http://localhost:8080.
At first usage malt-webview will create the password file $HOME/.malt/passwd
and ask you a
protection password for http authentification. You can change it at any time with
malt-passwd {USER}
If you are running the view remotely thought SSH you can redirect the ports by using :
ssh -L 8080:localhost:8080 user@ssh-server
To use the webview you need to install the nodeJS package on your system : http://nodejs.org/.
You can provide a config file to MALT to setup some features. This file uses the INI format. With the malt script :
{YOUR_PREFIX}/bin/malt -c=config.ini" {YOUR_PROGRAM} [OPTIONS]
By hand :
MALT_CONFIG="config.ini" LD_PRELOAD=libmalt.so {YOUR_PROGRAM} [OPTIONS]
Example of config file :
[time]
enabled=true ; enable time profiles
points=1000 ; keep 1000 points
linar-index=false ; use action ID instead of time
[stack]
enabled=true ; enable stack profiles
mode=backtrace ; select stack tracing mode (backtrace|enter-exit)
resolve=true ; Automatically resolve symbols with addr2line at exit.
libunwind=false ; Enable of disable usage of libunwind to backtrace.
skip=4 ; Number of stack frame to skip in order to cut at malloc level
[output]
name=malt-%1-%2.%3 ; base name for output, %1 = exe, %2 = PID, %3 = extension
lua=true ; enable LUA output
json=true ; enable json output
callgrind=true ; enable callgrind output
indent=false ; indent the output profile files
config=true ; dump current config
verbosity=default ; malt verbosity level (silent, default, verbose)
stack-tree=false ; store the call tree as a tree (smaller file, but need conversion)
loop-suppress=false ; Simplify recursive loop calls to get smaller profile file if too big
[max-stack]
enabled=true ; enable of disable strack size tracking (require -finstrument-functions)
[distr]
alloc-size=true ; generate distribution of allocation size
realloc-jump=true ; generate distribution of realloc jumps
[trace]
enable=false ; enable dumping allocation event tracing (not yet used by GUI)
[info]
hidden=false ; try to hide possible sensible names from profile (exe, hostname...)
[filter]
exe= ; Only apply malt on given exe (empty for all)
childs=true ; Instrument child processes or not
enabled=true ; Enable or disable MALT when threads start
[dump]
on-signal= ; Dump on signal. Can be comma separated list from SIGINT, SIGUSR1,
; SIGUSR2... help, avail (limited to only one dump)
after-seconds=0 ; Dump after X seconds (limited to only one time)
Option values can be overridden on the fly with command :
{YOUR_PREFIX}/bin/malt -o "stack:enabled=true;output:indent=true;" {YOUR_PROGRAM} [OPTIONS]
If you do not use the malt wrapper and use directly LD_PRELOAD you can use the Environnement variables :
MALT_OPTIONS="stack:enabled=true;output:indent=true;"
MALT_CONFIG="config.ini"
MALT_STACK="libunwind"
If you run on a really big program doing millions of allocation you might get a big overhead, and maybe
you are just interested in a sub-part of the program. You can do it by including malt/malt.h
in
your files and use maltEnable()
an maltDisable()
to controle MALT on each thread. It is also a nice
way to detect leaks of sub-parts of your code.
#include <malt/controler.h>
int main()
{
maltDisable();
//ignored
malloc(16);
maltEnable();
//tracked
malloc(16);
}
You will need to link the libmalt-controler.so
to get the default fake symbols when not using MALT.
You can also just provide the two empty functions in your own dynamic library (not static).
If you have some allocation not under your control before your first call you can disable MALT by default
on threads using the filter:enabled
option, then enable it by hand.
MALT use two ways to rebuild stacks, the default one relies on glibc backtrace but we observe several segfaults on some intel tools such as Intel OpenMP and Intel MPI so we also provide a more robust approach based on libunwind if present on your system at build time. You can provide it with :
../configure --with-libunwind=PREFIX
or on cmake :
cmake -DLIBUNWIND_PREFIX=PREFIX ..
You now can use it with malt by using :
malt -s libunwind {PROGRAM}
The alternative relies on function instrumentation by adding prove on start/end for each function. It can be done by using -finstrument-function on your compiler just as described in "How to use" section or by using binary instrumentation tools just as explained at the end of this document.
If you want to use the source instrumentation appraoch, you need to recompiler your program and the interesting libraries with :
gcc -finstrument-functions
Then running malt with :
${YOUR_PREFIX}/bin/malt -s enter-exit {YOUR_PROGRAM}
Malt can also track the memory used by stacks over time, but for this support it is required to enable a compiler flag :
gcc -finstrument-functions {YOUR FILES}
If your application use a custom allocator with a different namespce than the default malloc
, free
...
you can use the --wrap
or --wrap-prefix
options.
You can select in details the function by doing:
malt --wrap malloc:je_malloc ./prgm
malt --wrap malloc:je_malloc,free:je_free,calloc:je_calloc,malloc:another_custom_malloc ./prgm
You can also simply use a common prefix for all by using (typically usefull if you embed jemalloc with a custom symbol prefix):
malt --wrap-prefix je_
malt --wrap-prefix je_,another_custom_
MALT can also use binary instrumentation mode through pintool (http://software.intel.com/en-us/articles/pin-a-dynamic-binary-instrumentation-tool)
Please, check usage into src/pintool directory.
MALT can also use binary instrumentation with MAQAO (http://maqao.org/).
Please check usage into src/maqao directory.
In some cases you might get really big files. I get up to 600 MB on one code. The issue is that you cannot load this kind of file into nodejs due to some limits into the string used to read the file into json parsor functions.
The first alternative is to try to generate more compressed file by enabling usage of stackTree
output
options to store the stacks as a tree into the file. It is more efficient in terms of space (in the 600 MB
case it lower the file to 200 MB) but need an on-fly conversion by the server to get back the supported format.
malt -o "output:stackTree=true" ./PROGRAM
Currently you can still find cases where you cannot load the file into nodejs, I'm working on a workaround. Please provide me your files if it appends. By compressing it in gzip you will get less than 30-40 MB.
As of 25/07/2024, the JSON are read and processed using streams, and thus, we by-pass the internal hard limit of NodeJs requiring string to be < 512 MB. However, keep in mind that such big files makes the web interface a bit less responsive. This was tested with files up to 1 GB.
Due to another limitations, you may encounter the following error FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory
. You want to modify the heap size limit of nodeJs, with the following options NODE_OPTIONS="--max-old-space-size=<SIZE>"
with SIZE
in megabytes.
You can find packaging instructions inside packaging/README.md. For quicker use you can use the dev/packagin.sh script which do the steps automatically.
If you install MALT in a directory other than /usr
and /usr/local
, eg. in your home, you might
be interested by setting some environment variables integrating it to your shell :
export PATH=${PREFIX}/bin:$PATH
export MANPATH=${PREFIX}/share/man:$MANPATH
LD_LIBRARY_PATH
is not required as the malt
command will use the full path to get access the
internal .so
file.
If you search similar tools all over the web you might find:
- Heaptrack: A Heap Memory Profiler for Linux: KDE/heaptrack: https://github.com/KDE/heaptrack
- Memoro: A detailed Heap Profiler : https://epfl-vlsc.github.io/memoro/
- Memtrail: https://github.com/jrfonseca/memtrail
- MTuner: https://milostosic.github.io/MTuner/
- Profiler provided with google allocator: Google Heap Profiler
- Valgrind memcheck
- Valgrind massif: Valgrind massif with Massif visualizer
- Dr. Memory
- Commercial tool, Parasoft Insure++
- Commercial tool, Unicom PurifyPlus (previously IBM)
- Tau is more a communication profiling tool for HPC apps, but it offers a memory module
- Similar approach than MALT for the backend: IgProf
- A debug malloc library: Dmalloc
- Profiling and leak detection: MemProf
- Malloc count
- mpatrol
- Tracing tool for parallel programs: EZTrace
- Find Obsolete Memory: FOM Tools
- Memray: A memory profiler support C & python. https://bloomberg.github.io/memray/
If ever I missed new ones, you can also look on the repos of this person keeping an up-to-date list: https://github.com/MattPD/cpplinks/blob/master/performance.tools.md
If you search some parallel memory allocators, you can find those one on the net:
- Jemalloc (facebook, firefox)
- TCMalloc (google)
- Hoard
- Lockless allocator
- MPC memory allocator (look into mpcframework/MPC_Allocator)
MALT is distributed under CeCILL-C license (LGPL compatible).
You can join the google group to exchange ideas and ask questions : https://groups.google.com/forum/#!forum/memtt-malt.