Caplin Platform Diagnostics is a collection of Bash scripts that collect diagnostics on a running or crashed Caplin Platform component.
The scripts automate a series of common Linux diagnostic commands that Caplin Support ask customers to run when raising a support request (see Send diagnostic information to Caplin Support on the Caplin website).
Caplin Platform Diagnostics is made available under an MIT licence.
Contents:
The Caplin Platform Diagnostics scripts have the following requirements:
- CentOS/RHEL 6 or 7
- GNU Debugger:
$ sudo yum install gdb
- Red Hat OpenJDK 8 (full JDK, not just JRE):
$ sudo yum install java-1.8.0-openjdk-devel
Copy (or symlink) the two diagnostic scripts to a directory on your executable path. For example, ~/bin/
or /usr/local/bin/
To run diagnostics on a process, follow the steps below:
-
Install dependencies, if not already installed:
$ sudo yum install gdb java-1.8.0-openjdk-devel
-
Run the script below under the same user as the target process (run time 20 seconds):
$ caplin-process-diagnostics.sh <pid>
For full details and options, see Running diagnostics on a process.
-
Upload the generated tar file and any log files requested by Caplin Support to Caplin's File Upload Facility.
To run diagnostics on a core-file, follow the steps below:
-
Install dependencies, if not already installed:
$ sudo yum install gdb
-
Run the script below:
$ caplin-corefile-diagnostics.sh <corefile>
For full details and options, see Running diagnostics on a core file.
-
Upload the generated tar file and any log files requested by Caplin Support to Caplin's File Upload Facility.
The caplin-corefile-diagnostics.sh
script collates diagnostics for a core file dumped by a crashed Caplin Platform component.
The diagnostics collated include all the files Caplin Support require to analyse the core file: the component binary, the core file, and all shared libraries referenced in the core file. For the full list of information collated, see Information collated, below.
Run this script on the crashed component's host, or, if this is not possible, on an identically configured host (same operating system and Java versions).
After running the script, log in to Caplin's secure File Upload Facility and upload the following files:
- Tar archive generated by the
caplin-corefile-diagnostics.sh
script - Java virtual machine log and error files (if available):
- HotSpot JVM error file (
hs_err_pid<process-id>.log
) - Heap dump file (
var/java_pid<process-id>.hprof
) - Garbage collection log (
var/gc.log
)
- HotSpot JVM error file (
- Caplin log files for the period of the incident
- Caplin configuration files
This script has the following requirements:
- CentOS/RHEL 6 or 7
- GNU Debugger (
gdb
RPM package). - Write permission to the current directory
- Run on the crashed component's host or, if this is not possible, on an identically configured host (same operating system and Java versions)
Syntax: caplin-corefile-diagnostics.sh core [binary]
core
: path to the core file dumped by the crashed process.binary
: path to the crashed process's binary. Defaults to the path of the binary recorded in the core file.
Run as: any user
Runtime: < 1 minute
Output: diagnostics-<hostname>-<core-file>-<timestamp>.tar.gz
This script collates the following information:
Diagnostic | Dependencies | User |
---|---|---|
/etc/os-release |
- | - |
/etc/redhat-release |
- | - |
/etc/security/limits.conf |
- | - |
/etc/security/limits.d/* |
- | - |
ulimit -aS output |
- | - |
ulimit -aH output |
- | - |
uname -a output |
- | - |
df output for binary's 'var' directory |
- | - |
Caplin dfw versions output |
Binary is in a DFW | - |
Core file | - | - |
Core file backtrace | gdb RPM package |
- |
Core file libraries | gdb RPM package |
- |
Component binary | - | - |
The example below collates diagnostics for a core file, core.4972
, dumped by a Liberator binary, rttpd
:
$ ./caplin-corefile-diagnostics.sh ~/dfw1/servers/Liberator/core.4972
Caplin Core-file Diagnostics
============================
Host: server1
Core: /home/caplin/dfw1/servers/Liberator/core.4972
Binary: /home/caplin/dfw1/servers/Liberator/bin/rttpd
GDB installed: 1
Script temp dir: diagnostics-server1-rttpd-core.4972-20190916104354
Recording /etc/os-release
Recording /etc/redhat-release
Recording 'uname -a' output
Recording 'df' output for /home/caplin/dfw1/servers/Liberator/var
Recording 'dfw versions' output
Getting thread backtraces from core.4972
Getting list of libraries referenced by core.4972
Copying libraries referenced by core.4972
DONE
Files collected:
core.4972
core.4972.backtrace.out
core.4972.libs.tar
dfw-versions.out
diagnostics.log
libs-list.out
libs-list.txt
os-release
redhat-release
rttpd
uname.out
Archiving files to diagnostics-server1-rttpd-core.4972-20190916104354.tar.gz
Please login to https://www.caplin.com/account/uploads
and upload the archive to Caplin Support.
The caplin-process-diagnostics.sh
script collates diagnostics for a process without terminating the process.
Script run-time is 20s for the default set of diagnostics. Optional diagnostics take longer, and their timing can be variable. For example, the run time for the optional GDB core dump (--gcore
) depends on the size of the target process in memory, and the host's disk I/O and CPU performance.
For the full list of information collated, see Information collated, below.
After running the script, log in to Caplin's secure File Upload Facility and upload the following files:
- Tar archive generated by the
caplin-process-diagnostics.sh
script - Java virtual machine log files (if available):
- Garbage collection log (
var/gc.log
)
- Garbage collection log (
- Caplin log files for the period of the incident
- Caplin configuration files
The main dependency is the GNU Debugger (gdb
package). This is required for generating stack traces and a core dump.
If any requirements are missing when you run the script, the script lists the missing dependencies and asks if you wish to continue. If you choose to continue, the script skips any diagnostics with missing dependencies.
All diagnostics:
GDB core dump and backtrace diagnostics:
gdb
RPM package- Free disk space greater than the process's virtual memory
- CentOS/RHEL 7: SELINUX boolean
deny_ptrace
set to off (if SELINUX enabled and enforcing). - CentOS/RHEL 7: Yama kernel module sysctl setting
kernel.yama.ptrace_scope
set to 0, 1, or 2.
JVM diagnostics:
java-1.8.0-openjdk-devel
RPM package. This package installs the full JDK, which includes thejcmd
diagnostic tool.
Optional strace
diagnostic:
strace
RPM package. Only required if requested by Caplin Support.
Syntax: caplin-process-diagnostics.sh [options] pid
pid
: process identifier of the running component- Options:
--gcore
: include the optional GDB core file dump diagnostic. Only include this diagnostic when requested by Caplin Support.--jvm-heap
: include the optional JVM heap dump diagnostic. Halts the JVM temporarily for the duration of the diagnostic. Only include this diagnostic when requested by Caplin Support.--jvm-class-histogram
: include the optional JVM class histogram diagnostic. Halts the JVM temporarily for the duration of the diagnostic. Only include this diagnostic when requested by Caplin Support.--strace
: include the optionalstrace
diagnostic. Only include this diagnostic when requested by Caplin Support.--help
: display help and exit--version
: display version and exit
Run as:
- CentOS 6: the process's user
- CentOS 7:
kernel.yama.ptrace_scope=0
: the process's userkernel.yama.ptrace_scope=1
: root (required for core dump, thread backtraces, andstrace
)kernel.yama.ptrace_scope=2
: root (required for core dump, thread backtraces, andstrace
)kernel.yama.ptrace_scope=3
: the process's user (core dump, thread backtraces, andstrace
prohibited for all users)
Runtime: 20s for the default set of diagnostics
Output: diagnostics-<hostname>-<binary>-<pid>-<timestamp>.tar.gz
Default diagnostics:
Diagnostic | Dependencies | User |
---|---|---|
/etc/os-release |
- | - |
/etc/redhat-release |
- | - |
uname -a output |
- | - |
/proc/sys/kernel/core_pattern |
- | - |
/proc/sys/kernel/core_uses_pid |
- | - |
/proc/<pid>/limits |
- | - |
/etc/security/limits.conf |
- | - |
/etc/security/limits.d/* |
- | - |
top output for the system (5 seconds) |
- | - |
top output for the process (5 seconds) |
- | - |
df output for the process's <working-dir>/var directory |
- | - |
free output |
- | - |
vmstat output (5 seconds) |
- | - |
Caplin dfw info output |
Process binary is in a DFW | - |
Caplin dfw status output |
Process binary is in a DFW | - |
Caplin dfw versions output |
Process binary is in a DFW | - |
JVM jcmd <pid> Thread.print output |
jcmd JDK command |
Note 1 |
JVM jcmd <pid> GC.heap_info output |
jcmd JDK command |
Note 1 |
JVM jcmd <pid> VM.system_properties output |
jcmd JDK command |
Note 1 |
JVM jcmd <pid> VM.flags output |
jcmd JDK command |
Note 1 |
JVM jcmd <pid> PerfCounter.print output |
jcmd JDK command |
Note 1 |
JVM jstat -gc <pid> output |
jcmd JDK command |
Note 1 |
JVM jstat -gcutil <pid> output |
jcmd JDK command |
Note 1 |
GDB thread backtrace | gdb RPM package |
Note 2 |
Process binary | - | - |
Optional diagnostics (only enable if requested by Caplin Support):
Diagnostic | Dependencies | User |
---|---|---|
GDB core-file dump, backtrace, and libraries | gdb RPM package |
Note 2 |
JVM jcmd <pid> GC.heap_dump output |
jcmd JDK command |
Note 1 |
JVM jcmd <pid> GC.class_histogram output |
jcmd JDK command |
Note 1 |
strace output (system-call logging) |
strace RPM package |
Note 2 |
Note 1: JVM diagnostics must be run as the process's user. If you run the script as root, then the script uses sudo
to run the JVM diagnostics as the process's user.
Note 2: GDB thread backtraces, GDB core dump, and strace
can be run as the process's user, unless prohibited by the Yama kernel module (introduced in CentOS/RHEL 7). The script will advise you if root privileges are required to run these diagnostics.
The default set of diagnostics includes only one diagnostic that directly impacts the performance of the target process:
- GDB thread backtrace: the target process is halted temporarily for less than 1 second for each backtrace.
The optional diagnostics have a potentially greater performance impact and should only be enabled when requested by Caplin Support:
-
GDB core dump: the target process is halted temporarily for the time it takes the gcore command to write the process's virtual memory to a core file. The execution time is determined by the size of the process's virtual memory (
ps -o vsz= -q <process-id>
) and the host's CPU and I/O performance. -
strace: slows performance of the target process for the duration of the diagnostic (40 seconds).
-
JVM heap dump: halts the JVM temporarily for the duration of the diagnostic.
-
JVM class histogram: halts the JVM temporarily for the duration of the diagnostic.
The example below collates diagnostics for a Liberator running as process 4972:
$ ./caplin-process-diagnostics.sh 4972
Caplin Process Diagnostics
==========================
Process ID: 4972
Process binary: /home/caplin/dfw1/kits/Liberator/Liberator-7.1.9-313149/bin/rttpd
Script user: same user as process 4972
Script temp dir: ./diagnostics-server1-rttpd-4972-20190916102608
Recording /etc/redhat-release
Recording 'uname -a' output
Recording /proc/sys/kernel/core_pattern
Recording /proc/sys/kernel/core_uses_pid
Recording /proc/4972/limits
Recording 'top' output (5 seconds)
Recording 'top' output for process 4972 (5 seconds)
Recording process 4972 limits (/proc/4972/limits)
Recording 'df' output for /home/caplin/dfw1/servers/Liberator/var
Recording 'free' output
Recording 'vmstat' output (5 seconds)
Recording 'dfw info' output
Recording 'dfw status' output
Recording 'dfw versions' output
1/3: Dumping GDB thread backtraces for process 4972
Sleeping for 1 second...
2/3: Dumping GDB thread backtraces for process 4972
Sleeping for 1 second...
3/3: Dumping GDB thread backtraces for process 4972
1/3: Dumping JVM stack trace for process 4972
Sleeping for 1 second...
2/3: Dumping JVM stack trace for process 4972
Sleeping for 1 second...
3/3: Dumping JVM stack trace for process 4972
Recording JVM heap info
Recording JVM properties
Recording JVM flags
Recording JVM performance counters
Recording JVM jstat GC output
DONE
Files collected:
df.out
dfw-info.out
dfw-status.out
dfw-versions.out
diagnostics.log
free.out
jvm-flags
jvm-heapinfo
jvm-jstat-gc
jvm-jstat-gcutil
jvm-perfcounter
jvm-props
jvm-stacktrace-20190916102810.out
jvm-stacktrace-20190916102811.out
jvm-stacktrace-20190916102812.out
proc-4972-limits
proc-sys-kernel-core_pattern
proc-sys-kernel-core_uses_pid
redhat-release
rttpd-backtrace-20190916102806.out
rttpd-backtrace-20190916102808.out
rttpd-backtrace-20190916102809.out
top-4972.out
top.out
uname.out
vmstat.out
Archiving files to diagnostics-server1-rttpd-4972-20190916102608.tar.gz
Please login to https://www.caplin.com/account/uploads
and upload the archive to Caplin Support.