blob: 31306e4f361eb71c9bcfe956b73560523a203501 [file] [log] [blame]
amdnd51f4af2019-05-28 16:09:01 -05001Demonstrations of exitsnoop.
2
3This Linux tool traces all process terminations and reason, it
4 - is implemented using BPF, which requires CAP_SYS_ADMIN and
5 should therefore be invoked with sudo
6 - traces sched_process_exit tracepoint in kernel/exit.c
7 - includes processes by root and all users
8 - includes processes in containers
9 - includes processes that become zombie
10
11The following example shows the termination of the 'sleep' and 'bash' commands
12when run in a loop that is interrupted with Ctrl-C from the terminal:
13
14# ./exitsnoop.py > exitlog &
15[1] 18997
16# for((i=65;i<100;i+=5)); do bash -c "sleep 1.$i;exit $i"; done
17^C
18# fg
19./exitsnoop.py > exitlog
20^C
21# cat exitlog
22PCOMM PID PPID TID AGE(s) EXIT_CODE
23sleep 19004 19003 19004 1.65 0
24bash 19003 17656 19003 1.65 code 65
25sleep 19007 19006 19007 1.70 0
26bash 19006 17656 19006 1.70 code 70
27sleep 19010 19009 19010 1.75 0
28bash 19009 17656 19009 1.75 code 75
29sleep 19014 19013 19014 0.23 signal 2 (INT)
30bash 19013 17656 19013 0.23 signal 2 (INT)
31
32#
33
34The output shows the process/command name (PCOMM), the PID,
35the process that will be notified (PPID), the thread (TID), the AGE
36of the process with hundredth of a second resolution, and the reason for
37the process exit (EXIT_CODE).
38
39A -t option can be used to include a timestamp column, it shows local time
amdn471f6ab2019-05-28 17:51:41 -050040by default. The --utc option shows the time in UTC. The --label
amdnd51f4af2019-05-28 16:09:01 -050041option adds a column indicating the tool that generated the output,
42'exit' by default. If other tools follow this format their outputs
43can be merged into a single trace with a simple lexical sort
44increasing in time order with each line labeled to indicate the event,
45e.g. 'exec', 'open', 'exit', etc. Time is displayed with millisecond
46resolution. The -x option will show only non-zero exits and fatal
47signals, which excludes processes that exit with 0 code:
48
amdn471f6ab2019-05-28 17:51:41 -050049# ./exitsnoop.py -t --utc -x --label= > exitlog &
amdnd51f4af2019-05-28 16:09:01 -050050[1] 18289
51# for((i=65;i<100;i+=5)); do bash -c "sleep 1.$i;exit $i"; done
52^C
53# fg
amdn471f6ab2019-05-28 17:51:41 -050054./exitsnoop.py -t --utc -x --label= > exitlog
amdnd51f4af2019-05-28 16:09:01 -050055^C
56# cat exitlog
57TIME-UTC LABEL PCOMM PID PPID TID AGE(s) EXIT_CODE
5813:20:22.997 exit bash 18300 17656 18300 1.65 code 65
5913:20:24.701 exit bash 18303 17656 18303 1.70 code 70
6013:20:26.456 exit bash 18306 17656 18306 1.75 code 75
6113:20:28.260 exit bash 18310 17656 18310 1.80 code 80
6213:20:30.113 exit bash 18313 17656 18313 1.85 code 85
6313:20:31.495 exit sleep 18318 18317 18318 1.38 signal 2 (INT)
6413:20:31.495 exit bash 18317 17656 18317 1.38 signal 2 (INT)
65#
66
67USAGE message:
68
69# ./exitsnoop.py -h
Shohei YOSHIDAa28337a2020-05-22 22:13:01 +090070usage: exitsnoop.py [-h] [-t] [--utc] [-p PID] [--label LABEL] [-x] [--per-thread]
amdnd51f4af2019-05-28 16:09:01 -050071
72Trace all process termination (exit, fatal signal)
73
74optional arguments:
75 -h, --help show this help message and exit
76 -t, --timestamp include timestamp (local time default)
amdn471f6ab2019-05-28 17:51:41 -050077 --utc include timestamp in UTC (-t implied)
amdnd51f4af2019-05-28 16:09:01 -050078 -p PID, --pid PID trace this PID only
79 --label LABEL label each line
80 -x, --failed trace only fails, exclude exit(0)
Shohei YOSHIDAa28337a2020-05-22 22:13:01 +090081 --per-thread trace per thread termination
amdnd51f4af2019-05-28 16:09:01 -050082
83examples:
84 exitsnoop # trace all process termination
85 exitsnoop -x # trace only fails, exclude exit(0)
86 exitsnoop -t # include timestamps (local time)
amdn471f6ab2019-05-28 17:51:41 -050087 exitsnoop --utc # include timestamps (UTC)
amdnd51f4af2019-05-28 16:09:01 -050088 exitsnoop -p 181 # only trace PID 181
89 exitsnoop --label=exit # label each output line with 'exit'
Shohei YOSHIDAa28337a2020-05-22 22:13:01 +090090 exitsnoop --per-thread # trace per thread termination
amdnd51f4af2019-05-28 16:09:01 -050091
92Exit status:
93
94 0 EX_OK Success
95 2 argparse error
96 70 EX_SOFTWARE syntax error detected by compiler, or
97 verifier error from kernel
98 77 EX_NOPERM Need sudo (CAP_SYS_ADMIN) for BPF() system call
99
100About process termination in Linux
101----------------------------------
102
103A program/process on Linux terminates normally
104 - by explicitly invoking the exit( int ) system call
105 - in C/C++ by returning an int from main(),
106 ...which is then used as the value for exit()
107 - by reaching the end of main() without a return
108 ...which is equivalent to return 0 (C99 and C++)
109 Notes:
110 - Linux keeps only the least significant eight bits of the exit value
111 - an exit value of 0 means success
112 - an exit value of 1-255 means an error
113
114A process terminates abnormally if it
115 - receives a signal which is not ignored or blocked and has no handler
116 ... the default action is to terminate with optional core dump
117 - is selected by the kernel's "Out of Memory Killer",
118 equivalent to being sent SIGKILL (9), which cannot be ignored or blocked
119 Notes:
120 - any signal can be sent asynchronously via the kill() system call
121 - synchronous signals are the result of the CPU detecting
122 a fault or trap during execution of the program, a kernel handler
123 is dispatched which determines the cause and the corresponding
124 signal, examples are
125 - attempting to fetch data or instructions at invalid or
126 privileged addresses,
127 - attempting to divide by zero, unmasked floating point exceptions
128 - hitting a breakpoint
129
130Linux keeps process termination information in 'exit_code', an int
131within struct 'task_struct' defined in <linux/sched.c>
132 - if the process terminated normally:
133 - the exit value is in bits 15:8
134 - the least significant 8 bits of exit_code are zero (bits 7:0)
135 - if the process terminates abnormally:
136 - the signal number (>= 1) is in bits 6:0
137 - bit 7 indicates a 'core dump' action, whether a core dump was
138 actually done depends on ulimit.
139
140Success is indicated with an exit value of zero.
141The meaning of a non zero exit value depends on the program.
142Some programs document their exit values and their meaning.
143This script uses exit values as defined in <include/sysexits.h>
144
145References:
146
147 https://github.com/torvalds/linux/blob/master/kernel/exit.c
148 https://github.com/torvalds/linux/blob/master/arch/x86/include/uapi/asm/signal.h
149 https://code.woboq.org/userspace/glibc/misc/sysexits.h.html
150