blob: f1e953d3267659a6627e59cd3c4a5571c4abc514 [file] [log] [blame]
Sasha Goldshtein38847f02016-02-22 02:19:24 -08001Demonstrations of trace.
2
3
4trace probes functions you specify and displays trace messages if a particular
5condition is met. You can control the message format to display function
6arguments and return values.
7
8For example, suppose you want to trace all commands being exec'd across the
9system:
10
Sasha Goldshtein8acd0152016-02-22 02:25:03 -080011# trace 'sys_execve "%s", arg1'
Sasha Goldshtein38847f02016-02-22 02:19:24 -080012TIME PID COMM FUNC -
1305:11:51 4402 bash sys_execve /usr/bin/man
1405:11:51 4411 man sys_execve /usr/local/bin/less
1505:11:51 4411 man sys_execve /usr/bin/less
1605:11:51 4410 man sys_execve /usr/local/bin/nroff
1705:11:51 4410 man sys_execve /usr/bin/nroff
1805:11:51 4409 man sys_execve /usr/local/bin/tbl
1905:11:51 4409 man sys_execve /usr/bin/tbl
2005:11:51 4408 man sys_execve /usr/local/bin/preconv
2105:11:51 4408 man sys_execve /usr/bin/preconv
2205:11:51 4415 nroff sys_execve /usr/bin/locale
2305:11:51 4416 nroff sys_execve /usr/bin/groff
2405:11:51 4418 groff sys_execve /usr/bin/grotty
2505:11:51 4417 groff sys_execve /usr/bin/troff
26^C
27
28The ::sys_execve syntax specifies that you want an entry probe (which is the
29default), in a kernel function (which is the default) called sys_execve. Next,
30the format string to print is simply "%s", which prints a string. Finally, the
31value to print is the first argument to the sys_execve function, which happens
32to be the command that is exec'd. The above trace was generated by executing
33"man ls" in a separate shell. As you see, man executes a number of additional
34programs to finally display the man page.
35
36Next, suppose you are looking for large reads across the system. Let's trace
37the read system call and inspect the third argument, which is the number of
38bytes to be read:
39
Sasha Goldshtein8acd0152016-02-22 02:25:03 -080040# trace 'sys_read (arg3 > 20000) "read %d bytes", arg3'
Sasha Goldshtein38847f02016-02-22 02:19:24 -080041TIME PID COMM FUNC -
4205:18:23 4490 dd sys_read read 1048576 bytes
4305:18:23 4490 dd sys_read read 1048576 bytes
4405:18:23 4490 dd sys_read read 1048576 bytes
4505:18:23 4490 dd sys_read read 1048576 bytes
46^C
47
48During the trace, I executed "dd if=/dev/zero of=/dev/null bs=1M count=4".
49The individual reads are visible, with the custom format message printed for
50each read. The parenthesized expression "(arg3 > 20000)" is a filter that is
51evaluated for each invocation of the probe before printing anything.
52
53You can also trace user functions. For example, let's simulate the bashreadline
54script, which attaches to the readline function in bash and prints its return
55value, effectively snooping all bash shell input across the system:
56
57# trace 'r:bash:readline "%s", retval'
58TIME PID COMM FUNC -
5905:24:50 2740 bash readline echo hi!
6005:24:53 2740 bash readline man ls
61^C
62
63The special retval keywords stands for the function's return value, and can
64be used only in a retprobe, specified by the 'r' prefix. The next component
65of the probe is the library that contains the desired function. It's OK to
66specify executables too, as long as they can be found in the PATH. Or, you
67can specify the full path to the executable (e.g. "/usr/bin/bash").
68
69Multiple probes can be combined on the same command line. For example, let's
70trace failed read and write calls on the libc level:
71
72# trace 'r:c:read ((int)retval < 0) "read failed: %d", retval' \
73 'r:c:write ((int)retval < 0) "write failed: %d", retval'
74TIME PID COMM FUNC -
7505:31:57 3388 bash write write failed: -1
7605:32:00 3388 bash write write failed: -1
77^C
78
79Note that the retval variable must be cast to int before comparing to zero.
80The reason is that the default type for argN and retval is an unsigned 64-bit
81integer, which can never be smaller than 0.
82
Sasha Goldshteinfd60d552016-03-01 12:15:34 -080083trace has also some basic support for kernel tracepoints. For example, let's
84trace the block:block_rq_complete tracepoint and print out the number of sectors
85transferred:
86
Sasha Goldshtein376ae5c2016-10-04 19:49:57 +030087# trace 't:block:block_rq_complete "sectors=%d", args->nr_sector'
Sasha Goldshteinfd60d552016-03-01 12:15:34 -080088TIME PID COMM FUNC -
8901:23:51 0 swapper/0 block_rq_complete sectors=8
9001:23:55 10017 kworker/u64: block_rq_complete sectors=1
9101:23:55 0 swapper/0 block_rq_complete sectors=8
92^C
93
Sasha Goldshtein376ae5c2016-10-04 19:49:57 +030094To discover the tracepoint structure format (which you can refer to as the "args"
95pointer variable), use the tplist tool. For example:
Sasha Goldshteinfd60d552016-03-01 12:15:34 -080096
97# tplist -v block:block_rq_complete
98block:block_rq_complete
99 dev_t dev;
100 sector_t sector;
101 unsigned int nr_sector;
102 int errors;
103 char rwbs[8];
104
Sasha Goldshtein376ae5c2016-10-04 19:49:57 +0300105This output tells you that you can use "args->dev", "args->sector", etc. in your
Sasha Goldshteinfd60d552016-03-01 12:15:34 -0800106predicate and trace arguments.
107
Sasha Goldshteinaccd4cf2016-10-11 07:56:13 -0700108
109More and more high-level libraries are instrumented with USDT probe support.
110These probes can be traced by trace just like kernel tracepoints. For example,
111trace new threads being created and their function name:
112
113# trace 'u:pthread:pthread_create "%U", arg3'
114TIME PID COMM FUNC -
11502:07:29 4051 contentions pthread_create primes_thread+0x0
11602:07:29 4051 contentions pthread_create primes_thread+0x0
11702:07:29 4051 contentions pthread_create primes_thread+0x0
11802:07:29 4051 contentions pthread_create primes_thread+0x0
119^C
120
121The "%U" format specifier tells trace to resolve arg3 as a user-space symbol,
122if possible. Similarly, use "%K" for kernel symbols.
123
124Ruby, Node, and OpenJDK are also instrumented with USDT. For example, let's
125trace Ruby methods being called (this requires a version of Ruby built with
126the --enable-dtrace configure flag):
127
128# trace 'u:ruby:method__entry "%s.%s", arg1, arg2' -p $(pidof irb)
129TIME PID COMM FUNC -
13012:08:43 18420 irb method__entry IRB::Context.verbose?
13112:08:43 18420 irb method__entry RubyLex.ungetc
13212:08:43 18420 irb method__entry RuxyLex.debug?
133^C
134
135In the previous invocation, arg1 and arg2 are the class name and method name
136for the Ruby method being invoked.
137
138
Sasha Goldshteinf4797b02016-10-17 01:44:56 -0700139Occasionally, it can be useful to filter specific strings. For example, you
140might be interested in open() calls that open a specific file:
141
142# trace 'p:c:open (STRCMP("test.txt", arg1)) "opening %s", arg1'
143TIME PID COMM FUNC -
14401:43:15 10938 cat open opening test.txt
14501:43:20 10939 cat open opening test.txt
146^C
147
148
Sasha Goldshtein38847f02016-02-22 02:19:24 -0800149As a final example, let's trace open syscalls for a specific process. By
150default, tracing is system-wide, but the -p switch overrides this:
151
Sasha Goldshtein8acd0152016-02-22 02:25:03 -0800152# trace -p 2740 'do_sys_open "%s", arg2'
Sasha Goldshtein38847f02016-02-22 02:19:24 -0800153TIME PID COMM FUNC -
15405:36:16 15872 ls do_sys_open /etc/ld.so.cache
15505:36:16 15872 ls do_sys_open /lib64/libselinux.so.1
15605:36:16 15872 ls do_sys_open /lib64/libcap.so.2
15705:36:16 15872 ls do_sys_open /lib64/libacl.so.1
15805:36:16 15872 ls do_sys_open /lib64/libc.so.6
15905:36:16 15872 ls do_sys_open /lib64/libpcre.so.1
16005:36:16 15872 ls do_sys_open /lib64/libdl.so.2
16105:36:16 15872 ls do_sys_open /lib64/libattr.so.1
16205:36:16 15872 ls do_sys_open /lib64/libpthread.so.0
16305:36:16 15872 ls do_sys_open /usr/lib/locale/locale-archive
16405:36:16 15872 ls do_sys_open /home/vagrant
165^C
166
167In this example, we traced the "ls ~" command as it was opening its shared
168libraries and then accessing the /home/vagrant directory listing.
169
170
171USAGE message:
172
173# trace -h
174usage: trace.py [-h] [-p PID] [-v] [-Z STRING_SIZE] [-S] [-M MAX_EVENTS] [-o]
175 probe [probe ...]
176
177Attach to functions and print trace messages.
178
179positional arguments:
180 probe probe specifier (see examples)
181
182optional arguments:
183 -h, --help show this help message and exit
184 -p PID, --pid PID id of the process to trace (optional)
185 -v, --verbose print resulting BPF program code before executing
186 -Z STRING_SIZE, --string-size STRING_SIZE
187 maximum size to read from strings
188 -S, --include-self do not filter trace's own pid from the trace
189 -M MAX_EVENTS, --max-events MAX_EVENTS
190 number of events to print before quitting
191 -o, --offset use relative time from first traced message
Sasha Goldshtein4725a722016-10-18 20:54:47 +0300192 -K, --kernel-stack output kernel stack trace
193 -U, --user-stack output user stack trace
194 -I header, --include header
195 additional header files to include in the BPF program
Sasha Goldshtein38847f02016-02-22 02:19:24 -0800196
197EXAMPLES:
198
Sasha Goldshtein8acd0152016-02-22 02:25:03 -0800199trace do_sys_open
Sasha Goldshtein38847f02016-02-22 02:19:24 -0800200 Trace the open syscall and print a default trace message when entered
Sasha Goldshtein8acd0152016-02-22 02:25:03 -0800201trace 'do_sys_open "%s", arg2'
Sasha Goldshtein38847f02016-02-22 02:19:24 -0800202 Trace the open syscall and print the filename being opened
Sasha Goldshtein8acd0152016-02-22 02:25:03 -0800203trace 'sys_read (arg3 > 20000) "read %d bytes", arg3'
Sasha Goldshtein38847f02016-02-22 02:19:24 -0800204 Trace the read syscall and print a message for reads >20000 bytes
205trace r::do_sys_return
206 Trace the return from the open syscall
Sasha Goldshtein8acd0152016-02-22 02:25:03 -0800207trace 'c:open (arg2 == 42) "%s %d", arg1, arg2'
Sasha Goldshtein38847f02016-02-22 02:19:24 -0800208 Trace the open() call from libc only if the flags (arg2) argument is 42
Sasha Goldshtein8acd0152016-02-22 02:25:03 -0800209trace 'c:malloc "size = %d", arg1'
Sasha Goldshtein38847f02016-02-22 02:19:24 -0800210 Trace malloc calls and print the size being allocated
Sasha Goldshtein8acd0152016-02-22 02:25:03 -0800211trace 'p:c:write (arg1 == 1) "writing %d bytes to STDOUT", arg3'
212 Trace the write() call from libc to monitor writes to STDOUT
213trace 'r::__kmalloc (retval == 0) "kmalloc failed!"
214 Trace returns from __kmalloc which returned a null pointer
Sasha Goldshteinf4797b02016-10-17 01:44:56 -0700215trace 'r:c:malloc (retval) "allocated = %x", retval
Sasha Goldshtein38847f02016-02-22 02:19:24 -0800216 Trace returns from malloc and print non-NULL allocated buffers
Sasha Goldshtein376ae5c2016-10-04 19:49:57 +0300217trace 't:block:block_rq_complete "sectors=%d", args->nr_sector'
Sasha Goldshteinfd60d552016-03-01 12:15:34 -0800218 Trace the block_rq_complete kernel tracepoint and print # of tx sectors
Sasha Goldshtein3e39a082016-03-24 08:39:47 -0700219trace 'u:pthread:pthread_create (arg4 != 0)'
220 Trace the USDT probe pthread_create when its 4th argument is non-zero
Sasha Goldshtein38847f02016-02-22 02:19:24 -0800221