blob: 0b41d7a59a5375bbe951eff95d336f94250404ee [file] [log] [blame]
Sasha Goldshtein38847f02016-02-22 02:19:24 -08001Demonstrations of trace.
2
3
4trace probes functions you specify and displays trace messages if a particular
ShelbyFrances69abacc2017-02-08 05:56:53 +03005condition is met. You can control the message format to display function
6arguments and return values.
Sasha Goldshtein38847f02016-02-22 02:19:24 -08007
8For example, suppose you want to trace all commands being exec'd across the
9system:
10
Sasha Goldshtein8acd0152016-02-22 02:25:03 -080011# trace 'sys_execve "%s", arg1'
Sasha Goldshtein49d50ba2016-12-19 10:17:38 +000012PID COMM FUNC -
134402 bash sys_execve /usr/bin/man
144411 man sys_execve /usr/local/bin/less
154411 man sys_execve /usr/bin/less
164410 man sys_execve /usr/local/bin/nroff
174410 man sys_execve /usr/bin/nroff
184409 man sys_execve /usr/local/bin/tbl
194409 man sys_execve /usr/bin/tbl
204408 man sys_execve /usr/local/bin/preconv
214408 man sys_execve /usr/bin/preconv
224415 nroff sys_execve /usr/bin/locale
234416 nroff sys_execve /usr/bin/groff
244418 groff sys_execve /usr/bin/grotty
254417 groff sys_execve /usr/bin/troff
Sasha Goldshtein38847f02016-02-22 02:19:24 -080026^C
27
28The ::sys_execve syntax specifies that you want an entry probe (which is the
29default), in a kernel function (which is the default) called sys_execve. Next,
30the format string to print is simply "%s", which prints a string. Finally, the
31value to print is the first argument to the sys_execve function, which happens
32to be the command that is exec'd. The above trace was generated by executing
33"man ls" in a separate shell. As you see, man executes a number of additional
34programs to finally display the man page.
35
36Next, suppose you are looking for large reads across the system. Let's trace
37the read system call and inspect the third argument, which is the number of
38bytes to be read:
39
Sasha Goldshtein8acd0152016-02-22 02:25:03 -080040# trace 'sys_read (arg3 > 20000) "read %d bytes", arg3'
Sasha Goldshtein49d50ba2016-12-19 10:17:38 +000041PID COMM FUNC -
424490 dd sys_read read 1048576 bytes
434490 dd sys_read read 1048576 bytes
444490 dd sys_read read 1048576 bytes
454490 dd sys_read read 1048576 bytes
Sasha Goldshtein38847f02016-02-22 02:19:24 -080046^C
47
48During the trace, I executed "dd if=/dev/zero of=/dev/null bs=1M count=4".
49The individual reads are visible, with the custom format message printed for
50each read. The parenthesized expression "(arg3 > 20000)" is a filter that is
51evaluated for each invocation of the probe before printing anything.
52
53You can also trace user functions. For example, let's simulate the bashreadline
54script, which attaches to the readline function in bash and prints its return
55value, effectively snooping all bash shell input across the system:
56
57# trace 'r:bash:readline "%s", retval'
Sasha Goldshtein49d50ba2016-12-19 10:17:38 +000058PID COMM FUNC -
592740 bash readline echo hi!
602740 bash readline man ls
Sasha Goldshtein38847f02016-02-22 02:19:24 -080061^C
62
Mirek Klimose5382282018-01-26 14:52:50 -080063The special retval keyword stands for the function's return value, and can
Sasha Goldshtein38847f02016-02-22 02:19:24 -080064be used only in a retprobe, specified by the 'r' prefix. The next component
65of the probe is the library that contains the desired function. It's OK to
66specify executables too, as long as they can be found in the PATH. Or, you
67can specify the full path to the executable (e.g. "/usr/bin/bash").
68
Mirek Klimose5382282018-01-26 14:52:50 -080069Sometimes it can be useful to see where in code the events happen. There are
70flags to print the kernel stack (-K), the user stack (-U) and optionally
71include the virtual address in the stacks as well (-a):
72
73# trace.py -U -a 'r::sys_futex "%d", retval'
74PID TID COMM FUNC -
75793922 793951 poller sys_futex 0
76 7f6c72b6497a __lll_unlock_wake+0x1a [libpthread-2.23.so]
77 627fef folly::FunctionScheduler::run()+0x46f [router]
78 7f6c7345f171 execute_native_thread_routine+0x21 [libstdc++.so.6.0.21]
79 7f6c72b5b7a9 start_thread+0xd9 [libpthread-2.23.so]
80 7f6c7223fa7d clone+0x6d [libc-2.23.so]
81
Sasha Goldshtein38847f02016-02-22 02:19:24 -080082Multiple probes can be combined on the same command line. For example, let's
Sasha Goldshtein49d50ba2016-12-19 10:17:38 +000083trace failed read and write calls on the libc level, and include a time column:
Sasha Goldshtein38847f02016-02-22 02:19:24 -080084
85# trace 'r:c:read ((int)retval < 0) "read failed: %d", retval' \
Sasha Goldshtein49d50ba2016-12-19 10:17:38 +000086 'r:c:write ((int)retval < 0) "write failed: %d", retval' -T
Sasha Goldshtein38847f02016-02-22 02:19:24 -080087TIME PID COMM FUNC -
8805:31:57 3388 bash write write failed: -1
8905:32:00 3388 bash write write failed: -1
90^C
91
92Note that the retval variable must be cast to int before comparing to zero.
93The reason is that the default type for argN and retval is an unsigned 64-bit
94integer, which can never be smaller than 0.
95
Sasha Goldshteinfd60d552016-03-01 12:15:34 -080096trace has also some basic support for kernel tracepoints. For example, let's
97trace the block:block_rq_complete tracepoint and print out the number of sectors
98transferred:
99
Sasha Goldshtein49d50ba2016-12-19 10:17:38 +0000100# trace 't:block:block_rq_complete "sectors=%d", args->nr_sector' -T
Sasha Goldshteinfd60d552016-03-01 12:15:34 -0800101TIME PID COMM FUNC -
10201:23:51 0 swapper/0 block_rq_complete sectors=8
10301:23:55 10017 kworker/u64: block_rq_complete sectors=1
10401:23:55 0 swapper/0 block_rq_complete sectors=8
105^C
106
Sasha Goldshtein376ae5c2016-10-04 19:49:57 +0300107To discover the tracepoint structure format (which you can refer to as the "args"
108pointer variable), use the tplist tool. For example:
Sasha Goldshteinfd60d552016-03-01 12:15:34 -0800109
110# tplist -v block:block_rq_complete
111block:block_rq_complete
112 dev_t dev;
113 sector_t sector;
114 unsigned int nr_sector;
115 int errors;
116 char rwbs[8];
117
Sasha Goldshtein376ae5c2016-10-04 19:49:57 +0300118This output tells you that you can use "args->dev", "args->sector", etc. in your
Sasha Goldshteinfd60d552016-03-01 12:15:34 -0800119predicate and trace arguments.
120
Sasha Goldshteinaccd4cf2016-10-11 07:56:13 -0700121
122More and more high-level libraries are instrumented with USDT probe support.
123These probes can be traced by trace just like kernel tracepoints. For example,
Teng Qinc200b6c2017-12-16 00:15:55 -0800124trace new threads being created and their function name, include time column
125and on which CPU it happened:
Sasha Goldshteinaccd4cf2016-10-11 07:56:13 -0700126
Teng Qinc200b6c2017-12-16 00:15:55 -0800127# trace 'u:pthread:pthread_create "%U", arg3' -T -C
128TIME CPU PID TID COMM FUNC -
12913:22:01 25 2627 2629 automount pthread_create expire_proc_indirect+0x0 [automount]
13013:22:01 5 21360 21414 osqueryd pthread_create [unknown] [osqueryd]
13113:22:03 25 2627 2629 automount pthread_create expire_proc_indirect+0x0 [automount]
13213:22:04 15 21360 21414 osqueryd pthread_create [unknown] [osqueryd]
13313:22:07 25 2627 2629 automount pthread_create expire_proc_indirect+0x0 [automount]
13413:22:07 4 21360 21414 osqueryd pthread_create [unknown] [osqueryd]
Sasha Goldshteinaccd4cf2016-10-11 07:56:13 -0700135^C
136
137The "%U" format specifier tells trace to resolve arg3 as a user-space symbol,
138if possible. Similarly, use "%K" for kernel symbols.
139
140Ruby, Node, and OpenJDK are also instrumented with USDT. For example, let's
Nikita V. Shirokov3953c702018-07-27 16:13:47 -0700141trace Ruby methods being called (this requires a version of Ruby built with
Sasha Goldshteinaccd4cf2016-10-11 07:56:13 -0700142the --enable-dtrace configure flag):
143
Sasha Goldshtein49d50ba2016-12-19 10:17:38 +0000144# trace 'u:ruby:method__entry "%s.%s", arg1, arg2' -p $(pidof irb) -T
Sasha Goldshteinaccd4cf2016-10-11 07:56:13 -0700145TIME PID COMM FUNC -
14612:08:43 18420 irb method__entry IRB::Context.verbose?
14712:08:43 18420 irb method__entry RubyLex.ungetc
14812:08:43 18420 irb method__entry RuxyLex.debug?
149^C
150
151In the previous invocation, arg1 and arg2 are the class name and method name
152for the Ruby method being invoked.
153
ShelbyFrances86cd5352017-02-08 05:56:37 +0300154You can also trace exported functions from shared libraries, or an imported
155function on the actual executable:
156
157# sudo ./trace.py 'r:/usr/lib64/libtinfo.so:curses_version "Version=%s", retval'
158# tput -V
159
160PID TID COMM FUNC -
16121720 21720 tput curses_version Version=ncurses 6.0.20160709
162^C
163
Sasha Goldshteinaccd4cf2016-10-11 07:56:13 -0700164
Sasha Goldshteinf4797b02016-10-17 01:44:56 -0700165Occasionally, it can be useful to filter specific strings. For example, you
166might be interested in open() calls that open a specific file:
167
Sasha Goldshtein49d50ba2016-12-19 10:17:38 +0000168# trace 'p:c:open (STRCMP("test.txt", arg1)) "opening %s", arg1' -T
Sasha Goldshteinf4797b02016-10-17 01:44:56 -0700169TIME PID COMM FUNC -
17001:43:15 10938 cat open opening test.txt
17101:43:20 10939 cat open opening test.txt
172^C
173
174
Sasha Goldshtein23e72b82017-01-17 08:49:36 +0000175In the preceding example, as well as in many others, readability may be
176improved by providing the function's signature, which names the arguments and
177lets you access structure sub-fields, which is hard with the "arg1", "arg2"
178convention. For example:
179
180# trace 'p:c:open(char *filename) "opening %s", filename'
181PID TID COMM FUNC -
18217507 17507 cat open opening FAQ.txt
183^C
184
185# trace 'p::SyS_nanosleep(struct timespec *ts) "sleep for %lld ns", ts->tv_nsec'
186PID TID COMM FUNC -
187777 785 automount SyS_nanosleep sleep for 500000000 ns
188777 785 automount SyS_nanosleep sleep for 500000000 ns
189777 785 automount SyS_nanosleep sleep for 500000000 ns
190777 785 automount SyS_nanosleep sleep for 500000000 ns
191^C
192
ShelbyFrances69abacc2017-02-08 05:56:53 +0300193Remember to use the -I argument include the appropriate header file. We didn't
194need to do that here because `struct timespec` is used internally by the tool,
195so it always includes this header file.
Sasha Goldshtein23e72b82017-01-17 08:49:36 +0000196
ShelbyFrances69abacc2017-02-08 05:56:53 +0300197
198As a final example, let's trace open syscalls for a specific process. By
Sasha Goldshtein38847f02016-02-22 02:19:24 -0800199default, tracing is system-wide, but the -p switch overrides this:
200
Sasha Goldshtein49d50ba2016-12-19 10:17:38 +0000201# trace -p 2740 'do_sys_open "%s", arg2' -T
Sasha Goldshtein38847f02016-02-22 02:19:24 -0800202TIME PID COMM FUNC -
20305:36:16 15872 ls do_sys_open /etc/ld.so.cache
20405:36:16 15872 ls do_sys_open /lib64/libselinux.so.1
20505:36:16 15872 ls do_sys_open /lib64/libcap.so.2
20605:36:16 15872 ls do_sys_open /lib64/libacl.so.1
20705:36:16 15872 ls do_sys_open /lib64/libc.so.6
20805:36:16 15872 ls do_sys_open /lib64/libpcre.so.1
20905:36:16 15872 ls do_sys_open /lib64/libdl.so.2
21005:36:16 15872 ls do_sys_open /lib64/libattr.so.1
21105:36:16 15872 ls do_sys_open /lib64/libpthread.so.0
21205:36:16 15872 ls do_sys_open /usr/lib/locale/locale-archive
21305:36:16 15872 ls do_sys_open /home/vagrant
214^C
215
216In this example, we traced the "ls ~" command as it was opening its shared
217libraries and then accessing the /home/vagrant directory listing.
218
219
Mark Drayton5f5687e2017-02-20 18:13:03 +0000220Lastly, if a high-frequency event is traced you may overflow the perf ring
221buffer. This shows as "Lost N samples":
222
223# trace sys_open
2245087 5087 pgrep sys_open
2255087 5087 pgrep sys_open
2265087 5087 pgrep sys_open
2275087 5087 pgrep sys_open
2285087 5087 pgrep sys_open
229Lost 764896 samples
230Lost 764896 samples
231Lost 764896 samples
232
233The perf ring buffer size can be changed with -b. The unit is size per-CPU buffer
234size and is measured in pages. The value must be a power of two and defaults to
23564 pages.
236
Nikita V. Shirokov3953c702018-07-27 16:13:47 -0700237# trace.py 'sys_setsockopt(int fd, int level, int optname, char* optval, int optlen)(level==0 && optname == 1 && STRCMP("{0x6C, 0x00, 0x00, 0x00}", optval))' -U -M 1 --bin_cmp
238PID TID COMM FUNC -
2391855611 1863183 worker sys_setsockopt found
240
241In this example we are catching setsockopt syscall to change IPv4 IP_TOS
242value only for the cases where new TOS value is equal to 108. we are using
243STRCMP helper in binary mode (--bin_cmp flag) to compare optval array
244against int value of 108 (parametr of setsockopt call) in hex representation
245(little endian format)
246
247
248
Mark Drayton5f5687e2017-02-20 18:13:03 +0000249
Sasha Goldshtein38847f02016-02-22 02:19:24 -0800250USAGE message:
251
Mark Drayton5f5687e2017-02-20 18:13:03 +0000252usage: trace [-h] [-b BUFFER_PAGES] [-p PID] [-L TID] [-v] [-Z STRING_SIZE]
Mirek Klimose5382282018-01-26 14:52:50 -0800253 [-S] [-M MAX_EVENTS] [-t] [-T] [-K] [-U] [-a] [-I header]
Mark Draytonaa6c9162016-11-03 15:36:29 +0000254 probe [probe ...]
Sasha Goldshtein38847f02016-02-22 02:19:24 -0800255
256Attach to functions and print trace messages.
257
258positional arguments:
259 probe probe specifier (see examples)
260
261optional arguments:
262 -h, --help show this help message and exit
Mark Drayton5f5687e2017-02-20 18:13:03 +0000263 -b BUFFER_PAGES, --buffer-pages BUFFER_PAGES
264 number of pages to use for perf_events ring buffer
265 (default: 64)
Sasha Goldshtein38847f02016-02-22 02:19:24 -0800266 -p PID, --pid PID id of the process to trace (optional)
Sasha Goldshtein49d50ba2016-12-19 10:17:38 +0000267 -L TID, --tid TID id of the thread to trace (optional)
Sasha Goldshtein38847f02016-02-22 02:19:24 -0800268 -v, --verbose print resulting BPF program code before executing
269 -Z STRING_SIZE, --string-size STRING_SIZE
270 maximum size to read from strings
271 -S, --include-self do not filter trace's own pid from the trace
272 -M MAX_EVENTS, --max-events MAX_EVENTS
273 number of events to print before quitting
Sasha Goldshtein49d50ba2016-12-19 10:17:38 +0000274 -t, --timestamp print timestamp column (offset from trace start)
Mark Drayton5f5687e2017-02-20 18:13:03 +0000275 -T, --time print time column
Nikita V. Shirokov3953c702018-07-27 16:13:47 -0700276 -C, --print_cpu print CPU id
277 -B, --bin_cmp allow to use STRCMP with binary values
Sasha Goldshtein4725a722016-10-18 20:54:47 +0300278 -K, --kernel-stack output kernel stack trace
279 -U, --user-stack output user stack trace
Mirek Klimose5382282018-01-26 14:52:50 -0800280 -a, --address print virtual address in stacks
Sasha Goldshtein4725a722016-10-18 20:54:47 +0300281 -I header, --include header
282 additional header files to include in the BPF program
Yonghong Songf4470dc2017-12-13 14:12:13 -0800283 as either full path, or relative to current working directory,
284 or relative to default kernel header search path
Sasha Goldshtein38847f02016-02-22 02:19:24 -0800285
286EXAMPLES:
287
Sasha Goldshtein8acd0152016-02-22 02:25:03 -0800288trace do_sys_open
Sasha Goldshtein38847f02016-02-22 02:19:24 -0800289 Trace the open syscall and print a default trace message when entered
Sasha Goldshtein8acd0152016-02-22 02:25:03 -0800290trace 'do_sys_open "%s", arg2'
Sasha Goldshtein38847f02016-02-22 02:19:24 -0800291 Trace the open syscall and print the filename being opened
Sasha Goldshtein8acd0152016-02-22 02:25:03 -0800292trace 'sys_read (arg3 > 20000) "read %d bytes", arg3'
Sasha Goldshtein38847f02016-02-22 02:19:24 -0800293 Trace the read syscall and print a message for reads >20000 bytes
Sasha Goldshtein23e72b82017-01-17 08:49:36 +0000294trace 'r::do_sys_open "%llx", retval'
Mark Draytonaa6c9162016-11-03 15:36:29 +0000295 Trace the return from the open syscall and print the return value
Sasha Goldshtein8acd0152016-02-22 02:25:03 -0800296trace 'c:open (arg2 == 42) "%s %d", arg1, arg2'
Sasha Goldshtein38847f02016-02-22 02:19:24 -0800297 Trace the open() call from libc only if the flags (arg2) argument is 42
Sasha Goldshtein8acd0152016-02-22 02:25:03 -0800298trace 'c:malloc "size = %d", arg1'
Sasha Goldshtein38847f02016-02-22 02:19:24 -0800299 Trace malloc calls and print the size being allocated
Sasha Goldshtein8acd0152016-02-22 02:25:03 -0800300trace 'p:c:write (arg1 == 1) "writing %d bytes to STDOUT", arg3'
301 Trace the write() call from libc to monitor writes to STDOUT
Mark Drayton5f5687e2017-02-20 18:13:03 +0000302trace 'r::__kmalloc (retval == 0) "kmalloc failed!"'
Sasha Goldshtein8acd0152016-02-22 02:25:03 -0800303 Trace returns from __kmalloc which returned a null pointer
Mark Drayton5f5687e2017-02-20 18:13:03 +0000304trace 'r:c:malloc (retval) "allocated = %x", retval'
Sasha Goldshtein38847f02016-02-22 02:19:24 -0800305 Trace returns from malloc and print non-NULL allocated buffers
Sasha Goldshtein376ae5c2016-10-04 19:49:57 +0300306trace 't:block:block_rq_complete "sectors=%d", args->nr_sector'
Sasha Goldshteinfd60d552016-03-01 12:15:34 -0800307 Trace the block_rq_complete kernel tracepoint and print # of tx sectors
Sasha Goldshtein3e39a082016-03-24 08:39:47 -0700308trace 'u:pthread:pthread_create (arg4 != 0)'
309 Trace the USDT probe pthread_create when its 4th argument is non-zero
Sasha Goldshtein23e72b82017-01-17 08:49:36 +0000310trace 'p::SyS_nanosleep(struct timespec *ts) "sleep for %lld ns", ts->tv_nsec'
311 Trace the nanosleep syscall and print the sleep duration in ns
Yonghong Songf4470dc2017-12-13 14:12:13 -0800312trace -I 'linux/fs.h' \
313 'p::uprobe_register(struct inode *inode) "a_ops = %llx", inode->i_mapping->a_ops'
314 Trace the uprobe_register inode mapping ops, and the symbol can be found
315 in /proc/kallsyms
316trace -I 'kernel/sched/sched.h' \
317 'p::__account_cfs_rq_runtime(struct cfs_rq *cfs_rq) "%d", cfs_rq->runtime_remaining'
318 Trace the cfs scheduling runqueue remaining runtime. The struct cfs_rq is defined
319 in kernel/sched/sched.h which is in kernel source tree and not in kernel-devel
320 package. So this command needs to run at the kernel source tree root directory
321 so that the added header file can be found by the compiler.
tehnerd86293f02018-01-23 21:21:58 -0800322trace -I 'net/sock.h' \\
323 'udpv6_sendmsg(struct sock *sk) (sk->sk_dport == 13568)'
324 Trace udpv6 sendmsg calls only if socket's destination port is equal
325 to 53 (DNS; 13568 in big endian order)
Yonghong Songf92fef22018-01-24 20:51:46 -0800326trace -I 'linux/fs_struct.h' 'mntns_install "users = %d", $task->fs->users'
327 Trace the number of users accessing the file system of the current task
tehnerd86293f02018-01-23 21:21:58 -0800328"