Sasha Goldshtein | 38847f0 | 2016-02-22 02:19:24 -0800 | [diff] [blame] | 1 | Demonstrations of trace. |
| 2 | |
| 3 | |
| 4 | trace probes functions you specify and displays trace messages if a particular |
| 5 | condition is met. You can control the message format to display function |
| 6 | arguments and return values. |
| 7 | |
| 8 | For example, suppose you want to trace all commands being exec'd across the |
| 9 | system: |
| 10 | |
Sasha Goldshtein | 8acd015 | 2016-02-22 02:25:03 -0800 | [diff] [blame] | 11 | # trace 'sys_execve "%s", arg1' |
Sasha Goldshtein | 38847f0 | 2016-02-22 02:19:24 -0800 | [diff] [blame] | 12 | TIME PID COMM FUNC - |
| 13 | 05:11:51 4402 bash sys_execve /usr/bin/man |
| 14 | 05:11:51 4411 man sys_execve /usr/local/bin/less |
| 15 | 05:11:51 4411 man sys_execve /usr/bin/less |
| 16 | 05:11:51 4410 man sys_execve /usr/local/bin/nroff |
| 17 | 05:11:51 4410 man sys_execve /usr/bin/nroff |
| 18 | 05:11:51 4409 man sys_execve /usr/local/bin/tbl |
| 19 | 05:11:51 4409 man sys_execve /usr/bin/tbl |
| 20 | 05:11:51 4408 man sys_execve /usr/local/bin/preconv |
| 21 | 05:11:51 4408 man sys_execve /usr/bin/preconv |
| 22 | 05:11:51 4415 nroff sys_execve /usr/bin/locale |
| 23 | 05:11:51 4416 nroff sys_execve /usr/bin/groff |
| 24 | 05:11:51 4418 groff sys_execve /usr/bin/grotty |
| 25 | 05:11:51 4417 groff sys_execve /usr/bin/troff |
| 26 | ^C |
| 27 | |
| 28 | The ::sys_execve syntax specifies that you want an entry probe (which is the |
| 29 | default), in a kernel function (which is the default) called sys_execve. Next, |
| 30 | the format string to print is simply "%s", which prints a string. Finally, the |
| 31 | value to print is the first argument to the sys_execve function, which happens |
| 32 | to be the command that is exec'd. The above trace was generated by executing |
| 33 | "man ls" in a separate shell. As you see, man executes a number of additional |
| 34 | programs to finally display the man page. |
| 35 | |
| 36 | Next, suppose you are looking for large reads across the system. Let's trace |
| 37 | the read system call and inspect the third argument, which is the number of |
| 38 | bytes to be read: |
| 39 | |
Sasha Goldshtein | 8acd015 | 2016-02-22 02:25:03 -0800 | [diff] [blame] | 40 | # trace 'sys_read (arg3 > 20000) "read %d bytes", arg3' |
Sasha Goldshtein | 38847f0 | 2016-02-22 02:19:24 -0800 | [diff] [blame] | 41 | TIME PID COMM FUNC - |
| 42 | 05:18:23 4490 dd sys_read read 1048576 bytes |
| 43 | 05:18:23 4490 dd sys_read read 1048576 bytes |
| 44 | 05:18:23 4490 dd sys_read read 1048576 bytes |
| 45 | 05:18:23 4490 dd sys_read read 1048576 bytes |
| 46 | ^C |
| 47 | |
| 48 | During the trace, I executed "dd if=/dev/zero of=/dev/null bs=1M count=4". |
| 49 | The individual reads are visible, with the custom format message printed for |
| 50 | each read. The parenthesized expression "(arg3 > 20000)" is a filter that is |
| 51 | evaluated for each invocation of the probe before printing anything. |
| 52 | |
| 53 | You can also trace user functions. For example, let's simulate the bashreadline |
| 54 | script, which attaches to the readline function in bash and prints its return |
| 55 | value, effectively snooping all bash shell input across the system: |
| 56 | |
| 57 | # trace 'r:bash:readline "%s", retval' |
| 58 | TIME PID COMM FUNC - |
| 59 | 05:24:50 2740 bash readline echo hi! |
| 60 | 05:24:53 2740 bash readline man ls |
| 61 | ^C |
| 62 | |
| 63 | The special retval keywords stands for the function's return value, and can |
| 64 | be used only in a retprobe, specified by the 'r' prefix. The next component |
| 65 | of the probe is the library that contains the desired function. It's OK to |
| 66 | specify executables too, as long as they can be found in the PATH. Or, you |
| 67 | can specify the full path to the executable (e.g. "/usr/bin/bash"). |
| 68 | |
| 69 | Multiple probes can be combined on the same command line. For example, let's |
| 70 | trace failed read and write calls on the libc level: |
| 71 | |
| 72 | # trace 'r:c:read ((int)retval < 0) "read failed: %d", retval' \ |
| 73 | 'r:c:write ((int)retval < 0) "write failed: %d", retval' |
| 74 | TIME PID COMM FUNC - |
| 75 | 05:31:57 3388 bash write write failed: -1 |
| 76 | 05:32:00 3388 bash write write failed: -1 |
| 77 | ^C |
| 78 | |
| 79 | Note that the retval variable must be cast to int before comparing to zero. |
| 80 | The reason is that the default type for argN and retval is an unsigned 64-bit |
| 81 | integer, which can never be smaller than 0. |
| 82 | |
Sasha Goldshtein | fd60d55 | 2016-03-01 12:15:34 -0800 | [diff] [blame] | 83 | trace has also some basic support for kernel tracepoints. For example, let's |
| 84 | trace the block:block_rq_complete tracepoint and print out the number of sectors |
| 85 | transferred: |
| 86 | |
Sasha Goldshtein | 376ae5c | 2016-10-04 19:49:57 +0300 | [diff] [blame] | 87 | # trace 't:block:block_rq_complete "sectors=%d", args->nr_sector' |
Sasha Goldshtein | fd60d55 | 2016-03-01 12:15:34 -0800 | [diff] [blame] | 88 | TIME PID COMM FUNC - |
| 89 | 01:23:51 0 swapper/0 block_rq_complete sectors=8 |
| 90 | 01:23:55 10017 kworker/u64: block_rq_complete sectors=1 |
| 91 | 01:23:55 0 swapper/0 block_rq_complete sectors=8 |
| 92 | ^C |
| 93 | |
Sasha Goldshtein | 376ae5c | 2016-10-04 19:49:57 +0300 | [diff] [blame] | 94 | To discover the tracepoint structure format (which you can refer to as the "args" |
| 95 | pointer variable), use the tplist tool. For example: |
Sasha Goldshtein | fd60d55 | 2016-03-01 12:15:34 -0800 | [diff] [blame] | 96 | |
| 97 | # tplist -v block:block_rq_complete |
| 98 | block:block_rq_complete |
| 99 | dev_t dev; |
| 100 | sector_t sector; |
| 101 | unsigned int nr_sector; |
| 102 | int errors; |
| 103 | char rwbs[8]; |
| 104 | |
Sasha Goldshtein | 376ae5c | 2016-10-04 19:49:57 +0300 | [diff] [blame] | 105 | This output tells you that you can use "args->dev", "args->sector", etc. in your |
Sasha Goldshtein | fd60d55 | 2016-03-01 12:15:34 -0800 | [diff] [blame] | 106 | predicate and trace arguments. |
| 107 | |
Sasha Goldshtein | accd4cf | 2016-10-11 07:56:13 -0700 | [diff] [blame] | 108 | |
| 109 | More and more high-level libraries are instrumented with USDT probe support. |
| 110 | These probes can be traced by trace just like kernel tracepoints. For example, |
| 111 | trace new threads being created and their function name: |
| 112 | |
| 113 | # trace 'u:pthread:pthread_create "%U", arg3' |
| 114 | TIME PID COMM FUNC - |
| 115 | 02:07:29 4051 contentions pthread_create primes_thread+0x0 |
| 116 | 02:07:29 4051 contentions pthread_create primes_thread+0x0 |
| 117 | 02:07:29 4051 contentions pthread_create primes_thread+0x0 |
| 118 | 02:07:29 4051 contentions pthread_create primes_thread+0x0 |
| 119 | ^C |
| 120 | |
| 121 | The "%U" format specifier tells trace to resolve arg3 as a user-space symbol, |
| 122 | if possible. Similarly, use "%K" for kernel symbols. |
| 123 | |
| 124 | Ruby, Node, and OpenJDK are also instrumented with USDT. For example, let's |
| 125 | trace Ruby methods being called (this requires a version of Ruby built with |
| 126 | the --enable-dtrace configure flag): |
| 127 | |
| 128 | # trace 'u:ruby:method__entry "%s.%s", arg1, arg2' -p $(pidof irb) |
| 129 | TIME PID COMM FUNC - |
| 130 | 12:08:43 18420 irb method__entry IRB::Context.verbose? |
| 131 | 12:08:43 18420 irb method__entry RubyLex.ungetc |
| 132 | 12:08:43 18420 irb method__entry RuxyLex.debug? |
| 133 | ^C |
| 134 | |
| 135 | In the previous invocation, arg1 and arg2 are the class name and method name |
| 136 | for the Ruby method being invoked. |
| 137 | |
| 138 | |
Sasha Goldshtein | f4797b0 | 2016-10-17 01:44:56 -0700 | [diff] [blame] | 139 | Occasionally, it can be useful to filter specific strings. For example, you |
| 140 | might be interested in open() calls that open a specific file: |
| 141 | |
| 142 | # trace 'p:c:open (STRCMP("test.txt", arg1)) "opening %s", arg1' |
| 143 | TIME PID COMM FUNC - |
| 144 | 01:43:15 10938 cat open opening test.txt |
| 145 | 01:43:20 10939 cat open opening test.txt |
| 146 | ^C |
| 147 | |
| 148 | |
Sasha Goldshtein | 38847f0 | 2016-02-22 02:19:24 -0800 | [diff] [blame] | 149 | As a final example, let's trace open syscalls for a specific process. By |
| 150 | default, tracing is system-wide, but the -p switch overrides this: |
| 151 | |
Sasha Goldshtein | 8acd015 | 2016-02-22 02:25:03 -0800 | [diff] [blame] | 152 | # trace -p 2740 'do_sys_open "%s", arg2' |
Sasha Goldshtein | 38847f0 | 2016-02-22 02:19:24 -0800 | [diff] [blame] | 153 | TIME PID COMM FUNC - |
| 154 | 05:36:16 15872 ls do_sys_open /etc/ld.so.cache |
| 155 | 05:36:16 15872 ls do_sys_open /lib64/libselinux.so.1 |
| 156 | 05:36:16 15872 ls do_sys_open /lib64/libcap.so.2 |
| 157 | 05:36:16 15872 ls do_sys_open /lib64/libacl.so.1 |
| 158 | 05:36:16 15872 ls do_sys_open /lib64/libc.so.6 |
| 159 | 05:36:16 15872 ls do_sys_open /lib64/libpcre.so.1 |
| 160 | 05:36:16 15872 ls do_sys_open /lib64/libdl.so.2 |
| 161 | 05:36:16 15872 ls do_sys_open /lib64/libattr.so.1 |
| 162 | 05:36:16 15872 ls do_sys_open /lib64/libpthread.so.0 |
| 163 | 05:36:16 15872 ls do_sys_open /usr/lib/locale/locale-archive |
| 164 | 05:36:16 15872 ls do_sys_open /home/vagrant |
| 165 | ^C |
| 166 | |
| 167 | In this example, we traced the "ls ~" command as it was opening its shared |
| 168 | libraries and then accessing the /home/vagrant directory listing. |
| 169 | |
| 170 | |
| 171 | USAGE message: |
| 172 | |
| 173 | # trace -h |
| 174 | usage: trace.py [-h] [-p PID] [-v] [-Z STRING_SIZE] [-S] [-M MAX_EVENTS] [-o] |
| 175 | probe [probe ...] |
| 176 | |
| 177 | Attach to functions and print trace messages. |
| 178 | |
| 179 | positional arguments: |
| 180 | probe probe specifier (see examples) |
| 181 | |
| 182 | optional arguments: |
| 183 | -h, --help show this help message and exit |
| 184 | -p PID, --pid PID id of the process to trace (optional) |
| 185 | -v, --verbose print resulting BPF program code before executing |
| 186 | -Z STRING_SIZE, --string-size STRING_SIZE |
| 187 | maximum size to read from strings |
| 188 | -S, --include-self do not filter trace's own pid from the trace |
| 189 | -M MAX_EVENTS, --max-events MAX_EVENTS |
| 190 | number of events to print before quitting |
| 191 | -o, --offset use relative time from first traced message |
Sasha Goldshtein | 4725a72 | 2016-10-18 20:54:47 +0300 | [diff] [blame] | 192 | -K, --kernel-stack output kernel stack trace |
| 193 | -U, --user-stack output user stack trace |
| 194 | -I header, --include header |
| 195 | additional header files to include in the BPF program |
Sasha Goldshtein | 38847f0 | 2016-02-22 02:19:24 -0800 | [diff] [blame] | 196 | |
| 197 | EXAMPLES: |
| 198 | |
Sasha Goldshtein | 8acd015 | 2016-02-22 02:25:03 -0800 | [diff] [blame] | 199 | trace do_sys_open |
Sasha Goldshtein | 38847f0 | 2016-02-22 02:19:24 -0800 | [diff] [blame] | 200 | Trace the open syscall and print a default trace message when entered |
Sasha Goldshtein | 8acd015 | 2016-02-22 02:25:03 -0800 | [diff] [blame] | 201 | trace 'do_sys_open "%s", arg2' |
Sasha Goldshtein | 38847f0 | 2016-02-22 02:19:24 -0800 | [diff] [blame] | 202 | Trace the open syscall and print the filename being opened |
Sasha Goldshtein | 8acd015 | 2016-02-22 02:25:03 -0800 | [diff] [blame] | 203 | trace 'sys_read (arg3 > 20000) "read %d bytes", arg3' |
Sasha Goldshtein | 38847f0 | 2016-02-22 02:19:24 -0800 | [diff] [blame] | 204 | Trace the read syscall and print a message for reads >20000 bytes |
| 205 | trace r::do_sys_return |
| 206 | Trace the return from the open syscall |
Sasha Goldshtein | 8acd015 | 2016-02-22 02:25:03 -0800 | [diff] [blame] | 207 | trace 'c:open (arg2 == 42) "%s %d", arg1, arg2' |
Sasha Goldshtein | 38847f0 | 2016-02-22 02:19:24 -0800 | [diff] [blame] | 208 | Trace the open() call from libc only if the flags (arg2) argument is 42 |
Sasha Goldshtein | 8acd015 | 2016-02-22 02:25:03 -0800 | [diff] [blame] | 209 | trace 'c:malloc "size = %d", arg1' |
Sasha Goldshtein | 38847f0 | 2016-02-22 02:19:24 -0800 | [diff] [blame] | 210 | Trace malloc calls and print the size being allocated |
Sasha Goldshtein | 8acd015 | 2016-02-22 02:25:03 -0800 | [diff] [blame] | 211 | trace 'p:c:write (arg1 == 1) "writing %d bytes to STDOUT", arg3' |
| 212 | Trace the write() call from libc to monitor writes to STDOUT |
| 213 | trace 'r::__kmalloc (retval == 0) "kmalloc failed!" |
| 214 | Trace returns from __kmalloc which returned a null pointer |
Sasha Goldshtein | f4797b0 | 2016-10-17 01:44:56 -0700 | [diff] [blame] | 215 | trace 'r:c:malloc (retval) "allocated = %x", retval |
Sasha Goldshtein | 38847f0 | 2016-02-22 02:19:24 -0800 | [diff] [blame] | 216 | Trace returns from malloc and print non-NULL allocated buffers |
Sasha Goldshtein | 376ae5c | 2016-10-04 19:49:57 +0300 | [diff] [blame] | 217 | trace 't:block:block_rq_complete "sectors=%d", args->nr_sector' |
Sasha Goldshtein | fd60d55 | 2016-03-01 12:15:34 -0800 | [diff] [blame] | 218 | Trace the block_rq_complete kernel tracepoint and print # of tx sectors |
Sasha Goldshtein | 3e39a08 | 2016-03-24 08:39:47 -0700 | [diff] [blame] | 219 | trace 'u:pthread:pthread_create (arg4 != 0)' |
| 220 | Trace the USDT probe pthread_create when its 4th argument is non-zero |
Sasha Goldshtein | 38847f0 | 2016-02-22 02:19:24 -0800 | [diff] [blame] | 221 | |