| Suchakra Sharma | c497056 | 2015-08-03 19:22:22 -0400 | [diff] [blame] | 1 |  |
| Brenden | c3c4fc1 | 2015-05-03 08:33:53 -0700 | [diff] [blame] | 2 | # BPF Compiler Collection (BCC) |
| 3 | |
| Brendan Gregg | 493fd62 | 2015-09-10 14:46:52 -0700 | [diff] [blame] | 4 | BCC is a toolkit for creating efficient kernel tracing and manipulation |
| 5 | programs, and includes several useful tools and examples. It makes use of eBPF |
| 6 | (Extended Berkeley Packet Filters), a new feature that was first added to |
| 7 | Linux 3.15. Much of what BCC uses requires Linux 4.1 and above. |
| Brenden | c3c4fc1 | 2015-05-03 08:33:53 -0700 | [diff] [blame] | 8 | |
| Brendan Gregg | 493fd62 | 2015-09-10 14:46:52 -0700 | [diff] [blame] | 9 | eBPF was [described by](https://lkml.org/lkml/2015/4/14/232) Ingo Molnár as: |
| 10 | |
| 11 | > One of the more interesting features in this cycle is the ability to attach eBPF programs (user-defined, sandboxed bytecode executed by the kernel) to kprobes. This allows user-defined instrumentation on a live kernel image that can never crash, hang or interfere with the kernel negatively. |
| 12 | |
| Brendan Gregg | 90b3ea5 | 2015-09-10 14:50:02 -0700 | [diff] [blame] | 13 | BCC makes eBPF programs easier to write, with kernel instrumentation in C |
| 14 | and a front-end in Python. It is suited for many tasks, including performance |
| 15 | analysis and network traffic control. |
| Brendan Gregg | 493fd62 | 2015-09-10 14:46:52 -0700 | [diff] [blame] | 16 | |
| 17 | ## Screenshot |
| 18 | |
| 19 | This example traces a disk I/O kernel function, and populates an in-kernel |
| 20 | power-of-2 histogram of the I/O size. For efficiency, only the histogram |
| 21 | summary is returned to user-level. |
| 22 | |
| 23 | ```Shell |
| 24 | # ./bitehist.py |
| 25 | Tracing... Hit Ctrl-C to end. |
| 26 | ^C |
| Brendan Gregg | 8d70a88 | 2015-09-25 11:07:23 -0700 | [diff] [blame] | 27 | kbytes : count distribution |
| Brendan Gregg | 493fd62 | 2015-09-10 14:46:52 -0700 | [diff] [blame] | 28 | 0 -> 1 : 3 | | |
| 29 | 2 -> 3 : 0 | | |
| 30 | 4 -> 7 : 211 |********** | |
| 31 | 8 -> 15 : 0 | | |
| 32 | 16 -> 31 : 0 | | |
| 33 | 32 -> 63 : 0 | | |
| 34 | 64 -> 127 : 1 | | |
| 35 | 128 -> 255 : 800 |**************************************| |
| 36 | ``` |
| 37 | |
| 38 | The above output shows a bimodal distribution, where the largest mode of |
| 39 | 800 I/O was between 128 and 255 Kbytes in size. |
| 40 | |
| Dr.Z | d978a0d | 2015-11-12 04:45:21 +0900 | [diff] [blame] | 41 | See the source: [bitehist.c](examples/tracing/bitehist.c) and |
| 42 | [bitehist.py](examples/tracing/bitehist.py). What this traces, what this stores, and how |
| Brendan Gregg | 493fd62 | 2015-09-10 14:46:52 -0700 | [diff] [blame] | 43 | the data is presented, can be entirely customized. This shows only some of |
| 44 | many possible capabilities. |
| Brenden | c3c4fc1 | 2015-05-03 08:33:53 -0700 | [diff] [blame] | 45 | |
| Brenden Blanco | 3151843 | 2015-07-07 17:38:30 -0700 | [diff] [blame] | 46 | ## Installing |
| 47 | |
| 48 | See [INSTALL.md](INSTALL.md) for installation steps on your platform. |
| 49 | |
| Brendan Gregg | 493fd62 | 2015-09-10 14:46:52 -0700 | [diff] [blame] | 50 | ## Contents |
| 51 | |
| 52 | Some of these are single files that contain both C and Python, others have a |
| 53 | pair of .c and .py files, and some are directories of files. |
| 54 | |
| 55 | ### Tracing |
| 56 | |
| 57 | Examples: |
| 58 | |
| Dr.Z | d978a0d | 2015-11-12 04:45:21 +0900 | [diff] [blame] | 59 | - examples/tracing/[bitehist.py](examples/tracing/bitehist.py) examples/tracing/[bitehist.c](examples/tracing/bitehist.c): Block I/O size histogram. [Examples](examples/tracing/bitehist_example.txt). |
| 60 | - examples/tracing/[disksnoop.py](examples/tracing/disksnoop.py) examples/tracing/[disksnoop.c](examples/tracing/disksnoop.c): Trace block device I/O latency. [Examples](examples/tracing/disksnoop_example.txt). |
| Brendan Gregg | 493fd62 | 2015-09-10 14:46:52 -0700 | [diff] [blame] | 61 | - examples/[hello_world.py](examples/hello_world.py): Prints "Hello, World!" for new processes. |
| Dr.Z | d978a0d | 2015-11-12 04:45:21 +0900 | [diff] [blame] | 62 | - examples/tracing/[tcpv4connect](examples/tracing/tcpv4connect): Trace TCP IPv4 active connections. [Examples](examples/tracing/tcpv4connect_example.txt). |
| 63 | - examples/tracing/[trace_fields.py](examples/tracing/trace_fields.py): Simple example of printing fields from traced events. |
| 64 | - examples/tracing/[vfsreadlat.py](examples/tracing/vfsreadlat.py) examples/tracing/[vfsreadlat.c](examples/tracing/vfsreadlat.c): VFS read latency distribution. [Examples](examples/tracing/vfsreadlat_example.txt). |
| Brendan Gregg | 493fd62 | 2015-09-10 14:46:52 -0700 | [diff] [blame] | 65 | |
| 66 | Tools: |
| 67 | |
| Sasha Goldshtein | 6ae261e | 2016-02-14 08:32:54 -0800 | [diff] [blame] | 68 | - tools/[argdist](tools/argdist.py): Display function parameter values as a histogram or frequency count. [Examples](tools/argdist_example.txt). |
| Brendan Gregg | aa87997 | 2016-01-28 22:43:37 -0800 | [diff] [blame] | 69 | - tools/[bashreadline](tools/bashreadline.py): Print entered bash commands system wide. [Examples](tools/bashreadline_example.txt). |
| Brendan Gregg | 7bf0e49 | 2016-01-27 23:17:40 -0800 | [diff] [blame] | 70 | - tools/[biolatency](tools/biolatency.py): Summarize block device I/O latency as a histogram. [Examples](tools/biolatency_example.txt). |
| Brendan Gregg | 6f075b9 | 2016-02-07 00:46:34 -0800 | [diff] [blame] | 71 | - tools/[biotop](tools/biotop.py): Top for disks: Summarize block device I/O by process. [Examples](tools/biotop_example.txt). |
| Brendan Gregg | 7bf0e49 | 2016-01-27 23:17:40 -0800 | [diff] [blame] | 72 | - tools/[biosnoop](tools/biosnoop.py): Trace block device I/O with PID and latency. [Examples](tools/biosnoop_example.txt). |
| Allan McAleavy | eb3c960 | 2016-02-06 12:06:18 +0000 | [diff] [blame] | 73 | - tools/[bitesize](tools/bitesize.py): Show per process I/O size histogram. [Examples](tools/bitesize_example.txt). |
| unixtest | 57abe5b | 2016-01-31 10:47:03 +0000 | [diff] [blame] | 74 | - tools/[cachestat](tools/cachestat.py): Trace page cache hit/miss ratio. [Examples](tools/cachestat_example.txt). |
| Brendan Gregg | 1f1c8f8 | 2016-02-07 16:36:10 -0800 | [diff] [blame] | 75 | - tools/[execsnoop](tools/execsnoop.py): Trace new processes via exec() syscalls. [Examples](tools/execsnoop_example.txt). |
| Brendan Gregg | 2757f0e | 2016-02-10 01:38:32 -0800 | [diff] [blame] | 76 | - tools/[dcsnoop](tools/dcsnoop.py): Trace directory entry cache (dcache) lookups. [Examples](tools/dcsnoop_example.txt). |
| Brendan Gregg | 5bfadab | 2016-02-10 01:36:51 -0800 | [diff] [blame] | 77 | - tools/[dcstat](tools/dcstat.py): Directory entry cache (dcache) stats. [Examples](tools/dcstat_example.txt). |
| Brendan Gregg | 1dcedc4 | 2016-02-12 02:29:08 -0800 | [diff] [blame] | 78 | - tools/[ext4dist](tools/ext4dist.py): Summarize ext4 operation latency. [Examples](tools/ext4dist_example.txt). |
| Brendan Gregg | cd1cad1 | 2016-02-12 02:27:19 -0800 | [diff] [blame] | 79 | - tools/[ext4slower](tools/ext4slower.py): Trace slow ext4 operations. [Examples](tools/ext4slower_example.txt). |
| Brendan Gregg | dc642c5 | 2016-02-09 00:32:51 -0800 | [diff] [blame] | 80 | - tools/[filelife](tools/filelife.py): Trace the lifespan of short-lived files. [Examples](tools/filelife_example.txt). |
| Brendan Gregg | 75d3e9d | 2016-02-07 18:48:20 -0800 | [diff] [blame] | 81 | - tools/[fileslower](tools/fileslower.py): Trace slow synchronous file reads and writes. [Examples](tools/fileslower_example.txt). |
| Brendan Gregg | 08c2981 | 2016-02-09 00:36:43 -0800 | [diff] [blame] | 82 | - tools/[filetop](tools/filetop.py): File reads and writes by filename and process. Top for files. [Examples](tools/filetop_example.txt). |
| Brendan Gregg | 7bf0e49 | 2016-01-27 23:17:40 -0800 | [diff] [blame] | 83 | - tools/[funccount](tools/funccount.py): Count kernel function calls. [Examples](tools/funccount_example.txt). |
| 84 | - tools/[funclatency](tools/funclatency.py): Time kernel functions and show their latency distribution. [Examples](tools/funclatency_example.txt). |
| Brendan Gregg | 5a06c2c | 2016-01-28 23:00:00 -0800 | [diff] [blame] | 85 | - tools/[gethostlatency](tools/gethostlatency.py): Show latency for getaddrinfo/gethostbyname[2] calls. [Examples](tools/gethostlatency_example.txt). |
| Brendan Gregg | 7bf0e49 | 2016-01-27 23:17:40 -0800 | [diff] [blame] | 86 | - tools/[hardirqs](tools/hardirqs.py): Measure hard IRQ (hard interrupt) event time. [Examples](tools/hardirqs_example.txt). |
| 87 | - tools/[killsnoop](tools/killsnoop.py): Trace signals issued by the kill() syscall. [Examples](tools/killsnoop_example.txt). |
| Brendan Gregg | bea3430 | 2016-02-13 21:07:23 -0800 | [diff] [blame] | 88 | - tools/[mdflush](tools/mdflush.py): Trace md flush events. [Examples](tools/mdflush.txt). |
| Sasha Goldshtein | 5687579 | 2016-02-14 07:53:59 -0800 | [diff] [blame] | 89 | - tools/[memleak](tools/memleak.py): Display outstanding memory allocations to find memory leaks. [Examples](tools/memleak_example.txt). |
| Brendan Gregg | 7bf0e49 | 2016-01-27 23:17:40 -0800 | [diff] [blame] | 90 | - tools/[offcputime](tools/offcputime.py): Summarize off-CPU time by kernel stack trace. [Examples](tools/offcputime_example.txt). |
| Brendan Gregg | af2b46a | 2016-01-30 11:02:29 -0800 | [diff] [blame] | 91 | - tools/[offwaketime](tools/offwaketime.py): Summarize blocked time by kernel off-CPU stack and waker stack. [Examples](tools/offwaketime_example.txt). |
| Brendan Gregg | fe430e5 | 2016-02-10 01:34:53 -0800 | [diff] [blame] | 92 | - tools/[oomkill](tools/oomkill.py): Trace the out-of-memory (OOM) killer. [Examples](tools/oomkill_example.txt). |
| Brendan Gregg | 7bf0e49 | 2016-01-27 23:17:40 -0800 | [diff] [blame] | 93 | - tools/[opensnoop](tools/opensnoop.py): Trace open() syscalls. [Examples](tools/opensnoop_example.txt). |
| 94 | - tools/[pidpersec](tools/pidpersec.py): Count new processes (via fork). [Examples](tools/pidpersec_example.txt). |
| Brendan Gregg | 3a391c2 | 2016-02-08 01:20:31 -0800 | [diff] [blame] | 95 | - tools/[runqlat](tools/runqlat.py): Run queue (scheduler) latency as a histogram. [Examples](tools/runqlat_example.txt). |
| Brendan Gregg | 7bf0e49 | 2016-01-27 23:17:40 -0800 | [diff] [blame] | 96 | - tools/[softirqs](tools/softirqs.py): Measure soft IRQ (soft interrupt) event time. [Examples](tools/softirqs_example.txt). |
| 97 | - tools/[stackcount](tools/stackcount.py): Count kernel function calls and their stack traces. [Examples](tools/stackcount_example.txt). |
| 98 | - tools/[stacksnoop](tools/stacksnoop.py): Trace a kernel function and print all kernel stack traces. [Examples](tools/stacksnoop_example.txt). |
| Brendan Gregg | ad341c9 | 2016-02-09 00:31:24 -0800 | [diff] [blame] | 99 | - tools/[statsnoop](tools/statsnoop.py): Trace stat() syscalls. [Examples](tools/statsnoop_example.txt). |
| Brendan Gregg | 7bf0e49 | 2016-01-27 23:17:40 -0800 | [diff] [blame] | 100 | - tools/[syncsnoop](tools/syncsnoop.py): Trace sync() syscall. [Examples](tools/syncsnoop_example.txt). |
| 101 | - tools/[tcpaccept](tools/tcpaccept.py): Trace TCP passive connections (accept()). [Examples](tools/tcpaccept_example.txt). |
| 102 | - tools/[tcpconnect](tools/tcpconnect.py): Trace TCP active connections (connect()). [Examples](tools/tcpconnect_example.txt). |
| Brendan Gregg | 553f2aa | 2016-02-14 18:15:24 -0800 | [diff] [blame^] | 103 | - tools/[tcpretrans](tools/tcpretrans.py): Trace TCP retransmits and TLPs. [Examples](tools/tcpretrans_example.txt). |
| Brendan Gregg | 7bf0e49 | 2016-01-27 23:17:40 -0800 | [diff] [blame] | 104 | - tools/[vfscount](tools/vfscount.py) tools/[vfscount.c](tools/vfscount.c): Count VFS calls. [Examples](tools/vfscount_example.txt). |
| 105 | - tools/[vfsstat](tools/vfsstat.py) tools/[vfsstat.c](tools/vfsstat.c): Count some VFS calls, with column output. [Examples](tools/vfsstat_example.txt). |
| 106 | - tools/[wakeuptime](tools/wakeuptime.py): Summarize sleep to wakeup time by waker kernel stack. [Examples](tools/wakeuptime_example.txt). |
| Brendan Gregg | 23c96fe | 2016-02-12 02:25:32 -0800 | [diff] [blame] | 107 | - tools/[xfsdist](tools/xfsdist.py): Summarize XFS operation latency. [Examples](tools/xfsdist_example.txt). |
| 108 | - tools/[xfsslower](tools/xfsslower.py): Trace slow XFS operations. [Examples](tools/xfsslower_example.txt). |
| Brendan Gregg | 493fd62 | 2015-09-10 14:46:52 -0700 | [diff] [blame] | 109 | |
| 110 | ### Networking |
| 111 | |
| 112 | Examples: |
| 113 | |
| Dr.Z | d978a0d | 2015-11-12 04:45:21 +0900 | [diff] [blame] | 114 | - examples/networking/[distributed_bridge/](examples/networking/distributed_bridge): Distributed bridge example. |
| 115 | - examples/networking/[simple_tc.py](examples/networking/simple_tc.py): Simple traffic control example. |
| 116 | - examples/networking/[simulation.py](examples/networking/simulation.py): Simulation helper. |
| 117 | - examples/networking/neighbor_sharing/[tc_neighbor_sharing.py](examples/networking/neighbor_sharing/tc_neighbor_sharing.py) examples/networking/neighbor_sharing/[tc_neighbor_sharing.c](examples/networking/neighbor_sharing/tc_neighbor_sharing.c): Per-IP classification and rate limiting. |
| 118 | - examples/networking/[tunnel_monitor/](examples/networking/tunnel_monitor): Efficiently monitor traffic flows. [Example video](https://www.youtube.com/watch?v=yYy3Cwce02k). |
| 119 | - examples/networking/vlan_learning/[vlan_learning.py](examples/networking/vlan_learning/vlan_learning.py) examples/[vlan_learning.c](examples/networking/vlan_learning/vlan_learning.c): Demux Ethernet traffic into worker veth+namespaces. |
| Brendan Gregg | 493fd62 | 2015-09-10 14:46:52 -0700 | [diff] [blame] | 120 | |
| Brenden | c3c4fc1 | 2015-05-03 08:33:53 -0700 | [diff] [blame] | 121 | ## Motivation |
| 122 | |
| 123 | BPF guarantees that the programs loaded into the kernel cannot crash, and |
| Brenden Blanco | 452de20 | 2015-05-03 10:43:07 -0700 | [diff] [blame] | 124 | cannot run forever, but yet BPF is general purpose enough to perform many |
| 125 | arbitrary types of computation. Currently, it is possible to write a program in |
| Brenden | c3c4fc1 | 2015-05-03 08:33:53 -0700 | [diff] [blame] | 126 | C that will compile into a valid BPF program, yet it is vastly easier to |
| 127 | write a C program that will compile into invalid BPF (C is like that). The user |
| Brenden Blanco | 452de20 | 2015-05-03 10:43:07 -0700 | [diff] [blame] | 128 | won't know until trying to run the program whether it was valid or not. |
| Brenden | c3c4fc1 | 2015-05-03 08:33:53 -0700 | [diff] [blame] | 129 | |
| 130 | With a BPF-specific frontend, one should be able to write in a language and |
| 131 | receive feedback from the compiler on the validity as it pertains to a BPF |
| 132 | backend. This toolkit aims to provide a frontend that can only create valid BPF |
| 133 | programs while still harnessing its full flexibility. |
| 134 | |
| Brenden Blanco | 46176a1 | 2015-07-07 13:05:22 -0700 | [diff] [blame] | 135 | Furthermore, current integrations with BPF have a kludgy workflow, sometimes |
| 136 | involving compiling directly in a linux kernel source tree. This toolchain aims |
| 137 | to minimize the time that a developer spends getting BPF compiled, and instead |
| 138 | focus on the applications that can be written and the problems that can be |
| 139 | solved with BPF. |
| 140 | |
| Brenden | c3c4fc1 | 2015-05-03 08:33:53 -0700 | [diff] [blame] | 141 | The features of this toolkit include: |
| 142 | * End-to-end BPF workflow in a shared library |
| Brenden Blanco | 46176a1 | 2015-07-07 13:05:22 -0700 | [diff] [blame] | 143 | * A modified C language for BPF backends |
| Brenden Blanco | 452de20 | 2015-05-03 10:43:07 -0700 | [diff] [blame] | 144 | * Integration with llvm-bpf backend for JIT |
| Brenden | c3c4fc1 | 2015-05-03 08:33:53 -0700 | [diff] [blame] | 145 | * Dynamic (un)loading of JITed programs |
| 146 | * Support for BPF kernel hooks: socket filters, tc classifiers, |
| 147 | tc actions, and kprobes |
| 148 | * Bindings for Python |
| 149 | * Examples for socket filters, tc classifiers, and kprobes |
| Brenden Blanco | 3232620 | 2015-09-03 16:31:47 -0700 | [diff] [blame] | 150 | * Self-contained tools for tracing a running system |
| Brenden Blanco | 46176a1 | 2015-07-07 13:05:22 -0700 | [diff] [blame] | 151 | |
| 152 | In the future, more bindings besides python will likely be supported. Feel free |
| 153 | to add support for the language of your choice and send a pull request! |
| 154 | |
| Brendan Gregg | 493fd62 | 2015-09-10 14:46:52 -0700 | [diff] [blame] | 155 | ## Tutorial |
| Brenden Blanco | 46176a1 | 2015-07-07 13:05:22 -0700 | [diff] [blame] | 156 | |
| Brendan Gregg | 493fd62 | 2015-09-10 14:46:52 -0700 | [diff] [blame] | 157 | The BCC toolchain is currently composed of two parts: a C wrapper around LLVM, |
| 158 | and a Python API to interact with the running program. Later, we will go into |
| 159 | more detail of how this all works. |
| Brenden Blanco | 46176a1 | 2015-07-07 13:05:22 -0700 | [diff] [blame] | 160 | |
| 161 | ### Hello, World |
| 162 | |
| 163 | First, we should include the BPF class from the bpf module: |
| 164 | ```python |
| Brenden Blanco | c35989d | 2015-09-02 18:04:07 -0700 | [diff] [blame] | 165 | from bcc import BPF |
| Brenden Blanco | 46176a1 | 2015-07-07 13:05:22 -0700 | [diff] [blame] | 166 | ``` |
| 167 | |
| 168 | Since the C code is so short, we will embed it inside the python script. |
| 169 | |
| 170 | The BPF program always takes at least one argument, which is a pointer to the |
| 171 | context for this type of program. Different program types have different calling |
| 172 | conventions, but for this one we don't care so `void *` is fine. |
| 173 | ```python |
| Yonghong Song | 1375320 | 2015-09-10 19:05:58 -0700 | [diff] [blame] | 174 | BPF(text='void kprobe__sys_clone(void *ctx) { bpf_trace_printk("Hello, World!\\n"); }').trace_print() |
| Brenden Blanco | 46176a1 | 2015-07-07 13:05:22 -0700 | [diff] [blame] | 175 | ``` |
| 176 | |
| 177 | For this example, we will call the program every time `fork()` is called by a |
| Yonghong Song | 1375320 | 2015-09-10 19:05:58 -0700 | [diff] [blame] | 178 | userspace process. Underneath the hood, fork translates to the `clone` syscall. |
| 179 | BCC recognizes prefix `kprobe__`, and will auto attach our program to the kernel symbol `sys_clone`. |
| Brenden Blanco | 46176a1 | 2015-07-07 13:05:22 -0700 | [diff] [blame] | 180 | |
| 181 | The python process will then print the trace printk circular buffer until ctrl-c |
| 182 | is pressed. The BPF program is removed from the kernel when the userspace |
| 183 | process that loaded it closes the fd (or exits). |
| Brenden Blanco | 46176a1 | 2015-07-07 13:05:22 -0700 | [diff] [blame] | 184 | |
| 185 | Output: |
| 186 | ``` |
| Yonghong Song | 1375320 | 2015-09-10 19:05:58 -0700 | [diff] [blame] | 187 | bcc/examples$ sudo python hello_world.py |
| Brenden Blanco | 46176a1 | 2015-07-07 13:05:22 -0700 | [diff] [blame] | 188 | python-7282 [002] d... 3757.488508: : Hello, World! |
| 189 | ``` |
| 190 | |
| Brenden Blanco | 0031285 | 2015-09-04 00:08:19 -0700 | [diff] [blame] | 191 | For an explanation of the meaning of the printed fields, see the trace_pipe |
| 192 | section of the [kernel ftrace doc](https://www.kernel.org/doc/Documentation/trace/ftrace.txt). |
| 193 | |
| Brenden Blanco | 46176a1 | 2015-07-07 13:05:22 -0700 | [diff] [blame] | 194 | [Source code listing](examples/hello_world.py) |
| 195 | |
| 196 | ### Networking |
| 197 | |
| Alex Bagehot | 3b9679a | 2016-02-06 16:01:02 +0000 | [diff] [blame] | 198 | At Red Hat Summit 2015, BCC was presented as part of a [session on BPF](http://www.devnation.org/#7784f1f7513e8542e4db519e79ff5eec). |
| Brenden Blanco | 3151843 | 2015-07-07 17:38:30 -0700 | [diff] [blame] | 199 | A multi-host vxlan environment is simulated and a BPF program used to monitor |
| 200 | one of the physical interfaces. The BPF program keeps statistics on the inner |
| 201 | and outer IP addresses traversing the interface, and the userspace component |
| 202 | turns those statistics into a graph showing the traffic distribution at |
| Dr.Z | d978a0d | 2015-11-12 04:45:21 +0900 | [diff] [blame] | 203 | multiple granularities. See the code [here](examples/networking/tunnel_monitor). |
| Brenden Blanco | 3151843 | 2015-07-07 17:38:30 -0700 | [diff] [blame] | 204 | |
| 205 | [](https://youtu.be/yYy3Cwce02k) |
| Brenden Blanco | 46176a1 | 2015-07-07 13:05:22 -0700 | [diff] [blame] | 206 | |
| 207 | ### Tracing |
| Brenden | c3c4fc1 | 2015-05-03 08:33:53 -0700 | [diff] [blame] | 208 | |
| Brenden Blanco | 3151843 | 2015-07-07 17:38:30 -0700 | [diff] [blame] | 209 | Here is a slightly more complex tracing example than Hello World. This program |
| 210 | will be invoked for every task change in the kernel, and record in a BPF map |
| 211 | the new and old pids. |
| 212 | |
| 213 | The C program below introduces two new concepts. |
| 214 | The first is the macro `BPF_TABLE`. This defines a table (type="hash"), with key |
| 215 | type `key_t` and leaf type `u64` (a single counter). The table name is `stats`, |
| 216 | containing 1024 entries maximum. One can `lookup`, `lookup_or_init`, `update`, |
| 217 | and `delete` entries from the table. |
| 218 | The second concept is the prev argument. This argument is treated specially by |
| 219 | the BCC frontend, such that accesses to this variable are read from the saved |
| 220 | context that is passed by the kprobe infrastructure. The prototype of the args |
| 221 | starting from position 1 should match the prototype of the kernel function being |
| 222 | kprobed. If done so, the program will have seamless access to the function |
| 223 | parameters. |
| 224 | ```c |
| 225 | #include <uapi/linux/ptrace.h> |
| 226 | #include <linux/sched.h> |
| 227 | |
| 228 | struct key_t { |
| 229 | u32 prev_pid; |
| 230 | u32 curr_pid; |
| 231 | }; |
| 232 | // map_type, key_type, leaf_type, table_name, num_entry |
| 233 | BPF_TABLE("hash", struct key_t, u64, stats, 1024); |
| Brenden Blanco | 0031285 | 2015-09-04 00:08:19 -0700 | [diff] [blame] | 234 | // attach to finish_task_switch in kernel/sched/core.c, which has the following |
| 235 | // prototype: |
| 236 | // struct rq *finish_task_switch(struct task_struct *prev) |
| Brenden Blanco | 3151843 | 2015-07-07 17:38:30 -0700 | [diff] [blame] | 237 | int count_sched(struct pt_regs *ctx, struct task_struct *prev) { |
| 238 | struct key_t key = {}; |
| 239 | u64 zero = 0, *val; |
| 240 | |
| 241 | key.curr_pid = bpf_get_current_pid_tgid(); |
| 242 | key.prev_pid = prev->pid; |
| 243 | |
| 244 | val = stats.lookup_or_init(&key, &zero); |
| 245 | (*val)++; |
| 246 | return 0; |
| 247 | } |
| 248 | ``` |
| Dr.Z | d978a0d | 2015-11-12 04:45:21 +0900 | [diff] [blame] | 249 | [Source code listing](examples/tracing/task_switch.c) |
| Brenden Blanco | 3151843 | 2015-07-07 17:38:30 -0700 | [diff] [blame] | 250 | |
| 251 | The userspace component loads the file shown above, and attaches it to the |
| Brenden Blanco | 0031285 | 2015-09-04 00:08:19 -0700 | [diff] [blame] | 252 | `finish_task_switch` kernel function. |
| 253 | The [] operator of the BPF object gives access to each BPF_TABLE in the |
| 254 | program, allowing pass-through access to the values residing in the kernel. Use |
| 255 | the object as you would any other python dict object: read, update, and deletes |
| 256 | are all allowed. |
| Brenden Blanco | 3151843 | 2015-07-07 17:38:30 -0700 | [diff] [blame] | 257 | ```python |
| Brenden Blanco | c35989d | 2015-09-02 18:04:07 -0700 | [diff] [blame] | 258 | from bcc import BPF |
| Brenden Blanco | 3151843 | 2015-07-07 17:38:30 -0700 | [diff] [blame] | 259 | from time import sleep |
| 260 | |
| 261 | b = BPF(src_file="task_switch.c") |
| Brenden Blanco | c8b6698 | 2015-08-28 23:15:19 -0700 | [diff] [blame] | 262 | b.attach_kprobe(event="finish_task_switch", fn_name="count_sched") |
| Brenden Blanco | 3151843 | 2015-07-07 17:38:30 -0700 | [diff] [blame] | 263 | |
| 264 | # generate many schedule events |
| 265 | for i in range(0, 100): sleep(0.01) |
| 266 | |
| Brenden Blanco | c8b6698 | 2015-08-28 23:15:19 -0700 | [diff] [blame] | 267 | for k, v in b["stats"].items(): |
| Brenden Blanco | 3151843 | 2015-07-07 17:38:30 -0700 | [diff] [blame] | 268 | print("task_switch[%5d->%5d]=%u" % (k.prev_pid, k.curr_pid, v.value)) |
| 269 | ``` |
| Dr.Z | d978a0d | 2015-11-12 04:45:21 +0900 | [diff] [blame] | 270 | [Source code listing](examples/tracing/task_switch.py) |
| Brenden Blanco | 3151843 | 2015-07-07 17:38:30 -0700 | [diff] [blame] | 271 | |
| Brenden Blanco | 452de20 | 2015-05-03 10:43:07 -0700 | [diff] [blame] | 272 | ## Getting started |
| 273 | |
| Brenden Blanco | 3151843 | 2015-07-07 17:38:30 -0700 | [diff] [blame] | 274 | See [INSTALL.md](INSTALL.md) for installation steps on your platform. |
| Suchakra Sharma | 09de7bb | 2015-09-24 13:16:26 -0400 | [diff] [blame] | 275 | |
| 276 | ## Contributing |
| Brendan Gregg | 87d2f69 | 2016-02-05 13:36:06 -0800 | [diff] [blame] | 277 | |
| Suchakra Sharma | 4949f1a | 2015-09-24 14:27:46 -0400 | [diff] [blame] | 278 | Already pumped up to commit some code? Here are some resources to join the |
| 279 | discussions in the [IOVisor](https://www.iovisor.org/) community and see |
| 280 | what you want to work on. |
| Suchakra Sharma | 09de7bb | 2015-09-24 13:16:26 -0400 | [diff] [blame] | 281 | |
| 282 | * _Mailing List:_ http://lists.iovisor.org/mailman/listinfo/iovisor-dev |
| 283 | * _IRC:_ #iovisor at irc.oftc.net |
| 284 | * _IRC Logs:_ https://scrollback.io/iovisor/all |
| 285 | * _BCC Issue Tracker:_ [Github Issues](https://github.com/iovisor/bcc/issues) |
| Brendan Gregg | 87d2f69 | 2016-02-05 13:36:06 -0800 | [diff] [blame] | 286 | * _A guide for contributing scripts:_ [CONTRIBUTING-SCRIPTS.md](CONTRIBUTING-SCRIPTS.md) |