| Suchakra Sharma | c497056 | 2015-08-03 19:22:22 -0400 | [diff] [blame] | 1 |  |
| Brenden | c3c4fc1 | 2015-05-03 08:33:53 -0700 | [diff] [blame] | 2 | # BPF Compiler Collection (BCC) |
| 3 | |
| 4 | This directory contains source code for BCC, a toolkit for creating small |
| 5 | programs that can be dynamically loaded into a Linux kernel. |
| 6 | |
| 7 | The compiler relies upon eBPF (Extended Berkeley Packet Filters), which is a |
| Brenden Blanco | 3151843 | 2015-07-07 17:38:30 -0700 | [diff] [blame] | 8 | feature in Linux kernels starting from 3.15. Currently, this compiler leverages |
| Brenden | c3c4fc1 | 2015-05-03 08:33:53 -0700 | [diff] [blame] | 9 | features which are mostly available in Linux 4.1 and above. |
| 10 | |
| Brenden Blanco | 3151843 | 2015-07-07 17:38:30 -0700 | [diff] [blame] | 11 | ## Installing |
| 12 | |
| 13 | See [INSTALL.md](INSTALL.md) for installation steps on your platform. |
| 14 | |
| Brenden | c3c4fc1 | 2015-05-03 08:33:53 -0700 | [diff] [blame] | 15 | ## Motivation |
| 16 | |
| 17 | BPF guarantees that the programs loaded into the kernel cannot crash, and |
| Brenden Blanco | 452de20 | 2015-05-03 10:43:07 -0700 | [diff] [blame] | 18 | cannot run forever, but yet BPF is general purpose enough to perform many |
| 19 | arbitrary types of computation. Currently, it is possible to write a program in |
| Brenden | c3c4fc1 | 2015-05-03 08:33:53 -0700 | [diff] [blame] | 20 | C that will compile into a valid BPF program, yet it is vastly easier to |
| 21 | write a C program that will compile into invalid BPF (C is like that). The user |
| Brenden Blanco | 452de20 | 2015-05-03 10:43:07 -0700 | [diff] [blame] | 22 | won't know until trying to run the program whether it was valid or not. |
| Brenden | c3c4fc1 | 2015-05-03 08:33:53 -0700 | [diff] [blame] | 23 | |
| 24 | With a BPF-specific frontend, one should be able to write in a language and |
| 25 | receive feedback from the compiler on the validity as it pertains to a BPF |
| 26 | backend. This toolkit aims to provide a frontend that can only create valid BPF |
| 27 | programs while still harnessing its full flexibility. |
| 28 | |
| Brenden Blanco | 46176a1 | 2015-07-07 13:05:22 -0700 | [diff] [blame] | 29 | Furthermore, current integrations with BPF have a kludgy workflow, sometimes |
| 30 | involving compiling directly in a linux kernel source tree. This toolchain aims |
| 31 | to minimize the time that a developer spends getting BPF compiled, and instead |
| 32 | focus on the applications that can be written and the problems that can be |
| 33 | solved with BPF. |
| 34 | |
| Brenden | c3c4fc1 | 2015-05-03 08:33:53 -0700 | [diff] [blame] | 35 | The features of this toolkit include: |
| 36 | * End-to-end BPF workflow in a shared library |
| Brenden Blanco | 46176a1 | 2015-07-07 13:05:22 -0700 | [diff] [blame] | 37 | * A modified C language for BPF backends |
| Brenden Blanco | 452de20 | 2015-05-03 10:43:07 -0700 | [diff] [blame] | 38 | * Integration with llvm-bpf backend for JIT |
| Brenden | c3c4fc1 | 2015-05-03 08:33:53 -0700 | [diff] [blame] | 39 | * Dynamic (un)loading of JITed programs |
| 40 | * Support for BPF kernel hooks: socket filters, tc classifiers, |
| 41 | tc actions, and kprobes |
| 42 | * Bindings for Python |
| 43 | * Examples for socket filters, tc classifiers, and kprobes |
| Brenden Blanco | 46176a1 | 2015-07-07 13:05:22 -0700 | [diff] [blame] | 44 | |
| 45 | In the future, more bindings besides python will likely be supported. Feel free |
| 46 | to add support for the language of your choice and send a pull request! |
| 47 | |
| 48 | ## Examples |
| 49 | |
| 50 | This toolchain is currently composed of two parts: a C wrapper around LLVM, and |
| 51 | a Python API to interact with the running program. Later, we will go into more |
| 52 | detail of how this all works. |
| 53 | |
| 54 | ### Hello, World |
| 55 | |
| 56 | First, we should include the BPF class from the bpf module: |
| 57 | ```python |
| 58 | from bpf import BPF |
| 59 | ``` |
| 60 | |
| 61 | Since the C code is so short, we will embed it inside the python script. |
| 62 | |
| 63 | The BPF program always takes at least one argument, which is a pointer to the |
| 64 | context for this type of program. Different program types have different calling |
| 65 | conventions, but for this one we don't care so `void *` is fine. |
| 66 | ```python |
| 67 | prog = """ |
| 68 | int hello(void *ctx) { |
| 69 | bpf_trace_printk("Hello, World!\\n"); |
| 70 | return 0; |
| 71 | }; |
| 72 | """ |
| 73 | b = BPF(text=prog) |
| 74 | ``` |
| 75 | |
| 76 | For this example, we will call the program every time `fork()` is called by a |
| 77 | userspace process. Underneath the hood, fork translates to the `clone` syscall, |
| 78 | so we will attach our program to the kernel symbol `sys_clone`. |
| 79 | ```python |
| 80 | fn = b.load_func("hello", BPF.KPROBE) |
| 81 | BPF.attach_kprobe(fn, "sys_clone") |
| 82 | ``` |
| 83 | |
| 84 | The python process will then print the trace printk circular buffer until ctrl-c |
| 85 | is pressed. The BPF program is removed from the kernel when the userspace |
| 86 | process that loaded it closes the fd (or exits). |
| 87 | ```python |
| 88 | from subprocess import call |
| 89 | try: |
| 90 | call(["cat", "/sys/kernel/debug/tracing/trace_pipe"]) |
| 91 | except KeyboardInterrupt: |
| 92 | pass |
| 93 | ``` |
| 94 | |
| 95 | Output: |
| 96 | ``` |
| 97 | bcc/examples$ sudo python hello_world.py |
| 98 | python-7282 [002] d... 3757.488508: : Hello, World! |
| 99 | ``` |
| 100 | |
| 101 | [Source code listing](examples/hello_world.py) |
| 102 | |
| 103 | ### Networking |
| 104 | |
| Brenden Blanco | 3151843 | 2015-07-07 17:38:30 -0700 | [diff] [blame] | 105 | At RedHat Summit 2015, BCC was presented as part of a [session on BPF](http://www.devnation.org/#7784f1f7513e8542e4db519e79ff5eec). |
| 106 | A multi-host vxlan environment is simulated and a BPF program used to monitor |
| 107 | one of the physical interfaces. The BPF program keeps statistics on the inner |
| 108 | and outer IP addresses traversing the interface, and the userspace component |
| 109 | turns those statistics into a graph showing the traffic distribution at |
| 110 | multiple granularities. See the code [here](examples/tunnel_monitor). |
| 111 | |
| 112 | [](https://youtu.be/yYy3Cwce02k) |
| Brenden Blanco | 46176a1 | 2015-07-07 13:05:22 -0700 | [diff] [blame] | 113 | |
| 114 | ### Tracing |
| Brenden | c3c4fc1 | 2015-05-03 08:33:53 -0700 | [diff] [blame] | 115 | |
| Brenden Blanco | 3151843 | 2015-07-07 17:38:30 -0700 | [diff] [blame] | 116 | Here is a slightly more complex tracing example than Hello World. This program |
| 117 | will be invoked for every task change in the kernel, and record in a BPF map |
| 118 | the new and old pids. |
| 119 | |
| 120 | The C program below introduces two new concepts. |
| 121 | The first is the macro `BPF_TABLE`. This defines a table (type="hash"), with key |
| 122 | type `key_t` and leaf type `u64` (a single counter). The table name is `stats`, |
| 123 | containing 1024 entries maximum. One can `lookup`, `lookup_or_init`, `update`, |
| 124 | and `delete` entries from the table. |
| 125 | The second concept is the prev argument. This argument is treated specially by |
| 126 | the BCC frontend, such that accesses to this variable are read from the saved |
| 127 | context that is passed by the kprobe infrastructure. The prototype of the args |
| 128 | starting from position 1 should match the prototype of the kernel function being |
| 129 | kprobed. If done so, the program will have seamless access to the function |
| 130 | parameters. |
| 131 | ```c |
| 132 | #include <uapi/linux/ptrace.h> |
| 133 | #include <linux/sched.h> |
| 134 | |
| 135 | struct key_t { |
| 136 | u32 prev_pid; |
| 137 | u32 curr_pid; |
| 138 | }; |
| 139 | // map_type, key_type, leaf_type, table_name, num_entry |
| 140 | BPF_TABLE("hash", struct key_t, u64, stats, 1024); |
| 141 | int count_sched(struct pt_regs *ctx, struct task_struct *prev) { |
| 142 | struct key_t key = {}; |
| 143 | u64 zero = 0, *val; |
| 144 | |
| 145 | key.curr_pid = bpf_get_current_pid_tgid(); |
| 146 | key.prev_pid = prev->pid; |
| 147 | |
| 148 | val = stats.lookup_or_init(&key, &zero); |
| 149 | (*val)++; |
| 150 | return 0; |
| 151 | } |
| 152 | ``` |
| 153 | [Source code listing](examples/task_switch.c) |
| 154 | |
| 155 | The userspace component loads the file shown above, and attaches it to the |
| 156 | `finish_task_switch` kernel function (which takes one `struct task_struct *` |
| 157 | argument). The `get_table` API returns an object that gives dict-style access |
| 158 | to the stats BPF map. The python program could use that handle to modify the |
| 159 | kernel table as well. |
| 160 | ```python |
| 161 | from bpf import BPF |
| 162 | from time import sleep |
| 163 | |
| 164 | b = BPF(src_file="task_switch.c") |
| Brenden Blanco | c8b6698 | 2015-08-28 23:15:19 -0700 | [diff] [blame^] | 165 | b.attach_kprobe(event="finish_task_switch", fn_name="count_sched") |
| Brenden Blanco | 3151843 | 2015-07-07 17:38:30 -0700 | [diff] [blame] | 166 | |
| 167 | # generate many schedule events |
| 168 | for i in range(0, 100): sleep(0.01) |
| 169 | |
| Brenden Blanco | c8b6698 | 2015-08-28 23:15:19 -0700 | [diff] [blame^] | 170 | for k, v in b["stats"].items(): |
| Brenden Blanco | 3151843 | 2015-07-07 17:38:30 -0700 | [diff] [blame] | 171 | print("task_switch[%5d->%5d]=%u" % (k.prev_pid, k.curr_pid, v.value)) |
| 172 | ``` |
| 173 | [Source code listing](examples/task_switch.py) |
| 174 | |
| Brenden | c3c4fc1 | 2015-05-03 08:33:53 -0700 | [diff] [blame] | 175 | ## Requirements |
| 176 | |
| Brenden Blanco | 46176a1 | 2015-07-07 13:05:22 -0700 | [diff] [blame] | 177 | To get started using this toolchain in binary format, one needs: |
| Brenden | c3c4fc1 | 2015-05-03 08:33:53 -0700 | [diff] [blame] | 178 | * Linux kernel 4.1 or newer, with these flags enabled: |
| Brenden Blanco | 8310291 | 2015-06-09 17:43:27 -0700 | [diff] [blame] | 179 | * `CONFIG_BPF=y` |
| 180 | * `CONFIG_BPF_SYSCALL=y` |
| 181 | * `CONFIG_NET_CLS_BPF=m` [optional, for tc filters] |
| 182 | * `CONFIG_NET_ACT_BPF=m` [optional, for tc actions] |
| 183 | * `CONFIG_BPF_JIT=y` |
| 184 | * `CONFIG_HAVE_BPF_JIT=y` |
| 185 | * `CONFIG_BPF_EVENTS=y` [optional, for kprobes] |
| Brenden Blanco | 46176a1 | 2015-07-07 13:05:22 -0700 | [diff] [blame] | 186 | * Headers for the above kernel |
| 187 | * gcc, make, python |
| 188 | * python-pyroute2 (for some networking features only) |
| Brenden | c3c4fc1 | 2015-05-03 08:33:53 -0700 | [diff] [blame] | 189 | |
| Brenden Blanco | 452de20 | 2015-05-03 10:43:07 -0700 | [diff] [blame] | 190 | ## Getting started |
| 191 | |
| Brenden Blanco | 46176a1 | 2015-07-07 13:05:22 -0700 | [diff] [blame] | 192 | As of this writing, binary packages for the above requirements are available |
| 193 | in unstable formats. Both Ubuntu and Fedora have 4.2-rcX builds with the above |
| 194 | flags defaulted to on. LLVM provides 3.7 Ubuntu packages (but not Fedora yet). |
| Brenden Blanco | 452de20 | 2015-05-03 10:43:07 -0700 | [diff] [blame] | 195 | |
| Brenden Blanco | 3151843 | 2015-07-07 17:38:30 -0700 | [diff] [blame] | 196 | See [INSTALL.md](INSTALL.md) for installation steps on your platform. |