| Brenden | c3c4fc1 | 2015-05-03 08:33:53 -0700 | [diff] [blame] | 1 | # BPF Compiler Collection (BCC) |
| 2 | |
| 3 | This directory contains source code for BCC, a toolkit for creating small |
| 4 | programs that can be dynamically loaded into a Linux kernel. |
| 5 | |
| 6 | The compiler relies upon eBPF (Extended Berkeley Packet Filters), which is a |
| Brenden Blanco | 3151843 | 2015-07-07 17:38:30 -0700 | [diff] [blame^] | 7 | feature in Linux kernels starting from 3.15. Currently, this compiler leverages |
| Brenden | c3c4fc1 | 2015-05-03 08:33:53 -0700 | [diff] [blame] | 8 | features which are mostly available in Linux 4.1 and above. |
| 9 | |
| Brenden Blanco | 3151843 | 2015-07-07 17:38:30 -0700 | [diff] [blame^] | 10 | ## Installing |
| 11 | |
| 12 | See [INSTALL.md](INSTALL.md) for installation steps on your platform. |
| 13 | |
| Brenden | c3c4fc1 | 2015-05-03 08:33:53 -0700 | [diff] [blame] | 14 | ## Motivation |
| 15 | |
| 16 | BPF guarantees that the programs loaded into the kernel cannot crash, and |
| Brenden Blanco | 452de20 | 2015-05-03 10:43:07 -0700 | [diff] [blame] | 17 | cannot run forever, but yet BPF is general purpose enough to perform many |
| 18 | arbitrary types of computation. Currently, it is possible to write a program in |
| Brenden | c3c4fc1 | 2015-05-03 08:33:53 -0700 | [diff] [blame] | 19 | C that will compile into a valid BPF program, yet it is vastly easier to |
| 20 | write a C program that will compile into invalid BPF (C is like that). The user |
| Brenden Blanco | 452de20 | 2015-05-03 10:43:07 -0700 | [diff] [blame] | 21 | won't know until trying to run the program whether it was valid or not. |
| Brenden | c3c4fc1 | 2015-05-03 08:33:53 -0700 | [diff] [blame] | 22 | |
| 23 | With a BPF-specific frontend, one should be able to write in a language and |
| 24 | receive feedback from the compiler on the validity as it pertains to a BPF |
| 25 | backend. This toolkit aims to provide a frontend that can only create valid BPF |
| 26 | programs while still harnessing its full flexibility. |
| 27 | |
| Brenden Blanco | 46176a1 | 2015-07-07 13:05:22 -0700 | [diff] [blame] | 28 | Furthermore, current integrations with BPF have a kludgy workflow, sometimes |
| 29 | involving compiling directly in a linux kernel source tree. This toolchain aims |
| 30 | to minimize the time that a developer spends getting BPF compiled, and instead |
| 31 | focus on the applications that can be written and the problems that can be |
| 32 | solved with BPF. |
| 33 | |
| Brenden | c3c4fc1 | 2015-05-03 08:33:53 -0700 | [diff] [blame] | 34 | The features of this toolkit include: |
| 35 | * End-to-end BPF workflow in a shared library |
| Brenden Blanco | 46176a1 | 2015-07-07 13:05:22 -0700 | [diff] [blame] | 36 | * A modified C language for BPF backends |
| Brenden Blanco | 452de20 | 2015-05-03 10:43:07 -0700 | [diff] [blame] | 37 | * Integration with llvm-bpf backend for JIT |
| Brenden | c3c4fc1 | 2015-05-03 08:33:53 -0700 | [diff] [blame] | 38 | * Dynamic (un)loading of JITed programs |
| 39 | * Support for BPF kernel hooks: socket filters, tc classifiers, |
| 40 | tc actions, and kprobes |
| 41 | * Bindings for Python |
| 42 | * Examples for socket filters, tc classifiers, and kprobes |
| Brenden Blanco | 46176a1 | 2015-07-07 13:05:22 -0700 | [diff] [blame] | 43 | |
| 44 | In the future, more bindings besides python will likely be supported. Feel free |
| 45 | to add support for the language of your choice and send a pull request! |
| 46 | |
| 47 | ## Examples |
| 48 | |
| 49 | This toolchain is currently composed of two parts: a C wrapper around LLVM, and |
| 50 | a Python API to interact with the running program. Later, we will go into more |
| 51 | detail of how this all works. |
| 52 | |
| 53 | ### Hello, World |
| 54 | |
| 55 | First, we should include the BPF class from the bpf module: |
| 56 | ```python |
| 57 | from bpf import BPF |
| 58 | ``` |
| 59 | |
| 60 | Since the C code is so short, we will embed it inside the python script. |
| 61 | |
| 62 | The BPF program always takes at least one argument, which is a pointer to the |
| 63 | context for this type of program. Different program types have different calling |
| 64 | conventions, but for this one we don't care so `void *` is fine. |
| 65 | ```python |
| 66 | prog = """ |
| 67 | int hello(void *ctx) { |
| 68 | bpf_trace_printk("Hello, World!\\n"); |
| 69 | return 0; |
| 70 | }; |
| 71 | """ |
| 72 | b = BPF(text=prog) |
| 73 | ``` |
| 74 | |
| 75 | For this example, we will call the program every time `fork()` is called by a |
| 76 | userspace process. Underneath the hood, fork translates to the `clone` syscall, |
| 77 | so we will attach our program to the kernel symbol `sys_clone`. |
| 78 | ```python |
| 79 | fn = b.load_func("hello", BPF.KPROBE) |
| 80 | BPF.attach_kprobe(fn, "sys_clone") |
| 81 | ``` |
| 82 | |
| 83 | The python process will then print the trace printk circular buffer until ctrl-c |
| 84 | is pressed. The BPF program is removed from the kernel when the userspace |
| 85 | process that loaded it closes the fd (or exits). |
| 86 | ```python |
| 87 | from subprocess import call |
| 88 | try: |
| 89 | call(["cat", "/sys/kernel/debug/tracing/trace_pipe"]) |
| 90 | except KeyboardInterrupt: |
| 91 | pass |
| 92 | ``` |
| 93 | |
| 94 | Output: |
| 95 | ``` |
| 96 | bcc/examples$ sudo python hello_world.py |
| 97 | python-7282 [002] d... 3757.488508: : Hello, World! |
| 98 | ``` |
| 99 | |
| 100 | [Source code listing](examples/hello_world.py) |
| 101 | |
| 102 | ### Networking |
| 103 | |
| Brenden Blanco | 3151843 | 2015-07-07 17:38:30 -0700 | [diff] [blame^] | 104 | At RedHat Summit 2015, BCC was presented as part of a [session on BPF](http://www.devnation.org/#7784f1f7513e8542e4db519e79ff5eec). |
| 105 | A multi-host vxlan environment is simulated and a BPF program used to monitor |
| 106 | one of the physical interfaces. The BPF program keeps statistics on the inner |
| 107 | and outer IP addresses traversing the interface, and the userspace component |
| 108 | turns those statistics into a graph showing the traffic distribution at |
| 109 | multiple granularities. See the code [here](examples/tunnel_monitor). |
| 110 | |
| 111 | [](https://youtu.be/yYy3Cwce02k) |
| Brenden Blanco | 46176a1 | 2015-07-07 13:05:22 -0700 | [diff] [blame] | 112 | |
| 113 | ### Tracing |
| Brenden | c3c4fc1 | 2015-05-03 08:33:53 -0700 | [diff] [blame] | 114 | |
| Brenden Blanco | 3151843 | 2015-07-07 17:38:30 -0700 | [diff] [blame^] | 115 | Here is a slightly more complex tracing example than Hello World. This program |
| 116 | will be invoked for every task change in the kernel, and record in a BPF map |
| 117 | the new and old pids. |
| 118 | |
| 119 | The C program below introduces two new concepts. |
| 120 | The first is the macro `BPF_TABLE`. This defines a table (type="hash"), with key |
| 121 | type `key_t` and leaf type `u64` (a single counter). The table name is `stats`, |
| 122 | containing 1024 entries maximum. One can `lookup`, `lookup_or_init`, `update`, |
| 123 | and `delete` entries from the table. |
| 124 | The second concept is the prev argument. This argument is treated specially by |
| 125 | the BCC frontend, such that accesses to this variable are read from the saved |
| 126 | context that is passed by the kprobe infrastructure. The prototype of the args |
| 127 | starting from position 1 should match the prototype of the kernel function being |
| 128 | kprobed. If done so, the program will have seamless access to the function |
| 129 | parameters. |
| 130 | ```c |
| 131 | #include <uapi/linux/ptrace.h> |
| 132 | #include <linux/sched.h> |
| 133 | |
| 134 | struct key_t { |
| 135 | u32 prev_pid; |
| 136 | u32 curr_pid; |
| 137 | }; |
| 138 | // map_type, key_type, leaf_type, table_name, num_entry |
| 139 | BPF_TABLE("hash", struct key_t, u64, stats, 1024); |
| 140 | int count_sched(struct pt_regs *ctx, struct task_struct *prev) { |
| 141 | struct key_t key = {}; |
| 142 | u64 zero = 0, *val; |
| 143 | |
| 144 | key.curr_pid = bpf_get_current_pid_tgid(); |
| 145 | key.prev_pid = prev->pid; |
| 146 | |
| 147 | val = stats.lookup_or_init(&key, &zero); |
| 148 | (*val)++; |
| 149 | return 0; |
| 150 | } |
| 151 | ``` |
| 152 | [Source code listing](examples/task_switch.c) |
| 153 | |
| 154 | The userspace component loads the file shown above, and attaches it to the |
| 155 | `finish_task_switch` kernel function (which takes one `struct task_struct *` |
| 156 | argument). The `get_table` API returns an object that gives dict-style access |
| 157 | to the stats BPF map. The python program could use that handle to modify the |
| 158 | kernel table as well. |
| 159 | ```python |
| 160 | from bpf import BPF |
| 161 | from time import sleep |
| 162 | |
| 163 | b = BPF(src_file="task_switch.c") |
| 164 | fn = b.load_func("count_sched", BPF.KPROBE) |
| 165 | stats = b.get_table("stats") |
| 166 | BPF.attach_kprobe(fn, "finish_task_switch") |
| 167 | |
| 168 | # generate many schedule events |
| 169 | for i in range(0, 100): sleep(0.01) |
| 170 | |
| 171 | for k, v in stats.items(): |
| 172 | print("task_switch[%5d->%5d]=%u" % (k.prev_pid, k.curr_pid, v.value)) |
| 173 | ``` |
| 174 | [Source code listing](examples/task_switch.py) |
| 175 | |
| Brenden | c3c4fc1 | 2015-05-03 08:33:53 -0700 | [diff] [blame] | 176 | ## Requirements |
| 177 | |
| Brenden Blanco | 46176a1 | 2015-07-07 13:05:22 -0700 | [diff] [blame] | 178 | To get started using this toolchain in binary format, one needs: |
| Brenden | c3c4fc1 | 2015-05-03 08:33:53 -0700 | [diff] [blame] | 179 | * Linux kernel 4.1 or newer, with these flags enabled: |
| Brenden Blanco | 8310291 | 2015-06-09 17:43:27 -0700 | [diff] [blame] | 180 | * `CONFIG_BPF=y` |
| 181 | * `CONFIG_BPF_SYSCALL=y` |
| 182 | * `CONFIG_NET_CLS_BPF=m` [optional, for tc filters] |
| 183 | * `CONFIG_NET_ACT_BPF=m` [optional, for tc actions] |
| 184 | * `CONFIG_BPF_JIT=y` |
| 185 | * `CONFIG_HAVE_BPF_JIT=y` |
| 186 | * `CONFIG_BPF_EVENTS=y` [optional, for kprobes] |
| Brenden Blanco | 46176a1 | 2015-07-07 13:05:22 -0700 | [diff] [blame] | 187 | * Headers for the above kernel |
| 188 | * gcc, make, python |
| 189 | * python-pyroute2 (for some networking features only) |
| Brenden | c3c4fc1 | 2015-05-03 08:33:53 -0700 | [diff] [blame] | 190 | |
| Brenden Blanco | 452de20 | 2015-05-03 10:43:07 -0700 | [diff] [blame] | 191 | ## Getting started |
| 192 | |
| Brenden Blanco | 46176a1 | 2015-07-07 13:05:22 -0700 | [diff] [blame] | 193 | As of this writing, binary packages for the above requirements are available |
| 194 | in unstable formats. Both Ubuntu and Fedora have 4.2-rcX builds with the above |
| 195 | flags defaulted to on. LLVM provides 3.7 Ubuntu packages (but not Fedora yet). |
| Brenden Blanco | 452de20 | 2015-05-03 10:43:07 -0700 | [diff] [blame] | 196 | |
| Brenden Blanco | 3151843 | 2015-07-07 17:38:30 -0700 | [diff] [blame^] | 197 | See [INSTALL.md](INSTALL.md) for installation steps on your platform. |