blob: 0541fe1de70482e74fa2f10c409130b1ac21ef4b [file] [log] [blame]
Jim Kenistond27a4dd2005-08-04 12:53:35 -07001Title : Kernel Probes (Kprobes)
2Authors : Jim Keniston <jkenisto@us.ibm.com>
3 : Prasanna S Panchamukhi <prasanna@in.ibm.com>
4
5CONTENTS
6
71. Concepts: Kprobes, Jprobes, Return Probes
82. Architectures Supported
93. Configuring Kprobes
104. API Reference
115. Kprobes Features and Limitations
126. Probe Overhead
137. TODO
148. Kprobes Example
159. Jprobes Example
1610. Kretprobes Example
17
181. Concepts: Kprobes, Jprobes, Return Probes
19
20Kprobes enables you to dynamically break into any kernel routine and
21collect debugging and performance information non-disruptively. You
22can trap at almost any kernel code address, specifying a handler
23routine to be invoked when the breakpoint is hit.
24
25There are currently three types of probes: kprobes, jprobes, and
26kretprobes (also called return probes). A kprobe can be inserted
27on virtually any instruction in the kernel. A jprobe is inserted at
28the entry to a kernel function, and provides convenient access to the
29function's arguments. A return probe fires when a specified function
30returns.
31
32In the typical case, Kprobes-based instrumentation is packaged as
33a kernel module. The module's init function installs ("registers")
34one or more probes, and the exit function unregisters them. A
35registration function such as register_kprobe() specifies where
36the probe is to be inserted and what handler is to be called when
37the probe is hit.
38
39The next three subsections explain how the different types of
40probes work. They explain certain things that you'll need to
41know in order to make the best use of Kprobes -- e.g., the
42difference between a pre_handler and a post_handler, and how
43to use the maxactive and nmissed fields of a kretprobe. But
44if you're in a hurry to start using Kprobes, you can skip ahead
45to section 2.
46
471.1 How Does a Kprobe Work?
48
49When a kprobe is registered, Kprobes makes a copy of the probed
50instruction and replaces the first byte(s) of the probed instruction
51with a breakpoint instruction (e.g., int3 on i386 and x86_64).
52
53When a CPU hits the breakpoint instruction, a trap occurs, the CPU's
54registers are saved, and control passes to Kprobes via the
55notifier_call_chain mechanism. Kprobes executes the "pre_handler"
56associated with the kprobe, passing the handler the addresses of the
57kprobe struct and the saved registers.
58
59Next, Kprobes single-steps its copy of the probed instruction.
60(It would be simpler to single-step the actual instruction in place,
61but then Kprobes would have to temporarily remove the breakpoint
62instruction. This would open a small time window when another CPU
63could sail right past the probepoint.)
64
65After the instruction is single-stepped, Kprobes executes the
66"post_handler," if any, that is associated with the kprobe.
67Execution then continues with the instruction following the probepoint.
68
691.2 How Does a Jprobe Work?
70
71A jprobe is implemented using a kprobe that is placed on a function's
72entry point. It employs a simple mirroring principle to allow
73seamless access to the probed function's arguments. The jprobe
74handler routine should have the same signature (arg list and return
75type) as the function being probed, and must always end by calling
76the Kprobes function jprobe_return().
77
78Here's how it works. When the probe is hit, Kprobes makes a copy of
79the saved registers and a generous portion of the stack (see below).
80Kprobes then points the saved instruction pointer at the jprobe's
81handler routine, and returns from the trap. As a result, control
82passes to the handler, which is presented with the same register and
83stack contents as the probed function. When it is done, the handler
84calls jprobe_return(), which traps again to restore the original stack
85contents and processor state and switch to the probed function.
86
87By convention, the callee owns its arguments, so gcc may produce code
88that unexpectedly modifies that portion of the stack. This is why
89Kprobes saves a copy of the stack and restores it after the jprobe
90handler has run. Up to MAX_STACK_SIZE bytes are copied -- e.g.,
9164 bytes on i386.
92
93Note that the probed function's args may be passed on the stack
94or in registers (e.g., for x86_64 or for an i386 fastcall function).
95The jprobe will work in either case, so long as the handler's
96prototype matches that of the probed function.
97
981.3 How Does a Return Probe Work?
99
100When you call register_kretprobe(), Kprobes establishes a kprobe at
101the entry to the function. When the probed function is called and this
102probe is hit, Kprobes saves a copy of the return address, and replaces
103the return address with the address of a "trampoline." The trampoline
104is an arbitrary piece of code -- typically just a nop instruction.
105At boot time, Kprobes registers a kprobe at the trampoline.
106
107When the probed function executes its return instruction, control
108passes to the trampoline and that probe is hit. Kprobes' trampoline
109handler calls the user-specified handler associated with the kretprobe,
110then sets the saved instruction pointer to the saved return address,
111and that's where execution resumes upon return from the trap.
112
113While the probed function is executing, its return address is
114stored in an object of type kretprobe_instance. Before calling
115register_kretprobe(), the user sets the maxactive field of the
116kretprobe struct to specify how many instances of the specified
117function can be probed simultaneously. register_kretprobe()
118pre-allocates the indicated number of kretprobe_instance objects.
119
120For example, if the function is non-recursive and is called with a
121spinlock held, maxactive = 1 should be enough. If the function is
122non-recursive and can never relinquish the CPU (e.g., via a semaphore
123or preemption), NR_CPUS should be enough. If maxactive <= 0, it is
124set to a default value. If CONFIG_PREEMPT is enabled, the default
125is max(10, 2*NR_CPUS). Otherwise, the default is NR_CPUS.
126
127It's not a disaster if you set maxactive too low; you'll just miss
128some probes. In the kretprobe struct, the nmissed field is set to
129zero when the return probe is registered, and is incremented every
130time the probed function is entered but there is no kretprobe_instance
131object available for establishing the return probe.
132
1332. Architectures Supported
134
135Kprobes, jprobes, and return probes are implemented on the following
136architectures:
137
138- i386
139- x86_64 (AMD-64, E64MT)
140- ppc64
141- ia64 (Support for probes on certain instruction types is still in progress.)
142- sparc64 (Return probes not yet implemented.)
143
1443. Configuring Kprobes
145
146When configuring the kernel using make menuconfig/xconfig/oldconfig,
147ensure that CONFIG_KPROBES is set to "y". Under "Kernel hacking",
148look for "Kprobes". You may have to enable "Kernel debugging"
149(CONFIG_DEBUG_KERNEL) before you can enable Kprobes.
150
151You may also want to ensure that CONFIG_KALLSYMS and perhaps even
152CONFIG_KALLSYMS_ALL are set to "y", since kallsyms_lookup_name()
153is a handy, version-independent way to find a function's address.
154
155If you need to insert a probe in the middle of a function, you may find
156it useful to "Compile the kernel with debug info" (CONFIG_DEBUG_INFO),
157so you can use "objdump -d -l vmlinux" to see the source-to-object
158code mapping.
159
1604. API Reference
161
162The Kprobes API includes a "register" function and an "unregister"
163function for each type of probe. Here are terse, mini-man-page
164specifications for these functions and the associated probe handlers
165that you'll write. See the latter half of this document for examples.
166
1674.1 register_kprobe
168
169#include <linux/kprobes.h>
170int register_kprobe(struct kprobe *kp);
171
172Sets a breakpoint at the address kp->addr. When the breakpoint is
173hit, Kprobes calls kp->pre_handler. After the probed instruction
174is single-stepped, Kprobe calls kp->post_handler. If a fault
175occurs during execution of kp->pre_handler or kp->post_handler,
176or during single-stepping of the probed instruction, Kprobes calls
177kp->fault_handler. Any or all handlers can be NULL.
178
179register_kprobe() returns 0 on success, or a negative errno otherwise.
180
181User's pre-handler (kp->pre_handler):
182#include <linux/kprobes.h>
183#include <linux/ptrace.h>
184int pre_handler(struct kprobe *p, struct pt_regs *regs);
185
186Called with p pointing to the kprobe associated with the breakpoint,
187and regs pointing to the struct containing the registers saved when
188the breakpoint was hit. Return 0 here unless you're a Kprobes geek.
189
190User's post-handler (kp->post_handler):
191#include <linux/kprobes.h>
192#include <linux/ptrace.h>
193void post_handler(struct kprobe *p, struct pt_regs *regs,
194 unsigned long flags);
195
196p and regs are as described for the pre_handler. flags always seems
197to be zero.
198
199User's fault-handler (kp->fault_handler):
200#include <linux/kprobes.h>
201#include <linux/ptrace.h>
202int fault_handler(struct kprobe *p, struct pt_regs *regs, int trapnr);
203
204p and regs are as described for the pre_handler. trapnr is the
205architecture-specific trap number associated with the fault (e.g.,
206on i386, 13 for a general protection fault or 14 for a page fault).
207Returns 1 if it successfully handled the exception.
208
2094.2 register_jprobe
210
211#include <linux/kprobes.h>
212int register_jprobe(struct jprobe *jp)
213
214Sets a breakpoint at the address jp->kp.addr, which must be the address
215of the first instruction of a function. When the breakpoint is hit,
216Kprobes runs the handler whose address is jp->entry.
217
218The handler should have the same arg list and return type as the probed
219function; and just before it returns, it must call jprobe_return().
220(The handler never actually returns, since jprobe_return() returns
221control to Kprobes.) If the probed function is declared asmlinkage,
222fastcall, or anything else that affects how args are passed, the
223handler's declaration must match.
224
225register_jprobe() returns 0 on success, or a negative errno otherwise.
226
2274.3 register_kretprobe
228
229#include <linux/kprobes.h>
230int register_kretprobe(struct kretprobe *rp);
231
232Establishes a return probe for the function whose address is
233rp->kp.addr. When that function returns, Kprobes calls rp->handler.
234You must set rp->maxactive appropriately before you call
235register_kretprobe(); see "How Does a Return Probe Work?" for details.
236
237register_kretprobe() returns 0 on success, or a negative errno
238otherwise.
239
240User's return-probe handler (rp->handler):
241#include <linux/kprobes.h>
242#include <linux/ptrace.h>
243int kretprobe_handler(struct kretprobe_instance *ri, struct pt_regs *regs);
244
245regs is as described for kprobe.pre_handler. ri points to the
246kretprobe_instance object, of which the following fields may be
247of interest:
248- ret_addr: the return address
249- rp: points to the corresponding kretprobe object
250- task: points to the corresponding task struct
251The handler's return value is currently ignored.
252
2534.4 unregister_*probe
254
255#include <linux/kprobes.h>
256void unregister_kprobe(struct kprobe *kp);
257void unregister_jprobe(struct jprobe *jp);
258void unregister_kretprobe(struct kretprobe *rp);
259
260Removes the specified probe. The unregister function can be called
261at any time after the probe has been registered.
262
2635. Kprobes Features and Limitations
264
265As of Linux v2.6.12, Kprobes allows multiple probes at the same
266address. Currently, however, there cannot be multiple jprobes on
267the same function at the same time.
268
269In general, you can install a probe anywhere in the kernel.
270In particular, you can probe interrupt handlers. Known exceptions
271are discussed in this section.
272
273For obvious reasons, it's a bad idea to install a probe in
274the code that implements Kprobes (mostly kernel/kprobes.c and
275arch/*/kernel/kprobes.c). A patch in the v2.6.13 timeframe instructs
276Kprobes to reject such requests.
277
278If you install a probe in an inline-able function, Kprobes makes
279no attempt to chase down all inline instances of the function and
280install probes there. gcc may inline a function without being asked,
281so keep this in mind if you're not seeing the probe hits you expect.
282
283A probe handler can modify the environment of the probed function
284-- e.g., by modifying kernel data structures, or by modifying the
285contents of the pt_regs struct (which are restored to the registers
286upon return from the breakpoint). So Kprobes can be used, for example,
287to install a bug fix or to inject faults for testing. Kprobes, of
288course, has no way to distinguish the deliberately injected faults
289from the accidental ones. Don't drink and probe.
290
291Kprobes makes no attempt to prevent probe handlers from stepping on
292each other -- e.g., probing printk() and then calling printk() from a
293probe handler. As of Linux v2.6.12, if a probe handler hits a probe,
294that second probe's handlers won't be run in that instance.
295
296In Linux v2.6.12 and previous versions, Kprobes' data structures are
297protected by a single lock that is held during probe registration and
298unregistration and while handlers are run. Thus, no two handlers
299can run simultaneously. To improve scalability on SMP systems,
300this restriction will probably be removed soon, in which case
301multiple handlers (or multiple instances of the same handler) may
302run concurrently on different CPUs. Code your handlers accordingly.
303
304Kprobes does not use semaphores or allocate memory except during
305registration and unregistration.
306
307Probe handlers are run with preemption disabled. Depending on the
308architecture, handlers may also run with interrupts disabled. In any
309case, your handler should not yield the CPU (e.g., by attempting to
310acquire a semaphore).
311
312Since a return probe is implemented by replacing the return
313address with the trampoline's address, stack backtraces and calls
314to __builtin_return_address() will typically yield the trampoline's
315address instead of the real return address for kretprobed functions.
316(As far as we can tell, __builtin_return_address() is used only
317for instrumentation and error reporting.)
318
319If the number of times a function is called does not match the
320number of times it returns, registering a return probe on that
321function may produce undesirable results. We have the do_exit()
322and do_execve() cases covered. do_fork() is not an issue. We're
323unaware of other specific cases where this could be a problem.
324
3256. Probe Overhead
326
327On a typical CPU in use in 2005, a kprobe hit takes 0.5 to 1.0
328microseconds to process. Specifically, a benchmark that hits the same
329probepoint repeatedly, firing a simple handler each time, reports 1-2
330million hits per second, depending on the architecture. A jprobe or
331return-probe hit typically takes 50-75% longer than a kprobe hit.
332When you have a return probe set on a function, adding a kprobe at
333the entry to that function adds essentially no overhead.
334
335Here are sample overhead figures (in usec) for different architectures.
336k = kprobe; j = jprobe; r = return probe; kr = kprobe + return probe
337on same function; jr = jprobe + return probe on same function
338
339i386: Intel Pentium M, 1495 MHz, 2957.31 bogomips
340k = 0.57 usec; j = 1.00; r = 0.92; kr = 0.99; jr = 1.40
341
342x86_64: AMD Opteron 246, 1994 MHz, 3971.48 bogomips
343k = 0.49 usec; j = 0.76; r = 0.80; kr = 0.82; jr = 1.07
344
345ppc64: POWER5 (gr), 1656 MHz (SMT disabled, 1 virtual CPU per physical CPU)
346k = 0.77 usec; j = 1.31; r = 1.26; kr = 1.45; jr = 1.99
347
3487. TODO
349
350a. SystemTap (http://sourceware.org/systemtap): Work in progress
351to provide a simplified programming interface for probe-based
352instrumentation.
353b. Improved SMP scalability: Currently, work is in progress to handle
354multiple kprobes in parallel.
355c. Kernel return probes for sparc64.
356d. Support for other architectures.
357e. User-space probes.
358
3598. Kprobes Example
360
361Here's a sample kernel module showing the use of kprobes to dump a
362stack trace and selected i386 registers when do_fork() is called.
363----- cut here -----
364/*kprobe_example.c*/
365#include <linux/kernel.h>
366#include <linux/module.h>
367#include <linux/kprobes.h>
368#include <linux/kallsyms.h>
369#include <linux/sched.h>
370
371/*For each probe you need to allocate a kprobe structure*/
372static struct kprobe kp;
373
374/*kprobe pre_handler: called just before the probed instruction is executed*/
375int handler_pre(struct kprobe *p, struct pt_regs *regs)
376{
377 printk("pre_handler: p->addr=0x%p, eip=%lx, eflags=0x%lx\n",
378 p->addr, regs->eip, regs->eflags);
379 dump_stack();
380 return 0;
381}
382
383/*kprobe post_handler: called after the probed instruction is executed*/
384void handler_post(struct kprobe *p, struct pt_regs *regs, unsigned long flags)
385{
386 printk("post_handler: p->addr=0x%p, eflags=0x%lx\n",
387 p->addr, regs->eflags);
388}
389
390/* fault_handler: this is called if an exception is generated for any
391 * instruction within the pre- or post-handler, or when Kprobes
392 * single-steps the probed instruction.
393 */
394int handler_fault(struct kprobe *p, struct pt_regs *regs, int trapnr)
395{
396 printk("fault_handler: p->addr=0x%p, trap #%dn",
397 p->addr, trapnr);
398 /* Return 0 because we don't handle the fault. */
399 return 0;
400}
401
402int init_module(void)
403{
404 int ret;
405 kp.pre_handler = handler_pre;
406 kp.post_handler = handler_post;
407 kp.fault_handler = handler_fault;
408 kp.addr = (kprobe_opcode_t*) kallsyms_lookup_name("do_fork");
409 /* register the kprobe now */
410 if (!kp.addr) {
411 printk("Couldn't find %s to plant kprobe\n", "do_fork");
412 return -1;
413 }
414 if ((ret = register_kprobe(&kp) < 0)) {
415 printk("register_kprobe failed, returned %d\n", ret);
416 return -1;
417 }
418 printk("kprobe registered\n");
419 return 0;
420}
421
422void cleanup_module(void)
423{
424 unregister_kprobe(&kp);
425 printk("kprobe unregistered\n");
426}
427
428MODULE_LICENSE("GPL");
429----- cut here -----
430
431You can build the kernel module, kprobe-example.ko, using the following
432Makefile:
433----- cut here -----
434obj-m := kprobe-example.o
435KDIR := /lib/modules/$(shell uname -r)/build
436PWD := $(shell pwd)
437default:
438 $(MAKE) -C $(KDIR) SUBDIRS=$(PWD) modules
439clean:
440 rm -f *.mod.c *.ko *.o
441----- cut here -----
442
443$ make
444$ su -
445...
446# insmod kprobe-example.ko
447
448You will see the trace data in /var/log/messages and on the console
449whenever do_fork() is invoked to create a new process.
450
4519. Jprobes Example
452
453Here's a sample kernel module showing the use of jprobes to dump
454the arguments of do_fork().
455----- cut here -----
456/*jprobe-example.c */
457#include <linux/kernel.h>
458#include <linux/module.h>
459#include <linux/fs.h>
460#include <linux/uio.h>
461#include <linux/kprobes.h>
462#include <linux/kallsyms.h>
463
464/*
465 * Jumper probe for do_fork.
466 * Mirror principle enables access to arguments of the probed routine
467 * from the probe handler.
468 */
469
470/* Proxy routine having the same arguments as actual do_fork() routine */
471long jdo_fork(unsigned long clone_flags, unsigned long stack_start,
472 struct pt_regs *regs, unsigned long stack_size,
473 int __user * parent_tidptr, int __user * child_tidptr)
474{
475 printk("jprobe: clone_flags=0x%lx, stack_size=0x%lx, regs=0x%p\n",
476 clone_flags, stack_size, regs);
477 /* Always end with a call to jprobe_return(). */
478 jprobe_return();
479 /*NOTREACHED*/
480 return 0;
481}
482
483static struct jprobe my_jprobe = {
484 .entry = (kprobe_opcode_t *) jdo_fork
485};
486
487int init_module(void)
488{
489 int ret;
490 my_jprobe.kp.addr = (kprobe_opcode_t *) kallsyms_lookup_name("do_fork");
491 if (!my_jprobe.kp.addr) {
492 printk("Couldn't find %s to plant jprobe\n", "do_fork");
493 return -1;
494 }
495
496 if ((ret = register_jprobe(&my_jprobe)) <0) {
497 printk("register_jprobe failed, returned %d\n", ret);
498 return -1;
499 }
500 printk("Planted jprobe at %p, handler addr %p\n",
501 my_jprobe.kp.addr, my_jprobe.entry);
502 return 0;
503}
504
505void cleanup_module(void)
506{
507 unregister_jprobe(&my_jprobe);
508 printk("jprobe unregistered\n");
509}
510
511MODULE_LICENSE("GPL");
512----- cut here -----
513
514Build and insert the kernel module as shown in the above kprobe
515example. You will see the trace data in /var/log/messages and on
516the console whenever do_fork() is invoked to create a new process.
517(Some messages may be suppressed if syslogd is configured to
518eliminate duplicate messages.)
519
52010. Kretprobes Example
521
522Here's a sample kernel module showing the use of return probes to
523report failed calls to sys_open().
524----- cut here -----
525/*kretprobe-example.c*/
526#include <linux/kernel.h>
527#include <linux/module.h>
528#include <linux/kprobes.h>
529#include <linux/kallsyms.h>
530
531static const char *probed_func = "sys_open";
532
533/* Return-probe handler: If the probed function fails, log the return value. */
534static int ret_handler(struct kretprobe_instance *ri, struct pt_regs *regs)
535{
536 // Substitute the appropriate register name for your architecture --
537 // e.g., regs->rax for x86_64, regs->gpr[3] for ppc64.
538 int retval = (int) regs->eax;
539 if (retval < 0) {
540 printk("%s returns %d\n", probed_func, retval);
541 }
542 return 0;
543}
544
545static struct kretprobe my_kretprobe = {
546 .handler = ret_handler,
547 /* Probe up to 20 instances concurrently. */
548 .maxactive = 20
549};
550
551int init_module(void)
552{
553 int ret;
554 my_kretprobe.kp.addr =
555 (kprobe_opcode_t *) kallsyms_lookup_name(probed_func);
556 if (!my_kretprobe.kp.addr) {
557 printk("Couldn't find %s to plant return probe\n", probed_func);
558 return -1;
559 }
560 if ((ret = register_kretprobe(&my_kretprobe)) < 0) {
561 printk("register_kretprobe failed, returned %d\n", ret);
562 return -1;
563 }
564 printk("Planted return probe at %p\n", my_kretprobe.kp.addr);
565 return 0;
566}
567
568void cleanup_module(void)
569{
570 unregister_kretprobe(&my_kretprobe);
571 printk("kretprobe unregistered\n");
572 /* nmissed > 0 suggests that maxactive was set too low. */
573 printk("Missed probing %d instances of %s\n",
574 my_kretprobe.nmissed, probed_func);
575}
576
577MODULE_LICENSE("GPL");
578----- cut here -----
579
580Build and insert the kernel module as shown in the above kprobe
581example. You will see the trace data in /var/log/messages and on the
582console whenever sys_open() returns a negative value. (Some messages
583may be suppressed if syslogd is configured to eliminate duplicate
584messages.)
585
586For additional information on Kprobes, refer to the following URLs:
587http://www-106.ibm.com/developerworks/library/l-kprobes.html?ca=dgr-lnxw42Kprobe
588http://www.redhat.com/magazine/005mar05/features/kprobes/