blob: 2c3b1eae42801ae38fe9fe4eb2174f15b44c89f3 [file] [log] [blame]
Jim Kenistond27a4dd2005-08-04 12:53:35 -07001Title : Kernel Probes (Kprobes)
2Authors : Jim Keniston <jkenisto@us.ibm.com>
3 : Prasanna S Panchamukhi <prasanna@in.ibm.com>
4
5CONTENTS
6
71. Concepts: Kprobes, Jprobes, Return Probes
82. Architectures Supported
93. Configuring Kprobes
104. API Reference
115. Kprobes Features and Limitations
126. Probe Overhead
137. TODO
148. Kprobes Example
159. Jprobes Example
1610. Kretprobes Example
17
181. Concepts: Kprobes, Jprobes, Return Probes
19
20Kprobes enables you to dynamically break into any kernel routine and
21collect debugging and performance information non-disruptively. You
22can trap at almost any kernel code address, specifying a handler
23routine to be invoked when the breakpoint is hit.
24
25There are currently three types of probes: kprobes, jprobes, and
26kretprobes (also called return probes). A kprobe can be inserted
27on virtually any instruction in the kernel. A jprobe is inserted at
28the entry to a kernel function, and provides convenient access to the
29function's arguments. A return probe fires when a specified function
30returns.
31
32In the typical case, Kprobes-based instrumentation is packaged as
33a kernel module. The module's init function installs ("registers")
34one or more probes, and the exit function unregisters them. A
35registration function such as register_kprobe() specifies where
36the probe is to be inserted and what handler is to be called when
37the probe is hit.
38
39The next three subsections explain how the different types of
40probes work. They explain certain things that you'll need to
41know in order to make the best use of Kprobes -- e.g., the
42difference between a pre_handler and a post_handler, and how
43to use the maxactive and nmissed fields of a kretprobe. But
44if you're in a hurry to start using Kprobes, you can skip ahead
45to section 2.
46
471.1 How Does a Kprobe Work?
48
49When a kprobe is registered, Kprobes makes a copy of the probed
50instruction and replaces the first byte(s) of the probed instruction
51with a breakpoint instruction (e.g., int3 on i386 and x86_64).
52
53When a CPU hits the breakpoint instruction, a trap occurs, the CPU's
54registers are saved, and control passes to Kprobes via the
55notifier_call_chain mechanism. Kprobes executes the "pre_handler"
56associated with the kprobe, passing the handler the addresses of the
57kprobe struct and the saved registers.
58
59Next, Kprobes single-steps its copy of the probed instruction.
60(It would be simpler to single-step the actual instruction in place,
61but then Kprobes would have to temporarily remove the breakpoint
62instruction. This would open a small time window when another CPU
63could sail right past the probepoint.)
64
65After the instruction is single-stepped, Kprobes executes the
66"post_handler," if any, that is associated with the kprobe.
67Execution then continues with the instruction following the probepoint.
68
691.2 How Does a Jprobe Work?
70
71A jprobe is implemented using a kprobe that is placed on a function's
72entry point. It employs a simple mirroring principle to allow
73seamless access to the probed function's arguments. The jprobe
74handler routine should have the same signature (arg list and return
75type) as the function being probed, and must always end by calling
76the Kprobes function jprobe_return().
77
78Here's how it works. When the probe is hit, Kprobes makes a copy of
79the saved registers and a generous portion of the stack (see below).
80Kprobes then points the saved instruction pointer at the jprobe's
81handler routine, and returns from the trap. As a result, control
82passes to the handler, which is presented with the same register and
83stack contents as the probed function. When it is done, the handler
84calls jprobe_return(), which traps again to restore the original stack
85contents and processor state and switch to the probed function.
86
87By convention, the callee owns its arguments, so gcc may produce code
88that unexpectedly modifies that portion of the stack. This is why
89Kprobes saves a copy of the stack and restores it after the jprobe
90handler has run. Up to MAX_STACK_SIZE bytes are copied -- e.g.,
9164 bytes on i386.
92
93Note that the probed function's args may be passed on the stack
94or in registers (e.g., for x86_64 or for an i386 fastcall function).
95The jprobe will work in either case, so long as the handler's
96prototype matches that of the probed function.
97
981.3 How Does a Return Probe Work?
99
100When you call register_kretprobe(), Kprobes establishes a kprobe at
101the entry to the function. When the probed function is called and this
102probe is hit, Kprobes saves a copy of the return address, and replaces
103the return address with the address of a "trampoline." The trampoline
104is an arbitrary piece of code -- typically just a nop instruction.
105At boot time, Kprobes registers a kprobe at the trampoline.
106
107When the probed function executes its return instruction, control
108passes to the trampoline and that probe is hit. Kprobes' trampoline
109handler calls the user-specified handler associated with the kretprobe,
110then sets the saved instruction pointer to the saved return address,
111and that's where execution resumes upon return from the trap.
112
113While the probed function is executing, its return address is
114stored in an object of type kretprobe_instance. Before calling
115register_kretprobe(), the user sets the maxactive field of the
116kretprobe struct to specify how many instances of the specified
117function can be probed simultaneously. register_kretprobe()
118pre-allocates the indicated number of kretprobe_instance objects.
119
120For example, if the function is non-recursive and is called with a
121spinlock held, maxactive = 1 should be enough. If the function is
122non-recursive and can never relinquish the CPU (e.g., via a semaphore
123or preemption), NR_CPUS should be enough. If maxactive <= 0, it is
124set to a default value. If CONFIG_PREEMPT is enabled, the default
125is max(10, 2*NR_CPUS). Otherwise, the default is NR_CPUS.
126
127It's not a disaster if you set maxactive too low; you'll just miss
128some probes. In the kretprobe struct, the nmissed field is set to
129zero when the return probe is registered, and is incremented every
130time the probed function is entered but there is no kretprobe_instance
131object available for establishing the return probe.
132
1332. Architectures Supported
134
135Kprobes, jprobes, and return probes are implemented on the following
136architectures:
137
138- i386
Jim Keniston8861da32006-02-14 13:53:06 -0800139- x86_64 (AMD-64, EM64T)
Jim Kenistond27a4dd2005-08-04 12:53:35 -0700140- ppc64
Jim Keniston8861da32006-02-14 13:53:06 -0800141- ia64 (Does not support probes on instruction slot1.)
Jim Kenistond27a4dd2005-08-04 12:53:35 -0700142- sparc64 (Return probes not yet implemented.)
143
1443. Configuring Kprobes
145
146When configuring the kernel using make menuconfig/xconfig/oldconfig,
Jim Keniston8861da32006-02-14 13:53:06 -0800147ensure that CONFIG_KPROBES is set to "y". Under "Instrumentation
148Support", look for "Kprobes".
149
150So that you can load and unload Kprobes-based instrumentation modules,
151make sure "Loadable module support" (CONFIG_MODULES) and "Module
152unloading" (CONFIG_MODULE_UNLOAD) are set to "y".
Jim Kenistond27a4dd2005-08-04 12:53:35 -0700153
154You may also want to ensure that CONFIG_KALLSYMS and perhaps even
155CONFIG_KALLSYMS_ALL are set to "y", since kallsyms_lookup_name()
156is a handy, version-independent way to find a function's address.
157
158If you need to insert a probe in the middle of a function, you may find
159it useful to "Compile the kernel with debug info" (CONFIG_DEBUG_INFO),
160so you can use "objdump -d -l vmlinux" to see the source-to-object
161code mapping.
162
1634. API Reference
164
165The Kprobes API includes a "register" function and an "unregister"
166function for each type of probe. Here are terse, mini-man-page
167specifications for these functions and the associated probe handlers
168that you'll write. See the latter half of this document for examples.
169
1704.1 register_kprobe
171
172#include <linux/kprobes.h>
173int register_kprobe(struct kprobe *kp);
174
175Sets a breakpoint at the address kp->addr. When the breakpoint is
176hit, Kprobes calls kp->pre_handler. After the probed instruction
177is single-stepped, Kprobe calls kp->post_handler. If a fault
178occurs during execution of kp->pre_handler or kp->post_handler,
179or during single-stepping of the probed instruction, Kprobes calls
180kp->fault_handler. Any or all handlers can be NULL.
181
182register_kprobe() returns 0 on success, or a negative errno otherwise.
183
184User's pre-handler (kp->pre_handler):
185#include <linux/kprobes.h>
186#include <linux/ptrace.h>
187int pre_handler(struct kprobe *p, struct pt_regs *regs);
188
189Called with p pointing to the kprobe associated with the breakpoint,
190and regs pointing to the struct containing the registers saved when
191the breakpoint was hit. Return 0 here unless you're a Kprobes geek.
192
193User's post-handler (kp->post_handler):
194#include <linux/kprobes.h>
195#include <linux/ptrace.h>
196void post_handler(struct kprobe *p, struct pt_regs *regs,
197 unsigned long flags);
198
199p and regs are as described for the pre_handler. flags always seems
200to be zero.
201
202User's fault-handler (kp->fault_handler):
203#include <linux/kprobes.h>
204#include <linux/ptrace.h>
205int fault_handler(struct kprobe *p, struct pt_regs *regs, int trapnr);
206
207p and regs are as described for the pre_handler. trapnr is the
208architecture-specific trap number associated with the fault (e.g.,
209on i386, 13 for a general protection fault or 14 for a page fault).
210Returns 1 if it successfully handled the exception.
211
2124.2 register_jprobe
213
214#include <linux/kprobes.h>
215int register_jprobe(struct jprobe *jp)
216
217Sets a breakpoint at the address jp->kp.addr, which must be the address
218of the first instruction of a function. When the breakpoint is hit,
219Kprobes runs the handler whose address is jp->entry.
220
221The handler should have the same arg list and return type as the probed
222function; and just before it returns, it must call jprobe_return().
223(The handler never actually returns, since jprobe_return() returns
224control to Kprobes.) If the probed function is declared asmlinkage,
225fastcall, or anything else that affects how args are passed, the
226handler's declaration must match.
227
228register_jprobe() returns 0 on success, or a negative errno otherwise.
229
2304.3 register_kretprobe
231
232#include <linux/kprobes.h>
233int register_kretprobe(struct kretprobe *rp);
234
235Establishes a return probe for the function whose address is
236rp->kp.addr. When that function returns, Kprobes calls rp->handler.
237You must set rp->maxactive appropriately before you call
238register_kretprobe(); see "How Does a Return Probe Work?" for details.
239
240register_kretprobe() returns 0 on success, or a negative errno
241otherwise.
242
243User's return-probe handler (rp->handler):
244#include <linux/kprobes.h>
245#include <linux/ptrace.h>
246int kretprobe_handler(struct kretprobe_instance *ri, struct pt_regs *regs);
247
248regs is as described for kprobe.pre_handler. ri points to the
249kretprobe_instance object, of which the following fields may be
250of interest:
251- ret_addr: the return address
252- rp: points to the corresponding kretprobe object
253- task: points to the corresponding task struct
254The handler's return value is currently ignored.
255
2564.4 unregister_*probe
257
258#include <linux/kprobes.h>
259void unregister_kprobe(struct kprobe *kp);
260void unregister_jprobe(struct jprobe *jp);
261void unregister_kretprobe(struct kretprobe *rp);
262
263Removes the specified probe. The unregister function can be called
264at any time after the probe has been registered.
265
2665. Kprobes Features and Limitations
267
Jim Keniston8861da32006-02-14 13:53:06 -0800268Kprobes allows multiple probes at the same address. Currently,
269however, there cannot be multiple jprobes on the same function at
270the same time.
Jim Kenistond27a4dd2005-08-04 12:53:35 -0700271
272In general, you can install a probe anywhere in the kernel.
273In particular, you can probe interrupt handlers. Known exceptions
274are discussed in this section.
275
Jim Keniston8861da32006-02-14 13:53:06 -0800276The register_*probe functions will return -EINVAL if you attempt
277to install a probe in the code that implements Kprobes (mostly
278kernel/kprobes.c and arch/*/kernel/kprobes.c, but also functions such
279as do_page_fault and notifier_call_chain).
Jim Kenistond27a4dd2005-08-04 12:53:35 -0700280
281If you install a probe in an inline-able function, Kprobes makes
282no attempt to chase down all inline instances of the function and
283install probes there. gcc may inline a function without being asked,
284so keep this in mind if you're not seeing the probe hits you expect.
285
286A probe handler can modify the environment of the probed function
287-- e.g., by modifying kernel data structures, or by modifying the
288contents of the pt_regs struct (which are restored to the registers
289upon return from the breakpoint). So Kprobes can be used, for example,
290to install a bug fix or to inject faults for testing. Kprobes, of
291course, has no way to distinguish the deliberately injected faults
292from the accidental ones. Don't drink and probe.
293
294Kprobes makes no attempt to prevent probe handlers from stepping on
295each other -- e.g., probing printk() and then calling printk() from a
Jim Keniston8861da32006-02-14 13:53:06 -0800296probe handler. If a probe handler hits a probe, that second probe's
297handlers won't be run in that instance, and the kprobe.nmissed member
298of the second probe will be incremented.
Jim Kenistond27a4dd2005-08-04 12:53:35 -0700299
Jim Keniston8861da32006-02-14 13:53:06 -0800300As of Linux v2.6.15-rc1, multiple handlers (or multiple instances of
301the same handler) may run concurrently on different CPUs.
Jim Kenistond27a4dd2005-08-04 12:53:35 -0700302
Jim Keniston8861da32006-02-14 13:53:06 -0800303Kprobes does not use mutexes or allocate memory except during
Jim Kenistond27a4dd2005-08-04 12:53:35 -0700304registration and unregistration.
305
306Probe handlers are run with preemption disabled. Depending on the
307architecture, handlers may also run with interrupts disabled. In any
308case, your handler should not yield the CPU (e.g., by attempting to
309acquire a semaphore).
310
311Since a return probe is implemented by replacing the return
312address with the trampoline's address, stack backtraces and calls
313to __builtin_return_address() will typically yield the trampoline's
314address instead of the real return address for kretprobed functions.
315(As far as we can tell, __builtin_return_address() is used only
316for instrumentation and error reporting.)
317
Jim Keniston8861da32006-02-14 13:53:06 -0800318If the number of times a function is called does not match the number
319of times it returns, registering a return probe on that function may
320produce undesirable results. We have the do_exit() case covered.
321do_execve() and do_fork() are not an issue. We're unaware of other
322specific cases where this could be a problem.
323
324If, upon entry to or exit from a function, the CPU is running on
325a stack other than that of the current task, registering a return
326probe on that function may produce undesirable results. For this
327reason, Kprobes doesn't support return probes (or kprobes or jprobes)
328on the x86_64 version of __switch_to(); the registration functions
329return -EINVAL.
Jim Kenistond27a4dd2005-08-04 12:53:35 -0700330
3316. Probe Overhead
332
333On a typical CPU in use in 2005, a kprobe hit takes 0.5 to 1.0
334microseconds to process. Specifically, a benchmark that hits the same
335probepoint repeatedly, firing a simple handler each time, reports 1-2
336million hits per second, depending on the architecture. A jprobe or
337return-probe hit typically takes 50-75% longer than a kprobe hit.
338When you have a return probe set on a function, adding a kprobe at
339the entry to that function adds essentially no overhead.
340
341Here are sample overhead figures (in usec) for different architectures.
342k = kprobe; j = jprobe; r = return probe; kr = kprobe + return probe
343on same function; jr = jprobe + return probe on same function
344
345i386: Intel Pentium M, 1495 MHz, 2957.31 bogomips
346k = 0.57 usec; j = 1.00; r = 0.92; kr = 0.99; jr = 1.40
347
348x86_64: AMD Opteron 246, 1994 MHz, 3971.48 bogomips
349k = 0.49 usec; j = 0.76; r = 0.80; kr = 0.82; jr = 1.07
350
351ppc64: POWER5 (gr), 1656 MHz (SMT disabled, 1 virtual CPU per physical CPU)
352k = 0.77 usec; j = 1.31; r = 1.26; kr = 1.45; jr = 1.99
353
3547. TODO
355
Jim Keniston8861da32006-02-14 13:53:06 -0800356a. SystemTap (http://sourceware.org/systemtap): Provides a simplified
357programming interface for probe-based instrumentation. Try it out.
358b. Kernel return probes for sparc64.
359c. Support for other architectures.
360d. User-space probes.
361e. Watchpoint probes (which fire on data references).
Jim Kenistond27a4dd2005-08-04 12:53:35 -0700362
3638. Kprobes Example
364
365Here's a sample kernel module showing the use of kprobes to dump a
366stack trace and selected i386 registers when do_fork() is called.
367----- cut here -----
368/*kprobe_example.c*/
369#include <linux/kernel.h>
370#include <linux/module.h>
371#include <linux/kprobes.h>
372#include <linux/kallsyms.h>
373#include <linux/sched.h>
374
375/*For each probe you need to allocate a kprobe structure*/
376static struct kprobe kp;
377
378/*kprobe pre_handler: called just before the probed instruction is executed*/
379int handler_pre(struct kprobe *p, struct pt_regs *regs)
380{
381 printk("pre_handler: p->addr=0x%p, eip=%lx, eflags=0x%lx\n",
382 p->addr, regs->eip, regs->eflags);
383 dump_stack();
384 return 0;
385}
386
387/*kprobe post_handler: called after the probed instruction is executed*/
388void handler_post(struct kprobe *p, struct pt_regs *regs, unsigned long flags)
389{
390 printk("post_handler: p->addr=0x%p, eflags=0x%lx\n",
391 p->addr, regs->eflags);
392}
393
394/* fault_handler: this is called if an exception is generated for any
395 * instruction within the pre- or post-handler, or when Kprobes
396 * single-steps the probed instruction.
397 */
398int handler_fault(struct kprobe *p, struct pt_regs *regs, int trapnr)
399{
400 printk("fault_handler: p->addr=0x%p, trap #%dn",
401 p->addr, trapnr);
402 /* Return 0 because we don't handle the fault. */
403 return 0;
404}
405
406int init_module(void)
407{
408 int ret;
409 kp.pre_handler = handler_pre;
410 kp.post_handler = handler_post;
411 kp.fault_handler = handler_fault;
412 kp.addr = (kprobe_opcode_t*) kallsyms_lookup_name("do_fork");
413 /* register the kprobe now */
414 if (!kp.addr) {
415 printk("Couldn't find %s to plant kprobe\n", "do_fork");
416 return -1;
417 }
Jim Keniston8861da32006-02-14 13:53:06 -0800418 if ((ret = register_kprobe(&kp) < 0)) {
Jim Kenistond27a4dd2005-08-04 12:53:35 -0700419 printk("register_kprobe failed, returned %d\n", ret);
420 return -1;
421 }
422 printk("kprobe registered\n");
423 return 0;
424}
425
426void cleanup_module(void)
427{
428 unregister_kprobe(&kp);
429 printk("kprobe unregistered\n");
430}
431
432MODULE_LICENSE("GPL");
433----- cut here -----
434
435You can build the kernel module, kprobe-example.ko, using the following
436Makefile:
437----- cut here -----
438obj-m := kprobe-example.o
439KDIR := /lib/modules/$(shell uname -r)/build
440PWD := $(shell pwd)
441default:
442 $(MAKE) -C $(KDIR) SUBDIRS=$(PWD) modules
443clean:
444 rm -f *.mod.c *.ko *.o
445----- cut here -----
446
447$ make
448$ su -
449...
450# insmod kprobe-example.ko
451
452You will see the trace data in /var/log/messages and on the console
453whenever do_fork() is invoked to create a new process.
454
4559. Jprobes Example
456
457Here's a sample kernel module showing the use of jprobes to dump
458the arguments of do_fork().
459----- cut here -----
460/*jprobe-example.c */
461#include <linux/kernel.h>
462#include <linux/module.h>
463#include <linux/fs.h>
464#include <linux/uio.h>
465#include <linux/kprobes.h>
466#include <linux/kallsyms.h>
467
468/*
469 * Jumper probe for do_fork.
470 * Mirror principle enables access to arguments of the probed routine
471 * from the probe handler.
472 */
473
474/* Proxy routine having the same arguments as actual do_fork() routine */
475long jdo_fork(unsigned long clone_flags, unsigned long stack_start,
476 struct pt_regs *regs, unsigned long stack_size,
477 int __user * parent_tidptr, int __user * child_tidptr)
478{
479 printk("jprobe: clone_flags=0x%lx, stack_size=0x%lx, regs=0x%p\n",
480 clone_flags, stack_size, regs);
481 /* Always end with a call to jprobe_return(). */
482 jprobe_return();
483 /*NOTREACHED*/
484 return 0;
485}
486
487static struct jprobe my_jprobe = {
488 .entry = (kprobe_opcode_t *) jdo_fork
489};
490
491int init_module(void)
492{
493 int ret;
494 my_jprobe.kp.addr = (kprobe_opcode_t *) kallsyms_lookup_name("do_fork");
495 if (!my_jprobe.kp.addr) {
496 printk("Couldn't find %s to plant jprobe\n", "do_fork");
497 return -1;
498 }
499
500 if ((ret = register_jprobe(&my_jprobe)) <0) {
501 printk("register_jprobe failed, returned %d\n", ret);
502 return -1;
503 }
504 printk("Planted jprobe at %p, handler addr %p\n",
505 my_jprobe.kp.addr, my_jprobe.entry);
506 return 0;
507}
508
509void cleanup_module(void)
510{
511 unregister_jprobe(&my_jprobe);
512 printk("jprobe unregistered\n");
513}
514
515MODULE_LICENSE("GPL");
516----- cut here -----
517
518Build and insert the kernel module as shown in the above kprobe
519example. You will see the trace data in /var/log/messages and on
520the console whenever do_fork() is invoked to create a new process.
521(Some messages may be suppressed if syslogd is configured to
522eliminate duplicate messages.)
523
52410. Kretprobes Example
525
526Here's a sample kernel module showing the use of return probes to
527report failed calls to sys_open().
528----- cut here -----
529/*kretprobe-example.c*/
530#include <linux/kernel.h>
531#include <linux/module.h>
532#include <linux/kprobes.h>
533#include <linux/kallsyms.h>
534
535static const char *probed_func = "sys_open";
536
537/* Return-probe handler: If the probed function fails, log the return value. */
538static int ret_handler(struct kretprobe_instance *ri, struct pt_regs *regs)
539{
540 // Substitute the appropriate register name for your architecture --
541 // e.g., regs->rax for x86_64, regs->gpr[3] for ppc64.
542 int retval = (int) regs->eax;
543 if (retval < 0) {
544 printk("%s returns %d\n", probed_func, retval);
545 }
546 return 0;
547}
548
549static struct kretprobe my_kretprobe = {
550 .handler = ret_handler,
551 /* Probe up to 20 instances concurrently. */
552 .maxactive = 20
553};
554
555int init_module(void)
556{
557 int ret;
558 my_kretprobe.kp.addr =
559 (kprobe_opcode_t *) kallsyms_lookup_name(probed_func);
560 if (!my_kretprobe.kp.addr) {
561 printk("Couldn't find %s to plant return probe\n", probed_func);
562 return -1;
563 }
564 if ((ret = register_kretprobe(&my_kretprobe)) < 0) {
565 printk("register_kretprobe failed, returned %d\n", ret);
566 return -1;
567 }
568 printk("Planted return probe at %p\n", my_kretprobe.kp.addr);
569 return 0;
570}
571
572void cleanup_module(void)
573{
574 unregister_kretprobe(&my_kretprobe);
575 printk("kretprobe unregistered\n");
576 /* nmissed > 0 suggests that maxactive was set too low. */
577 printk("Missed probing %d instances of %s\n",
578 my_kretprobe.nmissed, probed_func);
579}
580
581MODULE_LICENSE("GPL");
582----- cut here -----
583
584Build and insert the kernel module as shown in the above kprobe
585example. You will see the trace data in /var/log/messages and on the
586console whenever sys_open() returns a negative value. (Some messages
587may be suppressed if syslogd is configured to eliminate duplicate
588messages.)
589
590For additional information on Kprobes, refer to the following URLs:
591http://www-106.ibm.com/developerworks/library/l-kprobes.html?ca=dgr-lnxw42Kprobe
592http://www.redhat.com/magazine/005mar05/features/kprobes/