Merge branches 'tracing/branch-tracer', 'tracing/fastboot', 'tracing/ftrace', 'tracing/function-return-tracer', 'tracing/power-tracer', 'tracing/powerpc', 'tracing/ring-buffer', 'tracing/stack-tracer' and 'tracing/urgent' into tracing/core
diff --git a/Documentation/ftrace.txt b/Documentation/ftrace.txt
index 9cc4d68..35a78bc 100644
--- a/Documentation/ftrace.txt
+++ b/Documentation/ftrace.txt
@@ -82,7 +82,7 @@
tracer is not adding more data, they will display
the same information every time they are read.
- iter_ctrl: This file lets the user control the amount of data
+ trace_options: This file lets the user control the amount of data
that is displayed in one of the above output
files.
@@ -94,10 +94,10 @@
only be recorded if the latency is greater than
the value in this file. (in microseconds)
- trace_entries: This sets or displays the number of bytes each CPU
+ buffer_size_kb: This sets or displays the number of kilobytes each CPU
buffer can hold. The tracer buffers are the same size
for each CPU. The displayed number is the size of the
- CPU buffer and not total size of all buffers. The
+ CPU buffer and not total size of all buffers. The
trace buffers are allocated in pages (blocks of memory
that the kernel uses for allocation, usually 4 KB in size).
If the last page allocated has room for more bytes
@@ -316,23 +316,23 @@
The rest is the same as the 'trace' file.
-iter_ctrl
----------
+trace_options
+-------------
-The iter_ctrl file is used to control what gets printed in the trace
+The trace_options file is used to control what gets printed in the trace
output. To see what is available, simply cat the file:
- cat /debug/tracing/iter_ctrl
+ cat /debug/tracing/trace_options
print-parent nosym-offset nosym-addr noverbose noraw nohex nobin \
- noblock nostacktrace nosched-tree
+ noblock nostacktrace nosched-tree nouserstacktrace nosym-userobj
To disable one of the options, echo in the option prepended with "no".
- echo noprint-parent > /debug/tracing/iter_ctrl
+ echo noprint-parent > /debug/tracing/trace_options
To enable an option, leave off the "no".
- echo sym-offset > /debug/tracing/iter_ctrl
+ echo sym-offset > /debug/tracing/trace_options
Here are the available options:
@@ -378,6 +378,20 @@
When a trace is recorded, so is the stack of functions.
This allows for back traces of trace sites.
+ userstacktrace - This option changes the trace.
+ It records a stacktrace of the current userspace thread.
+
+ sym-userobj - when user stacktrace are enabled, look up which object the
+ address belongs to, and print a relative address
+ This is especially useful when ASLR is on, otherwise you don't
+ get a chance to resolve the address to object/file/line after the app is no
+ longer running
+
+ The lookup is performed when you read trace,trace_pipe,latency_trace. Example:
+
+ a.out-1623 [000] 40874.465068: /root/a.out[+0x480] <-/root/a.out[+0
+x494] <- /root/a.out[+0x4a8] <- /lib/libc-2.7.so[+0x1e1a6]
+
sched-tree - TBD (any users??)
@@ -1299,41 +1313,29 @@
-------------
Having too much or not enough data can be troublesome in diagnosing
-an issue in the kernel. The file trace_entries is used to modify
+an issue in the kernel. The file buffer_size_kb is used to modify
the size of the internal trace buffers. The number listed
is the number of entries that can be recorded per CPU. To know
the full size, multiply the number of possible CPUS with the
number of entries.
- # cat /debug/tracing/trace_entries
-65620
+ # cat /debug/tracing/buffer_size_kb
+1408 (units kilobytes)
Note, to modify this, you must have tracing completely disabled. To do that,
echo "nop" into the current_tracer. If the current_tracer is not set
to "nop", an EINVAL error will be returned.
# echo nop > /debug/tracing/current_tracer
- # echo 100000 > /debug/tracing/trace_entries
- # cat /debug/tracing/trace_entries
-100045
-
-
-Notice that we echoed in 100,000 but the size is 100,045. The entries
-are held in individual pages. It allocates the number of pages it takes
-to fulfill the request. If more entries may fit on the last page
-then they will be added.
-
- # echo 1 > /debug/tracing/trace_entries
- # cat /debug/tracing/trace_entries
-85
-
-This shows us that 85 entries can fit in a single page.
+ # echo 10000 > /debug/tracing/buffer_size_kb
+ # cat /debug/tracing/buffer_size_kb
+10000 (units kilobytes)
The number of pages which will be allocated is limited to a percentage
of available memory. Allocating too much will produce an error.
- # echo 1000000000000 > /debug/tracing/trace_entries
+ # echo 1000000000000 > /debug/tracing/buffer_size_kb
-bash: echo: write error: Cannot allocate memory
- # cat /debug/tracing/trace_entries
+ # cat /debug/tracing/buffer_size_kb
85
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index e0f346d..2919a2e 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -750,6 +750,14 @@
parameter will force ia64_sal_cache_flush to call
ia64_pal_cache_flush instead of SAL_CACHE_FLUSH.
+ ftrace=[tracer]
+ [ftrace] will set and start the specified tracer
+ as early as possible in order to facilitate early
+ boot debugging.
+
+ ftrace_dump_on_oops
+ [ftrace] will dump the trace buffers on oops.
+
gamecon.map[2|3]=
[HW,JOY] Multisystem joystick and NES/SNES/PSX pad
support via parallel port (up to 5 devices per port)
diff --git a/Documentation/markers.txt b/Documentation/markers.txt
index 089f613..6d275e4 100644
--- a/Documentation/markers.txt
+++ b/Documentation/markers.txt
@@ -70,6 +70,20 @@
"Format mismatch for probe probe_name (format), marker (format)"
+Another way to use markers is to simply define the marker without generating any
+function call to actually call into the marker. This is useful in combination
+with tracepoint probes in a scheme like this :
+
+void probe_tracepoint_name(unsigned int arg1, struct task_struct *tsk);
+
+DEFINE_MARKER_TP(marker_eventname, tracepoint_name, probe_tracepoint_name,
+ "arg1 %u pid %d");
+
+notrace void probe_tracepoint_name(unsigned int arg1, struct task_struct *tsk)
+{
+ struct marker *marker = &GET_MARKER(kernel_irq_entry);
+ /* write data to trace buffers ... */
+}
* Probe / marker example
diff --git a/Documentation/tracepoints.txt b/Documentation/tracepoints.txt
index 5d354e1..2d42241 100644
--- a/Documentation/tracepoints.txt
+++ b/Documentation/tracepoints.txt
@@ -3,28 +3,30 @@
Mathieu Desnoyers
-This document introduces Linux Kernel Tracepoints and their use. It provides
-examples of how to insert tracepoints in the kernel and connect probe functions
-to them and provides some examples of probe functions.
+This document introduces Linux Kernel Tracepoints and their use. It
+provides examples of how to insert tracepoints in the kernel and
+connect probe functions to them and provides some examples of probe
+functions.
* Purpose of tracepoints
-A tracepoint placed in code provides a hook to call a function (probe) that you
-can provide at runtime. A tracepoint can be "on" (a probe is connected to it) or
-"off" (no probe is attached). When a tracepoint is "off" it has no effect,
-except for adding a tiny time penalty (checking a condition for a branch) and
-space penalty (adding a few bytes for the function call at the end of the
-instrumented function and adds a data structure in a separate section). When a
-tracepoint is "on", the function you provide is called each time the tracepoint
-is executed, in the execution context of the caller. When the function provided
-ends its execution, it returns to the caller (continuing from the tracepoint
-site).
+A tracepoint placed in code provides a hook to call a function (probe)
+that you can provide at runtime. A tracepoint can be "on" (a probe is
+connected to it) or "off" (no probe is attached). When a tracepoint is
+"off" it has no effect, except for adding a tiny time penalty
+(checking a condition for a branch) and space penalty (adding a few
+bytes for the function call at the end of the instrumented function
+and adds a data structure in a separate section). When a tracepoint
+is "on", the function you provide is called each time the tracepoint
+is executed, in the execution context of the caller. When the function
+provided ends its execution, it returns to the caller (continuing from
+the tracepoint site).
You can put tracepoints at important locations in the code. They are
lightweight hooks that can pass an arbitrary number of parameters,
-which prototypes are described in a tracepoint declaration placed in a header
-file.
+which prototypes are described in a tracepoint declaration placed in a
+header file.
They can be used for tracing and performance accounting.
@@ -42,7 +44,7 @@
#include <linux/tracepoint.h>
-DEFINE_TRACE(subsys_eventname,
+DECLARE_TRACE(subsys_eventname,
TPPTOTO(int firstarg, struct task_struct *p),
TPARGS(firstarg, p));
@@ -50,6 +52,8 @@
#include <trace/subsys.h>
+DEFINE_TRACE(subsys_eventname);
+
void somefct(void)
{
...
@@ -61,31 +65,41 @@
- subsys_eventname is an identifier unique to your event
- subsys is the name of your subsystem.
- eventname is the name of the event to trace.
-- TPPTOTO(int firstarg, struct task_struct *p) is the prototype of the function
- called by this tracepoint.
-- TPARGS(firstarg, p) are the parameters names, same as found in the prototype.
-Connecting a function (probe) to a tracepoint is done by providing a probe
-(function to call) for the specific tracepoint through
+- TPPTOTO(int firstarg, struct task_struct *p) is the prototype of the
+ function called by this tracepoint.
+
+- TPARGS(firstarg, p) are the parameters names, same as found in the
+ prototype.
+
+Connecting a function (probe) to a tracepoint is done by providing a
+probe (function to call) for the specific tracepoint through
register_trace_subsys_eventname(). Removing a probe is done through
-unregister_trace_subsys_eventname(); it will remove the probe sure there is no
-caller left using the probe when it returns. Probe removal is preempt-safe
-because preemption is disabled around the probe call. See the "Probe example"
-section below for a sample probe module.
+unregister_trace_subsys_eventname(); it will remove the probe.
-The tracepoint mechanism supports inserting multiple instances of the same
-tracepoint, but a single definition must be made of a given tracepoint name over
-all the kernel to make sure no type conflict will occur. Name mangling of the
-tracepoints is done using the prototypes to make sure typing is correct.
-Verification of probe type correctness is done at the registration site by the
-compiler. Tracepoints can be put in inline functions, inlined static functions,
-and unrolled loops as well as regular functions.
+tracepoint_synchronize_unregister() must be called before the end of
+the module exit function to make sure there is no caller left using
+the probe. This, and the fact that preemption is disabled around the
+probe call, make sure that probe removal and module unload are safe.
+See the "Probe example" section below for a sample probe module.
-The naming scheme "subsys_event" is suggested here as a convention intended
-to limit collisions. Tracepoint names are global to the kernel: they are
-considered as being the same whether they are in the core kernel image or in
-modules.
+The tracepoint mechanism supports inserting multiple instances of the
+same tracepoint, but a single definition must be made of a given
+tracepoint name over all the kernel to make sure no type conflict will
+occur. Name mangling of the tracepoints is done using the prototypes
+to make sure typing is correct. Verification of probe type correctness
+is done at the registration site by the compiler. Tracepoints can be
+put in inline functions, inlined static functions, and unrolled loops
+as well as regular functions.
+The naming scheme "subsys_event" is suggested here as a convention
+intended to limit collisions. Tracepoint names are global to the
+kernel: they are considered as being the same whether they are in the
+core kernel image or in modules.
+
+If the tracepoint has to be used in kernel modules, an
+EXPORT_TRACEPOINT_SYMBOL_GPL() or EXPORT_TRACEPOINT_SYMBOL() can be
+used to export the defined tracepoints.
* Probe / tracepoint example
diff --git a/arch/powerpc/include/asm/ftrace.h b/arch/powerpc/include/asm/ftrace.h
index b298f7a..e5f2ae8 100644
--- a/arch/powerpc/include/asm/ftrace.h
+++ b/arch/powerpc/include/asm/ftrace.h
@@ -7,7 +7,19 @@
#ifndef __ASSEMBLY__
extern void _mcount(void);
-#endif
+
+#ifdef CONFIG_DYNAMIC_FTRACE
+static inline unsigned long ftrace_call_adjust(unsigned long addr)
+{
+ /* reloction of mcount call site is the same as the address */
+ return addr;
+}
+
+struct dyn_arch_ftrace {
+ struct module *mod;
+};
+#endif /* CONFIG_DYNAMIC_FTRACE */
+#endif /* __ASSEMBLY__ */
#endif
diff --git a/arch/powerpc/include/asm/module.h b/arch/powerpc/include/asm/module.h
index e5f14b1..0845488 100644
--- a/arch/powerpc/include/asm/module.h
+++ b/arch/powerpc/include/asm/module.h
@@ -34,11 +34,19 @@
#ifdef __powerpc64__
unsigned int stubs_section; /* Index of stubs section in module */
unsigned int toc_section; /* What section is the TOC? */
-#else
+#ifdef CONFIG_DYNAMIC_FTRACE
+ unsigned long toc;
+ unsigned long tramp;
+#endif
+
+#else /* powerpc64 */
/* Indices of PLT sections within module. */
unsigned int core_plt_section;
unsigned int init_plt_section;
+#ifdef CONFIG_DYNAMIC_FTRACE
+ unsigned long tramp;
#endif
+#endif /* powerpc64 */
/* List of BUG addresses, source line numbers and filenames */
struct list_head bug_list;
@@ -68,6 +76,12 @@
# endif /* MODULE */
#endif
+#ifdef CONFIG_DYNAMIC_FTRACE
+# ifdef MODULE
+ asm(".section .ftrace.tramp,\"ax\",@nobits; .align 3; .previous");
+# endif /* MODULE */
+#endif
+
struct exception_table_entry;
void sort_ex_table(struct exception_table_entry *start,
diff --git a/arch/powerpc/kernel/ftrace.c b/arch/powerpc/kernel/ftrace.c
index f4b006e..3271cd6 100644
--- a/arch/powerpc/kernel/ftrace.c
+++ b/arch/powerpc/kernel/ftrace.c
@@ -9,22 +9,30 @@
#include <linux/spinlock.h>
#include <linux/hardirq.h>
+#include <linux/uaccess.h>
+#include <linux/module.h>
#include <linux/ftrace.h>
#include <linux/percpu.h>
#include <linux/init.h>
#include <linux/list.h>
#include <asm/cacheflush.h>
+#include <asm/code-patching.h>
#include <asm/ftrace.h>
+#if 0
+#define DEBUGP printk
+#else
+#define DEBUGP(fmt , ...) do { } while (0)
+#endif
-static unsigned int ftrace_nop = 0x60000000;
+static unsigned int ftrace_nop = PPC_NOP_INSTR;
#ifdef CONFIG_PPC32
# define GET_ADDR(addr) addr
#else
/* PowerPC64's functions are data that points to the functions */
-# define GET_ADDR(addr) *(unsigned long *)addr
+# define GET_ADDR(addr) (*(unsigned long *)addr)
#endif
@@ -33,12 +41,12 @@
return (int)(addr - ip);
}
-unsigned char *ftrace_nop_replace(void)
+static unsigned char *ftrace_nop_replace(void)
{
return (char *)&ftrace_nop;
}
-unsigned char *ftrace_call_replace(unsigned long ip, unsigned long addr)
+static unsigned char *ftrace_call_replace(unsigned long ip, unsigned long addr)
{
static unsigned int op;
@@ -68,49 +76,434 @@
# define _ASM_PTR " .long "
#endif
-int
+static int
ftrace_modify_code(unsigned long ip, unsigned char *old_code,
unsigned char *new_code)
{
- unsigned replaced;
- unsigned old = *(unsigned *)old_code;
- unsigned new = *(unsigned *)new_code;
- int faulted = 0;
+ unsigned char replaced[MCOUNT_INSN_SIZE];
/*
* Note: Due to modules and __init, code can
* disappear and change, we need to protect against faulting
- * as well as code changing.
+ * as well as code changing. We do this by using the
+ * probe_kernel_* functions.
*
* No real locking needed, this code is run through
- * kstop_machine.
+ * kstop_machine, or before SMP starts.
*/
- asm volatile (
- "1: lwz %1, 0(%2)\n"
- " cmpw %1, %5\n"
- " bne 2f\n"
- " stwu %3, 0(%2)\n"
- "2:\n"
- ".section .fixup, \"ax\"\n"
- "3: li %0, 1\n"
- " b 2b\n"
- ".previous\n"
- ".section __ex_table,\"a\"\n"
- _ASM_ALIGN "\n"
- _ASM_PTR "1b, 3b\n"
- ".previous"
- : "=r"(faulted), "=r"(replaced)
- : "r"(ip), "r"(new),
- "0"(faulted), "r"(old)
- : "memory");
- if (replaced != old && replaced != new)
- faulted = 2;
+ /* read the text we want to modify */
+ if (probe_kernel_read(replaced, (void *)ip, MCOUNT_INSN_SIZE))
+ return -EFAULT;
- if (!faulted)
- flush_icache_range(ip, ip + 8);
+ /* Make sure it is what we expect it to be */
+ if (memcmp(replaced, old_code, MCOUNT_INSN_SIZE) != 0)
+ return -EINVAL;
- return faulted;
+ /* replace the text with the new text */
+ if (probe_kernel_write((void *)ip, new_code, MCOUNT_INSN_SIZE))
+ return -EPERM;
+
+ flush_icache_range(ip, ip + 8);
+
+ return 0;
+}
+
+/*
+ * Helper functions that are the same for both PPC64 and PPC32.
+ */
+static int test_24bit_addr(unsigned long ip, unsigned long addr)
+{
+ long diff;
+
+ /*
+ * Can we get to addr from ip in 24 bits?
+ * (26 really, since we mulitply by 4 for 4 byte alignment)
+ */
+ diff = addr - ip;
+
+ /*
+ * Return true if diff is less than 1 << 25
+ * and greater than -1 << 26.
+ */
+ return (diff < (1 << 25)) && (diff > (-1 << 26));
+}
+
+static int is_bl_op(unsigned int op)
+{
+ return (op & 0xfc000003) == 0x48000001;
+}
+
+static int test_offset(unsigned long offset)
+{
+ return (offset + 0x2000000 > 0x3ffffff) || ((offset & 3) != 0);
+}
+
+static unsigned long find_bl_target(unsigned long ip, unsigned int op)
+{
+ static int offset;
+
+ offset = (op & 0x03fffffc);
+ /* make it signed */
+ if (offset & 0x02000000)
+ offset |= 0xfe000000;
+
+ return ip + (long)offset;
+}
+
+static unsigned int branch_offset(unsigned long offset)
+{
+ /* return "bl ip+offset" */
+ return 0x48000001 | (offset & 0x03fffffc);
+}
+
+#ifdef CONFIG_PPC64
+static int
+__ftrace_make_nop(struct module *mod,
+ struct dyn_ftrace *rec, unsigned long addr)
+{
+ unsigned char replaced[MCOUNT_INSN_SIZE * 2];
+ unsigned int *op = (unsigned *)&replaced;
+ unsigned char jmp[8];
+ unsigned long *ptr = (unsigned long *)&jmp;
+ unsigned long ip = rec->ip;
+ unsigned long tramp;
+ int offset;
+
+ /* read where this goes */
+ if (probe_kernel_read(replaced, (void *)ip, MCOUNT_INSN_SIZE))
+ return -EFAULT;
+
+ /* Make sure that that this is still a 24bit jump */
+ if (!is_bl_op(*op)) {
+ printk(KERN_ERR "Not expected bl: opcode is %x\n", *op);
+ return -EINVAL;
+ }
+
+ /* lets find where the pointer goes */
+ tramp = find_bl_target(ip, *op);
+
+ /*
+ * On PPC64 the trampoline looks like:
+ * 0x3d, 0x82, 0x00, 0x00, addis r12,r2, <high>
+ * 0x39, 0x8c, 0x00, 0x00, addi r12,r12, <low>
+ * Where the bytes 2,3,6 and 7 make up the 32bit offset
+ * to the TOC that holds the pointer.
+ * to jump to.
+ * 0xf8, 0x41, 0x00, 0x28, std r2,40(r1)
+ * 0xe9, 0x6c, 0x00, 0x20, ld r11,32(r12)
+ * The actually address is 32 bytes from the offset
+ * into the TOC.
+ * 0xe8, 0x4c, 0x00, 0x28, ld r2,40(r12)
+ */
+
+ DEBUGP("ip:%lx jumps to %lx r2: %lx", ip, tramp, mod->arch.toc);
+
+ /* Find where the trampoline jumps to */
+ if (probe_kernel_read(jmp, (void *)tramp, 8)) {
+ printk(KERN_ERR "Failed to read %lx\n", tramp);
+ return -EFAULT;
+ }
+
+ DEBUGP(" %08x %08x",
+ (unsigned)(*ptr >> 32),
+ (unsigned)*ptr);
+
+ offset = (unsigned)jmp[2] << 24 |
+ (unsigned)jmp[3] << 16 |
+ (unsigned)jmp[6] << 8 |
+ (unsigned)jmp[7];
+
+ DEBUGP(" %x ", offset);
+
+ /* get the address this jumps too */
+ tramp = mod->arch.toc + offset + 32;
+ DEBUGP("toc: %lx", tramp);
+
+ if (probe_kernel_read(jmp, (void *)tramp, 8)) {
+ printk(KERN_ERR "Failed to read %lx\n", tramp);
+ return -EFAULT;
+ }
+
+ DEBUGP(" %08x %08x\n",
+ (unsigned)(*ptr >> 32),
+ (unsigned)*ptr);
+
+ /* This should match what was called */
+ if (*ptr != GET_ADDR(addr)) {
+ printk(KERN_ERR "addr does not match %lx\n", *ptr);
+ return -EINVAL;
+ }
+
+ /*
+ * We want to nop the line, but the next line is
+ * 0xe8, 0x41, 0x00, 0x28 ld r2,40(r1)
+ * This needs to be turned to a nop too.
+ */
+ if (probe_kernel_read(replaced, (void *)(ip+4), MCOUNT_INSN_SIZE))
+ return -EFAULT;
+
+ if (*op != 0xe8410028) {
+ printk(KERN_ERR "Next line is not ld! (%08x)\n", *op);
+ return -EINVAL;
+ }
+
+ /*
+ * Milton Miller pointed out that we can not blindly do nops.
+ * If a task was preempted when calling a trace function,
+ * the nops will remove the way to restore the TOC in r2
+ * and the r2 TOC will get corrupted.
+ */
+
+ /*
+ * Replace:
+ * bl <tramp> <==== will be replaced with "b 1f"
+ * ld r2,40(r1)
+ * 1:
+ */
+ op[0] = 0x48000008; /* b +8 */
+
+ if (probe_kernel_write((void *)ip, replaced, MCOUNT_INSN_SIZE))
+ return -EPERM;
+
+ return 0;
+}
+
+#else /* !PPC64 */
+static int
+__ftrace_make_nop(struct module *mod,
+ struct dyn_ftrace *rec, unsigned long addr)
+{
+ unsigned char replaced[MCOUNT_INSN_SIZE];
+ unsigned int *op = (unsigned *)&replaced;
+ unsigned char jmp[8];
+ unsigned int *ptr = (unsigned int *)&jmp;
+ unsigned long ip = rec->ip;
+ unsigned long tramp;
+ int offset;
+
+ if (probe_kernel_read(replaced, (void *)ip, MCOUNT_INSN_SIZE))
+ return -EFAULT;
+
+ /* Make sure that that this is still a 24bit jump */
+ if (!is_bl_op(*op)) {
+ printk(KERN_ERR "Not expected bl: opcode is %x\n", *op);
+ return -EINVAL;
+ }
+
+ /* lets find where the pointer goes */
+ tramp = find_bl_target(ip, *op);
+
+ /*
+ * On PPC32 the trampoline looks like:
+ * lis r11,sym@ha
+ * addi r11,r11,sym@l
+ * mtctr r11
+ * bctr
+ */
+
+ DEBUGP("ip:%lx jumps to %lx", ip, tramp);
+
+ /* Find where the trampoline jumps to */
+ if (probe_kernel_read(jmp, (void *)tramp, 8)) {
+ printk(KERN_ERR "Failed to read %lx\n", tramp);
+ return -EFAULT;
+ }
+
+ DEBUGP(" %08x %08x ", ptr[0], ptr[1]);
+
+ tramp = (ptr[1] & 0xffff) |
+ ((ptr[0] & 0xffff) << 16);
+ if (tramp & 0x8000)
+ tramp -= 0x10000;
+
+ DEBUGP(" %x ", tramp);
+
+ if (tramp != addr) {
+ printk(KERN_ERR
+ "Trampoline location %08lx does not match addr\n",
+ tramp);
+ return -EINVAL;
+ }
+
+ op[0] = PPC_NOP_INSTR;
+
+ if (probe_kernel_write((void *)ip, replaced, MCOUNT_INSN_SIZE))
+ return -EPERM;
+
+ return 0;
+}
+#endif /* PPC64 */
+
+int ftrace_make_nop(struct module *mod,
+ struct dyn_ftrace *rec, unsigned long addr)
+{
+ unsigned char *old, *new;
+ unsigned long ip = rec->ip;
+
+ /*
+ * If the calling address is more that 24 bits away,
+ * then we had to use a trampoline to make the call.
+ * Otherwise just update the call site.
+ */
+ if (test_24bit_addr(ip, addr)) {
+ /* within range */
+ old = ftrace_call_replace(ip, addr);
+ new = ftrace_nop_replace();
+ return ftrace_modify_code(ip, old, new);
+ }
+
+ /*
+ * Out of range jumps are called from modules.
+ * We should either already have a pointer to the module
+ * or it has been passed in.
+ */
+ if (!rec->arch.mod) {
+ if (!mod) {
+ printk(KERN_ERR "No module loaded addr=%lx\n",
+ addr);
+ return -EFAULT;
+ }
+ rec->arch.mod = mod;
+ } else if (mod) {
+ if (mod != rec->arch.mod) {
+ printk(KERN_ERR
+ "Record mod %p not equal to passed in mod %p\n",
+ rec->arch.mod, mod);
+ return -EINVAL;
+ }
+ /* nothing to do if mod == rec->arch.mod */
+ } else
+ mod = rec->arch.mod;
+
+ return __ftrace_make_nop(mod, rec, addr);
+
+}
+
+#ifdef CONFIG_PPC64
+static int
+__ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
+{
+ unsigned char replaced[MCOUNT_INSN_SIZE * 2];
+ unsigned int *op = (unsigned *)&replaced;
+ unsigned long ip = rec->ip;
+ unsigned long offset;
+
+ /* read where this goes */
+ if (probe_kernel_read(replaced, (void *)ip, MCOUNT_INSN_SIZE * 2))
+ return -EFAULT;
+
+ /*
+ * It should be pointing to two nops or
+ * b +8; ld r2,40(r1)
+ */
+ if (((op[0] != 0x48000008) || (op[1] != 0xe8410028)) &&
+ ((op[0] != PPC_NOP_INSTR) || (op[1] != PPC_NOP_INSTR))) {
+ printk(KERN_ERR "Expected NOPs but have %x %x\n", op[0], op[1]);
+ return -EINVAL;
+ }
+
+ /* If we never set up a trampoline to ftrace_caller, then bail */
+ if (!rec->arch.mod->arch.tramp) {
+ printk(KERN_ERR "No ftrace trampoline\n");
+ return -EINVAL;
+ }
+
+ /* now calculate a jump to the ftrace caller trampoline */
+ offset = rec->arch.mod->arch.tramp - ip;
+
+ if (test_offset(offset)) {
+ printk(KERN_ERR "REL24 %li out of range!\n",
+ (long int)offset);
+ return -EINVAL;
+ }
+
+ /* Set to "bl addr" */
+ op[0] = branch_offset(offset);
+ /* ld r2,40(r1) */
+ op[1] = 0xe8410028;
+
+ DEBUGP("write to %lx\n", rec->ip);
+
+ if (probe_kernel_write((void *)ip, replaced, MCOUNT_INSN_SIZE * 2))
+ return -EPERM;
+
+ return 0;
+}
+#else
+static int
+__ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
+{
+ unsigned char replaced[MCOUNT_INSN_SIZE];
+ unsigned int *op = (unsigned *)&replaced;
+ unsigned long ip = rec->ip;
+ unsigned long offset;
+
+ /* read where this goes */
+ if (probe_kernel_read(replaced, (void *)ip, MCOUNT_INSN_SIZE))
+ return -EFAULT;
+
+ /* It should be pointing to a nop */
+ if (op[0] != PPC_NOP_INSTR) {
+ printk(KERN_ERR "Expected NOP but have %x\n", op[0]);
+ return -EINVAL;
+ }
+
+ /* If we never set up a trampoline to ftrace_caller, then bail */
+ if (!rec->arch.mod->arch.tramp) {
+ printk(KERN_ERR "No ftrace trampoline\n");
+ return -EINVAL;
+ }
+
+ /* now calculate a jump to the ftrace caller trampoline */
+ offset = rec->arch.mod->arch.tramp - ip;
+
+ if (test_offset(offset)) {
+ printk(KERN_ERR "REL24 %li out of range!\n",
+ (long int)offset);
+ return -EINVAL;
+ }
+
+ /* Set to "bl addr" */
+ op[0] = branch_offset(offset);
+
+ DEBUGP("write to %lx\n", rec->ip);
+
+ if (probe_kernel_write((void *)ip, replaced, MCOUNT_INSN_SIZE))
+ return -EPERM;
+
+ return 0;
+}
+#endif /* CONFIG_PPC64 */
+
+int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
+{
+ unsigned char *old, *new;
+ unsigned long ip = rec->ip;
+
+ /*
+ * If the calling address is more that 24 bits away,
+ * then we had to use a trampoline to make the call.
+ * Otherwise just update the call site.
+ */
+ if (test_24bit_addr(ip, addr)) {
+ /* within range */
+ old = ftrace_nop_replace();
+ new = ftrace_call_replace(ip, addr);
+ return ftrace_modify_code(ip, old, new);
+ }
+
+ /*
+ * Out of range jumps are called from modules.
+ * Being that we are converting from nop, it had better
+ * already have a module defined.
+ */
+ if (!rec->arch.mod) {
+ printk(KERN_ERR "No module loaded\n");
+ return -EINVAL;
+ }
+
+ return __ftrace_make_call(rec, addr);
}
int ftrace_update_ftrace_func(ftrace_func_t func)
@@ -128,10 +521,10 @@
int __init ftrace_dyn_arch_init(void *data)
{
- /* This is running in kstop_machine */
+ /* caller expects data to be zero */
+ unsigned long *p = data;
- ftrace_mcount_set(data);
+ *p = 0;
return 0;
}
-
diff --git a/arch/powerpc/kernel/idle.c b/arch/powerpc/kernel/idle.c
index 31982d0..88d9c1d 100644
--- a/arch/powerpc/kernel/idle.c
+++ b/arch/powerpc/kernel/idle.c
@@ -69,10 +69,15 @@
smp_mb();
local_irq_disable();
+ /* Don't trace irqs off for idle */
+ stop_critical_timings();
+
/* check again after disabling irqs */
if (!need_resched() && !cpu_should_die())
ppc_md.power_save();
+ start_critical_timings();
+
local_irq_enable();
set_thread_flag(TIF_POLLING_NRFLAG);
diff --git a/arch/powerpc/kernel/module_32.c b/arch/powerpc/kernel/module_32.c
index 2df91a0..f832773 100644
--- a/arch/powerpc/kernel/module_32.c
+++ b/arch/powerpc/kernel/module_32.c
@@ -22,6 +22,7 @@
#include <linux/fs.h>
#include <linux/string.h>
#include <linux/kernel.h>
+#include <linux/ftrace.h>
#include <linux/cache.h>
#include <linux/bug.h>
#include <linux/sort.h>
@@ -53,6 +54,9 @@
r_addend = rela[i].r_addend;
}
+#ifdef CONFIG_DYNAMIC_FTRACE
+ _count_relocs++; /* add one for ftrace_caller */
+#endif
return _count_relocs;
}
@@ -306,5 +310,11 @@
return -ENOEXEC;
}
}
+#ifdef CONFIG_DYNAMIC_FTRACE
+ module->arch.tramp =
+ do_plt_call(module->module_core,
+ (unsigned long)ftrace_caller,
+ sechdrs, module);
+#endif
return 0;
}
diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c
index 1af2377..8992b03 100644
--- a/arch/powerpc/kernel/module_64.c
+++ b/arch/powerpc/kernel/module_64.c
@@ -20,6 +20,7 @@
#include <linux/moduleloader.h>
#include <linux/err.h>
#include <linux/vmalloc.h>
+#include <linux/ftrace.h>
#include <linux/bug.h>
#include <asm/module.h>
#include <asm/firmware.h>
@@ -163,6 +164,11 @@
}
}
+#ifdef CONFIG_DYNAMIC_FTRACE
+ /* make the trampoline to the ftrace_caller */
+ relocs++;
+#endif
+
DEBUGP("Looks like a total of %lu stubs, max\n", relocs);
return relocs * sizeof(struct ppc64_stub_entry);
}
@@ -441,5 +447,12 @@
}
}
+#ifdef CONFIG_DYNAMIC_FTRACE
+ me->arch.toc = my_r2(sechdrs, me);
+ me->arch.tramp = stub_for_addr(sechdrs,
+ (unsigned long)ftrace_caller,
+ me);
+#endif
+
return 0;
}
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index ac22bb7..e49a4fd 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -29,11 +29,14 @@
select HAVE_FTRACE_MCOUNT_RECORD
select HAVE_DYNAMIC_FTRACE
select HAVE_FUNCTION_TRACER
+ select HAVE_FUNCTION_RET_TRACER if X86_32
+ select HAVE_FUNCTION_TRACE_MCOUNT_TEST
select HAVE_KVM if ((X86_32 && !X86_VOYAGER && !X86_VISWS && !X86_NUMAQ) || X86_64)
select HAVE_ARCH_KGDB if !X86_VOYAGER
select HAVE_ARCH_TRACEHOOK
select HAVE_GENERIC_DMA_COHERENT if X86_32
select HAVE_EFFICIENT_UNALIGNED_ACCESS
+ select USER_STACKTRACE_SUPPORT
config ARCH_DEFCONFIG
string
diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug
index 2a3dfbd..fa013f5 100644
--- a/arch/x86/Kconfig.debug
+++ b/arch/x86/Kconfig.debug
@@ -186,14 +186,10 @@
Add a simple leak tracer to the IOMMU code. This is useful when you
are debugging a buggy device driver that leaks IOMMU mappings.
-config MMIOTRACE_HOOKS
- bool
-
config MMIOTRACE
bool "Memory mapped IO tracing"
depends on DEBUG_KERNEL && PCI
select TRACING
- select MMIOTRACE_HOOKS
help
Mmiotrace traces Memory Mapped I/O access and is meant for
debugging and reverse engineering. It is called from the ioremap
diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h
index 9e8bc29..754a3e0 100644
--- a/arch/x86/include/asm/ftrace.h
+++ b/arch/x86/include/asm/ftrace.h
@@ -17,8 +17,40 @@
*/
return addr - 1;
}
-#endif
+#ifdef CONFIG_DYNAMIC_FTRACE
+
+struct dyn_arch_ftrace {
+ /* No extra data needed for x86 */
+};
+
+#endif /* CONFIG_DYNAMIC_FTRACE */
+#endif /* __ASSEMBLY__ */
#endif /* CONFIG_FUNCTION_TRACER */
+#ifdef CONFIG_FUNCTION_RET_TRACER
+
+#ifndef __ASSEMBLY__
+
+/*
+ * Stack of return addresses for functions
+ * of a thread.
+ * Used in struct thread_info
+ */
+struct ftrace_ret_stack {
+ unsigned long ret;
+ unsigned long func;
+ unsigned long long calltime;
+};
+
+/*
+ * Primary handler of a function return.
+ * It relays on ftrace_return_to_handler.
+ * Defined in entry32.S
+ */
+extern void return_to_handler(void);
+
+#endif /* __ASSEMBLY__ */
+#endif /* CONFIG_FUNCTION_RET_TRACER */
+
#endif /* _ASM_X86_FTRACE_H */
diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index e44d379..0921b40 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -20,6 +20,8 @@
struct task_struct;
struct exec_domain;
#include <asm/processor.h>
+#include <asm/ftrace.h>
+#include <asm/atomic.h>
struct thread_info {
struct task_struct *task; /* main task structure */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index e489ff9..1d8ed95 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -14,6 +14,11 @@
CFLAGS_REMOVE_ftrace.o = -pg
endif
+ifdef CONFIG_FUNCTION_RET_TRACER
+# Don't trace __switch_to() but let it for function tracer
+CFLAGS_REMOVE_process_32.o = -pg
+endif
+
#
# vsyscalls (which work on the user stack) should have
# no stack-protector checks:
@@ -65,6 +70,7 @@
obj-$(CONFIG_X86_IO_APIC) += io_apic.o
obj-$(CONFIG_X86_REBOOTFIXUPS) += reboot_fixups_32.o
obj-$(CONFIG_DYNAMIC_FTRACE) += ftrace.o
+obj-$(CONFIG_FUNCTION_RET_TRACER) += ftrace.o
obj-$(CONFIG_KEXEC) += machine_kexec_$(BITS).o
obj-$(CONFIG_KEXEC) += relocate_kernel_$(BITS).o crash.o
obj-$(CONFIG_CRASH_DUMP) += crash_dump_$(BITS).o
diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S
index 28b597e..74defe2 100644
--- a/arch/x86/kernel/entry_32.S
+++ b/arch/x86/kernel/entry_32.S
@@ -1157,6 +1157,9 @@
END(mcount)
ENTRY(ftrace_caller)
+ cmpl $0, function_trace_stop
+ jne ftrace_stub
+
pushl %eax
pushl %ecx
pushl %edx
@@ -1180,8 +1183,15 @@
#else /* ! CONFIG_DYNAMIC_FTRACE */
ENTRY(mcount)
+ cmpl $0, function_trace_stop
+ jne ftrace_stub
+
cmpl $ftrace_stub, ftrace_trace_function
jnz trace
+#ifdef CONFIG_FUNCTION_RET_TRACER
+ cmpl $ftrace_stub, ftrace_function_return
+ jnz ftrace_return_caller
+#endif
.globl ftrace_stub
ftrace_stub:
ret
@@ -1200,12 +1210,42 @@
popl %edx
popl %ecx
popl %eax
-
jmp ftrace_stub
END(mcount)
#endif /* CONFIG_DYNAMIC_FTRACE */
#endif /* CONFIG_FUNCTION_TRACER */
+#ifdef CONFIG_FUNCTION_RET_TRACER
+ENTRY(ftrace_return_caller)
+ cmpl $0, function_trace_stop
+ jne ftrace_stub
+
+ pushl %eax
+ pushl %ecx
+ pushl %edx
+ movl 0xc(%esp), %edx
+ lea 0x4(%ebp), %eax
+ call prepare_ftrace_return
+ popl %edx
+ popl %ecx
+ popl %eax
+ ret
+END(ftrace_return_caller)
+
+.globl return_to_handler
+return_to_handler:
+ pushl $0
+ pushl %eax
+ pushl %ecx
+ pushl %edx
+ call ftrace_return_to_handler
+ movl %eax, 0xc(%esp)
+ popl %edx
+ popl %ecx
+ popl %eax
+ ret
+#endif
+
.section .rodata,"a"
#include "syscall_table_32.S"
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index b86f332..08aa6b1 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -68,6 +68,8 @@
END(mcount)
ENTRY(ftrace_caller)
+ cmpl $0, function_trace_stop
+ jne ftrace_stub
/* taken from glibc */
subq $0x38, %rsp
@@ -103,6 +105,9 @@
#else /* ! CONFIG_DYNAMIC_FTRACE */
ENTRY(mcount)
+ cmpl $0, function_trace_stop
+ jne ftrace_stub
+
cmpq $ftrace_stub, ftrace_trace_function
jnz trace
.globl ftrace_stub
diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index 50ea0ac..bb137f7 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -14,14 +14,17 @@
#include <linux/uaccess.h>
#include <linux/ftrace.h>
#include <linux/percpu.h>
+#include <linux/sched.h>
#include <linux/init.h>
#include <linux/list.h>
#include <asm/ftrace.h>
+#include <linux/ftrace.h>
#include <asm/nops.h>
+#include <asm/nmi.h>
-static unsigned char ftrace_nop[MCOUNT_INSN_SIZE];
+#ifdef CONFIG_DYNAMIC_FTRACE
union ftrace_code_union {
char code[MCOUNT_INSN_SIZE];
@@ -31,18 +34,12 @@
} __attribute__((packed));
};
-
static int ftrace_calc_offset(long ip, long addr)
{
return (int)(addr - ip);
}
-unsigned char *ftrace_nop_replace(void)
-{
- return ftrace_nop;
-}
-
-unsigned char *ftrace_call_replace(unsigned long ip, unsigned long addr)
+static unsigned char *ftrace_call_replace(unsigned long ip, unsigned long addr)
{
static union ftrace_code_union calc;
@@ -56,7 +53,143 @@
return calc.code;
}
-int
+/*
+ * Modifying code must take extra care. On an SMP machine, if
+ * the code being modified is also being executed on another CPU
+ * that CPU will have undefined results and possibly take a GPF.
+ * We use kstop_machine to stop other CPUS from exectuing code.
+ * But this does not stop NMIs from happening. We still need
+ * to protect against that. We separate out the modification of
+ * the code to take care of this.
+ *
+ * Two buffers are added: An IP buffer and a "code" buffer.
+ *
+ * 1) Put the instruction pointer into the IP buffer
+ * and the new code into the "code" buffer.
+ * 2) Set a flag that says we are modifying code
+ * 3) Wait for any running NMIs to finish.
+ * 4) Write the code
+ * 5) clear the flag.
+ * 6) Wait for any running NMIs to finish.
+ *
+ * If an NMI is executed, the first thing it does is to call
+ * "ftrace_nmi_enter". This will check if the flag is set to write
+ * and if it is, it will write what is in the IP and "code" buffers.
+ *
+ * The trick is, it does not matter if everyone is writing the same
+ * content to the code location. Also, if a CPU is executing code
+ * it is OK to write to that code location if the contents being written
+ * are the same as what exists.
+ */
+
+static atomic_t in_nmi = ATOMIC_INIT(0);
+static int mod_code_status; /* holds return value of text write */
+static int mod_code_write; /* set when NMI should do the write */
+static void *mod_code_ip; /* holds the IP to write to */
+static void *mod_code_newcode; /* holds the text to write to the IP */
+
+static unsigned nmi_wait_count;
+static atomic_t nmi_update_count = ATOMIC_INIT(0);
+
+int ftrace_arch_read_dyn_info(char *buf, int size)
+{
+ int r;
+
+ r = snprintf(buf, size, "%u %u",
+ nmi_wait_count,
+ atomic_read(&nmi_update_count));
+ return r;
+}
+
+static void ftrace_mod_code(void)
+{
+ /*
+ * Yes, more than one CPU process can be writing to mod_code_status.
+ * (and the code itself)
+ * But if one were to fail, then they all should, and if one were
+ * to succeed, then they all should.
+ */
+ mod_code_status = probe_kernel_write(mod_code_ip, mod_code_newcode,
+ MCOUNT_INSN_SIZE);
+
+}
+
+void ftrace_nmi_enter(void)
+{
+ atomic_inc(&in_nmi);
+ /* Must have in_nmi seen before reading write flag */
+ smp_mb();
+ if (mod_code_write) {
+ ftrace_mod_code();
+ atomic_inc(&nmi_update_count);
+ }
+}
+
+void ftrace_nmi_exit(void)
+{
+ /* Finish all executions before clearing in_nmi */
+ smp_wmb();
+ atomic_dec(&in_nmi);
+}
+
+static void wait_for_nmi(void)
+{
+ int waited = 0;
+
+ while (atomic_read(&in_nmi)) {
+ waited = 1;
+ cpu_relax();
+ }
+
+ if (waited)
+ nmi_wait_count++;
+}
+
+static int
+do_ftrace_mod_code(unsigned long ip, void *new_code)
+{
+ mod_code_ip = (void *)ip;
+ mod_code_newcode = new_code;
+
+ /* The buffers need to be visible before we let NMIs write them */
+ smp_wmb();
+
+ mod_code_write = 1;
+
+ /* Make sure write bit is visible before we wait on NMIs */
+ smp_mb();
+
+ wait_for_nmi();
+
+ /* Make sure all running NMIs have finished before we write the code */
+ smp_mb();
+
+ ftrace_mod_code();
+
+ /* Make sure the write happens before clearing the bit */
+ smp_wmb();
+
+ mod_code_write = 0;
+
+ /* make sure NMIs see the cleared bit */
+ smp_mb();
+
+ wait_for_nmi();
+
+ return mod_code_status;
+}
+
+
+
+
+static unsigned char ftrace_nop[MCOUNT_INSN_SIZE];
+
+static unsigned char *ftrace_nop_replace(void)
+{
+ return ftrace_nop;
+}
+
+static int
ftrace_modify_code(unsigned long ip, unsigned char *old_code,
unsigned char *new_code)
{
@@ -81,7 +214,7 @@
return -EINVAL;
/* replace the text with the new text */
- if (probe_kernel_write((void *)ip, new_code, MCOUNT_INSN_SIZE))
+ if (do_ftrace_mod_code(ip, new_code))
return -EPERM;
sync_core();
@@ -89,6 +222,29 @@
return 0;
}
+int ftrace_make_nop(struct module *mod,
+ struct dyn_ftrace *rec, unsigned long addr)
+{
+ unsigned char *new, *old;
+ unsigned long ip = rec->ip;
+
+ old = ftrace_call_replace(ip, addr);
+ new = ftrace_nop_replace();
+
+ return ftrace_modify_code(rec->ip, old, new);
+}
+
+int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
+{
+ unsigned char *new, *old;
+ unsigned long ip = rec->ip;
+
+ old = ftrace_nop_replace();
+ new = ftrace_call_replace(ip, addr);
+
+ return ftrace_modify_code(rec->ip, old, new);
+}
+
int ftrace_update_ftrace_func(ftrace_func_t func)
{
unsigned long ip = (unsigned long)(&ftrace_call);
@@ -165,3 +321,139 @@
return 0;
}
+#endif
+
+#ifdef CONFIG_FUNCTION_RET_TRACER
+
+#ifndef CONFIG_DYNAMIC_FTRACE
+
+/*
+ * These functions are picked from those used on
+ * this page for dynamic ftrace. They have been
+ * simplified to ignore all traces in NMI context.
+ */
+static atomic_t in_nmi;
+
+void ftrace_nmi_enter(void)
+{
+ atomic_inc(&in_nmi);
+}
+
+void ftrace_nmi_exit(void)
+{
+ atomic_dec(&in_nmi);
+}
+#endif /* !CONFIG_DYNAMIC_FTRACE */
+
+/* Add a function return address to the trace stack on thread info.*/
+static int push_return_trace(unsigned long ret, unsigned long long time,
+ unsigned long func)
+{
+ int index;
+
+ if (!current->ret_stack)
+ return -EBUSY;
+
+ /* The return trace stack is full */
+ if (current->curr_ret_stack == FTRACE_RETFUNC_DEPTH - 1) {
+ atomic_inc(¤t->trace_overrun);
+ return -EBUSY;
+ }
+
+ index = ++current->curr_ret_stack;
+ barrier();
+ current->ret_stack[index].ret = ret;
+ current->ret_stack[index].func = func;
+ current->ret_stack[index].calltime = time;
+
+ return 0;
+}
+
+/* Retrieve a function return address to the trace stack on thread info.*/
+static void pop_return_trace(unsigned long *ret, unsigned long long *time,
+ unsigned long *func, unsigned long *overrun)
+{
+ int index;
+
+ index = current->curr_ret_stack;
+ *ret = current->ret_stack[index].ret;
+ *func = current->ret_stack[index].func;
+ *time = current->ret_stack[index].calltime;
+ *overrun = atomic_read(¤t->trace_overrun);
+ current->curr_ret_stack--;
+}
+
+/*
+ * Send the trace to the ring-buffer.
+ * @return the original return address.
+ */
+unsigned long ftrace_return_to_handler(void)
+{
+ struct ftrace_retfunc trace;
+ pop_return_trace(&trace.ret, &trace.calltime, &trace.func,
+ &trace.overrun);
+ trace.rettime = cpu_clock(raw_smp_processor_id());
+ ftrace_function_return(&trace);
+
+ return trace.ret;
+}
+
+/*
+ * Hook the return address and push it in the stack of return addrs
+ * in current thread info.
+ */
+void prepare_ftrace_return(unsigned long *parent, unsigned long self_addr)
+{
+ unsigned long old;
+ unsigned long long calltime;
+ int faulted;
+ unsigned long return_hooker = (unsigned long)
+ &return_to_handler;
+
+ /* Nmi's are currently unsupported */
+ if (atomic_read(&in_nmi))
+ return;
+
+ /*
+ * Protect against fault, even if it shouldn't
+ * happen. This tool is too much intrusive to
+ * ignore such a protection.
+ */
+ asm volatile(
+ "1: movl (%[parent_old]), %[old]\n"
+ "2: movl %[return_hooker], (%[parent_replaced])\n"
+ " movl $0, %[faulted]\n"
+
+ ".section .fixup, \"ax\"\n"
+ "3: movl $1, %[faulted]\n"
+ ".previous\n"
+
+ ".section __ex_table, \"a\"\n"
+ " .long 1b, 3b\n"
+ " .long 2b, 3b\n"
+ ".previous\n"
+
+ : [parent_replaced] "=r" (parent), [old] "=r" (old),
+ [faulted] "=r" (faulted)
+ : [parent_old] "0" (parent), [return_hooker] "r" (return_hooker)
+ : "memory"
+ );
+
+ if (WARN_ON(faulted)) {
+ unregister_ftrace_return();
+ return;
+ }
+
+ if (WARN_ON(!__kernel_text_address(old))) {
+ unregister_ftrace_return();
+ *parent = old;
+ return;
+ }
+
+ calltime = cpu_clock(raw_smp_processor_id());
+
+ if (push_return_trace(old, calltime, self_addr) == -EBUSY)
+ *parent = old;
+}
+
+#endif /* CONFIG_FUNCTION_RET_TRACER */
diff --git a/arch/x86/kernel/stacktrace.c b/arch/x86/kernel/stacktrace.c
index a03e7f6..10786af 100644
--- a/arch/x86/kernel/stacktrace.c
+++ b/arch/x86/kernel/stacktrace.c
@@ -6,6 +6,7 @@
#include <linux/sched.h>
#include <linux/stacktrace.h>
#include <linux/module.h>
+#include <linux/uaccess.h>
#include <asm/stacktrace.h>
static void save_stack_warning(void *data, char *msg)
@@ -83,3 +84,66 @@
trace->entries[trace->nr_entries++] = ULONG_MAX;
}
EXPORT_SYMBOL_GPL(save_stack_trace_tsk);
+
+/* Userspace stacktrace - based on kernel/trace/trace_sysprof.c */
+
+struct stack_frame {
+ const void __user *next_fp;
+ unsigned long ret_addr;
+};
+
+static int copy_stack_frame(const void __user *fp, struct stack_frame *frame)
+{
+ int ret;
+
+ if (!access_ok(VERIFY_READ, fp, sizeof(*frame)))
+ return 0;
+
+ ret = 1;
+ pagefault_disable();
+ if (__copy_from_user_inatomic(frame, fp, sizeof(*frame)))
+ ret = 0;
+ pagefault_enable();
+
+ return ret;
+}
+
+static inline void __save_stack_trace_user(struct stack_trace *trace)
+{
+ const struct pt_regs *regs = task_pt_regs(current);
+ const void __user *fp = (const void __user *)regs->bp;
+
+ if (trace->nr_entries < trace->max_entries)
+ trace->entries[trace->nr_entries++] = regs->ip;
+
+ while (trace->nr_entries < trace->max_entries) {
+ struct stack_frame frame;
+
+ frame.next_fp = NULL;
+ frame.ret_addr = 0;
+ if (!copy_stack_frame(fp, &frame))
+ break;
+ if ((unsigned long)fp < regs->sp)
+ break;
+ if (frame.ret_addr) {
+ trace->entries[trace->nr_entries++] =
+ frame.ret_addr;
+ }
+ if (fp == frame.next_fp)
+ break;
+ fp = frame.next_fp;
+ }
+}
+
+void save_stack_trace_user(struct stack_trace *trace)
+{
+ /*
+ * Trace user stack if we are not a kernel thread
+ */
+ if (current->mm) {
+ __save_stack_trace_user(trace);
+ }
+ if (trace->nr_entries < trace->max_entries)
+ trace->entries[trace->nr_entries++] = ULONG_MAX;
+}
+
diff --git a/arch/x86/kernel/vsyscall_64.c b/arch/x86/kernel/vsyscall_64.c
index 0b8b6690..6f3d3d4 100644
--- a/arch/x86/kernel/vsyscall_64.c
+++ b/arch/x86/kernel/vsyscall_64.c
@@ -17,6 +17,9 @@
* want per guest time just set the kernel.vsyscall64 sysctl to 0.
*/
+/* Disable profiling for userspace code: */
+#define DISABLE_BRANCH_PROFILING
+
#include <linux/time.h>
#include <linux/init.h>
#include <linux/kernel.h>
diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index fea4565..d8cc96a2 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -8,9 +8,8 @@
obj-$(CONFIG_HIGHMEM) += highmem_32.o
-obj-$(CONFIG_MMIOTRACE_HOOKS) += kmmio.o
obj-$(CONFIG_MMIOTRACE) += mmiotrace.o
-mmiotrace-y := pf_in.o mmio-mod.o
+mmiotrace-y := kmmio.o pf_in.o mmio-mod.o
obj-$(CONFIG_MMIOTRACE_TEST) += testmmiotrace.o
obj-$(CONFIG_NUMA) += numa_$(BITS).o
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 31e8730..4152d3c3 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -53,7 +53,7 @@
static inline int kmmio_fault(struct pt_regs *regs, unsigned long addr)
{
-#ifdef CONFIG_MMIOTRACE_HOOKS
+#ifdef CONFIG_MMIOTRACE
if (unlikely(is_kmmio_active()))
if (kmmio_handler(regs, addr) == 1)
return -1;
diff --git a/arch/x86/vdso/vclock_gettime.c b/arch/x86/vdso/vclock_gettime.c
index 1ef0f90..d9d3582 100644
--- a/arch/x86/vdso/vclock_gettime.c
+++ b/arch/x86/vdso/vclock_gettime.c
@@ -9,6 +9,9 @@
* Also alternative() doesn't work.
*/
+/* Disable profiling for userspace code: */
+#define DISABLE_BRANCH_PROFILING
+
#include <linux/kernel.h>
#include <linux/posix-timers.h>
#include <linux/time.h>
diff --git a/drivers/char/sysrq.c b/drivers/char/sysrq.c
index ce0d9da..94966ed 100644
--- a/drivers/char/sysrq.c
+++ b/drivers/char/sysrq.c
@@ -274,6 +274,22 @@
.enable_mask = SYSRQ_ENABLE_DUMP,
};
+#ifdef CONFIG_TRACING
+#include <linux/ftrace.h>
+
+static void sysrq_ftrace_dump(int key, struct tty_struct *tty)
+{
+ ftrace_dump();
+}
+static struct sysrq_key_op sysrq_ftrace_dump_op = {
+ .handler = sysrq_ftrace_dump,
+ .help_msg = "dumpZ-ftrace-buffer",
+ .action_msg = "Dump ftrace buffer",
+ .enable_mask = SYSRQ_ENABLE_DUMP,
+};
+#else
+#define sysrq_ftrace_dump_op (*(struct sysrq_key_op *)0)
+#endif
static void sysrq_handle_showmem(int key, struct tty_struct *tty)
{
@@ -406,7 +422,7 @@
NULL, /* x */
/* y: May be registered on sparc64 for global register dump */
NULL, /* y */
- NULL /* z */
+ &sysrq_ftrace_dump_op, /* z */
};
/* key2index calculation, -1 on invalid index */
diff --git a/fs/seq_file.c b/fs/seq_file.c
index eba2eab..f03220d7 100644
--- a/fs/seq_file.c
+++ b/fs/seq_file.c
@@ -357,7 +357,18 @@
}
EXPORT_SYMBOL(seq_printf);
-static char *mangle_path(char *s, char *p, char *esc)
+/**
+ * mangle_path - mangle and copy path to buffer beginning
+ * @s: buffer start
+ * @p: beginning of path in above buffer
+ * @esc: set of characters that need escaping
+ *
+ * Copy the path from @p to @s, replacing each occurrence of character from
+ * @esc with usual octal escape.
+ * Returns pointer past last written character in @s, or NULL in case of
+ * failure.
+ */
+char *mangle_path(char *s, char *p, char *esc)
{
while (s <= p) {
char c = *p++;
@@ -376,6 +387,7 @@
}
return NULL;
}
+EXPORT_SYMBOL_GPL(mangle_path);
/*
* return the absolute path of 'dentry' residing in mount 'mnt'.
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index 80744606..eba835a 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -45,6 +45,22 @@
#define MCOUNT_REC()
#endif
+#ifdef CONFIG_TRACE_BRANCH_PROFILING
+#define LIKELY_PROFILE() VMLINUX_SYMBOL(__start_annotated_branch_profile) = .; \
+ *(_ftrace_annotated_branch) \
+ VMLINUX_SYMBOL(__stop_annotated_branch_profile) = .;
+#else
+#define LIKELY_PROFILE()
+#endif
+
+#ifdef CONFIG_PROFILE_ALL_BRANCHES
+#define BRANCH_PROFILE() VMLINUX_SYMBOL(__start_branch_profile) = .; \
+ *(_ftrace_branch) \
+ VMLINUX_SYMBOL(__stop_branch_profile) = .;
+#else
+#define BRANCH_PROFILE()
+#endif
+
/* .data section */
#define DATA_DATA \
*(.data) \
@@ -60,9 +76,12 @@
VMLINUX_SYMBOL(__start___markers) = .; \
*(__markers) \
VMLINUX_SYMBOL(__stop___markers) = .; \
+ . = ALIGN(32); \
VMLINUX_SYMBOL(__start___tracepoints) = .; \
*(__tracepoints) \
- VMLINUX_SYMBOL(__stop___tracepoints) = .;
+ VMLINUX_SYMBOL(__stop___tracepoints) = .; \
+ LIKELY_PROFILE() \
+ BRANCH_PROFILE()
#define RO_DATA(align) \
. = ALIGN((align)); \
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 98115d9..ea7c6be 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -59,8 +59,88 @@
* specific implementations come from the above header files
*/
-#define likely(x) __builtin_expect(!!(x), 1)
-#define unlikely(x) __builtin_expect(!!(x), 0)
+struct ftrace_branch_data {
+ const char *func;
+ const char *file;
+ unsigned line;
+ union {
+ struct {
+ unsigned long correct;
+ unsigned long incorrect;
+ };
+ struct {
+ unsigned long miss;
+ unsigned long hit;
+ };
+ };
+};
+
+/*
+ * Note: DISABLE_BRANCH_PROFILING can be used by special lowlevel code
+ * to disable branch tracing on a per file basis.
+ */
+#if defined(CONFIG_TRACE_BRANCH_PROFILING) && !defined(DISABLE_BRANCH_PROFILING)
+void ftrace_likely_update(struct ftrace_branch_data *f, int val, int expect);
+
+#define likely_notrace(x) __builtin_expect(!!(x), 1)
+#define unlikely_notrace(x) __builtin_expect(!!(x), 0)
+
+#define __branch_check__(x, expect) ({ \
+ int ______r; \
+ static struct ftrace_branch_data \
+ __attribute__((__aligned__(4))) \
+ __attribute__((section("_ftrace_annotated_branch"))) \
+ ______f = { \
+ .func = __func__, \
+ .file = __FILE__, \
+ .line = __LINE__, \
+ }; \
+ ______r = likely_notrace(x); \
+ ftrace_likely_update(&______f, ______r, expect); \
+ ______r; \
+ })
+
+/*
+ * Using __builtin_constant_p(x) to ignore cases where the return
+ * value is always the same. This idea is taken from a similar patch
+ * written by Daniel Walker.
+ */
+# ifndef likely
+# define likely(x) (__builtin_constant_p(x) ? !!(x) : __branch_check__(x, 1))
+# endif
+# ifndef unlikely
+# define unlikely(x) (__builtin_constant_p(x) ? !!(x) : __branch_check__(x, 0))
+# endif
+
+#ifdef CONFIG_PROFILE_ALL_BRANCHES
+/*
+ * "Define 'is'", Bill Clinton
+ * "Define 'if'", Steven Rostedt
+ */
+#define if(cond) if (__builtin_constant_p((cond)) ? !!(cond) : \
+ ({ \
+ int ______r; \
+ static struct ftrace_branch_data \
+ __attribute__((__aligned__(4))) \
+ __attribute__((section("_ftrace_branch"))) \
+ ______f = { \
+ .func = __func__, \
+ .file = __FILE__, \
+ .line = __LINE__, \
+ }; \
+ ______r = !!(cond); \
+ if (______r) \
+ ______f.hit++; \
+ else \
+ ______f.miss++; \
+ ______r; \
+ }))
+#endif /* CONFIG_PROFILE_ALL_BRANCHES */
+
+#else
+# define likely(x) __builtin_expect(!!(x), 1)
+# define unlikely(x) __builtin_expect(!!(x), 0)
+#endif
/* Optimization barrier */
#ifndef barrier
diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 703eb53..7854d87 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -23,6 +23,45 @@
struct ftrace_ops *next;
};
+extern int function_trace_stop;
+
+/*
+ * Type of the current tracing.
+ */
+enum ftrace_tracing_type_t {
+ FTRACE_TYPE_ENTER = 0, /* Hook the call of the function */
+ FTRACE_TYPE_RETURN, /* Hook the return of the function */
+};
+
+/* Current tracing type, default is FTRACE_TYPE_ENTER */
+extern enum ftrace_tracing_type_t ftrace_tracing_type;
+
+/**
+ * ftrace_stop - stop function tracer.
+ *
+ * A quick way to stop the function tracer. Note this an on off switch,
+ * it is not something that is recursive like preempt_disable.
+ * This does not disable the calling of mcount, it only stops the
+ * calling of functions from mcount.
+ */
+static inline void ftrace_stop(void)
+{
+ function_trace_stop = 1;
+}
+
+/**
+ * ftrace_start - start the function tracer.
+ *
+ * This function is the inverse of ftrace_stop. This does not enable
+ * the function tracing if the function tracer is disabled. This only
+ * sets the function tracer flag to continue calling the functions
+ * from mcount.
+ */
+static inline void ftrace_start(void)
+{
+ function_trace_stop = 0;
+}
+
/*
* The ftrace_ops must be a static and should also
* be read_mostly. These functions do modify read_mostly variables
@@ -41,9 +80,13 @@
# define unregister_ftrace_function(ops) do { } while (0)
# define clear_ftrace_function(ops) do { } while (0)
static inline void ftrace_kill(void) { }
+static inline void ftrace_stop(void) { }
+static inline void ftrace_start(void) { }
#endif /* CONFIG_FUNCTION_TRACER */
#ifdef CONFIG_DYNAMIC_FTRACE
+/* asm/ftrace.h must be defined for archs supporting dynamic ftrace */
+#include <asm/ftrace.h>
enum {
FTRACE_FL_FREE = (1 << 0),
@@ -59,6 +102,7 @@
struct list_head list;
unsigned long ip; /* address of mcount call-site */
unsigned long flags;
+ struct dyn_arch_ftrace arch;
};
int ftrace_force_update(void);
@@ -66,19 +110,20 @@
/* defined in arch */
extern int ftrace_ip_converted(unsigned long ip);
-extern unsigned char *ftrace_nop_replace(void);
-extern unsigned char *ftrace_call_replace(unsigned long ip, unsigned long addr);
extern int ftrace_dyn_arch_init(void *data);
extern int ftrace_update_ftrace_func(ftrace_func_t func);
extern void ftrace_caller(void);
extern void ftrace_call(void);
extern void mcount_call(void);
+#ifdef CONFIG_FUNCTION_RET_TRACER
+extern void ftrace_return_caller(void);
+#endif
/**
- * ftrace_modify_code - modify code segment
- * @ip: the address of the code segment
- * @old_code: the contents of what is expected to be there
- * @new_code: the code to patch in
+ * ftrace_make_nop - convert code into top
+ * @mod: module structure if called by module load initialization
+ * @rec: the mcount call site record
+ * @addr: the address that the call site should be calling
*
* This is a very sensitive operation and great care needs
* to be taken by the arch. The operation should carefully
@@ -86,6 +131,8 @@
* what we expect it to be, and then on success of the compare,
* it should write to the location.
*
+ * The code segment at @rec->ip should be a caller to @addr
+ *
* Return must be:
* 0 on success
* -EFAULT on error reading the location
@@ -93,8 +140,34 @@
* -EPERM on error writing to the location
* Any other value will be considered a failure.
*/
-extern int ftrace_modify_code(unsigned long ip, unsigned char *old_code,
- unsigned char *new_code);
+extern int ftrace_make_nop(struct module *mod,
+ struct dyn_ftrace *rec, unsigned long addr);
+
+/**
+ * ftrace_make_call - convert a nop call site into a call to addr
+ * @rec: the mcount call site record
+ * @addr: the address that the call site should call
+ *
+ * This is a very sensitive operation and great care needs
+ * to be taken by the arch. The operation should carefully
+ * read the location, check to see if what is read is indeed
+ * what we expect it to be, and then on success of the compare,
+ * it should write to the location.
+ *
+ * The code segment at @rec->ip should be a nop
+ *
+ * Return must be:
+ * 0 on success
+ * -EFAULT on error reading the location
+ * -EINVAL on a failed compare of the contents
+ * -EPERM on error writing to the location
+ * Any other value will be considered a failure.
+ */
+extern int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr);
+
+
+/* May be defined in arch */
+extern int ftrace_arch_read_dyn_info(char *buf, int size);
extern int skip_trace(unsigned long ip);
@@ -102,7 +175,6 @@
extern void ftrace_disable_daemon(void);
extern void ftrace_enable_daemon(void);
-
#else
# define skip_trace(ip) ({ 0; })
# define ftrace_force_update() ({ 0; })
@@ -181,6 +253,12 @@
#endif
#ifdef CONFIG_TRACING
+extern int ftrace_dump_on_oops;
+
+extern void tracing_start(void);
+extern void tracing_stop(void);
+extern void ftrace_off_permanent(void);
+
extern void
ftrace_special(unsigned long arg1, unsigned long arg2, unsigned long arg3);
@@ -211,6 +289,9 @@
static inline int
ftrace_printk(const char *fmt, ...) __attribute__ ((format (printf, 1, 0)));
+static inline void tracing_start(void) { }
+static inline void tracing_stop(void) { }
+static inline void ftrace_off_permanent(void) { }
static inline int
ftrace_printk(const char *fmt, ...)
{
@@ -221,33 +302,44 @@
#ifdef CONFIG_FTRACE_MCOUNT_RECORD
extern void ftrace_init(void);
-extern void ftrace_init_module(unsigned long *start, unsigned long *end);
+extern void ftrace_init_module(struct module *mod,
+ unsigned long *start, unsigned long *end);
#else
static inline void ftrace_init(void) { }
static inline void
-ftrace_init_module(unsigned long *start, unsigned long *end) { }
+ftrace_init_module(struct module *mod,
+ unsigned long *start, unsigned long *end) { }
#endif
-struct boot_trace {
- pid_t caller;
- char func[KSYM_NAME_LEN];
- int result;
- unsigned long long duration; /* usecs */
- ktime_t calltime;
- ktime_t rettime;
+/*
+ * Structure that defines a return function trace.
+ */
+struct ftrace_retfunc {
+ unsigned long ret; /* Return address */
+ unsigned long func; /* Current function */
+ unsigned long long calltime;
+ unsigned long long rettime;
+ /* Number of functions that overran the depth limit for current task */
+ unsigned long overrun;
};
-#ifdef CONFIG_BOOT_TRACER
-extern void trace_boot(struct boot_trace *it, initcall_t fn);
-extern void start_boot_trace(void);
-extern void stop_boot_trace(void);
+#ifdef CONFIG_FUNCTION_RET_TRACER
+#define FTRACE_RETFUNC_DEPTH 50
+#define FTRACE_RETSTACK_ALLOC_SIZE 32
+/* Type of a callback handler of tracing return function */
+typedef void (*trace_function_return_t)(struct ftrace_retfunc *);
+
+extern int register_ftrace_return(trace_function_return_t func);
+/* The current handler in use */
+extern trace_function_return_t ftrace_function_return;
+extern void unregister_ftrace_return(void);
+
+extern void ftrace_retfunc_init_task(struct task_struct *t);
+extern void ftrace_retfunc_exit_task(struct task_struct *t);
#else
-static inline void trace_boot(struct boot_trace *it, initcall_t fn) { }
-static inline void start_boot_trace(void) { }
-static inline void stop_boot_trace(void) { }
+static inline void ftrace_retfunc_init_task(struct task_struct *t) { }
+static inline void ftrace_retfunc_exit_task(struct task_struct *t) { }
#endif
-
-
#endif /* _LINUX_FTRACE_H */
diff --git a/include/linux/ftrace_irq.h b/include/linux/ftrace_irq.h
new file mode 100644
index 0000000..0b4df55
--- /dev/null
+++ b/include/linux/ftrace_irq.h
@@ -0,0 +1,13 @@
+#ifndef _LINUX_FTRACE_IRQ_H
+#define _LINUX_FTRACE_IRQ_H
+
+
+#if defined(CONFIG_DYNAMIC_FTRACE) || defined(CONFIG_FUNCTION_RET_TRACER)
+extern void ftrace_nmi_enter(void);
+extern void ftrace_nmi_exit(void);
+#else
+static inline void ftrace_nmi_enter(void) { }
+static inline void ftrace_nmi_exit(void) { }
+#endif
+
+#endif /* _LINUX_FTRACE_IRQ_H */
diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
index 181006c..89a56d7 100644
--- a/include/linux/hardirq.h
+++ b/include/linux/hardirq.h
@@ -4,6 +4,7 @@
#include <linux/preempt.h>
#include <linux/smp_lock.h>
#include <linux/lockdep.h>
+#include <linux/ftrace_irq.h>
#include <asm/hardirq.h>
#include <asm/system.h>
@@ -161,7 +162,17 @@
*/
extern void irq_exit(void);
-#define nmi_enter() do { lockdep_off(); __irq_enter(); } while (0)
-#define nmi_exit() do { __irq_exit(); lockdep_on(); } while (0)
+#define nmi_enter() \
+ do { \
+ ftrace_nmi_enter(); \
+ lockdep_off(); \
+ __irq_enter(); \
+ } while (0)
+#define nmi_exit() \
+ do { \
+ __irq_exit(); \
+ lockdep_on(); \
+ ftrace_nmi_exit(); \
+ } while (0)
#endif /* LINUX_HARDIRQ_H */
diff --git a/include/linux/marker.h b/include/linux/marker.h
index 889196c..34c14bc 100644
--- a/include/linux/marker.h
+++ b/include/linux/marker.h
@@ -12,6 +12,7 @@
* See the file COPYING for more details.
*/
+#include <stdarg.h>
#include <linux/types.h>
struct module;
@@ -48,10 +49,28 @@
void (*call)(const struct marker *mdata, void *call_private, ...);
struct marker_probe_closure single;
struct marker_probe_closure *multi;
+ const char *tp_name; /* Optional tracepoint name */
+ void *tp_cb; /* Optional tracepoint callback */
} __attribute__((aligned(8)));
#ifdef CONFIG_MARKERS
+#define _DEFINE_MARKER(name, tp_name_str, tp_cb, format) \
+ static const char __mstrtab_##name[] \
+ __attribute__((section("__markers_strings"))) \
+ = #name "\0" format; \
+ static struct marker __mark_##name \
+ __attribute__((section("__markers"), aligned(8))) = \
+ { __mstrtab_##name, &__mstrtab_##name[sizeof(#name)], \
+ 0, 0, marker_probe_cb, { __mark_empty_function, NULL},\
+ NULL, tp_name_str, tp_cb }
+
+#define DEFINE_MARKER(name, format) \
+ _DEFINE_MARKER(name, NULL, NULL, format)
+
+#define DEFINE_MARKER_TP(name, tp_name, tp_cb, format) \
+ _DEFINE_MARKER(name, #tp_name, tp_cb, format)
+
/*
* Note : the empty asm volatile with read constraint is used here instead of a
* "used" attribute to fix a gcc 4.1.x bug.
@@ -65,14 +84,7 @@
*/
#define __trace_mark(generic, name, call_private, format, args...) \
do { \
- static const char __mstrtab_##name[] \
- __attribute__((section("__markers_strings"))) \
- = #name "\0" format; \
- static struct marker __mark_##name \
- __attribute__((section("__markers"), aligned(8))) = \
- { __mstrtab_##name, &__mstrtab_##name[sizeof(#name)], \
- 0, 0, marker_probe_cb, \
- { __mark_empty_function, NULL}, NULL }; \
+ DEFINE_MARKER(name, format); \
__mark_check_format(format, ## args); \
if (unlikely(__mark_##name.state)) { \
(*__mark_##name.call) \
@@ -80,14 +92,39 @@
} \
} while (0)
+#define __trace_mark_tp(name, call_private, tp_name, tp_cb, format, args...) \
+ do { \
+ void __check_tp_type(void) \
+ { \
+ register_trace_##tp_name(tp_cb); \
+ } \
+ DEFINE_MARKER_TP(name, tp_name, tp_cb, format); \
+ __mark_check_format(format, ## args); \
+ (*__mark_##name.call)(&__mark_##name, call_private, \
+ ## args); \
+ } while (0)
+
extern void marker_update_probe_range(struct marker *begin,
struct marker *end);
+
+#define GET_MARKER(name) (__mark_##name)
+
#else /* !CONFIG_MARKERS */
+#define DEFINE_MARKER(name, tp_name, tp_cb, format)
#define __trace_mark(generic, name, call_private, format, args...) \
__mark_check_format(format, ## args)
+#define __trace_mark_tp(name, call_private, tp_name, tp_cb, format, args...) \
+ do { \
+ void __check_tp_type(void) \
+ { \
+ register_trace_##tp_name(tp_cb); \
+ } \
+ __mark_check_format(format, ## args); \
+ } while (0)
static inline void marker_update_probe_range(struct marker *begin,
struct marker *end)
{ }
+#define GET_MARKER(name)
#endif /* CONFIG_MARKERS */
/**
@@ -117,6 +154,20 @@
__trace_mark(1, name, NULL, format, ## args)
/**
+ * trace_mark_tp - Marker in a tracepoint callback
+ * @name: marker name, not quoted.
+ * @tp_name: tracepoint name, not quoted.
+ * @tp_cb: tracepoint callback. Should have an associated global symbol so it
+ * is not optimized away by the compiler (should not be static).
+ * @format: format string
+ * @args...: variable argument list
+ *
+ * Places a marker in a tracepoint callback.
+ */
+#define trace_mark_tp(name, tp_name, tp_cb, format, args...) \
+ __trace_mark_tp(name, NULL, tp_name, tp_cb, format, ## args)
+
+/**
* MARK_NOARGS - Format string for a marker with no argument.
*/
#define MARK_NOARGS " "
@@ -136,8 +187,6 @@
extern void marker_probe_cb(const struct marker *mdata,
void *call_private, ...);
-extern void marker_probe_cb_noarg(const struct marker *mdata,
- void *call_private, ...);
/*
* Connect a probe to a marker.
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 86f1f5e..895dc9c 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -142,6 +142,7 @@
* on the write-side to insure proper synchronization.
*/
#define rcu_read_lock_sched() preempt_disable()
+#define rcu_read_lock_sched_notrace() preempt_disable_notrace()
/*
* rcu_read_unlock_sched - marks the end of a RCU-classic critical section
@@ -149,6 +150,7 @@
* See rcu_read_lock_sched for more information.
*/
#define rcu_read_unlock_sched() preempt_enable()
+#define rcu_read_unlock_sched_notrace() preempt_enable_notrace()
diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index e097c2e..3bb87a7 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -122,6 +122,7 @@
void tracing_on(void);
void tracing_off(void);
+void tracing_off_permanent(void);
enum ring_buffer_flags {
RB_FL_OVERWRITE = 1 << 0,
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 644ffbd..bee1e93 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1352,6 +1352,17 @@
unsigned long default_timer_slack_ns;
struct list_head *scm_work_list;
+#ifdef CONFIG_FUNCTION_RET_TRACER
+ /* Index of current stored adress in ret_stack */
+ int curr_ret_stack;
+ /* Stack of return addresses for return function tracing */
+ struct ftrace_ret_stack *ret_stack;
+ /*
+ * Number of functions that haven't been traced
+ * because of depth overrun.
+ */
+ atomic_t trace_overrun;
+#endif
};
/*
diff --git a/include/linux/seq_file.h b/include/linux/seq_file.h
index dc50bcc..b3dfa72 100644
--- a/include/linux/seq_file.h
+++ b/include/linux/seq_file.h
@@ -34,6 +34,7 @@
#define SEQ_SKIP 1
+char *mangle_path(char *s, char *p, char *esc);
int seq_open(struct file *, const struct seq_operations *);
ssize_t seq_read(struct file *, char __user *, size_t, loff_t *);
loff_t seq_lseek(struct file *, loff_t, int);
diff --git a/include/linux/stacktrace.h b/include/linux/stacktrace.h
index b106fd8..1a8cecc 100644
--- a/include/linux/stacktrace.h
+++ b/include/linux/stacktrace.h
@@ -15,9 +15,17 @@
struct stack_trace *trace);
extern void print_stack_trace(struct stack_trace *trace, int spaces);
+
+#ifdef CONFIG_USER_STACKTRACE_SUPPORT
+extern void save_stack_trace_user(struct stack_trace *trace);
+#else
+# define save_stack_trace_user(trace) do { } while (0)
+#endif
+
#else
# define save_stack_trace(trace) do { } while (0)
# define save_stack_trace_tsk(tsk, trace) do { } while (0)
+# define save_stack_trace_user(trace) do { } while (0)
# define print_stack_trace(trace, spaces) do { } while (0)
#endif
diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index c5bb39c..7570054 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -24,8 +24,12 @@
const char *name; /* Tracepoint name */
int state; /* State. */
void **funcs;
-} __attribute__((aligned(8)));
-
+} __attribute__((aligned(32))); /*
+ * Aligned on 32 bytes because it is
+ * globally visible and gcc happily
+ * align these on the structure size.
+ * Keep in sync with vmlinux.lds.h.
+ */
#define TPPROTO(args...) args
#define TPARGS(args...) args
@@ -40,14 +44,14 @@
do { \
void **it_func; \
\
- rcu_read_lock_sched(); \
+ rcu_read_lock_sched_notrace(); \
it_func = rcu_dereference((tp)->funcs); \
if (it_func) { \
do { \
((void(*)(proto))(*it_func))(args); \
} while (*(++it_func)); \
} \
- rcu_read_unlock_sched(); \
+ rcu_read_unlock_sched_notrace(); \
} while (0)
/*
@@ -55,35 +59,40 @@
* not add unwanted padding between the beginning of the section and the
* structure. Force alignment to the same alignment as the section start.
*/
-#define DEFINE_TRACE(name, proto, args) \
+#define DECLARE_TRACE(name, proto, args) \
+ extern struct tracepoint __tracepoint_##name; \
static inline void trace_##name(proto) \
{ \
- static const char __tpstrtab_##name[] \
- __attribute__((section("__tracepoints_strings"))) \
- = #name ":" #proto; \
- static struct tracepoint __tracepoint_##name \
- __attribute__((section("__tracepoints"), aligned(8))) = \
- { __tpstrtab_##name, 0, NULL }; \
if (unlikely(__tracepoint_##name.state)) \
__DO_TRACE(&__tracepoint_##name, \
TPPROTO(proto), TPARGS(args)); \
} \
static inline int register_trace_##name(void (*probe)(proto)) \
{ \
- return tracepoint_probe_register(#name ":" #proto, \
- (void *)probe); \
+ return tracepoint_probe_register(#name, (void *)probe); \
} \
- static inline void unregister_trace_##name(void (*probe)(proto))\
+ static inline int unregister_trace_##name(void (*probe)(proto)) \
{ \
- tracepoint_probe_unregister(#name ":" #proto, \
- (void *)probe); \
+ return tracepoint_probe_unregister(#name, (void *)probe);\
}
+#define DEFINE_TRACE(name) \
+ static const char __tpstrtab_##name[] \
+ __attribute__((section("__tracepoints_strings"))) = #name; \
+ struct tracepoint __tracepoint_##name \
+ __attribute__((section("__tracepoints"), aligned(32))) = \
+ { __tpstrtab_##name, 0, NULL }
+
+#define EXPORT_TRACEPOINT_SYMBOL_GPL(name) \
+ EXPORT_SYMBOL_GPL(__tracepoint_##name)
+#define EXPORT_TRACEPOINT_SYMBOL(name) \
+ EXPORT_SYMBOL(__tracepoint_##name)
+
extern void tracepoint_update_probe_range(struct tracepoint *begin,
struct tracepoint *end);
#else /* !CONFIG_TRACEPOINTS */
-#define DEFINE_TRACE(name, proto, args) \
+#define DECLARE_TRACE(name, proto, args) \
static inline void _do_trace_##name(struct tracepoint *tp, proto) \
{ } \
static inline void trace_##name(proto) \
@@ -92,8 +101,14 @@
{ \
return -ENOSYS; \
} \
- static inline void unregister_trace_##name(void (*probe)(proto))\
- { }
+ static inline int unregister_trace_##name(void (*probe)(proto)) \
+ { \
+ return -ENOSYS; \
+ }
+
+#define DEFINE_TRACE(name)
+#define EXPORT_TRACEPOINT_SYMBOL_GPL(name)
+#define EXPORT_TRACEPOINT_SYMBOL(name)
static inline void tracepoint_update_probe_range(struct tracepoint *begin,
struct tracepoint *end)
@@ -112,6 +127,10 @@
*/
extern int tracepoint_probe_unregister(const char *name, void *probe);
+extern int tracepoint_probe_register_noupdate(const char *name, void *probe);
+extern int tracepoint_probe_unregister_noupdate(const char *name, void *probe);
+extern void tracepoint_probe_update_all(void);
+
struct tracepoint_iter {
struct module *module;
struct tracepoint *tracepoint;
diff --git a/include/trace/boot.h b/include/trace/boot.h
new file mode 100644
index 0000000..6b54537
--- /dev/null
+++ b/include/trace/boot.h
@@ -0,0 +1,56 @@
+#ifndef _LINUX_TRACE_BOOT_H
+#define _LINUX_TRACE_BOOT_H
+
+/*
+ * Structure which defines the trace of an initcall
+ * while it is called.
+ * You don't have to fill the func field since it is
+ * only used internally by the tracer.
+ */
+struct boot_trace_call {
+ pid_t caller;
+ char func[KSYM_NAME_LEN];
+};
+
+/*
+ * Structure which defines the trace of an initcall
+ * while it returns.
+ */
+struct boot_trace_ret {
+ char func[KSYM_NAME_LEN];
+ int result;
+ unsigned long long duration; /* nsecs */
+};
+
+#ifdef CONFIG_BOOT_TRACER
+/* Append the traces on the ring-buffer */
+extern void trace_boot_call(struct boot_trace_call *bt, initcall_t fn);
+extern void trace_boot_ret(struct boot_trace_ret *bt, initcall_t fn);
+
+/* Tells the tracer that smp_pre_initcall is finished.
+ * So we can start the tracing
+ */
+extern void start_boot_trace(void);
+
+/* Resume the tracing of other necessary events
+ * such as sched switches
+ */
+extern void enable_boot_trace(void);
+
+/* Suspend this tracing. Actually, only sched_switches tracing have
+ * to be suspended. Initcalls doesn't need it.)
+ */
+extern void disable_boot_trace(void);
+#else
+static inline
+void trace_boot_call(struct boot_trace_call *bt, initcall_t fn) { }
+
+static inline
+void trace_boot_ret(struct boot_trace_ret *bt, initcall_t fn) { }
+
+static inline void start_boot_trace(void) { }
+static inline void enable_boot_trace(void) { }
+static inline void disable_boot_trace(void) { }
+#endif /* CONFIG_BOOT_TRACER */
+
+#endif /* __LINUX_TRACE_BOOT_H */
diff --git a/include/trace/sched.h b/include/trace/sched.h
index ad47369..9b2854a 100644
--- a/include/trace/sched.h
+++ b/include/trace/sched.h
@@ -4,52 +4,52 @@
#include <linux/sched.h>
#include <linux/tracepoint.h>
-DEFINE_TRACE(sched_kthread_stop,
+DECLARE_TRACE(sched_kthread_stop,
TPPROTO(struct task_struct *t),
TPARGS(t));
-DEFINE_TRACE(sched_kthread_stop_ret,
+DECLARE_TRACE(sched_kthread_stop_ret,
TPPROTO(int ret),
TPARGS(ret));
-DEFINE_TRACE(sched_wait_task,
+DECLARE_TRACE(sched_wait_task,
TPPROTO(struct rq *rq, struct task_struct *p),
TPARGS(rq, p));
-DEFINE_TRACE(sched_wakeup,
+DECLARE_TRACE(sched_wakeup,
TPPROTO(struct rq *rq, struct task_struct *p),
TPARGS(rq, p));
-DEFINE_TRACE(sched_wakeup_new,
+DECLARE_TRACE(sched_wakeup_new,
TPPROTO(struct rq *rq, struct task_struct *p),
TPARGS(rq, p));
-DEFINE_TRACE(sched_switch,
+DECLARE_TRACE(sched_switch,
TPPROTO(struct rq *rq, struct task_struct *prev,
struct task_struct *next),
TPARGS(rq, prev, next));
-DEFINE_TRACE(sched_migrate_task,
+DECLARE_TRACE(sched_migrate_task,
TPPROTO(struct rq *rq, struct task_struct *p, int dest_cpu),
TPARGS(rq, p, dest_cpu));
-DEFINE_TRACE(sched_process_free,
+DECLARE_TRACE(sched_process_free,
TPPROTO(struct task_struct *p),
TPARGS(p));
-DEFINE_TRACE(sched_process_exit,
+DECLARE_TRACE(sched_process_exit,
TPPROTO(struct task_struct *p),
TPARGS(p));
-DEFINE_TRACE(sched_process_wait,
+DECLARE_TRACE(sched_process_wait,
TPPROTO(struct pid *pid),
TPARGS(pid));
-DEFINE_TRACE(sched_process_fork,
+DECLARE_TRACE(sched_process_fork,
TPPROTO(struct task_struct *parent, struct task_struct *child),
TPARGS(parent, child));
-DEFINE_TRACE(sched_signal_send,
+DECLARE_TRACE(sched_signal_send,
TPPROTO(int sig, struct task_struct *p),
TPARGS(sig, p));
diff --git a/init/Kconfig b/init/Kconfig
index f763762..f291f08 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -808,6 +808,7 @@
config MARKERS
bool "Activate markers"
+ depends on TRACEPOINTS
help
Place an empty function call at each marker site. Can be
dynamically changed for a probe function.
diff --git a/init/main.c b/init/main.c
index 7e117a2..79213c0 100644
--- a/init/main.c
+++ b/init/main.c
@@ -63,6 +63,7 @@
#include <linux/signal.h>
#include <linux/idr.h>
#include <linux/ftrace.h>
+#include <trace/boot.h>
#include <asm/io.h>
#include <asm/bugs.h>
@@ -703,31 +704,35 @@
int do_one_initcall(initcall_t fn)
{
int count = preempt_count();
- ktime_t delta;
+ ktime_t calltime, delta, rettime;
char msgbuf[64];
- struct boot_trace it;
+ struct boot_trace_call call;
+ struct boot_trace_ret ret;
if (initcall_debug) {
- it.caller = task_pid_nr(current);
- printk("calling %pF @ %i\n", fn, it.caller);
- it.calltime = ktime_get();
+ call.caller = task_pid_nr(current);
+ printk("calling %pF @ %i\n", fn, call.caller);
+ calltime = ktime_get();
+ trace_boot_call(&call, fn);
+ enable_boot_trace();
}
- it.result = fn();
+ ret.result = fn();
if (initcall_debug) {
- it.rettime = ktime_get();
- delta = ktime_sub(it.rettime, it.calltime);
- it.duration = (unsigned long long) delta.tv64 >> 10;
+ disable_boot_trace();
+ rettime = ktime_get();
+ delta = ktime_sub(rettime, calltime);
+ ret.duration = (unsigned long long) ktime_to_ns(delta) >> 10;
+ trace_boot_ret(&ret, fn);
printk("initcall %pF returned %d after %Ld usecs\n", fn,
- it.result, it.duration);
- trace_boot(&it, fn);
+ ret.result, ret.duration);
}
msgbuf[0] = 0;
- if (it.result && it.result != -ENODEV && initcall_debug)
- sprintf(msgbuf, "error code %d ", it.result);
+ if (ret.result && ret.result != -ENODEV && initcall_debug)
+ sprintf(msgbuf, "error code %d ", ret.result);
if (preempt_count() != count) {
strlcat(msgbuf, "preemption imbalance ", sizeof(msgbuf));
@@ -741,7 +746,7 @@
printk("initcall %pF returned with %s\n", fn, msgbuf);
}
- return it.result;
+ return ret.result;
}
@@ -882,7 +887,7 @@
* we're essentially up and running. Get rid of the
* initmem segments and start the user-mode stuff..
*/
- stop_boot_trace();
+
init_post();
return 0;
}
diff --git a/kernel/Makefile b/kernel/Makefile
index 19fad00..03a45e7 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -21,6 +21,10 @@
CFLAGS_REMOVE_sched_clock.o = -pg
CFLAGS_REMOVE_sched.o = -pg
endif
+ifdef CONFIG_FUNCTION_RET_TRACER
+CFLAGS_REMOVE_extable.o = -pg # For __kernel_text_address()
+CFLAGS_REMOVE_module.o = -pg # For __module_text_address()
+endif
obj-$(CONFIG_FREEZER) += freezer.o
obj-$(CONFIG_PROFILING) += profile.o
diff --git a/kernel/exit.c b/kernel/exit.c
index 2d8be7e..e5ae36e 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -53,6 +53,10 @@
#include <asm/pgtable.h>
#include <asm/mmu_context.h>
+DEFINE_TRACE(sched_process_free);
+DEFINE_TRACE(sched_process_exit);
+DEFINE_TRACE(sched_process_wait);
+
static void exit_mm(struct task_struct * tsk);
static inline int task_detached(struct task_struct *p)
@@ -1123,7 +1127,6 @@
preempt_disable();
/* causes final put_task_struct in finish_task_switch(). */
tsk->state = TASK_DEAD;
-
schedule();
BUG();
/* Avoid "noreturn function does return". */
diff --git a/kernel/fork.c b/kernel/fork.c
index 2a372a0..d6e1a32 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -47,6 +47,7 @@
#include <linux/mount.h>
#include <linux/audit.h>
#include <linux/memcontrol.h>
+#include <linux/ftrace.h>
#include <linux/profile.h>
#include <linux/rmap.h>
#include <linux/acct.h>
@@ -80,6 +81,8 @@
__cacheline_aligned DEFINE_RWLOCK(tasklist_lock); /* outer */
+DEFINE_TRACE(sched_process_fork);
+
int nr_processes(void)
{
int cpu;
@@ -137,6 +140,7 @@
prop_local_destroy_single(&tsk->dirties);
free_thread_info(tsk->stack);
rt_mutex_debug_task_free(tsk);
+ ftrace_retfunc_exit_task(tsk);
free_task_struct(tsk);
}
EXPORT_SYMBOL(free_task);
@@ -1267,6 +1271,7 @@
total_forks++;
spin_unlock(¤t->sighand->siglock);
write_unlock_irq(&tasklist_lock);
+ ftrace_retfunc_init_task(p);
proc_fork_connector(p);
cgroup_post_fork(p);
return p;
diff --git a/kernel/kthread.c b/kernel/kthread.c
index 8e7a7ce..4fbc456 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -21,6 +21,9 @@
static LIST_HEAD(kthread_create_list);
struct task_struct *kthreadd_task;
+DEFINE_TRACE(sched_kthread_stop);
+DEFINE_TRACE(sched_kthread_stop_ret);
+
struct kthread_create_info
{
/* Information passed to kthread() from kthreadd. */
diff --git a/kernel/marker.c b/kernel/marker.c
index e9c6b2b..ea54f26 100644
--- a/kernel/marker.c
+++ b/kernel/marker.c
@@ -43,6 +43,7 @@
*/
#define MARKER_HASH_BITS 6
#define MARKER_TABLE_SIZE (1 << MARKER_HASH_BITS)
+static struct hlist_head marker_table[MARKER_TABLE_SIZE];
/*
* Note about RCU :
@@ -64,11 +65,10 @@
void *oldptr;
int rcu_pending;
unsigned char ptype:1;
+ unsigned char format_allocated:1;
char name[0]; /* Contains name'\0'format'\0' */
};
-static struct hlist_head marker_table[MARKER_TABLE_SIZE];
-
/**
* __mark_empty_function - Empty probe callback
* @probe_private: probe private data
@@ -81,7 +81,7 @@
* though the function pointer change and the marker enabling are two distinct
* operations that modifies the execution flow of preemptible code.
*/
-void __mark_empty_function(void *probe_private, void *call_private,
+notrace void __mark_empty_function(void *probe_private, void *call_private,
const char *fmt, va_list *args)
{
}
@@ -97,7 +97,8 @@
* need to put a full smp_rmb() in this branch. This is why we do not use
* rcu_dereference() for the pointer read.
*/
-void marker_probe_cb(const struct marker *mdata, void *call_private, ...)
+notrace void marker_probe_cb(const struct marker *mdata,
+ void *call_private, ...)
{
va_list args;
char ptype;
@@ -107,7 +108,7 @@
* sure the teardown of the callbacks can be done correctly when they
* are in modules and they insure RCU read coherency.
*/
- rcu_read_lock_sched();
+ rcu_read_lock_sched_notrace();
ptype = mdata->ptype;
if (likely(!ptype)) {
marker_probe_func *func;
@@ -145,7 +146,7 @@
va_end(args);
}
}
- rcu_read_unlock_sched();
+ rcu_read_unlock_sched_notrace();
}
EXPORT_SYMBOL_GPL(marker_probe_cb);
@@ -157,12 +158,13 @@
*
* Should be connected to markers "MARK_NOARGS".
*/
-void marker_probe_cb_noarg(const struct marker *mdata, void *call_private, ...)
+static notrace void marker_probe_cb_noarg(const struct marker *mdata,
+ void *call_private, ...)
{
va_list args; /* not initialized */
char ptype;
- rcu_read_lock_sched();
+ rcu_read_lock_sched_notrace();
ptype = mdata->ptype;
if (likely(!ptype)) {
marker_probe_func *func;
@@ -195,9 +197,8 @@
multi[i].func(multi[i].probe_private, call_private,
mdata->format, &args);
}
- rcu_read_unlock_sched();
+ rcu_read_unlock_sched_notrace();
}
-EXPORT_SYMBOL_GPL(marker_probe_cb_noarg);
static void free_old_closure(struct rcu_head *head)
{
@@ -416,6 +417,7 @@
e->single.probe_private = NULL;
e->multi = NULL;
e->ptype = 0;
+ e->format_allocated = 0;
e->refcount = 0;
e->rcu_pending = 0;
hlist_add_head(&e->hlist, head);
@@ -447,6 +449,8 @@
if (e->single.func != __mark_empty_function)
return -EBUSY;
hlist_del(&e->hlist);
+ if (e->format_allocated)
+ kfree(e->format);
/* Make sure the call_rcu has been executed */
if (e->rcu_pending)
rcu_barrier_sched();
@@ -457,57 +461,34 @@
/*
* Set the mark_entry format to the format found in the element.
*/
-static int marker_set_format(struct marker_entry **entry, const char *format)
+static int marker_set_format(struct marker_entry *entry, const char *format)
{
- struct marker_entry *e;
- size_t name_len = strlen((*entry)->name) + 1;
- size_t format_len = strlen(format) + 1;
-
-
- e = kmalloc(sizeof(struct marker_entry) + name_len + format_len,
- GFP_KERNEL);
- if (!e)
+ entry->format = kstrdup(format, GFP_KERNEL);
+ if (!entry->format)
return -ENOMEM;
- memcpy(&e->name[0], (*entry)->name, name_len);
- e->format = &e->name[name_len];
- memcpy(e->format, format, format_len);
- if (strcmp(e->format, MARK_NOARGS) == 0)
- e->call = marker_probe_cb_noarg;
- else
- e->call = marker_probe_cb;
- e->single = (*entry)->single;
- e->multi = (*entry)->multi;
- e->ptype = (*entry)->ptype;
- e->refcount = (*entry)->refcount;
- e->rcu_pending = 0;
- hlist_add_before(&e->hlist, &(*entry)->hlist);
- hlist_del(&(*entry)->hlist);
- /* Make sure the call_rcu has been executed */
- if ((*entry)->rcu_pending)
- rcu_barrier_sched();
- kfree(*entry);
- *entry = e;
+ entry->format_allocated = 1;
+
trace_mark(core_marker_format, "name %s format %s",
- e->name, e->format);
+ entry->name, entry->format);
return 0;
}
/*
* Sets the probe callback corresponding to one marker.
*/
-static int set_marker(struct marker_entry **entry, struct marker *elem,
+static int set_marker(struct marker_entry *entry, struct marker *elem,
int active)
{
- int ret;
- WARN_ON(strcmp((*entry)->name, elem->name) != 0);
+ int ret = 0;
+ WARN_ON(strcmp(entry->name, elem->name) != 0);
- if ((*entry)->format) {
- if (strcmp((*entry)->format, elem->format) != 0) {
+ if (entry->format) {
+ if (strcmp(entry->format, elem->format) != 0) {
printk(KERN_NOTICE
"Format mismatch for probe %s "
"(%s), marker (%s)\n",
- (*entry)->name,
- (*entry)->format,
+ entry->name,
+ entry->format,
elem->format);
return -EPERM;
}
@@ -523,37 +504,67 @@
* pass from a "safe" callback (with argument) to an "unsafe"
* callback (does not set arguments).
*/
- elem->call = (*entry)->call;
+ elem->call = entry->call;
/*
* Sanity check :
* We only update the single probe private data when the ptr is
* set to a _non_ single probe! (0 -> 1 and N -> 1, N != 1)
*/
WARN_ON(elem->single.func != __mark_empty_function
- && elem->single.probe_private
- != (*entry)->single.probe_private &&
- !elem->ptype);
- elem->single.probe_private = (*entry)->single.probe_private;
+ && elem->single.probe_private != entry->single.probe_private
+ && !elem->ptype);
+ elem->single.probe_private = entry->single.probe_private;
/*
* Make sure the private data is valid when we update the
* single probe ptr.
*/
smp_wmb();
- elem->single.func = (*entry)->single.func;
+ elem->single.func = entry->single.func;
/*
* We also make sure that the new probe callbacks array is consistent
* before setting a pointer to it.
*/
- rcu_assign_pointer(elem->multi, (*entry)->multi);
+ rcu_assign_pointer(elem->multi, entry->multi);
/*
* Update the function or multi probe array pointer before setting the
* ptype.
*/
smp_wmb();
- elem->ptype = (*entry)->ptype;
+ elem->ptype = entry->ptype;
+
+ if (elem->tp_name && (active ^ elem->state)) {
+ WARN_ON(!elem->tp_cb);
+ /*
+ * It is ok to directly call the probe registration because type
+ * checking has been done in the __trace_mark_tp() macro.
+ */
+
+ if (active) {
+ /*
+ * try_module_get should always succeed because we hold
+ * lock_module() to get the tp_cb address.
+ */
+ ret = try_module_get(__module_text_address(
+ (unsigned long)elem->tp_cb));
+ BUG_ON(!ret);
+ ret = tracepoint_probe_register_noupdate(
+ elem->tp_name,
+ elem->tp_cb);
+ } else {
+ ret = tracepoint_probe_unregister_noupdate(
+ elem->tp_name,
+ elem->tp_cb);
+ /*
+ * tracepoint_probe_update_all() must be called
+ * before the module containing tp_cb is unloaded.
+ */
+ module_put(__module_text_address(
+ (unsigned long)elem->tp_cb));
+ }
+ }
elem->state = active;
- return 0;
+ return ret;
}
/*
@@ -564,7 +575,24 @@
*/
static void disable_marker(struct marker *elem)
{
+ int ret;
+
/* leave "call" as is. It is known statically. */
+ if (elem->tp_name && elem->state) {
+ WARN_ON(!elem->tp_cb);
+ /*
+ * It is ok to directly call the probe registration because type
+ * checking has been done in the __trace_mark_tp() macro.
+ */
+ ret = tracepoint_probe_unregister_noupdate(elem->tp_name,
+ elem->tp_cb);
+ WARN_ON(ret);
+ /*
+ * tracepoint_probe_update_all() must be called
+ * before the module containing tp_cb is unloaded.
+ */
+ module_put(__module_text_address((unsigned long)elem->tp_cb));
+ }
elem->state = 0;
elem->single.func = __mark_empty_function;
/* Update the function before setting the ptype */
@@ -594,8 +622,7 @@
for (iter = begin; iter < end; iter++) {
mark_entry = get_marker(iter->name);
if (mark_entry) {
- set_marker(&mark_entry, iter,
- !!mark_entry->refcount);
+ set_marker(mark_entry, iter, !!mark_entry->refcount);
/*
* ignore error, continue
*/
@@ -629,6 +656,7 @@
marker_update_probe_range(__start___markers, __stop___markers);
/* Markers in modules. */
module_update_markers();
+ tracepoint_probe_update_all();
}
/**
@@ -657,7 +685,7 @@
ret = PTR_ERR(entry);
} else if (format) {
if (!entry->format)
- ret = marker_set_format(&entry, format);
+ ret = marker_set_format(entry, format);
else if (strcmp(entry->format, format))
ret = -EPERM;
}
@@ -676,10 +704,11 @@
goto end;
}
mutex_unlock(&markers_mutex);
- marker_update_probes(); /* may update entry */
+ marker_update_probes();
mutex_lock(&markers_mutex);
entry = get_marker(name);
- WARN_ON(!entry);
+ if (!entry)
+ goto end;
if (entry->rcu_pending)
rcu_barrier_sched();
entry->oldptr = old;
@@ -720,7 +749,7 @@
rcu_barrier_sched();
old = marker_entry_remove_probe(entry, probe, probe_private);
mutex_unlock(&markers_mutex);
- marker_update_probes(); /* may update entry */
+ marker_update_probes();
mutex_lock(&markers_mutex);
entry = get_marker(name);
if (!entry)
@@ -801,10 +830,11 @@
rcu_barrier_sched();
old = marker_entry_remove_probe(entry, NULL, probe_private);
mutex_unlock(&markers_mutex);
- marker_update_probes(); /* may update entry */
+ marker_update_probes();
mutex_lock(&markers_mutex);
entry = get_marker_from_private_data(probe, probe_private);
- WARN_ON(!entry);
+ if (!entry)
+ goto end;
if (entry->rcu_pending)
rcu_barrier_sched();
entry->oldptr = old;
@@ -848,8 +878,6 @@
if (!e->ptype) {
if (num == 0 && e->single.func == probe)
return e->single.probe_private;
- else
- break;
} else {
struct marker_probe_closure *closure;
int match = 0;
@@ -861,8 +889,42 @@
return closure[i].probe_private;
}
}
+ break;
}
}
return ERR_PTR(-ENOENT);
}
EXPORT_SYMBOL_GPL(marker_get_private_data);
+
+#ifdef CONFIG_MODULES
+
+int marker_module_notify(struct notifier_block *self,
+ unsigned long val, void *data)
+{
+ struct module *mod = data;
+
+ switch (val) {
+ case MODULE_STATE_COMING:
+ marker_update_probe_range(mod->markers,
+ mod->markers + mod->num_markers);
+ break;
+ case MODULE_STATE_GOING:
+ marker_update_probe_range(mod->markers,
+ mod->markers + mod->num_markers);
+ break;
+ }
+ return 0;
+}
+
+struct notifier_block marker_module_nb = {
+ .notifier_call = marker_module_notify,
+ .priority = 0,
+};
+
+static int init_markers(void)
+{
+ return register_module_notifier(&marker_module_nb);
+}
+__initcall(init_markers);
+
+#endif /* CONFIG_MODULES */
diff --git a/kernel/module.c b/kernel/module.c
index 1f4cc00..89bcf7c 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -2184,24 +2184,15 @@
struct mod_debug *debug;
unsigned int num_debug;
-#ifdef CONFIG_MARKERS
- marker_update_probe_range(mod->markers,
- mod->markers + mod->num_markers);
-#endif
debug = section_objs(hdr, sechdrs, secstrings, "__verbose",
sizeof(*debug), &num_debug);
dynamic_printk_setup(debug, num_debug);
-
-#ifdef CONFIG_TRACEPOINTS
- tracepoint_update_probe_range(mod->tracepoints,
- mod->tracepoints + mod->num_tracepoints);
-#endif
}
/* sechdrs[0].sh_size is always zero */
mseg = section_objs(hdr, sechdrs, secstrings, "__mcount_loc",
sizeof(*mseg), &num_mcount);
- ftrace_init_module(mseg, mseg + num_mcount);
+ ftrace_init_module(mod, mseg, mseg + num_mcount);
err = module_finalize(hdr, sechdrs, mod);
if (err < 0)
diff --git a/kernel/power/disk.c b/kernel/power/disk.c
index c9d7408..f77d381 100644
--- a/kernel/power/disk.c
+++ b/kernel/power/disk.c
@@ -22,7 +22,6 @@
#include <linux/console.h>
#include <linux/cpu.h>
#include <linux/freezer.h>
-#include <linux/ftrace.h>
#include "power.h"
@@ -257,7 +256,7 @@
int hibernation_snapshot(int platform_mode)
{
- int error, ftrace_save;
+ int error;
/* Free memory before shutting down devices. */
error = swsusp_shrink_memory();
@@ -269,7 +268,6 @@
goto Close;
suspend_console();
- ftrace_save = __ftrace_enabled_save();
error = device_suspend(PMSG_FREEZE);
if (error)
goto Recover_platform;
@@ -299,7 +297,6 @@
Resume_devices:
device_resume(in_suspend ?
(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);
- __ftrace_enabled_restore(ftrace_save);
resume_console();
Close:
platform_end(platform_mode);
@@ -370,11 +367,10 @@
int hibernation_restore(int platform_mode)
{
- int error, ftrace_save;
+ int error;
pm_prepare_console();
suspend_console();
- ftrace_save = __ftrace_enabled_save();
error = device_suspend(PMSG_QUIESCE);
if (error)
goto Finish;
@@ -389,7 +385,6 @@
platform_restore_cleanup(platform_mode);
device_resume(PMSG_RECOVER);
Finish:
- __ftrace_enabled_restore(ftrace_save);
resume_console();
pm_restore_console();
return error;
@@ -402,7 +397,7 @@
int hibernation_platform_enter(void)
{
- int error, ftrace_save;
+ int error;
if (!hibernation_ops)
return -ENOSYS;
@@ -417,7 +412,6 @@
goto Close;
suspend_console();
- ftrace_save = __ftrace_enabled_save();
error = device_suspend(PMSG_HIBERNATE);
if (error) {
if (hibernation_ops->recover)
@@ -452,7 +446,6 @@
hibernation_ops->finish();
Resume_devices:
device_resume(PMSG_RESTORE);
- __ftrace_enabled_restore(ftrace_save);
resume_console();
Close:
hibernation_ops->end();
diff --git a/kernel/power/main.c b/kernel/power/main.c
index b8f7ce9..613f169 100644
--- a/kernel/power/main.c
+++ b/kernel/power/main.c
@@ -22,7 +22,6 @@
#include <linux/freezer.h>
#include <linux/vmstat.h>
#include <linux/syscalls.h>
-#include <linux/ftrace.h>
#include "power.h"
@@ -317,7 +316,7 @@
*/
int suspend_devices_and_enter(suspend_state_t state)
{
- int error, ftrace_save;
+ int error;
if (!suspend_ops)
return -ENOSYS;
@@ -328,7 +327,6 @@
goto Close;
}
suspend_console();
- ftrace_save = __ftrace_enabled_save();
suspend_test_start();
error = device_suspend(PMSG_SUSPEND);
if (error) {
@@ -360,7 +358,6 @@
suspend_test_start();
device_resume(PMSG_RESUME);
suspend_test_finish("resume devices");
- __ftrace_enabled_restore(ftrace_save);
resume_console();
Close:
if (suspend_ops->end)
diff --git a/kernel/profile.c b/kernel/profile.c
index 5b7d1ac..7f93a50 100644
--- a/kernel/profile.c
+++ b/kernel/profile.c
@@ -544,7 +544,7 @@
};
#ifdef CONFIG_SMP
-static inline void profile_nop(void *unused)
+static void profile_nop(void *unused)
{
}
diff --git a/kernel/sched.c b/kernel/sched.c
index 9b1e793..388d9db 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -118,6 +118,12 @@
*/
#define RUNTIME_INF ((u64)~0ULL)
+DEFINE_TRACE(sched_wait_task);
+DEFINE_TRACE(sched_wakeup);
+DEFINE_TRACE(sched_wakeup_new);
+DEFINE_TRACE(sched_switch);
+DEFINE_TRACE(sched_migrate_task);
+
#ifdef CONFIG_SMP
/*
* Divide a load by a sched group cpu_power : (load / sg->__cpu_power)
@@ -5895,6 +5901,7 @@
* The idle tasks have their own, simple scheduling class:
*/
idle->sched_class = &idle_sched_class;
+ ftrace_retfunc_init_task(idle);
}
/*
diff --git a/kernel/signal.c b/kernel/signal.c
index 4530fc6..e9afe63 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -41,6 +41,8 @@
static struct kmem_cache *sigqueue_cachep;
+DEFINE_TRACE(sched_signal_send);
+
static void __user *sig_handler(struct task_struct *t, int sig)
{
return t->sighand->action[sig - 1].sa.sa_handler;
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 9d048fa..65d4a9b 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -484,6 +484,16 @@
.proc_handler = &ftrace_enable_sysctl,
},
#endif
+#ifdef CONFIG_TRACING
+ {
+ .ctl_name = CTL_UNNUMBERED,
+ .procname = "ftrace_dump_on_oops",
+ .data = &ftrace_dump_on_oops,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = &proc_dointvec,
+ },
+#endif
#ifdef CONFIG_MODULES
{
.ctl_name = KERN_MODPROBE,
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 33dbefd..9cbf776 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -3,12 +3,25 @@
# select HAVE_FUNCTION_TRACER:
#
+config USER_STACKTRACE_SUPPORT
+ bool
+
config NOP_TRACER
bool
config HAVE_FUNCTION_TRACER
bool
+config HAVE_FUNCTION_RET_TRACER
+ bool
+
+config HAVE_FUNCTION_TRACE_MCOUNT_TEST
+ bool
+ help
+ This gets selected when the arch tests the function_trace_stop
+ variable at the mcount call site. Otherwise, this variable
+ is tested by the called function.
+
config HAVE_DYNAMIC_FTRACE
bool
@@ -47,6 +60,16 @@
(the bootup default), then the overhead of the instructions is very
small and not measurable even in micro-benchmarks.
+config FUNCTION_RET_TRACER
+ bool "Kernel Function return Tracer"
+ depends on HAVE_FUNCTION_RET_TRACER
+ depends on FUNCTION_TRACER
+ help
+ Enable the kernel to trace a function at its return.
+ It's first purpose is to trace the duration of functions.
+ This is done by setting the current return address on the thread
+ info structure of the current task.
+
config IRQSOFF_TRACER
bool "Interrupts-off Latency Tracer"
default n
@@ -138,6 +161,59 @@
selected, because the self-tests are an initcall as well and that
would invalidate the boot trace. )
+config TRACE_BRANCH_PROFILING
+ bool "Trace likely/unlikely profiler"
+ depends on DEBUG_KERNEL
+ select TRACING
+ help
+ This tracer profiles all the the likely and unlikely macros
+ in the kernel. It will display the results in:
+
+ /debugfs/tracing/profile_annotated_branch
+
+ Note: this will add a significant overhead, only turn this
+ on if you need to profile the system's use of these macros.
+
+ Say N if unsure.
+
+config PROFILE_ALL_BRANCHES
+ bool "Profile all if conditionals"
+ depends on TRACE_BRANCH_PROFILING
+ help
+ This tracer profiles all branch conditions. Every if ()
+ taken in the kernel is recorded whether it hit or miss.
+ The results will be displayed in:
+
+ /debugfs/tracing/profile_branch
+
+ This configuration, when enabled, will impose a great overhead
+ on the system. This should only be enabled when the system
+ is to be analyzed
+
+ Say N if unsure.
+
+config TRACING_BRANCHES
+ bool
+ help
+ Selected by tracers that will trace the likely and unlikely
+ conditions. This prevents the tracers themselves from being
+ profiled. Profiling the tracing infrastructure can only happen
+ when the likelys and unlikelys are not being traced.
+
+config BRANCH_TRACER
+ bool "Trace likely/unlikely instances"
+ depends on TRACE_BRANCH_PROFILING
+ select TRACING_BRANCHES
+ help
+ This traces the events of likely and unlikely condition
+ calls in the kernel. The difference between this and the
+ "Trace likely/unlikely profiler" is that this is not a
+ histogram of the callers, but actually places the calling
+ events into a running trace buffer to see when and where the
+ events happened, as well as their results.
+
+ Say N if unsure.
+
config STACK_TRACER
bool "Trace max stack"
depends on HAVE_FUNCTION_TRACER
diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
index c8228b1..1a8c925 100644
--- a/kernel/trace/Makefile
+++ b/kernel/trace/Makefile
@@ -10,6 +10,11 @@
obj-y += trace_selftest_dynamic.o
endif
+# If unlikely tracing is enabled, do not trace these files
+ifdef CONFIG_TRACING_BRANCHES
+KBUILD_CFLAGS += -DDISABLE_BRANCH_PROFILING
+endif
+
obj-$(CONFIG_FUNCTION_TRACER) += libftrace.o
obj-$(CONFIG_RING_BUFFER) += ring_buffer.o
@@ -24,5 +29,7 @@
obj-$(CONFIG_STACK_TRACER) += trace_stack.o
obj-$(CONFIG_MMIOTRACE) += trace_mmiotrace.o
obj-$(CONFIG_BOOT_TRACER) += trace_boot.o
+obj-$(CONFIG_FUNCTION_RET_TRACER) += trace_functions_return.o
+obj-$(CONFIG_TRACE_BRANCH_PROFILING) += trace_branch.o
libftrace-y := ftrace.o
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 78db083..53042f1 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -47,6 +47,12 @@
int ftrace_enabled __read_mostly;
static int last_ftrace_enabled;
+/* Quick disabling of function tracer. */
+int function_trace_stop;
+
+/* By default, current tracing type is normal tracing. */
+enum ftrace_tracing_type_t ftrace_tracing_type = FTRACE_TYPE_ENTER;
+
/*
* ftrace_disabled is set when an anomaly is discovered.
* ftrace_disabled is much stronger than ftrace_enabled.
@@ -63,6 +69,7 @@
static struct ftrace_ops *ftrace_list __read_mostly = &ftrace_list_end;
ftrace_func_t ftrace_trace_function __read_mostly = ftrace_stub;
+ftrace_func_t __ftrace_trace_function __read_mostly = ftrace_stub;
static void ftrace_list_func(unsigned long ip, unsigned long parent_ip)
{
@@ -88,8 +95,23 @@
void clear_ftrace_function(void)
{
ftrace_trace_function = ftrace_stub;
+ __ftrace_trace_function = ftrace_stub;
}
+#ifndef CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST
+/*
+ * For those archs that do not test ftrace_trace_stop in their
+ * mcount call site, we need to do it from C.
+ */
+static void ftrace_test_stop_func(unsigned long ip, unsigned long parent_ip)
+{
+ if (function_trace_stop)
+ return;
+
+ __ftrace_trace_function(ip, parent_ip);
+}
+#endif
+
static int __register_ftrace_function(struct ftrace_ops *ops)
{
/* should not be called from interrupt context */
@@ -110,10 +132,18 @@
* For one func, simply call it directly.
* For more than one func, call the chain.
*/
+#ifdef CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST
if (ops->next == &ftrace_list_end)
ftrace_trace_function = ops->func;
else
ftrace_trace_function = ftrace_list_func;
+#else
+ if (ops->next == &ftrace_list_end)
+ __ftrace_trace_function = ops->func;
+ else
+ __ftrace_trace_function = ftrace_list_func;
+ ftrace_trace_function = ftrace_test_stop_func;
+#endif
}
spin_unlock(&ftrace_lock);
@@ -152,8 +182,7 @@
if (ftrace_enabled) {
/* If we only have one func left, then call that directly */
- if (ftrace_list == &ftrace_list_end ||
- ftrace_list->next == &ftrace_list_end)
+ if (ftrace_list->next == &ftrace_list_end)
ftrace_trace_function = ftrace_list->func;
}
@@ -308,7 +337,7 @@
{
struct dyn_ftrace *rec;
- if (!ftrace_enabled || ftrace_disabled)
+ if (ftrace_disabled)
return NULL;
rec = ftrace_alloc_dyn_node(ip);
@@ -322,14 +351,58 @@
return rec;
}
-#define FTRACE_ADDR ((long)(ftrace_caller))
+static void print_ip_ins(const char *fmt, unsigned char *p)
+{
+ int i;
+
+ printk(KERN_CONT "%s", fmt);
+
+ for (i = 0; i < MCOUNT_INSN_SIZE; i++)
+ printk(KERN_CONT "%s%02x", i ? ":" : "", p[i]);
+}
+
+static void ftrace_bug(int failed, unsigned long ip)
+{
+ switch (failed) {
+ case -EFAULT:
+ FTRACE_WARN_ON_ONCE(1);
+ pr_info("ftrace faulted on modifying ");
+ print_ip_sym(ip);
+ break;
+ case -EINVAL:
+ FTRACE_WARN_ON_ONCE(1);
+ pr_info("ftrace failed to modify ");
+ print_ip_sym(ip);
+ print_ip_ins(" actual: ", (unsigned char *)ip);
+ printk(KERN_CONT "\n");
+ break;
+ case -EPERM:
+ FTRACE_WARN_ON_ONCE(1);
+ pr_info("ftrace faulted on writing ");
+ print_ip_sym(ip);
+ break;
+ default:
+ FTRACE_WARN_ON_ONCE(1);
+ pr_info("ftrace faulted on unknown error ");
+ print_ip_sym(ip);
+ }
+}
+
static int
-__ftrace_replace_code(struct dyn_ftrace *rec,
- unsigned char *nop, int enable)
+__ftrace_replace_code(struct dyn_ftrace *rec, int enable)
{
unsigned long ip, fl;
- unsigned char *call, *old, *new;
+ unsigned long ftrace_addr;
+
+#ifdef CONFIG_FUNCTION_RET_TRACER
+ if (ftrace_tracing_type == FTRACE_TYPE_ENTER)
+ ftrace_addr = (unsigned long)ftrace_caller;
+ else
+ ftrace_addr = (unsigned long)ftrace_return_caller;
+#else
+ ftrace_addr = (unsigned long)ftrace_caller;
+#endif
ip = rec->ip;
@@ -388,34 +461,28 @@
}
}
- call = ftrace_call_replace(ip, FTRACE_ADDR);
-
- if (rec->flags & FTRACE_FL_ENABLED) {
- old = nop;
- new = call;
- } else {
- old = call;
- new = nop;
- }
-
- return ftrace_modify_code(ip, old, new);
+ if (rec->flags & FTRACE_FL_ENABLED)
+ return ftrace_make_call(rec, ftrace_addr);
+ else
+ return ftrace_make_nop(NULL, rec, ftrace_addr);
}
static void ftrace_replace_code(int enable)
{
int i, failed;
- unsigned char *nop = NULL;
struct dyn_ftrace *rec;
struct ftrace_page *pg;
- nop = ftrace_nop_replace();
-
for (pg = ftrace_pages_start; pg; pg = pg->next) {
for (i = 0; i < pg->index; i++) {
rec = &pg->records[i];
- /* don't modify code that has already faulted */
- if (rec->flags & FTRACE_FL_FAILED)
+ /*
+ * Skip over free records and records that have
+ * failed.
+ */
+ if (rec->flags & FTRACE_FL_FREE ||
+ rec->flags & FTRACE_FL_FAILED)
continue;
/* ignore updates to this record's mcount site */
@@ -426,68 +493,30 @@
unfreeze_record(rec);
}
- failed = __ftrace_replace_code(rec, nop, enable);
+ failed = __ftrace_replace_code(rec, enable);
if (failed && (rec->flags & FTRACE_FL_CONVERTED)) {
rec->flags |= FTRACE_FL_FAILED;
if ((system_state == SYSTEM_BOOTING) ||
!core_kernel_text(rec->ip)) {
ftrace_free_rec(rec);
- }
+ } else
+ ftrace_bug(failed, rec->ip);
}
}
}
}
-static void print_ip_ins(const char *fmt, unsigned char *p)
-{
- int i;
-
- printk(KERN_CONT "%s", fmt);
-
- for (i = 0; i < MCOUNT_INSN_SIZE; i++)
- printk(KERN_CONT "%s%02x", i ? ":" : "", p[i]);
-}
-
static int
-ftrace_code_disable(struct dyn_ftrace *rec)
+ftrace_code_disable(struct module *mod, struct dyn_ftrace *rec)
{
unsigned long ip;
- unsigned char *nop, *call;
int ret;
ip = rec->ip;
- nop = ftrace_nop_replace();
- call = ftrace_call_replace(ip, mcount_addr);
-
- ret = ftrace_modify_code(ip, call, nop);
+ ret = ftrace_make_nop(mod, rec, mcount_addr);
if (ret) {
- switch (ret) {
- case -EFAULT:
- FTRACE_WARN_ON_ONCE(1);
- pr_info("ftrace faulted on modifying ");
- print_ip_sym(ip);
- break;
- case -EINVAL:
- FTRACE_WARN_ON_ONCE(1);
- pr_info("ftrace failed to modify ");
- print_ip_sym(ip);
- print_ip_ins(" expected: ", call);
- print_ip_ins(" actual: ", (unsigned char *)ip);
- print_ip_ins(" replace: ", nop);
- printk(KERN_CONT "\n");
- break;
- case -EPERM:
- FTRACE_WARN_ON_ONCE(1);
- pr_info("ftrace faulted on writing ");
- print_ip_sym(ip);
- break;
- default:
- FTRACE_WARN_ON_ONCE(1);
- pr_info("ftrace faulted on unknown error ");
- print_ip_sym(ip);
- }
-
+ ftrace_bug(ret, ip);
rec->flags |= FTRACE_FL_FAILED;
return 0;
}
@@ -515,7 +544,7 @@
}
static ftrace_func_t saved_ftrace_func;
-static int ftrace_start;
+static int ftrace_start_up;
static DEFINE_MUTEX(ftrace_start_lock);
static void ftrace_startup(void)
@@ -526,7 +555,7 @@
return;
mutex_lock(&ftrace_start_lock);
- ftrace_start++;
+ ftrace_start_up++;
command |= FTRACE_ENABLE_CALLS;
if (saved_ftrace_func != ftrace_trace_function) {
@@ -550,8 +579,8 @@
return;
mutex_lock(&ftrace_start_lock);
- ftrace_start--;
- if (!ftrace_start)
+ ftrace_start_up--;
+ if (!ftrace_start_up)
command |= FTRACE_DISABLE_CALLS;
if (saved_ftrace_func != ftrace_trace_function) {
@@ -577,8 +606,8 @@
mutex_lock(&ftrace_start_lock);
/* Force update next time */
saved_ftrace_func = NULL;
- /* ftrace_start is true if we want ftrace running */
- if (ftrace_start)
+ /* ftrace_start_up is true if we want ftrace running */
+ if (ftrace_start_up)
command |= FTRACE_ENABLE_CALLS;
ftrace_run_update_code(command);
@@ -593,8 +622,8 @@
return;
mutex_lock(&ftrace_start_lock);
- /* ftrace_start is true if ftrace is running */
- if (ftrace_start)
+ /* ftrace_start_up is true if ftrace is running */
+ if (ftrace_start_up)
command |= FTRACE_DISABLE_CALLS;
ftrace_run_update_code(command);
@@ -605,7 +634,7 @@
static unsigned long ftrace_update_cnt;
unsigned long ftrace_update_tot_cnt;
-static int ftrace_update_code(void)
+static int ftrace_update_code(struct module *mod)
{
struct dyn_ftrace *p, *t;
cycle_t start, stop;
@@ -622,7 +651,7 @@
list_del_init(&p->list);
/* convert record (i.e, patch mcount-call with NOP) */
- if (ftrace_code_disable(p)) {
+ if (ftrace_code_disable(mod, p)) {
p->flags |= FTRACE_FL_CONVERTED;
ftrace_update_cnt++;
} else
@@ -1181,7 +1210,7 @@
mutex_lock(&ftrace_sysctl_lock);
mutex_lock(&ftrace_start_lock);
- if (ftrace_start && ftrace_enabled)
+ if (ftrace_start_up && ftrace_enabled)
ftrace_run_update_code(FTRACE_ENABLE_CALLS);
mutex_unlock(&ftrace_start_lock);
mutex_unlock(&ftrace_sysctl_lock);
@@ -1268,7 +1297,8 @@
fs_initcall(ftrace_init_debugfs);
-static int ftrace_convert_nops(unsigned long *start,
+static int ftrace_convert_nops(struct module *mod,
+ unsigned long *start,
unsigned long *end)
{
unsigned long *p;
@@ -1279,23 +1309,32 @@
p = start;
while (p < end) {
addr = ftrace_call_adjust(*p++);
+ /*
+ * Some architecture linkers will pad between
+ * the different mcount_loc sections of different
+ * object files to satisfy alignments.
+ * Skip any NULL pointers.
+ */
+ if (!addr)
+ continue;
ftrace_record_ip(addr);
}
/* disable interrupts to prevent kstop machine */
local_irq_save(flags);
- ftrace_update_code();
+ ftrace_update_code(mod);
local_irq_restore(flags);
mutex_unlock(&ftrace_start_lock);
return 0;
}
-void ftrace_init_module(unsigned long *start, unsigned long *end)
+void ftrace_init_module(struct module *mod,
+ unsigned long *start, unsigned long *end)
{
if (ftrace_disabled || start == end)
return;
- ftrace_convert_nops(start, end);
+ ftrace_convert_nops(mod, start, end);
}
extern unsigned long __start_mcount_loc[];
@@ -1325,7 +1364,8 @@
last_ftrace_enabled = ftrace_enabled = 1;
- ret = ftrace_convert_nops(__start_mcount_loc,
+ ret = ftrace_convert_nops(NULL,
+ __start_mcount_loc,
__stop_mcount_loc);
return;
@@ -1381,10 +1421,17 @@
return -1;
mutex_lock(&ftrace_sysctl_lock);
+
+ if (ftrace_tracing_type == FTRACE_TYPE_RETURN) {
+ ret = -EBUSY;
+ goto out;
+ }
+
ret = __register_ftrace_function(ops);
ftrace_startup();
- mutex_unlock(&ftrace_sysctl_lock);
+out:
+ mutex_unlock(&ftrace_sysctl_lock);
return ret;
}
@@ -1449,3 +1496,147 @@
return ret;
}
+#ifdef CONFIG_FUNCTION_RET_TRACER
+
+static atomic_t ftrace_retfunc_active;
+
+/* The callback that hooks the return of a function */
+trace_function_return_t ftrace_function_return =
+ (trace_function_return_t)ftrace_stub;
+
+
+/* Try to assign a return stack array on FTRACE_RETSTACK_ALLOC_SIZE tasks. */
+static int alloc_retstack_tasklist(struct ftrace_ret_stack **ret_stack_list)
+{
+ int i;
+ int ret = 0;
+ unsigned long flags;
+ int start = 0, end = FTRACE_RETSTACK_ALLOC_SIZE;
+ struct task_struct *g, *t;
+
+ for (i = 0; i < FTRACE_RETSTACK_ALLOC_SIZE; i++) {
+ ret_stack_list[i] = kmalloc(FTRACE_RETFUNC_DEPTH
+ * sizeof(struct ftrace_ret_stack),
+ GFP_KERNEL);
+ if (!ret_stack_list[i]) {
+ start = 0;
+ end = i;
+ ret = -ENOMEM;
+ goto free;
+ }
+ }
+
+ read_lock_irqsave(&tasklist_lock, flags);
+ do_each_thread(g, t) {
+ if (start == end) {
+ ret = -EAGAIN;
+ goto unlock;
+ }
+
+ if (t->ret_stack == NULL) {
+ t->ret_stack = ret_stack_list[start++];
+ t->curr_ret_stack = -1;
+ atomic_set(&t->trace_overrun, 0);
+ }
+ } while_each_thread(g, t);
+
+unlock:
+ read_unlock_irqrestore(&tasklist_lock, flags);
+free:
+ for (i = start; i < end; i++)
+ kfree(ret_stack_list[i]);
+ return ret;
+}
+
+/* Allocate a return stack for each task */
+static int start_return_tracing(void)
+{
+ struct ftrace_ret_stack **ret_stack_list;
+ int ret;
+
+ ret_stack_list = kmalloc(FTRACE_RETSTACK_ALLOC_SIZE *
+ sizeof(struct ftrace_ret_stack *),
+ GFP_KERNEL);
+
+ if (!ret_stack_list)
+ return -ENOMEM;
+
+ do {
+ ret = alloc_retstack_tasklist(ret_stack_list);
+ } while (ret == -EAGAIN);
+
+ kfree(ret_stack_list);
+ return ret;
+}
+
+int register_ftrace_return(trace_function_return_t func)
+{
+ int ret = 0;
+
+ mutex_lock(&ftrace_sysctl_lock);
+
+ /*
+ * Don't launch return tracing if normal function
+ * tracing is already running.
+ */
+ if (ftrace_trace_function != ftrace_stub) {
+ ret = -EBUSY;
+ goto out;
+ }
+ atomic_inc(&ftrace_retfunc_active);
+ ret = start_return_tracing();
+ if (ret) {
+ atomic_dec(&ftrace_retfunc_active);
+ goto out;
+ }
+ ftrace_tracing_type = FTRACE_TYPE_RETURN;
+ ftrace_function_return = func;
+ ftrace_startup();
+
+out:
+ mutex_unlock(&ftrace_sysctl_lock);
+ return ret;
+}
+
+void unregister_ftrace_return(void)
+{
+ mutex_lock(&ftrace_sysctl_lock);
+
+ atomic_dec(&ftrace_retfunc_active);
+ ftrace_function_return = (trace_function_return_t)ftrace_stub;
+ ftrace_shutdown();
+ /* Restore normal tracing type */
+ ftrace_tracing_type = FTRACE_TYPE_ENTER;
+
+ mutex_unlock(&ftrace_sysctl_lock);
+}
+
+/* Allocate a return stack for newly created task */
+void ftrace_retfunc_init_task(struct task_struct *t)
+{
+ if (atomic_read(&ftrace_retfunc_active)) {
+ t->ret_stack = kmalloc(FTRACE_RETFUNC_DEPTH
+ * sizeof(struct ftrace_ret_stack),
+ GFP_KERNEL);
+ if (!t->ret_stack)
+ return;
+ t->curr_ret_stack = -1;
+ atomic_set(&t->trace_overrun, 0);
+ } else
+ t->ret_stack = NULL;
+}
+
+void ftrace_retfunc_exit_task(struct task_struct *t)
+{
+ struct ftrace_ret_stack *ret_stack = t->ret_stack;
+
+ t->ret_stack = NULL;
+ /* NULL must become visible to IRQs before we free it: */
+ barrier();
+
+ kfree(ret_stack);
+}
+#endif
+
+
+
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index f780e95..e206951 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -18,8 +18,46 @@
#include "trace.h"
-/* Global flag to disable all recording to ring buffers */
-static int ring_buffers_off __read_mostly;
+/*
+ * A fast way to enable or disable all ring buffers is to
+ * call tracing_on or tracing_off. Turning off the ring buffers
+ * prevents all ring buffers from being recorded to.
+ * Turning this switch on, makes it OK to write to the
+ * ring buffer, if the ring buffer is enabled itself.
+ *
+ * There's three layers that must be on in order to write
+ * to the ring buffer.
+ *
+ * 1) This global flag must be set.
+ * 2) The ring buffer must be enabled for recording.
+ * 3) The per cpu buffer must be enabled for recording.
+ *
+ * In case of an anomaly, this global flag has a bit set that
+ * will permantly disable all ring buffers.
+ */
+
+/*
+ * Global flag to disable all recording to ring buffers
+ * This has two bits: ON, DISABLED
+ *
+ * ON DISABLED
+ * ---- ----------
+ * 0 0 : ring buffers are off
+ * 1 0 : ring buffers are on
+ * X 1 : ring buffers are permanently disabled
+ */
+
+enum {
+ RB_BUFFERS_ON_BIT = 0,
+ RB_BUFFERS_DISABLED_BIT = 1,
+};
+
+enum {
+ RB_BUFFERS_ON = 1 << RB_BUFFERS_ON_BIT,
+ RB_BUFFERS_DISABLED = 1 << RB_BUFFERS_DISABLED_BIT,
+};
+
+static long ring_buffer_flags __read_mostly = RB_BUFFERS_ON;
/**
* tracing_on - enable all tracing buffers
@@ -29,7 +67,7 @@
*/
void tracing_on(void)
{
- ring_buffers_off = 0;
+ set_bit(RB_BUFFERS_ON_BIT, &ring_buffer_flags);
}
/**
@@ -42,9 +80,22 @@
*/
void tracing_off(void)
{
- ring_buffers_off = 1;
+ clear_bit(RB_BUFFERS_ON_BIT, &ring_buffer_flags);
}
+/**
+ * tracing_off_permanent - permanently disable ring buffers
+ *
+ * This function, once called, will disable all ring buffers
+ * permanenty.
+ */
+void tracing_off_permanent(void)
+{
+ set_bit(RB_BUFFERS_DISABLED_BIT, &ring_buffer_flags);
+}
+
+#include "trace.h"
+
/* Up this if you want to test the TIME_EXTENTS and normalization */
#define DEBUG_SHIFT 0
@@ -187,7 +238,8 @@
struct ring_buffer_per_cpu {
int cpu;
struct ring_buffer *buffer;
- spinlock_t lock;
+ spinlock_t reader_lock; /* serialize readers */
+ raw_spinlock_t lock;
struct lock_class_key lock_key;
struct list_head pages;
struct buffer_page *head_page; /* read from head */
@@ -221,32 +273,16 @@
u64 read_stamp;
};
+/* buffer may be either ring_buffer or ring_buffer_per_cpu */
#define RB_WARN_ON(buffer, cond) \
- do { \
- if (unlikely(cond)) { \
+ ({ \
+ int _____ret = unlikely(cond); \
+ if (_____ret) { \
atomic_inc(&buffer->record_disabled); \
WARN_ON(1); \
} \
- } while (0)
-
-#define RB_WARN_ON_RET(buffer, cond) \
- do { \
- if (unlikely(cond)) { \
- atomic_inc(&buffer->record_disabled); \
- WARN_ON(1); \
- return -1; \
- } \
- } while (0)
-
-#define RB_WARN_ON_ONCE(buffer, cond) \
- do { \
- static int once; \
- if (unlikely(cond) && !once) { \
- once++; \
- atomic_inc(&buffer->record_disabled); \
- WARN_ON(1); \
- } \
- } while (0)
+ _____ret; \
+ })
/**
* check_pages - integrity check of buffer pages
@@ -260,14 +296,18 @@
struct list_head *head = &cpu_buffer->pages;
struct buffer_page *page, *tmp;
- RB_WARN_ON_RET(cpu_buffer, head->next->prev != head);
- RB_WARN_ON_RET(cpu_buffer, head->prev->next != head);
+ if (RB_WARN_ON(cpu_buffer, head->next->prev != head))
+ return -1;
+ if (RB_WARN_ON(cpu_buffer, head->prev->next != head))
+ return -1;
list_for_each_entry_safe(page, tmp, head, list) {
- RB_WARN_ON_RET(cpu_buffer,
- page->list.next->prev != &page->list);
- RB_WARN_ON_RET(cpu_buffer,
- page->list.prev->next != &page->list);
+ if (RB_WARN_ON(cpu_buffer,
+ page->list.next->prev != &page->list))
+ return -1;
+ if (RB_WARN_ON(cpu_buffer,
+ page->list.prev->next != &page->list))
+ return -1;
}
return 0;
@@ -324,7 +364,8 @@
cpu_buffer->cpu = cpu;
cpu_buffer->buffer = buffer;
- spin_lock_init(&cpu_buffer->lock);
+ spin_lock_init(&cpu_buffer->reader_lock);
+ cpu_buffer->lock = (raw_spinlock_t)__RAW_SPIN_LOCK_UNLOCKED;
INIT_LIST_HEAD(&cpu_buffer->pages);
page = kzalloc_node(ALIGN(sizeof(*page), cache_line_size()),
@@ -473,13 +514,15 @@
synchronize_sched();
for (i = 0; i < nr_pages; i++) {
- BUG_ON(list_empty(&cpu_buffer->pages));
+ if (RB_WARN_ON(cpu_buffer, list_empty(&cpu_buffer->pages)))
+ return;
p = cpu_buffer->pages.next;
page = list_entry(p, struct buffer_page, list);
list_del_init(&page->list);
free_buffer_page(page);
}
- BUG_ON(list_empty(&cpu_buffer->pages));
+ if (RB_WARN_ON(cpu_buffer, list_empty(&cpu_buffer->pages)))
+ return;
rb_reset_cpu(cpu_buffer);
@@ -501,7 +544,8 @@
synchronize_sched();
for (i = 0; i < nr_pages; i++) {
- BUG_ON(list_empty(pages));
+ if (RB_WARN_ON(cpu_buffer, list_empty(pages)))
+ return;
p = pages->next;
page = list_entry(p, struct buffer_page, list);
list_del_init(&page->list);
@@ -562,7 +606,10 @@
if (size < buffer_size) {
/* easy case, just free pages */
- BUG_ON(nr_pages >= buffer->pages);
+ if (RB_WARN_ON(buffer, nr_pages >= buffer->pages)) {
+ mutex_unlock(&buffer->mutex);
+ return -1;
+ }
rm_pages = buffer->pages - nr_pages;
@@ -581,7 +628,11 @@
* add these pages to the cpu_buffers. Otherwise we just free
* them all and return -ENOMEM;
*/
- BUG_ON(nr_pages <= buffer->pages);
+ if (RB_WARN_ON(buffer, nr_pages <= buffer->pages)) {
+ mutex_unlock(&buffer->mutex);
+ return -1;
+ }
+
new_pages = nr_pages - buffer->pages;
for_each_buffer_cpu(buffer, cpu) {
@@ -604,7 +655,10 @@
rb_insert_pages(cpu_buffer, &pages, new_pages);
}
- BUG_ON(!list_empty(&pages));
+ if (RB_WARN_ON(buffer, !list_empty(&pages))) {
+ mutex_unlock(&buffer->mutex);
+ return -1;
+ }
out:
buffer->pages = nr_pages;
@@ -693,7 +747,8 @@
head += rb_event_length(event)) {
event = __rb_page_index(cpu_buffer->head_page, head);
- BUG_ON(rb_null_event(event));
+ if (RB_WARN_ON(cpu_buffer, rb_null_event(event)))
+ return;
/* Only count data entries */
if (event->type != RINGBUF_TYPE_DATA)
continue;
@@ -746,8 +801,9 @@
addr &= PAGE_MASK;
while (cpu_buffer->commit_page->page != (void *)addr) {
- RB_WARN_ON(cpu_buffer,
- cpu_buffer->commit_page == cpu_buffer->tail_page);
+ if (RB_WARN_ON(cpu_buffer,
+ cpu_buffer->commit_page == cpu_buffer->tail_page))
+ return;
cpu_buffer->commit_page->commit =
cpu_buffer->commit_page->write;
rb_inc_page(cpu_buffer, &cpu_buffer->commit_page);
@@ -894,7 +950,8 @@
if (write > BUF_PAGE_SIZE) {
struct buffer_page *next_page = tail_page;
- spin_lock_irqsave(&cpu_buffer->lock, flags);
+ local_irq_save(flags);
+ __raw_spin_lock(&cpu_buffer->lock);
rb_inc_page(cpu_buffer, &next_page);
@@ -902,7 +959,8 @@
reader_page = cpu_buffer->reader_page;
/* we grabbed the lock before incrementing */
- RB_WARN_ON(cpu_buffer, next_page == reader_page);
+ if (RB_WARN_ON(cpu_buffer, next_page == reader_page))
+ goto out_unlock;
/*
* If for some reason, we had an interrupt storm that made
@@ -970,7 +1028,8 @@
rb_set_commit_to_write(cpu_buffer);
}
- spin_unlock_irqrestore(&cpu_buffer->lock, flags);
+ __raw_spin_unlock(&cpu_buffer->lock);
+ local_irq_restore(flags);
/* fail and let the caller try again */
return ERR_PTR(-EAGAIN);
@@ -978,7 +1037,8 @@
/* We reserved something on the buffer */
- BUG_ON(write > BUF_PAGE_SIZE);
+ if (RB_WARN_ON(cpu_buffer, write > BUF_PAGE_SIZE))
+ return NULL;
event = __rb_page_index(tail_page, tail);
rb_update_event(event, type, length);
@@ -993,7 +1053,8 @@
return event;
out_unlock:
- spin_unlock_irqrestore(&cpu_buffer->lock, flags);
+ __raw_spin_unlock(&cpu_buffer->lock);
+ local_irq_restore(flags);
return NULL;
}
@@ -1076,10 +1137,8 @@
* storm or we have something buggy.
* Bail!
*/
- if (unlikely(++nr_loops > 1000)) {
- RB_WARN_ON(cpu_buffer, 1);
+ if (RB_WARN_ON(cpu_buffer, ++nr_loops > 1000))
return NULL;
- }
ts = ring_buffer_time_stamp(cpu_buffer->cpu);
@@ -1175,15 +1234,14 @@
struct ring_buffer_event *event;
int cpu, resched;
- if (ring_buffers_off)
+ if (ring_buffer_flags != RB_BUFFERS_ON)
return NULL;
if (atomic_read(&buffer->record_disabled))
return NULL;
/* If we are tracing schedule, we don't want to recurse */
- resched = need_resched();
- preempt_disable_notrace();
+ resched = ftrace_preempt_disable();
cpu = raw_smp_processor_id();
@@ -1214,10 +1272,7 @@
return event;
out:
- if (resched)
- preempt_enable_notrace();
- else
- preempt_enable_notrace();
+ ftrace_preempt_enable(resched);
return NULL;
}
@@ -1259,12 +1314,9 @@
/*
* Only the last preempt count needs to restore preemption.
*/
- if (preempt_count() == 1) {
- if (per_cpu(rb_need_resched, cpu))
- preempt_enable_no_resched_notrace();
- else
- preempt_enable_notrace();
- } else
+ if (preempt_count() == 1)
+ ftrace_preempt_enable(per_cpu(rb_need_resched, cpu));
+ else
preempt_enable_no_resched_notrace();
return 0;
@@ -1294,14 +1346,13 @@
int ret = -EBUSY;
int cpu, resched;
- if (ring_buffers_off)
+ if (ring_buffer_flags != RB_BUFFERS_ON)
return -EBUSY;
if (atomic_read(&buffer->record_disabled))
return -EBUSY;
- resched = need_resched();
- preempt_disable_notrace();
+ resched = ftrace_preempt_disable();
cpu = raw_smp_processor_id();
@@ -1327,10 +1378,7 @@
ret = 0;
out:
- if (resched)
- preempt_enable_no_resched_notrace();
- else
- preempt_enable_notrace();
+ ftrace_preempt_enable(resched);
return ret;
}
@@ -1489,14 +1537,7 @@
return overruns;
}
-/**
- * ring_buffer_iter_reset - reset an iterator
- * @iter: The iterator to reset
- *
- * Resets the iterator, so that it will start from the beginning
- * again.
- */
-void ring_buffer_iter_reset(struct ring_buffer_iter *iter)
+static void rb_iter_reset(struct ring_buffer_iter *iter)
{
struct ring_buffer_per_cpu *cpu_buffer = iter->cpu_buffer;
@@ -1515,6 +1556,23 @@
}
/**
+ * ring_buffer_iter_reset - reset an iterator
+ * @iter: The iterator to reset
+ *
+ * Resets the iterator, so that it will start from the beginning
+ * again.
+ */
+void ring_buffer_iter_reset(struct ring_buffer_iter *iter)
+{
+ struct ring_buffer_per_cpu *cpu_buffer = iter->cpu_buffer;
+ unsigned long flags;
+
+ spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
+ rb_iter_reset(iter);
+ spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
+}
+
+/**
* ring_buffer_iter_empty - check if an iterator has no more to read
* @iter: The iterator to check
*/
@@ -1597,7 +1655,8 @@
unsigned long flags;
int nr_loops = 0;
- spin_lock_irqsave(&cpu_buffer->lock, flags);
+ local_irq_save(flags);
+ __raw_spin_lock(&cpu_buffer->lock);
again:
/*
@@ -1606,8 +1665,7 @@
* a case where we will loop three times. There should be no
* reason to loop four times (that I know of).
*/
- if (unlikely(++nr_loops > 3)) {
- RB_WARN_ON(cpu_buffer, 1);
+ if (RB_WARN_ON(cpu_buffer, ++nr_loops > 3)) {
reader = NULL;
goto out;
}
@@ -1619,8 +1677,9 @@
goto out;
/* Never should we have an index greater than the size */
- RB_WARN_ON(cpu_buffer,
- cpu_buffer->reader_page->read > rb_page_size(reader));
+ if (RB_WARN_ON(cpu_buffer,
+ cpu_buffer->reader_page->read > rb_page_size(reader)))
+ goto out;
/* check if we caught up to the tail */
reader = NULL;
@@ -1659,7 +1718,8 @@
goto again;
out:
- spin_unlock_irqrestore(&cpu_buffer->lock, flags);
+ __raw_spin_unlock(&cpu_buffer->lock);
+ local_irq_restore(flags);
return reader;
}
@@ -1673,7 +1733,8 @@
reader = rb_get_reader_page(cpu_buffer);
/* This function should not be called when buffer is empty */
- BUG_ON(!reader);
+ if (RB_WARN_ON(cpu_buffer, !reader))
+ return;
event = rb_reader_event(cpu_buffer);
@@ -1700,7 +1761,9 @@
* Check if we are at the end of the buffer.
*/
if (iter->head >= rb_page_size(iter->head_page)) {
- BUG_ON(iter->head_page == cpu_buffer->commit_page);
+ if (RB_WARN_ON(buffer,
+ iter->head_page == cpu_buffer->commit_page))
+ return;
rb_inc_iter(iter);
return;
}
@@ -1713,8 +1776,10 @@
* This should not be called to advance the header if we are
* at the tail of the buffer.
*/
- BUG_ON((iter->head_page == cpu_buffer->commit_page) &&
- (iter->head + length > rb_commit_index(cpu_buffer)));
+ if (RB_WARN_ON(cpu_buffer,
+ (iter->head_page == cpu_buffer->commit_page) &&
+ (iter->head + length > rb_commit_index(cpu_buffer))))
+ return;
rb_update_iter_read_stamp(iter, event);
@@ -1726,17 +1791,8 @@
rb_advance_iter(iter);
}
-/**
- * ring_buffer_peek - peek at the next event to be read
- * @buffer: The ring buffer to read
- * @cpu: The cpu to peak at
- * @ts: The timestamp counter of this event.
- *
- * This will return the event that will be read next, but does
- * not consume the data.
- */
-struct ring_buffer_event *
-ring_buffer_peek(struct ring_buffer *buffer, int cpu, u64 *ts)
+static struct ring_buffer_event *
+rb_buffer_peek(struct ring_buffer *buffer, int cpu, u64 *ts)
{
struct ring_buffer_per_cpu *cpu_buffer;
struct ring_buffer_event *event;
@@ -1757,10 +1813,8 @@
* can have. Nesting 10 deep of interrupts is clearly
* an anomaly.
*/
- if (unlikely(++nr_loops > 10)) {
- RB_WARN_ON(cpu_buffer, 1);
+ if (RB_WARN_ON(cpu_buffer, ++nr_loops > 10))
return NULL;
- }
reader = rb_get_reader_page(cpu_buffer);
if (!reader)
@@ -1798,16 +1852,8 @@
return NULL;
}
-/**
- * ring_buffer_iter_peek - peek at the next event to be read
- * @iter: The ring buffer iterator
- * @ts: The timestamp counter of this event.
- *
- * This will return the event that will be read next, but does
- * not increment the iterator.
- */
-struct ring_buffer_event *
-ring_buffer_iter_peek(struct ring_buffer_iter *iter, u64 *ts)
+static struct ring_buffer_event *
+rb_iter_peek(struct ring_buffer_iter *iter, u64 *ts)
{
struct ring_buffer *buffer;
struct ring_buffer_per_cpu *cpu_buffer;
@@ -1829,10 +1875,8 @@
* can have. Nesting 10 deep of interrupts is clearly
* an anomaly.
*/
- if (unlikely(++nr_loops > 10)) {
- RB_WARN_ON(cpu_buffer, 1);
+ if (RB_WARN_ON(cpu_buffer, ++nr_loops > 10))
return NULL;
- }
if (rb_per_cpu_empty(cpu_buffer))
return NULL;
@@ -1869,6 +1913,51 @@
}
/**
+ * ring_buffer_peek - peek at the next event to be read
+ * @buffer: The ring buffer to read
+ * @cpu: The cpu to peak at
+ * @ts: The timestamp counter of this event.
+ *
+ * This will return the event that will be read next, but does
+ * not consume the data.
+ */
+struct ring_buffer_event *
+ring_buffer_peek(struct ring_buffer *buffer, int cpu, u64 *ts)
+{
+ struct ring_buffer_per_cpu *cpu_buffer = buffer->buffers[cpu];
+ struct ring_buffer_event *event;
+ unsigned long flags;
+
+ spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
+ event = rb_buffer_peek(buffer, cpu, ts);
+ spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
+
+ return event;
+}
+
+/**
+ * ring_buffer_iter_peek - peek at the next event to be read
+ * @iter: The ring buffer iterator
+ * @ts: The timestamp counter of this event.
+ *
+ * This will return the event that will be read next, but does
+ * not increment the iterator.
+ */
+struct ring_buffer_event *
+ring_buffer_iter_peek(struct ring_buffer_iter *iter, u64 *ts)
+{
+ struct ring_buffer_per_cpu *cpu_buffer = iter->cpu_buffer;
+ struct ring_buffer_event *event;
+ unsigned long flags;
+
+ spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
+ event = rb_iter_peek(iter, ts);
+ spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
+
+ return event;
+}
+
+/**
* ring_buffer_consume - return an event and consume it
* @buffer: The ring buffer to get the next event from
*
@@ -1879,19 +1968,24 @@
struct ring_buffer_event *
ring_buffer_consume(struct ring_buffer *buffer, int cpu, u64 *ts)
{
- struct ring_buffer_per_cpu *cpu_buffer;
+ struct ring_buffer_per_cpu *cpu_buffer = buffer->buffers[cpu];
struct ring_buffer_event *event;
+ unsigned long flags;
if (!cpu_isset(cpu, buffer->cpumask))
return NULL;
- event = ring_buffer_peek(buffer, cpu, ts);
- if (!event)
- return NULL;
+ spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
- cpu_buffer = buffer->buffers[cpu];
+ event = rb_buffer_peek(buffer, cpu, ts);
+ if (!event)
+ goto out;
+
rb_advance_reader(cpu_buffer);
+ out:
+ spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
+
return event;
}
@@ -1928,9 +2022,11 @@
atomic_inc(&cpu_buffer->record_disabled);
synchronize_sched();
- spin_lock_irqsave(&cpu_buffer->lock, flags);
- ring_buffer_iter_reset(iter);
- spin_unlock_irqrestore(&cpu_buffer->lock, flags);
+ spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
+ __raw_spin_lock(&cpu_buffer->lock);
+ rb_iter_reset(iter);
+ __raw_spin_unlock(&cpu_buffer->lock);
+ spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
return iter;
}
@@ -1962,12 +2058,17 @@
ring_buffer_read(struct ring_buffer_iter *iter, u64 *ts)
{
struct ring_buffer_event *event;
+ struct ring_buffer_per_cpu *cpu_buffer = iter->cpu_buffer;
+ unsigned long flags;
- event = ring_buffer_iter_peek(iter, ts);
+ spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
+ event = rb_iter_peek(iter, ts);
if (!event)
- return NULL;
+ goto out;
rb_advance_iter(iter);
+ out:
+ spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
return event;
}
@@ -2016,11 +2117,15 @@
if (!cpu_isset(cpu, buffer->cpumask))
return;
- spin_lock_irqsave(&cpu_buffer->lock, flags);
+ spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
+
+ __raw_spin_lock(&cpu_buffer->lock);
rb_reset_cpu(cpu_buffer);
- spin_unlock_irqrestore(&cpu_buffer->lock, flags);
+ __raw_spin_unlock(&cpu_buffer->lock);
+
+ spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
}
/**
@@ -2122,12 +2227,14 @@
rb_simple_read(struct file *filp, char __user *ubuf,
size_t cnt, loff_t *ppos)
{
- int *p = filp->private_data;
+ long *p = filp->private_data;
char buf[64];
int r;
- /* !ring_buffers_off == tracing_on */
- r = sprintf(buf, "%d\n", !*p);
+ if (test_bit(RB_BUFFERS_DISABLED_BIT, p))
+ r = sprintf(buf, "permanently disabled\n");
+ else
+ r = sprintf(buf, "%d\n", test_bit(RB_BUFFERS_ON_BIT, p));
return simple_read_from_buffer(ubuf, cnt, ppos, buf, r);
}
@@ -2136,7 +2243,7 @@
rb_simple_write(struct file *filp, const char __user *ubuf,
size_t cnt, loff_t *ppos)
{
- int *p = filp->private_data;
+ long *p = filp->private_data;
char buf[64];
long val;
int ret;
@@ -2153,8 +2260,10 @@
if (ret < 0)
return ret;
- /* !ring_buffers_off == tracing_on */
- *p = !val;
+ if (val)
+ set_bit(RB_BUFFERS_ON_BIT, p);
+ else
+ clear_bit(RB_BUFFERS_ON_BIT, p);
(*ppos)++;
@@ -2176,7 +2285,7 @@
d_tracer = tracing_init_dentry();
entry = debugfs_create_file("tracing_on", 0644, d_tracer,
- &ring_buffers_off, &rb_simple_fops);
+ &ring_buffer_flags, &rb_simple_fops);
if (!entry)
pr_warning("Could not create debugfs 'tracing_on' entry\n");
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index d86e325..a45b59e 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -30,6 +30,7 @@
#include <linux/gfp.h>
#include <linux/fs.h>
#include <linux/kprobes.h>
+#include <linux/seq_file.h>
#include <linux/writeback.h>
#include <linux/stacktrace.h>
@@ -43,6 +44,29 @@
unsigned long __read_mostly tracing_max_latency = (cycle_t)ULONG_MAX;
unsigned long __read_mostly tracing_thresh;
+/* For tracers that don't implement custom flags */
+static struct tracer_opt dummy_tracer_opt[] = {
+ { }
+};
+
+static struct tracer_flags dummy_tracer_flags = {
+ .val = 0,
+ .opts = dummy_tracer_opt
+};
+
+static int dummy_set_flag(u32 old_flags, u32 bit, int set)
+{
+ return 0;
+}
+
+/*
+ * Kill all tracing for good (never come back).
+ * It is initialized to 1 but will turn to zero if the initialization
+ * of the tracer is successful. But that is the only place that sets
+ * this back to zero.
+ */
+int tracing_disabled = 1;
+
static DEFINE_PER_CPU(local_t, ftrace_cpu_disabled);
static inline void ftrace_disable_cpu(void)
@@ -62,7 +86,36 @@
#define for_each_tracing_cpu(cpu) \
for_each_cpu_mask(cpu, tracing_buffer_mask)
-static int tracing_disabled = 1;
+/*
+ * ftrace_dump_on_oops - variable to dump ftrace buffer on oops
+ *
+ * If there is an oops (or kernel panic) and the ftrace_dump_on_oops
+ * is set, then ftrace_dump is called. This will output the contents
+ * of the ftrace buffers to the console. This is very useful for
+ * capturing traces that lead to crashes and outputing it to a
+ * serial console.
+ *
+ * It is default off, but you can enable it with either specifying
+ * "ftrace_dump_on_oops" in the kernel command line, or setting
+ * /proc/sys/kernel/ftrace_dump_on_oops to true.
+ */
+int ftrace_dump_on_oops;
+
+static int tracing_set_tracer(char *buf);
+
+static int __init set_ftrace(char *str)
+{
+ tracing_set_tracer(str);
+ return 1;
+}
+__setup("ftrace", set_ftrace);
+
+static int __init set_ftrace_dump_on_oops(char *str)
+{
+ ftrace_dump_on_oops = 1;
+ return 1;
+}
+__setup("ftrace_dump_on_oops", set_ftrace_dump_on_oops);
long
ns2usecs(cycle_t nsec)
@@ -112,6 +165,19 @@
/* tracer_enabled is used to toggle activation of a tracer */
static int tracer_enabled = 1;
+/**
+ * tracing_is_enabled - return tracer_enabled status
+ *
+ * This function is used by other tracers to know the status
+ * of the tracer_enabled flag. Tracers may use this function
+ * to know if it should enable their features when starting
+ * up. See irqsoff tracer for an example (start_irqsoff_tracer).
+ */
+int tracing_is_enabled(void)
+{
+ return tracer_enabled;
+}
+
/* function tracing enabled */
int ftrace_function_enabled;
@@ -153,8 +219,9 @@
/* trace_wait is a waitqueue for tasks blocked on trace_poll */
static DECLARE_WAIT_QUEUE_HEAD(trace_wait);
-/* trace_flags holds iter_ctrl options */
-unsigned long trace_flags = TRACE_ITER_PRINT_PARENT;
+/* trace_flags holds trace_options default values */
+unsigned long trace_flags = TRACE_ITER_PRINT_PARENT | TRACE_ITER_PRINTK |
+ TRACE_ITER_ANNOTATE;
/**
* trace_wake_up - wake up tasks waiting for trace input
@@ -193,13 +260,6 @@
return nsecs / 1000;
}
-/*
- * TRACE_ITER_SYM_MASK masks the options in trace_flags that
- * control the output of kernel symbols.
- */
-#define TRACE_ITER_SYM_MASK \
- (TRACE_ITER_PRINT_PARENT|TRACE_ITER_SYM_OFFSET|TRACE_ITER_SYM_ADDR)
-
/* These must match the bit postions in trace_iterator_flags */
static const char *trace_options[] = {
"print-parent",
@@ -213,6 +273,11 @@
"stacktrace",
"sched-tree",
"ftrace_printk",
+ "ftrace_preempt",
+ "branch",
+ "annotate",
+ "userstacktrace",
+ "sym-userobj",
NULL
};
@@ -359,6 +424,28 @@
return trace_seq_putmem(s, hex, j);
}
+static int
+trace_seq_path(struct trace_seq *s, struct path *path)
+{
+ unsigned char *p;
+
+ if (s->len >= (PAGE_SIZE - 1))
+ return 0;
+ p = d_path(path, s->buffer + s->len, PAGE_SIZE - s->len);
+ if (!IS_ERR(p)) {
+ p = mangle_path(s->buffer + s->len, p, "\n");
+ if (p) {
+ s->len = p - s->buffer;
+ return 1;
+ }
+ } else {
+ s->buffer[s->len++] = '?';
+ return 1;
+ }
+
+ return 0;
+}
+
static void
trace_seq_reset(struct trace_seq *s)
{
@@ -470,7 +557,15 @@
return -1;
}
+ /*
+ * When this gets called we hold the BKL which means that
+ * preemption is disabled. Various trace selftests however
+ * need to disable and enable preemption for successful tests.
+ * So we drop the BKL here and grab it after the tests again.
+ */
+ unlock_kernel();
mutex_lock(&trace_types_lock);
+
for (t = trace_types; t; t = t->next) {
if (strcmp(type->name, t->name) == 0) {
/* already found */
@@ -481,11 +576,18 @@
}
}
+ if (!type->set_flag)
+ type->set_flag = &dummy_set_flag;
+ if (!type->flags)
+ type->flags = &dummy_tracer_flags;
+ else
+ if (!type->flags->opts)
+ type->flags->opts = dummy_tracer_opt;
+
#ifdef CONFIG_FTRACE_STARTUP_TEST
if (type->selftest) {
struct tracer *saved_tracer = current_trace;
struct trace_array *tr = &global_trace;
- int saved_ctrl = tr->ctrl;
int i;
/*
* Run a selftest on this tracer.
@@ -494,25 +596,23 @@
* internal tracing to verify that everything is in order.
* If we fail, we do not register this tracer.
*/
- for_each_tracing_cpu(i) {
+ for_each_tracing_cpu(i)
tracing_reset(tr, i);
- }
+
current_trace = type;
- tr->ctrl = 0;
/* the test is responsible for initializing and enabling */
pr_info("Testing tracer %s: ", type->name);
ret = type->selftest(type, tr);
/* the test is responsible for resetting too */
current_trace = saved_tracer;
- tr->ctrl = saved_ctrl;
if (ret) {
printk(KERN_CONT "FAILED!\n");
goto out;
}
/* Only reset on passing, to avoid touching corrupted buffers */
- for_each_tracing_cpu(i) {
+ for_each_tracing_cpu(i)
tracing_reset(tr, i);
- }
+
printk(KERN_CONT "PASSED\n");
}
#endif
@@ -525,6 +625,7 @@
out:
mutex_unlock(&trace_types_lock);
+ lock_kernel();
return ret;
}
@@ -581,6 +682,91 @@
cmdline_idx = 0;
}
+static int trace_stop_count;
+static DEFINE_SPINLOCK(tracing_start_lock);
+
+/**
+ * ftrace_off_permanent - disable all ftrace code permanently
+ *
+ * This should only be called when a serious anomally has
+ * been detected. This will turn off the function tracing,
+ * ring buffers, and other tracing utilites. It takes no
+ * locks and can be called from any context.
+ */
+void ftrace_off_permanent(void)
+{
+ tracing_disabled = 1;
+ ftrace_stop();
+ tracing_off_permanent();
+}
+
+/**
+ * tracing_start - quick start of the tracer
+ *
+ * If tracing is enabled but was stopped by tracing_stop,
+ * this will start the tracer back up.
+ */
+void tracing_start(void)
+{
+ struct ring_buffer *buffer;
+ unsigned long flags;
+
+ if (tracing_disabled)
+ return;
+
+ spin_lock_irqsave(&tracing_start_lock, flags);
+ if (--trace_stop_count)
+ goto out;
+
+ if (trace_stop_count < 0) {
+ /* Someone screwed up their debugging */
+ WARN_ON_ONCE(1);
+ trace_stop_count = 0;
+ goto out;
+ }
+
+
+ buffer = global_trace.buffer;
+ if (buffer)
+ ring_buffer_record_enable(buffer);
+
+ buffer = max_tr.buffer;
+ if (buffer)
+ ring_buffer_record_enable(buffer);
+
+ ftrace_start();
+ out:
+ spin_unlock_irqrestore(&tracing_start_lock, flags);
+}
+
+/**
+ * tracing_stop - quick stop of the tracer
+ *
+ * Light weight way to stop tracing. Use in conjunction with
+ * tracing_start.
+ */
+void tracing_stop(void)
+{
+ struct ring_buffer *buffer;
+ unsigned long flags;
+
+ ftrace_stop();
+ spin_lock_irqsave(&tracing_start_lock, flags);
+ if (trace_stop_count++)
+ goto out;
+
+ buffer = global_trace.buffer;
+ if (buffer)
+ ring_buffer_record_disable(buffer);
+
+ buffer = max_tr.buffer;
+ if (buffer)
+ ring_buffer_record_disable(buffer);
+
+ out:
+ spin_unlock_irqrestore(&tracing_start_lock, flags);
+}
+
void trace_stop_cmdline_recording(void);
static void trace_save_cmdline(struct task_struct *tsk)
@@ -655,6 +841,7 @@
entry->preempt_count = pc & 0xff;
entry->pid = (tsk) ? tsk->pid : 0;
+ entry->tgid = (tsk) ? tsk->tgid : 0;
entry->flags =
#ifdef CONFIG_TRACE_IRQFLAGS_SUPPORT
(irqs_disabled_flags(flags) ? TRACE_FLAG_IRQS_OFF : 0) |
@@ -691,6 +878,36 @@
ring_buffer_unlock_commit(tr->buffer, event, irq_flags);
}
+#ifdef CONFIG_FUNCTION_RET_TRACER
+static void __trace_function_return(struct trace_array *tr,
+ struct trace_array_cpu *data,
+ struct ftrace_retfunc *trace,
+ unsigned long flags,
+ int pc)
+{
+ struct ring_buffer_event *event;
+ struct ftrace_ret_entry *entry;
+ unsigned long irq_flags;
+
+ if (unlikely(local_read(&__get_cpu_var(ftrace_cpu_disabled))))
+ return;
+
+ event = ring_buffer_lock_reserve(global_trace.buffer, sizeof(*entry),
+ &irq_flags);
+ if (!event)
+ return;
+ entry = ring_buffer_event_data(event);
+ tracing_generic_entry_update(&entry->ent, flags, pc);
+ entry->ent.type = TRACE_FN_RET;
+ entry->ip = trace->func;
+ entry->parent_ip = trace->ret;
+ entry->rettime = trace->rettime;
+ entry->calltime = trace->calltime;
+ entry->overrun = trace->overrun;
+ ring_buffer_unlock_commit(global_trace.buffer, event, irq_flags);
+}
+#endif
+
void
ftrace(struct trace_array *tr, struct trace_array_cpu *data,
unsigned long ip, unsigned long parent_ip, unsigned long flags,
@@ -742,6 +959,44 @@
ftrace_trace_stack(tr, data, flags, skip, preempt_count());
}
+static void ftrace_trace_userstack(struct trace_array *tr,
+ struct trace_array_cpu *data,
+ unsigned long flags, int pc)
+{
+ struct ring_buffer_event *event;
+ struct userstack_entry *entry;
+ struct stack_trace trace;
+ unsigned long irq_flags;
+
+ if (!(trace_flags & TRACE_ITER_USERSTACKTRACE))
+ return;
+
+ event = ring_buffer_lock_reserve(tr->buffer, sizeof(*entry),
+ &irq_flags);
+ if (!event)
+ return;
+ entry = ring_buffer_event_data(event);
+ tracing_generic_entry_update(&entry->ent, flags, pc);
+ entry->ent.type = TRACE_USER_STACK;
+
+ memset(&entry->caller, 0, sizeof(entry->caller));
+
+ trace.nr_entries = 0;
+ trace.max_entries = FTRACE_STACK_ENTRIES;
+ trace.skip = 0;
+ trace.entries = entry->caller;
+
+ save_stack_trace_user(&trace);
+ ring_buffer_unlock_commit(tr->buffer, event, irq_flags);
+}
+
+void __trace_userstack(struct trace_array *tr,
+ struct trace_array_cpu *data,
+ unsigned long flags)
+{
+ ftrace_trace_userstack(tr, data, flags, preempt_count());
+}
+
static void
ftrace_trace_special(void *__tr, void *__data,
unsigned long arg1, unsigned long arg2, unsigned long arg3,
@@ -765,6 +1020,7 @@
entry->arg3 = arg3;
ring_buffer_unlock_commit(tr->buffer, event, irq_flags);
ftrace_trace_stack(tr, data, irq_flags, 4, pc);
+ ftrace_trace_userstack(tr, data, irq_flags, pc);
trace_wake_up();
}
@@ -803,6 +1059,7 @@
entry->next_cpu = task_cpu(next);
ring_buffer_unlock_commit(tr->buffer, event, irq_flags);
ftrace_trace_stack(tr, data, flags, 5, pc);
+ ftrace_trace_userstack(tr, data, flags, pc);
}
void
@@ -832,6 +1089,7 @@
entry->next_cpu = task_cpu(wakee);
ring_buffer_unlock_commit(tr->buffer, event, irq_flags);
ftrace_trace_stack(tr, data, flags, 6, pc);
+ ftrace_trace_userstack(tr, data, flags, pc);
trace_wake_up();
}
@@ -841,26 +1099,28 @@
{
struct trace_array *tr = &global_trace;
struct trace_array_cpu *data;
+ unsigned long flags;
int cpu;
int pc;
- if (tracing_disabled || !tr->ctrl)
+ if (tracing_disabled)
return;
pc = preempt_count();
- preempt_disable_notrace();
+ local_irq_save(flags);
cpu = raw_smp_processor_id();
data = tr->data[cpu];
- if (likely(!atomic_read(&data->disabled)))
+ if (likely(atomic_inc_return(&data->disabled) == 1))
ftrace_trace_special(tr, data, arg1, arg2, arg3, pc);
- preempt_enable_notrace();
+ atomic_dec(&data->disabled);
+ local_irq_restore(flags);
}
#ifdef CONFIG_FUNCTION_TRACER
static void
-function_trace_call(unsigned long ip, unsigned long parent_ip)
+function_trace_call_preempt_only(unsigned long ip, unsigned long parent_ip)
{
struct trace_array *tr = &global_trace;
struct trace_array_cpu *data;
@@ -873,8 +1133,7 @@
return;
pc = preempt_count();
- resched = need_resched();
- preempt_disable_notrace();
+ resched = ftrace_preempt_disable();
local_save_flags(flags);
cpu = raw_smp_processor_id();
data = tr->data[cpu];
@@ -884,12 +1143,63 @@
trace_function(tr, data, ip, parent_ip, flags, pc);
atomic_dec(&data->disabled);
- if (resched)
- preempt_enable_no_resched_notrace();
- else
- preempt_enable_notrace();
+ ftrace_preempt_enable(resched);
}
+static void
+function_trace_call(unsigned long ip, unsigned long parent_ip)
+{
+ struct trace_array *tr = &global_trace;
+ struct trace_array_cpu *data;
+ unsigned long flags;
+ long disabled;
+ int cpu;
+ int pc;
+
+ if (unlikely(!ftrace_function_enabled))
+ return;
+
+ /*
+ * Need to use raw, since this must be called before the
+ * recursive protection is performed.
+ */
+ local_irq_save(flags);
+ cpu = raw_smp_processor_id();
+ data = tr->data[cpu];
+ disabled = atomic_inc_return(&data->disabled);
+
+ if (likely(disabled == 1)) {
+ pc = preempt_count();
+ trace_function(tr, data, ip, parent_ip, flags, pc);
+ }
+
+ atomic_dec(&data->disabled);
+ local_irq_restore(flags);
+}
+
+#ifdef CONFIG_FUNCTION_RET_TRACER
+void trace_function_return(struct ftrace_retfunc *trace)
+{
+ struct trace_array *tr = &global_trace;
+ struct trace_array_cpu *data;
+ unsigned long flags;
+ long disabled;
+ int cpu;
+ int pc;
+
+ raw_local_irq_save(flags);
+ cpu = raw_smp_processor_id();
+ data = tr->data[cpu];
+ disabled = atomic_inc_return(&data->disabled);
+ if (likely(disabled == 1)) {
+ pc = preempt_count();
+ __trace_function_return(tr, data, trace, flags, pc);
+ }
+ atomic_dec(&data->disabled);
+ raw_local_irq_restore(flags);
+}
+#endif /* CONFIG_FUNCTION_RET_TRACER */
+
static struct ftrace_ops trace_ops __read_mostly =
{
.func = function_trace_call,
@@ -898,9 +1208,14 @@
void tracing_start_function_trace(void)
{
ftrace_function_enabled = 0;
+
+ if (trace_flags & TRACE_ITER_PREEMPTONLY)
+ trace_ops.func = function_trace_call_preempt_only;
+ else
+ trace_ops.func = function_trace_call;
+
register_ftrace_function(&trace_ops);
- if (tracer_enabled)
- ftrace_function_enabled = 1;
+ ftrace_function_enabled = 1;
}
void tracing_stop_function_trace(void)
@@ -912,6 +1227,7 @@
enum trace_file_type {
TRACE_FILE_LAT_FMT = 1,
+ TRACE_FILE_ANNOTATE = 2,
};
static void trace_iterator_increment(struct trace_iterator *iter, int cpu)
@@ -1047,10 +1363,6 @@
atomic_inc(&trace_record_cmdline_disabled);
- /* let the tracer grab locks here if needed */
- if (current_trace->start)
- current_trace->start(iter);
-
if (*pos != iter->pos) {
iter->ent = NULL;
iter->cpu = 0;
@@ -1077,14 +1389,7 @@
static void s_stop(struct seq_file *m, void *p)
{
- struct trace_iterator *iter = m->private;
-
atomic_dec(&trace_record_cmdline_disabled);
-
- /* let the tracer release locks here if needed */
- if (current_trace && current_trace == iter->trace && iter->trace->stop)
- iter->trace->stop(iter);
-
mutex_unlock(&trace_types_lock);
}
@@ -1143,7 +1448,7 @@
# define IP_FMT "%016lx"
#endif
-static int
+int
seq_print_ip_sym(struct trace_seq *s, unsigned long ip, unsigned long sym_flags)
{
int ret;
@@ -1164,6 +1469,78 @@
return ret;
}
+static inline int seq_print_user_ip(struct trace_seq *s, struct mm_struct *mm,
+ unsigned long ip, unsigned long sym_flags)
+{
+ struct file *file = NULL;
+ unsigned long vmstart = 0;
+ int ret = 1;
+
+ if (mm) {
+ const struct vm_area_struct *vma;
+
+ down_read(&mm->mmap_sem);
+ vma = find_vma(mm, ip);
+ if (vma) {
+ file = vma->vm_file;
+ vmstart = vma->vm_start;
+ }
+ if (file) {
+ ret = trace_seq_path(s, &file->f_path);
+ if (ret)
+ ret = trace_seq_printf(s, "[+0x%lx]", ip - vmstart);
+ }
+ up_read(&mm->mmap_sem);
+ }
+ if (ret && ((sym_flags & TRACE_ITER_SYM_ADDR) || !file))
+ ret = trace_seq_printf(s, " <" IP_FMT ">", ip);
+ return ret;
+}
+
+static int
+seq_print_userip_objs(const struct userstack_entry *entry, struct trace_seq *s,
+ unsigned long sym_flags)
+{
+ struct mm_struct *mm = NULL;
+ int ret = 1;
+ unsigned int i;
+
+ if (trace_flags & TRACE_ITER_SYM_USEROBJ) {
+ struct task_struct *task;
+ /*
+ * we do the lookup on the thread group leader,
+ * since individual threads might have already quit!
+ */
+ rcu_read_lock();
+ task = find_task_by_vpid(entry->ent.tgid);
+ if (task)
+ mm = get_task_mm(task);
+ rcu_read_unlock();
+ }
+
+ for (i = 0; i < FTRACE_STACK_ENTRIES; i++) {
+ unsigned long ip = entry->caller[i];
+
+ if (ip == ULONG_MAX || !ret)
+ break;
+ if (i && ret)
+ ret = trace_seq_puts(s, " <- ");
+ if (!ip) {
+ if (ret)
+ ret = trace_seq_puts(s, "??");
+ continue;
+ }
+ if (!ret)
+ break;
+ if (ret)
+ ret = seq_print_user_ip(s, mm, ip, sym_flags);
+ }
+
+ if (mm)
+ mmput(mm);
+ return ret;
+}
+
static void print_lat_help_header(struct seq_file *m)
{
seq_puts(m, "# _------=> CPU# \n");
@@ -1338,6 +1715,23 @@
trace_seq_putc(s, '\n');
}
+static void test_cpu_buff_start(struct trace_iterator *iter)
+{
+ struct trace_seq *s = &iter->seq;
+
+ if (!(trace_flags & TRACE_ITER_ANNOTATE))
+ return;
+
+ if (!(iter->iter_flags & TRACE_FILE_ANNOTATE))
+ return;
+
+ if (cpu_isset(iter->cpu, iter->started))
+ return;
+
+ cpu_set(iter->cpu, iter->started);
+ trace_seq_printf(s, "##### CPU %u buffer started ####\n", iter->cpu);
+}
+
static enum print_line_t
print_lat_fmt(struct trace_iterator *iter, unsigned int trace_idx, int cpu)
{
@@ -1357,6 +1751,8 @@
if (entry->type == TRACE_CONT)
return TRACE_TYPE_HANDLED;
+ test_cpu_buff_start(iter);
+
next_entry = find_next_entry(iter, NULL, &next_ts);
if (!next_entry)
next_ts = iter->ts;
@@ -1448,6 +1844,27 @@
trace_seq_print_cont(s, iter);
break;
}
+ case TRACE_BRANCH: {
+ struct trace_branch *field;
+
+ trace_assign_type(field, entry);
+
+ trace_seq_printf(s, "[%s] %s:%s:%d\n",
+ field->correct ? " ok " : " MISS ",
+ field->func,
+ field->file,
+ field->line);
+ break;
+ }
+ case TRACE_USER_STACK: {
+ struct userstack_entry *field;
+
+ trace_assign_type(field, entry);
+
+ seq_print_userip_objs(field, s, sym_flags);
+ trace_seq_putc(s, '\n');
+ break;
+ }
default:
trace_seq_printf(s, "Unknown type %d\n", entry->type);
}
@@ -1472,6 +1889,8 @@
if (entry->type == TRACE_CONT)
return TRACE_TYPE_HANDLED;
+ test_cpu_buff_start(iter);
+
comm = trace_find_cmdline(iter->ent->pid);
t = ns2usecs(iter->ts);
@@ -1581,6 +2000,35 @@
trace_seq_print_cont(s, iter);
break;
}
+ case TRACE_FN_RET: {
+ return print_return_function(iter);
+ break;
+ }
+ case TRACE_BRANCH: {
+ struct trace_branch *field;
+
+ trace_assign_type(field, entry);
+
+ trace_seq_printf(s, "[%s] %s:%s:%d\n",
+ field->correct ? " ok " : " MISS ",
+ field->func,
+ field->file,
+ field->line);
+ break;
+ }
+ case TRACE_USER_STACK: {
+ struct userstack_entry *field;
+
+ trace_assign_type(field, entry);
+
+ ret = seq_print_userip_objs(field, s, sym_flags);
+ if (!ret)
+ return TRACE_TYPE_PARTIAL_LINE;
+ ret = trace_seq_putc(s, '\n');
+ if (!ret)
+ return TRACE_TYPE_PARTIAL_LINE;
+ break;
+ }
}
return TRACE_TYPE_HANDLED;
}
@@ -1640,6 +2088,7 @@
break;
}
case TRACE_SPECIAL:
+ case TRACE_USER_STACK:
case TRACE_STACK: {
struct special_entry *field;
@@ -1728,6 +2177,7 @@
break;
}
case TRACE_SPECIAL:
+ case TRACE_USER_STACK:
case TRACE_STACK: {
struct special_entry *field;
@@ -1782,6 +2232,7 @@
break;
}
case TRACE_SPECIAL:
+ case TRACE_USER_STACK:
case TRACE_STACK: {
struct special_entry *field;
@@ -1899,6 +2350,11 @@
iter->trace = current_trace;
iter->pos = -1;
+ /* Annotate start of buffers if we had overruns */
+ if (ring_buffer_overruns(iter->tr->buffer))
+ iter->iter_flags |= TRACE_FILE_ANNOTATE;
+
+
for_each_tracing_cpu(cpu) {
iter->buffer_iter[cpu] =
@@ -1917,10 +2373,7 @@
m->private = iter;
/* stop the trace while dumping */
- if (iter->tr->ctrl) {
- tracer_enabled = 0;
- ftrace_function_enabled = 0;
- }
+ tracing_stop();
if (iter->trace && iter->trace->open)
iter->trace->open(iter);
@@ -1966,14 +2419,7 @@
iter->trace->close(iter);
/* reenable tracing if it was previously enabled */
- if (iter->tr->ctrl) {
- tracer_enabled = 1;
- /*
- * It is safe to enable function tracing even if it
- * isn't used
- */
- ftrace_function_enabled = 1;
- }
+ tracing_start();
mutex_unlock(&trace_types_lock);
seq_release(inode, file);
@@ -2189,13 +2635,16 @@
};
static ssize_t
-tracing_iter_ctrl_read(struct file *filp, char __user *ubuf,
+tracing_trace_options_read(struct file *filp, char __user *ubuf,
size_t cnt, loff_t *ppos)
{
+ int i;
char *buf;
int r = 0;
int len = 0;
- int i;
+ u32 tracer_flags = current_trace->flags->val;
+ struct tracer_opt *trace_opts = current_trace->flags->opts;
+
/* calulate max size */
for (i = 0; trace_options[i]; i++) {
@@ -2203,6 +2652,15 @@
len += 3; /* "no" and space */
}
+ /*
+ * Increase the size with names of options specific
+ * of the current tracer.
+ */
+ for (i = 0; trace_opts[i].name; i++) {
+ len += strlen(trace_opts[i].name);
+ len += 3; /* "no" and space */
+ }
+
/* +2 for \n and \0 */
buf = kmalloc(len + 2, GFP_KERNEL);
if (!buf)
@@ -2215,6 +2673,15 @@
r += sprintf(buf + r, "no%s ", trace_options[i]);
}
+ for (i = 0; trace_opts[i].name; i++) {
+ if (tracer_flags & trace_opts[i].bit)
+ r += sprintf(buf + r, "%s ",
+ trace_opts[i].name);
+ else
+ r += sprintf(buf + r, "no%s ",
+ trace_opts[i].name);
+ }
+
r += sprintf(buf + r, "\n");
WARN_ON(r >= len + 2);
@@ -2225,13 +2692,48 @@
return r;
}
+/* Try to assign a tracer specific option */
+static int set_tracer_option(struct tracer *trace, char *cmp, int neg)
+{
+ struct tracer_flags *trace_flags = trace->flags;
+ struct tracer_opt *opts = NULL;
+ int ret = 0, i = 0;
+ int len;
+
+ for (i = 0; trace_flags->opts[i].name; i++) {
+ opts = &trace_flags->opts[i];
+ len = strlen(opts->name);
+
+ if (strncmp(cmp, opts->name, len) == 0) {
+ ret = trace->set_flag(trace_flags->val,
+ opts->bit, !neg);
+ break;
+ }
+ }
+ /* Not found */
+ if (!trace_flags->opts[i].name)
+ return -EINVAL;
+
+ /* Refused to handle */
+ if (ret)
+ return ret;
+
+ if (neg)
+ trace_flags->val &= ~opts->bit;
+ else
+ trace_flags->val |= opts->bit;
+
+ return 0;
+}
+
static ssize_t
-tracing_iter_ctrl_write(struct file *filp, const char __user *ubuf,
+tracing_trace_options_write(struct file *filp, const char __user *ubuf,
size_t cnt, loff_t *ppos)
{
char buf[64];
char *cmp = buf;
int neg = 0;
+ int ret;
int i;
if (cnt >= sizeof(buf))
@@ -2258,11 +2760,13 @@
break;
}
}
- /*
- * If no option could be set, return an error:
- */
- if (!trace_options[i])
- return -EINVAL;
+
+ /* If no option could be set, test the specific tracer options */
+ if (!trace_options[i]) {
+ ret = set_tracer_option(current_trace, cmp, neg);
+ if (ret)
+ return ret;
+ }
filp->f_pos += cnt;
@@ -2271,8 +2775,8 @@
static struct file_operations tracing_iter_fops = {
.open = tracing_open_generic,
- .read = tracing_iter_ctrl_read,
- .write = tracing_iter_ctrl_write,
+ .read = tracing_trace_options_read,
+ .write = tracing_trace_options_write,
};
static const char readme_msg[] =
@@ -2286,9 +2790,9 @@
"# echo sched_switch > /debug/tracing/current_tracer\n"
"# cat /debug/tracing/current_tracer\n"
"sched_switch\n"
- "# cat /debug/tracing/iter_ctrl\n"
+ "# cat /debug/tracing/trace_options\n"
"noprint-parent nosym-offset nosym-addr noverbose\n"
- "# echo print-parent > /debug/tracing/iter_ctrl\n"
+ "# echo print-parent > /debug/tracing/trace_options\n"
"# echo 1 > /debug/tracing/tracing_enabled\n"
"# cat /debug/tracing/trace > /tmp/trace.txt\n"
"echo 0 > /debug/tracing/tracing_enabled\n"
@@ -2311,11 +2815,10 @@
tracing_ctrl_read(struct file *filp, char __user *ubuf,
size_t cnt, loff_t *ppos)
{
- struct trace_array *tr = filp->private_data;
char buf[64];
int r;
- r = sprintf(buf, "%ld\n", tr->ctrl);
+ r = sprintf(buf, "%u\n", tracer_enabled);
return simple_read_from_buffer(ubuf, cnt, ppos, buf, r);
}
@@ -2343,16 +2846,18 @@
val = !!val;
mutex_lock(&trace_types_lock);
- if (tr->ctrl ^ val) {
- if (val)
+ if (tracer_enabled ^ val) {
+ if (val) {
tracer_enabled = 1;
- else
+ if (current_trace->start)
+ current_trace->start(tr);
+ tracing_start();
+ } else {
tracer_enabled = 0;
-
- tr->ctrl = val;
-
- if (current_trace && current_trace->ctrl_update)
- current_trace->ctrl_update(tr);
+ tracing_stop();
+ if (current_trace->stop)
+ current_trace->stop(tr);
+ }
}
mutex_unlock(&trace_types_lock);
@@ -2378,15 +2883,50 @@
return simple_read_from_buffer(ubuf, cnt, ppos, buf, r);
}
+static int tracing_set_tracer(char *buf)
+{
+ struct trace_array *tr = &global_trace;
+ struct tracer *t;
+ int ret = 0;
+
+ mutex_lock(&trace_types_lock);
+ for (t = trace_types; t; t = t->next) {
+ if (strcmp(t->name, buf) == 0)
+ break;
+ }
+ if (!t) {
+ ret = -EINVAL;
+ goto out;
+ }
+ if (t == current_trace)
+ goto out;
+
+ trace_branch_disable();
+ if (current_trace && current_trace->reset)
+ current_trace->reset(tr);
+
+ current_trace = t;
+ if (t->init) {
+ ret = t->init(tr);
+ if (ret)
+ goto out;
+ }
+
+ trace_branch_enable(tr);
+ out:
+ mutex_unlock(&trace_types_lock);
+
+ return ret;
+}
+
static ssize_t
tracing_set_trace_write(struct file *filp, const char __user *ubuf,
size_t cnt, loff_t *ppos)
{
- struct trace_array *tr = &global_trace;
- struct tracer *t;
char buf[max_tracer_type_len+1];
int i;
size_t ret;
+ int err;
ret = cnt;
@@ -2402,30 +2942,11 @@
for (i = cnt - 1; i > 0 && isspace(buf[i]); i--)
buf[i] = 0;
- mutex_lock(&trace_types_lock);
- for (t = trace_types; t; t = t->next) {
- if (strcmp(t->name, buf) == 0)
- break;
- }
- if (!t) {
- ret = -EINVAL;
- goto out;
- }
- if (t == current_trace)
- goto out;
+ err = tracing_set_tracer(buf);
+ if (err)
+ return err;
- if (current_trace && current_trace->reset)
- current_trace->reset(tr);
-
- current_trace = t;
- if (t->init)
- t->init(tr);
-
- out:
- mutex_unlock(&trace_types_lock);
-
- if (ret > 0)
- filp->f_pos += ret;
+ filp->f_pos += ret;
return ret;
}
@@ -2492,6 +3013,10 @@
return -ENOMEM;
mutex_lock(&trace_types_lock);
+
+ /* trace pipe does not show start of buffer */
+ cpus_setall(iter->started);
+
iter->tr = &global_trace;
iter->trace = current_trace;
filp->private_data = iter;
@@ -2667,7 +3192,7 @@
char buf[64];
int r;
- r = sprintf(buf, "%lu\n", tr->entries);
+ r = sprintf(buf, "%lu\n", tr->entries >> 10);
return simple_read_from_buffer(ubuf, cnt, ppos, buf, r);
}
@@ -2678,7 +3203,6 @@
unsigned long val;
char buf[64];
int ret, cpu;
- struct trace_array *tr = filp->private_data;
if (cnt >= sizeof(buf))
return -EINVAL;
@@ -2698,12 +3222,7 @@
mutex_lock(&trace_types_lock);
- if (tr->ctrl) {
- cnt = -EBUSY;
- pr_info("ftrace: please disable tracing"
- " before modifying buffer size\n");
- goto out;
- }
+ tracing_stop();
/* disable all cpu buffers */
for_each_tracing_cpu(cpu) {
@@ -2713,6 +3232,9 @@
atomic_inc(&max_tr.data[cpu]->disabled);
}
+ /* value is in KB */
+ val <<= 10;
+
if (val != global_trace.entries) {
ret = ring_buffer_resize(global_trace.buffer, val);
if (ret < 0) {
@@ -2751,6 +3273,7 @@
atomic_dec(&max_tr.data[cpu]->disabled);
}
+ tracing_start();
max_tr.entries = global_trace.entries;
mutex_unlock(&trace_types_lock);
@@ -2773,9 +3296,8 @@
{
char *buf;
char *end;
- struct trace_array *tr = &global_trace;
- if (!tr->ctrl || tracing_disabled)
+ if (tracing_disabled)
return -EINVAL;
if (cnt > TRACE_BUF_SIZE)
@@ -2841,22 +3363,38 @@
#ifdef CONFIG_DYNAMIC_FTRACE
-static ssize_t
-tracing_read_long(struct file *filp, char __user *ubuf,
- size_t cnt, loff_t *ppos)
+int __weak ftrace_arch_read_dyn_info(char *buf, int size)
{
- unsigned long *p = filp->private_data;
- char buf[64];
- int r;
-
- r = sprintf(buf, "%ld\n", *p);
-
- return simple_read_from_buffer(ubuf, cnt, ppos, buf, r);
+ return 0;
}
-static struct file_operations tracing_read_long_fops = {
+static ssize_t
+tracing_read_dyn_info(struct file *filp, char __user *ubuf,
+ size_t cnt, loff_t *ppos)
+{
+ static char ftrace_dyn_info_buffer[1024];
+ static DEFINE_MUTEX(dyn_info_mutex);
+ unsigned long *p = filp->private_data;
+ char *buf = ftrace_dyn_info_buffer;
+ int size = ARRAY_SIZE(ftrace_dyn_info_buffer);
+ int r;
+
+ mutex_lock(&dyn_info_mutex);
+ r = sprintf(buf, "%ld ", *p);
+
+ r += ftrace_arch_read_dyn_info(buf+r, (size-1)-r);
+ buf[r++] = '\n';
+
+ r = simple_read_from_buffer(ubuf, cnt, ppos, buf, r);
+
+ mutex_unlock(&dyn_info_mutex);
+
+ return r;
+}
+
+static struct file_operations tracing_dyn_info_fops = {
.open = tracing_open_generic,
- .read = tracing_read_long,
+ .read = tracing_read_dyn_info,
};
#endif
@@ -2897,10 +3435,10 @@
if (!entry)
pr_warning("Could not create debugfs 'tracing_enabled' entry\n");
- entry = debugfs_create_file("iter_ctrl", 0644, d_tracer,
+ entry = debugfs_create_file("trace_options", 0644, d_tracer,
NULL, &tracing_iter_fops);
if (!entry)
- pr_warning("Could not create debugfs 'iter_ctrl' entry\n");
+ pr_warning("Could not create debugfs 'trace_options' entry\n");
entry = debugfs_create_file("tracing_cpumask", 0644, d_tracer,
NULL, &tracing_cpumask_fops);
@@ -2950,11 +3488,11 @@
pr_warning("Could not create debugfs "
"'trace_pipe' entry\n");
- entry = debugfs_create_file("trace_entries", 0644, d_tracer,
+ entry = debugfs_create_file("buffer_size_kb", 0644, d_tracer,
&global_trace, &tracing_entries_fops);
if (!entry)
pr_warning("Could not create debugfs "
- "'trace_entries' entry\n");
+ "'buffer_size_kb' entry\n");
entry = debugfs_create_file("trace_marker", 0220, d_tracer,
NULL, &tracing_mark_fops);
@@ -2965,7 +3503,7 @@
#ifdef CONFIG_DYNAMIC_FTRACE
entry = debugfs_create_file("dyn_ftrace_total_info", 0444, d_tracer,
&ftrace_update_tot_cnt,
- &tracing_read_long_fops);
+ &tracing_dyn_info_fops);
if (!entry)
pr_warning("Could not create debugfs "
"'dyn_ftrace_total_info' entry\n");
@@ -2988,7 +3526,7 @@
unsigned long flags, irq_flags;
int cpu, len = 0, size, pc;
- if (!tr->ctrl || tracing_disabled)
+ if (tracing_disabled)
return 0;
pc = preempt_count();
@@ -3046,7 +3584,8 @@
static int trace_panic_handler(struct notifier_block *this,
unsigned long event, void *unused)
{
- ftrace_dump();
+ if (ftrace_dump_on_oops)
+ ftrace_dump();
return NOTIFY_OK;
}
@@ -3062,7 +3601,8 @@
{
switch (val) {
case DIE_OOPS:
- ftrace_dump();
+ if (ftrace_dump_on_oops)
+ ftrace_dump();
break;
default:
break;
@@ -3103,7 +3643,6 @@
trace_seq_reset(s);
}
-
void ftrace_dump(void)
{
static DEFINE_SPINLOCK(ftrace_dump_lock);
@@ -3128,6 +3667,9 @@
atomic_inc(&global_trace.data[cpu]->disabled);
}
+ /* don't look at user memory in panic mode */
+ trace_flags &= ~TRACE_ITER_SYM_USEROBJ;
+
printk(KERN_TRACE "Dumping ftrace buffer:\n");
iter.tr = &global_trace;
@@ -3221,7 +3763,6 @@
#endif
/* All seems OK, enable tracing */
- global_trace.ctrl = tracer_enabled;
tracing_disabled = 0;
atomic_notifier_chain_register(&panic_notifier_list,
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 8465ad0..28c15c2 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -8,6 +8,7 @@
#include <linux/ring_buffer.h>
#include <linux/mmiotrace.h>
#include <linux/ftrace.h>
+#include <trace/boot.h>
enum trace_type {
__TRACE_FIRST_TYPE = 0,
@@ -21,7 +22,11 @@
TRACE_SPECIAL,
TRACE_MMIO_RW,
TRACE_MMIO_MAP,
- TRACE_BOOT,
+ TRACE_BRANCH,
+ TRACE_BOOT_CALL,
+ TRACE_BOOT_RET,
+ TRACE_FN_RET,
+ TRACE_USER_STACK,
__TRACE_LAST_TYPE
};
@@ -38,6 +43,7 @@
unsigned char flags;
unsigned char preempt_count;
int pid;
+ int tgid;
};
/*
@@ -48,6 +54,16 @@
unsigned long ip;
unsigned long parent_ip;
};
+
+/* Function return entry */
+struct ftrace_ret_entry {
+ struct trace_entry ent;
+ unsigned long ip;
+ unsigned long parent_ip;
+ unsigned long long calltime;
+ unsigned long long rettime;
+ unsigned long overrun;
+};
extern struct tracer boot_tracer;
/*
@@ -85,6 +101,11 @@
unsigned long caller[FTRACE_STACK_ENTRIES];
};
+struct userstack_entry {
+ struct trace_entry ent;
+ unsigned long caller[FTRACE_STACK_ENTRIES];
+};
+
/*
* ftrace_printk entry:
*/
@@ -112,9 +133,24 @@
struct mmiotrace_map map;
};
-struct trace_boot {
+struct trace_boot_call {
struct trace_entry ent;
- struct boot_trace initcall;
+ struct boot_trace_call boot_call;
+};
+
+struct trace_boot_ret {
+ struct trace_entry ent;
+ struct boot_trace_ret boot_ret;
+};
+
+#define TRACE_FUNC_SIZE 30
+#define TRACE_FILE_SIZE 20
+struct trace_branch {
+ struct trace_entry ent;
+ unsigned line;
+ char func[TRACE_FUNC_SIZE+1];
+ char file[TRACE_FILE_SIZE+1];
+ char correct;
};
/*
@@ -172,7 +208,6 @@
struct trace_array {
struct ring_buffer *buffer;
unsigned long entries;
- long ctrl;
int cpu;
cycle_t time_start;
struct task_struct *waiter;
@@ -212,13 +247,17 @@
IF_ASSIGN(var, ent, struct ctx_switch_entry, 0); \
IF_ASSIGN(var, ent, struct trace_field_cont, TRACE_CONT); \
IF_ASSIGN(var, ent, struct stack_entry, TRACE_STACK); \
+ IF_ASSIGN(var, ent, struct userstack_entry, TRACE_USER_STACK);\
IF_ASSIGN(var, ent, struct print_entry, TRACE_PRINT); \
IF_ASSIGN(var, ent, struct special_entry, 0); \
IF_ASSIGN(var, ent, struct trace_mmiotrace_rw, \
TRACE_MMIO_RW); \
IF_ASSIGN(var, ent, struct trace_mmiotrace_map, \
TRACE_MMIO_MAP); \
- IF_ASSIGN(var, ent, struct trace_boot, TRACE_BOOT); \
+ IF_ASSIGN(var, ent, struct trace_boot_call, TRACE_BOOT_CALL);\
+ IF_ASSIGN(var, ent, struct trace_boot_ret, TRACE_BOOT_RET);\
+ IF_ASSIGN(var, ent, struct trace_branch, TRACE_BRANCH); \
+ IF_ASSIGN(var, ent, struct ftrace_ret_entry, TRACE_FN_RET);\
__ftrace_bad_type(); \
} while (0)
@@ -229,29 +268,55 @@
TRACE_TYPE_UNHANDLED = 2 /* Relay to other output functions */
};
+
+/*
+ * An option specific to a tracer. This is a boolean value.
+ * The bit is the bit index that sets its value on the
+ * flags value in struct tracer_flags.
+ */
+struct tracer_opt {
+ const char *name; /* Will appear on the trace_options file */
+ u32 bit; /* Mask assigned in val field in tracer_flags */
+};
+
+/*
+ * The set of specific options for a tracer. Your tracer
+ * have to set the initial value of the flags val.
+ */
+struct tracer_flags {
+ u32 val;
+ struct tracer_opt *opts;
+};
+
+/* Makes more easy to define a tracer opt */
+#define TRACER_OPT(s, b) .name = #s, .bit = b
+
/*
* A specific tracer, represented by methods that operate on a trace array:
*/
struct tracer {
const char *name;
- void (*init)(struct trace_array *tr);
+ /* Your tracer should raise a warning if init fails */
+ int (*init)(struct trace_array *tr);
void (*reset)(struct trace_array *tr);
+ void (*start)(struct trace_array *tr);
+ void (*stop)(struct trace_array *tr);
void (*open)(struct trace_iterator *iter);
void (*pipe_open)(struct trace_iterator *iter);
void (*close)(struct trace_iterator *iter);
- void (*start)(struct trace_iterator *iter);
- void (*stop)(struct trace_iterator *iter);
ssize_t (*read)(struct trace_iterator *iter,
struct file *filp, char __user *ubuf,
size_t cnt, loff_t *ppos);
- void (*ctrl_update)(struct trace_array *tr);
#ifdef CONFIG_FTRACE_STARTUP_TEST
int (*selftest)(struct tracer *trace,
struct trace_array *tr);
#endif
enum print_line_t (*print_line)(struct trace_iterator *iter);
+ /* If you handled the flag setting, return 0 */
+ int (*set_flag)(u32 old_flags, u32 bit, int set);
struct tracer *next;
int print_max;
+ struct tracer_flags *flags;
};
struct trace_seq {
@@ -279,8 +344,11 @@
unsigned long iter_flags;
loff_t pos;
long idx;
+
+ cpumask_t started;
};
+int tracing_is_enabled(void);
void trace_wake_up(void);
void tracing_reset(struct trace_array *tr, int cpu);
int tracing_open_generic(struct inode *inode, struct file *filp);
@@ -320,9 +388,14 @@
unsigned long ip,
unsigned long parent_ip,
unsigned long flags, int pc);
+void
+trace_function_return(struct ftrace_retfunc *trace);
void tracing_start_cmdline_record(void);
void tracing_stop_cmdline_record(void);
+void tracing_sched_switch_assign_trace(struct trace_array *tr);
+void tracing_stop_sched_switch_record(void);
+void tracing_start_sched_switch_record(void);
int register_tracer(struct tracer *type);
void unregister_tracer(struct tracer *type);
@@ -383,12 +456,18 @@
struct trace_array *tr);
extern int trace_selftest_startup_sysprof(struct tracer *trace,
struct trace_array *tr);
+extern int trace_selftest_startup_branch(struct tracer *trace,
+ struct trace_array *tr);
#endif /* CONFIG_FTRACE_STARTUP_TEST */
extern void *head_page(struct trace_array_cpu *data);
extern int trace_seq_printf(struct trace_seq *s, const char *fmt, ...);
extern void trace_seq_print_cont(struct trace_seq *s,
struct trace_iterator *iter);
+
+extern int
+seq_print_ip_sym(struct trace_seq *s, unsigned long ip,
+ unsigned long sym_flags);
extern ssize_t trace_seq_to_user(struct trace_seq *s, char __user *ubuf,
size_t cnt);
extern long ns2usecs(cycle_t nsec);
@@ -396,6 +475,17 @@
extern unsigned long trace_flags;
+/* Standard output formatting function used for function return traces */
+#ifdef CONFIG_FUNCTION_RET_TRACER
+extern enum print_line_t print_return_function(struct trace_iterator *iter);
+#else
+static inline enum print_line_t
+print_return_function(struct trace_iterator *iter)
+{
+ return TRACE_TYPE_UNHANDLED;
+}
+#endif
+
/*
* trace_iterator_flags is an enumeration that defines bit
* positions into trace_flags that controls the output.
@@ -415,8 +505,92 @@
TRACE_ITER_STACKTRACE = 0x100,
TRACE_ITER_SCHED_TREE = 0x200,
TRACE_ITER_PRINTK = 0x400,
+ TRACE_ITER_PREEMPTONLY = 0x800,
+ TRACE_ITER_BRANCH = 0x1000,
+ TRACE_ITER_ANNOTATE = 0x2000,
+ TRACE_ITER_USERSTACKTRACE = 0x4000,
+ TRACE_ITER_SYM_USEROBJ = 0x8000
};
+/*
+ * TRACE_ITER_SYM_MASK masks the options in trace_flags that
+ * control the output of kernel symbols.
+ */
+#define TRACE_ITER_SYM_MASK \
+ (TRACE_ITER_PRINT_PARENT|TRACE_ITER_SYM_OFFSET|TRACE_ITER_SYM_ADDR)
+
extern struct tracer nop_trace;
+/**
+ * ftrace_preempt_disable - disable preemption scheduler safe
+ *
+ * When tracing can happen inside the scheduler, there exists
+ * cases that the tracing might happen before the need_resched
+ * flag is checked. If this happens and the tracer calls
+ * preempt_enable (after a disable), a schedule might take place
+ * causing an infinite recursion.
+ *
+ * To prevent this, we read the need_recshed flag before
+ * disabling preemption. When we want to enable preemption we
+ * check the flag, if it is set, then we call preempt_enable_no_resched.
+ * Otherwise, we call preempt_enable.
+ *
+ * The rational for doing the above is that if need resched is set
+ * and we have yet to reschedule, we are either in an atomic location
+ * (where we do not need to check for scheduling) or we are inside
+ * the scheduler and do not want to resched.
+ */
+static inline int ftrace_preempt_disable(void)
+{
+ int resched;
+
+ resched = need_resched();
+ preempt_disable_notrace();
+
+ return resched;
+}
+
+/**
+ * ftrace_preempt_enable - enable preemption scheduler safe
+ * @resched: the return value from ftrace_preempt_disable
+ *
+ * This is a scheduler safe way to enable preemption and not miss
+ * any preemption checks. The disabled saved the state of preemption.
+ * If resched is set, then we were either inside an atomic or
+ * are inside the scheduler (we would have already scheduled
+ * otherwise). In this case, we do not want to call normal
+ * preempt_enable, but preempt_enable_no_resched instead.
+ */
+static inline void ftrace_preempt_enable(int resched)
+{
+ if (resched)
+ preempt_enable_no_resched_notrace();
+ else
+ preempt_enable_notrace();
+}
+
+#ifdef CONFIG_BRANCH_TRACER
+extern int enable_branch_tracing(struct trace_array *tr);
+extern void disable_branch_tracing(void);
+static inline int trace_branch_enable(struct trace_array *tr)
+{
+ if (trace_flags & TRACE_ITER_BRANCH)
+ return enable_branch_tracing(tr);
+ return 0;
+}
+static inline void trace_branch_disable(void)
+{
+ /* due to races, always disable */
+ disable_branch_tracing();
+}
+#else
+static inline int trace_branch_enable(struct trace_array *tr)
+{
+ return 0;
+}
+static inline void trace_branch_disable(void)
+{
+}
+#endif /* CONFIG_BRANCH_TRACER */
+
#endif /* _LINUX_KERNEL_TRACE_H */
diff --git a/kernel/trace/trace_boot.c b/kernel/trace/trace_boot.c
index d0a5e50..a4fa2c5 100644
--- a/kernel/trace/trace_boot.c
+++ b/kernel/trace/trace_boot.c
@@ -13,73 +13,117 @@
#include "trace.h"
static struct trace_array *boot_trace;
-static int trace_boot_enabled;
+static bool pre_initcalls_finished;
-
-/* Should be started after do_pre_smp_initcalls() in init/main.c */
+/* Tells the boot tracer that the pre_smp_initcalls are finished.
+ * So we are ready .
+ * It doesn't enable sched events tracing however.
+ * You have to call enable_boot_trace to do so.
+ */
void start_boot_trace(void)
{
- trace_boot_enabled = 1;
+ pre_initcalls_finished = true;
}
-void stop_boot_trace(void)
+void enable_boot_trace(void)
{
- trace_boot_enabled = 0;
+ if (pre_initcalls_finished)
+ tracing_start_sched_switch_record();
}
-void reset_boot_trace(struct trace_array *tr)
+void disable_boot_trace(void)
{
- stop_boot_trace();
+ if (pre_initcalls_finished)
+ tracing_stop_sched_switch_record();
}
-static void boot_trace_init(struct trace_array *tr)
+static void reset_boot_trace(struct trace_array *tr)
+{
+ int cpu;
+
+ tr->time_start = ftrace_now(tr->cpu);
+
+ for_each_online_cpu(cpu)
+ tracing_reset(tr, cpu);
+}
+
+static int boot_trace_init(struct trace_array *tr)
{
int cpu;
boot_trace = tr;
- trace_boot_enabled = 0;
-
for_each_cpu_mask(cpu, cpu_possible_map)
tracing_reset(tr, cpu);
+
+ tracing_sched_switch_assign_trace(tr);
+ return 0;
}
-static void boot_trace_ctrl_update(struct trace_array *tr)
+static enum print_line_t
+initcall_call_print_line(struct trace_iterator *iter)
{
- if (tr->ctrl)
- start_boot_trace();
+ struct trace_entry *entry = iter->ent;
+ struct trace_seq *s = &iter->seq;
+ struct trace_boot_call *field;
+ struct boot_trace_call *call;
+ u64 ts;
+ unsigned long nsec_rem;
+ int ret;
+
+ trace_assign_type(field, entry);
+ call = &field->boot_call;
+ ts = iter->ts;
+ nsec_rem = do_div(ts, 1000000000);
+
+ ret = trace_seq_printf(s, "[%5ld.%09ld] calling %s @ %i\n",
+ (unsigned long)ts, nsec_rem, call->func, call->caller);
+
+ if (!ret)
+ return TRACE_TYPE_PARTIAL_LINE;
else
- stop_boot_trace();
+ return TRACE_TYPE_HANDLED;
+}
+
+static enum print_line_t
+initcall_ret_print_line(struct trace_iterator *iter)
+{
+ struct trace_entry *entry = iter->ent;
+ struct trace_seq *s = &iter->seq;
+ struct trace_boot_ret *field;
+ struct boot_trace_ret *init_ret;
+ u64 ts;
+ unsigned long nsec_rem;
+ int ret;
+
+ trace_assign_type(field, entry);
+ init_ret = &field->boot_ret;
+ ts = iter->ts;
+ nsec_rem = do_div(ts, 1000000000);
+
+ ret = trace_seq_printf(s, "[%5ld.%09ld] initcall %s "
+ "returned %d after %llu msecs\n",
+ (unsigned long) ts,
+ nsec_rem,
+ init_ret->func, init_ret->result, init_ret->duration);
+
+ if (!ret)
+ return TRACE_TYPE_PARTIAL_LINE;
+ else
+ return TRACE_TYPE_HANDLED;
}
static enum print_line_t initcall_print_line(struct trace_iterator *iter)
{
- int ret;
struct trace_entry *entry = iter->ent;
- struct trace_boot *field = (struct trace_boot *)entry;
- struct boot_trace *it = &field->initcall;
- struct trace_seq *s = &iter->seq;
- struct timespec calltime = ktime_to_timespec(it->calltime);
- struct timespec rettime = ktime_to_timespec(it->rettime);
- if (entry->type == TRACE_BOOT) {
- ret = trace_seq_printf(s, "[%5ld.%09ld] calling %s @ %i\n",
- calltime.tv_sec,
- calltime.tv_nsec,
- it->func, it->caller);
- if (!ret)
- return TRACE_TYPE_PARTIAL_LINE;
-
- ret = trace_seq_printf(s, "[%5ld.%09ld] initcall %s "
- "returned %d after %lld msecs\n",
- rettime.tv_sec,
- rettime.tv_nsec,
- it->func, it->result, it->duration);
-
- if (!ret)
- return TRACE_TYPE_PARTIAL_LINE;
- return TRACE_TYPE_HANDLED;
+ switch (entry->type) {
+ case TRACE_BOOT_CALL:
+ return initcall_call_print_line(iter);
+ case TRACE_BOOT_RET:
+ return initcall_ret_print_line(iter);
+ default:
+ return TRACE_TYPE_UNHANDLED;
}
- return TRACE_TYPE_UNHANDLED;
}
struct tracer boot_tracer __read_mostly =
@@ -87,27 +131,24 @@
.name = "initcall",
.init = boot_trace_init,
.reset = reset_boot_trace,
- .ctrl_update = boot_trace_ctrl_update,
.print_line = initcall_print_line,
};
-void trace_boot(struct boot_trace *it, initcall_t fn)
+void trace_boot_call(struct boot_trace_call *bt, initcall_t fn)
{
struct ring_buffer_event *event;
- struct trace_boot *entry;
- struct trace_array_cpu *data;
+ struct trace_boot_call *entry;
unsigned long irq_flags;
struct trace_array *tr = boot_trace;
- if (!trace_boot_enabled)
+ if (!pre_initcalls_finished)
return;
/* Get its name now since this function could
* disappear because it is in the .init section.
*/
- sprint_symbol(it->func, (unsigned long)fn);
+ sprint_symbol(bt->func, (unsigned long)fn);
preempt_disable();
- data = tr->data[smp_processor_id()];
event = ring_buffer_lock_reserve(tr->buffer, sizeof(*entry),
&irq_flags);
@@ -115,8 +156,37 @@
goto out;
entry = ring_buffer_event_data(event);
tracing_generic_entry_update(&entry->ent, 0, 0);
- entry->ent.type = TRACE_BOOT;
- entry->initcall = *it;
+ entry->ent.type = TRACE_BOOT_CALL;
+ entry->boot_call = *bt;
+ ring_buffer_unlock_commit(tr->buffer, event, irq_flags);
+
+ trace_wake_up();
+
+ out:
+ preempt_enable();
+}
+
+void trace_boot_ret(struct boot_trace_ret *bt, initcall_t fn)
+{
+ struct ring_buffer_event *event;
+ struct trace_boot_ret *entry;
+ unsigned long irq_flags;
+ struct trace_array *tr = boot_trace;
+
+ if (!pre_initcalls_finished)
+ return;
+
+ sprint_symbol(bt->func, (unsigned long)fn);
+ preempt_disable();
+
+ event = ring_buffer_lock_reserve(tr->buffer, sizeof(*entry),
+ &irq_flags);
+ if (!event)
+ goto out;
+ entry = ring_buffer_event_data(event);
+ tracing_generic_entry_update(&entry->ent, 0, 0);
+ entry->ent.type = TRACE_BOOT_RET;
+ entry->boot_ret = *bt;
ring_buffer_unlock_commit(tr->buffer, event, irq_flags);
trace_wake_up();
diff --git a/kernel/trace/trace_branch.c b/kernel/trace/trace_branch.c
new file mode 100644
index 0000000..877ee88
--- /dev/null
+++ b/kernel/trace/trace_branch.c
@@ -0,0 +1,341 @@
+/*
+ * unlikely profiler
+ *
+ * Copyright (C) 2008 Steven Rostedt <srostedt@redhat.com>
+ */
+#include <linux/kallsyms.h>
+#include <linux/seq_file.h>
+#include <linux/spinlock.h>
+#include <linux/debugfs.h>
+#include <linux/uaccess.h>
+#include <linux/module.h>
+#include <linux/ftrace.h>
+#include <linux/hash.h>
+#include <linux/fs.h>
+#include <asm/local.h>
+#include "trace.h"
+
+#ifdef CONFIG_BRANCH_TRACER
+
+static int branch_tracing_enabled __read_mostly;
+static DEFINE_MUTEX(branch_tracing_mutex);
+static struct trace_array *branch_tracer;
+
+static void
+probe_likely_condition(struct ftrace_branch_data *f, int val, int expect)
+{
+ struct trace_array *tr = branch_tracer;
+ struct ring_buffer_event *event;
+ struct trace_branch *entry;
+ unsigned long flags, irq_flags;
+ int cpu, pc;
+ const char *p;
+
+ /*
+ * I would love to save just the ftrace_likely_data pointer, but
+ * this code can also be used by modules. Ugly things can happen
+ * if the module is unloaded, and then we go and read the
+ * pointer. This is slower, but much safer.
+ */
+
+ if (unlikely(!tr))
+ return;
+
+ raw_local_irq_save(flags);
+ cpu = raw_smp_processor_id();
+ if (atomic_inc_return(&tr->data[cpu]->disabled) != 1)
+ goto out;
+
+ event = ring_buffer_lock_reserve(tr->buffer, sizeof(*entry),
+ &irq_flags);
+ if (!event)
+ goto out;
+
+ pc = preempt_count();
+ entry = ring_buffer_event_data(event);
+ tracing_generic_entry_update(&entry->ent, flags, pc);
+ entry->ent.type = TRACE_BRANCH;
+
+ /* Strip off the path, only save the file */
+ p = f->file + strlen(f->file);
+ while (p >= f->file && *p != '/')
+ p--;
+ p++;
+
+ strncpy(entry->func, f->func, TRACE_FUNC_SIZE);
+ strncpy(entry->file, p, TRACE_FILE_SIZE);
+ entry->func[TRACE_FUNC_SIZE] = 0;
+ entry->file[TRACE_FILE_SIZE] = 0;
+ entry->line = f->line;
+ entry->correct = val == expect;
+
+ ring_buffer_unlock_commit(tr->buffer, event, irq_flags);
+
+ out:
+ atomic_dec(&tr->data[cpu]->disabled);
+ raw_local_irq_restore(flags);
+}
+
+static inline
+void trace_likely_condition(struct ftrace_branch_data *f, int val, int expect)
+{
+ if (!branch_tracing_enabled)
+ return;
+
+ probe_likely_condition(f, val, expect);
+}
+
+int enable_branch_tracing(struct trace_array *tr)
+{
+ int ret = 0;
+
+ mutex_lock(&branch_tracing_mutex);
+ branch_tracer = tr;
+ /*
+ * Must be seen before enabling. The reader is a condition
+ * where we do not need a matching rmb()
+ */
+ smp_wmb();
+ branch_tracing_enabled++;
+ mutex_unlock(&branch_tracing_mutex);
+
+ return ret;
+}
+
+void disable_branch_tracing(void)
+{
+ mutex_lock(&branch_tracing_mutex);
+
+ if (!branch_tracing_enabled)
+ goto out_unlock;
+
+ branch_tracing_enabled--;
+
+ out_unlock:
+ mutex_unlock(&branch_tracing_mutex);
+}
+
+static void start_branch_trace(struct trace_array *tr)
+{
+ enable_branch_tracing(tr);
+}
+
+static void stop_branch_trace(struct trace_array *tr)
+{
+ disable_branch_tracing();
+}
+
+static int branch_trace_init(struct trace_array *tr)
+{
+ int cpu;
+
+ for_each_online_cpu(cpu)
+ tracing_reset(tr, cpu);
+
+ start_branch_trace(tr);
+ return 0;
+}
+
+static void branch_trace_reset(struct trace_array *tr)
+{
+ stop_branch_trace(tr);
+}
+
+struct tracer branch_trace __read_mostly =
+{
+ .name = "branch",
+ .init = branch_trace_init,
+ .reset = branch_trace_reset,
+#ifdef CONFIG_FTRACE_SELFTEST
+ .selftest = trace_selftest_startup_branch,
+#endif
+};
+
+__init static int init_branch_trace(void)
+{
+ return register_tracer(&branch_trace);
+}
+
+device_initcall(init_branch_trace);
+#else
+static inline
+void trace_likely_condition(struct ftrace_branch_data *f, int val, int expect)
+{
+}
+#endif /* CONFIG_BRANCH_TRACER */
+
+void ftrace_likely_update(struct ftrace_branch_data *f, int val, int expect)
+{
+ /*
+ * I would love to have a trace point here instead, but the
+ * trace point code is so inundated with unlikely and likely
+ * conditions that the recursive nightmare that exists is too
+ * much to try to get working. At least for now.
+ */
+ trace_likely_condition(f, val, expect);
+
+ /* FIXME: Make this atomic! */
+ if (val == expect)
+ f->correct++;
+ else
+ f->incorrect++;
+}
+EXPORT_SYMBOL(ftrace_likely_update);
+
+struct ftrace_pointer {
+ void *start;
+ void *stop;
+ int hit;
+};
+
+static void *
+t_next(struct seq_file *m, void *v, loff_t *pos)
+{
+ const struct ftrace_pointer *f = m->private;
+ struct ftrace_branch_data *p = v;
+
+ (*pos)++;
+
+ if (v == (void *)1)
+ return f->start;
+
+ ++p;
+
+ if ((void *)p >= (void *)f->stop)
+ return NULL;
+
+ return p;
+}
+
+static void *t_start(struct seq_file *m, loff_t *pos)
+{
+ void *t = (void *)1;
+ loff_t l = 0;
+
+ for (; t && l < *pos; t = t_next(m, t, &l))
+ ;
+
+ return t;
+}
+
+static void t_stop(struct seq_file *m, void *p)
+{
+}
+
+static int t_show(struct seq_file *m, void *v)
+{
+ const struct ftrace_pointer *fp = m->private;
+ struct ftrace_branch_data *p = v;
+ const char *f;
+ long percent;
+
+ if (v == (void *)1) {
+ if (fp->hit)
+ seq_printf(m, " miss hit %% ");
+ else
+ seq_printf(m, " correct incorrect %% ");
+ seq_printf(m, " Function "
+ " File Line\n"
+ " ------- --------- - "
+ " -------- "
+ " ---- ----\n");
+ return 0;
+ }
+
+ /* Only print the file, not the path */
+ f = p->file + strlen(p->file);
+ while (f >= p->file && *f != '/')
+ f--;
+ f++;
+
+ /*
+ * The miss is overlayed on correct, and hit on incorrect.
+ */
+ if (p->correct) {
+ percent = p->incorrect * 100;
+ percent /= p->correct + p->incorrect;
+ } else
+ percent = p->incorrect ? 100 : -1;
+
+ seq_printf(m, "%8lu %8lu ", p->correct, p->incorrect);
+ if (percent < 0)
+ seq_printf(m, " X ");
+ else
+ seq_printf(m, "%3ld ", percent);
+ seq_printf(m, "%-30.30s %-20.20s %d\n", p->func, f, p->line);
+ return 0;
+}
+
+static struct seq_operations tracing_likely_seq_ops = {
+ .start = t_start,
+ .next = t_next,
+ .stop = t_stop,
+ .show = t_show,
+};
+
+static int tracing_branch_open(struct inode *inode, struct file *file)
+{
+ int ret;
+
+ ret = seq_open(file, &tracing_likely_seq_ops);
+ if (!ret) {
+ struct seq_file *m = file->private_data;
+ m->private = (void *)inode->i_private;
+ }
+
+ return ret;
+}
+
+static const struct file_operations tracing_branch_fops = {
+ .open = tracing_branch_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+};
+
+#ifdef CONFIG_PROFILE_ALL_BRANCHES
+extern unsigned long __start_branch_profile[];
+extern unsigned long __stop_branch_profile[];
+
+static const struct ftrace_pointer ftrace_branch_pos = {
+ .start = __start_branch_profile,
+ .stop = __stop_branch_profile,
+ .hit = 1,
+};
+
+#endif /* CONFIG_PROFILE_ALL_BRANCHES */
+
+extern unsigned long __start_annotated_branch_profile[];
+extern unsigned long __stop_annotated_branch_profile[];
+
+static const struct ftrace_pointer ftrace_annotated_branch_pos = {
+ .start = __start_annotated_branch_profile,
+ .stop = __stop_annotated_branch_profile,
+};
+
+static __init int ftrace_branch_init(void)
+{
+ struct dentry *d_tracer;
+ struct dentry *entry;
+
+ d_tracer = tracing_init_dentry();
+
+ entry = debugfs_create_file("profile_annotated_branch", 0444, d_tracer,
+ (void *)&ftrace_annotated_branch_pos,
+ &tracing_branch_fops);
+ if (!entry)
+ pr_warning("Could not create debugfs "
+ "'profile_annotatet_branch' entry\n");
+
+#ifdef CONFIG_PROFILE_ALL_BRANCHES
+ entry = debugfs_create_file("profile_branch", 0444, d_tracer,
+ (void *)&ftrace_branch_pos,
+ &tracing_branch_fops);
+ if (!entry)
+ pr_warning("Could not create debugfs"
+ " 'profile_branch' entry\n");
+#endif
+
+ return 0;
+}
+
+device_initcall(ftrace_branch_init);
diff --git a/kernel/trace/trace_functions.c b/kernel/trace/trace_functions.c
index 0f85a64..e74f6d0 100644
--- a/kernel/trace/trace_functions.c
+++ b/kernel/trace/trace_functions.c
@@ -42,24 +42,20 @@
tracing_stop_cmdline_record();
}
-static void function_trace_init(struct trace_array *tr)
+static int function_trace_init(struct trace_array *tr)
{
- if (tr->ctrl)
- start_function_trace(tr);
+ start_function_trace(tr);
+ return 0;
}
static void function_trace_reset(struct trace_array *tr)
{
- if (tr->ctrl)
- stop_function_trace(tr);
+ stop_function_trace(tr);
}
-static void function_trace_ctrl_update(struct trace_array *tr)
+static void function_trace_start(struct trace_array *tr)
{
- if (tr->ctrl)
- start_function_trace(tr);
- else
- stop_function_trace(tr);
+ function_reset(tr);
}
static struct tracer function_trace __read_mostly =
@@ -67,7 +63,7 @@
.name = "function",
.init = function_trace_init,
.reset = function_trace_reset,
- .ctrl_update = function_trace_ctrl_update,
+ .start = function_trace_start,
#ifdef CONFIG_FTRACE_SELFTEST
.selftest = trace_selftest_startup_function,
#endif
diff --git a/kernel/trace/trace_functions_return.c b/kernel/trace/trace_functions_return.c
new file mode 100644
index 0000000..e00d645
--- /dev/null
+++ b/kernel/trace/trace_functions_return.c
@@ -0,0 +1,98 @@
+/*
+ *
+ * Function return tracer.
+ * Copyright (c) 2008 Frederic Weisbecker <fweisbec@gmail.com>
+ * Mostly borrowed from function tracer which
+ * is Copyright (c) Steven Rostedt <srostedt@redhat.com>
+ *
+ */
+#include <linux/debugfs.h>
+#include <linux/uaccess.h>
+#include <linux/ftrace.h>
+#include <linux/fs.h>
+
+#include "trace.h"
+
+
+#define TRACE_RETURN_PRINT_OVERRUN 0x1
+static struct tracer_opt trace_opts[] = {
+ /* Display overruns or not */
+ { TRACER_OPT(overrun, TRACE_RETURN_PRINT_OVERRUN) },
+ { } /* Empty entry */
+};
+
+static struct tracer_flags tracer_flags = {
+ .val = 0, /* Don't display overruns by default */
+ .opts = trace_opts
+};
+
+
+static int return_trace_init(struct trace_array *tr)
+{
+ int cpu;
+ for_each_online_cpu(cpu)
+ tracing_reset(tr, cpu);
+
+ return register_ftrace_return(&trace_function_return);
+}
+
+static void return_trace_reset(struct trace_array *tr)
+{
+ unregister_ftrace_return();
+}
+
+
+enum print_line_t
+print_return_function(struct trace_iterator *iter)
+{
+ struct trace_seq *s = &iter->seq;
+ struct trace_entry *entry = iter->ent;
+ struct ftrace_ret_entry *field;
+ int ret;
+
+ if (entry->type == TRACE_FN_RET) {
+ trace_assign_type(field, entry);
+ ret = trace_seq_printf(s, "%pF -> ", (void *)field->parent_ip);
+ if (!ret)
+ return TRACE_TYPE_PARTIAL_LINE;
+
+ ret = seq_print_ip_sym(s, field->ip,
+ trace_flags & TRACE_ITER_SYM_MASK);
+ if (!ret)
+ return TRACE_TYPE_PARTIAL_LINE;
+
+ ret = trace_seq_printf(s, " (%llu ns)",
+ field->rettime - field->calltime);
+ if (!ret)
+ return TRACE_TYPE_PARTIAL_LINE;
+
+ if (tracer_flags.val & TRACE_RETURN_PRINT_OVERRUN) {
+ ret = trace_seq_printf(s, " (Overruns: %lu)",
+ field->overrun);
+ if (!ret)
+ return TRACE_TYPE_PARTIAL_LINE;
+ }
+
+ ret = trace_seq_printf(s, "\n");
+ if (!ret)
+ return TRACE_TYPE_PARTIAL_LINE;
+
+ return TRACE_TYPE_HANDLED;
+ }
+ return TRACE_TYPE_UNHANDLED;
+}
+
+static struct tracer return_trace __read_mostly = {
+ .name = "return",
+ .init = return_trace_init,
+ .reset = return_trace_reset,
+ .print_line = print_return_function,
+ .flags = &tracer_flags,
+};
+
+static __init int init_return_trace(void)
+{
+ return register_tracer(&return_trace);
+}
+
+device_initcall(init_return_trace);
diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c
index 9c74071..7c2e326 100644
--- a/kernel/trace/trace_irqsoff.c
+++ b/kernel/trace/trace_irqsoff.c
@@ -353,15 +353,28 @@
}
#endif /* CONFIG_PREEMPT_TRACER */
+/*
+ * save_tracer_enabled is used to save the state of the tracer_enabled
+ * variable when we disable it when we open a trace output file.
+ */
+static int save_tracer_enabled;
+
static void start_irqsoff_tracer(struct trace_array *tr)
{
register_ftrace_function(&trace_ops);
- tracer_enabled = 1;
+ if (tracing_is_enabled()) {
+ tracer_enabled = 1;
+ save_tracer_enabled = 1;
+ } else {
+ tracer_enabled = 0;
+ save_tracer_enabled = 0;
+ }
}
static void stop_irqsoff_tracer(struct trace_array *tr)
{
tracer_enabled = 0;
+ save_tracer_enabled = 0;
unregister_ftrace_function(&trace_ops);
}
@@ -370,53 +383,55 @@
irqsoff_trace = tr;
/* make sure that the tracer is visible */
smp_wmb();
-
- if (tr->ctrl)
- start_irqsoff_tracer(tr);
+ start_irqsoff_tracer(tr);
}
static void irqsoff_tracer_reset(struct trace_array *tr)
{
- if (tr->ctrl)
- stop_irqsoff_tracer(tr);
+ stop_irqsoff_tracer(tr);
}
-static void irqsoff_tracer_ctrl_update(struct trace_array *tr)
+static void irqsoff_tracer_start(struct trace_array *tr)
{
- if (tr->ctrl)
- start_irqsoff_tracer(tr);
- else
- stop_irqsoff_tracer(tr);
+ tracer_enabled = 1;
+ save_tracer_enabled = 1;
+}
+
+static void irqsoff_tracer_stop(struct trace_array *tr)
+{
+ tracer_enabled = 0;
+ save_tracer_enabled = 0;
}
static void irqsoff_tracer_open(struct trace_iterator *iter)
{
/* stop the trace while dumping */
- if (iter->tr->ctrl)
- stop_irqsoff_tracer(iter->tr);
+ tracer_enabled = 0;
}
static void irqsoff_tracer_close(struct trace_iterator *iter)
{
- if (iter->tr->ctrl)
- start_irqsoff_tracer(iter->tr);
+ /* restart tracing */
+ tracer_enabled = save_tracer_enabled;
}
#ifdef CONFIG_IRQSOFF_TRACER
-static void irqsoff_tracer_init(struct trace_array *tr)
+static int irqsoff_tracer_init(struct trace_array *tr)
{
trace_type = TRACER_IRQS_OFF;
__irqsoff_tracer_init(tr);
+ return 0;
}
static struct tracer irqsoff_tracer __read_mostly =
{
.name = "irqsoff",
.init = irqsoff_tracer_init,
.reset = irqsoff_tracer_reset,
+ .start = irqsoff_tracer_start,
+ .stop = irqsoff_tracer_stop,
.open = irqsoff_tracer_open,
.close = irqsoff_tracer_close,
- .ctrl_update = irqsoff_tracer_ctrl_update,
.print_max = 1,
#ifdef CONFIG_FTRACE_SELFTEST
.selftest = trace_selftest_startup_irqsoff,
@@ -428,11 +443,12 @@
#endif
#ifdef CONFIG_PREEMPT_TRACER
-static void preemptoff_tracer_init(struct trace_array *tr)
+static int preemptoff_tracer_init(struct trace_array *tr)
{
trace_type = TRACER_PREEMPT_OFF;
__irqsoff_tracer_init(tr);
+ return 0;
}
static struct tracer preemptoff_tracer __read_mostly =
@@ -440,9 +456,10 @@
.name = "preemptoff",
.init = preemptoff_tracer_init,
.reset = irqsoff_tracer_reset,
+ .start = irqsoff_tracer_start,
+ .stop = irqsoff_tracer_stop,
.open = irqsoff_tracer_open,
.close = irqsoff_tracer_close,
- .ctrl_update = irqsoff_tracer_ctrl_update,
.print_max = 1,
#ifdef CONFIG_FTRACE_SELFTEST
.selftest = trace_selftest_startup_preemptoff,
@@ -456,11 +473,12 @@
#if defined(CONFIG_IRQSOFF_TRACER) && \
defined(CONFIG_PREEMPT_TRACER)
-static void preemptirqsoff_tracer_init(struct trace_array *tr)
+static int preemptirqsoff_tracer_init(struct trace_array *tr)
{
trace_type = TRACER_IRQS_OFF | TRACER_PREEMPT_OFF;
__irqsoff_tracer_init(tr);
+ return 0;
}
static struct tracer preemptirqsoff_tracer __read_mostly =
@@ -468,9 +486,10 @@
.name = "preemptirqsoff",
.init = preemptirqsoff_tracer_init,
.reset = irqsoff_tracer_reset,
+ .start = irqsoff_tracer_start,
+ .stop = irqsoff_tracer_stop,
.open = irqsoff_tracer_open,
.close = irqsoff_tracer_close,
- .ctrl_update = irqsoff_tracer_ctrl_update,
.print_max = 1,
#ifdef CONFIG_FTRACE_SELFTEST
.selftest = trace_selftest_startup_preemptirqsoff,
diff --git a/kernel/trace/trace_mmiotrace.c b/kernel/trace/trace_mmiotrace.c
index e62cbf7..2a98a20 100644
--- a/kernel/trace/trace_mmiotrace.c
+++ b/kernel/trace/trace_mmiotrace.c
@@ -32,34 +32,29 @@
tracing_reset(tr, cpu);
}
-static void mmio_trace_init(struct trace_array *tr)
+static int mmio_trace_init(struct trace_array *tr)
{
pr_debug("in %s\n", __func__);
mmio_trace_array = tr;
- if (tr->ctrl) {
- mmio_reset_data(tr);
- enable_mmiotrace();
- }
+
+ mmio_reset_data(tr);
+ enable_mmiotrace();
+ return 0;
}
static void mmio_trace_reset(struct trace_array *tr)
{
pr_debug("in %s\n", __func__);
- if (tr->ctrl)
- disable_mmiotrace();
+
+ disable_mmiotrace();
mmio_reset_data(tr);
mmio_trace_array = NULL;
}
-static void mmio_trace_ctrl_update(struct trace_array *tr)
+static void mmio_trace_start(struct trace_array *tr)
{
pr_debug("in %s\n", __func__);
- if (tr->ctrl) {
- mmio_reset_data(tr);
- enable_mmiotrace();
- } else {
- disable_mmiotrace();
- }
+ mmio_reset_data(tr);
}
static int mmio_print_pcidev(struct trace_seq *s, const struct pci_dev *dev)
@@ -296,10 +291,10 @@
.name = "mmiotrace",
.init = mmio_trace_init,
.reset = mmio_trace_reset,
+ .start = mmio_trace_start,
.pipe_open = mmio_pipe_open,
.close = mmio_close,
.read = mmio_read,
- .ctrl_update = mmio_trace_ctrl_update,
.print_line = mmio_print_line,
};
diff --git a/kernel/trace/trace_nop.c b/kernel/trace/trace_nop.c
index 4592b48..b9767ac 100644
--- a/kernel/trace/trace_nop.c
+++ b/kernel/trace/trace_nop.c
@@ -12,6 +12,27 @@
#include "trace.h"
+/* Our two options */
+enum {
+ TRACE_NOP_OPT_ACCEPT = 0x1,
+ TRACE_NOP_OPT_REFUSE = 0x2
+};
+
+/* Options for the tracer (see trace_options file) */
+static struct tracer_opt nop_opts[] = {
+ /* Option that will be accepted by set_flag callback */
+ { TRACER_OPT(test_nop_accept, TRACE_NOP_OPT_ACCEPT) },
+ /* Option that will be refused by set_flag callback */
+ { TRACER_OPT(test_nop_refuse, TRACE_NOP_OPT_REFUSE) },
+ { } /* Always set a last empty entry */
+};
+
+static struct tracer_flags nop_flags = {
+ /* You can check your flags value here when you want. */
+ .val = 0, /* By default: all flags disabled */
+ .opts = nop_opts
+};
+
static struct trace_array *ctx_trace;
static void start_nop_trace(struct trace_array *tr)
@@ -24,7 +45,7 @@
/* Nothing to do! */
}
-static void nop_trace_init(struct trace_array *tr)
+static int nop_trace_init(struct trace_array *tr)
{
int cpu;
ctx_trace = tr;
@@ -32,33 +53,53 @@
for_each_online_cpu(cpu)
tracing_reset(tr, cpu);
- if (tr->ctrl)
- start_nop_trace(tr);
+ start_nop_trace(tr);
+ return 0;
}
static void nop_trace_reset(struct trace_array *tr)
{
- if (tr->ctrl)
- stop_nop_trace(tr);
+ stop_nop_trace(tr);
}
-static void nop_trace_ctrl_update(struct trace_array *tr)
+/* It only serves as a signal handler and a callback to
+ * accept or refuse tthe setting of a flag.
+ * If you don't implement it, then the flag setting will be
+ * automatically accepted.
+ */
+static int nop_set_flag(u32 old_flags, u32 bit, int set)
{
- /* When starting a new trace, reset the buffers */
- if (tr->ctrl)
- start_nop_trace(tr);
- else
- stop_nop_trace(tr);
+ /*
+ * Note that you don't need to update nop_flags.val yourself.
+ * The tracing Api will do it automatically if you return 0
+ */
+ if (bit == TRACE_NOP_OPT_ACCEPT) {
+ printk(KERN_DEBUG "nop_test_accept flag set to %d: we accept."
+ " Now cat trace_options to see the result\n",
+ set);
+ return 0;
+ }
+
+ if (bit == TRACE_NOP_OPT_REFUSE) {
+ printk(KERN_DEBUG "nop_test_refuse flag set to %d: we refuse."
+ "Now cat trace_options to see the result\n",
+ set);
+ return -EINVAL;
+ }
+
+ return 0;
}
+
struct tracer nop_trace __read_mostly =
{
.name = "nop",
.init = nop_trace_init,
.reset = nop_trace_reset,
- .ctrl_update = nop_trace_ctrl_update,
#ifdef CONFIG_FTRACE_SELFTEST
.selftest = trace_selftest_startup_nop,
#endif
+ .flags = &nop_flags,
+ .set_flag = nop_set_flag
};
diff --git a/kernel/trace/trace_sched_switch.c b/kernel/trace/trace_sched_switch.c
index b8f56be..8633905 100644
--- a/kernel/trace/trace_sched_switch.c
+++ b/kernel/trace/trace_sched_switch.c
@@ -16,7 +16,8 @@
static struct trace_array *ctx_trace;
static int __read_mostly tracer_enabled;
-static atomic_t sched_ref;
+static int sched_ref;
+static DEFINE_MUTEX(sched_register_mutex);
static void
probe_sched_switch(struct rq *__rq, struct task_struct *prev,
@@ -27,7 +28,7 @@
int cpu;
int pc;
- if (!atomic_read(&sched_ref))
+ if (!sched_ref)
return;
tracing_record_cmdline(prev);
@@ -123,20 +124,18 @@
static void tracing_start_sched_switch(void)
{
- long ref;
-
- ref = atomic_inc_return(&sched_ref);
- if (ref == 1)
+ mutex_lock(&sched_register_mutex);
+ if (!(sched_ref++))
tracing_sched_register();
+ mutex_unlock(&sched_register_mutex);
}
static void tracing_stop_sched_switch(void)
{
- long ref;
-
- ref = atomic_dec_and_test(&sched_ref);
- if (ref)
+ mutex_lock(&sched_register_mutex);
+ if (!(--sched_ref))
tracing_sched_unregister();
+ mutex_unlock(&sched_register_mutex);
}
void tracing_start_cmdline_record(void)
@@ -149,40 +148,86 @@
tracing_stop_sched_switch();
}
+/**
+ * tracing_start_sched_switch_record - start tracing context switches
+ *
+ * Turns on context switch tracing for a tracer.
+ */
+void tracing_start_sched_switch_record(void)
+{
+ if (unlikely(!ctx_trace)) {
+ WARN_ON(1);
+ return;
+ }
+
+ tracing_start_sched_switch();
+
+ mutex_lock(&sched_register_mutex);
+ tracer_enabled++;
+ mutex_unlock(&sched_register_mutex);
+}
+
+/**
+ * tracing_stop_sched_switch_record - start tracing context switches
+ *
+ * Turns off context switch tracing for a tracer.
+ */
+void tracing_stop_sched_switch_record(void)
+{
+ mutex_lock(&sched_register_mutex);
+ tracer_enabled--;
+ WARN_ON(tracer_enabled < 0);
+ mutex_unlock(&sched_register_mutex);
+
+ tracing_stop_sched_switch();
+}
+
+/**
+ * tracing_sched_switch_assign_trace - assign a trace array for ctx switch
+ * @tr: trace array pointer to assign
+ *
+ * Some tracers might want to record the context switches in their
+ * trace. This function lets those tracers assign the trace array
+ * to use.
+ */
+void tracing_sched_switch_assign_trace(struct trace_array *tr)
+{
+ ctx_trace = tr;
+}
+
static void start_sched_trace(struct trace_array *tr)
{
sched_switch_reset(tr);
- tracing_start_cmdline_record();
- tracer_enabled = 1;
+ tracing_start_sched_switch_record();
}
static void stop_sched_trace(struct trace_array *tr)
{
- tracer_enabled = 0;
- tracing_stop_cmdline_record();
+ tracing_stop_sched_switch_record();
}
-static void sched_switch_trace_init(struct trace_array *tr)
+static int sched_switch_trace_init(struct trace_array *tr)
{
ctx_trace = tr;
-
- if (tr->ctrl)
- start_sched_trace(tr);
+ start_sched_trace(tr);
+ return 0;
}
static void sched_switch_trace_reset(struct trace_array *tr)
{
- if (tr->ctrl)
+ if (sched_ref)
stop_sched_trace(tr);
}
-static void sched_switch_trace_ctrl_update(struct trace_array *tr)
+static void sched_switch_trace_start(struct trace_array *tr)
{
- /* When starting a new trace, reset the buffers */
- if (tr->ctrl)
- start_sched_trace(tr);
- else
- stop_sched_trace(tr);
+ sched_switch_reset(tr);
+ tracing_start_sched_switch();
+}
+
+static void sched_switch_trace_stop(struct trace_array *tr)
+{
+ tracing_stop_sched_switch();
}
static struct tracer sched_switch_trace __read_mostly =
@@ -190,7 +235,8 @@
.name = "sched_switch",
.init = sched_switch_trace_init,
.reset = sched_switch_trace_reset,
- .ctrl_update = sched_switch_trace_ctrl_update,
+ .start = sched_switch_trace_start,
+ .stop = sched_switch_trace_stop,
#ifdef CONFIG_FTRACE_SELFTEST
.selftest = trace_selftest_startup_sched_switch,
#endif
@@ -198,14 +244,6 @@
__init static int init_sched_switch_trace(void)
{
- int ret = 0;
-
- if (atomic_read(&sched_ref))
- ret = tracing_sched_register();
- if (ret) {
- pr_info("error registering scheduler trace\n");
- return ret;
- }
return register_tracer(&sched_switch_trace);
}
device_initcall(init_sched_switch_trace);
diff --git a/kernel/trace/trace_sched_wakeup.c b/kernel/trace/trace_sched_wakeup.c
index 3ae93f1..0067b49 100644
--- a/kernel/trace/trace_sched_wakeup.c
+++ b/kernel/trace/trace_sched_wakeup.c
@@ -50,8 +50,7 @@
return;
pc = preempt_count();
- resched = need_resched();
- preempt_disable_notrace();
+ resched = ftrace_preempt_disable();
cpu = raw_smp_processor_id();
data = tr->data[cpu];
@@ -81,15 +80,7 @@
out:
atomic_dec(&data->disabled);
- /*
- * To prevent recursion from the scheduler, if the
- * resched flag was set before we entered, then
- * don't reschedule.
- */
- if (resched)
- preempt_enable_no_resched_notrace();
- else
- preempt_enable_notrace();
+ ftrace_preempt_enable(resched);
}
static struct ftrace_ops trace_ops __read_mostly =
@@ -271,6 +262,12 @@
atomic_dec(&wakeup_trace->data[cpu]->disabled);
}
+/*
+ * save_tracer_enabled is used to save the state of the tracer_enabled
+ * variable when we disable it when we open a trace output file.
+ */
+static int save_tracer_enabled;
+
static void start_wakeup_tracer(struct trace_array *tr)
{
int ret;
@@ -309,7 +306,13 @@
register_ftrace_function(&trace_ops);
- tracer_enabled = 1;
+ if (tracing_is_enabled()) {
+ tracer_enabled = 1;
+ save_tracer_enabled = 1;
+ } else {
+ tracer_enabled = 0;
+ save_tracer_enabled = 0;
+ }
return;
fail_deprobe_wake_new:
@@ -321,49 +324,53 @@
static void stop_wakeup_tracer(struct trace_array *tr)
{
tracer_enabled = 0;
+ save_tracer_enabled = 0;
unregister_ftrace_function(&trace_ops);
unregister_trace_sched_switch(probe_wakeup_sched_switch);
unregister_trace_sched_wakeup_new(probe_wakeup);
unregister_trace_sched_wakeup(probe_wakeup);
}
-static void wakeup_tracer_init(struct trace_array *tr)
+static int wakeup_tracer_init(struct trace_array *tr)
{
wakeup_trace = tr;
-
- if (tr->ctrl)
- start_wakeup_tracer(tr);
+ start_wakeup_tracer(tr);
+ return 0;
}
static void wakeup_tracer_reset(struct trace_array *tr)
{
- if (tr->ctrl) {
- stop_wakeup_tracer(tr);
- /* make sure we put back any tasks we are tracing */
- wakeup_reset(tr);
- }
+ stop_wakeup_tracer(tr);
+ /* make sure we put back any tasks we are tracing */
+ wakeup_reset(tr);
}
-static void wakeup_tracer_ctrl_update(struct trace_array *tr)
+static void wakeup_tracer_start(struct trace_array *tr)
{
- if (tr->ctrl)
- start_wakeup_tracer(tr);
- else
- stop_wakeup_tracer(tr);
+ wakeup_reset(tr);
+ tracer_enabled = 1;
+ save_tracer_enabled = 1;
+}
+
+static void wakeup_tracer_stop(struct trace_array *tr)
+{
+ tracer_enabled = 0;
+ save_tracer_enabled = 0;
}
static void wakeup_tracer_open(struct trace_iterator *iter)
{
/* stop the trace while dumping */
- if (iter->tr->ctrl)
- stop_wakeup_tracer(iter->tr);
+ tracer_enabled = 0;
}
static void wakeup_tracer_close(struct trace_iterator *iter)
{
/* forget about any processes we were recording */
- if (iter->tr->ctrl)
- start_wakeup_tracer(iter->tr);
+ if (save_tracer_enabled) {
+ wakeup_reset(iter->tr);
+ tracer_enabled = 1;
+ }
}
static struct tracer wakeup_tracer __read_mostly =
@@ -371,9 +378,10 @@
.name = "wakeup",
.init = wakeup_tracer_init,
.reset = wakeup_tracer_reset,
+ .start = wakeup_tracer_start,
+ .stop = wakeup_tracer_stop,
.open = wakeup_tracer_open,
.close = wakeup_tracer_close,
- .ctrl_update = wakeup_tracer_ctrl_update,
.print_max = 1,
#ifdef CONFIG_FTRACE_SELFTEST
.selftest = trace_selftest_startup_wakeup,
diff --git a/kernel/trace/trace_selftest.c b/kernel/trace/trace_selftest.c
index 90bc752..88c8eb7 100644
--- a/kernel/trace/trace_selftest.c
+++ b/kernel/trace/trace_selftest.c
@@ -13,6 +13,7 @@
case TRACE_STACK:
case TRACE_PRINT:
case TRACE_SPECIAL:
+ case TRACE_BRANCH:
return 1;
}
return 0;
@@ -51,7 +52,7 @@
int cpu, ret = 0;
/* Don't allow flipping of max traces now */
- raw_local_irq_save(flags);
+ local_irq_save(flags);
__raw_spin_lock(&ftrace_max_lock);
cnt = ring_buffer_entries(tr->buffer);
@@ -62,7 +63,7 @@
break;
}
__raw_spin_unlock(&ftrace_max_lock);
- raw_local_irq_restore(flags);
+ local_irq_restore(flags);
if (count)
*count = cnt;
@@ -70,6 +71,11 @@
return ret;
}
+static inline void warn_failed_init_tracer(struct tracer *trace, int init_ret)
+{
+ printk(KERN_WARNING "Failed to init %s tracer, init returned %d\n",
+ trace->name, init_ret);
+}
#ifdef CONFIG_FUNCTION_TRACER
#ifdef CONFIG_DYNAMIC_FTRACE
@@ -110,8 +116,11 @@
ftrace_set_filter(func_name, strlen(func_name), 1);
/* enable tracing */
- tr->ctrl = 1;
- trace->init(tr);
+ ret = trace->init(tr);
+ if (ret) {
+ warn_failed_init_tracer(trace, ret);
+ goto out;
+ }
/* Sleep for a 1/10 of a second */
msleep(100);
@@ -134,13 +143,13 @@
msleep(100);
/* stop the tracing. */
- tr->ctrl = 0;
- trace->ctrl_update(tr);
+ tracing_stop();
ftrace_enabled = 0;
/* check the trace buffer */
ret = trace_test_buffer(tr, &count);
trace->reset(tr);
+ tracing_start();
/* we should only have one item */
if (!ret && count != 1) {
@@ -148,6 +157,7 @@
ret = -1;
goto out;
}
+
out:
ftrace_enabled = save_ftrace_enabled;
tracer_enabled = save_tracer_enabled;
@@ -180,18 +190,22 @@
ftrace_enabled = 1;
tracer_enabled = 1;
- tr->ctrl = 1;
- trace->init(tr);
+ ret = trace->init(tr);
+ if (ret) {
+ warn_failed_init_tracer(trace, ret);
+ goto out;
+ }
+
/* Sleep for a 1/10 of a second */
msleep(100);
/* stop the tracing. */
- tr->ctrl = 0;
- trace->ctrl_update(tr);
+ tracing_stop();
ftrace_enabled = 0;
/* check the trace buffer */
ret = trace_test_buffer(tr, &count);
trace->reset(tr);
+ tracing_start();
if (!ret && !count) {
printk(KERN_CONT ".. no entries found ..");
@@ -223,8 +237,12 @@
int ret;
/* start the tracing */
- tr->ctrl = 1;
- trace->init(tr);
+ ret = trace->init(tr);
+ if (ret) {
+ warn_failed_init_tracer(trace, ret);
+ return ret;
+ }
+
/* reset the max latency */
tracing_max_latency = 0;
/* disable interrupts for a bit */
@@ -232,13 +250,13 @@
udelay(100);
local_irq_enable();
/* stop the tracing. */
- tr->ctrl = 0;
- trace->ctrl_update(tr);
+ tracing_stop();
/* check both trace buffers */
ret = trace_test_buffer(tr, NULL);
if (!ret)
ret = trace_test_buffer(&max_tr, &count);
trace->reset(tr);
+ tracing_start();
if (!ret && !count) {
printk(KERN_CONT ".. no entries found ..");
@@ -259,9 +277,26 @@
unsigned long count;
int ret;
+ /*
+ * Now that the big kernel lock is no longer preemptable,
+ * and this is called with the BKL held, it will always
+ * fail. If preemption is already disabled, simply
+ * pass the test. When the BKL is removed, or becomes
+ * preemptible again, we will once again test this,
+ * so keep it in.
+ */
+ if (preempt_count()) {
+ printk(KERN_CONT "can not test ... force ");
+ return 0;
+ }
+
/* start the tracing */
- tr->ctrl = 1;
- trace->init(tr);
+ ret = trace->init(tr);
+ if (ret) {
+ warn_failed_init_tracer(trace, ret);
+ return ret;
+ }
+
/* reset the max latency */
tracing_max_latency = 0;
/* disable preemption for a bit */
@@ -269,13 +304,13 @@
udelay(100);
preempt_enable();
/* stop the tracing. */
- tr->ctrl = 0;
- trace->ctrl_update(tr);
+ tracing_stop();
/* check both trace buffers */
ret = trace_test_buffer(tr, NULL);
if (!ret)
ret = trace_test_buffer(&max_tr, &count);
trace->reset(tr);
+ tracing_start();
if (!ret && !count) {
printk(KERN_CONT ".. no entries found ..");
@@ -296,9 +331,25 @@
unsigned long count;
int ret;
+ /*
+ * Now that the big kernel lock is no longer preemptable,
+ * and this is called with the BKL held, it will always
+ * fail. If preemption is already disabled, simply
+ * pass the test. When the BKL is removed, or becomes
+ * preemptible again, we will once again test this,
+ * so keep it in.
+ */
+ if (preempt_count()) {
+ printk(KERN_CONT "can not test ... force ");
+ return 0;
+ }
+
/* start the tracing */
- tr->ctrl = 1;
- trace->init(tr);
+ ret = trace->init(tr);
+ if (ret) {
+ warn_failed_init_tracer(trace, ret);
+ goto out;
+ }
/* reset the max latency */
tracing_max_latency = 0;
@@ -312,27 +363,30 @@
local_irq_enable();
/* stop the tracing. */
- tr->ctrl = 0;
- trace->ctrl_update(tr);
+ tracing_stop();
/* check both trace buffers */
ret = trace_test_buffer(tr, NULL);
- if (ret)
+ if (ret) {
+ tracing_start();
goto out;
+ }
ret = trace_test_buffer(&max_tr, &count);
- if (ret)
+ if (ret) {
+ tracing_start();
goto out;
+ }
if (!ret && !count) {
printk(KERN_CONT ".. no entries found ..");
ret = -1;
+ tracing_start();
goto out;
}
/* do the test by disabling interrupts first this time */
tracing_max_latency = 0;
- tr->ctrl = 1;
- trace->ctrl_update(tr);
+ tracing_start();
preempt_disable();
local_irq_disable();
udelay(100);
@@ -341,8 +395,7 @@
local_irq_enable();
/* stop the tracing. */
- tr->ctrl = 0;
- trace->ctrl_update(tr);
+ tracing_stop();
/* check both trace buffers */
ret = trace_test_buffer(tr, NULL);
if (ret)
@@ -358,6 +411,7 @@
out:
trace->reset(tr);
+ tracing_start();
tracing_max_latency = save_max;
return ret;
@@ -423,8 +477,12 @@
wait_for_completion(&isrt);
/* start the tracing */
- tr->ctrl = 1;
- trace->init(tr);
+ ret = trace->init(tr);
+ if (ret) {
+ warn_failed_init_tracer(trace, ret);
+ return ret;
+ }
+
/* reset the max latency */
tracing_max_latency = 0;
@@ -448,8 +506,7 @@
msleep(100);
/* stop the tracing. */
- tr->ctrl = 0;
- trace->ctrl_update(tr);
+ tracing_stop();
/* check both trace buffers */
ret = trace_test_buffer(tr, NULL);
if (!ret)
@@ -457,6 +514,7 @@
trace->reset(tr);
+ tracing_start();
tracing_max_latency = save_max;
@@ -480,16 +538,20 @@
int ret;
/* start the tracing */
- tr->ctrl = 1;
- trace->init(tr);
+ ret = trace->init(tr);
+ if (ret) {
+ warn_failed_init_tracer(trace, ret);
+ return ret;
+ }
+
/* Sleep for a 1/10 of a second */
msleep(100);
/* stop the tracing. */
- tr->ctrl = 0;
- trace->ctrl_update(tr);
+ tracing_stop();
/* check the trace buffer */
ret = trace_test_buffer(tr, &count);
trace->reset(tr);
+ tracing_start();
if (!ret && !count) {
printk(KERN_CONT ".. no entries found ..");
@@ -508,17 +570,48 @@
int ret;
/* start the tracing */
- tr->ctrl = 1;
- trace->init(tr);
+ ret = trace->init(tr);
+ if (ret) {
+ warn_failed_init_tracer(trace, ret);
+ return 0;
+ }
+
/* Sleep for a 1/10 of a second */
msleep(100);
/* stop the tracing. */
- tr->ctrl = 0;
- trace->ctrl_update(tr);
+ tracing_stop();
/* check the trace buffer */
ret = trace_test_buffer(tr, &count);
trace->reset(tr);
+ tracing_start();
return ret;
}
#endif /* CONFIG_SYSPROF_TRACER */
+
+#ifdef CONFIG_BRANCH_TRACER
+int
+trace_selftest_startup_branch(struct tracer *trace, struct trace_array *tr)
+{
+ unsigned long count;
+ int ret;
+
+ /* start the tracing */
+ ret = trace->init(tr);
+ if (ret) {
+ warn_failed_init_tracer(trace, ret);
+ return ret;
+ }
+
+ /* Sleep for a 1/10 of a second */
+ msleep(100);
+ /* stop the tracing. */
+ tracing_stop();
+ /* check the trace buffer */
+ ret = trace_test_buffer(tr, &count);
+ trace->reset(tr);
+ tracing_start();
+
+ return ret;
+}
+#endif /* CONFIG_BRANCH_TRACER */
diff --git a/kernel/trace/trace_stack.c b/kernel/trace/trace_stack.c
index 3bdb44b..fde3be1 100644
--- a/kernel/trace/trace_stack.c
+++ b/kernel/trace/trace_stack.c
@@ -107,8 +107,7 @@
if (unlikely(!ftrace_enabled || stack_trace_disabled))
return;
- resched = need_resched();
- preempt_disable_notrace();
+ resched = ftrace_preempt_disable();
cpu = raw_smp_processor_id();
/* no atomic needed, we only modify this variable by this cpu */
@@ -120,10 +119,7 @@
out:
per_cpu(trace_active, cpu)--;
/* prevent recursion in schedule */
- if (resched)
- preempt_enable_no_resched_notrace();
- else
- preempt_enable_notrace();
+ ftrace_preempt_enable(resched);
}
static struct ftrace_ops trace_ops __read_mostly =
diff --git a/kernel/trace/trace_sysprof.c b/kernel/trace/trace_sysprof.c
index 9587d3b..54960ed 100644
--- a/kernel/trace/trace_sysprof.c
+++ b/kernel/trace/trace_sysprof.c
@@ -261,27 +261,17 @@
mutex_unlock(&sample_timer_lock);
}
-static void stack_trace_init(struct trace_array *tr)
+static int stack_trace_init(struct trace_array *tr)
{
sysprof_trace = tr;
- if (tr->ctrl)
- start_stack_trace(tr);
+ start_stack_trace(tr);
+ return 0;
}
static void stack_trace_reset(struct trace_array *tr)
{
- if (tr->ctrl)
- stop_stack_trace(tr);
-}
-
-static void stack_trace_ctrl_update(struct trace_array *tr)
-{
- /* When starting a new trace, reset the buffers */
- if (tr->ctrl)
- start_stack_trace(tr);
- else
- stop_stack_trace(tr);
+ stop_stack_trace(tr);
}
static struct tracer stack_trace __read_mostly =
@@ -289,7 +279,6 @@
.name = "sysprof",
.init = stack_trace_init,
.reset = stack_trace_reset,
- .ctrl_update = stack_trace_ctrl_update,
#ifdef CONFIG_FTRACE_SELFTEST
.selftest = trace_selftest_startup_sysprof,
#endif
diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
index af8c856..7960274 100644
--- a/kernel/tracepoint.c
+++ b/kernel/tracepoint.c
@@ -43,6 +43,7 @@
*/
#define TRACEPOINT_HASH_BITS 6
#define TRACEPOINT_TABLE_SIZE (1 << TRACEPOINT_HASH_BITS)
+static struct hlist_head tracepoint_table[TRACEPOINT_TABLE_SIZE];
/*
* Note about RCU :
@@ -54,40 +55,43 @@
struct hlist_node hlist;
void **funcs;
int refcount; /* Number of times armed. 0 if disarmed. */
- struct rcu_head rcu;
- void *oldptr;
- unsigned char rcu_pending:1;
char name[0];
};
-static struct hlist_head tracepoint_table[TRACEPOINT_TABLE_SIZE];
+struct tp_probes {
+ union {
+ struct rcu_head rcu;
+ struct list_head list;
+ } u;
+ void *probes[0];
+};
-static void free_old_closure(struct rcu_head *head)
+static inline void *allocate_probes(int count)
{
- struct tracepoint_entry *entry = container_of(head,
- struct tracepoint_entry, rcu);
- kfree(entry->oldptr);
- /* Make sure we free the data before setting the pending flag to 0 */
- smp_wmb();
- entry->rcu_pending = 0;
+ struct tp_probes *p = kmalloc(count * sizeof(void *)
+ + sizeof(struct tp_probes), GFP_KERNEL);
+ return p == NULL ? NULL : p->probes;
}
-static void tracepoint_entry_free_old(struct tracepoint_entry *entry, void *old)
+static void rcu_free_old_probes(struct rcu_head *head)
{
- if (!old)
- return;
- entry->oldptr = old;
- entry->rcu_pending = 1;
- /* write rcu_pending before calling the RCU callback */
- smp_wmb();
- call_rcu_sched(&entry->rcu, free_old_closure);
+ kfree(container_of(head, struct tp_probes, u.rcu));
+}
+
+static inline void release_probes(void *old)
+{
+ if (old) {
+ struct tp_probes *tp_probes = container_of(old,
+ struct tp_probes, probes[0]);
+ call_rcu_sched(&tp_probes->u.rcu, rcu_free_old_probes);
+ }
}
static void debug_print_probes(struct tracepoint_entry *entry)
{
int i;
- if (!tracepoint_debug)
+ if (!tracepoint_debug || !entry->funcs)
return;
for (i = 0; entry->funcs[i]; i++)
@@ -111,12 +115,13 @@
return ERR_PTR(-EEXIST);
}
/* + 2 : one for new probe, one for NULL func */
- new = kzalloc((nr_probes + 2) * sizeof(void *), GFP_KERNEL);
+ new = allocate_probes(nr_probes + 2);
if (new == NULL)
return ERR_PTR(-ENOMEM);
if (old)
memcpy(new, old, nr_probes * sizeof(void *));
new[nr_probes] = probe;
+ new[nr_probes + 1] = NULL;
entry->refcount = nr_probes + 1;
entry->funcs = new;
debug_print_probes(entry);
@@ -132,7 +137,7 @@
old = entry->funcs;
if (!old)
- return NULL;
+ return ERR_PTR(-ENOENT);
debug_print_probes(entry);
/* (N -> M), (N > 1, M >= 0) probes */
@@ -151,13 +156,13 @@
int j = 0;
/* N -> M, (N > 1, M > 0) */
/* + 1 for NULL */
- new = kzalloc((nr_probes - nr_del + 1)
- * sizeof(void *), GFP_KERNEL);
+ new = allocate_probes(nr_probes - nr_del + 1);
if (new == NULL)
return ERR_PTR(-ENOMEM);
for (i = 0; old[i]; i++)
if ((probe && old[i] != probe))
new[j++] = old[i];
+ new[nr_probes - nr_del] = NULL;
entry->refcount = nr_probes - nr_del;
entry->funcs = new;
}
@@ -215,7 +220,6 @@
memcpy(&e->name[0], name, name_len);
e->funcs = NULL;
e->refcount = 0;
- e->rcu_pending = 0;
hlist_add_head(&e->hlist, head);
return e;
}
@@ -224,32 +228,10 @@
* Remove the tracepoint from the tracepoint hash table. Must be called with
* mutex_lock held.
*/
-static int remove_tracepoint(const char *name)
+static inline void remove_tracepoint(struct tracepoint_entry *e)
{
- struct hlist_head *head;
- struct hlist_node *node;
- struct tracepoint_entry *e;
- int found = 0;
- size_t len = strlen(name) + 1;
- u32 hash = jhash(name, len-1, 0);
-
- head = &tracepoint_table[hash & (TRACEPOINT_TABLE_SIZE - 1)];
- hlist_for_each_entry(e, node, head, hlist) {
- if (!strcmp(name, e->name)) {
- found = 1;
- break;
- }
- }
- if (!found)
- return -ENOENT;
- if (e->refcount)
- return -EBUSY;
hlist_del(&e->hlist);
- /* Make sure the call_rcu_sched has been executed */
- if (e->rcu_pending)
- rcu_barrier_sched();
kfree(e);
- return 0;
}
/*
@@ -280,6 +262,7 @@
static void disable_tracepoint(struct tracepoint *elem)
{
elem->state = 0;
+ rcu_assign_pointer(elem->funcs, NULL);
}
/**
@@ -320,6 +303,23 @@
module_update_tracepoints();
}
+static void *tracepoint_add_probe(const char *name, void *probe)
+{
+ struct tracepoint_entry *entry;
+ void *old;
+
+ entry = get_tracepoint(name);
+ if (!entry) {
+ entry = add_tracepoint(name);
+ if (IS_ERR(entry))
+ return entry;
+ }
+ old = tracepoint_entry_add_probe(entry, probe);
+ if (IS_ERR(old) && !entry->refcount)
+ remove_tracepoint(entry);
+ return old;
+}
+
/**
* tracepoint_probe_register - Connect a probe to a tracepoint
* @name: tracepoint name
@@ -330,44 +330,36 @@
*/
int tracepoint_probe_register(const char *name, void *probe)
{
- struct tracepoint_entry *entry;
- int ret = 0;
void *old;
mutex_lock(&tracepoints_mutex);
- entry = get_tracepoint(name);
- if (!entry) {
- entry = add_tracepoint(name);
- if (IS_ERR(entry)) {
- ret = PTR_ERR(entry);
- goto end;
- }
- }
- /*
- * If we detect that a call_rcu_sched is pending for this tracepoint,
- * make sure it's executed now.
- */
- if (entry->rcu_pending)
- rcu_barrier_sched();
- old = tracepoint_entry_add_probe(entry, probe);
- if (IS_ERR(old)) {
- ret = PTR_ERR(old);
- goto end;
- }
+ old = tracepoint_add_probe(name, probe);
mutex_unlock(&tracepoints_mutex);
+ if (IS_ERR(old))
+ return PTR_ERR(old);
+
tracepoint_update_probes(); /* may update entry */
- mutex_lock(&tracepoints_mutex);
- entry = get_tracepoint(name);
- WARN_ON(!entry);
- if (entry->rcu_pending)
- rcu_barrier_sched();
- tracepoint_entry_free_old(entry, old);
-end:
- mutex_unlock(&tracepoints_mutex);
- return ret;
+ release_probes(old);
+ return 0;
}
EXPORT_SYMBOL_GPL(tracepoint_probe_register);
+static void *tracepoint_remove_probe(const char *name, void *probe)
+{
+ struct tracepoint_entry *entry;
+ void *old;
+
+ entry = get_tracepoint(name);
+ if (!entry)
+ return ERR_PTR(-ENOENT);
+ old = tracepoint_entry_remove_probe(entry, probe);
+ if (IS_ERR(old))
+ return old;
+ if (!entry->refcount)
+ remove_tracepoint(entry);
+ return old;
+}
+
/**
* tracepoint_probe_unregister - Disconnect a probe from a tracepoint
* @name: tracepoint name
@@ -380,39 +372,105 @@
*/
int tracepoint_probe_unregister(const char *name, void *probe)
{
- struct tracepoint_entry *entry;
void *old;
- int ret = -ENOENT;
mutex_lock(&tracepoints_mutex);
- entry = get_tracepoint(name);
- if (!entry)
- goto end;
- if (entry->rcu_pending)
- rcu_barrier_sched();
- old = tracepoint_entry_remove_probe(entry, probe);
- if (!old) {
- printk(KERN_WARNING "Warning: Trying to unregister a probe"
- "that doesn't exist\n");
- goto end;
- }
+ old = tracepoint_remove_probe(name, probe);
mutex_unlock(&tracepoints_mutex);
+ if (IS_ERR(old))
+ return PTR_ERR(old);
+
tracepoint_update_probes(); /* may update entry */
- mutex_lock(&tracepoints_mutex);
- entry = get_tracepoint(name);
- if (!entry)
- goto end;
- if (entry->rcu_pending)
- rcu_barrier_sched();
- tracepoint_entry_free_old(entry, old);
- remove_tracepoint(name); /* Ignore busy error message */
- ret = 0;
-end:
- mutex_unlock(&tracepoints_mutex);
- return ret;
+ release_probes(old);
+ return 0;
}
EXPORT_SYMBOL_GPL(tracepoint_probe_unregister);
+static LIST_HEAD(old_probes);
+static int need_update;
+
+static void tracepoint_add_old_probes(void *old)
+{
+ need_update = 1;
+ if (old) {
+ struct tp_probes *tp_probes = container_of(old,
+ struct tp_probes, probes[0]);
+ list_add(&tp_probes->u.list, &old_probes);
+ }
+}
+
+/**
+ * tracepoint_probe_register_noupdate - register a probe but not connect
+ * @name: tracepoint name
+ * @probe: probe handler
+ *
+ * caller must call tracepoint_probe_update_all()
+ */
+int tracepoint_probe_register_noupdate(const char *name, void *probe)
+{
+ void *old;
+
+ mutex_lock(&tracepoints_mutex);
+ old = tracepoint_add_probe(name, probe);
+ if (IS_ERR(old)) {
+ mutex_unlock(&tracepoints_mutex);
+ return PTR_ERR(old);
+ }
+ tracepoint_add_old_probes(old);
+ mutex_unlock(&tracepoints_mutex);
+ return 0;
+}
+EXPORT_SYMBOL_GPL(tracepoint_probe_register_noupdate);
+
+/**
+ * tracepoint_probe_unregister_noupdate - remove a probe but not disconnect
+ * @name: tracepoint name
+ * @probe: probe function pointer
+ *
+ * caller must call tracepoint_probe_update_all()
+ */
+int tracepoint_probe_unregister_noupdate(const char *name, void *probe)
+{
+ void *old;
+
+ mutex_lock(&tracepoints_mutex);
+ old = tracepoint_remove_probe(name, probe);
+ if (IS_ERR(old)) {
+ mutex_unlock(&tracepoints_mutex);
+ return PTR_ERR(old);
+ }
+ tracepoint_add_old_probes(old);
+ mutex_unlock(&tracepoints_mutex);
+ return 0;
+}
+EXPORT_SYMBOL_GPL(tracepoint_probe_unregister_noupdate);
+
+/**
+ * tracepoint_probe_update_all - update tracepoints
+ */
+void tracepoint_probe_update_all(void)
+{
+ LIST_HEAD(release_probes);
+ struct tp_probes *pos, *next;
+
+ mutex_lock(&tracepoints_mutex);
+ if (!need_update) {
+ mutex_unlock(&tracepoints_mutex);
+ return;
+ }
+ if (!list_empty(&old_probes))
+ list_replace_init(&old_probes, &release_probes);
+ need_update = 0;
+ mutex_unlock(&tracepoints_mutex);
+
+ tracepoint_update_probes();
+ list_for_each_entry_safe(pos, next, &release_probes, u.list) {
+ list_del(&pos->u.list);
+ call_rcu_sched(&pos->u.rcu, rcu_free_old_probes);
+ }
+}
+EXPORT_SYMBOL_GPL(tracepoint_probe_update_all);
+
/**
* tracepoint_get_iter_range - Get a next tracepoint iterator given a range.
* @tracepoint: current tracepoints (in), next tracepoint (out)
@@ -483,3 +541,36 @@
iter->tracepoint = NULL;
}
EXPORT_SYMBOL_GPL(tracepoint_iter_reset);
+
+#ifdef CONFIG_MODULES
+
+int tracepoint_module_notify(struct notifier_block *self,
+ unsigned long val, void *data)
+{
+ struct module *mod = data;
+
+ switch (val) {
+ case MODULE_STATE_COMING:
+ tracepoint_update_probe_range(mod->tracepoints,
+ mod->tracepoints + mod->num_tracepoints);
+ break;
+ case MODULE_STATE_GOING:
+ tracepoint_update_probe_range(mod->tracepoints,
+ mod->tracepoints + mod->num_tracepoints);
+ break;
+ }
+ return 0;
+}
+
+struct notifier_block tracepoint_module_nb = {
+ .notifier_call = tracepoint_module_notify,
+ .priority = 0,
+};
+
+static int init_tracepoints(void)
+{
+ return register_module_notifier(&tracepoint_module_nb);
+}
+__initcall(init_tracepoints);
+
+#endif /* CONFIG_MODULES */
diff --git a/samples/tracepoints/tp-samples-trace.h b/samples/tracepoints/tp-samples-trace.h
index 0216b55..01724e0 100644
--- a/samples/tracepoints/tp-samples-trace.h
+++ b/samples/tracepoints/tp-samples-trace.h
@@ -4,10 +4,10 @@
#include <linux/proc_fs.h> /* for struct inode and struct file */
#include <linux/tracepoint.h>
-DEFINE_TRACE(subsys_event,
+DECLARE_TRACE(subsys_event,
TPPROTO(struct inode *inode, struct file *file),
TPARGS(inode, file));
-DEFINE_TRACE(subsys_eventb,
+DECLARE_TRACE(subsys_eventb,
TPPROTO(void),
TPARGS());
#endif
diff --git a/samples/tracepoints/tracepoint-probe-sample.c b/samples/tracepoints/tracepoint-probe-sample.c
index 55abfdd..e3a9648 100644
--- a/samples/tracepoints/tracepoint-probe-sample.c
+++ b/samples/tracepoints/tracepoint-probe-sample.c
@@ -46,6 +46,7 @@
{
unregister_trace_subsys_eventb(probe_subsys_eventb);
unregister_trace_subsys_event(probe_subsys_event);
+ tracepoint_synchronize_unregister();
}
module_exit(tp_sample_trace_exit);
diff --git a/samples/tracepoints/tracepoint-probe-sample2.c b/samples/tracepoints/tracepoint-probe-sample2.c
index 5e9fcf4..685a5ac 100644
--- a/samples/tracepoints/tracepoint-probe-sample2.c
+++ b/samples/tracepoints/tracepoint-probe-sample2.c
@@ -33,6 +33,7 @@
void __exit tp_sample_trace_exit(void)
{
unregister_trace_subsys_event(probe_subsys_event);
+ tracepoint_synchronize_unregister();
}
module_exit(tp_sample_trace_exit);
diff --git a/samples/tracepoints/tracepoint-sample.c b/samples/tracepoints/tracepoint-sample.c
index 4ae4b7f..00d1697 100644
--- a/samples/tracepoints/tracepoint-sample.c
+++ b/samples/tracepoints/tracepoint-sample.c
@@ -13,6 +13,9 @@
#include <linux/proc_fs.h>
#include "tp-samples-trace.h"
+DEFINE_TRACE(subsys_event);
+DEFINE_TRACE(subsys_eventb);
+
struct proc_dir_entry *pentry_example;
static int my_open(struct inode *inode, struct file *file)
diff --git a/scripts/Makefile.build b/scripts/Makefile.build
index 468fbc9..7a17677 100644
--- a/scripts/Makefile.build
+++ b/scripts/Makefile.build
@@ -198,16 +198,10 @@
fi;
endif
-ifdef CONFIG_64BIT
-arch_bits = 64
-else
-arch_bits = 32
-endif
-
ifdef CONFIG_FTRACE_MCOUNT_RECORD
-cmd_record_mcount = perl $(srctree)/scripts/recordmcount.pl \
- "$(ARCH)" "$(arch_bits)" "$(OBJDUMP)" "$(OBJCOPY)" "$(CC)" "$(LD)" \
- "$(NM)" "$(RM)" "$(MV)" "$(@)";
+cmd_record_mcount = perl $(srctree)/scripts/recordmcount.pl "$(ARCH)" \
+ "$(if $(CONFIG_64BIT),64,32)" \
+ "$(OBJDUMP)" "$(OBJCOPY)" "$(CC)" "$(LD)" "$(NM)" "$(RM)" "$(MV)" "$(@)";
endif
define rule_cc_o_c
diff --git a/scripts/bootgraph.pl b/scripts/bootgraph.pl
index d2c61ef..f0af9aa 100644
--- a/scripts/bootgraph.pl
+++ b/scripts/bootgraph.pl
@@ -78,11 +78,13 @@
}
if ($count == 0) {
- print "No data found in the dmesg. Make sure that 'printk.time=1' and\n";
- print "'initcall_debug' are passed on the kernel command line.\n\n";
- print "Usage: \n";
- print " dmesg | perl scripts/bootgraph.pl > output.svg\n\n";
- exit;
+ print STDERR <<END;
+No data found in the dmesg. Make sure that 'printk.time=1' and
+'initcall_debug' are passed on the kernel command line.
+Usage:
+ dmesg | perl scripts/bootgraph.pl > output.svg
+END
+ exit 1;
}
print "<?xml version=\"1.0\" standalone=\"no\"?> \n";
@@ -109,8 +111,8 @@
my %rows;
my $rowscount = 1;
my @initcalls = sort { $start{$a} <=> $start{$b} } keys(%start);
-my $key;
-foreach $key (@initcalls) {
+
+foreach my $key (@initcalls) {
my $duration = $end{$key} - $start{$key};
if ($duration >= $threshold) {
diff --git a/scripts/recordmcount.pl b/scripts/recordmcount.pl
index 6b9fe3e..0197e2f 100755
--- a/scripts/recordmcount.pl
+++ b/scripts/recordmcount.pl
@@ -130,10 +130,13 @@
my %convert; # List of local functions used that needs conversion
my $type;
+my $nm_regex; # Find the local functions (return function)
my $section_regex; # Find the start of a section
my $function_regex; # Find the name of a function
# (return offset and func name)
my $mcount_regex; # Find the call site to mcount (return offset)
+my $alignment; # The .align value to use for $mcount_section
+my $section_type; # Section header plus possible alignment command
if ($arch eq "x86") {
if ($bits == 64) {
@@ -143,11 +146,21 @@
}
}
+#
+# We base the defaults off of i386, the other archs may
+# feel free to change them in the below if statements.
+#
+$nm_regex = "^[0-9a-fA-F]+\\s+t\\s+(\\S+)";
+$section_regex = "Disassembly of section\\s+(\\S+):";
+$function_regex = "^([0-9a-fA-F]+)\\s+<(.*?)>:";
+$mcount_regex = "^\\s*([0-9a-fA-F]+):.*\\smcount\$";
+$section_type = '@progbits';
+$type = ".long";
+
if ($arch eq "x86_64") {
- $section_regex = "Disassembly of section\\s+(\\S+):";
- $function_regex = "^([0-9a-fA-F]+)\\s+<(.*?)>:";
$mcount_regex = "^\\s*([0-9a-fA-F]+):.*\\smcount([+-]0x[0-9a-zA-Z]+)?\$";
$type = ".quad";
+ $alignment = 8;
# force flags for this arch
$ld .= " -m elf_x86_64";
@@ -156,10 +169,7 @@
$cc .= " -m64";
} elsif ($arch eq "i386") {
- $section_regex = "Disassembly of section\\s+(\\S+):";
- $function_regex = "^([0-9a-fA-F]+)\\s+<(.*?)>:";
- $mcount_regex = "^\\s*([0-9a-fA-F]+):.*\\smcount\$";
- $type = ".long";
+ $alignment = 4;
# force flags for this arch
$ld .= " -m elf_i386";
@@ -167,6 +177,27 @@
$objcopy .= " -O elf32-i386";
$cc .= " -m32";
+} elsif ($arch eq "sh") {
+ $alignment = 2;
+
+ # force flags for this arch
+ $ld .= " -m shlelf_linux";
+ $objcopy .= " -O elf32-sh-linux";
+ $cc .= " -m32";
+
+} elsif ($arch eq "powerpc") {
+ $nm_regex = "^[0-9a-fA-F]+\\s+t\\s+(\\.?\\S+)";
+ $function_regex = "^([0-9a-fA-F]+)\\s+<(\\.?.*?)>:";
+ $mcount_regex = "^\\s*([0-9a-fA-F]+):.*\\s\\.?_mcount\$";
+
+ if ($bits == 64) {
+ $type = ".quad";
+ }
+
+} elsif ($arch eq "arm") {
+ $alignment = 2;
+ $section_type = '%progbits';
+
} else {
die "Arch $arch is not supported with CONFIG_FTRACE_MCOUNT_RECORD";
}
@@ -236,7 +267,7 @@
#
open (IN, "$nm $inputfile|") || die "error running $nm";
while (<IN>) {
- if (/^[0-9a-fA-F]+\s+t\s+(\S+)/) {
+ if (/$nm_regex/) {
$locals{$1} = 1;
} elsif (/^[0-9a-fA-F]+\s+([wW])\s+(\S+)/) {
$weak{$2} = $1;
@@ -287,7 +318,8 @@
if (!$opened) {
open(FILE, ">$mcount_s") || die "can't create $mcount_s\n";
$opened = 1;
- print FILE "\t.section $mcount_section,\"a\",\@progbits\n";
+ print FILE "\t.section $mcount_section,\"a\",$section_type\n";
+ print FILE "\t.align $alignment\n" if (defined($alignment));
}
printf FILE "\t%s %s + %d\n", $type, $ref_func, $offsets[$i] - $offset;
}
diff --git a/scripts/tracing/draw_functrace.py b/scripts/tracing/draw_functrace.py
new file mode 100644
index 0000000..902f9a9
--- /dev/null
+++ b/scripts/tracing/draw_functrace.py
@@ -0,0 +1,130 @@
+#!/usr/bin/python
+
+"""
+Copyright 2008 (c) Frederic Weisbecker <fweisbec@gmail.com>
+Licensed under the terms of the GNU GPL License version 2
+
+This script parses a trace provided by the function tracer in
+kernel/trace/trace_functions.c
+The resulted trace is processed into a tree to produce a more human
+view of the call stack by drawing textual but hierarchical tree of
+calls. Only the functions's names and the the call time are provided.
+
+Usage:
+ Be sure that you have CONFIG_FUNCTION_TRACER
+ # mkdir /debugfs
+ # mount -t debug debug /debug
+ # echo function > /debug/tracing/current_tracer
+ $ cat /debug/tracing/trace_pipe > ~/raw_trace_func
+ Wait some times but not too much, the script is a bit slow.
+ Break the pipe (Ctrl + Z)
+ $ scripts/draw_functrace.py < raw_trace_func > draw_functrace
+ Then you have your drawn trace in draw_functrace
+"""
+
+
+import sys, re
+
+class CallTree:
+ """ This class provides a tree representation of the functions
+ call stack. If a function has no parent in the kernel (interrupt,
+ syscall, kernel thread...) then it is attached to a virtual parent
+ called ROOT.
+ """
+ ROOT = None
+
+ def __init__(self, func, time = None, parent = None):
+ self._func = func
+ self._time = time
+ if parent is None:
+ self._parent = CallTree.ROOT
+ else:
+ self._parent = parent
+ self._children = []
+
+ def calls(self, func, calltime):
+ """ If a function calls another one, call this method to insert it
+ into the tree at the appropriate place.
+ @return: A reference to the newly created child node.
+ """
+ child = CallTree(func, calltime, self)
+ self._children.append(child)
+ return child
+
+ def getParent(self, func):
+ """ Retrieve the last parent of the current node that
+ has the name given by func. If this function is not
+ on a parent, then create it as new child of root
+ @return: A reference to the parent.
+ """
+ tree = self
+ while tree != CallTree.ROOT and tree._func != func:
+ tree = tree._parent
+ if tree == CallTree.ROOT:
+ child = CallTree.ROOT.calls(func, None)
+ return child
+ return tree
+
+ def __repr__(self):
+ return self.__toString("", True)
+
+ def __toString(self, branch, lastChild):
+ if self._time is not None:
+ s = "%s----%s (%s)\n" % (branch, self._func, self._time)
+ else:
+ s = "%s----%s\n" % (branch, self._func)
+
+ i = 0
+ if lastChild:
+ branch = branch[:-1] + " "
+ while i < len(self._children):
+ if i != len(self._children) - 1:
+ s += "%s" % self._children[i].__toString(branch +\
+ " |", False)
+ else:
+ s += "%s" % self._children[i].__toString(branch +\
+ " |", True)
+ i += 1
+ return s
+
+class BrokenLineException(Exception):
+ """If the last line is not complete because of the pipe breakage,
+ we want to stop the processing and ignore this line.
+ """
+ pass
+
+class CommentLineException(Exception):
+ """ If the line is a comment (as in the beginning of the trace file),
+ just ignore it.
+ """
+ pass
+
+
+def parseLine(line):
+ line = line.strip()
+ if line.startswith("#"):
+ raise CommentLineException
+ m = re.match("[^]]+?\\] +([0-9.]+): (\\w+) <-(\\w+)", line)
+ if m is None:
+ raise BrokenLineException
+ return (m.group(1), m.group(2), m.group(3))
+
+
+def main():
+ CallTree.ROOT = CallTree("Root (Nowhere)", None, None)
+ tree = CallTree.ROOT
+
+ for line in sys.stdin:
+ try:
+ calltime, callee, caller = parseLine(line)
+ except BrokenLineException:
+ break
+ except CommentLineException:
+ continue
+ tree = tree.getParent(caller)
+ tree = tree.calls(callee, calltime)
+
+ print CallTree.ROOT
+
+if __name__ == "__main__":
+ main()