Theodore Ts'o | abd4144 | 2009-04-11 15:51:18 -0400 | [diff] [blame] | 1 | Event Tracing |
| 2 | |
| 3 | Documentation written by Theodore Ts'o |
Tom Zanussi | 95b6960 | 2009-09-10 23:13:51 -0500 | [diff] [blame] | 4 | Updated by Li Zefan and Tom Zanussi |
Theodore Ts'o | abd4144 | 2009-04-11 15:51:18 -0400 | [diff] [blame] | 5 | |
Li Zefan | 143c145 | 2009-05-19 14:43:15 +0800 | [diff] [blame] | 6 | 1. Introduction |
| 7 | =============== |
Theodore Ts'o | abd4144 | 2009-04-11 15:51:18 -0400 | [diff] [blame] | 8 | |
| 9 | Tracepoints (see Documentation/trace/tracepoints.txt) can be used |
| 10 | without creating custom kernel modules to register probe functions |
| 11 | using the event tracing infrastructure. |
| 12 | |
| 13 | Not all tracepoints can be traced using the event tracing system; |
| 14 | the kernel developer must provide code snippets which define how the |
| 15 | tracing information is saved into the tracing buffer, and how the |
Li Zefan | 143c145 | 2009-05-19 14:43:15 +0800 | [diff] [blame] | 16 | tracing information should be printed. |
Theodore Ts'o | abd4144 | 2009-04-11 15:51:18 -0400 | [diff] [blame] | 17 | |
Li Zefan | 143c145 | 2009-05-19 14:43:15 +0800 | [diff] [blame] | 18 | 2. Using Event Tracing |
| 19 | ====================== |
| 20 | |
| 21 | 2.1 Via the 'set_event' interface |
| 22 | --------------------------------- |
Theodore Ts'o | abd4144 | 2009-04-11 15:51:18 -0400 | [diff] [blame] | 23 | |
| 24 | The events which are available for tracing can be found in the file |
GeunSik Lim | 52ad51e | 2009-09-07 21:37:17 +0900 | [diff] [blame] | 25 | /sys/kernel/debug/tracing/available_events. |
Theodore Ts'o | abd4144 | 2009-04-11 15:51:18 -0400 | [diff] [blame] | 26 | |
| 27 | To enable a particular event, such as 'sched_wakeup', simply echo it |
GeunSik Lim | 52ad51e | 2009-09-07 21:37:17 +0900 | [diff] [blame] | 28 | to /sys/kernel/debug/tracing/set_event. For example: |
Theodore Ts'o | abd4144 | 2009-04-11 15:51:18 -0400 | [diff] [blame] | 29 | |
GeunSik Lim | 52ad51e | 2009-09-07 21:37:17 +0900 | [diff] [blame] | 30 | # echo sched_wakeup >> /sys/kernel/debug/tracing/set_event |
Theodore Ts'o | abd4144 | 2009-04-11 15:51:18 -0400 | [diff] [blame] | 31 | |
Li Zefan | 143c145 | 2009-05-19 14:43:15 +0800 | [diff] [blame] | 32 | [ Note: '>>' is necessary, otherwise it will firstly disable |
| 33 | all the events. ] |
Theodore Ts'o | abd4144 | 2009-04-11 15:51:18 -0400 | [diff] [blame] | 34 | |
| 35 | To disable an event, echo the event name to the set_event file prefixed |
| 36 | with an exclamation point: |
| 37 | |
GeunSik Lim | 52ad51e | 2009-09-07 21:37:17 +0900 | [diff] [blame] | 38 | # echo '!sched_wakeup' >> /sys/kernel/debug/tracing/set_event |
Theodore Ts'o | abd4144 | 2009-04-11 15:51:18 -0400 | [diff] [blame] | 39 | |
Li Zefan | 143c145 | 2009-05-19 14:43:15 +0800 | [diff] [blame] | 40 | To disable all events, echo an empty line to the set_event file: |
Theodore Ts'o | abd4144 | 2009-04-11 15:51:18 -0400 | [diff] [blame] | 41 | |
GeunSik Lim | 52ad51e | 2009-09-07 21:37:17 +0900 | [diff] [blame] | 42 | # echo > /sys/kernel/debug/tracing/set_event |
Li Zefan | 143c145 | 2009-05-19 14:43:15 +0800 | [diff] [blame] | 43 | |
| 44 | To enable all events, echo '*:*' or '*:' to the set_event file: |
| 45 | |
GeunSik Lim | 52ad51e | 2009-09-07 21:37:17 +0900 | [diff] [blame] | 46 | # echo *:* > /sys/kernel/debug/tracing/set_event |
Theodore Ts'o | abd4144 | 2009-04-11 15:51:18 -0400 | [diff] [blame] | 47 | |
| 48 | The events are organized into subsystems, such as ext4, irq, sched, |
| 49 | etc., and a full event name looks like this: <subsystem>:<event>. The |
| 50 | subsystem name is optional, but it is displayed in the available_events |
| 51 | file. All of the events in a subsystem can be specified via the syntax |
| 52 | "<subsystem>:*"; for example, to enable all irq events, you can use the |
| 53 | command: |
| 54 | |
GeunSik Lim | 52ad51e | 2009-09-07 21:37:17 +0900 | [diff] [blame] | 55 | # echo 'irq:*' > /sys/kernel/debug/tracing/set_event |
Theodore Ts'o | abd4144 | 2009-04-11 15:51:18 -0400 | [diff] [blame] | 56 | |
Li Zefan | 143c145 | 2009-05-19 14:43:15 +0800 | [diff] [blame] | 57 | 2.2 Via the 'enable' toggle |
| 58 | --------------------------- |
Theodore Ts'o | abd4144 | 2009-04-11 15:51:18 -0400 | [diff] [blame] | 59 | |
GeunSik Lim | 52ad51e | 2009-09-07 21:37:17 +0900 | [diff] [blame] | 60 | The events available are also listed in /sys/kernel/debug/tracing/events/ hierarchy |
Li Zefan | 143c145 | 2009-05-19 14:43:15 +0800 | [diff] [blame] | 61 | of directories. |
Theodore Ts'o | abd4144 | 2009-04-11 15:51:18 -0400 | [diff] [blame] | 62 | |
Li Zefan | 143c145 | 2009-05-19 14:43:15 +0800 | [diff] [blame] | 63 | To enable event 'sched_wakeup': |
Theodore Ts'o | abd4144 | 2009-04-11 15:51:18 -0400 | [diff] [blame] | 64 | |
GeunSik Lim | 52ad51e | 2009-09-07 21:37:17 +0900 | [diff] [blame] | 65 | # echo 1 > /sys/kernel/debug/tracing/events/sched/sched_wakeup/enable |
Theodore Ts'o | abd4144 | 2009-04-11 15:51:18 -0400 | [diff] [blame] | 66 | |
Li Zefan | 143c145 | 2009-05-19 14:43:15 +0800 | [diff] [blame] | 67 | To disable it: |
Theodore Ts'o | abd4144 | 2009-04-11 15:51:18 -0400 | [diff] [blame] | 68 | |
GeunSik Lim | 52ad51e | 2009-09-07 21:37:17 +0900 | [diff] [blame] | 69 | # echo 0 > /sys/kernel/debug/tracing/events/sched/sched_wakeup/enable |
Theodore Ts'o | abd4144 | 2009-04-11 15:51:18 -0400 | [diff] [blame] | 70 | |
Li Zefan | 143c145 | 2009-05-19 14:43:15 +0800 | [diff] [blame] | 71 | To enable all events in sched subsystem: |
Theodore Ts'o | abd4144 | 2009-04-11 15:51:18 -0400 | [diff] [blame] | 72 | |
GeunSik Lim | 52ad51e | 2009-09-07 21:37:17 +0900 | [diff] [blame] | 73 | # echo 1 > /sys/kernel/debug/tracing/events/sched/enable |
Theodore Ts'o | abd4144 | 2009-04-11 15:51:18 -0400 | [diff] [blame] | 74 | |
Michal Sojka | 6afb1c6 | 2009-09-10 08:02:21 +0200 | [diff] [blame] | 75 | To enable all events: |
Theodore Ts'o | abd4144 | 2009-04-11 15:51:18 -0400 | [diff] [blame] | 76 | |
GeunSik Lim | 52ad51e | 2009-09-07 21:37:17 +0900 | [diff] [blame] | 77 | # echo 1 > /sys/kernel/debug/tracing/events/enable |
Theodore Ts'o | abd4144 | 2009-04-11 15:51:18 -0400 | [diff] [blame] | 78 | |
Li Zefan | 143c145 | 2009-05-19 14:43:15 +0800 | [diff] [blame] | 79 | When reading one of these enable files, there are four results: |
Theodore Ts'o | abd4144 | 2009-04-11 15:51:18 -0400 | [diff] [blame] | 80 | |
Li Zefan | 143c145 | 2009-05-19 14:43:15 +0800 | [diff] [blame] | 81 | 0 - all events this file affects are disabled |
| 82 | 1 - all events this file affects are enabled |
| 83 | X - there is a mixture of events enabled and disabled |
| 84 | ? - this file does not affect any event |
Theodore Ts'o | abd4144 | 2009-04-11 15:51:18 -0400 | [diff] [blame] | 85 | |
Li Zefan | 020e5f8 | 2009-07-01 10:47:05 +0800 | [diff] [blame] | 86 | 2.3 Boot option |
| 87 | --------------- |
| 88 | |
| 89 | In order to facilitate early boot debugging, use boot option: |
| 90 | |
| 91 | trace_event=[event-list] |
| 92 | |
Li Zefan | 03d646e | 2009-12-21 14:27:24 +0800 | [diff] [blame] | 93 | event-list is a comma separated list of events. See section 2.1 for event |
| 94 | format. |
Li Zefan | 020e5f8 | 2009-07-01 10:47:05 +0800 | [diff] [blame] | 95 | |
Li Zefan | 143c145 | 2009-05-19 14:43:15 +0800 | [diff] [blame] | 96 | 3. Defining an event-enabled tracepoint |
| 97 | ======================================= |
Theodore Ts'o | abd4144 | 2009-04-11 15:51:18 -0400 | [diff] [blame] | 98 | |
Li Zefan | 143c145 | 2009-05-19 14:43:15 +0800 | [diff] [blame] | 99 | See The example provided in samples/trace_events |
Theodore Ts'o | abd4144 | 2009-04-11 15:51:18 -0400 | [diff] [blame] | 100 | |
Tom Zanussi | 95b6960 | 2009-09-10 23:13:51 -0500 | [diff] [blame] | 101 | 4. Event formats |
| 102 | ================ |
| 103 | |
| 104 | Each trace event has a 'format' file associated with it that contains |
| 105 | a description of each field in a logged event. This information can |
| 106 | be used to parse the binary trace stream, and is also the place to |
| 107 | find the field names that can be used in event filters (see section 5). |
| 108 | |
| 109 | It also displays the format string that will be used to print the |
| 110 | event in text mode, along with the event name and ID used for |
| 111 | profiling. |
| 112 | |
| 113 | Every event has a set of 'common' fields associated with it; these are |
| 114 | the fields prefixed with 'common_'. The other fields vary between |
| 115 | events and correspond to the fields defined in the TRACE_EVENT |
| 116 | definition for that event. |
| 117 | |
| 118 | Each field in the format has the form: |
| 119 | |
| 120 | field:field-type field-name; offset:N; size:N; |
| 121 | |
| 122 | where offset is the offset of the field in the trace record and size |
| 123 | is the size of the data item, in bytes. |
| 124 | |
| 125 | For example, here's the information displayed for the 'sched_wakeup' |
| 126 | event: |
| 127 | |
| 128 | # cat /debug/tracing/events/sched/sched_wakeup/format |
| 129 | |
| 130 | name: sched_wakeup |
| 131 | ID: 60 |
| 132 | format: |
| 133 | field:unsigned short common_type; offset:0; size:2; |
| 134 | field:unsigned char common_flags; offset:2; size:1; |
| 135 | field:unsigned char common_preempt_count; offset:3; size:1; |
| 136 | field:int common_pid; offset:4; size:4; |
| 137 | field:int common_tgid; offset:8; size:4; |
| 138 | |
| 139 | field:char comm[TASK_COMM_LEN]; offset:12; size:16; |
| 140 | field:pid_t pid; offset:28; size:4; |
| 141 | field:int prio; offset:32; size:4; |
| 142 | field:int success; offset:36; size:4; |
| 143 | field:int cpu; offset:40; size:4; |
| 144 | |
| 145 | print fmt: "task %s:%d [%d] success=%d [%03d]", REC->comm, REC->pid, |
| 146 | REC->prio, REC->success, REC->cpu |
| 147 | |
| 148 | This event contains 10 fields, the first 5 common and the remaining 5 |
| 149 | event-specific. All the fields for this event are numeric, except for |
| 150 | 'comm' which is a string, a distinction important for event filtering. |
| 151 | |
| 152 | 5. Event filtering |
| 153 | ================== |
| 154 | |
| 155 | Trace events can be filtered in the kernel by associating boolean |
| 156 | 'filter expressions' with them. As soon as an event is logged into |
| 157 | the trace buffer, its fields are checked against the filter expression |
| 158 | associated with that event type. An event with field values that |
| 159 | 'match' the filter will appear in the trace output, and an event whose |
| 160 | values don't match will be discarded. An event with no filter |
| 161 | associated with it matches everything, and is the default when no |
| 162 | filter has been set for an event. |
| 163 | |
| 164 | 5.1 Expression syntax |
| 165 | --------------------- |
| 166 | |
| 167 | A filter expression consists of one or more 'predicates' that can be |
| 168 | combined using the logical operators '&&' and '||'. A predicate is |
| 169 | simply a clause that compares the value of a field contained within a |
| 170 | logged event with a constant value and returns either 0 or 1 depending |
| 171 | on whether the field value matched (1) or didn't match (0): |
| 172 | |
| 173 | field-name relational-operator value |
| 174 | |
| 175 | Parentheses can be used to provide arbitrary logical groupings and |
| 176 | double-quotes can be used to prevent the shell from interpreting |
| 177 | operators as shell metacharacters. |
| 178 | |
| 179 | The field-names available for use in filters can be found in the |
| 180 | 'format' files for trace events (see section 4). |
| 181 | |
| 182 | The relational-operators depend on the type of the field being tested: |
| 183 | |
| 184 | The operators available for numeric fields are: |
| 185 | |
| 186 | ==, !=, <, <=, >, >= |
| 187 | |
| 188 | And for string fields they are: |
| 189 | |
| 190 | ==, != |
| 191 | |
| 192 | Currently, only exact string matches are supported. |
| 193 | |
| 194 | Currently, the maximum number of predicates in a filter is 16. |
| 195 | |
| 196 | 5.2 Setting filters |
| 197 | ------------------- |
| 198 | |
| 199 | A filter for an individual event is set by writing a filter expression |
| 200 | to the 'filter' file for the given event. |
| 201 | |
| 202 | For example: |
| 203 | |
| 204 | # cd /debug/tracing/events/sched/sched_wakeup |
| 205 | # echo "common_preempt_count > 4" > filter |
| 206 | |
| 207 | A slightly more involved example: |
| 208 | |
| 209 | # cd /debug/tracing/events/sched/sched_signal_send |
| 210 | # echo "((sig >= 10 && sig < 15) || sig == 17) && comm != bash" > filter |
| 211 | |
| 212 | If there is an error in the expression, you'll get an 'Invalid |
| 213 | argument' error when setting it, and the erroneous string along with |
| 214 | an error message can be seen by looking at the filter e.g.: |
| 215 | |
| 216 | # cd /debug/tracing/events/sched/sched_signal_send |
| 217 | # echo "((sig >= 10 && sig < 15) || dsig == 17) && comm != bash" > filter |
| 218 | -bash: echo: write error: Invalid argument |
| 219 | # cat filter |
| 220 | ((sig >= 10 && sig < 15) || dsig == 17) && comm != bash |
| 221 | ^ |
| 222 | parse_error: Field not found |
| 223 | |
| 224 | Currently the caret ('^') for an error always appears at the beginning of |
| 225 | the filter string; the error message should still be useful though |
| 226 | even without more accurate position info. |
| 227 | |
| 228 | 5.3 Clearing filters |
| 229 | -------------------- |
| 230 | |
| 231 | To clear the filter for an event, write a '0' to the event's filter |
| 232 | file. |
| 233 | |
| 234 | To clear the filters for all events in a subsystem, write a '0' to the |
| 235 | subsystem's filter file. |
| 236 | |
| 237 | 5.3 Subsystem filters |
| 238 | --------------------- |
| 239 | |
| 240 | For convenience, filters for every event in a subsystem can be set or |
| 241 | cleared as a group by writing a filter expression into the filter file |
Thomas Weber | 8839316 | 2010-03-16 11:47:56 +0100 | [diff] [blame] | 242 | at the root of the subsystem. Note however, that if a filter for any |
Tom Zanussi | 95b6960 | 2009-09-10 23:13:51 -0500 | [diff] [blame] | 243 | event within the subsystem lacks a field specified in the subsystem |
| 244 | filter, or if the filter can't be applied for any other reason, the |
| 245 | filter for that event will retain its previous setting. This can |
| 246 | result in an unintended mixture of filters which could lead to |
| 247 | confusing (to the user who might think different filters are in |
| 248 | effect) trace output. Only filters that reference just the common |
| 249 | fields can be guaranteed to propagate successfully to all events. |
| 250 | |
| 251 | Here are a few subsystem filter examples that also illustrate the |
| 252 | above points: |
| 253 | |
Thomas Weber | 8839316 | 2010-03-16 11:47:56 +0100 | [diff] [blame] | 254 | Clear the filters on all events in the sched subsystem: |
Tom Zanussi | 95b6960 | 2009-09-10 23:13:51 -0500 | [diff] [blame] | 255 | |
| 256 | # cd /sys/kernel/debug/tracing/events/sched |
| 257 | # echo 0 > filter |
| 258 | # cat sched_switch/filter |
| 259 | none |
| 260 | # cat sched_wakeup/filter |
| 261 | none |
| 262 | |
| 263 | Set a filter using only common fields for all events in the sched |
Thomas Weber | 8839316 | 2010-03-16 11:47:56 +0100 | [diff] [blame] | 264 | subsystem (all events end up with the same filter): |
Tom Zanussi | 95b6960 | 2009-09-10 23:13:51 -0500 | [diff] [blame] | 265 | |
| 266 | # cd /sys/kernel/debug/tracing/events/sched |
| 267 | # echo common_pid == 0 > filter |
| 268 | # cat sched_switch/filter |
| 269 | common_pid == 0 |
| 270 | # cat sched_wakeup/filter |
| 271 | common_pid == 0 |
| 272 | |
| 273 | Attempt to set a filter using a non-common field for all events in the |
Thomas Weber | 8839316 | 2010-03-16 11:47:56 +0100 | [diff] [blame] | 274 | sched subsystem (all events but those that have a prev_pid field retain |
Tom Zanussi | 95b6960 | 2009-09-10 23:13:51 -0500 | [diff] [blame] | 275 | their old filters): |
| 276 | |
| 277 | # cd /sys/kernel/debug/tracing/events/sched |
| 278 | # echo prev_pid == 0 > filter |
| 279 | # cat sched_switch/filter |
| 280 | prev_pid == 0 |
| 281 | # cat sched_wakeup/filter |
| 282 | common_pid == 0 |