Ingo Molnar | 1d8c8b2 | 2009-04-20 15:52:29 +0200 | [diff] [blame] | 1 | perf-stat(1) |
Ingo Molnar | 6e6b754 | 2008-04-15 22:39:31 +0200 | [diff] [blame] | 2 | ============ |
Ingo Molnar | 1d8c8b2 | 2009-04-20 15:52:29 +0200 | [diff] [blame] | 3 | |
| 4 | NAME |
| 5 | ---- |
| 6 | perf-stat - Run a command and gather performance counter statistics |
| 7 | |
| 8 | SYNOPSIS |
| 9 | -------- |
| 10 | [verse] |
Shawn Bohrer | 8c20769 | 2010-11-30 19:57:19 -0600 | [diff] [blame] | 11 | 'perf stat' [-e <EVENT> | --event=EVENT] [-a] <command> |
| 12 | 'perf stat' [-e <EVENT> | --event=EVENT] [-a] -- <command> [<options>] |
Jiri Olsa | 4979d0c | 2015-11-05 15:40:46 +0100 | [diff] [blame] | 13 | 'perf stat' [-e <EVENT> | --event=EVENT] [-a] record [-o file] -- <command> [<options>] |
Jiri Olsa | ba6039b6 | 2015-11-05 15:40:55 +0100 | [diff] [blame] | 14 | 'perf stat' report [-i file] |
Ingo Molnar | 1d8c8b2 | 2009-04-20 15:52:29 +0200 | [diff] [blame] | 15 | |
| 16 | DESCRIPTION |
| 17 | ----------- |
| 18 | This command runs a command and gathers performance counter statistics |
| 19 | from it. |
| 20 | |
| 21 | |
| 22 | OPTIONS |
| 23 | ------- |
| 24 | <command>...:: |
| 25 | Any command you can specify in a shell. |
| 26 | |
Jiri Olsa | 4979d0c | 2015-11-05 15:40:46 +0100 | [diff] [blame] | 27 | record:: |
| 28 | See STAT RECORD. |
Ingo Molnar | 20c84e9 | 2009-06-04 16:33:00 +0200 | [diff] [blame] | 29 | |
Jiri Olsa | ba6039b6 | 2015-11-05 15:40:55 +0100 | [diff] [blame] | 30 | report:: |
| 31 | See STAT REPORT. |
| 32 | |
Ingo Molnar | 1d8c8b2 | 2009-04-20 15:52:29 +0200 | [diff] [blame] | 33 | -e:: |
| 34 | --event=:: |
Cody P Schafer | f9ab9c1 | 2015-01-07 17:13:53 -0800 | [diff] [blame] | 35 | Select the PMU event. Selection can be: |
| 36 | |
| 37 | - a symbolic event name (use 'perf list' to list all events) |
| 38 | |
| 39 | - a raw PMU event (eventsel+umask) in the form of rNNN where NNN is a |
| 40 | hexadecimal event descriptor. |
| 41 | |
| 42 | - a symbolically formed event like 'pmu/param1=0x3,param2/' where |
| 43 | param1 and param2 are defined as formats for the PMU in |
| 44 | /sys/bus/event_sources/devices/<pmu>/format/* |
| 45 | |
| 46 | - a symbolically formed event like 'pmu/config=M,config1=N,config2=K/' |
| 47 | where M, N, K are numbers (in decimal, hex, octal format). |
| 48 | Acceptable values for each of 'config', 'config1' and 'config2' |
| 49 | parameters are defined by corresponding entries in |
| 50 | /sys/bus/event_sources/devices/<pmu>/format/* |
Ingo Molnar | 1d8c8b2 | 2009-04-20 15:52:29 +0200 | [diff] [blame] | 51 | |
Ingo Molnar | 20c84e9 | 2009-06-04 16:33:00 +0200 | [diff] [blame] | 52 | -i:: |
Stephane Eranian | 2e6cdf9 | 2010-05-12 10:40:01 +0200 | [diff] [blame] | 53 | --no-inherit:: |
| 54 | child tasks do not inherit counters |
Ingo Molnar | 20c84e9 | 2009-06-04 16:33:00 +0200 | [diff] [blame] | 55 | -p:: |
| 56 | --pid=<pid>:: |
David Ahern | b52956c | 2012-02-08 09:32:52 -0700 | [diff] [blame] | 57 | stat events on existing process id (comma separated list) |
Shawn Bohrer | 8c20769 | 2010-11-30 19:57:19 -0600 | [diff] [blame] | 58 | |
| 59 | -t:: |
| 60 | --tid=<tid>:: |
David Ahern | b52956c | 2012-02-08 09:32:52 -0700 | [diff] [blame] | 61 | stat events on existing thread id (comma separated list) |
Shawn Bohrer | 8c20769 | 2010-11-30 19:57:19 -0600 | [diff] [blame] | 62 | |
Ingo Molnar | 20c84e9 | 2009-06-04 16:33:00 +0200 | [diff] [blame] | 63 | |
Ingo Molnar | 1d8c8b2 | 2009-04-20 15:52:29 +0200 | [diff] [blame] | 64 | -a:: |
Shawn Bohrer | 8c20769 | 2010-11-30 19:57:19 -0600 | [diff] [blame] | 65 | --all-cpus:: |
Jiri Olsa | 0d79f8b | 2017-02-17 18:00:34 +0100 | [diff] [blame] | 66 | system-wide collection from all CPUs (default if no target is specified) |
Ingo Molnar | 1d8c8b2 | 2009-04-20 15:52:29 +0200 | [diff] [blame] | 67 | |
Brice Goglin | b26bc5a | 2009-08-07 10:18:39 +0200 | [diff] [blame] | 68 | -c:: |
Shawn Bohrer | 8c20769 | 2010-11-30 19:57:19 -0600 | [diff] [blame] | 69 | --scale:: |
| 70 | scale/normalize counter values |
| 71 | |
Borislav Petkov | f594bae | 2016-03-07 16:44:44 -0300 | [diff] [blame] | 72 | -d:: |
| 73 | --detailed:: |
| 74 | print more detailed statistics, can be specified up to 3 times |
| 75 | |
| 76 | -d: detailed events, L1 and LLC data cache |
| 77 | -d -d: more detailed events, dTLB and iTLB events |
| 78 | -d -d -d: very detailed events, adding prefetch events |
| 79 | |
Shawn Bohrer | 8c20769 | 2010-11-30 19:57:19 -0600 | [diff] [blame] | 80 | -r:: |
| 81 | --repeat=<n>:: |
Frederik Deweerdt | a7e191c | 2013-03-01 13:02:27 -0500 | [diff] [blame] | 82 | repeat command and print average + stddev (max: 100). 0 means forever. |
Ingo Molnar | 1d8c8b2 | 2009-04-20 15:52:29 +0200 | [diff] [blame] | 83 | |
Stephane Eranian | 5af52b5 | 2010-05-18 15:00:01 +0200 | [diff] [blame] | 84 | -B:: |
Shawn Bohrer | 8c20769 | 2010-11-30 19:57:19 -0600 | [diff] [blame] | 85 | --big-num:: |
Stephane Eranian | 5af52b5 | 2010-05-18 15:00:01 +0200 | [diff] [blame] | 86 | print large numbers with thousands' separators according to locale |
| 87 | |
Stephane Eranian | c45c6ea | 2010-05-28 12:00:01 +0200 | [diff] [blame] | 88 | -C:: |
| 89 | --cpu=:: |
Shawn Bohrer | 8c20769 | 2010-11-30 19:57:19 -0600 | [diff] [blame] | 90 | Count only on the list of CPUs provided. Multiple CPUs can be provided as a |
| 91 | comma-separated list with no space: 0,1. Ranges of CPUs are specified with -: 0-2. |
Stephane Eranian | c45c6ea | 2010-05-28 12:00:01 +0200 | [diff] [blame] | 92 | In per-thread mode, this option is ignored. The -a option is still necessary |
| 93 | to activate system-wide monitoring. Default is to count on all CPUs. |
| 94 | |
Stephane Eranian | f5b4a9c3 | 2010-11-16 11:05:01 +0200 | [diff] [blame] | 95 | -A:: |
| 96 | --no-aggr:: |
Ravi Bangoria | efc9c05 | 2017-03-20 18:07:18 +0530 | [diff] [blame] | 97 | Do not aggregate counts across all monitored CPUs. |
Stephane Eranian | f5b4a9c3 | 2010-11-16 11:05:01 +0200 | [diff] [blame] | 98 | |
Shawn Bohrer | 8c20769 | 2010-11-30 19:57:19 -0600 | [diff] [blame] | 99 | -n:: |
| 100 | --null:: |
| 101 | null run - don't start any counters |
| 102 | |
| 103 | -v:: |
| 104 | --verbose:: |
| 105 | be more verbose (show counter open errors, etc) |
| 106 | |
Stephane Eranian | d7470b6 | 2010-12-01 18:49:05 +0200 | [diff] [blame] | 107 | -x SEP:: |
| 108 | --field-separator SEP:: |
| 109 | print counts using a CSV-style output to make it easy to import directly into |
| 110 | spreadsheets. Columns are separated by the string specified in SEP. |
| 111 | |
Stephane Eranian | 023695d | 2011-02-14 11:20:01 +0200 | [diff] [blame] | 112 | -G name:: |
| 113 | --cgroup name:: |
| 114 | monitor only in the container (cgroup) called "name". This option is available only |
| 115 | in per-cpu mode. The cgroup filesystem must be mounted. All threads belonging to |
| 116 | container "name" are monitored when they run on the monitored CPUs. Multiple cgroups |
| 117 | can be provided. Each cgroup is applied to the corresponding event, i.e., first cgroup |
| 118 | to first event, second cgroup to second event and so on. It is possible to provide |
| 119 | an empty cgroup (monitor all the time) using, e.g., -G foo,,bar. Cgroups must have |
| 120 | corresponding events, i.e., they always refer to events defined earlier on the command |
| 121 | line. |
| 122 | |
Stephane Eranian | 4aa9015 | 2011-08-15 22:22:33 +0200 | [diff] [blame] | 123 | -o file:: |
Jim Cromie | 56f3bae | 2011-09-07 17:14:00 -0600 | [diff] [blame] | 124 | --output file:: |
Stephane Eranian | 4aa9015 | 2011-08-15 22:22:33 +0200 | [diff] [blame] | 125 | Print the output into the designated file. |
| 126 | |
| 127 | --append:: |
| 128 | Append to the output file designated with the -o option. Ignored if -o is not specified. |
| 129 | |
Jim Cromie | 56f3bae | 2011-09-07 17:14:00 -0600 | [diff] [blame] | 130 | --log-fd:: |
| 131 | |
| 132 | Log output to fd, instead of stderr. Complementary to --output, and mutually exclusive |
| 133 | with it. --append may be used here. Examples: |
| 134 | 3>results perf stat --log-fd 3 -- $cmd |
| 135 | 3>>results perf stat --log-fd 3 --append -- $cmd |
| 136 | |
Peter Zijlstra | 1f16c57 | 2012-10-23 13:40:14 +0200 | [diff] [blame] | 137 | --pre:: |
| 138 | --post:: |
| 139 | Pre and post measurement hooks, e.g.: |
| 140 | |
| 141 | perf stat --repeat 10 --null --sync --pre 'make -s O=defconfig-build/clean' -- make -s -j64 O=defconfig-build/ bzImage |
Jim Cromie | 56f3bae | 2011-09-07 17:14:00 -0600 | [diff] [blame] | 142 | |
Stephane Eranian | 13370a9 | 2013-01-29 12:47:44 +0100 | [diff] [blame] | 143 | -I msecs:: |
| 144 | --interval-print msecs:: |
Kan Liang | 19afd10 | 2015-10-02 05:04:34 -0400 | [diff] [blame] | 145 | Print count deltas every N milliseconds (minimum: 10ms) |
| 146 | The overhead percentage could be high in some cases, for instance with small, sub 100ms intervals. Use with caution. |
| 147 | example: 'perf stat -I 1000 -e cycles -a sleep 5' |
Jim Cromie | 56f3bae | 2011-09-07 17:14:00 -0600 | [diff] [blame] | 148 | |
Andi Kleen | 54b5091 | 2016-03-03 15:57:36 -0800 | [diff] [blame] | 149 | --metric-only:: |
| 150 | Only print computed metrics. Print them in a single line. |
Andi Kleen | 206cab6 | 2016-03-03 15:57:37 -0800 | [diff] [blame] | 151 | Don't show any raw values. Not supported with --per-thread. |
Andi Kleen | 54b5091 | 2016-03-03 15:57:36 -0800 | [diff] [blame] | 152 | |
Stephane Eranian | d430495 | 2013-02-14 13:57:28 +0100 | [diff] [blame] | 153 | --per-socket:: |
Stephane Eranian | d7e7a45 | 2013-02-06 15:46:02 +0100 | [diff] [blame] | 154 | Aggregate counts per processor socket for system-wide mode measurements. This |
| 155 | is a useful mode to detect imbalance between sockets. To enable this mode, |
Stephane Eranian | d430495 | 2013-02-14 13:57:28 +0100 | [diff] [blame] | 156 | use --per-socket in addition to -a. (system-wide). The output includes the |
Stephane Eranian | d7e7a45 | 2013-02-06 15:46:02 +0100 | [diff] [blame] | 157 | socket number and the number of online processors on that socket. This is |
| 158 | useful to gauge the amount of aggregation. |
| 159 | |
Stephane Eranian | 12c08a9 | 2013-02-14 13:57:29 +0100 | [diff] [blame] | 160 | --per-core:: |
| 161 | Aggregate counts per physical processor for system-wide mode measurements. This |
| 162 | is a useful mode to detect imbalance between physical cores. To enable this mode, |
| 163 | use --per-core in addition to -a. (system-wide). The output includes the |
| 164 | core number and the number of online logical processors on that physical processor. |
| 165 | |
Jiri Olsa | 32b8af8 | 2015-06-26 11:29:27 +0200 | [diff] [blame] | 166 | --per-thread:: |
| 167 | Aggregate counts per monitored threads, when monitoring threads (-t option) |
| 168 | or processes (-p option). |
| 169 | |
Andi Kleen | 4119168 | 2013-08-02 17:41:11 -0700 | [diff] [blame] | 170 | -D msecs:: |
Andi Kleen | 8f3dd2b | 2014-01-07 14:14:06 -0800 | [diff] [blame] | 171 | --delay msecs:: |
Andi Kleen | 4119168 | 2013-08-02 17:41:11 -0700 | [diff] [blame] | 172 | After starting the program, wait msecs before measuring. This is useful to |
| 173 | filter out the startup phase of the program, which is often very different. |
| 174 | |
Andi Kleen | 4cabc3d | 2013-08-21 16:47:26 -0700 | [diff] [blame] | 175 | -T:: |
| 176 | --transaction:: |
| 177 | |
| 178 | Print statistics of transactional execution if supported. |
| 179 | |
Jiri Olsa | 4979d0c | 2015-11-05 15:40:46 +0100 | [diff] [blame] | 180 | STAT RECORD |
| 181 | ----------- |
| 182 | Stores stat data into perf data file. |
| 183 | |
| 184 | -o file:: |
| 185 | --output file:: |
| 186 | Output file name. |
| 187 | |
Jiri Olsa | ba6039b6 | 2015-11-05 15:40:55 +0100 | [diff] [blame] | 188 | STAT REPORT |
| 189 | ----------- |
| 190 | Reads and reports stat data from perf data file. |
| 191 | |
| 192 | -i file:: |
| 193 | --input file:: |
| 194 | Input file name. |
| 195 | |
Jiri Olsa | 89af4e0 | 2015-11-05 15:41:02 +0100 | [diff] [blame] | 196 | --per-socket:: |
| 197 | Aggregate counts per processor socket for system-wide mode measurements. |
| 198 | |
| 199 | --per-core:: |
| 200 | Aggregate counts per physical processor for system-wide mode measurements. |
| 201 | |
| 202 | -A:: |
| 203 | --no-aggr:: |
| 204 | Do not aggregate counts across all monitored CPUs. |
| 205 | |
Andi Kleen | 44b1e60 | 2016-05-30 12:49:42 -0300 | [diff] [blame] | 206 | --topdown:: |
| 207 | Print top down level 1 metrics if supported by the CPU. This allows to |
| 208 | determine bottle necks in the CPU pipeline for CPU bound workloads, |
| 209 | by breaking the cycles consumed down into frontend bound, backend bound, |
| 210 | bad speculation and retiring. |
| 211 | |
| 212 | Frontend bound means that the CPU cannot fetch and decode instructions fast |
| 213 | enough. Backend bound means that computation or memory access is the bottle |
| 214 | neck. Bad Speculation means that the CPU wasted cycles due to branch |
| 215 | mispredictions and similar issues. Retiring means that the CPU computed without |
| 216 | an apparently bottleneck. The bottleneck is only the real bottleneck |
| 217 | if the workload is actually bound by the CPU and not by something else. |
| 218 | |
| 219 | For best results it is usually a good idea to use it with interval |
| 220 | mode like -I 1000, as the bottleneck of workloads can change often. |
| 221 | |
| 222 | The top down metrics are collected per core instead of per |
| 223 | CPU thread. Per core mode is automatically enabled |
| 224 | and -a (global monitoring) is needed, requiring root rights or |
| 225 | perf.perf_event_paranoid=-1. |
| 226 | |
| 227 | Topdown uses the full Performance Monitoring Unit, and needs |
| 228 | disabling of the NMI watchdog (as root): |
| 229 | echo 0 > /proc/sys/kernel/nmi_watchdog |
| 230 | for best results. Otherwise the bottlenecks may be inconsistent |
| 231 | on workload with changing phases. |
| 232 | |
| 233 | This enables --metric-only, unless overriden with --no-metric-only. |
| 234 | |
| 235 | To interpret the results it is usually needed to know on which |
| 236 | CPUs the workload runs on. If needed the CPUs can be forced using |
| 237 | taskset. |
Jiri Olsa | 4979d0c | 2015-11-05 15:40:46 +0100 | [diff] [blame] | 238 | |
Andi Kleen | 430daf2 | 2017-03-20 13:17:00 -0700 | [diff] [blame] | 239 | --no-merge:: |
| 240 | Do not merge results from same PMUs. |
| 241 | |
Kan Liang | daefd0b | 2017-05-26 12:05:38 -0700 | [diff] [blame] | 242 | --smi-cost:: |
| 243 | Measure SMI cost if msr/aperf/ and msr/smi/ events are supported. |
| 244 | |
| 245 | During the measurement, the /sys/device/cpu/freeze_on_smi will be set to |
| 246 | freeze core counters on SMI. |
| 247 | The aperf counter will not be effected by the setting. |
| 248 | The cost of SMI can be measured by (aperf - unhalted core cycles). |
| 249 | |
| 250 | In practice, the percentages of SMI cycles is very useful for performance |
| 251 | oriented analysis. --metric_only will be applied by default. |
| 252 | The output is SMI cycles%, equals to (aperf - unhalted core cycles) / aperf |
| 253 | |
| 254 | Users who wants to get the actual value can apply --no-metric-only. |
| 255 | |
Ingo Molnar | 1d8c8b2 | 2009-04-20 15:52:29 +0200 | [diff] [blame] | 256 | EXAMPLES |
| 257 | -------- |
| 258 | |
Ingo Molnar | 20c84e9 | 2009-06-04 16:33:00 +0200 | [diff] [blame] | 259 | $ perf stat -- make -j |
Ingo Molnar | 1d8c8b2 | 2009-04-20 15:52:29 +0200 | [diff] [blame] | 260 | |
Ingo Molnar | 20c84e9 | 2009-06-04 16:33:00 +0200 | [diff] [blame] | 261 | Performance counter stats for 'make -j': |
Ingo Molnar | 1d8c8b2 | 2009-04-20 15:52:29 +0200 | [diff] [blame] | 262 | |
Ingo Molnar | 20c84e9 | 2009-06-04 16:33:00 +0200 | [diff] [blame] | 263 | 8117.370256 task clock ticks # 11.281 CPU utilization factor |
| 264 | 678 context switches # 0.000 M/sec |
| 265 | 133 CPU migrations # 0.000 M/sec |
| 266 | 235724 pagefaults # 0.029 M/sec |
| 267 | 24821162526 CPU cycles # 3057.784 M/sec |
| 268 | 18687303457 instructions # 2302.138 M/sec |
| 269 | 172158895 cache references # 21.209 M/sec |
| 270 | 27075259 cache misses # 3.335 M/sec |
Ingo Molnar | 1d8c8b2 | 2009-04-20 15:52:29 +0200 | [diff] [blame] | 271 | |
Ingo Molnar | 20c84e9 | 2009-06-04 16:33:00 +0200 | [diff] [blame] | 272 | Wall-clock time elapsed: 719.554352 msecs |
Ingo Molnar | 1d8c8b2 | 2009-04-20 15:52:29 +0200 | [diff] [blame] | 273 | |
Andi Kleen | 6b45f7b | 2016-03-03 15:57:35 -0800 | [diff] [blame] | 274 | CSV FORMAT |
| 275 | ---------- |
| 276 | |
| 277 | With -x, perf stat is able to output a not-quite-CSV format output |
| 278 | Commas in the output are not put into "". To make it easy to parse |
| 279 | it is recommended to use a different character like -x \; |
| 280 | |
| 281 | The fields are in this order: |
| 282 | |
| 283 | - optional usec time stamp in fractions of second (with -I xxx) |
| 284 | - optional CPU, core, or socket identifier |
| 285 | - optional number of logical CPUs aggregated |
| 286 | - counter value |
| 287 | - unit of the counter value or empty |
| 288 | - event name |
| 289 | - run time of counter |
| 290 | - percentage of measurement time the counter was running |
| 291 | - optional variance if multiple values are collected with -r |
| 292 | - optional metric value |
| 293 | - optional unit of metric |
| 294 | |
| 295 | Additional metrics may be printed with all earlier fields being empty. |
| 296 | |
Ingo Molnar | 1d8c8b2 | 2009-04-20 15:52:29 +0200 | [diff] [blame] | 297 | SEE ALSO |
| 298 | -------- |
Thomas Gleixner | 386b05e | 2009-06-06 14:56:33 +0200 | [diff] [blame] | 299 | linkperf:perf-top[1], linkperf:perf-list[1] |