blob: 29f2cec2688a7356d2fe9be35426b89f4120e516 [file] [log] [blame]
Clement Courbetac74acd2018-04-04 11:37:06 +00001llvm-exegesis - LLVM Machine Instruction Benchmark
2==================================================
3
4SYNOPSIS
5--------
6
7:program:`llvm-exegesis` [*options*]
8
9DESCRIPTION
10-----------
11
12:program:`llvm-exegesis` is a benchmarking tool that uses information available
Clement Courbet362653f2019-01-30 16:02:20 +000013in LLVM to measure host machine instruction characteristics like latency,
14throughput, or port decomposition.
Clement Courbetac74acd2018-04-04 11:37:06 +000015
16Given an LLVM opcode name and a benchmarking mode, :program:`llvm-exegesis`
17generates a code snippet that makes execution as serial (resp. as parallel) as
Clement Courbet362653f2019-01-30 16:02:20 +000018possible so that we can measure the latency (resp. inverse throughput/uop decomposition)
19of the instruction.
Clement Courbetac74acd2018-04-04 11:37:06 +000020The code snippet is jitted and executed on the host subtarget. The time taken
21(resp. resource usage) is measured using hardware performance counters. The
22result is printed out as YAML to the standard output.
23
24The main goal of this tool is to automatically (in)validate the LLVM's TableDef
Clement Courbet5ec03cd2018-05-18 12:33:57 +000025scheduling models. To that end, we also provide analysis of the results.
26
Clement Courbet78b2e732018-09-25 07:31:44 +000027:program:`llvm-exegesis` can also benchmark arbitrary user-provided code
28snippets.
29
30EXAMPLE 1: benchmarking instructions
31------------------------------------
Clement Courbet5ec03cd2018-05-18 12:33:57 +000032
33Assume you have an X86-64 machine. To measure the latency of a single
34instruction, run:
35
36.. code-block:: bash
37
38 $ llvm-exegesis -mode=latency -opcode-name=ADD64rr
39
Clement Courbet362653f2019-01-30 16:02:20 +000040Measuring the uop decomposition or inverse throughput of an instruction works similarly:
Clement Courbet5ec03cd2018-05-18 12:33:57 +000041
42.. code-block:: bash
43
44 $ llvm-exegesis -mode=uops -opcode-name=ADD64rr
Clement Courbet362653f2019-01-30 16:02:20 +000045 $ llvm-exegesis -mode=inverse_throughput -opcode-name=ADD64rr
46
Clement Courbet5ec03cd2018-05-18 12:33:57 +000047
48The output is a YAML document (the default is to write to stdout, but you can
49redirect the output to a file using `-benchmarks-file`):
50
51.. code-block:: none
52
53 ---
54 key:
55 opcode_name: ADD64rr
56 mode: latency
57 config: ''
58 cpu_name: haswell
59 llvm_triple: x86_64-unknown-linux-gnu
60 num_repetitions: 10000
61 measurements:
62 - { key: latency, value: 1.0058, debug_string: '' }
63 error: ''
64 info: 'explicit self cycles, selecting one aliasing configuration.
65 Snippet:
66 ADD64rr R8, R8, R10
67 '
68 ...
69
70To measure the latency of all instructions for the host architecture, run:
71
72.. code-block:: bash
73
74 #!/bin/bash
Clement Courbet6eb680a2018-06-01 14:49:06 +000075 readonly INSTRUCTIONS=$(($(grep INSTRUCTION_LIST_END build/lib/Target/X86/X86GenInstrInfo.inc | cut -f2 -d=) - 1))
Clement Courbet5ec03cd2018-05-18 12:33:57 +000076 for INSTRUCTION in $(seq 1 ${INSTRUCTIONS});
77 do
78 ./build/bin/llvm-exegesis -mode=latency -opcode-index=${INSTRUCTION} | sed -n '/---/,$p'
79 done
80
81FIXME: Provide an :program:`llvm-exegesis` option to test all instructions.
82
Clement Courbet78b2e732018-09-25 07:31:44 +000083
84EXAMPLE 2: benchmarking a custom code snippet
85---------------------------------------------
86
87To measure the latency/uops of a custom piece of code, you can specify the
88`snippets-file` option (`-` reads from standard input).
89
90.. code-block:: bash
91
92 $ echo "vzeroupper" | llvm-exegesis -mode=uops -snippets-file=-
93
94Real-life code snippets typically depend on registers or memory.
95:program:`llvm-exegesis` checks the liveliness of registers (i.e. any register
96use has a corresponding def or is a "live in"). If your code depends on the
97value of some registers, you have two options:
Clement Courbet86ecf462018-09-25 07:48:38 +000098
99- Mark the register as requiring a definition. :program:`llvm-exegesis` will
100 automatically assign a value to the register. This can be done using the
101 directive `LLVM-EXEGESIS-DEFREG <reg name> <hex_value>`, where `<hex_value>`
102 is a bit pattern used to fill `<reg_name>`. If `<hex_value>` is smaller than
103 the register width, it will be sign-extended.
104- Mark the register as a "live in". :program:`llvm-exegesis` will benchmark
105 using whatever value was in this registers on entry. This can be done using
106 the directive `LLVM-EXEGESIS-LIVEIN <reg name>`.
Clement Courbet78b2e732018-09-25 07:31:44 +0000107
108For example, the following code snippet depends on the values of XMM1 (which
109will be set by the tool) and the memory buffer passed in RDI (live in).
110
111.. code-block:: none
112
113 # LLVM-EXEGESIS-LIVEIN RDI
114 # LLVM-EXEGESIS-DEFREG XMM1 42
115 vmulps (%rdi), %xmm1, %xmm2
116 vhaddps %xmm2, %xmm2, %xmm3
117 addq $0x10, %rdi
118
119
120EXAMPLE 3: analysis
121-------------------
Clement Courbet5ec03cd2018-05-18 12:33:57 +0000122
123Assuming you have a set of benchmarked instructions (either latency or uops) as
124YAML in file `/tmp/benchmarks.yaml`, you can analyze the results using the
125following command:
126
127.. code-block:: bash
128
129 $ llvm-exegesis -mode=analysis \
130 -benchmarks-file=/tmp/benchmarks.yaml \
131 -analysis-clusters-output-file=/tmp/clusters.csv \
Simon Pilgrimc4976f62018-09-27 13:49:52 +0000132 -analysis-inconsistencies-output-file=/tmp/inconsistencies.html
Clement Courbet5ec03cd2018-05-18 12:33:57 +0000133
134This will group the instructions into clusters with the same performance
135characteristics. The clusters will be written out to `/tmp/clusters.csv` in the
136following format:
137
138.. code-block:: none
139
140 cluster_id,opcode_name,config,sched_class
141 ...
142 2,ADD32ri8_DB,,WriteALU,1.00
143 2,ADD32ri_DB,,WriteALU,1.01
144 2,ADD32rr,,WriteALU,1.01
145 2,ADD32rr_DB,,WriteALU,1.00
146 2,ADD32rr_REV,,WriteALU,1.00
147 2,ADD64i32,,WriteALU,1.01
148 2,ADD64ri32,,WriteALU,1.01
149 2,MOVSX64rr32,,BSWAP32r_BSWAP64r_MOVSX64rr32,1.00
150 2,VPADDQYrr,,VPADDBYrr_VPADDDYrr_VPADDQYrr_VPADDWYrr_VPSUBBYrr_VPSUBDYrr_VPSUBQYrr_VPSUBWYrr,1.02
151 2,VPSUBQYrr,,VPADDBYrr_VPADDDYrr_VPADDQYrr_VPADDWYrr_VPSUBBYrr_VPSUBDYrr_VPSUBQYrr_VPSUBWYrr,1.01
152 2,ADD64ri8,,WriteALU,1.00
153 2,SETBr,,WriteSETCC,1.01
154 ...
155
156:program:`llvm-exegesis` will also analyze the clusters to point out
Clement Courbet488ebfb2018-05-22 13:36:29 +0000157inconsistencies in the scheduling information. The output is an html file. For
Clement Courbet2637e5f2018-05-24 10:47:05 +0000158example, `/tmp/inconsistencies.html` will contain messages like the following :
Clement Courbet5ec03cd2018-05-18 12:33:57 +0000159
Clement Courbet2637e5f2018-05-24 10:47:05 +0000160.. image:: llvm-exegesis-analysis.png
161 :align: center
Clement Courbet5ec03cd2018-05-18 12:33:57 +0000162
163Note that the scheduling class names will be resolved only when
164:program:`llvm-exegesis` is compiled in debug mode, else only the class id will
165be shown. This does not invalidate any of the analysis results though.
166
Clement Courbetac74acd2018-04-04 11:37:06 +0000167
168OPTIONS
169-------
170
171.. option:: -help
172
173 Print a summary of command line options.
174
175.. option:: -opcode-index=<LLVM opcode index>
176
Clement Courbet78b2e732018-09-25 07:31:44 +0000177 Specify the opcode to measure, by index. See example 1 for details.
178 Either `opcode-index`, `opcode-name` or `snippets-file` must be set.
Clement Courbetac74acd2018-04-04 11:37:06 +0000179
Clement Courbetf973c2d2018-10-17 15:04:15 +0000180.. option:: -opcode-name=<opcode name 1>,<opcode name 2>,...
Clement Courbetac74acd2018-04-04 11:37:06 +0000181
Clement Courbetf973c2d2018-10-17 15:04:15 +0000182 Specify the opcode to measure, by name. Several opcodes can be specified as
183 a comma-separated list. See example 1 for details.
Clement Courbet78b2e732018-09-25 07:31:44 +0000184 Either `opcode-index`, `opcode-name` or `snippets-file` must be set.
185
186 .. option:: -snippets-file=<filename>
187
188 Specify the custom code snippet to measure. See example 2 for details.
189 Either `opcode-index`, `opcode-name` or `snippets-file` must be set.
Clement Courbetac74acd2018-04-04 11:37:06 +0000190
Clement Courbet362653f2019-01-30 16:02:20 +0000191.. option:: -mode=[latency|uops|inverse_throughput|analysis]
Clement Courbetac74acd2018-04-04 11:37:06 +0000192
Roman Lebedev21193f42019-02-04 09:12:08 +0000193 Specify the run mode. Note that if you pick `analysis` mode, you also need
194 to specify at least one of the `-analysis-clusters-output-file=` and
195 `-analysis-inconsistencies-output-file=`.
Clement Courbetac74acd2018-04-04 11:37:06 +0000196
197.. option:: -num-repetitions=<Number of repetition>
198
199 Specify the number of repetitions of the asm snippet.
200 Higher values lead to more accurate measurements but lengthen the benchmark.
201
Simon Pilgrima5638432018-06-18 20:05:02 +0000202.. option:: -benchmarks-file=</path/to/file>
Clement Courbet5ec03cd2018-05-18 12:33:57 +0000203
Clement Courbet362653f2019-01-30 16:02:20 +0000204 File to read (`analysis` mode) or write (`latency`/`uops`/`inverse_throughput`
205 modes) benchmark results. "-" uses stdin/stdout.
Clement Courbet5ec03cd2018-05-18 12:33:57 +0000206
207.. option:: -analysis-clusters-output-file=</path/to/file>
208
209 If provided, write the analysis clusters as CSV to this file. "-" prints to
Roman Lebedev21193f42019-02-04 09:12:08 +0000210 stdout. By default, this analysis is not run.
Clement Courbet5ec03cd2018-05-18 12:33:57 +0000211
212.. option:: -analysis-inconsistencies-output-file=</path/to/file>
213
214 If non-empty, write inconsistencies found during analysis to this file. `-`
Roman Lebedev21193f42019-02-04 09:12:08 +0000215 prints to stdout. By default, this analysis is not run.
Clement Courbet5ec03cd2018-05-18 12:33:57 +0000216
Roman Lebedevc2423fe2019-03-28 08:55:01 +0000217.. option:: -analysis-clustering=[dbscan,naive]
218
219 Specify the clustering algorithm to use. By default DBSCAN will be used.
220 Naive clustering algorithm is better for doing further work on the
221 `-analysis-inconsistencies-output-file=` output, it will create one cluster
222 per opcode, and check that the cluster is stable (all points are neighbours).
223
Clement Courbet5ec03cd2018-05-18 12:33:57 +0000224.. option:: -analysis-numpoints=<dbscan numPoints parameter>
225
226 Specify the numPoints parameters to be used for DBSCAN clustering
Roman Lebedevc2423fe2019-03-28 08:55:01 +0000227 (`analysis` mode, DBSCAN only).
Clement Courbet5ec03cd2018-05-18 12:33:57 +0000228
Roman Lebedev542e5d72019-02-25 09:36:12 +0000229.. option:: -analysis-clustering-epsilon=<dbscan epsilon parameter>
Clement Courbet5ec03cd2018-05-18 12:33:57 +0000230
Roman Lebedev542e5d72019-02-25 09:36:12 +0000231 Specify the epsilon parameter used for clustering of benchmark points
Clement Courbet5ec03cd2018-05-18 12:33:57 +0000232 (`analysis` mode).
233
Roman Lebedev542e5d72019-02-25 09:36:12 +0000234.. option:: -analysis-inconsistency-epsilon=<epsilon>
235
236 Specify the epsilon parameter used for detection of when the cluster
237 is different from the LLVM schedule profile values (`analysis` mode).
238
Roman Lebedev69716392019-02-20 09:14:04 +0000239.. option:: -analysis-display-unstable-clusters
240
241 If there is more than one benchmark for an opcode, said benchmarks may end up
242 not being clustered into the same cluster if the measured performance
243 characteristics are different. by default all such opcodes are filtered out.
244 This flag will instead show only such unstable opcodes.
245
Simon Pilgrima5638432018-06-18 20:05:02 +0000246.. option:: -ignore-invalid-sched-class=false
Clement Courbete752fd62018-06-18 11:27:47 +0000247
Simon Pilgrima5638432018-06-18 20:05:02 +0000248 If set, ignore instructions that do not have a sched class (class idx = 0).
Clement Courbete752fd62018-06-18 11:27:47 +0000249
Clement Courbet41c8af32018-10-25 07:44:01 +0000250 .. option:: -mcpu=<cpu name>
251
252 If set, measure the cpu characteristics using the counters for this CPU. This
253 is useful when creating new sched models (the host CPU is unknown to LLVM).
Clement Courbetac74acd2018-04-04 11:37:06 +0000254
255EXIT STATUS
256-----------
257
258:program:`llvm-exegesis` returns 0 on success. Otherwise, an error message is
259printed to standard error, and the tool returns a non 0 value.