blob: 81e92e7736d68a9ad5737217384062771a6a0310 [file] [log] [blame]
Clement Courbetac74acd2018-04-04 11:37:06 +00001llvm-exegesis - LLVM Machine Instruction Benchmark
2==================================================
3
James Hendersona0566842019-06-27 13:24:46 +00004.. program:: llvm-exegesis
5
Clement Courbetac74acd2018-04-04 11:37:06 +00006SYNOPSIS
7--------
8
9:program:`llvm-exegesis` [*options*]
10
11DESCRIPTION
12-----------
13
14:program:`llvm-exegesis` is a benchmarking tool that uses information available
Clement Courbet362653f2019-01-30 16:02:20 +000015in LLVM to measure host machine instruction characteristics like latency,
16throughput, or port decomposition.
Clement Courbetac74acd2018-04-04 11:37:06 +000017
18Given an LLVM opcode name and a benchmarking mode, :program:`llvm-exegesis`
19generates a code snippet that makes execution as serial (resp. as parallel) as
Clement Courbet362653f2019-01-30 16:02:20 +000020possible so that we can measure the latency (resp. inverse throughput/uop decomposition)
21of the instruction.
Clement Courbetac74acd2018-04-04 11:37:06 +000022The code snippet is jitted and executed on the host subtarget. The time taken
23(resp. resource usage) is measured using hardware performance counters. The
24result is printed out as YAML to the standard output.
25
26The main goal of this tool is to automatically (in)validate the LLVM's TableDef
Clement Courbet5ec03cd2018-05-18 12:33:57 +000027scheduling models. To that end, we also provide analysis of the results.
28
Clement Courbet78b2e732018-09-25 07:31:44 +000029:program:`llvm-exegesis` can also benchmark arbitrary user-provided code
30snippets.
31
32EXAMPLE 1: benchmarking instructions
33------------------------------------
Clement Courbet5ec03cd2018-05-18 12:33:57 +000034
35Assume you have an X86-64 machine. To measure the latency of a single
36instruction, run:
37
38.. code-block:: bash
39
40 $ llvm-exegesis -mode=latency -opcode-name=ADD64rr
41
Clement Courbet362653f2019-01-30 16:02:20 +000042Measuring the uop decomposition or inverse throughput of an instruction works similarly:
Clement Courbet5ec03cd2018-05-18 12:33:57 +000043
44.. code-block:: bash
45
46 $ llvm-exegesis -mode=uops -opcode-name=ADD64rr
Clement Courbet362653f2019-01-30 16:02:20 +000047 $ llvm-exegesis -mode=inverse_throughput -opcode-name=ADD64rr
48
Clement Courbet5ec03cd2018-05-18 12:33:57 +000049
50The output is a YAML document (the default is to write to stdout, but you can
51redirect the output to a file using `-benchmarks-file`):
52
53.. code-block:: none
54
55 ---
56 key:
57 opcode_name: ADD64rr
58 mode: latency
59 config: ''
60 cpu_name: haswell
61 llvm_triple: x86_64-unknown-linux-gnu
62 num_repetitions: 10000
63 measurements:
64 - { key: latency, value: 1.0058, debug_string: '' }
65 error: ''
66 info: 'explicit self cycles, selecting one aliasing configuration.
67 Snippet:
68 ADD64rr R8, R8, R10
69 '
70 ...
71
72To measure the latency of all instructions for the host architecture, run:
73
74.. code-block:: bash
75
76 #!/bin/bash
Clement Courbet6eb680a2018-06-01 14:49:06 +000077 readonly INSTRUCTIONS=$(($(grep INSTRUCTION_LIST_END build/lib/Target/X86/X86GenInstrInfo.inc | cut -f2 -d=) - 1))
Clement Courbet5ec03cd2018-05-18 12:33:57 +000078 for INSTRUCTION in $(seq 1 ${INSTRUCTIONS});
79 do
80 ./build/bin/llvm-exegesis -mode=latency -opcode-index=${INSTRUCTION} | sed -n '/---/,$p'
81 done
82
83FIXME: Provide an :program:`llvm-exegesis` option to test all instructions.
84
Clement Courbet78b2e732018-09-25 07:31:44 +000085
86EXAMPLE 2: benchmarking a custom code snippet
87---------------------------------------------
88
89To measure the latency/uops of a custom piece of code, you can specify the
90`snippets-file` option (`-` reads from standard input).
91
92.. code-block:: bash
93
94 $ echo "vzeroupper" | llvm-exegesis -mode=uops -snippets-file=-
95
96Real-life code snippets typically depend on registers or memory.
97:program:`llvm-exegesis` checks the liveliness of registers (i.e. any register
98use has a corresponding def or is a "live in"). If your code depends on the
99value of some registers, you have two options:
Clement Courbet86ecf462018-09-25 07:48:38 +0000100
101- Mark the register as requiring a definition. :program:`llvm-exegesis` will
102 automatically assign a value to the register. This can be done using the
103 directive `LLVM-EXEGESIS-DEFREG <reg name> <hex_value>`, where `<hex_value>`
104 is a bit pattern used to fill `<reg_name>`. If `<hex_value>` is smaller than
105 the register width, it will be sign-extended.
106- Mark the register as a "live in". :program:`llvm-exegesis` will benchmark
107 using whatever value was in this registers on entry. This can be done using
108 the directive `LLVM-EXEGESIS-LIVEIN <reg name>`.
Clement Courbet78b2e732018-09-25 07:31:44 +0000109
110For example, the following code snippet depends on the values of XMM1 (which
111will be set by the tool) and the memory buffer passed in RDI (live in).
112
113.. code-block:: none
114
115 # LLVM-EXEGESIS-LIVEIN RDI
116 # LLVM-EXEGESIS-DEFREG XMM1 42
117 vmulps (%rdi), %xmm1, %xmm2
118 vhaddps %xmm2, %xmm2, %xmm3
119 addq $0x10, %rdi
120
121
122EXAMPLE 3: analysis
123-------------------
Clement Courbet5ec03cd2018-05-18 12:33:57 +0000124
125Assuming you have a set of benchmarked instructions (either latency or uops) as
126YAML in file `/tmp/benchmarks.yaml`, you can analyze the results using the
127following command:
128
129.. code-block:: bash
130
131 $ llvm-exegesis -mode=analysis \
132 -benchmarks-file=/tmp/benchmarks.yaml \
133 -analysis-clusters-output-file=/tmp/clusters.csv \
Simon Pilgrimc4976f62018-09-27 13:49:52 +0000134 -analysis-inconsistencies-output-file=/tmp/inconsistencies.html
Clement Courbet5ec03cd2018-05-18 12:33:57 +0000135
136This will group the instructions into clusters with the same performance
137characteristics. The clusters will be written out to `/tmp/clusters.csv` in the
138following format:
139
140.. code-block:: none
141
142 cluster_id,opcode_name,config,sched_class
143 ...
144 2,ADD32ri8_DB,,WriteALU,1.00
145 2,ADD32ri_DB,,WriteALU,1.01
146 2,ADD32rr,,WriteALU,1.01
147 2,ADD32rr_DB,,WriteALU,1.00
148 2,ADD32rr_REV,,WriteALU,1.00
149 2,ADD64i32,,WriteALU,1.01
150 2,ADD64ri32,,WriteALU,1.01
151 2,MOVSX64rr32,,BSWAP32r_BSWAP64r_MOVSX64rr32,1.00
152 2,VPADDQYrr,,VPADDBYrr_VPADDDYrr_VPADDQYrr_VPADDWYrr_VPSUBBYrr_VPSUBDYrr_VPSUBQYrr_VPSUBWYrr,1.02
153 2,VPSUBQYrr,,VPADDBYrr_VPADDDYrr_VPADDQYrr_VPADDWYrr_VPSUBBYrr_VPSUBDYrr_VPSUBQYrr_VPSUBWYrr,1.01
154 2,ADD64ri8,,WriteALU,1.00
155 2,SETBr,,WriteSETCC,1.01
156 ...
157
158:program:`llvm-exegesis` will also analyze the clusters to point out
Clement Courbet488ebfb2018-05-22 13:36:29 +0000159inconsistencies in the scheduling information. The output is an html file. For
Clement Courbet2637e5f2018-05-24 10:47:05 +0000160example, `/tmp/inconsistencies.html` will contain messages like the following :
Clement Courbet5ec03cd2018-05-18 12:33:57 +0000161
Clement Courbet2637e5f2018-05-24 10:47:05 +0000162.. image:: llvm-exegesis-analysis.png
163 :align: center
Clement Courbet5ec03cd2018-05-18 12:33:57 +0000164
165Note that the scheduling class names will be resolved only when
166:program:`llvm-exegesis` is compiled in debug mode, else only the class id will
167be shown. This does not invalidate any of the analysis results though.
168
Clement Courbetac74acd2018-04-04 11:37:06 +0000169OPTIONS
170-------
171
172.. option:: -help
173
174 Print a summary of command line options.
175
176.. option:: -opcode-index=<LLVM opcode index>
177
Clement Courbet78b2e732018-09-25 07:31:44 +0000178 Specify the opcode to measure, by index. See example 1 for details.
179 Either `opcode-index`, `opcode-name` or `snippets-file` must be set.
Clement Courbetac74acd2018-04-04 11:37:06 +0000180
Clement Courbetf973c2d2018-10-17 15:04:15 +0000181.. option:: -opcode-name=<opcode name 1>,<opcode name 2>,...
Clement Courbetac74acd2018-04-04 11:37:06 +0000182
Clement Courbetf973c2d2018-10-17 15:04:15 +0000183 Specify the opcode to measure, by name. Several opcodes can be specified as
184 a comma-separated list. See example 1 for details.
Clement Courbet78b2e732018-09-25 07:31:44 +0000185 Either `opcode-index`, `opcode-name` or `snippets-file` must be set.
186
187 .. option:: -snippets-file=<filename>
188
189 Specify the custom code snippet to measure. See example 2 for details.
190 Either `opcode-index`, `opcode-name` or `snippets-file` must be set.
Clement Courbetac74acd2018-04-04 11:37:06 +0000191
Clement Courbet362653f2019-01-30 16:02:20 +0000192.. option:: -mode=[latency|uops|inverse_throughput|analysis]
Clement Courbetac74acd2018-04-04 11:37:06 +0000193
Roman Lebedev21193f42019-02-04 09:12:08 +0000194 Specify the run mode. Note that if you pick `analysis` mode, you also need
195 to specify at least one of the `-analysis-clusters-output-file=` and
196 `-analysis-inconsistencies-output-file=`.
Clement Courbetac74acd2018-04-04 11:37:06 +0000197
Clement Courbet2cd0f282019-10-08 14:30:24 +0000198.. option:: -num-repetitions=<Number of repetitions>
Clement Courbetac74acd2018-04-04 11:37:06 +0000199
200 Specify the number of repetitions of the asm snippet.
201 Higher values lead to more accurate measurements but lengthen the benchmark.
202
Clement Courbet2cd0f282019-10-08 14:30:24 +0000203.. option:: -max-configs-per-opcode=<value>
204
205 Specify the maximum configurations that can be generated for each opcode.
206 By default this is `1`, meaning that we assume that a single measurement is
207 enough to characterize an opcode. This might not be true of all instructions:
208 for example, the performance characteristics of the LEA instruction on X86
209 depends on the value of assigned registers and immediates. Setting a value of
210 `-max-configs-per-opcode` larger than `1` allows `llvm-exegesis` to explore
211 more configurations to discover if some register or immediate assignments
212 lead to different performance characteristics.
213
214
Simon Pilgrima5638432018-06-18 20:05:02 +0000215.. option:: -benchmarks-file=</path/to/file>
Clement Courbet5ec03cd2018-05-18 12:33:57 +0000216
Clement Courbet362653f2019-01-30 16:02:20 +0000217 File to read (`analysis` mode) or write (`latency`/`uops`/`inverse_throughput`
218 modes) benchmark results. "-" uses stdin/stdout.
Clement Courbet5ec03cd2018-05-18 12:33:57 +0000219
220.. option:: -analysis-clusters-output-file=</path/to/file>
221
222 If provided, write the analysis clusters as CSV to this file. "-" prints to
Roman Lebedev21193f42019-02-04 09:12:08 +0000223 stdout. By default, this analysis is not run.
Clement Courbet5ec03cd2018-05-18 12:33:57 +0000224
225.. option:: -analysis-inconsistencies-output-file=</path/to/file>
226
227 If non-empty, write inconsistencies found during analysis to this file. `-`
Roman Lebedev21193f42019-02-04 09:12:08 +0000228 prints to stdout. By default, this analysis is not run.
Clement Courbet5ec03cd2018-05-18 12:33:57 +0000229
Roman Lebedevc2423fe2019-03-28 08:55:01 +0000230.. option:: -analysis-clustering=[dbscan,naive]
231
232 Specify the clustering algorithm to use. By default DBSCAN will be used.
233 Naive clustering algorithm is better for doing further work on the
234 `-analysis-inconsistencies-output-file=` output, it will create one cluster
235 per opcode, and check that the cluster is stable (all points are neighbours).
236
Clement Courbet5ec03cd2018-05-18 12:33:57 +0000237.. option:: -analysis-numpoints=<dbscan numPoints parameter>
238
239 Specify the numPoints parameters to be used for DBSCAN clustering
Roman Lebedevc2423fe2019-03-28 08:55:01 +0000240 (`analysis` mode, DBSCAN only).
Clement Courbet5ec03cd2018-05-18 12:33:57 +0000241
Roman Lebedev542e5d72019-02-25 09:36:12 +0000242.. option:: -analysis-clustering-epsilon=<dbscan epsilon parameter>
Clement Courbet5ec03cd2018-05-18 12:33:57 +0000243
Roman Lebedev542e5d72019-02-25 09:36:12 +0000244 Specify the epsilon parameter used for clustering of benchmark points
Clement Courbet5ec03cd2018-05-18 12:33:57 +0000245 (`analysis` mode).
246
Roman Lebedev542e5d72019-02-25 09:36:12 +0000247.. option:: -analysis-inconsistency-epsilon=<epsilon>
248
249 Specify the epsilon parameter used for detection of when the cluster
250 is different from the LLVM schedule profile values (`analysis` mode).
251
Roman Lebedev69716392019-02-20 09:14:04 +0000252.. option:: -analysis-display-unstable-clusters
253
254 If there is more than one benchmark for an opcode, said benchmarks may end up
255 not being clustered into the same cluster if the measured performance
256 characteristics are different. by default all such opcodes are filtered out.
257 This flag will instead show only such unstable opcodes.
258
Simon Pilgrima5638432018-06-18 20:05:02 +0000259.. option:: -ignore-invalid-sched-class=false
Clement Courbete752fd62018-06-18 11:27:47 +0000260
Simon Pilgrima5638432018-06-18 20:05:02 +0000261 If set, ignore instructions that do not have a sched class (class idx = 0).
Clement Courbete752fd62018-06-18 11:27:47 +0000262
Guillaume Chatelet848df5b2019-04-05 15:18:59 +0000263.. option:: -mcpu=<cpu name>
Clement Courbet41c8af32018-10-25 07:44:01 +0000264
Guillaume Chatelet848df5b2019-04-05 15:18:59 +0000265 If set, measure the cpu characteristics using the counters for this CPU. This
266 is useful when creating new sched models (the host CPU is unknown to LLVM).
267
268.. option:: --dump-object-to-disk=true
269
270 By default, llvm-exegesis will dump the generated code to a temporary file to
271 enable code inspection. You may disable it to speed up the execution and save
272 disk space.
Clement Courbetac74acd2018-04-04 11:37:06 +0000273
274EXIT STATUS
275-----------
276
277:program:`llvm-exegesis` returns 0 on success. Otherwise, an error message is
278printed to standard error, and the tool returns a non 0 value.