blob: 4181a9987213eec01094d92722c429bd34ffc5c9 [file] [log] [blame]
Clement Courbetac74acd2018-04-04 11:37:06 +00001llvm-exegesis - LLVM Machine Instruction Benchmark
2==================================================
3
4SYNOPSIS
5--------
6
7:program:`llvm-exegesis` [*options*]
8
9DESCRIPTION
10-----------
11
12:program:`llvm-exegesis` is a benchmarking tool that uses information available
13in LLVM to measure host machine instruction characteristics like latency or port
14decomposition.
15
16Given an LLVM opcode name and a benchmarking mode, :program:`llvm-exegesis`
17generates a code snippet that makes execution as serial (resp. as parallel) as
18possible so that we can measure the latency (resp. uop decomposition) of the
19instruction.
20The code snippet is jitted and executed on the host subtarget. The time taken
21(resp. resource usage) is measured using hardware performance counters. The
22result is printed out as YAML to the standard output.
23
24The main goal of this tool is to automatically (in)validate the LLVM's TableDef
Clement Courbet5ec03cd2018-05-18 12:33:57 +000025scheduling models. To that end, we also provide analysis of the results.
26
Clement Courbet78b2e732018-09-25 07:31:44 +000027:program:`llvm-exegesis` can also benchmark arbitrary user-provided code
28snippets.
29
30EXAMPLE 1: benchmarking instructions
31------------------------------------
Clement Courbet5ec03cd2018-05-18 12:33:57 +000032
33Assume you have an X86-64 machine. To measure the latency of a single
34instruction, run:
35
36.. code-block:: bash
37
38 $ llvm-exegesis -mode=latency -opcode-name=ADD64rr
39
40Measuring the uop decomposition of an instruction works similarly:
41
42.. code-block:: bash
43
44 $ llvm-exegesis -mode=uops -opcode-name=ADD64rr
45
46The output is a YAML document (the default is to write to stdout, but you can
47redirect the output to a file using `-benchmarks-file`):
48
49.. code-block:: none
50
51 ---
52 key:
53 opcode_name: ADD64rr
54 mode: latency
55 config: ''
56 cpu_name: haswell
57 llvm_triple: x86_64-unknown-linux-gnu
58 num_repetitions: 10000
59 measurements:
60 - { key: latency, value: 1.0058, debug_string: '' }
61 error: ''
62 info: 'explicit self cycles, selecting one aliasing configuration.
63 Snippet:
64 ADD64rr R8, R8, R10
65 '
66 ...
67
68To measure the latency of all instructions for the host architecture, run:
69
70.. code-block:: bash
71
72 #!/bin/bash
Clement Courbet6eb680a2018-06-01 14:49:06 +000073 readonly INSTRUCTIONS=$(($(grep INSTRUCTION_LIST_END build/lib/Target/X86/X86GenInstrInfo.inc | cut -f2 -d=) - 1))
Clement Courbet5ec03cd2018-05-18 12:33:57 +000074 for INSTRUCTION in $(seq 1 ${INSTRUCTIONS});
75 do
76 ./build/bin/llvm-exegesis -mode=latency -opcode-index=${INSTRUCTION} | sed -n '/---/,$p'
77 done
78
79FIXME: Provide an :program:`llvm-exegesis` option to test all instructions.
80
Clement Courbet78b2e732018-09-25 07:31:44 +000081
82EXAMPLE 2: benchmarking a custom code snippet
83---------------------------------------------
84
85To measure the latency/uops of a custom piece of code, you can specify the
86`snippets-file` option (`-` reads from standard input).
87
88.. code-block:: bash
89
90 $ echo "vzeroupper" | llvm-exegesis -mode=uops -snippets-file=-
91
92Real-life code snippets typically depend on registers or memory.
93:program:`llvm-exegesis` checks the liveliness of registers (i.e. any register
94use has a corresponding def or is a "live in"). If your code depends on the
95value of some registers, you have two options:
Clement Courbet86ecf462018-09-25 07:48:38 +000096
97- Mark the register as requiring a definition. :program:`llvm-exegesis` will
98 automatically assign a value to the register. This can be done using the
99 directive `LLVM-EXEGESIS-DEFREG <reg name> <hex_value>`, where `<hex_value>`
100 is a bit pattern used to fill `<reg_name>`. If `<hex_value>` is smaller than
101 the register width, it will be sign-extended.
102- Mark the register as a "live in". :program:`llvm-exegesis` will benchmark
103 using whatever value was in this registers on entry. This can be done using
104 the directive `LLVM-EXEGESIS-LIVEIN <reg name>`.
Clement Courbet78b2e732018-09-25 07:31:44 +0000105
106For example, the following code snippet depends on the values of XMM1 (which
107will be set by the tool) and the memory buffer passed in RDI (live in).
108
109.. code-block:: none
110
111 # LLVM-EXEGESIS-LIVEIN RDI
112 # LLVM-EXEGESIS-DEFREG XMM1 42
113 vmulps (%rdi), %xmm1, %xmm2
114 vhaddps %xmm2, %xmm2, %xmm3
115 addq $0x10, %rdi
116
117
118EXAMPLE 3: analysis
119-------------------
Clement Courbet5ec03cd2018-05-18 12:33:57 +0000120
121Assuming you have a set of benchmarked instructions (either latency or uops) as
122YAML in file `/tmp/benchmarks.yaml`, you can analyze the results using the
123following command:
124
125.. code-block:: bash
126
127 $ llvm-exegesis -mode=analysis \
128 -benchmarks-file=/tmp/benchmarks.yaml \
129 -analysis-clusters-output-file=/tmp/clusters.csv \
Simon Pilgrimc4976f62018-09-27 13:49:52 +0000130 -analysis-inconsistencies-output-file=/tmp/inconsistencies.html
Clement Courbet5ec03cd2018-05-18 12:33:57 +0000131
132This will group the instructions into clusters with the same performance
133characteristics. The clusters will be written out to `/tmp/clusters.csv` in the
134following format:
135
136.. code-block:: none
137
138 cluster_id,opcode_name,config,sched_class
139 ...
140 2,ADD32ri8_DB,,WriteALU,1.00
141 2,ADD32ri_DB,,WriteALU,1.01
142 2,ADD32rr,,WriteALU,1.01
143 2,ADD32rr_DB,,WriteALU,1.00
144 2,ADD32rr_REV,,WriteALU,1.00
145 2,ADD64i32,,WriteALU,1.01
146 2,ADD64ri32,,WriteALU,1.01
147 2,MOVSX64rr32,,BSWAP32r_BSWAP64r_MOVSX64rr32,1.00
148 2,VPADDQYrr,,VPADDBYrr_VPADDDYrr_VPADDQYrr_VPADDWYrr_VPSUBBYrr_VPSUBDYrr_VPSUBQYrr_VPSUBWYrr,1.02
149 2,VPSUBQYrr,,VPADDBYrr_VPADDDYrr_VPADDQYrr_VPADDWYrr_VPSUBBYrr_VPSUBDYrr_VPSUBQYrr_VPSUBWYrr,1.01
150 2,ADD64ri8,,WriteALU,1.00
151 2,SETBr,,WriteSETCC,1.01
152 ...
153
154:program:`llvm-exegesis` will also analyze the clusters to point out
Clement Courbet488ebfb2018-05-22 13:36:29 +0000155inconsistencies in the scheduling information. The output is an html file. For
Clement Courbet2637e5f2018-05-24 10:47:05 +0000156example, `/tmp/inconsistencies.html` will contain messages like the following :
Clement Courbet5ec03cd2018-05-18 12:33:57 +0000157
Clement Courbet2637e5f2018-05-24 10:47:05 +0000158.. image:: llvm-exegesis-analysis.png
159 :align: center
Clement Courbet5ec03cd2018-05-18 12:33:57 +0000160
161Note that the scheduling class names will be resolved only when
162:program:`llvm-exegesis` is compiled in debug mode, else only the class id will
163be shown. This does not invalidate any of the analysis results though.
164
Clement Courbetac74acd2018-04-04 11:37:06 +0000165
166OPTIONS
167-------
168
169.. option:: -help
170
171 Print a summary of command line options.
172
173.. option:: -opcode-index=<LLVM opcode index>
174
Clement Courbet78b2e732018-09-25 07:31:44 +0000175 Specify the opcode to measure, by index. See example 1 for details.
176 Either `opcode-index`, `opcode-name` or `snippets-file` must be set.
Clement Courbetac74acd2018-04-04 11:37:06 +0000177
178.. option:: -opcode-name=<LLVM opcode name>
179
Clement Courbet78b2e732018-09-25 07:31:44 +0000180 Specify the opcode to measure, by name. See example 1 for details.
181 Either `opcode-index`, `opcode-name` or `snippets-file` must be set.
182
183 .. option:: -snippets-file=<filename>
184
185 Specify the custom code snippet to measure. See example 2 for details.
186 Either `opcode-index`, `opcode-name` or `snippets-file` must be set.
Clement Courbetac74acd2018-04-04 11:37:06 +0000187
Clement Courbet5ec03cd2018-05-18 12:33:57 +0000188.. option:: -mode=[latency|uops|analysis]
Clement Courbetac74acd2018-04-04 11:37:06 +0000189
Clement Courbet5ec03cd2018-05-18 12:33:57 +0000190 Specify the run mode.
Clement Courbetac74acd2018-04-04 11:37:06 +0000191
192.. option:: -num-repetitions=<Number of repetition>
193
194 Specify the number of repetitions of the asm snippet.
195 Higher values lead to more accurate measurements but lengthen the benchmark.
196
Simon Pilgrima5638432018-06-18 20:05:02 +0000197.. option:: -benchmarks-file=</path/to/file>
Clement Courbet5ec03cd2018-05-18 12:33:57 +0000198
199 File to read (`analysis` mode) or write (`latency`/`uops` modes) benchmark
200 results. "-" uses stdin/stdout.
201
202.. option:: -analysis-clusters-output-file=</path/to/file>
203
204 If provided, write the analysis clusters as CSV to this file. "-" prints to
205 stdout.
206
207.. option:: -analysis-inconsistencies-output-file=</path/to/file>
208
209 If non-empty, write inconsistencies found during analysis to this file. `-`
210 prints to stdout.
211
212.. option:: -analysis-numpoints=<dbscan numPoints parameter>
213
214 Specify the numPoints parameters to be used for DBSCAN clustering
215 (`analysis` mode).
216
217.. option:: -analysis-espilon=<dbscan epsilon parameter>
218
219 Specify the numPoints parameters to be used for DBSCAN clustering
220 (`analysis` mode).
221
Simon Pilgrima5638432018-06-18 20:05:02 +0000222.. option:: -ignore-invalid-sched-class=false
Clement Courbete752fd62018-06-18 11:27:47 +0000223
Simon Pilgrima5638432018-06-18 20:05:02 +0000224 If set, ignore instructions that do not have a sched class (class idx = 0).
Clement Courbete752fd62018-06-18 11:27:47 +0000225
Clement Courbetac74acd2018-04-04 11:37:06 +0000226
227EXIT STATUS
228-----------
229
230:program:`llvm-exegesis` returns 0 on success. Otherwise, an error message is
231printed to standard error, and the tool returns a non 0 value.