blob: e3e87a67367ce232e6f8aa9697b7785fe1872b17 [file] [log] [blame]
Andrea Di Biagio3a6b0922018-03-08 13:05:02 +00001llvm-mca - LLVM Machine Code Analyzer
2=====================================
3
4SYNOPSIS
5--------
6
7:program:`llvm-mca` [*options*] [input]
8
9DESCRIPTION
10-----------
11
12:program:`llvm-mca` is a performance analysis tool that uses information
13available in LLVM (e.g. scheduling models) to statically measure the performance
14of machine code in a specific CPU.
15
16Performance is measured in terms of throughput as well as processor resource
17consumption. The tool currently works for processors with an out-of-order
18backend, for which there is a scheduling model available in LLVM.
19
20The main goal of this tool is not just to predict the performance of the code
21when run on the target, but also help with diagnosing potential performance
22issues.
23
24Given an assembly code sequence, llvm-mca estimates the IPC (Instructions Per
25Cycle), as well as hardware resource pressure. The analysis and reporting style
26were inspired by the IACA tool from Intel.
27
Andrea Di Biagioc6590122018-04-09 16:39:52 +000028:program:`llvm-mca` allows the usage of special code comments to mark regions of
29the assembly code to be analyzed. A comment starting with substring
30``LLVM-MCA-BEGIN`` marks the beginning of a code region. A comment starting with
31substring ``LLVM-MCA-END`` marks the end of a code region. For example:
32
33.. code-block:: none
34
35 # LLVM-MCA-BEGIN My Code Region
36 ...
37 # LLVM-MCA-END
38
Sanjay Patel40ad9262018-04-10 18:10:14 +000039Multiple regions can be specified provided that they do not overlap. A code
40region can have an optional description. If no user-defined region is specified,
41then :program:`llvm-mca` assumes a default region which contains every
42instruction in the input file. Every region is analyzed in isolation, and the
43final performance report is the union of all the reports generated for every
44code region.
45
46Inline assembly directives may be used from source code to annotate the
Sanjay Patelc86033a2018-04-10 17:49:45 +000047assembly text:
48
49.. code-block:: c++
50
Sanjay Patele3a59e22018-04-10 17:56:24 +000051 int foo(int a, int b) {
52 __asm volatile("# LLVM-MCA-BEGIN foo");
53 a += 42;
54 __asm volatile("# LLVM-MCA-END");
Andrea Di Biagioef507cb2018-04-24 10:09:32 +000055 a *= b;
Sanjay Patele3a59e22018-04-10 17:56:24 +000056 return a;
57 }
Sanjay Patelc86033a2018-04-10 17:49:45 +000058
59So for example, you can compile code with clang, output assembly, and pipe it
60directly into llvm-mca for analysis:
61
62.. code-block:: bash
63
Sanjay Patel40ad9262018-04-10 18:10:14 +000064 $ clang foo.c -O2 -target x86_64-unknown-unknown -S -o - | llvm-mca -mcpu=btver2
Andrea Di Biagioc6590122018-04-09 16:39:52 +000065
Andrea Di Biagiod8d940a2018-05-17 16:48:53 +000066Or for Intel syntax:
67
68... code-block:: bash
69
70 $ clang foo.c -O2 -target x86_64-unknown-unknown -mllvm -x86-asm-syntax=intel -S -o - | llvm-mca -mcpu=btver2
71
Andrea Di Biagio3a6b0922018-03-08 13:05:02 +000072OPTIONS
73-------
74
75If ``input`` is "``-``" or omitted, :program:`llvm-mca` reads from standard
76input. Otherwise, it will read from the specified filename.
77
78If the :option:`-o` option is omitted, then :program:`llvm-mca` will send its output
79to standard output if the input is from standard input. If the :option:`-o`
80option specifies "``-``", then the output will also be sent to standard output.
81
82
83.. option:: -help
84
85 Print a summary of command line options.
86
87.. option:: -mtriple=<target triple>
88
89 Specify a target triple string.
90
91.. option:: -march=<arch>
92
93 Specify the architecture for which to analyze the code. It defaults to the
94 host default target.
95
96.. option:: -mcpu=<cpuname>
97
Andrea Di Biagio93c49d52018-04-25 10:18:25 +000098 Specify the processor for which to analyze the code. By default, the cpu name
99 is autodetected from the host.
Andrea Di Biagio3a6b0922018-03-08 13:05:02 +0000100
101.. option:: -output-asm-variant=<variant id>
102
103 Specify the output assembly variant for the report generated by the tool.
104 On x86, possible values are [0, 1]. A value of 0 (vic. 1) for this flag enables
105 the AT&T (vic. Intel) assembly format for the code printed out by the tool in
106 the analysis report.
107
108.. option:: -dispatch=<width>
109
110 Specify a different dispatch width for the processor. The dispatch width
Andrea Di Biagioefc3f392018-04-05 16:42:32 +0000111 defaults to field 'IssueWidth' in the processor scheduling model. If width is
112 zero, then the default dispatch width is used.
Andrea Di Biagio3a6b0922018-03-08 13:05:02 +0000113
Andrea Di Biagio3a6b0922018-03-08 13:05:02 +0000114.. option:: -register-file-size=<size>
115
Andrea Di Biagioefc3f392018-04-05 16:42:32 +0000116 Specify the size of the register file. When specified, this flag limits how
117 many temporary registers are available for register renaming purposes. A value
118 of zero for this flag means "unlimited number of temporary registers".
Andrea Di Biagio3a6b0922018-03-08 13:05:02 +0000119
120.. option:: -iterations=<number of iterations>
121
122 Specify the number of iterations to run. If this flag is set to 0, then the
Andrea Di Biagio074cef32018-04-10 12:50:03 +0000123 tool sets the number of iterations to a default value (i.e. 100).
Andrea Di Biagio3a6b0922018-03-08 13:05:02 +0000124
125.. option:: -noalias=<bool>
126
127 If set, the tool assumes that loads and stores don't alias. This is the
128 default behavior.
129
130.. option:: -lqueue=<load queue size>
131
132 Specify the size of the load queue in the load/store unit emulated by the tool.
133 By default, the tool assumes an unbound number of entries in the load queue.
134 A value of zero for this flag is ignored, and the default load queue size is
135 used instead.
136
137.. option:: -squeue=<store queue size>
138
139 Specify the size of the store queue in the load/store unit emulated by the
140 tool. By default, the tool assumes an unbound number of entries in the store
141 queue. A value of zero for this flag is ignored, and the default store queue
142 size is used instead.
143
Andrea Di Biagio3a6b0922018-03-08 13:05:02 +0000144.. option:: -timeline
145
146 Enable the timeline view.
147
148.. option:: -timeline-max-iterations=<iterations>
149
150 Limit the number of iterations to print in the timeline view. By default, the
151 timeline view prints information for up to 10 iterations.
152
153.. option:: -timeline-max-cycles=<cycles>
154
155 Limit the number of cycles in the timeline view. By default, the number of
156 cycles is set to 80.
157
Andrea Di Biagio1feccc22018-03-26 13:21:48 +0000158.. option:: -resource-pressure
159
160 Enable the resource pressure view. This is enabled by default.
161
Andrea Di Biagio8dabf4f2018-04-03 16:46:23 +0000162.. option:: -register-file-stats
163
164 Enable register file usage statistics.
165
Andrea Di Biagio821f6502018-04-10 14:55:14 +0000166.. option:: -dispatch-stats
167
168 Enable extra dispatch statistics. This view collects and analyzes instruction
169 dispatch events, as well as static/dynamic dispatch stall events. This view
170 is disabled by default.
171
Andrea Di Biagio1cc29c02018-04-11 11:37:46 +0000172.. option:: -scheduler-stats
173
174 Enable extra scheduler statistics. This view collects and analyzes instruction
175 issue events. This view is disabled by default.
176
Andrea Di Biagiof41ad5c2018-04-11 12:12:53 +0000177.. option:: -retire-stats
178
179 Enable extra retire control unit statistics. This view is disabled by default.
180
Andrea Di Biagioff9c1092018-03-26 13:44:54 +0000181.. option:: -instruction-info
182
183 Enable the instruction info view. This is enabled by default.
184
Andrea Di Biagio650b5fc2018-05-17 12:27:03 +0000185.. option:: -all-stats
186
187 Print all hardware statistics. This enables extra statistics related to the
188 dispatch logic, the hardware schedulers, the register file(s), and the retire
189 control unit. This option is disabled by default.
190
191.. option:: -all-views
192
193 Enable all the view.
194
Andrea Di Biagiod1569292018-03-26 12:04:53 +0000195.. option:: -instruction-tables
196
197 Prints resource pressure information based on the static information
198 available from the processor model. This differs from the resource pressure
199 view because it doesn't require that the code is simulated. It instead prints
200 the theoretical uniform distribution of resource pressure for every
201 instruction in sequence.
202
Andrea Di Biagio3a6b0922018-03-08 13:05:02 +0000203
204EXIT STATUS
205-----------
206
207:program:`llvm-mca` returns 0 on success. Otherwise, an error message is printed
208to standard error, and the tool returns 1.
209