blob: ee7040cc68f4d1c7ba00a47c018490fb44c47fdc [file] [log] [blame]
Andrea Di Biagio3a6b0922018-03-08 13:05:02 +00001llvm-mca - LLVM Machine Code Analyzer
2=====================================
3
4SYNOPSIS
5--------
6
7:program:`llvm-mca` [*options*] [input]
8
9DESCRIPTION
10-----------
11
12:program:`llvm-mca` is a performance analysis tool that uses information
13available in LLVM (e.g. scheduling models) to statically measure the performance
14of machine code in a specific CPU.
15
16Performance is measured in terms of throughput as well as processor resource
17consumption. The tool currently works for processors with an out-of-order
18backend, for which there is a scheduling model available in LLVM.
19
20The main goal of this tool is not just to predict the performance of the code
21when run on the target, but also help with diagnosing potential performance
22issues.
23
24Given an assembly code sequence, llvm-mca estimates the IPC (Instructions Per
25Cycle), as well as hardware resource pressure. The analysis and reporting style
26were inspired by the IACA tool from Intel.
27
Andrea Di Biagioc6590122018-04-09 16:39:52 +000028:program:`llvm-mca` allows the usage of special code comments to mark regions of
29the assembly code to be analyzed. A comment starting with substring
30``LLVM-MCA-BEGIN`` marks the beginning of a code region. A comment starting with
31substring ``LLVM-MCA-END`` marks the end of a code region. For example:
32
33.. code-block:: none
34
35 # LLVM-MCA-BEGIN My Code Region
36 ...
37 # LLVM-MCA-END
38
39Multiple regions can be specified provided that they do not overlap. A code
40region can have an optional description. If no user defined region is specified,
41then :program:`llvm-mca` assumes a default region which contains every
42instruction in the input file. Every region is analyzed in isolation, and the
43final performance report is the union of all the reports generated for every
44code region.
45
Andrea Di Biagio3a6b0922018-03-08 13:05:02 +000046OPTIONS
47-------
48
49If ``input`` is "``-``" or omitted, :program:`llvm-mca` reads from standard
50input. Otherwise, it will read from the specified filename.
51
52If the :option:`-o` option is omitted, then :program:`llvm-mca` will send its output
53to standard output if the input is from standard input. If the :option:`-o`
54option specifies "``-``", then the output will also be sent to standard output.
55
56
57.. option:: -help
58
59 Print a summary of command line options.
60
61.. option:: -mtriple=<target triple>
62
63 Specify a target triple string.
64
65.. option:: -march=<arch>
66
67 Specify the architecture for which to analyze the code. It defaults to the
68 host default target.
69
70.. option:: -mcpu=<cpuname>
71
72 Specify the processor for whic to run the analysis.
73 By default this defaults to a "generic" processor. It is not autodetected to
74 the current architecture.
75
76.. option:: -output-asm-variant=<variant id>
77
78 Specify the output assembly variant for the report generated by the tool.
79 On x86, possible values are [0, 1]. A value of 0 (vic. 1) for this flag enables
80 the AT&T (vic. Intel) assembly format for the code printed out by the tool in
81 the analysis report.
82
83.. option:: -dispatch=<width>
84
85 Specify a different dispatch width for the processor. The dispatch width
Andrea Di Biagioefc3f392018-04-05 16:42:32 +000086 defaults to field 'IssueWidth' in the processor scheduling model. If width is
87 zero, then the default dispatch width is used.
Andrea Di Biagio3a6b0922018-03-08 13:05:02 +000088
Andrea Di Biagio3a6b0922018-03-08 13:05:02 +000089.. option:: -register-file-size=<size>
90
Andrea Di Biagioefc3f392018-04-05 16:42:32 +000091 Specify the size of the register file. When specified, this flag limits how
92 many temporary registers are available for register renaming purposes. A value
93 of zero for this flag means "unlimited number of temporary registers".
Andrea Di Biagio3a6b0922018-03-08 13:05:02 +000094
95.. option:: -iterations=<number of iterations>
96
97 Specify the number of iterations to run. If this flag is set to 0, then the
Andrea Di Biagio074cef32018-04-10 12:50:03 +000098 tool sets the number of iterations to a default value (i.e. 100).
Andrea Di Biagio3a6b0922018-03-08 13:05:02 +000099
100.. option:: -noalias=<bool>
101
102 If set, the tool assumes that loads and stores don't alias. This is the
103 default behavior.
104
105.. option:: -lqueue=<load queue size>
106
107 Specify the size of the load queue in the load/store unit emulated by the tool.
108 By default, the tool assumes an unbound number of entries in the load queue.
109 A value of zero for this flag is ignored, and the default load queue size is
110 used instead.
111
112.. option:: -squeue=<store queue size>
113
114 Specify the size of the store queue in the load/store unit emulated by the
115 tool. By default, the tool assumes an unbound number of entries in the store
116 queue. A value of zero for this flag is ignored, and the default store queue
117 size is used instead.
118
119.. option:: -verbose
120
121 Enable verbose output. In particular, this flag enables a number of extra
122 statistics and performance counters for the dispatch logic, the reorder
123 buffer, the retire control unit and the register file.
124
125.. option:: -timeline
126
127 Enable the timeline view.
128
129.. option:: -timeline-max-iterations=<iterations>
130
131 Limit the number of iterations to print in the timeline view. By default, the
132 timeline view prints information for up to 10 iterations.
133
134.. option:: -timeline-max-cycles=<cycles>
135
136 Limit the number of cycles in the timeline view. By default, the number of
137 cycles is set to 80.
138
Andrea Di Biagio1feccc22018-03-26 13:21:48 +0000139.. option:: -resource-pressure
140
141 Enable the resource pressure view. This is enabled by default.
142
Andrea Di Biagio8dabf4f2018-04-03 16:46:23 +0000143.. option:: -register-file-stats
144
145 Enable register file usage statistics.
146
Andrea Di Biagio821f6502018-04-10 14:55:14 +0000147.. option:: -dispatch-stats
148
149 Enable extra dispatch statistics. This view collects and analyzes instruction
150 dispatch events, as well as static/dynamic dispatch stall events. This view
151 is disabled by default.
152
Andrea Di Biagioff9c1092018-03-26 13:44:54 +0000153.. option:: -instruction-info
154
155 Enable the instruction info view. This is enabled by default.
156
Andrea Di Biagiod1569292018-03-26 12:04:53 +0000157.. option:: -instruction-tables
158
159 Prints resource pressure information based on the static information
160 available from the processor model. This differs from the resource pressure
161 view because it doesn't require that the code is simulated. It instead prints
162 the theoretical uniform distribution of resource pressure for every
163 instruction in sequence.
164
Andrea Di Biagio3a6b0922018-03-08 13:05:02 +0000165
166EXIT STATUS
167-----------
168
169:program:`llvm-mca` returns 0 on success. Otherwise, an error message is printed
170to standard error, and the tool returns 1.
171