blob: b6efd4ec2dc4d45d954f6c921462b50cc14244ea [file] [log] [blame]
Andrea Di Biagio3a6b0922018-03-08 13:05:02 +00001llvm-mca - LLVM Machine Code Analyzer
2=====================================
3
4SYNOPSIS
5--------
6
7:program:`llvm-mca` [*options*] [input]
8
9DESCRIPTION
10-----------
11
12:program:`llvm-mca` is a performance analysis tool that uses information
13available in LLVM (e.g. scheduling models) to statically measure the performance
14of machine code in a specific CPU.
15
16Performance is measured in terms of throughput as well as processor resource
17consumption. The tool currently works for processors with an out-of-order
18backend, for which there is a scheduling model available in LLVM.
19
20The main goal of this tool is not just to predict the performance of the code
21when run on the target, but also help with diagnosing potential performance
22issues.
23
24Given an assembly code sequence, llvm-mca estimates the IPC (Instructions Per
25Cycle), as well as hardware resource pressure. The analysis and reporting style
26were inspired by the IACA tool from Intel.
27
28OPTIONS
29-------
30
31If ``input`` is "``-``" or omitted, :program:`llvm-mca` reads from standard
32input. Otherwise, it will read from the specified filename.
33
34If the :option:`-o` option is omitted, then :program:`llvm-mca` will send its output
35to standard output if the input is from standard input. If the :option:`-o`
36option specifies "``-``", then the output will also be sent to standard output.
37
38
39.. option:: -help
40
41 Print a summary of command line options.
42
43.. option:: -mtriple=<target triple>
44
45 Specify a target triple string.
46
47.. option:: -march=<arch>
48
49 Specify the architecture for which to analyze the code. It defaults to the
50 host default target.
51
52.. option:: -mcpu=<cpuname>
53
54 Specify the processor for whic to run the analysis.
55 By default this defaults to a "generic" processor. It is not autodetected to
56 the current architecture.
57
58.. option:: -output-asm-variant=<variant id>
59
60 Specify the output assembly variant for the report generated by the tool.
61 On x86, possible values are [0, 1]. A value of 0 (vic. 1) for this flag enables
62 the AT&T (vic. Intel) assembly format for the code printed out by the tool in
63 the analysis report.
64
65.. option:: -dispatch=<width>
66
67 Specify a different dispatch width for the processor. The dispatch width
Andrea Di Biagioefc3f392018-04-05 16:42:32 +000068 defaults to field 'IssueWidth' in the processor scheduling model. If width is
69 zero, then the default dispatch width is used.
Andrea Di Biagio3a6b0922018-03-08 13:05:02 +000070
Andrea Di Biagio3a6b0922018-03-08 13:05:02 +000071.. option:: -register-file-size=<size>
72
Andrea Di Biagioefc3f392018-04-05 16:42:32 +000073 Specify the size of the register file. When specified, this flag limits how
74 many temporary registers are available for register renaming purposes. A value
75 of zero for this flag means "unlimited number of temporary registers".
Andrea Di Biagio3a6b0922018-03-08 13:05:02 +000076
77.. option:: -iterations=<number of iterations>
78
79 Specify the number of iterations to run. If this flag is set to 0, then the
80 tool sets the number of iterations to a default value (i.e. 70).
81
82.. option:: -noalias=<bool>
83
84 If set, the tool assumes that loads and stores don't alias. This is the
85 default behavior.
86
87.. option:: -lqueue=<load queue size>
88
89 Specify the size of the load queue in the load/store unit emulated by the tool.
90 By default, the tool assumes an unbound number of entries in the load queue.
91 A value of zero for this flag is ignored, and the default load queue size is
92 used instead.
93
94.. option:: -squeue=<store queue size>
95
96 Specify the size of the store queue in the load/store unit emulated by the
97 tool. By default, the tool assumes an unbound number of entries in the store
98 queue. A value of zero for this flag is ignored, and the default store queue
99 size is used instead.
100
101.. option:: -verbose
102
103 Enable verbose output. In particular, this flag enables a number of extra
104 statistics and performance counters for the dispatch logic, the reorder
105 buffer, the retire control unit and the register file.
106
107.. option:: -timeline
108
109 Enable the timeline view.
110
111.. option:: -timeline-max-iterations=<iterations>
112
113 Limit the number of iterations to print in the timeline view. By default, the
114 timeline view prints information for up to 10 iterations.
115
116.. option:: -timeline-max-cycles=<cycles>
117
118 Limit the number of cycles in the timeline view. By default, the number of
119 cycles is set to 80.
120
Andrea Di Biagio1feccc22018-03-26 13:21:48 +0000121.. option:: -resource-pressure
122
123 Enable the resource pressure view. This is enabled by default.
124
Andrea Di Biagio8dabf4f2018-04-03 16:46:23 +0000125.. option:: -register-file-stats
126
127 Enable register file usage statistics.
128
Andrea Di Biagioff9c1092018-03-26 13:44:54 +0000129.. option:: -instruction-info
130
131 Enable the instruction info view. This is enabled by default.
132
Andrea Di Biagiod1569292018-03-26 12:04:53 +0000133.. option:: -instruction-tables
134
135 Prints resource pressure information based on the static information
136 available from the processor model. This differs from the resource pressure
137 view because it doesn't require that the code is simulated. It instead prints
138 the theoretical uniform distribution of resource pressure for every
139 instruction in sequence.
140
Andrea Di Biagio3a6b0922018-03-08 13:05:02 +0000141
142EXIT STATUS
143-----------
144
145:program:`llvm-mca` returns 0 on success. Otherwise, an error message is printed
146to standard error, and the tool returns 1.
147