Blame - llvm/docs/CommandGuide/llvm-mca.rst - toolchain/llvm-project

blob: d6324c821c81a18847872aea26a0fcf1ab5fe8f1 [file] [log] [blame]

Andrea Di Biagio	3a6b092	2018-03-08 13:05:02 +0000	[diff] [blame]	1	llvm-mca - LLVM Machine Code Analyzer
				2	=====================================
				3
				4	SYNOPSIS
				5	--------
				6
				7	:program:`llvm-mca` [options] [input]
				8
				9	DESCRIPTION
				10	-----------
				11
				12	:program:`llvm-mca` is a performance analysis tool that uses information
				13	available in LLVM (e.g. scheduling models) to statically measure the performance
				14	of machine code in a specific CPU.
				15
				16	Performance is measured in terms of throughput as well as processor resource
				17	consumption. The tool currently works for processors with an out-of-order
				18	backend, for which there is a scheduling model available in LLVM.
				19
				20	The main goal of this tool is not just to predict the performance of the code
				21	when run on the target, but also help with diagnosing potential performance
				22	issues.
				23
				24	Given an assembly code sequence, llvm-mca estimates the IPC (Instructions Per
				25	Cycle), as well as hardware resource pressure. The analysis and reporting style
				26	were inspired by the IACA tool from Intel.
				27
Andrea Di Biagio	c659012	2018-04-09 16:39:52 +0000	[diff] [blame]	28	:program:`llvm-mca` allows the usage of special code comments to mark regions of
				29	the assembly code to be analyzed. A comment starting with substring
				30	``LLVM-MCA-BEGIN`` marks the beginning of a code region. A comment starting with
				31	substring ``LLVM-MCA-END`` marks the end of a code region. For example:
				32
				33	.. code-block:: none
				34
				35	# LLVM-MCA-BEGIN My Code Region
				36	...
				37	# LLVM-MCA-END
				38
Sanjay Patel	40ad926	2018-04-10 18:10:14 +0000	[diff] [blame]	39	Multiple regions can be specified provided that they do not overlap. A code
				40	region can have an optional description. If no user-defined region is specified,
				41	then :program:`llvm-mca` assumes a default region which contains every
				42	instruction in the input file. Every region is analyzed in isolation, and the
				43	final performance report is the union of all the reports generated for every
				44	code region.
				45
				46	Inline assembly directives may be used from source code to annotate the
Sanjay Patel	c86033a	2018-04-10 17:49:45 +0000	[diff] [blame]	47	assembly text:
				48
				49	.. code-block:: c++
				50
Sanjay Patel	e3a59e2	2018-04-10 17:56:24 +0000	[diff] [blame]	51	int foo(int a, int b) {
				52	__asm volatile("# LLVM-MCA-BEGIN foo");
				53	a += 42;
				54	__asm volatile("# LLVM-MCA-END");
Andrea Di Biagio	ef507cb	2018-04-24 10:09:32 +0000	[diff] [blame]	55	a *= b;
Sanjay Patel	e3a59e2	2018-04-10 17:56:24 +0000	[diff] [blame]	56	return a;
				57	}
Sanjay Patel	c86033a	2018-04-10 17:49:45 +0000	[diff] [blame]	58
				59	So for example, you can compile code with clang, output assembly, and pipe it
				60	directly into llvm-mca for analysis:
				61
				62	.. code-block:: bash
				63
Sanjay Patel	40ad926	2018-04-10 18:10:14 +0000	[diff] [blame]	64	$ clang foo.c -O2 -target x86_64-unknown-unknown -S -o - \| llvm-mca -mcpu=btver2
Andrea Di Biagio	c659012	2018-04-09 16:39:52 +0000	[diff] [blame]	65
Andrea Di Biagio	3a6b092	2018-03-08 13:05:02 +0000	[diff] [blame]	66	OPTIONS
				67	-------
				68
				69	If ``input`` is "``-``" or omitted, :program:`llvm-mca` reads from standard
				70	input. Otherwise, it will read from the specified filename.
				71
				72	If the :option:`-o` option is omitted, then :program:`llvm-mca` will send its output
				73	to standard output if the input is from standard input. If the :option:`-o`
				74	option specifies "``-``", then the output will also be sent to standard output.
				75
				76
				77	.. option:: -help
				78
				79	Print a summary of command line options.
				80
				81	.. option:: -mtriple=<target triple>
				82
				83	Specify a target triple string.
				84
				85	.. option:: -march=<arch>
				86
				87	Specify the architecture for which to analyze the code. It defaults to the
				88	host default target.
				89
				90	.. option:: -mcpu=<cpuname>
				91
Andrea Di Biagio	93c49d5	2018-04-25 10:18:25 +0000	[diff] [blame]	92	Specify the processor for which to analyze the code. By default, the cpu name
				93	is autodetected from the host.
Andrea Di Biagio	3a6b092	2018-03-08 13:05:02 +0000	[diff] [blame]	94
				95	.. option:: -output-asm-variant=<variant id>
				96
				97	Specify the output assembly variant for the report generated by the tool.
				98	On x86, possible values are [0, 1]. A value of 0 (vic. 1) for this flag enables
				99	the AT&T (vic. Intel) assembly format for the code printed out by the tool in
				100	the analysis report.
				101
				102	.. option:: -dispatch=<width>
				103
				104	Specify a different dispatch width for the processor. The dispatch width
Andrea Di Biagio	efc3f39	2018-04-05 16:42:32 +0000	[diff] [blame]	105	defaults to field 'IssueWidth' in the processor scheduling model. If width is
				106	zero, then the default dispatch width is used.
Andrea Di Biagio	3a6b092	2018-03-08 13:05:02 +0000	[diff] [blame]	107
Andrea Di Biagio	3a6b092	2018-03-08 13:05:02 +0000	[diff] [blame]	108	.. option:: -register-file-size=<size>
				109
Andrea Di Biagio	efc3f39	2018-04-05 16:42:32 +0000	[diff] [blame]	110	Specify the size of the register file. When specified, this flag limits how
				111	many temporary registers are available for register renaming purposes. A value
				112	of zero for this flag means "unlimited number of temporary registers".
Andrea Di Biagio	3a6b092	2018-03-08 13:05:02 +0000	[diff] [blame]	113
				114	.. option:: -iterations=<number of iterations>
				115
				116	Specify the number of iterations to run. If this flag is set to 0, then the
Andrea Di Biagio	074cef3	2018-04-10 12:50:03 +0000	[diff] [blame]	117	tool sets the number of iterations to a default value (i.e. 100).
Andrea Di Biagio	3a6b092	2018-03-08 13:05:02 +0000	[diff] [blame]	118
				119	.. option:: -noalias=<bool>
				120
				121	If set, the tool assumes that loads and stores don't alias. This is the
				122	default behavior.
				123
				124	.. option:: -lqueue=<load queue size>
				125
				126	Specify the size of the load queue in the load/store unit emulated by the tool.
				127	By default, the tool assumes an unbound number of entries in the load queue.
				128	A value of zero for this flag is ignored, and the default load queue size is
				129	used instead.
				130
				131	.. option:: -squeue=<store queue size>
				132
				133	Specify the size of the store queue in the load/store unit emulated by the
				134	tool. By default, the tool assumes an unbound number of entries in the store
				135	queue. A value of zero for this flag is ignored, and the default store queue
				136	size is used instead.
				137
Andrea Di Biagio	3a6b092	2018-03-08 13:05:02 +0000	[diff] [blame]	138	.. option:: -timeline
				139
				140	Enable the timeline view.
				141
				142	.. option:: -timeline-max-iterations=<iterations>
				143
				144	Limit the number of iterations to print in the timeline view. By default, the
				145	timeline view prints information for up to 10 iterations.
				146
				147	.. option:: -timeline-max-cycles=<cycles>
				148
				149	Limit the number of cycles in the timeline view. By default, the number of
				150	cycles is set to 80.
				151
Andrea Di Biagio	1feccc2	2018-03-26 13:21:48 +0000	[diff] [blame]	152	.. option:: -resource-pressure
				153
				154	Enable the resource pressure view. This is enabled by default.
				155
Andrea Di Biagio	8dabf4f	2018-04-03 16:46:23 +0000	[diff] [blame]	156	.. option:: -register-file-stats
				157
				158	Enable register file usage statistics.
				159
Andrea Di Biagio	821f650	2018-04-10 14:55:14 +0000	[diff] [blame]	160	.. option:: -dispatch-stats
				161
				162	Enable extra dispatch statistics. This view collects and analyzes instruction
				163	dispatch events, as well as static/dynamic dispatch stall events. This view
				164	is disabled by default.
				165
Andrea Di Biagio	1cc29c0	2018-04-11 11:37:46 +0000	[diff] [blame]	166	.. option:: -scheduler-stats
				167
				168	Enable extra scheduler statistics. This view collects and analyzes instruction
				169	issue events. This view is disabled by default.
				170
Andrea Di Biagio	f41ad5c	2018-04-11 12:12:53 +0000	[diff] [blame]	171	.. option:: -retire-stats
				172
				173	Enable extra retire control unit statistics. This view is disabled by default.
				174
Andrea Di Biagio	ff9c109	2018-03-26 13:44:54 +0000	[diff] [blame]	175	.. option:: -instruction-info
				176
				177	Enable the instruction info view. This is enabled by default.
				178
Andrea Di Biagio	650b5fc	2018-05-17 12:27:03 +0000	[diff] [blame^]	179	.. option:: -all-stats
				180
				181	Print all hardware statistics. This enables extra statistics related to the
				182	dispatch logic, the hardware schedulers, the register file(s), and the retire
				183	control unit. This option is disabled by default.
				184
				185	.. option:: -all-views
				186
				187	Enable all the view.
				188
Andrea Di Biagio	d156929	2018-03-26 12:04:53 +0000	[diff] [blame]	189	.. option:: -instruction-tables
				190
				191	Prints resource pressure information based on the static information
				192	available from the processor model. This differs from the resource pressure
				193	view because it doesn't require that the code is simulated. It instead prints
				194	the theoretical uniform distribution of resource pressure for every
				195	instruction in sequence.
				196
Andrea Di Biagio	3a6b092	2018-03-08 13:05:02 +0000	[diff] [blame]	197
				198	EXIT STATUS
				199	-----------
				200
				201	:program:`llvm-mca` returns 0 on success. Otherwise, an error message is printed
				202	to standard error, and the tool returns 1.
				203