Blame - llvm/docs/CommandGuide/llvm-mca.rst - toolchain/llvm-project

blob: b6efd4ec2dc4d45d954f6c921462b50cc14244ea [file] [log] [blame]

Andrea Di Biagio	3a6b092	2018-03-08 13:05:02 +0000	[diff] [blame]	1	llvm-mca - LLVM Machine Code Analyzer
				2	=====================================
				3
				4	SYNOPSIS
				5	--------
				6
				7	:program:`llvm-mca` [options] [input]
				8
				9	DESCRIPTION
				10	-----------
				11
				12	:program:`llvm-mca` is a performance analysis tool that uses information
				13	available in LLVM (e.g. scheduling models) to statically measure the performance
				14	of machine code in a specific CPU.
				15
				16	Performance is measured in terms of throughput as well as processor resource
				17	consumption. The tool currently works for processors with an out-of-order
				18	backend, for which there is a scheduling model available in LLVM.
				19
				20	The main goal of this tool is not just to predict the performance of the code
				21	when run on the target, but also help with diagnosing potential performance
				22	issues.
				23
				24	Given an assembly code sequence, llvm-mca estimates the IPC (Instructions Per
				25	Cycle), as well as hardware resource pressure. The analysis and reporting style
				26	were inspired by the IACA tool from Intel.
				27
				28	OPTIONS
				29	-------
				30
				31	If ``input`` is "``-``" or omitted, :program:`llvm-mca` reads from standard
				32	input. Otherwise, it will read from the specified filename.
				33
				34	If the :option:`-o` option is omitted, then :program:`llvm-mca` will send its output
				35	to standard output if the input is from standard input. If the :option:`-o`
				36	option specifies "``-``", then the output will also be sent to standard output.
				37
				38
				39	.. option:: -help
				40
				41	Print a summary of command line options.
				42
				43	.. option:: -mtriple=<target triple>
				44
				45	Specify a target triple string.
				46
				47	.. option:: -march=<arch>
				48
				49	Specify the architecture for which to analyze the code. It defaults to the
				50	host default target.
				51
				52	.. option:: -mcpu=<cpuname>
				53
				54	Specify the processor for whic to run the analysis.
				55	By default this defaults to a "generic" processor. It is not autodetected to
				56	the current architecture.
				57
				58	.. option:: -output-asm-variant=<variant id>
				59
				60	Specify the output assembly variant for the report generated by the tool.
				61	On x86, possible values are [0, 1]. A value of 0 (vic. 1) for this flag enables
				62	the AT&T (vic. Intel) assembly format for the code printed out by the tool in
				63	the analysis report.
				64
				65	.. option:: -dispatch=<width>
				66
				67	Specify a different dispatch width for the processor. The dispatch width
Andrea Di Biagio	efc3f39	2018-04-05 16:42:32 +0000	[diff] [blame^]	68	defaults to field 'IssueWidth' in the processor scheduling model. If width is
				69	zero, then the default dispatch width is used.
Andrea Di Biagio	3a6b092	2018-03-08 13:05:02 +0000	[diff] [blame]	70
Andrea Di Biagio	3a6b092	2018-03-08 13:05:02 +0000	[diff] [blame]	71	.. option:: -register-file-size=<size>
				72
Andrea Di Biagio	efc3f39	2018-04-05 16:42:32 +0000	[diff] [blame^]	73	Specify the size of the register file. When specified, this flag limits how
				74	many temporary registers are available for register renaming purposes. A value
				75	of zero for this flag means "unlimited number of temporary registers".
Andrea Di Biagio	3a6b092	2018-03-08 13:05:02 +0000	[diff] [blame]	76
				77	.. option:: -iterations=<number of iterations>
				78
				79	Specify the number of iterations to run. If this flag is set to 0, then the
				80	tool sets the number of iterations to a default value (i.e. 70).
				81
				82	.. option:: -noalias=<bool>
				83
				84	If set, the tool assumes that loads and stores don't alias. This is the
				85	default behavior.
				86
				87	.. option:: -lqueue=<load queue size>
				88
				89	Specify the size of the load queue in the load/store unit emulated by the tool.
				90	By default, the tool assumes an unbound number of entries in the load queue.
				91	A value of zero for this flag is ignored, and the default load queue size is
				92	used instead.
				93
				94	.. option:: -squeue=<store queue size>
				95
				96	Specify the size of the store queue in the load/store unit emulated by the
				97	tool. By default, the tool assumes an unbound number of entries in the store
				98	queue. A value of zero for this flag is ignored, and the default store queue
				99	size is used instead.
				100
				101	.. option:: -verbose
				102
				103	Enable verbose output. In particular, this flag enables a number of extra
				104	statistics and performance counters for the dispatch logic, the reorder
				105	buffer, the retire control unit and the register file.
				106
				107	.. option:: -timeline
				108
				109	Enable the timeline view.
				110
				111	.. option:: -timeline-max-iterations=<iterations>
				112
				113	Limit the number of iterations to print in the timeline view. By default, the
				114	timeline view prints information for up to 10 iterations.
				115
				116	.. option:: -timeline-max-cycles=<cycles>
				117
				118	Limit the number of cycles in the timeline view. By default, the number of
				119	cycles is set to 80.
				120
Andrea Di Biagio	1feccc2	2018-03-26 13:21:48 +0000	[diff] [blame]	121	.. option:: -resource-pressure
				122
				123	Enable the resource pressure view. This is enabled by default.
				124
Andrea Di Biagio	8dabf4f	2018-04-03 16:46:23 +0000	[diff] [blame]	125	.. option:: -register-file-stats
				126
				127	Enable register file usage statistics.
				128
Andrea Di Biagio	ff9c109	2018-03-26 13:44:54 +0000	[diff] [blame]	129	.. option:: -instruction-info
				130
				131	Enable the instruction info view. This is enabled by default.
				132
Andrea Di Biagio	d156929	2018-03-26 12:04:53 +0000	[diff] [blame]	133	.. option:: -instruction-tables
				134
				135	Prints resource pressure information based on the static information
				136	available from the processor model. This differs from the resource pressure
				137	view because it doesn't require that the code is simulated. It instead prints
				138	the theoretical uniform distribution of resource pressure for every
				139	instruction in sequence.
				140
Andrea Di Biagio	3a6b092	2018-03-08 13:05:02 +0000	[diff] [blame]	141
				142	EXIT STATUS
				143	-----------
				144
				145	:program:`llvm-mca` returns 0 on success. Otherwise, an error message is printed
				146	to standard error, and the tool returns 1.
				147