Blame - llvm/docs/CommandGuide/llvm-exegesis.rst - toolchain/llvm-project

blob: 81e92e7736d68a9ad5737217384062771a6a0310 [file] [log] [blame]

Clement Courbet	ac74acd	2018-04-04 11:37:06 +0000	[diff] [blame]	1	llvm-exegesis - LLVM Machine Instruction Benchmark
				2	==================================================
				3
James Henderson	a056684	2019-06-27 13:24:46 +0000	[diff] [blame]	4	.. program:: llvm-exegesis
				5
Clement Courbet	ac74acd	2018-04-04 11:37:06 +0000	[diff] [blame]	6	SYNOPSIS
				7	--------
				8
				9	:program:`llvm-exegesis` [options]
				10
				11	DESCRIPTION
				12	-----------
				13
				14	:program:`llvm-exegesis` is a benchmarking tool that uses information available
Clement Courbet	362653f	2019-01-30 16:02:20 +0000	[diff] [blame]	15	in LLVM to measure host machine instruction characteristics like latency,
				16	throughput, or port decomposition.
Clement Courbet	ac74acd	2018-04-04 11:37:06 +0000	[diff] [blame]	17
				18	Given an LLVM opcode name and a benchmarking mode, :program:`llvm-exegesis`
				19	generates a code snippet that makes execution as serial (resp. as parallel) as
Clement Courbet	362653f	2019-01-30 16:02:20 +0000	[diff] [blame]	20	possible so that we can measure the latency (resp. inverse throughput/uop decomposition)
				21	of the instruction.
Clement Courbet	ac74acd	2018-04-04 11:37:06 +0000	[diff] [blame]	22	The code snippet is jitted and executed on the host subtarget. The time taken
				23	(resp. resource usage) is measured using hardware performance counters. The
				24	result is printed out as YAML to the standard output.
				25
				26	The main goal of this tool is to automatically (in)validate the LLVM's TableDef
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	27	scheduling models. To that end, we also provide analysis of the results.
				28
Clement Courbet	78b2e73	2018-09-25 07:31:44 +0000	[diff] [blame]	29	:program:`llvm-exegesis` can also benchmark arbitrary user-provided code
				30	snippets.
				31
				32	EXAMPLE 1: benchmarking instructions
				33	------------------------------------
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	34
				35	Assume you have an X86-64 machine. To measure the latency of a single
				36	instruction, run:
				37
				38	.. code-block:: bash
				39
				40	$ llvm-exegesis -mode=latency -opcode-name=ADD64rr
				41
Clement Courbet	362653f	2019-01-30 16:02:20 +0000	[diff] [blame]	42	Measuring the uop decomposition or inverse throughput of an instruction works similarly:
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	43
				44	.. code-block:: bash
				45
				46	$ llvm-exegesis -mode=uops -opcode-name=ADD64rr
Clement Courbet	362653f	2019-01-30 16:02:20 +0000	[diff] [blame]	47	$ llvm-exegesis -mode=inverse_throughput -opcode-name=ADD64rr
				48
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	49
				50	The output is a YAML document (the default is to write to stdout, but you can
				51	redirect the output to a file using `-benchmarks-file`):
				52
				53	.. code-block:: none
				54
				55	---
				56	key:
				57	opcode_name: ADD64rr
				58	mode: latency
				59	config: ''
				60	cpu_name: haswell
				61	llvm_triple: x86_64-unknown-linux-gnu
				62	num_repetitions: 10000
				63	measurements:
				64	- { key: latency, value: 1.0058, debug_string: '' }
				65	error: ''
				66	info: 'explicit self cycles, selecting one aliasing configuration.
				67	Snippet:
				68	ADD64rr R8, R8, R10
				69	'
				70	...
				71
				72	To measure the latency of all instructions for the host architecture, run:
				73
				74	.. code-block:: bash
				75
				76	#!/bin/bash
Clement Courbet	6eb680a	2018-06-01 14:49:06 +0000	[diff] [blame]	77	readonly INSTRUCTIONS=$(($(grep INSTRUCTION_LIST_END build/lib/Target/X86/X86GenInstrInfo.inc \| cut -f2 -d=) - 1))
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	78	for INSTRUCTION in $(seq 1 ${INSTRUCTIONS});
				79	do
				80	./build/bin/llvm-exegesis -mode=latency -opcode-index=${INSTRUCTION} \| sed -n '/---/,$p'
				81	done
				82
				83	FIXME: Provide an :program:`llvm-exegesis` option to test all instructions.
				84
Clement Courbet	78b2e73	2018-09-25 07:31:44 +0000	[diff] [blame]	85
				86	EXAMPLE 2: benchmarking a custom code snippet
				87	---------------------------------------------
				88
				89	To measure the latency/uops of a custom piece of code, you can specify the
				90	`snippets-file` option (`-` reads from standard input).
				91
				92	.. code-block:: bash
				93
				94	$ echo "vzeroupper" \| llvm-exegesis -mode=uops -snippets-file=-
				95
				96	Real-life code snippets typically depend on registers or memory.
				97	:program:`llvm-exegesis` checks the liveliness of registers (i.e. any register
				98	use has a corresponding def or is a "live in"). If your code depends on the
				99	value of some registers, you have two options:
Clement Courbet	86ecf46	2018-09-25 07:48:38 +0000	[diff] [blame]	100
				101	- Mark the register as requiring a definition. :program:`llvm-exegesis` will
				102	automatically assign a value to the register. This can be done using the
				103	directive `LLVM-EXEGESIS-DEFREG <reg name> <hex_value>`, where `<hex_value>`
				104	is a bit pattern used to fill `<reg_name>`. If `<hex_value>` is smaller than
				105	the register width, it will be sign-extended.
				106	- Mark the register as a "live in". :program:`llvm-exegesis` will benchmark
				107	using whatever value was in this registers on entry. This can be done using
				108	the directive `LLVM-EXEGESIS-LIVEIN <reg name>`.
Clement Courbet	78b2e73	2018-09-25 07:31:44 +0000	[diff] [blame]	109
				110	For example, the following code snippet depends on the values of XMM1 (which
				111	will be set by the tool) and the memory buffer passed in RDI (live in).
				112
				113	.. code-block:: none
				114
				115	# LLVM-EXEGESIS-LIVEIN RDI
				116	# LLVM-EXEGESIS-DEFREG XMM1 42
				117	vmulps (%rdi), %xmm1, %xmm2
				118	vhaddps %xmm2, %xmm2, %xmm3
				119	addq $0x10, %rdi
				120
				121
				122	EXAMPLE 3: analysis
				123	-------------------
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	124
				125	Assuming you have a set of benchmarked instructions (either latency or uops) as
				126	YAML in file `/tmp/benchmarks.yaml`, you can analyze the results using the
				127	following command:
				128
				129	.. code-block:: bash
				130
				131	$ llvm-exegesis -mode=analysis \
				132	-benchmarks-file=/tmp/benchmarks.yaml \
				133	-analysis-clusters-output-file=/tmp/clusters.csv \
Simon Pilgrim	c4976f6	2018-09-27 13:49:52 +0000	[diff] [blame]	134	-analysis-inconsistencies-output-file=/tmp/inconsistencies.html
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	135
				136	This will group the instructions into clusters with the same performance
				137	characteristics. The clusters will be written out to `/tmp/clusters.csv` in the
				138	following format:
				139
				140	.. code-block:: none
				141
				142	cluster_id,opcode_name,config,sched_class
				143	...
				144	2,ADD32ri8_DB,,WriteALU,1.00
				145	2,ADD32ri_DB,,WriteALU,1.01
				146	2,ADD32rr,,WriteALU,1.01
				147	2,ADD32rr_DB,,WriteALU,1.00
				148	2,ADD32rr_REV,,WriteALU,1.00
				149	2,ADD64i32,,WriteALU,1.01
				150	2,ADD64ri32,,WriteALU,1.01
				151	2,MOVSX64rr32,,BSWAP32r_BSWAP64r_MOVSX64rr32,1.00
				152	2,VPADDQYrr,,VPADDBYrr_VPADDDYrr_VPADDQYrr_VPADDWYrr_VPSUBBYrr_VPSUBDYrr_VPSUBQYrr_VPSUBWYrr,1.02
				153	2,VPSUBQYrr,,VPADDBYrr_VPADDDYrr_VPADDQYrr_VPADDWYrr_VPSUBBYrr_VPSUBDYrr_VPSUBQYrr_VPSUBWYrr,1.01
				154	2,ADD64ri8,,WriteALU,1.00
				155	2,SETBr,,WriteSETCC,1.01
				156	...
				157
				158	:program:`llvm-exegesis` will also analyze the clusters to point out
Clement Courbet	488ebfb	2018-05-22 13:36:29 +0000	[diff] [blame]	159	inconsistencies in the scheduling information. The output is an html file. For
Clement Courbet	2637e5f	2018-05-24 10:47:05 +0000	[diff] [blame]	160	example, `/tmp/inconsistencies.html` will contain messages like the following :
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	161
Clement Courbet	2637e5f	2018-05-24 10:47:05 +0000	[diff] [blame]	162	.. image:: llvm-exegesis-analysis.png
				163	:align: center
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	164
				165	Note that the scheduling class names will be resolved only when
				166	:program:`llvm-exegesis` is compiled in debug mode, else only the class id will
				167	be shown. This does not invalidate any of the analysis results though.
				168
Clement Courbet	ac74acd	2018-04-04 11:37:06 +0000	[diff] [blame]	169	OPTIONS
				170	-------
				171
				172	.. option:: -help
				173
				174	Print a summary of command line options.
				175
				176	.. option:: -opcode-index=<LLVM opcode index>
				177
Clement Courbet	78b2e73	2018-09-25 07:31:44 +0000	[diff] [blame]	178	Specify the opcode to measure, by index. See example 1 for details.
				179	Either `opcode-index`, `opcode-name` or `snippets-file` must be set.
Clement Courbet	ac74acd	2018-04-04 11:37:06 +0000	[diff] [blame]	180
Clement Courbet	f973c2d	2018-10-17 15:04:15 +0000	[diff] [blame]	181	.. option:: -opcode-name=<opcode name 1>,<opcode name 2>,...
Clement Courbet	ac74acd	2018-04-04 11:37:06 +0000	[diff] [blame]	182
Clement Courbet	f973c2d	2018-10-17 15:04:15 +0000	[diff] [blame]	183	Specify the opcode to measure, by name. Several opcodes can be specified as
				184	a comma-separated list. See example 1 for details.
Clement Courbet	78b2e73	2018-09-25 07:31:44 +0000	[diff] [blame]	185	Either `opcode-index`, `opcode-name` or `snippets-file` must be set.
				186
				187	.. option:: -snippets-file=<filename>
				188
				189	Specify the custom code snippet to measure. See example 2 for details.
				190	Either `opcode-index`, `opcode-name` or `snippets-file` must be set.
Clement Courbet	ac74acd	2018-04-04 11:37:06 +0000	[diff] [blame]	191
Clement Courbet	362653f	2019-01-30 16:02:20 +0000	[diff] [blame]	192	.. option:: -mode=[latency\|uops\|inverse_throughput\|analysis]
Clement Courbet	ac74acd	2018-04-04 11:37:06 +0000	[diff] [blame]	193
Roman Lebedev	21193f4	2019-02-04 09:12:08 +0000	[diff] [blame]	194	Specify the run mode. Note that if you pick `analysis` mode, you also need
				195	to specify at least one of the `-analysis-clusters-output-file=` and
				196	`-analysis-inconsistencies-output-file=`.
Clement Courbet	ac74acd	2018-04-04 11:37:06 +0000	[diff] [blame]	197
Clement Courbet	2cd0f28	2019-10-08 14:30:24 +0000	[diff] [blame^]	198	.. option:: -num-repetitions=<Number of repetitions>
Clement Courbet	ac74acd	2018-04-04 11:37:06 +0000	[diff] [blame]	199
				200	Specify the number of repetitions of the asm snippet.
				201	Higher values lead to more accurate measurements but lengthen the benchmark.
				202
Clement Courbet	2cd0f28	2019-10-08 14:30:24 +0000	[diff] [blame^]	203	.. option:: -max-configs-per-opcode=<value>
				204
				205	Specify the maximum configurations that can be generated for each opcode.
				206	By default this is `1`, meaning that we assume that a single measurement is
				207	enough to characterize an opcode. This might not be true of all instructions:
				208	for example, the performance characteristics of the LEA instruction on X86
				209	depends on the value of assigned registers and immediates. Setting a value of
				210	`-max-configs-per-opcode` larger than `1` allows `llvm-exegesis` to explore
				211	more configurations to discover if some register or immediate assignments
				212	lead to different performance characteristics.
				213
				214
Simon Pilgrim	a563843	2018-06-18 20:05:02 +0000	[diff] [blame]	215	.. option:: -benchmarks-file=</path/to/file>
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	216
Clement Courbet	362653f	2019-01-30 16:02:20 +0000	[diff] [blame]	217	File to read (`analysis` mode) or write (`latency`/`uops`/`inverse_throughput`
				218	modes) benchmark results. "-" uses stdin/stdout.
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	219
				220	.. option:: -analysis-clusters-output-file=</path/to/file>
				221
				222	If provided, write the analysis clusters as CSV to this file. "-" prints to
Roman Lebedev	21193f4	2019-02-04 09:12:08 +0000	[diff] [blame]	223	stdout. By default, this analysis is not run.
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	224
				225	.. option:: -analysis-inconsistencies-output-file=</path/to/file>
				226
				227	If non-empty, write inconsistencies found during analysis to this file. `-`
Roman Lebedev	21193f4	2019-02-04 09:12:08 +0000	[diff] [blame]	228	prints to stdout. By default, this analysis is not run.
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	229
Roman Lebedev	c2423fe	2019-03-28 08:55:01 +0000	[diff] [blame]	230	.. option:: -analysis-clustering=[dbscan,naive]
				231
				232	Specify the clustering algorithm to use. By default DBSCAN will be used.
				233	Naive clustering algorithm is better for doing further work on the
				234	`-analysis-inconsistencies-output-file=` output, it will create one cluster
				235	per opcode, and check that the cluster is stable (all points are neighbours).
				236
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	237	.. option:: -analysis-numpoints=<dbscan numPoints parameter>
				238
				239	Specify the numPoints parameters to be used for DBSCAN clustering
Roman Lebedev	c2423fe	2019-03-28 08:55:01 +0000	[diff] [blame]	240	(`analysis` mode, DBSCAN only).
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	241
Roman Lebedev	542e5d7	2019-02-25 09:36:12 +0000	[diff] [blame]	242	.. option:: -analysis-clustering-epsilon=<dbscan epsilon parameter>
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	243
Roman Lebedev	542e5d7	2019-02-25 09:36:12 +0000	[diff] [blame]	244	Specify the epsilon parameter used for clustering of benchmark points
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	245	(`analysis` mode).
				246
Roman Lebedev	542e5d7	2019-02-25 09:36:12 +0000	[diff] [blame]	247	.. option:: -analysis-inconsistency-epsilon=<epsilon>
				248
				249	Specify the epsilon parameter used for detection of when the cluster
				250	is different from the LLVM schedule profile values (`analysis` mode).
				251
Roman Lebedev	6971639	2019-02-20 09:14:04 +0000	[diff] [blame]	252	.. option:: -analysis-display-unstable-clusters
				253
				254	If there is more than one benchmark for an opcode, said benchmarks may end up
				255	not being clustered into the same cluster if the measured performance
				256	characteristics are different. by default all such opcodes are filtered out.
				257	This flag will instead show only such unstable opcodes.
				258
Simon Pilgrim	a563843	2018-06-18 20:05:02 +0000	[diff] [blame]	259	.. option:: -ignore-invalid-sched-class=false
Clement Courbet	e752fd6	2018-06-18 11:27:47 +0000	[diff] [blame]	260
Simon Pilgrim	a563843	2018-06-18 20:05:02 +0000	[diff] [blame]	261	If set, ignore instructions that do not have a sched class (class idx = 0).
Clement Courbet	e752fd6	2018-06-18 11:27:47 +0000	[diff] [blame]	262
Guillaume Chatelet	848df5b	2019-04-05 15:18:59 +0000	[diff] [blame]	263	.. option:: -mcpu=<cpu name>
Clement Courbet	41c8af3	2018-10-25 07:44:01 +0000	[diff] [blame]	264
Guillaume Chatelet	848df5b	2019-04-05 15:18:59 +0000	[diff] [blame]	265	If set, measure the cpu characteristics using the counters for this CPU. This
				266	is useful when creating new sched models (the host CPU is unknown to LLVM).
				267
				268	.. option:: --dump-object-to-disk=true
				269
				270	By default, llvm-exegesis will dump the generated code to a temporary file to
				271	enable code inspection. You may disable it to speed up the execution and save
				272	disk space.
Clement Courbet	ac74acd	2018-04-04 11:37:06 +0000	[diff] [blame]	273
				274	EXIT STATUS
				275	-----------
				276
				277	:program:`llvm-exegesis` returns 0 on success. Otherwise, an error message is
				278	printed to standard error, and the tool returns a non 0 value.