Blame - llvm/docs/CommandGuide/llvm-exegesis.rst - toolchain/llvm-project

blob: 8f9ec1eed52096395113723c9491bac62565de8d [file] [log] [blame]

Clement Courbet	ac74acd	2018-04-04 11:37:06 +0000	[diff] [blame]	1	llvm-exegesis - LLVM Machine Instruction Benchmark
				2	==================================================
				3
James Henderson	a056684	2019-06-27 13:24:46 +0000	[diff] [blame^]	4	.. program:: llvm-exegesis
				5
Clement Courbet	ac74acd	2018-04-04 11:37:06 +0000	[diff] [blame]	6	SYNOPSIS
				7	--------
				8
				9	:program:`llvm-exegesis` [options]
				10
				11	DESCRIPTION
				12	-----------
				13
				14	:program:`llvm-exegesis` is a benchmarking tool that uses information available
Clement Courbet	362653f	2019-01-30 16:02:20 +0000	[diff] [blame]	15	in LLVM to measure host machine instruction characteristics like latency,
				16	throughput, or port decomposition.
Clement Courbet	ac74acd	2018-04-04 11:37:06 +0000	[diff] [blame]	17
				18	Given an LLVM opcode name and a benchmarking mode, :program:`llvm-exegesis`
				19	generates a code snippet that makes execution as serial (resp. as parallel) as
Clement Courbet	362653f	2019-01-30 16:02:20 +0000	[diff] [blame]	20	possible so that we can measure the latency (resp. inverse throughput/uop decomposition)
				21	of the instruction.
Clement Courbet	ac74acd	2018-04-04 11:37:06 +0000	[diff] [blame]	22	The code snippet is jitted and executed on the host subtarget. The time taken
				23	(resp. resource usage) is measured using hardware performance counters. The
				24	result is printed out as YAML to the standard output.
				25
				26	The main goal of this tool is to automatically (in)validate the LLVM's TableDef
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	27	scheduling models. To that end, we also provide analysis of the results.
				28
Clement Courbet	78b2e73	2018-09-25 07:31:44 +0000	[diff] [blame]	29	:program:`llvm-exegesis` can also benchmark arbitrary user-provided code
				30	snippets.
				31
				32	EXAMPLE 1: benchmarking instructions
				33	------------------------------------
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	34
				35	Assume you have an X86-64 machine. To measure the latency of a single
				36	instruction, run:
				37
				38	.. code-block:: bash
				39
				40	$ llvm-exegesis -mode=latency -opcode-name=ADD64rr
				41
Clement Courbet	362653f	2019-01-30 16:02:20 +0000	[diff] [blame]	42	Measuring the uop decomposition or inverse throughput of an instruction works similarly:
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	43
				44	.. code-block:: bash
				45
				46	$ llvm-exegesis -mode=uops -opcode-name=ADD64rr
Clement Courbet	362653f	2019-01-30 16:02:20 +0000	[diff] [blame]	47	$ llvm-exegesis -mode=inverse_throughput -opcode-name=ADD64rr
				48
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	49
				50	The output is a YAML document (the default is to write to stdout, but you can
				51	redirect the output to a file using `-benchmarks-file`):
				52
				53	.. code-block:: none
				54
				55	---
				56	key:
				57	opcode_name: ADD64rr
				58	mode: latency
				59	config: ''
				60	cpu_name: haswell
				61	llvm_triple: x86_64-unknown-linux-gnu
				62	num_repetitions: 10000
				63	measurements:
				64	- { key: latency, value: 1.0058, debug_string: '' }
				65	error: ''
				66	info: 'explicit self cycles, selecting one aliasing configuration.
				67	Snippet:
				68	ADD64rr R8, R8, R10
				69	'
				70	...
				71
				72	To measure the latency of all instructions for the host architecture, run:
				73
				74	.. code-block:: bash
				75
				76	#!/bin/bash
Clement Courbet	6eb680a	2018-06-01 14:49:06 +0000	[diff] [blame]	77	readonly INSTRUCTIONS=$(($(grep INSTRUCTION_LIST_END build/lib/Target/X86/X86GenInstrInfo.inc \| cut -f2 -d=) - 1))
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	78	for INSTRUCTION in $(seq 1 ${INSTRUCTIONS});
				79	do
				80	./build/bin/llvm-exegesis -mode=latency -opcode-index=${INSTRUCTION} \| sed -n '/---/,$p'
				81	done
				82
				83	FIXME: Provide an :program:`llvm-exegesis` option to test all instructions.
				84
Clement Courbet	78b2e73	2018-09-25 07:31:44 +0000	[diff] [blame]	85
				86	EXAMPLE 2: benchmarking a custom code snippet
				87	---------------------------------------------
				88
				89	To measure the latency/uops of a custom piece of code, you can specify the
				90	`snippets-file` option (`-` reads from standard input).
				91
				92	.. code-block:: bash
				93
				94	$ echo "vzeroupper" \| llvm-exegesis -mode=uops -snippets-file=-
				95
				96	Real-life code snippets typically depend on registers or memory.
				97	:program:`llvm-exegesis` checks the liveliness of registers (i.e. any register
				98	use has a corresponding def or is a "live in"). If your code depends on the
				99	value of some registers, you have two options:
Clement Courbet	86ecf46	2018-09-25 07:48:38 +0000	[diff] [blame]	100
				101	- Mark the register as requiring a definition. :program:`llvm-exegesis` will
				102	automatically assign a value to the register. This can be done using the
				103	directive `LLVM-EXEGESIS-DEFREG <reg name> <hex_value>`, where `<hex_value>`
				104	is a bit pattern used to fill `<reg_name>`. If `<hex_value>` is smaller than
				105	the register width, it will be sign-extended.
				106	- Mark the register as a "live in". :program:`llvm-exegesis` will benchmark
				107	using whatever value was in this registers on entry. This can be done using
				108	the directive `LLVM-EXEGESIS-LIVEIN <reg name>`.
Clement Courbet	78b2e73	2018-09-25 07:31:44 +0000	[diff] [blame]	109
				110	For example, the following code snippet depends on the values of XMM1 (which
				111	will be set by the tool) and the memory buffer passed in RDI (live in).
				112
				113	.. code-block:: none
				114
				115	# LLVM-EXEGESIS-LIVEIN RDI
				116	# LLVM-EXEGESIS-DEFREG XMM1 42
				117	vmulps (%rdi), %xmm1, %xmm2
				118	vhaddps %xmm2, %xmm2, %xmm3
				119	addq $0x10, %rdi
				120
				121
				122	EXAMPLE 3: analysis
				123	-------------------
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	124
				125	Assuming you have a set of benchmarked instructions (either latency or uops) as
				126	YAML in file `/tmp/benchmarks.yaml`, you can analyze the results using the
				127	following command:
				128
				129	.. code-block:: bash
				130
				131	$ llvm-exegesis -mode=analysis \
				132	-benchmarks-file=/tmp/benchmarks.yaml \
				133	-analysis-clusters-output-file=/tmp/clusters.csv \
Simon Pilgrim	c4976f6	2018-09-27 13:49:52 +0000	[diff] [blame]	134	-analysis-inconsistencies-output-file=/tmp/inconsistencies.html
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	135
				136	This will group the instructions into clusters with the same performance
				137	characteristics. The clusters will be written out to `/tmp/clusters.csv` in the
				138	following format:
				139
				140	.. code-block:: none
				141
				142	cluster_id,opcode_name,config,sched_class
				143	...
				144	2,ADD32ri8_DB,,WriteALU,1.00
				145	2,ADD32ri_DB,,WriteALU,1.01
				146	2,ADD32rr,,WriteALU,1.01
				147	2,ADD32rr_DB,,WriteALU,1.00
				148	2,ADD32rr_REV,,WriteALU,1.00
				149	2,ADD64i32,,WriteALU,1.01
				150	2,ADD64ri32,,WriteALU,1.01
				151	2,MOVSX64rr32,,BSWAP32r_BSWAP64r_MOVSX64rr32,1.00
				152	2,VPADDQYrr,,VPADDBYrr_VPADDDYrr_VPADDQYrr_VPADDWYrr_VPSUBBYrr_VPSUBDYrr_VPSUBQYrr_VPSUBWYrr,1.02
				153	2,VPSUBQYrr,,VPADDBYrr_VPADDDYrr_VPADDQYrr_VPADDWYrr_VPSUBBYrr_VPSUBDYrr_VPSUBQYrr_VPSUBWYrr,1.01
				154	2,ADD64ri8,,WriteALU,1.00
				155	2,SETBr,,WriteSETCC,1.01
				156	...
				157
				158	:program:`llvm-exegesis` will also analyze the clusters to point out
Clement Courbet	488ebfb	2018-05-22 13:36:29 +0000	[diff] [blame]	159	inconsistencies in the scheduling information. The output is an html file. For
Clement Courbet	2637e5f	2018-05-24 10:47:05 +0000	[diff] [blame]	160	example, `/tmp/inconsistencies.html` will contain messages like the following :
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	161
Clement Courbet	2637e5f	2018-05-24 10:47:05 +0000	[diff] [blame]	162	.. image:: llvm-exegesis-analysis.png
				163	:align: center
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	164
				165	Note that the scheduling class names will be resolved only when
				166	:program:`llvm-exegesis` is compiled in debug mode, else only the class id will
				167	be shown. This does not invalidate any of the analysis results though.
				168
Clement Courbet	ac74acd	2018-04-04 11:37:06 +0000	[diff] [blame]	169
				170	OPTIONS
				171	-------
				172
				173	.. option:: -help
				174
				175	Print a summary of command line options.
				176
				177	.. option:: -opcode-index=<LLVM opcode index>
				178
Clement Courbet	78b2e73	2018-09-25 07:31:44 +0000	[diff] [blame]	179	Specify the opcode to measure, by index. See example 1 for details.
				180	Either `opcode-index`, `opcode-name` or `snippets-file` must be set.
Clement Courbet	ac74acd	2018-04-04 11:37:06 +0000	[diff] [blame]	181
Clement Courbet	f973c2d	2018-10-17 15:04:15 +0000	[diff] [blame]	182	.. option:: -opcode-name=<opcode name 1>,<opcode name 2>,...
Clement Courbet	ac74acd	2018-04-04 11:37:06 +0000	[diff] [blame]	183
Clement Courbet	f973c2d	2018-10-17 15:04:15 +0000	[diff] [blame]	184	Specify the opcode to measure, by name. Several opcodes can be specified as
				185	a comma-separated list. See example 1 for details.
Clement Courbet	78b2e73	2018-09-25 07:31:44 +0000	[diff] [blame]	186	Either `opcode-index`, `opcode-name` or `snippets-file` must be set.
				187
				188	.. option:: -snippets-file=<filename>
				189
				190	Specify the custom code snippet to measure. See example 2 for details.
				191	Either `opcode-index`, `opcode-name` or `snippets-file` must be set.
Clement Courbet	ac74acd	2018-04-04 11:37:06 +0000	[diff] [blame]	192
Clement Courbet	362653f	2019-01-30 16:02:20 +0000	[diff] [blame]	193	.. option:: -mode=[latency\|uops\|inverse_throughput\|analysis]
Clement Courbet	ac74acd	2018-04-04 11:37:06 +0000	[diff] [blame]	194
Roman Lebedev	21193f4	2019-02-04 09:12:08 +0000	[diff] [blame]	195	Specify the run mode. Note that if you pick `analysis` mode, you also need
				196	to specify at least one of the `-analysis-clusters-output-file=` and
				197	`-analysis-inconsistencies-output-file=`.
Clement Courbet	ac74acd	2018-04-04 11:37:06 +0000	[diff] [blame]	198
				199	.. option:: -num-repetitions=<Number of repetition>
				200
				201	Specify the number of repetitions of the asm snippet.
				202	Higher values lead to more accurate measurements but lengthen the benchmark.
				203
Simon Pilgrim	a563843	2018-06-18 20:05:02 +0000	[diff] [blame]	204	.. option:: -benchmarks-file=</path/to/file>
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	205
Clement Courbet	362653f	2019-01-30 16:02:20 +0000	[diff] [blame]	206	File to read (`analysis` mode) or write (`latency`/`uops`/`inverse_throughput`
				207	modes) benchmark results. "-" uses stdin/stdout.
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	208
				209	.. option:: -analysis-clusters-output-file=</path/to/file>
				210
				211	If provided, write the analysis clusters as CSV to this file. "-" prints to
Roman Lebedev	21193f4	2019-02-04 09:12:08 +0000	[diff] [blame]	212	stdout. By default, this analysis is not run.
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	213
				214	.. option:: -analysis-inconsistencies-output-file=</path/to/file>
				215
				216	If non-empty, write inconsistencies found during analysis to this file. `-`
Roman Lebedev	21193f4	2019-02-04 09:12:08 +0000	[diff] [blame]	217	prints to stdout. By default, this analysis is not run.
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	218
Roman Lebedev	c2423fe	2019-03-28 08:55:01 +0000	[diff] [blame]	219	.. option:: -analysis-clustering=[dbscan,naive]
				220
				221	Specify the clustering algorithm to use. By default DBSCAN will be used.
				222	Naive clustering algorithm is better for doing further work on the
				223	`-analysis-inconsistencies-output-file=` output, it will create one cluster
				224	per opcode, and check that the cluster is stable (all points are neighbours).
				225
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	226	.. option:: -analysis-numpoints=<dbscan numPoints parameter>
				227
				228	Specify the numPoints parameters to be used for DBSCAN clustering
Roman Lebedev	c2423fe	2019-03-28 08:55:01 +0000	[diff] [blame]	229	(`analysis` mode, DBSCAN only).
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	230
Roman Lebedev	542e5d7	2019-02-25 09:36:12 +0000	[diff] [blame]	231	.. option:: -analysis-clustering-epsilon=<dbscan epsilon parameter>
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	232
Roman Lebedev	542e5d7	2019-02-25 09:36:12 +0000	[diff] [blame]	233	Specify the epsilon parameter used for clustering of benchmark points
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	234	(`analysis` mode).
				235
Roman Lebedev	542e5d7	2019-02-25 09:36:12 +0000	[diff] [blame]	236	.. option:: -analysis-inconsistency-epsilon=<epsilon>
				237
				238	Specify the epsilon parameter used for detection of when the cluster
				239	is different from the LLVM schedule profile values (`analysis` mode).
				240
Roman Lebedev	6971639	2019-02-20 09:14:04 +0000	[diff] [blame]	241	.. option:: -analysis-display-unstable-clusters
				242
				243	If there is more than one benchmark for an opcode, said benchmarks may end up
				244	not being clustered into the same cluster if the measured performance
				245	characteristics are different. by default all such opcodes are filtered out.
				246	This flag will instead show only such unstable opcodes.
				247
Simon Pilgrim	a563843	2018-06-18 20:05:02 +0000	[diff] [blame]	248	.. option:: -ignore-invalid-sched-class=false
Clement Courbet	e752fd6	2018-06-18 11:27:47 +0000	[diff] [blame]	249
Simon Pilgrim	a563843	2018-06-18 20:05:02 +0000	[diff] [blame]	250	If set, ignore instructions that do not have a sched class (class idx = 0).
Clement Courbet	e752fd6	2018-06-18 11:27:47 +0000	[diff] [blame]	251
Guillaume Chatelet	848df5b	2019-04-05 15:18:59 +0000	[diff] [blame]	252	.. option:: -mcpu=<cpu name>
Clement Courbet	41c8af3	2018-10-25 07:44:01 +0000	[diff] [blame]	253
Guillaume Chatelet	848df5b	2019-04-05 15:18:59 +0000	[diff] [blame]	254	If set, measure the cpu characteristics using the counters for this CPU. This
				255	is useful when creating new sched models (the host CPU is unknown to LLVM).
				256
				257	.. option:: --dump-object-to-disk=true
				258
				259	By default, llvm-exegesis will dump the generated code to a temporary file to
				260	enable code inspection. You may disable it to speed up the execution and save
				261	disk space.
Clement Courbet	ac74acd	2018-04-04 11:37:06 +0000	[diff] [blame]	262
				263	EXIT STATUS
				264	-----------
				265
				266	:program:`llvm-exegesis` returns 0 on success. Otherwise, an error message is
				267	printed to standard error, and the tool returns a non 0 value.