Blame - llvm/docs/CommandGuide/llvm-exegesis.rst - toolchain/llvm-project

blob: 29f2cec2688a7356d2fe9be35426b89f4120e516 [file] [log] [blame]

Clement Courbet	ac74acd	2018-04-04 11:37:06 +0000	[diff] [blame]	1	llvm-exegesis - LLVM Machine Instruction Benchmark
				2	==================================================
				3
				4	SYNOPSIS
				5	--------
				6
				7	:program:`llvm-exegesis` [options]
				8
				9	DESCRIPTION
				10	-----------
				11
				12	:program:`llvm-exegesis` is a benchmarking tool that uses information available
Clement Courbet	362653f	2019-01-30 16:02:20 +0000	[diff] [blame]	13	in LLVM to measure host machine instruction characteristics like latency,
				14	throughput, or port decomposition.
Clement Courbet	ac74acd	2018-04-04 11:37:06 +0000	[diff] [blame]	15
				16	Given an LLVM opcode name and a benchmarking mode, :program:`llvm-exegesis`
				17	generates a code snippet that makes execution as serial (resp. as parallel) as
Clement Courbet	362653f	2019-01-30 16:02:20 +0000	[diff] [blame]	18	possible so that we can measure the latency (resp. inverse throughput/uop decomposition)
				19	of the instruction.
Clement Courbet	ac74acd	2018-04-04 11:37:06 +0000	[diff] [blame]	20	The code snippet is jitted and executed on the host subtarget. The time taken
				21	(resp. resource usage) is measured using hardware performance counters. The
				22	result is printed out as YAML to the standard output.
				23
				24	The main goal of this tool is to automatically (in)validate the LLVM's TableDef
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	25	scheduling models. To that end, we also provide analysis of the results.
				26
Clement Courbet	78b2e73	2018-09-25 07:31:44 +0000	[diff] [blame]	27	:program:`llvm-exegesis` can also benchmark arbitrary user-provided code
				28	snippets.
				29
				30	EXAMPLE 1: benchmarking instructions
				31	------------------------------------
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	32
				33	Assume you have an X86-64 machine. To measure the latency of a single
				34	instruction, run:
				35
				36	.. code-block:: bash
				37
				38	$ llvm-exegesis -mode=latency -opcode-name=ADD64rr
				39
Clement Courbet	362653f	2019-01-30 16:02:20 +0000	[diff] [blame]	40	Measuring the uop decomposition or inverse throughput of an instruction works similarly:
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	41
				42	.. code-block:: bash
				43
				44	$ llvm-exegesis -mode=uops -opcode-name=ADD64rr
Clement Courbet	362653f	2019-01-30 16:02:20 +0000	[diff] [blame]	45	$ llvm-exegesis -mode=inverse_throughput -opcode-name=ADD64rr
				46
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	47
				48	The output is a YAML document (the default is to write to stdout, but you can
				49	redirect the output to a file using `-benchmarks-file`):
				50
				51	.. code-block:: none
				52
				53	---
				54	key:
				55	opcode_name: ADD64rr
				56	mode: latency
				57	config: ''
				58	cpu_name: haswell
				59	llvm_triple: x86_64-unknown-linux-gnu
				60	num_repetitions: 10000
				61	measurements:
				62	- { key: latency, value: 1.0058, debug_string: '' }
				63	error: ''
				64	info: 'explicit self cycles, selecting one aliasing configuration.
				65	Snippet:
				66	ADD64rr R8, R8, R10
				67	'
				68	...
				69
				70	To measure the latency of all instructions for the host architecture, run:
				71
				72	.. code-block:: bash
				73
				74	#!/bin/bash
Clement Courbet	6eb680a	2018-06-01 14:49:06 +0000	[diff] [blame]	75	readonly INSTRUCTIONS=$(($(grep INSTRUCTION_LIST_END build/lib/Target/X86/X86GenInstrInfo.inc \| cut -f2 -d=) - 1))
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	76	for INSTRUCTION in $(seq 1 ${INSTRUCTIONS});
				77	do
				78	./build/bin/llvm-exegesis -mode=latency -opcode-index=${INSTRUCTION} \| sed -n '/---/,$p'
				79	done
				80
				81	FIXME: Provide an :program:`llvm-exegesis` option to test all instructions.
				82
Clement Courbet	78b2e73	2018-09-25 07:31:44 +0000	[diff] [blame]	83
				84	EXAMPLE 2: benchmarking a custom code snippet
				85	---------------------------------------------
				86
				87	To measure the latency/uops of a custom piece of code, you can specify the
				88	`snippets-file` option (`-` reads from standard input).
				89
				90	.. code-block:: bash
				91
				92	$ echo "vzeroupper" \| llvm-exegesis -mode=uops -snippets-file=-
				93
				94	Real-life code snippets typically depend on registers or memory.
				95	:program:`llvm-exegesis` checks the liveliness of registers (i.e. any register
				96	use has a corresponding def or is a "live in"). If your code depends on the
				97	value of some registers, you have two options:
Clement Courbet	86ecf46	2018-09-25 07:48:38 +0000	[diff] [blame]	98
				99	- Mark the register as requiring a definition. :program:`llvm-exegesis` will
				100	automatically assign a value to the register. This can be done using the
				101	directive `LLVM-EXEGESIS-DEFREG <reg name> <hex_value>`, where `<hex_value>`
				102	is a bit pattern used to fill `<reg_name>`. If `<hex_value>` is smaller than
				103	the register width, it will be sign-extended.
				104	- Mark the register as a "live in". :program:`llvm-exegesis` will benchmark
				105	using whatever value was in this registers on entry. This can be done using
				106	the directive `LLVM-EXEGESIS-LIVEIN <reg name>`.
Clement Courbet	78b2e73	2018-09-25 07:31:44 +0000	[diff] [blame]	107
				108	For example, the following code snippet depends on the values of XMM1 (which
				109	will be set by the tool) and the memory buffer passed in RDI (live in).
				110
				111	.. code-block:: none
				112
				113	# LLVM-EXEGESIS-LIVEIN RDI
				114	# LLVM-EXEGESIS-DEFREG XMM1 42
				115	vmulps (%rdi), %xmm1, %xmm2
				116	vhaddps %xmm2, %xmm2, %xmm3
				117	addq $0x10, %rdi
				118
				119
				120	EXAMPLE 3: analysis
				121	-------------------
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	122
				123	Assuming you have a set of benchmarked instructions (either latency or uops) as
				124	YAML in file `/tmp/benchmarks.yaml`, you can analyze the results using the
				125	following command:
				126
				127	.. code-block:: bash
				128
				129	$ llvm-exegesis -mode=analysis \
				130	-benchmarks-file=/tmp/benchmarks.yaml \
				131	-analysis-clusters-output-file=/tmp/clusters.csv \
Simon Pilgrim	c4976f6	2018-09-27 13:49:52 +0000	[diff] [blame]	132	-analysis-inconsistencies-output-file=/tmp/inconsistencies.html
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	133
				134	This will group the instructions into clusters with the same performance
				135	characteristics. The clusters will be written out to `/tmp/clusters.csv` in the
				136	following format:
				137
				138	.. code-block:: none
				139
				140	cluster_id,opcode_name,config,sched_class
				141	...
				142	2,ADD32ri8_DB,,WriteALU,1.00
				143	2,ADD32ri_DB,,WriteALU,1.01
				144	2,ADD32rr,,WriteALU,1.01
				145	2,ADD32rr_DB,,WriteALU,1.00
				146	2,ADD32rr_REV,,WriteALU,1.00
				147	2,ADD64i32,,WriteALU,1.01
				148	2,ADD64ri32,,WriteALU,1.01
				149	2,MOVSX64rr32,,BSWAP32r_BSWAP64r_MOVSX64rr32,1.00
				150	2,VPADDQYrr,,VPADDBYrr_VPADDDYrr_VPADDQYrr_VPADDWYrr_VPSUBBYrr_VPSUBDYrr_VPSUBQYrr_VPSUBWYrr,1.02
				151	2,VPSUBQYrr,,VPADDBYrr_VPADDDYrr_VPADDQYrr_VPADDWYrr_VPSUBBYrr_VPSUBDYrr_VPSUBQYrr_VPSUBWYrr,1.01
				152	2,ADD64ri8,,WriteALU,1.00
				153	2,SETBr,,WriteSETCC,1.01
				154	...
				155
				156	:program:`llvm-exegesis` will also analyze the clusters to point out
Clement Courbet	488ebfb	2018-05-22 13:36:29 +0000	[diff] [blame]	157	inconsistencies in the scheduling information. The output is an html file. For
Clement Courbet	2637e5f	2018-05-24 10:47:05 +0000	[diff] [blame]	158	example, `/tmp/inconsistencies.html` will contain messages like the following :
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	159
Clement Courbet	2637e5f	2018-05-24 10:47:05 +0000	[diff] [blame]	160	.. image:: llvm-exegesis-analysis.png
				161	:align: center
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	162
				163	Note that the scheduling class names will be resolved only when
				164	:program:`llvm-exegesis` is compiled in debug mode, else only the class id will
				165	be shown. This does not invalidate any of the analysis results though.
				166
Clement Courbet	ac74acd	2018-04-04 11:37:06 +0000	[diff] [blame]	167
				168	OPTIONS
				169	-------
				170
				171	.. option:: -help
				172
				173	Print a summary of command line options.
				174
				175	.. option:: -opcode-index=<LLVM opcode index>
				176
Clement Courbet	78b2e73	2018-09-25 07:31:44 +0000	[diff] [blame]	177	Specify the opcode to measure, by index. See example 1 for details.
				178	Either `opcode-index`, `opcode-name` or `snippets-file` must be set.
Clement Courbet	ac74acd	2018-04-04 11:37:06 +0000	[diff] [blame]	179
Clement Courbet	f973c2d	2018-10-17 15:04:15 +0000	[diff] [blame]	180	.. option:: -opcode-name=<opcode name 1>,<opcode name 2>,...
Clement Courbet	ac74acd	2018-04-04 11:37:06 +0000	[diff] [blame]	181
Clement Courbet	f973c2d	2018-10-17 15:04:15 +0000	[diff] [blame]	182	Specify the opcode to measure, by name. Several opcodes can be specified as
				183	a comma-separated list. See example 1 for details.
Clement Courbet	78b2e73	2018-09-25 07:31:44 +0000	[diff] [blame]	184	Either `opcode-index`, `opcode-name` or `snippets-file` must be set.
				185
				186	.. option:: -snippets-file=<filename>
				187
				188	Specify the custom code snippet to measure. See example 2 for details.
				189	Either `opcode-index`, `opcode-name` or `snippets-file` must be set.
Clement Courbet	ac74acd	2018-04-04 11:37:06 +0000	[diff] [blame]	190
Clement Courbet	362653f	2019-01-30 16:02:20 +0000	[diff] [blame]	191	.. option:: -mode=[latency\|uops\|inverse_throughput\|analysis]
Clement Courbet	ac74acd	2018-04-04 11:37:06 +0000	[diff] [blame]	192
Roman Lebedev	21193f4	2019-02-04 09:12:08 +0000	[diff] [blame]	193	Specify the run mode. Note that if you pick `analysis` mode, you also need
				194	to specify at least one of the `-analysis-clusters-output-file=` and
				195	`-analysis-inconsistencies-output-file=`.
Clement Courbet	ac74acd	2018-04-04 11:37:06 +0000	[diff] [blame]	196
				197	.. option:: -num-repetitions=<Number of repetition>
				198
				199	Specify the number of repetitions of the asm snippet.
				200	Higher values lead to more accurate measurements but lengthen the benchmark.
				201
Simon Pilgrim	a563843	2018-06-18 20:05:02 +0000	[diff] [blame]	202	.. option:: -benchmarks-file=</path/to/file>
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	203
Clement Courbet	362653f	2019-01-30 16:02:20 +0000	[diff] [blame]	204	File to read (`analysis` mode) or write (`latency`/`uops`/`inverse_throughput`
				205	modes) benchmark results. "-" uses stdin/stdout.
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	206
				207	.. option:: -analysis-clusters-output-file=</path/to/file>
				208
				209	If provided, write the analysis clusters as CSV to this file. "-" prints to
Roman Lebedev	21193f4	2019-02-04 09:12:08 +0000	[diff] [blame]	210	stdout. By default, this analysis is not run.
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	211
				212	.. option:: -analysis-inconsistencies-output-file=</path/to/file>
				213
				214	If non-empty, write inconsistencies found during analysis to this file. `-`
Roman Lebedev	21193f4	2019-02-04 09:12:08 +0000	[diff] [blame]	215	prints to stdout. By default, this analysis is not run.
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	216
Roman Lebedev	c2423fe	2019-03-28 08:55:01 +0000	[diff] [blame^]	217	.. option:: -analysis-clustering=[dbscan,naive]
				218
				219	Specify the clustering algorithm to use. By default DBSCAN will be used.
				220	Naive clustering algorithm is better for doing further work on the
				221	`-analysis-inconsistencies-output-file=` output, it will create one cluster
				222	per opcode, and check that the cluster is stable (all points are neighbours).
				223
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	224	.. option:: -analysis-numpoints=<dbscan numPoints parameter>
				225
				226	Specify the numPoints parameters to be used for DBSCAN clustering
Roman Lebedev	c2423fe	2019-03-28 08:55:01 +0000	[diff] [blame^]	227	(`analysis` mode, DBSCAN only).
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	228
Roman Lebedev	542e5d7	2019-02-25 09:36:12 +0000	[diff] [blame]	229	.. option:: -analysis-clustering-epsilon=<dbscan epsilon parameter>
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	230
Roman Lebedev	542e5d7	2019-02-25 09:36:12 +0000	[diff] [blame]	231	Specify the epsilon parameter used for clustering of benchmark points
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	232	(`analysis` mode).
				233
Roman Lebedev	542e5d7	2019-02-25 09:36:12 +0000	[diff] [blame]	234	.. option:: -analysis-inconsistency-epsilon=<epsilon>
				235
				236	Specify the epsilon parameter used for detection of when the cluster
				237	is different from the LLVM schedule profile values (`analysis` mode).
				238
Roman Lebedev	6971639	2019-02-20 09:14:04 +0000	[diff] [blame]	239	.. option:: -analysis-display-unstable-clusters
				240
				241	If there is more than one benchmark for an opcode, said benchmarks may end up
				242	not being clustered into the same cluster if the measured performance
				243	characteristics are different. by default all such opcodes are filtered out.
				244	This flag will instead show only such unstable opcodes.
				245
Simon Pilgrim	a563843	2018-06-18 20:05:02 +0000	[diff] [blame]	246	.. option:: -ignore-invalid-sched-class=false
Clement Courbet	e752fd6	2018-06-18 11:27:47 +0000	[diff] [blame]	247
Simon Pilgrim	a563843	2018-06-18 20:05:02 +0000	[diff] [blame]	248	If set, ignore instructions that do not have a sched class (class idx = 0).
Clement Courbet	e752fd6	2018-06-18 11:27:47 +0000	[diff] [blame]	249
Clement Courbet	41c8af3	2018-10-25 07:44:01 +0000	[diff] [blame]	250	.. option:: -mcpu=<cpu name>
				251
				252	If set, measure the cpu characteristics using the counters for this CPU. This
				253	is useful when creating new sched models (the host CPU is unknown to LLVM).
Clement Courbet	ac74acd	2018-04-04 11:37:06 +0000	[diff] [blame]	254
				255	EXIT STATUS
				256	-----------
				257
				258	:program:`llvm-exegesis` returns 0 on success. Otherwise, an error message is
				259	printed to standard error, and the tool returns a non 0 value.