Blame - llvm/docs/CommandGuide/llvm-exegesis.rst - toolchain/llvm-project

blob: 4181a9987213eec01094d92722c429bd34ffc5c9 [file] [log] [blame]

Clement Courbet	ac74acd	2018-04-04 11:37:06 +0000	[diff] [blame]	1	llvm-exegesis - LLVM Machine Instruction Benchmark
				2	==================================================
				3
				4	SYNOPSIS
				5	--------
				6
				7	:program:`llvm-exegesis` [options]
				8
				9	DESCRIPTION
				10	-----------
				11
				12	:program:`llvm-exegesis` is a benchmarking tool that uses information available
				13	in LLVM to measure host machine instruction characteristics like latency or port
				14	decomposition.
				15
				16	Given an LLVM opcode name and a benchmarking mode, :program:`llvm-exegesis`
				17	generates a code snippet that makes execution as serial (resp. as parallel) as
				18	possible so that we can measure the latency (resp. uop decomposition) of the
				19	instruction.
				20	The code snippet is jitted and executed on the host subtarget. The time taken
				21	(resp. resource usage) is measured using hardware performance counters. The
				22	result is printed out as YAML to the standard output.
				23
				24	The main goal of this tool is to automatically (in)validate the LLVM's TableDef
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	25	scheduling models. To that end, we also provide analysis of the results.
				26
Clement Courbet	78b2e73	2018-09-25 07:31:44 +0000	[diff] [blame]	27	:program:`llvm-exegesis` can also benchmark arbitrary user-provided code
				28	snippets.
				29
				30	EXAMPLE 1: benchmarking instructions
				31	------------------------------------
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	32
				33	Assume you have an X86-64 machine. To measure the latency of a single
				34	instruction, run:
				35
				36	.. code-block:: bash
				37
				38	$ llvm-exegesis -mode=latency -opcode-name=ADD64rr
				39
				40	Measuring the uop decomposition of an instruction works similarly:
				41
				42	.. code-block:: bash
				43
				44	$ llvm-exegesis -mode=uops -opcode-name=ADD64rr
				45
				46	The output is a YAML document (the default is to write to stdout, but you can
				47	redirect the output to a file using `-benchmarks-file`):
				48
				49	.. code-block:: none
				50
				51	---
				52	key:
				53	opcode_name: ADD64rr
				54	mode: latency
				55	config: ''
				56	cpu_name: haswell
				57	llvm_triple: x86_64-unknown-linux-gnu
				58	num_repetitions: 10000
				59	measurements:
				60	- { key: latency, value: 1.0058, debug_string: '' }
				61	error: ''
				62	info: 'explicit self cycles, selecting one aliasing configuration.
				63	Snippet:
				64	ADD64rr R8, R8, R10
				65	'
				66	...
				67
				68	To measure the latency of all instructions for the host architecture, run:
				69
				70	.. code-block:: bash
				71
				72	#!/bin/bash
Clement Courbet	6eb680a	2018-06-01 14:49:06 +0000	[diff] [blame]	73	readonly INSTRUCTIONS=$(($(grep INSTRUCTION_LIST_END build/lib/Target/X86/X86GenInstrInfo.inc \| cut -f2 -d=) - 1))
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	74	for INSTRUCTION in $(seq 1 ${INSTRUCTIONS});
				75	do
				76	./build/bin/llvm-exegesis -mode=latency -opcode-index=${INSTRUCTION} \| sed -n '/---/,$p'
				77	done
				78
				79	FIXME: Provide an :program:`llvm-exegesis` option to test all instructions.
				80
Clement Courbet	78b2e73	2018-09-25 07:31:44 +0000	[diff] [blame]	81
				82	EXAMPLE 2: benchmarking a custom code snippet
				83	---------------------------------------------
				84
				85	To measure the latency/uops of a custom piece of code, you can specify the
				86	`snippets-file` option (`-` reads from standard input).
				87
				88	.. code-block:: bash
				89
				90	$ echo "vzeroupper" \| llvm-exegesis -mode=uops -snippets-file=-
				91
				92	Real-life code snippets typically depend on registers or memory.
				93	:program:`llvm-exegesis` checks the liveliness of registers (i.e. any register
				94	use has a corresponding def or is a "live in"). If your code depends on the
				95	value of some registers, you have two options:
Clement Courbet	86ecf46	2018-09-25 07:48:38 +0000	[diff] [blame]	96
				97	- Mark the register as requiring a definition. :program:`llvm-exegesis` will
				98	automatically assign a value to the register. This can be done using the
				99	directive `LLVM-EXEGESIS-DEFREG <reg name> <hex_value>`, where `<hex_value>`
				100	is a bit pattern used to fill `<reg_name>`. If `<hex_value>` is smaller than
				101	the register width, it will be sign-extended.
				102	- Mark the register as a "live in". :program:`llvm-exegesis` will benchmark
				103	using whatever value was in this registers on entry. This can be done using
				104	the directive `LLVM-EXEGESIS-LIVEIN <reg name>`.
Clement Courbet	78b2e73	2018-09-25 07:31:44 +0000	[diff] [blame]	105
				106	For example, the following code snippet depends on the values of XMM1 (which
				107	will be set by the tool) and the memory buffer passed in RDI (live in).
				108
				109	.. code-block:: none
				110
				111	# LLVM-EXEGESIS-LIVEIN RDI
				112	# LLVM-EXEGESIS-DEFREG XMM1 42
				113	vmulps (%rdi), %xmm1, %xmm2
				114	vhaddps %xmm2, %xmm2, %xmm3
				115	addq $0x10, %rdi
				116
				117
				118	EXAMPLE 3: analysis
				119	-------------------
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	120
				121	Assuming you have a set of benchmarked instructions (either latency or uops) as
				122	YAML in file `/tmp/benchmarks.yaml`, you can analyze the results using the
				123	following command:
				124
				125	.. code-block:: bash
				126
				127	$ llvm-exegesis -mode=analysis \
				128	-benchmarks-file=/tmp/benchmarks.yaml \
				129	-analysis-clusters-output-file=/tmp/clusters.csv \
Simon Pilgrim	c4976f6	2018-09-27 13:49:52 +0000	[diff] [blame]	130	-analysis-inconsistencies-output-file=/tmp/inconsistencies.html
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	131
				132	This will group the instructions into clusters with the same performance
				133	characteristics. The clusters will be written out to `/tmp/clusters.csv` in the
				134	following format:
				135
				136	.. code-block:: none
				137
				138	cluster_id,opcode_name,config,sched_class
				139	...
				140	2,ADD32ri8_DB,,WriteALU,1.00
				141	2,ADD32ri_DB,,WriteALU,1.01
				142	2,ADD32rr,,WriteALU,1.01
				143	2,ADD32rr_DB,,WriteALU,1.00
				144	2,ADD32rr_REV,,WriteALU,1.00
				145	2,ADD64i32,,WriteALU,1.01
				146	2,ADD64ri32,,WriteALU,1.01
				147	2,MOVSX64rr32,,BSWAP32r_BSWAP64r_MOVSX64rr32,1.00
				148	2,VPADDQYrr,,VPADDBYrr_VPADDDYrr_VPADDQYrr_VPADDWYrr_VPSUBBYrr_VPSUBDYrr_VPSUBQYrr_VPSUBWYrr,1.02
				149	2,VPSUBQYrr,,VPADDBYrr_VPADDDYrr_VPADDQYrr_VPADDWYrr_VPSUBBYrr_VPSUBDYrr_VPSUBQYrr_VPSUBWYrr,1.01
				150	2,ADD64ri8,,WriteALU,1.00
				151	2,SETBr,,WriteSETCC,1.01
				152	...
				153
				154	:program:`llvm-exegesis` will also analyze the clusters to point out
Clement Courbet	488ebfb	2018-05-22 13:36:29 +0000	[diff] [blame]	155	inconsistencies in the scheduling information. The output is an html file. For
Clement Courbet	2637e5f	2018-05-24 10:47:05 +0000	[diff] [blame]	156	example, `/tmp/inconsistencies.html` will contain messages like the following :
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	157
Clement Courbet	2637e5f	2018-05-24 10:47:05 +0000	[diff] [blame]	158	.. image:: llvm-exegesis-analysis.png
				159	:align: center
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	160
				161	Note that the scheduling class names will be resolved only when
				162	:program:`llvm-exegesis` is compiled in debug mode, else only the class id will
				163	be shown. This does not invalidate any of the analysis results though.
				164
Clement Courbet	ac74acd	2018-04-04 11:37:06 +0000	[diff] [blame]	165
				166	OPTIONS
				167	-------
				168
				169	.. option:: -help
				170
				171	Print a summary of command line options.
				172
				173	.. option:: -opcode-index=<LLVM opcode index>
				174
Clement Courbet	78b2e73	2018-09-25 07:31:44 +0000	[diff] [blame]	175	Specify the opcode to measure, by index. See example 1 for details.
				176	Either `opcode-index`, `opcode-name` or `snippets-file` must be set.
Clement Courbet	ac74acd	2018-04-04 11:37:06 +0000	[diff] [blame]	177
				178	.. option:: -opcode-name=<LLVM opcode name>
				179
Clement Courbet	78b2e73	2018-09-25 07:31:44 +0000	[diff] [blame]	180	Specify the opcode to measure, by name. See example 1 for details.
				181	Either `opcode-index`, `opcode-name` or `snippets-file` must be set.
				182
				183	.. option:: -snippets-file=<filename>
				184
				185	Specify the custom code snippet to measure. See example 2 for details.
				186	Either `opcode-index`, `opcode-name` or `snippets-file` must be set.
Clement Courbet	ac74acd	2018-04-04 11:37:06 +0000	[diff] [blame]	187
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	188	.. option:: -mode=[latency\|uops\|analysis]
Clement Courbet	ac74acd	2018-04-04 11:37:06 +0000	[diff] [blame]	189
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	190	Specify the run mode.
Clement Courbet	ac74acd	2018-04-04 11:37:06 +0000	[diff] [blame]	191
				192	.. option:: -num-repetitions=<Number of repetition>
				193
				194	Specify the number of repetitions of the asm snippet.
				195	Higher values lead to more accurate measurements but lengthen the benchmark.
				196
Simon Pilgrim	a563843	2018-06-18 20:05:02 +0000	[diff] [blame]	197	.. option:: -benchmarks-file=</path/to/file>
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	198
				199	File to read (`analysis` mode) or write (`latency`/`uops` modes) benchmark
				200	results. "-" uses stdin/stdout.
				201
				202	.. option:: -analysis-clusters-output-file=</path/to/file>
				203
				204	If provided, write the analysis clusters as CSV to this file. "-" prints to
				205	stdout.
				206
				207	.. option:: -analysis-inconsistencies-output-file=</path/to/file>
				208
				209	If non-empty, write inconsistencies found during analysis to this file. `-`
				210	prints to stdout.
				211
				212	.. option:: -analysis-numpoints=<dbscan numPoints parameter>
				213
				214	Specify the numPoints parameters to be used for DBSCAN clustering
				215	(`analysis` mode).
				216
				217	.. option:: -analysis-espilon=<dbscan epsilon parameter>
				218
				219	Specify the numPoints parameters to be used for DBSCAN clustering
				220	(`analysis` mode).
				221
Simon Pilgrim	a563843	2018-06-18 20:05:02 +0000	[diff] [blame]	222	.. option:: -ignore-invalid-sched-class=false
Clement Courbet	e752fd6	2018-06-18 11:27:47 +0000	[diff] [blame]	223
Simon Pilgrim	a563843	2018-06-18 20:05:02 +0000	[diff] [blame]	224	If set, ignore instructions that do not have a sched class (class idx = 0).
Clement Courbet	e752fd6	2018-06-18 11:27:47 +0000	[diff] [blame]	225
Clement Courbet	ac74acd	2018-04-04 11:37:06 +0000	[diff] [blame]	226
				227	EXIT STATUS
				228	-----------
				229
				230	:program:`llvm-exegesis` returns 0 on success. Otherwise, an error message is
				231	printed to standard error, and the tool returns a non 0 value.