Blame - llvm/docs/CommandGuide/llvm-exegesis.rst - toolchain/llvm-project

blob: 8cc1a237e9969aa3bf0ead0049c82558e22e2eb1 [file] [log] [blame]

Clement Courbet	ac74acd	2018-04-04 11:37:06 +0000	[diff] [blame]	1	llvm-exegesis - LLVM Machine Instruction Benchmark
				2	==================================================
				3
James Henderson	a056684	2019-06-27 13:24:46 +0000	[diff] [blame]	4	.. program:: llvm-exegesis
				5
Clement Courbet	ac74acd	2018-04-04 11:37:06 +0000	[diff] [blame]	6	SYNOPSIS
				7	--------
				8
				9	:program:`llvm-exegesis` [options]
				10
				11	DESCRIPTION
				12	-----------
				13
				14	:program:`llvm-exegesis` is a benchmarking tool that uses information available
Clement Courbet	362653f	2019-01-30 16:02:20 +0000	[diff] [blame]	15	in LLVM to measure host machine instruction characteristics like latency,
				16	throughput, or port decomposition.
Clement Courbet	ac74acd	2018-04-04 11:37:06 +0000	[diff] [blame]	17
				18	Given an LLVM opcode name and a benchmarking mode, :program:`llvm-exegesis`
				19	generates a code snippet that makes execution as serial (resp. as parallel) as
Clement Courbet	362653f	2019-01-30 16:02:20 +0000	[diff] [blame]	20	possible so that we can measure the latency (resp. inverse throughput/uop decomposition)
				21	of the instruction.
Clement Courbet	ac74acd	2018-04-04 11:37:06 +0000	[diff] [blame]	22	The code snippet is jitted and executed on the host subtarget. The time taken
				23	(resp. resource usage) is measured using hardware performance counters. The
				24	result is printed out as YAML to the standard output.
				25
				26	The main goal of this tool is to automatically (in)validate the LLVM's TableDef
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	27	scheduling models. To that end, we also provide analysis of the results.
				28
Clement Courbet	78b2e73	2018-09-25 07:31:44 +0000	[diff] [blame]	29	:program:`llvm-exegesis` can also benchmark arbitrary user-provided code
				30	snippets.
				31
				32	EXAMPLE 1: benchmarking instructions
				33	------------------------------------
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	34
				35	Assume you have an X86-64 machine. To measure the latency of a single
				36	instruction, run:
				37
				38	.. code-block:: bash
				39
				40	$ llvm-exegesis -mode=latency -opcode-name=ADD64rr
				41
Clement Courbet	362653f	2019-01-30 16:02:20 +0000	[diff] [blame]	42	Measuring the uop decomposition or inverse throughput of an instruction works similarly:
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	43
				44	.. code-block:: bash
				45
				46	$ llvm-exegesis -mode=uops -opcode-name=ADD64rr
Clement Courbet	362653f	2019-01-30 16:02:20 +0000	[diff] [blame]	47	$ llvm-exegesis -mode=inverse_throughput -opcode-name=ADD64rr
				48
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	49
				50	The output is a YAML document (the default is to write to stdout, but you can
				51	redirect the output to a file using `-benchmarks-file`):
				52
				53	.. code-block:: none
				54
				55	---
				56	key:
				57	opcode_name: ADD64rr
				58	mode: latency
				59	config: ''
				60	cpu_name: haswell
				61	llvm_triple: x86_64-unknown-linux-gnu
				62	num_repetitions: 10000
				63	measurements:
				64	- { key: latency, value: 1.0058, debug_string: '' }
				65	error: ''
				66	info: 'explicit self cycles, selecting one aliasing configuration.
				67	Snippet:
				68	ADD64rr R8, R8, R10
				69	'
				70	...
				71
				72	To measure the latency of all instructions for the host architecture, run:
				73
				74	.. code-block:: bash
				75
				76	#!/bin/bash
Clement Courbet	6eb680a	2018-06-01 14:49:06 +0000	[diff] [blame]	77	readonly INSTRUCTIONS=$(($(grep INSTRUCTION_LIST_END build/lib/Target/X86/X86GenInstrInfo.inc \| cut -f2 -d=) - 1))
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	78	for INSTRUCTION in $(seq 1 ${INSTRUCTIONS});
				79	do
				80	./build/bin/llvm-exegesis -mode=latency -opcode-index=${INSTRUCTION} \| sed -n '/---/,$p'
				81	done
				82
				83	FIXME: Provide an :program:`llvm-exegesis` option to test all instructions.
				84
Clement Courbet	78b2e73	2018-09-25 07:31:44 +0000	[diff] [blame]	85
				86	EXAMPLE 2: benchmarking a custom code snippet
				87	---------------------------------------------
				88
				89	To measure the latency/uops of a custom piece of code, you can specify the
				90	`snippets-file` option (`-` reads from standard input).
				91
				92	.. code-block:: bash
				93
				94	$ echo "vzeroupper" \| llvm-exegesis -mode=uops -snippets-file=-
				95
				96	Real-life code snippets typically depend on registers or memory.
				97	:program:`llvm-exegesis` checks the liveliness of registers (i.e. any register
				98	use has a corresponding def or is a "live in"). If your code depends on the
				99	value of some registers, you have two options:
Clement Courbet	86ecf46	2018-09-25 07:48:38 +0000	[diff] [blame]	100
				101	- Mark the register as requiring a definition. :program:`llvm-exegesis` will
				102	automatically assign a value to the register. This can be done using the
				103	directive `LLVM-EXEGESIS-DEFREG <reg name> <hex_value>`, where `<hex_value>`
				104	is a bit pattern used to fill `<reg_name>`. If `<hex_value>` is smaller than
				105	the register width, it will be sign-extended.
				106	- Mark the register as a "live in". :program:`llvm-exegesis` will benchmark
				107	using whatever value was in this registers on entry. This can be done using
				108	the directive `LLVM-EXEGESIS-LIVEIN <reg name>`.
Clement Courbet	78b2e73	2018-09-25 07:31:44 +0000	[diff] [blame]	109
				110	For example, the following code snippet depends on the values of XMM1 (which
				111	will be set by the tool) and the memory buffer passed in RDI (live in).
				112
				113	.. code-block:: none
				114
				115	# LLVM-EXEGESIS-LIVEIN RDI
				116	# LLVM-EXEGESIS-DEFREG XMM1 42
				117	vmulps (%rdi), %xmm1, %xmm2
				118	vhaddps %xmm2, %xmm2, %xmm3
				119	addq $0x10, %rdi
				120
				121
				122	EXAMPLE 3: analysis
				123	-------------------
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	124
				125	Assuming you have a set of benchmarked instructions (either latency or uops) as
				126	YAML in file `/tmp/benchmarks.yaml`, you can analyze the results using the
				127	following command:
				128
				129	.. code-block:: bash
				130
				131	$ llvm-exegesis -mode=analysis \
				132	-benchmarks-file=/tmp/benchmarks.yaml \
				133	-analysis-clusters-output-file=/tmp/clusters.csv \
Simon Pilgrim	c4976f6	2018-09-27 13:49:52 +0000	[diff] [blame]	134	-analysis-inconsistencies-output-file=/tmp/inconsistencies.html
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	135
				136	This will group the instructions into clusters with the same performance
				137	characteristics. The clusters will be written out to `/tmp/clusters.csv` in the
				138	following format:
				139
				140	.. code-block:: none
				141
				142	cluster_id,opcode_name,config,sched_class
				143	...
				144	2,ADD32ri8_DB,,WriteALU,1.00
				145	2,ADD32ri_DB,,WriteALU,1.01
				146	2,ADD32rr,,WriteALU,1.01
				147	2,ADD32rr_DB,,WriteALU,1.00
				148	2,ADD32rr_REV,,WriteALU,1.00
				149	2,ADD64i32,,WriteALU,1.01
				150	2,ADD64ri32,,WriteALU,1.01
				151	2,MOVSX64rr32,,BSWAP32r_BSWAP64r_MOVSX64rr32,1.00
				152	2,VPADDQYrr,,VPADDBYrr_VPADDDYrr_VPADDQYrr_VPADDWYrr_VPSUBBYrr_VPSUBDYrr_VPSUBQYrr_VPSUBWYrr,1.02
				153	2,VPSUBQYrr,,VPADDBYrr_VPADDDYrr_VPADDQYrr_VPADDWYrr_VPSUBBYrr_VPSUBDYrr_VPSUBQYrr_VPSUBWYrr,1.01
				154	2,ADD64ri8,,WriteALU,1.00
				155	2,SETBr,,WriteSETCC,1.01
				156	...
				157
				158	:program:`llvm-exegesis` will also analyze the clusters to point out
Clement Courbet	488ebfb	2018-05-22 13:36:29 +0000	[diff] [blame]	159	inconsistencies in the scheduling information. The output is an html file. For
Clement Courbet	2637e5f	2018-05-24 10:47:05 +0000	[diff] [blame]	160	example, `/tmp/inconsistencies.html` will contain messages like the following :
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	161
Clement Courbet	2637e5f	2018-05-24 10:47:05 +0000	[diff] [blame]	162	.. image:: llvm-exegesis-analysis.png
				163	:align: center
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	164
				165	Note that the scheduling class names will be resolved only when
				166	:program:`llvm-exegesis` is compiled in debug mode, else only the class id will
				167	be shown. This does not invalidate any of the analysis results though.
				168
Clement Courbet	ac74acd	2018-04-04 11:37:06 +0000	[diff] [blame]	169	OPTIONS
				170	-------
				171
				172	.. option:: -help
				173
				174	Print a summary of command line options.
				175
				176	.. option:: -opcode-index=<LLVM opcode index>
				177
Roman Lebedev	cc5549d	2020-02-13 12:45:15 +0300	[diff] [blame]	178	Specify the opcode to measure, by index. Specifying `-1` will result
				179	in measuring every existing opcode. See example 1 for details.
Clement Courbet	78b2e73	2018-09-25 07:31:44 +0000	[diff] [blame]	180	Either `opcode-index`, `opcode-name` or `snippets-file` must be set.
Clement Courbet	ac74acd	2018-04-04 11:37:06 +0000	[diff] [blame]	181
Clement Courbet	f973c2d	2018-10-17 15:04:15 +0000	[diff] [blame]	182	.. option:: -opcode-name=<opcode name 1>,<opcode name 2>,...
Clement Courbet	ac74acd	2018-04-04 11:37:06 +0000	[diff] [blame]	183
Clement Courbet	f973c2d	2018-10-17 15:04:15 +0000	[diff] [blame]	184	Specify the opcode to measure, by name. Several opcodes can be specified as
				185	a comma-separated list. See example 1 for details.
Clement Courbet	78b2e73	2018-09-25 07:31:44 +0000	[diff] [blame]	186	Either `opcode-index`, `opcode-name` or `snippets-file` must be set.
				187
Clement Courbet	89a6647	2020-02-06 12:08:02 +0100	[diff] [blame]	188	.. option:: -snippets-file=<filename>
Clement Courbet	78b2e73	2018-09-25 07:31:44 +0000	[diff] [blame]	189
Clement Courbet	89a6647	2020-02-06 12:08:02 +0100	[diff] [blame]	190	Specify the custom code snippet to measure. See example 2 for details.
				191	Either `opcode-index`, `opcode-name` or `snippets-file` must be set.
Clement Courbet	ac74acd	2018-04-04 11:37:06 +0000	[diff] [blame]	192
Clement Courbet	362653f	2019-01-30 16:02:20 +0000	[diff] [blame]	193	.. option:: -mode=[latency\|uops\|inverse_throughput\|analysis]
Clement Courbet	ac74acd	2018-04-04 11:37:06 +0000	[diff] [blame]	194
Vy Nguyen	ee7caa7	2020-07-27 12:38:05 -0400	[diff] [blame^]	195	Specify the run mode. Note that some modes have additional requirements and options.
Clement Courbet	ac74acd	2018-04-04 11:37:06 +0000	[diff] [blame]	196
Vy Nguyen	ee7caa7	2020-07-27 12:38:05 -0400	[diff] [blame^]	197	`latency` mode can be make use of either RDTSC or LBR.
				198	`latency[LBR]` is only available on X86 (at least `Skylake`).
				199	To run in this mode, a positive value must be specified for `x86-lbr-sample-period` and `--repetition-mode=loop`
				200
				201	In `analysis` mode, you also need to specify at least one of the
				202	`-analysis-clusters-output-file=` and `-analysis-inconsistencies-output-file=`.
				203
				204	.. option:: -x86-lbr-sample-period=<nBranches/sample>
				205
				206	Specify the LBR sampling period - how many branches before we take a sample.
				207	When a positive value is specified for this option and when the mode is `latency`,
				208	we will use LBRs for measuring.
				209	On choosing the "right" sampling period, a small value is preferred, but throttling
				210	could occur if the sampling is too frequent. A prime number should be used to
				211	avoid consistently skipping certain blocks.
				212
Roman Lebedev	de22d71	2020-04-02 09:28:35 +0300	[diff] [blame]	213	.. option:: -repetition-mode=[duplicate\|loop\|min]
Clement Courbet	89a6647	2020-02-06 12:08:02 +0100	[diff] [blame]	214
				215	Specify the repetition mode. `duplicate` will create a large, straight line
				216	basic block with `num-repetitions` copies of the snippet. `loop` will wrap
				217	the snippet in a loop which will be run `num-repetitions` times. The `loop`
				218	mode tends to better hide the effects of the CPU frontend on architectures
				219	that cache decoded instructions, but consumes a register for counting
Roman Lebedev	de22d71	2020-04-02 09:28:35 +0300	[diff] [blame]	220	iterations. If performing an analysis over many opcodes, it may be best
				221	to instead use the `min` mode, which will run each other mode, and produce
				222	the minimal measured result.
Clement Courbet	89a6647	2020-02-06 12:08:02 +0100	[diff] [blame]	223
Clement Courbet	2cd0f28	2019-10-08 14:30:24 +0000	[diff] [blame]	224	.. option:: -num-repetitions=<Number of repetitions>
Clement Courbet	ac74acd	2018-04-04 11:37:06 +0000	[diff] [blame]	225
				226	Specify the number of repetitions of the asm snippet.
				227	Higher values lead to more accurate measurements but lengthen the benchmark.
				228
Clement Courbet	2cd0f28	2019-10-08 14:30:24 +0000	[diff] [blame]	229	.. option:: -max-configs-per-opcode=<value>
				230
				231	Specify the maximum configurations that can be generated for each opcode.
				232	By default this is `1`, meaning that we assume that a single measurement is
				233	enough to characterize an opcode. This might not be true of all instructions:
				234	for example, the performance characteristics of the LEA instruction on X86
				235	depends on the value of assigned registers and immediates. Setting a value of
				236	`-max-configs-per-opcode` larger than `1` allows `llvm-exegesis` to explore
				237	more configurations to discover if some register or immediate assignments
				238	lead to different performance characteristics.
				239
				240
Simon Pilgrim	a563843	2018-06-18 20:05:02 +0000	[diff] [blame]	241	.. option:: -benchmarks-file=</path/to/file>
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	242
Clement Courbet	362653f	2019-01-30 16:02:20 +0000	[diff] [blame]	243	File to read (`analysis` mode) or write (`latency`/`uops`/`inverse_throughput`
				244	modes) benchmark results. "-" uses stdin/stdout.
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	245
				246	.. option:: -analysis-clusters-output-file=</path/to/file>
				247
				248	If provided, write the analysis clusters as CSV to this file. "-" prints to
Roman Lebedev	21193f4	2019-02-04 09:12:08 +0000	[diff] [blame]	249	stdout. By default, this analysis is not run.
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	250
				251	.. option:: -analysis-inconsistencies-output-file=</path/to/file>
				252
				253	If non-empty, write inconsistencies found during analysis to this file. `-`
Roman Lebedev	21193f4	2019-02-04 09:12:08 +0000	[diff] [blame]	254	prints to stdout. By default, this analysis is not run.
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	255
Roman Lebedev	c2423fe	2019-03-28 08:55:01 +0000	[diff] [blame]	256	.. option:: -analysis-clustering=[dbscan,naive]
				257
				258	Specify the clustering algorithm to use. By default DBSCAN will be used.
				259	Naive clustering algorithm is better for doing further work on the
				260	`-analysis-inconsistencies-output-file=` output, it will create one cluster
				261	per opcode, and check that the cluster is stable (all points are neighbours).
				262
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	263	.. option:: -analysis-numpoints=<dbscan numPoints parameter>
				264
				265	Specify the numPoints parameters to be used for DBSCAN clustering
Roman Lebedev	c2423fe	2019-03-28 08:55:01 +0000	[diff] [blame]	266	(`analysis` mode, DBSCAN only).
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	267
Roman Lebedev	542e5d7	2019-02-25 09:36:12 +0000	[diff] [blame]	268	.. option:: -analysis-clustering-epsilon=<dbscan epsilon parameter>
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	269
Roman Lebedev	542e5d7	2019-02-25 09:36:12 +0000	[diff] [blame]	270	Specify the epsilon parameter used for clustering of benchmark points
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	271	(`analysis` mode).
				272
Roman Lebedev	542e5d7	2019-02-25 09:36:12 +0000	[diff] [blame]	273	.. option:: -analysis-inconsistency-epsilon=<epsilon>
				274
				275	Specify the epsilon parameter used for detection of when the cluster
				276	is different from the LLVM schedule profile values (`analysis` mode).
				277
Roman Lebedev	6971639	2019-02-20 09:14:04 +0000	[diff] [blame]	278	.. option:: -analysis-display-unstable-clusters
				279
				280	If there is more than one benchmark for an opcode, said benchmarks may end up
				281	not being clustered into the same cluster if the measured performance
				282	characteristics are different. by default all such opcodes are filtered out.
				283	This flag will instead show only such unstable opcodes.
				284
Simon Pilgrim	a563843	2018-06-18 20:05:02 +0000	[diff] [blame]	285	.. option:: -ignore-invalid-sched-class=false
Clement Courbet	e752fd6	2018-06-18 11:27:47 +0000	[diff] [blame]	286
Simon Pilgrim	a563843	2018-06-18 20:05:02 +0000	[diff] [blame]	287	If set, ignore instructions that do not have a sched class (class idx = 0).
Clement Courbet	e752fd6	2018-06-18 11:27:47 +0000	[diff] [blame]	288
Guillaume Chatelet	848df5b	2019-04-05 15:18:59 +0000	[diff] [blame]	289	.. option:: -mcpu=<cpu name>
Clement Courbet	41c8af3	2018-10-25 07:44:01 +0000	[diff] [blame]	290
Guillaume Chatelet	848df5b	2019-04-05 15:18:59 +0000	[diff] [blame]	291	If set, measure the cpu characteristics using the counters for this CPU. This
				292	is useful when creating new sched models (the host CPU is unknown to LLVM).
				293
				294	.. option:: --dump-object-to-disk=true
				295
				296	By default, llvm-exegesis will dump the generated code to a temporary file to
				297	enable code inspection. You may disable it to speed up the execution and save
				298	disk space.
Clement Courbet	ac74acd	2018-04-04 11:37:06 +0000	[diff] [blame]	299
				300	EXIT STATUS
				301	-----------
				302
				303	:program:`llvm-exegesis` returns 0 on success. Otherwise, an error message is
				304	printed to standard error, and the tool returns a non 0 value.