Blame - llvm/docs/CommandGuide/llvm-exegesis.rst - toolchain/llvm-project

blob: bd7132700d067fc5bb4197409ba6eb7a706b1e0a [file] [log] [blame]

Clement Courbet	ac74acd	2018-04-04 11:37:06 +0000	[diff] [blame]	1	llvm-exegesis - LLVM Machine Instruction Benchmark
				2	==================================================
				3
				4	SYNOPSIS
				5	--------
				6
				7	:program:`llvm-exegesis` [options]
				8
				9	DESCRIPTION
				10	-----------
				11
				12	:program:`llvm-exegesis` is a benchmarking tool that uses information available
				13	in LLVM to measure host machine instruction characteristics like latency or port
				14	decomposition.
				15
				16	Given an LLVM opcode name and a benchmarking mode, :program:`llvm-exegesis`
				17	generates a code snippet that makes execution as serial (resp. as parallel) as
				18	possible so that we can measure the latency (resp. uop decomposition) of the
				19	instruction.
				20	The code snippet is jitted and executed on the host subtarget. The time taken
				21	(resp. resource usage) is measured using hardware performance counters. The
				22	result is printed out as YAML to the standard output.
				23
				24	The main goal of this tool is to automatically (in)validate the LLVM's TableDef
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	25	scheduling models. To that end, we also provide analysis of the results.
				26
				27	EXAMPLES: benchmarking
				28	----------------------
				29
				30	Assume you have an X86-64 machine. To measure the latency of a single
				31	instruction, run:
				32
				33	.. code-block:: bash
				34
				35	$ llvm-exegesis -mode=latency -opcode-name=ADD64rr
				36
				37	Measuring the uop decomposition of an instruction works similarly:
				38
				39	.. code-block:: bash
				40
				41	$ llvm-exegesis -mode=uops -opcode-name=ADD64rr
				42
				43	The output is a YAML document (the default is to write to stdout, but you can
				44	redirect the output to a file using `-benchmarks-file`):
				45
				46	.. code-block:: none
				47
				48	---
				49	key:
				50	opcode_name: ADD64rr
				51	mode: latency
				52	config: ''
				53	cpu_name: haswell
				54	llvm_triple: x86_64-unknown-linux-gnu
				55	num_repetitions: 10000
				56	measurements:
				57	- { key: latency, value: 1.0058, debug_string: '' }
				58	error: ''
				59	info: 'explicit self cycles, selecting one aliasing configuration.
				60	Snippet:
				61	ADD64rr R8, R8, R10
				62	'
				63	...
				64
				65	To measure the latency of all instructions for the host architecture, run:
				66
				67	.. code-block:: bash
				68
				69	#!/bin/bash
				70	readonly INSTRUCTIONS=$(grep INSTRUCTION_LIST_END build/lib/Target/X86/X86GenInstrInfo.inc \| cut -f2 -d=)
				71	for INSTRUCTION in $(seq 1 ${INSTRUCTIONS});
				72	do
				73	./build/bin/llvm-exegesis -mode=latency -opcode-index=${INSTRUCTION} \| sed -n '/---/,$p'
				74	done
				75
				76	FIXME: Provide an :program:`llvm-exegesis` option to test all instructions.
				77
				78	EXAMPLES: analysis
				79	----------------------
				80
				81	Assuming you have a set of benchmarked instructions (either latency or uops) as
				82	YAML in file `/tmp/benchmarks.yaml`, you can analyze the results using the
				83	following command:
				84
				85	.. code-block:: bash
				86
				87	$ llvm-exegesis -mode=analysis \
				88	-benchmarks-file=/tmp/benchmarks.yaml \
				89	-analysis-clusters-output-file=/tmp/clusters.csv \
				90	-analysis-inconsistencies-output-file=/tmp/inconsistencies.txt
				91
				92	This will group the instructions into clusters with the same performance
				93	characteristics. The clusters will be written out to `/tmp/clusters.csv` in the
				94	following format:
				95
				96	.. code-block:: none
				97
				98	cluster_id,opcode_name,config,sched_class
				99	...
				100	2,ADD32ri8_DB,,WriteALU,1.00
				101	2,ADD32ri_DB,,WriteALU,1.01
				102	2,ADD32rr,,WriteALU,1.01
				103	2,ADD32rr_DB,,WriteALU,1.00
				104	2,ADD32rr_REV,,WriteALU,1.00
				105	2,ADD64i32,,WriteALU,1.01
				106	2,ADD64ri32,,WriteALU,1.01
				107	2,MOVSX64rr32,,BSWAP32r_BSWAP64r_MOVSX64rr32,1.00
				108	2,VPADDQYrr,,VPADDBYrr_VPADDDYrr_VPADDQYrr_VPADDWYrr_VPSUBBYrr_VPSUBDYrr_VPSUBQYrr_VPSUBWYrr,1.02
				109	2,VPSUBQYrr,,VPADDBYrr_VPADDDYrr_VPADDQYrr_VPADDWYrr_VPSUBBYrr_VPSUBDYrr_VPSUBQYrr_VPSUBWYrr,1.01
				110	2,ADD64ri8,,WriteALU,1.00
				111	2,SETBr,,WriteSETCC,1.01
				112	...
				113
				114	:program:`llvm-exegesis` will also analyze the clusters to point out
Clement Courbet	488ebfb	2018-05-22 13:36:29 +0000	[diff] [blame]	115	inconsistencies in the scheduling information. The output is an html file. For
Clement Courbet	2637e5f	2018-05-24 10:47:05 +0000	[diff] [blame^]	116	example, `/tmp/inconsistencies.html` will contain messages like the following :
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	117
Clement Courbet	2637e5f	2018-05-24 10:47:05 +0000	[diff] [blame^]	118	.. image:: llvm-exegesis-analysis.png
				119	:align: center
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	120
				121	Note that the scheduling class names will be resolved only when
				122	:program:`llvm-exegesis` is compiled in debug mode, else only the class id will
				123	be shown. This does not invalidate any of the analysis results though.
				124
Clement Courbet	ac74acd	2018-04-04 11:37:06 +0000	[diff] [blame]	125
				126	OPTIONS
				127	-------
				128
				129	.. option:: -help
				130
				131	Print a summary of command line options.
				132
				133	.. option:: -opcode-index=<LLVM opcode index>
				134
				135	Specify the opcode to measure, by index.
				136	Either `opcode-index` or `opcode-name` must be set.
				137
				138	.. option:: -opcode-name=<LLVM opcode name>
				139
				140	Specify the opcode to measure, by name.
				141	Either `opcode-index` or `opcode-name` must be set.
				142
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	143	.. option:: -mode=[latency\|uops\|analysis]
Clement Courbet	ac74acd	2018-04-04 11:37:06 +0000	[diff] [blame]	144
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	145	Specify the run mode.
Clement Courbet	ac74acd	2018-04-04 11:37:06 +0000	[diff] [blame]	146
				147	.. option:: -num-repetitions=<Number of repetition>
				148
				149	Specify the number of repetitions of the asm snippet.
				150	Higher values lead to more accurate measurements but lengthen the benchmark.
				151
Clement Courbet	5ec03cd	2018-05-18 12:33:57 +0000	[diff] [blame]	152	.. option:: -benchmarks-file=</path/to/file>
				153
				154	File to read (`analysis` mode) or write (`latency`/`uops` modes) benchmark
				155	results. "-" uses stdin/stdout.
				156
				157	.. option:: -analysis-clusters-output-file=</path/to/file>
				158
				159	If provided, write the analysis clusters as CSV to this file. "-" prints to
				160	stdout.
				161
				162	.. option:: -analysis-inconsistencies-output-file=</path/to/file>
				163
				164	If non-empty, write inconsistencies found during analysis to this file. `-`
				165	prints to stdout.
				166
				167	.. option:: -analysis-numpoints=<dbscan numPoints parameter>
				168
				169	Specify the numPoints parameters to be used for DBSCAN clustering
				170	(`analysis` mode).
				171
				172	.. option:: -analysis-espilon=<dbscan epsilon parameter>
				173
				174	Specify the numPoints parameters to be used for DBSCAN clustering
				175	(`analysis` mode).
				176
Clement Courbet	ac74acd	2018-04-04 11:37:06 +0000	[diff] [blame]	177
				178	EXIT STATUS
				179	-----------
				180
				181	:program:`llvm-exegesis` returns 0 on success. Otherwise, an error message is
				182	printed to standard error, and the tool returns a non 0 value.