Clement Courbet | ac74acd | 2018-04-04 11:37:06 +0000 | [diff] [blame] | 1 | llvm-exegesis - LLVM Machine Instruction Benchmark |
| 2 | ================================================== |
| 3 | |
| 4 | SYNOPSIS |
| 5 | -------- |
| 6 | |
| 7 | :program:`llvm-exegesis` [*options*] |
| 8 | |
| 9 | DESCRIPTION |
| 10 | ----------- |
| 11 | |
| 12 | :program:`llvm-exegesis` is a benchmarking tool that uses information available |
| 13 | in LLVM to measure host machine instruction characteristics like latency or port |
| 14 | decomposition. |
| 15 | |
| 16 | Given an LLVM opcode name and a benchmarking mode, :program:`llvm-exegesis` |
| 17 | generates a code snippet that makes execution as serial (resp. as parallel) as |
| 18 | possible so that we can measure the latency (resp. uop decomposition) of the |
| 19 | instruction. |
| 20 | The code snippet is jitted and executed on the host subtarget. The time taken |
| 21 | (resp. resource usage) is measured using hardware performance counters. The |
| 22 | result is printed out as YAML to the standard output. |
| 23 | |
| 24 | The main goal of this tool is to automatically (in)validate the LLVM's TableDef |
Clement Courbet | 5ec03cd | 2018-05-18 12:33:57 +0000 | [diff] [blame] | 25 | scheduling models. To that end, we also provide analysis of the results. |
| 26 | |
| 27 | EXAMPLES: benchmarking |
| 28 | ---------------------- |
| 29 | |
| 30 | Assume you have an X86-64 machine. To measure the latency of a single |
| 31 | instruction, run: |
| 32 | |
| 33 | .. code-block:: bash |
| 34 | |
| 35 | $ llvm-exegesis -mode=latency -opcode-name=ADD64rr |
| 36 | |
| 37 | Measuring the uop decomposition of an instruction works similarly: |
| 38 | |
| 39 | .. code-block:: bash |
| 40 | |
| 41 | $ llvm-exegesis -mode=uops -opcode-name=ADD64rr |
| 42 | |
| 43 | The output is a YAML document (the default is to write to stdout, but you can |
| 44 | redirect the output to a file using `-benchmarks-file`): |
| 45 | |
| 46 | .. code-block:: none |
| 47 | |
| 48 | --- |
| 49 | key: |
| 50 | opcode_name: ADD64rr |
| 51 | mode: latency |
| 52 | config: '' |
| 53 | cpu_name: haswell |
| 54 | llvm_triple: x86_64-unknown-linux-gnu |
| 55 | num_repetitions: 10000 |
| 56 | measurements: |
| 57 | - { key: latency, value: 1.0058, debug_string: '' } |
| 58 | error: '' |
| 59 | info: 'explicit self cycles, selecting one aliasing configuration. |
| 60 | Snippet: |
| 61 | ADD64rr R8, R8, R10 |
| 62 | ' |
| 63 | ... |
| 64 | |
| 65 | To measure the latency of all instructions for the host architecture, run: |
| 66 | |
| 67 | .. code-block:: bash |
| 68 | |
| 69 | #!/bin/bash |
| 70 | readonly INSTRUCTIONS=$(grep INSTRUCTION_LIST_END build/lib/Target/X86/X86GenInstrInfo.inc | cut -f2 -d=) |
| 71 | for INSTRUCTION in $(seq 1 ${INSTRUCTIONS}); |
| 72 | do |
| 73 | ./build/bin/llvm-exegesis -mode=latency -opcode-index=${INSTRUCTION} | sed -n '/---/,$p' |
| 74 | done |
| 75 | |
| 76 | FIXME: Provide an :program:`llvm-exegesis` option to test all instructions. |
| 77 | |
| 78 | EXAMPLES: analysis |
| 79 | ---------------------- |
| 80 | |
| 81 | Assuming you have a set of benchmarked instructions (either latency or uops) as |
| 82 | YAML in file `/tmp/benchmarks.yaml`, you can analyze the results using the |
| 83 | following command: |
| 84 | |
| 85 | .. code-block:: bash |
| 86 | |
| 87 | $ llvm-exegesis -mode=analysis \ |
| 88 | -benchmarks-file=/tmp/benchmarks.yaml \ |
| 89 | -analysis-clusters-output-file=/tmp/clusters.csv \ |
| 90 | -analysis-inconsistencies-output-file=/tmp/inconsistencies.txt |
| 91 | |
| 92 | This will group the instructions into clusters with the same performance |
| 93 | characteristics. The clusters will be written out to `/tmp/clusters.csv` in the |
| 94 | following format: |
| 95 | |
| 96 | .. code-block:: none |
| 97 | |
| 98 | cluster_id,opcode_name,config,sched_class |
| 99 | ... |
| 100 | 2,ADD32ri8_DB,,WriteALU,1.00 |
| 101 | 2,ADD32ri_DB,,WriteALU,1.01 |
| 102 | 2,ADD32rr,,WriteALU,1.01 |
| 103 | 2,ADD32rr_DB,,WriteALU,1.00 |
| 104 | 2,ADD32rr_REV,,WriteALU,1.00 |
| 105 | 2,ADD64i32,,WriteALU,1.01 |
| 106 | 2,ADD64ri32,,WriteALU,1.01 |
| 107 | 2,MOVSX64rr32,,BSWAP32r_BSWAP64r_MOVSX64rr32,1.00 |
| 108 | 2,VPADDQYrr,,VPADDBYrr_VPADDDYrr_VPADDQYrr_VPADDWYrr_VPSUBBYrr_VPSUBDYrr_VPSUBQYrr_VPSUBWYrr,1.02 |
| 109 | 2,VPSUBQYrr,,VPADDBYrr_VPADDDYrr_VPADDQYrr_VPADDWYrr_VPSUBBYrr_VPSUBDYrr_VPSUBQYrr_VPSUBWYrr,1.01 |
| 110 | 2,ADD64ri8,,WriteALU,1.00 |
| 111 | 2,SETBr,,WriteSETCC,1.01 |
| 112 | ... |
| 113 | |
| 114 | :program:`llvm-exegesis` will also analyze the clusters to point out |
Clement Courbet | 488ebfb | 2018-05-22 13:36:29 +0000 | [diff] [blame] | 115 | inconsistencies in the scheduling information. The output is an html file. For |
Clement Courbet | 2637e5f | 2018-05-24 10:47:05 +0000 | [diff] [blame^] | 116 | example, `/tmp/inconsistencies.html` will contain messages like the following : |
Clement Courbet | 5ec03cd | 2018-05-18 12:33:57 +0000 | [diff] [blame] | 117 | |
Clement Courbet | 2637e5f | 2018-05-24 10:47:05 +0000 | [diff] [blame^] | 118 | .. image:: llvm-exegesis-analysis.png |
| 119 | :align: center |
Clement Courbet | 5ec03cd | 2018-05-18 12:33:57 +0000 | [diff] [blame] | 120 | |
| 121 | Note that the scheduling class names will be resolved only when |
| 122 | :program:`llvm-exegesis` is compiled in debug mode, else only the class id will |
| 123 | be shown. This does not invalidate any of the analysis results though. |
| 124 | |
Clement Courbet | ac74acd | 2018-04-04 11:37:06 +0000 | [diff] [blame] | 125 | |
| 126 | OPTIONS |
| 127 | ------- |
| 128 | |
| 129 | .. option:: -help |
| 130 | |
| 131 | Print a summary of command line options. |
| 132 | |
| 133 | .. option:: -opcode-index=<LLVM opcode index> |
| 134 | |
| 135 | Specify the opcode to measure, by index. |
| 136 | Either `opcode-index` or `opcode-name` must be set. |
| 137 | |
| 138 | .. option:: -opcode-name=<LLVM opcode name> |
| 139 | |
| 140 | Specify the opcode to measure, by name. |
| 141 | Either `opcode-index` or `opcode-name` must be set. |
| 142 | |
Clement Courbet | 5ec03cd | 2018-05-18 12:33:57 +0000 | [diff] [blame] | 143 | .. option:: -mode=[latency|uops|analysis] |
Clement Courbet | ac74acd | 2018-04-04 11:37:06 +0000 | [diff] [blame] | 144 | |
Clement Courbet | 5ec03cd | 2018-05-18 12:33:57 +0000 | [diff] [blame] | 145 | Specify the run mode. |
Clement Courbet | ac74acd | 2018-04-04 11:37:06 +0000 | [diff] [blame] | 146 | |
| 147 | .. option:: -num-repetitions=<Number of repetition> |
| 148 | |
| 149 | Specify the number of repetitions of the asm snippet. |
| 150 | Higher values lead to more accurate measurements but lengthen the benchmark. |
| 151 | |
Clement Courbet | 5ec03cd | 2018-05-18 12:33:57 +0000 | [diff] [blame] | 152 | .. option:: -benchmarks-file=</path/to/file> |
| 153 | |
| 154 | File to read (`analysis` mode) or write (`latency`/`uops` modes) benchmark |
| 155 | results. "-" uses stdin/stdout. |
| 156 | |
| 157 | .. option:: -analysis-clusters-output-file=</path/to/file> |
| 158 | |
| 159 | If provided, write the analysis clusters as CSV to this file. "-" prints to |
| 160 | stdout. |
| 161 | |
| 162 | .. option:: -analysis-inconsistencies-output-file=</path/to/file> |
| 163 | |
| 164 | If non-empty, write inconsistencies found during analysis to this file. `-` |
| 165 | prints to stdout. |
| 166 | |
| 167 | .. option:: -analysis-numpoints=<dbscan numPoints parameter> |
| 168 | |
| 169 | Specify the numPoints parameters to be used for DBSCAN clustering |
| 170 | (`analysis` mode). |
| 171 | |
| 172 | .. option:: -analysis-espilon=<dbscan epsilon parameter> |
| 173 | |
| 174 | Specify the numPoints parameters to be used for DBSCAN clustering |
| 175 | (`analysis` mode). |
| 176 | |
Clement Courbet | ac74acd | 2018-04-04 11:37:06 +0000 | [diff] [blame] | 177 | |
| 178 | EXIT STATUS |
| 179 | ----------- |
| 180 | |
| 181 | :program:`llvm-exegesis` returns 0 on success. Otherwise, an error message is |
| 182 | printed to standard error, and the tool returns a non 0 value. |