| Andrea Di Biagio | 3a6b092 | 2018-03-08 13:05:02 +0000 | [diff] [blame] | 1 | llvm-mca - LLVM Machine Code Analyzer | 
|  | 2 | ===================================== | 
|  | 3 |  | 
|  | 4 | SYNOPSIS | 
|  | 5 | -------- | 
|  | 6 |  | 
|  | 7 | :program:`llvm-mca` [*options*] [input] | 
|  | 8 |  | 
|  | 9 | DESCRIPTION | 
|  | 10 | ----------- | 
|  | 11 |  | 
|  | 12 | :program:`llvm-mca` is a performance analysis tool that uses information | 
|  | 13 | available in LLVM (e.g. scheduling models) to statically measure the performance | 
|  | 14 | of machine code in a specific CPU. | 
|  | 15 |  | 
|  | 16 | Performance is measured in terms of throughput as well as processor resource | 
|  | 17 | consumption. The tool currently works for processors with an out-of-order | 
|  | 18 | backend, for which there is a scheduling model available in LLVM. | 
|  | 19 |  | 
|  | 20 | The main goal of this tool is not just to predict the performance of the code | 
|  | 21 | when run on the target, but also help with diagnosing potential performance | 
|  | 22 | issues. | 
|  | 23 |  | 
|  | 24 | Given an assembly code sequence, llvm-mca estimates the IPC (Instructions Per | 
|  | 25 | Cycle), as well as hardware resource pressure. The analysis and reporting style | 
|  | 26 | were inspired by the IACA tool from Intel. | 
|  | 27 |  | 
| Andrea Di Biagio | c659012 | 2018-04-09 16:39:52 +0000 | [diff] [blame] | 28 | :program:`llvm-mca` allows the usage of special code comments to mark regions of | 
|  | 29 | the assembly code to be analyzed.  A comment starting with substring | 
|  | 30 | ``LLVM-MCA-BEGIN`` marks the beginning of a code region. A comment starting with | 
|  | 31 | substring ``LLVM-MCA-END`` marks the end of a code region.  For example: | 
|  | 32 |  | 
|  | 33 | .. code-block:: none | 
|  | 34 |  | 
|  | 35 | # LLVM-MCA-BEGIN My Code Region | 
|  | 36 | ... | 
|  | 37 | # LLVM-MCA-END | 
|  | 38 |  | 
| Sanjay Patel | 40ad926 | 2018-04-10 18:10:14 +0000 | [diff] [blame] | 39 | Multiple regions can be specified provided that they do not overlap.  A code | 
|  | 40 | region can have an optional description. If no user-defined region is specified, | 
|  | 41 | then :program:`llvm-mca` assumes a default region which contains every | 
|  | 42 | instruction in the input file.  Every region is analyzed in isolation, and the | 
|  | 43 | final performance report is the union of all the reports generated for every | 
|  | 44 | code region. | 
|  | 45 |  | 
|  | 46 | Inline assembly directives may be used from source code to annotate the | 
| Sanjay Patel | c86033a | 2018-04-10 17:49:45 +0000 | [diff] [blame] | 47 | assembly text: | 
|  | 48 |  | 
|  | 49 | .. code-block:: c++ | 
|  | 50 |  | 
| Sanjay Patel | e3a59e2 | 2018-04-10 17:56:24 +0000 | [diff] [blame] | 51 | int foo(int a, int b) { | 
|  | 52 | __asm volatile("# LLVM-MCA-BEGIN foo"); | 
|  | 53 | a += 42; | 
|  | 54 | __asm volatile("# LLVM-MCA-END"); | 
| Andrea Di Biagio | ef507cb | 2018-04-24 10:09:32 +0000 | [diff] [blame] | 55 | a *= b; | 
| Sanjay Patel | e3a59e2 | 2018-04-10 17:56:24 +0000 | [diff] [blame] | 56 | return a; | 
|  | 57 | } | 
| Sanjay Patel | c86033a | 2018-04-10 17:49:45 +0000 | [diff] [blame] | 58 |  | 
|  | 59 | So for example, you can compile code with clang, output assembly, and pipe it | 
|  | 60 | directly into llvm-mca for analysis: | 
|  | 61 |  | 
|  | 62 | .. code-block:: bash | 
|  | 63 |  | 
| Sanjay Patel | 40ad926 | 2018-04-10 18:10:14 +0000 | [diff] [blame] | 64 | $ clang foo.c -O2 -target x86_64-unknown-unknown -S -o - | llvm-mca -mcpu=btver2 | 
| Andrea Di Biagio | c659012 | 2018-04-09 16:39:52 +0000 | [diff] [blame] | 65 |  | 
| Andrea Di Biagio | d8d940a | 2018-05-17 16:48:53 +0000 | [diff] [blame] | 66 | Or for Intel syntax: | 
|  | 67 |  | 
| Simon Pilgrim | 93d45bc | 2018-05-17 16:58:42 +0000 | [diff] [blame] | 68 | .. code-block:: bash | 
| Andrea Di Biagio | d8d940a | 2018-05-17 16:48:53 +0000 | [diff] [blame] | 69 |  | 
|  | 70 | $ clang foo.c -O2 -target x86_64-unknown-unknown -mllvm -x86-asm-syntax=intel -S -o - | llvm-mca -mcpu=btver2 | 
|  | 71 |  | 
| Andrea Di Biagio | 3a6b092 | 2018-03-08 13:05:02 +0000 | [diff] [blame] | 72 | OPTIONS | 
|  | 73 | ------- | 
|  | 74 |  | 
|  | 75 | If ``input`` is "``-``" or omitted, :program:`llvm-mca` reads from standard | 
|  | 76 | input. Otherwise, it will read from the specified filename. | 
|  | 77 |  | 
|  | 78 | If the :option:`-o` option is omitted, then :program:`llvm-mca` will send its output | 
|  | 79 | to standard output if the input is from standard input.  If the :option:`-o` | 
|  | 80 | option specifies "``-``", then the output will also be sent to standard output. | 
|  | 81 |  | 
|  | 82 |  | 
|  | 83 | .. option:: -help | 
|  | 84 |  | 
|  | 85 | Print a summary of command line options. | 
|  | 86 |  | 
|  | 87 | .. option:: -mtriple=<target triple> | 
|  | 88 |  | 
|  | 89 | Specify a target triple string. | 
|  | 90 |  | 
|  | 91 | .. option:: -march=<arch> | 
|  | 92 |  | 
|  | 93 | Specify the architecture for which to analyze the code. It defaults to the | 
|  | 94 | host default target. | 
|  | 95 |  | 
|  | 96 | .. option:: -mcpu=<cpuname> | 
|  | 97 |  | 
| Andrea Di Biagio | 93c49d5 | 2018-04-25 10:18:25 +0000 | [diff] [blame] | 98 | Specify the processor for which to analyze the code.  By default, the cpu name | 
|  | 99 | is autodetected from the host. | 
| Andrea Di Biagio | 3a6b092 | 2018-03-08 13:05:02 +0000 | [diff] [blame] | 100 |  | 
|  | 101 | .. option:: -output-asm-variant=<variant id> | 
|  | 102 |  | 
|  | 103 | Specify the output assembly variant for the report generated by the tool. | 
|  | 104 | On x86, possible values are [0, 1]. A value of 0 (vic. 1) for this flag enables | 
|  | 105 | the AT&T (vic. Intel) assembly format for the code printed out by the tool in | 
|  | 106 | the analysis report. | 
|  | 107 |  | 
|  | 108 | .. option:: -dispatch=<width> | 
|  | 109 |  | 
|  | 110 | Specify a different dispatch width for the processor. The dispatch width | 
| Andrea Di Biagio | efc3f39 | 2018-04-05 16:42:32 +0000 | [diff] [blame] | 111 | defaults to field 'IssueWidth' in the processor scheduling model.  If width is | 
|  | 112 | zero, then the default dispatch width is used. | 
| Andrea Di Biagio | 3a6b092 | 2018-03-08 13:05:02 +0000 | [diff] [blame] | 113 |  | 
| Andrea Di Biagio | 3a6b092 | 2018-03-08 13:05:02 +0000 | [diff] [blame] | 114 | .. option:: -register-file-size=<size> | 
|  | 115 |  | 
| Andrea Di Biagio | efc3f39 | 2018-04-05 16:42:32 +0000 | [diff] [blame] | 116 | Specify the size of the register file. When specified, this flag limits how | 
|  | 117 | many temporary registers are available for register renaming purposes. A value | 
|  | 118 | of zero for this flag means "unlimited number of temporary registers". | 
| Andrea Di Biagio | 3a6b092 | 2018-03-08 13:05:02 +0000 | [diff] [blame] | 119 |  | 
|  | 120 | .. option:: -iterations=<number of iterations> | 
|  | 121 |  | 
|  | 122 | Specify the number of iterations to run. If this flag is set to 0, then the | 
| Andrea Di Biagio | 074cef3 | 2018-04-10 12:50:03 +0000 | [diff] [blame] | 123 | tool sets the number of iterations to a default value (i.e. 100). | 
| Andrea Di Biagio | 3a6b092 | 2018-03-08 13:05:02 +0000 | [diff] [blame] | 124 |  | 
|  | 125 | .. option:: -noalias=<bool> | 
|  | 126 |  | 
|  | 127 | If set, the tool assumes that loads and stores don't alias. This is the | 
|  | 128 | default behavior. | 
|  | 129 |  | 
|  | 130 | .. option:: -lqueue=<load queue size> | 
|  | 131 |  | 
|  | 132 | Specify the size of the load queue in the load/store unit emulated by the tool. | 
|  | 133 | By default, the tool assumes an unbound number of entries in the load queue. | 
|  | 134 | A value of zero for this flag is ignored, and the default load queue size is | 
|  | 135 | used instead. | 
|  | 136 |  | 
|  | 137 | .. option:: -squeue=<store queue size> | 
|  | 138 |  | 
|  | 139 | Specify the size of the store queue in the load/store unit emulated by the | 
|  | 140 | tool. By default, the tool assumes an unbound number of entries in the store | 
|  | 141 | queue. A value of zero for this flag is ignored, and the default store queue | 
|  | 142 | size is used instead. | 
|  | 143 |  | 
| Andrea Di Biagio | 3a6b092 | 2018-03-08 13:05:02 +0000 | [diff] [blame] | 144 | .. option:: -timeline | 
|  | 145 |  | 
|  | 146 | Enable the timeline view. | 
|  | 147 |  | 
|  | 148 | .. option:: -timeline-max-iterations=<iterations> | 
|  | 149 |  | 
|  | 150 | Limit the number of iterations to print in the timeline view. By default, the | 
|  | 151 | timeline view prints information for up to 10 iterations. | 
|  | 152 |  | 
|  | 153 | .. option:: -timeline-max-cycles=<cycles> | 
|  | 154 |  | 
|  | 155 | Limit the number of cycles in the timeline view. By default, the number of | 
|  | 156 | cycles is set to 80. | 
|  | 157 |  | 
| Andrea Di Biagio | 1feccc2 | 2018-03-26 13:21:48 +0000 | [diff] [blame] | 158 | .. option:: -resource-pressure | 
|  | 159 |  | 
|  | 160 | Enable the resource pressure view. This is enabled by default. | 
|  | 161 |  | 
| Andrea Di Biagio | 8dabf4f | 2018-04-03 16:46:23 +0000 | [diff] [blame] | 162 | .. option:: -register-file-stats | 
|  | 163 |  | 
|  | 164 | Enable register file usage statistics. | 
|  | 165 |  | 
| Andrea Di Biagio | 821f650 | 2018-04-10 14:55:14 +0000 | [diff] [blame] | 166 | .. option:: -dispatch-stats | 
|  | 167 |  | 
|  | 168 | Enable extra dispatch statistics. This view collects and analyzes instruction | 
|  | 169 | dispatch events, as well as static/dynamic dispatch stall events. This view | 
|  | 170 | is disabled by default. | 
|  | 171 |  | 
| Andrea Di Biagio | 1cc29c0 | 2018-04-11 11:37:46 +0000 | [diff] [blame] | 172 | .. option:: -scheduler-stats | 
|  | 173 |  | 
|  | 174 | Enable extra scheduler statistics. This view collects and analyzes instruction | 
|  | 175 | issue events. This view is disabled by default. | 
|  | 176 |  | 
| Andrea Di Biagio | f41ad5c | 2018-04-11 12:12:53 +0000 | [diff] [blame] | 177 | .. option:: -retire-stats | 
|  | 178 |  | 
|  | 179 | Enable extra retire control unit statistics. This view is disabled by default. | 
|  | 180 |  | 
| Andrea Di Biagio | ff9c109 | 2018-03-26 13:44:54 +0000 | [diff] [blame] | 181 | .. option:: -instruction-info | 
|  | 182 |  | 
|  | 183 | Enable the instruction info view. This is enabled by default. | 
|  | 184 |  | 
| Andrea Di Biagio | 650b5fc | 2018-05-17 12:27:03 +0000 | [diff] [blame] | 185 | .. option:: -all-stats | 
|  | 186 |  | 
|  | 187 | Print all hardware statistics. This enables extra statistics related to the | 
|  | 188 | dispatch logic, the hardware schedulers, the register file(s), and the retire | 
|  | 189 | control unit. This option is disabled by default. | 
|  | 190 |  | 
|  | 191 | .. option:: -all-views | 
|  | 192 |  | 
|  | 193 | Enable all the view. | 
|  | 194 |  | 
| Andrea Di Biagio | d156929 | 2018-03-26 12:04:53 +0000 | [diff] [blame] | 195 | .. option:: -instruction-tables | 
|  | 196 |  | 
|  | 197 | Prints resource pressure information based on the static information | 
|  | 198 | available from the processor model. This differs from the resource pressure | 
|  | 199 | view because it doesn't require that the code is simulated. It instead prints | 
|  | 200 | the theoretical uniform distribution of resource pressure for every | 
|  | 201 | instruction in sequence. | 
|  | 202 |  | 
| Andrea Di Biagio | 3a6b092 | 2018-03-08 13:05:02 +0000 | [diff] [blame] | 203 |  | 
|  | 204 | EXIT STATUS | 
|  | 205 | ----------- | 
|  | 206 |  | 
|  | 207 | :program:`llvm-mca` returns 0 on success. Otherwise, an error message is printed | 
|  | 208 | to standard error, and the tool returns 1. | 
|  | 209 |  |