Mauro Carvalho Chehab | 603699b | 2017-05-17 08:19:34 -0300 | [diff] [blame] | 1 | =========== |
| 2 | Static Keys |
| 3 | =========== |
Jason Baron | 1cfa60d | 2012-02-21 15:03:30 -0500 | [diff] [blame] | 4 | |
Mauro Carvalho Chehab | 603699b | 2017-05-17 08:19:34 -0300 | [diff] [blame] | 5 | .. warning:: |
Jason Baron | 412758c | 2015-07-30 03:59:48 +0000 | [diff] [blame] | 6 | |
Mauro Carvalho Chehab | 603699b | 2017-05-17 08:19:34 -0300 | [diff] [blame] | 7 | DEPRECATED API: |
Jason Baron | 412758c | 2015-07-30 03:59:48 +0000 | [diff] [blame] | 8 | |
Mauro Carvalho Chehab | 603699b | 2017-05-17 08:19:34 -0300 | [diff] [blame] | 9 | The use of 'struct static_key' directly, is now DEPRECATED. In addition |
| 10 | static_key_{true,false}() is also DEPRECATED. IE DO NOT use the following:: |
Jason Baron | 412758c | 2015-07-30 03:59:48 +0000 | [diff] [blame] | 11 | |
Mauro Carvalho Chehab | 603699b | 2017-05-17 08:19:34 -0300 | [diff] [blame] | 12 | struct static_key false = STATIC_KEY_INIT_FALSE; |
| 13 | struct static_key true = STATIC_KEY_INIT_TRUE; |
| 14 | static_key_true() |
| 15 | static_key_false() |
Jason Baron | 412758c | 2015-07-30 03:59:48 +0000 | [diff] [blame] | 16 | |
Mauro Carvalho Chehab | 603699b | 2017-05-17 08:19:34 -0300 | [diff] [blame] | 17 | The updated API replacements are:: |
Jason Baron | 1cfa60d | 2012-02-21 15:03:30 -0500 | [diff] [blame] | 18 | |
Mauro Carvalho Chehab | 603699b | 2017-05-17 08:19:34 -0300 | [diff] [blame] | 19 | DEFINE_STATIC_KEY_TRUE(key); |
| 20 | DEFINE_STATIC_KEY_FALSE(key); |
| 21 | DEFINE_STATIC_KEY_ARRAY_TRUE(keys, count); |
| 22 | DEFINE_STATIC_KEY_ARRAY_FALSE(keys, count); |
| 23 | static_branch_likely() |
| 24 | static_branch_unlikely() |
| 25 | |
| 26 | Abstract |
| 27 | ======== |
Jason Baron | 1cfa60d | 2012-02-21 15:03:30 -0500 | [diff] [blame] | 28 | |
| 29 | Static keys allows the inclusion of seldom used features in |
| 30 | performance-sensitive fast-path kernel code, via a GCC feature and a code |
Mauro Carvalho Chehab | 603699b | 2017-05-17 08:19:34 -0300 | [diff] [blame] | 31 | patching technique. A quick example:: |
Jason Baron | 1cfa60d | 2012-02-21 15:03:30 -0500 | [diff] [blame] | 32 | |
Jason Baron | 412758c | 2015-07-30 03:59:48 +0000 | [diff] [blame] | 33 | DEFINE_STATIC_KEY_FALSE(key); |
Jason Baron | 1cfa60d | 2012-02-21 15:03:30 -0500 | [diff] [blame] | 34 | |
| 35 | ... |
| 36 | |
Jason Baron | 412758c | 2015-07-30 03:59:48 +0000 | [diff] [blame] | 37 | if (static_branch_unlikely(&key)) |
Jason Baron | 1cfa60d | 2012-02-21 15:03:30 -0500 | [diff] [blame] | 38 | do unlikely code |
| 39 | else |
| 40 | do likely code |
| 41 | |
| 42 | ... |
Jason Baron | 412758c | 2015-07-30 03:59:48 +0000 | [diff] [blame] | 43 | static_branch_enable(&key); |
Jason Baron | 1cfa60d | 2012-02-21 15:03:30 -0500 | [diff] [blame] | 44 | ... |
Jason Baron | 412758c | 2015-07-30 03:59:48 +0000 | [diff] [blame] | 45 | static_branch_disable(&key); |
Jason Baron | 1cfa60d | 2012-02-21 15:03:30 -0500 | [diff] [blame] | 46 | ... |
| 47 | |
Jason Baron | 412758c | 2015-07-30 03:59:48 +0000 | [diff] [blame] | 48 | The static_branch_unlikely() branch will be generated into the code with as little |
Jason Baron | 1cfa60d | 2012-02-21 15:03:30 -0500 | [diff] [blame] | 49 | impact to the likely code path as possible. |
| 50 | |
| 51 | |
Mauro Carvalho Chehab | 603699b | 2017-05-17 08:19:34 -0300 | [diff] [blame] | 52 | Motivation |
| 53 | ========== |
Jason Baron | 1cfa60d | 2012-02-21 15:03:30 -0500 | [diff] [blame] | 54 | |
| 55 | |
| 56 | Currently, tracepoints are implemented using a conditional branch. The |
| 57 | conditional check requires checking a global variable for each tracepoint. |
| 58 | Although the overhead of this check is small, it increases when the memory |
| 59 | cache comes under pressure (memory cache lines for these global variables may |
| 60 | be shared with other memory accesses). As we increase the number of tracepoints |
| 61 | in the kernel this overhead may become more of an issue. In addition, |
| 62 | tracepoints are often dormant (disabled) and provide no direct kernel |
| 63 | functionality. Thus, it is highly desirable to reduce their impact as much as |
| 64 | possible. Although tracepoints are the original motivation for this work, other |
| 65 | kernel code paths should be able to make use of the static keys facility. |
| 66 | |
| 67 | |
Mauro Carvalho Chehab | 603699b | 2017-05-17 08:19:34 -0300 | [diff] [blame] | 68 | Solution |
| 69 | ======== |
Jason Baron | 1cfa60d | 2012-02-21 15:03:30 -0500 | [diff] [blame] | 70 | |
| 71 | |
| 72 | gcc (v4.5) adds a new 'asm goto' statement that allows branching to a label: |
| 73 | |
| 74 | http://gcc.gnu.org/ml/gcc-patches/2009-07/msg01556.html |
| 75 | |
| 76 | Using the 'asm goto', we can create branches that are either taken or not taken |
| 77 | by default, without the need to check memory. Then, at run-time, we can patch |
| 78 | the branch site to change the branch direction. |
| 79 | |
Mauro Carvalho Chehab | 603699b | 2017-05-17 08:19:34 -0300 | [diff] [blame] | 80 | For example, if we have a simple branch that is disabled by default:: |
Jason Baron | 1cfa60d | 2012-02-21 15:03:30 -0500 | [diff] [blame] | 81 | |
Jason Baron | 412758c | 2015-07-30 03:59:48 +0000 | [diff] [blame] | 82 | if (static_branch_unlikely(&key)) |
Jason Baron | 1cfa60d | 2012-02-21 15:03:30 -0500 | [diff] [blame] | 83 | printk("I am the true branch\n"); |
| 84 | |
| 85 | Thus, by default the 'printk' will not be emitted. And the code generated will |
| 86 | consist of a single atomic 'no-op' instruction (5 bytes on x86), in the |
| 87 | straight-line code path. When the branch is 'flipped', we will patch the |
| 88 | 'no-op' in the straight-line codepath with a 'jump' instruction to the |
| 89 | out-of-line true branch. Thus, changing branch direction is expensive but |
| 90 | branch selection is basically 'free'. That is the basic tradeoff of this |
| 91 | optimization. |
| 92 | |
| 93 | This lowlevel patching mechanism is called 'jump label patching', and it gives |
| 94 | the basis for the static keys facility. |
| 95 | |
Mauro Carvalho Chehab | 603699b | 2017-05-17 08:19:34 -0300 | [diff] [blame] | 96 | Static key label API, usage and examples |
| 97 | ======================================== |
Jason Baron | 1cfa60d | 2012-02-21 15:03:30 -0500 | [diff] [blame] | 98 | |
| 99 | |
Mauro Carvalho Chehab | 603699b | 2017-05-17 08:19:34 -0300 | [diff] [blame] | 100 | In order to make use of this optimization you must first define a key:: |
Jason Baron | 1cfa60d | 2012-02-21 15:03:30 -0500 | [diff] [blame] | 101 | |
Jason Baron | 412758c | 2015-07-30 03:59:48 +0000 | [diff] [blame] | 102 | DEFINE_STATIC_KEY_TRUE(key); |
Jason Baron | 1cfa60d | 2012-02-21 15:03:30 -0500 | [diff] [blame] | 103 | |
Mauro Carvalho Chehab | 603699b | 2017-05-17 08:19:34 -0300 | [diff] [blame] | 104 | or:: |
Jason Baron | 1cfa60d | 2012-02-21 15:03:30 -0500 | [diff] [blame] | 105 | |
Jason Baron | 412758c | 2015-07-30 03:59:48 +0000 | [diff] [blame] | 106 | DEFINE_STATIC_KEY_FALSE(key); |
Jason Baron | 1cfa60d | 2012-02-21 15:03:30 -0500 | [diff] [blame] | 107 | |
Jason Baron | 412758c | 2015-07-30 03:59:48 +0000 | [diff] [blame] | 108 | |
| 109 | The key must be global, that is, it can't be allocated on the stack or dynamically |
Jason Baron | 1cfa60d | 2012-02-21 15:03:30 -0500 | [diff] [blame] | 110 | allocated at run-time. |
| 111 | |
Mauro Carvalho Chehab | 603699b | 2017-05-17 08:19:34 -0300 | [diff] [blame] | 112 | The key is then used in code as:: |
Jason Baron | 1cfa60d | 2012-02-21 15:03:30 -0500 | [diff] [blame] | 113 | |
Jason Baron | 412758c | 2015-07-30 03:59:48 +0000 | [diff] [blame] | 114 | if (static_branch_unlikely(&key)) |
Jason Baron | 1cfa60d | 2012-02-21 15:03:30 -0500 | [diff] [blame] | 115 | do unlikely code |
| 116 | else |
| 117 | do likely code |
| 118 | |
Mauro Carvalho Chehab | 603699b | 2017-05-17 08:19:34 -0300 | [diff] [blame] | 119 | Or:: |
Jason Baron | 1cfa60d | 2012-02-21 15:03:30 -0500 | [diff] [blame] | 120 | |
Jason Baron | 412758c | 2015-07-30 03:59:48 +0000 | [diff] [blame] | 121 | if (static_branch_likely(&key)) |
Jason Baron | 1cfa60d | 2012-02-21 15:03:30 -0500 | [diff] [blame] | 122 | do likely code |
| 123 | else |
| 124 | do unlikely code |
| 125 | |
Jason Baron | 412758c | 2015-07-30 03:59:48 +0000 | [diff] [blame] | 126 | Keys defined via DEFINE_STATIC_KEY_TRUE(), or DEFINE_STATIC_KEY_FALSE, may |
| 127 | be used in either static_branch_likely() or static_branch_unlikely() |
Stan Drozd | 9bb0e9c | 2017-04-21 13:16:03 +0200 | [diff] [blame] | 128 | statements. |
Jason Baron | 1cfa60d | 2012-02-21 15:03:30 -0500 | [diff] [blame] | 129 | |
Mauro Carvalho Chehab | 603699b | 2017-05-17 08:19:34 -0300 | [diff] [blame] | 130 | Branch(es) can be set true via:: |
Jason Baron | 1cfa60d | 2012-02-21 15:03:30 -0500 | [diff] [blame] | 131 | |
Mauro Carvalho Chehab | 603699b | 2017-05-17 08:19:34 -0300 | [diff] [blame] | 132 | static_branch_enable(&key); |
Jason Baron | 412758c | 2015-07-30 03:59:48 +0000 | [diff] [blame] | 133 | |
Mauro Carvalho Chehab | 603699b | 2017-05-17 08:19:34 -0300 | [diff] [blame] | 134 | or false via:: |
Jason Baron | 412758c | 2015-07-30 03:59:48 +0000 | [diff] [blame] | 135 | |
Mauro Carvalho Chehab | 603699b | 2017-05-17 08:19:34 -0300 | [diff] [blame] | 136 | static_branch_disable(&key); |
Jason Baron | 412758c | 2015-07-30 03:59:48 +0000 | [diff] [blame] | 137 | |
Mauro Carvalho Chehab | 603699b | 2017-05-17 08:19:34 -0300 | [diff] [blame] | 138 | The branch(es) can then be switched via reference counts:: |
Jason Baron | 412758c | 2015-07-30 03:59:48 +0000 | [diff] [blame] | 139 | |
| 140 | static_branch_inc(&key); |
Jason Baron | 1cfa60d | 2012-02-21 15:03:30 -0500 | [diff] [blame] | 141 | ... |
Jason Baron | 412758c | 2015-07-30 03:59:48 +0000 | [diff] [blame] | 142 | static_branch_dec(&key); |
Jason Baron | 1cfa60d | 2012-02-21 15:03:30 -0500 | [diff] [blame] | 143 | |
Jason Baron | 412758c | 2015-07-30 03:59:48 +0000 | [diff] [blame] | 144 | Thus, 'static_branch_inc()' means 'make the branch true', and |
| 145 | 'static_branch_dec()' means 'make the branch false' with appropriate |
Jason Baron | 1cfa60d | 2012-02-21 15:03:30 -0500 | [diff] [blame] | 146 | reference counting. For example, if the key is initialized true, a |
Jason Baron | 412758c | 2015-07-30 03:59:48 +0000 | [diff] [blame] | 147 | static_branch_dec(), will switch the branch to false. And a subsequent |
| 148 | static_branch_inc(), will change the branch back to true. Likewise, if the |
| 149 | key is initialized false, a 'static_branch_inc()', will change the branch to |
| 150 | true. And then a 'static_branch_dec()', will again make the branch false. |
Jason Baron | 1cfa60d | 2012-02-21 15:03:30 -0500 | [diff] [blame] | 151 | |
Paolo Bonzini | 7a34bcb | 2017-08-01 17:24:05 +0200 | [diff] [blame] | 152 | The state and the reference count can be retrieved with 'static_key_enabled()' |
| 153 | and 'static_key_count()'. In general, if you use these functions, they |
| 154 | should be protected with the same mutex used around the enable/disable |
| 155 | or increment/decrement function. |
| 156 | |
Marc Zyngier | 5a40527 | 2017-08-01 09:02:56 +0100 | [diff] [blame] | 157 | Note that switching branches results in some locks being taken, |
| 158 | particularly the CPU hotplug lock (in order to avoid races against |
| 159 | CPUs being brought in the kernel whilst the kernel is getting |
| 160 | patched). Calling the static key API from within a hotplug notifier is |
| 161 | thus a sure deadlock recipe. In order to still allow use of the |
| 162 | functionnality, the following functions are provided: |
| 163 | |
| 164 | static_key_enable_cpuslocked() |
| 165 | static_key_disable_cpuslocked() |
| 166 | static_branch_enable_cpuslocked() |
| 167 | static_branch_disable_cpuslocked() |
| 168 | |
| 169 | These functions are *not* general purpose, and must only be used when |
| 170 | you really know that you're in the above context, and no other. |
| 171 | |
Mauro Carvalho Chehab | 603699b | 2017-05-17 08:19:34 -0300 | [diff] [blame] | 172 | Where an array of keys is required, it can be defined as:: |
Catalin Marinas | ef0da55 | 2016-09-05 18:25:47 +0100 | [diff] [blame] | 173 | |
| 174 | DEFINE_STATIC_KEY_ARRAY_TRUE(keys, count); |
| 175 | |
Mauro Carvalho Chehab | 603699b | 2017-05-17 08:19:34 -0300 | [diff] [blame] | 176 | or:: |
Catalin Marinas | ef0da55 | 2016-09-05 18:25:47 +0100 | [diff] [blame] | 177 | |
| 178 | DEFINE_STATIC_KEY_ARRAY_FALSE(keys, count); |
Jason Baron | 1cfa60d | 2012-02-21 15:03:30 -0500 | [diff] [blame] | 179 | |
| 180 | 4) Architecture level code patching interface, 'jump labels' |
| 181 | |
| 182 | |
| 183 | There are a few functions and macros that architectures must implement in order |
| 184 | to take advantage of this optimization. If there is no architecture support, we |
Jason Baron | 3821fd3 | 2017-02-03 15:42:24 -0500 | [diff] [blame] | 185 | simply fall back to a traditional, load, test, and jump sequence. Also, the |
| 186 | struct jump_entry table must be at least 4-byte aligned because the |
| 187 | static_key->entry field makes use of the two least significant bits. |
Jason Baron | 1cfa60d | 2012-02-21 15:03:30 -0500 | [diff] [blame] | 188 | |
Mauro Carvalho Chehab | 603699b | 2017-05-17 08:19:34 -0300 | [diff] [blame] | 189 | * ``select HAVE_ARCH_JUMP_LABEL``, |
| 190 | see: arch/x86/Kconfig |
Jason Baron | 1cfa60d | 2012-02-21 15:03:30 -0500 | [diff] [blame] | 191 | |
Mauro Carvalho Chehab | 603699b | 2017-05-17 08:19:34 -0300 | [diff] [blame] | 192 | * ``#define JUMP_LABEL_NOP_SIZE``, |
| 193 | see: arch/x86/include/asm/jump_label.h |
Jason Baron | 1cfa60d | 2012-02-21 15:03:30 -0500 | [diff] [blame] | 194 | |
Mauro Carvalho Chehab | 603699b | 2017-05-17 08:19:34 -0300 | [diff] [blame] | 195 | * ``__always_inline bool arch_static_branch(struct static_key *key, bool branch)``, |
| 196 | see: arch/x86/include/asm/jump_label.h |
Jason Baron | 1cfa60d | 2012-02-21 15:03:30 -0500 | [diff] [blame] | 197 | |
Mauro Carvalho Chehab | 603699b | 2017-05-17 08:19:34 -0300 | [diff] [blame] | 198 | * ``__always_inline bool arch_static_branch_jump(struct static_key *key, bool branch)``, |
| 199 | see: arch/x86/include/asm/jump_label.h |
Jason Baron | 412758c | 2015-07-30 03:59:48 +0000 | [diff] [blame] | 200 | |
Mauro Carvalho Chehab | 603699b | 2017-05-17 08:19:34 -0300 | [diff] [blame] | 201 | * ``void arch_jump_label_transform(struct jump_entry *entry, enum jump_label_type type)``, |
| 202 | see: arch/x86/kernel/jump_label.c |
Jason Baron | 1cfa60d | 2012-02-21 15:03:30 -0500 | [diff] [blame] | 203 | |
Mauro Carvalho Chehab | 603699b | 2017-05-17 08:19:34 -0300 | [diff] [blame] | 204 | * ``__init_or_module void arch_jump_label_transform_static(struct jump_entry *entry, enum jump_label_type type)``, |
| 205 | see: arch/x86/kernel/jump_label.c |
Jason Baron | 1cfa60d | 2012-02-21 15:03:30 -0500 | [diff] [blame] | 206 | |
Mauro Carvalho Chehab | 603699b | 2017-05-17 08:19:34 -0300 | [diff] [blame] | 207 | * ``struct jump_entry``, |
| 208 | see: arch/x86/include/asm/jump_label.h |
Jason Baron | 1cfa60d | 2012-02-21 15:03:30 -0500 | [diff] [blame] | 209 | |
| 210 | |
| 211 | 5) Static keys / jump label analysis, results (x86_64): |
| 212 | |
| 213 | |
| 214 | As an example, let's add the following branch to 'getppid()', such that the |
Mauro Carvalho Chehab | 603699b | 2017-05-17 08:19:34 -0300 | [diff] [blame] | 215 | system call now looks like:: |
Jason Baron | 1cfa60d | 2012-02-21 15:03:30 -0500 | [diff] [blame] | 216 | |
Mauro Carvalho Chehab | 603699b | 2017-05-17 08:19:34 -0300 | [diff] [blame] | 217 | SYSCALL_DEFINE0(getppid) |
| 218 | { |
Jason Baron | 1cfa60d | 2012-02-21 15:03:30 -0500 | [diff] [blame] | 219 | int pid; |
| 220 | |
Mauro Carvalho Chehab | 603699b | 2017-05-17 08:19:34 -0300 | [diff] [blame] | 221 | + if (static_branch_unlikely(&key)) |
| 222 | + printk("I am the true branch\n"); |
Jason Baron | 1cfa60d | 2012-02-21 15:03:30 -0500 | [diff] [blame] | 223 | |
| 224 | rcu_read_lock(); |
| 225 | pid = task_tgid_vnr(rcu_dereference(current->real_parent)); |
| 226 | rcu_read_unlock(); |
| 227 | |
| 228 | return pid; |
Mauro Carvalho Chehab | 603699b | 2017-05-17 08:19:34 -0300 | [diff] [blame] | 229 | } |
Jason Baron | 1cfa60d | 2012-02-21 15:03:30 -0500 | [diff] [blame] | 230 | |
Mauro Carvalho Chehab | 603699b | 2017-05-17 08:19:34 -0300 | [diff] [blame] | 231 | The resulting instructions with jump labels generated by GCC is:: |
Jason Baron | 1cfa60d | 2012-02-21 15:03:30 -0500 | [diff] [blame] | 232 | |
Mauro Carvalho Chehab | 603699b | 2017-05-17 08:19:34 -0300 | [diff] [blame] | 233 | ffffffff81044290 <sys_getppid>: |
| 234 | ffffffff81044290: 55 push %rbp |
| 235 | ffffffff81044291: 48 89 e5 mov %rsp,%rbp |
| 236 | ffffffff81044294: e9 00 00 00 00 jmpq ffffffff81044299 <sys_getppid+0x9> |
| 237 | ffffffff81044299: 65 48 8b 04 25 c0 b6 mov %gs:0xb6c0,%rax |
| 238 | ffffffff810442a0: 00 00 |
| 239 | ffffffff810442a2: 48 8b 80 80 02 00 00 mov 0x280(%rax),%rax |
| 240 | ffffffff810442a9: 48 8b 80 b0 02 00 00 mov 0x2b0(%rax),%rax |
| 241 | ffffffff810442b0: 48 8b b8 e8 02 00 00 mov 0x2e8(%rax),%rdi |
| 242 | ffffffff810442b7: e8 f4 d9 00 00 callq ffffffff81051cb0 <pid_vnr> |
| 243 | ffffffff810442bc: 5d pop %rbp |
| 244 | ffffffff810442bd: 48 98 cltq |
| 245 | ffffffff810442bf: c3 retq |
| 246 | ffffffff810442c0: 48 c7 c7 e3 54 98 81 mov $0xffffffff819854e3,%rdi |
| 247 | ffffffff810442c7: 31 c0 xor %eax,%eax |
| 248 | ffffffff810442c9: e8 71 13 6d 00 callq ffffffff8171563f <printk> |
| 249 | ffffffff810442ce: eb c9 jmp ffffffff81044299 <sys_getppid+0x9> |
Jason Baron | 1cfa60d | 2012-02-21 15:03:30 -0500 | [diff] [blame] | 250 | |
Mauro Carvalho Chehab | 603699b | 2017-05-17 08:19:34 -0300 | [diff] [blame] | 251 | Without the jump label optimization it looks like:: |
Jason Baron | 1cfa60d | 2012-02-21 15:03:30 -0500 | [diff] [blame] | 252 | |
Mauro Carvalho Chehab | 603699b | 2017-05-17 08:19:34 -0300 | [diff] [blame] | 253 | ffffffff810441f0 <sys_getppid>: |
| 254 | ffffffff810441f0: 8b 05 8a 52 d8 00 mov 0xd8528a(%rip),%eax # ffffffff81dc9480 <key> |
| 255 | ffffffff810441f6: 55 push %rbp |
| 256 | ffffffff810441f7: 48 89 e5 mov %rsp,%rbp |
| 257 | ffffffff810441fa: 85 c0 test %eax,%eax |
| 258 | ffffffff810441fc: 75 27 jne ffffffff81044225 <sys_getppid+0x35> |
| 259 | ffffffff810441fe: 65 48 8b 04 25 c0 b6 mov %gs:0xb6c0,%rax |
| 260 | ffffffff81044205: 00 00 |
| 261 | ffffffff81044207: 48 8b 80 80 02 00 00 mov 0x280(%rax),%rax |
| 262 | ffffffff8104420e: 48 8b 80 b0 02 00 00 mov 0x2b0(%rax),%rax |
| 263 | ffffffff81044215: 48 8b b8 e8 02 00 00 mov 0x2e8(%rax),%rdi |
| 264 | ffffffff8104421c: e8 2f da 00 00 callq ffffffff81051c50 <pid_vnr> |
| 265 | ffffffff81044221: 5d pop %rbp |
| 266 | ffffffff81044222: 48 98 cltq |
| 267 | ffffffff81044224: c3 retq |
| 268 | ffffffff81044225: 48 c7 c7 13 53 98 81 mov $0xffffffff81985313,%rdi |
| 269 | ffffffff8104422c: 31 c0 xor %eax,%eax |
| 270 | ffffffff8104422e: e8 60 0f 6d 00 callq ffffffff81715193 <printk> |
| 271 | ffffffff81044233: eb c9 jmp ffffffff810441fe <sys_getppid+0xe> |
| 272 | ffffffff81044235: 66 66 2e 0f 1f 84 00 data32 nopw %cs:0x0(%rax,%rax,1) |
| 273 | ffffffff8104423c: 00 00 00 00 |
Jason Baron | 1cfa60d | 2012-02-21 15:03:30 -0500 | [diff] [blame] | 274 | |
| 275 | Thus, the disable jump label case adds a 'mov', 'test' and 'jne' instruction |
| 276 | vs. the jump label case just has a 'no-op' or 'jmp 0'. (The jmp 0, is patched |
| 277 | to a 5 byte atomic no-op instruction at boot-time.) Thus, the disabled jump |
Mauro Carvalho Chehab | 603699b | 2017-05-17 08:19:34 -0300 | [diff] [blame] | 278 | label case adds:: |
Jason Baron | 1cfa60d | 2012-02-21 15:03:30 -0500 | [diff] [blame] | 279 | |
Mauro Carvalho Chehab | 603699b | 2017-05-17 08:19:34 -0300 | [diff] [blame] | 280 | 6 (mov) + 2 (test) + 2 (jne) = 10 - 5 (5 byte jump 0) = 5 addition bytes. |
Jason Baron | 1cfa60d | 2012-02-21 15:03:30 -0500 | [diff] [blame] | 281 | |
| 282 | If we then include the padding bytes, the jump label code saves, 16 total bytes |
Masanari Iida | c94bed8e | 2012-04-10 00:22:13 +0900 | [diff] [blame] | 283 | of instruction memory for this small function. In this case the non-jump label |
Xishi Qiu | c79a8d8 | 2013-11-06 13:18:21 -0800 | [diff] [blame] | 284 | function is 80 bytes long. Thus, we have saved 20% of the instruction |
Jason Baron | 1cfa60d | 2012-02-21 15:03:30 -0500 | [diff] [blame] | 285 | footprint. We can in fact improve this even further, since the 5-byte no-op |
| 286 | really can be a 2-byte no-op since we can reach the branch with a 2-byte jmp. |
| 287 | However, we have not yet implemented optimal no-op sizes (they are currently |
| 288 | hard-coded). |
| 289 | |
| 290 | Since there are a number of static key API uses in the scheduler paths, |
| 291 | 'pipe-test' (also known as 'perf bench sched pipe') can be used to show the |
| 292 | performance improvement. Testing done on 3.3.0-rc2: |
| 293 | |
Mauro Carvalho Chehab | 603699b | 2017-05-17 08:19:34 -0300 | [diff] [blame] | 294 | jump label disabled:: |
Jason Baron | 1cfa60d | 2012-02-21 15:03:30 -0500 | [diff] [blame] | 295 | |
| 296 | Performance counter stats for 'bash -c /tmp/pipe-test' (50 runs): |
| 297 | |
| 298 | 855.700314 task-clock # 0.534 CPUs utilized ( +- 0.11% ) |
| 299 | 200,003 context-switches # 0.234 M/sec ( +- 0.00% ) |
| 300 | 0 CPU-migrations # 0.000 M/sec ( +- 39.58% ) |
| 301 | 487 page-faults # 0.001 M/sec ( +- 0.02% ) |
| 302 | 1,474,374,262 cycles # 1.723 GHz ( +- 0.17% ) |
| 303 | <not supported> stalled-cycles-frontend |
| 304 | <not supported> stalled-cycles-backend |
| 305 | 1,178,049,567 instructions # 0.80 insns per cycle ( +- 0.06% ) |
| 306 | 208,368,926 branches # 243.507 M/sec ( +- 0.06% ) |
| 307 | 5,569,188 branch-misses # 2.67% of all branches ( +- 0.54% ) |
| 308 | |
| 309 | 1.601607384 seconds time elapsed ( +- 0.07% ) |
| 310 | |
Mauro Carvalho Chehab | 603699b | 2017-05-17 08:19:34 -0300 | [diff] [blame] | 311 | jump label enabled:: |
Jason Baron | 1cfa60d | 2012-02-21 15:03:30 -0500 | [diff] [blame] | 312 | |
| 313 | Performance counter stats for 'bash -c /tmp/pipe-test' (50 runs): |
| 314 | |
| 315 | 841.043185 task-clock # 0.533 CPUs utilized ( +- 0.12% ) |
| 316 | 200,004 context-switches # 0.238 M/sec ( +- 0.00% ) |
| 317 | 0 CPU-migrations # 0.000 M/sec ( +- 40.87% ) |
| 318 | 487 page-faults # 0.001 M/sec ( +- 0.05% ) |
| 319 | 1,432,559,428 cycles # 1.703 GHz ( +- 0.18% ) |
| 320 | <not supported> stalled-cycles-frontend |
| 321 | <not supported> stalled-cycles-backend |
| 322 | 1,175,363,994 instructions # 0.82 insns per cycle ( +- 0.04% ) |
| 323 | 206,859,359 branches # 245.956 M/sec ( +- 0.04% ) |
| 324 | 4,884,119 branch-misses # 2.36% of all branches ( +- 0.85% ) |
| 325 | |
| 326 | 1.579384366 seconds time elapsed |
| 327 | |
| 328 | The percentage of saved branches is .7%, and we've saved 12% on |
| 329 | 'branch-misses'. This is where we would expect to get the most savings, since |
| 330 | this optimization is about reducing the number of branches. In addition, we've |
| 331 | saved .2% on instructions, and 2.8% on cycles and 1.4% on elapsed time. |