Vlad Tsyrklevich | e55aa03 | 2018-04-03 22:33:53 +0000 | [diff] [blame] | 1 | =============== |
| 2 | ShadowCallStack |
| 3 | =============== |
| 4 | |
| 5 | .. contents:: |
| 6 | :local: |
| 7 | |
| 8 | Introduction |
| 9 | ============ |
| 10 | |
| 11 | ShadowCallStack is an **experimental** instrumentation pass, currently only |
Peter Collingbourne | f11eb3e | 2018-04-04 21:55:44 +0000 | [diff] [blame] | 12 | implemented for x86_64 and aarch64, that protects programs against return |
| 13 | address overwrites (e.g. stack buffer overflows.) It works by saving a |
| 14 | function's return address to a separately allocated 'shadow call stack' |
| 15 | in the function prolog and checking the return address on the stack against |
| 16 | the shadow call stack in the function epilog. |
Vlad Tsyrklevich | e55aa03 | 2018-04-03 22:33:53 +0000 | [diff] [blame] | 17 | |
| 18 | Comparison |
| 19 | ---------- |
| 20 | |
| 21 | To optimize for memory consumption and cache locality, the shadow call stack |
| 22 | stores an index followed by an array of return addresses. This is in contrast |
| 23 | to other schemes, like :doc:`SafeStack`, that mirror the entire stack and |
| 24 | trade-off consuming more memory for shorter function prologs and epilogs with |
| 25 | fewer memory accesses. Similarly, `Return Flow Guard`_ consumes more memory with |
| 26 | shorter function prologs and epilogs than ShadowCallStack but suffers from the |
| 27 | same race conditions (see `Security`_). Intel `Control-flow Enforcement Technology`_ |
| 28 | (CET) is a proposed hardware extension that would add native support to |
| 29 | use a shadow stack to store/check return addresses at call/return time. It |
| 30 | would not suffer from race conditions at calls and returns and not incur the |
| 31 | overhead of function instrumentation, but it does require operating system |
| 32 | support. |
| 33 | |
| 34 | .. _`Return Flow Guard`: https://xlab.tencent.com/en/2016/11/02/return-flow-guard/ |
| 35 | .. _`Control-flow Enforcement Technology`: https://software.intel.com/sites/default/files/managed/4d/2a/control-flow-enforcement-technology-preview.pdf |
| 36 | |
| 37 | Compatibility |
| 38 | ------------- |
| 39 | |
Peter Collingbourne | f11eb3e | 2018-04-04 21:55:44 +0000 | [diff] [blame] | 40 | ShadowCallStack currently only supports x86_64 and aarch64. A runtime is not |
| 41 | currently provided in compiler-rt so one must be provided by the compiled |
| 42 | application. |
| 43 | |
| 44 | On aarch64, the instrumentation makes use of the platform register ``x18``. |
| 45 | On some platforms, ``x18`` is reserved, and on others, it is designated as |
| 46 | a scratch register. This generally means that any code that may run on the |
| 47 | same thread as code compiled with ShadowCallStack must either target one |
| 48 | of the platforms whose ABI reserves ``x18`` (currently Darwin, Fuchsia and |
| 49 | Windows) or be compiled with the flag ``-ffixed-x18``. |
Vlad Tsyrklevich | e55aa03 | 2018-04-03 22:33:53 +0000 | [diff] [blame] | 50 | |
| 51 | Security |
| 52 | ======== |
| 53 | |
| 54 | ShadowCallStack is intended to be a stronger alternative to |
| 55 | ``-fstack-protector``. It protects from non-linear overflows and arbitrary |
| 56 | memory writes to the return address slot; however, similarly to |
| 57 | ``-fstack-protector`` this protection suffers from race conditions because of |
| 58 | the call-return semantics on x86_64. There is a short race between the call |
| 59 | instruction and the first instruction in the function that reads the return |
| 60 | address where an attacker could overwrite the return address and bypass |
| 61 | ShadowCallStack. Similarly, there is a time-of-check-to-time-of-use race in the |
| 62 | function epilog where an attacker could overwrite the return address after it |
| 63 | has been checked and before it has been returned to. Modifying the call-return |
| 64 | semantics to fix this on x86_64 would incur an unacceptable performance overhead |
| 65 | due to return branch prediction. |
| 66 | |
Peter Collingbourne | f11eb3e | 2018-04-04 21:55:44 +0000 | [diff] [blame] | 67 | The instrumentation makes use of the ``gs`` segment register on x86_64, |
| 68 | or the ``x18`` register on aarch64, to reference the shadow call stack |
| 69 | meaning that references to the shadow call stack do not have to be stored in |
| 70 | memory. This makes it possible to implement a runtime that avoids exposing |
| 71 | the address of the shadow call stack to attackers that can read arbitrary |
| 72 | memory. However, attackers could still try to exploit side channels exposed |
| 73 | by the operating system `[1]`_ `[2]`_ or processor `[3]`_ to discover the |
| 74 | address of the shadow call stack. |
Vlad Tsyrklevich | e55aa03 | 2018-04-03 22:33:53 +0000 | [diff] [blame] | 75 | |
| 76 | .. _`[1]`: https://eyalitkin.wordpress.com/2017/09/01/cartography-lighting-up-the-shadows/ |
| 77 | .. _`[2]`: https://www.blackhat.com/docs/eu-16/materials/eu-16-Goktas-Bypassing-Clangs-SafeStack.pdf |
| 78 | .. _`[3]`: https://www.vusec.net/projects/anc/ |
| 79 | |
Peter Collingbourne | f11eb3e | 2018-04-04 21:55:44 +0000 | [diff] [blame] | 80 | On x86_64, leaf functions are optimized to store the return address in a |
| 81 | free register and avoid writing to the shadow call stack if a register is |
| 82 | available. Very short leaf functions are uninstrumented if their execution |
| 83 | is judged to be shorter than the race condition window intrinsic to the |
| 84 | instrumentation. |
| 85 | |
| 86 | On aarch64, the architecture's call and return instructions (``bl`` and |
| 87 | ``ret``) operate on a register rather than the stack, which means that |
| 88 | leaf functions are generally protected from return address overwrites even |
| 89 | without ShadowCallStack. It also means that ShadowCallStack on aarch64 is not |
| 90 | vulnerable to the same types of time-of-check-to-time-of-use races as x86_64. |
Vlad Tsyrklevich | e55aa03 | 2018-04-03 22:33:53 +0000 | [diff] [blame] | 91 | |
| 92 | Usage |
| 93 | ===== |
| 94 | |
Peter Collingbourne | f11eb3e | 2018-04-04 21:55:44 +0000 | [diff] [blame] | 95 | To enable ShadowCallStack, just pass the ``-fsanitize=shadow-call-stack`` |
| 96 | flag to both compile and link command lines. On aarch64, you also need to pass |
| 97 | ``-ffixed-x18`` unless your target already reserves ``x18``. |
Vlad Tsyrklevich | e55aa03 | 2018-04-03 22:33:53 +0000 | [diff] [blame] | 98 | |
| 99 | Low-level API |
| 100 | ------------- |
| 101 | |
| 102 | ``__has_feature(shadow_call_stack)`` |
| 103 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 104 | |
| 105 | In some cases one may need to execute different code depending on whether |
| 106 | ShadowCallStack is enabled. The macro ``__has_feature(shadow_call_stack)`` can |
| 107 | be used for this purpose. |
| 108 | |
| 109 | .. code-block:: c |
| 110 | |
| 111 | #if defined(__has_feature) |
| 112 | # if __has_feature(shadow_call_stack) |
| 113 | // code that builds only under ShadowCallStack |
| 114 | # endif |
| 115 | #endif |
| 116 | |
| 117 | ``__attribute__((no_sanitize("shadow-call-stack")))`` |
| 118 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 119 | |
| 120 | Use ``__attribute__((no_sanitize("shadow-call-stack")))`` on a function |
| 121 | declaration to specify that the shadow call stack instrumentation should not be |
| 122 | applied to that function, even if enabled globally. |
| 123 | |
| 124 | Example |
| 125 | ======= |
| 126 | |
| 127 | The following example code: |
| 128 | |
| 129 | .. code-block:: c++ |
| 130 | |
| 131 | int foo() { |
| 132 | return bar() + 1; |
| 133 | } |
| 134 | |
| 135 | Generates the following x86_64 assembly when compiled with ``-O2``: |
| 136 | |
| 137 | .. code-block:: gas |
| 138 | |
| 139 | push %rax |
Kostya Serebryany | d5dc819 | 2018-05-01 00:15:56 +0000 | [diff] [blame] | 140 | callq bar |
Vlad Tsyrklevich | e55aa03 | 2018-04-03 22:33:53 +0000 | [diff] [blame] | 141 | add $0x1,%eax |
| 142 | pop %rcx |
| 143 | retq |
| 144 | |
Peter Collingbourne | f11eb3e | 2018-04-04 21:55:44 +0000 | [diff] [blame] | 145 | or the following aarch64 assembly: |
| 146 | |
| 147 | .. code-block:: none |
| 148 | |
| 149 | stp x29, x30, [sp, #-16]! |
| 150 | mov x29, sp |
| 151 | bl bar |
| 152 | add w0, w0, #1 |
| 153 | ldp x29, x30, [sp], #16 |
| 154 | ret |
| 155 | |
| 156 | |
| 157 | Adding ``-fsanitize=shadow-call-stack`` would output the following x86_64 |
| 158 | assembly: |
Vlad Tsyrklevich | e55aa03 | 2018-04-03 22:33:53 +0000 | [diff] [blame] | 159 | |
| 160 | .. code-block:: gas |
| 161 | |
| 162 | mov (%rsp),%r10 |
| 163 | xor %r11,%r11 |
| 164 | addq $0x8,%gs:(%r11) |
| 165 | mov %gs:(%r11),%r11 |
| 166 | mov %r10,%gs:(%r11) |
| 167 | push %rax |
Kostya Serebryany | d5dc819 | 2018-05-01 00:15:56 +0000 | [diff] [blame] | 168 | callq bar |
Vlad Tsyrklevich | e55aa03 | 2018-04-03 22:33:53 +0000 | [diff] [blame] | 169 | add $0x1,%eax |
| 170 | pop %rcx |
| 171 | xor %r11,%r11 |
| 172 | mov %gs:(%r11),%r10 |
| 173 | mov %gs:(%r10),%r10 |
| 174 | subq $0x8,%gs:(%r11) |
| 175 | cmp %r10,(%rsp) |
| 176 | jne trap |
| 177 | retq |
| 178 | |
| 179 | trap: |
| 180 | ud2 |
Peter Collingbourne | f11eb3e | 2018-04-04 21:55:44 +0000 | [diff] [blame] | 181 | |
| 182 | or the following aarch64 assembly: |
| 183 | |
| 184 | .. code-block:: none |
| 185 | |
| 186 | str x30, [x18], #8 |
| 187 | stp x29, x30, [sp, #-16]! |
| 188 | mov x29, sp |
| 189 | bl bar |
| 190 | add w0, w0, #1 |
| 191 | ldp x29, x30, [sp], #16 |
| 192 | ldr x30, [x18, #-8]! |
| 193 | ret |