Vlad Tsyrklevich | e55aa03 | 2018-04-03 22:33:53 +0000 | [diff] [blame] | 1 | =============== |
| 2 | ShadowCallStack |
| 3 | =============== |
| 4 | |
| 5 | .. contents:: |
| 6 | :local: |
| 7 | |
| 8 | Introduction |
| 9 | ============ |
| 10 | |
Peter Collingbourne | 27aa8b6 | 2019-02-12 22:45:23 +0000 | [diff] [blame] | 11 | ShadowCallStack is an instrumentation pass, currently only implemented for |
Vlad Tsyrklevich | 2e1479e | 2019-03-07 18:56:36 +0000 | [diff] [blame] | 12 | aarch64, that protects programs against return address overwrites |
Peter Collingbourne | 27aa8b6 | 2019-02-12 22:45:23 +0000 | [diff] [blame] | 13 | (e.g. stack buffer overflows.) It works by saving a function's return address |
| 14 | to a separately allocated 'shadow call stack' in the function prolog in |
| 15 | non-leaf functions and loading the return address from the shadow call stack |
| 16 | in the function epilog. The return address is also stored on the regular stack |
| 17 | for compatibility with unwinders, but is otherwise unused. |
| 18 | |
| 19 | The aarch64 implementation is considered production ready, and |
| 20 | an `implementation of the runtime`_ has been added to Android's libc |
Vlad Tsyrklevich | 2e1479e | 2019-03-07 18:56:36 +0000 | [diff] [blame] | 21 | (bionic). An x86_64 implementation was evaluated using Chromium and was found |
| 22 | to have critical performance and security deficiencies--it was removed in |
| 23 | LLVM 9.0. Details on the x86_64 implementation can be found in the |
| 24 | `Clang 7.0.1 documentation`_. |
Peter Collingbourne | 27aa8b6 | 2019-02-12 22:45:23 +0000 | [diff] [blame] | 25 | |
| 26 | .. _`implementation of the runtime`: https://android.googlesource.com/platform/bionic/+/808d176e7e0dd727c7f929622ec017f6e065c582/libc/bionic/pthread_create.cpp#128 |
| 27 | .. _`Clang 7.0.1 documentation`: https://releases.llvm.org/7.0.1/tools/clang/docs/ShadowCallStack.html |
Vlad Tsyrklevich | e55aa03 | 2018-04-03 22:33:53 +0000 | [diff] [blame] | 28 | |
| 29 | Comparison |
| 30 | ---------- |
| 31 | |
Peter Collingbourne | 27aa8b6 | 2019-02-12 22:45:23 +0000 | [diff] [blame] | 32 | To optimize for memory consumption and cache locality, the shadow call |
| 33 | stack stores only an array of return addresses. This is in contrast to other |
| 34 | schemes, like :doc:`SafeStack`, that mirror the entire stack and trade-off |
| 35 | consuming more memory for shorter function prologs and epilogs with fewer |
| 36 | memory accesses. |
| 37 | |
| 38 | `Return Flow Guard`_ is a pure software implementation of shadow call stacks |
Vlad Tsyrklevich | 2e1479e | 2019-03-07 18:56:36 +0000 | [diff] [blame] | 39 | on x86_64. Like the previous implementation of ShadowCallStack on x86_64, it is |
| 40 | inherently racy due to the architecture's use of the stack for calls and |
| 41 | returns. |
Peter Collingbourne | 27aa8b6 | 2019-02-12 22:45:23 +0000 | [diff] [blame] | 42 | |
| 43 | Intel `Control-flow Enforcement Technology`_ (CET) is a proposed hardware |
| 44 | extension that would add native support to use a shadow stack to store/check |
| 45 | return addresses at call/return time. Being a hardware implementation, it |
| 46 | would not suffer from race conditions and would not incur the overhead of |
| 47 | function instrumentation, but it does require operating system support. |
Vlad Tsyrklevich | e55aa03 | 2018-04-03 22:33:53 +0000 | [diff] [blame] | 48 | |
| 49 | .. _`Return Flow Guard`: https://xlab.tencent.com/en/2016/11/02/return-flow-guard/ |
| 50 | .. _`Control-flow Enforcement Technology`: https://software.intel.com/sites/default/files/managed/4d/2a/control-flow-enforcement-technology-preview.pdf |
| 51 | |
| 52 | Compatibility |
| 53 | ------------- |
| 54 | |
Peter Collingbourne | 27aa8b6 | 2019-02-12 22:45:23 +0000 | [diff] [blame] | 55 | A runtime is not provided in compiler-rt so one must be provided by the |
| 56 | compiled application or the operating system. Integrating the runtime into |
| 57 | the operating system should be preferred since otherwise all thread creation |
| 58 | and destruction would need to be intercepted by the application. |
Peter Collingbourne | f11eb3e | 2018-04-04 21:55:44 +0000 | [diff] [blame] | 59 | |
Peter Collingbourne | 27aa8b6 | 2019-02-12 22:45:23 +0000 | [diff] [blame] | 60 | The instrumentation makes use of the platform register ``x18``. On some |
| 61 | platforms, ``x18`` is reserved, and on others, it is designated as a scratch |
| 62 | register. This generally means that any code that may run on the same thread |
| 63 | as code compiled with ShadowCallStack must either target one of the platforms |
| 64 | whose ABI reserves ``x18`` (currently Android, Darwin, Fuchsia and Windows) |
| 65 | or be compiled with the flag ``-ffixed-x18``. If absolutely necessary, code |
| 66 | compiled without ``-ffixed-x18`` may be run on the same thread as code that |
| 67 | uses ShadowCallStack by saving the register value temporarily on the stack |
| 68 | (`example in Android`_) but this should be done with care since it risks |
| 69 | leaking the shadow call stack address. |
| 70 | |
| 71 | .. _`example in Android`: https://android-review.googlesource.com/c/platform/frameworks/base/+/803717 |
| 72 | |
| 73 | Because of the use of register ``x18``, the ShadowCallStack feature is |
| 74 | incompatible with any other feature that may use ``x18``. However, there |
| 75 | is no inherent reason why ShadowCallStack needs to use register ``x18`` |
| 76 | specifically; in principle, a platform could choose to reserve and use another |
| 77 | register for ShadowCallStack, but this would be incompatible with the AAPCS64. |
| 78 | |
| 79 | Special unwind information is required on functions that are compiled |
| 80 | with ShadowCallStack and that may be unwound, i.e. functions compiled with |
| 81 | ``-fexceptions`` (which is the default in C++). Some unwinders (such as the |
| 82 | libgcc 4.9 unwinder) do not understand this unwind info and will segfault |
| 83 | when encountering it. LLVM libunwind processes this unwind info correctly, |
| 84 | however. This means that if exceptions are used together with ShadowCallStack, |
| 85 | the program must use a compatible unwinder. |
Vlad Tsyrklevich | e55aa03 | 2018-04-03 22:33:53 +0000 | [diff] [blame] | 86 | |
| 87 | Security |
| 88 | ======== |
| 89 | |
| 90 | ShadowCallStack is intended to be a stronger alternative to |
| 91 | ``-fstack-protector``. It protects from non-linear overflows and arbitrary |
Peter Collingbourne | 27aa8b6 | 2019-02-12 22:45:23 +0000 | [diff] [blame] | 92 | memory writes to the return address slot. |
Vlad Tsyrklevich | e55aa03 | 2018-04-03 22:33:53 +0000 | [diff] [blame] | 93 | |
Peter Collingbourne | 27aa8b6 | 2019-02-12 22:45:23 +0000 | [diff] [blame] | 94 | The instrumentation makes use of the ``x18`` register to reference the shadow |
| 95 | call stack, meaning that references to the shadow call stack do not have |
| 96 | to be stored in memory. This makes it possible to implement a runtime that |
| 97 | avoids exposing the address of the shadow call stack to attackers that can |
| 98 | read arbitrary memory. However, attackers could still try to exploit side |
| 99 | channels exposed by the operating system `[1]`_ `[2]`_ or processor `[3]`_ |
| 100 | to discover the address of the shadow call stack. |
Vlad Tsyrklevich | e55aa03 | 2018-04-03 22:33:53 +0000 | [diff] [blame] | 101 | |
| 102 | .. _`[1]`: https://eyalitkin.wordpress.com/2017/09/01/cartography-lighting-up-the-shadows/ |
| 103 | .. _`[2]`: https://www.blackhat.com/docs/eu-16/materials/eu-16-Goktas-Bypassing-Clangs-SafeStack.pdf |
| 104 | .. _`[3]`: https://www.vusec.net/projects/anc/ |
| 105 | |
Peter Collingbourne | 27aa8b6 | 2019-02-12 22:45:23 +0000 | [diff] [blame] | 106 | Unless care is taken when allocating the shadow call stack, it may be |
| 107 | possible for an attacker to guess its address using the addresses of |
| 108 | other allocations. Therefore, the address should be chosen to make this |
| 109 | difficult. One way to do this is to allocate a large guard region without |
| 110 | read/write permissions, randomly select a small region within it to be |
| 111 | used as the address of the shadow call stack and mark only that region as |
| 112 | read/write. This also mitigates somewhat against processor side channels. |
| 113 | The intent is that the Android runtime `will do this`_, but the platform will |
| 114 | first need to be `changed`_ to avoid using ``setrlimit(RLIMIT_AS)`` to limit |
| 115 | memory allocations in certain processes, as this also limits the number of |
| 116 | guard regions that can be allocated. |
Peter Collingbourne | f11eb3e | 2018-04-04 21:55:44 +0000 | [diff] [blame] | 117 | |
Peter Collingbourne | 27aa8b6 | 2019-02-12 22:45:23 +0000 | [diff] [blame] | 118 | .. _`will do this`: https://android-review.googlesource.com/c/platform/bionic/+/891622 |
| 119 | .. _`changed`: https://android-review.googlesource.com/c/platform/frameworks/av/+/837745 |
| 120 | |
| 121 | The runtime will need the address of the shadow call stack in order to |
| 122 | deallocate it when destroying the thread. If the entire program is compiled |
| 123 | with ``-ffixed-x18``, this is trivial: the address can be derived from the |
| 124 | value stored in ``x18`` (e.g. by masking out the lower bits). If a guard |
| 125 | region is used, the address of the start of the guard region could then be |
| 126 | stored at the start of the shadow call stack itself. But if it is possible |
| 127 | for code compiled without ``-ffixed-x18`` to run on a thread managed by the |
| 128 | runtime, which is the case on Android for example, the address must be stored |
| 129 | somewhere else instead. On Android we store the address of the start of the |
| 130 | guard region in TLS and deallocate the entire guard region including the |
| 131 | shadow call stack at thread exit. This is considered acceptable given that |
| 132 | the address of the start of the guard region is already somewhat guessable. |
| 133 | |
| 134 | One way in which the address of the shadow call stack could leak is in the |
| 135 | ``jmp_buf`` data structure used by ``setjmp`` and ``longjmp``. The Android |
| 136 | runtime `avoids this`_ by only storing the low bits of ``x18`` in the |
| 137 | ``jmp_buf``, which requires the address of the shadow call stack to be |
| 138 | aligned to its size. |
| 139 | |
| 140 | .. _`avoids this`: https://android.googlesource.com/platform/bionic/+/808d176e7e0dd727c7f929622ec017f6e065c582/libc/arch-arm64/bionic/setjmp.S#49 |
| 141 | |
| 142 | The architecture's call and return instructions (``bl`` and ``ret``) operate on |
| 143 | a register rather than the stack, which means that leaf functions are generally |
| 144 | protected from return address overwrites even without ShadowCallStack. |
Vlad Tsyrklevich | e55aa03 | 2018-04-03 22:33:53 +0000 | [diff] [blame] | 145 | |
| 146 | Usage |
| 147 | ===== |
| 148 | |
Peter Collingbourne | f11eb3e | 2018-04-04 21:55:44 +0000 | [diff] [blame] | 149 | To enable ShadowCallStack, just pass the ``-fsanitize=shadow-call-stack`` |
| 150 | flag to both compile and link command lines. On aarch64, you also need to pass |
| 151 | ``-ffixed-x18`` unless your target already reserves ``x18``. |
Vlad Tsyrklevich | e55aa03 | 2018-04-03 22:33:53 +0000 | [diff] [blame] | 152 | |
| 153 | Low-level API |
| 154 | ------------- |
| 155 | |
| 156 | ``__has_feature(shadow_call_stack)`` |
| 157 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 158 | |
| 159 | In some cases one may need to execute different code depending on whether |
| 160 | ShadowCallStack is enabled. The macro ``__has_feature(shadow_call_stack)`` can |
| 161 | be used for this purpose. |
| 162 | |
| 163 | .. code-block:: c |
| 164 | |
| 165 | #if defined(__has_feature) |
| 166 | # if __has_feature(shadow_call_stack) |
| 167 | // code that builds only under ShadowCallStack |
| 168 | # endif |
| 169 | #endif |
| 170 | |
| 171 | ``__attribute__((no_sanitize("shadow-call-stack")))`` |
| 172 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 173 | |
| 174 | Use ``__attribute__((no_sanitize("shadow-call-stack")))`` on a function |
| 175 | declaration to specify that the shadow call stack instrumentation should not be |
| 176 | applied to that function, even if enabled globally. |
| 177 | |
| 178 | Example |
| 179 | ======= |
| 180 | |
| 181 | The following example code: |
| 182 | |
| 183 | .. code-block:: c++ |
| 184 | |
| 185 | int foo() { |
| 186 | return bar() + 1; |
| 187 | } |
| 188 | |
Peter Collingbourne | 27aa8b6 | 2019-02-12 22:45:23 +0000 | [diff] [blame] | 189 | Generates the following aarch64 assembly when compiled with ``-O2``: |
Peter Collingbourne | f11eb3e | 2018-04-04 21:55:44 +0000 | [diff] [blame] | 190 | |
| 191 | .. code-block:: none |
| 192 | |
| 193 | stp x29, x30, [sp, #-16]! |
| 194 | mov x29, sp |
| 195 | bl bar |
| 196 | add w0, w0, #1 |
| 197 | ldp x29, x30, [sp], #16 |
| 198 | ret |
| 199 | |
Peter Collingbourne | 27aa8b6 | 2019-02-12 22:45:23 +0000 | [diff] [blame] | 200 | Adding ``-fsanitize=shadow-call-stack`` would output the following assembly: |
Peter Collingbourne | f11eb3e | 2018-04-04 21:55:44 +0000 | [diff] [blame] | 201 | |
| 202 | .. code-block:: none |
| 203 | |
| 204 | str x30, [x18], #8 |
| 205 | stp x29, x30, [sp, #-16]! |
| 206 | mov x29, sp |
| 207 | bl bar |
| 208 | add w0, w0, #1 |
| 209 | ldp x29, x30, [sp], #16 |
| 210 | ldr x30, [x18, #-8]! |
| 211 | ret |