blob: c1284f7c2dea3049e269f367e39066745b9ec062 [file] [log] [blame]
Vlad Tsyrkleviche55aa032018-04-03 22:33:53 +00001===============
2ShadowCallStack
3===============
4
5.. contents::
6 :local:
7
8Introduction
9============
10
Peter Collingbourne27aa8b62019-02-12 22:45:23 +000011ShadowCallStack is an instrumentation pass, currently only implemented for
12aarch64 and x86_64, that protects programs against return address overwrites
13(e.g. stack buffer overflows.) It works by saving a function's return address
14to a separately allocated 'shadow call stack' in the function prolog in
15non-leaf functions and loading the return address from the shadow call stack
16in the function epilog. The return address is also stored on the regular stack
17for compatibility with unwinders, but is otherwise unused.
18
19The aarch64 implementation is considered production ready, and
20an `implementation of the runtime`_ has been added to Android's libc
21(bionic). The x86_64 implementation was evaluated using Chromium and was
22found to have critical performance and security deficiencies, and may be
23removed in a future release of the compiler. This document only describes
24the aarch64 implementation; details on the x86_64 implementation are found
25in the `Clang 7.0.1 documentation`_.
26
27.. _`implementation of the runtime`: https://android.googlesource.com/platform/bionic/+/808d176e7e0dd727c7f929622ec017f6e065c582/libc/bionic/pthread_create.cpp#128
28.. _`Clang 7.0.1 documentation`: https://releases.llvm.org/7.0.1/tools/clang/docs/ShadowCallStack.html
Vlad Tsyrkleviche55aa032018-04-03 22:33:53 +000029
30Comparison
31----------
32
Peter Collingbourne27aa8b62019-02-12 22:45:23 +000033To optimize for memory consumption and cache locality, the shadow call
34stack stores only an array of return addresses. This is in contrast to other
35schemes, like :doc:`SafeStack`, that mirror the entire stack and trade-off
36consuming more memory for shorter function prologs and epilogs with fewer
37memory accesses.
38
39`Return Flow Guard`_ is a pure software implementation of shadow call stacks
40on x86_64. It is similar to the ShadowCallStack x86_64 implementation but
41trades off higher memory usage for a shorter prologue and epilogue. Like
42x86_64 ShadowCallStack, it is inherently racy due to the architecture's use
43of the stack for calls and returns.
44
45Intel `Control-flow Enforcement Technology`_ (CET) is a proposed hardware
46extension that would add native support to use a shadow stack to store/check
47return addresses at call/return time. Being a hardware implementation, it
48would not suffer from race conditions and would not incur the overhead of
49function instrumentation, but it does require operating system support.
Vlad Tsyrkleviche55aa032018-04-03 22:33:53 +000050
51.. _`Return Flow Guard`: https://xlab.tencent.com/en/2016/11/02/return-flow-guard/
52.. _`Control-flow Enforcement Technology`: https://software.intel.com/sites/default/files/managed/4d/2a/control-flow-enforcement-technology-preview.pdf
53
54Compatibility
55-------------
56
Peter Collingbourne27aa8b62019-02-12 22:45:23 +000057A runtime is not provided in compiler-rt so one must be provided by the
58compiled application or the operating system. Integrating the runtime into
59the operating system should be preferred since otherwise all thread creation
60and destruction would need to be intercepted by the application.
Peter Collingbournef11eb3e2018-04-04 21:55:44 +000061
Peter Collingbourne27aa8b62019-02-12 22:45:23 +000062The instrumentation makes use of the platform register ``x18``. On some
63platforms, ``x18`` is reserved, and on others, it is designated as a scratch
64register. This generally means that any code that may run on the same thread
65as code compiled with ShadowCallStack must either target one of the platforms
66whose ABI reserves ``x18`` (currently Android, Darwin, Fuchsia and Windows)
67or be compiled with the flag ``-ffixed-x18``. If absolutely necessary, code
68compiled without ``-ffixed-x18`` may be run on the same thread as code that
69uses ShadowCallStack by saving the register value temporarily on the stack
70(`example in Android`_) but this should be done with care since it risks
71leaking the shadow call stack address.
72
73.. _`example in Android`: https://android-review.googlesource.com/c/platform/frameworks/base/+/803717
74
75Because of the use of register ``x18``, the ShadowCallStack feature is
76incompatible with any other feature that may use ``x18``. However, there
77is no inherent reason why ShadowCallStack needs to use register ``x18``
78specifically; in principle, a platform could choose to reserve and use another
79register for ShadowCallStack, but this would be incompatible with the AAPCS64.
80
81Special unwind information is required on functions that are compiled
82with ShadowCallStack and that may be unwound, i.e. functions compiled with
83``-fexceptions`` (which is the default in C++). Some unwinders (such as the
84libgcc 4.9 unwinder) do not understand this unwind info and will segfault
85when encountering it. LLVM libunwind processes this unwind info correctly,
86however. This means that if exceptions are used together with ShadowCallStack,
87the program must use a compatible unwinder.
Vlad Tsyrkleviche55aa032018-04-03 22:33:53 +000088
89Security
90========
91
92ShadowCallStack is intended to be a stronger alternative to
93``-fstack-protector``. It protects from non-linear overflows and arbitrary
Peter Collingbourne27aa8b62019-02-12 22:45:23 +000094memory writes to the return address slot.
Vlad Tsyrkleviche55aa032018-04-03 22:33:53 +000095
Peter Collingbourne27aa8b62019-02-12 22:45:23 +000096The instrumentation makes use of the ``x18`` register to reference the shadow
97call stack, meaning that references to the shadow call stack do not have
98to be stored in memory. This makes it possible to implement a runtime that
99avoids exposing the address of the shadow call stack to attackers that can
100read arbitrary memory. However, attackers could still try to exploit side
101channels exposed by the operating system `[1]`_ `[2]`_ or processor `[3]`_
102to discover the address of the shadow call stack.
Vlad Tsyrkleviche55aa032018-04-03 22:33:53 +0000103
104.. _`[1]`: https://eyalitkin.wordpress.com/2017/09/01/cartography-lighting-up-the-shadows/
105.. _`[2]`: https://www.blackhat.com/docs/eu-16/materials/eu-16-Goktas-Bypassing-Clangs-SafeStack.pdf
106.. _`[3]`: https://www.vusec.net/projects/anc/
107
Peter Collingbourne27aa8b62019-02-12 22:45:23 +0000108Unless care is taken when allocating the shadow call stack, it may be
109possible for an attacker to guess its address using the addresses of
110other allocations. Therefore, the address should be chosen to make this
111difficult. One way to do this is to allocate a large guard region without
112read/write permissions, randomly select a small region within it to be
113used as the address of the shadow call stack and mark only that region as
114read/write. This also mitigates somewhat against processor side channels.
115The intent is that the Android runtime `will do this`_, but the platform will
116first need to be `changed`_ to avoid using ``setrlimit(RLIMIT_AS)`` to limit
117memory allocations in certain processes, as this also limits the number of
118guard regions that can be allocated.
Peter Collingbournef11eb3e2018-04-04 21:55:44 +0000119
Peter Collingbourne27aa8b62019-02-12 22:45:23 +0000120.. _`will do this`: https://android-review.googlesource.com/c/platform/bionic/+/891622
121.. _`changed`: https://android-review.googlesource.com/c/platform/frameworks/av/+/837745
122
123The runtime will need the address of the shadow call stack in order to
124deallocate it when destroying the thread. If the entire program is compiled
125with ``-ffixed-x18``, this is trivial: the address can be derived from the
126value stored in ``x18`` (e.g. by masking out the lower bits). If a guard
127region is used, the address of the start of the guard region could then be
128stored at the start of the shadow call stack itself. But if it is possible
129for code compiled without ``-ffixed-x18`` to run on a thread managed by the
130runtime, which is the case on Android for example, the address must be stored
131somewhere else instead. On Android we store the address of the start of the
132guard region in TLS and deallocate the entire guard region including the
133shadow call stack at thread exit. This is considered acceptable given that
134the address of the start of the guard region is already somewhat guessable.
135
136One way in which the address of the shadow call stack could leak is in the
137``jmp_buf`` data structure used by ``setjmp`` and ``longjmp``. The Android
138runtime `avoids this`_ by only storing the low bits of ``x18`` in the
139``jmp_buf``, which requires the address of the shadow call stack to be
140aligned to its size.
141
142.. _`avoids this`: https://android.googlesource.com/platform/bionic/+/808d176e7e0dd727c7f929622ec017f6e065c582/libc/arch-arm64/bionic/setjmp.S#49
143
144The architecture's call and return instructions (``bl`` and ``ret``) operate on
145a register rather than the stack, which means that leaf functions are generally
146protected from return address overwrites even without ShadowCallStack.
Vlad Tsyrkleviche55aa032018-04-03 22:33:53 +0000147
148Usage
149=====
150
Peter Collingbournef11eb3e2018-04-04 21:55:44 +0000151To enable ShadowCallStack, just pass the ``-fsanitize=shadow-call-stack``
152flag to both compile and link command lines. On aarch64, you also need to pass
153``-ffixed-x18`` unless your target already reserves ``x18``.
Vlad Tsyrkleviche55aa032018-04-03 22:33:53 +0000154
155Low-level API
156-------------
157
158``__has_feature(shadow_call_stack)``
159~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
160
161In some cases one may need to execute different code depending on whether
162ShadowCallStack is enabled. The macro ``__has_feature(shadow_call_stack)`` can
163be used for this purpose.
164
165.. code-block:: c
166
167 #if defined(__has_feature)
168 # if __has_feature(shadow_call_stack)
169 // code that builds only under ShadowCallStack
170 # endif
171 #endif
172
173``__attribute__((no_sanitize("shadow-call-stack")))``
174~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
175
176Use ``__attribute__((no_sanitize("shadow-call-stack")))`` on a function
177declaration to specify that the shadow call stack instrumentation should not be
178applied to that function, even if enabled globally.
179
180Example
181=======
182
183The following example code:
184
185.. code-block:: c++
186
187 int foo() {
188 return bar() + 1;
189 }
190
Peter Collingbourne27aa8b62019-02-12 22:45:23 +0000191Generates the following aarch64 assembly when compiled with ``-O2``:
Peter Collingbournef11eb3e2018-04-04 21:55:44 +0000192
193.. code-block:: none
194
195 stp x29, x30, [sp, #-16]!
196 mov x29, sp
197 bl bar
198 add w0, w0, #1
199 ldp x29, x30, [sp], #16
200 ret
201
Peter Collingbourne27aa8b62019-02-12 22:45:23 +0000202Adding ``-fsanitize=shadow-call-stack`` would output the following assembly:
Peter Collingbournef11eb3e2018-04-04 21:55:44 +0000203
204.. code-block:: none
205
206 str x30, [x18], #8
207 stp x29, x30, [sp, #-16]!
208 mov x29, sp
209 bl bar
210 add w0, w0, #1
211 ldp x29, x30, [sp], #16
212 ldr x30, [x18, #-8]!
213 ret