Blame - clang/docs/ShadowCallStack.rst - toolchain/llvm-project

blob: c1284f7c2dea3049e269f367e39066745b9ec062 [file] [log] [blame]

Vlad Tsyrklevich	e55aa03	2018-04-03 22:33:53 +0000	[diff] [blame]	1	===============
				2	ShadowCallStack
				3	===============
				4
				5	.. contents::
				6	:local:
				7
				8	Introduction
				9	============
				10
Peter Collingbourne	27aa8b6	2019-02-12 22:45:23 +0000	[diff] [blame]	11	ShadowCallStack is an instrumentation pass, currently only implemented for
				12	aarch64 and x86_64, that protects programs against return address overwrites
				13	(e.g. stack buffer overflows.) It works by saving a function's return address
				14	to a separately allocated 'shadow call stack' in the function prolog in
				15	non-leaf functions and loading the return address from the shadow call stack
				16	in the function epilog. The return address is also stored on the regular stack
				17	for compatibility with unwinders, but is otherwise unused.
				18
				19	The aarch64 implementation is considered production ready, and
				20	an `implementation of the runtime`_ has been added to Android's libc
				21	(bionic). The x86_64 implementation was evaluated using Chromium and was
				22	found to have critical performance and security deficiencies, and may be
				23	removed in a future release of the compiler. This document only describes
				24	the aarch64 implementation; details on the x86_64 implementation are found
				25	in the `Clang 7.0.1 documentation`_.
				26
				27	.. _`implementation of the runtime`: https://android.googlesource.com/platform/bionic/+/808d176e7e0dd727c7f929622ec017f6e065c582/libc/bionic/pthread_create.cpp#128
				28	.. _`Clang 7.0.1 documentation`: https://releases.llvm.org/7.0.1/tools/clang/docs/ShadowCallStack.html
Vlad Tsyrklevich	e55aa03	2018-04-03 22:33:53 +0000	[diff] [blame]	29
				30	Comparison
				31	----------
				32
Peter Collingbourne	27aa8b6	2019-02-12 22:45:23 +0000	[diff] [blame]	33	To optimize for memory consumption and cache locality, the shadow call
				34	stack stores only an array of return addresses. This is in contrast to other
				35	schemes, like :doc:`SafeStack`, that mirror the entire stack and trade-off
				36	consuming more memory for shorter function prologs and epilogs with fewer
				37	memory accesses.
				38
				39	`Return Flow Guard`_ is a pure software implementation of shadow call stacks
				40	on x86_64. It is similar to the ShadowCallStack x86_64 implementation but
				41	trades off higher memory usage for a shorter prologue and epilogue. Like
				42	x86_64 ShadowCallStack, it is inherently racy due to the architecture's use
				43	of the stack for calls and returns.
				44
				45	Intel `Control-flow Enforcement Technology`_ (CET) is a proposed hardware
				46	extension that would add native support to use a shadow stack to store/check
				47	return addresses at call/return time. Being a hardware implementation, it
				48	would not suffer from race conditions and would not incur the overhead of
				49	function instrumentation, but it does require operating system support.
Vlad Tsyrklevich	e55aa03	2018-04-03 22:33:53 +0000	[diff] [blame]	50
				51	.. _`Return Flow Guard`: https://xlab.tencent.com/en/2016/11/02/return-flow-guard/
				52	.. _`Control-flow Enforcement Technology`: https://software.intel.com/sites/default/files/managed/4d/2a/control-flow-enforcement-technology-preview.pdf
				53
				54	Compatibility
				55	-------------
				56
Peter Collingbourne	27aa8b6	2019-02-12 22:45:23 +0000	[diff] [blame]	57	A runtime is not provided in compiler-rt so one must be provided by the
				58	compiled application or the operating system. Integrating the runtime into
				59	the operating system should be preferred since otherwise all thread creation
				60	and destruction would need to be intercepted by the application.
Peter Collingbourne	f11eb3e	2018-04-04 21:55:44 +0000	[diff] [blame]	61
Peter Collingbourne	27aa8b6	2019-02-12 22:45:23 +0000	[diff] [blame]	62	The instrumentation makes use of the platform register ``x18``. On some
				63	platforms, ``x18`` is reserved, and on others, it is designated as a scratch
				64	register. This generally means that any code that may run on the same thread
				65	as code compiled with ShadowCallStack must either target one of the platforms
				66	whose ABI reserves ``x18`` (currently Android, Darwin, Fuchsia and Windows)
				67	or be compiled with the flag ``-ffixed-x18``. If absolutely necessary, code
				68	compiled without ``-ffixed-x18`` may be run on the same thread as code that
				69	uses ShadowCallStack by saving the register value temporarily on the stack
				70	(`example in Android`_) but this should be done with care since it risks
				71	leaking the shadow call stack address.
				72
				73	.. _`example in Android`: https://android-review.googlesource.com/c/platform/frameworks/base/+/803717
				74
				75	Because of the use of register ``x18``, the ShadowCallStack feature is
				76	incompatible with any other feature that may use ``x18``. However, there
				77	is no inherent reason why ShadowCallStack needs to use register ``x18``
				78	specifically; in principle, a platform could choose to reserve and use another
				79	register for ShadowCallStack, but this would be incompatible with the AAPCS64.
				80
				81	Special unwind information is required on functions that are compiled
				82	with ShadowCallStack and that may be unwound, i.e. functions compiled with
				83	``-fexceptions`` (which is the default in C++). Some unwinders (such as the
				84	libgcc 4.9 unwinder) do not understand this unwind info and will segfault
				85	when encountering it. LLVM libunwind processes this unwind info correctly,
				86	however. This means that if exceptions are used together with ShadowCallStack,
				87	the program must use a compatible unwinder.
Vlad Tsyrklevich	e55aa03	2018-04-03 22:33:53 +0000	[diff] [blame]	88
				89	Security
				90	========
				91
				92	ShadowCallStack is intended to be a stronger alternative to
				93	``-fstack-protector``. It protects from non-linear overflows and arbitrary
Peter Collingbourne	27aa8b6	2019-02-12 22:45:23 +0000	[diff] [blame]	94	memory writes to the return address slot.
Vlad Tsyrklevich	e55aa03	2018-04-03 22:33:53 +0000	[diff] [blame]	95
Peter Collingbourne	27aa8b6	2019-02-12 22:45:23 +0000	[diff] [blame]	96	The instrumentation makes use of the ``x18`` register to reference the shadow
				97	call stack, meaning that references to the shadow call stack do not have
				98	to be stored in memory. This makes it possible to implement a runtime that
				99	avoids exposing the address of the shadow call stack to attackers that can
				100	read arbitrary memory. However, attackers could still try to exploit side
				101	channels exposed by the operating system `[1]`_ `[2]`_ or processor `[3]`_
				102	to discover the address of the shadow call stack.
Vlad Tsyrklevich	e55aa03	2018-04-03 22:33:53 +0000	[diff] [blame]	103
				104	.. _`[1]`: https://eyalitkin.wordpress.com/2017/09/01/cartography-lighting-up-the-shadows/
				105	.. _`[2]`: https://www.blackhat.com/docs/eu-16/materials/eu-16-Goktas-Bypassing-Clangs-SafeStack.pdf
				106	.. _`[3]`: https://www.vusec.net/projects/anc/
				107
Peter Collingbourne	27aa8b6	2019-02-12 22:45:23 +0000	[diff] [blame]	108	Unless care is taken when allocating the shadow call stack, it may be
				109	possible for an attacker to guess its address using the addresses of
				110	other allocations. Therefore, the address should be chosen to make this
				111	difficult. One way to do this is to allocate a large guard region without
				112	read/write permissions, randomly select a small region within it to be
				113	used as the address of the shadow call stack and mark only that region as
				114	read/write. This also mitigates somewhat against processor side channels.
				115	The intent is that the Android runtime `will do this`_, but the platform will
				116	first need to be `changed`_ to avoid using ``setrlimit(RLIMIT_AS)`` to limit
				117	memory allocations in certain processes, as this also limits the number of
				118	guard regions that can be allocated.
Peter Collingbourne	f11eb3e	2018-04-04 21:55:44 +0000	[diff] [blame]	119
Peter Collingbourne	27aa8b6	2019-02-12 22:45:23 +0000	[diff] [blame]	120	.. _`will do this`: https://android-review.googlesource.com/c/platform/bionic/+/891622
				121	.. _`changed`: https://android-review.googlesource.com/c/platform/frameworks/av/+/837745
				122
				123	The runtime will need the address of the shadow call stack in order to
				124	deallocate it when destroying the thread. If the entire program is compiled
				125	with ``-ffixed-x18``, this is trivial: the address can be derived from the
				126	value stored in ``x18`` (e.g. by masking out the lower bits). If a guard
				127	region is used, the address of the start of the guard region could then be
				128	stored at the start of the shadow call stack itself. But if it is possible
				129	for code compiled without ``-ffixed-x18`` to run on a thread managed by the
				130	runtime, which is the case on Android for example, the address must be stored
				131	somewhere else instead. On Android we store the address of the start of the
				132	guard region in TLS and deallocate the entire guard region including the
				133	shadow call stack at thread exit. This is considered acceptable given that
				134	the address of the start of the guard region is already somewhat guessable.
				135
				136	One way in which the address of the shadow call stack could leak is in the
				137	``jmp_buf`` data structure used by ``setjmp`` and ``longjmp``. The Android
				138	runtime `avoids this`_ by only storing the low bits of ``x18`` in the
				139	``jmp_buf``, which requires the address of the shadow call stack to be
				140	aligned to its size.
				141
				142	.. _`avoids this`: https://android.googlesource.com/platform/bionic/+/808d176e7e0dd727c7f929622ec017f6e065c582/libc/arch-arm64/bionic/setjmp.S#49
				143
				144	The architecture's call and return instructions (``bl`` and ``ret``) operate on
				145	a register rather than the stack, which means that leaf functions are generally
				146	protected from return address overwrites even without ShadowCallStack.
Vlad Tsyrklevich	e55aa03	2018-04-03 22:33:53 +0000	[diff] [blame]	147
				148	Usage
				149	=====
				150
Peter Collingbourne	f11eb3e	2018-04-04 21:55:44 +0000	[diff] [blame]	151	To enable ShadowCallStack, just pass the ``-fsanitize=shadow-call-stack``
				152	flag to both compile and link command lines. On aarch64, you also need to pass
				153	``-ffixed-x18`` unless your target already reserves ``x18``.
Vlad Tsyrklevich	e55aa03	2018-04-03 22:33:53 +0000	[diff] [blame]	154
				155	Low-level API
				156	-------------
				157
				158	``__has_feature(shadow_call_stack)``
				159	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
				160
				161	In some cases one may need to execute different code depending on whether
				162	ShadowCallStack is enabled. The macro ``__has_feature(shadow_call_stack)`` can
				163	be used for this purpose.
				164
				165	.. code-block:: c
				166
				167	#if defined(__has_feature)
				168	# if __has_feature(shadow_call_stack)
				169	// code that builds only under ShadowCallStack
				170	# endif
				171	#endif
				172
				173	``__attribute__((no_sanitize("shadow-call-stack")))``
				174	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
				175
				176	Use ``__attribute__((no_sanitize("shadow-call-stack")))`` on a function
				177	declaration to specify that the shadow call stack instrumentation should not be
				178	applied to that function, even if enabled globally.
				179
				180	Example
				181	=======
				182
				183	The following example code:
				184
				185	.. code-block:: c++
				186
				187	int foo() {
				188	return bar() + 1;
				189	}
				190
Peter Collingbourne	27aa8b6	2019-02-12 22:45:23 +0000	[diff] [blame]	191	Generates the following aarch64 assembly when compiled with ``-O2``:
Peter Collingbourne	f11eb3e	2018-04-04 21:55:44 +0000	[diff] [blame]	192
				193	.. code-block:: none
				194
				195	stp x29, x30, [sp, #-16]!
				196	mov x29, sp
				197	bl bar
				198	add w0, w0, #1
				199	ldp x29, x30, [sp], #16
				200	ret
				201
Peter Collingbourne	27aa8b6	2019-02-12 22:45:23 +0000	[diff] [blame]	202	Adding ``-fsanitize=shadow-call-stack`` would output the following assembly:
Peter Collingbourne	f11eb3e	2018-04-04 21:55:44 +0000	[diff] [blame]	203
				204	.. code-block:: none
				205
				206	str x30, [x18], #8
				207	stp x29, x30, [sp, #-16]!
				208	mov x29, sp
				209	bl bar
				210	add w0, w0, #1
				211	ldp x29, x30, [sp], #16
				212	ldr x30, [x18, #-8]!
				213	ret