blob: 4e6f5d14cde4cbce57d0afd2c4edadd888663036 [file] [log] [blame]
Kostya Serebryany79990bd2017-12-07 19:21:30 +00001=======================================================
2Hardware-assisted AddressSanitizer Design Documentation
3=======================================================
Kostya Serebryanyf51f5802017-12-04 20:01:38 +00004
5This page is a design document for
Kostya Serebryany79990bd2017-12-07 19:21:30 +00006**hardware-assisted AddressSanitizer** (or **HWASAN**)
Kostya Serebryanyf51f5802017-12-04 20:01:38 +00007a tool similar to :doc:`AddressSanitizer`,
8but based on partial hardware assistance.
9
Kostya Serebryanyf51f5802017-12-04 20:01:38 +000010
11Introduction
12============
13
14:doc:`AddressSanitizer`
15tags every 8 bytes of the application memory with a 1 byte tag (using *shadow memory*),
16uses *redzones* to find buffer-overflows and
17*quarantine* to find use-after-free.
18The redzones, the quarantine, and, to a less extent, the shadow, are the
19sources of AddressSanitizer's memory overhead.
20See the `AddressSanitizer paper`_ for details.
21
Kostya Serebryany48fff9982017-12-18 21:40:07 +000022AArch64 has the `Address Tagging`_ (or top-byte-ignore, TBI), a hardware feature that allows
Kostya Serebryanyf51f5802017-12-04 20:01:38 +000023software to use 8 most significant bits of a 64-bit pointer as
Kostya Serebryany79990bd2017-12-07 19:21:30 +000024a tag. HWASAN uses `Address Tagging`_
Kostya Serebryanyf51f5802017-12-04 20:01:38 +000025to implement a memory safety tool, similar to :doc:`AddressSanitizer`,
26but with smaller memory overhead and slightly different (mostly better)
27accuracy guarantees.
28
29Algorithm
30=========
Kostya Serebryany79fa4182018-03-14 01:55:49 +000031* Every heap/stack/global memory object is forcibly aligned by `TG` bytes
32 (`TG` is e.g. 16 or 64). We call `TG` the **tagging granularity**.
33* For every such object a random `TS`-bit tag `T` is chosen (`TS`, or tag size, is e.g. 4 or 8)
Kostya Serebryanyf51f5802017-12-04 20:01:38 +000034* The pointer to the object is tagged with `T`.
Kostya Serebryany79fa4182018-03-14 01:55:49 +000035* The memory for the object is also tagged with `T` (using a `TG=>1` shadow memory)
Kostya Serebryanyf51f5802017-12-04 20:01:38 +000036* Every load and store is instrumented to read the memory tag and compare it
37 with the pointer tag, exception is raised on tag mismatch.
38
Kostya Serebryany79fa4182018-03-14 01:55:49 +000039For a more detailed discussion of this approach see https://arxiv.org/pdf/1802.09517.pdf
40
Kostya Serebryanyf51f5802017-12-04 20:01:38 +000041Instrumentation
42===============
43
44Memory Accesses
45---------------
Kostya Serebryany48fff9982017-12-18 21:40:07 +000046All memory accesses are prefixed with an inline instruction sequence that
47verifies the tags. Currently, the following sequence is used:
Kostya Serebryanyf51f5802017-12-04 20:01:38 +000048
Kostya Serebryany48fff9982017-12-18 21:40:07 +000049
Chandler Carruth59f1e692018-08-06 01:28:42 +000050.. code-block:: none
Kostya Serebryany48fff9982017-12-18 21:40:07 +000051
52 // int foo(int *a) { return *a; }
53 // clang -O2 --target=aarch64-linux -fsanitize=hwaddress -c load.c
54 foo:
Alex Shlyapnikove55bbac2018-04-24 17:41:48 +000055 0: 08 00 00 90 adrp x8, 0 <__hwasan_shadow>
Chandler Carruth59f1e692018-08-06 01:28:42 +000056 4: 08 01 40 f9 ldr x8, [x8] // shadow base (to be resolved by the loader)
57 8: 09 dc 44 d3 ubfx x9, x0, #4, #52 // shadow offset
58 c: 28 69 68 38 ldrb w8, [x9, x8] // load shadow tag
59 10: 09 fc 78 d3 lsr x9, x0, #56 // extract address tag
60 14: 3f 01 08 6b cmp w9, w8 // compare tags
61 18: 61 00 00 54 b.ne 24 // jump on mismatch
62 1c: 00 00 40 b9 ldr w0, [x0] // original load
Alex Shlyapnikove55bbac2018-04-24 17:41:48 +000063 20: c0 03 5f d6 ret
Chandler Carruth59f1e692018-08-06 01:28:42 +000064 24: 40 20 21 d4 brk #0x902 // trap
Kostya Serebryany48fff9982017-12-18 21:40:07 +000065
66Alternatively, memory accesses are prefixed with a function call.
Kostya Serebryanyf51f5802017-12-04 20:01:38 +000067
68Heap
69----
70
71Tagging the heap memory/pointers is done by `malloc`.
Kostya Serebryany79fa4182018-03-14 01:55:49 +000072This can be based on any malloc that forces all objects to be TG-aligned.
Kostya Serebryany48fff9982017-12-18 21:40:07 +000073`free` tags the memory with a different tag.
Kostya Serebryanyf51f5802017-12-04 20:01:38 +000074
75Stack
76-----
77
Evgeniy Stepanov5f345042018-02-03 01:06:21 +000078Stack frames are instrumented by aligning all non-promotable allocas
Kostya Serebryany79fa4182018-03-14 01:55:49 +000079by `TG` and tagging stack memory in function prologue and epilogue.
Evgeniy Stepanov5f345042018-02-03 01:06:21 +000080
81Tags for different allocas in one function are **not** generated
82independently; doing that in a function with `M` allocas would require
83maintaining `M` live stack pointers, significantly increasing register
84pressure. Instead we generate a single base tag value in the prologue,
85and build the tag for alloca number `M` as `ReTag(BaseTag, M)`, where
86ReTag can be as simple as exclusive-or with constant `M`.
87
Kostya Serebryanyf51f5802017-12-04 20:01:38 +000088Stack instrumentation is expected to be a major source of overhead,
89but could be optional.
Kostya Serebryanyf51f5802017-12-04 20:01:38 +000090
91Globals
92-------
93
94TODO: details.
95
96Error reporting
97---------------
98
Kostya Serebryany48fff9982017-12-18 21:40:07 +000099Errors are generated by the `HLT` instruction and are handled by a signal handler.
Kostya Serebryanyf51f5802017-12-04 20:01:38 +0000100
Kostya Serebryany79990bd2017-12-07 19:21:30 +0000101Attribute
102---------
103
Kostya Serebryany67a3af02017-12-08 18:14:03 +0000104HWASAN uses its own LLVM IR Attribute `sanitize_hwaddress` and a matching
Kostya Serebryany79990bd2017-12-07 19:21:30 +0000105C function attribute. An alternative would be to re-use ASAN's attribute
106`sanitize_address`. The reasons to use a separate attribute are:
107
108 * Users may need to disable ASAN but not HWASAN, or vise versa,
109 because the tools have different trade-offs and compatibility issues.
110 * LLVM (ideally) does not use flags to decide which pass is being used,
111 ASAN or HWASAN are being applied, based on the function attributes.
112
113This does mean that users of HWASAN may need to add the new attribute
114to the code that already uses the old attribute.
115
Kostya Serebryanyf51f5802017-12-04 20:01:38 +0000116
117Comparison with AddressSanitizer
118================================
119
Kostya Serebryany79990bd2017-12-07 19:21:30 +0000120HWASAN:
Kostya Serebryanyf51f5802017-12-04 20:01:38 +0000121 * Is less portable than :doc:`AddressSanitizer`
122 as it relies on hardware `Address Tagging`_ (AArch64).
123 Address Tagging can be emulated with compiler instrumentation,
124 but it will require the instrumentation to remove the tags before
125 any load or store, which is infeasible in any realistic environment
126 that contains non-instrumented code.
127 * May have compatibility problems if the target code uses higher
128 pointer bits for other purposes.
129 * May require changes in the OS kernels (e.g. Linux seems to dislike
Kostya Serebryany79990bd2017-12-07 19:21:30 +0000130 tagged pointers passed from address space:
131 https://www.kernel.org/doc/Documentation/arm64/tagged-pointers.txt).
Kostya Serebryanyf51f5802017-12-04 20:01:38 +0000132 * **Does not require redzones to detect buffer overflows**,
133 but the buffer overflow detection is probabilistic, with roughly
Kostya Serebryany79fa4182018-03-14 01:55:49 +0000134 `(2**TS-1)/(2**TS)` probability of catching a bug.
Kostya Serebryanyf51f5802017-12-04 20:01:38 +0000135 * **Does not require quarantine to detect heap-use-after-free,
136 or stack-use-after-return**.
137 The detection is similarly probabilistic.
138
Kostya Serebryany79990bd2017-12-07 19:21:30 +0000139The memory overhead of HWASAN is expected to be much smaller
Kostya Serebryanyf51f5802017-12-04 20:01:38 +0000140than that of AddressSanitizer:
Kostya Serebryany79fa4182018-03-14 01:55:49 +0000141`1/TG` extra memory for the shadow
142and some overhead due to `TG`-aligning all objects.
143
144Supported architectures
145=======================
146HWASAN relies on `Address Tagging`_ which is only available on AArch64.
147For other 64-bit architectures it is possible to remove the address tags
148before every load and store by compiler instrumentation, but this variant
149will have limited deployability since not all of the code is
150typically instrumented.
151
152The HWASAN's approach is not applicable to 32-bit architectures.
Kostya Serebryanyf51f5802017-12-04 20:01:38 +0000153
154
155Related Work
156============
157* `SPARC ADI`_ implements a similar tool mostly in hardware.
158* `Effective and Efficient Memory Protection Using Dynamic Tainting`_ discusses
159 similar approaches ("lock & key").
160* `Watchdog`_ discussed a heavier, but still somewhat similar
161 "lock & key" approach.
162* *TODO: add more "related work" links. Suggestions are welcome.*
163
164
165.. _Watchdog: http://www.cis.upenn.edu/acg/papers/isca12_watchdog.pdf
166.. _Effective and Efficient Memory Protection Using Dynamic Tainting: https://www.cc.gatech.edu/~orso/papers/clause.doudalis.orso.prvulovic.pdf
167.. _SPARC ADI: https://lazytyped.blogspot.com/2017/09/getting-started-with-adi.html
168.. _AddressSanitizer paper: https://www.usenix.org/system/files/conference/atc12/atc12-final39.pdf
169.. _Address Tagging: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.den0024a/ch12s05s01.html
170