Blame - clang/docs/ControlFlowIntegrityDesign.rst - toolchain/llvm-project

blob: b4aacd365670510f27b020321b9c8f26bd4bb05c [file] [log] [blame]

Peter Collingbourne	a4ccff3	2015-02-20 20:30:56 +0000	[diff] [blame]	1	===========================================
				2	Control Flow Integrity Design Documentation
				3	===========================================
				4
				5	This page documents the design of the :doc:`ControlFlowIntegrity` schemes
				6	supported by Clang.
				7
				8	Forward-Edge CFI for Virtual Calls
Peter Collingbourne	03054d4	2015-02-25 03:35:03 +0000	[diff] [blame]	9	==================================
Peter Collingbourne	a4ccff3	2015-02-20 20:30:56 +0000	[diff] [blame]	10
				11	This scheme works by allocating, for each static type used to make a virtual
				12	call, a region of read-only storage in the object file holding a bit vector
				13	that maps onto to the region of storage used for those virtual tables. Each
				14	set bit in the bit vector corresponds to the `address point`_ for a virtual
				15	table compatible with the static type for which the bit vector is being built.
				16
				17	For example, consider the following three C++ classes:
				18
				19	.. code-block:: c++
				20
				21	struct A {
Peter Collingbourne	0100e6c	2015-02-23 20:22:17 +0000	[diff] [blame]	22	virtual void f1();
				23	virtual void f2();
				24	virtual void f3();
Peter Collingbourne	a4ccff3	2015-02-20 20:30:56 +0000	[diff] [blame]	25	};
				26
				27	struct B : A {
Peter Collingbourne	0100e6c	2015-02-23 20:22:17 +0000	[diff] [blame]	28	virtual void f1();
				29	virtual void f2();
				30	virtual void f3();
Peter Collingbourne	a4ccff3	2015-02-20 20:30:56 +0000	[diff] [blame]	31	};
				32
				33	struct C : A {
Peter Collingbourne	0100e6c	2015-02-23 20:22:17 +0000	[diff] [blame]	34	virtual void f1();
				35	virtual void f2();
				36	virtual void f3();
Peter Collingbourne	a4ccff3	2015-02-20 20:30:56 +0000	[diff] [blame]	37	};
				38
				39	The scheme will cause the virtual tables for A, B and C to be laid out
				40	consecutively:
				41
				42	.. csv-table:: Virtual Table Layout for A, B, C
Peter Collingbourne	0100e6c	2015-02-23 20:22:17 +0000	[diff] [blame]	43	:header: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14
Peter Collingbourne	a4ccff3	2015-02-20 20:30:56 +0000	[diff] [blame]	44
Peter Collingbourne	0100e6c	2015-02-23 20:22:17 +0000	[diff] [blame]	45	A::offset-to-top, &A::rtti, &A::f1, &A::f2, &A::f3, B::offset-to-top, &B::rtti, &B::f1, &B::f2, &B::f3, C::offset-to-top, &C::rtti, &C::f1, &C::f2, &C::f3
Peter Collingbourne	a4ccff3	2015-02-20 20:30:56 +0000	[diff] [blame]	46
				47	The bit vector for static types A, B and C will look like this:
				48
				49	.. csv-table:: Bit Vectors for A, B, C
Peter Collingbourne	0100e6c	2015-02-23 20:22:17 +0000	[diff] [blame]	50	:header: Class, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14
Peter Collingbourne	a4ccff3	2015-02-20 20:30:56 +0000	[diff] [blame]	51
Peter Collingbourne	0100e6c	2015-02-23 20:22:17 +0000	[diff] [blame]	52	A, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0
				53	B, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0
				54	C, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0
Peter Collingbourne	a4ccff3	2015-02-20 20:30:56 +0000	[diff] [blame]	55
Peter Collingbourne	6e7908d	2015-03-12 00:30:41 +0000	[diff] [blame]	56	Bit vectors are represented in the object file as byte arrays. By loading
				57	from indexed offsets into the byte array and applying a mask, a program can
				58	test bits from the bit set with a relatively short instruction sequence. Bit
				59	vectors may overlap so long as they use different bits. For the full details,
				60	see the `ByteArrayBuilder`_ class.
				61
				62	In this case, assuming A is laid out at offset 0 in bit 0, B at offset 0 in
				63	bit 1 and C at offset 0 in bit 2, the byte array would look like this:
				64
				65	.. code-block:: c++
				66
				67	char bits[] = { 0, 0, 1, 0, 0, 0, 3, 0, 0, 0, 0, 5, 0, 0 };
				68
Peter Collingbourne	a4ccff3	2015-02-20 20:30:56 +0000	[diff] [blame]	69	To emit a virtual call, the compiler will assemble code that checks that
				70	the object's virtual table pointer is in-bounds and aligned and that the
				71	relevant bit is set in the bit vector.
				72
Peter Collingbourne	03054d4	2015-02-25 03:35:03 +0000	[diff] [blame]	73	For example on x86 a typical virtual call may look like this:
Peter Collingbourne	0100e6c	2015-02-23 20:22:17 +0000	[diff] [blame]	74
				75	.. code-block:: none
				76
Peter Collingbourne	6e7908d	2015-03-12 00:30:41 +0000	[diff] [blame]	77	ca7fbb: 48 8b 0f mov (%rdi),%rcx
				78	ca7fbe: 48 8d 15 c3 42 fb 07 lea 0x7fb42c3(%rip),%rdx
				79	ca7fc5: 48 89 c8 mov %rcx,%rax
				80	ca7fc8: 48 29 d0 sub %rdx,%rax
				81	ca7fcb: 48 c1 c0 3d rol $0x3d,%rax
				82	ca7fcf: 48 3d 7f 01 00 00 cmp $0x17f,%rax
				83	ca7fd5: 0f 87 36 05 00 00 ja ca8511
				84	ca7fdb: 48 8d 15 c0 0b f7 06 lea 0x6f70bc0(%rip),%rdx
				85	ca7fe2: f6 04 10 10 testb $0x10,(%rax,%rdx,1)
				86	ca7fe6: 0f 84 25 05 00 00 je ca8511
				87	ca7fec: ff 91 98 00 00 00 callq *0x98(%rcx)
Peter Collingbourne	0100e6c	2015-02-23 20:22:17 +0000	[diff] [blame]	88	[...]
Peter Collingbourne	6e7908d	2015-03-12 00:30:41 +0000	[diff] [blame]	89	ca8511: 0f 0b ud2
Peter Collingbourne	0100e6c	2015-02-23 20:22:17 +0000	[diff] [blame]	90
Peter Collingbourne	03054d4	2015-02-25 03:35:03 +0000	[diff] [blame]	91	The compiler relies on co-operation from the linker in order to assemble
				92	the bit vectors for the whole program. It currently does this using LLVM's
				93	`bit sets`_ mechanism together with link-time optimization.
				94
				95	.. _address point: https://mentorembedded.github.io/cxx-abi/abi.html#vtable-general
				96	.. _bit sets: http://llvm.org/docs/BitSets.html
Peter Collingbourne	6e7908d	2015-03-12 00:30:41 +0000	[diff] [blame]	97	.. _ByteArrayBuilder: http://llvm.org/docs/doxygen/html/structllvm_1_1ByteArrayBuilder.html
Peter Collingbourne	03054d4	2015-02-25 03:35:03 +0000	[diff] [blame]	98
				99	Optimizations
				100	-------------
				101
				102	The scheme as described above is the fully general variant of the scheme.
				103	Most of the time we are able to apply one or more of the following
				104	optimizations to improve binary size or performance.
				105
Peter Collingbourne	4b0924d	2015-02-26 00:18:04 +0000	[diff] [blame]	106	In fact, if you try the above example with the current version of the
				107	compiler, you will probably find that it will not use the described virtual
				108	table layout or machine instructions. Some of the optimizations we are about
				109	to introduce cause the compiler to use a different layout or a different
				110	sequence of machine instructions.
				111
Peter Collingbourne	03054d4	2015-02-25 03:35:03 +0000	[diff] [blame]	112	Stripping Leading/Trailing Zeros in Bit Vectors
				113	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
				114
				115	If a bit vector contains leading or trailing zeros, we can strip them from
				116	the vector. The compiler will emit code to check if the pointer is in range
				117	of the region covered by ones, and perform the bit vector check using a
				118	truncated version of the bit vector. For example, the bit vectors for our
				119	example class hierarchy will be emitted like this:
				120
				121	.. csv-table:: Bit Vectors for A, B, C
				122	:header: Class, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14
				123
				124	A, , , 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, ,
				125	B, , , , , , , , 1, , , , , , ,
				126	C, , , , , , , , , , , , , 1, ,
				127
				128	Short Inline Bit Vectors
				129	~~~~~~~~~~~~~~~~~~~~~~~~
				130
				131	If the vector is sufficiently short, we can represent it as an inline constant
				132	on x86. This saves us a few instructions when reading the correct element
				133	of the bit vector.
				134
				135	If the bit vector fits in 32 bits, the code looks like this:
Peter Collingbourne	0100e6c	2015-02-23 20:22:17 +0000	[diff] [blame]	136
				137	.. code-block:: none
				138
				139	dc2: 48 8b 03 mov (%rbx),%rax
				140	dc5: 48 8d 15 14 1e 00 00 lea 0x1e14(%rip),%rdx
				141	dcc: 48 89 c1 mov %rax,%rcx
				142	dcf: 48 29 d1 sub %rdx,%rcx
				143	dd2: 48 c1 c1 3d rol $0x3d,%rcx
				144	dd6: 48 83 f9 03 cmp $0x3,%rcx
				145	dda: 77 2f ja e0b <main+0x9b>
				146	ddc: ba 09 00 00 00 mov $0x9,%edx
				147	de1: 0f a3 ca bt %ecx,%edx
				148	de4: 73 25 jae e0b <main+0x9b>
				149	de6: 48 89 df mov %rbx,%rdi
				150	de9: ff 10 callq *(%rax)
				151	[...]
				152	e0b: 0f 0b ud2
				153
				154	Or if the bit vector fits in 64 bits:
				155
				156	.. code-block:: none
				157
				158	11a6: 48 8b 03 mov (%rbx),%rax
				159	11a9: 48 8d 15 d0 28 00 00 lea 0x28d0(%rip),%rdx
				160	11b0: 48 89 c1 mov %rax,%rcx
				161	11b3: 48 29 d1 sub %rdx,%rcx
				162	11b6: 48 c1 c1 3d rol $0x3d,%rcx
				163	11ba: 48 83 f9 2a cmp $0x2a,%rcx
				164	11be: 77 35 ja 11f5 <main+0xb5>
				165	11c0: 48 ba 09 00 00 00 00 movabs $0x40000000009,%rdx
				166	11c7: 04 00 00
				167	11ca: 48 0f a3 ca bt %rcx,%rdx
				168	11ce: 73 25 jae 11f5 <main+0xb5>
				169	11d0: 48 89 df mov %rbx,%rdi
				170	11d3: ff 10 callq *(%rax)
				171	[...]
				172	11f5: 0f 0b ud2
				173
Peter Collingbourne	03054d4	2015-02-25 03:35:03 +0000	[diff] [blame]	174	If the bit vector consists of a single bit, there is only one possible
				175	virtual table, and the check can consist of a single equality comparison:
Peter Collingbourne	a4ccff3	2015-02-20 20:30:56 +0000	[diff] [blame]	176
Peter Collingbourne	03054d4	2015-02-25 03:35:03 +0000	[diff] [blame]	177	.. code-block:: none
				178
				179	9a2: 48 8b 03 mov (%rbx),%rax
				180	9a5: 48 8d 0d a4 13 00 00 lea 0x13a4(%rip),%rcx
				181	9ac: 48 39 c8 cmp %rcx,%rax
				182	9af: 75 25 jne 9d6 <main+0x86>
				183	9b1: 48 89 df mov %rbx,%rdi
				184	9b4: ff 10 callq *(%rax)
				185	[...]
				186	9d6: 0f 0b ud2
				187
				188	Virtual Table Layout
				189	~~~~~~~~~~~~~~~~~~~~
				190
				191	The compiler lays out classes of disjoint hierarchies in separate regions
				192	of the object file. At worst, bit vectors in disjoint hierarchies only
				193	need to cover their disjoint hierarchy. But the closer that classes in
				194	sub-hierarchies are laid out to each other, the smaller the bit vectors for
				195	those sub-hierarchies need to be (see "Stripping Leading/Trailing Zeros in Bit
				196	Vectors" above). The `GlobalLayoutBuilder`_ class is responsible for laying
				197	out the globals efficiently to minimize the sizes of the underlying bitsets.
				198
Peter Collingbourne	4b0924d	2015-02-26 00:18:04 +0000	[diff] [blame]	199	.. _GlobalLayoutBuilder: http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Transforms/IPO/LowerBitSets.h?view=markup
Peter Collingbourne	03054d4	2015-02-25 03:35:03 +0000	[diff] [blame]	200
				201	Alignment
				202	~~~~~~~~~
				203
				204	If all gaps between address points in a particular bit vector are multiples
				205	of powers of 2, the compiler can compress the bit vector by strengthening
				206	the alignment requirements of the virtual table pointer. For example, given
				207	this class hierarchy:
				208
				209	.. code-block:: c++
				210
				211	struct A {
				212	virtual void f1();
				213	virtual void f2();
				214	};
				215
				216	struct B : A {
				217	virtual void f1();
				218	virtual void f2();
				219	virtual void f3();
				220	virtual void f4();
				221	virtual void f5();
				222	virtual void f6();
				223	};
				224
				225	struct C : A {
				226	virtual void f1();
				227	virtual void f2();
				228	};
				229
				230	The virtual tables will be laid out like this:
				231
				232	.. csv-table:: Virtual Table Layout for A, B, C
				233	:header: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15
				234
				235	A::offset-to-top, &A::rtti, &A::f1, &A::f2, B::offset-to-top, &B::rtti, &B::f1, &B::f2, &B::f3, &B::f4, &B::f5, &B::f6, C::offset-to-top, &C::rtti, &C::f1, &C::f2
				236
				237	Notice that each address point for A is separated by 4 words. This lets us
				238	emit a compressed bit vector for A that looks like this:
				239
				240	.. csv-table::
				241	:header: 2, 6, 10, 14
				242
				243	1, 1, 0, 1
				244
				245	At call sites, the compiler will strengthen the alignment requirements by
				246	using a different rotate count. For example, on a 64-bit machine where the
				247	address points are 4-word aligned (as in A from our example), the ``rol``
				248	instruction may look like this:
				249
				250	.. code-block:: none
				251
				252	dd2: 48 c1 c1 3b rol $0x3b,%rcx
Peter Collingbourne	4b0924d	2015-02-26 00:18:04 +0000	[diff] [blame]	253
				254	Padding to Powers of 2
				255	~~~~~~~~~~~~~~~~~~~~~~
				256
				257	Of course, this alignment scheme works best if the address points are
				258	in fact aligned correctly. To make this more likely to happen, we insert
				259	padding between virtual tables that in many cases aligns address points to
				260	a power of 2. Specifically, our padding aligns virtual tables to the next
				261	highest power of 2 bytes; because address points for specific base classes
				262	normally appear at fixed offsets within the virtual table, this normally
				263	has the effect of aligning the address points as well.
				264
				265	This scheme introduces tradeoffs between decreased space overhead for
				266	instructions and bit vectors and increased overhead in the form of padding. We
				267	therefore limit the amount of padding so that we align to no more than 128
				268	bytes. This number was found experimentally to provide a good tradeoff.
				269
				270	Eliminating Bit Vector Checks for All-Ones Bit Vectors
				271	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
				272
				273	If the bit vector is all ones, the bit vector check is redundant; we simply
				274	need to check that the address is in range and well aligned. This is more
				275	likely to occur if the virtual tables are padded.
Peter Collingbourne	2c7f7e3	2015-09-10 02:17:40 +0000	[diff] [blame]	276
				277	Forward-Edge CFI for Indirect Function Calls
				278	============================================
				279
Peter Collingbourne	9de8fc5	2015-12-01 21:58:54 +0000	[diff] [blame]	280	Under forward-edge CFI for indirect function calls, each unique function
				281	type has its own bit vector, and at each call site we need to check that the
				282	function pointer is a member of the function type's bit vector. This scheme
				283	works in a similar way to forward-edge CFI for virtual calls, the distinction
				284	being that we need to build bit vectors of function entry points rather than
				285	of virtual tables.
Peter Collingbourne	2c7f7e3	2015-09-10 02:17:40 +0000	[diff] [blame]	286
Peter Collingbourne	9de8fc5	2015-12-01 21:58:54 +0000	[diff] [blame]	287	Unlike when re-arranging global variables, we cannot re-arrange functions
				288	in a particular order and base our calculations on the layout of the
				289	functions' entry points, as we have no idea how large a particular function
				290	will end up being (the function sizes could even depend on how we arrange
				291	the functions). Instead, we build a jump table, which is a block of code
				292	consisting of one branch instruction for each of the functions in the bit
				293	set that branches to the target function, and redirect any taken function
				294	addresses to the corresponding jump table entry. In this way, the distance
				295	between function entry points is predictable and controllable. In the object
				296	file's symbol table, the symbols for the target functions also refer to the
				297	jump table entries, so that addresses taken outside the module will pass
				298	any verification done inside the module.
				299
				300	In more concrete terms, suppose we have three functions ``f``, ``g``, ``h``
				301	which are members of a single bitset, and a function foo that returns their
				302	addresses:
				303
				304	.. code-block:: none
				305
				306	f:
				307	mov 0, %eax
				308	ret
				309
				310	g:
				311	mov 1, %eax
				312	ret
				313
				314	h:
				315	mov 2, %eax
				316	ret
				317
				318	foo:
				319	mov f, %eax
				320	mov g, %edx
				321	mov h, %ecx
				322	ret
				323
				324	Our jump table will (conceptually) look like this:
				325
				326	.. code-block:: none
				327
				328	f:
				329	jmp .Ltmp0 ; 5 bytes
				330	int3 ; 1 byte
				331	int3 ; 1 byte
				332	int3 ; 1 byte
				333
				334	g:
				335	jmp .Ltmp1 ; 5 bytes
				336	int3 ; 1 byte
				337	int3 ; 1 byte
				338	int3 ; 1 byte
				339
				340	h:
				341	jmp .Ltmp2 ; 5 bytes
				342	int3 ; 1 byte
				343	int3 ; 1 byte
				344	int3 ; 1 byte
				345
				346	.Ltmp0:
				347	mov 0, %eax
				348	ret
				349
				350	.Ltmp1:
				351	mov 1, %eax
				352	ret
				353
				354	.Ltmp2:
				355	mov 2, %eax
				356	ret
				357
				358	foo:
				359	mov f, %eax
				360	mov g, %edx
				361	mov h, %ecx
				362	ret
				363
				364	Because the addresses of ``f``, ``g``, ``h`` are evenly spaced at a power of
				365	2, and function types do not overlap (unlike class types with base classes),
				366	we can normally apply the `Alignment`_ and `Eliminating Bit Vector Checks
				367	for All-Ones Bit Vectors`_ optimizations thus simplifying the check at each
				368	call site to a range and alignment check.
Evgeniy Stepanov	fd6f92d	2015-12-15 23:00:20 +0000	[diff] [blame]	369
				370	Shared library support
				371	======================
				372
				373	EXPERIMENTAL
				374
				375	The basic CFI mode described above assumes that the application is a
				376	monolithic binary; at least that all possible virtual/indirect call
				377	targets and the entire class hierarchy are known at link time. The
				378	cross-DSO mode, enabled with -f[no-]sanitize-cfi-cross-dso relaxes
				379	this requirement by allowing virtual and indirect calls to cross the
				380	DSO boundary.
				381
				382	Assuming the following setup: the binary consists of several
				383	instrumented and several uninstrumented DSOs. Some of them may be
				384	dlopen-ed/dlclose-d periodically, even frequently.
				385
				386	- Calls made from uninstrumented DSOs are not checked and just work.
				387	- Calls inside any instrumented DSO are fully protected.
				388	- Calls between different instrumented DSOs are also protected, with
				389	a performance penalty (in addition to the monolithic CFI
				390	overhead).
				391	- Calls from an instrumented DSO to an uninstrumented one are
				392	unchecked and just work, with performance penalty.
				393	- Calls from an instrumented DSO outside of any known DSO are
				394	detected as CFI violations.
				395
				396	In the monolithic scheme a call site is instrumented as
				397
				398	.. code-block:: none
				399
				400	if (!InlinedFastCheck(f))
				401	abort();
				402	call *f
				403
				404	In the cross-DSO scheme it becomes
				405
				406	.. code-block:: none
				407
				408	if (!InlinedFastCheck(f))
				409	__cfi_slowpath(CallSiteTypeId, f);
				410	call *f
				411
				412	CallSiteTypeId
				413	--------------
				414
				415	``CallSiteTypeId`` is a stable process-wide identifier of the
				416	call-site type. For a virtual call site, the type in question is the class
				417	type; for an indirect function call it is the function signature. The
				418	mapping from a type to an identifier is an ABI detail. In the current,
				419	experimental, implementation the identifier of type T is calculated as
				420	follows:
				421
				422	- Obtain the mangled name for "typeinfo name for T".
				423	- Calculate MD5 hash of the name as a string.
				424	- Reinterpret the first 8 bytes of the hash as a little-endian
				425	64-bit integer.
				426
				427	It is possible, but unlikely, that collisions in the
				428	``CallSiteTypeId`` hashing will result in weaker CFI checks that would
				429	still be conservatively correct.
				430
				431	CFI_Check
				432	---------
				433
				434	In the general case, only the target DSO knows whether the call to
				435	function ``f`` with type ``CallSiteTypeId`` is valid or not. To
				436	export this information, every DSO implements
				437
				438	.. code-block:: none
				439
				440	void __cfi_check(uint64 CallSiteTypeId, void *TargetAddr)
				441
				442	This function provides external modules with access to CFI checks for
				443	the targets inside this DSO. For each known ``CallSiteTypeId``, this
				444	functions performs an ``llvm.bitset.test`` with the corresponding bit
				445	set. It aborts if the type is unknown, or if the check fails.
				446
				447	The basic implementation is a large switch statement over all values
				448	of CallSiteTypeId supported by this DSO, and each case is similar to
				449	the InlinedFastCheck() in the basic CFI mode.
				450
				451	CFI Shadow
				452	----------
				453
				454	To route CFI checks to the target DSO's __cfi_check function, a
				455	mapping from possible virtual / indirect call targets to
				456	the corresponding __cfi_check functions is maintained. This mapping is
				457	implemented as a sparse array of 2 bytes for every possible page (4096
				458	bytes) of memory. The table is kept readonly (FIXME: not yet) most of
				459	the time.
				460
				461	There are 3 types of shadow values:
				462
				463	- Address in a CFI-instrumented DSO.
				464	- Unchecked address (a “trusted” non-instrumented DSO). Encoded as
				465	value 0xFFFF.
				466	- Invalid address (everything else). Encoded as value 0.
				467
				468	For a CFI-instrumented DSO, a shadow value encodes the address of the
				469	__cfi_check function for all call targets in the corresponding memory
				470	page. If Addr is the target address, and V is the shadow value, then
				471	the address of __cfi_check is calculated as
				472
				473	.. code-block:: none
				474
				475	__cfi_check = AlignUpTo(Addr, 4096) - (V + 1) * 4096
				476
				477	This works as long as __cfi_check is aligned by 4096 bytes and located
				478	below any call targets in its DSO, but not more than 256MB apart from
				479	them.
				480
				481	CFI_SlowPath
				482	------------
				483
				484	The slow path check is implemented in compiler-rt library as
				485
				486	.. code-block:: none
				487
				488	void __cfi_slowpath(uint64 CallSiteTypeId, void *TargetAddr)
				489
				490	This functions loads a shadow value for ``TargetAddr``, finds the
				491	address of __cfi_check as described above and calls that.
				492
				493	Position-independent executable requirement
				494	-------------------------------------------
				495
				496	Cross-DSO CFI mode requires that the main executable is built as PIE.
				497	In non-PIE executables the address of an external function (taken from
				498	the main executable) is the address of that function’s PLT record in
				499	the main executable. This would break the CFI checks.