Blame - Documentation/x86/mds.rst - kernel/msm-4.9

blob: 5d4330be200f980cf4ea7a8e17b9e89012dcc381 [file] [log] [blame]

Thomas Gleixner	96ef7af	2019-02-18 23:13:06 +0100	[diff] [blame]	1	Microarchitectural Data Sampling (MDS) mitigation
				2	=================================================
				3
				4	.. _mds:
				5
				6	Overview
				7	--------
				8
				9	Microarchitectural Data Sampling (MDS) is a family of side channel attacks
				10	on internal buffers in Intel CPUs. The variants are:
				11
				12	- Microarchitectural Store Buffer Data Sampling (MSBDS) (CVE-2018-12126)
				13	- Microarchitectural Fill Buffer Data Sampling (MFBDS) (CVE-2018-12130)
				14	- Microarchitectural Load Port Data Sampling (MLPDS) (CVE-2018-12127)
speck for Pawan Gupta	96c06cd	2019-05-06 12:23:50 -0700	[diff] [blame]	15	- Microarchitectural Data Sampling Uncacheable Memory (MDSUM) (CVE-2019-11091)
Thomas Gleixner	96ef7af	2019-02-18 23:13:06 +0100	[diff] [blame]	16
				17	MSBDS leaks Store Buffer Entries which can be speculatively forwarded to a
				18	dependent load (store-to-load forwarding) as an optimization. The forward
				19	can also happen to a faulting or assisting load operation for a different
				20	memory address, which can be exploited under certain conditions. Store
				21	buffers are partitioned between Hyper-Threads so cross thread forwarding is
				22	not possible. But if a thread enters or exits a sleep state the store
				23	buffer is repartitioned which can expose data from one thread to the other.
				24
				25	MFBDS leaks Fill Buffer Entries. Fill buffers are used internally to manage
				26	L1 miss situations and to hold data which is returned or sent in response
				27	to a memory or I/O operation. Fill buffers can forward data to a load
				28	operation and also write data to the cache. When the fill buffer is
				29	deallocated it can retain the stale data of the preceding operations which
				30	can then be forwarded to a faulting or assisting load operation, which can
				31	be exploited under certain conditions. Fill buffers are shared between
				32	Hyper-Threads so cross thread leakage is possible.
				33
				34	MLPDS leaks Load Port Data. Load ports are used to perform load operations
				35	from memory or I/O. The received data is then forwarded to the register
				36	file or a subsequent operation. In some implementations the Load Port can
				37	contain stale data from a previous operation which can be forwarded to
				38	faulting or assisting loads under certain conditions, which again can be
				39	exploited eventually. Load ports are shared between Hyper-Threads so cross
				40	thread leakage is possible.
				41
speck for Pawan Gupta	96c06cd	2019-05-06 12:23:50 -0700	[diff] [blame]	42	MDSUM is a special case of MSBDS, MFBDS and MLPDS. An uncacheable load from
				43	memory that takes a fault or assist can leave data in a microarchitectural
				44	structure that may later be observed using one of the same methods used by
				45	MSBDS, MFBDS or MLPDS.
Thomas Gleixner	96ef7af	2019-02-18 23:13:06 +0100	[diff] [blame]	46
				47	Exposure assumptions
				48	--------------------
				49
				50	It is assumed that attack code resides in user space or in a guest with one
				51	exception. The rationale behind this assumption is that the code construct
				52	needed for exploiting MDS requires:
				53
				54	- to control the load to trigger a fault or assist
				55
				56	- to have a disclosure gadget which exposes the speculatively accessed
				57	data for consumption through a side channel.
				58
				59	- to control the pointer through which the disclosure gadget exposes the
				60	data
				61
				62	The existence of such a construct in the kernel cannot be excluded with
				63	100% certainty, but the complexity involved makes it extremly unlikely.
				64
				65	There is one exception, which is untrusted BPF. The functionality of
				66	untrusted BPF is limited, but it needs to be thoroughly investigated
				67	whether it can be used to create such a construct.
				68
				69
				70	Mitigation strategy
				71	-------------------
				72
				73	All variants have the same mitigation strategy at least for the single CPU
				74	thread case (SMT off): Force the CPU to clear the affected buffers.
				75
				76	This is achieved by using the otherwise unused and obsolete VERW
				77	instruction in combination with a microcode update. The microcode clears
				78	the affected CPU buffers when the VERW instruction is executed.
				79
				80	For virtualization there are two ways to achieve CPU buffer
				81	clearing. Either the modified VERW instruction or via the L1D Flush
				82	command. The latter is issued when L1TF mitigation is enabled so the extra
				83	VERW can be avoided. If the CPU is not affected by L1TF then VERW needs to
				84	be issued.
				85
				86	If the VERW instruction with the supplied segment selector argument is
				87	executed on a CPU without the microcode update there is no side effect
				88	other than a small number of pointlessly wasted CPU cycles.
				89
				90	This does not protect against cross Hyper-Thread attacks except for MSBDS
				91	which is only exploitable cross Hyper-thread when one of the Hyper-Threads
				92	enters a C-state.
				93
				94	The kernel provides a function to invoke the buffer clearing:
				95
				96	mds_clear_cpu_buffers()
				97
				98	The mitigation is invoked on kernel/userspace, hypervisor/guest and C-state
				99	(idle) transitions.
				100
Thomas Gleixner	81ea109	2019-02-20 09:40:40 +0100	[diff] [blame]	101	As a special quirk to address virtualization scenarios where the host has
				102	the microcode updated, but the hypervisor does not (yet) expose the
				103	MD_CLEAR CPUID bit to guests, the kernel issues the VERW instruction in the
				104	hope that it might actually clear the buffers. The state is reflected
				105	accordingly.
				106
Thomas Gleixner	96ef7af	2019-02-18 23:13:06 +0100	[diff] [blame]	107	According to current knowledge additional mitigations inside the kernel
				108	itself are not required because the necessary gadgets to expose the leaked
				109	data cannot be controlled in a way which allows exploitation from malicious
				110	user space or VM guests.
Thomas Gleixner	20041a0	2019-02-18 23:42:51 +0100	[diff] [blame]	111
Thomas Gleixner	81ea109	2019-02-20 09:40:40 +0100	[diff] [blame]	112	Kernel internal mitigation modes
				113	--------------------------------
				114
				115	======= ============================================================
				116	off Mitigation is disabled. Either the CPU is not affected or
				117	mds=off is supplied on the kernel command line
				118
Josh Poimboeuf	2a09901	2019-05-07 15:05:22 -0500	[diff] [blame]	119	full Mitigation is enabled. CPU is affected and MD_CLEAR is
Thomas Gleixner	81ea109	2019-02-20 09:40:40 +0100	[diff] [blame]	120	advertised in CPUID.
				121
				122	vmwerv Mitigation is enabled. CPU is affected and MD_CLEAR is not
				123	advertised in CPUID. That is mainly for virtualization
				124	scenarios where the host has the updated microcode but the
				125	hypervisor does not expose MD_CLEAR in CPUID. It's a best
				126	effort approach without guarantee.
				127	======= ============================================================
				128
				129	If the CPU is affected and mds=off is not supplied on the kernel command
				130	line then the kernel selects the appropriate mitigation mode depending on
				131	the availability of the MD_CLEAR CPUID bit.
				132
Thomas Gleixner	20041a0	2019-02-18 23:42:51 +0100	[diff] [blame]	133	Mitigation points
				134	-----------------
				135
				136	1. Return to user space
				137	^^^^^^^^^^^^^^^^^^^^^^^
				138
				139	When transitioning from kernel to user space the CPU buffers are flushed
				140	on affected CPUs when the mitigation is not disabled on the kernel
				141	command line. The migitation is enabled through the static key
				142	mds_user_clear.
				143
				144	The mitigation is invoked in prepare_exit_to_usermode() which covers
Andy Lutomirski	2f27bff	2019-05-14 13:24:40 -0700	[diff] [blame]	145	all but one of the kernel to user space transitions. The exception
				146	is when we return from a Non Maskable Interrupt (NMI), which is
				147	handled directly in do_nmi().
Thomas Gleixner	20041a0	2019-02-18 23:42:51 +0100	[diff] [blame]	148
Andy Lutomirski	2f27bff	2019-05-14 13:24:40 -0700	[diff] [blame]	149	(The reason that NMI is special is that prepare_exit_to_usermode() can
				150	enable IRQs. In NMI context, NMIs are blocked, and we don't want to
				151	enable IRQs with NMIs blocked.)
Thomas Gleixner	2394f59	2019-02-18 23:04:01 +0100	[diff] [blame]	152
				153
				154	2. C-State transition
				155	^^^^^^^^^^^^^^^^^^^^^
				156
				157	When a CPU goes idle and enters a C-State the CPU buffers need to be
				158	cleared on affected CPUs when SMT is active. This addresses the
				159	repartitioning of the store buffer when one of the Hyper-Threads enters
				160	a C-State.
				161
				162	When SMT is inactive, i.e. either the CPU does not support it or all
				163	sibling threads are offline CPU buffer clearing is not required.
				164
				165	The idle clearing is enabled on CPUs which are only affected by MSBDS
				166	and not by any other MDS variant. The other MDS variants cannot be
				167	protected against cross Hyper-Thread attacks because the Fill Buffer and
				168	the Load Ports are shared. So on CPUs affected by other variants, the
				169	idle clearing would be a window dressing exercise and is therefore not
				170	activated.
				171
				172	The invocation is controlled by the static key mds_idle_clear which is
				173	switched depending on the chosen mitigation mode and the SMT state of
				174	the system.
				175
				176	The buffer clear is only invoked before entering the C-State to prevent
				177	that stale data from the idling CPU from spilling to the Hyper-Thread
				178	sibling after the store buffer got repartitioned and all entries are
				179	available to the non idle sibling.
				180
				181	When coming out of idle the store buffer is partitioned again so each
				182	sibling has half of it available. The back from idle CPU could be then
				183	speculatively exposed to contents of the sibling. The buffers are
				184	flushed either on exit to user space or on VMENTER so malicious code
				185	in user space or the guest cannot speculatively access them.
				186
				187	The mitigation is hooked into all variants of halt()/mwait(), but does
				188	not cover the legacy ACPI IO-Port mechanism because the ACPI idle driver
				189	has been superseded by the intel_idle driver around 2010 and is
				190	preferred on all affected CPUs which are expected to gain the MD_CLEAR
				191	functionality in microcode. Aside of that the IO-Port mechanism is a
				192	legacy interface which is only used on older systems which are either
				193	not affected or do not receive microcode updates anymore.