Blame - Documentation/hw-vuln/l1tf.rst - kernel/msm-4.9

blob: 31653a9f0e1b3496c468e5e7c82ffccaf0420018 [file] [log] [blame]

Thomas Gleixner	93aed24	2018-07-13 16:23:26 +0200	[diff] [blame]	1	L1TF - L1 Terminal Fault
				2	========================
				3
				4	L1 Terminal Fault is a hardware vulnerability which allows unprivileged
				5	speculative access to data which is available in the Level 1 Data Cache
				6	when the page table entry controlling the virtual address, which is used
				7	for the access, has the Present bit cleared or other reserved bits set.
				8
				9	Affected processors
				10	-------------------
				11
				12	This vulnerability affects a wide range of Intel processors. The
				13	vulnerability is not present on:
				14
				15	- Processors from AMD, Centaur and other non Intel vendors
				16
				17	- Older processor models, where the CPU family is < 6
				18
				19	- A range of Intel ATOM processors (Cedarview, Cloverview, Lincroft,
Tony Luck	03b3614	2018-07-19 13:49:58 -0700	[diff] [blame]	20	Penwell, Pineview, Silvermont, Airmont, Merrifield)
Thomas Gleixner	93aed24	2018-07-13 16:23:26 +0200	[diff] [blame]	21
Thomas Gleixner	93aed24	2018-07-13 16:23:26 +0200	[diff] [blame]	22	- The Intel XEON PHI family
				23
				24	- Intel processors which have the ARCH_CAP_RDCL_NO bit set in the
				25	IA32_ARCH_CAPABILITIES MSR. If the bit is set the CPU is not affected
				26	by the Meltdown vulnerability either. These CPUs should become
				27	available by end of 2018.
				28
				29	Whether a processor is affected or not can be read out from the L1TF
				30	vulnerability file in sysfs. See :ref:`l1tf_sys_info`.
				31
				32	Related CVEs
				33	------------
				34
				35	The following CVE entries are related to the L1TF vulnerability:
				36
				37	============= ================= ==============================
				38	CVE-2018-3615 L1 Terminal Fault SGX related aspects
				39	CVE-2018-3620 L1 Terminal Fault OS, SMM related aspects
				40	CVE-2018-3646 L1 Terminal Fault Virtualization related aspects
				41	============= ================= ==============================
				42
				43	Problem
				44	-------
				45
				46	If an instruction accesses a virtual address for which the relevant page
				47	table entry (PTE) has the Present bit cleared or other reserved bits set,
				48	then speculative execution ignores the invalid PTE and loads the referenced
				49	data if it is present in the Level 1 Data Cache, as if the page referenced
				50	by the address bits in the PTE was still present and accessible.
				51
				52	While this is a purely speculative mechanism and the instruction will raise
				53	a page fault when it is retired eventually, the pure act of loading the
				54	data and making it available to other speculative instructions opens up the
				55	opportunity for side channel attacks to unprivileged malicious code,
				56	similar to the Meltdown attack.
				57
				58	While Meltdown breaks the user space to kernel space protection, L1TF
				59	allows to attack any physical memory address in the system and the attack
				60	works across all protection domains. It allows an attack of SGX and also
				61	works from inside virtual machines because the speculation bypasses the
				62	extended page table (EPT) protection mechanism.
				63
				64
				65	Attack scenarios
				66	----------------
				67
				68	1. Malicious user space
				69	^^^^^^^^^^^^^^^^^^^^^^^
				70
				71	Operating Systems store arbitrary information in the address bits of a
				72	PTE which is marked non present. This allows a malicious user space
				73	application to attack the physical memory to which these PTEs resolve.
				74	In some cases user-space can maliciously influence the information
				75	encoded in the address bits of the PTE, thus making attacks more
				76	deterministic and more practical.
				77
				78	The Linux kernel contains a mitigation for this attack vector, PTE
				79	inversion, which is permanently enabled and has no performance
				80	impact. The kernel ensures that the address bits of PTEs, which are not
				81	marked present, never point to cacheable physical memory space.
				82
				83	A system with an up to date kernel is protected against attacks from
				84	malicious user space applications.
				85
				86	2. Malicious guest in a virtual machine
				87	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
				88
				89	The fact that L1TF breaks all domain protections allows malicious guest
				90	OSes, which can control the PTEs directly, and malicious guest user
				91	space applications, which run on an unprotected guest kernel lacking the
				92	PTE inversion mitigation for L1TF, to attack physical host memory.
				93
				94	A special aspect of L1TF in the context of virtualization is symmetric
				95	multi threading (SMT). The Intel implementation of SMT is called
				96	HyperThreading. The fact that Hyperthreads on the affected processors
				97	share the L1 Data Cache (L1D) is important for this. As the flaw allows
				98	only to attack data which is present in L1D, a malicious guest running
				99	on one Hyperthread can attack the data which is brought into the L1D by
				100	the context which runs on the sibling Hyperthread of the same physical
				101	core. This context can be host OS, host user space or a different guest.
				102
				103	If the processor does not support Extended Page Tables, the attack is
				104	only possible, when the hypervisor does not sanitize the content of the
				105	effective (shadow) page tables.
				106
				107	While solutions exist to mitigate these attack vectors fully, these
				108	mitigations are not enabled by default in the Linux kernel because they
				109	can affect performance significantly. The kernel provides several
				110	mechanisms which can be utilized to address the problem depending on the
				111	deployment scenario. The mitigations, their protection scope and impact
				112	are described in the next sections.
				113
Tony Luck	03b3614	2018-07-19 13:49:58 -0700	[diff] [blame]	114	The default mitigations and the rationale for choosing them are explained
Thomas Gleixner	93aed24	2018-07-13 16:23:26 +0200	[diff] [blame]	115	at the end of this document. See :ref:`default_mitigations`.
				116
				117	.. _l1tf_sys_info:
				118
				119	L1TF system information
				120	-----------------------
				121
				122	The Linux kernel provides a sysfs interface to enumerate the current L1TF
				123	status of the system: whether the system is vulnerable, and which
				124	mitigations are active. The relevant sysfs file is:
				125
				126	/sys/devices/system/cpu/vulnerabilities/l1tf
				127
				128	The possible values in this file are:
				129
				130	=========================== ===============================
				131	'Not affected' The processor is not vulnerable
				132	'Mitigation: PTE Inversion' The host protection is active
				133	=========================== ===============================
				134
				135	If KVM/VMX is enabled and the processor is vulnerable then the following
				136	information is appended to the 'Mitigation: PTE Inversion' part:
				137
				138	- SMT status:
				139
				140	===================== ================
				141	'VMX: SMT vulnerable' SMT is enabled
				142	'VMX: SMT disabled' SMT is disabled
				143	===================== ================
				144
				145	- L1D Flush mode:
				146
				147	================================ ====================================
				148	'L1D vulnerable' L1D flushing is disabled
				149
				150	'L1D conditional cache flushes' L1D flush is conditionally enabled
				151
				152	'L1D cache flushes' L1D flush is unconditionally enabled
				153	================================ ====================================
				154
				155	The resulting grade of protection is discussed in the following sections.
				156
				157
				158	Host mitigation mechanism
				159	-------------------------
				160
				161	The kernel is unconditionally protected against L1TF attacks from malicious
				162	user space running on the host.
				163
				164
				165	Guest mitigation mechanisms
				166	---------------------------
				167
				168	.. _l1d_flush:
				169
				170	1. L1D flush on VMENTER
				171	^^^^^^^^^^^^^^^^^^^^^^^
				172
				173	To make sure that a guest cannot attack data which is present in the L1D
				174	the hypervisor flushes the L1D before entering the guest.
				175
				176	Flushing the L1D evicts not only the data which should not be accessed
				177	by a potentially malicious guest, it also flushes the guest
				178	data. Flushing the L1D has a performance impact as the processor has to
				179	bring the flushed guest data back into the L1D. Depending on the
				180	frequency of VMEXIT/VMENTER and the type of computations in the guest
				181	performance degradation in the range of 1% to 50% has been observed. For
				182	scenarios where guest VMEXIT/VMENTER are rare the performance impact is
				183	minimal. Virtio and mechanisms like posted interrupts are designed to
				184	confine the VMEXITs to a bare minimum, but specific configurations and
				185	application scenarios might still suffer from a high VMEXIT rate.
				186
				187	The kernel provides two L1D flush modes:
				188	- conditional ('cond')
				189	- unconditional ('always')
				190
				191	The conditional mode avoids L1D flushing after VMEXITs which execute
Tony Luck	03b3614	2018-07-19 13:49:58 -0700	[diff] [blame]	192	only audited code paths before the corresponding VMENTER. These code
				193	paths have been verified that they cannot expose secrets or other
Thomas Gleixner	93aed24	2018-07-13 16:23:26 +0200	[diff] [blame]	194	interesting data to an attacker, but they can leak information about the
				195	address space layout of the hypervisor.
				196
				197	Unconditional mode flushes L1D on all VMENTER invocations and provides
				198	maximum protection. It has a higher overhead than the conditional
				199	mode. The overhead cannot be quantified correctly as it depends on the
Tony Luck	03b3614	2018-07-19 13:49:58 -0700	[diff] [blame]	200	workload scenario and the resulting number of VMEXITs.
Thomas Gleixner	93aed24	2018-07-13 16:23:26 +0200	[diff] [blame]	201
				202	The general recommendation is to enable L1D flush on VMENTER. The kernel
				203	defaults to conditional mode on affected processors.
				204
				205	Note, that L1D flush does not prevent the SMT problem because the
				206	sibling thread will also bring back its data into the L1D which makes it
				207	attackable again.
				208
				209	L1D flush can be controlled by the administrator via the kernel command
				210	line and sysfs control files. See :ref:`mitigation_control_command_line`
				211	and :ref:`mitigation_control_kvm`.
				212
				213	.. _guest_confinement:
				214
				215	2. Guest VCPU confinement to dedicated physical cores
				216	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
				217
				218	To address the SMT problem, it is possible to make a guest or a group of
				219	guests affine to one or more physical cores. The proper mechanism for
				220	that is to utilize exclusive cpusets to ensure that no other guest or
				221	host tasks can run on these cores.
				222
				223	If only a single guest or related guests run on sibling SMT threads on
				224	the same physical core then they can only attack their own memory and
				225	restricted parts of the host memory.
				226
				227	Host memory is attackable, when one of the sibling SMT threads runs in
				228	host OS (hypervisor) context and the other in guest context. The amount
				229	of valuable information from the host OS context depends on the context
				230	which the host OS executes, i.e. interrupts, soft interrupts and kernel
				231	threads. The amount of valuable data from these contexts cannot be
				232	declared as non-interesting for an attacker without deep inspection of
				233	the code.
				234
				235	Note, that assigning guests to a fixed set of physical cores affects
				236	the ability of the scheduler to do load balancing and might have
				237	negative effects on CPU utilization depending on the hosting
				238	scenario. Disabling SMT might be a viable alternative for particular
				239	scenarios.
				240
				241	For further information about confining guests to a single or to a group
				242	of cores consult the cpusets documentation:
				243
				244	https://www.kernel.org/doc/Documentation/cgroup-v1/cpusets.txt
				245
				246	.. _interrupt_isolation:
				247
				248	3. Interrupt affinity
				249	^^^^^^^^^^^^^^^^^^^^^
				250
				251	Interrupts can be made affine to logical CPUs. This is not universally
				252	true because there are types of interrupts which are truly per CPU
				253	interrupts, e.g. the local timer interrupt. Aside of that multi queue
				254	devices affine their interrupts to single CPUs or groups of CPUs per
				255	queue without allowing the administrator to control the affinities.
				256
				257	Moving the interrupts, which can be affinity controlled, away from CPUs
				258	which run untrusted guests, reduces the attack vector space.
				259
				260	Whether the interrupts with are affine to CPUs, which run untrusted
				261	guests, provide interesting data for an attacker depends on the system
				262	configuration and the scenarios which run on the system. While for some
Tony Luck	03b3614	2018-07-19 13:49:58 -0700	[diff] [blame]	263	of the interrupts it can be assumed that they won't expose interesting
Thomas Gleixner	93aed24	2018-07-13 16:23:26 +0200	[diff] [blame]	264	information beyond exposing hints about the host OS memory layout, there
				265	is no way to make general assumptions.
				266
				267	Interrupt affinity can be controlled by the administrator via the
				268	/proc/irq/$NR/smp_affinity[_list] files. Limited documentation is
				269	available at:
				270
				271	https://www.kernel.org/doc/Documentation/IRQ-affinity.txt
				272
				273	.. _smt_control:
				274
				275	4. SMT control
				276	^^^^^^^^^^^^^^
				277
				278	To prevent the SMT issues of L1TF it might be necessary to disable SMT
				279	completely. Disabling SMT can have a significant performance impact, but
				280	the impact depends on the hosting scenario and the type of workloads.
				281	The impact of disabling SMT needs also to be weighted against the impact
				282	of other mitigation solutions like confining guests to dedicated cores.
				283
				284	The kernel provides a sysfs interface to retrieve the status of SMT and
				285	to control it. It also provides a kernel command line interface to
				286	control SMT.
				287
				288	The kernel command line interface consists of the following options:
				289
				290	=========== ==========================================================
				291	nosmt Affects the bring up of the secondary CPUs during boot. The
				292	kernel tries to bring all present CPUs online during the
				293	boot process. "nosmt" makes sure that from each physical
				294	core only one - the so called primary (hyper) thread is
				295	activated. Due to a design flaw of Intel processors related
				296	to Machine Check Exceptions the non primary siblings have
				297	to be brought up at least partially and are then shut down
				298	again. "nosmt" can be undone via the sysfs interface.
				299
Tony Luck	03b3614	2018-07-19 13:49:58 -0700	[diff] [blame]	300	nosmt=force Has the same effect as "nosmt" but it does not allow to
Thomas Gleixner	93aed24	2018-07-13 16:23:26 +0200	[diff] [blame]	301	undo the SMT disable via the sysfs interface.
				302	=========== ==========================================================
				303
				304	The sysfs interface provides two files:
				305
				306	- /sys/devices/system/cpu/smt/control
				307	- /sys/devices/system/cpu/smt/active
				308
				309	/sys/devices/system/cpu/smt/control:
				310
				311	This file allows to read out the SMT control state and provides the
				312	ability to disable or (re)enable SMT. The possible states are:
				313
				314	============== ===================================================
				315	on SMT is supported by the CPU and enabled. All
				316	logical CPUs can be onlined and offlined without
				317	restrictions.
				318
				319	off SMT is supported by the CPU and disabled. Only
				320	the so called primary SMT threads can be onlined
				321	and offlined without restrictions. An attempt to
				322	online a non-primary sibling is rejected
				323
				324	forceoff Same as 'off' but the state cannot be controlled.
				325	Attempts to write to the control file are rejected.
				326
				327	notsupported The processor does not support SMT. It's therefore
				328	not affected by the SMT implications of L1TF.
				329	Attempts to write to the control file are rejected.
				330	============== ===================================================
				331
				332	The possible states which can be written into this file to control SMT
				333	state are:
				334
				335	- on
				336	- off
				337	- forceoff
				338
				339	/sys/devices/system/cpu/smt/active:
				340
				341	This file reports whether SMT is enabled and active, i.e. if on any
				342	physical core two or more sibling threads are online.
				343
				344	SMT control is also possible at boot time via the l1tf kernel command
				345	line parameter in combination with L1D flush control. See
				346	:ref:`mitigation_control_command_line`.
				347
				348	5. Disabling EPT
				349	^^^^^^^^^^^^^^^^
				350
				351	Disabling EPT for virtual machines provides full mitigation for L1TF even
				352	with SMT enabled, because the effective page tables for guests are
				353	managed and sanitized by the hypervisor. Though disabling EPT has a
				354	significant performance impact especially when the Meltdown mitigation
				355	KPTI is enabled.
				356
				357	EPT can be disabled in the hypervisor via the 'kvm-intel.ept' parameter.
				358
				359	There is ongoing research and development for new mitigation mechanisms to
				360	address the performance impact of disabling SMT or EPT.
				361
				362	.. _mitigation_control_command_line:
				363
				364	Mitigation control on the kernel command line
				365	---------------------------------------------
				366
				367	The kernel command line allows to control the L1TF mitigations at boot
				368	time with the option "l1tf=". The valid arguments for this option are:
				369
				370	============ =============================================================
				371	full Provides all available mitigations for the L1TF
				372	vulnerability. Disables SMT and enables all mitigations in
				373	the hypervisors, i.e. unconditional L1D flushing
				374
				375	SMT control and L1D flush control via the sysfs interface
				376	is still possible after boot. Hypervisors will issue a
				377	warning when the first VM is started in a potentially
				378	insecure configuration, i.e. SMT enabled or L1D flush
				379	disabled.
				380
				381	full,force Same as 'full', but disables SMT and L1D flush runtime
				382	control. Implies the 'nosmt=force' command line option.
				383	(i.e. sysfs control of SMT is disabled.)
				384
				385	flush Leaves SMT enabled and enables the default hypervisor
				386	mitigation, i.e. conditional L1D flushing
				387
				388	SMT control and L1D flush control via the sysfs interface
				389	is still possible after boot. Hypervisors will issue a
				390	warning when the first VM is started in a potentially
				391	insecure configuration, i.e. SMT enabled or L1D flush
				392	disabled.
				393
				394	flush,nosmt Disables SMT and enables the default hypervisor mitigation,
				395	i.e. conditional L1D flushing.
				396
				397	SMT control and L1D flush control via the sysfs interface
				398	is still possible after boot. Hypervisors will issue a
				399	warning when the first VM is started in a potentially
				400	insecure configuration, i.e. SMT enabled or L1D flush
				401	disabled.
				402
				403	flush,nowarn Same as 'flush', but hypervisors will not warn when a VM is
				404	started in a potentially insecure configuration.
				405
				406	off Disables hypervisor mitigations and doesn't emit any
				407	warnings.
Michal Hocko	c369258	2018-11-13 19:49:10 +0100	[diff] [blame]	408	It also drops the swap size and available RAM limit restrictions
				409	on both hypervisor and bare metal.
				410
Thomas Gleixner	93aed24	2018-07-13 16:23:26 +0200	[diff] [blame]	411	============ =============================================================
				412
				413	The default is 'flush'. For details about L1D flushing see :ref:`l1d_flush`.
				414
				415
				416	.. _mitigation_control_kvm:
				417
				418	Mitigation control for KVM - module parameter
				419	-------------------------------------------------------------
				420
				421	The KVM hypervisor mitigation mechanism, flushing the L1D cache when
				422	entering a guest, can be controlled with a module parameter.
				423
				424	The option/parameter is "kvm-intel.vmentry_l1d_flush=". It takes the
				425	following arguments:
				426
				427	============ ==============================================================
				428	always L1D cache flush on every VMENTER.
				429
				430	cond Flush L1D on VMENTER only when the code between VMEXIT and
				431	VMENTER can leak host memory which is considered
				432	interesting for an attacker. This still can leak host memory
				433	which allows e.g. to determine the hosts address space layout.
				434
				435	never Disables the mitigation
				436	============ ==============================================================
				437
				438	The parameter can be provided on the kernel command line, as a module
				439	parameter when loading the modules and at runtime modified via the sysfs
				440	file:
				441
				442	/sys/module/kvm_intel/parameters/vmentry_l1d_flush
				443
				444	The default is 'cond'. If 'l1tf=full,force' is given on the kernel command
				445	line, then 'always' is enforced and the kvm-intel.vmentry_l1d_flush
				446	module parameter is ignored and writes to the sysfs file are rejected.
				447
Thomas Gleixner	3880bc1	2019-02-19 00:02:31 +0100	[diff] [blame]	448	.. _mitigation_selection:
Thomas Gleixner	93aed24	2018-07-13 16:23:26 +0200	[diff] [blame]	449
				450	Mitigation selection guide
				451	--------------------------
				452
				453	1. No virtualization in use
				454	^^^^^^^^^^^^^^^^^^^^^^^^^^^
				455
				456	The system is protected by the kernel unconditionally and no further
				457	action is required.
				458
				459	2. Virtualization with trusted guests
				460	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
				461
				462	If the guest comes from a trusted source and the guest OS kernel is
				463	guaranteed to have the L1TF mitigations in place the system is fully
				464	protected against L1TF and no further action is required.
				465
				466	To avoid the overhead of the default L1D flushing on VMENTER the
				467	administrator can disable the flushing via the kernel command line and
				468	sysfs control files. See :ref:`mitigation_control_command_line` and
				469	:ref:`mitigation_control_kvm`.
				470
				471
				472	3. Virtualization with untrusted guests
				473	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
				474
				475	3.1. SMT not supported or disabled
				476	""""""""""""""""""""""""""""""""""
				477
				478	If SMT is not supported by the processor or disabled in the BIOS or by
				479	the kernel, it's only required to enforce L1D flushing on VMENTER.
				480
				481	Conditional L1D flushing is the default behaviour and can be tuned. See
				482	:ref:`mitigation_control_command_line` and :ref:`mitigation_control_kvm`.
				483
				484	3.2. EPT not supported or disabled
				485	""""""""""""""""""""""""""""""""""
				486
				487	If EPT is not supported by the processor or disabled in the hypervisor,
				488	the system is fully protected. SMT can stay enabled and L1D flushing on
				489	VMENTER is not required.
				490
				491	EPT can be disabled in the hypervisor via the 'kvm-intel.ept' parameter.
				492
				493	3.3. SMT and EPT supported and active
				494	"""""""""""""""""""""""""""""""""""""
				495
				496	If SMT and EPT are supported and active then various degrees of
				497	mitigations can be employed:
				498
				499	- L1D flushing on VMENTER:
				500
				501	L1D flushing on VMENTER is the minimal protection requirement, but it
				502	is only potent in combination with other mitigation methods.
				503
				504	Conditional L1D flushing is the default behaviour and can be tuned. See
				505	:ref:`mitigation_control_command_line` and :ref:`mitigation_control_kvm`.
				506
				507	- Guest confinement:
				508
				509	Confinement of guests to a single or a group of physical cores which
				510	are not running any other processes, can reduce the attack surface
				511	significantly, but interrupts, soft interrupts and kernel threads can
				512	still expose valuable data to a potential attacker. See
				513	:ref:`guest_confinement`.
				514
				515	- Interrupt isolation:
				516
				517	Isolating the guest CPUs from interrupts can reduce the attack surface
				518	further, but still allows a malicious guest to explore a limited amount
				519	of host physical memory. This can at least be used to gain knowledge
				520	about the host address space layout. The interrupts which have a fixed
				521	affinity to the CPUs which run the untrusted guests can depending on
				522	the scenario still trigger soft interrupts and schedule kernel threads
				523	which might expose valuable information. See
				524	:ref:`interrupt_isolation`.
				525
				526	The above three mitigation methods combined can provide protection to a
				527	certain degree, but the risk of the remaining attack surface has to be
				528	carefully analyzed. For full protection the following methods are
				529	available:
				530
				531	- Disabling SMT:
				532
				533	Disabling SMT and enforcing the L1D flushing provides the maximum
				534	amount of protection. This mitigation is not depending on any of the
				535	above mitigation methods.
				536
				537	SMT control and L1D flushing can be tuned by the command line
				538	parameters 'nosmt', 'l1tf', 'kvm-intel.vmentry_l1d_flush' and at run
				539	time with the matching sysfs control files. See :ref:`smt_control`,
				540	:ref:`mitigation_control_command_line` and
				541	:ref:`mitigation_control_kvm`.
				542
				543	- Disabling EPT:
				544
				545	Disabling EPT provides the maximum amount of protection as well. It is
				546	not depending on any of the above mitigation methods. SMT can stay
				547	enabled and L1D flushing is not required, but the performance impact is
				548	significant.
				549
				550	EPT can be disabled in the hypervisor via the 'kvm-intel.ept'
				551	parameter.
				552
Paolo Bonzini	f56c8ee	2018-08-05 16:07:47 +0200	[diff] [blame]	553	3.4. Nested virtual machines
				554	""""""""""""""""""""""""""""
				555
				556	When nested virtualization is in use, three operating systems are involved:
				557	the bare metal hypervisor, the nested hypervisor and the nested virtual
				558	machine. VMENTER operations from the nested hypervisor into the nested
				559	guest will always be processed by the bare metal hypervisor. If KVM is the
Salvatore Bonaccorso	29d4af1	2018-08-15 07:46:04 +0200	[diff] [blame]	560	bare metal hypervisor it will:
Paolo Bonzini	f56c8ee	2018-08-05 16:07:47 +0200	[diff] [blame]	561
				562	- Flush the L1D cache on every switch from the nested hypervisor to the
				563	nested virtual machine, so that the nested hypervisor's secrets are not
				564	exposed to the nested virtual machine;
				565
				566	- Flush the L1D cache on every switch from the nested virtual machine to
				567	the nested hypervisor; this is a complex operation, and flushing the L1D
				568	cache avoids that the bare metal hypervisor's secrets are exposed to the
				569	nested virtual machine;
				570
				571	- Instruct the nested hypervisor to not perform any L1D cache flush. This
				572	is an optimization to avoid double L1D flushing.
				573
Thomas Gleixner	93aed24	2018-07-13 16:23:26 +0200	[diff] [blame]	574
				575	.. _default_mitigations:
				576
				577	Default mitigations
				578	-------------------
				579
				580	The kernel default mitigations for vulnerable processors are:
				581
				582	- PTE inversion to protect against malicious user space. This is done
Michal Hocko	c369258	2018-11-13 19:49:10 +0100	[diff] [blame]	583	unconditionally and cannot be controlled. The swap storage is limited
				584	to ~16TB.
Thomas Gleixner	93aed24	2018-07-13 16:23:26 +0200	[diff] [blame]	585
				586	- L1D conditional flushing on VMENTER when EPT is enabled for
				587	a guest.
				588
				589	The kernel does not by default enforce the disabling of SMT, which leaves
				590	SMT systems vulnerable when running untrusted guests with EPT enabled.
				591
				592	The rationale for this choice is:
				593
				594	- Force disabling SMT can break existing setups, especially with
				595	unattended updates.
				596
				597	- If regular users run untrusted guests on their machine, then L1TF is
				598	just an add on to other malware which might be embedded in an untrusted
				599	guest, e.g. spam-bots or attacks on the local network.
				600
				601	There is no technical way to prevent a user from running untrusted code
				602	on their machines blindly.
				603
				604	- It's technically extremely unlikely and from today's knowledge even
				605	impossible that L1TF can be exploited via the most popular attack
				606	mechanisms like JavaScript because these mechanisms have no way to
				607	control PTEs. If this would be possible and not other mitigation would
				608	be possible, then the default might be different.
				609
				610	- The administrators of cloud and hosting setups have to carefully
				611	analyze the risk for their scenarios and make the appropriate
				612	mitigation choices, which might even vary across their deployed
				613	machines and also result in other changes of their overall setup.
				614	There is no way for the kernel to provide a sensible default for this
				615	kind of scenarios.