Thomas Gleixner | 96ef7af | 2019-02-18 23:13:06 +0100 | [diff] [blame] | 1 | Microarchitectural Data Sampling (MDS) mitigation |
| 2 | ================================================= |
| 3 | |
| 4 | .. _mds: |
| 5 | |
| 6 | Overview |
| 7 | -------- |
| 8 | |
| 9 | Microarchitectural Data Sampling (MDS) is a family of side channel attacks |
| 10 | on internal buffers in Intel CPUs. The variants are: |
| 11 | |
| 12 | - Microarchitectural Store Buffer Data Sampling (MSBDS) (CVE-2018-12126) |
| 13 | - Microarchitectural Fill Buffer Data Sampling (MFBDS) (CVE-2018-12130) |
| 14 | - Microarchitectural Load Port Data Sampling (MLPDS) (CVE-2018-12127) |
speck for Pawan Gupta | 96c06cd | 2019-05-06 12:23:50 -0700 | [diff] [blame] | 15 | - Microarchitectural Data Sampling Uncacheable Memory (MDSUM) (CVE-2019-11091) |
Thomas Gleixner | 96ef7af | 2019-02-18 23:13:06 +0100 | [diff] [blame] | 16 | |
| 17 | MSBDS leaks Store Buffer Entries which can be speculatively forwarded to a |
| 18 | dependent load (store-to-load forwarding) as an optimization. The forward |
| 19 | can also happen to a faulting or assisting load operation for a different |
| 20 | memory address, which can be exploited under certain conditions. Store |
| 21 | buffers are partitioned between Hyper-Threads so cross thread forwarding is |
| 22 | not possible. But if a thread enters or exits a sleep state the store |
| 23 | buffer is repartitioned which can expose data from one thread to the other. |
| 24 | |
| 25 | MFBDS leaks Fill Buffer Entries. Fill buffers are used internally to manage |
| 26 | L1 miss situations and to hold data which is returned or sent in response |
| 27 | to a memory or I/O operation. Fill buffers can forward data to a load |
| 28 | operation and also write data to the cache. When the fill buffer is |
| 29 | deallocated it can retain the stale data of the preceding operations which |
| 30 | can then be forwarded to a faulting or assisting load operation, which can |
| 31 | be exploited under certain conditions. Fill buffers are shared between |
| 32 | Hyper-Threads so cross thread leakage is possible. |
| 33 | |
| 34 | MLPDS leaks Load Port Data. Load ports are used to perform load operations |
| 35 | from memory or I/O. The received data is then forwarded to the register |
| 36 | file or a subsequent operation. In some implementations the Load Port can |
| 37 | contain stale data from a previous operation which can be forwarded to |
| 38 | faulting or assisting loads under certain conditions, which again can be |
| 39 | exploited eventually. Load ports are shared between Hyper-Threads so cross |
| 40 | thread leakage is possible. |
| 41 | |
speck for Pawan Gupta | 96c06cd | 2019-05-06 12:23:50 -0700 | [diff] [blame] | 42 | MDSUM is a special case of MSBDS, MFBDS and MLPDS. An uncacheable load from |
| 43 | memory that takes a fault or assist can leave data in a microarchitectural |
| 44 | structure that may later be observed using one of the same methods used by |
| 45 | MSBDS, MFBDS or MLPDS. |
Thomas Gleixner | 96ef7af | 2019-02-18 23:13:06 +0100 | [diff] [blame] | 46 | |
| 47 | Exposure assumptions |
| 48 | -------------------- |
| 49 | |
| 50 | It is assumed that attack code resides in user space or in a guest with one |
| 51 | exception. The rationale behind this assumption is that the code construct |
| 52 | needed for exploiting MDS requires: |
| 53 | |
| 54 | - to control the load to trigger a fault or assist |
| 55 | |
| 56 | - to have a disclosure gadget which exposes the speculatively accessed |
| 57 | data for consumption through a side channel. |
| 58 | |
| 59 | - to control the pointer through which the disclosure gadget exposes the |
| 60 | data |
| 61 | |
| 62 | The existence of such a construct in the kernel cannot be excluded with |
| 63 | 100% certainty, but the complexity involved makes it extremly unlikely. |
| 64 | |
| 65 | There is one exception, which is untrusted BPF. The functionality of |
| 66 | untrusted BPF is limited, but it needs to be thoroughly investigated |
| 67 | whether it can be used to create such a construct. |
| 68 | |
| 69 | |
| 70 | Mitigation strategy |
| 71 | ------------------- |
| 72 | |
| 73 | All variants have the same mitigation strategy at least for the single CPU |
| 74 | thread case (SMT off): Force the CPU to clear the affected buffers. |
| 75 | |
| 76 | This is achieved by using the otherwise unused and obsolete VERW |
| 77 | instruction in combination with a microcode update. The microcode clears |
| 78 | the affected CPU buffers when the VERW instruction is executed. |
| 79 | |
| 80 | For virtualization there are two ways to achieve CPU buffer |
| 81 | clearing. Either the modified VERW instruction or via the L1D Flush |
| 82 | command. The latter is issued when L1TF mitigation is enabled so the extra |
| 83 | VERW can be avoided. If the CPU is not affected by L1TF then VERW needs to |
| 84 | be issued. |
| 85 | |
| 86 | If the VERW instruction with the supplied segment selector argument is |
| 87 | executed on a CPU without the microcode update there is no side effect |
| 88 | other than a small number of pointlessly wasted CPU cycles. |
| 89 | |
| 90 | This does not protect against cross Hyper-Thread attacks except for MSBDS |
| 91 | which is only exploitable cross Hyper-thread when one of the Hyper-Threads |
| 92 | enters a C-state. |
| 93 | |
| 94 | The kernel provides a function to invoke the buffer clearing: |
| 95 | |
| 96 | mds_clear_cpu_buffers() |
| 97 | |
| 98 | The mitigation is invoked on kernel/userspace, hypervisor/guest and C-state |
| 99 | (idle) transitions. |
| 100 | |
Thomas Gleixner | 81ea109 | 2019-02-20 09:40:40 +0100 | [diff] [blame] | 101 | As a special quirk to address virtualization scenarios where the host has |
| 102 | the microcode updated, but the hypervisor does not (yet) expose the |
| 103 | MD_CLEAR CPUID bit to guests, the kernel issues the VERW instruction in the |
| 104 | hope that it might actually clear the buffers. The state is reflected |
| 105 | accordingly. |
| 106 | |
Thomas Gleixner | 96ef7af | 2019-02-18 23:13:06 +0100 | [diff] [blame] | 107 | According to current knowledge additional mitigations inside the kernel |
| 108 | itself are not required because the necessary gadgets to expose the leaked |
| 109 | data cannot be controlled in a way which allows exploitation from malicious |
| 110 | user space or VM guests. |
Thomas Gleixner | 20041a0 | 2019-02-18 23:42:51 +0100 | [diff] [blame] | 111 | |
Thomas Gleixner | 81ea109 | 2019-02-20 09:40:40 +0100 | [diff] [blame] | 112 | Kernel internal mitigation modes |
| 113 | -------------------------------- |
| 114 | |
| 115 | ======= ============================================================ |
| 116 | off Mitigation is disabled. Either the CPU is not affected or |
| 117 | mds=off is supplied on the kernel command line |
| 118 | |
Josh Poimboeuf | 2a09901 | 2019-05-07 15:05:22 -0500 | [diff] [blame] | 119 | full Mitigation is enabled. CPU is affected and MD_CLEAR is |
Thomas Gleixner | 81ea109 | 2019-02-20 09:40:40 +0100 | [diff] [blame] | 120 | advertised in CPUID. |
| 121 | |
| 122 | vmwerv Mitigation is enabled. CPU is affected and MD_CLEAR is not |
| 123 | advertised in CPUID. That is mainly for virtualization |
| 124 | scenarios where the host has the updated microcode but the |
| 125 | hypervisor does not expose MD_CLEAR in CPUID. It's a best |
| 126 | effort approach without guarantee. |
| 127 | ======= ============================================================ |
| 128 | |
| 129 | If the CPU is affected and mds=off is not supplied on the kernel command |
| 130 | line then the kernel selects the appropriate mitigation mode depending on |
| 131 | the availability of the MD_CLEAR CPUID bit. |
| 132 | |
Thomas Gleixner | 20041a0 | 2019-02-18 23:42:51 +0100 | [diff] [blame] | 133 | Mitigation points |
| 134 | ----------------- |
| 135 | |
| 136 | 1. Return to user space |
| 137 | ^^^^^^^^^^^^^^^^^^^^^^^ |
| 138 | |
| 139 | When transitioning from kernel to user space the CPU buffers are flushed |
| 140 | on affected CPUs when the mitigation is not disabled on the kernel |
| 141 | command line. The migitation is enabled through the static key |
| 142 | mds_user_clear. |
| 143 | |
| 144 | The mitigation is invoked in prepare_exit_to_usermode() which covers |
Andy Lutomirski | 2f27bff | 2019-05-14 13:24:40 -0700 | [diff] [blame] | 145 | all but one of the kernel to user space transitions. The exception |
| 146 | is when we return from a Non Maskable Interrupt (NMI), which is |
| 147 | handled directly in do_nmi(). |
Thomas Gleixner | 20041a0 | 2019-02-18 23:42:51 +0100 | [diff] [blame] | 148 | |
Andy Lutomirski | 2f27bff | 2019-05-14 13:24:40 -0700 | [diff] [blame] | 149 | (The reason that NMI is special is that prepare_exit_to_usermode() can |
| 150 | enable IRQs. In NMI context, NMIs are blocked, and we don't want to |
| 151 | enable IRQs with NMIs blocked.) |
Thomas Gleixner | 2394f59 | 2019-02-18 23:04:01 +0100 | [diff] [blame] | 152 | |
| 153 | |
| 154 | 2. C-State transition |
| 155 | ^^^^^^^^^^^^^^^^^^^^^ |
| 156 | |
| 157 | When a CPU goes idle and enters a C-State the CPU buffers need to be |
| 158 | cleared on affected CPUs when SMT is active. This addresses the |
| 159 | repartitioning of the store buffer when one of the Hyper-Threads enters |
| 160 | a C-State. |
| 161 | |
| 162 | When SMT is inactive, i.e. either the CPU does not support it or all |
| 163 | sibling threads are offline CPU buffer clearing is not required. |
| 164 | |
| 165 | The idle clearing is enabled on CPUs which are only affected by MSBDS |
| 166 | and not by any other MDS variant. The other MDS variants cannot be |
| 167 | protected against cross Hyper-Thread attacks because the Fill Buffer and |
| 168 | the Load Ports are shared. So on CPUs affected by other variants, the |
| 169 | idle clearing would be a window dressing exercise and is therefore not |
| 170 | activated. |
| 171 | |
| 172 | The invocation is controlled by the static key mds_idle_clear which is |
| 173 | switched depending on the chosen mitigation mode and the SMT state of |
| 174 | the system. |
| 175 | |
| 176 | The buffer clear is only invoked before entering the C-State to prevent |
| 177 | that stale data from the idling CPU from spilling to the Hyper-Thread |
| 178 | sibling after the store buffer got repartitioned and all entries are |
| 179 | available to the non idle sibling. |
| 180 | |
| 181 | When coming out of idle the store buffer is partitioned again so each |
| 182 | sibling has half of it available. The back from idle CPU could be then |
| 183 | speculatively exposed to contents of the sibling. The buffers are |
| 184 | flushed either on exit to user space or on VMENTER so malicious code |
| 185 | in user space or the guest cannot speculatively access them. |
| 186 | |
| 187 | The mitigation is hooked into all variants of halt()/mwait(), but does |
| 188 | not cover the legacy ACPI IO-Port mechanism because the ACPI idle driver |
| 189 | has been superseded by the intel_idle driver around 2010 and is |
| 190 | preferred on all affected CPUs which are expected to gain the MD_CLEAR |
| 191 | functionality in microcode. Aside of that the IO-Port mechanism is a |
| 192 | legacy interface which is only used on older systems which are either |
| 193 | not affected or do not receive microcode updates anymore. |