KVM: Allow not-present guest page faults to bypass kvm
There are two classes of page faults trapped by kvm:
- host page faults, where the fault is needed to allow kvm to install
the shadow pte or update the guest accessed and dirty bits
- guest page faults, where the guest has faulted and kvm simply injects
the fault back into the guest to handle
The second class, guest page faults, is pure overhead. We can eliminate
some of it on vmx using the following evil trick:
- when we set up a shadow page table entry, if the corresponding guest pte
is not present, set up the shadow pte as not present
- if the guest pte _is_ present, mark the shadow pte as present but also
set one of the reserved bits in the shadow pte
- tell the vmx hardware not to trap faults which have the present bit clear
With this, normal page-not-present faults go directly to the guest,
bypassing kvm entirely.
Unfortunately, this trick only works on Intel hardware, as AMD lacks a
way to discriminate among page faults based on error code. It is also
a little risky since it uses reserved bits which might become unreserved
in the future, so a module parameter is provided to disable it.
Signed-off-by: Avi Kivity <avi@qumranet.com>
diff --git a/drivers/kvm/vmx.c b/drivers/kvm/vmx.c
index 8eb49e0..27a3318 100644
--- a/drivers/kvm/vmx.c
+++ b/drivers/kvm/vmx.c
@@ -26,6 +26,7 @@
#include <linux/mm.h>
#include <linux/highmem.h>
#include <linux/sched.h>
+#include <linux/moduleparam.h>
#include <asm/io.h>
#include <asm/desc.h>
@@ -33,6 +34,9 @@
MODULE_AUTHOR("Qumranet");
MODULE_LICENSE("GPL");
+static int bypass_guest_pf = 1;
+module_param(bypass_guest_pf, bool, 0);
+
struct vmcs {
u32 revision_id;
u32 abort;
@@ -1535,8 +1539,8 @@
}
vmcs_write32(CPU_BASED_VM_EXEC_CONTROL, exec_control);
- vmcs_write32(PAGE_FAULT_ERROR_CODE_MASK, 0);
- vmcs_write32(PAGE_FAULT_ERROR_CODE_MATCH, 0);
+ vmcs_write32(PAGE_FAULT_ERROR_CODE_MASK, !!bypass_guest_pf);
+ vmcs_write32(PAGE_FAULT_ERROR_CODE_MATCH, !!bypass_guest_pf);
vmcs_write32(CR3_TARGET_COUNT, 0); /* 22.2.1 */
vmcs_writel(HOST_CR0, read_cr0()); /* 22.2.3 */
@@ -2582,6 +2586,9 @@
if (r)
goto out1;
+ if (bypass_guest_pf)
+ kvm_mmu_set_nonpresent_ptes(~0xffeull, 0ull);
+
return 0;
out1: