mm: provide speculative fault infrastructure
Provide infrastructure to do a speculative fault (not holding
mmap_sem).
The not holding of mmap_sem means we can race against VMA
change/removal and page-table destruction. We use the SRCU VMA freeing
to keep the VMA around. We use the VMA seqcount to detect change
(including umapping / page-table deletion) and we use gup_fast() style
page-table walking to deal with page-table races.
Once we've obtained the page and are ready to update the PTE, we
validate if the state we started the fault with is still valid, if
not, we'll fail the fault with VM_FAULT_RETRY, otherwise we update the
PTE and we're done.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
[Manage the newly introduced pte_spinlock() for speculative page
fault to fail if the VMA is touched in our back]
[Rename vma_is_dead() to vma_has_changed() and declare it here]
[Fetch p4d and pud]
[Set vmd.sequence in __handle_mm_fault()]
[Abort speculative path when handle_userfault() has to be called]
[Add additional VMA's flags checks in handle_speculative_fault()]
[Clear FAULT_FLAG_ALLOW_RETRY in handle_speculative_fault()]
[Don't set vmf->pte and vmf->ptl if pte_map_lock() failed]
[Remove warning comment about waiting for !seq&1 since we don't want
to wait]
[Remove warning about no huge page support, mention it explictly]
[Don't call do_fault() in the speculative path as __do_fault() calls
vma->vm_ops->fault() which may want to release mmap_sem]
[Only vm_fault pointer argument for vma_has_changed()]
[Fix check against huge page, calling pmd_trans_huge()]
[Use READ_ONCE() when reading VMA's fields in the speculative path]
[Explicitly check for __HAVE_ARCH_PTE_SPECIAL as we can't support for
processing done in vm_normal_page()]
[Check that vma->anon_vma is already set when starting the speculative
path]
[Check for memory policy as we can't support MPOL_INTERLEAVE case due to
the processing done in mpol_misplaced()]
[Don't support VMA growing up or down]
[Move check on vm_sequence just before calling handle_pte_fault()]
[Don't build SPF services if !CONFIG_SPECULATIVE_PAGE_FAULT]
[Add mem cgroup oom check]
[Use READ_ONCE to access p*d entries]
[Replace deprecated ACCESS_ONCE() by READ_ONCE() in vma_has_changed()]
[Don't fetch pte again in handle_pte_fault() when running the speculative
path]
[Check PMD against concurrent collapsing operation]
[Try spin lock the pte during the speculative path to avoid deadlock with
other CPU's invalidating the TLB and requiring this CPU to catch the
inter processor's interrupt]
[Move define of FAULT_FLAG_SPECULATIVE here]
[Introduce __handle_speculative_fault() and add a check against
mm->mm_users in handle_speculative_fault() defined in mm.h]
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Change-Id: I45cbe79a96c7aa234cace421a36550e7fb0b9368
Patch-mainline: linux-mm @ Tue, 17 Apr 2018 16:33:24
[vinmenon@codeaurora.org: fix !CONFIG_NUMA build,
4.9 porting fixes + checkpatch fixes + changes in the speculative
page fault handler to get rid of p4d page table entries which is not
supported on 4.9 + changes in handle_mm_fault to support fe->orig_pte
that is originally not supported in 4.9 + use uninitialized_var in
handle_pte_fault to fix a spurious warning]
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
5 files changed