IB/core: Add support for on demand paging regions
* Extend the umem struct to keep the ODP related data.
* Allocate and initialize the ODP related information in the umem
(page_list, dma_list) and freeing as needed in the end of the run.
* Store a reference to the process PID struct in the ucontext. Used to
safely obtain the task_struct and the mm during fault handling,
without preventing the task destruction if needed.
* Add 2 helper functions: ib_umem_odp_map_dma_pages and
ib_umem_odp_unmap_dma_pages. These functions get the DMA addresses
of specific pages of the umem (and, currently, pin them).
* Support for page faults only - IB core will keep the reference on
the pages used and call put_page when freeing an ODP umem
area. Invalidations support will be added in a later patch.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index c328e46..5baceb7 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -39,6 +39,7 @@
#include <linux/hugetlb.h>
#include <linux/dma-attrs.h>
#include <linux/slab.h>
+#include <rdma/ib_umem_odp.h>
#include "uverbs.h"
@@ -69,6 +70,10 @@
/**
* ib_umem_get - Pin and DMA map userspace memory.
+ *
+ * If access flags indicate ODP memory, avoid pinning. Instead, stores
+ * the mm for future page fault handling.
+ *
* @context: userspace context to pin memory for
* @addr: userspace virtual address to start at
* @size: length of region to pin
@@ -117,6 +122,17 @@
(IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE |
IB_ACCESS_REMOTE_ATOMIC | IB_ACCESS_MW_BIND));
+ if (access & IB_ACCESS_ON_DEMAND) {
+ ret = ib_umem_odp_get(context, umem);
+ if (ret) {
+ kfree(umem);
+ return ERR_PTR(ret);
+ }
+ return umem;
+ }
+
+ umem->odp_data = NULL;
+
/* We assume the memory is from hugetlb until proved otherwise */
umem->hugetlb = 1;
@@ -237,6 +253,11 @@
struct task_struct *task;
unsigned long diff;
+ if (umem->odp_data) {
+ ib_umem_odp_release(umem);
+ return;
+ }
+
__ib_umem_release(umem->context->device, umem, 1);
task = get_pid_task(umem->pid, PIDTYPE_PID);
@@ -285,6 +306,9 @@
int n;
struct scatterlist *sg;
+ if (umem->odp_data)
+ return ib_umem_num_pages(umem);
+
shift = ilog2(umem->page_size);
n = 0;