IB/rdmavt: Handle local operations in post send

Some work requests are local operations, such as IB_WR_REG_MR and
IB_WR_LOCAL_INV. They differ from non-local operations in that:

(1) Local operations can be processed immediately without being posted
to the send queue if neither fencing nor completion generation is needed.
However, to ensure correct ordering, once a local operation is posted to
the work queue due to fencing or completion requiement, all subsequent
local operations must also be posted to the work queue until all the
local operations on the work queue have completed.

(2) Local operations don't send packets over the wire and thus don't
need (and shouldn't update) the packet sequence numbers.

Define a new a flag bit for the post send table to identify local
operations.

Add a new field to the QP structure to track the number of local
operations on the send queue to determine if direct processing of new
local operations should be enabled/disabled.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
diff --git a/include/rdma/rdmavt_qp.h b/include/rdma/rdmavt_qp.h
index a90d1e9..b0ab12b 100644
--- a/include/rdma/rdmavt_qp.h
+++ b/include/rdma/rdmavt_qp.h
@@ -231,6 +231,7 @@
 #define RVT_OPERATION_PRIV        0x00000001
 #define RVT_OPERATION_ATOMIC      0x00000002
 #define RVT_OPERATION_ATOMIC_SGE  0x00000004
+#define RVT_OPERATION_LOCAL       0x00000008
 
 #define RVT_OPERATION_MAX (IB_WR_RESERVED10 + 1)
 
@@ -363,6 +364,8 @@
 	struct rvt_sge_state s_ack_rdma_sge;
 	struct timer_list s_timer;
 
+	atomic_t local_ops_pending; /* number of fast_reg/local_inv reqs */
+
 	/*
 	 * This sge list MUST be last. Do not add anything below here.
 	 */