staging/rdma/hfi: fix CQ completion order issue

The current implementation of the sdma_wait variable
has a timing hole that can cause a completion Q entry
to be returned from a pio send prior to an older
sdma packets completion queue entry.

The sdma_wait variable used to be decremented prior to
calling the packet complete routine.  The hole is between decrement
and the verbs completion where send engine using pio could return
a out of order completion in that window.

This patch closes the hole by allowing an API option to
specify an sdma_drained callback.   The atomic dec
is positioned after the complete callback to avoid the
window as long as the pio path doesn't execute when
there is a non-zero sdma count.

Reviewed-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 files changed