staging/lustre/clio: replace semaphore with mutex

According https://www.kernel.org/doc/Documentation/mutex-design.txt:
- the mutex subsystem is slightly faster and has better scalability
  for contended workloads. In terms of 'ops per CPU cycle', the
  semaphore kernel performed 551 ops/sec per 1% of CPU time used,
  while the mutex kernel performed 3825 ops/sec per 1% of CPU time
  used - it was 6.9 times more efficient.
- there are no fastpath tradeoffs, the mutex fastpath is just as
  tight as the semaphore fastpath. On x86, the locking fastpath is
  2 instructions.
- 'struct mutex' semantics are well-defined and are enforced if
  CONFIG_DEBUG_MUTEXES is turned on. Semaphores on the other hand
  have virtually no debugging code or instrumentation.

One more benefit of mutex is optimistic spinning. It try to spin for
acquisition when there are no pending waiters and the lock owner is
currently running on a (different) CPU. The rationale is that if the
lock owner is running, it is likely to release the lock soon.

This significantly reduce amount of context switches when locked
region is small and we have high contention.
Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-on: http://review.whamcloud.com/9095
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4257
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 files changed