kvm: optimize out smp_mb after srcu_read_unlock

I noticed that srcu_read_lock/unlock both have a memory barrier,
so just by moving srcu_read_unlock earlier we can get rid of
one call to smp_mb() using smp_mb__after_srcu_read_unlock instead.

Unsurprisingly, the gain is small but measureable using the unit test
microbenchmark:
before
        vmcall in the ballpark of 1410 cycles
after
        vmcall in the ballpark of 1360 cycles

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
1 file changed