Make mutex semaphore based.

This implementation improves performance of SkMutex acquire / release pair from 42ns -> 13 ns.

SkSharedMutex and SkSpinlock have the same performance.

It also removes specialized windows and linux/mac code.

BUG=skia:

Review URL: https://codereview.chromium.org/1359733002
9 files changed