Improve memory coherence management in screenshot code

The existing code worked in practice, but wasn't quite correct in
theory and relied on implementation details of other code. It's still
somewhat unusual and subtle, but now is correct-in-theory (I believe)
and a little better documented.

Bug: 16044767
Change-Id: I22b01d6640f0b7beca7cbfc74981795a3218b064
diff --git a/services/surfaceflinger/Barrier.h b/services/surfaceflinger/Barrier.h
index 6f8507e..3e9d443 100644
--- a/services/surfaceflinger/Barrier.h
+++ b/services/surfaceflinger/Barrier.h
@@ -28,15 +28,25 @@
 public:
     inline Barrier() : state(CLOSED) { }
     inline ~Barrier() { }
+
+    // Release any threads waiting at the Barrier.
+    // Provides release semantics: preceding loads and stores will be visible
+    // to other threads before they wake up.
     void open() {
         Mutex::Autolock _l(lock);
         state = OPENED;
         cv.broadcast();
     }
+
+    // Reset the Barrier, so wait() will block until open() has been called.
     void close() {
         Mutex::Autolock _l(lock);
         state = CLOSED;
     }
+
+    // Wait until the Barrier is OPEN.
+    // Provides acquire semantics: no subsequent loads or stores will occur
+    // until wait() returns.
     void wait() const {
         Mutex::Autolock _l(lock);
         while (state == CLOSED) {