[PATCH] Another couple of alterations to the memory barrier doc

Make another couple of alterations to the memory barrier document following
suggestions by Alan Stern and in co-operation with Paul McKenney:

 (*) Rework the point of introduction of memory barriers and the description
     of what they are to reiterate why they're needed.

 (*) Modify a statement about the use of data dependency barriers to note that
     other barriers can be used instead (as they imply DD-barriers).

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-By: Paul E. McKenney <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index 4710845d..cc53f47 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -262,9 +262,14 @@
 CPU to restrict the order.
 
 Memory barriers are such interventions.  They impose a perceived partial
-ordering between the memory operations specified on either side of the barrier.
-They request that the sequence of memory events generated appears to other
-parts of the system as if the barrier is effective on that CPU.
+ordering over the memory operations on either side of the barrier.
+
+Such enforcement is important because the CPUs and other devices in a system
+can use a variety of tricks to improve performance - including reordering,
+deferral and combination of memory operations; speculative loads; speculative
+branch prediction and various types of caching.  Memory barriers are used to
+override or suppress these tricks, allowing the code to sanely control the
+interaction of multiple CPUs and/or devices.
 
 
 VARIETIES OF MEMORY BARRIER
@@ -461,8 +466,8 @@
 isn't, and this behaviour can be observed on certain real CPUs (such as the DEC
 Alpha).
 
-To deal with this, a data dependency barrier must be inserted between the
-address load and the data load:
+To deal with this, a data dependency barrier or better must be inserted
+between the address load and the data load:
 
 	CPU 1		CPU 2
 	===============	===============