doc: De-emphasize smp_read_barrier_depends

This commit keeps only the historical and low-level discussion of
smp_read_barrier_depends().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Adjusted to allow for David Howells feedback on prior commit. ]
diff --git a/Documentation/RCU/Design/Requirements/Requirements.html b/Documentation/RCU/Design/Requirements/Requirements.html
index 62e847b..571c3d7 100644
--- a/Documentation/RCU/Design/Requirements/Requirements.html
+++ b/Documentation/RCU/Design/Requirements/Requirements.html
@@ -581,7 +581,8 @@
 DYNIX/ptx used an explicit memory barrier for publication, but had nothing
 resembling <tt>rcu_dereference()</tt> for subscription, nor did it
 have anything resembling the <tt>smp_read_barrier_depends()</tt>
-that was later subsumed into <tt>rcu_dereference()</tt>.
+that was later subsumed into <tt>rcu_dereference()</tt> and later
+still into <tt>READ_ONCE()</tt>.
 The need for these operations made itself known quite suddenly at a
 late-1990s meeting with the DEC Alpha architects, back in the days when
 DEC was still a free-standing company.
diff --git a/Documentation/RCU/rcu_dereference.txt b/Documentation/RCU/rcu_dereference.txt
index 1acb26b..ab96227 100644
--- a/Documentation/RCU/rcu_dereference.txt
+++ b/Documentation/RCU/rcu_dereference.txt
@@ -122,11 +122,7 @@
 		Note that if checks for being within an RCU read-side
 		critical section are not required and the pointer is never
 		dereferenced, rcu_access_pointer() should be used in place
-		of rcu_dereference(). The rcu_access_pointer() primitive
-		does not require an enclosing read-side critical section,
-		and also omits the smp_read_barrier_depends() included in
-		rcu_dereference(), which in turn should provide a small
-		performance gain in some CPUs (e.g., the DEC Alpha).
+		of rcu_dereference().
 
 	o	The comparison is against a pointer that references memory
 		that was initialized "a long time ago."  The reason
diff --git a/Documentation/RCU/whatisRCU.txt b/Documentation/RCU/whatisRCU.txt
index df62466..a27fbfb 100644
--- a/Documentation/RCU/whatisRCU.txt
+++ b/Documentation/RCU/whatisRCU.txt
@@ -600,8 +600,7 @@
 
 	#define rcu_dereference(p) \
 	({ \
-		typeof(p) _________p1 = p; \
-		smp_read_barrier_depends(); \
+		typeof(p) _________p1 = READ_ONCE(p); \
 		(_________p1); \
 	})
 
diff --git a/Documentation/circular-buffers.txt b/Documentation/circular-buffers.txt
index d462817..53e51ca 100644
--- a/Documentation/circular-buffers.txt
+++ b/Documentation/circular-buffers.txt
@@ -220,8 +220,7 @@
 
 Note the use of READ_ONCE() and smp_load_acquire() to read the
 opposition index.  This prevents the compiler from discarding and
-reloading its cached value - which some compilers will do across
-smp_read_barrier_depends().  This isn't strictly needed if you can
+reloading its cached value.  This isn't strictly needed if you can
 be sure that the opposition index will _only_ be used the once.
 The smp_load_acquire() additionally forces the CPU to order against
 subsequent memory references.  Similarly, smp_store_release() is used
diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index 13fd35b..a863009 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -1818,7 +1818,7 @@
 	GENERAL		mb()			smp_mb()
 	WRITE		wmb()			smp_wmb()
 	READ		rmb()			smp_rmb()
-	DATA DEPENDENCY	read_barrier_depends()	smp_read_barrier_depends()
+	DATA DEPENDENCY				READ_ONCE()
 
 
 All memory barriers except the data dependency barriers imply a compiler
@@ -2867,7 +2867,10 @@
 
 Other CPUs may also have split caches, but must coordinate between the various
 cachelets for normal memory accesses.  The semantics of the Alpha removes the
-need for coordination in the absence of memory barriers.
+need for hardware coordination in the absence of memory barriers, which
+permitted Alpha to sport higher CPU clock rates back in the day.  However,
+please note that smp_read_barrier_depends() should not be used except in
+Alpha arch-specific code and within the READ_ONCE() macro.
 
 
 CACHE COHERENCY VS DMA