Make the RMSPropOptimizer docstring more explicit about sparse vs. dense

PiperOrigin-RevId: 165335237
diff --git a/tensorflow/python/training/rmsprop.py b/tensorflow/python/training/rmsprop.py
index eb814a7..d046456 100644
--- a/tensorflow/python/training/rmsprop.py
+++ b/tensorflow/python/training/rmsprop.py
@@ -63,9 +63,17 @@
                name="RMSProp"):
     """Construct a new RMSProp optimizer.
 
-    Note that in dense implement of this algorithm, m_t and v_t will
-    update even if g is zero, but in sparse implement, m_t and v_t
-    will not update in iterations g is zero.
+    Note that in the dense implementation of this algorithm, variables and their
+    corresponding accumulators (momentum, gradient moving average, square
+    gradient moving average) will be updated even if the gradient is zero
+    (i.e. accumulators will decay, momentum will be applied). The sparse
+    implementation (used when the gradient is an `IndexedSlices` object,
+    typically because of `tf.gather` or an embedding lookup in the forward pass)
+    will not update variable slices or their accumulators unless those slices
+    were used in the forward pass (nor is there an "eventual" correction to
+    account for these omitted updates). This leads to more efficient updates for
+    large embedding lookup tables (where most of the slices are not accessed in
+    a particular graph execution), but differs from the published algorithm.
 
     Args:
       learning_rate: A Tensor or a floating point value.  The learning rate.