Make the RMSPropOptimizer docstring more explicit about sparse vs. dense
PiperOrigin-RevId: 165335237
diff --git a/tensorflow/python/training/rmsprop.py b/tensorflow/python/training/rmsprop.py
index eb814a7..d046456 100644
--- a/tensorflow/python/training/rmsprop.py
+++ b/tensorflow/python/training/rmsprop.py
@@ -63,9 +63,17 @@
name="RMSProp"):
"""Construct a new RMSProp optimizer.
- Note that in dense implement of this algorithm, m_t and v_t will
- update even if g is zero, but in sparse implement, m_t and v_t
- will not update in iterations g is zero.
+ Note that in the dense implementation of this algorithm, variables and their
+ corresponding accumulators (momentum, gradient moving average, square
+ gradient moving average) will be updated even if the gradient is zero
+ (i.e. accumulators will decay, momentum will be applied). The sparse
+ implementation (used when the gradient is an `IndexedSlices` object,
+ typically because of `tf.gather` or an embedding lookup in the forward pass)
+ will not update variable slices or their accumulators unless those slices
+ were used in the forward pass (nor is there an "eventual" correction to
+ account for these omitted updates). This leads to more efficient updates for
+ large embedding lookup tables (where most of the slices are not accessed in
+ a particular graph execution), but differs from the published algorithm.
Args:
learning_rate: A Tensor or a floating point value. The learning rate.