bpo-36324:  Apply review comments from Allen Downey (GH-15693)

diff --git a/Doc/library/statistics.rst b/Doc/library/statistics.rst
index 04b731d..62d856b 100644
--- a/Doc/library/statistics.rst
+++ b/Doc/library/statistics.rst
@@ -26,10 +26,10 @@
    Unless explicitly noted otherwise, these functions support :class:`int`,
    :class:`float`, :class:`decimal.Decimal` and :class:`fractions.Fraction`.
    Behaviour with other types (whether in the numeric tower or not) is
-   currently unsupported.  Mixed types are also undefined and
-   implementation-dependent.  If your input data consists of mixed types,
-   you may be able to use :func:`map` to ensure a consistent result, e.g.
-   ``map(float, input_data)``.
+   currently unsupported.  Collections with a mix of types are also undefined
+   and implementation-dependent.  If your input data consists of mixed types,
+   you may be able to use :func:`map` to ensure a consistent result, for
+   example: ``map(float, input_data)``.
 
 Averages and measures of central location
 -----------------------------------------
@@ -102,11 +102,9 @@
    .. note::
 
       The mean is strongly affected by outliers and is not a robust estimator
-      for central location: the mean is not necessarily a typical example of the
-      data points.  For more robust, although less efficient, measures of
-      central location, see :func:`median` and :func:`mode`.  (In this case,
-      "efficient" refers to statistical efficiency rather than computational
-      efficiency.)
+      for central location: the mean is not necessarily a typical example of
+      the data points.  For more robust measures of central location, see
+      :func:`median` and :func:`mode`.
 
       The sample mean gives an unbiased estimate of the true population mean,
       which means that, taken on average over all the possible samples,
@@ -120,9 +118,8 @@
    Convert *data* to floats and compute the arithmetic mean.
 
    This runs faster than the :func:`mean` function and it always returns a
-   :class:`float`.  The result is highly accurate but not as perfect as
-   :func:`mean`.  If the input dataset is empty, raises a
-   :exc:`StatisticsError`.
+   :class:`float`.  The *data* may be a sequence or iterator.  If the input
+   dataset is empty, raises a :exc:`StatisticsError`.
 
    .. doctest::
 
@@ -136,15 +133,20 @@
 
    Convert *data* to floats and compute the geometric mean.
 
+   The geometric mean indicates the central tendency or typical value of the
+   *data* using the product of the values (as opposed to the arithmetic mean
+   which uses their sum).
+
    Raises a :exc:`StatisticsError` if the input dataset is empty,
    if it contains a zero, or if it contains a negative value.
+   The *data* may be a sequence or iterator.
 
    No special efforts are made to achieve exact results.
    (However, this may change in the future.)
 
    .. doctest::
 
-      >>> round(geometric_mean([54, 24, 36]), 9)
+      >>> round(geometric_mean([54, 24, 36]), 1)
       36.0
 
    .. versionadded:: 3.8
@@ -174,7 +176,7 @@
       3.6
 
    Using the arithmetic mean would give an average of about 5.167, which
-   is too high.
+   is well over the aggregate P/E ratio.
 
    :exc:`StatisticsError` is raised if *data* is empty, or any element
    is less than zero.
@@ -312,10 +314,10 @@
    The mode (when it exists) is the most typical value and serves as a
    measure of central location.
 
-   If there are multiple modes, returns the first one encountered in the *data*.
-   If the smallest or largest of multiple modes is desired instead, use
-   ``min(multimode(data))`` or ``max(multimode(data))``.  If the input *data* is
-   empty, :exc:`StatisticsError` is raised.
+   If there are multiple modes with the same frequency, returns the first one
+   encountered in the *data*.  If the smallest or largest of those is
+   desired instead, use ``min(multimode(data))`` or ``max(multimode(data))``.
+   If the input *data* is empty, :exc:`StatisticsError` is raised.
 
    ``mode`` assumes discrete data, and returns a single value. This is the
    standard treatment of the mode as commonly taught in schools:
@@ -325,8 +327,8 @@
       >>> mode([1, 1, 2, 3, 3, 3, 3, 4])
       3
 
-   The mode is unique in that it is the only statistic which also applies
-   to nominal (non-numeric) data:
+   The mode is unique in that it is the only statistic in this package that
+   also applies to nominal (non-numeric) data:
 
    .. doctest::
 
@@ -368,15 +370,16 @@
 
 .. function:: pvariance(data, mu=None)
 
-   Return the population variance of *data*, a non-empty iterable of real-valued
-   numbers.  Variance, or second moment about the mean, is a measure of the
-   variability (spread or dispersion) of data.  A large variance indicates that
-   the data is spread out; a small variance indicates it is clustered closely
-   around the mean.
+   Return the population variance of *data*, a non-empty sequence or iterator
+   of real-valued numbers.  Variance, or second moment about the mean, is a
+   measure of the variability (spread or dispersion) of data.  A large
+   variance indicates that the data is spread out; a small variance indicates
+   it is clustered closely around the mean.
 
-   If the optional second argument *mu* is given, it should be the mean of
-   *data*.  If it is missing or ``None`` (the default), the mean is
-   automatically calculated.
+   If the optional second argument *mu* is given, it is typically the mean of
+   the *data*.  It can also be used to compute the second moment around a
+   point that is not the mean.  If it is missing or ``None`` (the default),
+   the arithmetic mean is automatically calculated.
 
    Use this function to calculate the variance from the entire population.  To
    estimate the variance from a sample, the :func:`variance` function is usually
@@ -401,10 +404,6 @@
       >>> pvariance(data, mu)
       1.25
 
-   This function does not attempt to verify that you have passed the actual mean
-   as *mu*.  Using arbitrary values for *mu* may lead to invalid or impossible
-   results.
-
    Decimals and Fractions are supported:
 
    .. doctest::
@@ -423,11 +422,11 @@
       σ².  When called on a sample instead, this is the biased sample variance
       s², also known as variance with N degrees of freedom.
 
-      If you somehow know the true population mean μ, you may use this function
-      to calculate the variance of a sample, giving the known population mean as
-      the second argument.  Provided the data points are representative
-      (e.g. independent and identically distributed), the result will be an
-      unbiased estimate of the population variance.
+      If you somehow know the true population mean μ, you may use this
+      function to calculate the variance of a sample, giving the known
+      population mean as the second argument.  Provided the data points are a
+      random sample of the population, the result will be an unbiased estimate
+      of the population variance.
 
 
 .. function:: stdev(data, xbar=None)
@@ -502,19 +501,19 @@
       :func:`pvariance` function as the *mu* parameter to get the variance of a
       sample.
 
-.. function:: quantiles(dist, *, n=4, method='exclusive')
+.. function:: quantiles(data, *, n=4, method='exclusive')
 
-   Divide *dist* into *n* continuous intervals with equal probability.
+   Divide *data* into *n* continuous intervals with equal probability.
    Returns a list of ``n - 1`` cut points separating the intervals.
 
    Set *n* to 4 for quartiles (the default).  Set *n* to 10 for deciles.  Set
    *n* to 100 for percentiles which gives the 99 cuts points that separate
-   *dist* in to 100 equal sized groups.  Raises :exc:`StatisticsError` if *n*
+   *data* in to 100 equal sized groups.  Raises :exc:`StatisticsError` if *n*
    is not least 1.
 
-   The *dist* can be any iterable containing sample data or it can be an
+   The *data* can be any iterable containing sample data or it can be an
    instance of a class that defines an :meth:`~inv_cdf` method.  For meaningful
-   results, the number of data points in *dist* should be larger than *n*.
+   results, the number of data points in *data* should be larger than *n*.
    Raises :exc:`StatisticsError` if there are not at least two data points.
 
    For sample data, the cut points are linearly interpolated from the
@@ -523,7 +522,7 @@
    cut-point will evaluate to ``104``.
 
    The *method* for computing quantiles can be varied depending on
-   whether the data in *dist* includes or excludes the lowest and
+   whether the data in *data* includes or excludes the lowest and
    highest possible values from the population.
 
    The default *method* is "exclusive" and is used for data sampled from
@@ -535,14 +534,14 @@
 
    Setting the *method* to "inclusive" is used for describing population
    data or for samples that are known to include the most extreme values
-   from the population.  The minimum value in *dist* is treated as the 0th
+   from the population.  The minimum value in *data* is treated as the 0th
    percentile and the maximum value is treated as the 100th percentile.
    The portion of the population falling below the *i-th* of *m* sorted
    data points is computed as ``(i - 1) / (m - 1)``.  Given 11 sample
    values, the method sorts them and assigns the following percentiles:
    0%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 100%.
 
-   If *dist* is an instance of a class that defines an
+   If *data* is an instance of a class that defines an
    :meth:`~inv_cdf` method, setting *method* has no effect.
 
    .. doctest::
@@ -580,7 +579,7 @@
 :class:`NormalDist` is a tool for creating and manipulating normal
 distributions of a `random variable
 <http://www.stat.yale.edu/Courses/1997-98/101/ranvar.htm>`_.  It is a
-composite class that treats the mean and standard deviation of data
+class that treats the mean and standard deviation of data
 measurements as a single entity.
 
 Normal distributions arise from the `Central Limit Theorem
@@ -616,13 +615,14 @@
 
     .. classmethod:: NormalDist.from_samples(data)
 
-       Makes a normal distribution instance computed from sample data.  The
-       *data* can be any :term:`iterable` and should consist of values that
-       can be converted to type :class:`float`.
+       Makes a normal distribution instance with *mu* and *sigma* parameters
+       estimated from the *data* using :func:`fmean` and :func:`stdev`.
 
-       If *data* does not contain at least two elements, raises
-       :exc:`StatisticsError` because it takes at least one point to estimate
-       a central value and at least two points to estimate dispersion.
+       The *data* can be any :term:`iterable` and should consist of values
+       that can be converted to type :class:`float`.  If *data* does not
+       contain at least two elements, raises :exc:`StatisticsError` because it
+       takes at least one point to estimate a central value and at least two
+       points to estimate dispersion.
 
     .. method:: NormalDist.samples(n, *, seed=None)
 
@@ -636,10 +636,10 @@
     .. method:: NormalDist.pdf(x)
 
        Using a `probability density function (pdf)
-       <https://en.wikipedia.org/wiki/Probability_density_function>`_,
-       compute the relative likelihood that a random variable *X* will be near
-       the given value *x*.  Mathematically, it is the ratio ``P(x <= X <
-       x+dx) / dx``.
+       <https://en.wikipedia.org/wiki/Probability_density_function>`_, compute
+       the relative likelihood that a random variable *X* will be near the
+       given value *x*.  Mathematically, it is the limit of the ratio ``P(x <=
+       X < x+dx) / dx`` as *dx* approaches zero.
 
        The relative likelihood is computed as the probability of a sample
        occurring in a narrow range divided by the width of the range (hence
@@ -667,8 +667,10 @@
 
     .. method:: NormalDist.overlap(other)
 
-       Returns a value between 0.0 and 1.0 giving the overlapping area for
-       the two probability density functions.
+       Measures the agreement between two normal probability distributions.
+       Returns a value between 0.0 and 1.0 giving `the overlapping area for
+       the two probability density functions
+       <https://www.rasch.org/rmt/rmt101r.htm>`_.
 
     Instances of :class:`NormalDist` support addition, subtraction,
     multiplication and division by a constant.  These operations
@@ -740,12 +742,11 @@
     ...     return (3*x + 7*x*y - 5*y) / (11 * z)
     ...
     >>> n = 100_000
-    >>> seed = 86753099035768
-    >>> X = NormalDist(10, 2.5).samples(n, seed=seed)
-    >>> Y = NormalDist(15, 1.75).samples(n, seed=seed)
-    >>> Z = NormalDist(50, 1.25).samples(n, seed=seed)
-    >>> NormalDist.from_samples(map(model, X, Y, Z))     # doctest: +SKIP
-    NormalDist(mu=1.8661894803304777, sigma=0.65238717376862)
+    >>> X = NormalDist(10, 2.5).samples(n, seed=3652260728)
+    >>> Y = NormalDist(15, 1.75).samples(n, seed=4582495471)
+    >>> Z = NormalDist(50, 1.25).samples(n, seed=6582483453)
+    >>> quantiles(map(model, X, Y, Z))       # doctest: +SKIP
+    [1.4591308524824727, 1.8035946855390597, 2.175091447274739]
 
 Normal distributions commonly arise in machine learning problems.