bpo-38490: statistics: Add covariance, Pearson's correlation, and simple linear regression (#16813)
Co-authored-by: Tymoteusz Wołodźko <twolodzko+gitkraken@gmail.com
diff --git a/Doc/library/statistics.rst b/Doc/library/statistics.rst
index 695fb49..117d2b6 100644
--- a/Doc/library/statistics.rst
+++ b/Doc/library/statistics.rst
@@ -68,6 +68,17 @@
:func:`variance` Sample variance of data.
======================= =============================================
+Statistics for relations between two inputs
+-------------------------------------------
+
+These functions calculate statistics regarding relations between two inputs.
+
+========================= =====================================================
+:func:`covariance` Sample covariance for two variables.
+:func:`correlation` Pearson's correlation coefficient for two variables.
+:func:`linear_regression` Intercept and slope for simple linear regression.
+========================= =====================================================
+
Function details
----------------
@@ -566,6 +577,98 @@
.. versionadded:: 3.8
+.. function:: covariance(x, y, /)
+
+ Return the sample covariance of two inputs *x* and *y*. Covariance
+ is a measure of the joint variability of two inputs.
+
+ Both inputs must be of the same length (no less than two), otherwise
+ :exc:`StatisticsError` is raised.
+
+ Examples:
+
+ .. doctest::
+
+ >>> x = [1, 2, 3, 4, 5, 6, 7, 8, 9]
+ >>> y = [1, 2, 3, 1, 2, 3, 1, 2, 3]
+ >>> covariance(x, y)
+ 0.75
+ >>> z = [9, 8, 7, 6, 5, 4, 3, 2, 1]
+ >>> covariance(x, z)
+ -7.5
+ >>> covariance(z, x)
+ -7.5
+
+ .. versionadded:: 3.10
+
+.. function:: correlation(x, y, /)
+
+ Return the `Pearson's correlation coefficient
+ <https://en.wikipedia.org/wiki/Pearson_correlation_coefficient>`_
+ for two inputs. Pearson's correlation coefficient *r* takes values
+ between -1 and +1. It measures the strength and direction of the linear
+ relationship, where +1 means very strong, positive linear relationship,
+ -1 very strong, negative linear relationship, and 0 no linear relationship.
+
+ Both inputs must be of the same length (no less than two), and need
+ not to be constant, otherwise :exc:`StatisticsError` is raised.
+
+ Examples:
+
+ .. doctest::
+
+ >>> x = [1, 2, 3, 4, 5, 6, 7, 8, 9]
+ >>> y = [9, 8, 7, 6, 5, 4, 3, 2, 1]
+ >>> correlation(x, x)
+ 1.0
+ >>> correlation(x, y)
+ -1.0
+
+ .. versionadded:: 3.10
+
+.. function:: linear_regression(regressor, dependent_variable)
+
+ Return the intercept and slope of `simple linear regression
+ <https://en.wikipedia.org/wiki/Simple_linear_regression>`_
+ parameters estimated using ordinary least squares. Simple linear
+ regression describes relationship between *regressor* and
+ *dependent variable* in terms of linear function:
+
+ *dependent_variable = intercept + slope \* regressor + noise*
+
+ where ``intercept`` and ``slope`` are the regression parameters that are
+ estimated, and noise term is an unobserved random variable, for the
+ variability of the data that was not explained by the linear regression
+ (it is equal to the difference between prediction and the actual values
+ of dependent variable).
+
+ Both inputs must be of the same length (no less than two), and regressor
+ needs not to be constant, otherwise :exc:`StatisticsError` is raised.
+
+ For example, if we took the data on the data on `release dates of the Monty
+ Python films <https://en.wikipedia.org/wiki/Monty_Python#Films>`_, and used
+ it to predict the cumulative number of Monty Python films produced, we could
+ predict what would be the number of films they could have made till year
+ 2019, assuming that they kept the pace.
+
+ .. doctest::
+
+ >>> year = [1971, 1975, 1979, 1982, 1983]
+ >>> films_total = [1, 2, 3, 4, 5]
+ >>> intercept, slope = linear_regression(year, films_total)
+ >>> round(intercept + slope * 2019)
+ 16
+
+ We could also use it to "predict" how many Monty Python films existed when
+ Brian Cohen was born.
+
+ .. doctest::
+
+ >>> round(intercept + slope * 1)
+ -610
+
+ .. versionadded:: 3.10
+
Exceptions
----------