Blame - Doc/library/statistics.rst - platform/external/python/cpython3

blob: 26bb592b23812b8d36447d72820122ff1f18556d [file] [log] [blame]

Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	1	:mod:`statistics` --- Mathematical statistics functions
				2	=======================================================
				3
				4	.. module:: statistics
				5	:synopsis: mathematical statistics functions
Terry Jan Reedy	fa089b9	2016-06-11 15:02:54 -0400	[diff] [blame]	6
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	7	.. moduleauthor:: Steven D'Aprano <steve+python@pearwood.info>
				8	.. sectionauthor:: Steven D'Aprano <steve+python@pearwood.info>
				9
				10	.. versionadded:: 3.4
				11
Terry Jan Reedy	fa089b9	2016-06-11 15:02:54 -0400	[diff] [blame]	12	Source code: :source:`Lib/statistics.py`
				13
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	14	.. testsetup:: *
				15
				16	from statistics import *
				17	__name__ = '<doctest>'
				18
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	19	--------------
				20
				21	This module provides functions for calculating mathematical statistics of
				22	numeric (:class:`Real`-valued) data.
				23
Nick Coghlan	73afe2a	2014-02-08 19:58:04 +1000	[diff] [blame]	24	.. note::
				25
				26	Unless explicitly noted otherwise, these functions support :class:`int`,
				27	:class:`float`, :class:`decimal.Decimal` and :class:`fractions.Fraction`.
				28	Behaviour with other types (whether in the numeric tower or not) is
				29	currently unsupported. Mixed types are also undefined and
				30	implementation-dependent. If your input data consists of mixed types,
				31	you may be able to use :func:`map` to ensure a consistent result, e.g.
				32	``map(float, input_data)``.
				33
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	34	Averages and measures of central location
				35	-----------------------------------------
				36
				37	These functions calculate an average or typical value from a population
				38	or sample.
				39
				40	======================= =============================================
				41	:func:`mean` Arithmetic mean ("average") of data.
Steven D'Aprano	2287318	2016-08-24 02:34:25 +1000	[diff] [blame]	42	:func:`harmonic_mean` Harmonic mean of data.
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	43	:func:`median` Median (middle value) of data.
				44	:func:`median_low` Low median of data.
				45	:func:`median_high` High median of data.
				46	:func:`median_grouped` Median, or 50th percentile, of grouped data.
				47	:func:`mode` Mode (most common value) of discrete data.
				48	======================= =============================================
				49
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	50	Measures of spread
				51	------------------
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	52
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	53	These functions calculate a measure of how much the population or sample
				54	tends to deviate from the typical or average values.
				55
				56	======================= =============================================
				57	:func:`pstdev` Population standard deviation of data.
				58	:func:`pvariance` Population variance of data.
				59	:func:`stdev` Sample standard deviation of data.
				60	:func:`variance` Sample variance of data.
				61	======================= =============================================
				62
				63
				64	Function details
				65	----------------
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	66
Georg Brandl	e051b55	2013-11-04 07:30:50 +0100	[diff] [blame]	67	Note: The functions do not require the data given to them to be sorted.
				68	However, for reading convenience, most of the examples show sorted sequences.
				69
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	70	.. function:: mean(data)
				71
Raymond Hettinger	6da9078	2016-11-21 16:31:02 -0800	[diff] [blame]	72	Return the sample arithmetic mean of data which can be a sequence or iterator.
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	73
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	74	The arithmetic mean is the sum of the data divided by the number of data
				75	points. It is commonly called "the average", although it is only one of many
				76	different mathematical averages. It is a measure of the central location of
				77	the data.
				78
				79	If data is empty, :exc:`StatisticsError` will be raised.
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	80
				81	Some examples of use:
				82
				83	.. doctest::
				84
				85	>>> mean([1, 2, 3, 4, 4])
				86	2.8
				87	>>> mean([-1.0, 2.5, 3.25, 5.75])
				88	2.625
				89
				90	>>> from fractions import Fraction as F
				91	>>> mean([F(3, 7), F(1, 21), F(5, 3), F(1, 3)])
				92	Fraction(13, 21)
				93
				94	>>> from decimal import Decimal as D
				95	>>> mean([D("0.5"), D("0.75"), D("0.625"), D("0.375")])
				96	Decimal('0.5625')
				97
				98	.. note::
				99
Georg Brandl	a3fdcaa	2013-10-21 09:08:39 +0200	[diff] [blame]	100	The mean is strongly affected by outliers and is not a robust estimator
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	101	for central location: the mean is not necessarily a typical example of the
				102	data points. For more robust, although less efficient, measures of
				103	central location, see :func:`median` and :func:`mode`. (In this case,
				104	"efficient" refers to statistical efficiency rather than computational
				105	efficiency.)
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	106
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	107	The sample mean gives an unbiased estimate of the true population mean,
				108	which means that, taken on average over all the possible samples,
				109	``mean(sample)`` converges on the true mean of the entire population. If
				110	data represents the entire population rather than a sample, then
				111	``mean(data)`` is equivalent to calculating the true population mean μ.
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	112
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	113
Steven D'Aprano	2287318	2016-08-24 02:34:25 +1000	[diff] [blame]	114	.. function:: harmonic_mean(data)
				115
				116	Return the harmonic mean of data, a sequence or iterator of
				117	real-valued numbers.
				118
				119	The harmonic mean, sometimes called the subcontrary mean, is the
Zachary Ware	c019bd3	2016-08-23 13:23:31 -0500	[diff] [blame]	120	reciprocal of the arithmetic :func:`mean` of the reciprocals of the
Steven D'Aprano	2287318	2016-08-24 02:34:25 +1000	[diff] [blame]	121	data. For example, the harmonic mean of three values a, b and c
				122	will be equivalent to ``3/(1/a + 1/b + 1/c)``.
				123
				124	The harmonic mean is a type of average, a measure of the central
				125	location of the data. It is often appropriate when averaging quantities
				126	which are rates or ratios, for example speeds. For example:
				127
				128	Suppose an investor purchases an equal value of shares in each of
				129	three companies, with P/E (price/earning) ratios of 2.5, 3 and 10.
				130	What is the average P/E ratio for the investor's portfolio?
				131
				132	.. doctest::
				133
				134	>>> harmonic_mean([2.5, 3, 10]) # For an equal investment portfolio.
				135	3.6
				136
				137	Using the arithmetic mean would give an average of about 5.167, which
				138	is too high.
				139
Zachary Ware	c019bd3	2016-08-23 13:23:31 -0500	[diff] [blame]	140	:exc:`StatisticsError` is raised if data is empty, or any element
Steven D'Aprano	2287318	2016-08-24 02:34:25 +1000	[diff] [blame]	141	is less than zero.
				142
Zachary Ware	c019bd3	2016-08-23 13:23:31 -0500	[diff] [blame]	143	.. versionadded:: 3.6
				144
Steven D'Aprano	2287318	2016-08-24 02:34:25 +1000	[diff] [blame]	145
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	146	.. function:: median(data)
				147
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	148	Return the median (middle value) of numeric data, using the common "mean of
				149	middle two" method. If data is empty, :exc:`StatisticsError` is raised.
Raymond Hettinger	6da9078	2016-11-21 16:31:02 -0800	[diff] [blame]	150	data can be a sequence or iterator.
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	151
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	152	The median is a robust measure of central location, and is less affected by
				153	the presence of outliers in your data. When the number of data points is
				154	odd, the middle data point is returned:
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	155
				156	.. doctest::
				157
				158	>>> median([1, 3, 5])
				159	3
				160
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	161	When the number of data points is even, the median is interpolated by taking
				162	the average of the two middle values:
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	163
				164	.. doctest::
				165
				166	>>> median([1, 3, 5, 7])
				167	4.0
				168
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	169	This is suited for when your data is discrete, and you don't mind that the
				170	median may not be an actual data point.
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	171
Tal Einat	fdd6e0b	2018-06-25 14:04:01 +0300	[diff] [blame]	172	If your data is ordinal (supports order operations) but not numeric (doesn't
				173	support addition), you should use :func:`median_low` or :func:`median_high`
				174	instead.
				175
Berker Peksag	9c1dba2	2014-09-28 00:00:58 +0300	[diff] [blame]	176	.. seealso:: :func:`median_low`, :func:`median_high`, :func:`median_grouped`
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	177
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	178
				179	.. function:: median_low(data)
				180
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	181	Return the low median of numeric data. If data is empty,
Raymond Hettinger	6da9078	2016-11-21 16:31:02 -0800	[diff] [blame]	182	:exc:`StatisticsError` is raised. data can be a sequence or iterator.
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	183
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	184	The low median is always a member of the data set. When the number of data
				185	points is odd, the middle value is returned. When it is even, the smaller of
				186	the two middle values is returned.
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	187
				188	.. doctest::
				189
				190	>>> median_low([1, 3, 5])
				191	3
				192	>>> median_low([1, 3, 5, 7])
				193	3
				194
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	195	Use the low median when your data are discrete and you prefer the median to
				196	be an actual data point rather than interpolated.
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	197
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	198
				199	.. function:: median_high(data)
				200
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	201	Return the high median of data. If data is empty, :exc:`StatisticsError`
Raymond Hettinger	6da9078	2016-11-21 16:31:02 -0800	[diff] [blame]	202	is raised. data can be a sequence or iterator.
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	203
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	204	The high median is always a member of the data set. When the number of data
				205	points is odd, the middle value is returned. When it is even, the larger of
				206	the two middle values is returned.
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	207
				208	.. doctest::
				209
				210	>>> median_high([1, 3, 5])
				211	3
				212	>>> median_high([1, 3, 5, 7])
				213	5
				214
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	215	Use the high median when your data are discrete and you prefer the median to
				216	be an actual data point rather than interpolated.
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	217
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	218
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	219	.. function:: median_grouped(data, interval=1)
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	220
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	221	Return the median of grouped continuous data, calculated as the 50th
				222	percentile, using interpolation. If data is empty, :exc:`StatisticsError`
Raymond Hettinger	6da9078	2016-11-21 16:31:02 -0800	[diff] [blame]	223	is raised. data can be a sequence or iterator.
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	224
				225	.. doctest::
				226
				227	>>> median_grouped([52, 52, 53, 54])
				228	52.5
				229
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	230	In the following example, the data are rounded, so that each value represents
Serhiy Storchaka	c7b1a0b	2016-11-26 13:43:28 +0200	[diff] [blame]	231	the midpoint of data classes, e.g. 1 is the midpoint of the class 0.5--1.5, 2
				232	is the midpoint of 1.5--2.5, 3 is the midpoint of 2.5--3.5, etc. With the data
				233	given, the middle value falls somewhere in the class 3.5--4.5, and
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	234	interpolation is used to estimate it:
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	235
				236	.. doctest::
				237
				238	>>> median_grouped([1, 2, 2, 3, 4, 4, 4, 4, 4, 5])
				239	3.7
				240
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	241	Optional argument interval represents the class interval, and defaults
				242	to 1. Changing the class interval naturally will change the interpolation:
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	243
				244	.. doctest::
				245
				246	>>> median_grouped([1, 3, 3, 5, 7], interval=1)
				247	3.25
				248	>>> median_grouped([1, 3, 3, 5, 7], interval=2)
				249	3.5
				250
				251	This function does not check whether the data points are at least
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	252	interval apart.
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	253
				254	.. impl-detail::
				255
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	256	Under some circumstances, :func:`median_grouped` may coerce data points to
				257	floats. This behaviour is likely to change in the future.
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	258
				259	.. seealso::
				260
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	261	* "Statistics for the Behavioral Sciences", Frederick J Gravetter and
				262	Larry B Wallnau (8th Edition).
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	263
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	264	* The `SSMEDIAN
Georg Brandl	525d355	2014-10-29 10:26:56 +0100	[diff] [blame]	265	<https://help.gnome.org/users/gnumeric/stable/gnumeric.html#gnumeric-function-SSMEDIAN>`_
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	266	function in the Gnome Gnumeric spreadsheet, including `this discussion
				267	<https://mail.gnome.org/archives/gnumeric-list/2011-April/msg00018.html>`_.
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	268
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	269
				270	.. function:: mode(data)
				271
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	272	Return the most common data point from discrete or nominal data. The mode
				273	(when it exists) is the most typical value, and is a robust measure of
				274	central location.
				275
				276	If data is empty, or if there is not exactly one most common value,
				277	:exc:`StatisticsError` is raised.
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	278
				279	``mode`` assumes discrete data, and returns a single value. This is the
				280	standard treatment of the mode as commonly taught in schools:
				281
				282	.. doctest::
				283
				284	>>> mode([1, 1, 2, 3, 3, 3, 3, 4])
				285	3
				286
				287	The mode is unique in that it is the only statistic which also applies
				288	to nominal (non-numeric) data:
				289
				290	.. doctest::
				291
				292	>>> mode(["red", "blue", "blue", "red", "green", "red", "red"])
				293	'red'
				294
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	295
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	296	.. function:: pstdev(data, mu=None)
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	297
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	298	Return the population standard deviation (the square root of the population
				299	variance). See :func:`pvariance` for arguments and other details.
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	300
				301	.. doctest::
				302
				303	>>> pstdev([1.5, 2.5, 2.5, 2.75, 3.25, 4.75])
				304	0.986893273527251
				305
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	306
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	307	.. function:: pvariance(data, mu=None)
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	308
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	309	Return the population variance of data, a non-empty iterable of real-valued
				310	numbers. Variance, or second moment about the mean, is a measure of the
				311	variability (spread or dispersion) of data. A large variance indicates that
				312	the data is spread out; a small variance indicates it is clustered closely
				313	around the mean.
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	314
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	315	If the optional second argument mu is given, it should be the mean of
				316	data. If it is missing or ``None`` (the default), the mean is
Ned Deily	3586673	2013-10-19 12:10:01 -0700	[diff] [blame]	317	automatically calculated.
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	318
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	319	Use this function to calculate the variance from the entire population. To
				320	estimate the variance from a sample, the :func:`variance` function is usually
				321	a better choice.
				322
				323	Raises :exc:`StatisticsError` if data is empty.
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	324
				325	Examples:
				326
				327	.. doctest::
				328
				329	>>> data = [0.0, 0.25, 0.25, 1.25, 1.5, 1.75, 2.75, 3.25]
				330	>>> pvariance(data)
				331	1.25
				332
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	333	If you have already calculated the mean of your data, you can pass it as the
				334	optional second argument mu to avoid recalculation:
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	335
				336	.. doctest::
				337
				338	>>> mu = mean(data)
				339	>>> pvariance(data, mu)
				340	1.25
				341
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	342	This function does not attempt to verify that you have passed the actual mean
				343	as mu. Using arbitrary values for mu may lead to invalid or impossible
				344	results.
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	345
				346	Decimals and Fractions are supported:
				347
				348	.. doctest::
				349
				350	>>> from decimal import Decimal as D
				351	>>> pvariance([D("27.5"), D("30.25"), D("30.25"), D("34.5"), D("41.75")])
				352	Decimal('24.815')
				353
				354	>>> from fractions import Fraction as F
				355	>>> pvariance([F(1, 4), F(5, 4), F(1, 2)])
				356	Fraction(13, 72)
				357
				358	.. note::
				359
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	360	When called with the entire population, this gives the population variance
				361	σ². When called on a sample instead, this is the biased sample variance
				362	s², also known as variance with N degrees of freedom.
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	363
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	364	If you somehow know the true population mean μ, you may use this function
				365	to calculate the variance of a sample, giving the known population mean as
				366	the second argument. Provided the data points are representative
				367	(e.g. independent and identically distributed), the result will be an
				368	unbiased estimate of the population variance.
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	369
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	370
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	371	.. function:: stdev(data, xbar=None)
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	372
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	373	Return the sample standard deviation (the square root of the sample
				374	variance). See :func:`variance` for arguments and other details.
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	375
				376	.. doctest::
				377
				378	>>> stdev([1.5, 2.5, 2.5, 2.75, 3.25, 4.75])
				379	1.0810874155219827
				380
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	381
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	382	.. function:: variance(data, xbar=None)
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	383
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	384	Return the sample variance of data, an iterable of at least two real-valued
				385	numbers. Variance, or second moment about the mean, is a measure of the
				386	variability (spread or dispersion) of data. A large variance indicates that
				387	the data is spread out; a small variance indicates it is clustered closely
				388	around the mean.
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	389
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	390	If the optional second argument xbar is given, it should be the mean of
				391	data. If it is missing or ``None`` (the default), the mean is
Ned Deily	3586673	2013-10-19 12:10:01 -0700	[diff] [blame]	392	automatically calculated.
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	393
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	394	Use this function when your data is a sample from a population. To calculate
				395	the variance from the entire population, see :func:`pvariance`.
				396
				397	Raises :exc:`StatisticsError` if data has fewer than two values.
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	398
				399	Examples:
				400
				401	.. doctest::
				402
				403	>>> data = [2.75, 1.75, 1.25, 0.25, 0.5, 1.25, 3.5]
				404	>>> variance(data)
				405	1.3720238095238095
				406
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	407	If you have already calculated the mean of your data, you can pass it as the
				408	optional second argument xbar to avoid recalculation:
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	409
				410	.. doctest::
				411
				412	>>> m = mean(data)
				413	>>> variance(data, m)
				414	1.3720238095238095
				415
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	416	This function does not attempt to verify that you have passed the actual mean
				417	as xbar. Using arbitrary values for xbar can lead to invalid or
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	418	impossible results.
				419
				420	Decimal and Fraction values are supported:
				421
				422	.. doctest::
				423
				424	>>> from decimal import Decimal as D
				425	>>> variance([D("27.5"), D("30.25"), D("30.25"), D("34.5"), D("41.75")])
				426	Decimal('31.01875')
				427
				428	>>> from fractions import Fraction as F
				429	>>> variance([F(1, 6), F(1, 2), F(5, 3)])
				430	Fraction(67, 108)
				431
				432	.. note::
				433
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	434	This is the sample variance s² with Bessel's correction, also known as
				435	variance with N-1 degrees of freedom. Provided that the data points are
				436	representative (e.g. independent and identically distributed), the result
				437	should be an unbiased estimate of the true population variance.
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	438
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	439	If you somehow know the actual population mean μ you should pass it to the
				440	:func:`pvariance` function as the mu parameter to get the variance of a
				441	sample.
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	442
				443	Exceptions
				444	----------
				445
				446	A single exception is defined:
				447
Benjamin Peterson	4ea16e5	2013-10-20 17:52:54 -0400	[diff] [blame]	448	.. exception:: StatisticsError
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	449
Benjamin Peterson	44c3065	2013-10-20 17:52:09 -0400	[diff] [blame]	450	Subclass of :exc:`ValueError` for statistics-related exceptions.
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	451
				452	..
				453	# This modelines must appear within the last ten lines of the file.
				454	kate: indent-width 3; remove-trailing-space on; replace-tabs on; encoding utf-8;