Blame - Doc/library/statistics.rst - platform/external/python/cpython3

blob: 232fb75247430b4ace6e6afe5496d196ca10bf3a [file] [log] [blame]

Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	1	:mod:`statistics` --- Mathematical statistics functions
				2	=======================================================
				3
				4	.. module:: statistics
				5	:synopsis: mathematical statistics functions
Terry Jan Reedy	fa089b9	2016-06-11 15:02:54 -0400	[diff] [blame]	6
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	7	.. moduleauthor:: Steven D'Aprano <steve+python@pearwood.info>
				8	.. sectionauthor:: Steven D'Aprano <steve+python@pearwood.info>
				9
				10	.. versionadded:: 3.4
				11
Terry Jan Reedy	fa089b9	2016-06-11 15:02:54 -0400	[diff] [blame]	12	Source code: :source:`Lib/statistics.py`
				13
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	14	.. testsetup:: *
				15
				16	from statistics import *
				17	__name__ = '<doctest>'
				18
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	19	--------------
				20
				21	This module provides functions for calculating mathematical statistics of
				22	numeric (:class:`Real`-valued) data.
				23
Nick Coghlan	73afe2a	2014-02-08 19:58:04 +1000	[diff] [blame]	24	.. note::
				25
				26	Unless explicitly noted otherwise, these functions support :class:`int`,
				27	:class:`float`, :class:`decimal.Decimal` and :class:`fractions.Fraction`.
				28	Behaviour with other types (whether in the numeric tower or not) is
				29	currently unsupported. Mixed types are also undefined and
				30	implementation-dependent. If your input data consists of mixed types,
				31	you may be able to use :func:`map` to ensure a consistent result, e.g.
				32	``map(float, input_data)``.
				33
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	34	Averages and measures of central location
				35	-----------------------------------------
				36
				37	These functions calculate an average or typical value from a population
				38	or sample.
				39
				40	======================= =============================================
				41	:func:`mean` Arithmetic mean ("average") of data.
Steven D'Aprano	2287318	2016-08-24 02:34:25 +1000	[diff] [blame]	42	:func:`geometric_mean` Geometric mean of data.
				43	:func:`harmonic_mean` Harmonic mean of data.
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	44	:func:`median` Median (middle value) of data.
				45	:func:`median_low` Low median of data.
				46	:func:`median_high` High median of data.
				47	:func:`median_grouped` Median, or 50th percentile, of grouped data.
				48	:func:`mode` Mode (most common value) of discrete data.
				49	======================= =============================================
				50
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	51	Measures of spread
				52	------------------
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	53
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	54	These functions calculate a measure of how much the population or sample
				55	tends to deviate from the typical or average values.
				56
				57	======================= =============================================
				58	:func:`pstdev` Population standard deviation of data.
				59	:func:`pvariance` Population variance of data.
				60	:func:`stdev` Sample standard deviation of data.
				61	:func:`variance` Sample variance of data.
				62	======================= =============================================
				63
				64
				65	Function details
				66	----------------
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	67
Georg Brandl	e051b55	2013-11-04 07:30:50 +0100	[diff] [blame]	68	Note: The functions do not require the data given to them to be sorted.
				69	However, for reading convenience, most of the examples show sorted sequences.
				70
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	71	.. function:: mean(data)
				72
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	73	Return the sample arithmetic mean of data, a sequence or iterator of
				74	real-valued numbers.
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	75
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	76	The arithmetic mean is the sum of the data divided by the number of data
				77	points. It is commonly called "the average", although it is only one of many
				78	different mathematical averages. It is a measure of the central location of
				79	the data.
				80
				81	If data is empty, :exc:`StatisticsError` will be raised.
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	82
				83	Some examples of use:
				84
				85	.. doctest::
				86
				87	>>> mean([1, 2, 3, 4, 4])
				88	2.8
				89	>>> mean([-1.0, 2.5, 3.25, 5.75])
				90	2.625
				91
				92	>>> from fractions import Fraction as F
				93	>>> mean([F(3, 7), F(1, 21), F(5, 3), F(1, 3)])
				94	Fraction(13, 21)
				95
				96	>>> from decimal import Decimal as D
				97	>>> mean([D("0.5"), D("0.75"), D("0.625"), D("0.375")])
				98	Decimal('0.5625')
				99
				100	.. note::
				101
Georg Brandl	a3fdcaa	2013-10-21 09:08:39 +0200	[diff] [blame]	102	The mean is strongly affected by outliers and is not a robust estimator
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	103	for central location: the mean is not necessarily a typical example of the
				104	data points. For more robust, although less efficient, measures of
				105	central location, see :func:`median` and :func:`mode`. (In this case,
				106	"efficient" refers to statistical efficiency rather than computational
				107	efficiency.)
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	108
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	109	The sample mean gives an unbiased estimate of the true population mean,
				110	which means that, taken on average over all the possible samples,
				111	``mean(sample)`` converges on the true mean of the entire population. If
				112	data represents the entire population rather than a sample, then
				113	``mean(data)`` is equivalent to calculating the true population mean μ.
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	114
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	115
Steven D'Aprano	2287318	2016-08-24 02:34:25 +1000	[diff] [blame]	116	.. function:: geometric_mean(data)
				117
				118	Return the geometric mean of data, a sequence or iterator of
				119	real-valued numbers.
				120
				121	The geometric mean is the n-th root of the product of n data points.
				122	It is a type of average, a measure of the central location of the data.
				123
				124	The geometric mean is appropriate when averaging quantities which
				125	are multiplied together rather than added, for example growth rates.
				126	Suppose an investment grows by 10% in the first year, falls by 5% in
				127	the second, then grows by 12% in the third, what is the average rate
				128	of growth over the three years?
				129
				130	.. doctest::
				131
				132	>>> geometric_mean([1.10, 0.95, 1.12])
				133	1.0538483123382172
				134
				135	giving an average growth of 5.385%. Using the arithmetic mean will
				136	give approximately 5.667%, which is too high.
				137
Zachary Ware	c019bd3	2016-08-23 13:23:31 -0500	[diff] [blame]	138	:exc:`StatisticsError` is raised if data is empty, or any
Steven D'Aprano	2287318	2016-08-24 02:34:25 +1000	[diff] [blame]	139	element is less than zero.
				140
Zachary Ware	c019bd3	2016-08-23 13:23:31 -0500	[diff] [blame]	141	.. versionadded:: 3.6
				142
Steven D'Aprano	2287318	2016-08-24 02:34:25 +1000	[diff] [blame]	143
				144	.. function:: harmonic_mean(data)
				145
				146	Return the harmonic mean of data, a sequence or iterator of
				147	real-valued numbers.
				148
				149	The harmonic mean, sometimes called the subcontrary mean, is the
Zachary Ware	c019bd3	2016-08-23 13:23:31 -0500	[diff] [blame]	150	reciprocal of the arithmetic :func:`mean` of the reciprocals of the
Steven D'Aprano	2287318	2016-08-24 02:34:25 +1000	[diff] [blame]	151	data. For example, the harmonic mean of three values a, b and c
				152	will be equivalent to ``3/(1/a + 1/b + 1/c)``.
				153
				154	The harmonic mean is a type of average, a measure of the central
				155	location of the data. It is often appropriate when averaging quantities
				156	which are rates or ratios, for example speeds. For example:
				157
				158	Suppose an investor purchases an equal value of shares in each of
				159	three companies, with P/E (price/earning) ratios of 2.5, 3 and 10.
				160	What is the average P/E ratio for the investor's portfolio?
				161
				162	.. doctest::
				163
				164	>>> harmonic_mean([2.5, 3, 10]) # For an equal investment portfolio.
				165	3.6
				166
				167	Using the arithmetic mean would give an average of about 5.167, which
				168	is too high.
				169
Zachary Ware	c019bd3	2016-08-23 13:23:31 -0500	[diff] [blame]	170	:exc:`StatisticsError` is raised if data is empty, or any element
Steven D'Aprano	2287318	2016-08-24 02:34:25 +1000	[diff] [blame]	171	is less than zero.
				172
Zachary Ware	c019bd3	2016-08-23 13:23:31 -0500	[diff] [blame]	173	.. versionadded:: 3.6
				174
Steven D'Aprano	2287318	2016-08-24 02:34:25 +1000	[diff] [blame]	175
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	176	.. function:: median(data)
				177
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	178	Return the median (middle value) of numeric data, using the common "mean of
				179	middle two" method. If data is empty, :exc:`StatisticsError` is raised.
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	180
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	181	The median is a robust measure of central location, and is less affected by
				182	the presence of outliers in your data. When the number of data points is
				183	odd, the middle data point is returned:
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	184
				185	.. doctest::
				186
				187	>>> median([1, 3, 5])
				188	3
				189
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	190	When the number of data points is even, the median is interpolated by taking
				191	the average of the two middle values:
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	192
				193	.. doctest::
				194
				195	>>> median([1, 3, 5, 7])
				196	4.0
				197
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	198	This is suited for when your data is discrete, and you don't mind that the
				199	median may not be an actual data point.
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	200
Berker Peksag	9c1dba2	2014-09-28 00:00:58 +0300	[diff] [blame]	201	.. seealso:: :func:`median_low`, :func:`median_high`, :func:`median_grouped`
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	202
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	203
				204	.. function:: median_low(data)
				205
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	206	Return the low median of numeric data. If data is empty,
				207	:exc:`StatisticsError` is raised.
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	208
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	209	The low median is always a member of the data set. When the number of data
				210	points is odd, the middle value is returned. When it is even, the smaller of
				211	the two middle values is returned.
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	212
				213	.. doctest::
				214
				215	>>> median_low([1, 3, 5])
				216	3
				217	>>> median_low([1, 3, 5, 7])
				218	3
				219
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	220	Use the low median when your data are discrete and you prefer the median to
				221	be an actual data point rather than interpolated.
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	222
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	223
				224	.. function:: median_high(data)
				225
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	226	Return the high median of data. If data is empty, :exc:`StatisticsError`
				227	is raised.
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	228
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	229	The high median is always a member of the data set. When the number of data
				230	points is odd, the middle value is returned. When it is even, the larger of
				231	the two middle values is returned.
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	232
				233	.. doctest::
				234
				235	>>> median_high([1, 3, 5])
				236	3
				237	>>> median_high([1, 3, 5, 7])
				238	5
				239
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	240	Use the high median when your data are discrete and you prefer the median to
				241	be an actual data point rather than interpolated.
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	242
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	243
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	244	.. function:: median_grouped(data, interval=1)
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	245
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	246	Return the median of grouped continuous data, calculated as the 50th
				247	percentile, using interpolation. If data is empty, :exc:`StatisticsError`
				248	is raised.
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	249
				250	.. doctest::
				251
				252	>>> median_grouped([52, 52, 53, 54])
				253	52.5
				254
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	255	In the following example, the data are rounded, so that each value represents
				256	the midpoint of data classes, e.g. 1 is the midpoint of the class 0.5-1.5, 2
				257	is the midpoint of 1.5-2.5, 3 is the midpoint of 2.5-3.5, etc. With the data
				258	given, the middle value falls somewhere in the class 3.5-4.5, and
				259	interpolation is used to estimate it:
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	260
				261	.. doctest::
				262
				263	>>> median_grouped([1, 2, 2, 3, 4, 4, 4, 4, 4, 5])
				264	3.7
				265
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	266	Optional argument interval represents the class interval, and defaults
				267	to 1. Changing the class interval naturally will change the interpolation:
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	268
				269	.. doctest::
				270
				271	>>> median_grouped([1, 3, 3, 5, 7], interval=1)
				272	3.25
				273	>>> median_grouped([1, 3, 3, 5, 7], interval=2)
				274	3.5
				275
				276	This function does not check whether the data points are at least
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	277	interval apart.
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	278
				279	.. impl-detail::
				280
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	281	Under some circumstances, :func:`median_grouped` may coerce data points to
				282	floats. This behaviour is likely to change in the future.
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	283
				284	.. seealso::
				285
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	286	* "Statistics for the Behavioral Sciences", Frederick J Gravetter and
				287	Larry B Wallnau (8th Edition).
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	288
Serhiy Storchaka	6dff020	2016-05-07 10:49:07 +0300	[diff] [blame]	289	* Calculating the `median <https://www.ualberta.ca/~opscan/median.html>`_.
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	290
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	291	* The `SSMEDIAN
Georg Brandl	525d355	2014-10-29 10:26:56 +0100	[diff] [blame]	292	<https://help.gnome.org/users/gnumeric/stable/gnumeric.html#gnumeric-function-SSMEDIAN>`_
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	293	function in the Gnome Gnumeric spreadsheet, including `this discussion
				294	<https://mail.gnome.org/archives/gnumeric-list/2011-April/msg00018.html>`_.
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	295
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	296
				297	.. function:: mode(data)
				298
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	299	Return the most common data point from discrete or nominal data. The mode
				300	(when it exists) is the most typical value, and is a robust measure of
				301	central location.
				302
				303	If data is empty, or if there is not exactly one most common value,
				304	:exc:`StatisticsError` is raised.
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	305
				306	``mode`` assumes discrete data, and returns a single value. This is the
				307	standard treatment of the mode as commonly taught in schools:
				308
				309	.. doctest::
				310
				311	>>> mode([1, 1, 2, 3, 3, 3, 3, 4])
				312	3
				313
				314	The mode is unique in that it is the only statistic which also applies
				315	to nominal (non-numeric) data:
				316
				317	.. doctest::
				318
				319	>>> mode(["red", "blue", "blue", "red", "green", "red", "red"])
				320	'red'
				321
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	322
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	323	.. function:: pstdev(data, mu=None)
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	324
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	325	Return the population standard deviation (the square root of the population
				326	variance). See :func:`pvariance` for arguments and other details.
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	327
				328	.. doctest::
				329
				330	>>> pstdev([1.5, 2.5, 2.5, 2.75, 3.25, 4.75])
				331	0.986893273527251
				332
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	333
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	334	.. function:: pvariance(data, mu=None)
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	335
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	336	Return the population variance of data, a non-empty iterable of real-valued
				337	numbers. Variance, or second moment about the mean, is a measure of the
				338	variability (spread or dispersion) of data. A large variance indicates that
				339	the data is spread out; a small variance indicates it is clustered closely
				340	around the mean.
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	341
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	342	If the optional second argument mu is given, it should be the mean of
				343	data. If it is missing or ``None`` (the default), the mean is
Ned Deily	3586673	2013-10-19 12:10:01 -0700	[diff] [blame]	344	automatically calculated.
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	345
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	346	Use this function to calculate the variance from the entire population. To
				347	estimate the variance from a sample, the :func:`variance` function is usually
				348	a better choice.
				349
				350	Raises :exc:`StatisticsError` if data is empty.
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	351
				352	Examples:
				353
				354	.. doctest::
				355
				356	>>> data = [0.0, 0.25, 0.25, 1.25, 1.5, 1.75, 2.75, 3.25]
				357	>>> pvariance(data)
				358	1.25
				359
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	360	If you have already calculated the mean of your data, you can pass it as the
				361	optional second argument mu to avoid recalculation:
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	362
				363	.. doctest::
				364
				365	>>> mu = mean(data)
				366	>>> pvariance(data, mu)
				367	1.25
				368
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	369	This function does not attempt to verify that you have passed the actual mean
				370	as mu. Using arbitrary values for mu may lead to invalid or impossible
				371	results.
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	372
				373	Decimals and Fractions are supported:
				374
				375	.. doctest::
				376
				377	>>> from decimal import Decimal as D
				378	>>> pvariance([D("27.5"), D("30.25"), D("30.25"), D("34.5"), D("41.75")])
				379	Decimal('24.815')
				380
				381	>>> from fractions import Fraction as F
				382	>>> pvariance([F(1, 4), F(5, 4), F(1, 2)])
				383	Fraction(13, 72)
				384
				385	.. note::
				386
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	387	When called with the entire population, this gives the population variance
				388	σ². When called on a sample instead, this is the biased sample variance
				389	s², also known as variance with N degrees of freedom.
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	390
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	391	If you somehow know the true population mean μ, you may use this function
				392	to calculate the variance of a sample, giving the known population mean as
				393	the second argument. Provided the data points are representative
				394	(e.g. independent and identically distributed), the result will be an
				395	unbiased estimate of the population variance.
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	396
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	397
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	398	.. function:: stdev(data, xbar=None)
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	399
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	400	Return the sample standard deviation (the square root of the sample
				401	variance). See :func:`variance` for arguments and other details.
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	402
				403	.. doctest::
				404
				405	>>> stdev([1.5, 2.5, 2.5, 2.75, 3.25, 4.75])
				406	1.0810874155219827
				407
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	408
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	409	.. function:: variance(data, xbar=None)
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	410
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	411	Return the sample variance of data, an iterable of at least two real-valued
				412	numbers. Variance, or second moment about the mean, is a measure of the
				413	variability (spread or dispersion) of data. A large variance indicates that
				414	the data is spread out; a small variance indicates it is clustered closely
				415	around the mean.
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	416
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	417	If the optional second argument xbar is given, it should be the mean of
				418	data. If it is missing or ``None`` (the default), the mean is
Ned Deily	3586673	2013-10-19 12:10:01 -0700	[diff] [blame]	419	automatically calculated.
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	420
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	421	Use this function when your data is a sample from a population. To calculate
				422	the variance from the entire population, see :func:`pvariance`.
				423
				424	Raises :exc:`StatisticsError` if data has fewer than two values.
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	425
				426	Examples:
				427
				428	.. doctest::
				429
				430	>>> data = [2.75, 1.75, 1.25, 0.25, 0.5, 1.25, 3.5]
				431	>>> variance(data)
				432	1.3720238095238095
				433
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	434	If you have already calculated the mean of your data, you can pass it as the
				435	optional second argument xbar to avoid recalculation:
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	436
				437	.. doctest::
				438
				439	>>> m = mean(data)
				440	>>> variance(data, m)
				441	1.3720238095238095
				442
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	443	This function does not attempt to verify that you have passed the actual mean
				444	as xbar. Using arbitrary values for xbar can lead to invalid or
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	445	impossible results.
				446
				447	Decimal and Fraction values are supported:
				448
				449	.. doctest::
				450
				451	>>> from decimal import Decimal as D
				452	>>> variance([D("27.5"), D("30.25"), D("30.25"), D("34.5"), D("41.75")])
				453	Decimal('31.01875')
				454
				455	>>> from fractions import Fraction as F
				456	>>> variance([F(1, 6), F(1, 2), F(5, 3)])
				457	Fraction(67, 108)
				458
				459	.. note::
				460
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	461	This is the sample variance s² with Bessel's correction, also known as
				462	variance with N-1 degrees of freedom. Provided that the data points are
				463	representative (e.g. independent and identically distributed), the result
				464	should be an unbiased estimate of the true population variance.
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	465
Georg Brandl	eb2aeec	2013-10-21 08:57:26 +0200	[diff] [blame]	466	If you somehow know the actual population mean μ you should pass it to the
				467	:func:`pvariance` function as the mu parameter to get the variance of a
				468	sample.
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	469
				470	Exceptions
				471	----------
				472
				473	A single exception is defined:
				474
Benjamin Peterson	4ea16e5	2013-10-20 17:52:54 -0400	[diff] [blame]	475	.. exception:: StatisticsError
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	476
Benjamin Peterson	44c3065	2013-10-20 17:52:09 -0400	[diff] [blame]	477	Subclass of :exc:`ValueError` for statistics-related exceptions.
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	478
				479	..
				480	# This modelines must appear within the last ten lines of the file.
				481	kate: indent-width 3; remove-trailing-space on; replace-tabs on; encoding utf-8;