Blame - Doc/library/statistics.rst - platform/external/python/cpython3

blob: fc99d818d57864da7d89a86f15eec6ea2336b7f8 [file] [log] [blame]

Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	1	:mod:`statistics` --- Mathematical statistics functions
				2	=======================================================
				3
				4	.. module:: statistics
				5	:synopsis: mathematical statistics functions
				6	.. moduleauthor:: Steven D'Aprano <steve+python@pearwood.info>
				7	.. sectionauthor:: Steven D'Aprano <steve+python@pearwood.info>
				8
				9	.. versionadded:: 3.4
				10
				11	.. testsetup:: *
				12
				13	from statistics import *
				14	__name__ = '<doctest>'
				15
				16	Source code: :source:`Lib/statistics.py`
				17
				18	--------------
				19
				20	This module provides functions for calculating mathematical statistics of
				21	numeric (:class:`Real`-valued) data.
				22
				23	Averages and measures of central location
				24	-----------------------------------------
				25
				26	These functions calculate an average or typical value from a population
				27	or sample.
				28
				29	======================= =============================================
				30	:func:`mean` Arithmetic mean ("average") of data.
				31	:func:`median` Median (middle value) of data.
				32	:func:`median_low` Low median of data.
				33	:func:`median_high` High median of data.
				34	:func:`median_grouped` Median, or 50th percentile, of grouped data.
				35	:func:`mode` Mode (most common value) of discrete data.
				36	======================= =============================================
				37
				38	:func:`mean`
				39	~~~~~~~~~~~~
				40
				41	The :func:`mean` function calculates the arithmetic mean, commonly known
				42	as the average, of its iterable argument:
				43
				44	.. function:: mean(data)
				45
				46	Return the sample arithmetic mean of data, a sequence or iterator
				47	of real-valued numbers.
				48
				49	The arithmetic mean is the sum of the data divided by the number of
				50	data points. It is commonly called "the average", although it is only
				51	one of many different mathematical averages. It is a measure of the
				52	central location of the data.
				53
				54	Some examples of use:
				55
				56	.. doctest::
				57
				58	>>> mean([1, 2, 3, 4, 4])
				59	2.8
				60	>>> mean([-1.0, 2.5, 3.25, 5.75])
				61	2.625
				62
				63	>>> from fractions import Fraction as F
				64	>>> mean([F(3, 7), F(1, 21), F(5, 3), F(1, 3)])
				65	Fraction(13, 21)
				66
				67	>>> from decimal import Decimal as D
				68	>>> mean([D("0.5"), D("0.75"), D("0.625"), D("0.375")])
				69	Decimal('0.5625')
				70
				71	.. note::
				72
				73	The mean is strongly effected by outliers and is not a robust
				74	estimator for central location: the mean is not necessarily a
				75	typical example of the data points. For more robust, although less
				76	efficient, measures of central location, see :func:`median` and
				77	:func:`mode`. (In this case, "efficient" refers to statistical
				78	efficiency rather than computational efficiency.)
				79
				80	The sample mean gives an unbiased estimate of the true population
				81	mean, which means that, taken on average over all the possible
				82	samples, ``mean(sample)`` converges on the true mean of the entire
				83	population. If data represents the entire population rather than
				84	a sample, then ``mean(data)`` is equivalent to calculating the true
				85	population mean μ.
				86
				87	If ``data`` is empty, :exc:`StatisticsError` will be raised.
				88
				89	:func:`median`
				90	~~~~~~~~~~~~~~
				91
				92	The :func:`median` function calculates the median, or middle, data point,
				93	using the common "mean of middle two" method.
				94
				95	.. seealso::
				96
				97	:func:`median_low`
				98
				99	:func:`median_high`
				100
				101	:func:`median_grouped`
				102
				103	.. function:: median(data)
				104
				105	Return the median (middle value) of numeric data.
				106
				107	The median is a robust measure of central location, and is less affected
				108	by the presence of outliers in your data. When the number of data points
				109	is odd, the middle data point is returned:
				110
				111	.. doctest::
				112
				113	>>> median([1, 3, 5])
				114	3
				115
				116	When the number of data points is even, the median is interpolated by
				117	taking the average of the two middle values:
				118
				119	.. doctest::
				120
				121	>>> median([1, 3, 5, 7])
				122	4.0
				123
				124	This is suited for when your data is discrete, and you don't mind that
				125	the median may not be an actual data point.
				126
				127	If data is empty, :exc:`StatisticsError` is raised.
				128
				129	:func:`median_low`
				130	~~~~~~~~~~~~~~~~~~
				131
				132	The :func:`median_low` function calculates the low median without
				133	interpolation.
				134
				135	.. function:: median_low(data)
				136
				137	Return the low median of numeric data.
				138
				139	The low median is always a member of the data set. When the number
				140	of data points is odd, the middle value is returned. When it is
				141	even, the smaller of the two middle values is returned.
				142
				143	.. doctest::
				144
				145	>>> median_low([1, 3, 5])
				146	3
				147	>>> median_low([1, 3, 5, 7])
				148	3
				149
				150	Use the low median when your data are discrete and you prefer the median
				151	to be an actual data point rather than interpolated.
				152
				153	If data is empty, :exc:`StatisticsError` is raised.
				154
				155	:func:`median_high`
				156	~~~~~~~~~~~~~~~~~~~
				157
				158	The :func:`median_high` function calculates the high median without
				159	interpolation.
				160
				161	.. function:: median_high(data)
				162
				163	Return the high median of data.
				164
				165	The high median is always a member of the data set. When the number of
				166	data points is odd, the middle value is returned. When it is even, the
				167	larger of the two middle values is returned.
				168
				169	.. doctest::
				170
				171	>>> median_high([1, 3, 5])
				172	3
				173	>>> median_high([1, 3, 5, 7])
				174	5
				175
				176	Use the high median when your data are discrete and you prefer the median
				177	to be an actual data point rather than interpolated.
				178
				179	If data is empty, :exc:`StatisticsError` is raised.
				180
				181	:func:`median_grouped`
				182	~~~~~~~~~~~~~~~~~~~~~~
				183
				184	The :func:`median_grouped` function calculates the median of grouped data
				185	as the 50th percentile, using interpolation.
				186
				187	.. function:: median_grouped(data [, interval])
				188
				189	Return the median of grouped continuous data, calculated as the
				190	50th percentile.
				191
				192	.. doctest::
				193
				194	>>> median_grouped([52, 52, 53, 54])
				195	52.5
				196
				197	In the following example, the data are rounded, so that each value
				198	represents the midpoint of data classes, e.g. 1 is the midpoint of the
				199	class 0.5-1.5, 2 is the midpoint of 1.5-2.5, 3 is the midpoint of
				200	2.5-3.5, etc. With the data given, the middle value falls somewhere in
				201	the class 3.5-4.5, and interpolation is used to estimate it:
				202
				203	.. doctest::
				204
				205	>>> median_grouped([1, 2, 2, 3, 4, 4, 4, 4, 4, 5])
				206	3.7
				207
				208	Optional argument ``interval`` represents the class interval, and
				209	defaults to 1. Changing the class interval naturally will change the
				210	interpolation:
				211
				212	.. doctest::
				213
				214	>>> median_grouped([1, 3, 3, 5, 7], interval=1)
				215	3.25
				216	>>> median_grouped([1, 3, 3, 5, 7], interval=2)
				217	3.5
				218
				219	This function does not check whether the data points are at least
				220	``interval`` apart.
				221
				222	.. impl-detail::
				223
				224	Under some circumstances, :func:`median_grouped` may coerce data
				225	points to floats. This behaviour is likely to change in the future.
				226
				227	.. seealso::
				228
				229	* "Statistics for the Behavioral Sciences", Frederick J Gravetter
				230	and Larry B Wallnau (8th Edition).
				231
				232	* Calculating the `median <http://www.ualberta.ca/~opscan/median.html>`_.
				233
				234	* The `SSMEDIAN <https://projects.gnome.org/gnumeric/doc/gnumeric-function-SSMEDIAN.shtml>`_
				235	function in the Gnome Gnumeric spreadsheet, including
				236	`this discussion <https://mail.gnome.org/archives/gnumeric-list/2011-April/msg00018.html>`_.
				237
				238	If data is empty, :exc:`StatisticsError` is raised.
				239
				240	:func:`mode`
				241	~~~~~~~~~~~~
				242
				243	The :func:`mode` function calculates the mode, or most common element, of
				244	discrete or nominal data. The mode (when it exists) is the most typical
				245	value, and is a robust measure of central location.
				246
				247	.. function:: mode(data)
				248
				249	Return the most common data point from discrete or nominal data.
				250
				251	``mode`` assumes discrete data, and returns a single value. This is the
				252	standard treatment of the mode as commonly taught in schools:
				253
				254	.. doctest::
				255
				256	>>> mode([1, 1, 2, 3, 3, 3, 3, 4])
				257	3
				258
				259	The mode is unique in that it is the only statistic which also applies
				260	to nominal (non-numeric) data:
				261
				262	.. doctest::
				263
				264	>>> mode(["red", "blue", "blue", "red", "green", "red", "red"])
				265	'red'
				266
				267	If data is empty, or if there is not exactly one most common value,
				268	:exc:`StatisticsError` is raised.
				269
				270	Measures of spread
				271	------------------
				272
				273	These functions calculate a measure of how much the population or sample
				274	tends to deviate from the typical or average values.
				275
				276	======================= =============================================
				277	:func:`pstdev` Population standard deviation of data.
				278	:func:`pvariance` Population variance of data.
				279	:func:`stdev` Sample standard deviation of data.
				280	:func:`variance` Sample variance of data.
				281	======================= =============================================
				282
				283	:func:`pstdev`
				284	~~~~~~~~~~~~~~
				285
				286	The :func:`pstdev` function calculates the standard deviation of a
				287	population. The standard deviation is equivalent to the square root of
				288	the variance.
				289
				290	.. function:: pstdev(data [, mu])
				291
				292	Return the square root of the population variance. See :func:`pvariance`
				293	for arguments and other details.
				294
				295	.. doctest::
				296
				297	>>> pstdev([1.5, 2.5, 2.5, 2.75, 3.25, 4.75])
				298	0.986893273527251
				299
				300	:func:`pvariance`
				301	~~~~~~~~~~~~~~~~~
				302
				303	The :func:`pvariance` function calculates the variance of a population.
				304	Variance, or second moment about the mean, is a measure of the variability
				305	(spread or dispersion) of data. A large variance indicates that the data is
				306	spread out; a small variance indicates it is clustered closely around the
				307	mean.
				308
				309	.. function:: pvariance(data [, mu])
				310
				311	Return the population variance of data, a non-empty iterable of
				312	real-valued numbers.
				313
				314	If the optional second argument mu is given, it should be the mean
				315	of data. If it is missing or None (the default), the mean is
Ned Deily	3586673	2013-10-19 12:10:01 -0700	[diff] [blame]	316	automatically calculated.
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	317
				318	Use this function to calculate the variance from the entire population.
				319	To estimate the variance from a sample, the :func:`variance` function is
				320	usually a better choice.
				321
				322	Examples:
				323
				324	.. doctest::
				325
				326	>>> data = [0.0, 0.25, 0.25, 1.25, 1.5, 1.75, 2.75, 3.25]
				327	>>> pvariance(data)
				328	1.25
				329
				330	If you have already calculated the mean of your data, you can pass
				331	it as the optional second argument mu to avoid recalculation:
				332
				333	.. doctest::
				334
				335	>>> mu = mean(data)
				336	>>> pvariance(data, mu)
				337	1.25
				338
				339	This function does not attempt to verify that you have passed the actual
				340	mean as mu. Using arbitrary values for mu may lead to invalid or
				341	impossible results.
				342
				343	Decimals and Fractions are supported:
				344
				345	.. doctest::
				346
				347	>>> from decimal import Decimal as D
				348	>>> pvariance([D("27.5"), D("30.25"), D("30.25"), D("34.5"), D("41.75")])
				349	Decimal('24.815')
				350
				351	>>> from fractions import Fraction as F
				352	>>> pvariance([F(1, 4), F(5, 4), F(1, 2)])
				353	Fraction(13, 72)
				354
				355	.. note::
				356
				357	When called with the entire population, this gives the population
				358	variance σ². When called on a sample instead, this is the biased
				359	sample variance s², also known as variance with N degrees of freedom.
				360
				361	If you somehow know the true population mean μ, you may use this
				362	function to calculate the variance of a sample, giving the known
				363	population mean as the second argument. Provided the data points are
				364	representative (e.g. independent and identically distributed), the
				365	result will be an unbiased estimate of the population variance.
				366
				367	Raises :exc:`StatisticsError` if data is empty.
				368
				369	:func:`stdev`
				370	~~~~~~~~~~~~~~
				371
				372	The :func:`stdev` function calculates the standard deviation of a sample.
				373	The standard deviation is equivalent to the square root of the variance.
				374
				375	.. function:: stdev(data [, xbar])
				376
				377	Return the square root of the sample variance. See :func:`variance` for
				378	arguments and other details.
				379
				380	.. doctest::
				381
				382	>>> stdev([1.5, 2.5, 2.5, 2.75, 3.25, 4.75])
				383	1.0810874155219827
				384
				385	:func:`variance`
				386	~~~~~~~~~~~~~~~~~
				387
				388	The :func:`variance` function calculates the variance of a sample. Variance,
				389	or second moment about the mean, is a measure of the variability (spread or
				390	dispersion) of data. A large variance indicates that the data is spread out;
				391	a small variance indicates it is clustered closely around the mean.
				392
				393	.. function:: variance(data [, xbar])
				394
				395	Return the sample variance of data, an iterable of at least two
				396	real-valued numbers.
				397
				398	If the optional second argument xbar is given, it should be the mean
				399	of data. If it is missing or None (the default), the mean is
Ned Deily	3586673	2013-10-19 12:10:01 -0700	[diff] [blame]	400	automatically calculated.
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	401
				402	Use this function when your data is a sample from a population. To
				403	calculate the variance from the entire population, see :func:`pvariance`.
				404
				405	Examples:
				406
				407	.. doctest::
				408
				409	>>> data = [2.75, 1.75, 1.25, 0.25, 0.5, 1.25, 3.5]
				410	>>> variance(data)
				411	1.3720238095238095
				412
				413	If you have already calculated the mean of your data, you can pass
				414	it as the optional second argument xbar to avoid recalculation:
				415
				416	.. doctest::
				417
				418	>>> m = mean(data)
				419	>>> variance(data, m)
				420	1.3720238095238095
				421
				422	This function does not attempt to verify that you have passed the actual
				423	mean as xbar. Using arbitrary values for xbar can lead to invalid or
				424	impossible results.
				425
				426	Decimal and Fraction values are supported:
				427
				428	.. doctest::
				429
				430	>>> from decimal import Decimal as D
				431	>>> variance([D("27.5"), D("30.25"), D("30.25"), D("34.5"), D("41.75")])
				432	Decimal('31.01875')
				433
				434	>>> from fractions import Fraction as F
				435	>>> variance([F(1, 6), F(1, 2), F(5, 3)])
				436	Fraction(67, 108)
				437
				438	.. note::
				439
				440	This is the sample variance s² with Bessel's correction, also known
				441	as variance with N-1 degrees of freedom. Provided that the data
				442	points are representative (e.g. independent and identically
				443	distributed), the result should be an unbiased estimate of the true
				444	population variance.
				445
				446	If you somehow know the actual population mean μ you should pass it
				447	to the :func:`pvariance` function as the mu parameter to get
				448	the variance of a sample.
				449
				450	Raises :exc:`StatisticsError` if data has fewer than two values.
				451
				452	Exceptions
				453	----------
				454
				455	A single exception is defined:
				456
Benjamin Peterson	44c3065	2013-10-20 17:52:09 -0400	[diff] [blame^]	457	.. exception:: `StatisticsError`
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	458
Benjamin Peterson	44c3065	2013-10-20 17:52:09 -0400	[diff] [blame^]	459	Subclass of :exc:`ValueError` for statistics-related exceptions.
Larry Hastings	f5e987b	2013-10-19 11:50:09 -0700	[diff] [blame]	460
				461	..
				462	# This modelines must appear within the last ten lines of the file.
				463	kate: indent-width 3; remove-trailing-space on; replace-tabs on; encoding utf-8;