blob: a790ed81a5607e002bb299b3056c877f4055fe03 [file] [log] [blame]
Larry Hastingsf5e987b2013-10-19 11:50:09 -07001:mod:`statistics` --- Mathematical statistics functions
2=======================================================
3
4.. module:: statistics
5 :synopsis: mathematical statistics functions
Terry Jan Reedyfa089b92016-06-11 15:02:54 -04006
Larry Hastingsf5e987b2013-10-19 11:50:09 -07007.. moduleauthor:: Steven D'Aprano <steve+python@pearwood.info>
8.. sectionauthor:: Steven D'Aprano <steve+python@pearwood.info>
9
10.. versionadded:: 3.4
11
Terry Jan Reedyfa089b92016-06-11 15:02:54 -040012**Source code:** :source:`Lib/statistics.py`
13
Larry Hastingsf5e987b2013-10-19 11:50:09 -070014.. testsetup:: *
15
16 from statistics import *
17 __name__ = '<doctest>'
18
Larry Hastingsf5e987b2013-10-19 11:50:09 -070019--------------
20
21This module provides functions for calculating mathematical statistics of
Raymond Hettingerd8c93aa2019-09-05 23:02:27 -070022numeric (:class:`~numbers.Real`-valued) data.
Larry Hastingsf5e987b2013-10-19 11:50:09 -070023
Raymond Hettingerd8c93aa2019-09-05 23:02:27 -070024The module is not intended to be a competitor to third-party libraries such
25as `NumPy <https://numpy.org>`_, `SciPy <https://www.scipy.org/>`_, or
26proprietary full-featured statistics packages aimed at professional
27statisticians such as Minitab, SAS and Matlab. It is aimed at the level of
28graphing and scientific calculators.
Nick Coghlan73afe2a2014-02-08 19:58:04 +100029
Raymond Hettingerd8c93aa2019-09-05 23:02:27 -070030Unless explicitly noted, these functions support :class:`int`,
31:class:`float`, :class:`~decimal.Decimal` and :class:`~fractions.Fraction`.
32Behaviour with other types (whether in the numeric tower or not) is
33currently unsupported. Collections with a mix of types are also undefined
34and implementation-dependent. If your input data consists of mixed types,
35you may be able to use :func:`map` to ensure a consistent result, for
36example: ``map(float, input_data)``.
Nick Coghlan73afe2a2014-02-08 19:58:04 +100037
Larry Hastingsf5e987b2013-10-19 11:50:09 -070038Averages and measures of central location
39-----------------------------------------
40
41These functions calculate an average or typical value from a population
42or sample.
43
Raymond Hettingerfc06a192019-03-12 00:43:27 -070044======================= ===============================================================
Larry Hastingsf5e987b2013-10-19 11:50:09 -070045:func:`mean` Arithmetic mean ("average") of data.
Raymond Hettinger47d99872019-02-21 15:06:29 -080046:func:`fmean` Fast, floating point arithmetic mean.
Raymond Hettinger6463ba32019-04-07 09:20:03 -070047:func:`geometric_mean` Geometric mean of data.
Steven D'Aprano22873182016-08-24 02:34:25 +100048:func:`harmonic_mean` Harmonic mean of data.
Larry Hastingsf5e987b2013-10-19 11:50:09 -070049:func:`median` Median (middle value) of data.
50:func:`median_low` Low median of data.
51:func:`median_high` High median of data.
52:func:`median_grouped` Median, or 50th percentile, of grouped data.
Raymond Hettingerfc06a192019-03-12 00:43:27 -070053:func:`mode` Single mode (most common value) of discrete or nominal data.
54:func:`multimode` List of modes (most common values) of discrete or nomimal data.
Raymond Hettinger9013ccf2019-04-23 00:06:35 -070055:func:`quantiles` Divide data into intervals with equal probability.
Raymond Hettingerfc06a192019-03-12 00:43:27 -070056======================= ===============================================================
Larry Hastingsf5e987b2013-10-19 11:50:09 -070057
Georg Brandleb2aeec2013-10-21 08:57:26 +020058Measures of spread
59------------------
Larry Hastingsf5e987b2013-10-19 11:50:09 -070060
Georg Brandleb2aeec2013-10-21 08:57:26 +020061These functions calculate a measure of how much the population or sample
62tends to deviate from the typical or average values.
63
64======================= =============================================
65:func:`pstdev` Population standard deviation of data.
66:func:`pvariance` Population variance of data.
67:func:`stdev` Sample standard deviation of data.
68:func:`variance` Sample variance of data.
69======================= =============================================
70
71
72Function details
73----------------
Larry Hastingsf5e987b2013-10-19 11:50:09 -070074
Georg Brandle051b552013-11-04 07:30:50 +010075Note: The functions do not require the data given to them to be sorted.
76However, for reading convenience, most of the examples show sorted sequences.
77
Larry Hastingsf5e987b2013-10-19 11:50:09 -070078.. function:: mean(data)
79
Raymond Hettinger733b9a32019-11-11 23:35:06 -080080 Return the sample arithmetic mean of *data* which can be a sequence or iterable.
Larry Hastingsf5e987b2013-10-19 11:50:09 -070081
Georg Brandleb2aeec2013-10-21 08:57:26 +020082 The arithmetic mean is the sum of the data divided by the number of data
83 points. It is commonly called "the average", although it is only one of many
84 different mathematical averages. It is a measure of the central location of
85 the data.
86
87 If *data* is empty, :exc:`StatisticsError` will be raised.
Larry Hastingsf5e987b2013-10-19 11:50:09 -070088
89 Some examples of use:
90
91 .. doctest::
92
93 >>> mean([1, 2, 3, 4, 4])
94 2.8
95 >>> mean([-1.0, 2.5, 3.25, 5.75])
96 2.625
97
98 >>> from fractions import Fraction as F
99 >>> mean([F(3, 7), F(1, 21), F(5, 3), F(1, 3)])
100 Fraction(13, 21)
101
102 >>> from decimal import Decimal as D
103 >>> mean([D("0.5"), D("0.75"), D("0.625"), D("0.375")])
104 Decimal('0.5625')
105
106 .. note::
107
Georg Brandla3fdcaa2013-10-21 09:08:39 +0200108 The mean is strongly affected by outliers and is not a robust estimator
Raymond Hettingere4810b22019-09-05 00:18:47 -0700109 for central location: the mean is not necessarily a typical example of
110 the data points. For more robust measures of central location, see
111 :func:`median` and :func:`mode`.
Larry Hastingsf5e987b2013-10-19 11:50:09 -0700112
Georg Brandleb2aeec2013-10-21 08:57:26 +0200113 The sample mean gives an unbiased estimate of the true population mean,
Raymond Hettingerd8c93aa2019-09-05 23:02:27 -0700114 so that when taken on average over all the possible samples,
Georg Brandleb2aeec2013-10-21 08:57:26 +0200115 ``mean(sample)`` converges on the true mean of the entire population. If
116 *data* represents the entire population rather than a sample, then
117 ``mean(data)`` is equivalent to calculating the true population mean μ.
Larry Hastingsf5e987b2013-10-19 11:50:09 -0700118
Larry Hastingsf5e987b2013-10-19 11:50:09 -0700119
Raymond Hettinger47d99872019-02-21 15:06:29 -0800120.. function:: fmean(data)
121
122 Convert *data* to floats and compute the arithmetic mean.
123
124 This runs faster than the :func:`mean` function and it always returns a
Raymond Hettinger733b9a32019-11-11 23:35:06 -0800125 :class:`float`. The *data* may be a sequence or iterable. If the input
Raymond Hettingere4810b22019-09-05 00:18:47 -0700126 dataset is empty, raises a :exc:`StatisticsError`.
Raymond Hettinger47d99872019-02-21 15:06:29 -0800127
128 .. doctest::
129
130 >>> fmean([3.5, 4.0, 5.25])
131 4.25
132
133 .. versionadded:: 3.8
134
135
Raymond Hettinger6463ba32019-04-07 09:20:03 -0700136.. function:: geometric_mean(data)
137
138 Convert *data* to floats and compute the geometric mean.
139
Raymond Hettingere4810b22019-09-05 00:18:47 -0700140 The geometric mean indicates the central tendency or typical value of the
141 *data* using the product of the values (as opposed to the arithmetic mean
142 which uses their sum).
143
Raymond Hettinger6463ba32019-04-07 09:20:03 -0700144 Raises a :exc:`StatisticsError` if the input dataset is empty,
145 if it contains a zero, or if it contains a negative value.
Raymond Hettinger733b9a32019-11-11 23:35:06 -0800146 The *data* may be a sequence or iterable.
Raymond Hettinger6463ba32019-04-07 09:20:03 -0700147
148 No special efforts are made to achieve exact results.
149 (However, this may change in the future.)
150
151 .. doctest::
152
Raymond Hettingere4810b22019-09-05 00:18:47 -0700153 >>> round(geometric_mean([54, 24, 36]), 1)
Raymond Hettinger6463ba32019-04-07 09:20:03 -0700154 36.0
155
156 .. versionadded:: 3.8
157
158
Steven D'Aprano22873182016-08-24 02:34:25 +1000159.. function:: harmonic_mean(data)
160
Raymond Hettinger733b9a32019-11-11 23:35:06 -0800161 Return the harmonic mean of *data*, a sequence or iterable of
Steven D'Aprano22873182016-08-24 02:34:25 +1000162 real-valued numbers.
163
164 The harmonic mean, sometimes called the subcontrary mean, is the
Zachary Warec019bd32016-08-23 13:23:31 -0500165 reciprocal of the arithmetic :func:`mean` of the reciprocals of the
Steven D'Aprano22873182016-08-24 02:34:25 +1000166 data. For example, the harmonic mean of three values *a*, *b* and *c*
Raymond Hettinger7f460492019-11-06 21:50:44 -0800167 will be equivalent to ``3/(1/a + 1/b + 1/c)``. If one of the values
168 is zero, the result will be zero.
Steven D'Aprano22873182016-08-24 02:34:25 +1000169
170 The harmonic mean is a type of average, a measure of the central
Raymond Hettingerd8c93aa2019-09-05 23:02:27 -0700171 location of the data. It is often appropriate when averaging
172 rates or ratios, for example speeds.
173
174 Suppose a car travels 10 km at 40 km/hr, then another 10 km at 60 km/hr.
175 What is the average speed?
176
177 .. doctest::
178
179 >>> harmonic_mean([40, 60])
180 48.0
Steven D'Aprano22873182016-08-24 02:34:25 +1000181
182 Suppose an investor purchases an equal value of shares in each of
183 three companies, with P/E (price/earning) ratios of 2.5, 3 and 10.
184 What is the average P/E ratio for the investor's portfolio?
185
186 .. doctest::
187
188 >>> harmonic_mean([2.5, 3, 10]) # For an equal investment portfolio.
189 3.6
190
Zachary Warec019bd32016-08-23 13:23:31 -0500191 :exc:`StatisticsError` is raised if *data* is empty, or any element
Steven D'Aprano22873182016-08-24 02:34:25 +1000192 is less than zero.
193
Raymond Hettinger7f460492019-11-06 21:50:44 -0800194 The current algorithm has an early-out when it encounters a zero
195 in the input. This means that the subsequent inputs are not tested
196 for validity. (This behavior may change in the future.)
197
Zachary Warec019bd32016-08-23 13:23:31 -0500198 .. versionadded:: 3.6
199
Steven D'Aprano22873182016-08-24 02:34:25 +1000200
Larry Hastingsf5e987b2013-10-19 11:50:09 -0700201.. function:: median(data)
202
Georg Brandleb2aeec2013-10-21 08:57:26 +0200203 Return the median (middle value) of numeric data, using the common "mean of
204 middle two" method. If *data* is empty, :exc:`StatisticsError` is raised.
Raymond Hettinger733b9a32019-11-11 23:35:06 -0800205 *data* can be a sequence or iterable.
Larry Hastingsf5e987b2013-10-19 11:50:09 -0700206
Raymond Hettingerd8c93aa2019-09-05 23:02:27 -0700207 The median is a robust measure of central location and is less affected by
208 the presence of outliers. When the number of data points is odd, the
209 middle data point is returned:
Larry Hastingsf5e987b2013-10-19 11:50:09 -0700210
211 .. doctest::
212
213 >>> median([1, 3, 5])
214 3
215
Georg Brandleb2aeec2013-10-21 08:57:26 +0200216 When the number of data points is even, the median is interpolated by taking
217 the average of the two middle values:
Larry Hastingsf5e987b2013-10-19 11:50:09 -0700218
219 .. doctest::
220
221 >>> median([1, 3, 5, 7])
222 4.0
223
Georg Brandleb2aeec2013-10-21 08:57:26 +0200224 This is suited for when your data is discrete, and you don't mind that the
225 median may not be an actual data point.
Larry Hastingsf5e987b2013-10-19 11:50:09 -0700226
Raymond Hettingerd8c93aa2019-09-05 23:02:27 -0700227 If the data is ordinal (supports order operations) but not numeric (doesn't
228 support addition), consider using :func:`median_low` or :func:`median_high`
Tal Einatfdd6e0b2018-06-25 14:04:01 +0300229 instead.
230
Larry Hastingsf5e987b2013-10-19 11:50:09 -0700231.. function:: median_low(data)
232
Georg Brandleb2aeec2013-10-21 08:57:26 +0200233 Return the low median of numeric data. If *data* is empty,
Raymond Hettinger733b9a32019-11-11 23:35:06 -0800234 :exc:`StatisticsError` is raised. *data* can be a sequence or iterable.
Larry Hastingsf5e987b2013-10-19 11:50:09 -0700235
Georg Brandleb2aeec2013-10-21 08:57:26 +0200236 The low median is always a member of the data set. When the number of data
237 points is odd, the middle value is returned. When it is even, the smaller of
238 the two middle values is returned.
Larry Hastingsf5e987b2013-10-19 11:50:09 -0700239
240 .. doctest::
241
242 >>> median_low([1, 3, 5])
243 3
244 >>> median_low([1, 3, 5, 7])
245 3
246
Georg Brandleb2aeec2013-10-21 08:57:26 +0200247 Use the low median when your data are discrete and you prefer the median to
248 be an actual data point rather than interpolated.
Larry Hastingsf5e987b2013-10-19 11:50:09 -0700249
Larry Hastingsf5e987b2013-10-19 11:50:09 -0700250
251.. function:: median_high(data)
252
Georg Brandleb2aeec2013-10-21 08:57:26 +0200253 Return the high median of data. If *data* is empty, :exc:`StatisticsError`
Raymond Hettinger733b9a32019-11-11 23:35:06 -0800254 is raised. *data* can be a sequence or iterable.
Larry Hastingsf5e987b2013-10-19 11:50:09 -0700255
Georg Brandleb2aeec2013-10-21 08:57:26 +0200256 The high median is always a member of the data set. When the number of data
257 points is odd, the middle value is returned. When it is even, the larger of
258 the two middle values is returned.
Larry Hastingsf5e987b2013-10-19 11:50:09 -0700259
260 .. doctest::
261
262 >>> median_high([1, 3, 5])
263 3
264 >>> median_high([1, 3, 5, 7])
265 5
266
Georg Brandleb2aeec2013-10-21 08:57:26 +0200267 Use the high median when your data are discrete and you prefer the median to
268 be an actual data point rather than interpolated.
Larry Hastingsf5e987b2013-10-19 11:50:09 -0700269
Larry Hastingsf5e987b2013-10-19 11:50:09 -0700270
Georg Brandleb2aeec2013-10-21 08:57:26 +0200271.. function:: median_grouped(data, interval=1)
Larry Hastingsf5e987b2013-10-19 11:50:09 -0700272
Georg Brandleb2aeec2013-10-21 08:57:26 +0200273 Return the median of grouped continuous data, calculated as the 50th
274 percentile, using interpolation. If *data* is empty, :exc:`StatisticsError`
Raymond Hettinger733b9a32019-11-11 23:35:06 -0800275 is raised. *data* can be a sequence or iterable.
Larry Hastingsf5e987b2013-10-19 11:50:09 -0700276
277 .. doctest::
278
279 >>> median_grouped([52, 52, 53, 54])
280 52.5
281
Georg Brandleb2aeec2013-10-21 08:57:26 +0200282 In the following example, the data are rounded, so that each value represents
Serhiy Storchakac7b1a0b2016-11-26 13:43:28 +0200283 the midpoint of data classes, e.g. 1 is the midpoint of the class 0.5--1.5, 2
284 is the midpoint of 1.5--2.5, 3 is the midpoint of 2.5--3.5, etc. With the data
285 given, the middle value falls somewhere in the class 3.5--4.5, and
Georg Brandleb2aeec2013-10-21 08:57:26 +0200286 interpolation is used to estimate it:
Larry Hastingsf5e987b2013-10-19 11:50:09 -0700287
288 .. doctest::
289
290 >>> median_grouped([1, 2, 2, 3, 4, 4, 4, 4, 4, 5])
291 3.7
292
Georg Brandleb2aeec2013-10-21 08:57:26 +0200293 Optional argument *interval* represents the class interval, and defaults
294 to 1. Changing the class interval naturally will change the interpolation:
Larry Hastingsf5e987b2013-10-19 11:50:09 -0700295
296 .. doctest::
297
298 >>> median_grouped([1, 3, 3, 5, 7], interval=1)
299 3.25
300 >>> median_grouped([1, 3, 3, 5, 7], interval=2)
301 3.5
302
303 This function does not check whether the data points are at least
Georg Brandleb2aeec2013-10-21 08:57:26 +0200304 *interval* apart.
Larry Hastingsf5e987b2013-10-19 11:50:09 -0700305
306 .. impl-detail::
307
Georg Brandleb2aeec2013-10-21 08:57:26 +0200308 Under some circumstances, :func:`median_grouped` may coerce data points to
309 floats. This behaviour is likely to change in the future.
Larry Hastingsf5e987b2013-10-19 11:50:09 -0700310
311 .. seealso::
312
Georg Brandleb2aeec2013-10-21 08:57:26 +0200313 * "Statistics for the Behavioral Sciences", Frederick J Gravetter and
314 Larry B Wallnau (8th Edition).
Larry Hastingsf5e987b2013-10-19 11:50:09 -0700315
Georg Brandleb2aeec2013-10-21 08:57:26 +0200316 * The `SSMEDIAN
Georg Brandl525d3552014-10-29 10:26:56 +0100317 <https://help.gnome.org/users/gnumeric/stable/gnumeric.html#gnumeric-function-SSMEDIAN>`_
Georg Brandleb2aeec2013-10-21 08:57:26 +0200318 function in the Gnome Gnumeric spreadsheet, including `this discussion
319 <https://mail.gnome.org/archives/gnumeric-list/2011-April/msg00018.html>`_.
Larry Hastingsf5e987b2013-10-19 11:50:09 -0700320
Larry Hastingsf5e987b2013-10-19 11:50:09 -0700321
322.. function:: mode(data)
323
Raymond Hettingerfc06a192019-03-12 00:43:27 -0700324 Return the single most common data point from discrete or nominal *data*.
325 The mode (when it exists) is the most typical value and serves as a
326 measure of central location.
Georg Brandleb2aeec2013-10-21 08:57:26 +0200327
Raymond Hettingere4810b22019-09-05 00:18:47 -0700328 If there are multiple modes with the same frequency, returns the first one
329 encountered in the *data*. If the smallest or largest of those is
330 desired instead, use ``min(multimode(data))`` or ``max(multimode(data))``.
331 If the input *data* is empty, :exc:`StatisticsError` is raised.
Larry Hastingsf5e987b2013-10-19 11:50:09 -0700332
Raymond Hettingerd8c93aa2019-09-05 23:02:27 -0700333 ``mode`` assumes discrete data and returns a single value. This is the
Larry Hastingsf5e987b2013-10-19 11:50:09 -0700334 standard treatment of the mode as commonly taught in schools:
335
336 .. doctest::
337
338 >>> mode([1, 1, 2, 3, 3, 3, 3, 4])
339 3
340
Raymond Hettingere4810b22019-09-05 00:18:47 -0700341 The mode is unique in that it is the only statistic in this package that
342 also applies to nominal (non-numeric) data:
Larry Hastingsf5e987b2013-10-19 11:50:09 -0700343
344 .. doctest::
345
346 >>> mode(["red", "blue", "blue", "red", "green", "red", "red"])
347 'red'
348
Raymond Hettingerfc06a192019-03-12 00:43:27 -0700349 .. versionchanged:: 3.8
350 Now handles multimodal datasets by returning the first mode encountered.
351 Formerly, it raised :exc:`StatisticsError` when more than one mode was
352 found.
353
354
355.. function:: multimode(data)
356
357 Return a list of the most frequently occurring values in the order they
358 were first encountered in the *data*. Will return more than one result if
359 there are multiple modes or an empty list if the *data* is empty:
360
361 .. doctest::
362
363 >>> multimode('aabbbbccddddeeffffgg')
364 ['b', 'd', 'f']
365 >>> multimode('')
366 []
367
368 .. versionadded:: 3.8
369
Larry Hastingsf5e987b2013-10-19 11:50:09 -0700370
Georg Brandleb2aeec2013-10-21 08:57:26 +0200371.. function:: pstdev(data, mu=None)
Larry Hastingsf5e987b2013-10-19 11:50:09 -0700372
Georg Brandleb2aeec2013-10-21 08:57:26 +0200373 Return the population standard deviation (the square root of the population
374 variance). See :func:`pvariance` for arguments and other details.
Larry Hastingsf5e987b2013-10-19 11:50:09 -0700375
376 .. doctest::
377
378 >>> pstdev([1.5, 2.5, 2.5, 2.75, 3.25, 4.75])
379 0.986893273527251
380
Larry Hastingsf5e987b2013-10-19 11:50:09 -0700381
Georg Brandleb2aeec2013-10-21 08:57:26 +0200382.. function:: pvariance(data, mu=None)
Larry Hastingsf5e987b2013-10-19 11:50:09 -0700383
Raymond Hettinger733b9a32019-11-11 23:35:06 -0800384 Return the population variance of *data*, a non-empty sequence or iterable
Raymond Hettingere4810b22019-09-05 00:18:47 -0700385 of real-valued numbers. Variance, or second moment about the mean, is a
386 measure of the variability (spread or dispersion) of data. A large
387 variance indicates that the data is spread out; a small variance indicates
388 it is clustered closely around the mean.
Larry Hastingsf5e987b2013-10-19 11:50:09 -0700389
Raymond Hettingere4810b22019-09-05 00:18:47 -0700390 If the optional second argument *mu* is given, it is typically the mean of
391 the *data*. It can also be used to compute the second moment around a
392 point that is not the mean. If it is missing or ``None`` (the default),
393 the arithmetic mean is automatically calculated.
Larry Hastingsf5e987b2013-10-19 11:50:09 -0700394
Georg Brandleb2aeec2013-10-21 08:57:26 +0200395 Use this function to calculate the variance from the entire population. To
396 estimate the variance from a sample, the :func:`variance` function is usually
397 a better choice.
398
399 Raises :exc:`StatisticsError` if *data* is empty.
Larry Hastingsf5e987b2013-10-19 11:50:09 -0700400
401 Examples:
402
403 .. doctest::
404
405 >>> data = [0.0, 0.25, 0.25, 1.25, 1.5, 1.75, 2.75, 3.25]
406 >>> pvariance(data)
407 1.25
408
Georg Brandleb2aeec2013-10-21 08:57:26 +0200409 If you have already calculated the mean of your data, you can pass it as the
410 optional second argument *mu* to avoid recalculation:
Larry Hastingsf5e987b2013-10-19 11:50:09 -0700411
412 .. doctest::
413
414 >>> mu = mean(data)
415 >>> pvariance(data, mu)
416 1.25
417
Larry Hastingsf5e987b2013-10-19 11:50:09 -0700418 Decimals and Fractions are supported:
419
420 .. doctest::
421
422 >>> from decimal import Decimal as D
423 >>> pvariance([D("27.5"), D("30.25"), D("30.25"), D("34.5"), D("41.75")])
424 Decimal('24.815')
425
426 >>> from fractions import Fraction as F
427 >>> pvariance([F(1, 4), F(5, 4), F(1, 2)])
428 Fraction(13, 72)
429
430 .. note::
431
Georg Brandleb2aeec2013-10-21 08:57:26 +0200432 When called with the entire population, this gives the population variance
433 σ². When called on a sample instead, this is the biased sample variance
434 s², also known as variance with N degrees of freedom.
Larry Hastingsf5e987b2013-10-19 11:50:09 -0700435
Raymond Hettingere4810b22019-09-05 00:18:47 -0700436 If you somehow know the true population mean μ, you may use this
437 function to calculate the variance of a sample, giving the known
438 population mean as the second argument. Provided the data points are a
439 random sample of the population, the result will be an unbiased estimate
440 of the population variance.
Larry Hastingsf5e987b2013-10-19 11:50:09 -0700441
Larry Hastingsf5e987b2013-10-19 11:50:09 -0700442
Georg Brandleb2aeec2013-10-21 08:57:26 +0200443.. function:: stdev(data, xbar=None)
Larry Hastingsf5e987b2013-10-19 11:50:09 -0700444
Georg Brandleb2aeec2013-10-21 08:57:26 +0200445 Return the sample standard deviation (the square root of the sample
446 variance). See :func:`variance` for arguments and other details.
Larry Hastingsf5e987b2013-10-19 11:50:09 -0700447
448 .. doctest::
449
450 >>> stdev([1.5, 2.5, 2.5, 2.75, 3.25, 4.75])
451 1.0810874155219827
452
Larry Hastingsf5e987b2013-10-19 11:50:09 -0700453
Georg Brandleb2aeec2013-10-21 08:57:26 +0200454.. function:: variance(data, xbar=None)
Larry Hastingsf5e987b2013-10-19 11:50:09 -0700455
Georg Brandleb2aeec2013-10-21 08:57:26 +0200456 Return the sample variance of *data*, an iterable of at least two real-valued
457 numbers. Variance, or second moment about the mean, is a measure of the
458 variability (spread or dispersion) of data. A large variance indicates that
459 the data is spread out; a small variance indicates it is clustered closely
460 around the mean.
Larry Hastingsf5e987b2013-10-19 11:50:09 -0700461
Georg Brandleb2aeec2013-10-21 08:57:26 +0200462 If the optional second argument *xbar* is given, it should be the mean of
463 *data*. If it is missing or ``None`` (the default), the mean is
Ned Deily35866732013-10-19 12:10:01 -0700464 automatically calculated.
Larry Hastingsf5e987b2013-10-19 11:50:09 -0700465
Georg Brandleb2aeec2013-10-21 08:57:26 +0200466 Use this function when your data is a sample from a population. To calculate
467 the variance from the entire population, see :func:`pvariance`.
468
469 Raises :exc:`StatisticsError` if *data* has fewer than two values.
Larry Hastingsf5e987b2013-10-19 11:50:09 -0700470
471 Examples:
472
473 .. doctest::
474
475 >>> data = [2.75, 1.75, 1.25, 0.25, 0.5, 1.25, 3.5]
476 >>> variance(data)
477 1.3720238095238095
478
Georg Brandleb2aeec2013-10-21 08:57:26 +0200479 If you have already calculated the mean of your data, you can pass it as the
480 optional second argument *xbar* to avoid recalculation:
Larry Hastingsf5e987b2013-10-19 11:50:09 -0700481
482 .. doctest::
483
484 >>> m = mean(data)
485 >>> variance(data, m)
486 1.3720238095238095
487
Georg Brandleb2aeec2013-10-21 08:57:26 +0200488 This function does not attempt to verify that you have passed the actual mean
489 as *xbar*. Using arbitrary values for *xbar* can lead to invalid or
Larry Hastingsf5e987b2013-10-19 11:50:09 -0700490 impossible results.
491
492 Decimal and Fraction values are supported:
493
494 .. doctest::
495
496 >>> from decimal import Decimal as D
497 >>> variance([D("27.5"), D("30.25"), D("30.25"), D("34.5"), D("41.75")])
498 Decimal('31.01875')
499
500 >>> from fractions import Fraction as F
501 >>> variance([F(1, 6), F(1, 2), F(5, 3)])
502 Fraction(67, 108)
503
504 .. note::
505
Georg Brandleb2aeec2013-10-21 08:57:26 +0200506 This is the sample variance s² with Bessel's correction, also known as
507 variance with N-1 degrees of freedom. Provided that the data points are
508 representative (e.g. independent and identically distributed), the result
509 should be an unbiased estimate of the true population variance.
Larry Hastingsf5e987b2013-10-19 11:50:09 -0700510
Georg Brandleb2aeec2013-10-21 08:57:26 +0200511 If you somehow know the actual population mean μ you should pass it to the
512 :func:`pvariance` function as the *mu* parameter to get the variance of a
513 sample.
Larry Hastingsf5e987b2013-10-19 11:50:09 -0700514
Raymond Hettingere4810b22019-09-05 00:18:47 -0700515.. function:: quantiles(data, *, n=4, method='exclusive')
Raymond Hettinger9013ccf2019-04-23 00:06:35 -0700516
Raymond Hettingere4810b22019-09-05 00:18:47 -0700517 Divide *data* into *n* continuous intervals with equal probability.
Raymond Hettinger9013ccf2019-04-23 00:06:35 -0700518 Returns a list of ``n - 1`` cut points separating the intervals.
519
520 Set *n* to 4 for quartiles (the default). Set *n* to 10 for deciles. Set
521 *n* to 100 for percentiles which gives the 99 cuts points that separate
Raymond Hettinger4db25d52019-09-08 16:57:58 -0700522 *data* into 100 equal sized groups. Raises :exc:`StatisticsError` if *n*
Raymond Hettinger9013ccf2019-04-23 00:06:35 -0700523 is not least 1.
524
Raymond Hettinger4db25d52019-09-08 16:57:58 -0700525 The *data* can be any iterable containing sample data. For meaningful
Raymond Hettingere4810b22019-09-05 00:18:47 -0700526 results, the number of data points in *data* should be larger than *n*.
Raymond Hettinger9013ccf2019-04-23 00:06:35 -0700527 Raises :exc:`StatisticsError` if there are not at least two data points.
528
Raymond Hettinger4db25d52019-09-08 16:57:58 -0700529 The cut points are linearly interpolated from the
Raymond Hettinger9013ccf2019-04-23 00:06:35 -0700530 two nearest data points. For example, if a cut point falls one-third
531 of the distance between two sample values, ``100`` and ``112``, the
Raymond Hettingere917f2e2019-05-18 10:18:29 -0700532 cut-point will evaluate to ``104``.
Raymond Hettinger9013ccf2019-04-23 00:06:35 -0700533
Raymond Hettingere917f2e2019-05-18 10:18:29 -0700534 The *method* for computing quantiles can be varied depending on
Raymond Hettingerd8c93aa2019-09-05 23:02:27 -0700535 whether the *data* includes or excludes the lowest and
Raymond Hettingere917f2e2019-05-18 10:18:29 -0700536 highest possible values from the population.
537
538 The default *method* is "exclusive" and is used for data sampled from
539 a population that can have more extreme values than found in the
540 samples. The portion of the population falling below the *i-th* of
Raymond Hettingerb530a442019-07-21 16:32:00 -0700541 *m* sorted data points is computed as ``i / (m + 1)``. Given nine
542 sample values, the method sorts them and assigns the following
543 percentiles: 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%.
Raymond Hettingere917f2e2019-05-18 10:18:29 -0700544
545 Setting the *method* to "inclusive" is used for describing population
Raymond Hettingerb530a442019-07-21 16:32:00 -0700546 data or for samples that are known to include the most extreme values
Raymond Hettingere4810b22019-09-05 00:18:47 -0700547 from the population. The minimum value in *data* is treated as the 0th
Raymond Hettingerb530a442019-07-21 16:32:00 -0700548 percentile and the maximum value is treated as the 100th percentile.
549 The portion of the population falling below the *i-th* of *m* sorted
550 data points is computed as ``(i - 1) / (m - 1)``. Given 11 sample
551 values, the method sorts them and assigns the following percentiles:
552 0%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 100%.
Raymond Hettingere917f2e2019-05-18 10:18:29 -0700553
Raymond Hettinger9013ccf2019-04-23 00:06:35 -0700554 .. doctest::
555
556 # Decile cut points for empirically sampled data
557 >>> data = [105, 129, 87, 86, 111, 111, 89, 81, 108, 92, 110,
558 ... 100, 75, 105, 103, 109, 76, 119, 99, 91, 103, 129,
559 ... 106, 101, 84, 111, 74, 87, 86, 103, 103, 106, 86,
560 ... 111, 75, 87, 102, 121, 111, 88, 89, 101, 106, 95,
561 ... 103, 107, 101, 81, 109, 104]
562 >>> [round(q, 1) for q in quantiles(data, n=10)]
563 [81.0, 86.2, 89.0, 99.4, 102.5, 103.6, 106.0, 109.8, 111.0]
564
Raymond Hettinger9013ccf2019-04-23 00:06:35 -0700565 .. versionadded:: 3.8
566
567
Larry Hastingsf5e987b2013-10-19 11:50:09 -0700568Exceptions
569----------
570
571A single exception is defined:
572
Benjamin Peterson4ea16e52013-10-20 17:52:54 -0400573.. exception:: StatisticsError
Larry Hastingsf5e987b2013-10-19 11:50:09 -0700574
Benjamin Peterson44c30652013-10-20 17:52:09 -0400575 Subclass of :exc:`ValueError` for statistics-related exceptions.
Larry Hastingsf5e987b2013-10-19 11:50:09 -0700576
Raymond Hettinger11c79532019-02-23 14:44:07 -0800577
578:class:`NormalDist` objects
Raymond Hettinger1c668d12019-03-14 21:46:31 -0700579---------------------------
Raymond Hettinger11c79532019-02-23 14:44:07 -0800580
Raymond Hettinger9add4b32019-02-28 21:47:26 -0800581:class:`NormalDist` is a tool for creating and manipulating normal
582distributions of a `random variable
583<http://www.stat.yale.edu/Courses/1997-98/101/ranvar.htm>`_. It is a
Raymond Hettingere4810b22019-09-05 00:18:47 -0700584class that treats the mean and standard deviation of data
Raymond Hettinger9add4b32019-02-28 21:47:26 -0800585measurements as a single entity.
Raymond Hettinger11c79532019-02-23 14:44:07 -0800586
587Normal distributions arise from the `Central Limit Theorem
588<https://en.wikipedia.org/wiki/Central_limit_theorem>`_ and have a wide range
Raymond Hettinger1f58f4f2019-03-06 23:23:55 -0800589of applications in statistics.
Raymond Hettinger11c79532019-02-23 14:44:07 -0800590
591.. class:: NormalDist(mu=0.0, sigma=1.0)
592
593 Returns a new *NormalDist* object where *mu* represents the `arithmetic
Raymond Hettingeref17fdb2019-02-28 09:16:25 -0800594 mean <https://en.wikipedia.org/wiki/Arithmetic_mean>`_ and *sigma*
Raymond Hettinger11c79532019-02-23 14:44:07 -0800595 represents the `standard deviation
Raymond Hettingeref17fdb2019-02-28 09:16:25 -0800596 <https://en.wikipedia.org/wiki/Standard_deviation>`_.
Raymond Hettinger11c79532019-02-23 14:44:07 -0800597
598 If *sigma* is negative, raises :exc:`StatisticsError`.
599
Raymond Hettinger9e456bc2019-02-24 11:44:55 -0800600 .. attribute:: mean
Raymond Hettinger11c79532019-02-23 14:44:07 -0800601
Raymond Hettinger1f58f4f2019-03-06 23:23:55 -0800602 A read-only property for the `arithmetic mean
Raymond Hettinger9e456bc2019-02-24 11:44:55 -0800603 <https://en.wikipedia.org/wiki/Arithmetic_mean>`_ of a normal
604 distribution.
Raymond Hettinger11c79532019-02-23 14:44:07 -0800605
Raymond Hettinger4db25d52019-09-08 16:57:58 -0700606 .. attribute:: median
607
608 A read-only property for the `median
609 <https://en.wikipedia.org/wiki/Median>`_ of a normal
610 distribution.
611
612 .. attribute:: mode
613
614 A read-only property for the `mode
615 <https://en.wikipedia.org/wiki/Mode_(statistics)>`_ of a normal
616 distribution.
617
Raymond Hettinger9e456bc2019-02-24 11:44:55 -0800618 .. attribute:: stdev
Raymond Hettinger11c79532019-02-23 14:44:07 -0800619
Raymond Hettinger1f58f4f2019-03-06 23:23:55 -0800620 A read-only property for the `standard deviation
Raymond Hettinger9e456bc2019-02-24 11:44:55 -0800621 <https://en.wikipedia.org/wiki/Standard_deviation>`_ of a normal
622 distribution.
Raymond Hettinger11c79532019-02-23 14:44:07 -0800623
624 .. attribute:: variance
625
Raymond Hettinger1f58f4f2019-03-06 23:23:55 -0800626 A read-only property for the `variance
Raymond Hettinger11c79532019-02-23 14:44:07 -0800627 <https://en.wikipedia.org/wiki/Variance>`_ of a normal
628 distribution. Equal to the square of the standard deviation.
629
630 .. classmethod:: NormalDist.from_samples(data)
631
Raymond Hettingere4810b22019-09-05 00:18:47 -0700632 Makes a normal distribution instance with *mu* and *sigma* parameters
633 estimated from the *data* using :func:`fmean` and :func:`stdev`.
Raymond Hettinger11c79532019-02-23 14:44:07 -0800634
Raymond Hettingere4810b22019-09-05 00:18:47 -0700635 The *data* can be any :term:`iterable` and should consist of values
636 that can be converted to type :class:`float`. If *data* does not
637 contain at least two elements, raises :exc:`StatisticsError` because it
638 takes at least one point to estimate a central value and at least two
639 points to estimate dispersion.
Raymond Hettinger11c79532019-02-23 14:44:07 -0800640
Raymond Hettingerfb8c7d52019-04-23 01:46:18 -0700641 .. method:: NormalDist.samples(n, *, seed=None)
Raymond Hettinger11c79532019-02-23 14:44:07 -0800642
643 Generates *n* random samples for a given mean and standard deviation.
644 Returns a :class:`list` of :class:`float` values.
645
646 If *seed* is given, creates a new instance of the underlying random
647 number generator. This is useful for creating reproducible results,
648 even in a multi-threading context.
649
650 .. method:: NormalDist.pdf(x)
651
652 Using a `probability density function (pdf)
Raymond Hettingere4810b22019-09-05 00:18:47 -0700653 <https://en.wikipedia.org/wiki/Probability_density_function>`_, compute
654 the relative likelihood that a random variable *X* will be near the
655 given value *x*. Mathematically, it is the limit of the ratio ``P(x <=
656 X < x+dx) / dx`` as *dx* approaches zero.
Raymond Hettinger11c79532019-02-23 14:44:07 -0800657
Raymond Hettingercc353a02019-03-10 23:43:33 -0700658 The relative likelihood is computed as the probability of a sample
659 occurring in a narrow range divided by the width of the range (hence
660 the word "density"). Since the likelihood is relative to other points,
661 its value can be greater than `1.0`.
Raymond Hettinger11c79532019-02-23 14:44:07 -0800662
663 .. method:: NormalDist.cdf(x)
664
665 Using a `cumulative distribution function (cdf)
666 <https://en.wikipedia.org/wiki/Cumulative_distribution_function>`_,
Raymond Hettinger9add4b32019-02-28 21:47:26 -0800667 compute the probability that a random variable *X* will be less than or
Raymond Hettinger11c79532019-02-23 14:44:07 -0800668 equal to *x*. Mathematically, it is written ``P(X <= x)``.
669
Raymond Hettinger714c60d2019-03-18 20:17:14 -0700670 .. method:: NormalDist.inv_cdf(p)
671
672 Compute the inverse cumulative distribution function, also known as the
673 `quantile function <https://en.wikipedia.org/wiki/Quantile_function>`_
674 or the `percent-point
675 <https://www.statisticshowto.datasciencecentral.com/inverse-distribution-function/>`_
676 function. Mathematically, it is written ``x : P(X <= x) = p``.
677
678 Finds the value *x* of the random variable *X* such that the
679 probability of the variable being less than or equal to that value
680 equals the given probability *p*.
681
Raymond Hettinger318d5372019-03-06 22:59:40 -0800682 .. method:: NormalDist.overlap(other)
683
Raymond Hettingere4810b22019-09-05 00:18:47 -0700684 Measures the agreement between two normal probability distributions.
685 Returns a value between 0.0 and 1.0 giving `the overlapping area for
686 the two probability density functions
687 <https://www.rasch.org/rmt/rmt101r.htm>`_.
Raymond Hettinger318d5372019-03-06 22:59:40 -0800688
Raymond Hettinger8a6cbf82019-10-13 19:53:30 -0700689 .. method:: NormalDist.quantiles(n=4)
Raymond Hettinger4db25d52019-09-08 16:57:58 -0700690
691 Divide the normal distribution into *n* continuous intervals with
692 equal probability. Returns a list of (n - 1) cut points separating
693 the intervals.
694
695 Set *n* to 4 for quartiles (the default). Set *n* to 10 for deciles.
696 Set *n* to 100 for percentiles which gives the 99 cuts points that
697 separate the normal distribution into 100 equal sized groups.
698
Raymond Hettinger11c79532019-02-23 14:44:07 -0800699 Instances of :class:`NormalDist` support addition, subtraction,
700 multiplication and division by a constant. These operations
701 are used for translation and scaling. For example:
702
703 .. doctest::
704
705 >>> temperature_february = NormalDist(5, 2.5) # Celsius
706 >>> temperature_february * (9/5) + 32 # Fahrenheit
707 NormalDist(mu=41.0, sigma=4.5)
708
Raymond Hettingercc353a02019-03-10 23:43:33 -0700709 Dividing a constant by an instance of :class:`NormalDist` is not supported
710 because the result wouldn't be normally distributed.
Raymond Hettinger11c79532019-02-23 14:44:07 -0800711
712 Since normal distributions arise from additive effects of independent
Raymond Hettinger1f58f4f2019-03-06 23:23:55 -0800713 variables, it is possible to `add and subtract two independent normally
714 distributed random variables
Raymond Hettinger11c79532019-02-23 14:44:07 -0800715 <https://en.wikipedia.org/wiki/Sum_of_normally_distributed_random_variables>`_
716 represented as instances of :class:`NormalDist`. For example:
717
718 .. doctest::
719
720 >>> birth_weights = NormalDist.from_samples([2.5, 3.1, 2.1, 2.4, 2.7, 3.5])
721 >>> drug_effects = NormalDist(0.4, 0.15)
722 >>> combined = birth_weights + drug_effects
Raymond Hettingercc353a02019-03-10 23:43:33 -0700723 >>> round(combined.mean, 1)
724 3.1
725 >>> round(combined.stdev, 1)
726 0.5
Raymond Hettinger11c79532019-02-23 14:44:07 -0800727
728 .. versionadded:: 3.8
729
730
731:class:`NormalDist` Examples and Recipes
Raymond Hettinger1c668d12019-03-14 21:46:31 -0700732^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Raymond Hettinger11c79532019-02-23 14:44:07 -0800733
Raymond Hettingeref17fdb2019-02-28 09:16:25 -0800734:class:`NormalDist` readily solves classic probability problems.
Raymond Hettinger11c79532019-02-23 14:44:07 -0800735
736For example, given `historical data for SAT exams
737<https://blog.prepscholar.com/sat-standard-deviation>`_ showing that scores
Raymond Hettinger1f58f4f2019-03-06 23:23:55 -0800738are normally distributed with a mean of 1060 and a standard deviation of 192,
Raymond Hettinger9013ccf2019-04-23 00:06:35 -0700739determine the percentage of students with test scores between 1100 and
7401200, after rounding to the nearest whole number:
Raymond Hettinger11c79532019-02-23 14:44:07 -0800741
742.. doctest::
743
744 >>> sat = NormalDist(1060, 195)
Raymond Hettinger1f58f4f2019-03-06 23:23:55 -0800745 >>> fraction = sat.cdf(1200 + 0.5) - sat.cdf(1100 - 0.5)
Raymond Hettingercc353a02019-03-10 23:43:33 -0700746 >>> round(fraction * 100.0, 1)
747 18.4
Raymond Hettinger11c79532019-02-23 14:44:07 -0800748
Raymond Hettinger714c60d2019-03-18 20:17:14 -0700749Find the `quartiles <https://en.wikipedia.org/wiki/Quartile>`_ and `deciles
750<https://en.wikipedia.org/wiki/Decile>`_ for the SAT scores:
751
752.. doctest::
753
Raymond Hettinger4db25d52019-09-08 16:57:58 -0700754 >>> list(map(round, sat.quantiles()))
Raymond Hettinger714c60d2019-03-18 20:17:14 -0700755 [928, 1060, 1192]
Raymond Hettinger4db25d52019-09-08 16:57:58 -0700756 >>> list(map(round, sat.quantiles(n=10)))
Raymond Hettinger714c60d2019-03-18 20:17:14 -0700757 [810, 896, 958, 1011, 1060, 1109, 1162, 1224, 1310]
758
Raymond Hettinger11c79532019-02-23 14:44:07 -0800759To estimate the distribution for a model than isn't easy to solve
760analytically, :class:`NormalDist` can generate input samples for a `Monte
Raymond Hettingercc353a02019-03-10 23:43:33 -0700761Carlo simulation <https://en.wikipedia.org/wiki/Monte_Carlo_method>`_:
Raymond Hettinger11c79532019-02-23 14:44:07 -0800762
763.. doctest::
764
Raymond Hettingercc353a02019-03-10 23:43:33 -0700765 >>> def model(x, y, z):
766 ... return (3*x + 7*x*y - 5*y) / (11 * z)
767 ...
Raymond Hettinger11c79532019-02-23 14:44:07 -0800768 >>> n = 100_000
Raymond Hettingere4810b22019-09-05 00:18:47 -0700769 >>> X = NormalDist(10, 2.5).samples(n, seed=3652260728)
770 >>> Y = NormalDist(15, 1.75).samples(n, seed=4582495471)
771 >>> Z = NormalDist(50, 1.25).samples(n, seed=6582483453)
772 >>> quantiles(map(model, X, Y, Z)) # doctest: +SKIP
773 [1.4591308524824727, 1.8035946855390597, 2.175091447274739]
Raymond Hettinger11c79532019-02-23 14:44:07 -0800774
775Normal distributions commonly arise in machine learning problems.
776
Raymond Hettinger1f58f4f2019-03-06 23:23:55 -0800777Wikipedia has a `nice example of a Naive Bayesian Classifier
Raymond Hettingerd70a3592019-03-09 00:42:23 -0800778<https://en.wikipedia.org/wiki/Naive_Bayes_classifier#Sex_classification>`_.
779The challenge is to predict a person's gender from measurements of normally
780distributed features including height, weight, and foot size.
Raymond Hettinger11c79532019-02-23 14:44:07 -0800781
Raymond Hettinger1f58f4f2019-03-06 23:23:55 -0800782We're given a training dataset with measurements for eight people. The
Raymond Hettinger11c79532019-02-23 14:44:07 -0800783measurements are assumed to be normally distributed, so we summarize the data
784with :class:`NormalDist`:
785
786.. doctest::
787
788 >>> height_male = NormalDist.from_samples([6, 5.92, 5.58, 5.92])
789 >>> height_female = NormalDist.from_samples([5, 5.5, 5.42, 5.75])
790 >>> weight_male = NormalDist.from_samples([180, 190, 170, 165])
791 >>> weight_female = NormalDist.from_samples([100, 150, 130, 150])
792 >>> foot_size_male = NormalDist.from_samples([12, 11, 12, 10])
793 >>> foot_size_female = NormalDist.from_samples([6, 8, 7, 9])
794
Raymond Hettinger1f58f4f2019-03-06 23:23:55 -0800795Next, we encounter a new person whose feature measurements are known but whose
796gender is unknown:
Raymond Hettinger11c79532019-02-23 14:44:07 -0800797
798.. doctest::
799
800 >>> ht = 6.0 # height
801 >>> wt = 130 # weight
802 >>> fs = 8 # foot size
803
Raymond Hettinger1f58f4f2019-03-06 23:23:55 -0800804Starting with a 50% `prior probability
805<https://en.wikipedia.org/wiki/Prior_probability>`_ of being male or female,
806we compute the posterior as the prior times the product of likelihoods for the
807feature measurements given the gender:
Raymond Hettinger11c79532019-02-23 14:44:07 -0800808
809.. doctest::
810
Raymond Hettinger1f58f4f2019-03-06 23:23:55 -0800811 >>> prior_male = 0.5
812 >>> prior_female = 0.5
Raymond Hettinger11c79532019-02-23 14:44:07 -0800813 >>> posterior_male = (prior_male * height_male.pdf(ht) *
814 ... weight_male.pdf(wt) * foot_size_male.pdf(fs))
815
816 >>> posterior_female = (prior_female * height_female.pdf(ht) *
817 ... weight_female.pdf(wt) * foot_size_female.pdf(fs))
818
Raymond Hettinger1f58f4f2019-03-06 23:23:55 -0800819The final prediction goes to the largest posterior. This is known as the
820`maximum a posteriori
Raymond Hettinger11c79532019-02-23 14:44:07 -0800821<https://en.wikipedia.org/wiki/Maximum_a_posteriori_estimation>`_ or MAP:
822
823.. doctest::
824
825 >>> 'male' if posterior_male > posterior_female else 'female'
826 'female'
827
828
Larry Hastingsf5e987b2013-10-19 11:50:09 -0700829..
830 # This modelines must appear within the last ten lines of the file.
831 kate: indent-width 3; remove-trailing-space on; replace-tabs on; encoding utf-8;