blob: 90b86248e6e1c3654886836998f2675fc03f571d [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001:mod:`random` --- Generate pseudo-random numbers
2================================================
3
4.. module:: random
5 :synopsis: Generate pseudo-random numbers with various common distributions.
6
Raymond Hettinger10480942011-01-10 03:26:08 +00007**Source code:** :source:`Lib/random.py`
Georg Brandl116aa622007-08-15 14:28:22 +00008
Raymond Hettinger4f707fd2011-01-10 19:54:11 +00009--------------
10
Georg Brandl116aa622007-08-15 14:28:22 +000011This module implements pseudo-random number generators for various
12distributions.
13
Raymond Hettingerb21dac12010-09-07 05:32:49 +000014For integers, there is uniform selection from a range. For sequences, there is
15uniform selection of a random element, a function to generate a random
16permutation of a list in-place, and a function for random sampling without
17replacement.
Georg Brandl116aa622007-08-15 14:28:22 +000018
19On the real line, there are functions to compute uniform, normal (Gaussian),
20lognormal, negative exponential, gamma, and beta distributions. For generating
21distributions of angles, the von Mises distribution is available.
22
Georg Brandl92849d12016-02-19 08:57:38 +010023Almost all module functions depend on the basic function :func:`.random`, which
Georg Brandl116aa622007-08-15 14:28:22 +000024generates a random float uniformly in the semi-open range [0.0, 1.0). Python
25uses the Mersenne Twister as the core generator. It produces 53-bit precision
26floats and has a period of 2\*\*19937-1. The underlying implementation in C is
27both fast and threadsafe. The Mersenne Twister is one of the most extensively
28tested random number generators in existence. However, being completely
29deterministic, it is not suitable for all purposes, and is completely unsuitable
30for cryptographic purposes.
31
32The functions supplied by this module are actually bound methods of a hidden
33instance of the :class:`random.Random` class. You can instantiate your own
Raymond Hettinger28de64f2008-01-13 23:40:30 +000034instances of :class:`Random` to get generators that don't share state.
Georg Brandl116aa622007-08-15 14:28:22 +000035
36Class :class:`Random` can also be subclassed if you want to use a different
Georg Brandl92849d12016-02-19 08:57:38 +010037basic generator of your own devising: in that case, override the :meth:`~Random.random`,
38:meth:`~Random.seed`, :meth:`~Random.getstate`, and :meth:`~Random.setstate` methods.
39Optionally, a new generator can supply a :meth:`~Random.getrandbits` method --- this
Georg Brandl116aa622007-08-15 14:28:22 +000040allows :meth:`randrange` to produce selections over an arbitrarily large range.
41
Benjamin Peterson21896a32010-03-21 22:03:03 +000042The :mod:`random` module also provides the :class:`SystemRandom` class which
43uses the system function :func:`os.urandom` to generate random numbers
44from sources provided by the operating system.
Georg Brandl116aa622007-08-15 14:28:22 +000045
Raymond Hettingerc89a4512014-05-11 02:26:23 -070046.. warning::
47
48 The pseudo-random generators of this module should not be used for
Steven D'Apranob2871fa2016-04-17 01:42:33 +100049 security purposes. For security or cryptographic uses, see the
50 :mod:`secrets` module.
Raymond Hettingerc89a4512014-05-11 02:26:23 -070051
Raymond Hettingere1329102016-11-21 12:33:50 -080052.. seealso::
Georg Brandl116aa622007-08-15 14:28:22 +000053
Raymond Hettingere1329102016-11-21 12:33:50 -080054 M. Matsumoto and T. Nishimura, "Mersenne Twister: A 623-dimensionally
55 equidistributed uniform pseudorandom number generator", ACM Transactions on
Serhiy Storchaka0264e462016-11-26 13:49:59 +020056 Modeling and Computer Simulation Vol. 8, No. 1, January pp.3--30 1998.
Raymond Hettingere1329102016-11-21 12:33:50 -080057
58
59 `Complementary-Multiply-with-Carry recipe
60 <https://code.activestate.com/recipes/576707/>`_ for a compatible alternative
61 random number generator with a long period and comparatively simple update
62 operations.
63
64
65Bookkeeping functions
66---------------------
Georg Brandl116aa622007-08-15 14:28:22 +000067
Ezio Melottie0add762012-09-14 06:32:35 +030068.. function:: seed(a=None, version=2)
Georg Brandl116aa622007-08-15 14:28:22 +000069
Raymond Hettingerf763a722010-09-07 00:38:15 +000070 Initialize the random number generator.
Georg Brandl116aa622007-08-15 14:28:22 +000071
Ezio Melottie0add762012-09-14 06:32:35 +030072 If *a* is omitted or ``None``, the current system time is used. If
Raymond Hettingerf763a722010-09-07 00:38:15 +000073 randomness sources are provided by the operating system, they are used
74 instead of the system time (see the :func:`os.urandom` function for details
75 on availability).
Georg Brandl116aa622007-08-15 14:28:22 +000076
Ezio Melottie0add762012-09-14 06:32:35 +030077 If *a* is an int, it is used directly.
Raymond Hettingerf763a722010-09-07 00:38:15 +000078
79 With version 2 (the default), a :class:`str`, :class:`bytes`, or :class:`bytearray`
Raymond Hettinger16eb8272016-09-04 11:17:28 -070080 object gets converted to an :class:`int` and all of its bits are used.
81
82 With version 1 (provided for reproducing random sequences from older versions
83 of Python), the algorithm for :class:`str` and :class:`bytes` generates a
84 narrower range of seeds.
Raymond Hettingerf763a722010-09-07 00:38:15 +000085
86 .. versionchanged:: 3.2
87 Moved to the version 2 scheme which uses all of the bits in a string seed.
Georg Brandl116aa622007-08-15 14:28:22 +000088
89.. function:: getstate()
90
91 Return an object capturing the current internal state of the generator. This
92 object can be passed to :func:`setstate` to restore the state.
93
Georg Brandl116aa622007-08-15 14:28:22 +000094
95.. function:: setstate(state)
96
97 *state* should have been obtained from a previous call to :func:`getstate`, and
98 :func:`setstate` restores the internal state of the generator to what it was at
Sandro Tosi985104a2012-08-12 15:12:15 +020099 the time :func:`getstate` was called.
Georg Brandl116aa622007-08-15 14:28:22 +0000100
Georg Brandl116aa622007-08-15 14:28:22 +0000101
Georg Brandl116aa622007-08-15 14:28:22 +0000102.. function:: getrandbits(k)
103
Ezio Melotti0639d5a2009-12-19 23:26:38 +0000104 Returns a Python integer with *k* random bits. This method is supplied with
Georg Brandl5c106642007-11-29 17:41:05 +0000105 the MersenneTwister generator and some other generators may also provide it
Georg Brandl116aa622007-08-15 14:28:22 +0000106 as an optional part of the API. When available, :meth:`getrandbits` enables
107 :meth:`randrange` to handle arbitrarily large ranges.
108
Georg Brandl116aa622007-08-15 14:28:22 +0000109
Raymond Hettingere1329102016-11-21 12:33:50 -0800110Functions for integers
111----------------------
Georg Brandl116aa622007-08-15 14:28:22 +0000112
Ezio Melottie0add762012-09-14 06:32:35 +0300113.. function:: randrange(stop)
114 randrange(start, stop[, step])
Georg Brandl116aa622007-08-15 14:28:22 +0000115
116 Return a randomly selected element from ``range(start, stop, step)``. This is
117 equivalent to ``choice(range(start, stop, step))``, but doesn't actually build a
118 range object.
119
Raymond Hettinger05156612010-09-07 04:44:52 +0000120 The positional argument pattern matches that of :func:`range`. Keyword arguments
121 should not be used because the function may use them in unexpected ways.
122
123 .. versionchanged:: 3.2
124 :meth:`randrange` is more sophisticated about producing equally distributed
125 values. Formerly it used a style like ``int(random()*n)`` which could produce
126 slightly uneven distributions.
Georg Brandl116aa622007-08-15 14:28:22 +0000127
128.. function:: randint(a, b)
129
Raymond Hettingerafd30452009-02-24 10:57:02 +0000130 Return a random integer *N* such that ``a <= N <= b``. Alias for
131 ``randrange(a, b+1)``.
Georg Brandl116aa622007-08-15 14:28:22 +0000132
Georg Brandl116aa622007-08-15 14:28:22 +0000133
Raymond Hettingere1329102016-11-21 12:33:50 -0800134Functions for sequences
135-----------------------
Georg Brandl116aa622007-08-15 14:28:22 +0000136
137.. function:: choice(seq)
138
139 Return a random element from the non-empty sequence *seq*. If *seq* is empty,
140 raises :exc:`IndexError`.
141
Raymond Hettinger9016f282016-09-26 21:45:57 -0700142.. function:: choices(population, weights=None, *, cum_weights=None, k=1)
Raymond Hettingere8f1e002016-09-06 17:15:29 -0700143
144 Return a *k* sized list of elements chosen from the *population* with replacement.
145 If the *population* is empty, raises :exc:`IndexError`.
146
147 If a *weights* sequence is specified, selections are made according to the
148 relative weights. Alternatively, if a *cum_weights* sequence is given, the
Raymond Hettinger1c3a1212016-10-12 01:42:10 -0400149 selections are made according to the cumulative weights (perhaps computed
150 using :func:`itertools.accumulate`). For example, the relative weights
151 ``[10, 5, 30, 5]`` are equivalent to the cumulative weights
152 ``[10, 15, 45, 50]``. Internally, the relative weights are converted to
153 cumulative weights before making selections, so supplying the cumulative
154 weights saves work.
Raymond Hettingere8f1e002016-09-06 17:15:29 -0700155
156 If neither *weights* nor *cum_weights* are specified, selections are made
157 with equal probability. If a weights sequence is supplied, it must be
158 the same length as the *population* sequence. It is a :exc:`TypeError`
159 to specify both *weights* and *cum_weights*.
160
161 The *weights* or *cum_weights* can use any numeric type that interoperates
162 with the :class:`float` values returned by :func:`random` (that includes
Raymond Hettinger8dbe5632019-07-19 01:56:42 -0700163 integers, floats, and fractions but excludes decimals). Weights are
164 assumed to be non-negative.
Georg Brandl116aa622007-08-15 14:28:22 +0000165
Raymond Hettinger40ebe942019-01-30 13:30:20 -0800166 For a given seed, the :func:`choices` function with equal weighting
167 typically produces a different sequence than repeated calls to
168 :func:`choice`. The algorithm used by :func:`choices` uses floating
169 point arithmetic for internal consistency and speed. The algorithm used
170 by :func:`choice` defaults to integer arithmetic with repeated selections
171 to avoid small biases from round-off error.
172
Raymond Hettinger1c3a1212016-10-12 01:42:10 -0400173 .. versionadded:: 3.6
174
175
Georg Brandl116aa622007-08-15 14:28:22 +0000176.. function:: shuffle(x[, random])
177
Raymond Hettingera3950e42016-11-17 01:49:54 -0800178 Shuffle the sequence *x* in place.
Georg Brandl116aa622007-08-15 14:28:22 +0000179
Raymond Hettingera3950e42016-11-17 01:49:54 -0800180 The optional argument *random* is a 0-argument function returning a random
181 float in [0.0, 1.0); by default, this is the function :func:`.random`.
182
183 To shuffle an immutable sequence and return a new shuffled list, use
184 ``sample(x, k=len(x))`` instead.
185
186 Note that even for small ``len(x)``, the total number of permutations of *x*
187 can quickly grow larger than the period of most random number generators.
188 This implies that most permutations of a long sequence can never be
189 generated. For example, a sequence of length 2080 is the largest that
190 can fit within the period of the Mersenne Twister random number generator.
Georg Brandl116aa622007-08-15 14:28:22 +0000191
192
193.. function:: sample(population, k)
194
Raymond Hettinger1acde192008-01-14 01:00:53 +0000195 Return a *k* length list of unique elements chosen from the population sequence
196 or set. Used for random sampling without replacement.
Georg Brandl116aa622007-08-15 14:28:22 +0000197
Georg Brandl116aa622007-08-15 14:28:22 +0000198 Returns a new list containing elements from the population while leaving the
199 original population unchanged. The resulting list is in selection order so that
200 all sub-slices will also be valid random samples. This allows raffle winners
201 (the sample) to be partitioned into grand prize and second place winners (the
202 subslices).
203
Guido van Rossum2cc30da2007-11-02 23:46:40 +0000204 Members of the population need not be :term:`hashable` or unique. If the population
Georg Brandl116aa622007-08-15 14:28:22 +0000205 contains repeats, then each occurrence is a possible selection in the sample.
206
Raymond Hettingera3950e42016-11-17 01:49:54 -0800207 To choose a sample from a range of integers, use a :func:`range` object as an
Georg Brandl116aa622007-08-15 14:28:22 +0000208 argument. This is especially fast and space efficient for sampling from a large
Raymond Hettingera3950e42016-11-17 01:49:54 -0800209 population: ``sample(range(10000000), k=60)``.
Georg Brandl116aa622007-08-15 14:28:22 +0000210
Raymond Hettingerf07d9492012-07-09 12:43:57 -0700211 If the sample size is larger than the population size, a :exc:`ValueError`
Raymond Hettinger86a20f82012-07-08 16:01:53 -0700212 is raised.
213
Raymond Hettingere1329102016-11-21 12:33:50 -0800214Real-valued distributions
215-------------------------
216
Georg Brandl116aa622007-08-15 14:28:22 +0000217The following functions generate specific real-valued distributions. Function
218parameters are named after the corresponding variables in the distribution's
219equation, as used in common mathematical practice; most of these equations can
220be found in any statistics text.
221
222
223.. function:: random()
224
225 Return the next random floating point number in the range [0.0, 1.0).
226
227
228.. function:: uniform(a, b)
229
Benjamin Petersonb58dda72009-01-18 22:27:04 +0000230 Return a random floating point number *N* such that ``a <= N <= b`` for
231 ``a <= b`` and ``b <= N <= a`` for ``b < a``.
Georg Brandl116aa622007-08-15 14:28:22 +0000232
Raymond Hettingerbe40db02009-06-11 23:12:14 +0000233 The end-point value ``b`` may or may not be included in the range
234 depending on floating-point rounding in the equation ``a + (b-a) * random()``.
Benjamin Peterson35e8c462008-04-24 02:34:53 +0000235
Georg Brandl73dd7c72011-09-17 20:36:28 +0200236
Christian Heimesfe337bf2008-03-23 21:54:12 +0000237.. function:: triangular(low, high, mode)
238
Benjamin Petersonb58dda72009-01-18 22:27:04 +0000239 Return a random floating point number *N* such that ``low <= N <= high`` and
Christian Heimescc47b052008-03-25 14:56:36 +0000240 with the specified *mode* between those bounds. The *low* and *high* bounds
241 default to zero and one. The *mode* argument defaults to the midpoint
242 between the bounds, giving a symmetric distribution.
Christian Heimesfe337bf2008-03-23 21:54:12 +0000243
Georg Brandl116aa622007-08-15 14:28:22 +0000244
245.. function:: betavariate(alpha, beta)
246
Benjamin Petersonb58dda72009-01-18 22:27:04 +0000247 Beta distribution. Conditions on the parameters are ``alpha > 0`` and
248 ``beta > 0``. Returned values range between 0 and 1.
Georg Brandl116aa622007-08-15 14:28:22 +0000249
250
251.. function:: expovariate(lambd)
252
Mark Dickinson2f947362009-01-07 17:54:07 +0000253 Exponential distribution. *lambd* is 1.0 divided by the desired
254 mean. It should be nonzero. (The parameter would be called
255 "lambda", but that is a reserved word in Python.) Returned values
256 range from 0 to positive infinity if *lambd* is positive, and from
257 negative infinity to 0 if *lambd* is negative.
Georg Brandl116aa622007-08-15 14:28:22 +0000258
259
260.. function:: gammavariate(alpha, beta)
261
Benjamin Petersonb58dda72009-01-18 22:27:04 +0000262 Gamma distribution. (*Not* the gamma function!) Conditions on the
263 parameters are ``alpha > 0`` and ``beta > 0``.
Georg Brandl116aa622007-08-15 14:28:22 +0000264
Georg Brandl73dd7c72011-09-17 20:36:28 +0200265 The probability distribution function is::
266
267 x ** (alpha - 1) * math.exp(-x / beta)
268 pdf(x) = --------------------------------------
269 math.gamma(alpha) * beta ** alpha
270
Georg Brandl116aa622007-08-15 14:28:22 +0000271
272.. function:: gauss(mu, sigma)
273
Benjamin Petersonb58dda72009-01-18 22:27:04 +0000274 Gaussian distribution. *mu* is the mean, and *sigma* is the standard
275 deviation. This is slightly faster than the :func:`normalvariate` function
276 defined below.
Georg Brandl116aa622007-08-15 14:28:22 +0000277
278
279.. function:: lognormvariate(mu, sigma)
280
281 Log normal distribution. If you take the natural logarithm of this
282 distribution, you'll get a normal distribution with mean *mu* and standard
283 deviation *sigma*. *mu* can have any value, and *sigma* must be greater than
284 zero.
285
286
287.. function:: normalvariate(mu, sigma)
288
289 Normal distribution. *mu* is the mean, and *sigma* is the standard deviation.
290
291
292.. function:: vonmisesvariate(mu, kappa)
293
294 *mu* is the mean angle, expressed in radians between 0 and 2\*\ *pi*, and *kappa*
295 is the concentration parameter, which must be greater than or equal to zero. If
296 *kappa* is equal to zero, this distribution reduces to a uniform random angle
297 over the range 0 to 2\*\ *pi*.
298
299
300.. function:: paretovariate(alpha)
301
302 Pareto distribution. *alpha* is the shape parameter.
303
304
305.. function:: weibullvariate(alpha, beta)
306
307 Weibull distribution. *alpha* is the scale parameter and *beta* is the shape
308 parameter.
309
310
Raymond Hettingere1329102016-11-21 12:33:50 -0800311Alternative Generator
312---------------------
Georg Brandl116aa622007-08-15 14:28:22 +0000313
Matthias Bussonnier31e8d692019-04-16 09:47:11 -0700314.. class:: Random([seed])
315
316 Class that implements the default pseudo-random number generator used by the
317 :mod:`random` module.
318
Georg Brandl116aa622007-08-15 14:28:22 +0000319.. class:: SystemRandom([seed])
320
321 Class that uses the :func:`os.urandom` function for generating random numbers
322 from sources provided by the operating system. Not available on all systems.
Raymond Hettinger3cdf8712010-12-02 05:35:35 +0000323 Does not rely on software state, and sequences are not reproducible. Accordingly,
Raymond Hettingerafd30452009-02-24 10:57:02 +0000324 the :meth:`seed` method has no effect and is ignored.
Georg Brandl116aa622007-08-15 14:28:22 +0000325 The :meth:`getstate` and :meth:`setstate` methods raise
326 :exc:`NotImplementedError` if called.
327
Georg Brandl116aa622007-08-15 14:28:22 +0000328
Raymond Hettinger435cb0f2010-09-06 23:36:31 +0000329Notes on Reproducibility
Antoine Pitroue72b5862010-12-12 20:13:31 +0000330------------------------
Raymond Hettinger435cb0f2010-09-06 23:36:31 +0000331
332Sometimes it is useful to be able to reproduce the sequences given by a pseudo
333random number generator. By re-using a seed value, the same sequence should be
334reproducible from run to run as long as multiple threads are not running.
335
336Most of the random module's algorithms and seeding functions are subject to
337change across Python versions, but two aspects are guaranteed not to change:
338
339* If a new seeding method is added, then a backward compatible seeder will be
340 offered.
341
Georg Brandl92849d12016-02-19 08:57:38 +0100342* The generator's :meth:`~Random.random` method will continue to produce the same
Raymond Hettinger435cb0f2010-09-06 23:36:31 +0000343 sequence when the compatible seeder is given the same seed.
Raymond Hettinger2fdc7b12010-12-02 02:41:33 +0000344
Raymond Hettinger6e353942010-12-04 23:42:12 +0000345.. _random-examples:
Raymond Hettinger2fdc7b12010-12-02 02:41:33 +0000346
Raymond Hettinger2fdc7b12010-12-02 02:41:33 +0000347Examples and Recipes
Antoine Pitroue72b5862010-12-12 20:13:31 +0000348--------------------
Raymond Hettinger2fdc7b12010-12-02 02:41:33 +0000349
Raymond Hettinger0a1a9092016-11-17 00:45:35 -0800350Basic examples::
Raymond Hettinger3cdf8712010-12-02 05:35:35 +0000351
Raymond Hettinger0a1a9092016-11-17 00:45:35 -0800352 >>> random() # Random float: 0.0 <= x < 1.0
Raymond Hettinger3cdf8712010-12-02 05:35:35 +0000353 0.37444887175646646
354
Raymond Hettingere1329102016-11-21 12:33:50 -0800355 >>> uniform(2.5, 10.0) # Random float: 2.5 <= x < 10.0
Raymond Hettinger0a1a9092016-11-17 00:45:35 -0800356 3.1800146073117523
Raymond Hettinger3cdf8712010-12-02 05:35:35 +0000357
Raymond Hettingere1329102016-11-21 12:33:50 -0800358 >>> expovariate(1 / 5) # Interval between arrivals averaging 5 seconds
Raymond Hettinger0a1a9092016-11-17 00:45:35 -0800359 5.148957571865031
360
Raymond Hettingere1329102016-11-21 12:33:50 -0800361 >>> randrange(10) # Integer from 0 to 9 inclusive
Raymond Hettinger3cdf8712010-12-02 05:35:35 +0000362 7
363
Raymond Hettinger0a1a9092016-11-17 00:45:35 -0800364 >>> randrange(0, 101, 2) # Even integer from 0 to 100 inclusive
Raymond Hettinger3cdf8712010-12-02 05:35:35 +0000365 26
366
Raymond Hettinger6befb642016-11-21 01:59:39 -0800367 >>> choice(['win', 'lose', 'draw']) # Single random element from a sequence
368 'draw'
Raymond Hettinger3cdf8712010-12-02 05:35:35 +0000369
Raymond Hettinger0a1a9092016-11-17 00:45:35 -0800370 >>> deck = 'ace two three four'.split()
371 >>> shuffle(deck) # Shuffle a list
Raymond Hettinger1c3a1212016-10-12 01:42:10 -0400372 >>> deck
Raymond Hettinger0a1a9092016-11-17 00:45:35 -0800373 ['four', 'two', 'ace', 'three']
Raymond Hettinger3cdf8712010-12-02 05:35:35 +0000374
Raymond Hettinger0a1a9092016-11-17 00:45:35 -0800375 >>> sample([10, 20, 30, 40, 50], k=4) # Four samples without replacement
376 [40, 10, 50, 30]
Raymond Hettinger3cdf8712010-12-02 05:35:35 +0000377
Raymond Hettinger0a1a9092016-11-17 00:45:35 -0800378Simulations::
379
Raymond Hettinger71c62e12016-12-04 11:00:34 -0800380 >>> # Six roulette wheel spins (weighted sampling with replacement)
Raymond Hettinger1c3a1212016-10-12 01:42:10 -0400381 >>> choices(['red', 'black', 'green'], [18, 18, 2], k=6)
382 ['red', 'green', 'black', 'black', 'red', 'black']
Raymond Hettinger2fdc7b12010-12-02 02:41:33 +0000383
Raymond Hettinger71c62e12016-12-04 11:00:34 -0800384 >>> # Deal 20 cards without replacement from a deck of 52 playing cards
385 >>> # and determine the proportion of cards with a ten-value
386 >>> # (a ten, jack, queen, or king).
Raymond Hettinger0a1a9092016-11-17 00:45:35 -0800387 >>> deck = collections.Counter(tens=16, low_cards=36)
388 >>> seen = sample(list(deck.elements()), k=20)
Raymond Hettinger71c62e12016-12-04 11:00:34 -0800389 >>> seen.count('tens') / 20
Raymond Hettinger0a1a9092016-11-17 00:45:35 -0800390 0.15
391
Raymond Hettinger71c62e12016-12-04 11:00:34 -0800392 >>> # Estimate the probability of getting 5 or more heads from 7 spins
393 >>> # of a biased coin that settles on heads 60% of the time.
Raymond Hettinger9abb7252019-02-15 12:40:18 -0800394 >>> def trial():
395 ... return choices('HT', cum_weights=(0.60, 1.00), k=7).count('H') >= 5
396 ...
Raymond Hettinger71c62e12016-12-04 11:00:34 -0800397 >>> sum(trial() for i in range(10000)) / 10000
Raymond Hettinger16ef5d42016-10-31 22:53:52 -0700398 0.4169
399
Raymond Hettinger71c62e12016-12-04 11:00:34 -0800400 >>> # Probability of the median of 5 samples being in middle two quartiles
Raymond Hettinger9abb7252019-02-15 12:40:18 -0800401 >>> def trial():
402 ... return 2500 <= sorted(choices(range(10000), k=5))[2] < 7500
403 ...
Raymond Hettinger71c62e12016-12-04 11:00:34 -0800404 >>> sum(trial() for i in range(10000)) / 10000
405 0.7958
406
Raymond Hettinger1c3a1212016-10-12 01:42:10 -0400407Example of `statistical bootstrapping
408<https://en.wikipedia.org/wiki/Bootstrapping_(statistics)>`_ using resampling
Raymond Hettinger0a1a9092016-11-17 00:45:35 -0800409with replacement to estimate a confidence interval for the mean of a sample of
410size five::
Raymond Hettinger2fdc7b12010-12-02 02:41:33 +0000411
Raymond Hettinger1c3a1212016-10-12 01:42:10 -0400412 # http://statistics.about.com/od/Applications/a/Example-Of-Bootstrapping.htm
Raymond Hettinger47d99872019-02-21 15:06:29 -0800413 from statistics import fmean as mean
Raymond Hettinger1c3a1212016-10-12 01:42:10 -0400414 from random import choices
Raymond Hettingerf5b7c7b2016-09-05 13:15:02 -0700415
Raymond Hettinger1c3a1212016-10-12 01:42:10 -0400416 data = 1, 2, 4, 4, 10
417 means = sorted(mean(choices(data, k=5)) for i in range(20))
Raymond Hettinger2589ee32016-11-16 21:34:17 -0800418 print(f'The sample mean of {mean(data):.1f} has a 90% confidence '
419 f'interval from {means[1]:.1f} to {means[-2]:.1f}')
420
Raymond Hettinger00305ad2016-11-16 22:56:11 -0800421Example of a `resampling permutation test
422<https://en.wikipedia.org/wiki/Resampling_(statistics)#Permutation_tests>`_
423to determine the statistical significance or `p-value
424<https://en.wikipedia.org/wiki/P-value>`_ of an observed difference
425between the effects of a drug versus a placebo::
426
427 # Example from "Statistics is Easy" by Dennis Shasha and Manda Wilson
Raymond Hettinger47d99872019-02-21 15:06:29 -0800428 from statistics import fmean as mean
Raymond Hettinger00305ad2016-11-16 22:56:11 -0800429 from random import shuffle
430
431 drug = [54, 73, 53, 70, 73, 68, 52, 65, 65]
432 placebo = [54, 51, 58, 44, 55, 52, 42, 47, 58, 46]
433 observed_diff = mean(drug) - mean(placebo)
434
435 n = 10000
436 count = 0
437 combined = drug + placebo
438 for i in range(n):
439 shuffle(combined)
440 new_diff = mean(combined[:len(drug)]) - mean(combined[len(drug):])
441 count += (new_diff >= observed_diff)
442
443 print(f'{n} label reshufflings produced only {count} instances with a difference')
444 print(f'at least as extreme as the observed difference of {observed_diff:.1f}.')
445 print(f'The one-sided p-value of {count / n:.4f} leads us to reject the null')
Raymond Hettinger6befb642016-11-21 01:59:39 -0800446 print(f'hypothesis that there is no difference between the drug and the placebo.')
447
448Simulation of arrival times and service deliveries in a single server queue::
449
Raymond Hettinger1149d932016-11-21 14:13:07 -0800450 from random import expovariate, gauss
451 from statistics import mean, median, stdev
Raymond Hettinger6befb642016-11-21 01:59:39 -0800452
453 average_arrival_interval = 5.6
454 average_service_time = 5.0
455 stdev_service_time = 0.5
456
457 num_waiting = 0
Raymond Hettinger1149d932016-11-21 14:13:07 -0800458 arrivals = []
459 starts = []
Raymond Hettinger6befb642016-11-21 01:59:39 -0800460 arrival = service_end = 0.0
Raymond Hettinger8ab12582016-11-21 10:16:01 -0800461 for i in range(20000):
462 if arrival <= service_end:
463 num_waiting += 1
464 arrival += expovariate(1.0 / average_arrival_interval)
Raymond Hettinger1149d932016-11-21 14:13:07 -0800465 arrivals.append(arrival)
Raymond Hettinger8ab12582016-11-21 10:16:01 -0800466 else:
Raymond Hettinger6befb642016-11-21 01:59:39 -0800467 num_waiting -= 1
468 service_start = service_end if num_waiting else arrival
469 service_time = gauss(average_service_time, stdev_service_time)
470 service_end = service_start + service_time
Raymond Hettinger1149d932016-11-21 14:13:07 -0800471 starts.append(service_start)
472
473 waits = [start - arrival for arrival, start in zip(arrivals, starts)]
474 print(f'Mean wait: {mean(waits):.1f}. Stdev wait: {stdev(waits):.1f}.')
475 print(f'Median wait: {median(waits):.1f}. Max wait: {max(waits):.1f}.')
Raymond Hettinger6befb642016-11-21 01:59:39 -0800476
Raymond Hettinger05374052016-11-21 10:52:04 -0800477.. seealso::
478
479 `Statistics for Hackers <https://www.youtube.com/watch?v=Iq9DzN6mvYA>`_
480 a video tutorial by
481 `Jake Vanderplas <https://us.pycon.org/2016/speaker/profile/295/>`_
482 on statistical analysis using just a few fundamental concepts
483 including simulation, sampling, shuffling, and cross-validation.
484
485 `Economics Simulation
486 <http://nbviewer.jupyter.org/url/norvig.com/ipython/Economics.ipynb>`_
487 a simulation of a marketplace by
488 `Peter Norvig <http://norvig.com/bio.html>`_ that shows effective
Raymond Hettinger7f946192016-11-21 15:13:18 -0800489 use of many of the tools and distributions provided by this module
Raymond Hettinger05374052016-11-21 10:52:04 -0800490 (gauss, uniform, sample, betavariate, choice, triangular, and randrange).
491
492 `A Concrete Introduction to Probability (using Python)
493 <http://nbviewer.jupyter.org/url/norvig.com/ipython/Probability.ipynb>`_
494 a tutorial by `Peter Norvig <http://norvig.com/bio.html>`_ covering
495 the basics of probability theory, how to write simulations, and
Raymond Hettinger7f946192016-11-21 15:13:18 -0800496 how to perform data analysis using Python.