blob: 43a9902f6c11f4e1609a1e628331cc9446d2a1d4 [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001:mod:`random` --- Generate pseudo-random numbers
2================================================
3
4.. module:: random
5 :synopsis: Generate pseudo-random numbers with various common distributions.
6
Raymond Hettinger10480942011-01-10 03:26:08 +00007**Source code:** :source:`Lib/random.py`
Georg Brandl116aa622007-08-15 14:28:22 +00008
Raymond Hettinger4f707fd2011-01-10 19:54:11 +00009--------------
10
Georg Brandl116aa622007-08-15 14:28:22 +000011This module implements pseudo-random number generators for various
12distributions.
13
Raymond Hettingerb21dac12010-09-07 05:32:49 +000014For integers, there is uniform selection from a range. For sequences, there is
15uniform selection of a random element, a function to generate a random
16permutation of a list in-place, and a function for random sampling without
17replacement.
Georg Brandl116aa622007-08-15 14:28:22 +000018
19On the real line, there are functions to compute uniform, normal (Gaussian),
20lognormal, negative exponential, gamma, and beta distributions. For generating
21distributions of angles, the von Mises distribution is available.
22
Georg Brandl92849d12016-02-19 08:57:38 +010023Almost all module functions depend on the basic function :func:`.random`, which
Georg Brandl116aa622007-08-15 14:28:22 +000024generates a random float uniformly in the semi-open range [0.0, 1.0). Python
25uses the Mersenne Twister as the core generator. It produces 53-bit precision
26floats and has a period of 2\*\*19937-1. The underlying implementation in C is
27both fast and threadsafe. The Mersenne Twister is one of the most extensively
28tested random number generators in existence. However, being completely
29deterministic, it is not suitable for all purposes, and is completely unsuitable
30for cryptographic purposes.
31
32The functions supplied by this module are actually bound methods of a hidden
33instance of the :class:`random.Random` class. You can instantiate your own
Raymond Hettinger28de64f2008-01-13 23:40:30 +000034instances of :class:`Random` to get generators that don't share state.
Georg Brandl116aa622007-08-15 14:28:22 +000035
36Class :class:`Random` can also be subclassed if you want to use a different
Georg Brandl92849d12016-02-19 08:57:38 +010037basic generator of your own devising: in that case, override the :meth:`~Random.random`,
38:meth:`~Random.seed`, :meth:`~Random.getstate`, and :meth:`~Random.setstate` methods.
39Optionally, a new generator can supply a :meth:`~Random.getrandbits` method --- this
Georg Brandl116aa622007-08-15 14:28:22 +000040allows :meth:`randrange` to produce selections over an arbitrarily large range.
41
Benjamin Peterson21896a32010-03-21 22:03:03 +000042The :mod:`random` module also provides the :class:`SystemRandom` class which
43uses the system function :func:`os.urandom` to generate random numbers
44from sources provided by the operating system.
Georg Brandl116aa622007-08-15 14:28:22 +000045
Raymond Hettingerc89a4512014-05-11 02:26:23 -070046.. warning::
47
48 The pseudo-random generators of this module should not be used for
Steven D'Apranob2871fa2016-04-17 01:42:33 +100049 security purposes. For security or cryptographic uses, see the
50 :mod:`secrets` module.
Raymond Hettingerc89a4512014-05-11 02:26:23 -070051
Raymond Hettingere1329102016-11-21 12:33:50 -080052.. seealso::
Georg Brandl116aa622007-08-15 14:28:22 +000053
Raymond Hettingere1329102016-11-21 12:33:50 -080054 M. Matsumoto and T. Nishimura, "Mersenne Twister: A 623-dimensionally
55 equidistributed uniform pseudorandom number generator", ACM Transactions on
Serhiy Storchaka0264e462016-11-26 13:49:59 +020056 Modeling and Computer Simulation Vol. 8, No. 1, January pp.3--30 1998.
Raymond Hettingere1329102016-11-21 12:33:50 -080057
58
59 `Complementary-Multiply-with-Carry recipe
60 <https://code.activestate.com/recipes/576707/>`_ for a compatible alternative
61 random number generator with a long period and comparatively simple update
62 operations.
63
64
65Bookkeeping functions
66---------------------
Georg Brandl116aa622007-08-15 14:28:22 +000067
Ezio Melottie0add762012-09-14 06:32:35 +030068.. function:: seed(a=None, version=2)
Georg Brandl116aa622007-08-15 14:28:22 +000069
Raymond Hettingerf763a722010-09-07 00:38:15 +000070 Initialize the random number generator.
Georg Brandl116aa622007-08-15 14:28:22 +000071
Ezio Melottie0add762012-09-14 06:32:35 +030072 If *a* is omitted or ``None``, the current system time is used. If
Raymond Hettingerf763a722010-09-07 00:38:15 +000073 randomness sources are provided by the operating system, they are used
74 instead of the system time (see the :func:`os.urandom` function for details
75 on availability).
Georg Brandl116aa622007-08-15 14:28:22 +000076
Ezio Melottie0add762012-09-14 06:32:35 +030077 If *a* is an int, it is used directly.
Raymond Hettingerf763a722010-09-07 00:38:15 +000078
79 With version 2 (the default), a :class:`str`, :class:`bytes`, or :class:`bytearray`
Raymond Hettinger16eb8272016-09-04 11:17:28 -070080 object gets converted to an :class:`int` and all of its bits are used.
81
82 With version 1 (provided for reproducing random sequences from older versions
83 of Python), the algorithm for :class:`str` and :class:`bytes` generates a
84 narrower range of seeds.
Raymond Hettingerf763a722010-09-07 00:38:15 +000085
86 .. versionchanged:: 3.2
87 Moved to the version 2 scheme which uses all of the bits in a string seed.
Georg Brandl116aa622007-08-15 14:28:22 +000088
Raymond Hettingerd0cdeaa2019-08-22 09:19:36 -070089 .. deprecated:: 3.9
90 In the future, the *seed* must be one of the following types:
91 *NoneType*, :class:`int`, :class:`float`, :class:`str`,
92 :class:`bytes`, or :class:`bytearray`.
93
Georg Brandl116aa622007-08-15 14:28:22 +000094.. function:: getstate()
95
96 Return an object capturing the current internal state of the generator. This
97 object can be passed to :func:`setstate` to restore the state.
98
Georg Brandl116aa622007-08-15 14:28:22 +000099
100.. function:: setstate(state)
101
102 *state* should have been obtained from a previous call to :func:`getstate`, and
103 :func:`setstate` restores the internal state of the generator to what it was at
Sandro Tosi985104a2012-08-12 15:12:15 +0200104 the time :func:`getstate` was called.
Georg Brandl116aa622007-08-15 14:28:22 +0000105
Georg Brandl116aa622007-08-15 14:28:22 +0000106
Georg Brandl116aa622007-08-15 14:28:22 +0000107.. function:: getrandbits(k)
108
Ezio Melotti0639d5a2009-12-19 23:26:38 +0000109 Returns a Python integer with *k* random bits. This method is supplied with
Georg Brandl5c106642007-11-29 17:41:05 +0000110 the MersenneTwister generator and some other generators may also provide it
Georg Brandl116aa622007-08-15 14:28:22 +0000111 as an optional part of the API. When available, :meth:`getrandbits` enables
112 :meth:`randrange` to handle arbitrarily large ranges.
113
Antoine Pitrou75a33782020-04-17 19:32:14 +0200114 .. versionchanged:: 3.9
115 This method now accepts zero for *k*.
116
Georg Brandl116aa622007-08-15 14:28:22 +0000117
Victor Stinner9f5fe792020-04-17 19:05:35 +0200118.. function:: randbytes(n)
119
120 Generate *n* random bytes.
121
122 .. versionadded:: 3.9
123
124
Raymond Hettingere1329102016-11-21 12:33:50 -0800125Functions for integers
126----------------------
Georg Brandl116aa622007-08-15 14:28:22 +0000127
Ezio Melottie0add762012-09-14 06:32:35 +0300128.. function:: randrange(stop)
129 randrange(start, stop[, step])
Georg Brandl116aa622007-08-15 14:28:22 +0000130
131 Return a randomly selected element from ``range(start, stop, step)``. This is
132 equivalent to ``choice(range(start, stop, step))``, but doesn't actually build a
133 range object.
134
Raymond Hettinger05156612010-09-07 04:44:52 +0000135 The positional argument pattern matches that of :func:`range`. Keyword arguments
136 should not be used because the function may use them in unexpected ways.
137
138 .. versionchanged:: 3.2
139 :meth:`randrange` is more sophisticated about producing equally distributed
140 values. Formerly it used a style like ``int(random()*n)`` which could produce
141 slightly uneven distributions.
Georg Brandl116aa622007-08-15 14:28:22 +0000142
143.. function:: randint(a, b)
144
Raymond Hettingerafd30452009-02-24 10:57:02 +0000145 Return a random integer *N* such that ``a <= N <= b``. Alias for
146 ``randrange(a, b+1)``.
Georg Brandl116aa622007-08-15 14:28:22 +0000147
Georg Brandl116aa622007-08-15 14:28:22 +0000148
Raymond Hettingere1329102016-11-21 12:33:50 -0800149Functions for sequences
150-----------------------
Georg Brandl116aa622007-08-15 14:28:22 +0000151
152.. function:: choice(seq)
153
154 Return a random element from the non-empty sequence *seq*. If *seq* is empty,
155 raises :exc:`IndexError`.
156
Raymond Hettinger9016f282016-09-26 21:45:57 -0700157.. function:: choices(population, weights=None, *, cum_weights=None, k=1)
Raymond Hettingere8f1e002016-09-06 17:15:29 -0700158
159 Return a *k* sized list of elements chosen from the *population* with replacement.
160 If the *population* is empty, raises :exc:`IndexError`.
161
162 If a *weights* sequence is specified, selections are made according to the
163 relative weights. Alternatively, if a *cum_weights* sequence is given, the
Raymond Hettinger1c3a1212016-10-12 01:42:10 -0400164 selections are made according to the cumulative weights (perhaps computed
165 using :func:`itertools.accumulate`). For example, the relative weights
166 ``[10, 5, 30, 5]`` are equivalent to the cumulative weights
167 ``[10, 15, 45, 50]``. Internally, the relative weights are converted to
168 cumulative weights before making selections, so supplying the cumulative
169 weights saves work.
Raymond Hettingere8f1e002016-09-06 17:15:29 -0700170
171 If neither *weights* nor *cum_weights* are specified, selections are made
172 with equal probability. If a weights sequence is supplied, it must be
173 the same length as the *population* sequence. It is a :exc:`TypeError`
174 to specify both *weights* and *cum_weights*.
175
176 The *weights* or *cum_weights* can use any numeric type that interoperates
177 with the :class:`float` values returned by :func:`random` (that includes
Raymond Hettinger041d8b42019-11-23 02:22:13 -0800178 integers, floats, and fractions but excludes decimals). Behavior is
179 undefined if any weight is negative. A :exc:`ValueError` is raised if all
180 weights are zero.
Georg Brandl116aa622007-08-15 14:28:22 +0000181
Raymond Hettinger40ebe942019-01-30 13:30:20 -0800182 For a given seed, the :func:`choices` function with equal weighting
183 typically produces a different sequence than repeated calls to
184 :func:`choice`. The algorithm used by :func:`choices` uses floating
185 point arithmetic for internal consistency and speed. The algorithm used
186 by :func:`choice` defaults to integer arithmetic with repeated selections
187 to avoid small biases from round-off error.
188
Raymond Hettinger1c3a1212016-10-12 01:42:10 -0400189 .. versionadded:: 3.6
190
Raymond Hettinger041d8b42019-11-23 02:22:13 -0800191 .. versionchanged:: 3.9
192 Raises a :exc:`ValueError` if all weights are zero.
193
Raymond Hettinger1c3a1212016-10-12 01:42:10 -0400194
Georg Brandl116aa622007-08-15 14:28:22 +0000195.. function:: shuffle(x[, random])
196
Raymond Hettingera3950e42016-11-17 01:49:54 -0800197 Shuffle the sequence *x* in place.
Georg Brandl116aa622007-08-15 14:28:22 +0000198
Raymond Hettingera3950e42016-11-17 01:49:54 -0800199 The optional argument *random* is a 0-argument function returning a random
200 float in [0.0, 1.0); by default, this is the function :func:`.random`.
201
202 To shuffle an immutable sequence and return a new shuffled list, use
203 ``sample(x, k=len(x))`` instead.
204
205 Note that even for small ``len(x)``, the total number of permutations of *x*
206 can quickly grow larger than the period of most random number generators.
207 This implies that most permutations of a long sequence can never be
208 generated. For example, a sequence of length 2080 is the largest that
209 can fit within the period of the Mersenne Twister random number generator.
Georg Brandl116aa622007-08-15 14:28:22 +0000210
Raymond Hettinger190fac92020-05-02 16:45:32 -0700211 .. deprecated-removed:: 3.9 3.11
212 The optional parameter *random*.
213
Georg Brandl116aa622007-08-15 14:28:22 +0000214
215.. function:: sample(population, k)
216
Raymond Hettinger1acde192008-01-14 01:00:53 +0000217 Return a *k* length list of unique elements chosen from the population sequence
218 or set. Used for random sampling without replacement.
Georg Brandl116aa622007-08-15 14:28:22 +0000219
Georg Brandl116aa622007-08-15 14:28:22 +0000220 Returns a new list containing elements from the population while leaving the
221 original population unchanged. The resulting list is in selection order so that
222 all sub-slices will also be valid random samples. This allows raffle winners
223 (the sample) to be partitioned into grand prize and second place winners (the
224 subslices).
225
Guido van Rossum2cc30da2007-11-02 23:46:40 +0000226 Members of the population need not be :term:`hashable` or unique. If the population
Georg Brandl116aa622007-08-15 14:28:22 +0000227 contains repeats, then each occurrence is a possible selection in the sample.
228
Raymond Hettingera3950e42016-11-17 01:49:54 -0800229 To choose a sample from a range of integers, use a :func:`range` object as an
Georg Brandl116aa622007-08-15 14:28:22 +0000230 argument. This is especially fast and space efficient for sampling from a large
Raymond Hettingera3950e42016-11-17 01:49:54 -0800231 population: ``sample(range(10000000), k=60)``.
Georg Brandl116aa622007-08-15 14:28:22 +0000232
Raymond Hettingerf07d9492012-07-09 12:43:57 -0700233 If the sample size is larger than the population size, a :exc:`ValueError`
Raymond Hettinger86a20f82012-07-08 16:01:53 -0700234 is raised.
235
Raymond Hettinger4fe00202020-04-19 00:36:42 -0700236 .. deprecated:: 3.9
237 In the future, the *population* must be a sequence. Instances of
238 :class:`set` are no longer supported. The set must first be converted
239 to a :class:`list` or :class:`tuple`, preferably in a deterministic
240 order so that the sample is reproducible.
241
242
Raymond Hettingere1329102016-11-21 12:33:50 -0800243Real-valued distributions
244-------------------------
245
Georg Brandl116aa622007-08-15 14:28:22 +0000246The following functions generate specific real-valued distributions. Function
247parameters are named after the corresponding variables in the distribution's
248equation, as used in common mathematical practice; most of these equations can
249be found in any statistics text.
250
251
252.. function:: random()
253
254 Return the next random floating point number in the range [0.0, 1.0).
255
256
257.. function:: uniform(a, b)
258
Benjamin Petersonb58dda72009-01-18 22:27:04 +0000259 Return a random floating point number *N* such that ``a <= N <= b`` for
260 ``a <= b`` and ``b <= N <= a`` for ``b < a``.
Georg Brandl116aa622007-08-15 14:28:22 +0000261
Raymond Hettingerbe40db02009-06-11 23:12:14 +0000262 The end-point value ``b`` may or may not be included in the range
263 depending on floating-point rounding in the equation ``a + (b-a) * random()``.
Benjamin Peterson35e8c462008-04-24 02:34:53 +0000264
Georg Brandl73dd7c72011-09-17 20:36:28 +0200265
Christian Heimesfe337bf2008-03-23 21:54:12 +0000266.. function:: triangular(low, high, mode)
267
Benjamin Petersonb58dda72009-01-18 22:27:04 +0000268 Return a random floating point number *N* such that ``low <= N <= high`` and
Christian Heimescc47b052008-03-25 14:56:36 +0000269 with the specified *mode* between those bounds. The *low* and *high* bounds
270 default to zero and one. The *mode* argument defaults to the midpoint
271 between the bounds, giving a symmetric distribution.
Christian Heimesfe337bf2008-03-23 21:54:12 +0000272
Georg Brandl116aa622007-08-15 14:28:22 +0000273
274.. function:: betavariate(alpha, beta)
275
Benjamin Petersonb58dda72009-01-18 22:27:04 +0000276 Beta distribution. Conditions on the parameters are ``alpha > 0`` and
277 ``beta > 0``. Returned values range between 0 and 1.
Georg Brandl116aa622007-08-15 14:28:22 +0000278
279
280.. function:: expovariate(lambd)
281
Mark Dickinson2f947362009-01-07 17:54:07 +0000282 Exponential distribution. *lambd* is 1.0 divided by the desired
283 mean. It should be nonzero. (The parameter would be called
284 "lambda", but that is a reserved word in Python.) Returned values
285 range from 0 to positive infinity if *lambd* is positive, and from
286 negative infinity to 0 if *lambd* is negative.
Georg Brandl116aa622007-08-15 14:28:22 +0000287
288
289.. function:: gammavariate(alpha, beta)
290
Benjamin Petersonb58dda72009-01-18 22:27:04 +0000291 Gamma distribution. (*Not* the gamma function!) Conditions on the
292 parameters are ``alpha > 0`` and ``beta > 0``.
Georg Brandl116aa622007-08-15 14:28:22 +0000293
Georg Brandl73dd7c72011-09-17 20:36:28 +0200294 The probability distribution function is::
295
296 x ** (alpha - 1) * math.exp(-x / beta)
297 pdf(x) = --------------------------------------
298 math.gamma(alpha) * beta ** alpha
299
Georg Brandl116aa622007-08-15 14:28:22 +0000300
301.. function:: gauss(mu, sigma)
302
Benjamin Petersonb58dda72009-01-18 22:27:04 +0000303 Gaussian distribution. *mu* is the mean, and *sigma* is the standard
304 deviation. This is slightly faster than the :func:`normalvariate` function
305 defined below.
Georg Brandl116aa622007-08-15 14:28:22 +0000306
307
308.. function:: lognormvariate(mu, sigma)
309
310 Log normal distribution. If you take the natural logarithm of this
311 distribution, you'll get a normal distribution with mean *mu* and standard
312 deviation *sigma*. *mu* can have any value, and *sigma* must be greater than
313 zero.
314
315
316.. function:: normalvariate(mu, sigma)
317
318 Normal distribution. *mu* is the mean, and *sigma* is the standard deviation.
319
320
321.. function:: vonmisesvariate(mu, kappa)
322
323 *mu* is the mean angle, expressed in radians between 0 and 2\*\ *pi*, and *kappa*
324 is the concentration parameter, which must be greater than or equal to zero. If
325 *kappa* is equal to zero, this distribution reduces to a uniform random angle
326 over the range 0 to 2\*\ *pi*.
327
328
329.. function:: paretovariate(alpha)
330
331 Pareto distribution. *alpha* is the shape parameter.
332
333
334.. function:: weibullvariate(alpha, beta)
335
336 Weibull distribution. *alpha* is the scale parameter and *beta* is the shape
337 parameter.
338
339
Raymond Hettingere1329102016-11-21 12:33:50 -0800340Alternative Generator
341---------------------
Georg Brandl116aa622007-08-15 14:28:22 +0000342
Matthias Bussonnier31e8d692019-04-16 09:47:11 -0700343.. class:: Random([seed])
344
345 Class that implements the default pseudo-random number generator used by the
346 :mod:`random` module.
347
Raymond Hettingerd0cdeaa2019-08-22 09:19:36 -0700348 .. deprecated:: 3.9
349 In the future, the *seed* must be one of the following types:
350 :class:`NoneType`, :class:`int`, :class:`float`, :class:`str`,
351 :class:`bytes`, or :class:`bytearray`.
352
Georg Brandl116aa622007-08-15 14:28:22 +0000353.. class:: SystemRandom([seed])
354
355 Class that uses the :func:`os.urandom` function for generating random numbers
356 from sources provided by the operating system. Not available on all systems.
Raymond Hettinger3cdf8712010-12-02 05:35:35 +0000357 Does not rely on software state, and sequences are not reproducible. Accordingly,
Raymond Hettingerafd30452009-02-24 10:57:02 +0000358 the :meth:`seed` method has no effect and is ignored.
Georg Brandl116aa622007-08-15 14:28:22 +0000359 The :meth:`getstate` and :meth:`setstate` methods raise
360 :exc:`NotImplementedError` if called.
361
Georg Brandl116aa622007-08-15 14:28:22 +0000362
Raymond Hettinger435cb0f2010-09-06 23:36:31 +0000363Notes on Reproducibility
Antoine Pitroue72b5862010-12-12 20:13:31 +0000364------------------------
Raymond Hettinger435cb0f2010-09-06 23:36:31 +0000365
Julien Palard58a40542020-01-31 10:50:14 +0100366Sometimes it is useful to be able to reproduce the sequences given by a
367pseudo-random number generator. By re-using a seed value, the same sequence should be
Raymond Hettinger435cb0f2010-09-06 23:36:31 +0000368reproducible from run to run as long as multiple threads are not running.
369
370Most of the random module's algorithms and seeding functions are subject to
371change across Python versions, but two aspects are guaranteed not to change:
372
373* If a new seeding method is added, then a backward compatible seeder will be
374 offered.
375
Georg Brandl92849d12016-02-19 08:57:38 +0100376* The generator's :meth:`~Random.random` method will continue to produce the same
Raymond Hettinger435cb0f2010-09-06 23:36:31 +0000377 sequence when the compatible seeder is given the same seed.
Raymond Hettinger2fdc7b12010-12-02 02:41:33 +0000378
Raymond Hettinger6e353942010-12-04 23:42:12 +0000379.. _random-examples:
Raymond Hettinger2fdc7b12010-12-02 02:41:33 +0000380
Raymond Hettinger2fdc7b12010-12-02 02:41:33 +0000381Examples and Recipes
Antoine Pitroue72b5862010-12-12 20:13:31 +0000382--------------------
Raymond Hettinger2fdc7b12010-12-02 02:41:33 +0000383
Raymond Hettinger0a1a9092016-11-17 00:45:35 -0800384Basic examples::
Raymond Hettinger3cdf8712010-12-02 05:35:35 +0000385
Raymond Hettinger0a1a9092016-11-17 00:45:35 -0800386 >>> random() # Random float: 0.0 <= x < 1.0
Raymond Hettinger3cdf8712010-12-02 05:35:35 +0000387 0.37444887175646646
388
Raymond Hettingere1329102016-11-21 12:33:50 -0800389 >>> uniform(2.5, 10.0) # Random float: 2.5 <= x < 10.0
Raymond Hettinger0a1a9092016-11-17 00:45:35 -0800390 3.1800146073117523
Raymond Hettinger3cdf8712010-12-02 05:35:35 +0000391
Raymond Hettingere1329102016-11-21 12:33:50 -0800392 >>> expovariate(1 / 5) # Interval between arrivals averaging 5 seconds
Raymond Hettinger0a1a9092016-11-17 00:45:35 -0800393 5.148957571865031
394
Raymond Hettingere1329102016-11-21 12:33:50 -0800395 >>> randrange(10) # Integer from 0 to 9 inclusive
Raymond Hettinger3cdf8712010-12-02 05:35:35 +0000396 7
397
Raymond Hettinger0a1a9092016-11-17 00:45:35 -0800398 >>> randrange(0, 101, 2) # Even integer from 0 to 100 inclusive
Raymond Hettinger3cdf8712010-12-02 05:35:35 +0000399 26
400
Raymond Hettinger6befb642016-11-21 01:59:39 -0800401 >>> choice(['win', 'lose', 'draw']) # Single random element from a sequence
402 'draw'
Raymond Hettinger3cdf8712010-12-02 05:35:35 +0000403
Raymond Hettinger0a1a9092016-11-17 00:45:35 -0800404 >>> deck = 'ace two three four'.split()
405 >>> shuffle(deck) # Shuffle a list
Raymond Hettinger1c3a1212016-10-12 01:42:10 -0400406 >>> deck
Raymond Hettinger0a1a9092016-11-17 00:45:35 -0800407 ['four', 'two', 'ace', 'three']
Raymond Hettinger3cdf8712010-12-02 05:35:35 +0000408
Raymond Hettinger0a1a9092016-11-17 00:45:35 -0800409 >>> sample([10, 20, 30, 40, 50], k=4) # Four samples without replacement
410 [40, 10, 50, 30]
Raymond Hettinger3cdf8712010-12-02 05:35:35 +0000411
Raymond Hettinger0a1a9092016-11-17 00:45:35 -0800412Simulations::
413
Raymond Hettinger71c62e12016-12-04 11:00:34 -0800414 >>> # Six roulette wheel spins (weighted sampling with replacement)
Raymond Hettinger1c3a1212016-10-12 01:42:10 -0400415 >>> choices(['red', 'black', 'green'], [18, 18, 2], k=6)
416 ['red', 'green', 'black', 'black', 'red', 'black']
Raymond Hettinger2fdc7b12010-12-02 02:41:33 +0000417
Raymond Hettinger71c62e12016-12-04 11:00:34 -0800418 >>> # Deal 20 cards without replacement from a deck of 52 playing cards
419 >>> # and determine the proportion of cards with a ten-value
420 >>> # (a ten, jack, queen, or king).
Raymond Hettinger0a1a9092016-11-17 00:45:35 -0800421 >>> deck = collections.Counter(tens=16, low_cards=36)
422 >>> seen = sample(list(deck.elements()), k=20)
Raymond Hettinger71c62e12016-12-04 11:00:34 -0800423 >>> seen.count('tens') / 20
Raymond Hettinger0a1a9092016-11-17 00:45:35 -0800424 0.15
425
Raymond Hettinger71c62e12016-12-04 11:00:34 -0800426 >>> # Estimate the probability of getting 5 or more heads from 7 spins
427 >>> # of a biased coin that settles on heads 60% of the time.
Raymond Hettinger9abb7252019-02-15 12:40:18 -0800428 >>> def trial():
429 ... return choices('HT', cum_weights=(0.60, 1.00), k=7).count('H') >= 5
430 ...
Raymond Hettingerd3a8d612020-04-21 16:11:00 -0700431 >>> sum(trial() for i in range(10_000)) / 10_000
Raymond Hettinger16ef5d42016-10-31 22:53:52 -0700432 0.4169
433
Raymond Hettinger71c62e12016-12-04 11:00:34 -0800434 >>> # Probability of the median of 5 samples being in middle two quartiles
Raymond Hettinger9abb7252019-02-15 12:40:18 -0800435 >>> def trial():
Raymond Hettingerd3a8d612020-04-21 16:11:00 -0700436 ... return 2_500 <= sorted(choices(range(10_000), k=5))[2] < 7_500
Raymond Hettinger9abb7252019-02-15 12:40:18 -0800437 ...
Raymond Hettingerd3a8d612020-04-21 16:11:00 -0700438 >>> sum(trial() for i in range(10_000)) / 10_000
Raymond Hettinger71c62e12016-12-04 11:00:34 -0800439 0.7958
440
Raymond Hettinger1c3a1212016-10-12 01:42:10 -0400441Example of `statistical bootstrapping
442<https://en.wikipedia.org/wiki/Bootstrapping_(statistics)>`_ using resampling
Raymond Hettingerd3a8d612020-04-21 16:11:00 -0700443with replacement to estimate a confidence interval for the mean of a sample::
Raymond Hettinger2fdc7b12010-12-02 02:41:33 +0000444
Raymond Hettinger1c3a1212016-10-12 01:42:10 -0400445 # http://statistics.about.com/od/Applications/a/Example-Of-Bootstrapping.htm
Raymond Hettinger47d99872019-02-21 15:06:29 -0800446 from statistics import fmean as mean
Raymond Hettinger1c3a1212016-10-12 01:42:10 -0400447 from random import choices
Raymond Hettingerf5b7c7b2016-09-05 13:15:02 -0700448
Raymond Hettingerd3a8d612020-04-21 16:11:00 -0700449 data = [41, 50, 29, 37, 81, 30, 73, 63, 20, 35, 68, 22, 60, 31, 95]
450 means = sorted(mean(choices(data, k=len(data))) for i in range(100))
Raymond Hettinger2589ee32016-11-16 21:34:17 -0800451 print(f'The sample mean of {mean(data):.1f} has a 90% confidence '
Raymond Hettingerd3a8d612020-04-21 16:11:00 -0700452 f'interval from {means[5]:.1f} to {means[94]:.1f}')
Raymond Hettinger2589ee32016-11-16 21:34:17 -0800453
Raymond Hettinger00305ad2016-11-16 22:56:11 -0800454Example of a `resampling permutation test
455<https://en.wikipedia.org/wiki/Resampling_(statistics)#Permutation_tests>`_
456to determine the statistical significance or `p-value
457<https://en.wikipedia.org/wiki/P-value>`_ of an observed difference
458between the effects of a drug versus a placebo::
459
460 # Example from "Statistics is Easy" by Dennis Shasha and Manda Wilson
Raymond Hettinger47d99872019-02-21 15:06:29 -0800461 from statistics import fmean as mean
Raymond Hettinger00305ad2016-11-16 22:56:11 -0800462 from random import shuffle
463
464 drug = [54, 73, 53, 70, 73, 68, 52, 65, 65]
465 placebo = [54, 51, 58, 44, 55, 52, 42, 47, 58, 46]
466 observed_diff = mean(drug) - mean(placebo)
467
Raymond Hettingerd3a8d612020-04-21 16:11:00 -0700468 n = 10_000
Raymond Hettinger00305ad2016-11-16 22:56:11 -0800469 count = 0
470 combined = drug + placebo
471 for i in range(n):
472 shuffle(combined)
473 new_diff = mean(combined[:len(drug)]) - mean(combined[len(drug):])
474 count += (new_diff >= observed_diff)
475
476 print(f'{n} label reshufflings produced only {count} instances with a difference')
477 print(f'at least as extreme as the observed difference of {observed_diff:.1f}.')
478 print(f'The one-sided p-value of {count / n:.4f} leads us to reject the null')
Raymond Hettinger6befb642016-11-21 01:59:39 -0800479 print(f'hypothesis that there is no difference between the drug and the placebo.')
480
Raymond Hettingerd3a8d612020-04-21 16:11:00 -0700481Simulation of arrival times and service deliveries for a multiserver queue::
Raymond Hettinger6befb642016-11-21 01:59:39 -0800482
Raymond Hettingerd3a8d612020-04-21 16:11:00 -0700483 from heapq import heappush, heappop
Raymond Hettinger1149d932016-11-21 14:13:07 -0800484 from random import expovariate, gauss
485 from statistics import mean, median, stdev
Raymond Hettinger6befb642016-11-21 01:59:39 -0800486
487 average_arrival_interval = 5.6
Raymond Hettingerd3a8d612020-04-21 16:11:00 -0700488 average_service_time = 15.0
489 stdev_service_time = 3.5
490 num_servers = 3
Raymond Hettinger6befb642016-11-21 01:59:39 -0800491
Raymond Hettingerd3a8d612020-04-21 16:11:00 -0700492 waits = []
493 arrival_time = 0.0
494 servers = [0.0] * num_servers # time when each server becomes available
495 for i in range(100_000):
496 arrival_time += expovariate(1.0 / average_arrival_interval)
497 next_server_available = heappop(servers)
498 wait = max(0.0, next_server_available - arrival_time)
499 waits.append(wait)
500 service_duration = gauss(average_service_time, stdev_service_time)
501 service_completed = arrival_time + wait + service_duration
502 heappush(servers, service_completed)
Raymond Hettinger1149d932016-11-21 14:13:07 -0800503
Raymond Hettinger1149d932016-11-21 14:13:07 -0800504 print(f'Mean wait: {mean(waits):.1f}. Stdev wait: {stdev(waits):.1f}.')
505 print(f'Median wait: {median(waits):.1f}. Max wait: {max(waits):.1f}.')
Raymond Hettinger6befb642016-11-21 01:59:39 -0800506
Raymond Hettinger05374052016-11-21 10:52:04 -0800507.. seealso::
508
509 `Statistics for Hackers <https://www.youtube.com/watch?v=Iq9DzN6mvYA>`_
510 a video tutorial by
511 `Jake Vanderplas <https://us.pycon.org/2016/speaker/profile/295/>`_
512 on statistical analysis using just a few fundamental concepts
513 including simulation, sampling, shuffling, and cross-validation.
514
515 `Economics Simulation
516 <http://nbviewer.jupyter.org/url/norvig.com/ipython/Economics.ipynb>`_
517 a simulation of a marketplace by
518 `Peter Norvig <http://norvig.com/bio.html>`_ that shows effective
Raymond Hettinger7f946192016-11-21 15:13:18 -0800519 use of many of the tools and distributions provided by this module
Raymond Hettinger05374052016-11-21 10:52:04 -0800520 (gauss, uniform, sample, betavariate, choice, triangular, and randrange).
521
522 `A Concrete Introduction to Probability (using Python)
523 <http://nbviewer.jupyter.org/url/norvig.com/ipython/Probability.ipynb>`_
524 a tutorial by `Peter Norvig <http://norvig.com/bio.html>`_ covering
525 the basics of probability theory, how to write simulations, and
Raymond Hettinger7f946192016-11-21 15:13:18 -0800526 how to perform data analysis using Python.