blob: 9d85c2b9958eb2c34698d7afed9d28eac230764d [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001:mod:`random` --- Generate pseudo-random numbers
2================================================
3
4.. module:: random
5 :synopsis: Generate pseudo-random numbers with various common distributions.
6
Raymond Hettinger10480942011-01-10 03:26:08 +00007**Source code:** :source:`Lib/random.py`
Georg Brandl116aa622007-08-15 14:28:22 +00008
Raymond Hettinger4f707fd2011-01-10 19:54:11 +00009--------------
10
Georg Brandl116aa622007-08-15 14:28:22 +000011This module implements pseudo-random number generators for various
12distributions.
13
Raymond Hettingerb21dac12010-09-07 05:32:49 +000014For integers, there is uniform selection from a range. For sequences, there is
15uniform selection of a random element, a function to generate a random
16permutation of a list in-place, and a function for random sampling without
17replacement.
Georg Brandl116aa622007-08-15 14:28:22 +000018
19On the real line, there are functions to compute uniform, normal (Gaussian),
20lognormal, negative exponential, gamma, and beta distributions. For generating
21distributions of angles, the von Mises distribution is available.
22
Georg Brandl92849d12016-02-19 08:57:38 +010023Almost all module functions depend on the basic function :func:`.random`, which
Georg Brandl116aa622007-08-15 14:28:22 +000024generates a random float uniformly in the semi-open range [0.0, 1.0). Python
25uses the Mersenne Twister as the core generator. It produces 53-bit precision
26floats and has a period of 2\*\*19937-1. The underlying implementation in C is
27both fast and threadsafe. The Mersenne Twister is one of the most extensively
28tested random number generators in existence. However, being completely
29deterministic, it is not suitable for all purposes, and is completely unsuitable
30for cryptographic purposes.
31
32The functions supplied by this module are actually bound methods of a hidden
33instance of the :class:`random.Random` class. You can instantiate your own
Raymond Hettinger28de64f2008-01-13 23:40:30 +000034instances of :class:`Random` to get generators that don't share state.
Georg Brandl116aa622007-08-15 14:28:22 +000035
36Class :class:`Random` can also be subclassed if you want to use a different
Georg Brandl92849d12016-02-19 08:57:38 +010037basic generator of your own devising: in that case, override the :meth:`~Random.random`,
38:meth:`~Random.seed`, :meth:`~Random.getstate`, and :meth:`~Random.setstate` methods.
39Optionally, a new generator can supply a :meth:`~Random.getrandbits` method --- this
Georg Brandl116aa622007-08-15 14:28:22 +000040allows :meth:`randrange` to produce selections over an arbitrarily large range.
41
Benjamin Peterson21896a32010-03-21 22:03:03 +000042The :mod:`random` module also provides the :class:`SystemRandom` class which
43uses the system function :func:`os.urandom` to generate random numbers
44from sources provided by the operating system.
Georg Brandl116aa622007-08-15 14:28:22 +000045
Raymond Hettingerc89a4512014-05-11 02:26:23 -070046.. warning::
47
48 The pseudo-random generators of this module should not be used for
Steven D'Apranob2871fa2016-04-17 01:42:33 +100049 security purposes. For security or cryptographic uses, see the
50 :mod:`secrets` module.
Raymond Hettingerc89a4512014-05-11 02:26:23 -070051
Raymond Hettingere1329102016-11-21 12:33:50 -080052.. seealso::
Georg Brandl116aa622007-08-15 14:28:22 +000053
Raymond Hettingere1329102016-11-21 12:33:50 -080054 M. Matsumoto and T. Nishimura, "Mersenne Twister: A 623-dimensionally
55 equidistributed uniform pseudorandom number generator", ACM Transactions on
Serhiy Storchaka0264e462016-11-26 13:49:59 +020056 Modeling and Computer Simulation Vol. 8, No. 1, January pp.3--30 1998.
Raymond Hettingere1329102016-11-21 12:33:50 -080057
58
59 `Complementary-Multiply-with-Carry recipe
Andre Delfinoe8a20762020-09-26 21:47:25 -030060 <https://code.activestate.com/recipes/576707/>`_ for a compatible alternative
Raymond Hettingere1329102016-11-21 12:33:50 -080061 random number generator with a long period and comparatively simple update
62 operations.
63
64
65Bookkeeping functions
66---------------------
Georg Brandl116aa622007-08-15 14:28:22 +000067
Ezio Melottie0add762012-09-14 06:32:35 +030068.. function:: seed(a=None, version=2)
Georg Brandl116aa622007-08-15 14:28:22 +000069
Raymond Hettingerf763a722010-09-07 00:38:15 +000070 Initialize the random number generator.
Georg Brandl116aa622007-08-15 14:28:22 +000071
Ezio Melottie0add762012-09-14 06:32:35 +030072 If *a* is omitted or ``None``, the current system time is used. If
Raymond Hettingerf763a722010-09-07 00:38:15 +000073 randomness sources are provided by the operating system, they are used
74 instead of the system time (see the :func:`os.urandom` function for details
75 on availability).
Georg Brandl116aa622007-08-15 14:28:22 +000076
Ezio Melottie0add762012-09-14 06:32:35 +030077 If *a* is an int, it is used directly.
Raymond Hettingerf763a722010-09-07 00:38:15 +000078
79 With version 2 (the default), a :class:`str`, :class:`bytes`, or :class:`bytearray`
Raymond Hettinger16eb8272016-09-04 11:17:28 -070080 object gets converted to an :class:`int` and all of its bits are used.
81
82 With version 1 (provided for reproducing random sequences from older versions
83 of Python), the algorithm for :class:`str` and :class:`bytes` generates a
84 narrower range of seeds.
Raymond Hettingerf763a722010-09-07 00:38:15 +000085
86 .. versionchanged:: 3.2
87 Moved to the version 2 scheme which uses all of the bits in a string seed.
Georg Brandl116aa622007-08-15 14:28:22 +000088
Raymond Hettingerd0cdeaa2019-08-22 09:19:36 -070089 .. deprecated:: 3.9
90 In the future, the *seed* must be one of the following types:
91 *NoneType*, :class:`int`, :class:`float`, :class:`str`,
92 :class:`bytes`, or :class:`bytearray`.
93
Georg Brandl116aa622007-08-15 14:28:22 +000094.. function:: getstate()
95
96 Return an object capturing the current internal state of the generator. This
97 object can be passed to :func:`setstate` to restore the state.
98
Georg Brandl116aa622007-08-15 14:28:22 +000099
100.. function:: setstate(state)
101
102 *state* should have been obtained from a previous call to :func:`getstate`, and
103 :func:`setstate` restores the internal state of the generator to what it was at
Sandro Tosi985104a2012-08-12 15:12:15 +0200104 the time :func:`getstate` was called.
Georg Brandl116aa622007-08-15 14:28:22 +0000105
Georg Brandl116aa622007-08-15 14:28:22 +0000106
Raymond Hettingerf01d1be2020-05-04 22:52:13 -0700107Functions for bytes
108-------------------
Georg Brandl116aa622007-08-15 14:28:22 +0000109
Victor Stinner9f5fe792020-04-17 19:05:35 +0200110.. function:: randbytes(n)
111
112 Generate *n* random bytes.
113
Raymond Hettingerf01d1be2020-05-04 22:52:13 -0700114 This method should not be used for generating security tokens.
115 Use :func:`secrets.token_bytes` instead.
116
Victor Stinner9f5fe792020-04-17 19:05:35 +0200117 .. versionadded:: 3.9
118
119
Raymond Hettingere1329102016-11-21 12:33:50 -0800120Functions for integers
121----------------------
Georg Brandl116aa622007-08-15 14:28:22 +0000122
Ezio Melottie0add762012-09-14 06:32:35 +0300123.. function:: randrange(stop)
124 randrange(start, stop[, step])
Georg Brandl116aa622007-08-15 14:28:22 +0000125
126 Return a randomly selected element from ``range(start, stop, step)``. This is
127 equivalent to ``choice(range(start, stop, step))``, but doesn't actually build a
128 range object.
129
Raymond Hettinger05156612010-09-07 04:44:52 +0000130 The positional argument pattern matches that of :func:`range`. Keyword arguments
131 should not be used because the function may use them in unexpected ways.
132
133 .. versionchanged:: 3.2
134 :meth:`randrange` is more sophisticated about producing equally distributed
135 values. Formerly it used a style like ``int(random()*n)`` which could produce
136 slightly uneven distributions.
Georg Brandl116aa622007-08-15 14:28:22 +0000137
Raymond Hettingera9621bb2020-12-28 11:10:34 -0800138 .. deprecated:: 3.10
139 The automatic conversion of non-integer types to equivalent integers is
140 deprecated. Currently ``randrange(10.0)`` is losslessly converted to
141 ``randrange(10)``. In the future, this will raise a :exc:`TypeError`.
142
143 .. deprecated:: 3.10
Serhiy Storchakaf066bd92021-01-25 23:02:04 +0200144 The exception raised for non-integral values such as ``randrange(10.5)``
145 or ``randrange('10')`` will be changed from :exc:`ValueError` to
146 :exc:`TypeError`.
Raymond Hettingera9621bb2020-12-28 11:10:34 -0800147
Georg Brandl116aa622007-08-15 14:28:22 +0000148.. function:: randint(a, b)
149
Raymond Hettingerafd30452009-02-24 10:57:02 +0000150 Return a random integer *N* such that ``a <= N <= b``. Alias for
151 ``randrange(a, b+1)``.
Georg Brandl116aa622007-08-15 14:28:22 +0000152
Raymond Hettingerf01d1be2020-05-04 22:52:13 -0700153.. function:: getrandbits(k)
154
Raymond Hettinger56464142020-12-18 17:03:10 -0800155 Returns a non-negative Python integer with *k* random bits. This method
156 is supplied with the MersenneTwister generator and some other generators
157 may also provide it as an optional part of the API. When available,
158 :meth:`getrandbits` enables :meth:`randrange` to handle arbitrarily large
159 ranges.
Raymond Hettingerf01d1be2020-05-04 22:52:13 -0700160
161 .. versionchanged:: 3.9
162 This method now accepts zero for *k*.
163
Georg Brandl116aa622007-08-15 14:28:22 +0000164
Raymond Hettingere1329102016-11-21 12:33:50 -0800165Functions for sequences
166-----------------------
Georg Brandl116aa622007-08-15 14:28:22 +0000167
168.. function:: choice(seq)
169
170 Return a random element from the non-empty sequence *seq*. If *seq* is empty,
171 raises :exc:`IndexError`.
172
Raymond Hettinger9016f282016-09-26 21:45:57 -0700173.. function:: choices(population, weights=None, *, cum_weights=None, k=1)
Raymond Hettingere8f1e002016-09-06 17:15:29 -0700174
175 Return a *k* sized list of elements chosen from the *population* with replacement.
176 If the *population* is empty, raises :exc:`IndexError`.
177
178 If a *weights* sequence is specified, selections are made according to the
179 relative weights. Alternatively, if a *cum_weights* sequence is given, the
Raymond Hettinger1c3a1212016-10-12 01:42:10 -0400180 selections are made according to the cumulative weights (perhaps computed
181 using :func:`itertools.accumulate`). For example, the relative weights
182 ``[10, 5, 30, 5]`` are equivalent to the cumulative weights
183 ``[10, 15, 45, 50]``. Internally, the relative weights are converted to
184 cumulative weights before making selections, so supplying the cumulative
185 weights saves work.
Raymond Hettingere8f1e002016-09-06 17:15:29 -0700186
187 If neither *weights* nor *cum_weights* are specified, selections are made
188 with equal probability. If a weights sequence is supplied, it must be
189 the same length as the *population* sequence. It is a :exc:`TypeError`
190 to specify both *weights* and *cum_weights*.
191
192 The *weights* or *cum_weights* can use any numeric type that interoperates
193 with the :class:`float` values returned by :func:`random` (that includes
Ram Rachumb0dfc752020-09-29 04:32:10 +0300194 integers, floats, and fractions but excludes decimals). Weights are assumed
195 to be non-negative and finite. A :exc:`ValueError` is raised if all
Raymond Hettinger041d8b42019-11-23 02:22:13 -0800196 weights are zero.
Georg Brandl116aa622007-08-15 14:28:22 +0000197
Raymond Hettinger40ebe942019-01-30 13:30:20 -0800198 For a given seed, the :func:`choices` function with equal weighting
199 typically produces a different sequence than repeated calls to
200 :func:`choice`. The algorithm used by :func:`choices` uses floating
201 point arithmetic for internal consistency and speed. The algorithm used
202 by :func:`choice` defaults to integer arithmetic with repeated selections
203 to avoid small biases from round-off error.
204
Raymond Hettinger1c3a1212016-10-12 01:42:10 -0400205 .. versionadded:: 3.6
206
Raymond Hettinger041d8b42019-11-23 02:22:13 -0800207 .. versionchanged:: 3.9
208 Raises a :exc:`ValueError` if all weights are zero.
209
Raymond Hettinger1c3a1212016-10-12 01:42:10 -0400210
Georg Brandl116aa622007-08-15 14:28:22 +0000211.. function:: shuffle(x[, random])
212
Raymond Hettingera3950e42016-11-17 01:49:54 -0800213 Shuffle the sequence *x* in place.
Georg Brandl116aa622007-08-15 14:28:22 +0000214
Raymond Hettingera3950e42016-11-17 01:49:54 -0800215 The optional argument *random* is a 0-argument function returning a random
216 float in [0.0, 1.0); by default, this is the function :func:`.random`.
217
218 To shuffle an immutable sequence and return a new shuffled list, use
219 ``sample(x, k=len(x))`` instead.
220
221 Note that even for small ``len(x)``, the total number of permutations of *x*
222 can quickly grow larger than the period of most random number generators.
223 This implies that most permutations of a long sequence can never be
224 generated. For example, a sequence of length 2080 is the largest that
225 can fit within the period of the Mersenne Twister random number generator.
Georg Brandl116aa622007-08-15 14:28:22 +0000226
Raymond Hettinger190fac92020-05-02 16:45:32 -0700227 .. deprecated-removed:: 3.9 3.11
228 The optional parameter *random*.
229
Georg Brandl116aa622007-08-15 14:28:22 +0000230
Raymond Hettinger81a5fc32020-05-08 07:53:15 -0700231.. function:: sample(population, k, *, counts=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000232
Raymond Hettinger1acde192008-01-14 01:00:53 +0000233 Return a *k* length list of unique elements chosen from the population sequence
234 or set. Used for random sampling without replacement.
Georg Brandl116aa622007-08-15 14:28:22 +0000235
Georg Brandl116aa622007-08-15 14:28:22 +0000236 Returns a new list containing elements from the population while leaving the
237 original population unchanged. The resulting list is in selection order so that
238 all sub-slices will also be valid random samples. This allows raffle winners
239 (the sample) to be partitioned into grand prize and second place winners (the
240 subslices).
241
Guido van Rossum2cc30da2007-11-02 23:46:40 +0000242 Members of the population need not be :term:`hashable` or unique. If the population
Georg Brandl116aa622007-08-15 14:28:22 +0000243 contains repeats, then each occurrence is a possible selection in the sample.
244
Raymond Hettinger81a5fc32020-05-08 07:53:15 -0700245 Repeated elements can be specified one at a time or with the optional
246 keyword-only *counts* parameter. For example, ``sample(['red', 'blue'],
247 counts=[4, 2], k=5)`` is equivalent to ``sample(['red', 'red', 'red', 'red',
248 'blue', 'blue'], k=5)``.
249
Raymond Hettingera3950e42016-11-17 01:49:54 -0800250 To choose a sample from a range of integers, use a :func:`range` object as an
Georg Brandl116aa622007-08-15 14:28:22 +0000251 argument. This is especially fast and space efficient for sampling from a large
Raymond Hettingera3950e42016-11-17 01:49:54 -0800252 population: ``sample(range(10000000), k=60)``.
Georg Brandl116aa622007-08-15 14:28:22 +0000253
Raymond Hettingerf07d9492012-07-09 12:43:57 -0700254 If the sample size is larger than the population size, a :exc:`ValueError`
Raymond Hettinger86a20f82012-07-08 16:01:53 -0700255 is raised.
256
Raymond Hettinger81a5fc32020-05-08 07:53:15 -0700257 .. versionchanged:: 3.9
258 Added the *counts* parameter.
259
Raymond Hettinger4fe00202020-04-19 00:36:42 -0700260 .. deprecated:: 3.9
261 In the future, the *population* must be a sequence. Instances of
262 :class:`set` are no longer supported. The set must first be converted
263 to a :class:`list` or :class:`tuple`, preferably in a deterministic
264 order so that the sample is reproducible.
265
266
Raymond Hettingerf2bd04f2020-10-13 16:41:26 -0700267.. _real-valued-distributions:
268
Raymond Hettingere1329102016-11-21 12:33:50 -0800269Real-valued distributions
270-------------------------
271
Georg Brandl116aa622007-08-15 14:28:22 +0000272The following functions generate specific real-valued distributions. Function
273parameters are named after the corresponding variables in the distribution's
274equation, as used in common mathematical practice; most of these equations can
275be found in any statistics text.
276
277
278.. function:: random()
279
280 Return the next random floating point number in the range [0.0, 1.0).
281
282
283.. function:: uniform(a, b)
284
Benjamin Petersonb58dda72009-01-18 22:27:04 +0000285 Return a random floating point number *N* such that ``a <= N <= b`` for
286 ``a <= b`` and ``b <= N <= a`` for ``b < a``.
Georg Brandl116aa622007-08-15 14:28:22 +0000287
Raymond Hettingerbe40db02009-06-11 23:12:14 +0000288 The end-point value ``b`` may or may not be included in the range
289 depending on floating-point rounding in the equation ``a + (b-a) * random()``.
Benjamin Peterson35e8c462008-04-24 02:34:53 +0000290
Georg Brandl73dd7c72011-09-17 20:36:28 +0200291
Christian Heimesfe337bf2008-03-23 21:54:12 +0000292.. function:: triangular(low, high, mode)
293
Benjamin Petersonb58dda72009-01-18 22:27:04 +0000294 Return a random floating point number *N* such that ``low <= N <= high`` and
Christian Heimescc47b052008-03-25 14:56:36 +0000295 with the specified *mode* between those bounds. The *low* and *high* bounds
296 default to zero and one. The *mode* argument defaults to the midpoint
297 between the bounds, giving a symmetric distribution.
Christian Heimesfe337bf2008-03-23 21:54:12 +0000298
Georg Brandl116aa622007-08-15 14:28:22 +0000299
300.. function:: betavariate(alpha, beta)
301
Benjamin Petersonb58dda72009-01-18 22:27:04 +0000302 Beta distribution. Conditions on the parameters are ``alpha > 0`` and
303 ``beta > 0``. Returned values range between 0 and 1.
Georg Brandl116aa622007-08-15 14:28:22 +0000304
305
306.. function:: expovariate(lambd)
307
Mark Dickinson2f947362009-01-07 17:54:07 +0000308 Exponential distribution. *lambd* is 1.0 divided by the desired
309 mean. It should be nonzero. (The parameter would be called
310 "lambda", but that is a reserved word in Python.) Returned values
311 range from 0 to positive infinity if *lambd* is positive, and from
312 negative infinity to 0 if *lambd* is negative.
Georg Brandl116aa622007-08-15 14:28:22 +0000313
314
315.. function:: gammavariate(alpha, beta)
316
Benjamin Petersonb58dda72009-01-18 22:27:04 +0000317 Gamma distribution. (*Not* the gamma function!) Conditions on the
318 parameters are ``alpha > 0`` and ``beta > 0``.
Georg Brandl116aa622007-08-15 14:28:22 +0000319
Georg Brandl73dd7c72011-09-17 20:36:28 +0200320 The probability distribution function is::
321
322 x ** (alpha - 1) * math.exp(-x / beta)
323 pdf(x) = --------------------------------------
324 math.gamma(alpha) * beta ** alpha
325
Georg Brandl116aa622007-08-15 14:28:22 +0000326
327.. function:: gauss(mu, sigma)
328
Benjamin Petersonb58dda72009-01-18 22:27:04 +0000329 Gaussian distribution. *mu* is the mean, and *sigma* is the standard
330 deviation. This is slightly faster than the :func:`normalvariate` function
331 defined below.
Georg Brandl116aa622007-08-15 14:28:22 +0000332
Raymond Hettinger3cde3782020-10-25 07:59:01 -0700333 Multithreading note: When two threads call this function
334 simultaneously, it is possible that they will receive the
335 same return value. This can be avoided in three ways.
336 1) Have each thread use a different instance of the random
337 number generator. 2) Put locks around all calls. 3) Use the
338 slower, but thread-safe :func:`normalvariate` function instead.
339
Georg Brandl116aa622007-08-15 14:28:22 +0000340
341.. function:: lognormvariate(mu, sigma)
342
343 Log normal distribution. If you take the natural logarithm of this
344 distribution, you'll get a normal distribution with mean *mu* and standard
345 deviation *sigma*. *mu* can have any value, and *sigma* must be greater than
346 zero.
347
348
349.. function:: normalvariate(mu, sigma)
350
351 Normal distribution. *mu* is the mean, and *sigma* is the standard deviation.
352
353
354.. function:: vonmisesvariate(mu, kappa)
355
356 *mu* is the mean angle, expressed in radians between 0 and 2\*\ *pi*, and *kappa*
357 is the concentration parameter, which must be greater than or equal to zero. If
358 *kappa* is equal to zero, this distribution reduces to a uniform random angle
359 over the range 0 to 2\*\ *pi*.
360
361
362.. function:: paretovariate(alpha)
363
364 Pareto distribution. *alpha* is the shape parameter.
365
366
367.. function:: weibullvariate(alpha, beta)
368
369 Weibull distribution. *alpha* is the scale parameter and *beta* is the shape
370 parameter.
371
372
Raymond Hettingere1329102016-11-21 12:33:50 -0800373Alternative Generator
374---------------------
Georg Brandl116aa622007-08-15 14:28:22 +0000375
Matthias Bussonnier31e8d692019-04-16 09:47:11 -0700376.. class:: Random([seed])
377
378 Class that implements the default pseudo-random number generator used by the
379 :mod:`random` module.
380
Raymond Hettingerd0cdeaa2019-08-22 09:19:36 -0700381 .. deprecated:: 3.9
382 In the future, the *seed* must be one of the following types:
383 :class:`NoneType`, :class:`int`, :class:`float`, :class:`str`,
384 :class:`bytes`, or :class:`bytearray`.
385
Georg Brandl116aa622007-08-15 14:28:22 +0000386.. class:: SystemRandom([seed])
387
388 Class that uses the :func:`os.urandom` function for generating random numbers
389 from sources provided by the operating system. Not available on all systems.
Raymond Hettinger3cdf8712010-12-02 05:35:35 +0000390 Does not rely on software state, and sequences are not reproducible. Accordingly,
Raymond Hettingerafd30452009-02-24 10:57:02 +0000391 the :meth:`seed` method has no effect and is ignored.
Georg Brandl116aa622007-08-15 14:28:22 +0000392 The :meth:`getstate` and :meth:`setstate` methods raise
393 :exc:`NotImplementedError` if called.
394
Georg Brandl116aa622007-08-15 14:28:22 +0000395
Raymond Hettinger435cb0f2010-09-06 23:36:31 +0000396Notes on Reproducibility
Antoine Pitroue72b5862010-12-12 20:13:31 +0000397------------------------
Raymond Hettinger435cb0f2010-09-06 23:36:31 +0000398
Julien Palard58a40542020-01-31 10:50:14 +0100399Sometimes it is useful to be able to reproduce the sequences given by a
400pseudo-random number generator. By re-using a seed value, the same sequence should be
Raymond Hettinger435cb0f2010-09-06 23:36:31 +0000401reproducible from run to run as long as multiple threads are not running.
402
403Most of the random module's algorithms and seeding functions are subject to
404change across Python versions, but two aspects are guaranteed not to change:
405
406* If a new seeding method is added, then a backward compatible seeder will be
407 offered.
408
Georg Brandl92849d12016-02-19 08:57:38 +0100409* The generator's :meth:`~Random.random` method will continue to produce the same
Raymond Hettinger435cb0f2010-09-06 23:36:31 +0000410 sequence when the compatible seeder is given the same seed.
Raymond Hettinger2fdc7b12010-12-02 02:41:33 +0000411
Raymond Hettinger6e353942010-12-04 23:42:12 +0000412.. _random-examples:
Raymond Hettinger2fdc7b12010-12-02 02:41:33 +0000413
Raymond Hettinger8b2ff4c2020-10-13 11:54:21 -0700414Examples
415--------
Raymond Hettinger2fdc7b12010-12-02 02:41:33 +0000416
Raymond Hettinger0a1a9092016-11-17 00:45:35 -0800417Basic examples::
Raymond Hettinger3cdf8712010-12-02 05:35:35 +0000418
Raymond Hettinger0a1a9092016-11-17 00:45:35 -0800419 >>> random() # Random float: 0.0 <= x < 1.0
Raymond Hettinger3cdf8712010-12-02 05:35:35 +0000420 0.37444887175646646
421
Raymond Hettingere1329102016-11-21 12:33:50 -0800422 >>> uniform(2.5, 10.0) # Random float: 2.5 <= x < 10.0
Raymond Hettinger0a1a9092016-11-17 00:45:35 -0800423 3.1800146073117523
Raymond Hettinger3cdf8712010-12-02 05:35:35 +0000424
Raymond Hettingere1329102016-11-21 12:33:50 -0800425 >>> expovariate(1 / 5) # Interval between arrivals averaging 5 seconds
Raymond Hettinger0a1a9092016-11-17 00:45:35 -0800426 5.148957571865031
427
Raymond Hettingere1329102016-11-21 12:33:50 -0800428 >>> randrange(10) # Integer from 0 to 9 inclusive
Raymond Hettinger3cdf8712010-12-02 05:35:35 +0000429 7
430
Raymond Hettinger0a1a9092016-11-17 00:45:35 -0800431 >>> randrange(0, 101, 2) # Even integer from 0 to 100 inclusive
Raymond Hettinger3cdf8712010-12-02 05:35:35 +0000432 26
433
Raymond Hettinger6befb642016-11-21 01:59:39 -0800434 >>> choice(['win', 'lose', 'draw']) # Single random element from a sequence
435 'draw'
Raymond Hettinger3cdf8712010-12-02 05:35:35 +0000436
Raymond Hettinger0a1a9092016-11-17 00:45:35 -0800437 >>> deck = 'ace two three four'.split()
438 >>> shuffle(deck) # Shuffle a list
Raymond Hettinger1c3a1212016-10-12 01:42:10 -0400439 >>> deck
Raymond Hettinger0a1a9092016-11-17 00:45:35 -0800440 ['four', 'two', 'ace', 'three']
Raymond Hettinger3cdf8712010-12-02 05:35:35 +0000441
Raymond Hettinger0a1a9092016-11-17 00:45:35 -0800442 >>> sample([10, 20, 30, 40, 50], k=4) # Four samples without replacement
443 [40, 10, 50, 30]
Raymond Hettinger3cdf8712010-12-02 05:35:35 +0000444
Raymond Hettinger0a1a9092016-11-17 00:45:35 -0800445Simulations::
446
Raymond Hettinger71c62e12016-12-04 11:00:34 -0800447 >>> # Six roulette wheel spins (weighted sampling with replacement)
Raymond Hettinger1c3a1212016-10-12 01:42:10 -0400448 >>> choices(['red', 'black', 'green'], [18, 18, 2], k=6)
449 ['red', 'green', 'black', 'black', 'red', 'black']
Raymond Hettinger2fdc7b12010-12-02 02:41:33 +0000450
Raymond Hettinger81a5fc32020-05-08 07:53:15 -0700451 >>> # Deal 20 cards without replacement from a deck
452 >>> # of 52 playing cards, and determine the proportion of cards
453 >>> # with a ten-value: ten, jack, queen, or king.
454 >>> dealt = sample(['tens', 'low cards'], counts=[16, 36], k=20)
455 >>> dealt.count('tens') / 20
Raymond Hettinger0a1a9092016-11-17 00:45:35 -0800456 0.15
457
Raymond Hettinger71c62e12016-12-04 11:00:34 -0800458 >>> # Estimate the probability of getting 5 or more heads from 7 spins
459 >>> # of a biased coin that settles on heads 60% of the time.
Raymond Hettinger9abb7252019-02-15 12:40:18 -0800460 >>> def trial():
461 ... return choices('HT', cum_weights=(0.60, 1.00), k=7).count('H') >= 5
462 ...
Raymond Hettingerd3a8d612020-04-21 16:11:00 -0700463 >>> sum(trial() for i in range(10_000)) / 10_000
Raymond Hettinger16ef5d42016-10-31 22:53:52 -0700464 0.4169
465
Raymond Hettinger71c62e12016-12-04 11:00:34 -0800466 >>> # Probability of the median of 5 samples being in middle two quartiles
Raymond Hettinger9abb7252019-02-15 12:40:18 -0800467 >>> def trial():
Raymond Hettingerd3a8d612020-04-21 16:11:00 -0700468 ... return 2_500 <= sorted(choices(range(10_000), k=5))[2] < 7_500
Raymond Hettinger9abb7252019-02-15 12:40:18 -0800469 ...
Raymond Hettingerd3a8d612020-04-21 16:11:00 -0700470 >>> sum(trial() for i in range(10_000)) / 10_000
Raymond Hettinger71c62e12016-12-04 11:00:34 -0800471 0.7958
472
Raymond Hettinger1c3a1212016-10-12 01:42:10 -0400473Example of `statistical bootstrapping
474<https://en.wikipedia.org/wiki/Bootstrapping_(statistics)>`_ using resampling
Raymond Hettingerd3a8d612020-04-21 16:11:00 -0700475with replacement to estimate a confidence interval for the mean of a sample::
Raymond Hettinger2fdc7b12010-12-02 02:41:33 +0000476
Raymond Hettinger1c3a1212016-10-12 01:42:10 -0400477 # http://statistics.about.com/od/Applications/a/Example-Of-Bootstrapping.htm
Raymond Hettinger47d99872019-02-21 15:06:29 -0800478 from statistics import fmean as mean
Raymond Hettinger1c3a1212016-10-12 01:42:10 -0400479 from random import choices
Raymond Hettingerf5b7c7b2016-09-05 13:15:02 -0700480
Raymond Hettingerd3a8d612020-04-21 16:11:00 -0700481 data = [41, 50, 29, 37, 81, 30, 73, 63, 20, 35, 68, 22, 60, 31, 95]
482 means = sorted(mean(choices(data, k=len(data))) for i in range(100))
Raymond Hettinger2589ee32016-11-16 21:34:17 -0800483 print(f'The sample mean of {mean(data):.1f} has a 90% confidence '
Raymond Hettingerd3a8d612020-04-21 16:11:00 -0700484 f'interval from {means[5]:.1f} to {means[94]:.1f}')
Raymond Hettinger2589ee32016-11-16 21:34:17 -0800485
Raymond Hettinger00305ad2016-11-16 22:56:11 -0800486Example of a `resampling permutation test
487<https://en.wikipedia.org/wiki/Resampling_(statistics)#Permutation_tests>`_
488to determine the statistical significance or `p-value
489<https://en.wikipedia.org/wiki/P-value>`_ of an observed difference
490between the effects of a drug versus a placebo::
491
492 # Example from "Statistics is Easy" by Dennis Shasha and Manda Wilson
Raymond Hettinger47d99872019-02-21 15:06:29 -0800493 from statistics import fmean as mean
Raymond Hettinger00305ad2016-11-16 22:56:11 -0800494 from random import shuffle
495
496 drug = [54, 73, 53, 70, 73, 68, 52, 65, 65]
497 placebo = [54, 51, 58, 44, 55, 52, 42, 47, 58, 46]
498 observed_diff = mean(drug) - mean(placebo)
499
Raymond Hettingerd3a8d612020-04-21 16:11:00 -0700500 n = 10_000
Raymond Hettinger00305ad2016-11-16 22:56:11 -0800501 count = 0
502 combined = drug + placebo
503 for i in range(n):
504 shuffle(combined)
505 new_diff = mean(combined[:len(drug)]) - mean(combined[len(drug):])
506 count += (new_diff >= observed_diff)
507
508 print(f'{n} label reshufflings produced only {count} instances with a difference')
509 print(f'at least as extreme as the observed difference of {observed_diff:.1f}.')
510 print(f'The one-sided p-value of {count / n:.4f} leads us to reject the null')
Raymond Hettinger6befb642016-11-21 01:59:39 -0800511 print(f'hypothesis that there is no difference between the drug and the placebo.')
512
Raymond Hettingerd3a8d612020-04-21 16:11:00 -0700513Simulation of arrival times and service deliveries for a multiserver queue::
Raymond Hettinger6befb642016-11-21 01:59:39 -0800514
Raymond Hettingerd3a8d612020-04-21 16:11:00 -0700515 from heapq import heappush, heappop
Raymond Hettinger1149d932016-11-21 14:13:07 -0800516 from random import expovariate, gauss
Raymond Hettingere16d2f72020-05-21 01:37:38 -0700517 from statistics import mean, quantiles
Raymond Hettinger6befb642016-11-21 01:59:39 -0800518
519 average_arrival_interval = 5.6
Raymond Hettingerd3a8d612020-04-21 16:11:00 -0700520 average_service_time = 15.0
521 stdev_service_time = 3.5
522 num_servers = 3
Raymond Hettinger6befb642016-11-21 01:59:39 -0800523
Raymond Hettingerd3a8d612020-04-21 16:11:00 -0700524 waits = []
525 arrival_time = 0.0
526 servers = [0.0] * num_servers # time when each server becomes available
527 for i in range(100_000):
528 arrival_time += expovariate(1.0 / average_arrival_interval)
529 next_server_available = heappop(servers)
530 wait = max(0.0, next_server_available - arrival_time)
531 waits.append(wait)
532 service_duration = gauss(average_service_time, stdev_service_time)
533 service_completed = arrival_time + wait + service_duration
534 heappush(servers, service_completed)
Raymond Hettinger1149d932016-11-21 14:13:07 -0800535
Raymond Hettingere16d2f72020-05-21 01:37:38 -0700536 print(f'Mean wait: {mean(waits):.1f} Max wait: {max(waits):.1f}')
537 print('Quartiles:', [round(q, 1) for q in quantiles(waits)])
Raymond Hettinger6befb642016-11-21 01:59:39 -0800538
Raymond Hettinger05374052016-11-21 10:52:04 -0800539.. seealso::
540
541 `Statistics for Hackers <https://www.youtube.com/watch?v=Iq9DzN6mvYA>`_
542 a video tutorial by
543 `Jake Vanderplas <https://us.pycon.org/2016/speaker/profile/295/>`_
544 on statistical analysis using just a few fundamental concepts
545 including simulation, sampling, shuffling, and cross-validation.
546
547 `Economics Simulation
548 <http://nbviewer.jupyter.org/url/norvig.com/ipython/Economics.ipynb>`_
549 a simulation of a marketplace by
550 `Peter Norvig <http://norvig.com/bio.html>`_ that shows effective
Raymond Hettinger7f946192016-11-21 15:13:18 -0800551 use of many of the tools and distributions provided by this module
Raymond Hettinger05374052016-11-21 10:52:04 -0800552 (gauss, uniform, sample, betavariate, choice, triangular, and randrange).
553
554 `A Concrete Introduction to Probability (using Python)
555 <http://nbviewer.jupyter.org/url/norvig.com/ipython/Probability.ipynb>`_
556 a tutorial by `Peter Norvig <http://norvig.com/bio.html>`_ covering
557 the basics of probability theory, how to write simulations, and
Raymond Hettinger7f946192016-11-21 15:13:18 -0800558 how to perform data analysis using Python.
Raymond Hettinger8b2ff4c2020-10-13 11:54:21 -0700559
Raymond Hettingerf2bd04f2020-10-13 16:41:26 -0700560
561Recipes
562-------
563
564The default :func:`.random` returns multiples of 2⁻⁵³ in the range
565*0.0 ≤ x < 1.0*. All such numbers are evenly spaced and are exactly
Raymond Hettingerb67cbbd2020-10-14 23:41:55 -0700566representable as Python floats. However, many other representable
567floats in that interval are not possible selections. For example,
568``0.05954861408025609`` isn't an integer multiple of 2⁻⁵³.
Raymond Hettingerf2bd04f2020-10-13 16:41:26 -0700569
570The following recipe takes a different approach. All floats in the
571interval are possible selections. The mantissa comes from a uniform
572distribution of integers in the range *2⁵² ≤ mantissa < 2⁵³*. The
573exponent comes from a geometric distribution where exponents smaller
574than *-53* occur half as often as the next larger exponent.
575
576::
577
578 from random import Random
579 from math import ldexp
580
581 class FullRandom(Random):
582
583 def random(self):
584 mantissa = 0x10_0000_0000_0000 | self.getrandbits(52)
585 exponent = -53
586 x = 0
587 while not x:
588 x = self.getrandbits(32)
589 exponent += x.bit_length() - 32
590 return ldexp(mantissa, exponent)
591
592All :ref:`real valued distributions <real-valued-distributions>`
593in the class will use the new method::
594
595 >>> fr = FullRandom()
596 >>> fr.random()
597 0.05954861408025609
598 >>> fr.expovariate(0.25)
599 8.87925541791544
600
601The recipe is conceptually equivalent to an algorithm that chooses from
602all the multiples of 2⁻¹⁰⁷⁴ in the range *0.0 ≤ x < 1.0*. All such
603numbers are evenly spaced, but most have to be rounded down to the
604nearest representable Python float. (The value 2⁻¹⁰⁷⁴ is the smallest
605positive unnormalized float and is equal to ``math.ulp(0.0)``.)
606
607
608.. seealso::
609
Raymond Hettinger8b2ff4c2020-10-13 11:54:21 -0700610 `Generating Pseudo-random Floating-Point Values
611 <https://allendowney.com/research/rand/downey07randfloat.pdf>`_ a
612 paper by Allen B. Downey describing ways to generate more
613 fine-grained floats than normally generated by :func:`.random`.