Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 1 | :mod:`bisect` --- Array bisection algorithm |
| 2 | =========================================== |
| 3 | |
| 4 | .. module:: bisect |
| 5 | :synopsis: Array bisection algorithms for binary searching. |
| 6 | .. sectionauthor:: Fred L. Drake, Jr. <fdrake@acm.org> |
Raymond Hettinger | 20933e0 | 2010-09-01 06:58:25 +0000 | [diff] [blame] | 7 | .. sectionauthor:: Raymond Hettinger <python at rcn.com> |
Christian Heimes | 5b5e81c | 2007-12-31 16:14:33 +0000 | [diff] [blame] | 8 | .. example based on the PyModules FAQ entry by Aaron Watters <arw@pythonpros.com> |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 9 | |
Raymond Hettinger | 1048094 | 2011-01-10 03:26:08 +0000 | [diff] [blame] | 10 | **Source code:** :source:`Lib/bisect.py` |
| 11 | |
Raymond Hettinger | 4f707fd | 2011-01-10 19:54:11 +0000 | [diff] [blame] | 12 | -------------- |
| 13 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 14 | This module provides support for maintaining a list in sorted order without |
| 15 | having to sort the list after each insertion. For long lists of items with |
| 16 | expensive comparison operations, this can be an improvement over the more common |
| 17 | approach. The module is called :mod:`bisect` because it uses a basic bisection |
| 18 | algorithm to do its work. The source code may be most useful as a working |
| 19 | example of the algorithm (the boundary conditions are already right!). |
| 20 | |
| 21 | The following functions are provided: |
| 22 | |
| 23 | |
Raymond Hettinger | 871934d | 2020-10-19 22:04:01 -0700 | [diff] [blame] | 24 | .. function:: bisect_left(a, x, lo=0, hi=len(a), *, key=None) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 25 | |
Raymond Hettinger | 20933e0 | 2010-09-01 06:58:25 +0000 | [diff] [blame] | 26 | Locate the insertion point for *x* in *a* to maintain sorted order. |
Georg Brandl | 0d8f073 | 2009-04-05 22:20:44 +0000 | [diff] [blame] | 27 | The parameters *lo* and *hi* may be used to specify a subset of the list |
| 28 | which should be considered; by default the entire list is used. If *x* is |
| 29 | already present in *a*, the insertion point will be before (to the left of) |
| 30 | any existing entries. The return value is suitable for use as the first |
Raymond Hettinger | 20933e0 | 2010-09-01 06:58:25 +0000 | [diff] [blame] | 31 | parameter to ``list.insert()`` assuming that *a* is already sorted. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 32 | |
Raymond Hettinger | 20933e0 | 2010-09-01 06:58:25 +0000 | [diff] [blame] | 33 | The returned insertion point *i* partitions the array *a* into two halves so |
Raymond Hettinger | 871934d | 2020-10-19 22:04:01 -0700 | [diff] [blame] | 34 | that ``all(val < x for val in a[lo : i])`` for the left side and |
| 35 | ``all(val >= x for val in a[i : hi])`` for the right side. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 36 | |
Raymond Hettinger | 871934d | 2020-10-19 22:04:01 -0700 | [diff] [blame] | 37 | *key* specifies a :term:`key function` of one argument that is used to |
| 38 | extract a comparison key from each input element. The default value is |
| 39 | ``None`` (compare the elements directly). |
| 40 | |
| 41 | .. versionchanged:: 3.10 |
| 42 | Added the *key* parameter. |
| 43 | |
| 44 | |
| 45 | .. function:: bisect_right(a, x, lo=0, hi=len(a), *, key=None) |
Georg Brandl | 0d8f073 | 2009-04-05 22:20:44 +0000 | [diff] [blame] | 46 | bisect(a, x, lo=0, hi=len(a)) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 47 | |
Georg Brandl | 0d8f073 | 2009-04-05 22:20:44 +0000 | [diff] [blame] | 48 | Similar to :func:`bisect_left`, but returns an insertion point which comes |
| 49 | after (to the right of) any existing entries of *x* in *a*. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 50 | |
Raymond Hettinger | 20933e0 | 2010-09-01 06:58:25 +0000 | [diff] [blame] | 51 | The returned insertion point *i* partitions the array *a* into two halves so |
Raymond Hettinger | 871934d | 2020-10-19 22:04:01 -0700 | [diff] [blame] | 52 | that ``all(val <= x for val in a[lo : i])`` for the left side and |
| 53 | ``all(val > x for val in a[i : hi])`` for the right side. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 54 | |
Raymond Hettinger | 871934d | 2020-10-19 22:04:01 -0700 | [diff] [blame] | 55 | *key* specifies a :term:`key function` of one argument that is used to |
| 56 | extract a comparison key from each input element. The default value is |
| 57 | ``None`` (compare the elements directly). |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 58 | |
Raymond Hettinger | 871934d | 2020-10-19 22:04:01 -0700 | [diff] [blame] | 59 | .. versionchanged:: 3.10 |
| 60 | Added the *key* parameter. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 61 | |
Raymond Hettinger | 871934d | 2020-10-19 22:04:01 -0700 | [diff] [blame] | 62 | |
| 63 | .. function:: insort_left(a, x, lo=0, hi=len(a), *, key=None) |
| 64 | |
| 65 | Insert *x* in *a* in sorted order. |
| 66 | |
| 67 | *key* specifies a :term:`key function` of one argument that is used to |
| 68 | extract a comparison key from each input element. The default value is |
| 69 | ``None`` (compare the elements directly). |
| 70 | |
| 71 | This function first runs :func:`bisect_left` to locate an insertion point. |
| 72 | Next, it runs the :meth:`insert` method on *a* to insert *x* at the |
| 73 | appropriate position to maintain sort order. |
| 74 | |
| 75 | Keep in mind that the ``O(log n)`` search is dominated by the slow O(n) |
| 76 | insertion step. |
| 77 | |
| 78 | .. versionchanged:: 3.10 |
| 79 | Added the *key* parameter. |
| 80 | |
| 81 | |
| 82 | .. function:: insort_right(a, x, lo=0, hi=len(a), *, key=None) |
Georg Brandl | 0d8f073 | 2009-04-05 22:20:44 +0000 | [diff] [blame] | 83 | insort(a, x, lo=0, hi=len(a)) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 84 | |
Georg Brandl | 0d8f073 | 2009-04-05 22:20:44 +0000 | [diff] [blame] | 85 | Similar to :func:`insort_left`, but inserting *x* in *a* after any existing |
| 86 | entries of *x*. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 87 | |
Raymond Hettinger | 871934d | 2020-10-19 22:04:01 -0700 | [diff] [blame] | 88 | *key* specifies a :term:`key function` of one argument that is used to |
| 89 | extract a comparison key from each input element. The default value is |
| 90 | ``None`` (compare the elements directly). |
| 91 | |
| 92 | This function first runs :func:`bisect_right` to locate an insertion point. |
| 93 | Next, it runs the :meth:`insert` method on *a* to insert *x* at the |
| 94 | appropriate position to maintain sort order. |
| 95 | |
| 96 | Keep in mind that the ``O(log n)`` search is dominated by the slow O(n) |
| 97 | insertion step. |
| 98 | |
| 99 | .. versionchanged:: 3.10 |
| 100 | Added the *key* parameter. |
| 101 | |
| 102 | |
| 103 | Performance Notes |
| 104 | ----------------- |
| 105 | |
| 106 | When writing time sensitive code using *bisect()* and *insort()*, keep these |
| 107 | thoughts in mind: |
| 108 | |
| 109 | * Bisection is effective for searching ranges of values. |
| 110 | For locating specific values, dictionaries are more performant. |
| 111 | |
| 112 | * The *insort()* functions are ``O(n)`` because the logarithmic search step |
| 113 | is dominated by the linear time insertion step. |
| 114 | |
| 115 | * The search functions are stateless and discard key function results after |
| 116 | they are used. Consequently, if the search functions are used in a loop, |
| 117 | the key function may be called again and again on the same array elements. |
| 118 | If the key function isn't fast, consider wrapping it with |
| 119 | :func:`functools.cache` to avoid duplicate computations. Alternatively, |
| 120 | consider searching an array of precomputed keys to locate the insertion |
| 121 | point (as shown in the examples section below). |
| 122 | |
Raymond Hettinger | 20933e0 | 2010-09-01 06:58:25 +0000 | [diff] [blame] | 123 | .. seealso:: |
| 124 | |
Raymond Hettinger | 871934d | 2020-10-19 22:04:01 -0700 | [diff] [blame] | 125 | * `Sorted Collections |
| 126 | <http://www.grantjenks.com/docs/sortedcollections/>`_ is a high performance |
| 127 | module that uses *bisect* to managed sorted collections of data. |
| 128 | |
| 129 | * The `SortedCollection recipe |
| 130 | <https://code.activestate.com/recipes/577197-sortedcollection/>`_ uses |
| 131 | bisect to build a full-featured collection class with straight-forward search |
| 132 | methods and support for a key-function. The keys are precomputed to save |
| 133 | unnecessary calls to the key function during searches. |
Raymond Hettinger | 20933e0 | 2010-09-01 06:58:25 +0000 | [diff] [blame] | 134 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 135 | |
Raymond Hettinger | 87c9d6c | 2010-08-07 07:36:55 +0000 | [diff] [blame] | 136 | Searching Sorted Lists |
| 137 | ---------------------- |
| 138 | |
Raymond Hettinger | 20933e0 | 2010-09-01 06:58:25 +0000 | [diff] [blame] | 139 | The above :func:`bisect` functions are useful for finding insertion points but |
| 140 | can be tricky or awkward to use for common searching tasks. The following five |
Raymond Hettinger | 87c9d6c | 2010-08-07 07:36:55 +0000 | [diff] [blame] | 141 | functions show how to transform them into the standard lookups for sorted |
| 142 | lists:: |
| 143 | |
Raymond Hettinger | 20933e0 | 2010-09-01 06:58:25 +0000 | [diff] [blame] | 144 | def index(a, x): |
| 145 | 'Locate the leftmost value exactly equal to x' |
| 146 | i = bisect_left(a, x) |
| 147 | if i != len(a) and a[i] == x: |
| 148 | return i |
| 149 | raise ValueError |
Raymond Hettinger | 87c9d6c | 2010-08-07 07:36:55 +0000 | [diff] [blame] | 150 | |
Raymond Hettinger | 20933e0 | 2010-09-01 06:58:25 +0000 | [diff] [blame] | 151 | def find_lt(a, x): |
| 152 | 'Find rightmost value less than x' |
| 153 | i = bisect_left(a, x) |
| 154 | if i: |
| 155 | return a[i-1] |
| 156 | raise ValueError |
| 157 | |
| 158 | def find_le(a, x): |
| 159 | 'Find rightmost value less than or equal to x' |
| 160 | i = bisect_right(a, x) |
| 161 | if i: |
| 162 | return a[i-1] |
| 163 | raise ValueError |
| 164 | |
| 165 | def find_gt(a, x): |
| 166 | 'Find leftmost value greater than x' |
| 167 | i = bisect_right(a, x) |
| 168 | if i != len(a): |
Raymond Hettinger | 87c9d6c | 2010-08-07 07:36:55 +0000 | [diff] [blame] | 169 | return a[i] |
Raymond Hettinger | 20933e0 | 2010-09-01 06:58:25 +0000 | [diff] [blame] | 170 | raise ValueError |
Raymond Hettinger | 87c9d6c | 2010-08-07 07:36:55 +0000 | [diff] [blame] | 171 | |
Raymond Hettinger | 20933e0 | 2010-09-01 06:58:25 +0000 | [diff] [blame] | 172 | def find_ge(a, x): |
| 173 | 'Find leftmost item greater than or equal to x' |
| 174 | i = bisect_left(a, x) |
| 175 | if i != len(a): |
Raymond Hettinger | 87c9d6c | 2010-08-07 07:36:55 +0000 | [diff] [blame] | 176 | return a[i] |
Raymond Hettinger | 20933e0 | 2010-09-01 06:58:25 +0000 | [diff] [blame] | 177 | raise ValueError |
Raymond Hettinger | 87c9d6c | 2010-08-07 07:36:55 +0000 | [diff] [blame] | 178 | |
Raymond Hettinger | 87c9d6c | 2010-08-07 07:36:55 +0000 | [diff] [blame] | 179 | |
Raymond Hettinger | 871934d | 2020-10-19 22:04:01 -0700 | [diff] [blame] | 180 | Examples |
| 181 | -------- |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 182 | |
| 183 | .. _bisect-example: |
| 184 | |
Raymond Hettinger | 20933e0 | 2010-09-01 06:58:25 +0000 | [diff] [blame] | 185 | The :func:`bisect` function can be useful for numeric table lookups. This |
| 186 | example uses :func:`bisect` to look up a letter grade for an exam score (say) |
| 187 | based on a set of ordered numeric breakpoints: 90 and up is an 'A', 80 to 89 is |
| 188 | a 'B', and so on:: |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 189 | |
Raymond Hettinger | 20933e0 | 2010-09-01 06:58:25 +0000 | [diff] [blame] | 190 | >>> def grade(score, breakpoints=[60, 70, 80, 90], grades='FDCBA'): |
| 191 | ... i = bisect(breakpoints, score) |
| 192 | ... return grades[i] |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 193 | ... |
Raymond Hettinger | 20933e0 | 2010-09-01 06:58:25 +0000 | [diff] [blame] | 194 | >>> [grade(score) for score in [33, 99, 77, 70, 89, 90, 100]] |
| 195 | ['F', 'A', 'C', 'C', 'B', 'A', 'A'] |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 196 | |
Raymond Hettinger | 871934d | 2020-10-19 22:04:01 -0700 | [diff] [blame] | 197 | One technique to avoid repeated calls to a key function is to search a list of |
| 198 | precomputed keys to find the index of a record:: |
Raymond Hettinger | e046d2a | 2009-06-11 22:01:24 +0000 | [diff] [blame] | 199 | |
| 200 | >>> data = [('red', 5), ('blue', 1), ('yellow', 8), ('black', 0)] |
Raymond Hettinger | 871934d | 2020-10-19 22:04:01 -0700 | [diff] [blame] | 201 | >>> data.sort(key=lambda r: r[1]) # Or use operator.itemgetter(1). |
| 202 | >>> keys = [r[1] for r in data] # Precompute a list of keys. |
Raymond Hettinger | e046d2a | 2009-06-11 22:01:24 +0000 | [diff] [blame] | 203 | >>> data[bisect_left(keys, 0)] |
| 204 | ('black', 0) |
| 205 | >>> data[bisect_left(keys, 1)] |
| 206 | ('blue', 1) |
| 207 | >>> data[bisect_left(keys, 5)] |
| 208 | ('red', 5) |
| 209 | >>> data[bisect_left(keys, 8)] |
| 210 | ('yellow', 8) |
Raymond Hettinger | 87c9d6c | 2010-08-07 07:36:55 +0000 | [diff] [blame] | 211 | |