Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 1 | :mod:`bisect` --- Array bisection algorithm |
| 2 | =========================================== |
| 3 | |
| 4 | .. module:: bisect |
| 5 | :synopsis: Array bisection algorithms for binary searching. |
| 6 | .. sectionauthor:: Fred L. Drake, Jr. <fdrake@acm.org> |
Raymond Hettinger | 54f824f | 2010-09-01 19:42:36 +0000 | [diff] [blame] | 7 | .. sectionauthor:: Raymond Hettinger <python at rcn.com> |
Georg Brandl | b19be57 | 2007-12-29 10:57:00 +0000 | [diff] [blame] | 8 | .. example based on the PyModules FAQ entry by Aaron Watters <arw@pythonpros.com> |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 9 | |
Éric Araujo | 29a0b57 | 2011-08-19 02:14:03 +0200 | [diff] [blame] | 10 | .. versionadded:: 2.1 |
| 11 | |
| 12 | **Source code:** :source:`Lib/bisect.py` |
| 13 | |
| 14 | -------------- |
| 15 | |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 16 | This module provides support for maintaining a list in sorted order without |
| 17 | having to sort the list after each insertion. For long lists of items with |
| 18 | expensive comparison operations, this can be an improvement over the more common |
| 19 | approach. The module is called :mod:`bisect` because it uses a basic bisection |
| 20 | algorithm to do its work. The source code may be most useful as a working |
| 21 | example of the algorithm (the boundary conditions are already right!). |
| 22 | |
| 23 | The following functions are provided: |
| 24 | |
| 25 | |
Raymond Hettinger | 54f824f | 2010-09-01 19:42:36 +0000 | [diff] [blame] | 26 | .. function:: bisect_left(a, x, lo=0, hi=len(a)) |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 27 | |
Raymond Hettinger | 54f824f | 2010-09-01 19:42:36 +0000 | [diff] [blame] | 28 | Locate the insertion point for *x* in *a* to maintain sorted order. |
| 29 | The parameters *lo* and *hi* may be used to specify a subset of the list |
| 30 | which should be considered; by default the entire list is used. If *x* is |
| 31 | already present in *a*, the insertion point will be before (to the left of) |
| 32 | any existing entries. The return value is suitable for use as the first |
| 33 | parameter to ``list.insert()`` assuming that *a* is already sorted. |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 34 | |
Raymond Hettinger | 54f824f | 2010-09-01 19:42:36 +0000 | [diff] [blame] | 35 | The returned insertion point *i* partitions the array *a* into two halves so |
| 36 | that ``all(val < x for val in a[lo:i])`` for the left side and |
| 37 | ``all(val >= x for val in a[i:hi])`` for the right side. |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 38 | |
Raymond Hettinger | 54f824f | 2010-09-01 19:42:36 +0000 | [diff] [blame] | 39 | .. function:: bisect_right(a, x, lo=0, hi=len(a)) |
| 40 | bisect(a, x, lo=0, hi=len(a)) |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 41 | |
Raymond Hettinger | 54f824f | 2010-09-01 19:42:36 +0000 | [diff] [blame] | 42 | Similar to :func:`bisect_left`, but returns an insertion point which comes |
| 43 | after (to the right of) any existing entries of *x* in *a*. |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 44 | |
Raymond Hettinger | 54f824f | 2010-09-01 19:42:36 +0000 | [diff] [blame] | 45 | The returned insertion point *i* partitions the array *a* into two halves so |
| 46 | that ``all(val <= x for val in a[lo:i])`` for the left side and |
| 47 | ``all(val > x for val in a[i:hi])`` for the right side. |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 48 | |
Raymond Hettinger | 54f824f | 2010-09-01 19:42:36 +0000 | [diff] [blame] | 49 | .. function:: insort_left(a, x, lo=0, hi=len(a)) |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 50 | |
Raymond Hettinger | 54f824f | 2010-09-01 19:42:36 +0000 | [diff] [blame] | 51 | Insert *x* in *a* in sorted order. This is equivalent to |
| 52 | ``a.insert(bisect.bisect_left(a, x, lo, hi), x)`` assuming that *a* is |
| 53 | already sorted. Keep in mind that the O(log n) search is dominated by |
| 54 | the slow O(n) insertion step. |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 55 | |
Raymond Hettinger | 54f824f | 2010-09-01 19:42:36 +0000 | [diff] [blame] | 56 | .. function:: insort_right(a, x, lo=0, hi=len(a)) |
Raymond Hettinger | 47ed1c1 | 2010-08-07 21:55:06 +0000 | [diff] [blame] | 57 | insort(a, x, lo=0, hi=len(a)) |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 58 | |
Raymond Hettinger | 54f824f | 2010-09-01 19:42:36 +0000 | [diff] [blame] | 59 | Similar to :func:`insort_left`, but inserting *x* in *a* after any existing |
| 60 | entries of *x*. |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 61 | |
Raymond Hettinger | 54f824f | 2010-09-01 19:42:36 +0000 | [diff] [blame] | 62 | .. seealso:: |
| 63 | |
| 64 | `SortedCollection recipe |
| 65 | <http://code.activestate.com/recipes/577197-sortedcollection/>`_ that uses |
| 66 | bisect to build a full-featured collection class with straight-forward search |
| 67 | methods and support for a key-function. The keys are precomputed to save |
| 68 | unnecessary calls to the key function during searches. |
| 69 | |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 70 | |
Raymond Hettinger | 47ed1c1 | 2010-08-07 21:55:06 +0000 | [diff] [blame] | 71 | Searching Sorted Lists |
| 72 | ---------------------- |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 73 | |
Raymond Hettinger | 54f824f | 2010-09-01 19:42:36 +0000 | [diff] [blame] | 74 | The above :func:`bisect` functions are useful for finding insertion points but |
| 75 | can be tricky or awkward to use for common searching tasks. The following five |
Raymond Hettinger | 47ed1c1 | 2010-08-07 21:55:06 +0000 | [diff] [blame] | 76 | functions show how to transform them into the standard lookups for sorted |
| 77 | lists:: |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 78 | |
Raymond Hettinger | 54f824f | 2010-09-01 19:42:36 +0000 | [diff] [blame] | 79 | def index(a, x): |
| 80 | 'Locate the leftmost value exactly equal to x' |
| 81 | i = bisect_left(a, x) |
| 82 | if i != len(a) and a[i] == x: |
| 83 | return i |
| 84 | raise ValueError |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 85 | |
Raymond Hettinger | 54f824f | 2010-09-01 19:42:36 +0000 | [diff] [blame] | 86 | def find_lt(a, x): |
| 87 | 'Find rightmost value less than x' |
| 88 | i = bisect_left(a, x) |
| 89 | if i: |
| 90 | return a[i-1] |
| 91 | raise ValueError |
| 92 | |
| 93 | def find_le(a, x): |
| 94 | 'Find rightmost value less than or equal to x' |
| 95 | i = bisect_right(a, x) |
| 96 | if i: |
| 97 | return a[i-1] |
| 98 | raise ValueError |
| 99 | |
| 100 | def find_gt(a, x): |
| 101 | 'Find leftmost value greater than x' |
| 102 | i = bisect_right(a, x) |
| 103 | if i != len(a): |
Raymond Hettinger | 47ed1c1 | 2010-08-07 21:55:06 +0000 | [diff] [blame] | 104 | return a[i] |
Raymond Hettinger | 54f824f | 2010-09-01 19:42:36 +0000 | [diff] [blame] | 105 | raise ValueError |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 106 | |
Raymond Hettinger | 54f824f | 2010-09-01 19:42:36 +0000 | [diff] [blame] | 107 | def find_ge(a, x): |
| 108 | 'Find leftmost item greater than or equal to x' |
| 109 | i = bisect_left(a, x) |
| 110 | if i != len(a): |
Raymond Hettinger | 47ed1c1 | 2010-08-07 21:55:06 +0000 | [diff] [blame] | 111 | return a[i] |
Raymond Hettinger | 54f824f | 2010-09-01 19:42:36 +0000 | [diff] [blame] | 112 | raise ValueError |
Raymond Hettinger | 47ed1c1 | 2010-08-07 21:55:06 +0000 | [diff] [blame] | 113 | |
Raymond Hettinger | 47ed1c1 | 2010-08-07 21:55:06 +0000 | [diff] [blame] | 114 | |
| 115 | Other Examples |
| 116 | -------------- |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 117 | |
| 118 | .. _bisect-example: |
| 119 | |
Raymond Hettinger | 54f824f | 2010-09-01 19:42:36 +0000 | [diff] [blame] | 120 | The :func:`bisect` function can be useful for numeric table lookups. This |
| 121 | example uses :func:`bisect` to look up a letter grade for an exam score (say) |
| 122 | based on a set of ordered numeric breakpoints: 90 and up is an 'A', 80 to 89 is |
| 123 | a 'B', and so on:: |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 124 | |
Raymond Hettinger | 54f824f | 2010-09-01 19:42:36 +0000 | [diff] [blame] | 125 | >>> def grade(score, breakpoints=[60, 70, 80, 90], grades='FDCBA'): |
Raymond Hettinger | 9aa1395 | 2012-04-27 09:55:31 -0700 | [diff] [blame] | 126 | i = bisect(breakpoints, score) |
| 127 | return grades[i] |
| 128 | |
Raymond Hettinger | 54f824f | 2010-09-01 19:42:36 +0000 | [diff] [blame] | 129 | >>> [grade(score) for score in [33, 99, 77, 70, 89, 90, 100]] |
| 130 | ['F', 'A', 'C', 'C', 'B', 'A', 'A'] |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 131 | |
Raymond Hettinger | 87be88c | 2009-06-11 22:04:00 +0000 | [diff] [blame] | 132 | Unlike the :func:`sorted` function, it does not make sense for the :func:`bisect` |
| 133 | functions to have *key* or *reversed* arguments because that would lead to an |
Georg Brandl | 0930228 | 2010-10-06 09:32:48 +0000 | [diff] [blame] | 134 | inefficient design (successive calls to bisect functions would not "remember" |
Raymond Hettinger | 87be88c | 2009-06-11 22:04:00 +0000 | [diff] [blame] | 135 | all of the previous key lookups). |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 136 | |
Raymond Hettinger | 87be88c | 2009-06-11 22:04:00 +0000 | [diff] [blame] | 137 | Instead, it is better to search a list of precomputed keys to find the index |
| 138 | of the record in question:: |
| 139 | |
| 140 | >>> data = [('red', 5), ('blue', 1), ('yellow', 8), ('black', 0)] |
Raymond Hettinger | 2742e7e | 2009-06-11 22:08:10 +0000 | [diff] [blame] | 141 | >>> data.sort(key=lambda r: r[1]) |
| 142 | >>> keys = [r[1] for r in data] # precomputed list of keys |
Raymond Hettinger | 87be88c | 2009-06-11 22:04:00 +0000 | [diff] [blame] | 143 | >>> data[bisect_left(keys, 0)] |
| 144 | ('black', 0) |
| 145 | >>> data[bisect_left(keys, 1)] |
| 146 | ('blue', 1) |
| 147 | >>> data[bisect_left(keys, 5)] |
| 148 | ('red', 5) |
| 149 | >>> data[bisect_left(keys, 8)] |
| 150 | ('yellow', 8) |
Raymond Hettinger | 47ed1c1 | 2010-08-07 21:55:06 +0000 | [diff] [blame] | 151 | |