Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 1 | |
| 2 | :mod:`collections` --- High-performance container datatypes |
| 3 | =========================================================== |
| 4 | |
| 5 | .. module:: collections |
| 6 | :synopsis: High-performance datatypes |
| 7 | .. moduleauthor:: Raymond Hettinger <python@rcn.com> |
| 8 | .. sectionauthor:: Raymond Hettinger <python@rcn.com> |
| 9 | |
| 10 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 11 | This module implements high-performance container datatypes. Currently, |
| 12 | there are two datatypes, :class:`deque` and :class:`defaultdict`, and |
| 13 | one datatype factory function, :func:`NamedTuple`. Python already |
| 14 | includes built-in containers, :class:`dict`, :class:`list`, |
| 15 | :class:`set`, and :class:`tuple`. In addition, the optional :mod:`bsddb` |
| 16 | module has a :meth:`bsddb.btopen` method that can be used to create in-memory |
| 17 | or file based ordered dictionaries with string keys. |
| 18 | |
| 19 | Future editions of the standard library may include balanced trees and |
| 20 | ordered dictionaries. |
| 21 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 22 | |
| 23 | .. _deque-objects: |
| 24 | |
| 25 | :class:`deque` objects |
| 26 | ---------------------- |
| 27 | |
| 28 | |
| 29 | .. class:: deque([iterable]) |
| 30 | |
| 31 | Returns a new deque object initialized left-to-right (using :meth:`append`) with |
| 32 | data from *iterable*. If *iterable* is not specified, the new deque is empty. |
| 33 | |
| 34 | Deques are a generalization of stacks and queues (the name is pronounced "deck" |
| 35 | and is short for "double-ended queue"). Deques support thread-safe, memory |
| 36 | efficient appends and pops from either side of the deque with approximately the |
| 37 | same O(1) performance in either direction. |
| 38 | |
| 39 | Though :class:`list` objects support similar operations, they are optimized for |
| 40 | fast fixed-length operations and incur O(n) memory movement costs for |
| 41 | ``pop(0)`` and ``insert(0, v)`` operations which change both the size and |
| 42 | position of the underlying data representation. |
| 43 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 44 | |
| 45 | Deque objects support the following methods: |
| 46 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 47 | .. method:: deque.append(x) |
| 48 | |
| 49 | Add *x* to the right side of the deque. |
| 50 | |
| 51 | |
| 52 | .. method:: deque.appendleft(x) |
| 53 | |
| 54 | Add *x* to the left side of the deque. |
| 55 | |
| 56 | |
| 57 | .. method:: deque.clear() |
| 58 | |
| 59 | Remove all elements from the deque leaving it with length 0. |
| 60 | |
| 61 | |
| 62 | .. method:: deque.extend(iterable) |
| 63 | |
| 64 | Extend the right side of the deque by appending elements from the iterable |
| 65 | argument. |
| 66 | |
| 67 | |
| 68 | .. method:: deque.extendleft(iterable) |
| 69 | |
| 70 | Extend the left side of the deque by appending elements from *iterable*. Note, |
| 71 | the series of left appends results in reversing the order of elements in the |
| 72 | iterable argument. |
| 73 | |
| 74 | |
| 75 | .. method:: deque.pop() |
| 76 | |
| 77 | Remove and return an element from the right side of the deque. If no elements |
| 78 | are present, raises an :exc:`IndexError`. |
| 79 | |
| 80 | |
| 81 | .. method:: deque.popleft() |
| 82 | |
| 83 | Remove and return an element from the left side of the deque. If no elements are |
| 84 | present, raises an :exc:`IndexError`. |
| 85 | |
| 86 | |
| 87 | .. method:: deque.remove(value) |
| 88 | |
| 89 | Removed the first occurrence of *value*. If not found, raises a |
| 90 | :exc:`ValueError`. |
| 91 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 92 | |
| 93 | .. method:: deque.rotate(n) |
| 94 | |
| 95 | Rotate the deque *n* steps to the right. If *n* is negative, rotate to the |
| 96 | left. Rotating one step to the right is equivalent to: |
| 97 | ``d.appendleft(d.pop())``. |
| 98 | |
| 99 | In addition to the above, deques support iteration, pickling, ``len(d)``, |
| 100 | ``reversed(d)``, ``copy.copy(d)``, ``copy.deepcopy(d)``, membership testing with |
| 101 | the :keyword:`in` operator, and subscript references such as ``d[-1]``. |
| 102 | |
| 103 | Example:: |
| 104 | |
| 105 | >>> from collections import deque |
| 106 | >>> d = deque('ghi') # make a new deque with three items |
| 107 | >>> for elem in d: # iterate over the deque's elements |
Georg Brandl | 6911e3c | 2007-09-04 07:15:32 +0000 | [diff] [blame] | 108 | ... print(elem.upper()) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 109 | G |
| 110 | H |
| 111 | I |
| 112 | |
| 113 | >>> d.append('j') # add a new entry to the right side |
| 114 | >>> d.appendleft('f') # add a new entry to the left side |
| 115 | >>> d # show the representation of the deque |
| 116 | deque(['f', 'g', 'h', 'i', 'j']) |
| 117 | |
| 118 | >>> d.pop() # return and remove the rightmost item |
| 119 | 'j' |
| 120 | >>> d.popleft() # return and remove the leftmost item |
| 121 | 'f' |
| 122 | >>> list(d) # list the contents of the deque |
| 123 | ['g', 'h', 'i'] |
| 124 | >>> d[0] # peek at leftmost item |
| 125 | 'g' |
| 126 | >>> d[-1] # peek at rightmost item |
| 127 | 'i' |
| 128 | |
| 129 | >>> list(reversed(d)) # list the contents of a deque in reverse |
| 130 | ['i', 'h', 'g'] |
| 131 | >>> 'h' in d # search the deque |
| 132 | True |
| 133 | >>> d.extend('jkl') # add multiple elements at once |
| 134 | >>> d |
| 135 | deque(['g', 'h', 'i', 'j', 'k', 'l']) |
| 136 | >>> d.rotate(1) # right rotation |
| 137 | >>> d |
| 138 | deque(['l', 'g', 'h', 'i', 'j', 'k']) |
| 139 | >>> d.rotate(-1) # left rotation |
| 140 | >>> d |
| 141 | deque(['g', 'h', 'i', 'j', 'k', 'l']) |
| 142 | |
| 143 | >>> deque(reversed(d)) # make a new deque in reverse order |
| 144 | deque(['l', 'k', 'j', 'i', 'h', 'g']) |
| 145 | >>> d.clear() # empty the deque |
| 146 | >>> d.pop() # cannot pop from an empty deque |
| 147 | Traceback (most recent call last): |
| 148 | File "<pyshell#6>", line 1, in -toplevel- |
| 149 | d.pop() |
| 150 | IndexError: pop from an empty deque |
| 151 | |
| 152 | >>> d.extendleft('abc') # extendleft() reverses the input order |
| 153 | >>> d |
| 154 | deque(['c', 'b', 'a']) |
| 155 | |
| 156 | |
| 157 | .. _deque-recipes: |
| 158 | |
| 159 | Recipes |
| 160 | ^^^^^^^ |
| 161 | |
| 162 | This section shows various approaches to working with deques. |
| 163 | |
| 164 | The :meth:`rotate` method provides a way to implement :class:`deque` slicing and |
| 165 | deletion. For example, a pure python implementation of ``del d[n]`` relies on |
| 166 | the :meth:`rotate` method to position elements to be popped:: |
| 167 | |
| 168 | def delete_nth(d, n): |
| 169 | d.rotate(-n) |
| 170 | d.popleft() |
| 171 | d.rotate(n) |
| 172 | |
| 173 | To implement :class:`deque` slicing, use a similar approach applying |
| 174 | :meth:`rotate` to bring a target element to the left side of the deque. Remove |
| 175 | old entries with :meth:`popleft`, add new entries with :meth:`extend`, and then |
| 176 | reverse the rotation. |
| 177 | |
| 178 | With minor variations on that approach, it is easy to implement Forth style |
| 179 | stack manipulations such as ``dup``, ``drop``, ``swap``, ``over``, ``pick``, |
| 180 | ``rot``, and ``roll``. |
| 181 | |
| 182 | A roundrobin task server can be built from a :class:`deque` using |
| 183 | :meth:`popleft` to select the current task and :meth:`append` to add it back to |
| 184 | the tasklist if the input stream is not exhausted:: |
| 185 | |
| 186 | >>> def roundrobin(*iterables): |
| 187 | ... pending = deque(iter(i) for i in iterables) |
| 188 | ... while pending: |
| 189 | ... task = pending.popleft() |
| 190 | ... try: |
| 191 | ... yield next(task) |
| 192 | ... except StopIteration: |
| 193 | ... continue |
| 194 | ... pending.append(task) |
| 195 | ... |
| 196 | >>> for value in roundrobin('abc', 'd', 'efgh'): |
Georg Brandl | 6911e3c | 2007-09-04 07:15:32 +0000 | [diff] [blame] | 197 | ... print(value) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 198 | |
| 199 | a |
| 200 | d |
| 201 | e |
| 202 | b |
| 203 | f |
| 204 | c |
| 205 | g |
| 206 | h |
| 207 | |
| 208 | |
| 209 | Multi-pass data reduction algorithms can be succinctly expressed and efficiently |
| 210 | coded by extracting elements with multiple calls to :meth:`popleft`, applying |
| 211 | the reduction function, and calling :meth:`append` to add the result back to the |
| 212 | queue. |
| 213 | |
| 214 | For example, building a balanced binary tree of nested lists entails reducing |
| 215 | two adjacent nodes into one by grouping them in a list:: |
| 216 | |
| 217 | >>> def maketree(iterable): |
| 218 | ... d = deque(iterable) |
| 219 | ... while len(d) > 1: |
| 220 | ... pair = [d.popleft(), d.popleft()] |
| 221 | ... d.append(pair) |
| 222 | ... return list(d) |
| 223 | ... |
Georg Brandl | 6911e3c | 2007-09-04 07:15:32 +0000 | [diff] [blame] | 224 | >>> print(maketree('abcdefgh')) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 225 | [[[['a', 'b'], ['c', 'd']], [['e', 'f'], ['g', 'h']]]] |
| 226 | |
| 227 | |
| 228 | |
| 229 | .. _defaultdict-objects: |
| 230 | |
| 231 | :class:`defaultdict` objects |
| 232 | ---------------------------- |
| 233 | |
| 234 | |
| 235 | .. class:: defaultdict([default_factory[, ...]]) |
| 236 | |
| 237 | Returns a new dictionary-like object. :class:`defaultdict` is a subclass of the |
| 238 | builtin :class:`dict` class. It overrides one method and adds one writable |
| 239 | instance variable. The remaining functionality is the same as for the |
| 240 | :class:`dict` class and is not documented here. |
| 241 | |
| 242 | The first argument provides the initial value for the :attr:`default_factory` |
| 243 | attribute; it defaults to ``None``. All remaining arguments are treated the same |
| 244 | as if they were passed to the :class:`dict` constructor, including keyword |
| 245 | arguments. |
| 246 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 247 | |
| 248 | :class:`defaultdict` objects support the following method in addition to the |
| 249 | standard :class:`dict` operations: |
| 250 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 251 | .. method:: defaultdict.__missing__(key) |
| 252 | |
| 253 | If the :attr:`default_factory` attribute is ``None``, this raises an |
| 254 | :exc:`KeyError` exception with the *key* as argument. |
| 255 | |
| 256 | If :attr:`default_factory` is not ``None``, it is called without arguments to |
| 257 | provide a default value for the given *key*, this value is inserted in the |
| 258 | dictionary for the *key*, and returned. |
| 259 | |
| 260 | If calling :attr:`default_factory` raises an exception this exception is |
| 261 | propagated unchanged. |
| 262 | |
| 263 | This method is called by the :meth:`__getitem__` method of the :class:`dict` |
| 264 | class when the requested key is not found; whatever it returns or raises is then |
| 265 | returned or raised by :meth:`__getitem__`. |
| 266 | |
| 267 | :class:`defaultdict` objects support the following instance variable: |
| 268 | |
| 269 | |
| 270 | .. attribute:: defaultdict.default_factory |
| 271 | |
| 272 | This attribute is used by the :meth:`__missing__` method; it is initialized from |
| 273 | the first argument to the constructor, if present, or to ``None``, if absent. |
| 274 | |
| 275 | |
| 276 | .. _defaultdict-examples: |
| 277 | |
| 278 | :class:`defaultdict` Examples |
| 279 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 280 | |
| 281 | Using :class:`list` as the :attr:`default_factory`, it is easy to group a |
| 282 | sequence of key-value pairs into a dictionary of lists:: |
| 283 | |
| 284 | >>> s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)] |
| 285 | >>> d = defaultdict(list) |
| 286 | >>> for k, v in s: |
| 287 | ... d[k].append(v) |
| 288 | ... |
| 289 | >>> d.items() |
| 290 | [('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])] |
| 291 | |
| 292 | When each key is encountered for the first time, it is not already in the |
| 293 | mapping; so an entry is automatically created using the :attr:`default_factory` |
| 294 | function which returns an empty :class:`list`. The :meth:`list.append` |
| 295 | operation then attaches the value to the new list. When keys are encountered |
| 296 | again, the look-up proceeds normally (returning the list for that key) and the |
| 297 | :meth:`list.append` operation adds another value to the list. This technique is |
| 298 | simpler and faster than an equivalent technique using :meth:`dict.setdefault`:: |
| 299 | |
| 300 | >>> d = {} |
| 301 | >>> for k, v in s: |
| 302 | ... d.setdefault(k, []).append(v) |
| 303 | ... |
| 304 | >>> d.items() |
| 305 | [('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])] |
| 306 | |
| 307 | Setting the :attr:`default_factory` to :class:`int` makes the |
| 308 | :class:`defaultdict` useful for counting (like a bag or multiset in other |
| 309 | languages):: |
| 310 | |
| 311 | >>> s = 'mississippi' |
| 312 | >>> d = defaultdict(int) |
| 313 | >>> for k in s: |
| 314 | ... d[k] += 1 |
| 315 | ... |
| 316 | >>> d.items() |
| 317 | [('i', 4), ('p', 2), ('s', 4), ('m', 1)] |
| 318 | |
| 319 | When a letter is first encountered, it is missing from the mapping, so the |
| 320 | :attr:`default_factory` function calls :func:`int` to supply a default count of |
| 321 | zero. The increment operation then builds up the count for each letter. |
| 322 | |
| 323 | The function :func:`int` which always returns zero is just a special case of |
| 324 | constant functions. A faster and more flexible way to create constant functions |
| 325 | is to use a lambda function which can supply any constant value (not just |
| 326 | zero):: |
| 327 | |
| 328 | >>> def constant_factory(value): |
| 329 | ... return lambda: value |
| 330 | >>> d = defaultdict(constant_factory('<missing>')) |
| 331 | >>> d.update(name='John', action='ran') |
| 332 | >>> '%(name)s %(action)s to %(object)s' % d |
| 333 | 'John ran to <missing>' |
| 334 | |
| 335 | Setting the :attr:`default_factory` to :class:`set` makes the |
| 336 | :class:`defaultdict` useful for building a dictionary of sets:: |
| 337 | |
| 338 | >>> s = [('red', 1), ('blue', 2), ('red', 3), ('blue', 4), ('red', 1), ('blue', 4)] |
| 339 | >>> d = defaultdict(set) |
| 340 | >>> for k, v in s: |
| 341 | ... d[k].add(v) |
| 342 | ... |
| 343 | >>> d.items() |
| 344 | [('blue', set([2, 4])), ('red', set([1, 3]))] |
| 345 | |
| 346 | |
| 347 | .. _named-tuple-factory: |
| 348 | |
| 349 | :func:`NamedTuple` datatype factory function |
| 350 | -------------------------------------------- |
| 351 | |
| 352 | |
| 353 | .. function:: NamedTuple(typename, fieldnames) |
| 354 | |
| 355 | Returns a new tuple subclass named *typename*. The new subclass is used to |
| 356 | create tuple-like objects that have fields accessable by attribute lookup as |
| 357 | well as being indexable and iterable. Instances of the subclass also have a |
| 358 | helpful docstring (with typename and fieldnames) and a helpful :meth:`__repr__` |
| 359 | method which lists the tuple contents in a ``name=value`` format. |
| 360 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 361 | The *fieldnames* are specified in a single string and are separated by spaces. |
| 362 | Any valid Python identifier may be used for a field name. |
| 363 | |
| 364 | Example:: |
| 365 | |
| 366 | >>> Point = NamedTuple('Point', 'x y') |
| 367 | >>> Point.__doc__ # docstring for the new datatype |
| 368 | 'Point(x, y)' |
| 369 | >>> p = Point(11, y=22) # instantiate with positional or keyword arguments |
| 370 | >>> p[0] + p[1] # works just like the tuple (11, 22) |
| 371 | 33 |
| 372 | >>> x, y = p # unpacks just like a tuple |
| 373 | >>> x, y |
| 374 | (11, 22) |
| 375 | >>> p.x + p.y # fields also accessable by name |
| 376 | 33 |
| 377 | >>> p # readable __repr__ with name=value style |
| 378 | Point(x=11, y=22) |
| 379 | |
| 380 | The use cases are the same as those for tuples. The named factories assign |
| 381 | meaning to each tuple position and allow for more readable, self-documenting |
| 382 | code. Named tuples can also be used to assign field names to tuples returned |
| 383 | by the :mod:`csv` or :mod:`sqlite3` modules. For example:: |
| 384 | |
| 385 | from itertools import starmap |
| 386 | import csv |
| 387 | EmployeeRecord = NamedTuple('EmployeeRecord', 'name age title department paygrade') |
| 388 | for record in starmap(EmployeeRecord, csv.reader(open("employees.csv", "rb"))): |
Georg Brandl | 6911e3c | 2007-09-04 07:15:32 +0000 | [diff] [blame] | 389 | print(record) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 390 | |
| 391 | To cast an individual record stored as :class:`list`, :class:`tuple`, or some |
Thomas Wouters | 47b49bf | 2007-08-30 22:15:33 +0000 | [diff] [blame] | 392 | other iterable type, use the star-operator [#]_ to unpack the values:: |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 393 | |
| 394 | >>> Color = NamedTuple('Color', 'name code') |
| 395 | >>> m = dict(red=1, green=2, blue=3) |
Georg Brandl | 6911e3c | 2007-09-04 07:15:32 +0000 | [diff] [blame] | 396 | >>> print(Color(*m.popitem())) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 397 | Color(name='blue', code=3) |
| 398 | |
Thomas Wouters | 47b49bf | 2007-08-30 22:15:33 +0000 | [diff] [blame] | 399 | .. rubric:: Footnotes |
| 400 | |
| 401 | .. [#] For information on the star-operator see |
| 402 | :ref:`tut-unpacking-arguments` and :ref:`calls`. |