blob: ba3ce31d1c64bdaa5cf11137fb0ce10562f842de [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001
2:mod:`collections` --- High-performance container datatypes
3===========================================================
4
5.. module:: collections
6 :synopsis: High-performance datatypes
7.. moduleauthor:: Raymond Hettinger <python@rcn.com>
8.. sectionauthor:: Raymond Hettinger <python@rcn.com>
9
10
Georg Brandl116aa622007-08-15 14:28:22 +000011This module implements high-performance container datatypes. Currently,
12there are two datatypes, :class:`deque` and :class:`defaultdict`, and
Georg Brandl9afde1c2007-11-01 20:32:30 +000013one datatype factory function, :func:`named_tuple`. Python already
Georg Brandl116aa622007-08-15 14:28:22 +000014includes built-in containers, :class:`dict`, :class:`list`,
15:class:`set`, and :class:`tuple`. In addition, the optional :mod:`bsddb`
16module has a :meth:`bsddb.btopen` method that can be used to create in-memory
17or file based ordered dictionaries with string keys.
18
19Future editions of the standard library may include balanced trees and
20ordered dictionaries.
21
Mark Summerfield08898b42007-09-05 08:43:04 +000022In addition to containers, the collections module provides some ABCs
23(abstract base classes) that can be used to test whether
24a class provides a particular interface, for example, is it hashable or
25a mapping. The ABCs provided include those in the following table:
26
27===================================== ========================================
28ABC Notes
29===================================== ========================================
30:class:`collections.Container` Defines ``__contains__()``
31:class:`collections.Hashable` Defines ``__hash__()``
32:class:`collections.Iterable` Defines ``__iter__()``
33:class:`collections.Iterator` Derived from :class:`Iterable` and in
34 addition defines ``__next__()``
35:class:`collections.Mapping` Derived from :class:`Container`,
36 :class:`Iterable`,
37 and :class:`Sized`, and in addition
38 defines ``__getitem__()``, ``get()``,
39 ``__contains__()``, ``__len__()``,
40 ``__iter__()``, ``keys()``,
41 ``items()``, and ``values()``
42:class:`collections.MutableMapping` Derived from :class:`Mapping`
43:class:`collections.MutableSequence` Derived from :class:`Sequence`
44:class:`collections.MutableSet` Derived from :class:`Set` and in
45 addition defines ``add()``,
46 ``clear()``, ``discard()``, ``pop()``,
47 and ``toggle()``
48:class:`collections.Sequence` Derived from :class:`Container`,
49 :class:`Iterable`, and :class:`Sized`,
50 and in addition defines
51 ``__getitem__()``
52:class:`collections.Set` Derived from :class:`Container`, :class:`Iterable`, and :class:`Sized`
53:class:`collections.Sized` Defines ``__len__()``
54===================================== ========================================
55
56.. XXX Have not included them all and the notes are imcomplete
57.. Deliberately did one row wide to get a neater output
58
59These ABCs allow us to ask classes or instances if they provide
60particular functionality, for example::
61
62 from collections import Sized
63
64 size = None
65 if isinstance(myvar, Sized):
66 size = len(myvar)
67
68(For more about ABCs, see the :mod:`abc` module and :pep:`3119`.)
69
70
Georg Brandl116aa622007-08-15 14:28:22 +000071
72.. _deque-objects:
73
74:class:`deque` objects
75----------------------
76
77
Georg Brandl9afde1c2007-11-01 20:32:30 +000078.. class:: deque([iterable[, maxlen]])
Georg Brandl116aa622007-08-15 14:28:22 +000079
80 Returns a new deque object initialized left-to-right (using :meth:`append`) with
81 data from *iterable*. If *iterable* is not specified, the new deque is empty.
82
83 Deques are a generalization of stacks and queues (the name is pronounced "deck"
84 and is short for "double-ended queue"). Deques support thread-safe, memory
85 efficient appends and pops from either side of the deque with approximately the
86 same O(1) performance in either direction.
87
88 Though :class:`list` objects support similar operations, they are optimized for
89 fast fixed-length operations and incur O(n) memory movement costs for
90 ``pop(0)`` and ``insert(0, v)`` operations which change both the size and
91 position of the underlying data representation.
92
Georg Brandl116aa622007-08-15 14:28:22 +000093
Georg Brandl9afde1c2007-11-01 20:32:30 +000094 If *maxlen* is not specified or is *None*, deques may grow to an
95 arbitrary length. Otherwise, the deque is bounded to the specified maximum
96 length. Once a bounded length deque is full, when new items are added, a
97 corresponding number of items are discarded from the opposite end. Bounded
98 length deques provide functionality similar to the ``tail`` filter in
99 Unix. They are also useful for tracking transactions and other pools of data
100 where only the most recent activity is of interest.
101
102 .. versionchanged:: 2.6
103 Added *maxlen*
104
Georg Brandl116aa622007-08-15 14:28:22 +0000105Deque objects support the following methods:
106
Georg Brandl116aa622007-08-15 14:28:22 +0000107.. method:: deque.append(x)
108
109 Add *x* to the right side of the deque.
110
111
112.. method:: deque.appendleft(x)
113
114 Add *x* to the left side of the deque.
115
116
117.. method:: deque.clear()
118
119 Remove all elements from the deque leaving it with length 0.
120
121
122.. method:: deque.extend(iterable)
123
124 Extend the right side of the deque by appending elements from the iterable
125 argument.
126
127
128.. method:: deque.extendleft(iterable)
129
130 Extend the left side of the deque by appending elements from *iterable*. Note,
131 the series of left appends results in reversing the order of elements in the
132 iterable argument.
133
134
135.. method:: deque.pop()
136
137 Remove and return an element from the right side of the deque. If no elements
138 are present, raises an :exc:`IndexError`.
139
140
141.. method:: deque.popleft()
142
143 Remove and return an element from the left side of the deque. If no elements are
144 present, raises an :exc:`IndexError`.
145
146
147.. method:: deque.remove(value)
148
149 Removed the first occurrence of *value*. If not found, raises a
150 :exc:`ValueError`.
151
Georg Brandl116aa622007-08-15 14:28:22 +0000152
153.. method:: deque.rotate(n)
154
155 Rotate the deque *n* steps to the right. If *n* is negative, rotate to the
156 left. Rotating one step to the right is equivalent to:
157 ``d.appendleft(d.pop())``.
158
159In addition to the above, deques support iteration, pickling, ``len(d)``,
160``reversed(d)``, ``copy.copy(d)``, ``copy.deepcopy(d)``, membership testing with
161the :keyword:`in` operator, and subscript references such as ``d[-1]``.
162
163Example::
164
165 >>> from collections import deque
166 >>> d = deque('ghi') # make a new deque with three items
167 >>> for elem in d: # iterate over the deque's elements
Georg Brandl6911e3c2007-09-04 07:15:32 +0000168 ... print(elem.upper())
Georg Brandl116aa622007-08-15 14:28:22 +0000169 G
170 H
171 I
172
173 >>> d.append('j') # add a new entry to the right side
174 >>> d.appendleft('f') # add a new entry to the left side
175 >>> d # show the representation of the deque
176 deque(['f', 'g', 'h', 'i', 'j'])
177
178 >>> d.pop() # return and remove the rightmost item
179 'j'
180 >>> d.popleft() # return and remove the leftmost item
181 'f'
182 >>> list(d) # list the contents of the deque
183 ['g', 'h', 'i']
184 >>> d[0] # peek at leftmost item
185 'g'
186 >>> d[-1] # peek at rightmost item
187 'i'
188
189 >>> list(reversed(d)) # list the contents of a deque in reverse
190 ['i', 'h', 'g']
191 >>> 'h' in d # search the deque
192 True
193 >>> d.extend('jkl') # add multiple elements at once
194 >>> d
195 deque(['g', 'h', 'i', 'j', 'k', 'l'])
196 >>> d.rotate(1) # right rotation
197 >>> d
198 deque(['l', 'g', 'h', 'i', 'j', 'k'])
199 >>> d.rotate(-1) # left rotation
200 >>> d
201 deque(['g', 'h', 'i', 'j', 'k', 'l'])
202
203 >>> deque(reversed(d)) # make a new deque in reverse order
204 deque(['l', 'k', 'j', 'i', 'h', 'g'])
205 >>> d.clear() # empty the deque
206 >>> d.pop() # cannot pop from an empty deque
207 Traceback (most recent call last):
208 File "<pyshell#6>", line 1, in -toplevel-
209 d.pop()
210 IndexError: pop from an empty deque
211
212 >>> d.extendleft('abc') # extendleft() reverses the input order
213 >>> d
214 deque(['c', 'b', 'a'])
215
216
217.. _deque-recipes:
218
Georg Brandl9afde1c2007-11-01 20:32:30 +0000219:class:`deque` Recipes
220^^^^^^^^^^^^^^^^^^^^^^
Georg Brandl116aa622007-08-15 14:28:22 +0000221
222This section shows various approaches to working with deques.
223
224The :meth:`rotate` method provides a way to implement :class:`deque` slicing and
225deletion. For example, a pure python implementation of ``del d[n]`` relies on
226the :meth:`rotate` method to position elements to be popped::
227
228 def delete_nth(d, n):
229 d.rotate(-n)
230 d.popleft()
231 d.rotate(n)
232
233To implement :class:`deque` slicing, use a similar approach applying
234:meth:`rotate` to bring a target element to the left side of the deque. Remove
235old entries with :meth:`popleft`, add new entries with :meth:`extend`, and then
236reverse the rotation.
Georg Brandl116aa622007-08-15 14:28:22 +0000237With minor variations on that approach, it is easy to implement Forth style
238stack manipulations such as ``dup``, ``drop``, ``swap``, ``over``, ``pick``,
239``rot``, and ``roll``.
240
Georg Brandl116aa622007-08-15 14:28:22 +0000241Multi-pass data reduction algorithms can be succinctly expressed and efficiently
242coded by extracting elements with multiple calls to :meth:`popleft`, applying
Georg Brandl9afde1c2007-11-01 20:32:30 +0000243a reduction function, and calling :meth:`append` to add the result back to the
244deque.
Georg Brandl116aa622007-08-15 14:28:22 +0000245
246For example, building a balanced binary tree of nested lists entails reducing
247two adjacent nodes into one by grouping them in a list::
248
249 >>> def maketree(iterable):
250 ... d = deque(iterable)
251 ... while len(d) > 1:
252 ... pair = [d.popleft(), d.popleft()]
253 ... d.append(pair)
254 ... return list(d)
255 ...
Georg Brandl6911e3c2007-09-04 07:15:32 +0000256 >>> print(maketree('abcdefgh'))
Georg Brandl116aa622007-08-15 14:28:22 +0000257 [[[['a', 'b'], ['c', 'd']], [['e', 'f'], ['g', 'h']]]]
258
Georg Brandl9afde1c2007-11-01 20:32:30 +0000259Bounded length deques provide functionality similar to the ``tail`` filter
260in Unix::
Georg Brandl116aa622007-08-15 14:28:22 +0000261
Georg Brandl9afde1c2007-11-01 20:32:30 +0000262 def tail(filename, n=10):
263 'Return the last n lines of a file'
264 return deque(open(filename), n)
Georg Brandl116aa622007-08-15 14:28:22 +0000265
266.. _defaultdict-objects:
267
268:class:`defaultdict` objects
269----------------------------
270
271
272.. class:: defaultdict([default_factory[, ...]])
273
274 Returns a new dictionary-like object. :class:`defaultdict` is a subclass of the
275 builtin :class:`dict` class. It overrides one method and adds one writable
276 instance variable. The remaining functionality is the same as for the
277 :class:`dict` class and is not documented here.
278
279 The first argument provides the initial value for the :attr:`default_factory`
280 attribute; it defaults to ``None``. All remaining arguments are treated the same
281 as if they were passed to the :class:`dict` constructor, including keyword
282 arguments.
283
Georg Brandl116aa622007-08-15 14:28:22 +0000284
285:class:`defaultdict` objects support the following method in addition to the
286standard :class:`dict` operations:
287
Georg Brandl116aa622007-08-15 14:28:22 +0000288.. method:: defaultdict.__missing__(key)
289
290 If the :attr:`default_factory` attribute is ``None``, this raises an
291 :exc:`KeyError` exception with the *key* as argument.
292
293 If :attr:`default_factory` is not ``None``, it is called without arguments to
294 provide a default value for the given *key*, this value is inserted in the
295 dictionary for the *key*, and returned.
296
297 If calling :attr:`default_factory` raises an exception this exception is
298 propagated unchanged.
299
300 This method is called by the :meth:`__getitem__` method of the :class:`dict`
301 class when the requested key is not found; whatever it returns or raises is then
302 returned or raised by :meth:`__getitem__`.
303
304:class:`defaultdict` objects support the following instance variable:
305
306
307.. attribute:: defaultdict.default_factory
308
309 This attribute is used by the :meth:`__missing__` method; it is initialized from
310 the first argument to the constructor, if present, or to ``None``, if absent.
311
312
313.. _defaultdict-examples:
314
315:class:`defaultdict` Examples
316^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
317
318Using :class:`list` as the :attr:`default_factory`, it is easy to group a
319sequence of key-value pairs into a dictionary of lists::
320
321 >>> s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]
322 >>> d = defaultdict(list)
323 >>> for k, v in s:
324 ... d[k].append(v)
325 ...
326 >>> d.items()
327 [('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])]
328
329When each key is encountered for the first time, it is not already in the
330mapping; so an entry is automatically created using the :attr:`default_factory`
331function which returns an empty :class:`list`. The :meth:`list.append`
332operation then attaches the value to the new list. When keys are encountered
333again, the look-up proceeds normally (returning the list for that key) and the
334:meth:`list.append` operation adds another value to the list. This technique is
335simpler and faster than an equivalent technique using :meth:`dict.setdefault`::
336
337 >>> d = {}
338 >>> for k, v in s:
339 ... d.setdefault(k, []).append(v)
340 ...
341 >>> d.items()
342 [('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])]
343
344Setting the :attr:`default_factory` to :class:`int` makes the
345:class:`defaultdict` useful for counting (like a bag or multiset in other
346languages)::
347
348 >>> s = 'mississippi'
349 >>> d = defaultdict(int)
350 >>> for k in s:
351 ... d[k] += 1
352 ...
353 >>> d.items()
354 [('i', 4), ('p', 2), ('s', 4), ('m', 1)]
355
356When a letter is first encountered, it is missing from the mapping, so the
357:attr:`default_factory` function calls :func:`int` to supply a default count of
358zero. The increment operation then builds up the count for each letter.
359
360The function :func:`int` which always returns zero is just a special case of
361constant functions. A faster and more flexible way to create constant functions
362is to use a lambda function which can supply any constant value (not just
363zero)::
364
365 >>> def constant_factory(value):
366 ... return lambda: value
367 >>> d = defaultdict(constant_factory('<missing>'))
368 >>> d.update(name='John', action='ran')
369 >>> '%(name)s %(action)s to %(object)s' % d
370 'John ran to <missing>'
371
372Setting the :attr:`default_factory` to :class:`set` makes the
373:class:`defaultdict` useful for building a dictionary of sets::
374
375 >>> s = [('red', 1), ('blue', 2), ('red', 3), ('blue', 4), ('red', 1), ('blue', 4)]
376 >>> d = defaultdict(set)
377 >>> for k, v in s:
378 ... d[k].add(v)
379 ...
380 >>> d.items()
381 [('blue', set([2, 4])), ('red', set([1, 3]))]
382
383
384.. _named-tuple-factory:
385
Georg Brandl9afde1c2007-11-01 20:32:30 +0000386:func:`named_tuple` Factory Function for Tuples with Named Fields
387-----------------------------------------------------------------
Georg Brandl116aa622007-08-15 14:28:22 +0000388
Thomas Wouters1b7f8912007-09-19 03:06:30 +0000389Named tuples assign meaning to each position in a tuple and allow for more readable,
390self-documenting code. They can be used wherever regular tuples are used, and
391they add the ability to access fields by name instead of position index.
Georg Brandl116aa622007-08-15 14:28:22 +0000392
Georg Brandl9afde1c2007-11-01 20:32:30 +0000393.. function:: named_tuple(typename, fieldnames, [verbose])
Georg Brandl116aa622007-08-15 14:28:22 +0000394
395 Returns a new tuple subclass named *typename*. The new subclass is used to
396 create tuple-like objects that have fields accessable by attribute lookup as
397 well as being indexable and iterable. Instances of the subclass also have a
398 helpful docstring (with typename and fieldnames) and a helpful :meth:`__repr__`
399 method which lists the tuple contents in a ``name=value`` format.
400
Georg Brandl9afde1c2007-11-01 20:32:30 +0000401 The *fieldnames* are a single string with each fieldname separated by whitespace
402 and/or commas (for example 'x y' or 'x, y'). Alternatively, the *fieldnames*
403 can be specified as a list of strings (such as ['x', 'y']).
404
405 Any valid Python identifier may be used for a fieldname except for names
406 starting and ending with double underscores. Valid identifiers consist of
407 letters, digits, and underscores but do not start with a digit and cannot be
408 a :mod:`keyword` such as *class*, *for*, *return*, *global*, *pass*, *print*,
409 or *raise*.
Georg Brandl116aa622007-08-15 14:28:22 +0000410
Thomas Wouters8ce81f72007-09-20 18:22:40 +0000411 If *verbose* is true, will print the class definition.
Georg Brandl116aa622007-08-15 14:28:22 +0000412
Georg Brandl9afde1c2007-11-01 20:32:30 +0000413 Named tuple instances do not have per-instance dictionaries, so they are
Thomas Wouters8ce81f72007-09-20 18:22:40 +0000414 lightweight and require no more memory than regular tuples.
Georg Brandl116aa622007-08-15 14:28:22 +0000415
Thomas Wouters1b7f8912007-09-19 03:06:30 +0000416Example::
Georg Brandl116aa622007-08-15 14:28:22 +0000417
Georg Brandl9afde1c2007-11-01 20:32:30 +0000418 >>> Point = named_tuple('Point', 'x y', verbose=True)
Thomas Wouters1b7f8912007-09-19 03:06:30 +0000419 class Point(tuple):
420 'Point(x, y)'
421 __slots__ = ()
422 __fields__ = ('x', 'y')
423 def __new__(cls, x, y):
424 return tuple.__new__(cls, (x, y))
425 def __repr__(self):
426 return 'Point(x=%r, y=%r)' % self
Georg Brandl9afde1c2007-11-01 20:32:30 +0000427 def __asdict__(self):
428 'Return a new dict mapping field names to their values'
429 return dict(zip(('x', 'y'), self))
Thomas Wouters1b7f8912007-09-19 03:06:30 +0000430 def __replace__(self, field, value):
431 'Return a new Point object replacing one field with a new value'
432 return Point(**dict(zip(('x', 'y'), self) + [(field, value)]))
433 x = property(itemgetter(0))
434 y = property(itemgetter(1))
Georg Brandl116aa622007-08-15 14:28:22 +0000435
Thomas Wouters1b7f8912007-09-19 03:06:30 +0000436 >>> p = Point(11, y=22) # instantiate with positional or keyword arguments
437 >>> p[0] + p[1] # indexable like the regular tuple (11, 22)
438 33
439 >>> x, y = p # unpack like a regular tuple
440 >>> x, y
441 (11, 22)
442 >>> p.x + p.y # fields also accessable by name
443 33
444 >>> p # readable __repr__ with a name=value style
445 Point(x=11, y=22)
Georg Brandl116aa622007-08-15 14:28:22 +0000446
Thomas Wouters1b7f8912007-09-19 03:06:30 +0000447Named tuples are especially useful for assigning field names to result tuples returned
448by the :mod:`csv` or :mod:`sqlite3` modules::
449
Georg Brandl9afde1c2007-11-01 20:32:30 +0000450 EmployeeRecord = named_tuple('EmployeeRecord', 'name, age, title, department, paygrade')
451
Thomas Wouters1b7f8912007-09-19 03:06:30 +0000452 from itertools import starmap
453 import csv
Thomas Wouters1b7f8912007-09-19 03:06:30 +0000454 for record in starmap(EmployeeRecord, csv.reader(open("employees.csv", "rb"))):
455 print(emp.name, emp.title)
456
Georg Brandl9afde1c2007-11-01 20:32:30 +0000457 import sqlite3
458 conn = sqlite3.connect('/companydata')
459 cursor = conn.cursor()
460 cursor.execute('SELECT name, age, title, department, paygrade FROM employees')
461 for emp in starmap(EmployeeRecord, cursor.fetchall()):
462 print emp.name, emp.title
463
464When casting a single record to a named tuple, use the star-operator [#]_ to unpack
Thomas Wouters1b7f8912007-09-19 03:06:30 +0000465the values::
466
467 >>> t = [11, 22]
468 >>> Point(*t) # the star-operator unpacks any iterable object
469 Point(x=11, y=22)
470
Georg Brandl9afde1c2007-11-01 20:32:30 +0000471When casting a dictionary to a named tuple, use the double-star-operator::
Thomas Wouters1b7f8912007-09-19 03:06:30 +0000472
Georg Brandl9afde1c2007-11-01 20:32:30 +0000473 >>> d = {'x': 11, 'y': 22}
474 >>> Point(**d)
475 Point(x=11, y=22)
476
477In addition to the methods inherited from tuples, named tuples support
478two additonal methods and a read-only attribute.
479
480.. method:: somenamedtuple.__asdict__()
481
482 Return a new dict which maps field names to their corresponding values:
483
484::
485
486 >>> p.__asdict__()
487 {'x': 11, 'y': 22}
488
489.. method:: somenamedtuple.__replace__(field, value)
Thomas Wouters1b7f8912007-09-19 03:06:30 +0000490
Thomas Wouters8ce81f72007-09-20 18:22:40 +0000491 Return a new instance of the named tuple replacing the named *field* with a new *value*:
492
493::
Thomas Wouters1b7f8912007-09-19 03:06:30 +0000494
495 >>> p = Point(x=11, y=22)
496 >>> p.__replace__('x', 33)
497 Point(x=33, y=22)
498
499 >>> for recordnum, record in inventory:
500 ... inventory[recordnum] = record.replace('total', record.price * record.quantity)
501
502.. attribute:: somenamedtuple.__fields__
503
Georg Brandl9afde1c2007-11-01 20:32:30 +0000504 Return a tuple of strings listing the field names. This is useful for introspection
505 and for creating new named tuple types from existing named tuples.
Thomas Wouters8ce81f72007-09-20 18:22:40 +0000506
507::
Thomas Wouters1b7f8912007-09-19 03:06:30 +0000508
Georg Brandl9afde1c2007-11-01 20:32:30 +0000509 >>> p.__fields__ # view the field names
Thomas Wouters1b7f8912007-09-19 03:06:30 +0000510 ('x', 'y')
Thomas Wouters1b7f8912007-09-19 03:06:30 +0000511
Georg Brandl9afde1c2007-11-01 20:32:30 +0000512 >>> Color = named_tuple('Color', 'red green blue')
513 >>> Pixel = named_tuple('Pixel', Point.__fields__ + Color.__fields__)
Thomas Wouters1b7f8912007-09-19 03:06:30 +0000514 >>> Pixel(11, 22, 128, 255, 0)
515 Pixel(x=11, y=22, red=128, green=255, blue=0)'
Georg Brandl116aa622007-08-15 14:28:22 +0000516
Thomas Wouters47b49bf2007-08-30 22:15:33 +0000517.. rubric:: Footnotes
518
519.. [#] For information on the star-operator see
520 :ref:`tut-unpacking-arguments` and :ref:`calls`.