blob: fa5dd6410ad1cf5f1390048e53232307e1116252 [file] [log] [blame]
Georg Brandl8ec7f652007-08-15 14:28:01 +00001
2:mod:`collections` --- High-performance container datatypes
3===========================================================
4
5.. module:: collections
6 :synopsis: High-performance datatypes
7.. moduleauthor:: Raymond Hettinger <python@rcn.com>
8.. sectionauthor:: Raymond Hettinger <python@rcn.com>
9
10
11.. versionadded:: 2.4
12
13This module implements high-performance container datatypes. Currently,
14there are two datatypes, :class:`deque` and :class:`defaultdict`, and
15one datatype factory function, :func:`NamedTuple`. Python already
16includes built-in containers, :class:`dict`, :class:`list`,
17:class:`set`, and :class:`tuple`. In addition, the optional :mod:`bsddb`
18module has a :meth:`bsddb.btopen` method that can be used to create in-memory
19or file based ordered dictionaries with string keys.
20
21Future editions of the standard library may include balanced trees and
22ordered dictionaries.
23
24.. versionchanged:: 2.5
25 Added :class:`defaultdict`.
26
27.. versionchanged:: 2.6
28 Added :class:`NamedTuple`.
29
30
31.. _deque-objects:
32
33:class:`deque` objects
34----------------------
35
36
37.. class:: deque([iterable])
38
39 Returns a new deque object initialized left-to-right (using :meth:`append`) with
40 data from *iterable*. If *iterable* is not specified, the new deque is empty.
41
42 Deques are a generalization of stacks and queues (the name is pronounced "deck"
43 and is short for "double-ended queue"). Deques support thread-safe, memory
44 efficient appends and pops from either side of the deque with approximately the
45 same O(1) performance in either direction.
46
47 Though :class:`list` objects support similar operations, they are optimized for
48 fast fixed-length operations and incur O(n) memory movement costs for
49 ``pop(0)`` and ``insert(0, v)`` operations which change both the size and
50 position of the underlying data representation.
51
52 .. versionadded:: 2.4
53
54Deque objects support the following methods:
55
56
57.. method:: deque.append(x)
58
59 Add *x* to the right side of the deque.
60
61
62.. method:: deque.appendleft(x)
63
64 Add *x* to the left side of the deque.
65
66
67.. method:: deque.clear()
68
69 Remove all elements from the deque leaving it with length 0.
70
71
72.. method:: deque.extend(iterable)
73
74 Extend the right side of the deque by appending elements from the iterable
75 argument.
76
77
78.. method:: deque.extendleft(iterable)
79
80 Extend the left side of the deque by appending elements from *iterable*. Note,
81 the series of left appends results in reversing the order of elements in the
82 iterable argument.
83
84
85.. method:: deque.pop()
86
87 Remove and return an element from the right side of the deque. If no elements
88 are present, raises an :exc:`IndexError`.
89
90
91.. method:: deque.popleft()
92
93 Remove and return an element from the left side of the deque. If no elements are
94 present, raises an :exc:`IndexError`.
95
96
97.. method:: deque.remove(value)
98
99 Removed the first occurrence of *value*. If not found, raises a
100 :exc:`ValueError`.
101
102 .. versionadded:: 2.5
103
104
105.. method:: deque.rotate(n)
106
107 Rotate the deque *n* steps to the right. If *n* is negative, rotate to the
108 left. Rotating one step to the right is equivalent to:
109 ``d.appendleft(d.pop())``.
110
111In addition to the above, deques support iteration, pickling, ``len(d)``,
112``reversed(d)``, ``copy.copy(d)``, ``copy.deepcopy(d)``, membership testing with
113the :keyword:`in` operator, and subscript references such as ``d[-1]``.
114
115Example::
116
117 >>> from collections import deque
118 >>> d = deque('ghi') # make a new deque with three items
119 >>> for elem in d: # iterate over the deque's elements
120 ... print elem.upper()
121 G
122 H
123 I
124
125 >>> d.append('j') # add a new entry to the right side
126 >>> d.appendleft('f') # add a new entry to the left side
127 >>> d # show the representation of the deque
128 deque(['f', 'g', 'h', 'i', 'j'])
129
130 >>> d.pop() # return and remove the rightmost item
131 'j'
132 >>> d.popleft() # return and remove the leftmost item
133 'f'
134 >>> list(d) # list the contents of the deque
135 ['g', 'h', 'i']
136 >>> d[0] # peek at leftmost item
137 'g'
138 >>> d[-1] # peek at rightmost item
139 'i'
140
141 >>> list(reversed(d)) # list the contents of a deque in reverse
142 ['i', 'h', 'g']
143 >>> 'h' in d # search the deque
144 True
145 >>> d.extend('jkl') # add multiple elements at once
146 >>> d
147 deque(['g', 'h', 'i', 'j', 'k', 'l'])
148 >>> d.rotate(1) # right rotation
149 >>> d
150 deque(['l', 'g', 'h', 'i', 'j', 'k'])
151 >>> d.rotate(-1) # left rotation
152 >>> d
153 deque(['g', 'h', 'i', 'j', 'k', 'l'])
154
155 >>> deque(reversed(d)) # make a new deque in reverse order
156 deque(['l', 'k', 'j', 'i', 'h', 'g'])
157 >>> d.clear() # empty the deque
158 >>> d.pop() # cannot pop from an empty deque
159 Traceback (most recent call last):
160 File "<pyshell#6>", line 1, in -toplevel-
161 d.pop()
162 IndexError: pop from an empty deque
163
164 >>> d.extendleft('abc') # extendleft() reverses the input order
165 >>> d
166 deque(['c', 'b', 'a'])
167
168
169.. _deque-recipes:
170
171Recipes
172^^^^^^^
173
174This section shows various approaches to working with deques.
175
176The :meth:`rotate` method provides a way to implement :class:`deque` slicing and
177deletion. For example, a pure python implementation of ``del d[n]`` relies on
178the :meth:`rotate` method to position elements to be popped::
179
180 def delete_nth(d, n):
181 d.rotate(-n)
182 d.popleft()
183 d.rotate(n)
184
185To implement :class:`deque` slicing, use a similar approach applying
186:meth:`rotate` to bring a target element to the left side of the deque. Remove
187old entries with :meth:`popleft`, add new entries with :meth:`extend`, and then
188reverse the rotation.
189
190With minor variations on that approach, it is easy to implement Forth style
191stack manipulations such as ``dup``, ``drop``, ``swap``, ``over``, ``pick``,
192``rot``, and ``roll``.
193
194A roundrobin task server can be built from a :class:`deque` using
195:meth:`popleft` to select the current task and :meth:`append` to add it back to
196the tasklist if the input stream is not exhausted::
197
198 >>> def roundrobin(*iterables):
199 ... pending = deque(iter(i) for i in iterables)
200 ... while pending:
201 ... task = pending.popleft()
202 ... try:
203 ... yield task.next()
204 ... except StopIteration:
205 ... continue
206 ... pending.append(task)
207 ...
208 >>> for value in roundrobin('abc', 'd', 'efgh'):
209 ... print value
210
211 a
212 d
213 e
214 b
215 f
216 c
217 g
218 h
219
220
221Multi-pass data reduction algorithms can be succinctly expressed and efficiently
222coded by extracting elements with multiple calls to :meth:`popleft`, applying
223the reduction function, and calling :meth:`append` to add the result back to the
224queue.
225
226For example, building a balanced binary tree of nested lists entails reducing
227two adjacent nodes into one by grouping them in a list::
228
229 >>> def maketree(iterable):
230 ... d = deque(iterable)
231 ... while len(d) > 1:
232 ... pair = [d.popleft(), d.popleft()]
233 ... d.append(pair)
234 ... return list(d)
235 ...
236 >>> print maketree('abcdefgh')
237 [[[['a', 'b'], ['c', 'd']], [['e', 'f'], ['g', 'h']]]]
238
239
240
241.. _defaultdict-objects:
242
243:class:`defaultdict` objects
244----------------------------
245
246
247.. class:: defaultdict([default_factory[, ...]])
248
249 Returns a new dictionary-like object. :class:`defaultdict` is a subclass of the
250 builtin :class:`dict` class. It overrides one method and adds one writable
251 instance variable. The remaining functionality is the same as for the
252 :class:`dict` class and is not documented here.
253
254 The first argument provides the initial value for the :attr:`default_factory`
255 attribute; it defaults to ``None``. All remaining arguments are treated the same
256 as if they were passed to the :class:`dict` constructor, including keyword
257 arguments.
258
259 .. versionadded:: 2.5
260
261:class:`defaultdict` objects support the following method in addition to the
262standard :class:`dict` operations:
263
264
265.. method:: defaultdict.__missing__(key)
266
267 If the :attr:`default_factory` attribute is ``None``, this raises an
268 :exc:`KeyError` exception with the *key* as argument.
269
270 If :attr:`default_factory` is not ``None``, it is called without arguments to
271 provide a default value for the given *key*, this value is inserted in the
272 dictionary for the *key*, and returned.
273
274 If calling :attr:`default_factory` raises an exception this exception is
275 propagated unchanged.
276
277 This method is called by the :meth:`__getitem__` method of the :class:`dict`
278 class when the requested key is not found; whatever it returns or raises is then
279 returned or raised by :meth:`__getitem__`.
280
281:class:`defaultdict` objects support the following instance variable:
282
283
284.. attribute:: defaultdict.default_factory
285
286 This attribute is used by the :meth:`__missing__` method; it is initialized from
287 the first argument to the constructor, if present, or to ``None``, if absent.
288
289
290.. _defaultdict-examples:
291
292:class:`defaultdict` Examples
293^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
294
295Using :class:`list` as the :attr:`default_factory`, it is easy to group a
296sequence of key-value pairs into a dictionary of lists::
297
298 >>> s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]
299 >>> d = defaultdict(list)
300 >>> for k, v in s:
301 ... d[k].append(v)
302 ...
303 >>> d.items()
304 [('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])]
305
306When each key is encountered for the first time, it is not already in the
307mapping; so an entry is automatically created using the :attr:`default_factory`
308function which returns an empty :class:`list`. The :meth:`list.append`
309operation then attaches the value to the new list. When keys are encountered
310again, the look-up proceeds normally (returning the list for that key) and the
311:meth:`list.append` operation adds another value to the list. This technique is
312simpler and faster than an equivalent technique using :meth:`dict.setdefault`::
313
314 >>> d = {}
315 >>> for k, v in s:
316 ... d.setdefault(k, []).append(v)
317 ...
318 >>> d.items()
319 [('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])]
320
321Setting the :attr:`default_factory` to :class:`int` makes the
322:class:`defaultdict` useful for counting (like a bag or multiset in other
323languages)::
324
325 >>> s = 'mississippi'
326 >>> d = defaultdict(int)
327 >>> for k in s:
328 ... d[k] += 1
329 ...
330 >>> d.items()
331 [('i', 4), ('p', 2), ('s', 4), ('m', 1)]
332
333When a letter is first encountered, it is missing from the mapping, so the
334:attr:`default_factory` function calls :func:`int` to supply a default count of
335zero. The increment operation then builds up the count for each letter.
336
337The function :func:`int` which always returns zero is just a special case of
338constant functions. A faster and more flexible way to create constant functions
339is to use :func:`itertools.repeat` which can supply any constant value (not just
340zero)::
341
342 >>> def constant_factory(value):
343 ... return itertools.repeat(value).next
344 >>> d = defaultdict(constant_factory('<missing>'))
345 >>> d.update(name='John', action='ran')
346 >>> '%(name)s %(action)s to %(object)s' % d
347 'John ran to <missing>'
348
349Setting the :attr:`default_factory` to :class:`set` makes the
350:class:`defaultdict` useful for building a dictionary of sets::
351
352 >>> s = [('red', 1), ('blue', 2), ('red', 3), ('blue', 4), ('red', 1), ('blue', 4)]
353 >>> d = defaultdict(set)
354 >>> for k, v in s:
355 ... d[k].add(v)
356 ...
357 >>> d.items()
358 [('blue', set([2, 4])), ('red', set([1, 3]))]
359
360
361.. _named-tuple-factory:
362
Raymond Hettinger7268e9d2007-09-20 03:03:43 +0000363:func:`NamedTuple` Factory Function for Tuples with Named Fields
364----------------------------------------------------------------
Georg Brandl8ec7f652007-08-15 14:28:01 +0000365
Raymond Hettingercbab5942007-09-18 22:18:02 +0000366Named tuples assign meaning to each position in a tuple and allow for more readable,
367self-documenting code. They can be used wherever regular tuples are used, and
368they add the ability to access fields by name instead of position index.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000369
Raymond Hettinger2b03d452007-09-18 03:33:19 +0000370.. function:: NamedTuple(typename, fieldnames, [verbose])
Georg Brandl8ec7f652007-08-15 14:28:01 +0000371
372 Returns a new tuple subclass named *typename*. The new subclass is used to
373 create tuple-like objects that have fields accessable by attribute lookup as
374 well as being indexable and iterable. Instances of the subclass also have a
375 helpful docstring (with typename and fieldnames) and a helpful :meth:`__repr__`
376 method which lists the tuple contents in a ``name=value`` format.
377
Raymond Hettingercbab5942007-09-18 22:18:02 +0000378 The *fieldnames* are specified in a single string with each fieldname separated by
Raymond Hettinger7268e9d2007-09-20 03:03:43 +0000379 a space and/or comma. Any valid Python identifier may be used for a fieldname.
Raymond Hettingercbab5942007-09-18 22:18:02 +0000380
Raymond Hettinger7268e9d2007-09-20 03:03:43 +0000381 If *verbose* is true, will print the class definition.
Raymond Hettingercbab5942007-09-18 22:18:02 +0000382
383 *NamedTuple* instances do not have per-instance dictionaries, so they are
Raymond Hettinger7268e9d2007-09-20 03:03:43 +0000384 lightweight and require no more memory than regular tuples.
Raymond Hettingercbab5942007-09-18 22:18:02 +0000385
Georg Brandl8ec7f652007-08-15 14:28:01 +0000386 .. versionadded:: 2.6
387
Raymond Hettingercbab5942007-09-18 22:18:02 +0000388Example::
Georg Brandl8ec7f652007-08-15 14:28:01 +0000389
Raymond Hettingercbab5942007-09-18 22:18:02 +0000390 >>> Point = NamedTuple('Point', 'x y', True)
391 class Point(tuple):
392 'Point(x, y)'
393 __slots__ = ()
394 __fields__ = ('x', 'y')
395 def __new__(cls, x, y):
396 return tuple.__new__(cls, (x, y))
397 def __repr__(self):
398 return 'Point(x=%r, y=%r)' % self
399 def __replace__(self, field, value):
400 'Return a new Point object replacing one field with a new value'
401 return Point(**dict(zip(('x', 'y'), self) + [(field, value)]))
402 x = property(itemgetter(0))
403 y = property(itemgetter(1))
Georg Brandl8ec7f652007-08-15 14:28:01 +0000404
Raymond Hettingercbab5942007-09-18 22:18:02 +0000405 >>> p = Point(11, y=22) # instantiate with positional or keyword arguments
406 >>> p[0] + p[1] # indexable like the regular tuple (11, 22)
407 33
408 >>> x, y = p # unpack like a regular tuple
409 >>> x, y
410 (11, 22)
411 >>> p.x + p.y # fields also accessable by name
412 33
413 >>> p # readable __repr__ with a name=value style
414 Point(x=11, y=22)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000415
Raymond Hettingercbab5942007-09-18 22:18:02 +0000416Named tuples are especially useful for assigning field names to result tuples returned
417by the :mod:`csv` or :mod:`sqlite3` modules::
Georg Brandl8ec7f652007-08-15 14:28:01 +0000418
Raymond Hettingercbab5942007-09-18 22:18:02 +0000419 from itertools import starmap
420 import csv
421 EmployeeRecord = NamedTuple('EmployeeRecord', 'name age title department paygrade')
422 for emp in starmap(EmployeeRecord, csv.reader(open("employees.csv", "rb"))):
423 print emp.name, emp.title
Georg Brandl8ec7f652007-08-15 14:28:01 +0000424
Raymond Hettingercbab5942007-09-18 22:18:02 +0000425When casting a single record to a *NamedTuple*, use the star-operator [#]_ to unpack
426the values::
Georg Brandl8ec7f652007-08-15 14:28:01 +0000427
Raymond Hettingercbab5942007-09-18 22:18:02 +0000428 >>> t = [11, 22]
429 >>> Point(*t) # the star-operator unpacks any iterable object
430 Point(x=11, y=22)
Raymond Hettinger2b03d452007-09-18 03:33:19 +0000431
Raymond Hettingerd36a60e2007-09-17 00:55:00 +0000432In addition to the methods inherited from tuples, named tuples support
433an additonal method and an informational read-only attribute.
434
435.. method:: somenamedtuple.replace(field, value)
436
Raymond Hettinger7268e9d2007-09-20 03:03:43 +0000437 Return a new instance of the named tuple replacing the named *field* with a new *value*:
438
439::
Raymond Hettingerd36a60e2007-09-17 00:55:00 +0000440
Raymond Hettingercbab5942007-09-18 22:18:02 +0000441 >>> p = Point(x=11, y=22)
Raymond Hettingerd36a60e2007-09-17 00:55:00 +0000442 >>> p.__replace__('x', 33)
443 Point(x=33, y=22)
444
445 >>> for recordnum, record in inventory:
446 ... inventory[recordnum] = record.replace('total', record.price * record.quantity)
447
Raymond Hettingerd36a60e2007-09-17 00:55:00 +0000448.. attribute:: somenamedtuple.__fields__
449
450 Return a tuple of strings listing the field names. This is useful for introspection,
Raymond Hettingercbab5942007-09-18 22:18:02 +0000451 for converting a named tuple instance to a dictionary, and for combining named tuple
Raymond Hettinger7268e9d2007-09-20 03:03:43 +0000452 types to create new named tuple types:
453
454::
Raymond Hettingerd36a60e2007-09-17 00:55:00 +0000455
Raymond Hettingercbab5942007-09-18 22:18:02 +0000456 >>> p.__fields__ # view the field names
457 ('x', 'y')
458 >>> dict(zip(p.__fields__, p)) # convert to a dictionary
459 {'y': 22, 'x': 11}
Raymond Hettingerd36a60e2007-09-17 00:55:00 +0000460
Raymond Hettingercbab5942007-09-18 22:18:02 +0000461 >>> Color = NamedTuple('Color', 'red green blue')
462 >>> pixel_fields = ' '.join(Point.__fields__ + Color.__fields__) # combine fields
463 >>> Pixel = NamedTuple('Pixel', pixel_fields)
464 >>> Pixel(11, 22, 128, 255, 0)
465 Pixel(x=11, y=22, red=128, green=255, blue=0)'
Raymond Hettingerd36a60e2007-09-17 00:55:00 +0000466
Mark Summerfield7f626f42007-08-30 15:03:03 +0000467.. rubric:: Footnotes
468
469.. [#] For information on the star-operator see
470 :ref:`tut-unpacking-arguments` and :ref:`calls`.