blob: 50ddc0f742d2763a1a6cc69f35bfb5179b7c6677 [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001
2:mod:`collections` --- High-performance container datatypes
3===========================================================
4
5.. module:: collections
6 :synopsis: High-performance datatypes
7.. moduleauthor:: Raymond Hettinger <python@rcn.com>
8.. sectionauthor:: Raymond Hettinger <python@rcn.com>
9
10
Georg Brandl116aa622007-08-15 14:28:22 +000011This module implements high-performance container datatypes. Currently,
12there are two datatypes, :class:`deque` and :class:`defaultdict`, and
13one datatype factory function, :func:`NamedTuple`. Python already
14includes built-in containers, :class:`dict`, :class:`list`,
15:class:`set`, and :class:`tuple`. In addition, the optional :mod:`bsddb`
16module has a :meth:`bsddb.btopen` method that can be used to create in-memory
17or file based ordered dictionaries with string keys.
18
19Future editions of the standard library may include balanced trees and
20ordered dictionaries.
21
Georg Brandl116aa622007-08-15 14:28:22 +000022
23.. _deque-objects:
24
25:class:`deque` objects
26----------------------
27
28
29.. class:: deque([iterable])
30
31 Returns a new deque object initialized left-to-right (using :meth:`append`) with
32 data from *iterable*. If *iterable* is not specified, the new deque is empty.
33
34 Deques are a generalization of stacks and queues (the name is pronounced "deck"
35 and is short for "double-ended queue"). Deques support thread-safe, memory
36 efficient appends and pops from either side of the deque with approximately the
37 same O(1) performance in either direction.
38
39 Though :class:`list` objects support similar operations, they are optimized for
40 fast fixed-length operations and incur O(n) memory movement costs for
41 ``pop(0)`` and ``insert(0, v)`` operations which change both the size and
42 position of the underlying data representation.
43
Georg Brandl116aa622007-08-15 14:28:22 +000044
45Deque objects support the following methods:
46
Georg Brandl116aa622007-08-15 14:28:22 +000047.. method:: deque.append(x)
48
49 Add *x* to the right side of the deque.
50
51
52.. method:: deque.appendleft(x)
53
54 Add *x* to the left side of the deque.
55
56
57.. method:: deque.clear()
58
59 Remove all elements from the deque leaving it with length 0.
60
61
62.. method:: deque.extend(iterable)
63
64 Extend the right side of the deque by appending elements from the iterable
65 argument.
66
67
68.. method:: deque.extendleft(iterable)
69
70 Extend the left side of the deque by appending elements from *iterable*. Note,
71 the series of left appends results in reversing the order of elements in the
72 iterable argument.
73
74
75.. method:: deque.pop()
76
77 Remove and return an element from the right side of the deque. If no elements
78 are present, raises an :exc:`IndexError`.
79
80
81.. method:: deque.popleft()
82
83 Remove and return an element from the left side of the deque. If no elements are
84 present, raises an :exc:`IndexError`.
85
86
87.. method:: deque.remove(value)
88
89 Removed the first occurrence of *value*. If not found, raises a
90 :exc:`ValueError`.
91
Georg Brandl116aa622007-08-15 14:28:22 +000092
93.. method:: deque.rotate(n)
94
95 Rotate the deque *n* steps to the right. If *n* is negative, rotate to the
96 left. Rotating one step to the right is equivalent to:
97 ``d.appendleft(d.pop())``.
98
99In addition to the above, deques support iteration, pickling, ``len(d)``,
100``reversed(d)``, ``copy.copy(d)``, ``copy.deepcopy(d)``, membership testing with
101the :keyword:`in` operator, and subscript references such as ``d[-1]``.
102
103Example::
104
105 >>> from collections import deque
106 >>> d = deque('ghi') # make a new deque with three items
107 >>> for elem in d: # iterate over the deque's elements
Georg Brandl6911e3c2007-09-04 07:15:32 +0000108 ... print(elem.upper())
Georg Brandl116aa622007-08-15 14:28:22 +0000109 G
110 H
111 I
112
113 >>> d.append('j') # add a new entry to the right side
114 >>> d.appendleft('f') # add a new entry to the left side
115 >>> d # show the representation of the deque
116 deque(['f', 'g', 'h', 'i', 'j'])
117
118 >>> d.pop() # return and remove the rightmost item
119 'j'
120 >>> d.popleft() # return and remove the leftmost item
121 'f'
122 >>> list(d) # list the contents of the deque
123 ['g', 'h', 'i']
124 >>> d[0] # peek at leftmost item
125 'g'
126 >>> d[-1] # peek at rightmost item
127 'i'
128
129 >>> list(reversed(d)) # list the contents of a deque in reverse
130 ['i', 'h', 'g']
131 >>> 'h' in d # search the deque
132 True
133 >>> d.extend('jkl') # add multiple elements at once
134 >>> d
135 deque(['g', 'h', 'i', 'j', 'k', 'l'])
136 >>> d.rotate(1) # right rotation
137 >>> d
138 deque(['l', 'g', 'h', 'i', 'j', 'k'])
139 >>> d.rotate(-1) # left rotation
140 >>> d
141 deque(['g', 'h', 'i', 'j', 'k', 'l'])
142
143 >>> deque(reversed(d)) # make a new deque in reverse order
144 deque(['l', 'k', 'j', 'i', 'h', 'g'])
145 >>> d.clear() # empty the deque
146 >>> d.pop() # cannot pop from an empty deque
147 Traceback (most recent call last):
148 File "<pyshell#6>", line 1, in -toplevel-
149 d.pop()
150 IndexError: pop from an empty deque
151
152 >>> d.extendleft('abc') # extendleft() reverses the input order
153 >>> d
154 deque(['c', 'b', 'a'])
155
156
157.. _deque-recipes:
158
159Recipes
160^^^^^^^
161
162This section shows various approaches to working with deques.
163
164The :meth:`rotate` method provides a way to implement :class:`deque` slicing and
165deletion. For example, a pure python implementation of ``del d[n]`` relies on
166the :meth:`rotate` method to position elements to be popped::
167
168 def delete_nth(d, n):
169 d.rotate(-n)
170 d.popleft()
171 d.rotate(n)
172
173To implement :class:`deque` slicing, use a similar approach applying
174:meth:`rotate` to bring a target element to the left side of the deque. Remove
175old entries with :meth:`popleft`, add new entries with :meth:`extend`, and then
176reverse the rotation.
177
178With minor variations on that approach, it is easy to implement Forth style
179stack manipulations such as ``dup``, ``drop``, ``swap``, ``over``, ``pick``,
180``rot``, and ``roll``.
181
182A roundrobin task server can be built from a :class:`deque` using
183:meth:`popleft` to select the current task and :meth:`append` to add it back to
184the tasklist if the input stream is not exhausted::
185
186 >>> def roundrobin(*iterables):
187 ... pending = deque(iter(i) for i in iterables)
188 ... while pending:
189 ... task = pending.popleft()
190 ... try:
191 ... yield next(task)
192 ... except StopIteration:
193 ... continue
194 ... pending.append(task)
195 ...
196 >>> for value in roundrobin('abc', 'd', 'efgh'):
Georg Brandl6911e3c2007-09-04 07:15:32 +0000197 ... print(value)
Georg Brandl116aa622007-08-15 14:28:22 +0000198
199 a
200 d
201 e
202 b
203 f
204 c
205 g
206 h
207
208
209Multi-pass data reduction algorithms can be succinctly expressed and efficiently
210coded by extracting elements with multiple calls to :meth:`popleft`, applying
211the reduction function, and calling :meth:`append` to add the result back to the
212queue.
213
214For example, building a balanced binary tree of nested lists entails reducing
215two adjacent nodes into one by grouping them in a list::
216
217 >>> def maketree(iterable):
218 ... d = deque(iterable)
219 ... while len(d) > 1:
220 ... pair = [d.popleft(), d.popleft()]
221 ... d.append(pair)
222 ... return list(d)
223 ...
Georg Brandl6911e3c2007-09-04 07:15:32 +0000224 >>> print(maketree('abcdefgh'))
Georg Brandl116aa622007-08-15 14:28:22 +0000225 [[[['a', 'b'], ['c', 'd']], [['e', 'f'], ['g', 'h']]]]
226
227
228
229.. _defaultdict-objects:
230
231:class:`defaultdict` objects
232----------------------------
233
234
235.. class:: defaultdict([default_factory[, ...]])
236
237 Returns a new dictionary-like object. :class:`defaultdict` is a subclass of the
238 builtin :class:`dict` class. It overrides one method and adds one writable
239 instance variable. The remaining functionality is the same as for the
240 :class:`dict` class and is not documented here.
241
242 The first argument provides the initial value for the :attr:`default_factory`
243 attribute; it defaults to ``None``. All remaining arguments are treated the same
244 as if they were passed to the :class:`dict` constructor, including keyword
245 arguments.
246
Georg Brandl116aa622007-08-15 14:28:22 +0000247
248:class:`defaultdict` objects support the following method in addition to the
249standard :class:`dict` operations:
250
Georg Brandl116aa622007-08-15 14:28:22 +0000251.. method:: defaultdict.__missing__(key)
252
253 If the :attr:`default_factory` attribute is ``None``, this raises an
254 :exc:`KeyError` exception with the *key* as argument.
255
256 If :attr:`default_factory` is not ``None``, it is called without arguments to
257 provide a default value for the given *key*, this value is inserted in the
258 dictionary for the *key*, and returned.
259
260 If calling :attr:`default_factory` raises an exception this exception is
261 propagated unchanged.
262
263 This method is called by the :meth:`__getitem__` method of the :class:`dict`
264 class when the requested key is not found; whatever it returns or raises is then
265 returned or raised by :meth:`__getitem__`.
266
267:class:`defaultdict` objects support the following instance variable:
268
269
270.. attribute:: defaultdict.default_factory
271
272 This attribute is used by the :meth:`__missing__` method; it is initialized from
273 the first argument to the constructor, if present, or to ``None``, if absent.
274
275
276.. _defaultdict-examples:
277
278:class:`defaultdict` Examples
279^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
280
281Using :class:`list` as the :attr:`default_factory`, it is easy to group a
282sequence of key-value pairs into a dictionary of lists::
283
284 >>> s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]
285 >>> d = defaultdict(list)
286 >>> for k, v in s:
287 ... d[k].append(v)
288 ...
289 >>> d.items()
290 [('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])]
291
292When each key is encountered for the first time, it is not already in the
293mapping; so an entry is automatically created using the :attr:`default_factory`
294function which returns an empty :class:`list`. The :meth:`list.append`
295operation then attaches the value to the new list. When keys are encountered
296again, the look-up proceeds normally (returning the list for that key) and the
297:meth:`list.append` operation adds another value to the list. This technique is
298simpler and faster than an equivalent technique using :meth:`dict.setdefault`::
299
300 >>> d = {}
301 >>> for k, v in s:
302 ... d.setdefault(k, []).append(v)
303 ...
304 >>> d.items()
305 [('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])]
306
307Setting the :attr:`default_factory` to :class:`int` makes the
308:class:`defaultdict` useful for counting (like a bag or multiset in other
309languages)::
310
311 >>> s = 'mississippi'
312 >>> d = defaultdict(int)
313 >>> for k in s:
314 ... d[k] += 1
315 ...
316 >>> d.items()
317 [('i', 4), ('p', 2), ('s', 4), ('m', 1)]
318
319When a letter is first encountered, it is missing from the mapping, so the
320:attr:`default_factory` function calls :func:`int` to supply a default count of
321zero. The increment operation then builds up the count for each letter.
322
323The function :func:`int` which always returns zero is just a special case of
324constant functions. A faster and more flexible way to create constant functions
325is to use a lambda function which can supply any constant value (not just
326zero)::
327
328 >>> def constant_factory(value):
329 ... return lambda: value
330 >>> d = defaultdict(constant_factory('<missing>'))
331 >>> d.update(name='John', action='ran')
332 >>> '%(name)s %(action)s to %(object)s' % d
333 'John ran to <missing>'
334
335Setting the :attr:`default_factory` to :class:`set` makes the
336:class:`defaultdict` useful for building a dictionary of sets::
337
338 >>> s = [('red', 1), ('blue', 2), ('red', 3), ('blue', 4), ('red', 1), ('blue', 4)]
339 >>> d = defaultdict(set)
340 >>> for k, v in s:
341 ... d[k].add(v)
342 ...
343 >>> d.items()
344 [('blue', set([2, 4])), ('red', set([1, 3]))]
345
346
347.. _named-tuple-factory:
348
349:func:`NamedTuple` datatype factory function
350--------------------------------------------
351
352
353.. function:: NamedTuple(typename, fieldnames)
354
355 Returns a new tuple subclass named *typename*. The new subclass is used to
356 create tuple-like objects that have fields accessable by attribute lookup as
357 well as being indexable and iterable. Instances of the subclass also have a
358 helpful docstring (with typename and fieldnames) and a helpful :meth:`__repr__`
359 method which lists the tuple contents in a ``name=value`` format.
360
Georg Brandl116aa622007-08-15 14:28:22 +0000361 The *fieldnames* are specified in a single string and are separated by spaces.
362 Any valid Python identifier may be used for a field name.
363
364 Example::
365
366 >>> Point = NamedTuple('Point', 'x y')
367 >>> Point.__doc__ # docstring for the new datatype
368 'Point(x, y)'
369 >>> p = Point(11, y=22) # instantiate with positional or keyword arguments
370 >>> p[0] + p[1] # works just like the tuple (11, 22)
371 33
372 >>> x, y = p # unpacks just like a tuple
373 >>> x, y
374 (11, 22)
375 >>> p.x + p.y # fields also accessable by name
376 33
377 >>> p # readable __repr__ with name=value style
378 Point(x=11, y=22)
379
380 The use cases are the same as those for tuples. The named factories assign
381 meaning to each tuple position and allow for more readable, self-documenting
382 code. Named tuples can also be used to assign field names to tuples returned
383 by the :mod:`csv` or :mod:`sqlite3` modules. For example::
384
385 from itertools import starmap
386 import csv
387 EmployeeRecord = NamedTuple('EmployeeRecord', 'name age title department paygrade')
388 for record in starmap(EmployeeRecord, csv.reader(open("employees.csv", "rb"))):
Georg Brandl6911e3c2007-09-04 07:15:32 +0000389 print(record)
Georg Brandl116aa622007-08-15 14:28:22 +0000390
391 To cast an individual record stored as :class:`list`, :class:`tuple`, or some
Thomas Wouters47b49bf2007-08-30 22:15:33 +0000392 other iterable type, use the star-operator [#]_ to unpack the values::
Georg Brandl116aa622007-08-15 14:28:22 +0000393
394 >>> Color = NamedTuple('Color', 'name code')
395 >>> m = dict(red=1, green=2, blue=3)
Georg Brandl6911e3c2007-09-04 07:15:32 +0000396 >>> print(Color(*m.popitem()))
Georg Brandl116aa622007-08-15 14:28:22 +0000397 Color(name='blue', code=3)
398
Thomas Wouters47b49bf2007-08-30 22:15:33 +0000399.. rubric:: Footnotes
400
401.. [#] For information on the star-operator see
402 :ref:`tut-unpacking-arguments` and :ref:`calls`.