blob: fe57f1230c434068ebd579225d04a8e145b6ef63 [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001
2:mod:`collections` --- High-performance container datatypes
3===========================================================
4
5.. module:: collections
6 :synopsis: High-performance datatypes
7.. moduleauthor:: Raymond Hettinger <python@rcn.com>
8.. sectionauthor:: Raymond Hettinger <python@rcn.com>
9
10
Georg Brandl116aa622007-08-15 14:28:22 +000011This module implements high-performance container datatypes. Currently,
12there are two datatypes, :class:`deque` and :class:`defaultdict`, and
13one datatype factory function, :func:`NamedTuple`. Python already
14includes built-in containers, :class:`dict`, :class:`list`,
15:class:`set`, and :class:`tuple`. In addition, the optional :mod:`bsddb`
16module has a :meth:`bsddb.btopen` method that can be used to create in-memory
17or file based ordered dictionaries with string keys.
18
19Future editions of the standard library may include balanced trees and
20ordered dictionaries.
21
Mark Summerfield08898b42007-09-05 08:43:04 +000022In addition to containers, the collections module provides some ABCs
23(abstract base classes) that can be used to test whether
24a class provides a particular interface, for example, is it hashable or
25a mapping. The ABCs provided include those in the following table:
26
27===================================== ========================================
28ABC Notes
29===================================== ========================================
30:class:`collections.Container` Defines ``__contains__()``
31:class:`collections.Hashable` Defines ``__hash__()``
32:class:`collections.Iterable` Defines ``__iter__()``
33:class:`collections.Iterator` Derived from :class:`Iterable` and in
34 addition defines ``__next__()``
35:class:`collections.Mapping` Derived from :class:`Container`,
36 :class:`Iterable`,
37 and :class:`Sized`, and in addition
38 defines ``__getitem__()``, ``get()``,
39 ``__contains__()``, ``__len__()``,
40 ``__iter__()``, ``keys()``,
41 ``items()``, and ``values()``
42:class:`collections.MutableMapping` Derived from :class:`Mapping`
43:class:`collections.MutableSequence` Derived from :class:`Sequence`
44:class:`collections.MutableSet` Derived from :class:`Set` and in
45 addition defines ``add()``,
46 ``clear()``, ``discard()``, ``pop()``,
47 and ``toggle()``
48:class:`collections.Sequence` Derived from :class:`Container`,
49 :class:`Iterable`, and :class:`Sized`,
50 and in addition defines
51 ``__getitem__()``
52:class:`collections.Set` Derived from :class:`Container`, :class:`Iterable`, and :class:`Sized`
53:class:`collections.Sized` Defines ``__len__()``
54===================================== ========================================
55
56.. XXX Have not included them all and the notes are imcomplete
57.. Deliberately did one row wide to get a neater output
58
59These ABCs allow us to ask classes or instances if they provide
60particular functionality, for example::
61
62 from collections import Sized
63
64 size = None
65 if isinstance(myvar, Sized):
66 size = len(myvar)
67
68(For more about ABCs, see the :mod:`abc` module and :pep:`3119`.)
69
70
Georg Brandl116aa622007-08-15 14:28:22 +000071
72.. _deque-objects:
73
74:class:`deque` objects
75----------------------
76
77
78.. class:: deque([iterable])
79
80 Returns a new deque object initialized left-to-right (using :meth:`append`) with
81 data from *iterable*. If *iterable* is not specified, the new deque is empty.
82
83 Deques are a generalization of stacks and queues (the name is pronounced "deck"
84 and is short for "double-ended queue"). Deques support thread-safe, memory
85 efficient appends and pops from either side of the deque with approximately the
86 same O(1) performance in either direction.
87
88 Though :class:`list` objects support similar operations, they are optimized for
89 fast fixed-length operations and incur O(n) memory movement costs for
90 ``pop(0)`` and ``insert(0, v)`` operations which change both the size and
91 position of the underlying data representation.
92
Georg Brandl116aa622007-08-15 14:28:22 +000093
94Deque objects support the following methods:
95
Georg Brandl116aa622007-08-15 14:28:22 +000096.. method:: deque.append(x)
97
98 Add *x* to the right side of the deque.
99
100
101.. method:: deque.appendleft(x)
102
103 Add *x* to the left side of the deque.
104
105
106.. method:: deque.clear()
107
108 Remove all elements from the deque leaving it with length 0.
109
110
111.. method:: deque.extend(iterable)
112
113 Extend the right side of the deque by appending elements from the iterable
114 argument.
115
116
117.. method:: deque.extendleft(iterable)
118
119 Extend the left side of the deque by appending elements from *iterable*. Note,
120 the series of left appends results in reversing the order of elements in the
121 iterable argument.
122
123
124.. method:: deque.pop()
125
126 Remove and return an element from the right side of the deque. If no elements
127 are present, raises an :exc:`IndexError`.
128
129
130.. method:: deque.popleft()
131
132 Remove and return an element from the left side of the deque. If no elements are
133 present, raises an :exc:`IndexError`.
134
135
136.. method:: deque.remove(value)
137
138 Removed the first occurrence of *value*. If not found, raises a
139 :exc:`ValueError`.
140
Georg Brandl116aa622007-08-15 14:28:22 +0000141
142.. method:: deque.rotate(n)
143
144 Rotate the deque *n* steps to the right. If *n* is negative, rotate to the
145 left. Rotating one step to the right is equivalent to:
146 ``d.appendleft(d.pop())``.
147
148In addition to the above, deques support iteration, pickling, ``len(d)``,
149``reversed(d)``, ``copy.copy(d)``, ``copy.deepcopy(d)``, membership testing with
150the :keyword:`in` operator, and subscript references such as ``d[-1]``.
151
152Example::
153
154 >>> from collections import deque
155 >>> d = deque('ghi') # make a new deque with three items
156 >>> for elem in d: # iterate over the deque's elements
Georg Brandl6911e3c2007-09-04 07:15:32 +0000157 ... print(elem.upper())
Georg Brandl116aa622007-08-15 14:28:22 +0000158 G
159 H
160 I
161
162 >>> d.append('j') # add a new entry to the right side
163 >>> d.appendleft('f') # add a new entry to the left side
164 >>> d # show the representation of the deque
165 deque(['f', 'g', 'h', 'i', 'j'])
166
167 >>> d.pop() # return and remove the rightmost item
168 'j'
169 >>> d.popleft() # return and remove the leftmost item
170 'f'
171 >>> list(d) # list the contents of the deque
172 ['g', 'h', 'i']
173 >>> d[0] # peek at leftmost item
174 'g'
175 >>> d[-1] # peek at rightmost item
176 'i'
177
178 >>> list(reversed(d)) # list the contents of a deque in reverse
179 ['i', 'h', 'g']
180 >>> 'h' in d # search the deque
181 True
182 >>> d.extend('jkl') # add multiple elements at once
183 >>> d
184 deque(['g', 'h', 'i', 'j', 'k', 'l'])
185 >>> d.rotate(1) # right rotation
186 >>> d
187 deque(['l', 'g', 'h', 'i', 'j', 'k'])
188 >>> d.rotate(-1) # left rotation
189 >>> d
190 deque(['g', 'h', 'i', 'j', 'k', 'l'])
191
192 >>> deque(reversed(d)) # make a new deque in reverse order
193 deque(['l', 'k', 'j', 'i', 'h', 'g'])
194 >>> d.clear() # empty the deque
195 >>> d.pop() # cannot pop from an empty deque
196 Traceback (most recent call last):
197 File "<pyshell#6>", line 1, in -toplevel-
198 d.pop()
199 IndexError: pop from an empty deque
200
201 >>> d.extendleft('abc') # extendleft() reverses the input order
202 >>> d
203 deque(['c', 'b', 'a'])
204
205
206.. _deque-recipes:
207
208Recipes
209^^^^^^^
210
211This section shows various approaches to working with deques.
212
213The :meth:`rotate` method provides a way to implement :class:`deque` slicing and
214deletion. For example, a pure python implementation of ``del d[n]`` relies on
215the :meth:`rotate` method to position elements to be popped::
216
217 def delete_nth(d, n):
218 d.rotate(-n)
219 d.popleft()
220 d.rotate(n)
221
222To implement :class:`deque` slicing, use a similar approach applying
223:meth:`rotate` to bring a target element to the left side of the deque. Remove
224old entries with :meth:`popleft`, add new entries with :meth:`extend`, and then
225reverse the rotation.
226
227With minor variations on that approach, it is easy to implement Forth style
228stack manipulations such as ``dup``, ``drop``, ``swap``, ``over``, ``pick``,
229``rot``, and ``roll``.
230
231A roundrobin task server can be built from a :class:`deque` using
232:meth:`popleft` to select the current task and :meth:`append` to add it back to
233the tasklist if the input stream is not exhausted::
234
235 >>> def roundrobin(*iterables):
236 ... pending = deque(iter(i) for i in iterables)
237 ... while pending:
238 ... task = pending.popleft()
239 ... try:
240 ... yield next(task)
241 ... except StopIteration:
242 ... continue
243 ... pending.append(task)
244 ...
245 >>> for value in roundrobin('abc', 'd', 'efgh'):
Georg Brandl6911e3c2007-09-04 07:15:32 +0000246 ... print(value)
Georg Brandl116aa622007-08-15 14:28:22 +0000247
248 a
249 d
250 e
251 b
252 f
253 c
254 g
255 h
256
257
258Multi-pass data reduction algorithms can be succinctly expressed and efficiently
259coded by extracting elements with multiple calls to :meth:`popleft`, applying
260the reduction function, and calling :meth:`append` to add the result back to the
261queue.
262
263For example, building a balanced binary tree of nested lists entails reducing
264two adjacent nodes into one by grouping them in a list::
265
266 >>> def maketree(iterable):
267 ... d = deque(iterable)
268 ... while len(d) > 1:
269 ... pair = [d.popleft(), d.popleft()]
270 ... d.append(pair)
271 ... return list(d)
272 ...
Georg Brandl6911e3c2007-09-04 07:15:32 +0000273 >>> print(maketree('abcdefgh'))
Georg Brandl116aa622007-08-15 14:28:22 +0000274 [[[['a', 'b'], ['c', 'd']], [['e', 'f'], ['g', 'h']]]]
275
276
277
278.. _defaultdict-objects:
279
280:class:`defaultdict` objects
281----------------------------
282
283
284.. class:: defaultdict([default_factory[, ...]])
285
286 Returns a new dictionary-like object. :class:`defaultdict` is a subclass of the
287 builtin :class:`dict` class. It overrides one method and adds one writable
288 instance variable. The remaining functionality is the same as for the
289 :class:`dict` class and is not documented here.
290
291 The first argument provides the initial value for the :attr:`default_factory`
292 attribute; it defaults to ``None``. All remaining arguments are treated the same
293 as if they were passed to the :class:`dict` constructor, including keyword
294 arguments.
295
Georg Brandl116aa622007-08-15 14:28:22 +0000296
297:class:`defaultdict` objects support the following method in addition to the
298standard :class:`dict` operations:
299
Georg Brandl116aa622007-08-15 14:28:22 +0000300.. method:: defaultdict.__missing__(key)
301
302 If the :attr:`default_factory` attribute is ``None``, this raises an
303 :exc:`KeyError` exception with the *key* as argument.
304
305 If :attr:`default_factory` is not ``None``, it is called without arguments to
306 provide a default value for the given *key*, this value is inserted in the
307 dictionary for the *key*, and returned.
308
309 If calling :attr:`default_factory` raises an exception this exception is
310 propagated unchanged.
311
312 This method is called by the :meth:`__getitem__` method of the :class:`dict`
313 class when the requested key is not found; whatever it returns or raises is then
314 returned or raised by :meth:`__getitem__`.
315
316:class:`defaultdict` objects support the following instance variable:
317
318
319.. attribute:: defaultdict.default_factory
320
321 This attribute is used by the :meth:`__missing__` method; it is initialized from
322 the first argument to the constructor, if present, or to ``None``, if absent.
323
324
325.. _defaultdict-examples:
326
327:class:`defaultdict` Examples
328^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
329
330Using :class:`list` as the :attr:`default_factory`, it is easy to group a
331sequence of key-value pairs into a dictionary of lists::
332
333 >>> s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]
334 >>> d = defaultdict(list)
335 >>> for k, v in s:
336 ... d[k].append(v)
337 ...
338 >>> d.items()
339 [('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])]
340
341When each key is encountered for the first time, it is not already in the
342mapping; so an entry is automatically created using the :attr:`default_factory`
343function which returns an empty :class:`list`. The :meth:`list.append`
344operation then attaches the value to the new list. When keys are encountered
345again, the look-up proceeds normally (returning the list for that key) and the
346:meth:`list.append` operation adds another value to the list. This technique is
347simpler and faster than an equivalent technique using :meth:`dict.setdefault`::
348
349 >>> d = {}
350 >>> for k, v in s:
351 ... d.setdefault(k, []).append(v)
352 ...
353 >>> d.items()
354 [('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])]
355
356Setting the :attr:`default_factory` to :class:`int` makes the
357:class:`defaultdict` useful for counting (like a bag or multiset in other
358languages)::
359
360 >>> s = 'mississippi'
361 >>> d = defaultdict(int)
362 >>> for k in s:
363 ... d[k] += 1
364 ...
365 >>> d.items()
366 [('i', 4), ('p', 2), ('s', 4), ('m', 1)]
367
368When a letter is first encountered, it is missing from the mapping, so the
369:attr:`default_factory` function calls :func:`int` to supply a default count of
370zero. The increment operation then builds up the count for each letter.
371
372The function :func:`int` which always returns zero is just a special case of
373constant functions. A faster and more flexible way to create constant functions
374is to use a lambda function which can supply any constant value (not just
375zero)::
376
377 >>> def constant_factory(value):
378 ... return lambda: value
379 >>> d = defaultdict(constant_factory('<missing>'))
380 >>> d.update(name='John', action='ran')
381 >>> '%(name)s %(action)s to %(object)s' % d
382 'John ran to <missing>'
383
384Setting the :attr:`default_factory` to :class:`set` makes the
385:class:`defaultdict` useful for building a dictionary of sets::
386
387 >>> s = [('red', 1), ('blue', 2), ('red', 3), ('blue', 4), ('red', 1), ('blue', 4)]
388 >>> d = defaultdict(set)
389 >>> for k, v in s:
390 ... d[k].add(v)
391 ...
392 >>> d.items()
393 [('blue', set([2, 4])), ('red', set([1, 3]))]
394
395
396.. _named-tuple-factory:
397
398:func:`NamedTuple` datatype factory function
399--------------------------------------------
400
401
402.. function:: NamedTuple(typename, fieldnames)
403
404 Returns a new tuple subclass named *typename*. The new subclass is used to
405 create tuple-like objects that have fields accessable by attribute lookup as
406 well as being indexable and iterable. Instances of the subclass also have a
407 helpful docstring (with typename and fieldnames) and a helpful :meth:`__repr__`
408 method which lists the tuple contents in a ``name=value`` format.
409
Georg Brandl116aa622007-08-15 14:28:22 +0000410 The *fieldnames* are specified in a single string and are separated by spaces.
411 Any valid Python identifier may be used for a field name.
412
413 Example::
414
415 >>> Point = NamedTuple('Point', 'x y')
416 >>> Point.__doc__ # docstring for the new datatype
417 'Point(x, y)'
418 >>> p = Point(11, y=22) # instantiate with positional or keyword arguments
419 >>> p[0] + p[1] # works just like the tuple (11, 22)
420 33
421 >>> x, y = p # unpacks just like a tuple
422 >>> x, y
423 (11, 22)
424 >>> p.x + p.y # fields also accessable by name
425 33
426 >>> p # readable __repr__ with name=value style
427 Point(x=11, y=22)
428
429 The use cases are the same as those for tuples. The named factories assign
430 meaning to each tuple position and allow for more readable, self-documenting
431 code. Named tuples can also be used to assign field names to tuples returned
432 by the :mod:`csv` or :mod:`sqlite3` modules. For example::
433
434 from itertools import starmap
435 import csv
436 EmployeeRecord = NamedTuple('EmployeeRecord', 'name age title department paygrade')
437 for record in starmap(EmployeeRecord, csv.reader(open("employees.csv", "rb"))):
Georg Brandl6911e3c2007-09-04 07:15:32 +0000438 print(record)
Georg Brandl116aa622007-08-15 14:28:22 +0000439
440 To cast an individual record stored as :class:`list`, :class:`tuple`, or some
Thomas Wouters47b49bf2007-08-30 22:15:33 +0000441 other iterable type, use the star-operator [#]_ to unpack the values::
Georg Brandl116aa622007-08-15 14:28:22 +0000442
443 >>> Color = NamedTuple('Color', 'name code')
444 >>> m = dict(red=1, green=2, blue=3)
Georg Brandl6911e3c2007-09-04 07:15:32 +0000445 >>> print(Color(*m.popitem()))
Georg Brandl116aa622007-08-15 14:28:22 +0000446 Color(name='blue', code=3)
447
Thomas Wouters47b49bf2007-08-30 22:15:33 +0000448.. rubric:: Footnotes
449
450.. [#] For information on the star-operator see
451 :ref:`tut-unpacking-arguments` and :ref:`calls`.