blob: fc44e011095185ae4d575ad948039e1dcf4dcafd [file] [log] [blame]
Raymond Hettingere52f3b12004-01-29 07:27:45 +00001\section{\module{collections} ---
Raymond Hettinger5c5eb862004-02-07 21:13:00 +00002 High-performance container datatypes}
Raymond Hettingere52f3b12004-01-29 07:27:45 +00003
4\declaremodule{standard}{collections}
5\modulesynopsis{High-performance datatypes}
6\moduleauthor{Raymond Hettinger}{python@rcn.com}
7\sectionauthor{Raymond Hettinger}{python@rcn.com}
8\versionadded{2.4}
9
10
Guido van Rossum1968ad32006-02-25 22:38:04 +000011This module implements high-performance container datatypes. Currently,
Guido van Rossumd8faa362007-04-27 19:54:29 +000012there are two datatypes, deque and defaultdict, and one datatype factory
13function, \function{NamedTuple}.
Thomas Wouters49fd7fa2006-04-21 10:40:58 +000014Future additions may include balanced trees and ordered dictionaries.
Guido van Rossum1968ad32006-02-25 22:38:04 +000015\versionchanged[Added defaultdict]{2.5}
Guido van Rossumd8faa362007-04-27 19:54:29 +000016\versionchanged[Added NamedTuple]{2.6}
Raymond Hettingere52f3b12004-01-29 07:27:45 +000017
Thomas Wouters49fd7fa2006-04-21 10:40:58 +000018\subsection{\class{deque} objects \label{deque-objects}}
19
Guido van Rossumd8faa362007-04-27 19:54:29 +000020\begin{classdesc}{deque}{\optional{iterable}}
21 Returns a new deque object initialized left-to-right (using
Raymond Hettingere52f3b12004-01-29 07:27:45 +000022 \method{append()}) with data from \var{iterable}. If \var{iterable}
23 is not specified, the new deque is empty.
24
Raymond Hettinger5c5eb862004-02-07 21:13:00 +000025 Deques are a generalization of stacks and queues (the name is pronounced
26 ``deck'' and is short for ``double-ended queue''). Deques support
27 thread-safe, memory efficient appends and pops from either side of the deque
28 with approximately the same \code{O(1)} performance in either direction.
29
30 Though \class{list} objects support similar operations, they are optimized
31 for fast fixed-length operations and incur \code{O(n)} memory movement costs
32 for \samp{pop(0)} and \samp{insert(0, v)} operations which change both the
33 size and position of the underlying data representation.
Raymond Hettingere52f3b12004-01-29 07:27:45 +000034 \versionadded{2.4}
Guido van Rossumd8faa362007-04-27 19:54:29 +000035\end{classdesc}
Raymond Hettingere52f3b12004-01-29 07:27:45 +000036
37Deque objects support the following methods:
38
39\begin{methoddesc}{append}{x}
40 Add \var{x} to the right side of the deque.
41\end{methoddesc}
42
43\begin{methoddesc}{appendleft}{x}
44 Add \var{x} to the left side of the deque.
45\end{methoddesc}
46
47\begin{methoddesc}{clear}{}
48 Remove all elements from the deque leaving it with length 0.
49\end{methoddesc}
50
Raymond Hettinger3ba85c22004-02-06 19:04:56 +000051\begin{methoddesc}{extend}{iterable}
52 Extend the right side of the deque by appending elements from
53 the iterable argument.
54\end{methoddesc}
55
56\begin{methoddesc}{extendleft}{iterable}
57 Extend the left side of the deque by appending elements from
58 \var{iterable}. Note, the series of left appends results in
59 reversing the order of elements in the iterable argument.
60\end{methoddesc}
61
Raymond Hettingere52f3b12004-01-29 07:27:45 +000062\begin{methoddesc}{pop}{}
63 Remove and return an element from the right side of the deque.
Thomas Wouters477c8d52006-05-27 19:21:47 +000064 If no elements are present, raises an \exception{IndexError}.
Raymond Hettingere52f3b12004-01-29 07:27:45 +000065\end{methoddesc}
66
67\begin{methoddesc}{popleft}{}
68 Remove and return an element from the left side of the deque.
Thomas Wouters477c8d52006-05-27 19:21:47 +000069 If no elements are present, raises an \exception{IndexError}.
Raymond Hettinger738ec902004-02-29 02:15:56 +000070\end{methoddesc}
71
Raymond Hettinger4aec61e2005-03-18 21:20:23 +000072\begin{methoddesc}{remove}{value}
73 Removed the first occurrence of \var{value}. If not found,
74 raises a \exception{ValueError}.
75 \versionadded{2.5}
76\end{methoddesc}
77
Raymond Hettinger5c5eb862004-02-07 21:13:00 +000078\begin{methoddesc}{rotate}{n}
79 Rotate the deque \var{n} steps to the right. If \var{n} is
80 negative, rotate to the left. Rotating one step to the right
Raymond Hettingerf5f9a3702004-04-30 22:52:50 +000081 is equivalent to: \samp{d.appendleft(d.pop())}.
Raymond Hettinger5c5eb862004-02-07 21:13:00 +000082\end{methoddesc}
83
84In addition to the above, deques support iteration, pickling, \samp{len(d)},
Raymond Hettinger0a4977c2004-03-01 23:16:22 +000085\samp{reversed(d)}, \samp{copy.copy(d)}, \samp{copy.deepcopy(d)},
86membership testing with the \keyword{in} operator, and subscript references
87such as \samp{d[-1]}.
Raymond Hettingere52f3b12004-01-29 07:27:45 +000088
89Example:
90
91\begin{verbatim}
92>>> from collections import deque
Raymond Hettinger5c5eb862004-02-07 21:13:00 +000093>>> d = deque('ghi') # make a new deque with three items
94>>> for elem in d: # iterate over the deque's elements
Raymond Hettinger738ec902004-02-29 02:15:56 +000095... print elem.upper()
Raymond Hettingere52f3b12004-01-29 07:27:45 +000096G
97H
98I
Raymond Hettinger738ec902004-02-29 02:15:56 +000099
Raymond Hettinger5c5eb862004-02-07 21:13:00 +0000100>>> d.append('j') # add a new entry to the right side
101>>> d.appendleft('f') # add a new entry to the left side
102>>> d # show the representation of the deque
Raymond Hettingere52f3b12004-01-29 07:27:45 +0000103deque(['f', 'g', 'h', 'i', 'j'])
Raymond Hettinger738ec902004-02-29 02:15:56 +0000104
Raymond Hettinger5c5eb862004-02-07 21:13:00 +0000105>>> d.pop() # return and remove the rightmost item
Raymond Hettingere52f3b12004-01-29 07:27:45 +0000106'j'
Raymond Hettinger5c5eb862004-02-07 21:13:00 +0000107>>> d.popleft() # return and remove the leftmost item
Raymond Hettingere52f3b12004-01-29 07:27:45 +0000108'f'
Raymond Hettinger5c5eb862004-02-07 21:13:00 +0000109>>> list(d) # list the contents of the deque
Raymond Hettingere52f3b12004-01-29 07:27:45 +0000110['g', 'h', 'i']
Raymond Hettinger0a4977c2004-03-01 23:16:22 +0000111>>> d[0] # peek at leftmost item
Raymond Hettinger738ec902004-02-29 02:15:56 +0000112'g'
Raymond Hettinger0a4977c2004-03-01 23:16:22 +0000113>>> d[-1] # peek at rightmost item
Raymond Hettinger738ec902004-02-29 02:15:56 +0000114'i'
Raymond Hettinger0a4977c2004-03-01 23:16:22 +0000115
Raymond Hettinger5c5eb862004-02-07 21:13:00 +0000116>>> list(reversed(d)) # list the contents of a deque in reverse
Raymond Hettingerc058fd12004-02-07 02:45:22 +0000117['i', 'h', 'g']
Raymond Hettinger5c5eb862004-02-07 21:13:00 +0000118>>> 'h' in d # search the deque
Raymond Hettingere52f3b12004-01-29 07:27:45 +0000119True
Raymond Hettinger5c5eb862004-02-07 21:13:00 +0000120>>> d.extend('jkl') # add multiple elements at once
Raymond Hettingere52f3b12004-01-29 07:27:45 +0000121>>> d
122deque(['g', 'h', 'i', 'j', 'k', 'l'])
Raymond Hettinger5c5eb862004-02-07 21:13:00 +0000123>>> d.rotate(1) # right rotation
124>>> d
125deque(['l', 'g', 'h', 'i', 'j', 'k'])
126>>> d.rotate(-1) # left rotation
127>>> d
128deque(['g', 'h', 'i', 'j', 'k', 'l'])
Raymond Hettinger738ec902004-02-29 02:15:56 +0000129
Raymond Hettinger5c5eb862004-02-07 21:13:00 +0000130>>> deque(reversed(d)) # make a new deque in reverse order
131deque(['l', 'k', 'j', 'i', 'h', 'g'])
132>>> d.clear() # empty the deque
133>>> d.pop() # cannot pop from an empty deque
Raymond Hettingere52f3b12004-01-29 07:27:45 +0000134Traceback (most recent call last):
135 File "<pyshell#6>", line 1, in -toplevel-
136 d.pop()
Raymond Hettinger738ec902004-02-29 02:15:56 +0000137IndexError: pop from an empty deque
Raymond Hettinger3ba85c22004-02-06 19:04:56 +0000138
Raymond Hettinger5c5eb862004-02-07 21:13:00 +0000139>>> d.extendleft('abc') # extendleft() reverses the input order
Raymond Hettinger3ba85c22004-02-06 19:04:56 +0000140>>> d
141deque(['c', 'b', 'a'])
Raymond Hettingerf5f9a3702004-04-30 22:52:50 +0000142\end{verbatim}
Raymond Hettinger3ba85c22004-02-06 19:04:56 +0000143
Thomas Wouters49fd7fa2006-04-21 10:40:58 +0000144\subsubsection{Recipes \label{deque-recipes}}
Raymond Hettingere7169eb2004-05-09 01:15:01 +0000145
146This section shows various approaches to working with deques.
147
148The \method{rotate()} method provides a way to implement \class{deque}
Raymond Hettinger2e669402004-06-12 07:59:40 +0000149slicing and deletion. For example, a pure python implementation of
150\code{del d[n]} relies on the \method{rotate()} method to position
151elements to be popped:
152
Raymond Hettingere7169eb2004-05-09 01:15:01 +0000153\begin{verbatim}
154def delete_nth(d, n):
Raymond Hettingere7169eb2004-05-09 01:15:01 +0000155 d.rotate(-n)
156 d.popleft()
157 d.rotate(n)
Raymond Hettingere7169eb2004-05-09 01:15:01 +0000158\end{verbatim}
159
Raymond Hettinger0e371f22004-05-12 20:55:56 +0000160To implement \class{deque} slicing, use a similar approach applying
161\method{rotate()} to bring a target element to the left side of the deque.
162Remove old entries with \method{popleft()}, add new entries with
163\method{extend()}, and then reverse the rotation.
Raymond Hettingere7169eb2004-05-09 01:15:01 +0000164
165With minor variations on that approach, it is easy to implement Forth style
166stack manipulations such as \code{dup}, \code{drop}, \code{swap}, \code{over},
167\code{pick}, \code{rot}, and \code{roll}.
Raymond Hettingerf5f9a3702004-04-30 22:52:50 +0000168
169A roundrobin task server can be built from a \class{deque} using
170\method{popleft()} to select the current task and \method{append()}
171to add it back to the tasklist if the input stream is not exhausted:
172
173\begin{verbatim}
174def roundrobin(*iterables):
175 pending = deque(iter(i) for i in iterables)
176 while pending:
177 task = pending.popleft()
178 try:
Georg Brandla18af4e2007-04-21 15:47:16 +0000179 yield next(task)
Raymond Hettingerf5f9a3702004-04-30 22:52:50 +0000180 except StopIteration:
181 continue
182 pending.append(task)
183
184>>> for value in roundrobin('abc', 'd', 'efgh'):
Raymond Hettingere7169eb2004-05-09 01:15:01 +0000185... print value
Raymond Hettingerf5f9a3702004-04-30 22:52:50 +0000186
187a
188d
189e
190b
191f
192c
193g
194h
195
196\end{verbatim}
Raymond Hettingere7169eb2004-05-09 01:15:01 +0000197
198
199Multi-pass data reduction algorithms can be succinctly expressed and
Raymond Hettinger2e669402004-06-12 07:59:40 +0000200efficiently coded by extracting elements with multiple calls to
201\method{popleft()}, applying the reduction function, and calling
202\method{append()} to add the result back to the queue.
Raymond Hettingere7169eb2004-05-09 01:15:01 +0000203
204For example, building a balanced binary tree of nested lists entails
205reducing two adjacent nodes into one by grouping them in a list:
206
207\begin{verbatim}
208def maketree(iterable):
209 d = deque(iterable)
210 while len(d) > 1:
211 pair = [d.popleft(), d.popleft()]
212 d.append(pair)
213 return list(d)
214
215>>> print maketree('abcdefgh')
216[[[['a', 'b'], ['c', 'd']], [['e', 'f'], ['g', 'h']]]]
217
218\end{verbatim}
Guido van Rossum1968ad32006-02-25 22:38:04 +0000219
220
221
Thomas Wouters49fd7fa2006-04-21 10:40:58 +0000222\subsection{\class{defaultdict} objects \label{defaultdict-objects}}
223
Guido van Rossumd8faa362007-04-27 19:54:29 +0000224\begin{classdesc}{defaultdict}{\optional{default_factory\optional{, ...}}}
Guido van Rossum1968ad32006-02-25 22:38:04 +0000225 Returns a new dictionary-like object. \class{defaultdict} is a subclass
226 of the builtin \class{dict} class. It overrides one method and adds one
227 writable instance variable. The remaining functionality is the same as
228 for the \class{dict} class and is not documented here.
229
230 The first argument provides the initial value for the
231 \member{default_factory} attribute; it defaults to \code{None}.
232 All remaining arguments are treated the same as if they were
233 passed to the \class{dict} constructor, including keyword arguments.
234
235 \versionadded{2.5}
Guido van Rossumd8faa362007-04-27 19:54:29 +0000236\end{classdesc}
Guido van Rossum1968ad32006-02-25 22:38:04 +0000237
238\class{defaultdict} objects support the following method in addition to
239the standard \class{dict} operations:
240
241\begin{methoddesc}{__missing__}{key}
242 If the \member{default_factory} attribute is \code{None}, this raises
243 an \exception{KeyError} exception with the \var{key} as argument.
244
245 If \member{default_factory} is not \code{None}, it is called without
246 arguments to provide a default value for the given \var{key}, this
247 value is inserted in the dictionary for the \var{key}, and returned.
248
249 If calling \member{default_factory} raises an exception this exception
250 is propagated unchanged.
251
252 This method is called by the \method{__getitem__} method of the
253 \class{dict} class when the requested key is not found; whatever it
254 returns or raises is then returned or raised by \method{__getitem__}.
255\end{methoddesc}
256
257\class{defaultdict} objects support the following instance variable:
258
Guido van Rossumd8faa362007-04-27 19:54:29 +0000259\begin{memberdesc}{default_factory}
Guido van Rossum1968ad32006-02-25 22:38:04 +0000260 This attribute is used by the \method{__missing__} method; it is initialized
261 from the first argument to the constructor, if present, or to \code{None},
262 if absent.
Guido van Rossumd8faa362007-04-27 19:54:29 +0000263\end{memberdesc}
Thomas Wouters49fd7fa2006-04-21 10:40:58 +0000264
265
266\subsubsection{\class{defaultdict} Examples \label{defaultdict-examples}}
267
268Using \class{list} as the \member{default_factory}, it is easy to group
269a sequence of key-value pairs into a dictionary of lists:
270
271\begin{verbatim}
272>>> s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]
273>>> d = defaultdict(list)
274>>> for k, v in s:
275 d[k].append(v)
276
277>>> d.items()
278[('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])]
279\end{verbatim}
280
281When each key is encountered for the first time, it is not already in the
282mapping; so an entry is automatically created using the
283\member{default_factory} function which returns an empty \class{list}. The
284\method{list.append()} operation then attaches the value to the new list. When
285keys are encountered again, the look-up proceeds normally (returning the list
286for that key) and the \method{list.append()} operation adds another value to
287the list. This technique is simpler and faster than an equivalent technique
288using \method{dict.setdefault()}:
289
290\begin{verbatim}
291>>> d = {}
292>>> for k, v in s:
293 d.setdefault(k, []).append(v)
294
295>>> d.items()
296[('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])]
297\end{verbatim}
298
299Setting the \member{default_factory} to \class{int} makes the
300\class{defaultdict} useful for counting (like a bag or multiset in other
301languages):
302
303\begin{verbatim}
304>>> s = 'mississippi'
305>>> d = defaultdict(int)
306>>> for k in s:
307 d[k] += 1
308
309>>> d.items()
310[('i', 4), ('p', 2), ('s', 4), ('m', 1)]
311\end{verbatim}
312
313When a letter is first encountered, it is missing from the mapping, so the
314\member{default_factory} function calls \function{int()} to supply a default
315count of zero. The increment operation then builds up the count for each
Thomas Wouterscf297e42007-02-23 15:07:44 +0000316letter.
317
318The function \function{int()} which always returns zero is just a special
319case of constant functions. A faster and more flexible way to create
Georg Brandla18af4e2007-04-21 15:47:16 +0000320constant functions is to use a lambda function which can supply
Thomas Wouterscf297e42007-02-23 15:07:44 +0000321any constant value (not just zero):
Thomas Wouters49fd7fa2006-04-21 10:40:58 +0000322
323\begin{verbatim}
Thomas Wouterscf297e42007-02-23 15:07:44 +0000324>>> def constant_factory(value):
Georg Brandla18af4e2007-04-21 15:47:16 +0000325... return lambda: value
Thomas Wouterscf297e42007-02-23 15:07:44 +0000326>>> d = defaultdict(constant_factory('<missing>'))
327>>> d.update(name='John', action='ran')
328>>> '%(name)s %(action)s to %(object)s' % d
329'John ran to <missing>'
Thomas Wouters49fd7fa2006-04-21 10:40:58 +0000330\end{verbatim}
331
332Setting the \member{default_factory} to \class{set} makes the
333\class{defaultdict} useful for building a dictionary of sets:
334
335\begin{verbatim}
336>>> s = [('red', 1), ('blue', 2), ('red', 3), ('blue', 4), ('red', 1), ('blue', 4)]
337>>> d = defaultdict(set)
338>>> for k, v in s:
339 d[k].add(v)
340
341>>> d.items()
342[('blue', set([2, 4])), ('red', set([1, 3]))]
343\end{verbatim}
Guido van Rossumd8faa362007-04-27 19:54:29 +0000344
345
346
347\subsection{\function{NamedTuple} datatype factory function \label{named-tuple-factory}}
348
349\begin{funcdesc}{NamedTuple}{typename, fieldnames}
350 Returns a new tuple subclass named \var{typename}. The new subclass is used
351 to create tuple-like objects that have fields accessable by attribute
352 lookup as well as being indexable and iterable. Instances of the subclass
353 also have a helpful docstring (with typename and fieldnames) and a helpful
354 \method{__repr__()} method which lists the tuple contents in a \code{name=value}
355 format.
356 \versionadded{2.6}
357
358 The \var{fieldnames} are specified in a single string and are separated by spaces.
359 Any valid Python identifier may be used for a field name.
360
361 Example:
362 \begin{verbatim}
363>>> Point = NamedTuple('Point', 'x y')
364>>> Point.__doc__ # docstring for the new datatype
365'Point(x, y)'
366>>> p = Point(11, y=22) # instantiate with positional or keyword arguments
367>>> p[0] + p[1] # works just like the tuple (11, 22)
36833
369>>> x, y = p # unpacks just like a tuple
370>>> x, y
371(11, 22)
372>>> p.x + p.y # fields also accessable by name
37333
374>>> p # readable __repr__ with name=value style
375Point(x=11, y=22)
376\end{verbatim}
377
378 The use cases are the same as those for tuples. The named factories
379 assign meaning to each tuple position and allow for more readable,
380 self-documenting code. Named tuples can also be used to assign field names
Guido van Rossumd59da4b2007-05-22 18:11:13 +0000381 to tuples returned by the \module{csv} or \module{sqlite3} modules.
382 For example:
Guido van Rossumd8faa362007-04-27 19:54:29 +0000383
384 \begin{verbatim}
Guido van Rossumd59da4b2007-05-22 18:11:13 +0000385from itertools import starmap
Guido van Rossumd8faa362007-04-27 19:54:29 +0000386import csv
387EmployeeRecord = NamedTuple('EmployeeRecord', 'name age title department paygrade')
Guido van Rossumd59da4b2007-05-22 18:11:13 +0000388for record in starmap(EmployeeRecord, csv.reader(open("employees.csv", "rb"))):
389 print record
390\end{verbatim}
391
392 To cast an individual record stored as \class{list}, \class{tuple}, or some other
393 iterable type, use the star-operator to unpack the values:
394
395 \begin{verbatim}
396>>> Color = NamedTuple('Color', 'name code')
397>>> m = dict(red=1, green=2, blue=3)
398>>> print Color(*m.popitem())
399Color(name='blue', code=3)
Guido van Rossumd8faa362007-04-27 19:54:29 +0000400\end{verbatim}
401
402\end{funcdesc}