blob: 69701392b50b9a4333388cdfc66adbb0cf068316 [file] [log] [blame]
Raymond Hettinger584cb192002-08-23 15:18:38 +00001\section{\module{sets} ---
2 Unordered collections of unique elements}
3
4\declaremodule{standard}{sets}
5\modulesynopsis{Implementation of sets of unique elements.}
6\moduleauthor{Greg V. Wilson}{gvwilson@nevex.com}
7\moduleauthor{Alex Martelli}{aleax@aleax.it}
8\moduleauthor{Guido van Rossum}{guido@python.org}
9\sectionauthor{Raymond D. Hettinger}{python@rcn.com}
10
11\versionadded{2.3}
Georg Brandl7ded34d2007-01-13 12:31:51 +000012\deprecated{2.6}{The built-in \code{set}/\code{frozenset} types replace this
Brett Cannon093b6702007-01-13 00:29:49 +000013module.}
Raymond Hettinger584cb192002-08-23 15:18:38 +000014
15The \module{sets} module provides classes for constructing and manipulating
16unordered collections of unique elements. Common uses include membership
17testing, removing duplicates from a sequence, and computing standard math
18operations on sets such as intersection, union, difference, and symmetric
19difference.
20
Fred Draked10c6c92002-08-23 17:22:36 +000021Like other collections, sets support \code{\var{x} in \var{set}},
22\code{len(\var{set})}, and \code{for \var{x} in \var{set}}. Being an
23unordered collection, sets do not record element position or order of
24insertion. Accordingly, sets do not support indexing, slicing, or
25other sequence-like behavior.
Raymond Hettinger584cb192002-08-23 15:18:38 +000026
27Most set applications use the \class{Set} class which provides every set
28method except for \method{__hash__()}. For advanced applications requiring
29a hash method, the \class{ImmutableSet} class adds a \method{__hash__()}
30method but omits methods which alter the contents of the set. Both
31\class{Set} and \class{ImmutableSet} derive from \class{BaseSet}, an
32abstract class useful for determining whether something is a set:
Fred Draked10c6c92002-08-23 17:22:36 +000033\code{isinstance(\var{obj}, BaseSet)}.
Raymond Hettinger584cb192002-08-23 15:18:38 +000034
Raymond Hettingere4905022005-04-10 17:32:35 +000035The set classes are implemented using dictionaries. Accordingly, the
36requirements for set elements are the same as those for dictionary keys;
37namely, that the element defines both \method{__eq__} and \method{__hash__}.
38As a result, sets
Fred Draked10c6c92002-08-23 17:22:36 +000039cannot contain mutable elements such as lists or dictionaries.
40However, they can contain immutable collections such as tuples or
Raymond Hettingerfa8dd5f2002-08-23 18:10:54 +000041instances of \class{ImmutableSet}. For convenience in implementing
Fred Draked10c6c92002-08-23 17:22:36 +000042sets of sets, inner sets are automatically converted to immutable
43form, for example, \code{Set([Set(['dog'])])} is transformed to
Raymond Hettinger584cb192002-08-23 15:18:38 +000044\code{Set([ImmutableSet(['dog'])])}.
45
46\begin{classdesc}{Set}{\optional{iterable}}
47Constructs a new empty \class{Set} object. If the optional \var{iterable}
48parameter is supplied, updates the set with elements obtained from iteration.
49All of the elements in \var{iterable} should be immutable or be transformable
Fred Draked10c6c92002-08-23 17:22:36 +000050to an immutable using the protocol described in
51section~\ref{immutable-transforms}.
Raymond Hettinger584cb192002-08-23 15:18:38 +000052\end{classdesc}
53
54\begin{classdesc}{ImmutableSet}{\optional{iterable}}
55Constructs a new empty \class{ImmutableSet} object. If the optional
56\var{iterable} parameter is supplied, updates the set with elements obtained
57from iteration. All of the elements in \var{iterable} should be immutable or
Fred Draked10c6c92002-08-23 17:22:36 +000058be transformable to an immutable using the protocol described in
59section~\ref{immutable-transforms}.
Raymond Hettinger584cb192002-08-23 15:18:38 +000060
61Because \class{ImmutableSet} objects provide a \method{__hash__()} method,
62they can be used as set elements or as dictionary keys. \class{ImmutableSet}
63objects do not have methods for adding or removing elements, so all of the
64elements must be known when the constructor is called.
65\end{classdesc}
66
67
Fred Drake2e3ae212003-01-06 15:50:32 +000068\subsection{Set Objects \label{set-objects}}
Raymond Hettinger584cb192002-08-23 15:18:38 +000069
70Instances of \class{Set} and \class{ImmutableSet} both provide
71the following operations:
72
Raymond Hettinger7ceb29e2003-08-16 00:56:40 +000073\begin{tableiii}{c|c|l}{code}{Operation}{Equivalent}{Result}
74 \lineiii{len(\var{s})}{}{cardinality of set \var{s}}
Raymond Hettinger584cb192002-08-23 15:18:38 +000075
76 \hline
Raymond Hettinger7ceb29e2003-08-16 00:56:40 +000077 \lineiii{\var{x} in \var{s}}{}
Fred Draked10c6c92002-08-23 17:22:36 +000078 {test \var{x} for membership in \var{s}}
Raymond Hettinger7ceb29e2003-08-16 00:56:40 +000079 \lineiii{\var{x} not in \var{s}}{}
Fred Draked10c6c92002-08-23 17:22:36 +000080 {test \var{x} for non-membership in \var{s}}
Raymond Hettinger7ceb29e2003-08-16 00:56:40 +000081 \lineiii{\var{s}.issubset(\var{t})}{\code{\var{s} <= \var{t}}}
82 {test whether every element in \var{s} is in \var{t}}
83 \lineiii{\var{s}.issuperset(\var{t})}{\code{\var{s} >= \var{t}}}
84 {test whether every element in \var{t} is in \var{s}}
Raymond Hettinger584cb192002-08-23 15:18:38 +000085
86 \hline
Fred Drake447083e2005-01-19 07:24:34 +000087 \lineiii{\var{s}.union(\var{t})}{\var{s} \textbar{} \var{t}}
Fred Draked10c6c92002-08-23 17:22:36 +000088 {new set with elements from both \var{s} and \var{t}}
Raymond Hettinger7ceb29e2003-08-16 00:56:40 +000089 \lineiii{\var{s}.intersection(\var{t})}{\var{s} \&\ \var{t}}
Fred Draked10c6c92002-08-23 17:22:36 +000090 {new set with elements common to \var{s} and \var{t}}
Raymond Hettinger7ceb29e2003-08-16 00:56:40 +000091 \lineiii{\var{s}.difference(\var{t})}{\var{s} - \var{t}}
Fred Draked10c6c92002-08-23 17:22:36 +000092 {new set with elements in \var{s} but not in \var{t}}
Raymond Hettinger7ceb29e2003-08-16 00:56:40 +000093 \lineiii{\var{s}.symmetric_difference(\var{t})}{\var{s} \^\ \var{t}}
Fred Draked10c6c92002-08-23 17:22:36 +000094 {new set with elements in either \var{s} or \var{t} but not both}
Raymond Hettinger7ceb29e2003-08-16 00:56:40 +000095 \lineiii{\var{s}.copy()}{}
Fred Draked10c6c92002-08-23 17:22:36 +000096 {new set with a shallow copy of \var{s}}
Raymond Hettinger7ceb29e2003-08-16 00:56:40 +000097\end{tableiii}
Raymond Hettinger584cb192002-08-23 15:18:38 +000098
Raymond Hettingerd4462302003-11-26 17:52:45 +000099Note, the non-operator versions of \method{union()},
Raymond Hettinger6a180122003-08-17 08:34:09 +0000100\method{intersection()}, \method{difference()}, and
101\method{symmetric_difference()} will accept any iterable as an argument.
102In contrast, their operator based counterparts require their arguments to
103be sets. This precludes error-prone constructions like
104\code{Set('abc') \&\ 'cbs'} in favor of the more readable
105\code{Set('abc').intersection('cbs')}.
106\versionchanged[Formerly all arguments were required to be sets]{2.3.1}
107
Tim Petersea76c982002-08-25 18:43:10 +0000108In addition, both \class{Set} and \class{ImmutableSet}
109support set to set comparisons. Two sets are equal if and only if
110every element of each set is contained in the other (each is a subset
111of the other).
112A set is less than another set if and only if the first set is a proper
113subset of the second set (is a subset, but is not equal).
114A set is greater than another set if and only if the first set is a proper
115superset of the second set (is a superset, but is not equal).
Raymond Hettinger584cb192002-08-23 15:18:38 +0000116
Raymond Hettinger3801ec72003-01-15 15:46:05 +0000117The subset and equality comparisons do not generalize to a complete
118ordering function. For example, any two disjoint sets are not equal and
Raymond Hettinger7ceb29e2003-08-16 00:56:40 +0000119are not subsets of each other, so \emph{all} of the following return
120\code{False}: \code{\var{a}<\var{b}}, \code{\var{a}==\var{b}}, or
121\code{\var{a}>\var{b}}.
Raymond Hettinger3801ec72003-01-15 15:46:05 +0000122Accordingly, sets do not implement the \method{__cmp__} method.
123
124Since sets only define partial ordering (subset relationships), the output
125of the \method{list.sort()} method is undefined for lists of sets.
126
Raymond Hettinger584cb192002-08-23 15:18:38 +0000127The following table lists operations available in \class{ImmutableSet}
128but not found in \class{Set}:
129
Raymond Hettinger7ceb29e2003-08-16 00:56:40 +0000130\begin{tableii}{c|l}{code}{Operation}{Result}
Fred Draked10c6c92002-08-23 17:22:36 +0000131 \lineii{hash(\var{s})}{returns a hash value for \var{s}}
132\end{tableii}
Raymond Hettinger584cb192002-08-23 15:18:38 +0000133
134The following table lists operations available in \class{Set}
135but not found in \class{ImmutableSet}:
136
Raymond Hettinger7ceb29e2003-08-16 00:56:40 +0000137\begin{tableiii}{c|c|l}{code}{Operation}{Equivalent}{Result}
Raymond Hettinger16ffbc32005-07-01 23:00:13 +0000138 \lineiii{\var{s}.update(\var{t})}
Fred Drake447083e2005-01-19 07:24:34 +0000139 {\var{s} \textbar= \var{t}}
Fred Draked10c6c92002-08-23 17:22:36 +0000140 {return set \var{s} with elements added from \var{t}}
Raymond Hettinger7ceb29e2003-08-16 00:56:40 +0000141 \lineiii{\var{s}.intersection_update(\var{t})}
142 {\var{s} \&= \var{t}}
Fred Draked10c6c92002-08-23 17:22:36 +0000143 {return set \var{s} keeping only elements also found in \var{t}}
Raymond Hettinger7ceb29e2003-08-16 00:56:40 +0000144 \lineiii{\var{s}.difference_update(\var{t})}
145 {\var{s} -= \var{t}}
Fred Draked10c6c92002-08-23 17:22:36 +0000146 {return set \var{s} after removing elements found in \var{t}}
Raymond Hettinger7ceb29e2003-08-16 00:56:40 +0000147 \lineiii{\var{s}.symmetric_difference_update(\var{t})}
148 {\var{s} \textasciicircum= \var{t}}
Fred Draked10c6c92002-08-23 17:22:36 +0000149 {return set \var{s} with elements from \var{s} or \var{t}
150 but not both}
Raymond Hettinger584cb192002-08-23 15:18:38 +0000151
152 \hline
Raymond Hettinger7ceb29e2003-08-16 00:56:40 +0000153 \lineiii{\var{s}.add(\var{x})}{}
Fred Drake2e3ae212003-01-06 15:50:32 +0000154 {add element \var{x} to set \var{s}}
Raymond Hettinger7ceb29e2003-08-16 00:56:40 +0000155 \lineiii{\var{s}.remove(\var{x})}{}
Georg Brandldb815ab2006-03-17 16:26:31 +0000156 {remove \var{x} from set \var{s}; raises \exception{KeyError}
157 if not present}
Raymond Hettinger7ceb29e2003-08-16 00:56:40 +0000158 \lineiii{\var{s}.discard(\var{x})}{}
Fred Drake2e3ae212003-01-06 15:50:32 +0000159 {removes \var{x} from set \var{s} if present}
Raymond Hettinger7ceb29e2003-08-16 00:56:40 +0000160 \lineiii{\var{s}.pop()}{}
161 {remove and return an arbitrary element from \var{s}; raises
Georg Brandldb815ab2006-03-17 16:26:31 +0000162 \exception{KeyError} if empty}
Raymond Hettinger7ceb29e2003-08-16 00:56:40 +0000163 \lineiii{\var{s}.clear()}{}
Fred Drake2e3ae212003-01-06 15:50:32 +0000164 {remove all elements from set \var{s}}
Raymond Hettinger7ceb29e2003-08-16 00:56:40 +0000165\end{tableiii}
Raymond Hettinger584cb192002-08-23 15:18:38 +0000166
Raymond Hettinger16ffbc32005-07-01 23:00:13 +0000167Note, the non-operator versions of \method{update()},
Raymond Hettinger6a180122003-08-17 08:34:09 +0000168\method{intersection_update()}, \method{difference_update()}, and
169\method{symmetric_difference_update()} will accept any iterable as
170an argument.
171\versionchanged[Formerly all arguments were required to be sets]{2.3.1}
172
Raymond Hettinger16ffbc32005-07-01 23:00:13 +0000173Also note, the module also includes a \method{union_update()} method
174which is an alias for \method{update()}. The method is included for
Georg Brandl08c02db2005-07-22 18:39:19 +0000175backwards compatibility. Programmers should prefer the
Neal Norwitz26f4c232005-11-03 04:39:09 +0000176\method{update()} method because it is supported by the builtin
Raymond Hettinger16ffbc32005-07-01 23:00:13 +0000177\class{set()} and \class{frozenset()} types.
Raymond Hettinger584cb192002-08-23 15:18:38 +0000178
Fred Drake2e3ae212003-01-06 15:50:32 +0000179\subsection{Example \label{set-example}}
Raymond Hettinger584cb192002-08-23 15:18:38 +0000180
181\begin{verbatim}
182>>> from sets import Set
183>>> engineers = Set(['John', 'Jane', 'Jack', 'Janice'])
184>>> programmers = Set(['Jack', 'Sam', 'Susan', 'Janice'])
Raymond Hettinger7ceb29e2003-08-16 00:56:40 +0000185>>> managers = Set(['Jane', 'Jack', 'Susan', 'Zack'])
186>>> employees = engineers | programmers | managers # union
187>>> engineering_management = engineers & managers # intersection
188>>> fulltime_management = managers - engineers - programmers # difference
189>>> engineers.add('Marvin') # add element
Raymond Hettinger584cb192002-08-23 15:18:38 +0000190>>> print engineers
191Set(['Jane', 'Marvin', 'Janice', 'John', 'Jack'])
192>>> employees.issuperset(engineers) # superset test
193False
Raymond Hettinger6a180122003-08-17 08:34:09 +0000194>>> employees.union_update(engineers) # update from another set
Raymond Hettinger584cb192002-08-23 15:18:38 +0000195>>> employees.issuperset(engineers)
196True
Raymond Hettingerd4568492003-11-16 13:44:19 +0000197>>> for group in [engineers, programmers, managers, employees]:
Raymond Hettinger3801ec72003-01-15 15:46:05 +0000198... group.discard('Susan') # unconditionally remove element
199... print group
200...
Raymond Hettinger584cb192002-08-23 15:18:38 +0000201Set(['Jane', 'Marvin', 'Janice', 'John', 'Jack'])
202Set(['Janice', 'Jack', 'Sam'])
203Set(['Jane', 'Zack', 'Jack'])
204Set(['Jack', 'Sam', 'Jane', 'Marvin', 'Janice', 'John', 'Zack'])
205\end{verbatim}
206
207
208\subsection{Protocol for automatic conversion to immutable
209 \label{immutable-transforms}}
210
211Sets can only contain immutable elements. For convenience, mutable
212\class{Set} objects are automatically copied to an \class{ImmutableSet}
213before being added as a set element.
214
Fred Draked10c6c92002-08-23 17:22:36 +0000215The mechanism is to always add a hashable element, or if it is not
216hashable, the element is checked to see if it has an
Raymond Hettinger2835e372003-02-14 03:42:11 +0000217\method{__as_immutable__()} method which returns an immutable equivalent.
Raymond Hettinger584cb192002-08-23 15:18:38 +0000218
Raymond Hettinger2835e372003-02-14 03:42:11 +0000219Since \class{Set} objects have a \method{__as_immutable__()} method
Raymond Hettinger584cb192002-08-23 15:18:38 +0000220returning an instance of \class{ImmutableSet}, it is possible to
221construct sets of sets.
222
223A similar mechanism is needed by the \method{__contains__()} and
224\method{remove()} methods which need to hash an element to check
225for membership in a set. Those methods check an element for hashability
Raymond Hettinger2835e372003-02-14 03:42:11 +0000226and, if not, check for a \method{__as_temporarily_immutable__()} method
Raymond Hettinger584cb192002-08-23 15:18:38 +0000227which returns the element wrapped by a class that provides temporary
228methods for \method{__hash__()}, \method{__eq__()}, and \method{__ne__()}.
229
230The alternate mechanism spares the need to build a separate copy of
231the original mutable object.
232
Raymond Hettinger2835e372003-02-14 03:42:11 +0000233\class{Set} objects implement the \method{__as_temporarily_immutable__()}
Fred Draked10c6c92002-08-23 17:22:36 +0000234method which returns the \class{Set} object wrapped by a new class
Raymond Hettinger584cb192002-08-23 15:18:38 +0000235\class{_TemporarilyImmutableSet}.
236
237The two mechanisms for adding hashability are normally invisible to the
238user; however, a conflict can arise in a multi-threaded environment
Raymond Hettingerfa8dd5f2002-08-23 18:10:54 +0000239where one thread is updating a set while another has temporarily wrapped it
Raymond Hettinger584cb192002-08-23 15:18:38 +0000240in \class{_TemporarilyImmutableSet}. In other words, sets of mutable sets
241are not thread-safe.
Raymond Hettinger16ffbc32005-07-01 23:00:13 +0000242
243
244\subsection{Comparison to the built-in \class{set} types
245 \label{comparison-to-builtin-set}}
246
247The built-in \class{set} and \class{frozenset} types were designed based
248on lessons learned from the \module{sets} module. The key differences are:
249
250\begin{itemize}
251\item \class{Set} and \class{ImmutableSet} were renamed to \class{set} and
252 \class{frozenset}.
253\item There is no equivalent to \class{BaseSet}. Instead, use
254 \code{isinstance(x, (set, frozenset))}.
255\item The hash algorithm for the built-ins performs significantly better
256 (fewer collisions) for most datasets.
257\item The built-in versions have more space efficient pickles.
258\item The built-in versions do not have a \method{union_update()} method.
259 Instead, use the \method{update()} method which is equivalent.
Georg Brandl08c02db2005-07-22 18:39:19 +0000260\item The built-in versions do not have a \method{_repr(sorted=True)} method.
Raymond Hettinger16ffbc32005-07-01 23:00:13 +0000261 Instead, use the built-in \function{repr()} and \function{sorted()}
262 functions: \code{repr(sorted(s))}.
263\item The built-in version does not have a protocol for automatic conversion
264 to immutable. Many found this feature to be confusing and no one
265 in the community reported having found real uses for it.
266\end{itemize}