Raymond Hettinger | 584cb19 | 2002-08-23 15:18:38 +0000 | [diff] [blame] | 1 | \section{\module{sets} --- |
| 2 | Unordered collections of unique elements} |
| 3 | |
| 4 | \declaremodule{standard}{sets} |
| 5 | \modulesynopsis{Implementation of sets of unique elements.} |
| 6 | \moduleauthor{Greg V. Wilson}{gvwilson@nevex.com} |
| 7 | \moduleauthor{Alex Martelli}{aleax@aleax.it} |
| 8 | \moduleauthor{Guido van Rossum}{guido@python.org} |
| 9 | \sectionauthor{Raymond D. Hettinger}{python@rcn.com} |
| 10 | |
| 11 | \versionadded{2.3} |
| 12 | |
| 13 | The \module{sets} module provides classes for constructing and manipulating |
| 14 | unordered collections of unique elements. Common uses include membership |
| 15 | testing, removing duplicates from a sequence, and computing standard math |
| 16 | operations on sets such as intersection, union, difference, and symmetric |
| 17 | difference. |
| 18 | |
Fred Drake | d10c6c9 | 2002-08-23 17:22:36 +0000 | [diff] [blame] | 19 | Like other collections, sets support \code{\var{x} in \var{set}}, |
| 20 | \code{len(\var{set})}, and \code{for \var{x} in \var{set}}. Being an |
| 21 | unordered collection, sets do not record element position or order of |
| 22 | insertion. Accordingly, sets do not support indexing, slicing, or |
| 23 | other sequence-like behavior. |
Raymond Hettinger | 584cb19 | 2002-08-23 15:18:38 +0000 | [diff] [blame] | 24 | |
| 25 | Most set applications use the \class{Set} class which provides every set |
| 26 | method except for \method{__hash__()}. For advanced applications requiring |
| 27 | a hash method, the \class{ImmutableSet} class adds a \method{__hash__()} |
| 28 | method but omits methods which alter the contents of the set. Both |
| 29 | \class{Set} and \class{ImmutableSet} derive from \class{BaseSet}, an |
| 30 | abstract class useful for determining whether something is a set: |
Fred Drake | d10c6c9 | 2002-08-23 17:22:36 +0000 | [diff] [blame] | 31 | \code{isinstance(\var{obj}, BaseSet)}. |
Raymond Hettinger | 584cb19 | 2002-08-23 15:18:38 +0000 | [diff] [blame] | 32 | |
Fred Drake | d10c6c9 | 2002-08-23 17:22:36 +0000 | [diff] [blame] | 33 | The set classes are implemented using dictionaries. As a result, sets |
| 34 | cannot contain mutable elements such as lists or dictionaries. |
| 35 | However, they can contain immutable collections such as tuples or |
Raymond Hettinger | fa8dd5f | 2002-08-23 18:10:54 +0000 | [diff] [blame] | 36 | instances of \class{ImmutableSet}. For convenience in implementing |
Fred Drake | d10c6c9 | 2002-08-23 17:22:36 +0000 | [diff] [blame] | 37 | sets of sets, inner sets are automatically converted to immutable |
| 38 | form, for example, \code{Set([Set(['dog'])])} is transformed to |
Raymond Hettinger | 584cb19 | 2002-08-23 15:18:38 +0000 | [diff] [blame] | 39 | \code{Set([ImmutableSet(['dog'])])}. |
| 40 | |
| 41 | \begin{classdesc}{Set}{\optional{iterable}} |
| 42 | Constructs a new empty \class{Set} object. If the optional \var{iterable} |
| 43 | parameter is supplied, updates the set with elements obtained from iteration. |
| 44 | All of the elements in \var{iterable} should be immutable or be transformable |
Fred Drake | d10c6c9 | 2002-08-23 17:22:36 +0000 | [diff] [blame] | 45 | to an immutable using the protocol described in |
| 46 | section~\ref{immutable-transforms}. |
Raymond Hettinger | 584cb19 | 2002-08-23 15:18:38 +0000 | [diff] [blame] | 47 | \end{classdesc} |
| 48 | |
| 49 | \begin{classdesc}{ImmutableSet}{\optional{iterable}} |
| 50 | Constructs a new empty \class{ImmutableSet} object. If the optional |
| 51 | \var{iterable} parameter is supplied, updates the set with elements obtained |
| 52 | from iteration. All of the elements in \var{iterable} should be immutable or |
Fred Drake | d10c6c9 | 2002-08-23 17:22:36 +0000 | [diff] [blame] | 53 | be transformable to an immutable using the protocol described in |
| 54 | section~\ref{immutable-transforms}. |
Raymond Hettinger | 584cb19 | 2002-08-23 15:18:38 +0000 | [diff] [blame] | 55 | |
| 56 | Because \class{ImmutableSet} objects provide a \method{__hash__()} method, |
| 57 | they can be used as set elements or as dictionary keys. \class{ImmutableSet} |
| 58 | objects do not have methods for adding or removing elements, so all of the |
| 59 | elements must be known when the constructor is called. |
| 60 | \end{classdesc} |
| 61 | |
| 62 | |
Fred Drake | 2e3ae21 | 2003-01-06 15:50:32 +0000 | [diff] [blame] | 63 | \subsection{Set Objects \label{set-objects}} |
Raymond Hettinger | 584cb19 | 2002-08-23 15:18:38 +0000 | [diff] [blame] | 64 | |
| 65 | Instances of \class{Set} and \class{ImmutableSet} both provide |
| 66 | the following operations: |
| 67 | |
Fred Drake | d10c6c9 | 2002-08-23 17:22:36 +0000 | [diff] [blame] | 68 | \begin{tableii}{c|l}{code}{Operation}{Result} |
| 69 | \lineii{len(\var{s})}{cardinality of set \var{s}} |
Raymond Hettinger | 584cb19 | 2002-08-23 15:18:38 +0000 | [diff] [blame] | 70 | |
| 71 | \hline |
Fred Drake | d10c6c9 | 2002-08-23 17:22:36 +0000 | [diff] [blame] | 72 | \lineii{\var{x} in \var{s}} |
| 73 | {test \var{x} for membership in \var{s}} |
| 74 | \lineii{\var{x} not in \var{s}} |
| 75 | {test \var{x} for non-membership in \var{s}} |
| 76 | \lineii{\var{s}.issubset(\var{t})} |
Tim Peters | ea76c98 | 2002-08-25 18:43:10 +0000 | [diff] [blame] | 77 | {test whether every element in \var{s} is in \var{t}; |
| 78 | \code{\var{s} <= \var{t}} is equivalent} |
Fred Drake | d10c6c9 | 2002-08-23 17:22:36 +0000 | [diff] [blame] | 79 | \lineii{\var{s}.issuperset(\var{t})} |
Tim Peters | ea76c98 | 2002-08-25 18:43:10 +0000 | [diff] [blame] | 80 | {test whether every element in \var{t} is in \var{s}; |
| 81 | \code{\var{s} >= \var{t}} is equivalent} |
Raymond Hettinger | 584cb19 | 2002-08-23 15:18:38 +0000 | [diff] [blame] | 82 | |
| 83 | \hline |
Fred Drake | d10c6c9 | 2002-08-23 17:22:36 +0000 | [diff] [blame] | 84 | \lineii{\var{s} | \var{t}} |
| 85 | {new set with elements from both \var{s} and \var{t}} |
| 86 | \lineii{\var{s}.union(\var{t})} |
| 87 | {new set with elements from both \var{s} and \var{t}} |
| 88 | \lineii{\var{s} \&\ \var{t}} |
| 89 | {new set with elements common to \var{s} and \var{t}} |
| 90 | \lineii{\var{s}.intersection(\var{t})} |
| 91 | {new set with elements common to \var{s} and \var{t}} |
| 92 | \lineii{\var{s} - \var{t}} |
| 93 | {new set with elements in \var{s} but not in \var{t}} |
| 94 | \lineii{\var{s}.difference(\var{t})} |
| 95 | {new set with elements in \var{s} but not in \var{t}} |
| 96 | \lineii{\var{s} \textasciicircum\ \var{t}} |
| 97 | {new set with elements in either \var{s} or \var{t} but not both} |
| 98 | \lineii{\var{s}.symmetric_difference(\var{t})} |
| 99 | {new set with elements in either \var{s} or \var{t} but not both} |
| 100 | \lineii{\var{s}.copy()} |
| 101 | {new set with a shallow copy of \var{s}} |
| 102 | \end{tableii} |
Raymond Hettinger | 584cb19 | 2002-08-23 15:18:38 +0000 | [diff] [blame] | 103 | |
Tim Peters | ea76c98 | 2002-08-25 18:43:10 +0000 | [diff] [blame] | 104 | In addition, both \class{Set} and \class{ImmutableSet} |
| 105 | support set to set comparisons. Two sets are equal if and only if |
| 106 | every element of each set is contained in the other (each is a subset |
| 107 | of the other). |
| 108 | A set is less than another set if and only if the first set is a proper |
| 109 | subset of the second set (is a subset, but is not equal). |
| 110 | A set is greater than another set if and only if the first set is a proper |
| 111 | superset of the second set (is a superset, but is not equal). |
Raymond Hettinger | 584cb19 | 2002-08-23 15:18:38 +0000 | [diff] [blame] | 112 | |
Raymond Hettinger | 3801ec7 | 2003-01-15 15:46:05 +0000 | [diff] [blame] | 113 | The subset and equality comparisons do not generalize to a complete |
| 114 | ordering function. For example, any two disjoint sets are not equal and |
| 115 | are not subsets of each other, so \emph{none} of the following are true: |
| 116 | \code{\var{a}<\var{b}}, \code{\var{a}==\var{b}}, or \code{\var{a}>\var{b}}. |
| 117 | Accordingly, sets do not implement the \method{__cmp__} method. |
| 118 | |
| 119 | Since sets only define partial ordering (subset relationships), the output |
| 120 | of the \method{list.sort()} method is undefined for lists of sets. |
| 121 | |
Raymond Hettinger | 584cb19 | 2002-08-23 15:18:38 +0000 | [diff] [blame] | 122 | The following table lists operations available in \class{ImmutableSet} |
| 123 | but not found in \class{Set}: |
| 124 | |
Fred Drake | d10c6c9 | 2002-08-23 17:22:36 +0000 | [diff] [blame] | 125 | \begin{tableii}{c|l|c}{code}{Operation}{Result} |
| 126 | \lineii{hash(\var{s})}{returns a hash value for \var{s}} |
| 127 | \end{tableii} |
Raymond Hettinger | 584cb19 | 2002-08-23 15:18:38 +0000 | [diff] [blame] | 128 | |
| 129 | The following table lists operations available in \class{Set} |
| 130 | but not found in \class{ImmutableSet}: |
| 131 | |
Fred Drake | d10c6c9 | 2002-08-23 17:22:36 +0000 | [diff] [blame] | 132 | \begin{tableii}{c|l}{code}{Operation}{Result} |
| 133 | \lineii{\var{s} |= \var{t}} |
| 134 | {return set \var{s} with elements added from \var{t}} |
| 135 | \lineii{\var{s}.union_update(\var{t})} |
| 136 | {return set \var{s} with elements added from \var{t}} |
| 137 | \lineii{\var{s} \&= \var{t}} |
| 138 | {return set \var{s} keeping only elements also found in \var{t}} |
| 139 | \lineii{\var{s}.intersection_update(\var{t})} |
| 140 | {return set \var{s} keeping only elements also found in \var{t}} |
| 141 | \lineii{\var{s} -= \var{t}} |
| 142 | {return set \var{s} after removing elements found in \var{t}} |
| 143 | \lineii{\var{s}.difference_update(\var{t})} |
| 144 | {return set \var{s} after removing elements found in \var{t}} |
| 145 | \lineii{\var{s} \textasciicircum= \var{t}} |
| 146 | {return set \var{s} with elements from \var{s} or \var{t} |
| 147 | but not both} |
| 148 | \lineii{\var{s}.symmetric_difference_update(\var{t})} |
| 149 | {return set \var{s} with elements from \var{s} or \var{t} |
| 150 | but not both} |
Raymond Hettinger | 584cb19 | 2002-08-23 15:18:38 +0000 | [diff] [blame] | 151 | |
| 152 | \hline |
Fred Drake | d10c6c9 | 2002-08-23 17:22:36 +0000 | [diff] [blame] | 153 | \lineii{\var{s}.add(\var{x})} |
Fred Drake | 2e3ae21 | 2003-01-06 15:50:32 +0000 | [diff] [blame] | 154 | {add element \var{x} to set \var{s}} |
Fred Drake | d10c6c9 | 2002-08-23 17:22:36 +0000 | [diff] [blame] | 155 | \lineii{\var{s}.remove(\var{x})} |
Fred Drake | 2e3ae21 | 2003-01-06 15:50:32 +0000 | [diff] [blame] | 156 | {remove \var{x} from set \var{s}} |
Fred Drake | d10c6c9 | 2002-08-23 17:22:36 +0000 | [diff] [blame] | 157 | \lineii{\var{s}.discard(\var{x})} |
Fred Drake | 2e3ae21 | 2003-01-06 15:50:32 +0000 | [diff] [blame] | 158 | {removes \var{x} from set \var{s} if present} |
Fred Drake | d10c6c9 | 2002-08-23 17:22:36 +0000 | [diff] [blame] | 159 | \lineii{\var{s}.pop()} |
Fred Drake | 2e3ae21 | 2003-01-06 15:50:32 +0000 | [diff] [blame] | 160 | {remove and return an arbitrary element from \var{s}} |
Fred Drake | d10c6c9 | 2002-08-23 17:22:36 +0000 | [diff] [blame] | 161 | \lineii{\var{s}.update(\var{t})} |
Fred Drake | 2e3ae21 | 2003-01-06 15:50:32 +0000 | [diff] [blame] | 162 | {add elements from \var{t} to set \var{s}} |
Fred Drake | d10c6c9 | 2002-08-23 17:22:36 +0000 | [diff] [blame] | 163 | \lineii{\var{s}.clear()} |
Fred Drake | 2e3ae21 | 2003-01-06 15:50:32 +0000 | [diff] [blame] | 164 | {remove all elements from set \var{s}} |
Fred Drake | d10c6c9 | 2002-08-23 17:22:36 +0000 | [diff] [blame] | 165 | \end{tableii} |
Raymond Hettinger | 584cb19 | 2002-08-23 15:18:38 +0000 | [diff] [blame] | 166 | |
| 167 | |
Fred Drake | 2e3ae21 | 2003-01-06 15:50:32 +0000 | [diff] [blame] | 168 | \subsection{Example \label{set-example}} |
Raymond Hettinger | 584cb19 | 2002-08-23 15:18:38 +0000 | [diff] [blame] | 169 | |
| 170 | \begin{verbatim} |
| 171 | >>> from sets import Set |
| 172 | >>> engineers = Set(['John', 'Jane', 'Jack', 'Janice']) |
| 173 | >>> programmers = Set(['Jack', 'Sam', 'Susan', 'Janice']) |
| 174 | >>> management = Set(['Jane', 'Jack', 'Susan', 'Zack']) |
| 175 | >>> employees = engineers | programmers | management # union |
| 176 | >>> engineering_management = engineers & programmers # intersection |
| 177 | >>> fulltime_management = management - engineers - programmers # difference |
| 178 | >>> engineers.add('Marvin') # add element |
| 179 | >>> print engineers |
| 180 | Set(['Jane', 'Marvin', 'Janice', 'John', 'Jack']) |
| 181 | >>> employees.issuperset(engineers) # superset test |
| 182 | False |
| 183 | >>> employees.update(engineers) # update from another set |
| 184 | >>> employees.issuperset(engineers) |
| 185 | True |
| 186 | >>> for group in [engineers, programmers, management, employees]: |
Raymond Hettinger | 3801ec7 | 2003-01-15 15:46:05 +0000 | [diff] [blame] | 187 | ... group.discard('Susan') # unconditionally remove element |
| 188 | ... print group |
| 189 | ... |
Raymond Hettinger | 584cb19 | 2002-08-23 15:18:38 +0000 | [diff] [blame] | 190 | Set(['Jane', 'Marvin', 'Janice', 'John', 'Jack']) |
| 191 | Set(['Janice', 'Jack', 'Sam']) |
| 192 | Set(['Jane', 'Zack', 'Jack']) |
| 193 | Set(['Jack', 'Sam', 'Jane', 'Marvin', 'Janice', 'John', 'Zack']) |
| 194 | \end{verbatim} |
| 195 | |
| 196 | |
| 197 | \subsection{Protocol for automatic conversion to immutable |
| 198 | \label{immutable-transforms}} |
| 199 | |
| 200 | Sets can only contain immutable elements. For convenience, mutable |
| 201 | \class{Set} objects are automatically copied to an \class{ImmutableSet} |
| 202 | before being added as a set element. |
| 203 | |
Fred Drake | d10c6c9 | 2002-08-23 17:22:36 +0000 | [diff] [blame] | 204 | The mechanism is to always add a hashable element, or if it is not |
| 205 | hashable, the element is checked to see if it has an |
| 206 | \method{_as_immutable()} method which returns an immutable equivalent. |
Raymond Hettinger | 584cb19 | 2002-08-23 15:18:38 +0000 | [diff] [blame] | 207 | |
| 208 | Since \class{Set} objects have a \method{_as_immutable()} method |
| 209 | returning an instance of \class{ImmutableSet}, it is possible to |
| 210 | construct sets of sets. |
| 211 | |
| 212 | A similar mechanism is needed by the \method{__contains__()} and |
| 213 | \method{remove()} methods which need to hash an element to check |
| 214 | for membership in a set. Those methods check an element for hashability |
Tim Peters | b81b252 | 2002-08-23 17:48:23 +0000 | [diff] [blame] | 215 | and, if not, check for a \method{_as_temporarily_immutable()} method |
Raymond Hettinger | 584cb19 | 2002-08-23 15:18:38 +0000 | [diff] [blame] | 216 | which returns the element wrapped by a class that provides temporary |
| 217 | methods for \method{__hash__()}, \method{__eq__()}, and \method{__ne__()}. |
| 218 | |
| 219 | The alternate mechanism spares the need to build a separate copy of |
| 220 | the original mutable object. |
| 221 | |
Tim Peters | b81b252 | 2002-08-23 17:48:23 +0000 | [diff] [blame] | 222 | \class{Set} objects implement the \method{_as_temporarily_immutable()} |
Fred Drake | d10c6c9 | 2002-08-23 17:22:36 +0000 | [diff] [blame] | 223 | method which returns the \class{Set} object wrapped by a new class |
Raymond Hettinger | 584cb19 | 2002-08-23 15:18:38 +0000 | [diff] [blame] | 224 | \class{_TemporarilyImmutableSet}. |
| 225 | |
| 226 | The two mechanisms for adding hashability are normally invisible to the |
| 227 | user; however, a conflict can arise in a multi-threaded environment |
Raymond Hettinger | fa8dd5f | 2002-08-23 18:10:54 +0000 | [diff] [blame] | 228 | where one thread is updating a set while another has temporarily wrapped it |
Raymond Hettinger | 584cb19 | 2002-08-23 15:18:38 +0000 | [diff] [blame] | 229 | in \class{_TemporarilyImmutableSet}. In other words, sets of mutable sets |
| 230 | are not thread-safe. |