blob: 40ea03197692b98d3fb3865df804c52d8e1e86de [file] [log] [blame]
\section{\module{sets} ---
Unordered collections of unique elements}
\declaremodule{standard}{sets}
\modulesynopsis{Implementation of sets of unique elements.}
\moduleauthor{Greg V. Wilson}{gvwilson@nevex.com}
\moduleauthor{Alex Martelli}{aleax@aleax.it}
\moduleauthor{Guido van Rossum}{guido@python.org}
\sectionauthor{Raymond D. Hettinger}{python@rcn.com}
\versionadded{2.3}
The \module{sets} module provides classes for constructing and manipulating
unordered collections of unique elements. Common uses include membership
testing, removing duplicates from a sequence, and computing standard math
operations on sets such as intersection, union, difference, and symmetric
difference.
Like other collections, sets support \code{\var{x} in \var{set}},
\code{len(\var{set})}, and \code{for \var{x} in \var{set}}. Being an
unordered collection, sets do not record element position or order of
insertion. Accordingly, sets do not support indexing, slicing, or
other sequence-like behavior.
Most set applications use the \class{Set} class which provides every set
method except for \method{__hash__()}. For advanced applications requiring
a hash method, the \class{ImmutableSet} class adds a \method{__hash__()}
method but omits methods which alter the contents of the set. Both
\class{Set} and \class{ImmutableSet} derive from \class{BaseSet}, an
abstract class useful for determining whether something is a set:
\code{isinstance(\var{obj}, BaseSet)}.
The set classes are implemented using dictionaries. Accordingly, the
requirements for set elements are the same as those for dictionary keys;
namely, that the element defines both \method{__eq__} and \method{__hash__}.
As a result, sets
cannot contain mutable elements such as lists or dictionaries.
However, they can contain immutable collections such as tuples or
instances of \class{ImmutableSet}. For convenience in implementing
sets of sets, inner sets are automatically converted to immutable
form, for example, \code{Set([Set(['dog'])])} is transformed to
\code{Set([ImmutableSet(['dog'])])}.
\begin{classdesc}{Set}{\optional{iterable}}
Constructs a new empty \class{Set} object. If the optional \var{iterable}
parameter is supplied, updates the set with elements obtained from iteration.
All of the elements in \var{iterable} should be immutable or be transformable
to an immutable using the protocol described in
section~\ref{immutable-transforms}.
\end{classdesc}
\begin{classdesc}{ImmutableSet}{\optional{iterable}}
Constructs a new empty \class{ImmutableSet} object. If the optional
\var{iterable} parameter is supplied, updates the set with elements obtained
from iteration. All of the elements in \var{iterable} should be immutable or
be transformable to an immutable using the protocol described in
section~\ref{immutable-transforms}.
Because \class{ImmutableSet} objects provide a \method{__hash__()} method,
they can be used as set elements or as dictionary keys. \class{ImmutableSet}
objects do not have methods for adding or removing elements, so all of the
elements must be known when the constructor is called.
\end{classdesc}
\subsection{Set Objects \label{set-objects}}
Instances of \class{Set} and \class{ImmutableSet} both provide
the following operations:
\begin{tableiii}{c|c|l}{code}{Operation}{Equivalent}{Result}
\lineiii{len(\var{s})}{}{cardinality of set \var{s}}
\hline
\lineiii{\var{x} in \var{s}}{}
{test \var{x} for membership in \var{s}}
\lineiii{\var{x} not in \var{s}}{}
{test \var{x} for non-membership in \var{s}}
\lineiii{\var{s}.issubset(\var{t})}{\code{\var{s} <= \var{t}}}
{test whether every element in \var{s} is in \var{t}}
\lineiii{\var{s}.issuperset(\var{t})}{\code{\var{s} >= \var{t}}}
{test whether every element in \var{t} is in \var{s}}
\hline
\lineiii{\var{s}.union(\var{t})}{\var{s} \textbar{} \var{t}}
{new set with elements from both \var{s} and \var{t}}
\lineiii{\var{s}.intersection(\var{t})}{\var{s} \&\ \var{t}}
{new set with elements common to \var{s} and \var{t}}
\lineiii{\var{s}.difference(\var{t})}{\var{s} - \var{t}}
{new set with elements in \var{s} but not in \var{t}}
\lineiii{\var{s}.symmetric_difference(\var{t})}{\var{s} \^\ \var{t}}
{new set with elements in either \var{s} or \var{t} but not both}
\lineiii{\var{s}.copy()}{}
{new set with a shallow copy of \var{s}}
\end{tableiii}
Note, the non-operator versions of \method{union()},
\method{intersection()}, \method{difference()}, and
\method{symmetric_difference()} will accept any iterable as an argument.
In contrast, their operator based counterparts require their arguments to
be sets. This precludes error-prone constructions like
\code{Set('abc') \&\ 'cbs'} in favor of the more readable
\code{Set('abc').intersection('cbs')}.
\versionchanged[Formerly all arguments were required to be sets]{2.3.1}
In addition, both \class{Set} and \class{ImmutableSet}
support set to set comparisons. Two sets are equal if and only if
every element of each set is contained in the other (each is a subset
of the other).
A set is less than another set if and only if the first set is a proper
subset of the second set (is a subset, but is not equal).
A set is greater than another set if and only if the first set is a proper
superset of the second set (is a superset, but is not equal).
The subset and equality comparisons do not generalize to a complete
ordering function. For example, any two disjoint sets are not equal and
are not subsets of each other, so \emph{all} of the following return
\code{False}: \code{\var{a}<\var{b}}, \code{\var{a}==\var{b}}, or
\code{\var{a}>\var{b}}.
Accordingly, sets do not implement the \method{__cmp__} method.
Since sets only define partial ordering (subset relationships), the output
of the \method{list.sort()} method is undefined for lists of sets.
The following table lists operations available in \class{ImmutableSet}
but not found in \class{Set}:
\begin{tableii}{c|l}{code}{Operation}{Result}
\lineii{hash(\var{s})}{returns a hash value for \var{s}}
\end{tableii}
The following table lists operations available in \class{Set}
but not found in \class{ImmutableSet}:
\begin{tableiii}{c|c|l}{code}{Operation}{Equivalent}{Result}
\lineiii{\var{s}.update(\var{t})}
{\var{s} \textbar= \var{t}}
{return set \var{s} with elements added from \var{t}}
\lineiii{\var{s}.intersection_update(\var{t})}
{\var{s} \&= \var{t}}
{return set \var{s} keeping only elements also found in \var{t}}
\lineiii{\var{s}.difference_update(\var{t})}
{\var{s} -= \var{t}}
{return set \var{s} after removing elements found in \var{t}}
\lineiii{\var{s}.symmetric_difference_update(\var{t})}
{\var{s} \textasciicircum= \var{t}}
{return set \var{s} with elements from \var{s} or \var{t}
but not both}
\hline
\lineiii{\var{s}.add(\var{x})}{}
{add element \var{x} to set \var{s}}
\lineiii{\var{s}.remove(\var{x})}{}
{remove \var{x} from set \var{s}; raises KeyError if not present}
\lineiii{\var{s}.discard(\var{x})}{}
{removes \var{x} from set \var{s} if present}
\lineiii{\var{s}.pop()}{}
{remove and return an arbitrary element from \var{s}; raises
KeyError if empty}
\lineiii{\var{s}.clear()}{}
{remove all elements from set \var{s}}
\end{tableiii}
Note, the non-operator versions of \method{update()},
\method{intersection_update()}, \method{difference_update()}, and
\method{symmetric_difference_update()} will accept any iterable as
an argument.
\versionchanged[Formerly all arguments were required to be sets]{2.3.1}
Also note, the module also includes a \method{union_update()} method
which is an alias for \method{update()}. The method is included for
backwards compatibility. Programmers should prefer the
\method{update()} method because it is the one supported by the builtin
\class{set()} and \class{frozenset()} types.
\subsection{Example \label{set-example}}
\begin{verbatim}
>>> from sets import Set
>>> engineers = Set(['John', 'Jane', 'Jack', 'Janice'])
>>> programmers = Set(['Jack', 'Sam', 'Susan', 'Janice'])
>>> managers = Set(['Jane', 'Jack', 'Susan', 'Zack'])
>>> employees = engineers | programmers | managers # union
>>> engineering_management = engineers & managers # intersection
>>> fulltime_management = managers - engineers - programmers # difference
>>> engineers.add('Marvin') # add element
>>> print engineers
Set(['Jane', 'Marvin', 'Janice', 'John', 'Jack'])
>>> employees.issuperset(engineers) # superset test
False
>>> employees.union_update(engineers) # update from another set
>>> employees.issuperset(engineers)
True
>>> for group in [engineers, programmers, managers, employees]:
... group.discard('Susan') # unconditionally remove element
... print group
...
Set(['Jane', 'Marvin', 'Janice', 'John', 'Jack'])
Set(['Janice', 'Jack', 'Sam'])
Set(['Jane', 'Zack', 'Jack'])
Set(['Jack', 'Sam', 'Jane', 'Marvin', 'Janice', 'John', 'Zack'])
\end{verbatim}
\subsection{Protocol for automatic conversion to immutable
\label{immutable-transforms}}
Sets can only contain immutable elements. For convenience, mutable
\class{Set} objects are automatically copied to an \class{ImmutableSet}
before being added as a set element.
The mechanism is to always add a hashable element, or if it is not
hashable, the element is checked to see if it has an
\method{__as_immutable__()} method which returns an immutable equivalent.
Since \class{Set} objects have a \method{__as_immutable__()} method
returning an instance of \class{ImmutableSet}, it is possible to
construct sets of sets.
A similar mechanism is needed by the \method{__contains__()} and
\method{remove()} methods which need to hash an element to check
for membership in a set. Those methods check an element for hashability
and, if not, check for a \method{__as_temporarily_immutable__()} method
which returns the element wrapped by a class that provides temporary
methods for \method{__hash__()}, \method{__eq__()}, and \method{__ne__()}.
The alternate mechanism spares the need to build a separate copy of
the original mutable object.
\class{Set} objects implement the \method{__as_temporarily_immutable__()}
method which returns the \class{Set} object wrapped by a new class
\class{_TemporarilyImmutableSet}.
The two mechanisms for adding hashability are normally invisible to the
user; however, a conflict can arise in a multi-threaded environment
where one thread is updating a set while another has temporarily wrapped it
in \class{_TemporarilyImmutableSet}. In other words, sets of mutable sets
are not thread-safe.
\subsection{Comparison to the built-in \class{set} types
\label{comparison-to-builtin-set}}
The built-in \class{set} and \class{frozenset} types were designed based
on lessons learned from the \module{sets} module. The key differences are:
\begin{itemize}
\item \class{Set} and \class{ImmutableSet} were renamed to \class{set} and
\class{frozenset}.
\item There is no equivalent to \class{BaseSet}. Instead, use
\code{isinstance(x, (set, frozenset))}.
\item The hash algorithm for the built-ins performs significantly better
(fewer collisions) for most datasets.
\item The built-in versions have more space efficient pickles.
\item The built-in versions do not have a \method{union_update()} method.
Instead, use the \method{update()} method which is equivalent.
\item The built-in versions do not have a \method{_repr(sorted=True)} method.
Instead, use the built-in \function{repr()} and \function{sorted()}
functions: \code{repr(sorted(s))}.
\item The built-in version does not have a protocol for automatic conversion
to immutable. Many found this feature to be confusing and no one
in the community reported having found real uses for it.
\end{itemize}