blob: dc6edffe8dc9d604c142a42d3112075fd227f875 [file] [log] [blame]
Barry Warsawf595fd92001-11-15 23:39:07 +00001\section{\module{pickle} --- Python object serialization}
Fred Drakeb91e9341998-07-23 17:59:49 +00002
Fred Drakeffbe6871999-04-22 21:23:22 +00003\declaremodule{standard}{pickle}
Fred Drakeb91e9341998-07-23 17:59:49 +00004\modulesynopsis{Convert Python objects to streams of bytes and back.}
Fred Drake38e5d272000-04-03 20:13:55 +00005% Substantial improvements by Jim Kerr <jbkerr@sr.hp.com>.
Barry Warsawf595fd92001-11-15 23:39:07 +00006% Rewritten by Barry Warsaw <barry@zope.com>
Fred Drakeb91e9341998-07-23 17:59:49 +00007
Thomas Woutersf8316632000-07-16 19:01:10 +00008\index{persistence}
Guido van Rossumd1883581995-02-15 15:53:08 +00009\indexii{persistent}{objects}
10\indexii{serializing}{objects}
11\indexii{marshalling}{objects}
12\indexii{flattening}{objects}
13\indexii{pickling}{objects}
14
Barry Warsawf595fd92001-11-15 23:39:07 +000015The \module{pickle} module implements a fundamental, but powerful
16algorithm for serializing and de-serializing a Python object
17structure. ``Pickling'' is the process whereby a Python object
18hierarchy is converted into a byte stream, and ``unpickling'' is the
19inverse operation, whereby a byte stream is converted back into an
20object hierarchy. Pickling (and unpickling) is alternatively known as
Fred Drake2744f432001-11-26 21:30:36 +000021``serialization'', ``marshalling,''\footnote{Don't confuse this with
22the \refmodule{marshal} module} or ``flattening'',
Barry Warsawf595fd92001-11-15 23:39:07 +000023however the preferred term used here is ``pickling'' and
24``unpickling'' to avoid confusing.
Guido van Rossum470be141995-03-17 16:07:09 +000025
Barry Warsawf595fd92001-11-15 23:39:07 +000026This documentation describes both the \module{pickle} module and the
Fred Drake2744f432001-11-26 21:30:36 +000027\refmodule{cPickle} module.
Fred Drakeffbe6871999-04-22 21:23:22 +000028
Barry Warsawf595fd92001-11-15 23:39:07 +000029\subsection{Relationship to other Python modules}
Guido van Rossumd1883581995-02-15 15:53:08 +000030
Barry Warsawf595fd92001-11-15 23:39:07 +000031The \module{pickle} module has an optimized cousin called the
32\module{cPickle} module. As its name implies, \module{cPickle} is
33written in C, so it can be up to 1000 times faster than
34\module{pickle}. However it does not support subclassing of the
35\function{Pickler()} and \function{Unpickler()} classes, because in
36\module{cPickle} these are functions, not classes. Most applications
37have no need for this functionality, and can benefit from the improved
38performance of \module{cPickle}. Other than that, the interfaces of
39the two modules are nearly identical; the common interface is
40described in this manual and differences are pointed out where
41necessary. In the following discussions, we use the term ``pickle''
42to collectively describe the \module{pickle} and
43\module{cPickle} modules.
Guido van Rossum736fe5e1997-12-09 20:45:08 +000044
Barry Warsawf595fd92001-11-15 23:39:07 +000045The data streams the two modules produce are guaranteed to be
46interchangeable.
47
48Python has a more primitive serialization module called
Fred Drake2744f432001-11-26 21:30:36 +000049\refmodule{marshal}, but in general
Barry Warsawf595fd92001-11-15 23:39:07 +000050\module{pickle} should always be the preferred way to serialize Python
51objects. \module{marshal} exists primarily to support Python's
52\file{.pyc} files.
53
54The \module{pickle} module differs from \refmodule{marshal} several
55significant ways:
Guido van Rossumd1883581995-02-15 15:53:08 +000056
57\begin{itemize}
58
Barry Warsawf595fd92001-11-15 23:39:07 +000059\item The \module{pickle} module keeps track of the objects it has
60 already serialized, so that later references to the same object
61 won't be serialized again. \module{marshal} doesn't do this.
Guido van Rossumd1883581995-02-15 15:53:08 +000062
Barry Warsawf595fd92001-11-15 23:39:07 +000063 This has implications both for recursive objects and object
64 sharing. Recursive objects are objects that contain references
65 to themselves. These are not handled by marshal, and in fact,
66 attempting to marshal recursive objects will crash your Python
67 interpreter. Object sharing happens when there are multiple
68 references to the same object in different places in the object
69 hierarchy being serialized. \module{pickle} stores such objects
70 only once, and ensures that all other references point to the
71 master copy. Shared objects remain shared, which can be very
72 important for mutable objects.
Guido van Rossumd1883581995-02-15 15:53:08 +000073
Barry Warsawf595fd92001-11-15 23:39:07 +000074\item \module{marshal} cannot be used to serialize user-defined
75 classes and their instances. \module{pickle} can save and
76 restore class instances transparently, however the class
77 definition must be importable and live in the same module as
78 when the object was stored.
79
80\item The \module{marshal} serialization format is not guaranteed to
81 be portable across Python versions. Because its primary job in
82 life is to support \file{.pyc} files, the Python implementers
83 reserve the right to change the serialization format in
84 non-backwards compatible ways should the need arise. The
85 \module{pickle} serialization format is guaranteed to be
86 backwards compatible across Python releases.
87
88\item The \module{pickle} module doesn't handle code objects, which
89 the \module{marshal} module does. This avoids the possibility
90 of smuggling Trojan horses into a program through the
91 \module{pickle} module\footnote{This doesn't necessarily imply
92 that \module{pickle} is inherently secure. See
93 section~\ref{pickle-sec} for a more detailed discussion on
94 \module{pickle} module security. Besides, it's possible that
95 \module{pickle} will eventually support serializing code
96 objects.}.
Guido van Rossumd1883581995-02-15 15:53:08 +000097
98\end{itemize}
99
Barry Warsawf595fd92001-11-15 23:39:07 +0000100Note that serialization is a more primitive notion than persistence;
101although
102\module{pickle} reads and writes file objects, it does not handle the
103issue of naming persistent objects, nor the (even more complicated)
104issue of concurrent access to persistent objects. The \module{pickle}
105module can transform a complex object into a byte stream and it can
106transform the byte stream into an object with the same internal
107structure. Perhaps the most obvious thing to do with these byte
108streams is to write them onto a file, but it is also conceivable to
109send them across a network or store them in a database. The module
110\refmodule{shelve} provides a simple interface
111to pickle and unpickle objects on DBM-style database files.
112
113\subsection{Data stream format}
114
Fred Drake9b28fe21998-04-04 06:20:28 +0000115The data format used by \module{pickle} is Python-specific. This has
Guido van Rossumd1883581995-02-15 15:53:08 +0000116the advantage that there are no restrictions imposed by external
Barry Warsawf595fd92001-11-15 23:39:07 +0000117standards such as XDR\index{XDR}\index{External Data Representation}
118(which can't represent pointer sharing); however it means that
119non-Python programs may not be able to reconstruct pickled Python
120objects.
Guido van Rossumd1883581995-02-15 15:53:08 +0000121
Fred Drake9b28fe21998-04-04 06:20:28 +0000122By default, the \module{pickle} data format uses a printable \ASCII{}
Guido van Rossum736fe5e1997-12-09 20:45:08 +0000123representation. This is slightly more voluminous than a binary
124representation. The big advantage of using printable \ASCII{} (and of
Fred Drake9b28fe21998-04-04 06:20:28 +0000125some other characteristics of \module{pickle}'s representation) is that
Guido van Rossum736fe5e1997-12-09 20:45:08 +0000126for debugging or recovery purposes it is possible for a human to read
127the pickled file with a standard text editor.
128
129A binary format, which is slightly more efficient, can be chosen by
Barry Warsawf595fd92001-11-15 23:39:07 +0000130specifying a true value for the \var{bin} argument to the
Fred Drake9b28fe21998-04-04 06:20:28 +0000131\class{Pickler} constructor or the \function{dump()} and \function{dumps()}
Barry Warsawf595fd92001-11-15 23:39:07 +0000132functions.
Guido van Rossumd1883581995-02-15 15:53:08 +0000133
Barry Warsawf595fd92001-11-15 23:39:07 +0000134\subsection{Usage}
Guido van Rossumd1883581995-02-15 15:53:08 +0000135
Barry Warsawf595fd92001-11-15 23:39:07 +0000136To serialize an object hierarchy, you first create a pickler, then you
137call the pickler's \method{dump()} method. To de-serialize a data
138stream, you first create an unpickler, then you call the unpickler's
139\method{load()} method. The \module{pickle} module provides the
140following functions to make this process more convenient:
Guido van Rossumd1883581995-02-15 15:53:08 +0000141
Barry Warsawf595fd92001-11-15 23:39:07 +0000142\begin{funcdesc}{dump}{object, file\optional{, bin}}
143Write a pickled representation of \var{object} to the open file object
144\var{file}. This is equivalent to
145\code{Pickler(\var{file}, \var{bin}).dump(\var{object})}.
146If the optional \var{bin} argument is true, the binary pickle format
147is used; otherwise the (less efficient) text pickle format is used
148(for backwards compatibility, this is the default).
Guido van Rossumd1883581995-02-15 15:53:08 +0000149
Barry Warsawf595fd92001-11-15 23:39:07 +0000150\var{file} must have a \method{write()} method that accepts a single
151string argument. It can thus be a file object opened for writing, a
152\refmodule{StringIO} object, or any other custom
153object that meets this interface.
154\end{funcdesc}
Guido van Rossumd1883581995-02-15 15:53:08 +0000155
Barry Warsawf595fd92001-11-15 23:39:07 +0000156\begin{funcdesc}{load}{file}
157Read a string from the open file object \var{file} and interpret it as
158a pickle data stream, reconstructing and returning the original object
159hierarchy. This is equivalent to \code{Unpickler(\var{file}).load()}.
Guido van Rossum470be141995-03-17 16:07:09 +0000160
Barry Warsawf595fd92001-11-15 23:39:07 +0000161\var{file} must have two methods, a \method{read()} method that takes
162an integer argument, and a \method{readline()} method that requires no
163arguments. Both methods should return a string. Thus \var{file} can
164be a file object opened for reading, a
165\module{StringIO} object, or any other custom
166object that meets this interface.
Guido van Rossum736fe5e1997-12-09 20:45:08 +0000167
Barry Warsawf595fd92001-11-15 23:39:07 +0000168This function automatically determines whether the data stream was
169written in binary mode or not.
170\end{funcdesc}
Guido van Rossumd1883581995-02-15 15:53:08 +0000171
Barry Warsawf595fd92001-11-15 23:39:07 +0000172\begin{funcdesc}{dumps}{object\optional{, bin}}
173Return the pickled representation of the object as a string, instead
174of writing it to a file. If the optional \var{bin} argument is
175true, the binary pickle format is used; otherwise the (less efficient)
176text pickle format is used (this is the default).
177\end{funcdesc}
Guido van Rossumd1883581995-02-15 15:53:08 +0000178
Barry Warsawf595fd92001-11-15 23:39:07 +0000179\begin{funcdesc}{loads}{string}
180Read a pickled object hierarchy from a string. Characters in the
181string past the pickled object's representation are ignored.
182\end{funcdesc}
Guido van Rossumd1883581995-02-15 15:53:08 +0000183
Barry Warsawf595fd92001-11-15 23:39:07 +0000184The \module{pickle} module also defines three exceptions:
Guido van Rossum470be141995-03-17 16:07:09 +0000185
Barry Warsawf595fd92001-11-15 23:39:07 +0000186\begin{excdesc}{PickleError}
187A common base class for the other exceptions defined below. This
188inherits from \exception{Exception}.
189\end{excdesc}
Guido van Rossum470be141995-03-17 16:07:09 +0000190
Barry Warsawf595fd92001-11-15 23:39:07 +0000191\begin{excdesc}{PicklingError}
192This exception is raised when an unpicklable object is passed to
193the \method{dump()} method.
194\end{excdesc}
Guido van Rossumd1883581995-02-15 15:53:08 +0000195
Barry Warsawf595fd92001-11-15 23:39:07 +0000196\begin{excdesc}{UnpicklingError}
197This exception is raised when there is a problem unpickling an object,
198such as a security violation. Note that other exceptions may also be
199raised during unpickling, including (but not necessarily limited to)
Neil Schemenauer79f18132002-03-22 22:16:03 +0000200\exception{AttributeError}, \exception{EOFError},
201\exception{ImportError}, and \exception{IndexError}.
Barry Warsawf595fd92001-11-15 23:39:07 +0000202\end{excdesc}
203
204The \module{pickle} module also exports two callables\footnote{In the
205\module{pickle} module these callables are classes, which you could
206subclass to customize the behavior. However, in the \module{cPickle}
207modules these callables are factory functions and so cannot be
208subclassed. One of the common reasons to subclass is to control what
209objects can actually be unpickled. See section~\ref{pickle-sec} for
210more details on security concerns.}, \class{Pickler} and
211\class{Unpickler}:
212
213\begin{classdesc}{Pickler}{file\optional{, bin}}
214This takes a file-like object to which it will write a pickle data
215stream. Optional \var{bin} if true, tells the pickler to use the more
216efficient binary pickle format, otherwise the \ASCII{} format is used
217(this is the default).
218
219\var{file} must have a \method{write()} method that accepts a single
220string argument. It can thus be an open file object, a
221\module{StringIO} object, or any other custom
222object that meets this interface.
223\end{classdesc}
224
225\class{Pickler} objects define one (or two) public methods:
226
227\begin{methoddesc}[Pickler]{dump}{object}
228Write a pickled representation of \var{object} to the open file object
229given in the constructor. Either the binary or \ASCII{} format will
230be used, depending on the value of the \var{bin} flag passed to the
231constructor.
232\end{methoddesc}
233
234\begin{methoddesc}[Pickler]{clear_memo}{}
235Clears the pickler's ``memo''. The memo is the data structure that
236remembers which objects the pickler has already seen, so that shared
237or recursive objects pickled by reference and not by value. This
238method is useful when re-using picklers.
239
240\strong{Note:} \method{clear_memo()} is only available on the picklers
241created by \module{cPickle}. In the \module{pickle} module, picklers
242have an instance variable called \member{memo} which is a Python
243dictionary. So to clear the memo for a \module{pickle} module
244pickler, you could do the following:
Guido van Rossumd1883581995-02-15 15:53:08 +0000245
Fred Drake19479911998-02-13 06:58:54 +0000246\begin{verbatim}
Barry Warsawf595fd92001-11-15 23:39:07 +0000247mypickler.memo.clear()
Fred Drake19479911998-02-13 06:58:54 +0000248\end{verbatim}
Barry Warsawf595fd92001-11-15 23:39:07 +0000249\end{methoddesc}
Fred Drake9b28fe21998-04-04 06:20:28 +0000250
Barry Warsawf595fd92001-11-15 23:39:07 +0000251It is possible to make multiple calls to the \method{dump()} method of
252the same \class{Pickler} instance. These must then be matched to the
253same number of calls to the \method{load()} method of the
254corresponding \class{Unpickler} instance. If the same object is
255pickled by multiple \method{dump()} calls, the \method{load()} will
256all yield references to the same object\footnote{\emph{Warning}: this
257is intended for pickling multiple objects without intervening
258modifications to the objects or their parts. If you modify an object
259and then pickle it again using the same \class{Pickler} instance, the
260object is not pickled again --- a reference to it is pickled and the
261\class{Unpickler} will return the old value, not the modified one.
262There are two problems here: (1) detecting changes, and (2)
263marshalling a minimal set of changes. Garbage Collection may also
264become a problem here.}.
Guido van Rossum470be141995-03-17 16:07:09 +0000265
Barry Warsawf595fd92001-11-15 23:39:07 +0000266\class{Unpickler} objects are defined as:
Fred Drake9b28fe21998-04-04 06:20:28 +0000267
Barry Warsawf595fd92001-11-15 23:39:07 +0000268\begin{classdesc}{Unpickler}{file}
269This takes a file-like object from which it will read a pickle data
270stream. This class automatically determines whether the data stream
271was written in binary mode or not, so it does not need a flag as in
272the \class{Pickler} factory.
Guido van Rossumd1883581995-02-15 15:53:08 +0000273
Barry Warsawf595fd92001-11-15 23:39:07 +0000274\var{file} must have two methods, a \method{read()} method that takes
275an integer argument, and a \method{readline()} method that requires no
276arguments. Both methods should return a string. Thus \var{file} can
277be a file object opened for reading, a
278\module{StringIO} object, or any other custom
279object that meets this interface.
280\end{classdesc}
Fred Drake9b28fe21998-04-04 06:20:28 +0000281
Barry Warsawf595fd92001-11-15 23:39:07 +0000282\class{Unpickler} objects have one (or two) public methods:
Guido van Rossum470be141995-03-17 16:07:09 +0000283
Barry Warsawf595fd92001-11-15 23:39:07 +0000284\begin{methoddesc}[Unpickler]{load}{}
285Read a pickled object representation from the open file object given
286in the constructor, and return the reconstituted object hierarchy
287specified therein.
288\end{methoddesc}
Fred Drake9b28fe21998-04-04 06:20:28 +0000289
Barry Warsawf595fd92001-11-15 23:39:07 +0000290\begin{methoddesc}[Unpickler]{noload}{}
291This is just like \method{load()} except that it doesn't actually
292create any objects. This is useful primarily for finding what's
293called ``persistent ids'' that may be referenced in a pickle data
294stream. See section~\ref{pickle-protocol} below for more details.
Guido van Rossumd1883581995-02-15 15:53:08 +0000295
Barry Warsawf595fd92001-11-15 23:39:07 +0000296\strong{Note:} the \method{noload()} method is currently only
297available on \class{Unpickler} objects created with the
298\module{cPickle} module. \module{pickle} module \class{Unpickler}s do
299not have the \method{noload()} method.
300\end{methoddesc}
301
302\subsection{What can be pickled and unpickled?}
Guido van Rossum736fe5e1997-12-09 20:45:08 +0000303
Guido van Rossumd1883581995-02-15 15:53:08 +0000304The following types can be pickled:
Fred Drake41796911999-07-02 14:25:37 +0000305
Guido van Rossumd1883581995-02-15 15:53:08 +0000306\begin{itemize}
307
308\item \code{None}
309
Barry Warsawf595fd92001-11-15 23:39:07 +0000310\item integers, long integers, floating point numbers, complex numbers
Guido van Rossumd1883581995-02-15 15:53:08 +0000311
Fred Drake56ced2a2000-04-06 15:04:30 +0000312\item normal and Unicode strings
Guido van Rossumd1883581995-02-15 15:53:08 +0000313
Barry Warsawf595fd92001-11-15 23:39:07 +0000314\item tuples, lists, and dictionaries containing only picklable objects
Guido van Rossumd1883581995-02-15 15:53:08 +0000315
Barry Warsawf595fd92001-11-15 23:39:07 +0000316\item functions defined at the top level of a module
Fred Drake38e5d272000-04-03 20:13:55 +0000317
Barry Warsawf595fd92001-11-15 23:39:07 +0000318\item built-in functions defined at the top level of a module
Fred Drake38e5d272000-04-03 20:13:55 +0000319
Barry Warsawf595fd92001-11-15 23:39:07 +0000320\item classes that are defined at the top level of a module
Guido van Rossum470be141995-03-17 16:07:09 +0000321
Fred Drake9b28fe21998-04-04 06:20:28 +0000322\item instances of such classes whose \member{__dict__} or
Barry Warsawf595fd92001-11-15 23:39:07 +0000323\method{__setstate__()} is picklable (see
324section~\ref{pickle-protocol} for details)
Guido van Rossumd1883581995-02-15 15:53:08 +0000325
326\end{itemize}
327
Guido van Rossum470be141995-03-17 16:07:09 +0000328Attempts to pickle unpicklable objects will raise the
Fred Drake9b28fe21998-04-04 06:20:28 +0000329\exception{PicklingError} exception; when this happens, an unspecified
Barry Warsawf595fd92001-11-15 23:39:07 +0000330number of bytes may have already been written to the underlying file.
Guido van Rossumd1883581995-02-15 15:53:08 +0000331
Barry Warsawf595fd92001-11-15 23:39:07 +0000332Note that functions (built-in and user-defined) are pickled by ``fully
333qualified'' name reference, not by value. This means that only the
334function name is pickled, along with the name of module the function
335is defined in. Neither the function's code, nor any of its function
336attributes are pickled. Thus the defining module must be importable
337in the unpickling environment, and the module must contain the named
338object, otherwise an exception will be raised\footnote{The exception
339raised will likely be an \exception{ImportError} or an
340\exception{AttributeError} but it could be something else.}.
Guido van Rossum470be141995-03-17 16:07:09 +0000341
Barry Warsawf595fd92001-11-15 23:39:07 +0000342Similarly, classes are pickled by named reference, so the same
343restrictions in the unpickling environment apply. Note that none of
344the class's code or data is pickled, so in the following example the
345class attribute \code{attr} is not restored in the unpickling
346environment:
Guido van Rossum470be141995-03-17 16:07:09 +0000347
Barry Warsawf595fd92001-11-15 23:39:07 +0000348\begin{verbatim}
349class Foo:
350 attr = 'a class attr'
Guido van Rossum470be141995-03-17 16:07:09 +0000351
Barry Warsawf595fd92001-11-15 23:39:07 +0000352picklestring = pickle.dumps(Foo)
353\end{verbatim}
Guido van Rossum470be141995-03-17 16:07:09 +0000354
Barry Warsawf595fd92001-11-15 23:39:07 +0000355These restrictions are why picklable functions and classes must be
356defined in the top level of a module.
Guido van Rossum470be141995-03-17 16:07:09 +0000357
Barry Warsawf595fd92001-11-15 23:39:07 +0000358Similarly, when class instances are pickled, their class's code and
359data are not pickled along with them. Only the instance data are
360pickled. This is done on purpose, so you can fix bugs in a class or
361add methods to the class and still load objects that were created with
362an earlier version of the class. If you plan to have long-lived
363objects that will see many versions of a class, it may be worthwhile
364to put a version number in the objects so that suitable conversions
365can be made by the class's \method{__setstate__()} method.
Guido van Rossum470be141995-03-17 16:07:09 +0000366
Barry Warsawf595fd92001-11-15 23:39:07 +0000367\subsection{The pickle protocol
368\label{pickle-protocol}}\setindexsubitem{(pickle protocol)}
Fred Drake40748961998-03-06 21:27:14 +0000369
Barry Warsawf595fd92001-11-15 23:39:07 +0000370This section describes the ``pickling protocol'' that defines the
371interface between the pickler/unpickler and the objects that are being
372serialized. This protocol provides a standard way for you to define,
373customize, and control how your objects are serialized and
374de-serialized. The description in this section doesn't cover specific
375customizations that you can employ to make the unpickling environment
376safer from untrusted pickle data streams; see section~\ref{pickle-sec}
377for more details.
Fred Drake40748961998-03-06 21:27:14 +0000378
Barry Warsawf595fd92001-11-15 23:39:07 +0000379\subsubsection{Pickling and unpickling normal class
380 instances\label{pickle-inst}}
Fred Drake9b28fe21998-04-04 06:20:28 +0000381
Barry Warsawf595fd92001-11-15 23:39:07 +0000382When a pickled class instance is unpickled, its \method{__init__()}
383method is normally \emph{not} invoked. If it is desirable that the
384\method{__init__()} method be called on unpickling, a class can define
385a method \method{__getinitargs__()}, which should return a
386\emph{tuple} containing the arguments to be passed to the class
387constructor (i.e. \method{__init__()}). The
388\method{__getinitargs__()} method is called at
389pickle time; the tuple it returns is incorporated in the pickle for
390the instance.
391\withsubitem{(copy protocol)}{\ttindex{__getinitargs__()}}
392\withsubitem{(instance constructor)}{\ttindex{__init__()}}
Fred Drake17e56401998-04-11 20:43:51 +0000393
Barry Warsawf595fd92001-11-15 23:39:07 +0000394\withsubitem{(copy protocol)}{
395 \ttindex{__getstate__()}\ttindex{__setstate__()}}
396\withsubitem{(instance attribute)}{
397 \ttindex{__dict__}}
Fred Drake17e56401998-04-11 20:43:51 +0000398
Barry Warsawf595fd92001-11-15 23:39:07 +0000399Classes can further influence how their instances are pickled; if the
400class defines the method \method{__getstate__()}, it is called and the
401return state is pickled as the contents for the instance, instead of
402the contents of the instance's dictionary. If there is no
403\method{__getstate__()} method, the instance's \member{__dict__} is
404pickled.
Fred Drake9463de21998-04-11 20:05:43 +0000405
Barry Warsawf595fd92001-11-15 23:39:07 +0000406Upon unpickling, if the class also defines the method
407\method{__setstate__()}, it is called with the unpickled
408state\footnote{These methods can also be used to implement copying
409class instances.}. If there is no \method{__setstate__()} method, the
410pickled object must be a dictionary and its items are assigned to the
411new instance's dictionary. If a class defines both
412\method{__getstate__()} and \method{__setstate__()}, the state object
413needn't be a dictionary and these methods can do what they
414want\footnote{This protocol is also used by the shallow and deep
415copying operations defined in the
416\refmodule{copy} module.}.
417
418\subsubsection{Pickling and unpickling extension types}
419
420When the \class{Pickler} encounters an object of a type it knows
421nothing about --- such as an extension type --- it looks in two places
422for a hint of how to pickle it. One alternative is for the object to
423implement a \method{__reduce__()} method. If provided, at pickling
424time \method{__reduce__()} will be called with no arguments, and it
425must return either a string or a tuple.
426
427If a string is returned, it names a global variable whose contents are
428pickled as normal. When a tuple is returned, it must be of length two
429or three, with the following semantics:
430
431\begin{itemize}
432
433\item A callable object, which in the unpickling environment must be
434 either a class, a callable registered as a ``safe constructor''
435 (see below), or it must have an attribute
436 \member{__safe_for_unpickling__} with a true value. Otherwise,
437 an \exception{UnpicklingError} will be raised in the unpickling
438 environment. Note that as usual, the callable itself is pickled
439 by name.
440
441\item A tuple of arguments for the callable object, or \code{None}.
442
443\item Optionally, the object's state, which will be passed to
444 the object's \method{__setstate__()} method as described in
445 section~\ref{pickle-inst}. If the object has no
446 \method{__setstate__()} method, then, as above, the value must
447 be a dictionary and it will be added to the object's
448 \member{__dict__}.
449
450\end{itemize}
451
452Upon unpickling, the callable will be called (provided that it meets
453the above criteria), passing in the tuple of arguments; it should
454return the unpickled object. If the second item was \code{None}, then
455instead of calling the callable directly, its \method{__basicnew__()}
456method is called without arguments. It should also return the
457unpickled object.
458
459An alternative to implementing a \method{__reduce__()} method on the
460object to be pickled, is to register the callable with the
Fred Drake2744f432001-11-26 21:30:36 +0000461\refmodule[copyreg]{copy_reg} module. This module provides a way
Barry Warsawf595fd92001-11-15 23:39:07 +0000462for programs to register ``reduction functions'' and constructors for
463user-defined types. Reduction functions have the same semantics and
464interface as the \method{__reduce__()} method described above, except
465that they are called with a single argument, the object to be pickled.
466
467The registered constructor is deemed a ``safe constructor'' for purposes
468of unpickling as described above.
469
470\subsubsection{Pickling and unpickling external objects}
471
472For the benefit of object persistence, the \module{pickle} module
473supports the notion of a reference to an object outside the pickled
474data stream. Such objects are referenced by a ``persistent id'',
475which is just an arbitrary string of printable \ASCII{} characters.
476The resolution of such names is not defined by the \module{pickle}
477module; it will delegate this resolution to user defined functions on
478the pickler and unpickler\footnote{The actual mechanism for
479associating these user defined functions is slightly different for
480\module{pickle} and \module{cPickle}. The description given here
481works the same for both implementations. Users of the \module{pickle}
482module could also use subclassing to effect the same results,
483overriding the \method{persistent_id()} and \method{persistent_load()}
484methods in the derived classes.}.
485
486To define external persistent id resolution, you need to set the
487\member{persistent_id} attribute of the pickler object and the
488\member{persistent_load} attribute of the unpickler object.
489
490To pickle objects that have an external persistent id, the pickler
491must have a custom \function{persistent_id()} method that takes an
492object as an argument and returns either \code{None} or the persistent
493id for that object. When \code{None} is returned, the pickler simply
494pickles the object as normal. When a persistent id string is
495returned, the pickler will pickle that string, along with a marker
496so that the unpickler will recognize the string as a persistent id.
497
498To unpickle external objects, the unpickler must have a custom
499\function{persistent_load()} function that takes a persistent id
500string and returns the referenced object.
501
502Here's a silly example that \emph{might} shed more light:
503
504\begin{verbatim}
505import pickle
506from cStringIO import StringIO
507
508src = StringIO()
509p = pickle.Pickler(src)
510
511def persistent_id(obj):
512 if hasattr(obj, 'x'):
513 return 'the value %d' % obj.x
514 else:
515 return None
516
517p.persistent_id = persistent_id
518
519class Integer:
520 def __init__(self, x):
521 self.x = x
522 def __str__(self):
523 return 'My name is integer %d' % self.x
524
525i = Integer(7)
526print i
527p.dump(i)
528
529datastream = src.getvalue()
530print repr(datastream)
531dst = StringIO(datastream)
532
533up = pickle.Unpickler(dst)
534
535class FancyInteger(Integer):
536 def __str__(self):
537 return 'I am the integer %d' % self.x
538
539def persistent_load(persid):
540 if persid.startswith('the value '):
541 value = int(persid.split()[2])
542 return FancyInteger(value)
543 else:
544 raise pickle.UnpicklingError, 'Invalid persistent id'
545
546up.persistent_load = persistent_load
547
548j = up.load()
549print j
550\end{verbatim}
551
552In the \module{cPickle} module, the unpickler's
553\member{persistent_load} attribute can also be set to a Python
554list, in which case, when the unpickler reaches a persistent id, the
555persistent id string will simply be appended to this list. This
556functionality exists so that a pickle data stream can be ``sniffed''
557for object references without actually instantiating all the objects
558in a pickle\footnote{We'll leave you with the image of Guido and Jim
559sitting around sniffing pickles in their living rooms.}. Setting
560\member{persistent_load} to a list is usually used in conjunction with
561the \method{noload()} method on the Unpickler.
562
563% BAW: Both pickle and cPickle support something called
564% inst_persistent_id() which appears to give unknown types a second
565% shot at producing a persistent id. Since Jim Fulton can't remember
566% why it was added or what it's for, I'm leaving it undocumented.
567
568\subsection{Security \label{pickle-sec}}
569
570Most of the security issues surrounding the \module{pickle} and
571\module{cPickle} module involve unpickling. There are no known
572security vulnerabilities
573related to pickling because you (the programmer) control the objects
574that \module{pickle} will interact with, and all it produces is a
575string.
576
577However, for unpickling, it is \strong{never} a good idea to unpickle
578an untrusted string whose origins are dubious, for example, strings
579read from a socket. This is because unpickling can create unexpected
580objects and even potentially run methods of those objects, such as
581their class constructor or destructor\footnote{A special note of
582caution is worth raising about the \refmodule{Cookie}
583module. By default, the \class{Cookie.Cookie} class is an alias for
584the \class{Cookie.SmartCookie} class, which ``helpfully'' attempts to
585unpickle any cookie data string it is passed. This is a huge security
586hole because cookie data typically comes from an untrusted source.
587You should either explicitly use the \class{Cookie.SimpleCookie} class
588--- which doesn't attempt to unpickle its string --- or you should
589implement the defensive programming steps described later on in this
590section.}.
591
592You can defend against this by customizing your unpickler so that you
593can control exactly what gets unpickled and what gets called.
594Unfortunately, exactly how you do this is different depending on
595whether you're using \module{pickle} or \module{cPickle}.
596
597One common feature that both modules implement is the
598\member{__safe_for_unpickling__} attribute. Before calling a callable
599which is not a class, the unpickler will check to make sure that the
600callable has either been registered as a safe callable via the
Fred Drake2744f432001-11-26 21:30:36 +0000601\refmodule[copyreg]{copy_reg} module, or that it has an
Barry Warsawf595fd92001-11-15 23:39:07 +0000602attribute \member{__safe_for_unpickling__} with a true value. This
603prevents the unpickling environment from being tricked into doing
604evil things like call \code{os.unlink()} with an arbitrary file name.
605See section~\ref{pickle-protocol} for more details.
606
607For safely unpickling class instances, you need to control exactly
Barry Warsaw69ab5832001-11-18 16:24:01 +0000608which classes will get created. Be aware that a class's constructor
609could be called (if the pickler found a \method{__getinitargs__()}
610method) and the the class's destructor (i.e. its \method{__del__()} method)
611might get called when the object is garbage collected. Depending on
612the class, it isn't very heard to trick either method into doing bad
613things, such as removing a file. The way to
Barry Warsawf595fd92001-11-15 23:39:07 +0000614control the classes that are safe to instantiate differs in
615\module{pickle} and \module{cPickle}\footnote{A word of caution: the
616mechanisms described here use internal attributes and methods, which
617are subject to change in future versions of Python. We intend to
618someday provide a common interface for controlling this behavior,
619which will work in either \module{pickle} or \module{cPickle}.}.
620
621In the \module{pickle} module, you need to derive a subclass from
622\class{Unpickler}, overriding the \method{load_global()}
623method. \method{load_global()} should read two lines from the pickle
624data stream where the first line will the the name of the module
625containing the class and the second line will be the name of the
626instance's class. It then look up the class, possibly importing the
627module and digging out the attribute, then it appends what it finds to
628the unpickler's stack. Later on, this class will be assigned to the
629\member{__class__} attribute of an empty class, as a way of magically
630creating an instance without calling its class's \method{__init__()}.
631You job (should you choose to accept it), would be to have
632\method{load_global()} push onto the unpickler's stack, a known safe
633version of any class you deem safe to unpickle. It is up to you to
634produce such a class. Or you could raise an error if you want to
635disallow all unpickling of instances. If this sounds like a hack,
636you're right. UTSL.
637
638Things are a little cleaner with \module{cPickle}, but not by much.
639To control what gets unpickled, you can set the unpickler's
640\member{find_global} attribute to a function or \code{None}. If it is
641\code{None} then any attempts to unpickle instances will raise an
642\exception{UnpicklingError}. If it is a function,
643then it should accept a module name and a class name, and return the
644corresponding class object. It is responsible for looking up the
645class, again performing any necessary imports, and it may raise an
646error to prevent instances of the class from being unpickled.
647
648The moral of the story is that you should be really careful about the
649source of the strings your application unpickles.
Fred Drake9463de21998-04-11 20:05:43 +0000650
Fred Drake38e5d272000-04-03 20:13:55 +0000651\subsection{Example \label{pickle-example}}
652
653Here's a simple example of how to modify pickling behavior for a
654class. The \class{TextReader} class opens a text file, and returns
655the line number and line contents each time its \method{readline()}
656method is called. If a \class{TextReader} instance is pickled, all
657attributes \emph{except} the file object member are saved. When the
658instance is unpickled, the file is reopened, and reading resumes from
659the last location. The \method{__setstate__()} and
660\method{__getstate__()} methods are used to implement this behavior.
661
662\begin{verbatim}
Fred Drake38e5d272000-04-03 20:13:55 +0000663class TextReader:
Fred Drakec8252802001-09-25 16:29:17 +0000664 """Print and number lines in a text file."""
665 def __init__(self, file):
Fred Drake38e5d272000-04-03 20:13:55 +0000666 self.file = file
Fred Drakec8252802001-09-25 16:29:17 +0000667 self.fh = open(file)
Fred Drake38e5d272000-04-03 20:13:55 +0000668 self.lineno = 0
669
670 def readline(self):
671 self.lineno = self.lineno + 1
672 line = self.fh.readline()
673 if not line:
674 return None
Fred Drakec8252802001-09-25 16:29:17 +0000675 if line.endswith("\n"):
676 line = line[:-1]
677 return "%d: %s" % (self.lineno, line)
Fred Drake38e5d272000-04-03 20:13:55 +0000678
Fred Drake38e5d272000-04-03 20:13:55 +0000679 def __getstate__(self):
Fred Drakec8252802001-09-25 16:29:17 +0000680 odict = self.__dict__.copy() # copy the dict since we change it
681 del odict['fh'] # remove filehandle entry
Fred Drake38e5d272000-04-03 20:13:55 +0000682 return odict
683
Fred Drake38e5d272000-04-03 20:13:55 +0000684 def __setstate__(self,dict):
Fred Drakec8252802001-09-25 16:29:17 +0000685 fh = open(dict['file']) # reopen file
686 count = dict['lineno'] # read from file...
687 while count: # until line count is restored
Fred Drake38e5d272000-04-03 20:13:55 +0000688 fh.readline()
689 count = count - 1
Fred Drakec8252802001-09-25 16:29:17 +0000690 self.__dict__.update(dict) # update attributes
691 self.fh = fh # save the file object
Fred Drake38e5d272000-04-03 20:13:55 +0000692\end{verbatim}
693
694A sample usage might be something like this:
695
696\begin{verbatim}
697>>> import TextReader
698>>> obj = TextReader.TextReader("TextReader.py")
699>>> obj.readline()
700'1: #!/usr/local/bin/python'
701>>> # (more invocations of obj.readline() here)
702... obj.readline()
703'7: class TextReader:'
704>>> import pickle
705>>> pickle.dump(obj,open('save.p','w'))
Fred Drakec8252802001-09-25 16:29:17 +0000706\end{verbatim}
Fred Drake38e5d272000-04-03 20:13:55 +0000707
Fred Drakec8252802001-09-25 16:29:17 +0000708If you want to see that \refmodule{pickle} works across Python
709processes, start another Python session, before continuing. What
710follows can happen from either the same process or a new process.
Fred Drake38e5d272000-04-03 20:13:55 +0000711
Fred Drakec8252802001-09-25 16:29:17 +0000712\begin{verbatim}
Fred Drake38e5d272000-04-03 20:13:55 +0000713>>> import pickle
714>>> reader = pickle.load(open('save.p'))
715>>> reader.readline()
716'8: "Print and number lines in a text file."'
717\end{verbatim}
718
719
Barry Warsawf595fd92001-11-15 23:39:07 +0000720\begin{seealso}
721 \seemodule[copyreg]{copy_reg}{Pickle interface constructor
722 registration for extension types.}
723
724 \seemodule{shelve}{Indexed databases of objects; uses \module{pickle}.}
725
726 \seemodule{copy}{Shallow and deep object copying.}
727
728 \seemodule{marshal}{High-performance serialization of built-in types.}
729\end{seealso}
730
731
732\section{\module{cPickle} --- A faster \module{pickle}}
Fred Drakeffbe6871999-04-22 21:23:22 +0000733
Fred Drakeb91e9341998-07-23 17:59:49 +0000734\declaremodule{builtin}{cPickle}
Fred Drake38e5d272000-04-03 20:13:55 +0000735\modulesynopsis{Faster version of \refmodule{pickle}, but not subclassable.}
Fred Drakeffbe6871999-04-22 21:23:22 +0000736\moduleauthor{Jim Fulton}{jfulton@digicool.com}
737\sectionauthor{Fred L. Drake, Jr.}{fdrake@acm.org}
Fred Drakeb91e9341998-07-23 17:59:49 +0000738
Barry Warsawf595fd92001-11-15 23:39:07 +0000739The \module{cPickle} module supports serialization and
740de-serialization of Python objects, providing an interface and
741functionality nearly identical to the
742\refmodule{pickle}\refstmodindex{pickle} module. There are several
743differences, the most important being performance and subclassability.
Fred Drake9463de21998-04-11 20:05:43 +0000744
Barry Warsawf595fd92001-11-15 23:39:07 +0000745First, \module{cPickle} can be up to 1000 times faster than
746\module{pickle} because the former is implemented in C. Second, in
747the \module{cPickle} module the callables \function{Pickler()} and
748\function{Unpickler()} are functions, not classes. This means that
749you cannot use them to derive custom pickling and unpickling
750subclasses. Most applications have no need for this functionality and
751should benefit from the greatly improved performance of the
752\module{cPickle} module.
Fred Drake9463de21998-04-11 20:05:43 +0000753
Barry Warsawf595fd92001-11-15 23:39:07 +0000754The pickle data stream produced by \module{pickle} and
755\module{cPickle} are identical, so it is possible to use
756\module{pickle} and \module{cPickle} interchangeably with existing
757pickles\footnote{Since the pickle data format is actually a tiny
758stack-oriented programming language, and some freedom is taken in the
759encodings of certain objects, it is possible that the two modules
760produce different data streams for the same input objects. However it
761is guaranteed that they will always be able to read each other's
762data streams.}.
Guido van Rossumcf3ce921999-01-06 23:34:39 +0000763
Barry Warsawf595fd92001-11-15 23:39:07 +0000764There are additional minor differences in API between \module{cPickle}
765and \module{pickle}, however for most applications, they are
766interchangable. More documentation is provided in the
767\module{pickle} module documentation, which
768includes a list of the documented differences.
769
770