blob: 92a79897cebf91d7a720b4ffe406454ca683c386 [file] [log] [blame]
Barry Warsawf595fd92001-11-15 23:39:07 +00001\section{\module{pickle} --- Python object serialization}
Fred Drakeb91e9341998-07-23 17:59:49 +00002
Fred Drakeffbe6871999-04-22 21:23:22 +00003\declaremodule{standard}{pickle}
Fred Drakeb91e9341998-07-23 17:59:49 +00004\modulesynopsis{Convert Python objects to streams of bytes and back.}
Fred Drake38e5d272000-04-03 20:13:55 +00005% Substantial improvements by Jim Kerr <jbkerr@sr.hp.com>.
Barry Warsawf595fd92001-11-15 23:39:07 +00006% Rewritten by Barry Warsaw <barry@zope.com>
Fred Drakeb91e9341998-07-23 17:59:49 +00007
Thomas Woutersf8316632000-07-16 19:01:10 +00008\index{persistence}
Guido van Rossumd1883581995-02-15 15:53:08 +00009\indexii{persistent}{objects}
10\indexii{serializing}{objects}
11\indexii{marshalling}{objects}
12\indexii{flattening}{objects}
13\indexii{pickling}{objects}
14
Barry Warsawf595fd92001-11-15 23:39:07 +000015The \module{pickle} module implements a fundamental, but powerful
16algorithm for serializing and de-serializing a Python object
17structure. ``Pickling'' is the process whereby a Python object
18hierarchy is converted into a byte stream, and ``unpickling'' is the
19inverse operation, whereby a byte stream is converted back into an
20object hierarchy. Pickling (and unpickling) is alternatively known as
Fred Drake2744f432001-11-26 21:30:36 +000021``serialization'', ``marshalling,''\footnote{Don't confuse this with
22the \refmodule{marshal} module} or ``flattening'',
Barry Warsawf595fd92001-11-15 23:39:07 +000023however the preferred term used here is ``pickling'' and
24``unpickling'' to avoid confusing.
Guido van Rossum470be141995-03-17 16:07:09 +000025
Barry Warsawf595fd92001-11-15 23:39:07 +000026This documentation describes both the \module{pickle} module and the
Fred Drake2744f432001-11-26 21:30:36 +000027\refmodule{cPickle} module.
Fred Drakeffbe6871999-04-22 21:23:22 +000028
Barry Warsawf595fd92001-11-15 23:39:07 +000029\subsection{Relationship to other Python modules}
Guido van Rossumd1883581995-02-15 15:53:08 +000030
Barry Warsawf595fd92001-11-15 23:39:07 +000031The \module{pickle} module has an optimized cousin called the
32\module{cPickle} module. As its name implies, \module{cPickle} is
33written in C, so it can be up to 1000 times faster than
34\module{pickle}. However it does not support subclassing of the
35\function{Pickler()} and \function{Unpickler()} classes, because in
36\module{cPickle} these are functions, not classes. Most applications
37have no need for this functionality, and can benefit from the improved
38performance of \module{cPickle}. Other than that, the interfaces of
39the two modules are nearly identical; the common interface is
40described in this manual and differences are pointed out where
41necessary. In the following discussions, we use the term ``pickle''
42to collectively describe the \module{pickle} and
43\module{cPickle} modules.
Guido van Rossum736fe5e1997-12-09 20:45:08 +000044
Barry Warsawf595fd92001-11-15 23:39:07 +000045The data streams the two modules produce are guaranteed to be
46interchangeable.
47
48Python has a more primitive serialization module called
Fred Drake2744f432001-11-26 21:30:36 +000049\refmodule{marshal}, but in general
Barry Warsawf595fd92001-11-15 23:39:07 +000050\module{pickle} should always be the preferred way to serialize Python
51objects. \module{marshal} exists primarily to support Python's
52\file{.pyc} files.
53
54The \module{pickle} module differs from \refmodule{marshal} several
55significant ways:
Guido van Rossumd1883581995-02-15 15:53:08 +000056
57\begin{itemize}
58
Barry Warsawf595fd92001-11-15 23:39:07 +000059\item The \module{pickle} module keeps track of the objects it has
60 already serialized, so that later references to the same object
61 won't be serialized again. \module{marshal} doesn't do this.
Guido van Rossumd1883581995-02-15 15:53:08 +000062
Barry Warsawf595fd92001-11-15 23:39:07 +000063 This has implications both for recursive objects and object
64 sharing. Recursive objects are objects that contain references
65 to themselves. These are not handled by marshal, and in fact,
66 attempting to marshal recursive objects will crash your Python
67 interpreter. Object sharing happens when there are multiple
68 references to the same object in different places in the object
69 hierarchy being serialized. \module{pickle} stores such objects
70 only once, and ensures that all other references point to the
71 master copy. Shared objects remain shared, which can be very
72 important for mutable objects.
Guido van Rossumd1883581995-02-15 15:53:08 +000073
Barry Warsawf595fd92001-11-15 23:39:07 +000074\item \module{marshal} cannot be used to serialize user-defined
75 classes and their instances. \module{pickle} can save and
76 restore class instances transparently, however the class
77 definition must be importable and live in the same module as
78 when the object was stored.
79
80\item The \module{marshal} serialization format is not guaranteed to
81 be portable across Python versions. Because its primary job in
82 life is to support \file{.pyc} files, the Python implementers
83 reserve the right to change the serialization format in
84 non-backwards compatible ways should the need arise. The
85 \module{pickle} serialization format is guaranteed to be
86 backwards compatible across Python releases.
87
88\item The \module{pickle} module doesn't handle code objects, which
89 the \module{marshal} module does. This avoids the possibility
90 of smuggling Trojan horses into a program through the
91 \module{pickle} module\footnote{This doesn't necessarily imply
92 that \module{pickle} is inherently secure. See
93 section~\ref{pickle-sec} for a more detailed discussion on
94 \module{pickle} module security. Besides, it's possible that
95 \module{pickle} will eventually support serializing code
96 objects.}.
Guido van Rossumd1883581995-02-15 15:53:08 +000097
98\end{itemize}
99
Barry Warsawf595fd92001-11-15 23:39:07 +0000100Note that serialization is a more primitive notion than persistence;
101although
102\module{pickle} reads and writes file objects, it does not handle the
103issue of naming persistent objects, nor the (even more complicated)
104issue of concurrent access to persistent objects. The \module{pickle}
105module can transform a complex object into a byte stream and it can
106transform the byte stream into an object with the same internal
107structure. Perhaps the most obvious thing to do with these byte
108streams is to write them onto a file, but it is also conceivable to
109send them across a network or store them in a database. The module
110\refmodule{shelve} provides a simple interface
111to pickle and unpickle objects on DBM-style database files.
112
113\subsection{Data stream format}
114
Fred Drake9b28fe21998-04-04 06:20:28 +0000115The data format used by \module{pickle} is Python-specific. This has
Guido van Rossumd1883581995-02-15 15:53:08 +0000116the advantage that there are no restrictions imposed by external
Barry Warsawf595fd92001-11-15 23:39:07 +0000117standards such as XDR\index{XDR}\index{External Data Representation}
118(which can't represent pointer sharing); however it means that
119non-Python programs may not be able to reconstruct pickled Python
120objects.
Guido van Rossumd1883581995-02-15 15:53:08 +0000121
Fred Drake9b28fe21998-04-04 06:20:28 +0000122By default, the \module{pickle} data format uses a printable \ASCII{}
Guido van Rossum736fe5e1997-12-09 20:45:08 +0000123representation. This is slightly more voluminous than a binary
124representation. The big advantage of using printable \ASCII{} (and of
Fred Drake9b28fe21998-04-04 06:20:28 +0000125some other characteristics of \module{pickle}'s representation) is that
Guido van Rossum736fe5e1997-12-09 20:45:08 +0000126for debugging or recovery purposes it is possible for a human to read
127the pickled file with a standard text editor.
128
Neal Norwitz12d31e22003-02-13 03:12:48 +0000129There are currently 3 different protocols which can be used for pickling.
130
131\begin{itemize}
132
133\item Protocol version 0 is the original ASCII protocol and is backwards
134compatible with earlier versions of Python.
135
136\item Protocol version 1 is the old binary format which is also compatible
137with earlier versions of Python.
138
139\item Protocol version 2 was introduced in Python 2.3. It provides
140much more efficient pickling of new-style classes.
141
142\end{itemize}
143
144Refer to PEP 307 for more information.
145
146If a \var{protocol} is not specified, protocol 0 is used.
Neal Norwitzd08baa92003-02-21 00:26:33 +0000147If \var{protocol} is specified as a negative value
148or \constant{HIGHEST_PROTOCOL},
149the highest protocol version available will be used.
Neal Norwitz12d31e22003-02-13 03:12:48 +0000150
151\versionchanged[The \var{bin} parameter is deprecated and only provided
152for backwards compatibility. You should use the \var{protocol}
153parameter instead]{2.3}
154
Guido van Rossum736fe5e1997-12-09 20:45:08 +0000155A binary format, which is slightly more efficient, can be chosen by
Barry Warsawf595fd92001-11-15 23:39:07 +0000156specifying a true value for the \var{bin} argument to the
Fred Drake9b28fe21998-04-04 06:20:28 +0000157\class{Pickler} constructor or the \function{dump()} and \function{dumps()}
Neal Norwitz12d31e22003-02-13 03:12:48 +0000158functions. A \var{protocol} version >= 1 implies use of a binary format.
Guido van Rossumd1883581995-02-15 15:53:08 +0000159
Barry Warsawf595fd92001-11-15 23:39:07 +0000160\subsection{Usage}
Guido van Rossumd1883581995-02-15 15:53:08 +0000161
Barry Warsawf595fd92001-11-15 23:39:07 +0000162To serialize an object hierarchy, you first create a pickler, then you
163call the pickler's \method{dump()} method. To de-serialize a data
164stream, you first create an unpickler, then you call the unpickler's
165\method{load()} method. The \module{pickle} module provides the
Neal Norwitzd08baa92003-02-21 00:26:33 +0000166following constant:
167
168\begin{datadesc}{HIGHEST_PROTOCOL}
169The highest protocol version available. This value can be passed
170as a \var{protocol} value.
171\end{datadesc}
172
173The \module{pickle} module provides the
Barry Warsawf595fd92001-11-15 23:39:07 +0000174following functions to make this process more convenient:
Guido van Rossumd1883581995-02-15 15:53:08 +0000175
Neal Norwitz12d31e22003-02-13 03:12:48 +0000176\begin{funcdesc}{dump}{object, file\optional{, protocol\optional{, bin}}}
Barry Warsawf595fd92001-11-15 23:39:07 +0000177Write a pickled representation of \var{object} to the open file object
178\var{file}. This is equivalent to
Neal Norwitz12d31e22003-02-13 03:12:48 +0000179\code{Pickler(\var{file}, \var{protocol}, \var{bin}).dump(\var{object})}.
180
181If the \var{protocol} parameter is ommitted, protocol 0 is used.
Neal Norwitzd08baa92003-02-21 00:26:33 +0000182If \var{protocol} is specified as a negative value
183or \constant{HIGHEST_PROTOCOL},
Neal Norwitz12d31e22003-02-13 03:12:48 +0000184the highest protocol version will be used.
185
186\versionchanged[The \var{protocol} parameter was added.
187The \var{bin} parameter is deprecated and only provided
188for backwards compatibility. You should use the \var{protocol}
189parameter instead]{2.3}
190
Barry Warsawf595fd92001-11-15 23:39:07 +0000191If the optional \var{bin} argument is true, the binary pickle format
192is used; otherwise the (less efficient) text pickle format is used
193(for backwards compatibility, this is the default).
Guido van Rossumd1883581995-02-15 15:53:08 +0000194
Barry Warsawf595fd92001-11-15 23:39:07 +0000195\var{file} must have a \method{write()} method that accepts a single
196string argument. It can thus be a file object opened for writing, a
197\refmodule{StringIO} object, or any other custom
198object that meets this interface.
199\end{funcdesc}
Guido van Rossumd1883581995-02-15 15:53:08 +0000200
Barry Warsawf595fd92001-11-15 23:39:07 +0000201\begin{funcdesc}{load}{file}
202Read a string from the open file object \var{file} and interpret it as
203a pickle data stream, reconstructing and returning the original object
204hierarchy. This is equivalent to \code{Unpickler(\var{file}).load()}.
Guido van Rossum470be141995-03-17 16:07:09 +0000205
Barry Warsawf595fd92001-11-15 23:39:07 +0000206\var{file} must have two methods, a \method{read()} method that takes
207an integer argument, and a \method{readline()} method that requires no
208arguments. Both methods should return a string. Thus \var{file} can
209be a file object opened for reading, a
210\module{StringIO} object, or any other custom
211object that meets this interface.
Guido van Rossum736fe5e1997-12-09 20:45:08 +0000212
Barry Warsawf595fd92001-11-15 23:39:07 +0000213This function automatically determines whether the data stream was
214written in binary mode or not.
215\end{funcdesc}
Guido van Rossumd1883581995-02-15 15:53:08 +0000216
Neal Norwitz12d31e22003-02-13 03:12:48 +0000217\begin{funcdesc}{dumps}{object\optional{, protocol\optional{, bin}}}
Barry Warsawf595fd92001-11-15 23:39:07 +0000218Return the pickled representation of the object as a string, instead
Neal Norwitz12d31e22003-02-13 03:12:48 +0000219of writing it to a file.
220
221If the \var{protocol} parameter is ommitted, protocol 0 is used.
Neal Norwitzd08baa92003-02-21 00:26:33 +0000222If \var{protocol} is specified as a negative value
223or \constant{HIGHEST_PROTOCOL},
Neal Norwitz12d31e22003-02-13 03:12:48 +0000224the highest protocol version will be used.
225
226\versionchanged[The \var{protocol} parameter was added.
227The \var{bin} parameter is deprecated and only provided
228for backwards compatibility. You should use the \var{protocol}
229parameter instead]{2.3}
230
231If the optional \var{bin} argument is
Barry Warsawf595fd92001-11-15 23:39:07 +0000232true, the binary pickle format is used; otherwise the (less efficient)
233text pickle format is used (this is the default).
234\end{funcdesc}
Guido van Rossumd1883581995-02-15 15:53:08 +0000235
Barry Warsawf595fd92001-11-15 23:39:07 +0000236\begin{funcdesc}{loads}{string}
237Read a pickled object hierarchy from a string. Characters in the
238string past the pickled object's representation are ignored.
239\end{funcdesc}
Guido van Rossumd1883581995-02-15 15:53:08 +0000240
Barry Warsawf595fd92001-11-15 23:39:07 +0000241The \module{pickle} module also defines three exceptions:
Guido van Rossum470be141995-03-17 16:07:09 +0000242
Barry Warsawf595fd92001-11-15 23:39:07 +0000243\begin{excdesc}{PickleError}
244A common base class for the other exceptions defined below. This
245inherits from \exception{Exception}.
246\end{excdesc}
Guido van Rossum470be141995-03-17 16:07:09 +0000247
Barry Warsawf595fd92001-11-15 23:39:07 +0000248\begin{excdesc}{PicklingError}
249This exception is raised when an unpicklable object is passed to
250the \method{dump()} method.
251\end{excdesc}
Guido van Rossumd1883581995-02-15 15:53:08 +0000252
Barry Warsawf595fd92001-11-15 23:39:07 +0000253\begin{excdesc}{UnpicklingError}
254This exception is raised when there is a problem unpickling an object,
255such as a security violation. Note that other exceptions may also be
256raised during unpickling, including (but not necessarily limited to)
Neil Schemenauer79f18132002-03-22 22:16:03 +0000257\exception{AttributeError}, \exception{EOFError},
258\exception{ImportError}, and \exception{IndexError}.
Barry Warsawf595fd92001-11-15 23:39:07 +0000259\end{excdesc}
260
261The \module{pickle} module also exports two callables\footnote{In the
262\module{pickle} module these callables are classes, which you could
263subclass to customize the behavior. However, in the \module{cPickle}
264modules these callables are factory functions and so cannot be
265subclassed. One of the common reasons to subclass is to control what
266objects can actually be unpickled. See section~\ref{pickle-sec} for
267more details on security concerns.}, \class{Pickler} and
268\class{Unpickler}:
269
Neal Norwitz12d31e22003-02-13 03:12:48 +0000270\begin{classdesc}{Pickler}{file\optional{, protocol\optional{, bin}}}
Barry Warsawf595fd92001-11-15 23:39:07 +0000271This takes a file-like object to which it will write a pickle data
Neal Norwitz12d31e22003-02-13 03:12:48 +0000272stream.
273
274If the \var{protocol} parameter is ommitted, protocol 0 is used.
275If \var{protocol} is specified as a negative value,
276the highest protocol version will be used.
277
278\versionchanged[The \var{bin} parameter is deprecated and only provided
279for backwards compatibility. You should use the \var{protocol}
280parameter instead]{2.3}
281
282Optional \var{bin} if true, tells the pickler to use the more
Barry Warsawf595fd92001-11-15 23:39:07 +0000283efficient binary pickle format, otherwise the \ASCII{} format is used
284(this is the default).
285
286\var{file} must have a \method{write()} method that accepts a single
287string argument. It can thus be an open file object, a
288\module{StringIO} object, or any other custom
289object that meets this interface.
290\end{classdesc}
291
292\class{Pickler} objects define one (or two) public methods:
293
294\begin{methoddesc}[Pickler]{dump}{object}
295Write a pickled representation of \var{object} to the open file object
296given in the constructor. Either the binary or \ASCII{} format will
297be used, depending on the value of the \var{bin} flag passed to the
298constructor.
299\end{methoddesc}
300
301\begin{methoddesc}[Pickler]{clear_memo}{}
302Clears the pickler's ``memo''. The memo is the data structure that
303remembers which objects the pickler has already seen, so that shared
304or recursive objects pickled by reference and not by value. This
305method is useful when re-using picklers.
306
Fred Drake7f781c92002-05-01 20:33:53 +0000307\begin{notice}
308Prior to Python 2.3, \method{clear_memo()} was only available on the
309picklers created by \refmodule{cPickle}. In the \module{pickle} module,
310picklers have an instance variable called \member{memo} which is a
311Python dictionary. So to clear the memo for a \module{pickle} module
Barry Warsawf595fd92001-11-15 23:39:07 +0000312pickler, you could do the following:
Guido van Rossumd1883581995-02-15 15:53:08 +0000313
Fred Drake19479911998-02-13 06:58:54 +0000314\begin{verbatim}
Barry Warsawf595fd92001-11-15 23:39:07 +0000315mypickler.memo.clear()
Fred Drake19479911998-02-13 06:58:54 +0000316\end{verbatim}
Fred Drake7f781c92002-05-01 20:33:53 +0000317
318Code that does not need to support older versions of Python should
319simply use \method{clear_memo()}.
320\end{notice}
Barry Warsawf595fd92001-11-15 23:39:07 +0000321\end{methoddesc}
Fred Drake9b28fe21998-04-04 06:20:28 +0000322
Barry Warsawf595fd92001-11-15 23:39:07 +0000323It is possible to make multiple calls to the \method{dump()} method of
324the same \class{Pickler} instance. These must then be matched to the
325same number of calls to the \method{load()} method of the
326corresponding \class{Unpickler} instance. If the same object is
327pickled by multiple \method{dump()} calls, the \method{load()} will
328all yield references to the same object\footnote{\emph{Warning}: this
329is intended for pickling multiple objects without intervening
330modifications to the objects or their parts. If you modify an object
331and then pickle it again using the same \class{Pickler} instance, the
332object is not pickled again --- a reference to it is pickled and the
333\class{Unpickler} will return the old value, not the modified one.
334There are two problems here: (1) detecting changes, and (2)
335marshalling a minimal set of changes. Garbage Collection may also
336become a problem here.}.
Guido van Rossum470be141995-03-17 16:07:09 +0000337
Barry Warsawf595fd92001-11-15 23:39:07 +0000338\class{Unpickler} objects are defined as:
Fred Drake9b28fe21998-04-04 06:20:28 +0000339
Barry Warsawf595fd92001-11-15 23:39:07 +0000340\begin{classdesc}{Unpickler}{file}
341This takes a file-like object from which it will read a pickle data
342stream. This class automatically determines whether the data stream
343was written in binary mode or not, so it does not need a flag as in
344the \class{Pickler} factory.
Guido van Rossumd1883581995-02-15 15:53:08 +0000345
Barry Warsawf595fd92001-11-15 23:39:07 +0000346\var{file} must have two methods, a \method{read()} method that takes
347an integer argument, and a \method{readline()} method that requires no
348arguments. Both methods should return a string. Thus \var{file} can
349be a file object opened for reading, a
350\module{StringIO} object, or any other custom
351object that meets this interface.
352\end{classdesc}
Fred Drake9b28fe21998-04-04 06:20:28 +0000353
Barry Warsawf595fd92001-11-15 23:39:07 +0000354\class{Unpickler} objects have one (or two) public methods:
Guido van Rossum470be141995-03-17 16:07:09 +0000355
Barry Warsawf595fd92001-11-15 23:39:07 +0000356\begin{methoddesc}[Unpickler]{load}{}
357Read a pickled object representation from the open file object given
358in the constructor, and return the reconstituted object hierarchy
359specified therein.
360\end{methoddesc}
Fred Drake9b28fe21998-04-04 06:20:28 +0000361
Barry Warsawf595fd92001-11-15 23:39:07 +0000362\begin{methoddesc}[Unpickler]{noload}{}
363This is just like \method{load()} except that it doesn't actually
364create any objects. This is useful primarily for finding what's
365called ``persistent ids'' that may be referenced in a pickle data
366stream. See section~\ref{pickle-protocol} below for more details.
Guido van Rossumd1883581995-02-15 15:53:08 +0000367
Barry Warsawf595fd92001-11-15 23:39:07 +0000368\strong{Note:} the \method{noload()} method is currently only
369available on \class{Unpickler} objects created with the
370\module{cPickle} module. \module{pickle} module \class{Unpickler}s do
371not have the \method{noload()} method.
372\end{methoddesc}
373
374\subsection{What can be pickled and unpickled?}
Guido van Rossum736fe5e1997-12-09 20:45:08 +0000375
Guido van Rossumd1883581995-02-15 15:53:08 +0000376The following types can be pickled:
Fred Drake41796911999-07-02 14:25:37 +0000377
Guido van Rossumd1883581995-02-15 15:53:08 +0000378\begin{itemize}
379
Raymond Hettingeracb45d72002-08-05 03:55:36 +0000380\item \code{None}, \code{True}, and \code{False}
Guido van Rossumd1883581995-02-15 15:53:08 +0000381
Barry Warsawf595fd92001-11-15 23:39:07 +0000382\item integers, long integers, floating point numbers, complex numbers
Guido van Rossumd1883581995-02-15 15:53:08 +0000383
Fred Drake56ced2a2000-04-06 15:04:30 +0000384\item normal and Unicode strings
Guido van Rossumd1883581995-02-15 15:53:08 +0000385
Barry Warsawf595fd92001-11-15 23:39:07 +0000386\item tuples, lists, and dictionaries containing only picklable objects
Guido van Rossumd1883581995-02-15 15:53:08 +0000387
Barry Warsawf595fd92001-11-15 23:39:07 +0000388\item functions defined at the top level of a module
Fred Drake38e5d272000-04-03 20:13:55 +0000389
Barry Warsawf595fd92001-11-15 23:39:07 +0000390\item built-in functions defined at the top level of a module
Fred Drake38e5d272000-04-03 20:13:55 +0000391
Barry Warsawf595fd92001-11-15 23:39:07 +0000392\item classes that are defined at the top level of a module
Guido van Rossum470be141995-03-17 16:07:09 +0000393
Fred Drake9b28fe21998-04-04 06:20:28 +0000394\item instances of such classes whose \member{__dict__} or
Barry Warsawf595fd92001-11-15 23:39:07 +0000395\method{__setstate__()} is picklable (see
396section~\ref{pickle-protocol} for details)
Guido van Rossumd1883581995-02-15 15:53:08 +0000397
398\end{itemize}
399
Guido van Rossum470be141995-03-17 16:07:09 +0000400Attempts to pickle unpicklable objects will raise the
Fred Drake9b28fe21998-04-04 06:20:28 +0000401\exception{PicklingError} exception; when this happens, an unspecified
Barry Warsawf595fd92001-11-15 23:39:07 +0000402number of bytes may have already been written to the underlying file.
Guido van Rossumd1883581995-02-15 15:53:08 +0000403
Barry Warsawf595fd92001-11-15 23:39:07 +0000404Note that functions (built-in and user-defined) are pickled by ``fully
405qualified'' name reference, not by value. This means that only the
406function name is pickled, along with the name of module the function
407is defined in. Neither the function's code, nor any of its function
408attributes are pickled. Thus the defining module must be importable
409in the unpickling environment, and the module must contain the named
410object, otherwise an exception will be raised\footnote{The exception
411raised will likely be an \exception{ImportError} or an
412\exception{AttributeError} but it could be something else.}.
Guido van Rossum470be141995-03-17 16:07:09 +0000413
Barry Warsawf595fd92001-11-15 23:39:07 +0000414Similarly, classes are pickled by named reference, so the same
415restrictions in the unpickling environment apply. Note that none of
416the class's code or data is pickled, so in the following example the
417class attribute \code{attr} is not restored in the unpickling
418environment:
Guido van Rossum470be141995-03-17 16:07:09 +0000419
Barry Warsawf595fd92001-11-15 23:39:07 +0000420\begin{verbatim}
421class Foo:
422 attr = 'a class attr'
Guido van Rossum470be141995-03-17 16:07:09 +0000423
Barry Warsawf595fd92001-11-15 23:39:07 +0000424picklestring = pickle.dumps(Foo)
425\end{verbatim}
Guido van Rossum470be141995-03-17 16:07:09 +0000426
Barry Warsawf595fd92001-11-15 23:39:07 +0000427These restrictions are why picklable functions and classes must be
428defined in the top level of a module.
Guido van Rossum470be141995-03-17 16:07:09 +0000429
Barry Warsawf595fd92001-11-15 23:39:07 +0000430Similarly, when class instances are pickled, their class's code and
431data are not pickled along with them. Only the instance data are
432pickled. This is done on purpose, so you can fix bugs in a class or
433add methods to the class and still load objects that were created with
434an earlier version of the class. If you plan to have long-lived
435objects that will see many versions of a class, it may be worthwhile
436to put a version number in the objects so that suitable conversions
437can be made by the class's \method{__setstate__()} method.
Guido van Rossum470be141995-03-17 16:07:09 +0000438
Barry Warsawf595fd92001-11-15 23:39:07 +0000439\subsection{The pickle protocol
440\label{pickle-protocol}}\setindexsubitem{(pickle protocol)}
Fred Drake40748961998-03-06 21:27:14 +0000441
Barry Warsawf595fd92001-11-15 23:39:07 +0000442This section describes the ``pickling protocol'' that defines the
443interface between the pickler/unpickler and the objects that are being
444serialized. This protocol provides a standard way for you to define,
445customize, and control how your objects are serialized and
446de-serialized. The description in this section doesn't cover specific
447customizations that you can employ to make the unpickling environment
448safer from untrusted pickle data streams; see section~\ref{pickle-sec}
449for more details.
Fred Drake40748961998-03-06 21:27:14 +0000450
Barry Warsawf595fd92001-11-15 23:39:07 +0000451\subsubsection{Pickling and unpickling normal class
452 instances\label{pickle-inst}}
Fred Drake9b28fe21998-04-04 06:20:28 +0000453
Barry Warsawf595fd92001-11-15 23:39:07 +0000454When a pickled class instance is unpickled, its \method{__init__()}
455method is normally \emph{not} invoked. If it is desirable that the
456\method{__init__()} method be called on unpickling, a class can define
457a method \method{__getinitargs__()}, which should return a
458\emph{tuple} containing the arguments to be passed to the class
459constructor (i.e. \method{__init__()}). The
460\method{__getinitargs__()} method is called at
461pickle time; the tuple it returns is incorporated in the pickle for
462the instance.
463\withsubitem{(copy protocol)}{\ttindex{__getinitargs__()}}
464\withsubitem{(instance constructor)}{\ttindex{__init__()}}
Fred Drake17e56401998-04-11 20:43:51 +0000465
Barry Warsawf595fd92001-11-15 23:39:07 +0000466\withsubitem{(copy protocol)}{
467 \ttindex{__getstate__()}\ttindex{__setstate__()}}
468\withsubitem{(instance attribute)}{
469 \ttindex{__dict__}}
Fred Drake17e56401998-04-11 20:43:51 +0000470
Barry Warsawf595fd92001-11-15 23:39:07 +0000471Classes can further influence how their instances are pickled; if the
472class defines the method \method{__getstate__()}, it is called and the
473return state is pickled as the contents for the instance, instead of
474the contents of the instance's dictionary. If there is no
475\method{__getstate__()} method, the instance's \member{__dict__} is
476pickled.
Fred Drake9463de21998-04-11 20:05:43 +0000477
Barry Warsawf595fd92001-11-15 23:39:07 +0000478Upon unpickling, if the class also defines the method
479\method{__setstate__()}, it is called with the unpickled
480state\footnote{These methods can also be used to implement copying
481class instances.}. If there is no \method{__setstate__()} method, the
Fred Drakee9cfcef2002-11-27 05:26:46 +0000482pickled state must be a dictionary and its items are assigned to the
Barry Warsawf595fd92001-11-15 23:39:07 +0000483new instance's dictionary. If a class defines both
484\method{__getstate__()} and \method{__setstate__()}, the state object
485needn't be a dictionary and these methods can do what they
Fred Drakee9cfcef2002-11-27 05:26:46 +0000486want.\footnote{This protocol is also used by the shallow and deep
Barry Warsawf595fd92001-11-15 23:39:07 +0000487copying operations defined in the
Fred Drakee9cfcef2002-11-27 05:26:46 +0000488\refmodule{copy} module.}
489
490\begin{notice}[warning]
491 For new-style classes, if \method{__getstate__()} returns a false
492 value, the \method{__setstate__()} method will not be called.
493\end{notice}
494
Barry Warsawf595fd92001-11-15 23:39:07 +0000495
496\subsubsection{Pickling and unpickling extension types}
497
498When the \class{Pickler} encounters an object of a type it knows
499nothing about --- such as an extension type --- it looks in two places
500for a hint of how to pickle it. One alternative is for the object to
501implement a \method{__reduce__()} method. If provided, at pickling
502time \method{__reduce__()} will be called with no arguments, and it
503must return either a string or a tuple.
504
505If a string is returned, it names a global variable whose contents are
506pickled as normal. When a tuple is returned, it must be of length two
507or three, with the following semantics:
508
509\begin{itemize}
510
511\item A callable object, which in the unpickling environment must be
512 either a class, a callable registered as a ``safe constructor''
513 (see below), or it must have an attribute
514 \member{__safe_for_unpickling__} with a true value. Otherwise,
515 an \exception{UnpicklingError} will be raised in the unpickling
516 environment. Note that as usual, the callable itself is pickled
517 by name.
518
519\item A tuple of arguments for the callable object, or \code{None}.
Raymond Hettinger97394bc2002-05-21 17:22:02 +0000520\deprecated{2.3}{Use the tuple of arguments instead}
Barry Warsawf595fd92001-11-15 23:39:07 +0000521
522\item Optionally, the object's state, which will be passed to
523 the object's \method{__setstate__()} method as described in
524 section~\ref{pickle-inst}. If the object has no
525 \method{__setstate__()} method, then, as above, the value must
526 be a dictionary and it will be added to the object's
527 \member{__dict__}.
528
529\end{itemize}
530
531Upon unpickling, the callable will be called (provided that it meets
532the above criteria), passing in the tuple of arguments; it should
Raymond Hettinger97394bc2002-05-21 17:22:02 +0000533return the unpickled object.
534
535If the second item was \code{None}, then instead of calling the
536callable directly, its \method{__basicnew__()} method is called
537without arguments. It should also return the unpickled object.
538
539\deprecated{2.3}{Use the tuple of arguments instead}
Barry Warsawf595fd92001-11-15 23:39:07 +0000540
541An alternative to implementing a \method{__reduce__()} method on the
542object to be pickled, is to register the callable with the
Fred Drake2744f432001-11-26 21:30:36 +0000543\refmodule[copyreg]{copy_reg} module. This module provides a way
Barry Warsawf595fd92001-11-15 23:39:07 +0000544for programs to register ``reduction functions'' and constructors for
545user-defined types. Reduction functions have the same semantics and
546interface as the \method{__reduce__()} method described above, except
547that they are called with a single argument, the object to be pickled.
548
549The registered constructor is deemed a ``safe constructor'' for purposes
550of unpickling as described above.
551
552\subsubsection{Pickling and unpickling external objects}
553
554For the benefit of object persistence, the \module{pickle} module
555supports the notion of a reference to an object outside the pickled
556data stream. Such objects are referenced by a ``persistent id'',
557which is just an arbitrary string of printable \ASCII{} characters.
558The resolution of such names is not defined by the \module{pickle}
559module; it will delegate this resolution to user defined functions on
560the pickler and unpickler\footnote{The actual mechanism for
561associating these user defined functions is slightly different for
562\module{pickle} and \module{cPickle}. The description given here
563works the same for both implementations. Users of the \module{pickle}
564module could also use subclassing to effect the same results,
565overriding the \method{persistent_id()} and \method{persistent_load()}
566methods in the derived classes.}.
567
568To define external persistent id resolution, you need to set the
569\member{persistent_id} attribute of the pickler object and the
570\member{persistent_load} attribute of the unpickler object.
571
572To pickle objects that have an external persistent id, the pickler
573must have a custom \function{persistent_id()} method that takes an
574object as an argument and returns either \code{None} or the persistent
575id for that object. When \code{None} is returned, the pickler simply
576pickles the object as normal. When a persistent id string is
577returned, the pickler will pickle that string, along with a marker
578so that the unpickler will recognize the string as a persistent id.
579
580To unpickle external objects, the unpickler must have a custom
581\function{persistent_load()} function that takes a persistent id
582string and returns the referenced object.
583
584Here's a silly example that \emph{might} shed more light:
585
586\begin{verbatim}
587import pickle
588from cStringIO import StringIO
589
590src = StringIO()
591p = pickle.Pickler(src)
592
593def persistent_id(obj):
594 if hasattr(obj, 'x'):
595 return 'the value %d' % obj.x
596 else:
597 return None
598
599p.persistent_id = persistent_id
600
601class Integer:
602 def __init__(self, x):
603 self.x = x
604 def __str__(self):
605 return 'My name is integer %d' % self.x
606
607i = Integer(7)
608print i
609p.dump(i)
610
611datastream = src.getvalue()
612print repr(datastream)
613dst = StringIO(datastream)
614
615up = pickle.Unpickler(dst)
616
617class FancyInteger(Integer):
618 def __str__(self):
619 return 'I am the integer %d' % self.x
620
621def persistent_load(persid):
622 if persid.startswith('the value '):
623 value = int(persid.split()[2])
624 return FancyInteger(value)
625 else:
626 raise pickle.UnpicklingError, 'Invalid persistent id'
627
628up.persistent_load = persistent_load
629
630j = up.load()
631print j
632\end{verbatim}
633
634In the \module{cPickle} module, the unpickler's
635\member{persistent_load} attribute can also be set to a Python
636list, in which case, when the unpickler reaches a persistent id, the
637persistent id string will simply be appended to this list. This
638functionality exists so that a pickle data stream can be ``sniffed''
639for object references without actually instantiating all the objects
640in a pickle\footnote{We'll leave you with the image of Guido and Jim
641sitting around sniffing pickles in their living rooms.}. Setting
642\member{persistent_load} to a list is usually used in conjunction with
643the \method{noload()} method on the Unpickler.
644
645% BAW: Both pickle and cPickle support something called
646% inst_persistent_id() which appears to give unknown types a second
647% shot at producing a persistent id. Since Jim Fulton can't remember
648% why it was added or what it's for, I'm leaving it undocumented.
649
650\subsection{Security \label{pickle-sec}}
651
652Most of the security issues surrounding the \module{pickle} and
653\module{cPickle} module involve unpickling. There are no known
654security vulnerabilities
655related to pickling because you (the programmer) control the objects
656that \module{pickle} will interact with, and all it produces is a
657string.
658
659However, for unpickling, it is \strong{never} a good idea to unpickle
660an untrusted string whose origins are dubious, for example, strings
661read from a socket. This is because unpickling can create unexpected
662objects and even potentially run methods of those objects, such as
663their class constructor or destructor\footnote{A special note of
664caution is worth raising about the \refmodule{Cookie}
665module. By default, the \class{Cookie.Cookie} class is an alias for
666the \class{Cookie.SmartCookie} class, which ``helpfully'' attempts to
667unpickle any cookie data string it is passed. This is a huge security
668hole because cookie data typically comes from an untrusted source.
669You should either explicitly use the \class{Cookie.SimpleCookie} class
670--- which doesn't attempt to unpickle its string --- or you should
671implement the defensive programming steps described later on in this
672section.}.
673
674You can defend against this by customizing your unpickler so that you
675can control exactly what gets unpickled and what gets called.
676Unfortunately, exactly how you do this is different depending on
677whether you're using \module{pickle} or \module{cPickle}.
678
679One common feature that both modules implement is the
680\member{__safe_for_unpickling__} attribute. Before calling a callable
681which is not a class, the unpickler will check to make sure that the
682callable has either been registered as a safe callable via the
Fred Drake2744f432001-11-26 21:30:36 +0000683\refmodule[copyreg]{copy_reg} module, or that it has an
Barry Warsawf595fd92001-11-15 23:39:07 +0000684attribute \member{__safe_for_unpickling__} with a true value. This
685prevents the unpickling environment from being tricked into doing
686evil things like call \code{os.unlink()} with an arbitrary file name.
687See section~\ref{pickle-protocol} for more details.
688
689For safely unpickling class instances, you need to control exactly
Barry Warsaw69ab5832001-11-18 16:24:01 +0000690which classes will get created. Be aware that a class's constructor
691could be called (if the pickler found a \method{__getinitargs__()}
692method) and the the class's destructor (i.e. its \method{__del__()} method)
693might get called when the object is garbage collected. Depending on
694the class, it isn't very heard to trick either method into doing bad
695things, such as removing a file. The way to
Barry Warsawf595fd92001-11-15 23:39:07 +0000696control the classes that are safe to instantiate differs in
697\module{pickle} and \module{cPickle}\footnote{A word of caution: the
698mechanisms described here use internal attributes and methods, which
699are subject to change in future versions of Python. We intend to
700someday provide a common interface for controlling this behavior,
701which will work in either \module{pickle} or \module{cPickle}.}.
702
703In the \module{pickle} module, you need to derive a subclass from
704\class{Unpickler}, overriding the \method{load_global()}
705method. \method{load_global()} should read two lines from the pickle
706data stream where the first line will the the name of the module
707containing the class and the second line will be the name of the
708instance's class. It then look up the class, possibly importing the
709module and digging out the attribute, then it appends what it finds to
710the unpickler's stack. Later on, this class will be assigned to the
711\member{__class__} attribute of an empty class, as a way of magically
712creating an instance without calling its class's \method{__init__()}.
713You job (should you choose to accept it), would be to have
714\method{load_global()} push onto the unpickler's stack, a known safe
715version of any class you deem safe to unpickle. It is up to you to
716produce such a class. Or you could raise an error if you want to
717disallow all unpickling of instances. If this sounds like a hack,
718you're right. UTSL.
719
720Things are a little cleaner with \module{cPickle}, but not by much.
721To control what gets unpickled, you can set the unpickler's
722\member{find_global} attribute to a function or \code{None}. If it is
723\code{None} then any attempts to unpickle instances will raise an
724\exception{UnpicklingError}. If it is a function,
725then it should accept a module name and a class name, and return the
726corresponding class object. It is responsible for looking up the
727class, again performing any necessary imports, and it may raise an
728error to prevent instances of the class from being unpickled.
729
730The moral of the story is that you should be really careful about the
731source of the strings your application unpickles.
Fred Drake9463de21998-04-11 20:05:43 +0000732
Fred Drake38e5d272000-04-03 20:13:55 +0000733\subsection{Example \label{pickle-example}}
734
735Here's a simple example of how to modify pickling behavior for a
736class. The \class{TextReader} class opens a text file, and returns
737the line number and line contents each time its \method{readline()}
738method is called. If a \class{TextReader} instance is pickled, all
739attributes \emph{except} the file object member are saved. When the
740instance is unpickled, the file is reopened, and reading resumes from
741the last location. The \method{__setstate__()} and
742\method{__getstate__()} methods are used to implement this behavior.
743
744\begin{verbatim}
Fred Drake38e5d272000-04-03 20:13:55 +0000745class TextReader:
Fred Drakec8252802001-09-25 16:29:17 +0000746 """Print and number lines in a text file."""
747 def __init__(self, file):
Fred Drake38e5d272000-04-03 20:13:55 +0000748 self.file = file
Fred Drakec8252802001-09-25 16:29:17 +0000749 self.fh = open(file)
Fred Drake38e5d272000-04-03 20:13:55 +0000750 self.lineno = 0
751
752 def readline(self):
753 self.lineno = self.lineno + 1
754 line = self.fh.readline()
755 if not line:
756 return None
Fred Drakec8252802001-09-25 16:29:17 +0000757 if line.endswith("\n"):
758 line = line[:-1]
759 return "%d: %s" % (self.lineno, line)
Fred Drake38e5d272000-04-03 20:13:55 +0000760
Fred Drake38e5d272000-04-03 20:13:55 +0000761 def __getstate__(self):
Fred Drakec8252802001-09-25 16:29:17 +0000762 odict = self.__dict__.copy() # copy the dict since we change it
763 del odict['fh'] # remove filehandle entry
Fred Drake38e5d272000-04-03 20:13:55 +0000764 return odict
765
Fred Drake38e5d272000-04-03 20:13:55 +0000766 def __setstate__(self,dict):
Fred Drakec8252802001-09-25 16:29:17 +0000767 fh = open(dict['file']) # reopen file
768 count = dict['lineno'] # read from file...
769 while count: # until line count is restored
Fred Drake38e5d272000-04-03 20:13:55 +0000770 fh.readline()
771 count = count - 1
Fred Drakec8252802001-09-25 16:29:17 +0000772 self.__dict__.update(dict) # update attributes
773 self.fh = fh # save the file object
Fred Drake38e5d272000-04-03 20:13:55 +0000774\end{verbatim}
775
776A sample usage might be something like this:
777
778\begin{verbatim}
779>>> import TextReader
780>>> obj = TextReader.TextReader("TextReader.py")
781>>> obj.readline()
782'1: #!/usr/local/bin/python'
783>>> # (more invocations of obj.readline() here)
784... obj.readline()
785'7: class TextReader:'
786>>> import pickle
787>>> pickle.dump(obj,open('save.p','w'))
Fred Drakec8252802001-09-25 16:29:17 +0000788\end{verbatim}
Fred Drake38e5d272000-04-03 20:13:55 +0000789
Fred Drakec8252802001-09-25 16:29:17 +0000790If you want to see that \refmodule{pickle} works across Python
791processes, start another Python session, before continuing. What
792follows can happen from either the same process or a new process.
Fred Drake38e5d272000-04-03 20:13:55 +0000793
Fred Drakec8252802001-09-25 16:29:17 +0000794\begin{verbatim}
Fred Drake38e5d272000-04-03 20:13:55 +0000795>>> import pickle
796>>> reader = pickle.load(open('save.p'))
797>>> reader.readline()
798'8: "Print and number lines in a text file."'
799\end{verbatim}
800
801
Barry Warsawf595fd92001-11-15 23:39:07 +0000802\begin{seealso}
803 \seemodule[copyreg]{copy_reg}{Pickle interface constructor
804 registration for extension types.}
805
806 \seemodule{shelve}{Indexed databases of objects; uses \module{pickle}.}
807
808 \seemodule{copy}{Shallow and deep object copying.}
809
810 \seemodule{marshal}{High-performance serialization of built-in types.}
811\end{seealso}
812
813
814\section{\module{cPickle} --- A faster \module{pickle}}
Fred Drakeffbe6871999-04-22 21:23:22 +0000815
Fred Drakeb91e9341998-07-23 17:59:49 +0000816\declaremodule{builtin}{cPickle}
Fred Drake38e5d272000-04-03 20:13:55 +0000817\modulesynopsis{Faster version of \refmodule{pickle}, but not subclassable.}
Fred Drakeffbe6871999-04-22 21:23:22 +0000818\moduleauthor{Jim Fulton}{jfulton@digicool.com}
819\sectionauthor{Fred L. Drake, Jr.}{fdrake@acm.org}
Fred Drakeb91e9341998-07-23 17:59:49 +0000820
Barry Warsawf595fd92001-11-15 23:39:07 +0000821The \module{cPickle} module supports serialization and
822de-serialization of Python objects, providing an interface and
823functionality nearly identical to the
824\refmodule{pickle}\refstmodindex{pickle} module. There are several
825differences, the most important being performance and subclassability.
Fred Drake9463de21998-04-11 20:05:43 +0000826
Barry Warsawf595fd92001-11-15 23:39:07 +0000827First, \module{cPickle} can be up to 1000 times faster than
828\module{pickle} because the former is implemented in C. Second, in
829the \module{cPickle} module the callables \function{Pickler()} and
830\function{Unpickler()} are functions, not classes. This means that
831you cannot use them to derive custom pickling and unpickling
832subclasses. Most applications have no need for this functionality and
833should benefit from the greatly improved performance of the
834\module{cPickle} module.
Fred Drake9463de21998-04-11 20:05:43 +0000835
Barry Warsawf595fd92001-11-15 23:39:07 +0000836The pickle data stream produced by \module{pickle} and
837\module{cPickle} are identical, so it is possible to use
838\module{pickle} and \module{cPickle} interchangeably with existing
839pickles\footnote{Since the pickle data format is actually a tiny
840stack-oriented programming language, and some freedom is taken in the
841encodings of certain objects, it is possible that the two modules
842produce different data streams for the same input objects. However it
843is guaranteed that they will always be able to read each other's
844data streams.}.
Guido van Rossumcf3ce921999-01-06 23:34:39 +0000845
Barry Warsawf595fd92001-11-15 23:39:07 +0000846There are additional minor differences in API between \module{cPickle}
847and \module{pickle}, however for most applications, they are
848interchangable. More documentation is provided in the
849\module{pickle} module documentation, which
850includes a list of the documented differences.
851
852