blob: 9e45a9608689110295a58e629de7abfd88be0b97 [file] [log] [blame]
Barry Warsawf595fd92001-11-15 23:39:07 +00001\section{\module{pickle} --- Python object serialization}
Fred Drakeb91e9341998-07-23 17:59:49 +00002
Fred Drakeffbe6871999-04-22 21:23:22 +00003\declaremodule{standard}{pickle}
Fred Drakeb91e9341998-07-23 17:59:49 +00004\modulesynopsis{Convert Python objects to streams of bytes and back.}
Fred Drake38e5d272000-04-03 20:13:55 +00005% Substantial improvements by Jim Kerr <jbkerr@sr.hp.com>.
Barry Warsawf595fd92001-11-15 23:39:07 +00006% Rewritten by Barry Warsaw <barry@zope.com>
Fred Drakeb91e9341998-07-23 17:59:49 +00007
Thomas Woutersf8316632000-07-16 19:01:10 +00008\index{persistence}
Guido van Rossumd1883581995-02-15 15:53:08 +00009\indexii{persistent}{objects}
10\indexii{serializing}{objects}
11\indexii{marshalling}{objects}
12\indexii{flattening}{objects}
13\indexii{pickling}{objects}
14
Barry Warsawf595fd92001-11-15 23:39:07 +000015The \module{pickle} module implements a fundamental, but powerful
16algorithm for serializing and de-serializing a Python object
17structure. ``Pickling'' is the process whereby a Python object
18hierarchy is converted into a byte stream, and ``unpickling'' is the
19inverse operation, whereby a byte stream is converted back into an
20object hierarchy. Pickling (and unpickling) is alternatively known as
Fred Drake2744f432001-11-26 21:30:36 +000021``serialization'', ``marshalling,''\footnote{Don't confuse this with
22the \refmodule{marshal} module} or ``flattening'',
Raymond Hettinger35fd9262003-06-25 15:07:45 +000023however, to avoid confusion, the terms used here are ``pickling'' and
24``unpickling''.
Guido van Rossum470be141995-03-17 16:07:09 +000025
Barry Warsawf595fd92001-11-15 23:39:07 +000026This documentation describes both the \module{pickle} module and the
Fred Drake2744f432001-11-26 21:30:36 +000027\refmodule{cPickle} module.
Fred Drakeffbe6871999-04-22 21:23:22 +000028
Barry Warsawf595fd92001-11-15 23:39:07 +000029\subsection{Relationship to other Python modules}
Guido van Rossumd1883581995-02-15 15:53:08 +000030
Barry Warsawf595fd92001-11-15 23:39:07 +000031The \module{pickle} module has an optimized cousin called the
32\module{cPickle} module. As its name implies, \module{cPickle} is
33written in C, so it can be up to 1000 times faster than
34\module{pickle}. However it does not support subclassing of the
35\function{Pickler()} and \function{Unpickler()} classes, because in
36\module{cPickle} these are functions, not classes. Most applications
37have no need for this functionality, and can benefit from the improved
38performance of \module{cPickle}. Other than that, the interfaces of
39the two modules are nearly identical; the common interface is
40described in this manual and differences are pointed out where
41necessary. In the following discussions, we use the term ``pickle''
42to collectively describe the \module{pickle} and
43\module{cPickle} modules.
Guido van Rossum736fe5e1997-12-09 20:45:08 +000044
Barry Warsawf595fd92001-11-15 23:39:07 +000045The data streams the two modules produce are guaranteed to be
46interchangeable.
47
48Python has a more primitive serialization module called
Fred Drake2744f432001-11-26 21:30:36 +000049\refmodule{marshal}, but in general
Barry Warsawf595fd92001-11-15 23:39:07 +000050\module{pickle} should always be the preferred way to serialize Python
51objects. \module{marshal} exists primarily to support Python's
52\file{.pyc} files.
53
54The \module{pickle} module differs from \refmodule{marshal} several
55significant ways:
Guido van Rossumd1883581995-02-15 15:53:08 +000056
57\begin{itemize}
58
Barry Warsawf595fd92001-11-15 23:39:07 +000059\item The \module{pickle} module keeps track of the objects it has
60 already serialized, so that later references to the same object
61 won't be serialized again. \module{marshal} doesn't do this.
Guido van Rossumd1883581995-02-15 15:53:08 +000062
Barry Warsawf595fd92001-11-15 23:39:07 +000063 This has implications both for recursive objects and object
64 sharing. Recursive objects are objects that contain references
65 to themselves. These are not handled by marshal, and in fact,
66 attempting to marshal recursive objects will crash your Python
67 interpreter. Object sharing happens when there are multiple
68 references to the same object in different places in the object
69 hierarchy being serialized. \module{pickle} stores such objects
70 only once, and ensures that all other references point to the
71 master copy. Shared objects remain shared, which can be very
72 important for mutable objects.
Guido van Rossumd1883581995-02-15 15:53:08 +000073
Barry Warsawf595fd92001-11-15 23:39:07 +000074\item \module{marshal} cannot be used to serialize user-defined
75 classes and their instances. \module{pickle} can save and
76 restore class instances transparently, however the class
77 definition must be importable and live in the same module as
78 when the object was stored.
79
80\item The \module{marshal} serialization format is not guaranteed to
81 be portable across Python versions. Because its primary job in
82 life is to support \file{.pyc} files, the Python implementers
83 reserve the right to change the serialization format in
84 non-backwards compatible ways should the need arise. The
85 \module{pickle} serialization format is guaranteed to be
86 backwards compatible across Python releases.
87
Guido van Rossumd1883581995-02-15 15:53:08 +000088\end{itemize}
89
Andrew M. Kuchling76963442003-05-14 16:51:46 +000090\begin{notice}[warning]
91The \module{pickle} module is not intended to be secure against
92erroneous or maliciously constructed data. Never unpickle data
93received from an untrusted or unauthenticated source.
94\end{notice}
95
Barry Warsawf595fd92001-11-15 23:39:07 +000096Note that serialization is a more primitive notion than persistence;
97although
98\module{pickle} reads and writes file objects, it does not handle the
99issue of naming persistent objects, nor the (even more complicated)
100issue of concurrent access to persistent objects. The \module{pickle}
101module can transform a complex object into a byte stream and it can
102transform the byte stream into an object with the same internal
103structure. Perhaps the most obvious thing to do with these byte
104streams is to write them onto a file, but it is also conceivable to
105send them across a network or store them in a database. The module
106\refmodule{shelve} provides a simple interface
107to pickle and unpickle objects on DBM-style database files.
108
109\subsection{Data stream format}
110
Fred Drake9b28fe21998-04-04 06:20:28 +0000111The data format used by \module{pickle} is Python-specific. This has
Guido van Rossumd1883581995-02-15 15:53:08 +0000112the advantage that there are no restrictions imposed by external
Barry Warsawf595fd92001-11-15 23:39:07 +0000113standards such as XDR\index{XDR}\index{External Data Representation}
114(which can't represent pointer sharing); however it means that
115non-Python programs may not be able to reconstruct pickled Python
116objects.
Guido van Rossumd1883581995-02-15 15:53:08 +0000117
Fred Drake9b28fe21998-04-04 06:20:28 +0000118By default, the \module{pickle} data format uses a printable \ASCII{}
Guido van Rossum736fe5e1997-12-09 20:45:08 +0000119representation. This is slightly more voluminous than a binary
120representation. The big advantage of using printable \ASCII{} (and of
Fred Drake9b28fe21998-04-04 06:20:28 +0000121some other characteristics of \module{pickle}'s representation) is that
Guido van Rossum736fe5e1997-12-09 20:45:08 +0000122for debugging or recovery purposes it is possible for a human to read
123the pickled file with a standard text editor.
124
Neal Norwitz12d31e22003-02-13 03:12:48 +0000125There are currently 3 different protocols which can be used for pickling.
126
127\begin{itemize}
128
129\item Protocol version 0 is the original ASCII protocol and is backwards
130compatible with earlier versions of Python.
131
132\item Protocol version 1 is the old binary format which is also compatible
133with earlier versions of Python.
134
135\item Protocol version 2 was introduced in Python 2.3. It provides
136much more efficient pickling of new-style classes.
137
138\end{itemize}
139
140Refer to PEP 307 for more information.
141
142If a \var{protocol} is not specified, protocol 0 is used.
Neal Norwitzd08baa92003-02-21 00:26:33 +0000143If \var{protocol} is specified as a negative value
144or \constant{HIGHEST_PROTOCOL},
145the highest protocol version available will be used.
Neal Norwitz12d31e22003-02-13 03:12:48 +0000146
147\versionchanged[The \var{bin} parameter is deprecated and only provided
148for backwards compatibility. You should use the \var{protocol}
149parameter instead]{2.3}
150
Guido van Rossum736fe5e1997-12-09 20:45:08 +0000151A binary format, which is slightly more efficient, can be chosen by
Barry Warsawf595fd92001-11-15 23:39:07 +0000152specifying a true value for the \var{bin} argument to the
Fred Drake9b28fe21998-04-04 06:20:28 +0000153\class{Pickler} constructor or the \function{dump()} and \function{dumps()}
Neal Norwitz12d31e22003-02-13 03:12:48 +0000154functions. A \var{protocol} version >= 1 implies use of a binary format.
Guido van Rossumd1883581995-02-15 15:53:08 +0000155
Barry Warsawf595fd92001-11-15 23:39:07 +0000156\subsection{Usage}
Guido van Rossumd1883581995-02-15 15:53:08 +0000157
Barry Warsawf595fd92001-11-15 23:39:07 +0000158To serialize an object hierarchy, you first create a pickler, then you
159call the pickler's \method{dump()} method. To de-serialize a data
160stream, you first create an unpickler, then you call the unpickler's
161\method{load()} method. The \module{pickle} module provides the
Neal Norwitzd08baa92003-02-21 00:26:33 +0000162following constant:
163
164\begin{datadesc}{HIGHEST_PROTOCOL}
165The highest protocol version available. This value can be passed
166as a \var{protocol} value.
Fred Drake7c4d8f32003-09-10 20:47:43 +0000167\versionadded{2.3}
Neal Norwitzd08baa92003-02-21 00:26:33 +0000168\end{datadesc}
169
170The \module{pickle} module provides the
Barry Warsawf595fd92001-11-15 23:39:07 +0000171following functions to make this process more convenient:
Guido van Rossumd1883581995-02-15 15:53:08 +0000172
Neal Norwitz12d31e22003-02-13 03:12:48 +0000173\begin{funcdesc}{dump}{object, file\optional{, protocol\optional{, bin}}}
Barry Warsawf595fd92001-11-15 23:39:07 +0000174Write a pickled representation of \var{object} to the open file object
175\var{file}. This is equivalent to
Neal Norwitz12d31e22003-02-13 03:12:48 +0000176\code{Pickler(\var{file}, \var{protocol}, \var{bin}).dump(\var{object})}.
177
178If the \var{protocol} parameter is ommitted, protocol 0 is used.
Neal Norwitzd08baa92003-02-21 00:26:33 +0000179If \var{protocol} is specified as a negative value
180or \constant{HIGHEST_PROTOCOL},
Neal Norwitz12d31e22003-02-13 03:12:48 +0000181the highest protocol version will be used.
182
183\versionchanged[The \var{protocol} parameter was added.
184The \var{bin} parameter is deprecated and only provided
185for backwards compatibility. You should use the \var{protocol}
186parameter instead]{2.3}
187
Barry Warsawf595fd92001-11-15 23:39:07 +0000188If the optional \var{bin} argument is true, the binary pickle format
189is used; otherwise the (less efficient) text pickle format is used
190(for backwards compatibility, this is the default).
Guido van Rossumd1883581995-02-15 15:53:08 +0000191
Barry Warsawf595fd92001-11-15 23:39:07 +0000192\var{file} must have a \method{write()} method that accepts a single
193string argument. It can thus be a file object opened for writing, a
194\refmodule{StringIO} object, or any other custom
195object that meets this interface.
196\end{funcdesc}
Guido van Rossumd1883581995-02-15 15:53:08 +0000197
Barry Warsawf595fd92001-11-15 23:39:07 +0000198\begin{funcdesc}{load}{file}
199Read a string from the open file object \var{file} and interpret it as
200a pickle data stream, reconstructing and returning the original object
201hierarchy. This is equivalent to \code{Unpickler(\var{file}).load()}.
Guido van Rossum470be141995-03-17 16:07:09 +0000202
Barry Warsawf595fd92001-11-15 23:39:07 +0000203\var{file} must have two methods, a \method{read()} method that takes
204an integer argument, and a \method{readline()} method that requires no
205arguments. Both methods should return a string. Thus \var{file} can
206be a file object opened for reading, a
207\module{StringIO} object, or any other custom
208object that meets this interface.
Guido van Rossum736fe5e1997-12-09 20:45:08 +0000209
Barry Warsawf595fd92001-11-15 23:39:07 +0000210This function automatically determines whether the data stream was
211written in binary mode or not.
212\end{funcdesc}
Guido van Rossumd1883581995-02-15 15:53:08 +0000213
Neal Norwitz12d31e22003-02-13 03:12:48 +0000214\begin{funcdesc}{dumps}{object\optional{, protocol\optional{, bin}}}
Barry Warsawf595fd92001-11-15 23:39:07 +0000215Return the pickled representation of the object as a string, instead
Neal Norwitz12d31e22003-02-13 03:12:48 +0000216of writing it to a file.
217
218If the \var{protocol} parameter is ommitted, protocol 0 is used.
Neal Norwitzd08baa92003-02-21 00:26:33 +0000219If \var{protocol} is specified as a negative value
220or \constant{HIGHEST_PROTOCOL},
Neal Norwitz12d31e22003-02-13 03:12:48 +0000221the highest protocol version will be used.
222
223\versionchanged[The \var{protocol} parameter was added.
224The \var{bin} parameter is deprecated and only provided
225for backwards compatibility. You should use the \var{protocol}
226parameter instead]{2.3}
227
228If the optional \var{bin} argument is
Barry Warsawf595fd92001-11-15 23:39:07 +0000229true, the binary pickle format is used; otherwise the (less efficient)
230text pickle format is used (this is the default).
231\end{funcdesc}
Guido van Rossumd1883581995-02-15 15:53:08 +0000232
Barry Warsawf595fd92001-11-15 23:39:07 +0000233\begin{funcdesc}{loads}{string}
234Read a pickled object hierarchy from a string. Characters in the
235string past the pickled object's representation are ignored.
236\end{funcdesc}
Guido van Rossumd1883581995-02-15 15:53:08 +0000237
Barry Warsawf595fd92001-11-15 23:39:07 +0000238The \module{pickle} module also defines three exceptions:
Guido van Rossum470be141995-03-17 16:07:09 +0000239
Barry Warsawf595fd92001-11-15 23:39:07 +0000240\begin{excdesc}{PickleError}
241A common base class for the other exceptions defined below. This
242inherits from \exception{Exception}.
243\end{excdesc}
Guido van Rossum470be141995-03-17 16:07:09 +0000244
Barry Warsawf595fd92001-11-15 23:39:07 +0000245\begin{excdesc}{PicklingError}
246This exception is raised when an unpicklable object is passed to
247the \method{dump()} method.
248\end{excdesc}
Guido van Rossumd1883581995-02-15 15:53:08 +0000249
Barry Warsawf595fd92001-11-15 23:39:07 +0000250\begin{excdesc}{UnpicklingError}
Andrew M. Kuchling76963442003-05-14 16:51:46 +0000251This exception is raised when there is a problem unpickling an object.
252Note that other exceptions may also be raised during unpickling,
253including (but not necessarily limited to) \exception{AttributeError},
254\exception{EOFError}, \exception{ImportError}, and \exception{IndexError}.
Barry Warsawf595fd92001-11-15 23:39:07 +0000255\end{excdesc}
256
257The \module{pickle} module also exports two callables\footnote{In the
258\module{pickle} module these callables are classes, which you could
Fred Drake7c4d8f32003-09-10 20:47:43 +0000259subclass to customize the behavior. However, in the \refmodule{cPickle}
260module these callables are factory functions and so cannot be
261subclassed. One common reason to subclass is to control what
Andrew M. Kuchling76963442003-05-14 16:51:46 +0000262objects can actually be unpickled. See section~\ref{pickle-sub} for
Fred Drake7c4d8f32003-09-10 20:47:43 +0000263more details.}, \class{Pickler} and \class{Unpickler}:
Barry Warsawf595fd92001-11-15 23:39:07 +0000264
Neal Norwitz12d31e22003-02-13 03:12:48 +0000265\begin{classdesc}{Pickler}{file\optional{, protocol\optional{, bin}}}
Barry Warsawf595fd92001-11-15 23:39:07 +0000266This takes a file-like object to which it will write a pickle data
Neal Norwitz12d31e22003-02-13 03:12:48 +0000267stream.
268
269If the \var{protocol} parameter is ommitted, protocol 0 is used.
270If \var{protocol} is specified as a negative value,
271the highest protocol version will be used.
272
273\versionchanged[The \var{bin} parameter is deprecated and only provided
274for backwards compatibility. You should use the \var{protocol}
275parameter instead]{2.3}
276
277Optional \var{bin} if true, tells the pickler to use the more
Barry Warsawf595fd92001-11-15 23:39:07 +0000278efficient binary pickle format, otherwise the \ASCII{} format is used
279(this is the default).
280
281\var{file} must have a \method{write()} method that accepts a single
282string argument. It can thus be an open file object, a
283\module{StringIO} object, or any other custom
284object that meets this interface.
285\end{classdesc}
286
287\class{Pickler} objects define one (or two) public methods:
288
289\begin{methoddesc}[Pickler]{dump}{object}
290Write a pickled representation of \var{object} to the open file object
291given in the constructor. Either the binary or \ASCII{} format will
292be used, depending on the value of the \var{bin} flag passed to the
293constructor.
294\end{methoddesc}
295
296\begin{methoddesc}[Pickler]{clear_memo}{}
297Clears the pickler's ``memo''. The memo is the data structure that
298remembers which objects the pickler has already seen, so that shared
299or recursive objects pickled by reference and not by value. This
300method is useful when re-using picklers.
301
Fred Drake7f781c92002-05-01 20:33:53 +0000302\begin{notice}
303Prior to Python 2.3, \method{clear_memo()} was only available on the
304picklers created by \refmodule{cPickle}. In the \module{pickle} module,
305picklers have an instance variable called \member{memo} which is a
306Python dictionary. So to clear the memo for a \module{pickle} module
Barry Warsawf595fd92001-11-15 23:39:07 +0000307pickler, you could do the following:
Guido van Rossumd1883581995-02-15 15:53:08 +0000308
Fred Drake19479911998-02-13 06:58:54 +0000309\begin{verbatim}
Barry Warsawf595fd92001-11-15 23:39:07 +0000310mypickler.memo.clear()
Fred Drake19479911998-02-13 06:58:54 +0000311\end{verbatim}
Fred Drake7f781c92002-05-01 20:33:53 +0000312
313Code that does not need to support older versions of Python should
314simply use \method{clear_memo()}.
315\end{notice}
Barry Warsawf595fd92001-11-15 23:39:07 +0000316\end{methoddesc}
Fred Drake9b28fe21998-04-04 06:20:28 +0000317
Barry Warsawf595fd92001-11-15 23:39:07 +0000318It is possible to make multiple calls to the \method{dump()} method of
319the same \class{Pickler} instance. These must then be matched to the
320same number of calls to the \method{load()} method of the
321corresponding \class{Unpickler} instance. If the same object is
322pickled by multiple \method{dump()} calls, the \method{load()} will
Fred Drakef5f0c172003-09-09 19:49:18 +0000323all yield references to the same object.\footnote{\emph{Warning}: this
Barry Warsawf595fd92001-11-15 23:39:07 +0000324is intended for pickling multiple objects without intervening
325modifications to the objects or their parts. If you modify an object
326and then pickle it again using the same \class{Pickler} instance, the
327object is not pickled again --- a reference to it is pickled and the
328\class{Unpickler} will return the old value, not the modified one.
329There are two problems here: (1) detecting changes, and (2)
330marshalling a minimal set of changes. Garbage Collection may also
Fred Drakef5f0c172003-09-09 19:49:18 +0000331become a problem here.}
Guido van Rossum470be141995-03-17 16:07:09 +0000332
Barry Warsawf595fd92001-11-15 23:39:07 +0000333\class{Unpickler} objects are defined as:
Fred Drake9b28fe21998-04-04 06:20:28 +0000334
Barry Warsawf595fd92001-11-15 23:39:07 +0000335\begin{classdesc}{Unpickler}{file}
336This takes a file-like object from which it will read a pickle data
337stream. This class automatically determines whether the data stream
338was written in binary mode or not, so it does not need a flag as in
339the \class{Pickler} factory.
Guido van Rossumd1883581995-02-15 15:53:08 +0000340
Barry Warsawf595fd92001-11-15 23:39:07 +0000341\var{file} must have two methods, a \method{read()} method that takes
342an integer argument, and a \method{readline()} method that requires no
343arguments. Both methods should return a string. Thus \var{file} can
344be a file object opened for reading, a
345\module{StringIO} object, or any other custom
346object that meets this interface.
347\end{classdesc}
Fred Drake9b28fe21998-04-04 06:20:28 +0000348
Barry Warsawf595fd92001-11-15 23:39:07 +0000349\class{Unpickler} objects have one (or two) public methods:
Guido van Rossum470be141995-03-17 16:07:09 +0000350
Barry Warsawf595fd92001-11-15 23:39:07 +0000351\begin{methoddesc}[Unpickler]{load}{}
352Read a pickled object representation from the open file object given
353in the constructor, and return the reconstituted object hierarchy
354specified therein.
355\end{methoddesc}
Fred Drake9b28fe21998-04-04 06:20:28 +0000356
Barry Warsawf595fd92001-11-15 23:39:07 +0000357\begin{methoddesc}[Unpickler]{noload}{}
358This is just like \method{load()} except that it doesn't actually
359create any objects. This is useful primarily for finding what's
360called ``persistent ids'' that may be referenced in a pickle data
361stream. See section~\ref{pickle-protocol} below for more details.
Guido van Rossumd1883581995-02-15 15:53:08 +0000362
Barry Warsawf595fd92001-11-15 23:39:07 +0000363\strong{Note:} the \method{noload()} method is currently only
364available on \class{Unpickler} objects created with the
365\module{cPickle} module. \module{pickle} module \class{Unpickler}s do
366not have the \method{noload()} method.
367\end{methoddesc}
368
369\subsection{What can be pickled and unpickled?}
Guido van Rossum736fe5e1997-12-09 20:45:08 +0000370
Guido van Rossumd1883581995-02-15 15:53:08 +0000371The following types can be pickled:
Fred Drake41796911999-07-02 14:25:37 +0000372
Guido van Rossumd1883581995-02-15 15:53:08 +0000373\begin{itemize}
374
Raymond Hettingeracb45d72002-08-05 03:55:36 +0000375\item \code{None}, \code{True}, and \code{False}
Guido van Rossumd1883581995-02-15 15:53:08 +0000376
Barry Warsawf595fd92001-11-15 23:39:07 +0000377\item integers, long integers, floating point numbers, complex numbers
Guido van Rossumd1883581995-02-15 15:53:08 +0000378
Fred Drake56ced2a2000-04-06 15:04:30 +0000379\item normal and Unicode strings
Guido van Rossumd1883581995-02-15 15:53:08 +0000380
Raymond Hettinger621c53e2004-01-01 05:53:51 +0000381\item tuples, lists, sets, and dictionaries containing only picklable objects
Guido van Rossumd1883581995-02-15 15:53:08 +0000382
Barry Warsawf595fd92001-11-15 23:39:07 +0000383\item functions defined at the top level of a module
Fred Drake38e5d272000-04-03 20:13:55 +0000384
Barry Warsawf595fd92001-11-15 23:39:07 +0000385\item built-in functions defined at the top level of a module
Fred Drake38e5d272000-04-03 20:13:55 +0000386
Barry Warsawf595fd92001-11-15 23:39:07 +0000387\item classes that are defined at the top level of a module
Guido van Rossum470be141995-03-17 16:07:09 +0000388
Fred Drake9b28fe21998-04-04 06:20:28 +0000389\item instances of such classes whose \member{__dict__} or
Barry Warsawf595fd92001-11-15 23:39:07 +0000390\method{__setstate__()} is picklable (see
391section~\ref{pickle-protocol} for details)
Guido van Rossumd1883581995-02-15 15:53:08 +0000392
393\end{itemize}
394
Guido van Rossum470be141995-03-17 16:07:09 +0000395Attempts to pickle unpicklable objects will raise the
Fred Drake9b28fe21998-04-04 06:20:28 +0000396\exception{PicklingError} exception; when this happens, an unspecified
Barry Warsawf595fd92001-11-15 23:39:07 +0000397number of bytes may have already been written to the underlying file.
Guido van Rossumd1883581995-02-15 15:53:08 +0000398
Barry Warsawf595fd92001-11-15 23:39:07 +0000399Note that functions (built-in and user-defined) are pickled by ``fully
400qualified'' name reference, not by value. This means that only the
401function name is pickled, along with the name of module the function
402is defined in. Neither the function's code, nor any of its function
403attributes are pickled. Thus the defining module must be importable
404in the unpickling environment, and the module must contain the named
Fred Drakef5f0c172003-09-09 19:49:18 +0000405object, otherwise an exception will be raised.\footnote{The exception
Barry Warsawf595fd92001-11-15 23:39:07 +0000406raised will likely be an \exception{ImportError} or an
Fred Drakef5f0c172003-09-09 19:49:18 +0000407\exception{AttributeError} but it could be something else.}
Guido van Rossum470be141995-03-17 16:07:09 +0000408
Barry Warsawf595fd92001-11-15 23:39:07 +0000409Similarly, classes are pickled by named reference, so the same
410restrictions in the unpickling environment apply. Note that none of
411the class's code or data is pickled, so in the following example the
412class attribute \code{attr} is not restored in the unpickling
413environment:
Guido van Rossum470be141995-03-17 16:07:09 +0000414
Barry Warsawf595fd92001-11-15 23:39:07 +0000415\begin{verbatim}
416class Foo:
417 attr = 'a class attr'
Guido van Rossum470be141995-03-17 16:07:09 +0000418
Barry Warsawf595fd92001-11-15 23:39:07 +0000419picklestring = pickle.dumps(Foo)
420\end{verbatim}
Guido van Rossum470be141995-03-17 16:07:09 +0000421
Barry Warsawf595fd92001-11-15 23:39:07 +0000422These restrictions are why picklable functions and classes must be
423defined in the top level of a module.
Guido van Rossum470be141995-03-17 16:07:09 +0000424
Barry Warsawf595fd92001-11-15 23:39:07 +0000425Similarly, when class instances are pickled, their class's code and
426data are not pickled along with them. Only the instance data are
427pickled. This is done on purpose, so you can fix bugs in a class or
428add methods to the class and still load objects that were created with
429an earlier version of the class. If you plan to have long-lived
430objects that will see many versions of a class, it may be worthwhile
431to put a version number in the objects so that suitable conversions
432can be made by the class's \method{__setstate__()} method.
Guido van Rossum470be141995-03-17 16:07:09 +0000433
Barry Warsawf595fd92001-11-15 23:39:07 +0000434\subsection{The pickle protocol
435\label{pickle-protocol}}\setindexsubitem{(pickle protocol)}
Fred Drake40748961998-03-06 21:27:14 +0000436
Barry Warsawf595fd92001-11-15 23:39:07 +0000437This section describes the ``pickling protocol'' that defines the
438interface between the pickler/unpickler and the objects that are being
439serialized. This protocol provides a standard way for you to define,
440customize, and control how your objects are serialized and
441de-serialized. The description in this section doesn't cover specific
442customizations that you can employ to make the unpickling environment
Andrew M. Kuchling76963442003-05-14 16:51:46 +0000443slightly safer from untrusted pickle data streams; see section~\ref{pickle-sub}
Barry Warsawf595fd92001-11-15 23:39:07 +0000444for more details.
Fred Drake40748961998-03-06 21:27:14 +0000445
Barry Warsawf595fd92001-11-15 23:39:07 +0000446\subsubsection{Pickling and unpickling normal class
447 instances\label{pickle-inst}}
Fred Drake9b28fe21998-04-04 06:20:28 +0000448
Barry Warsawf595fd92001-11-15 23:39:07 +0000449When a pickled class instance is unpickled, its \method{__init__()}
450method is normally \emph{not} invoked. If it is desirable that the
Fred Drake0de77d12004-05-05 04:54:37 +0000451\method{__init__()} method be called on unpickling, an old-style class
452can define a method \method{__getinitargs__()}, which should return a
Barry Warsawf595fd92001-11-15 23:39:07 +0000453\emph{tuple} containing the arguments to be passed to the class
454constructor (i.e. \method{__init__()}). The
455\method{__getinitargs__()} method is called at
456pickle time; the tuple it returns is incorporated in the pickle for
457the instance.
458\withsubitem{(copy protocol)}{\ttindex{__getinitargs__()}}
459\withsubitem{(instance constructor)}{\ttindex{__init__()}}
Fred Drake17e56401998-04-11 20:43:51 +0000460
Fred Drake8aa8c842004-05-05 04:56:06 +0000461\withsubitem{(copy protocol)}{\ttindex{__getnewargs__()}}
462
Fred Drake0de77d12004-05-05 04:54:37 +0000463New-style types can provide a \method{__getnewargs__()} method that is
464used for protocol 2. Implementing this method is needed if the type
465establishes some internal invariants when the instance is created, or
466if the memory allocation is affected by the values passed to the
467\method{__new__()} method for the type (as it is for tuples and
468strings). Instances of a new-style type \class{C} are created using
469
470\begin{alltt}
471obj = C.__new__(C, *\var{args})
472\end{alltt}
473
474where \var{args} is the result of calling \method{__getnewargs__()} on
475the original object; if there is no \method{__getnewargs__()}, an
476empty tuple is assumed.
477
Barry Warsawf595fd92001-11-15 23:39:07 +0000478\withsubitem{(copy protocol)}{
479 \ttindex{__getstate__()}\ttindex{__setstate__()}}
480\withsubitem{(instance attribute)}{
481 \ttindex{__dict__}}
Fred Drake17e56401998-04-11 20:43:51 +0000482
Barry Warsawf595fd92001-11-15 23:39:07 +0000483Classes can further influence how their instances are pickled; if the
484class defines the method \method{__getstate__()}, it is called and the
485return state is pickled as the contents for the instance, instead of
486the contents of the instance's dictionary. If there is no
487\method{__getstate__()} method, the instance's \member{__dict__} is
488pickled.
Fred Drake9463de21998-04-11 20:05:43 +0000489
Barry Warsawf595fd92001-11-15 23:39:07 +0000490Upon unpickling, if the class also defines the method
491\method{__setstate__()}, it is called with the unpickled
Fred Drakef5f0c172003-09-09 19:49:18 +0000492state.\footnote{These methods can also be used to implement copying
493class instances.} If there is no \method{__setstate__()} method, the
Fred Drakee9cfcef2002-11-27 05:26:46 +0000494pickled state must be a dictionary and its items are assigned to the
Barry Warsawf595fd92001-11-15 23:39:07 +0000495new instance's dictionary. If a class defines both
496\method{__getstate__()} and \method{__setstate__()}, the state object
497needn't be a dictionary and these methods can do what they
Fred Drakee9cfcef2002-11-27 05:26:46 +0000498want.\footnote{This protocol is also used by the shallow and deep
Barry Warsawf595fd92001-11-15 23:39:07 +0000499copying operations defined in the
Fred Drakee9cfcef2002-11-27 05:26:46 +0000500\refmodule{copy} module.}
501
502\begin{notice}[warning]
503 For new-style classes, if \method{__getstate__()} returns a false
504 value, the \method{__setstate__()} method will not be called.
505\end{notice}
506
Barry Warsawf595fd92001-11-15 23:39:07 +0000507
508\subsubsection{Pickling and unpickling extension types}
509
510When the \class{Pickler} encounters an object of a type it knows
511nothing about --- such as an extension type --- it looks in two places
512for a hint of how to pickle it. One alternative is for the object to
513implement a \method{__reduce__()} method. If provided, at pickling
514time \method{__reduce__()} will be called with no arguments, and it
515must return either a string or a tuple.
516
517If a string is returned, it names a global variable whose contents are
518pickled as normal. When a tuple is returned, it must be of length two
519or three, with the following semantics:
520
521\begin{itemize}
522
523\item A callable object, which in the unpickling environment must be
524 either a class, a callable registered as a ``safe constructor''
525 (see below), or it must have an attribute
526 \member{__safe_for_unpickling__} with a true value. Otherwise,
527 an \exception{UnpicklingError} will be raised in the unpickling
528 environment. Note that as usual, the callable itself is pickled
529 by name.
530
531\item A tuple of arguments for the callable object, or \code{None}.
Raymond Hettinger97394bc2002-05-21 17:22:02 +0000532\deprecated{2.3}{Use the tuple of arguments instead}
Barry Warsawf595fd92001-11-15 23:39:07 +0000533
534\item Optionally, the object's state, which will be passed to
535 the object's \method{__setstate__()} method as described in
536 section~\ref{pickle-inst}. If the object has no
537 \method{__setstate__()} method, then, as above, the value must
538 be a dictionary and it will be added to the object's
539 \member{__dict__}.
540
541\end{itemize}
542
543Upon unpickling, the callable will be called (provided that it meets
544the above criteria), passing in the tuple of arguments; it should
Raymond Hettinger97394bc2002-05-21 17:22:02 +0000545return the unpickled object.
546
547If the second item was \code{None}, then instead of calling the
548callable directly, its \method{__basicnew__()} method is called
549without arguments. It should also return the unpickled object.
550
551\deprecated{2.3}{Use the tuple of arguments instead}
Barry Warsawf595fd92001-11-15 23:39:07 +0000552
553An alternative to implementing a \method{__reduce__()} method on the
554object to be pickled, is to register the callable with the
Fred Drake2744f432001-11-26 21:30:36 +0000555\refmodule[copyreg]{copy_reg} module. This module provides a way
Barry Warsawf595fd92001-11-15 23:39:07 +0000556for programs to register ``reduction functions'' and constructors for
557user-defined types. Reduction functions have the same semantics and
558interface as the \method{__reduce__()} method described above, except
559that they are called with a single argument, the object to be pickled.
560
561The registered constructor is deemed a ``safe constructor'' for purposes
562of unpickling as described above.
563
564\subsubsection{Pickling and unpickling external objects}
565
566For the benefit of object persistence, the \module{pickle} module
567supports the notion of a reference to an object outside the pickled
568data stream. Such objects are referenced by a ``persistent id'',
569which is just an arbitrary string of printable \ASCII{} characters.
570The resolution of such names is not defined by the \module{pickle}
571module; it will delegate this resolution to user defined functions on
Fred Drakef5f0c172003-09-09 19:49:18 +0000572the pickler and unpickler.\footnote{The actual mechanism for
Barry Warsawf595fd92001-11-15 23:39:07 +0000573associating these user defined functions is slightly different for
574\module{pickle} and \module{cPickle}. The description given here
575works the same for both implementations. Users of the \module{pickle}
576module could also use subclassing to effect the same results,
577overriding the \method{persistent_id()} and \method{persistent_load()}
Fred Drakef5f0c172003-09-09 19:49:18 +0000578methods in the derived classes.}
Barry Warsawf595fd92001-11-15 23:39:07 +0000579
580To define external persistent id resolution, you need to set the
581\member{persistent_id} attribute of the pickler object and the
582\member{persistent_load} attribute of the unpickler object.
583
584To pickle objects that have an external persistent id, the pickler
585must have a custom \function{persistent_id()} method that takes an
586object as an argument and returns either \code{None} or the persistent
587id for that object. When \code{None} is returned, the pickler simply
588pickles the object as normal. When a persistent id string is
589returned, the pickler will pickle that string, along with a marker
590so that the unpickler will recognize the string as a persistent id.
591
592To unpickle external objects, the unpickler must have a custom
593\function{persistent_load()} function that takes a persistent id
594string and returns the referenced object.
595
596Here's a silly example that \emph{might} shed more light:
597
598\begin{verbatim}
599import pickle
600from cStringIO import StringIO
601
602src = StringIO()
603p = pickle.Pickler(src)
604
605def persistent_id(obj):
606 if hasattr(obj, 'x'):
607 return 'the value %d' % obj.x
608 else:
609 return None
610
611p.persistent_id = persistent_id
612
613class Integer:
614 def __init__(self, x):
615 self.x = x
616 def __str__(self):
617 return 'My name is integer %d' % self.x
618
619i = Integer(7)
620print i
621p.dump(i)
622
623datastream = src.getvalue()
624print repr(datastream)
625dst = StringIO(datastream)
626
627up = pickle.Unpickler(dst)
628
629class FancyInteger(Integer):
630 def __str__(self):
631 return 'I am the integer %d' % self.x
632
633def persistent_load(persid):
634 if persid.startswith('the value '):
635 value = int(persid.split()[2])
636 return FancyInteger(value)
637 else:
638 raise pickle.UnpicklingError, 'Invalid persistent id'
639
640up.persistent_load = persistent_load
641
642j = up.load()
643print j
644\end{verbatim}
645
646In the \module{cPickle} module, the unpickler's
647\member{persistent_load} attribute can also be set to a Python
648list, in which case, when the unpickler reaches a persistent id, the
649persistent id string will simply be appended to this list. This
650functionality exists so that a pickle data stream can be ``sniffed''
651for object references without actually instantiating all the objects
Fred Drakef5f0c172003-09-09 19:49:18 +0000652in a pickle.\footnote{We'll leave you with the image of Guido and Jim
653sitting around sniffing pickles in their living rooms.} Setting
Barry Warsawf595fd92001-11-15 23:39:07 +0000654\member{persistent_load} to a list is usually used in conjunction with
655the \method{noload()} method on the Unpickler.
656
657% BAW: Both pickle and cPickle support something called
658% inst_persistent_id() which appears to give unknown types a second
659% shot at producing a persistent id. Since Jim Fulton can't remember
660% why it was added or what it's for, I'm leaving it undocumented.
661
Andrew M. Kuchling76963442003-05-14 16:51:46 +0000662\subsection{Subclassing Unpicklers \label{pickle-sub}}
Barry Warsawf595fd92001-11-15 23:39:07 +0000663
Andrew M. Kuchling76963442003-05-14 16:51:46 +0000664By default, unpickling will import any class that it finds in the
665pickle data. You can control exactly what gets unpickled and what
666gets called by customizing your unpickler. Unfortunately, exactly how
667you do this is different depending on whether you're using
668\module{pickle} or \module{cPickle}.\footnote{A word of caution: the
Barry Warsawf595fd92001-11-15 23:39:07 +0000669mechanisms described here use internal attributes and methods, which
670are subject to change in future versions of Python. We intend to
671someday provide a common interface for controlling this behavior,
Fred Drakef5f0c172003-09-09 19:49:18 +0000672which will work in either \module{pickle} or \module{cPickle}.}
Barry Warsawf595fd92001-11-15 23:39:07 +0000673
674In the \module{pickle} module, you need to derive a subclass from
675\class{Unpickler}, overriding the \method{load_global()}
676method. \method{load_global()} should read two lines from the pickle
Raymond Hettingerf17d65d2003-08-12 00:01:16 +0000677data stream where the first line will the name of the module
Barry Warsawf595fd92001-11-15 23:39:07 +0000678containing the class and the second line will be the name of the
Andrew M. Kuchling76963442003-05-14 16:51:46 +0000679instance's class. It then looks up the class, possibly importing the
Barry Warsawf595fd92001-11-15 23:39:07 +0000680module and digging out the attribute, then it appends what it finds to
681the unpickler's stack. Later on, this class will be assigned to the
682\member{__class__} attribute of an empty class, as a way of magically
683creating an instance without calling its class's \method{__init__()}.
Andrew M. Kuchling76963442003-05-14 16:51:46 +0000684Your job (should you choose to accept it), would be to have
Barry Warsawf595fd92001-11-15 23:39:07 +0000685\method{load_global()} push onto the unpickler's stack, a known safe
686version of any class you deem safe to unpickle. It is up to you to
687produce such a class. Or you could raise an error if you want to
688disallow all unpickling of instances. If this sounds like a hack,
Andrew M. Kuchling76963442003-05-14 16:51:46 +0000689you're right. Refer to the source code to make this work.
Barry Warsawf595fd92001-11-15 23:39:07 +0000690
691Things are a little cleaner with \module{cPickle}, but not by much.
692To control what gets unpickled, you can set the unpickler's
693\member{find_global} attribute to a function or \code{None}. If it is
694\code{None} then any attempts to unpickle instances will raise an
695\exception{UnpicklingError}. If it is a function,
696then it should accept a module name and a class name, and return the
697corresponding class object. It is responsible for looking up the
Andrew M. Kuchling76963442003-05-14 16:51:46 +0000698class and performing any necessary imports, and it may raise an
Barry Warsawf595fd92001-11-15 23:39:07 +0000699error to prevent instances of the class from being unpickled.
700
701The moral of the story is that you should be really careful about the
702source of the strings your application unpickles.
Fred Drake9463de21998-04-11 20:05:43 +0000703
Fred Drake38e5d272000-04-03 20:13:55 +0000704\subsection{Example \label{pickle-example}}
705
706Here's a simple example of how to modify pickling behavior for a
707class. The \class{TextReader} class opens a text file, and returns
708the line number and line contents each time its \method{readline()}
709method is called. If a \class{TextReader} instance is pickled, all
710attributes \emph{except} the file object member are saved. When the
711instance is unpickled, the file is reopened, and reading resumes from
712the last location. The \method{__setstate__()} and
713\method{__getstate__()} methods are used to implement this behavior.
714
715\begin{verbatim}
Fred Drake38e5d272000-04-03 20:13:55 +0000716class TextReader:
Fred Drakec8252802001-09-25 16:29:17 +0000717 """Print and number lines in a text file."""
718 def __init__(self, file):
Fred Drake38e5d272000-04-03 20:13:55 +0000719 self.file = file
Fred Drakec8252802001-09-25 16:29:17 +0000720 self.fh = open(file)
Fred Drake38e5d272000-04-03 20:13:55 +0000721 self.lineno = 0
722
723 def readline(self):
724 self.lineno = self.lineno + 1
725 line = self.fh.readline()
726 if not line:
727 return None
Fred Drakec8252802001-09-25 16:29:17 +0000728 if line.endswith("\n"):
729 line = line[:-1]
730 return "%d: %s" % (self.lineno, line)
Fred Drake38e5d272000-04-03 20:13:55 +0000731
Fred Drake38e5d272000-04-03 20:13:55 +0000732 def __getstate__(self):
Fred Drakec8252802001-09-25 16:29:17 +0000733 odict = self.__dict__.copy() # copy the dict since we change it
734 del odict['fh'] # remove filehandle entry
Fred Drake38e5d272000-04-03 20:13:55 +0000735 return odict
736
Fred Drake38e5d272000-04-03 20:13:55 +0000737 def __setstate__(self,dict):
Fred Drakec8252802001-09-25 16:29:17 +0000738 fh = open(dict['file']) # reopen file
739 count = dict['lineno'] # read from file...
740 while count: # until line count is restored
Fred Drake38e5d272000-04-03 20:13:55 +0000741 fh.readline()
742 count = count - 1
Fred Drakec8252802001-09-25 16:29:17 +0000743 self.__dict__.update(dict) # update attributes
744 self.fh = fh # save the file object
Fred Drake38e5d272000-04-03 20:13:55 +0000745\end{verbatim}
746
747A sample usage might be something like this:
748
749\begin{verbatim}
750>>> import TextReader
751>>> obj = TextReader.TextReader("TextReader.py")
752>>> obj.readline()
753'1: #!/usr/local/bin/python'
754>>> # (more invocations of obj.readline() here)
755... obj.readline()
756'7: class TextReader:'
757>>> import pickle
758>>> pickle.dump(obj,open('save.p','w'))
Fred Drakec8252802001-09-25 16:29:17 +0000759\end{verbatim}
Fred Drake38e5d272000-04-03 20:13:55 +0000760
Fred Drakec8252802001-09-25 16:29:17 +0000761If you want to see that \refmodule{pickle} works across Python
762processes, start another Python session, before continuing. What
763follows can happen from either the same process or a new process.
Fred Drake38e5d272000-04-03 20:13:55 +0000764
Fred Drakec8252802001-09-25 16:29:17 +0000765\begin{verbatim}
Fred Drake38e5d272000-04-03 20:13:55 +0000766>>> import pickle
767>>> reader = pickle.load(open('save.p'))
768>>> reader.readline()
769'8: "Print and number lines in a text file."'
770\end{verbatim}
771
772
Barry Warsawf595fd92001-11-15 23:39:07 +0000773\begin{seealso}
774 \seemodule[copyreg]{copy_reg}{Pickle interface constructor
775 registration for extension types.}
776
777 \seemodule{shelve}{Indexed databases of objects; uses \module{pickle}.}
778
779 \seemodule{copy}{Shallow and deep object copying.}
780
781 \seemodule{marshal}{High-performance serialization of built-in types.}
782\end{seealso}
783
784
785\section{\module{cPickle} --- A faster \module{pickle}}
Fred Drakeffbe6871999-04-22 21:23:22 +0000786
Fred Drakeb91e9341998-07-23 17:59:49 +0000787\declaremodule{builtin}{cPickle}
Fred Drake38e5d272000-04-03 20:13:55 +0000788\modulesynopsis{Faster version of \refmodule{pickle}, but not subclassable.}
Andrew M. Kuchlingc62af022004-01-08 15:01:08 +0000789\moduleauthor{Jim Fulton}{jim@zope.com}
Fred Drakeffbe6871999-04-22 21:23:22 +0000790\sectionauthor{Fred L. Drake, Jr.}{fdrake@acm.org}
Fred Drakeb91e9341998-07-23 17:59:49 +0000791
Barry Warsawf595fd92001-11-15 23:39:07 +0000792The \module{cPickle} module supports serialization and
793de-serialization of Python objects, providing an interface and
794functionality nearly identical to the
795\refmodule{pickle}\refstmodindex{pickle} module. There are several
796differences, the most important being performance and subclassability.
Fred Drake9463de21998-04-11 20:05:43 +0000797
Barry Warsawf595fd92001-11-15 23:39:07 +0000798First, \module{cPickle} can be up to 1000 times faster than
799\module{pickle} because the former is implemented in C. Second, in
800the \module{cPickle} module the callables \function{Pickler()} and
801\function{Unpickler()} are functions, not classes. This means that
802you cannot use them to derive custom pickling and unpickling
803subclasses. Most applications have no need for this functionality and
804should benefit from the greatly improved performance of the
805\module{cPickle} module.
Fred Drake9463de21998-04-11 20:05:43 +0000806
Barry Warsawf595fd92001-11-15 23:39:07 +0000807The pickle data stream produced by \module{pickle} and
808\module{cPickle} are identical, so it is possible to use
809\module{pickle} and \module{cPickle} interchangeably with existing
Fred Drakef5f0c172003-09-09 19:49:18 +0000810pickles.\footnote{Since the pickle data format is actually a tiny
Barry Warsawf595fd92001-11-15 23:39:07 +0000811stack-oriented programming language, and some freedom is taken in the
812encodings of certain objects, it is possible that the two modules
813produce different data streams for the same input objects. However it
814is guaranteed that they will always be able to read each other's
Fred Drakef5f0c172003-09-09 19:49:18 +0000815data streams.}
Guido van Rossumcf3ce921999-01-06 23:34:39 +0000816
Barry Warsawf595fd92001-11-15 23:39:07 +0000817There are additional minor differences in API between \module{cPickle}
818and \module{pickle}, however for most applications, they are
819interchangable. More documentation is provided in the
820\module{pickle} module documentation, which
821includes a list of the documented differences.
822
823