blob: ef8cb2334e5df6c85ea9da694326ccef813c717a [file] [log] [blame]
Barry Warsawf595fd92001-11-15 23:39:07 +00001\section{\module{pickle} --- Python object serialization}
Fred Drakeb91e9341998-07-23 17:59:49 +00002
Fred Drakeffbe6871999-04-22 21:23:22 +00003\declaremodule{standard}{pickle}
Fred Drakeb91e9341998-07-23 17:59:49 +00004\modulesynopsis{Convert Python objects to streams of bytes and back.}
Fred Drake38e5d272000-04-03 20:13:55 +00005% Substantial improvements by Jim Kerr <jbkerr@sr.hp.com>.
Barry Warsawf595fd92001-11-15 23:39:07 +00006% Rewritten by Barry Warsaw <barry@zope.com>
Fred Drakeb91e9341998-07-23 17:59:49 +00007
Thomas Woutersf8316632000-07-16 19:01:10 +00008\index{persistence}
Guido van Rossumd1883581995-02-15 15:53:08 +00009\indexii{persistent}{objects}
10\indexii{serializing}{objects}
11\indexii{marshalling}{objects}
12\indexii{flattening}{objects}
13\indexii{pickling}{objects}
14
Barry Warsawf595fd92001-11-15 23:39:07 +000015The \module{pickle} module implements a fundamental, but powerful
16algorithm for serializing and de-serializing a Python object
17structure. ``Pickling'' is the process whereby a Python object
18hierarchy is converted into a byte stream, and ``unpickling'' is the
19inverse operation, whereby a byte stream is converted back into an
20object hierarchy. Pickling (and unpickling) is alternatively known as
Fred Drake2744f432001-11-26 21:30:36 +000021``serialization'', ``marshalling,''\footnote{Don't confuse this with
22the \refmodule{marshal} module} or ``flattening'',
Raymond Hettinger35fd9262003-06-25 15:07:45 +000023however, to avoid confusion, the terms used here are ``pickling'' and
24``unpickling''.
Guido van Rossum470be141995-03-17 16:07:09 +000025
Barry Warsawf595fd92001-11-15 23:39:07 +000026This documentation describes both the \module{pickle} module and the
Fred Drake2744f432001-11-26 21:30:36 +000027\refmodule{cPickle} module.
Fred Drakeffbe6871999-04-22 21:23:22 +000028
Barry Warsawf595fd92001-11-15 23:39:07 +000029\subsection{Relationship to other Python modules}
Guido van Rossumd1883581995-02-15 15:53:08 +000030
Barry Warsawf595fd92001-11-15 23:39:07 +000031The \module{pickle} module has an optimized cousin called the
32\module{cPickle} module. As its name implies, \module{cPickle} is
33written in C, so it can be up to 1000 times faster than
34\module{pickle}. However it does not support subclassing of the
35\function{Pickler()} and \function{Unpickler()} classes, because in
36\module{cPickle} these are functions, not classes. Most applications
37have no need for this functionality, and can benefit from the improved
38performance of \module{cPickle}. Other than that, the interfaces of
39the two modules are nearly identical; the common interface is
40described in this manual and differences are pointed out where
41necessary. In the following discussions, we use the term ``pickle''
42to collectively describe the \module{pickle} and
43\module{cPickle} modules.
Guido van Rossum736fe5e1997-12-09 20:45:08 +000044
Barry Warsawf595fd92001-11-15 23:39:07 +000045The data streams the two modules produce are guaranteed to be
46interchangeable.
47
48Python has a more primitive serialization module called
Fred Drake2744f432001-11-26 21:30:36 +000049\refmodule{marshal}, but in general
Barry Warsawf595fd92001-11-15 23:39:07 +000050\module{pickle} should always be the preferred way to serialize Python
51objects. \module{marshal} exists primarily to support Python's
52\file{.pyc} files.
53
54The \module{pickle} module differs from \refmodule{marshal} several
55significant ways:
Guido van Rossumd1883581995-02-15 15:53:08 +000056
57\begin{itemize}
58
Barry Warsawf595fd92001-11-15 23:39:07 +000059\item The \module{pickle} module keeps track of the objects it has
60 already serialized, so that later references to the same object
61 won't be serialized again. \module{marshal} doesn't do this.
Guido van Rossumd1883581995-02-15 15:53:08 +000062
Barry Warsawf595fd92001-11-15 23:39:07 +000063 This has implications both for recursive objects and object
64 sharing. Recursive objects are objects that contain references
65 to themselves. These are not handled by marshal, and in fact,
66 attempting to marshal recursive objects will crash your Python
67 interpreter. Object sharing happens when there are multiple
68 references to the same object in different places in the object
69 hierarchy being serialized. \module{pickle} stores such objects
70 only once, and ensures that all other references point to the
71 master copy. Shared objects remain shared, which can be very
72 important for mutable objects.
Guido van Rossumd1883581995-02-15 15:53:08 +000073
Barry Warsawf595fd92001-11-15 23:39:07 +000074\item \module{marshal} cannot be used to serialize user-defined
75 classes and their instances. \module{pickle} can save and
76 restore class instances transparently, however the class
77 definition must be importable and live in the same module as
78 when the object was stored.
79
80\item The \module{marshal} serialization format is not guaranteed to
81 be portable across Python versions. Because its primary job in
82 life is to support \file{.pyc} files, the Python implementers
83 reserve the right to change the serialization format in
84 non-backwards compatible ways should the need arise. The
85 \module{pickle} serialization format is guaranteed to be
86 backwards compatible across Python releases.
87
Guido van Rossumd1883581995-02-15 15:53:08 +000088\end{itemize}
89
Andrew M. Kuchling76963442003-05-14 16:51:46 +000090\begin{notice}[warning]
91The \module{pickle} module is not intended to be secure against
92erroneous or maliciously constructed data. Never unpickle data
93received from an untrusted or unauthenticated source.
94\end{notice}
95
Barry Warsawf595fd92001-11-15 23:39:07 +000096Note that serialization is a more primitive notion than persistence;
97although
98\module{pickle} reads and writes file objects, it does not handle the
99issue of naming persistent objects, nor the (even more complicated)
100issue of concurrent access to persistent objects. The \module{pickle}
101module can transform a complex object into a byte stream and it can
102transform the byte stream into an object with the same internal
103structure. Perhaps the most obvious thing to do with these byte
104streams is to write them onto a file, but it is also conceivable to
105send them across a network or store them in a database. The module
106\refmodule{shelve} provides a simple interface
107to pickle and unpickle objects on DBM-style database files.
108
109\subsection{Data stream format}
110
Fred Drake9b28fe21998-04-04 06:20:28 +0000111The data format used by \module{pickle} is Python-specific. This has
Guido van Rossumd1883581995-02-15 15:53:08 +0000112the advantage that there are no restrictions imposed by external
Barry Warsawf595fd92001-11-15 23:39:07 +0000113standards such as XDR\index{XDR}\index{External Data Representation}
114(which can't represent pointer sharing); however it means that
115non-Python programs may not be able to reconstruct pickled Python
116objects.
Guido van Rossumd1883581995-02-15 15:53:08 +0000117
Fred Drake9b28fe21998-04-04 06:20:28 +0000118By default, the \module{pickle} data format uses a printable \ASCII{}
Guido van Rossum736fe5e1997-12-09 20:45:08 +0000119representation. This is slightly more voluminous than a binary
120representation. The big advantage of using printable \ASCII{} (and of
Fred Drake9b28fe21998-04-04 06:20:28 +0000121some other characteristics of \module{pickle}'s representation) is that
Guido van Rossum736fe5e1997-12-09 20:45:08 +0000122for debugging or recovery purposes it is possible for a human to read
123the pickled file with a standard text editor.
124
Neal Norwitz12d31e22003-02-13 03:12:48 +0000125There are currently 3 different protocols which can be used for pickling.
126
127\begin{itemize}
128
129\item Protocol version 0 is the original ASCII protocol and is backwards
130compatible with earlier versions of Python.
131
132\item Protocol version 1 is the old binary format which is also compatible
133with earlier versions of Python.
134
135\item Protocol version 2 was introduced in Python 2.3. It provides
136much more efficient pickling of new-style classes.
137
138\end{itemize}
139
140Refer to PEP 307 for more information.
141
142If a \var{protocol} is not specified, protocol 0 is used.
Neal Norwitzd08baa92003-02-21 00:26:33 +0000143If \var{protocol} is specified as a negative value
144or \constant{HIGHEST_PROTOCOL},
145the highest protocol version available will be used.
Neal Norwitz12d31e22003-02-13 03:12:48 +0000146
147\versionchanged[The \var{bin} parameter is deprecated and only provided
148for backwards compatibility. You should use the \var{protocol}
149parameter instead]{2.3}
150
Guido van Rossum736fe5e1997-12-09 20:45:08 +0000151A binary format, which is slightly more efficient, can be chosen by
Barry Warsawf595fd92001-11-15 23:39:07 +0000152specifying a true value for the \var{bin} argument to the
Fred Drake9b28fe21998-04-04 06:20:28 +0000153\class{Pickler} constructor or the \function{dump()} and \function{dumps()}
Neal Norwitz12d31e22003-02-13 03:12:48 +0000154functions. A \var{protocol} version >= 1 implies use of a binary format.
Guido van Rossumd1883581995-02-15 15:53:08 +0000155
Barry Warsawf595fd92001-11-15 23:39:07 +0000156\subsection{Usage}
Guido van Rossumd1883581995-02-15 15:53:08 +0000157
Barry Warsawf595fd92001-11-15 23:39:07 +0000158To serialize an object hierarchy, you first create a pickler, then you
159call the pickler's \method{dump()} method. To de-serialize a data
160stream, you first create an unpickler, then you call the unpickler's
161\method{load()} method. The \module{pickle} module provides the
Neal Norwitzd08baa92003-02-21 00:26:33 +0000162following constant:
163
164\begin{datadesc}{HIGHEST_PROTOCOL}
165The highest protocol version available. This value can be passed
166as a \var{protocol} value.
Fred Drake7c4d8f32003-09-10 20:47:43 +0000167\versionadded{2.3}
Neal Norwitzd08baa92003-02-21 00:26:33 +0000168\end{datadesc}
169
170The \module{pickle} module provides the
Barry Warsawf595fd92001-11-15 23:39:07 +0000171following functions to make this process more convenient:
Guido van Rossumd1883581995-02-15 15:53:08 +0000172
Neal Norwitz12d31e22003-02-13 03:12:48 +0000173\begin{funcdesc}{dump}{object, file\optional{, protocol\optional{, bin}}}
Barry Warsawf595fd92001-11-15 23:39:07 +0000174Write a pickled representation of \var{object} to the open file object
175\var{file}. This is equivalent to
Neal Norwitz12d31e22003-02-13 03:12:48 +0000176\code{Pickler(\var{file}, \var{protocol}, \var{bin}).dump(\var{object})}.
177
178If the \var{protocol} parameter is ommitted, protocol 0 is used.
Neal Norwitzd08baa92003-02-21 00:26:33 +0000179If \var{protocol} is specified as a negative value
180or \constant{HIGHEST_PROTOCOL},
Neal Norwitz12d31e22003-02-13 03:12:48 +0000181the highest protocol version will be used.
182
183\versionchanged[The \var{protocol} parameter was added.
184The \var{bin} parameter is deprecated and only provided
185for backwards compatibility. You should use the \var{protocol}
186parameter instead]{2.3}
187
Barry Warsawf595fd92001-11-15 23:39:07 +0000188If the optional \var{bin} argument is true, the binary pickle format
189is used; otherwise the (less efficient) text pickle format is used
190(for backwards compatibility, this is the default).
Guido van Rossumd1883581995-02-15 15:53:08 +0000191
Barry Warsawf595fd92001-11-15 23:39:07 +0000192\var{file} must have a \method{write()} method that accepts a single
193string argument. It can thus be a file object opened for writing, a
194\refmodule{StringIO} object, or any other custom
195object that meets this interface.
196\end{funcdesc}
Guido van Rossumd1883581995-02-15 15:53:08 +0000197
Barry Warsawf595fd92001-11-15 23:39:07 +0000198\begin{funcdesc}{load}{file}
199Read a string from the open file object \var{file} and interpret it as
200a pickle data stream, reconstructing and returning the original object
201hierarchy. This is equivalent to \code{Unpickler(\var{file}).load()}.
Guido van Rossum470be141995-03-17 16:07:09 +0000202
Barry Warsawf595fd92001-11-15 23:39:07 +0000203\var{file} must have two methods, a \method{read()} method that takes
204an integer argument, and a \method{readline()} method that requires no
205arguments. Both methods should return a string. Thus \var{file} can
206be a file object opened for reading, a
207\module{StringIO} object, or any other custom
208object that meets this interface.
Guido van Rossum736fe5e1997-12-09 20:45:08 +0000209
Barry Warsawf595fd92001-11-15 23:39:07 +0000210This function automatically determines whether the data stream was
211written in binary mode or not.
212\end{funcdesc}
Guido van Rossumd1883581995-02-15 15:53:08 +0000213
Neal Norwitz12d31e22003-02-13 03:12:48 +0000214\begin{funcdesc}{dumps}{object\optional{, protocol\optional{, bin}}}
Barry Warsawf595fd92001-11-15 23:39:07 +0000215Return the pickled representation of the object as a string, instead
Neal Norwitz12d31e22003-02-13 03:12:48 +0000216of writing it to a file.
217
218If the \var{protocol} parameter is ommitted, protocol 0 is used.
Neal Norwitzd08baa92003-02-21 00:26:33 +0000219If \var{protocol} is specified as a negative value
220or \constant{HIGHEST_PROTOCOL},
Neal Norwitz12d31e22003-02-13 03:12:48 +0000221the highest protocol version will be used.
222
223\versionchanged[The \var{protocol} parameter was added.
224The \var{bin} parameter is deprecated and only provided
225for backwards compatibility. You should use the \var{protocol}
226parameter instead]{2.3}
227
228If the optional \var{bin} argument is
Barry Warsawf595fd92001-11-15 23:39:07 +0000229true, the binary pickle format is used; otherwise the (less efficient)
230text pickle format is used (this is the default).
231\end{funcdesc}
Guido van Rossumd1883581995-02-15 15:53:08 +0000232
Barry Warsawf595fd92001-11-15 23:39:07 +0000233\begin{funcdesc}{loads}{string}
234Read a pickled object hierarchy from a string. Characters in the
235string past the pickled object's representation are ignored.
236\end{funcdesc}
Guido van Rossumd1883581995-02-15 15:53:08 +0000237
Barry Warsawf595fd92001-11-15 23:39:07 +0000238The \module{pickle} module also defines three exceptions:
Guido van Rossum470be141995-03-17 16:07:09 +0000239
Barry Warsawf595fd92001-11-15 23:39:07 +0000240\begin{excdesc}{PickleError}
241A common base class for the other exceptions defined below. This
242inherits from \exception{Exception}.
243\end{excdesc}
Guido van Rossum470be141995-03-17 16:07:09 +0000244
Barry Warsawf595fd92001-11-15 23:39:07 +0000245\begin{excdesc}{PicklingError}
246This exception is raised when an unpicklable object is passed to
247the \method{dump()} method.
248\end{excdesc}
Guido van Rossumd1883581995-02-15 15:53:08 +0000249
Barry Warsawf595fd92001-11-15 23:39:07 +0000250\begin{excdesc}{UnpicklingError}
Andrew M. Kuchling76963442003-05-14 16:51:46 +0000251This exception is raised when there is a problem unpickling an object.
252Note that other exceptions may also be raised during unpickling,
253including (but not necessarily limited to) \exception{AttributeError},
254\exception{EOFError}, \exception{ImportError}, and \exception{IndexError}.
Barry Warsawf595fd92001-11-15 23:39:07 +0000255\end{excdesc}
256
257The \module{pickle} module also exports two callables\footnote{In the
258\module{pickle} module these callables are classes, which you could
Fred Drake7c4d8f32003-09-10 20:47:43 +0000259subclass to customize the behavior. However, in the \refmodule{cPickle}
260module these callables are factory functions and so cannot be
261subclassed. One common reason to subclass is to control what
Andrew M. Kuchling76963442003-05-14 16:51:46 +0000262objects can actually be unpickled. See section~\ref{pickle-sub} for
Fred Drake7c4d8f32003-09-10 20:47:43 +0000263more details.}, \class{Pickler} and \class{Unpickler}:
Barry Warsawf595fd92001-11-15 23:39:07 +0000264
Neal Norwitz12d31e22003-02-13 03:12:48 +0000265\begin{classdesc}{Pickler}{file\optional{, protocol\optional{, bin}}}
Barry Warsawf595fd92001-11-15 23:39:07 +0000266This takes a file-like object to which it will write a pickle data
Neal Norwitz12d31e22003-02-13 03:12:48 +0000267stream.
268
269If the \var{protocol} parameter is ommitted, protocol 0 is used.
270If \var{protocol} is specified as a negative value,
271the highest protocol version will be used.
272
273\versionchanged[The \var{bin} parameter is deprecated and only provided
274for backwards compatibility. You should use the \var{protocol}
275parameter instead]{2.3}
276
277Optional \var{bin} if true, tells the pickler to use the more
Barry Warsawf595fd92001-11-15 23:39:07 +0000278efficient binary pickle format, otherwise the \ASCII{} format is used
279(this is the default).
280
281\var{file} must have a \method{write()} method that accepts a single
282string argument. It can thus be an open file object, a
283\module{StringIO} object, or any other custom
284object that meets this interface.
285\end{classdesc}
286
287\class{Pickler} objects define one (or two) public methods:
288
289\begin{methoddesc}[Pickler]{dump}{object}
290Write a pickled representation of \var{object} to the open file object
291given in the constructor. Either the binary or \ASCII{} format will
292be used, depending on the value of the \var{bin} flag passed to the
293constructor.
294\end{methoddesc}
295
296\begin{methoddesc}[Pickler]{clear_memo}{}
297Clears the pickler's ``memo''. The memo is the data structure that
298remembers which objects the pickler has already seen, so that shared
299or recursive objects pickled by reference and not by value. This
300method is useful when re-using picklers.
301
Fred Drake7f781c92002-05-01 20:33:53 +0000302\begin{notice}
303Prior to Python 2.3, \method{clear_memo()} was only available on the
304picklers created by \refmodule{cPickle}. In the \module{pickle} module,
305picklers have an instance variable called \member{memo} which is a
306Python dictionary. So to clear the memo for a \module{pickle} module
Barry Warsawf595fd92001-11-15 23:39:07 +0000307pickler, you could do the following:
Guido van Rossumd1883581995-02-15 15:53:08 +0000308
Fred Drake19479911998-02-13 06:58:54 +0000309\begin{verbatim}
Barry Warsawf595fd92001-11-15 23:39:07 +0000310mypickler.memo.clear()
Fred Drake19479911998-02-13 06:58:54 +0000311\end{verbatim}
Fred Drake7f781c92002-05-01 20:33:53 +0000312
313Code that does not need to support older versions of Python should
314simply use \method{clear_memo()}.
315\end{notice}
Barry Warsawf595fd92001-11-15 23:39:07 +0000316\end{methoddesc}
Fred Drake9b28fe21998-04-04 06:20:28 +0000317
Barry Warsawf595fd92001-11-15 23:39:07 +0000318It is possible to make multiple calls to the \method{dump()} method of
319the same \class{Pickler} instance. These must then be matched to the
320same number of calls to the \method{load()} method of the
321corresponding \class{Unpickler} instance. If the same object is
322pickled by multiple \method{dump()} calls, the \method{load()} will
Fred Drakef5f0c172003-09-09 19:49:18 +0000323all yield references to the same object.\footnote{\emph{Warning}: this
Barry Warsawf595fd92001-11-15 23:39:07 +0000324is intended for pickling multiple objects without intervening
325modifications to the objects or their parts. If you modify an object
326and then pickle it again using the same \class{Pickler} instance, the
327object is not pickled again --- a reference to it is pickled and the
328\class{Unpickler} will return the old value, not the modified one.
329There are two problems here: (1) detecting changes, and (2)
330marshalling a minimal set of changes. Garbage Collection may also
Fred Drakef5f0c172003-09-09 19:49:18 +0000331become a problem here.}
Guido van Rossum470be141995-03-17 16:07:09 +0000332
Barry Warsawf595fd92001-11-15 23:39:07 +0000333\class{Unpickler} objects are defined as:
Fred Drake9b28fe21998-04-04 06:20:28 +0000334
Barry Warsawf595fd92001-11-15 23:39:07 +0000335\begin{classdesc}{Unpickler}{file}
336This takes a file-like object from which it will read a pickle data
337stream. This class automatically determines whether the data stream
338was written in binary mode or not, so it does not need a flag as in
339the \class{Pickler} factory.
Guido van Rossumd1883581995-02-15 15:53:08 +0000340
Barry Warsawf595fd92001-11-15 23:39:07 +0000341\var{file} must have two methods, a \method{read()} method that takes
342an integer argument, and a \method{readline()} method that requires no
343arguments. Both methods should return a string. Thus \var{file} can
344be a file object opened for reading, a
345\module{StringIO} object, or any other custom
346object that meets this interface.
347\end{classdesc}
Fred Drake9b28fe21998-04-04 06:20:28 +0000348
Barry Warsawf595fd92001-11-15 23:39:07 +0000349\class{Unpickler} objects have one (or two) public methods:
Guido van Rossum470be141995-03-17 16:07:09 +0000350
Barry Warsawf595fd92001-11-15 23:39:07 +0000351\begin{methoddesc}[Unpickler]{load}{}
352Read a pickled object representation from the open file object given
353in the constructor, and return the reconstituted object hierarchy
354specified therein.
355\end{methoddesc}
Fred Drake9b28fe21998-04-04 06:20:28 +0000356
Barry Warsawf595fd92001-11-15 23:39:07 +0000357\begin{methoddesc}[Unpickler]{noload}{}
358This is just like \method{load()} except that it doesn't actually
359create any objects. This is useful primarily for finding what's
360called ``persistent ids'' that may be referenced in a pickle data
361stream. See section~\ref{pickle-protocol} below for more details.
Guido van Rossumd1883581995-02-15 15:53:08 +0000362
Barry Warsawf595fd92001-11-15 23:39:07 +0000363\strong{Note:} the \method{noload()} method is currently only
364available on \class{Unpickler} objects created with the
365\module{cPickle} module. \module{pickle} module \class{Unpickler}s do
366not have the \method{noload()} method.
367\end{methoddesc}
368
369\subsection{What can be pickled and unpickled?}
Guido van Rossum736fe5e1997-12-09 20:45:08 +0000370
Guido van Rossumd1883581995-02-15 15:53:08 +0000371The following types can be pickled:
Fred Drake41796911999-07-02 14:25:37 +0000372
Guido van Rossumd1883581995-02-15 15:53:08 +0000373\begin{itemize}
374
Raymond Hettingeracb45d72002-08-05 03:55:36 +0000375\item \code{None}, \code{True}, and \code{False}
Guido van Rossumd1883581995-02-15 15:53:08 +0000376
Barry Warsawf595fd92001-11-15 23:39:07 +0000377\item integers, long integers, floating point numbers, complex numbers
Guido van Rossumd1883581995-02-15 15:53:08 +0000378
Fred Drake56ced2a2000-04-06 15:04:30 +0000379\item normal and Unicode strings
Guido van Rossumd1883581995-02-15 15:53:08 +0000380
Barry Warsawf595fd92001-11-15 23:39:07 +0000381\item tuples, lists, and dictionaries containing only picklable objects
Guido van Rossumd1883581995-02-15 15:53:08 +0000382
Barry Warsawf595fd92001-11-15 23:39:07 +0000383\item functions defined at the top level of a module
Fred Drake38e5d272000-04-03 20:13:55 +0000384
Barry Warsawf595fd92001-11-15 23:39:07 +0000385\item built-in functions defined at the top level of a module
Fred Drake38e5d272000-04-03 20:13:55 +0000386
Barry Warsawf595fd92001-11-15 23:39:07 +0000387\item classes that are defined at the top level of a module
Guido van Rossum470be141995-03-17 16:07:09 +0000388
Fred Drake9b28fe21998-04-04 06:20:28 +0000389\item instances of such classes whose \member{__dict__} or
Barry Warsawf595fd92001-11-15 23:39:07 +0000390\method{__setstate__()} is picklable (see
391section~\ref{pickle-protocol} for details)
Guido van Rossumd1883581995-02-15 15:53:08 +0000392
393\end{itemize}
394
Guido van Rossum470be141995-03-17 16:07:09 +0000395Attempts to pickle unpicklable objects will raise the
Fred Drake9b28fe21998-04-04 06:20:28 +0000396\exception{PicklingError} exception; when this happens, an unspecified
Barry Warsawf595fd92001-11-15 23:39:07 +0000397number of bytes may have already been written to the underlying file.
Guido van Rossumd1883581995-02-15 15:53:08 +0000398
Barry Warsawf595fd92001-11-15 23:39:07 +0000399Note that functions (built-in and user-defined) are pickled by ``fully
400qualified'' name reference, not by value. This means that only the
401function name is pickled, along with the name of module the function
402is defined in. Neither the function's code, nor any of its function
403attributes are pickled. Thus the defining module must be importable
404in the unpickling environment, and the module must contain the named
Fred Drakef5f0c172003-09-09 19:49:18 +0000405object, otherwise an exception will be raised.\footnote{The exception
Barry Warsawf595fd92001-11-15 23:39:07 +0000406raised will likely be an \exception{ImportError} or an
Fred Drakef5f0c172003-09-09 19:49:18 +0000407\exception{AttributeError} but it could be something else.}
Guido van Rossum470be141995-03-17 16:07:09 +0000408
Barry Warsawf595fd92001-11-15 23:39:07 +0000409Similarly, classes are pickled by named reference, so the same
410restrictions in the unpickling environment apply. Note that none of
411the class's code or data is pickled, so in the following example the
412class attribute \code{attr} is not restored in the unpickling
413environment:
Guido van Rossum470be141995-03-17 16:07:09 +0000414
Barry Warsawf595fd92001-11-15 23:39:07 +0000415\begin{verbatim}
416class Foo:
417 attr = 'a class attr'
Guido van Rossum470be141995-03-17 16:07:09 +0000418
Barry Warsawf595fd92001-11-15 23:39:07 +0000419picklestring = pickle.dumps(Foo)
420\end{verbatim}
Guido van Rossum470be141995-03-17 16:07:09 +0000421
Barry Warsawf595fd92001-11-15 23:39:07 +0000422These restrictions are why picklable functions and classes must be
423defined in the top level of a module.
Guido van Rossum470be141995-03-17 16:07:09 +0000424
Barry Warsawf595fd92001-11-15 23:39:07 +0000425Similarly, when class instances are pickled, their class's code and
426data are not pickled along with them. Only the instance data are
427pickled. This is done on purpose, so you can fix bugs in a class or
428add methods to the class and still load objects that were created with
429an earlier version of the class. If you plan to have long-lived
430objects that will see many versions of a class, it may be worthwhile
431to put a version number in the objects so that suitable conversions
432can be made by the class's \method{__setstate__()} method.
Guido van Rossum470be141995-03-17 16:07:09 +0000433
Barry Warsawf595fd92001-11-15 23:39:07 +0000434\subsection{The pickle protocol
435\label{pickle-protocol}}\setindexsubitem{(pickle protocol)}
Fred Drake40748961998-03-06 21:27:14 +0000436
Barry Warsawf595fd92001-11-15 23:39:07 +0000437This section describes the ``pickling protocol'' that defines the
438interface between the pickler/unpickler and the objects that are being
439serialized. This protocol provides a standard way for you to define,
440customize, and control how your objects are serialized and
441de-serialized. The description in this section doesn't cover specific
442customizations that you can employ to make the unpickling environment
Andrew M. Kuchling76963442003-05-14 16:51:46 +0000443slightly safer from untrusted pickle data streams; see section~\ref{pickle-sub}
Barry Warsawf595fd92001-11-15 23:39:07 +0000444for more details.
Fred Drake40748961998-03-06 21:27:14 +0000445
Barry Warsawf595fd92001-11-15 23:39:07 +0000446\subsubsection{Pickling and unpickling normal class
447 instances\label{pickle-inst}}
Fred Drake9b28fe21998-04-04 06:20:28 +0000448
Barry Warsawf595fd92001-11-15 23:39:07 +0000449When a pickled class instance is unpickled, its \method{__init__()}
450method is normally \emph{not} invoked. If it is desirable that the
451\method{__init__()} method be called on unpickling, a class can define
452a method \method{__getinitargs__()}, which should return a
453\emph{tuple} containing the arguments to be passed to the class
454constructor (i.e. \method{__init__()}). The
455\method{__getinitargs__()} method is called at
456pickle time; the tuple it returns is incorporated in the pickle for
457the instance.
458\withsubitem{(copy protocol)}{\ttindex{__getinitargs__()}}
459\withsubitem{(instance constructor)}{\ttindex{__init__()}}
Fred Drake17e56401998-04-11 20:43:51 +0000460
Barry Warsawf595fd92001-11-15 23:39:07 +0000461\withsubitem{(copy protocol)}{
462 \ttindex{__getstate__()}\ttindex{__setstate__()}}
463\withsubitem{(instance attribute)}{
464 \ttindex{__dict__}}
Fred Drake17e56401998-04-11 20:43:51 +0000465
Barry Warsawf595fd92001-11-15 23:39:07 +0000466Classes can further influence how their instances are pickled; if the
467class defines the method \method{__getstate__()}, it is called and the
468return state is pickled as the contents for the instance, instead of
469the contents of the instance's dictionary. If there is no
470\method{__getstate__()} method, the instance's \member{__dict__} is
471pickled.
Fred Drake9463de21998-04-11 20:05:43 +0000472
Barry Warsawf595fd92001-11-15 23:39:07 +0000473Upon unpickling, if the class also defines the method
474\method{__setstate__()}, it is called with the unpickled
Fred Drakef5f0c172003-09-09 19:49:18 +0000475state.\footnote{These methods can also be used to implement copying
476class instances.} If there is no \method{__setstate__()} method, the
Fred Drakee9cfcef2002-11-27 05:26:46 +0000477pickled state must be a dictionary and its items are assigned to the
Barry Warsawf595fd92001-11-15 23:39:07 +0000478new instance's dictionary. If a class defines both
479\method{__getstate__()} and \method{__setstate__()}, the state object
480needn't be a dictionary and these methods can do what they
Fred Drakee9cfcef2002-11-27 05:26:46 +0000481want.\footnote{This protocol is also used by the shallow and deep
Barry Warsawf595fd92001-11-15 23:39:07 +0000482copying operations defined in the
Fred Drakee9cfcef2002-11-27 05:26:46 +0000483\refmodule{copy} module.}
484
485\begin{notice}[warning]
486 For new-style classes, if \method{__getstate__()} returns a false
487 value, the \method{__setstate__()} method will not be called.
488\end{notice}
489
Barry Warsawf595fd92001-11-15 23:39:07 +0000490
491\subsubsection{Pickling and unpickling extension types}
492
493When the \class{Pickler} encounters an object of a type it knows
494nothing about --- such as an extension type --- it looks in two places
495for a hint of how to pickle it. One alternative is for the object to
496implement a \method{__reduce__()} method. If provided, at pickling
497time \method{__reduce__()} will be called with no arguments, and it
498must return either a string or a tuple.
499
500If a string is returned, it names a global variable whose contents are
501pickled as normal. When a tuple is returned, it must be of length two
502or three, with the following semantics:
503
504\begin{itemize}
505
506\item A callable object, which in the unpickling environment must be
507 either a class, a callable registered as a ``safe constructor''
508 (see below), or it must have an attribute
509 \member{__safe_for_unpickling__} with a true value. Otherwise,
510 an \exception{UnpicklingError} will be raised in the unpickling
511 environment. Note that as usual, the callable itself is pickled
512 by name.
513
514\item A tuple of arguments for the callable object, or \code{None}.
Raymond Hettinger97394bc2002-05-21 17:22:02 +0000515\deprecated{2.3}{Use the tuple of arguments instead}
Barry Warsawf595fd92001-11-15 23:39:07 +0000516
517\item Optionally, the object's state, which will be passed to
518 the object's \method{__setstate__()} method as described in
519 section~\ref{pickle-inst}. If the object has no
520 \method{__setstate__()} method, then, as above, the value must
521 be a dictionary and it will be added to the object's
522 \member{__dict__}.
523
524\end{itemize}
525
526Upon unpickling, the callable will be called (provided that it meets
527the above criteria), passing in the tuple of arguments; it should
Raymond Hettinger97394bc2002-05-21 17:22:02 +0000528return the unpickled object.
529
530If the second item was \code{None}, then instead of calling the
531callable directly, its \method{__basicnew__()} method is called
532without arguments. It should also return the unpickled object.
533
534\deprecated{2.3}{Use the tuple of arguments instead}
Barry Warsawf595fd92001-11-15 23:39:07 +0000535
536An alternative to implementing a \method{__reduce__()} method on the
537object to be pickled, is to register the callable with the
Fred Drake2744f432001-11-26 21:30:36 +0000538\refmodule[copyreg]{copy_reg} module. This module provides a way
Barry Warsawf595fd92001-11-15 23:39:07 +0000539for programs to register ``reduction functions'' and constructors for
540user-defined types. Reduction functions have the same semantics and
541interface as the \method{__reduce__()} method described above, except
542that they are called with a single argument, the object to be pickled.
543
544The registered constructor is deemed a ``safe constructor'' for purposes
545of unpickling as described above.
546
547\subsubsection{Pickling and unpickling external objects}
548
549For the benefit of object persistence, the \module{pickle} module
550supports the notion of a reference to an object outside the pickled
551data stream. Such objects are referenced by a ``persistent id'',
552which is just an arbitrary string of printable \ASCII{} characters.
553The resolution of such names is not defined by the \module{pickle}
554module; it will delegate this resolution to user defined functions on
Fred Drakef5f0c172003-09-09 19:49:18 +0000555the pickler and unpickler.\footnote{The actual mechanism for
Barry Warsawf595fd92001-11-15 23:39:07 +0000556associating these user defined functions is slightly different for
557\module{pickle} and \module{cPickle}. The description given here
558works the same for both implementations. Users of the \module{pickle}
559module could also use subclassing to effect the same results,
560overriding the \method{persistent_id()} and \method{persistent_load()}
Fred Drakef5f0c172003-09-09 19:49:18 +0000561methods in the derived classes.}
Barry Warsawf595fd92001-11-15 23:39:07 +0000562
563To define external persistent id resolution, you need to set the
564\member{persistent_id} attribute of the pickler object and the
565\member{persistent_load} attribute of the unpickler object.
566
567To pickle objects that have an external persistent id, the pickler
568must have a custom \function{persistent_id()} method that takes an
569object as an argument and returns either \code{None} or the persistent
570id for that object. When \code{None} is returned, the pickler simply
571pickles the object as normal. When a persistent id string is
572returned, the pickler will pickle that string, along with a marker
573so that the unpickler will recognize the string as a persistent id.
574
575To unpickle external objects, the unpickler must have a custom
576\function{persistent_load()} function that takes a persistent id
577string and returns the referenced object.
578
579Here's a silly example that \emph{might} shed more light:
580
581\begin{verbatim}
582import pickle
583from cStringIO import StringIO
584
585src = StringIO()
586p = pickle.Pickler(src)
587
588def persistent_id(obj):
589 if hasattr(obj, 'x'):
590 return 'the value %d' % obj.x
591 else:
592 return None
593
594p.persistent_id = persistent_id
595
596class Integer:
597 def __init__(self, x):
598 self.x = x
599 def __str__(self):
600 return 'My name is integer %d' % self.x
601
602i = Integer(7)
603print i
604p.dump(i)
605
606datastream = src.getvalue()
607print repr(datastream)
608dst = StringIO(datastream)
609
610up = pickle.Unpickler(dst)
611
612class FancyInteger(Integer):
613 def __str__(self):
614 return 'I am the integer %d' % self.x
615
616def persistent_load(persid):
617 if persid.startswith('the value '):
618 value = int(persid.split()[2])
619 return FancyInteger(value)
620 else:
621 raise pickle.UnpicklingError, 'Invalid persistent id'
622
623up.persistent_load = persistent_load
624
625j = up.load()
626print j
627\end{verbatim}
628
629In the \module{cPickle} module, the unpickler's
630\member{persistent_load} attribute can also be set to a Python
631list, in which case, when the unpickler reaches a persistent id, the
632persistent id string will simply be appended to this list. This
633functionality exists so that a pickle data stream can be ``sniffed''
634for object references without actually instantiating all the objects
Fred Drakef5f0c172003-09-09 19:49:18 +0000635in a pickle.\footnote{We'll leave you with the image of Guido and Jim
636sitting around sniffing pickles in their living rooms.} Setting
Barry Warsawf595fd92001-11-15 23:39:07 +0000637\member{persistent_load} to a list is usually used in conjunction with
638the \method{noload()} method on the Unpickler.
639
640% BAW: Both pickle and cPickle support something called
641% inst_persistent_id() which appears to give unknown types a second
642% shot at producing a persistent id. Since Jim Fulton can't remember
643% why it was added or what it's for, I'm leaving it undocumented.
644
Andrew M. Kuchling76963442003-05-14 16:51:46 +0000645\subsection{Subclassing Unpicklers \label{pickle-sub}}
Barry Warsawf595fd92001-11-15 23:39:07 +0000646
Andrew M. Kuchling76963442003-05-14 16:51:46 +0000647By default, unpickling will import any class that it finds in the
648pickle data. You can control exactly what gets unpickled and what
649gets called by customizing your unpickler. Unfortunately, exactly how
650you do this is different depending on whether you're using
651\module{pickle} or \module{cPickle}.\footnote{A word of caution: the
Barry Warsawf595fd92001-11-15 23:39:07 +0000652mechanisms described here use internal attributes and methods, which
653are subject to change in future versions of Python. We intend to
654someday provide a common interface for controlling this behavior,
Fred Drakef5f0c172003-09-09 19:49:18 +0000655which will work in either \module{pickle} or \module{cPickle}.}
Barry Warsawf595fd92001-11-15 23:39:07 +0000656
657In the \module{pickle} module, you need to derive a subclass from
658\class{Unpickler}, overriding the \method{load_global()}
659method. \method{load_global()} should read two lines from the pickle
Raymond Hettingerf17d65d2003-08-12 00:01:16 +0000660data stream where the first line will the name of the module
Barry Warsawf595fd92001-11-15 23:39:07 +0000661containing the class and the second line will be the name of the
Andrew M. Kuchling76963442003-05-14 16:51:46 +0000662instance's class. It then looks up the class, possibly importing the
Barry Warsawf595fd92001-11-15 23:39:07 +0000663module and digging out the attribute, then it appends what it finds to
664the unpickler's stack. Later on, this class will be assigned to the
665\member{__class__} attribute of an empty class, as a way of magically
666creating an instance without calling its class's \method{__init__()}.
Andrew M. Kuchling76963442003-05-14 16:51:46 +0000667Your job (should you choose to accept it), would be to have
Barry Warsawf595fd92001-11-15 23:39:07 +0000668\method{load_global()} push onto the unpickler's stack, a known safe
669version of any class you deem safe to unpickle. It is up to you to
670produce such a class. Or you could raise an error if you want to
671disallow all unpickling of instances. If this sounds like a hack,
Andrew M. Kuchling76963442003-05-14 16:51:46 +0000672you're right. Refer to the source code to make this work.
Barry Warsawf595fd92001-11-15 23:39:07 +0000673
674Things are a little cleaner with \module{cPickle}, but not by much.
675To control what gets unpickled, you can set the unpickler's
676\member{find_global} attribute to a function or \code{None}. If it is
677\code{None} then any attempts to unpickle instances will raise an
678\exception{UnpicklingError}. If it is a function,
679then it should accept a module name and a class name, and return the
680corresponding class object. It is responsible for looking up the
Andrew M. Kuchling76963442003-05-14 16:51:46 +0000681class and performing any necessary imports, and it may raise an
Barry Warsawf595fd92001-11-15 23:39:07 +0000682error to prevent instances of the class from being unpickled.
683
684The moral of the story is that you should be really careful about the
685source of the strings your application unpickles.
Fred Drake9463de21998-04-11 20:05:43 +0000686
Fred Drake38e5d272000-04-03 20:13:55 +0000687\subsection{Example \label{pickle-example}}
688
689Here's a simple example of how to modify pickling behavior for a
690class. The \class{TextReader} class opens a text file, and returns
691the line number and line contents each time its \method{readline()}
692method is called. If a \class{TextReader} instance is pickled, all
693attributes \emph{except} the file object member are saved. When the
694instance is unpickled, the file is reopened, and reading resumes from
695the last location. The \method{__setstate__()} and
696\method{__getstate__()} methods are used to implement this behavior.
697
698\begin{verbatim}
Fred Drake38e5d272000-04-03 20:13:55 +0000699class TextReader:
Fred Drakec8252802001-09-25 16:29:17 +0000700 """Print and number lines in a text file."""
701 def __init__(self, file):
Fred Drake38e5d272000-04-03 20:13:55 +0000702 self.file = file
Fred Drakec8252802001-09-25 16:29:17 +0000703 self.fh = open(file)
Fred Drake38e5d272000-04-03 20:13:55 +0000704 self.lineno = 0
705
706 def readline(self):
707 self.lineno = self.lineno + 1
708 line = self.fh.readline()
709 if not line:
710 return None
Fred Drakec8252802001-09-25 16:29:17 +0000711 if line.endswith("\n"):
712 line = line[:-1]
713 return "%d: %s" % (self.lineno, line)
Fred Drake38e5d272000-04-03 20:13:55 +0000714
Fred Drake38e5d272000-04-03 20:13:55 +0000715 def __getstate__(self):
Fred Drakec8252802001-09-25 16:29:17 +0000716 odict = self.__dict__.copy() # copy the dict since we change it
717 del odict['fh'] # remove filehandle entry
Fred Drake38e5d272000-04-03 20:13:55 +0000718 return odict
719
Fred Drake38e5d272000-04-03 20:13:55 +0000720 def __setstate__(self,dict):
Fred Drakec8252802001-09-25 16:29:17 +0000721 fh = open(dict['file']) # reopen file
722 count = dict['lineno'] # read from file...
723 while count: # until line count is restored
Fred Drake38e5d272000-04-03 20:13:55 +0000724 fh.readline()
725 count = count - 1
Fred Drakec8252802001-09-25 16:29:17 +0000726 self.__dict__.update(dict) # update attributes
727 self.fh = fh # save the file object
Fred Drake38e5d272000-04-03 20:13:55 +0000728\end{verbatim}
729
730A sample usage might be something like this:
731
732\begin{verbatim}
733>>> import TextReader
734>>> obj = TextReader.TextReader("TextReader.py")
735>>> obj.readline()
736'1: #!/usr/local/bin/python'
737>>> # (more invocations of obj.readline() here)
738... obj.readline()
739'7: class TextReader:'
740>>> import pickle
741>>> pickle.dump(obj,open('save.p','w'))
Fred Drakec8252802001-09-25 16:29:17 +0000742\end{verbatim}
Fred Drake38e5d272000-04-03 20:13:55 +0000743
Fred Drakec8252802001-09-25 16:29:17 +0000744If you want to see that \refmodule{pickle} works across Python
745processes, start another Python session, before continuing. What
746follows can happen from either the same process or a new process.
Fred Drake38e5d272000-04-03 20:13:55 +0000747
Fred Drakec8252802001-09-25 16:29:17 +0000748\begin{verbatim}
Fred Drake38e5d272000-04-03 20:13:55 +0000749>>> import pickle
750>>> reader = pickle.load(open('save.p'))
751>>> reader.readline()
752'8: "Print and number lines in a text file."'
753\end{verbatim}
754
755
Barry Warsawf595fd92001-11-15 23:39:07 +0000756\begin{seealso}
757 \seemodule[copyreg]{copy_reg}{Pickle interface constructor
758 registration for extension types.}
759
760 \seemodule{shelve}{Indexed databases of objects; uses \module{pickle}.}
761
762 \seemodule{copy}{Shallow and deep object copying.}
763
764 \seemodule{marshal}{High-performance serialization of built-in types.}
765\end{seealso}
766
767
768\section{\module{cPickle} --- A faster \module{pickle}}
Fred Drakeffbe6871999-04-22 21:23:22 +0000769
Fred Drakeb91e9341998-07-23 17:59:49 +0000770\declaremodule{builtin}{cPickle}
Fred Drake38e5d272000-04-03 20:13:55 +0000771\modulesynopsis{Faster version of \refmodule{pickle}, but not subclassable.}
Fred Drakeffbe6871999-04-22 21:23:22 +0000772\moduleauthor{Jim Fulton}{jfulton@digicool.com}
773\sectionauthor{Fred L. Drake, Jr.}{fdrake@acm.org}
Fred Drakeb91e9341998-07-23 17:59:49 +0000774
Barry Warsawf595fd92001-11-15 23:39:07 +0000775The \module{cPickle} module supports serialization and
776de-serialization of Python objects, providing an interface and
777functionality nearly identical to the
778\refmodule{pickle}\refstmodindex{pickle} module. There are several
779differences, the most important being performance and subclassability.
Fred Drake9463de21998-04-11 20:05:43 +0000780
Barry Warsawf595fd92001-11-15 23:39:07 +0000781First, \module{cPickle} can be up to 1000 times faster than
782\module{pickle} because the former is implemented in C. Second, in
783the \module{cPickle} module the callables \function{Pickler()} and
784\function{Unpickler()} are functions, not classes. This means that
785you cannot use them to derive custom pickling and unpickling
786subclasses. Most applications have no need for this functionality and
787should benefit from the greatly improved performance of the
788\module{cPickle} module.
Fred Drake9463de21998-04-11 20:05:43 +0000789
Barry Warsawf595fd92001-11-15 23:39:07 +0000790The pickle data stream produced by \module{pickle} and
791\module{cPickle} are identical, so it is possible to use
792\module{pickle} and \module{cPickle} interchangeably with existing
Fred Drakef5f0c172003-09-09 19:49:18 +0000793pickles.\footnote{Since the pickle data format is actually a tiny
Barry Warsawf595fd92001-11-15 23:39:07 +0000794stack-oriented programming language, and some freedom is taken in the
795encodings of certain objects, it is possible that the two modules
796produce different data streams for the same input objects. However it
797is guaranteed that they will always be able to read each other's
Fred Drakef5f0c172003-09-09 19:49:18 +0000798data streams.}
Guido van Rossumcf3ce921999-01-06 23:34:39 +0000799
Barry Warsawf595fd92001-11-15 23:39:07 +0000800There are additional minor differences in API between \module{cPickle}
801and \module{pickle}, however for most applications, they are
802interchangable. More documentation is provided in the
803\module{pickle} module documentation, which
804includes a list of the documented differences.
805
806