blob: 0d31059946705798289f94661c6b97362e9a1511 [file] [log] [blame]
Guido van Rossum470be141995-03-17 16:07:09 +00001\section{Standard Module \sectcode{pickle}}
Guido van Rossume47da0a1997-07-17 16:34:52 +00002\label{module-pickle}
Guido van Rossumd1883581995-02-15 15:53:08 +00003\stmodindex{pickle}
4\index{persistency}
5\indexii{persistent}{objects}
6\indexii{serializing}{objects}
7\indexii{marshalling}{objects}
8\indexii{flattening}{objects}
9\indexii{pickling}{objects}
10
Guido van Rossum470be141995-03-17 16:07:09 +000011\renewcommand{\indexsubitem}{(in module pickle)}
12
Guido van Rossumd1883581995-02-15 15:53:08 +000013The \code{pickle} module implements a basic but powerful algorithm for
Guido van Rossum6bb1adc1995-03-13 10:03:32 +000014``pickling'' (a.k.a.\ serializing, marshalling or flattening) nearly
Guido van Rossumecde7811995-03-28 13:35:14 +000015arbitrary Python objects. This is the act of converting objects to a
16stream of bytes (and back: ``unpickling'').
17This is a more primitive notion than
Guido van Rossumd1883581995-02-15 15:53:08 +000018persistency --- although \code{pickle} reads and writes file objects,
19it does not handle the issue of naming persistent objects, nor the
20(even more complicated) area of concurrent access to persistent
21objects. The \code{pickle} module can transform a complex object into
22a byte stream and it can transform the byte stream into an object with
23the same internal structure. The most obvious thing to do with these
24byte streams is to write them onto a file, but it is also conceivable
25to send them across a network or store them in a database. The module
26\code{shelve} provides a simple interface to pickle and unpickle
27objects on ``dbm''-style database files.
Fred Drake54820dc1997-12-15 21:56:05 +000028\refstmodindex{shelve}
Guido van Rossumd1883581995-02-15 15:53:08 +000029
Guido van Rossum736fe5e1997-12-09 20:45:08 +000030\strong{Note:} The \code{pickle} module is rather slow. A
31reimplementation of the same algorithm in C, which is up to 1000 times
32faster, is available as the \code{cPickle} module. This has the same
33interface except that \code{Pickler} and \code{Unpickler} are factory
34functions, not classes (so they cannot be used as a base class for
35inheritance).
36
Guido van Rossumd1883581995-02-15 15:53:08 +000037Unlike the built-in module \code{marshal}, \code{pickle} handles the
38following correctly:
Fred Drake54820dc1997-12-15 21:56:05 +000039\refbimodindex{marshal}
Guido van Rossumd1883581995-02-15 15:53:08 +000040
41\begin{itemize}
42
Guido van Rossum470be141995-03-17 16:07:09 +000043\item recursive objects (objects containing references to themselves)
Guido van Rossumd1883581995-02-15 15:53:08 +000044
Guido van Rossum470be141995-03-17 16:07:09 +000045\item object sharing (references to the same object in different places)
Guido van Rossumd1883581995-02-15 15:53:08 +000046
Guido van Rossum470be141995-03-17 16:07:09 +000047\item user-defined classes and their instances
Guido van Rossumd1883581995-02-15 15:53:08 +000048
49\end{itemize}
50
51The data format used by \code{pickle} is Python-specific. This has
52the advantage that there are no restrictions imposed by external
53standards such as CORBA (which probably can't represent pointer
54sharing or recursive objects); however it means that non-Python
55programs may not be able to reconstruct pickled Python objects.
56
Guido van Rossum736fe5e1997-12-09 20:45:08 +000057By default, the \code{pickle} data format uses a printable \ASCII{}
58representation. This is slightly more voluminous than a binary
59representation. The big advantage of using printable \ASCII{} (and of
60some other characteristics of \code{pickle}'s representation) is that
61for debugging or recovery purposes it is possible for a human to read
62the pickled file with a standard text editor.
63
64A binary format, which is slightly more efficient, can be chosen by
65specifying a nonzero (true) value for the \var{bin} argument to the
66\code{Pickler} constructor or the \code{dump()} and \code{dumps()}
67functions. The binary format is not the default because of backwards
68compatibility with the Python 1.4 pickle module. In a future version,
69the default may change to binary.
Guido van Rossumd1883581995-02-15 15:53:08 +000070
71The \code{pickle} module doesn't handle code objects, which the
72\code{marshal} module does. I suppose \code{pickle} could, and maybe
73it should, but there's probably no great need for it right now (as
74long as \code{marshal} continues to be used for reading and writing
75code objects), and at least this avoids the possibility of smuggling
76Trojan horses into a program.
Fred Drake54820dc1997-12-15 21:56:05 +000077\refbimodindex{marshal}
Guido van Rossumd1883581995-02-15 15:53:08 +000078
79For the benefit of persistency modules written using \code{pickle}, it
80supports the notion of a reference to an object outside the pickled
81data stream. Such objects are referenced by a name, which is an
Guido van Rossum470be141995-03-17 16:07:09 +000082arbitrary string of printable \ASCII{} characters. The resolution of
Guido van Rossumd1883581995-02-15 15:53:08 +000083such names is not defined by the \code{pickle} module --- the
84persistent object module will have to implement a method
85\code{persistent_load}. To write references to persistent objects,
86the persistent module must define a method \code{persistent_id} which
87returns either \code{None} or the persistent ID of the object.
88
89There are some restrictions on the pickling of class instances.
90
91First of all, the class must be defined at the top level in a module.
Guido van Rossum736fe5e1997-12-09 20:45:08 +000092Furthermore, all its instance variables must be picklable.
Guido van Rossumd1883581995-02-15 15:53:08 +000093
Guido van Rossum470be141995-03-17 16:07:09 +000094\renewcommand{\indexsubitem}{(pickle protocol)}
95
Guido van Rossum736fe5e1997-12-09 20:45:08 +000096When a pickled class instance is unpickled, its \code{__init__} method
97is normally \emph{not} invoked. \strong{Note:} This is a deviation
98from previous versions of this module; the change was introduced in
99Python 1.5b2. The reason for the change is that in many cases it is
100desirable to have a constructor that requires arguments; it is a
101(minor) nuisance to have to provide a \code{__getinitargs__} method.
102
103If it is desirable that the \code{__init__} method be called on
104unpickling, a class can define a method \code{__getinitargs__()},
105which should return a {\em tuple} containing the arguments to be
Guido van Rossum57930391997-12-30 17:44:48 +0000106passed to the class constructor (\code{__init__()}). This method is
107called at pickle time; the tuple it returns is incorporated in the
108pickle for the instance.
Guido van Rossumd1883581995-02-15 15:53:08 +0000109\ttindex{__getinitargs__}
110\ttindex{__init__}
111
Guido van Rossum470be141995-03-17 16:07:09 +0000112Classes can further influence how their instances are pickled --- if the class
Guido van Rossumd1883581995-02-15 15:53:08 +0000113defines the method \code{__getstate__()}, it is called and the return
114state is pickled as the contents for the instance, and if the class
115defines the method \code{__setstate__()}, it is called with the
116unpickled state. (Note that these methods can also be used to
117implement copying class instances.) If there is no
118\code{__getstate__()} method, the instance's \code{__dict__} is
119pickled. If there is no \code{__setstate__()} method, the pickled
120object must be a dictionary and its items are assigned to the new
121instance's dictionary. (If a class defines both \code{__getstate__()}
122and \code{__setstate__()}, the state object needn't be a dictionary
123--- these methods can do what they want.) This protocol is also used
124by the shallow and deep copying operations defined in the \code{copy}
125module.
126\ttindex{__getstate__}
127\ttindex{__setstate__}
128\ttindex{__dict__}
129
130Note that when class instances are pickled, their class's code and
Guido van Rossum6bb1adc1995-03-13 10:03:32 +0000131data are not pickled along with them. Only the instance data are
Guido van Rossumd1883581995-02-15 15:53:08 +0000132pickled. This is done on purpose, so you can fix bugs in a class or
133add methods and still load objects that were created with an earlier
134version of the class. If you plan to have long-lived objects that
Guido van Rossum6bb1adc1995-03-13 10:03:32 +0000135will see many versions of a class, it may be worthwhile to put a version
Guido van Rossumd1883581995-02-15 15:53:08 +0000136number in the objects so that suitable conversions can be made by the
137class's \code{__setstate__()} method.
138
Guido van Rossum470be141995-03-17 16:07:09 +0000139When a class itself is pickled, only its name is pickled --- the class
140definition is not pickled, but re-imported by the unpickling process.
141Therefore, the restriction that the class must be defined at the top
142level in a module applies to pickled classes as well.
143
144\renewcommand{\indexsubitem}{(in module pickle)}
145
Guido van Rossumd1883581995-02-15 15:53:08 +0000146The interface can be summarized as follows.
147
148To pickle an object \code{x} onto a file \code{f}, open for writing:
149
Guido van Rossume47da0a1997-07-17 16:34:52 +0000150\bcode\begin{verbatim}
Guido van Rossumd1883581995-02-15 15:53:08 +0000151p = pickle.Pickler(f)
152p.dump(x)
Guido van Rossume47da0a1997-07-17 16:34:52 +0000153\end{verbatim}\ecode
154%
Guido van Rossum470be141995-03-17 16:07:09 +0000155A shorthand for this is:
156
Guido van Rossume47da0a1997-07-17 16:34:52 +0000157\bcode\begin{verbatim}
Guido van Rossum470be141995-03-17 16:07:09 +0000158pickle.dump(x, f)
Guido van Rossume47da0a1997-07-17 16:34:52 +0000159\end{verbatim}\ecode
160%
Guido van Rossumd1883581995-02-15 15:53:08 +0000161To unpickle an object \code{x} from a file \code{f}, open for reading:
162
Guido van Rossume47da0a1997-07-17 16:34:52 +0000163\bcode\begin{verbatim}
Guido van Rossumd1883581995-02-15 15:53:08 +0000164u = pickle.Unpickler(f)
Guido van Rossum96628a91995-04-10 11:34:00 +0000165x = u.load()
Guido van Rossume47da0a1997-07-17 16:34:52 +0000166\end{verbatim}\ecode
167%
Guido van Rossum470be141995-03-17 16:07:09 +0000168A shorthand is:
169
Guido van Rossume47da0a1997-07-17 16:34:52 +0000170\bcode\begin{verbatim}
Guido van Rossum470be141995-03-17 16:07:09 +0000171x = pickle.load(f)
Guido van Rossume47da0a1997-07-17 16:34:52 +0000172\end{verbatim}\ecode
173%
Guido van Rossumd1883581995-02-15 15:53:08 +0000174The \code{Pickler} class only calls the method \code{f.write} with a
175string argument. The \code{Unpickler} calls the methods \code{f.read}
176(with an integer argument) and \code{f.readline} (without argument),
177both returning a string. It is explicitly allowed to pass non-file
178objects here, as long as they have the right methods.
Guido van Rossum470be141995-03-17 16:07:09 +0000179\ttindex{Unpickler}
180\ttindex{Pickler}
Guido van Rossumd1883581995-02-15 15:53:08 +0000181
Guido van Rossum736fe5e1997-12-09 20:45:08 +0000182The constructor for the \code{Pickler} class has an optional second
183argument, \var{bin}. If this is present and nonzero, the binary
184pickle format is used; if it is zero or absent, the (less efficient,
185but backwards compatible) text pickle format is used. The
186\code{Unpickler} class does not have an argument to distinguish
187between binary and text pickle formats; it accepts either format.
188
Guido van Rossumd1883581995-02-15 15:53:08 +0000189The following types can be pickled:
190\begin{itemize}
191
192\item \code{None}
193
194\item integers, long integers, floating point numbers
195
196\item strings
197
198\item tuples, lists and dictionaries containing only picklable objects
199
Guido van Rossum470be141995-03-17 16:07:09 +0000200\item classes that are defined at the top level in a module
201
202\item instances of such classes whose \code{__dict__} or
203\code{__setstate__()} is picklable
Guido van Rossumd1883581995-02-15 15:53:08 +0000204
205\end{itemize}
206
Guido van Rossum470be141995-03-17 16:07:09 +0000207Attempts to pickle unpicklable objects will raise the
208\code{PicklingError} exception; when this happens, an unspecified
209number of bytes may have been written to the file.
Guido van Rossumd1883581995-02-15 15:53:08 +0000210
Guido van Rossum470be141995-03-17 16:07:09 +0000211It is possible to make multiple calls to the \code{dump()} method of
212the same \code{Pickler} instance. These must then be matched to the
213same number of calls to the \code{load()} instance of the
214corresponding \code{Unpickler} instance. If the same object is
215pickled by multiple \code{dump()} calls, the \code{load()} will all
216yield references to the same object. {\em Warning}: this is intended
217for pickling multiple objects without intervening modifications to the
218objects or their parts. If you modify an object and then pickle it
219again using the same \code{Pickler} instance, the object is not
220pickled again --- a reference to it is pickled and the
221\code{Unpickler} will return the old value, not the modified one.
222(There are two problems here: (a) detecting changes, and (b)
223marshalling a minimal set of changes. I have no answers. Garbage
224Collection may also become a problem here.)
225
226Apart from the \code{Pickler} and \code{Unpickler} classes, the
227module defines the following functions, and an exception:
228
Guido van Rossum736fe5e1997-12-09 20:45:08 +0000229\begin{funcdesc}{dump}{object\, file\optional{, bin}}
Guido van Rossum470be141995-03-17 16:07:09 +0000230Write a pickled representation of \var{obect} to the open file object
Guido van Rossum736fe5e1997-12-09 20:45:08 +0000231\var{file}. This is equivalent to
232\code{Pickler(\var{file}, \var{bin}).dump(\var{object})}.
233If the optional \var{bin} argument is present and nonzero, the binary
234pickle format is used; if it is zero or absent, the (less efficient)
235text pickle format is used.
Guido van Rossum470be141995-03-17 16:07:09 +0000236\end{funcdesc}
237
238\begin{funcdesc}{load}{file}
239Read a pickled object from the open file object \var{file}. This is
240equivalent to \code{Unpickler(file).load()}.
241\end{funcdesc}
242
Guido van Rossum736fe5e1997-12-09 20:45:08 +0000243\begin{funcdesc}{dumps}{object\optional{, bin}}
Guido van Rossum470be141995-03-17 16:07:09 +0000244Return the pickled representation of the object as a string, instead
Guido van Rossum736fe5e1997-12-09 20:45:08 +0000245of writing it to a file. If the optional \var{bin} argument is
246present and nonzero, the binary pickle format is used; if it is zero
247or absent, the (less efficient) text pickle format is used.
Guido van Rossum470be141995-03-17 16:07:09 +0000248\end{funcdesc}
249
250\begin{funcdesc}{loads}{string}
251Read a pickled object from a string instead of a file. Characters in
252the string past the pickled object's representation are ignored.
253\end{funcdesc}
254
255\begin{excdesc}{PicklingError}
256This exception is raised when an unpicklable object is passed to
257\code{Pickler.dump()}.
258\end{excdesc}