blob: 508e50ddcc2c95e4377260b704da63504d5d7019 [file] [log] [blame]
Guido van Rossum470be141995-03-17 16:07:09 +00001\section{Standard Module \sectcode{pickle}}
Guido van Rossume47da0a1997-07-17 16:34:52 +00002\label{module-pickle}
Guido van Rossumd1883581995-02-15 15:53:08 +00003\stmodindex{pickle}
4\index{persistency}
5\indexii{persistent}{objects}
6\indexii{serializing}{objects}
7\indexii{marshalling}{objects}
8\indexii{flattening}{objects}
9\indexii{pickling}{objects}
10
Guido van Rossum470be141995-03-17 16:07:09 +000011\renewcommand{\indexsubitem}{(in module pickle)}
12
Guido van Rossumd1883581995-02-15 15:53:08 +000013The \code{pickle} module implements a basic but powerful algorithm for
Guido van Rossum6bb1adc1995-03-13 10:03:32 +000014``pickling'' (a.k.a.\ serializing, marshalling or flattening) nearly
Guido van Rossumecde7811995-03-28 13:35:14 +000015arbitrary Python objects. This is the act of converting objects to a
16stream of bytes (and back: ``unpickling'').
17This is a more primitive notion than
Guido van Rossumd1883581995-02-15 15:53:08 +000018persistency --- although \code{pickle} reads and writes file objects,
19it does not handle the issue of naming persistent objects, nor the
20(even more complicated) area of concurrent access to persistent
21objects. The \code{pickle} module can transform a complex object into
22a byte stream and it can transform the byte stream into an object with
23the same internal structure. The most obvious thing to do with these
24byte streams is to write them onto a file, but it is also conceivable
25to send them across a network or store them in a database. The module
26\code{shelve} provides a simple interface to pickle and unpickle
27objects on ``dbm''-style database files.
Fred Drake54820dc1997-12-15 21:56:05 +000028\refstmodindex{shelve}
Guido van Rossumd1883581995-02-15 15:53:08 +000029
Guido van Rossum736fe5e1997-12-09 20:45:08 +000030\strong{Note:} The \code{pickle} module is rather slow. A
31reimplementation of the same algorithm in C, which is up to 1000 times
32faster, is available as the \code{cPickle} module. This has the same
33interface except that \code{Pickler} and \code{Unpickler} are factory
34functions, not classes (so they cannot be used as a base class for
35inheritance).
36
Guido van Rossumd1883581995-02-15 15:53:08 +000037Unlike the built-in module \code{marshal}, \code{pickle} handles the
38following correctly:
Fred Drake54820dc1997-12-15 21:56:05 +000039\refbimodindex{marshal}
Guido van Rossumd1883581995-02-15 15:53:08 +000040
41\begin{itemize}
42
Guido van Rossum470be141995-03-17 16:07:09 +000043\item recursive objects (objects containing references to themselves)
Guido van Rossumd1883581995-02-15 15:53:08 +000044
Guido van Rossum470be141995-03-17 16:07:09 +000045\item object sharing (references to the same object in different places)
Guido van Rossumd1883581995-02-15 15:53:08 +000046
Guido van Rossum470be141995-03-17 16:07:09 +000047\item user-defined classes and their instances
Guido van Rossumd1883581995-02-15 15:53:08 +000048
49\end{itemize}
50
51The data format used by \code{pickle} is Python-specific. This has
52the advantage that there are no restrictions imposed by external
53standards such as CORBA (which probably can't represent pointer
54sharing or recursive objects); however it means that non-Python
55programs may not be able to reconstruct pickled Python objects.
56
Guido van Rossum736fe5e1997-12-09 20:45:08 +000057By default, the \code{pickle} data format uses a printable \ASCII{}
58representation. This is slightly more voluminous than a binary
59representation. The big advantage of using printable \ASCII{} (and of
60some other characteristics of \code{pickle}'s representation) is that
61for debugging or recovery purposes it is possible for a human to read
62the pickled file with a standard text editor.
63
64A binary format, which is slightly more efficient, can be chosen by
65specifying a nonzero (true) value for the \var{bin} argument to the
66\code{Pickler} constructor or the \code{dump()} and \code{dumps()}
67functions. The binary format is not the default because of backwards
68compatibility with the Python 1.4 pickle module. In a future version,
69the default may change to binary.
Guido van Rossumd1883581995-02-15 15:53:08 +000070
71The \code{pickle} module doesn't handle code objects, which the
72\code{marshal} module does. I suppose \code{pickle} could, and maybe
73it should, but there's probably no great need for it right now (as
74long as \code{marshal} continues to be used for reading and writing
75code objects), and at least this avoids the possibility of smuggling
76Trojan horses into a program.
Fred Drake54820dc1997-12-15 21:56:05 +000077\refbimodindex{marshal}
Guido van Rossumd1883581995-02-15 15:53:08 +000078
79For the benefit of persistency modules written using \code{pickle}, it
80supports the notion of a reference to an object outside the pickled
81data stream. Such objects are referenced by a name, which is an
Guido van Rossum470be141995-03-17 16:07:09 +000082arbitrary string of printable \ASCII{} characters. The resolution of
Guido van Rossumd1883581995-02-15 15:53:08 +000083such names is not defined by the \code{pickle} module --- the
84persistent object module will have to implement a method
85\code{persistent_load}. To write references to persistent objects,
86the persistent module must define a method \code{persistent_id} which
87returns either \code{None} or the persistent ID of the object.
88
89There are some restrictions on the pickling of class instances.
90
91First of all, the class must be defined at the top level in a module.
Guido van Rossum736fe5e1997-12-09 20:45:08 +000092Furthermore, all its instance variables must be picklable.
Guido van Rossumd1883581995-02-15 15:53:08 +000093
Guido van Rossum470be141995-03-17 16:07:09 +000094\renewcommand{\indexsubitem}{(pickle protocol)}
95
Guido van Rossum736fe5e1997-12-09 20:45:08 +000096When a pickled class instance is unpickled, its \code{__init__} method
97is normally \emph{not} invoked. \strong{Note:} This is a deviation
98from previous versions of this module; the change was introduced in
99Python 1.5b2. The reason for the change is that in many cases it is
100desirable to have a constructor that requires arguments; it is a
101(minor) nuisance to have to provide a \code{__getinitargs__} method.
102
103If it is desirable that the \code{__init__} method be called on
104unpickling, a class can define a method \code{__getinitargs__()},
105which should return a {\em tuple} containing the arguments to be
106passed to the class constructor (\code{__init__()}).
Guido van Rossumd1883581995-02-15 15:53:08 +0000107\ttindex{__getinitargs__}
108\ttindex{__init__}
109
Guido van Rossum470be141995-03-17 16:07:09 +0000110Classes can further influence how their instances are pickled --- if the class
Guido van Rossumd1883581995-02-15 15:53:08 +0000111defines the method \code{__getstate__()}, it is called and the return
112state is pickled as the contents for the instance, and if the class
113defines the method \code{__setstate__()}, it is called with the
114unpickled state. (Note that these methods can also be used to
115implement copying class instances.) If there is no
116\code{__getstate__()} method, the instance's \code{__dict__} is
117pickled. If there is no \code{__setstate__()} method, the pickled
118object must be a dictionary and its items are assigned to the new
119instance's dictionary. (If a class defines both \code{__getstate__()}
120and \code{__setstate__()}, the state object needn't be a dictionary
121--- these methods can do what they want.) This protocol is also used
122by the shallow and deep copying operations defined in the \code{copy}
123module.
124\ttindex{__getstate__}
125\ttindex{__setstate__}
126\ttindex{__dict__}
127
128Note that when class instances are pickled, their class's code and
Guido van Rossum6bb1adc1995-03-13 10:03:32 +0000129data are not pickled along with them. Only the instance data are
Guido van Rossumd1883581995-02-15 15:53:08 +0000130pickled. This is done on purpose, so you can fix bugs in a class or
131add methods and still load objects that were created with an earlier
132version of the class. If you plan to have long-lived objects that
Guido van Rossum6bb1adc1995-03-13 10:03:32 +0000133will see many versions of a class, it may be worthwhile to put a version
Guido van Rossumd1883581995-02-15 15:53:08 +0000134number in the objects so that suitable conversions can be made by the
135class's \code{__setstate__()} method.
136
Guido van Rossum470be141995-03-17 16:07:09 +0000137When a class itself is pickled, only its name is pickled --- the class
138definition is not pickled, but re-imported by the unpickling process.
139Therefore, the restriction that the class must be defined at the top
140level in a module applies to pickled classes as well.
141
142\renewcommand{\indexsubitem}{(in module pickle)}
143
Guido van Rossumd1883581995-02-15 15:53:08 +0000144The interface can be summarized as follows.
145
146To pickle an object \code{x} onto a file \code{f}, open for writing:
147
Guido van Rossume47da0a1997-07-17 16:34:52 +0000148\bcode\begin{verbatim}
Guido van Rossumd1883581995-02-15 15:53:08 +0000149p = pickle.Pickler(f)
150p.dump(x)
Guido van Rossume47da0a1997-07-17 16:34:52 +0000151\end{verbatim}\ecode
152%
Guido van Rossum470be141995-03-17 16:07:09 +0000153A shorthand for this is:
154
Guido van Rossume47da0a1997-07-17 16:34:52 +0000155\bcode\begin{verbatim}
Guido van Rossum470be141995-03-17 16:07:09 +0000156pickle.dump(x, f)
Guido van Rossume47da0a1997-07-17 16:34:52 +0000157\end{verbatim}\ecode
158%
Guido van Rossumd1883581995-02-15 15:53:08 +0000159To unpickle an object \code{x} from a file \code{f}, open for reading:
160
Guido van Rossume47da0a1997-07-17 16:34:52 +0000161\bcode\begin{verbatim}
Guido van Rossumd1883581995-02-15 15:53:08 +0000162u = pickle.Unpickler(f)
Guido van Rossum96628a91995-04-10 11:34:00 +0000163x = u.load()
Guido van Rossume47da0a1997-07-17 16:34:52 +0000164\end{verbatim}\ecode
165%
Guido van Rossum470be141995-03-17 16:07:09 +0000166A shorthand is:
167
Guido van Rossume47da0a1997-07-17 16:34:52 +0000168\bcode\begin{verbatim}
Guido van Rossum470be141995-03-17 16:07:09 +0000169x = pickle.load(f)
Guido van Rossume47da0a1997-07-17 16:34:52 +0000170\end{verbatim}\ecode
171%
Guido van Rossumd1883581995-02-15 15:53:08 +0000172The \code{Pickler} class only calls the method \code{f.write} with a
173string argument. The \code{Unpickler} calls the methods \code{f.read}
174(with an integer argument) and \code{f.readline} (without argument),
175both returning a string. It is explicitly allowed to pass non-file
176objects here, as long as they have the right methods.
Guido van Rossum470be141995-03-17 16:07:09 +0000177\ttindex{Unpickler}
178\ttindex{Pickler}
Guido van Rossumd1883581995-02-15 15:53:08 +0000179
Guido van Rossum736fe5e1997-12-09 20:45:08 +0000180The constructor for the \code{Pickler} class has an optional second
181argument, \var{bin}. If this is present and nonzero, the binary
182pickle format is used; if it is zero or absent, the (less efficient,
183but backwards compatible) text pickle format is used. The
184\code{Unpickler} class does not have an argument to distinguish
185between binary and text pickle formats; it accepts either format.
186
Guido van Rossumd1883581995-02-15 15:53:08 +0000187The following types can be pickled:
188\begin{itemize}
189
190\item \code{None}
191
192\item integers, long integers, floating point numbers
193
194\item strings
195
196\item tuples, lists and dictionaries containing only picklable objects
197
Guido van Rossum470be141995-03-17 16:07:09 +0000198\item classes that are defined at the top level in a module
199
200\item instances of such classes whose \code{__dict__} or
201\code{__setstate__()} is picklable
Guido van Rossumd1883581995-02-15 15:53:08 +0000202
203\end{itemize}
204
Guido van Rossum470be141995-03-17 16:07:09 +0000205Attempts to pickle unpicklable objects will raise the
206\code{PicklingError} exception; when this happens, an unspecified
207number of bytes may have been written to the file.
Guido van Rossumd1883581995-02-15 15:53:08 +0000208
Guido van Rossum470be141995-03-17 16:07:09 +0000209It is possible to make multiple calls to the \code{dump()} method of
210the same \code{Pickler} instance. These must then be matched to the
211same number of calls to the \code{load()} instance of the
212corresponding \code{Unpickler} instance. If the same object is
213pickled by multiple \code{dump()} calls, the \code{load()} will all
214yield references to the same object. {\em Warning}: this is intended
215for pickling multiple objects without intervening modifications to the
216objects or their parts. If you modify an object and then pickle it
217again using the same \code{Pickler} instance, the object is not
218pickled again --- a reference to it is pickled and the
219\code{Unpickler} will return the old value, not the modified one.
220(There are two problems here: (a) detecting changes, and (b)
221marshalling a minimal set of changes. I have no answers. Garbage
222Collection may also become a problem here.)
223
224Apart from the \code{Pickler} and \code{Unpickler} classes, the
225module defines the following functions, and an exception:
226
Guido van Rossum736fe5e1997-12-09 20:45:08 +0000227\begin{funcdesc}{dump}{object\, file\optional{, bin}}
Guido van Rossum470be141995-03-17 16:07:09 +0000228Write a pickled representation of \var{obect} to the open file object
Guido van Rossum736fe5e1997-12-09 20:45:08 +0000229\var{file}. This is equivalent to
230\code{Pickler(\var{file}, \var{bin}).dump(\var{object})}.
231If the optional \var{bin} argument is present and nonzero, the binary
232pickle format is used; if it is zero or absent, the (less efficient)
233text pickle format is used.
Guido van Rossum470be141995-03-17 16:07:09 +0000234\end{funcdesc}
235
236\begin{funcdesc}{load}{file}
237Read a pickled object from the open file object \var{file}. This is
238equivalent to \code{Unpickler(file).load()}.
239\end{funcdesc}
240
Guido van Rossum736fe5e1997-12-09 20:45:08 +0000241\begin{funcdesc}{dumps}{object\optional{, bin}}
Guido van Rossum470be141995-03-17 16:07:09 +0000242Return the pickled representation of the object as a string, instead
Guido van Rossum736fe5e1997-12-09 20:45:08 +0000243of writing it to a file. If the optional \var{bin} argument is
244present and nonzero, the binary pickle format is used; if it is zero
245or absent, the (less efficient) text pickle format is used.
Guido van Rossum470be141995-03-17 16:07:09 +0000246\end{funcdesc}
247
248\begin{funcdesc}{loads}{string}
249Read a pickled object from a string instead of a file. Characters in
250the string past the pickled object's representation are ignored.
251\end{funcdesc}
252
253\begin{excdesc}{PicklingError}
254This exception is raised when an unpicklable object is passed to
255\code{Pickler.dump()}.
256\end{excdesc}