blob: 5eef11c05334fdc061454817c33be370eb4dbdfa [file] [log] [blame]
Guido van Rossum470be141995-03-17 16:07:09 +00001\section{Standard Module \sectcode{pickle}}
Guido van Rossume47da0a1997-07-17 16:34:52 +00002\label{module-pickle}
Guido van Rossumd1883581995-02-15 15:53:08 +00003\stmodindex{pickle}
4\index{persistency}
5\indexii{persistent}{objects}
6\indexii{serializing}{objects}
7\indexii{marshalling}{objects}
8\indexii{flattening}{objects}
9\indexii{pickling}{objects}
10
Guido van Rossum470be141995-03-17 16:07:09 +000011\renewcommand{\indexsubitem}{(in module pickle)}
12
Guido van Rossumd1883581995-02-15 15:53:08 +000013The \code{pickle} module implements a basic but powerful algorithm for
Guido van Rossum6bb1adc1995-03-13 10:03:32 +000014``pickling'' (a.k.a.\ serializing, marshalling or flattening) nearly
Guido van Rossumecde7811995-03-28 13:35:14 +000015arbitrary Python objects. This is the act of converting objects to a
16stream of bytes (and back: ``unpickling'').
17This is a more primitive notion than
Guido van Rossumd1883581995-02-15 15:53:08 +000018persistency --- although \code{pickle} reads and writes file objects,
19it does not handle the issue of naming persistent objects, nor the
20(even more complicated) area of concurrent access to persistent
21objects. The \code{pickle} module can transform a complex object into
22a byte stream and it can transform the byte stream into an object with
23the same internal structure. The most obvious thing to do with these
24byte streams is to write them onto a file, but it is also conceivable
25to send them across a network or store them in a database. The module
26\code{shelve} provides a simple interface to pickle and unpickle
27objects on ``dbm''-style database files.
Fred Drake54820dc1997-12-15 21:56:05 +000028\refstmodindex{shelve}
Guido van Rossumd1883581995-02-15 15:53:08 +000029
Guido van Rossum736fe5e1997-12-09 20:45:08 +000030\strong{Note:} The \code{pickle} module is rather slow. A
31reimplementation of the same algorithm in C, which is up to 1000 times
Fred Drakecf7e8301998-01-09 22:36:51 +000032faster, is available as the \code{cPickle}\refbimodindex{cPickle}
33module. This has the same interface except that \code{Pickler} and
34\code{Unpickler} are factory functions, not classes (so they cannot be
35used as a base class for inheritance).
Guido van Rossum736fe5e1997-12-09 20:45:08 +000036
Guido van Rossumd1883581995-02-15 15:53:08 +000037Unlike the built-in module \code{marshal}, \code{pickle} handles the
38following correctly:
Fred Drake54820dc1997-12-15 21:56:05 +000039\refbimodindex{marshal}
Guido van Rossumd1883581995-02-15 15:53:08 +000040
41\begin{itemize}
42
Guido van Rossum470be141995-03-17 16:07:09 +000043\item recursive objects (objects containing references to themselves)
Guido van Rossumd1883581995-02-15 15:53:08 +000044
Guido van Rossum470be141995-03-17 16:07:09 +000045\item object sharing (references to the same object in different places)
Guido van Rossumd1883581995-02-15 15:53:08 +000046
Guido van Rossum470be141995-03-17 16:07:09 +000047\item user-defined classes and their instances
Guido van Rossumd1883581995-02-15 15:53:08 +000048
49\end{itemize}
50
51The data format used by \code{pickle} is Python-specific. This has
52the advantage that there are no restrictions imposed by external
Fred Drakecf7e8301998-01-09 22:36:51 +000053standards such as XDR%
54\index{XDR}
55\index{External Data Representation}
56(which can't represent pointer sharing); however
57it means that non-Python programs may not be able to reconstruct
58pickled Python objects.
Guido van Rossumd1883581995-02-15 15:53:08 +000059
Guido van Rossum736fe5e1997-12-09 20:45:08 +000060By default, the \code{pickle} data format uses a printable \ASCII{}
61representation. This is slightly more voluminous than a binary
62representation. The big advantage of using printable \ASCII{} (and of
63some other characteristics of \code{pickle}'s representation) is that
64for debugging or recovery purposes it is possible for a human to read
65the pickled file with a standard text editor.
66
67A binary format, which is slightly more efficient, can be chosen by
68specifying a nonzero (true) value for the \var{bin} argument to the
69\code{Pickler} constructor or the \code{dump()} and \code{dumps()}
70functions. The binary format is not the default because of backwards
71compatibility with the Python 1.4 pickle module. In a future version,
72the default may change to binary.
Guido van Rossumd1883581995-02-15 15:53:08 +000073
74The \code{pickle} module doesn't handle code objects, which the
75\code{marshal} module does. I suppose \code{pickle} could, and maybe
76it should, but there's probably no great need for it right now (as
77long as \code{marshal} continues to be used for reading and writing
78code objects), and at least this avoids the possibility of smuggling
79Trojan horses into a program.
Fred Drake54820dc1997-12-15 21:56:05 +000080\refbimodindex{marshal}
Guido van Rossumd1883581995-02-15 15:53:08 +000081
82For the benefit of persistency modules written using \code{pickle}, it
83supports the notion of a reference to an object outside the pickled
84data stream. Such objects are referenced by a name, which is an
Guido van Rossum470be141995-03-17 16:07:09 +000085arbitrary string of printable \ASCII{} characters. The resolution of
Guido van Rossumd1883581995-02-15 15:53:08 +000086such names is not defined by the \code{pickle} module --- the
87persistent object module will have to implement a method
Fred Drakecf7e8301998-01-09 22:36:51 +000088\code{persistent_load()}. To write references to persistent objects,
89the persistent module must define a method \code{persistent_id()} which
Guido van Rossumd1883581995-02-15 15:53:08 +000090returns either \code{None} or the persistent ID of the object.
91
92There are some restrictions on the pickling of class instances.
93
94First of all, the class must be defined at the top level in a module.
Guido van Rossum736fe5e1997-12-09 20:45:08 +000095Furthermore, all its instance variables must be picklable.
Guido van Rossumd1883581995-02-15 15:53:08 +000096
Guido van Rossum470be141995-03-17 16:07:09 +000097\renewcommand{\indexsubitem}{(pickle protocol)}
98
Fred Drakecf7e8301998-01-09 22:36:51 +000099When a pickled class instance is unpickled, its \code{__init__()} method
Guido van Rossum736fe5e1997-12-09 20:45:08 +0000100is normally \emph{not} invoked. \strong{Note:} This is a deviation
101from previous versions of this module; the change was introduced in
102Python 1.5b2. The reason for the change is that in many cases it is
103desirable to have a constructor that requires arguments; it is a
Fred Drakecf7e8301998-01-09 22:36:51 +0000104(minor) nuisance to have to provide a \code{__getinitargs__()} method.
Guido van Rossum736fe5e1997-12-09 20:45:08 +0000105
Fred Drakecf7e8301998-01-09 22:36:51 +0000106If it is desirable that the \code{__init__()} method be called on
Guido van Rossum736fe5e1997-12-09 20:45:08 +0000107unpickling, a class can define a method \code{__getinitargs__()},
Fred Drakecf7e8301998-01-09 22:36:51 +0000108which should return a \emph{tuple} containing the arguments to be
Guido van Rossum57930391997-12-30 17:44:48 +0000109passed to the class constructor (\code{__init__()}). This method is
110called at pickle time; the tuple it returns is incorporated in the
111pickle for the instance.
Guido van Rossumd1883581995-02-15 15:53:08 +0000112\ttindex{__getinitargs__}
113\ttindex{__init__}
114
Guido van Rossum470be141995-03-17 16:07:09 +0000115Classes can further influence how their instances are pickled --- if the class
Guido van Rossumd1883581995-02-15 15:53:08 +0000116defines the method \code{__getstate__()}, it is called and the return
117state is pickled as the contents for the instance, and if the class
118defines the method \code{__setstate__()}, it is called with the
119unpickled state. (Note that these methods can also be used to
120implement copying class instances.) If there is no
121\code{__getstate__()} method, the instance's \code{__dict__} is
122pickled. If there is no \code{__setstate__()} method, the pickled
123object must be a dictionary and its items are assigned to the new
124instance's dictionary. (If a class defines both \code{__getstate__()}
125and \code{__setstate__()}, the state object needn't be a dictionary
126--- these methods can do what they want.) This protocol is also used
127by the shallow and deep copying operations defined in the \code{copy}
128module.
129\ttindex{__getstate__}
130\ttindex{__setstate__}
131\ttindex{__dict__}
132
133Note that when class instances are pickled, their class's code and
Guido van Rossum6bb1adc1995-03-13 10:03:32 +0000134data are not pickled along with them. Only the instance data are
Guido van Rossumd1883581995-02-15 15:53:08 +0000135pickled. This is done on purpose, so you can fix bugs in a class or
136add methods and still load objects that were created with an earlier
137version of the class. If you plan to have long-lived objects that
Guido van Rossum6bb1adc1995-03-13 10:03:32 +0000138will see many versions of a class, it may be worthwhile to put a version
Guido van Rossumd1883581995-02-15 15:53:08 +0000139number in the objects so that suitable conversions can be made by the
140class's \code{__setstate__()} method.
141
Guido van Rossum470be141995-03-17 16:07:09 +0000142When a class itself is pickled, only its name is pickled --- the class
143definition is not pickled, but re-imported by the unpickling process.
144Therefore, the restriction that the class must be defined at the top
145level in a module applies to pickled classes as well.
146
147\renewcommand{\indexsubitem}{(in module pickle)}
148
Guido van Rossumd1883581995-02-15 15:53:08 +0000149The interface can be summarized as follows.
150
151To pickle an object \code{x} onto a file \code{f}, open for writing:
152
Guido van Rossume47da0a1997-07-17 16:34:52 +0000153\bcode\begin{verbatim}
Guido van Rossumd1883581995-02-15 15:53:08 +0000154p = pickle.Pickler(f)
155p.dump(x)
Guido van Rossume47da0a1997-07-17 16:34:52 +0000156\end{verbatim}\ecode
157%
Guido van Rossum470be141995-03-17 16:07:09 +0000158A shorthand for this is:
159
Guido van Rossume47da0a1997-07-17 16:34:52 +0000160\bcode\begin{verbatim}
Guido van Rossum470be141995-03-17 16:07:09 +0000161pickle.dump(x, f)
Guido van Rossume47da0a1997-07-17 16:34:52 +0000162\end{verbatim}\ecode
163%
Guido van Rossumd1883581995-02-15 15:53:08 +0000164To unpickle an object \code{x} from a file \code{f}, open for reading:
165
Guido van Rossume47da0a1997-07-17 16:34:52 +0000166\bcode\begin{verbatim}
Guido van Rossumd1883581995-02-15 15:53:08 +0000167u = pickle.Unpickler(f)
Guido van Rossum96628a91995-04-10 11:34:00 +0000168x = u.load()
Guido van Rossume47da0a1997-07-17 16:34:52 +0000169\end{verbatim}\ecode
170%
Guido van Rossum470be141995-03-17 16:07:09 +0000171A shorthand is:
172
Guido van Rossume47da0a1997-07-17 16:34:52 +0000173\bcode\begin{verbatim}
Guido van Rossum470be141995-03-17 16:07:09 +0000174x = pickle.load(f)
Guido van Rossume47da0a1997-07-17 16:34:52 +0000175\end{verbatim}\ecode
176%
Fred Drakecf7e8301998-01-09 22:36:51 +0000177The \code{Pickler} class only calls the method \code{f.write()} with a
178string argument. The \code{Unpickler} calls the methods \code{f.read()}
179(with an integer argument) and \code{f.readline()} (without argument),
Guido van Rossumd1883581995-02-15 15:53:08 +0000180both returning a string. It is explicitly allowed to pass non-file
181objects here, as long as they have the right methods.
Guido van Rossum470be141995-03-17 16:07:09 +0000182\ttindex{Unpickler}
183\ttindex{Pickler}
Guido van Rossumd1883581995-02-15 15:53:08 +0000184
Guido van Rossum736fe5e1997-12-09 20:45:08 +0000185The constructor for the \code{Pickler} class has an optional second
186argument, \var{bin}. If this is present and nonzero, the binary
187pickle format is used; if it is zero or absent, the (less efficient,
188but backwards compatible) text pickle format is used. The
189\code{Unpickler} class does not have an argument to distinguish
190between binary and text pickle formats; it accepts either format.
191
Guido van Rossumd1883581995-02-15 15:53:08 +0000192The following types can be pickled:
193\begin{itemize}
194
195\item \code{None}
196
197\item integers, long integers, floating point numbers
198
199\item strings
200
201\item tuples, lists and dictionaries containing only picklable objects
202
Guido van Rossum470be141995-03-17 16:07:09 +0000203\item classes that are defined at the top level in a module
204
205\item instances of such classes whose \code{__dict__} or
206\code{__setstate__()} is picklable
Guido van Rossumd1883581995-02-15 15:53:08 +0000207
208\end{itemize}
209
Guido van Rossum470be141995-03-17 16:07:09 +0000210Attempts to pickle unpicklable objects will raise the
211\code{PicklingError} exception; when this happens, an unspecified
212number of bytes may have been written to the file.
Guido van Rossumd1883581995-02-15 15:53:08 +0000213
Guido van Rossum470be141995-03-17 16:07:09 +0000214It is possible to make multiple calls to the \code{dump()} method of
215the same \code{Pickler} instance. These must then be matched to the
216same number of calls to the \code{load()} instance of the
217corresponding \code{Unpickler} instance. If the same object is
218pickled by multiple \code{dump()} calls, the \code{load()} will all
Fred Drakecf7e8301998-01-09 22:36:51 +0000219yield references to the same object. \emph{Warning}: this is intended
Guido van Rossum470be141995-03-17 16:07:09 +0000220for pickling multiple objects without intervening modifications to the
221objects or their parts. If you modify an object and then pickle it
222again using the same \code{Pickler} instance, the object is not
223pickled again --- a reference to it is pickled and the
224\code{Unpickler} will return the old value, not the modified one.
225(There are two problems here: (a) detecting changes, and (b)
226marshalling a minimal set of changes. I have no answers. Garbage
227Collection may also become a problem here.)
228
229Apart from the \code{Pickler} and \code{Unpickler} classes, the
230module defines the following functions, and an exception:
231
Guido van Rossum736fe5e1997-12-09 20:45:08 +0000232\begin{funcdesc}{dump}{object\, file\optional{, bin}}
Guido van Rossum470be141995-03-17 16:07:09 +0000233Write a pickled representation of \var{obect} to the open file object
Guido van Rossum736fe5e1997-12-09 20:45:08 +0000234\var{file}. This is equivalent to
235\code{Pickler(\var{file}, \var{bin}).dump(\var{object})}.
236If the optional \var{bin} argument is present and nonzero, the binary
237pickle format is used; if it is zero or absent, the (less efficient)
238text pickle format is used.
Guido van Rossum470be141995-03-17 16:07:09 +0000239\end{funcdesc}
240
241\begin{funcdesc}{load}{file}
242Read a pickled object from the open file object \var{file}. This is
Fred Drakecf7e8301998-01-09 22:36:51 +0000243equivalent to \code{Unpickler(\var{file}).load()}.
Guido van Rossum470be141995-03-17 16:07:09 +0000244\end{funcdesc}
245
Guido van Rossum736fe5e1997-12-09 20:45:08 +0000246\begin{funcdesc}{dumps}{object\optional{, bin}}
Guido van Rossum470be141995-03-17 16:07:09 +0000247Return the pickled representation of the object as a string, instead
Guido van Rossum736fe5e1997-12-09 20:45:08 +0000248of writing it to a file. If the optional \var{bin} argument is
249present and nonzero, the binary pickle format is used; if it is zero
250or absent, the (less efficient) text pickle format is used.
Guido van Rossum470be141995-03-17 16:07:09 +0000251\end{funcdesc}
252
253\begin{funcdesc}{loads}{string}
254Read a pickled object from a string instead of a file. Characters in
255the string past the pickled object's representation are ignored.
256\end{funcdesc}
257
258\begin{excdesc}{PicklingError}
259This exception is raised when an unpicklable object is passed to
260\code{Pickler.dump()}.
261\end{excdesc}