blob: 128b29de468ea82456cbc5e473ba38254f716b76 [file] [log] [blame]
Guido van Rossum470be141995-03-17 16:07:09 +00001\section{Standard Module \sectcode{pickle}}
Guido van Rossumd1883581995-02-15 15:53:08 +00002\stmodindex{pickle}
3\index{persistency}
4\indexii{persistent}{objects}
5\indexii{serializing}{objects}
6\indexii{marshalling}{objects}
7\indexii{flattening}{objects}
8\indexii{pickling}{objects}
9
Guido van Rossum470be141995-03-17 16:07:09 +000010\renewcommand{\indexsubitem}{(in module pickle)}
11
Guido van Rossumd1883581995-02-15 15:53:08 +000012The \code{pickle} module implements a basic but powerful algorithm for
Guido van Rossum6bb1adc1995-03-13 10:03:32 +000013``pickling'' (a.k.a.\ serializing, marshalling or flattening) nearly
Guido van Rossumecde7811995-03-28 13:35:14 +000014arbitrary Python objects. This is the act of converting objects to a
15stream of bytes (and back: ``unpickling'').
16This is a more primitive notion than
Guido van Rossumd1883581995-02-15 15:53:08 +000017persistency --- although \code{pickle} reads and writes file objects,
18it does not handle the issue of naming persistent objects, nor the
19(even more complicated) area of concurrent access to persistent
20objects. The \code{pickle} module can transform a complex object into
21a byte stream and it can transform the byte stream into an object with
22the same internal structure. The most obvious thing to do with these
23byte streams is to write them onto a file, but it is also conceivable
24to send them across a network or store them in a database. The module
25\code{shelve} provides a simple interface to pickle and unpickle
26objects on ``dbm''-style database files.
27\stmodindex{shelve}
28
29Unlike the built-in module \code{marshal}, \code{pickle} handles the
30following correctly:
31\stmodindex{marshal}
32
33\begin{itemize}
34
Guido van Rossum470be141995-03-17 16:07:09 +000035\item recursive objects (objects containing references to themselves)
Guido van Rossumd1883581995-02-15 15:53:08 +000036
Guido van Rossum470be141995-03-17 16:07:09 +000037\item object sharing (references to the same object in different places)
Guido van Rossumd1883581995-02-15 15:53:08 +000038
Guido van Rossum470be141995-03-17 16:07:09 +000039\item user-defined classes and their instances
Guido van Rossumd1883581995-02-15 15:53:08 +000040
41\end{itemize}
42
43The data format used by \code{pickle} is Python-specific. This has
44the advantage that there are no restrictions imposed by external
45standards such as CORBA (which probably can't represent pointer
46sharing or recursive objects); however it means that non-Python
47programs may not be able to reconstruct pickled Python objects.
48
Guido van Rossum470be141995-03-17 16:07:09 +000049The \code{pickle} data format uses a printable \ASCII{} representation.
Guido van Rossumd1883581995-02-15 15:53:08 +000050This is slightly more voluminous than a binary representation.
51However, small integers actually take {\em less} space when
52represented as minimal-size decimal strings than when represented as
5332-bit binary numbers, and strings are only much longer if they
54contain many control characters or 8-bit characters. The big
Guido van Rossum470be141995-03-17 16:07:09 +000055advantage of using printable \ASCII{} (and of some other characteristics
Guido van Rossumd1883581995-02-15 15:53:08 +000056of \code{pickle}'s representation) is that for debugging or recovery
57purposes it is possible for a human to read the pickled file with a
58standard text editor. (I could have gone a step further and used a
Guido van Rossumecde7811995-03-28 13:35:14 +000059notation like S-expressions, but the parser
60(currently written in Python) would have been
Guido van Rossumd1883581995-02-15 15:53:08 +000061considerably more complicated and slower, and the files would probably
62have become much larger.)
63
64The \code{pickle} module doesn't handle code objects, which the
65\code{marshal} module does. I suppose \code{pickle} could, and maybe
66it should, but there's probably no great need for it right now (as
67long as \code{marshal} continues to be used for reading and writing
68code objects), and at least this avoids the possibility of smuggling
69Trojan horses into a program.
70\stmodindex{marshal}
71
72For the benefit of persistency modules written using \code{pickle}, it
73supports the notion of a reference to an object outside the pickled
74data stream. Such objects are referenced by a name, which is an
Guido van Rossum470be141995-03-17 16:07:09 +000075arbitrary string of printable \ASCII{} characters. The resolution of
Guido van Rossumd1883581995-02-15 15:53:08 +000076such names is not defined by the \code{pickle} module --- the
77persistent object module will have to implement a method
78\code{persistent_load}. To write references to persistent objects,
79the persistent module must define a method \code{persistent_id} which
80returns either \code{None} or the persistent ID of the object.
81
82There are some restrictions on the pickling of class instances.
83
84First of all, the class must be defined at the top level in a module.
85
Guido van Rossum470be141995-03-17 16:07:09 +000086\renewcommand{\indexsubitem}{(pickle protocol)}
87
Guido van Rossumd1883581995-02-15 15:53:08 +000088Next, it must normally be possible to create class instances by
Guido van Rossum12f0cc31996-08-09 21:23:47 +000089calling the class without arguments. Usually, this is best
90accomplished by providing default values for all arguments to its
91\code{__init__} method (if it has one). If this is undesirable, the
Guido van Rossumd1883581995-02-15 15:53:08 +000092class can define a method \code{__getinitargs__()}, which should
93return a {\em tuple} containing the arguments to be passed to the
94class constructor (\code{__init__()}).
95\ttindex{__getinitargs__}
96\ttindex{__init__}
97
Guido van Rossum470be141995-03-17 16:07:09 +000098Classes can further influence how their instances are pickled --- if the class
Guido van Rossumd1883581995-02-15 15:53:08 +000099defines the method \code{__getstate__()}, it is called and the return
100state is pickled as the contents for the instance, and if the class
101defines the method \code{__setstate__()}, it is called with the
102unpickled state. (Note that these methods can also be used to
103implement copying class instances.) If there is no
104\code{__getstate__()} method, the instance's \code{__dict__} is
105pickled. If there is no \code{__setstate__()} method, the pickled
106object must be a dictionary and its items are assigned to the new
107instance's dictionary. (If a class defines both \code{__getstate__()}
108and \code{__setstate__()}, the state object needn't be a dictionary
109--- these methods can do what they want.) This protocol is also used
110by the shallow and deep copying operations defined in the \code{copy}
111module.
112\ttindex{__getstate__}
113\ttindex{__setstate__}
114\ttindex{__dict__}
115
116Note that when class instances are pickled, their class's code and
Guido van Rossum6bb1adc1995-03-13 10:03:32 +0000117data are not pickled along with them. Only the instance data are
Guido van Rossumd1883581995-02-15 15:53:08 +0000118pickled. This is done on purpose, so you can fix bugs in a class or
119add methods and still load objects that were created with an earlier
120version of the class. If you plan to have long-lived objects that
Guido van Rossum6bb1adc1995-03-13 10:03:32 +0000121will see many versions of a class, it may be worthwhile to put a version
Guido van Rossumd1883581995-02-15 15:53:08 +0000122number in the objects so that suitable conversions can be made by the
123class's \code{__setstate__()} method.
124
Guido van Rossum470be141995-03-17 16:07:09 +0000125When a class itself is pickled, only its name is pickled --- the class
126definition is not pickled, but re-imported by the unpickling process.
127Therefore, the restriction that the class must be defined at the top
128level in a module applies to pickled classes as well.
129
130\renewcommand{\indexsubitem}{(in module pickle)}
131
Guido van Rossumd1883581995-02-15 15:53:08 +0000132The interface can be summarized as follows.
133
134To pickle an object \code{x} onto a file \code{f}, open for writing:
135
136\begin{verbatim}
137p = pickle.Pickler(f)
138p.dump(x)
139\end{verbatim}
140
Guido van Rossum470be141995-03-17 16:07:09 +0000141A shorthand for this is:
142
143\begin{verbatim}
144pickle.dump(x, f)
145\end{verbatim}
146
Guido van Rossumd1883581995-02-15 15:53:08 +0000147To unpickle an object \code{x} from a file \code{f}, open for reading:
148
149\begin{verbatim}
150u = pickle.Unpickler(f)
Guido van Rossum96628a91995-04-10 11:34:00 +0000151x = u.load()
Guido van Rossumd1883581995-02-15 15:53:08 +0000152\end{verbatim}
153
Guido van Rossum470be141995-03-17 16:07:09 +0000154A shorthand is:
155
156\begin{verbatim}
157x = pickle.load(f)
158\end{verbatim}
159
Guido van Rossumd1883581995-02-15 15:53:08 +0000160The \code{Pickler} class only calls the method \code{f.write} with a
161string argument. The \code{Unpickler} calls the methods \code{f.read}
162(with an integer argument) and \code{f.readline} (without argument),
163both returning a string. It is explicitly allowed to pass non-file
164objects here, as long as they have the right methods.
Guido van Rossum470be141995-03-17 16:07:09 +0000165\ttindex{Unpickler}
166\ttindex{Pickler}
Guido van Rossumd1883581995-02-15 15:53:08 +0000167
168The following types can be pickled:
169\begin{itemize}
170
171\item \code{None}
172
173\item integers, long integers, floating point numbers
174
175\item strings
176
177\item tuples, lists and dictionaries containing only picklable objects
178
Guido van Rossum470be141995-03-17 16:07:09 +0000179\item classes that are defined at the top level in a module
180
181\item instances of such classes whose \code{__dict__} or
182\code{__setstate__()} is picklable
Guido van Rossumd1883581995-02-15 15:53:08 +0000183
184\end{itemize}
185
Guido van Rossum470be141995-03-17 16:07:09 +0000186Attempts to pickle unpicklable objects will raise the
187\code{PicklingError} exception; when this happens, an unspecified
188number of bytes may have been written to the file.
Guido van Rossumd1883581995-02-15 15:53:08 +0000189
Guido van Rossum470be141995-03-17 16:07:09 +0000190It is possible to make multiple calls to the \code{dump()} method of
191the same \code{Pickler} instance. These must then be matched to the
192same number of calls to the \code{load()} instance of the
193corresponding \code{Unpickler} instance. If the same object is
194pickled by multiple \code{dump()} calls, the \code{load()} will all
195yield references to the same object. {\em Warning}: this is intended
196for pickling multiple objects without intervening modifications to the
197objects or their parts. If you modify an object and then pickle it
198again using the same \code{Pickler} instance, the object is not
199pickled again --- a reference to it is pickled and the
200\code{Unpickler} will return the old value, not the modified one.
201(There are two problems here: (a) detecting changes, and (b)
202marshalling a minimal set of changes. I have no answers. Garbage
203Collection may also become a problem here.)
204
205Apart from the \code{Pickler} and \code{Unpickler} classes, the
206module defines the following functions, and an exception:
207
208\begin{funcdesc}{dump}{object\, file}
209Write a pickled representation of \var{obect} to the open file object
210\var{file}. This is equivalent to \code{Pickler(file).dump(object)}.
211\end{funcdesc}
212
213\begin{funcdesc}{load}{file}
214Read a pickled object from the open file object \var{file}. This is
215equivalent to \code{Unpickler(file).load()}.
216\end{funcdesc}
217
218\begin{funcdesc}{dumps}{object}
219Return the pickled representation of the object as a string, instead
220of writing it to a file.
221\end{funcdesc}
222
223\begin{funcdesc}{loads}{string}
224Read a pickled object from a string instead of a file. Characters in
225the string past the pickled object's representation are ignored.
226\end{funcdesc}
227
228\begin{excdesc}{PicklingError}
229This exception is raised when an unpicklable object is passed to
230\code{Pickler.dump()}.
231\end{excdesc}