blob: 8dc29e40ff622c5d148164bd4d90cc97d3f4e16b [file] [log] [blame]
Guido van Rossum470be141995-03-17 16:07:09 +00001\section{Standard Module \sectcode{pickle}}
Guido van Rossumd1883581995-02-15 15:53:08 +00002\stmodindex{pickle}
3\index{persistency}
4\indexii{persistent}{objects}
5\indexii{serializing}{objects}
6\indexii{marshalling}{objects}
7\indexii{flattening}{objects}
8\indexii{pickling}{objects}
9
Guido van Rossum470be141995-03-17 16:07:09 +000010\renewcommand{\indexsubitem}{(in module pickle)}
11
Guido van Rossumd1883581995-02-15 15:53:08 +000012The \code{pickle} module implements a basic but powerful algorithm for
Guido van Rossum6bb1adc1995-03-13 10:03:32 +000013``pickling'' (a.k.a.\ serializing, marshalling or flattening) nearly
Guido van Rossumd1883581995-02-15 15:53:08 +000014arbitrary Python objects. This is a more primitive notion than
15persistency --- although \code{pickle} reads and writes file objects,
16it does not handle the issue of naming persistent objects, nor the
17(even more complicated) area of concurrent access to persistent
18objects. The \code{pickle} module can transform a complex object into
19a byte stream and it can transform the byte stream into an object with
20the same internal structure. The most obvious thing to do with these
21byte streams is to write them onto a file, but it is also conceivable
22to send them across a network or store them in a database. The module
23\code{shelve} provides a simple interface to pickle and unpickle
24objects on ``dbm''-style database files.
25\stmodindex{shelve}
26
27Unlike the built-in module \code{marshal}, \code{pickle} handles the
28following correctly:
29\stmodindex{marshal}
30
31\begin{itemize}
32
Guido van Rossum470be141995-03-17 16:07:09 +000033\item recursive objects (objects containing references to themselves)
Guido van Rossumd1883581995-02-15 15:53:08 +000034
Guido van Rossum470be141995-03-17 16:07:09 +000035\item object sharing (references to the same object in different places)
Guido van Rossumd1883581995-02-15 15:53:08 +000036
Guido van Rossum470be141995-03-17 16:07:09 +000037\item user-defined classes and their instances
Guido van Rossumd1883581995-02-15 15:53:08 +000038
39\end{itemize}
40
41The data format used by \code{pickle} is Python-specific. This has
42the advantage that there are no restrictions imposed by external
43standards such as CORBA (which probably can't represent pointer
44sharing or recursive objects); however it means that non-Python
45programs may not be able to reconstruct pickled Python objects.
46
Guido van Rossum470be141995-03-17 16:07:09 +000047The \code{pickle} data format uses a printable \ASCII{} representation.
Guido van Rossumd1883581995-02-15 15:53:08 +000048This is slightly more voluminous than a binary representation.
49However, small integers actually take {\em less} space when
50represented as minimal-size decimal strings than when represented as
5132-bit binary numbers, and strings are only much longer if they
52contain many control characters or 8-bit characters. The big
Guido van Rossum470be141995-03-17 16:07:09 +000053advantage of using printable \ASCII{} (and of some other characteristics
Guido van Rossumd1883581995-02-15 15:53:08 +000054of \code{pickle}'s representation) is that for debugging or recovery
55purposes it is possible for a human to read the pickled file with a
56standard text editor. (I could have gone a step further and used a
57notation like S-expressions, but the parser would have been
58considerably more complicated and slower, and the files would probably
59have become much larger.)
60
61The \code{pickle} module doesn't handle code objects, which the
62\code{marshal} module does. I suppose \code{pickle} could, and maybe
63it should, but there's probably no great need for it right now (as
64long as \code{marshal} continues to be used for reading and writing
65code objects), and at least this avoids the possibility of smuggling
66Trojan horses into a program.
67\stmodindex{marshal}
68
69For the benefit of persistency modules written using \code{pickle}, it
70supports the notion of a reference to an object outside the pickled
71data stream. Such objects are referenced by a name, which is an
Guido van Rossum470be141995-03-17 16:07:09 +000072arbitrary string of printable \ASCII{} characters. The resolution of
Guido van Rossumd1883581995-02-15 15:53:08 +000073such names is not defined by the \code{pickle} module --- the
74persistent object module will have to implement a method
75\code{persistent_load}. To write references to persistent objects,
76the persistent module must define a method \code{persistent_id} which
77returns either \code{None} or the persistent ID of the object.
78
79There are some restrictions on the pickling of class instances.
80
81First of all, the class must be defined at the top level in a module.
82
Guido van Rossum470be141995-03-17 16:07:09 +000083\renewcommand{\indexsubitem}{(pickle protocol)}
84
Guido van Rossumd1883581995-02-15 15:53:08 +000085Next, it must normally be possible to create class instances by
86calling the class without arguments. If this is undesirable, the
87class can define a method \code{__getinitargs__()}, which should
88return a {\em tuple} containing the arguments to be passed to the
89class constructor (\code{__init__()}).
90\ttindex{__getinitargs__}
91\ttindex{__init__}
92
Guido van Rossum470be141995-03-17 16:07:09 +000093Classes can further influence how their instances are pickled --- if the class
Guido van Rossumd1883581995-02-15 15:53:08 +000094defines the method \code{__getstate__()}, it is called and the return
95state is pickled as the contents for the instance, and if the class
96defines the method \code{__setstate__()}, it is called with the
97unpickled state. (Note that these methods can also be used to
98implement copying class instances.) If there is no
99\code{__getstate__()} method, the instance's \code{__dict__} is
100pickled. If there is no \code{__setstate__()} method, the pickled
101object must be a dictionary and its items are assigned to the new
102instance's dictionary. (If a class defines both \code{__getstate__()}
103and \code{__setstate__()}, the state object needn't be a dictionary
104--- these methods can do what they want.) This protocol is also used
105by the shallow and deep copying operations defined in the \code{copy}
106module.
107\ttindex{__getstate__}
108\ttindex{__setstate__}
109\ttindex{__dict__}
110
111Note that when class instances are pickled, their class's code and
Guido van Rossum6bb1adc1995-03-13 10:03:32 +0000112data are not pickled along with them. Only the instance data are
Guido van Rossumd1883581995-02-15 15:53:08 +0000113pickled. This is done on purpose, so you can fix bugs in a class or
114add methods and still load objects that were created with an earlier
115version of the class. If you plan to have long-lived objects that
Guido van Rossum6bb1adc1995-03-13 10:03:32 +0000116will see many versions of a class, it may be worthwhile to put a version
Guido van Rossumd1883581995-02-15 15:53:08 +0000117number in the objects so that suitable conversions can be made by the
118class's \code{__setstate__()} method.
119
Guido van Rossum470be141995-03-17 16:07:09 +0000120When a class itself is pickled, only its name is pickled --- the class
121definition is not pickled, but re-imported by the unpickling process.
122Therefore, the restriction that the class must be defined at the top
123level in a module applies to pickled classes as well.
124
125\renewcommand{\indexsubitem}{(in module pickle)}
126
Guido van Rossumd1883581995-02-15 15:53:08 +0000127The interface can be summarized as follows.
128
129To pickle an object \code{x} onto a file \code{f}, open for writing:
130
131\begin{verbatim}
132p = pickle.Pickler(f)
133p.dump(x)
134\end{verbatim}
135
Guido van Rossum470be141995-03-17 16:07:09 +0000136A shorthand for this is:
137
138\begin{verbatim}
139pickle.dump(x, f)
140\end{verbatim}
141
Guido van Rossumd1883581995-02-15 15:53:08 +0000142To unpickle an object \code{x} from a file \code{f}, open for reading:
143
144\begin{verbatim}
145u = pickle.Unpickler(f)
146x = u.load(x)
147\end{verbatim}
148
Guido van Rossum470be141995-03-17 16:07:09 +0000149A shorthand is:
150
151\begin{verbatim}
152x = pickle.load(f)
153\end{verbatim}
154
Guido van Rossumd1883581995-02-15 15:53:08 +0000155The \code{Pickler} class only calls the method \code{f.write} with a
156string argument. The \code{Unpickler} calls the methods \code{f.read}
157(with an integer argument) and \code{f.readline} (without argument),
158both returning a string. It is explicitly allowed to pass non-file
159objects here, as long as they have the right methods.
Guido van Rossum470be141995-03-17 16:07:09 +0000160\ttindex{Unpickler}
161\ttindex{Pickler}
Guido van Rossumd1883581995-02-15 15:53:08 +0000162
163The following types can be pickled:
164\begin{itemize}
165
166\item \code{None}
167
168\item integers, long integers, floating point numbers
169
170\item strings
171
172\item tuples, lists and dictionaries containing only picklable objects
173
Guido van Rossum470be141995-03-17 16:07:09 +0000174\item classes that are defined at the top level in a module
175
176\item instances of such classes whose \code{__dict__} or
177\code{__setstate__()} is picklable
Guido van Rossumd1883581995-02-15 15:53:08 +0000178
179\end{itemize}
180
Guido van Rossum470be141995-03-17 16:07:09 +0000181Attempts to pickle unpicklable objects will raise the
182\code{PicklingError} exception; when this happens, an unspecified
183number of bytes may have been written to the file.
Guido van Rossumd1883581995-02-15 15:53:08 +0000184
Guido van Rossum470be141995-03-17 16:07:09 +0000185It is possible to make multiple calls to the \code{dump()} method of
186the same \code{Pickler} instance. These must then be matched to the
187same number of calls to the \code{load()} instance of the
188corresponding \code{Unpickler} instance. If the same object is
189pickled by multiple \code{dump()} calls, the \code{load()} will all
190yield references to the same object. {\em Warning}: this is intended
191for pickling multiple objects without intervening modifications to the
192objects or their parts. If you modify an object and then pickle it
193again using the same \code{Pickler} instance, the object is not
194pickled again --- a reference to it is pickled and the
195\code{Unpickler} will return the old value, not the modified one.
196(There are two problems here: (a) detecting changes, and (b)
197marshalling a minimal set of changes. I have no answers. Garbage
198Collection may also become a problem here.)
199
200Apart from the \code{Pickler} and \code{Unpickler} classes, the
201module defines the following functions, and an exception:
202
203\begin{funcdesc}{dump}{object\, file}
204Write a pickled representation of \var{obect} to the open file object
205\var{file}. This is equivalent to \code{Pickler(file).dump(object)}.
206\end{funcdesc}
207
208\begin{funcdesc}{load}{file}
209Read a pickled object from the open file object \var{file}. This is
210equivalent to \code{Unpickler(file).load()}.
211\end{funcdesc}
212
213\begin{funcdesc}{dumps}{object}
214Return the pickled representation of the object as a string, instead
215of writing it to a file.
216\end{funcdesc}
217
218\begin{funcdesc}{loads}{string}
219Read a pickled object from a string instead of a file. Characters in
220the string past the pickled object's representation are ignored.
221\end{funcdesc}
222
223\begin{excdesc}{PicklingError}
224This exception is raised when an unpicklable object is passed to
225\code{Pickler.dump()}.
226\end{excdesc}