blob: 579992f049d5ad9fbd2604cc376a64a257047fa8 [file] [log] [blame]
Guido van Rossum470be141995-03-17 16:07:09 +00001\section{Standard Module \sectcode{pickle}}
Guido van Rossumd1883581995-02-15 15:53:08 +00002\stmodindex{pickle}
3\index{persistency}
4\indexii{persistent}{objects}
5\indexii{serializing}{objects}
6\indexii{marshalling}{objects}
7\indexii{flattening}{objects}
8\indexii{pickling}{objects}
9
Guido van Rossum470be141995-03-17 16:07:09 +000010\renewcommand{\indexsubitem}{(in module pickle)}
11
Guido van Rossumd1883581995-02-15 15:53:08 +000012The \code{pickle} module implements a basic but powerful algorithm for
Guido van Rossum6bb1adc1995-03-13 10:03:32 +000013``pickling'' (a.k.a.\ serializing, marshalling or flattening) nearly
Guido van Rossumecde7811995-03-28 13:35:14 +000014arbitrary Python objects. This is the act of converting objects to a
15stream of bytes (and back: ``unpickling'').
16This is a more primitive notion than
Guido van Rossumd1883581995-02-15 15:53:08 +000017persistency --- although \code{pickle} reads and writes file objects,
18it does not handle the issue of naming persistent objects, nor the
19(even more complicated) area of concurrent access to persistent
20objects. The \code{pickle} module can transform a complex object into
21a byte stream and it can transform the byte stream into an object with
22the same internal structure. The most obvious thing to do with these
23byte streams is to write them onto a file, but it is also conceivable
24to send them across a network or store them in a database. The module
25\code{shelve} provides a simple interface to pickle and unpickle
26objects on ``dbm''-style database files.
27\stmodindex{shelve}
28
29Unlike the built-in module \code{marshal}, \code{pickle} handles the
30following correctly:
31\stmodindex{marshal}
32
33\begin{itemize}
34
Guido van Rossum470be141995-03-17 16:07:09 +000035\item recursive objects (objects containing references to themselves)
Guido van Rossumd1883581995-02-15 15:53:08 +000036
Guido van Rossum470be141995-03-17 16:07:09 +000037\item object sharing (references to the same object in different places)
Guido van Rossumd1883581995-02-15 15:53:08 +000038
Guido van Rossum470be141995-03-17 16:07:09 +000039\item user-defined classes and their instances
Guido van Rossumd1883581995-02-15 15:53:08 +000040
41\end{itemize}
42
43The data format used by \code{pickle} is Python-specific. This has
44the advantage that there are no restrictions imposed by external
45standards such as CORBA (which probably can't represent pointer
46sharing or recursive objects); however it means that non-Python
47programs may not be able to reconstruct pickled Python objects.
48
Guido van Rossum470be141995-03-17 16:07:09 +000049The \code{pickle} data format uses a printable \ASCII{} representation.
Guido van Rossumd1883581995-02-15 15:53:08 +000050This is slightly more voluminous than a binary representation.
51However, small integers actually take {\em less} space when
52represented as minimal-size decimal strings than when represented as
5332-bit binary numbers, and strings are only much longer if they
54contain many control characters or 8-bit characters. The big
Guido van Rossum470be141995-03-17 16:07:09 +000055advantage of using printable \ASCII{} (and of some other characteristics
Guido van Rossumd1883581995-02-15 15:53:08 +000056of \code{pickle}'s representation) is that for debugging or recovery
57purposes it is possible for a human to read the pickled file with a
58standard text editor. (I could have gone a step further and used a
Guido van Rossumecde7811995-03-28 13:35:14 +000059notation like S-expressions, but the parser
60(currently written in Python) would have been
Guido van Rossumd1883581995-02-15 15:53:08 +000061considerably more complicated and slower, and the files would probably
62have become much larger.)
63
64The \code{pickle} module doesn't handle code objects, which the
65\code{marshal} module does. I suppose \code{pickle} could, and maybe
66it should, but there's probably no great need for it right now (as
67long as \code{marshal} continues to be used for reading and writing
68code objects), and at least this avoids the possibility of smuggling
69Trojan horses into a program.
70\stmodindex{marshal}
71
72For the benefit of persistency modules written using \code{pickle}, it
73supports the notion of a reference to an object outside the pickled
74data stream. Such objects are referenced by a name, which is an
Guido van Rossum470be141995-03-17 16:07:09 +000075arbitrary string of printable \ASCII{} characters. The resolution of
Guido van Rossumd1883581995-02-15 15:53:08 +000076such names is not defined by the \code{pickle} module --- the
77persistent object module will have to implement a method
78\code{persistent_load}. To write references to persistent objects,
79the persistent module must define a method \code{persistent_id} which
80returns either \code{None} or the persistent ID of the object.
81
82There are some restrictions on the pickling of class instances.
83
84First of all, the class must be defined at the top level in a module.
85
Guido van Rossum470be141995-03-17 16:07:09 +000086\renewcommand{\indexsubitem}{(pickle protocol)}
87
Guido van Rossumd1883581995-02-15 15:53:08 +000088Next, it must normally be possible to create class instances by
89calling the class without arguments. If this is undesirable, the
90class can define a method \code{__getinitargs__()}, which should
91return a {\em tuple} containing the arguments to be passed to the
92class constructor (\code{__init__()}).
93\ttindex{__getinitargs__}
94\ttindex{__init__}
95
Guido van Rossum470be141995-03-17 16:07:09 +000096Classes can further influence how their instances are pickled --- if the class
Guido van Rossumd1883581995-02-15 15:53:08 +000097defines the method \code{__getstate__()}, it is called and the return
98state is pickled as the contents for the instance, and if the class
99defines the method \code{__setstate__()}, it is called with the
100unpickled state. (Note that these methods can also be used to
101implement copying class instances.) If there is no
102\code{__getstate__()} method, the instance's \code{__dict__} is
103pickled. If there is no \code{__setstate__()} method, the pickled
104object must be a dictionary and its items are assigned to the new
105instance's dictionary. (If a class defines both \code{__getstate__()}
106and \code{__setstate__()}, the state object needn't be a dictionary
107--- these methods can do what they want.) This protocol is also used
108by the shallow and deep copying operations defined in the \code{copy}
109module.
110\ttindex{__getstate__}
111\ttindex{__setstate__}
112\ttindex{__dict__}
113
114Note that when class instances are pickled, their class's code and
Guido van Rossum6bb1adc1995-03-13 10:03:32 +0000115data are not pickled along with them. Only the instance data are
Guido van Rossumd1883581995-02-15 15:53:08 +0000116pickled. This is done on purpose, so you can fix bugs in a class or
117add methods and still load objects that were created with an earlier
118version of the class. If you plan to have long-lived objects that
Guido van Rossum6bb1adc1995-03-13 10:03:32 +0000119will see many versions of a class, it may be worthwhile to put a version
Guido van Rossumd1883581995-02-15 15:53:08 +0000120number in the objects so that suitable conversions can be made by the
121class's \code{__setstate__()} method.
122
Guido van Rossum470be141995-03-17 16:07:09 +0000123When a class itself is pickled, only its name is pickled --- the class
124definition is not pickled, but re-imported by the unpickling process.
125Therefore, the restriction that the class must be defined at the top
126level in a module applies to pickled classes as well.
127
128\renewcommand{\indexsubitem}{(in module pickle)}
129
Guido van Rossumd1883581995-02-15 15:53:08 +0000130The interface can be summarized as follows.
131
132To pickle an object \code{x} onto a file \code{f}, open for writing:
133
134\begin{verbatim}
135p = pickle.Pickler(f)
136p.dump(x)
137\end{verbatim}
138
Guido van Rossum470be141995-03-17 16:07:09 +0000139A shorthand for this is:
140
141\begin{verbatim}
142pickle.dump(x, f)
143\end{verbatim}
144
Guido van Rossumd1883581995-02-15 15:53:08 +0000145To unpickle an object \code{x} from a file \code{f}, open for reading:
146
147\begin{verbatim}
148u = pickle.Unpickler(f)
Guido van Rossum96628a91995-04-10 11:34:00 +0000149x = u.load()
Guido van Rossumd1883581995-02-15 15:53:08 +0000150\end{verbatim}
151
Guido van Rossum470be141995-03-17 16:07:09 +0000152A shorthand is:
153
154\begin{verbatim}
155x = pickle.load(f)
156\end{verbatim}
157
Guido van Rossumd1883581995-02-15 15:53:08 +0000158The \code{Pickler} class only calls the method \code{f.write} with a
159string argument. The \code{Unpickler} calls the methods \code{f.read}
160(with an integer argument) and \code{f.readline} (without argument),
161both returning a string. It is explicitly allowed to pass non-file
162objects here, as long as they have the right methods.
Guido van Rossum470be141995-03-17 16:07:09 +0000163\ttindex{Unpickler}
164\ttindex{Pickler}
Guido van Rossumd1883581995-02-15 15:53:08 +0000165
166The following types can be pickled:
167\begin{itemize}
168
169\item \code{None}
170
171\item integers, long integers, floating point numbers
172
173\item strings
174
175\item tuples, lists and dictionaries containing only picklable objects
176
Guido van Rossum470be141995-03-17 16:07:09 +0000177\item classes that are defined at the top level in a module
178
179\item instances of such classes whose \code{__dict__} or
180\code{__setstate__()} is picklable
Guido van Rossumd1883581995-02-15 15:53:08 +0000181
182\end{itemize}
183
Guido van Rossum470be141995-03-17 16:07:09 +0000184Attempts to pickle unpicklable objects will raise the
185\code{PicklingError} exception; when this happens, an unspecified
186number of bytes may have been written to the file.
Guido van Rossumd1883581995-02-15 15:53:08 +0000187
Guido van Rossum470be141995-03-17 16:07:09 +0000188It is possible to make multiple calls to the \code{dump()} method of
189the same \code{Pickler} instance. These must then be matched to the
190same number of calls to the \code{load()} instance of the
191corresponding \code{Unpickler} instance. If the same object is
192pickled by multiple \code{dump()} calls, the \code{load()} will all
193yield references to the same object. {\em Warning}: this is intended
194for pickling multiple objects without intervening modifications to the
195objects or their parts. If you modify an object and then pickle it
196again using the same \code{Pickler} instance, the object is not
197pickled again --- a reference to it is pickled and the
198\code{Unpickler} will return the old value, not the modified one.
199(There are two problems here: (a) detecting changes, and (b)
200marshalling a minimal set of changes. I have no answers. Garbage
201Collection may also become a problem here.)
202
203Apart from the \code{Pickler} and \code{Unpickler} classes, the
204module defines the following functions, and an exception:
205
206\begin{funcdesc}{dump}{object\, file}
207Write a pickled representation of \var{obect} to the open file object
208\var{file}. This is equivalent to \code{Pickler(file).dump(object)}.
209\end{funcdesc}
210
211\begin{funcdesc}{load}{file}
212Read a pickled object from the open file object \var{file}. This is
213equivalent to \code{Unpickler(file).load()}.
214\end{funcdesc}
215
216\begin{funcdesc}{dumps}{object}
217Return the pickled representation of the object as a string, instead
218of writing it to a file.
219\end{funcdesc}
220
221\begin{funcdesc}{loads}{string}
222Read a pickled object from a string instead of a file. Characters in
223the string past the pickled object's representation are ignored.
224\end{funcdesc}
225
226\begin{excdesc}{PicklingError}
227This exception is raised when an unpicklable object is passed to
228\code{Pickler.dump()}.
229\end{excdesc}