blob: cb054a78a982da03176d0a4744d82e862f151bfc [file] [log] [blame]
Guido van Rossum470be141995-03-17 16:07:09 +00001\section{Standard Module \sectcode{pickle}}
Guido van Rossume47da0a1997-07-17 16:34:52 +00002\label{module-pickle}
Guido van Rossumd1883581995-02-15 15:53:08 +00003\stmodindex{pickle}
4\index{persistency}
5\indexii{persistent}{objects}
6\indexii{serializing}{objects}
7\indexii{marshalling}{objects}
8\indexii{flattening}{objects}
9\indexii{pickling}{objects}
10
Guido van Rossum470be141995-03-17 16:07:09 +000011\renewcommand{\indexsubitem}{(in module pickle)}
12
Guido van Rossumd1883581995-02-15 15:53:08 +000013The \code{pickle} module implements a basic but powerful algorithm for
Guido van Rossum6bb1adc1995-03-13 10:03:32 +000014``pickling'' (a.k.a.\ serializing, marshalling or flattening) nearly
Guido van Rossumecde7811995-03-28 13:35:14 +000015arbitrary Python objects. This is the act of converting objects to a
16stream of bytes (and back: ``unpickling'').
17This is a more primitive notion than
Guido van Rossumd1883581995-02-15 15:53:08 +000018persistency --- although \code{pickle} reads and writes file objects,
19it does not handle the issue of naming persistent objects, nor the
20(even more complicated) area of concurrent access to persistent
21objects. The \code{pickle} module can transform a complex object into
22a byte stream and it can transform the byte stream into an object with
23the same internal structure. The most obvious thing to do with these
24byte streams is to write them onto a file, but it is also conceivable
25to send them across a network or store them in a database. The module
26\code{shelve} provides a simple interface to pickle and unpickle
27objects on ``dbm''-style database files.
28\stmodindex{shelve}
29
30Unlike the built-in module \code{marshal}, \code{pickle} handles the
31following correctly:
32\stmodindex{marshal}
33
34\begin{itemize}
35
Guido van Rossum470be141995-03-17 16:07:09 +000036\item recursive objects (objects containing references to themselves)
Guido van Rossumd1883581995-02-15 15:53:08 +000037
Guido van Rossum470be141995-03-17 16:07:09 +000038\item object sharing (references to the same object in different places)
Guido van Rossumd1883581995-02-15 15:53:08 +000039
Guido van Rossum470be141995-03-17 16:07:09 +000040\item user-defined classes and their instances
Guido van Rossumd1883581995-02-15 15:53:08 +000041
42\end{itemize}
43
44The data format used by \code{pickle} is Python-specific. This has
45the advantage that there are no restrictions imposed by external
46standards such as CORBA (which probably can't represent pointer
47sharing or recursive objects); however it means that non-Python
48programs may not be able to reconstruct pickled Python objects.
49
Guido van Rossum470be141995-03-17 16:07:09 +000050The \code{pickle} data format uses a printable \ASCII{} representation.
Guido van Rossumd1883581995-02-15 15:53:08 +000051This is slightly more voluminous than a binary representation.
52However, small integers actually take {\em less} space when
53represented as minimal-size decimal strings than when represented as
5432-bit binary numbers, and strings are only much longer if they
55contain many control characters or 8-bit characters. The big
Guido van Rossum470be141995-03-17 16:07:09 +000056advantage of using printable \ASCII{} (and of some other characteristics
Guido van Rossumd1883581995-02-15 15:53:08 +000057of \code{pickle}'s representation) is that for debugging or recovery
58purposes it is possible for a human to read the pickled file with a
59standard text editor. (I could have gone a step further and used a
Guido van Rossumecde7811995-03-28 13:35:14 +000060notation like S-expressions, but the parser
61(currently written in Python) would have been
Guido van Rossumd1883581995-02-15 15:53:08 +000062considerably more complicated and slower, and the files would probably
63have become much larger.)
64
65The \code{pickle} module doesn't handle code objects, which the
66\code{marshal} module does. I suppose \code{pickle} could, and maybe
67it should, but there's probably no great need for it right now (as
68long as \code{marshal} continues to be used for reading and writing
69code objects), and at least this avoids the possibility of smuggling
70Trojan horses into a program.
71\stmodindex{marshal}
72
73For the benefit of persistency modules written using \code{pickle}, it
74supports the notion of a reference to an object outside the pickled
75data stream. Such objects are referenced by a name, which is an
Guido van Rossum470be141995-03-17 16:07:09 +000076arbitrary string of printable \ASCII{} characters. The resolution of
Guido van Rossumd1883581995-02-15 15:53:08 +000077such names is not defined by the \code{pickle} module --- the
78persistent object module will have to implement a method
79\code{persistent_load}. To write references to persistent objects,
80the persistent module must define a method \code{persistent_id} which
81returns either \code{None} or the persistent ID of the object.
82
83There are some restrictions on the pickling of class instances.
84
85First of all, the class must be defined at the top level in a module.
86
Guido van Rossum470be141995-03-17 16:07:09 +000087\renewcommand{\indexsubitem}{(pickle protocol)}
88
Guido van Rossumd1883581995-02-15 15:53:08 +000089Next, it must normally be possible to create class instances by
Guido van Rossum12f0cc31996-08-09 21:23:47 +000090calling the class without arguments. Usually, this is best
91accomplished by providing default values for all arguments to its
92\code{__init__} method (if it has one). If this is undesirable, the
Guido van Rossumd1883581995-02-15 15:53:08 +000093class can define a method \code{__getinitargs__()}, which should
94return a {\em tuple} containing the arguments to be passed to the
95class constructor (\code{__init__()}).
96\ttindex{__getinitargs__}
97\ttindex{__init__}
98
Guido van Rossum470be141995-03-17 16:07:09 +000099Classes can further influence how their instances are pickled --- if the class
Guido van Rossumd1883581995-02-15 15:53:08 +0000100defines the method \code{__getstate__()}, it is called and the return
101state is pickled as the contents for the instance, and if the class
102defines the method \code{__setstate__()}, it is called with the
103unpickled state. (Note that these methods can also be used to
104implement copying class instances.) If there is no
105\code{__getstate__()} method, the instance's \code{__dict__} is
106pickled. If there is no \code{__setstate__()} method, the pickled
107object must be a dictionary and its items are assigned to the new
108instance's dictionary. (If a class defines both \code{__getstate__()}
109and \code{__setstate__()}, the state object needn't be a dictionary
110--- these methods can do what they want.) This protocol is also used
111by the shallow and deep copying operations defined in the \code{copy}
112module.
113\ttindex{__getstate__}
114\ttindex{__setstate__}
115\ttindex{__dict__}
116
117Note that when class instances are pickled, their class's code and
Guido van Rossum6bb1adc1995-03-13 10:03:32 +0000118data are not pickled along with them. Only the instance data are
Guido van Rossumd1883581995-02-15 15:53:08 +0000119pickled. This is done on purpose, so you can fix bugs in a class or
120add methods and still load objects that were created with an earlier
121version of the class. If you plan to have long-lived objects that
Guido van Rossum6bb1adc1995-03-13 10:03:32 +0000122will see many versions of a class, it may be worthwhile to put a version
Guido van Rossumd1883581995-02-15 15:53:08 +0000123number in the objects so that suitable conversions can be made by the
124class's \code{__setstate__()} method.
125
Guido van Rossum470be141995-03-17 16:07:09 +0000126When a class itself is pickled, only its name is pickled --- the class
127definition is not pickled, but re-imported by the unpickling process.
128Therefore, the restriction that the class must be defined at the top
129level in a module applies to pickled classes as well.
130
131\renewcommand{\indexsubitem}{(in module pickle)}
132
Guido van Rossumd1883581995-02-15 15:53:08 +0000133The interface can be summarized as follows.
134
135To pickle an object \code{x} onto a file \code{f}, open for writing:
136
Guido van Rossume47da0a1997-07-17 16:34:52 +0000137\bcode\begin{verbatim}
Guido van Rossumd1883581995-02-15 15:53:08 +0000138p = pickle.Pickler(f)
139p.dump(x)
Guido van Rossume47da0a1997-07-17 16:34:52 +0000140\end{verbatim}\ecode
141%
Guido van Rossum470be141995-03-17 16:07:09 +0000142A shorthand for this is:
143
Guido van Rossume47da0a1997-07-17 16:34:52 +0000144\bcode\begin{verbatim}
Guido van Rossum470be141995-03-17 16:07:09 +0000145pickle.dump(x, f)
Guido van Rossume47da0a1997-07-17 16:34:52 +0000146\end{verbatim}\ecode
147%
Guido van Rossumd1883581995-02-15 15:53:08 +0000148To unpickle an object \code{x} from a file \code{f}, open for reading:
149
Guido van Rossume47da0a1997-07-17 16:34:52 +0000150\bcode\begin{verbatim}
Guido van Rossumd1883581995-02-15 15:53:08 +0000151u = pickle.Unpickler(f)
Guido van Rossum96628a91995-04-10 11:34:00 +0000152x = u.load()
Guido van Rossume47da0a1997-07-17 16:34:52 +0000153\end{verbatim}\ecode
154%
Guido van Rossum470be141995-03-17 16:07:09 +0000155A shorthand is:
156
Guido van Rossume47da0a1997-07-17 16:34:52 +0000157\bcode\begin{verbatim}
Guido van Rossum470be141995-03-17 16:07:09 +0000158x = pickle.load(f)
Guido van Rossume47da0a1997-07-17 16:34:52 +0000159\end{verbatim}\ecode
160%
Guido van Rossumd1883581995-02-15 15:53:08 +0000161The \code{Pickler} class only calls the method \code{f.write} with a
162string argument. The \code{Unpickler} calls the methods \code{f.read}
163(with an integer argument) and \code{f.readline} (without argument),
164both returning a string. It is explicitly allowed to pass non-file
165objects here, as long as they have the right methods.
Guido van Rossum470be141995-03-17 16:07:09 +0000166\ttindex{Unpickler}
167\ttindex{Pickler}
Guido van Rossumd1883581995-02-15 15:53:08 +0000168
169The following types can be pickled:
170\begin{itemize}
171
172\item \code{None}
173
174\item integers, long integers, floating point numbers
175
176\item strings
177
178\item tuples, lists and dictionaries containing only picklable objects
179
Guido van Rossum470be141995-03-17 16:07:09 +0000180\item classes that are defined at the top level in a module
181
182\item instances of such classes whose \code{__dict__} or
183\code{__setstate__()} is picklable
Guido van Rossumd1883581995-02-15 15:53:08 +0000184
185\end{itemize}
186
Guido van Rossum470be141995-03-17 16:07:09 +0000187Attempts to pickle unpicklable objects will raise the
188\code{PicklingError} exception; when this happens, an unspecified
189number of bytes may have been written to the file.
Guido van Rossumd1883581995-02-15 15:53:08 +0000190
Guido van Rossum470be141995-03-17 16:07:09 +0000191It is possible to make multiple calls to the \code{dump()} method of
192the same \code{Pickler} instance. These must then be matched to the
193same number of calls to the \code{load()} instance of the
194corresponding \code{Unpickler} instance. If the same object is
195pickled by multiple \code{dump()} calls, the \code{load()} will all
196yield references to the same object. {\em Warning}: this is intended
197for pickling multiple objects without intervening modifications to the
198objects or their parts. If you modify an object and then pickle it
199again using the same \code{Pickler} instance, the object is not
200pickled again --- a reference to it is pickled and the
201\code{Unpickler} will return the old value, not the modified one.
202(There are two problems here: (a) detecting changes, and (b)
203marshalling a minimal set of changes. I have no answers. Garbage
204Collection may also become a problem here.)
205
206Apart from the \code{Pickler} and \code{Unpickler} classes, the
207module defines the following functions, and an exception:
208
209\begin{funcdesc}{dump}{object\, file}
210Write a pickled representation of \var{obect} to the open file object
211\var{file}. This is equivalent to \code{Pickler(file).dump(object)}.
212\end{funcdesc}
213
214\begin{funcdesc}{load}{file}
215Read a pickled object from the open file object \var{file}. This is
216equivalent to \code{Unpickler(file).load()}.
217\end{funcdesc}
218
219\begin{funcdesc}{dumps}{object}
220Return the pickled representation of the object as a string, instead
221of writing it to a file.
222\end{funcdesc}
223
224\begin{funcdesc}{loads}{string}
225Read a pickled object from a string instead of a file. Characters in
226the string past the pickled object's representation are ignored.
227\end{funcdesc}
228
229\begin{excdesc}{PicklingError}
230This exception is raised when an unpicklable object is passed to
231\code{Pickler.dump()}.
232\end{excdesc}