blob: a9d5fa4eb5ae64ffdce214b273b36a40db149e79 [file] [log] [blame]
Guido van Rossumd1883581995-02-15 15:53:08 +00001\section{Built-in module \sectcode{pickle}}
2\stmodindex{pickle}
3\index{persistency}
4\indexii{persistent}{objects}
5\indexii{serializing}{objects}
6\indexii{marshalling}{objects}
7\indexii{flattening}{objects}
8\indexii{pickling}{objects}
9
10The \code{pickle} module implements a basic but powerful algorithm for
11``pickling'' (a.k.a. serializing, marshalling or flattening) nearly
12arbitrary Python objects. This is a more primitive notion than
13persistency --- although \code{pickle} reads and writes file objects,
14it does not handle the issue of naming persistent objects, nor the
15(even more complicated) area of concurrent access to persistent
16objects. The \code{pickle} module can transform a complex object into
17a byte stream and it can transform the byte stream into an object with
18the same internal structure. The most obvious thing to do with these
19byte streams is to write them onto a file, but it is also conceivable
20to send them across a network or store them in a database. The module
21\code{shelve} provides a simple interface to pickle and unpickle
22objects on ``dbm''-style database files.
23\stmodindex{shelve}
24
25Unlike the built-in module \code{marshal}, \code{pickle} handles the
26following correctly:
27\stmodindex{marshal}
28
29\begin{itemize}
30
31\item recursive objects
32
33\item pointer sharing
34
35\item instances uf user-defined classes
36
37\end{itemize}
38
39The data format used by \code{pickle} is Python-specific. This has
40the advantage that there are no restrictions imposed by external
41standards such as CORBA (which probably can't represent pointer
42sharing or recursive objects); however it means that non-Python
43programs may not be able to reconstruct pickled Python objects.
44
45The \code{pickle} data format uses a printable ASCII representation.
46This is slightly more voluminous than a binary representation.
47However, small integers actually take {\em less} space when
48represented as minimal-size decimal strings than when represented as
4932-bit binary numbers, and strings are only much longer if they
50contain many control characters or 8-bit characters. The big
51advantage of using printable ASCII (and of some other characteristics
52of \code{pickle}'s representation) is that for debugging or recovery
53purposes it is possible for a human to read the pickled file with a
54standard text editor. (I could have gone a step further and used a
55notation like S-expressions, but the parser would have been
56considerably more complicated and slower, and the files would probably
57have become much larger.)
58
59The \code{pickle} module doesn't handle code objects, which the
60\code{marshal} module does. I suppose \code{pickle} could, and maybe
61it should, but there's probably no great need for it right now (as
62long as \code{marshal} continues to be used for reading and writing
63code objects), and at least this avoids the possibility of smuggling
64Trojan horses into a program.
65\stmodindex{marshal}
66
67For the benefit of persistency modules written using \code{pickle}, it
68supports the notion of a reference to an object outside the pickled
69data stream. Such objects are referenced by a name, which is an
70arbitrary string of printable ASCII characters. The resolution of
71such names is not defined by the \code{pickle} module --- the
72persistent object module will have to implement a method
73\code{persistent_load}. To write references to persistent objects,
74the persistent module must define a method \code{persistent_id} which
75returns either \code{None} or the persistent ID of the object.
76
77There are some restrictions on the pickling of class instances.
78
79First of all, the class must be defined at the top level in a module.
80
81Next, it must normally be possible to create class instances by
82calling the class without arguments. If this is undesirable, the
83class can define a method \code{__getinitargs__()}, which should
84return a {\em tuple} containing the arguments to be passed to the
85class constructor (\code{__init__()}).
86\ttindex{__getinitargs__}
87\ttindex{__init__}
88
89Classes can further influence how they are pickled --- if the class
90defines the method \code{__getstate__()}, it is called and the return
91state is pickled as the contents for the instance, and if the class
92defines the method \code{__setstate__()}, it is called with the
93unpickled state. (Note that these methods can also be used to
94implement copying class instances.) If there is no
95\code{__getstate__()} method, the instance's \code{__dict__} is
96pickled. If there is no \code{__setstate__()} method, the pickled
97object must be a dictionary and its items are assigned to the new
98instance's dictionary. (If a class defines both \code{__getstate__()}
99and \code{__setstate__()}, the state object needn't be a dictionary
100--- these methods can do what they want.) This protocol is also used
101by the shallow and deep copying operations defined in the \code{copy}
102module.
103\ttindex{__getstate__}
104\ttindex{__setstate__}
105\ttindex{__dict__}
106
107Note that when class instances are pickled, their class's code and
108data is not pickled along with them. Only the instance data is
109pickled. This is done on purpose, so you can fix bugs in a class or
110add methods and still load objects that were created with an earlier
111version of the class. If you plan to have long-lived objects that
112will see many versions of a class, it may be worth to put a version
113number in the objects so that suitable conversions can be made by the
114class's \code{__setstate__()} method.
115
116The interface can be summarized as follows.
117
118To pickle an object \code{x} onto a file \code{f}, open for writing:
119
120\begin{verbatim}
121p = pickle.Pickler(f)
122p.dump(x)
123\end{verbatim}
124
125To unpickle an object \code{x} from a file \code{f}, open for reading:
126
127\begin{verbatim}
128u = pickle.Unpickler(f)
129x = u.load(x)
130\end{verbatim}
131
132The \code{Pickler} class only calls the method \code{f.write} with a
133string argument. The \code{Unpickler} calls the methods \code{f.read}
134(with an integer argument) and \code{f.readline} (without argument),
135both returning a string. It is explicitly allowed to pass non-file
136objects here, as long as they have the right methods.
137
138The following types can be pickled:
139\begin{itemize}
140
141\item \code{None}
142
143\item integers, long integers, floating point numbers
144
145\item strings
146
147\item tuples, lists and dictionaries containing only picklable objects
148
149\item class instances whose \code{__dict__} or \code{__setstate__()}
150is picklable
151
152\end{itemize}
153
154Attempts to pickle unpicklable objects will raise an exception; when
155this happens, an unspecified number of bytes may have been written to
156the file argument.
157
158It is possible to make multiple calls to \code{Pickler.dump()} or to
159\code{Unpickler.load()}, as long as there is a one-to-one
160correspondence between pickler and \code{Unpickler} objects and
161between \code{dump} and \code{load} calls for any pair of
162corresponding \code{Pickler} and \code{Unpicklers}. {\em Warning}:
163this is intended for pickling multiple objects without intervening
164modifications to the objects or their parts. If you modify an object
165and then pickle it again using the same \code{Pickler} instance, the
166object is not pickled again --- a reference to it is pickled and the
167\code{Unpickler} will return the old value, not the modified one. (There
168are two problems here: (a) detecting changes, and (b) marshalling a
169minimal set of changes. I have no answers. Garbage Collection may
170also become a problem here.)