blob: bea1099e5a6af56c53b36870921be40797776634 [file] [log] [blame]
Georg Brandl8ec7f652007-08-15 14:28:01 +00001:mod:`pickle` --- Python object serialization
2=============================================
3
4.. index::
5 single: persistence
6 pair: persistent; objects
7 pair: serializing; objects
8 pair: marshalling; objects
9 pair: flattening; objects
10 pair: pickling; objects
11
12.. module:: pickle
13 :synopsis: Convert Python objects to streams of bytes and back.
Georg Brandlb19be572007-12-29 10:57:00 +000014.. sectionauthor:: Jim Kerr <jbkerr@sr.hp.com>.
15.. sectionauthor:: Barry Warsaw <barry@zope.com>
Georg Brandl8ec7f652007-08-15 14:28:01 +000016
17The :mod:`pickle` module implements a fundamental, but powerful algorithm for
18serializing and de-serializing a Python object structure. "Pickling" is the
19process whereby a Python object hierarchy is converted into a byte stream, and
20"unpickling" is the inverse operation, whereby a byte stream is converted back
21into an object hierarchy. Pickling (and unpickling) is alternatively known as
22"serialization", "marshalling," [#]_ or "flattening", however, to avoid
23confusion, the terms used here are "pickling" and "unpickling".
24
25This documentation describes both the :mod:`pickle` module and the
26:mod:`cPickle` module.
27
Georg Brandlb8d0e362010-11-26 07:53:50 +000028.. warning::
29
30 The :mod:`pickle` module is not intended to be secure against erroneous or
31 maliciously constructed data. Never unpickle data received from an untrusted
32 or unauthenticated source.
33
Georg Brandl8ec7f652007-08-15 14:28:01 +000034
35Relationship to other Python modules
36------------------------------------
37
38The :mod:`pickle` module has an optimized cousin called the :mod:`cPickle`
39module. As its name implies, :mod:`cPickle` is written in C, so it can be up to
401000 times faster than :mod:`pickle`. However it does not support subclassing
41of the :func:`Pickler` and :func:`Unpickler` classes, because in :mod:`cPickle`
42these are functions, not classes. Most applications have no need for this
43functionality, and can benefit from the improved performance of :mod:`cPickle`.
44Other than that, the interfaces of the two modules are nearly identical; the
45common interface is described in this manual and differences are pointed out
46where necessary. In the following discussions, we use the term "pickle" to
47collectively describe the :mod:`pickle` and :mod:`cPickle` modules.
48
49The data streams the two modules produce are guaranteed to be interchangeable.
50
51Python has a more primitive serialization module called :mod:`marshal`, but in
52general :mod:`pickle` should always be the preferred way to serialize Python
53objects. :mod:`marshal` exists primarily to support Python's :file:`.pyc`
54files.
55
Georg Brandl52f83952011-02-25 10:39:23 +000056The :mod:`pickle` module differs from :mod:`marshal` in several significant ways:
Georg Brandl8ec7f652007-08-15 14:28:01 +000057
58* The :mod:`pickle` module keeps track of the objects it has already serialized,
59 so that later references to the same object won't be serialized again.
60 :mod:`marshal` doesn't do this.
61
62 This has implications both for recursive objects and object sharing. Recursive
63 objects are objects that contain references to themselves. These are not
64 handled by marshal, and in fact, attempting to marshal recursive objects will
65 crash your Python interpreter. Object sharing happens when there are multiple
66 references to the same object in different places in the object hierarchy being
67 serialized. :mod:`pickle` stores such objects only once, and ensures that all
68 other references point to the master copy. Shared objects remain shared, which
69 can be very important for mutable objects.
70
71* :mod:`marshal` cannot be used to serialize user-defined classes and their
72 instances. :mod:`pickle` can save and restore class instances transparently,
73 however the class definition must be importable and live in the same module as
74 when the object was stored.
75
76* The :mod:`marshal` serialization format is not guaranteed to be portable
77 across Python versions. Because its primary job in life is to support
78 :file:`.pyc` files, the Python implementers reserve the right to change the
79 serialization format in non-backwards compatible ways should the need arise.
80 The :mod:`pickle` serialization format is guaranteed to be backwards compatible
81 across Python releases.
82
Georg Brandl8ec7f652007-08-15 14:28:01 +000083Note that serialization is a more primitive notion than persistence; although
84:mod:`pickle` reads and writes file objects, it does not handle the issue of
85naming persistent objects, nor the (even more complicated) issue of concurrent
86access to persistent objects. The :mod:`pickle` module can transform a complex
87object into a byte stream and it can transform the byte stream into an object
88with the same internal structure. Perhaps the most obvious thing to do with
89these byte streams is to write them onto a file, but it is also conceivable to
90send them across a network or store them in a database. The module
91:mod:`shelve` provides a simple interface to pickle and unpickle objects on
92DBM-style database files.
93
94
95Data stream format
96------------------
97
98.. index::
99 single: XDR
100 single: External Data Representation
101
102The data format used by :mod:`pickle` is Python-specific. This has the
103advantage that there are no restrictions imposed by external standards such as
104XDR (which can't represent pointer sharing); however it means that non-Python
105programs may not be able to reconstruct pickled Python objects.
106
107By default, the :mod:`pickle` data format uses a printable ASCII representation.
108This is slightly more voluminous than a binary representation. The big
109advantage of using printable ASCII (and of some other characteristics of
110:mod:`pickle`'s representation) is that for debugging or recovery purposes it is
111possible for a human to read the pickled file with a standard text editor.
112
113There are currently 3 different protocols which can be used for pickling.
114
115* Protocol version 0 is the original ASCII protocol and is backwards compatible
116 with earlier versions of Python.
117
118* Protocol version 1 is the old binary format which is also compatible with
119 earlier versions of Python.
120
121* Protocol version 2 was introduced in Python 2.3. It provides much more
Georg Brandla7395032007-10-21 12:15:05 +0000122 efficient pickling of :term:`new-style class`\es.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000123
124Refer to :pep:`307` for more information.
125
126If a *protocol* is not specified, protocol 0 is used. If *protocol* is specified
127as a negative value or :const:`HIGHEST_PROTOCOL`, the highest protocol version
128available will be used.
129
130.. versionchanged:: 2.3
131 Introduced the *protocol* parameter.
132
133A binary format, which is slightly more efficient, can be chosen by specifying a
134*protocol* version >= 1.
135
136
137Usage
138-----
139
140To serialize an object hierarchy, you first create a pickler, then you call the
141pickler's :meth:`dump` method. To de-serialize a data stream, you first create
142an unpickler, then you call the unpickler's :meth:`load` method. The
143:mod:`pickle` module provides the following constant:
144
145
146.. data:: HIGHEST_PROTOCOL
147
148 The highest protocol version available. This value can be passed as a
149 *protocol* value.
150
151 .. versionadded:: 2.3
152
153.. note::
154
155 Be sure to always open pickle files created with protocols >= 1 in binary mode.
156 For the old ASCII-based pickle protocol 0 you can use either text mode or binary
157 mode as long as you stay consistent.
158
159 A pickle file written with protocol 0 in binary mode will contain lone linefeeds
160 as line terminators and therefore will look "funny" when viewed in Notepad or
161 other editors which do not support this format.
162
163The :mod:`pickle` module provides the following functions to make the pickling
164process more convenient:
165
166
167.. function:: dump(obj, file[, protocol])
168
169 Write a pickled representation of *obj* to the open file object *file*. This is
170 equivalent to ``Pickler(file, protocol).dump(obj)``.
171
172 If the *protocol* parameter is omitted, protocol 0 is used. If *protocol* is
173 specified as a negative value or :const:`HIGHEST_PROTOCOL`, the highest protocol
174 version will be used.
175
176 .. versionchanged:: 2.3
177 Introduced the *protocol* parameter.
178
179 *file* must have a :meth:`write` method that accepts a single string argument.
180 It can thus be a file object opened for writing, a :mod:`StringIO` object, or
181 any other custom object that meets this interface.
182
183
184.. function:: load(file)
185
186 Read a string from the open file object *file* and interpret it as a pickle data
187 stream, reconstructing and returning the original object hierarchy. This is
188 equivalent to ``Unpickler(file).load()``.
189
190 *file* must have two methods, a :meth:`read` method that takes an integer
191 argument, and a :meth:`readline` method that requires no arguments. Both
192 methods should return a string. Thus *file* can be a file object opened for
193 reading, a :mod:`StringIO` object, or any other custom object that meets this
194 interface.
195
196 This function automatically determines whether the data stream was written in
197 binary mode or not.
198
199
200.. function:: dumps(obj[, protocol])
201
202 Return the pickled representation of the object as a string, instead of writing
203 it to a file.
204
205 If the *protocol* parameter is omitted, protocol 0 is used. If *protocol* is
206 specified as a negative value or :const:`HIGHEST_PROTOCOL`, the highest protocol
207 version will be used.
208
209 .. versionchanged:: 2.3
210 The *protocol* parameter was added.
211
212
213.. function:: loads(string)
214
215 Read a pickled object hierarchy from a string. Characters in the string past
216 the pickled object's representation are ignored.
217
218The :mod:`pickle` module also defines three exceptions:
219
220
221.. exception:: PickleError
222
223 A common base class for the other exceptions defined below. This inherits from
224 :exc:`Exception`.
225
226
227.. exception:: PicklingError
228
229 This exception is raised when an unpicklable object is passed to the
230 :meth:`dump` method.
231
232
233.. exception:: UnpicklingError
234
235 This exception is raised when there is a problem unpickling an object. Note that
236 other exceptions may also be raised during unpickling, including (but not
237 necessarily limited to) :exc:`AttributeError`, :exc:`EOFError`,
238 :exc:`ImportError`, and :exc:`IndexError`.
239
240The :mod:`pickle` module also exports two callables [#]_, :class:`Pickler` and
241:class:`Unpickler`:
242
243
244.. class:: Pickler(file[, protocol])
245
246 This takes a file-like object to which it will write a pickle data stream.
247
248 If the *protocol* parameter is omitted, protocol 0 is used. If *protocol* is
249 specified as a negative value or :const:`HIGHEST_PROTOCOL`, the highest
250 protocol version will be used.
251
252 .. versionchanged:: 2.3
253 Introduced the *protocol* parameter.
254
255 *file* must have a :meth:`write` method that accepts a single string argument.
256 It can thus be an open file object, a :mod:`StringIO` object, or any other
257 custom object that meets this interface.
258
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000259 :class:`Pickler` objects define one (or two) public methods:
Georg Brandl8ec7f652007-08-15 14:28:01 +0000260
261
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000262 .. method:: dump(obj)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000263
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000264 Write a pickled representation of *obj* to the open file object given in the
265 constructor. Either the binary or ASCII format will be used, depending on the
266 value of the *protocol* argument passed to the constructor.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000267
268
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000269 .. method:: clear_memo()
Georg Brandl8ec7f652007-08-15 14:28:01 +0000270
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000271 Clears the pickler's "memo". The memo is the data structure that remembers
272 which objects the pickler has already seen, so that shared or recursive objects
273 pickled by reference and not by value. This method is useful when re-using
274 picklers.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000275
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000276 .. note::
Georg Brandl8ec7f652007-08-15 14:28:01 +0000277
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000278 Prior to Python 2.3, :meth:`clear_memo` was only available on the picklers
279 created by :mod:`cPickle`. In the :mod:`pickle` module, picklers have an
280 instance variable called :attr:`memo` which is a Python dictionary. So to clear
281 the memo for a :mod:`pickle` module pickler, you could do the following::
Georg Brandl8ec7f652007-08-15 14:28:01 +0000282
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000283 mypickler.memo.clear()
Georg Brandl8ec7f652007-08-15 14:28:01 +0000284
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000285 Code that does not need to support older versions of Python should simply use
286 :meth:`clear_memo`.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000287
288It is possible to make multiple calls to the :meth:`dump` method of the same
289:class:`Pickler` instance. These must then be matched to the same number of
290calls to the :meth:`load` method of the corresponding :class:`Unpickler`
291instance. If the same object is pickled by multiple :meth:`dump` calls, the
292:meth:`load` will all yield references to the same object. [#]_
293
294:class:`Unpickler` objects are defined as:
295
296
297.. class:: Unpickler(file)
298
299 This takes a file-like object from which it will read a pickle data stream.
300 This class automatically determines whether the data stream was written in
301 binary mode or not, so it does not need a flag as in the :class:`Pickler`
302 factory.
303
304 *file* must have two methods, a :meth:`read` method that takes an integer
305 argument, and a :meth:`readline` method that requires no arguments. Both
306 methods should return a string. Thus *file* can be a file object opened for
307 reading, a :mod:`StringIO` object, or any other custom object that meets this
308 interface.
309
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000310 :class:`Unpickler` objects have one (or two) public methods:
Georg Brandl8ec7f652007-08-15 14:28:01 +0000311
312
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000313 .. method:: load()
Georg Brandl8ec7f652007-08-15 14:28:01 +0000314
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000315 Read a pickled object representation from the open file object given in
316 the constructor, and return the reconstituted object hierarchy specified
317 therein.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000318
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000319 This method automatically determines whether the data stream was written
320 in binary mode or not.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000321
322
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000323 .. method:: noload()
Georg Brandl8ec7f652007-08-15 14:28:01 +0000324
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000325 This is just like :meth:`load` except that it doesn't actually create any
326 objects. This is useful primarily for finding what's called "persistent
327 ids" that may be referenced in a pickle data stream. See section
328 :ref:`pickle-protocol` below for more details.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000329
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000330 **Note:** the :meth:`noload` method is currently only available on
331 :class:`Unpickler` objects created with the :mod:`cPickle` module.
332 :mod:`pickle` module :class:`Unpickler`\ s do not have the :meth:`noload`
333 method.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000334
335
336What can be pickled and unpickled?
337----------------------------------
338
339The following types can be pickled:
340
341* ``None``, ``True``, and ``False``
342
343* integers, long integers, floating point numbers, complex numbers
344
345* normal and Unicode strings
346
347* tuples, lists, sets, and dictionaries containing only picklable objects
348
349* functions defined at the top level of a module
350
351* built-in functions defined at the top level of a module
352
353* classes that are defined at the top level of a module
354
Eli Benderskyf29abd32013-01-02 06:02:23 -0800355* instances of such classes whose :attr:`__dict__` or the result of calling
356 :meth:`__getstate__` is picklable (see section :ref:`pickle-protocol` for
357 details).
Georg Brandl8ec7f652007-08-15 14:28:01 +0000358
359Attempts to pickle unpicklable objects will raise the :exc:`PicklingError`
360exception; when this happens, an unspecified number of bytes may have already
361been written to the underlying file. Trying to pickle a highly recursive data
362structure may exceed the maximum recursion depth, a :exc:`RuntimeError` will be
363raised in this case. You can carefully raise this limit with
364:func:`sys.setrecursionlimit`.
365
366Note that functions (built-in and user-defined) are pickled by "fully qualified"
367name reference, not by value. This means that only the function name is
Eli Benderskyf29abd32013-01-02 06:02:23 -0800368pickled, along with the name of the module the function is defined in. Neither
369the function's code, nor any of its function attributes are pickled. Thus the
Georg Brandl8ec7f652007-08-15 14:28:01 +0000370defining module must be importable in the unpickling environment, and the module
371must contain the named object, otherwise an exception will be raised. [#]_
372
373Similarly, classes are pickled by named reference, so the same restrictions in
374the unpickling environment apply. Note that none of the class's code or data is
375pickled, so in the following example the class attribute ``attr`` is not
376restored in the unpickling environment::
377
378 class Foo:
379 attr = 'a class attr'
380
381 picklestring = pickle.dumps(Foo)
382
383These restrictions are why picklable functions and classes must be defined in
384the top level of a module.
385
386Similarly, when class instances are pickled, their class's code and data are not
387pickled along with them. Only the instance data are pickled. This is done on
388purpose, so you can fix bugs in a class or add methods to the class and still
389load objects that were created with an earlier version of the class. If you
390plan to have long-lived objects that will see many versions of a class, it may
391be worthwhile to put a version number in the objects so that suitable
392conversions can be made by the class's :meth:`__setstate__` method.
393
394
395.. _pickle-protocol:
396
397The pickle protocol
398-------------------
399
Georg Brandl66ef83b2008-07-04 17:22:53 +0000400.. currentmodule:: None
401
Georg Brandl8ec7f652007-08-15 14:28:01 +0000402This section describes the "pickling protocol" that defines the interface
403between the pickler/unpickler and the objects that are being serialized. This
404protocol provides a standard way for you to define, customize, and control how
405your objects are serialized and de-serialized. The description in this section
406doesn't cover specific customizations that you can employ to make the unpickling
407environment slightly safer from untrusted pickle data streams; see section
408:ref:`pickle-sub` for more details.
409
410
411.. _pickle-inst:
412
413Pickling and unpickling normal class instances
414^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
415
Georg Brandl66ef83b2008-07-04 17:22:53 +0000416.. method:: object.__getinitargs__()
Georg Brandlc62ef8b2009-01-03 20:55:06 +0000417
Georg Brandl66ef83b2008-07-04 17:22:53 +0000418 When a pickled class instance is unpickled, its :meth:`__init__` method is
419 normally *not* invoked. If it is desirable that the :meth:`__init__` method
420 be called on unpickling, an old-style class can define a method
421 :meth:`__getinitargs__`, which should return a *tuple* containing the
422 arguments to be passed to the class constructor (:meth:`__init__` for
423 example). The :meth:`__getinitargs__` method is called at pickle time; the
424 tuple it returns is incorporated in the pickle for the instance.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000425
Georg Brandl66ef83b2008-07-04 17:22:53 +0000426.. method:: object.__getnewargs__()
Georg Brandl8ec7f652007-08-15 14:28:01 +0000427
Georg Brandl66ef83b2008-07-04 17:22:53 +0000428 New-style types can provide a :meth:`__getnewargs__` method that is used for
429 protocol 2. Implementing this method is needed if the type establishes some
430 internal invariants when the instance is created, or if the memory allocation
431 is affected by the values passed to the :meth:`__new__` method for the type
432 (as it is for tuples and strings). Instances of a :term:`new-style class`
433 ``C`` are created using ::
Georg Brandlc62ef8b2009-01-03 20:55:06 +0000434
Georg Brandl66ef83b2008-07-04 17:22:53 +0000435 obj = C.__new__(C, *args)
Georg Brandlc62ef8b2009-01-03 20:55:06 +0000436
Georg Brandl66ef83b2008-07-04 17:22:53 +0000437 where *args* is the result of calling :meth:`__getnewargs__` on the original
438 object; if there is no :meth:`__getnewargs__`, an empty tuple is assumed.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000439
Georg Brandl66ef83b2008-07-04 17:22:53 +0000440.. method:: object.__getstate__()
Georg Brandlc62ef8b2009-01-03 20:55:06 +0000441
Georg Brandl66ef83b2008-07-04 17:22:53 +0000442 Classes can further influence how their instances are pickled; if the class
443 defines the method :meth:`__getstate__`, it is called and the return state is
444 pickled as the contents for the instance, instead of the contents of the
445 instance's dictionary. If there is no :meth:`__getstate__` method, the
446 instance's :attr:`__dict__` is pickled.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000447
Georg Brandlc463c8a2010-10-17 11:07:40 +0000448.. method:: object.__setstate__(state)
Georg Brandlc62ef8b2009-01-03 20:55:06 +0000449
Georg Brandl66ef83b2008-07-04 17:22:53 +0000450 Upon unpickling, if the class also defines the method :meth:`__setstate__`,
451 it is called with the unpickled state. [#]_ If there is no
452 :meth:`__setstate__` method, the pickled state must be a dictionary and its
453 items are assigned to the new instance's dictionary. If a class defines both
454 :meth:`__getstate__` and :meth:`__setstate__`, the state object needn't be a
455 dictionary and these methods can do what they want. [#]_
Georg Brandlc62ef8b2009-01-03 20:55:06 +0000456
Georg Brandl16a57f62009-04-27 15:29:09 +0000457 .. note::
Georg Brandlc62ef8b2009-01-03 20:55:06 +0000458
Georg Brandl66ef83b2008-07-04 17:22:53 +0000459 For :term:`new-style class`\es, if :meth:`__getstate__` returns a false
460 value, the :meth:`__setstate__` method will not be called.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000461
Georg Brandla7ec0722009-04-05 14:40:06 +0000462.. note::
463
464 At unpickling time, some methods like :meth:`__getattr__`,
465 :meth:`__getattribute__`, or :meth:`__setattr__` may be called upon the
466 instance. In case those methods rely on some internal invariant being
467 true, the type should implement either :meth:`__getinitargs__` or
468 :meth:`__getnewargs__` to establish such an invariant; otherwise, neither
469 :meth:`__new__` nor :meth:`__init__` will be called.
470
Georg Brandl8ec7f652007-08-15 14:28:01 +0000471
472Pickling and unpickling extension types
473^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
474
Georg Brandl66ef83b2008-07-04 17:22:53 +0000475.. method:: object.__reduce__()
Georg Brandlc62ef8b2009-01-03 20:55:06 +0000476
Georg Brandl66ef83b2008-07-04 17:22:53 +0000477 When the :class:`Pickler` encounters an object of a type it knows nothing
478 about --- such as an extension type --- it looks in two places for a hint of
479 how to pickle it. One alternative is for the object to implement a
480 :meth:`__reduce__` method. If provided, at pickling time :meth:`__reduce__`
481 will be called with no arguments, and it must return either a string or a
482 tuple.
Andrew M. Kuchling8887e542008-02-23 16:39:43 +0000483
Georg Brandl66ef83b2008-07-04 17:22:53 +0000484 If a string is returned, it names a global variable whose contents are
485 pickled as normal. The string returned by :meth:`__reduce__` should be the
486 object's local name relative to its module; the pickle module searches the
487 module namespace to determine the object's module.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000488
Georg Brandl66ef83b2008-07-04 17:22:53 +0000489 When a tuple is returned, it must be between two and five elements long.
490 Optional elements can either be omitted, or ``None`` can be provided as their
491 value. The contents of this tuple are pickled as normal and used to
492 reconstruct the object at unpickling time. The semantics of each element
493 are:
Georg Brandl8ec7f652007-08-15 14:28:01 +0000494
Georg Brandl66ef83b2008-07-04 17:22:53 +0000495 * A callable object that will be called to create the initial version of the
496 object. The next element of the tuple will provide arguments for this
497 callable, and later elements provide additional state information that will
498 subsequently be used to fully reconstruct the pickled data.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000499
Georg Brandl66ef83b2008-07-04 17:22:53 +0000500 In the unpickling environment this object must be either a class, a
501 callable registered as a "safe constructor" (see below), or it must have an
502 attribute :attr:`__safe_for_unpickling__` with a true value. Otherwise, an
503 :exc:`UnpicklingError` will be raised in the unpickling environment. Note
504 that as usual, the callable itself is pickled by name.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000505
Georg Brandl66ef83b2008-07-04 17:22:53 +0000506 * A tuple of arguments for the callable object.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000507
Georg Brandl66ef83b2008-07-04 17:22:53 +0000508 .. versionchanged:: 2.5
509 Formerly, this argument could also be ``None``.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000510
Georg Brandl66ef83b2008-07-04 17:22:53 +0000511 * Optionally, the object's state, which will be passed to the object's
512 :meth:`__setstate__` method as described in section :ref:`pickle-inst`. If
513 the object has no :meth:`__setstate__` method, then, as above, the value
514 must be a dictionary and it will be added to the object's :attr:`__dict__`.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000515
Georg Brandl66ef83b2008-07-04 17:22:53 +0000516 * Optionally, an iterator (and not a sequence) yielding successive list
517 items. These list items will be pickled, and appended to the object using
518 either ``obj.append(item)`` or ``obj.extend(list_of_items)``. This is
519 primarily used for list subclasses, but may be used by other classes as
520 long as they have :meth:`append` and :meth:`extend` methods with the
521 appropriate signature. (Whether :meth:`append` or :meth:`extend` is used
522 depends on which pickle protocol version is used as well as the number of
523 items to append, so both must be supported.)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000524
Georg Brandl66ef83b2008-07-04 17:22:53 +0000525 * Optionally, an iterator (not a sequence) yielding successive dictionary
526 items, which should be tuples of the form ``(key, value)``. These items
527 will be pickled and stored to the object using ``obj[key] = value``. This
528 is primarily used for dictionary subclasses, but may be used by other
529 classes as long as they implement :meth:`__setitem__`.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000530
Georg Brandlc62ef8b2009-01-03 20:55:06 +0000531.. method:: object.__reduce_ex__(protocol)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000532
Georg Brandl66ef83b2008-07-04 17:22:53 +0000533 It is sometimes useful to know the protocol version when implementing
534 :meth:`__reduce__`. This can be done by implementing a method named
535 :meth:`__reduce_ex__` instead of :meth:`__reduce__`. :meth:`__reduce_ex__`,
536 when it exists, is called in preference over :meth:`__reduce__` (you may
537 still provide :meth:`__reduce__` for backwards compatibility). The
538 :meth:`__reduce_ex__` method will be called with a single integer argument,
539 the protocol version.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000540
Georg Brandl66ef83b2008-07-04 17:22:53 +0000541 The :class:`object` class implements both :meth:`__reduce__` and
542 :meth:`__reduce_ex__`; however, if a subclass overrides :meth:`__reduce__`
543 but not :meth:`__reduce_ex__`, the :meth:`__reduce_ex__` implementation
544 detects this and calls :meth:`__reduce__`.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000545
546An alternative to implementing a :meth:`__reduce__` method on the object to be
Georg Brandldffbf5f2008-05-20 07:49:57 +0000547pickled, is to register the callable with the :mod:`copy_reg` module. This
Georg Brandl8ec7f652007-08-15 14:28:01 +0000548module provides a way for programs to register "reduction functions" and
549constructors for user-defined types. Reduction functions have the same
550semantics and interface as the :meth:`__reduce__` method described above, except
551that they are called with a single argument, the object to be pickled.
552
553The registered constructor is deemed a "safe constructor" for purposes of
554unpickling as described above.
555
556
557Pickling and unpickling external objects
558^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
559
Andrew M. Kuchling8887e542008-02-23 16:39:43 +0000560.. index::
561 single: persistent_id (pickle protocol)
562 single: persistent_load (pickle protocol)
563
Georg Brandl8ec7f652007-08-15 14:28:01 +0000564For the benefit of object persistence, the :mod:`pickle` module supports the
565notion of a reference to an object outside the pickled data stream. Such
566objects are referenced by a "persistent id", which is just an arbitrary string
567of printable ASCII characters. The resolution of such names is not defined by
568the :mod:`pickle` module; it will delegate this resolution to user defined
569functions on the pickler and unpickler. [#]_
570
571To define external persistent id resolution, you need to set the
572:attr:`persistent_id` attribute of the pickler object and the
573:attr:`persistent_load` attribute of the unpickler object.
574
575To pickle objects that have an external persistent id, the pickler must have a
576custom :func:`persistent_id` method that takes an object as an argument and
577returns either ``None`` or the persistent id for that object. When ``None`` is
578returned, the pickler simply pickles the object as normal. When a persistent id
579string is returned, the pickler will pickle that string, along with a marker so
580that the unpickler will recognize the string as a persistent id.
581
582To unpickle external objects, the unpickler must have a custom
583:func:`persistent_load` function that takes a persistent id string and returns
584the referenced object.
585
586Here's a silly example that *might* shed more light::
587
588 import pickle
589 from cStringIO import StringIO
590
591 src = StringIO()
592 p = pickle.Pickler(src)
593
594 def persistent_id(obj):
595 if hasattr(obj, 'x'):
596 return 'the value %d' % obj.x
597 else:
598 return None
599
600 p.persistent_id = persistent_id
601
602 class Integer:
603 def __init__(self, x):
604 self.x = x
605 def __str__(self):
606 return 'My name is integer %d' % self.x
607
608 i = Integer(7)
609 print i
610 p.dump(i)
611
612 datastream = src.getvalue()
613 print repr(datastream)
614 dst = StringIO(datastream)
615
616 up = pickle.Unpickler(dst)
617
618 class FancyInteger(Integer):
619 def __str__(self):
620 return 'I am the integer %d' % self.x
621
622 def persistent_load(persid):
623 if persid.startswith('the value '):
624 value = int(persid.split()[2])
625 return FancyInteger(value)
626 else:
627 raise pickle.UnpicklingError, 'Invalid persistent id'
628
629 up.persistent_load = persistent_load
630
631 j = up.load()
632 print j
633
634In the :mod:`cPickle` module, the unpickler's :attr:`persistent_load` attribute
635can also be set to a Python list, in which case, when the unpickler reaches a
636persistent id, the persistent id string will simply be appended to this list.
637This functionality exists so that a pickle data stream can be "sniffed" for
638object references without actually instantiating all the objects in a pickle.
639[#]_ Setting :attr:`persistent_load` to a list is usually used in conjunction
640with the :meth:`noload` method on the Unpickler.
641
Georg Brandlb19be572007-12-29 10:57:00 +0000642.. BAW: Both pickle and cPickle support something called inst_persistent_id()
643 which appears to give unknown types a second shot at producing a persistent
644 id. Since Jim Fulton can't remember why it was added or what it's for, I'm
645 leaving it undocumented.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000646
647
648.. _pickle-sub:
649
650Subclassing Unpicklers
651----------------------
652
Andrew M. Kuchling8887e542008-02-23 16:39:43 +0000653.. index::
654 single: load_global() (pickle protocol)
655 single: find_global() (pickle protocol)
656
Georg Brandl8ec7f652007-08-15 14:28:01 +0000657By default, unpickling will import any class that it finds in the pickle data.
658You can control exactly what gets unpickled and what gets called by customizing
659your unpickler. Unfortunately, exactly how you do this is different depending
660on whether you're using :mod:`pickle` or :mod:`cPickle`. [#]_
661
662In the :mod:`pickle` module, you need to derive a subclass from
663:class:`Unpickler`, overriding the :meth:`load_global` method.
664:meth:`load_global` should read two lines from the pickle data stream where the
665first line will the name of the module containing the class and the second line
666will be the name of the instance's class. It then looks up the class, possibly
667importing the module and digging out the attribute, then it appends what it
668finds to the unpickler's stack. Later on, this class will be assigned to the
669:attr:`__class__` attribute of an empty class, as a way of magically creating an
670instance without calling its class's :meth:`__init__`. Your job (should you
671choose to accept it), would be to have :meth:`load_global` push onto the
672unpickler's stack, a known safe version of any class you deem safe to unpickle.
673It is up to you to produce such a class. Or you could raise an error if you
674want to disallow all unpickling of instances. If this sounds like a hack,
675you're right. Refer to the source code to make this work.
676
677Things are a little cleaner with :mod:`cPickle`, but not by much. To control
678what gets unpickled, you can set the unpickler's :attr:`find_global` attribute
679to a function or ``None``. If it is ``None`` then any attempts to unpickle
680instances will raise an :exc:`UnpicklingError`. If it is a function, then it
681should accept a module name and a class name, and return the corresponding class
682object. It is responsible for looking up the class and performing any necessary
683imports, and it may raise an error to prevent instances of the class from being
684unpickled.
685
686The moral of the story is that you should be really careful about the source of
687the strings your application unpickles.
688
689
690.. _pickle-example:
691
692Example
693-------
694
695For the simplest code, use the :func:`dump` and :func:`load` functions. Note
696that a self-referencing list is pickled and restored correctly. ::
697
698 import pickle
699
700 data1 = {'a': [1, 2.0, 3, 4+6j],
701 'b': ('string', u'Unicode string'),
702 'c': None}
703
704 selfref_list = [1, 2, 3]
705 selfref_list.append(selfref_list)
706
707 output = open('data.pkl', 'wb')
708
709 # Pickle dictionary using protocol 0.
710 pickle.dump(data1, output)
711
712 # Pickle the list using the highest protocol available.
713 pickle.dump(selfref_list, output, -1)
714
715 output.close()
716
717The following example reads the resulting pickled data. When reading a
718pickle-containing file, you should open the file in binary mode because you
719can't be sure if the ASCII or binary format was used. ::
720
Benjamin Petersona7b55a32009-02-20 03:31:23 +0000721 import pprint, pickle
Georg Brandl8ec7f652007-08-15 14:28:01 +0000722
723 pkl_file = open('data.pkl', 'rb')
724
725 data1 = pickle.load(pkl_file)
726 pprint.pprint(data1)
727
728 data2 = pickle.load(pkl_file)
729 pprint.pprint(data2)
730
731 pkl_file.close()
732
733Here's a larger example that shows how to modify pickling behavior for a class.
734The :class:`TextReader` class opens a text file, and returns the line number and
735line contents each time its :meth:`readline` method is called. If a
736:class:`TextReader` instance is pickled, all attributes *except* the file object
737member are saved. When the instance is unpickled, the file is reopened, and
738reading resumes from the last location. The :meth:`__setstate__` and
739:meth:`__getstate__` methods are used to implement this behavior. ::
740
741 #!/usr/local/bin/python
742
743 class TextReader:
744 """Print and number lines in a text file."""
745 def __init__(self, file):
746 self.file = file
747 self.fh = open(file)
748 self.lineno = 0
749
750 def readline(self):
751 self.lineno = self.lineno + 1
752 line = self.fh.readline()
753 if not line:
754 return None
755 if line.endswith("\n"):
756 line = line[:-1]
757 return "%d: %s" % (self.lineno, line)
758
759 def __getstate__(self):
760 odict = self.__dict__.copy() # copy the dict since we change it
761 del odict['fh'] # remove filehandle entry
762 return odict
763
764 def __setstate__(self, dict):
765 fh = open(dict['file']) # reopen file
766 count = dict['lineno'] # read from file...
767 while count: # until line count is restored
768 fh.readline()
769 count = count - 1
770 self.__dict__.update(dict) # update attributes
771 self.fh = fh # save the file object
772
773A sample usage might be something like this::
774
775 >>> import TextReader
776 >>> obj = TextReader.TextReader("TextReader.py")
777 >>> obj.readline()
778 '1: #!/usr/local/bin/python'
779 >>> obj.readline()
780 '2: '
781 >>> obj.readline()
782 '3: class TextReader:'
783 >>> import pickle
784 >>> pickle.dump(obj, open('save.p', 'wb'))
785
786If you want to see that :mod:`pickle` works across Python processes, start
787another Python session, before continuing. What follows can happen from either
788the same process or a new process. ::
789
790 >>> import pickle
791 >>> reader = pickle.load(open('save.p', 'rb'))
792 >>> reader.readline()
793 '4: """Print and number lines in a text file."""'
794
795
796.. seealso::
797
Georg Brandldffbf5f2008-05-20 07:49:57 +0000798 Module :mod:`copy_reg`
Georg Brandl8ec7f652007-08-15 14:28:01 +0000799 Pickle interface constructor registration for extension types.
800
801 Module :mod:`shelve`
802 Indexed databases of objects; uses :mod:`pickle`.
803
804 Module :mod:`copy`
805 Shallow and deep object copying.
806
807 Module :mod:`marshal`
808 High-performance serialization of built-in types.
809
810
811:mod:`cPickle` --- A faster :mod:`pickle`
812=========================================
813
814.. module:: cPickle
815 :synopsis: Faster version of pickle, but not subclassable.
816.. moduleauthor:: Jim Fulton <jim@zope.com>
817.. sectionauthor:: Fred L. Drake, Jr. <fdrake@acm.org>
818
819
820.. index:: module: pickle
821
822The :mod:`cPickle` module supports serialization and de-serialization of Python
823objects, providing an interface and functionality nearly identical to the
824:mod:`pickle` module. There are several differences, the most important being
825performance and subclassability.
826
827First, :mod:`cPickle` can be up to 1000 times faster than :mod:`pickle` because
828the former is implemented in C. Second, in the :mod:`cPickle` module the
829callables :func:`Pickler` and :func:`Unpickler` are functions, not classes.
830This means that you cannot use them to derive custom pickling and unpickling
831subclasses. Most applications have no need for this functionality and should
832benefit from the greatly improved performance of the :mod:`cPickle` module.
833
834The pickle data stream produced by :mod:`pickle` and :mod:`cPickle` are
835identical, so it is possible to use :mod:`pickle` and :mod:`cPickle`
836interchangeably with existing pickles. [#]_
837
838There are additional minor differences in API between :mod:`cPickle` and
839:mod:`pickle`, however for most applications, they are interchangeable. More
840documentation is provided in the :mod:`pickle` module documentation, which
841includes a list of the documented differences.
842
843.. rubric:: Footnotes
844
845.. [#] Don't confuse this with the :mod:`marshal` module
846
847.. [#] In the :mod:`pickle` module these callables are classes, which you could
848 subclass to customize the behavior. However, in the :mod:`cPickle` module these
849 callables are factory functions and so cannot be subclassed. One common reason
850 to subclass is to control what objects can actually be unpickled. See section
851 :ref:`pickle-sub` for more details.
852
853.. [#] *Warning*: this is intended for pickling multiple objects without intervening
854 modifications to the objects or their parts. If you modify an object and then
855 pickle it again using the same :class:`Pickler` instance, the object is not
856 pickled again --- a reference to it is pickled and the :class:`Unpickler` will
857 return the old value, not the modified one. There are two problems here: (1)
858 detecting changes, and (2) marshalling a minimal set of changes. Garbage
859 Collection may also become a problem here.
860
861.. [#] The exception raised will likely be an :exc:`ImportError` or an
862 :exc:`AttributeError` but it could be something else.
863
864.. [#] These methods can also be used to implement copying class instances.
865
866.. [#] This protocol is also used by the shallow and deep copying operations defined in
867 the :mod:`copy` module.
868
869.. [#] The actual mechanism for associating these user defined functions is slightly
870 different for :mod:`pickle` and :mod:`cPickle`. The description given here
871 works the same for both implementations. Users of the :mod:`pickle` module
872 could also use subclassing to effect the same results, overriding the
873 :meth:`persistent_id` and :meth:`persistent_load` methods in the derived
874 classes.
875
876.. [#] We'll leave you with the image of Guido and Jim sitting around sniffing pickles
877 in their living rooms.
878
879.. [#] A word of caution: the mechanisms described here use internal attributes and
880 methods, which are subject to change in future versions of Python. We intend to
881 someday provide a common interface for controlling this behavior, which will
882 work in either :mod:`pickle` or :mod:`cPickle`.
883
884.. [#] Since the pickle data format is actually a tiny stack-oriented programming
885 language, and some freedom is taken in the encodings of certain objects, it is
886 possible that the two modules produce different data streams for the same input
887 objects. However it is guaranteed that they will always be able to read each
888 other's data streams.
889