blob: c3b9c8fe49e5d50807f1aafed3e28fa8ca3d43c7 [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001:mod:`pickle` --- Python object serialization
2=============================================
3
4.. index::
5 single: persistence
6 pair: persistent; objects
7 pair: serializing; objects
8 pair: marshalling; objects
9 pair: flattening; objects
10 pair: pickling; objects
11
12.. module:: pickle
13 :synopsis: Convert Python objects to streams of bytes and back.
Christian Heimes5b5e81c2007-12-31 16:14:33 +000014.. sectionauthor:: Jim Kerr <jbkerr@sr.hp.com>.
15.. sectionauthor:: Barry Warsaw <barry@zope.com>
Georg Brandl116aa622007-08-15 14:28:22 +000016
17The :mod:`pickle` module implements a fundamental, but powerful algorithm for
18serializing and de-serializing a Python object structure. "Pickling" is the
19process whereby a Python object hierarchy is converted into a byte stream, and
20"unpickling" is the inverse operation, whereby a byte stream is converted back
21into an object hierarchy. Pickling (and unpickling) is alternatively known as
22"serialization", "marshalling," [#]_ or "flattening", however, to avoid
23confusion, the terms used here are "pickling" and "unpickling".
24
25This documentation describes both the :mod:`pickle` module and the
26:mod:`cPickle` module.
27
28
29Relationship to other Python modules
30------------------------------------
31
32The :mod:`pickle` module has an optimized cousin called the :mod:`cPickle`
33module. As its name implies, :mod:`cPickle` is written in C, so it can be up to
341000 times faster than :mod:`pickle`. However it does not support subclassing
35of the :func:`Pickler` and :func:`Unpickler` classes, because in :mod:`cPickle`
36these are functions, not classes. Most applications have no need for this
37functionality, and can benefit from the improved performance of :mod:`cPickle`.
38Other than that, the interfaces of the two modules are nearly identical; the
39common interface is described in this manual and differences are pointed out
40where necessary. In the following discussions, we use the term "pickle" to
41collectively describe the :mod:`pickle` and :mod:`cPickle` modules.
42
43The data streams the two modules produce are guaranteed to be interchangeable.
44
45Python has a more primitive serialization module called :mod:`marshal`, but in
46general :mod:`pickle` should always be the preferred way to serialize Python
47objects. :mod:`marshal` exists primarily to support Python's :file:`.pyc`
48files.
49
50The :mod:`pickle` module differs from :mod:`marshal` several significant ways:
51
52* The :mod:`pickle` module keeps track of the objects it has already serialized,
53 so that later references to the same object won't be serialized again.
54 :mod:`marshal` doesn't do this.
55
56 This has implications both for recursive objects and object sharing. Recursive
57 objects are objects that contain references to themselves. These are not
58 handled by marshal, and in fact, attempting to marshal recursive objects will
59 crash your Python interpreter. Object sharing happens when there are multiple
60 references to the same object in different places in the object hierarchy being
61 serialized. :mod:`pickle` stores such objects only once, and ensures that all
62 other references point to the master copy. Shared objects remain shared, which
63 can be very important for mutable objects.
64
65* :mod:`marshal` cannot be used to serialize user-defined classes and their
66 instances. :mod:`pickle` can save and restore class instances transparently,
67 however the class definition must be importable and live in the same module as
68 when the object was stored.
69
70* The :mod:`marshal` serialization format is not guaranteed to be portable
71 across Python versions. Because its primary job in life is to support
72 :file:`.pyc` files, the Python implementers reserve the right to change the
73 serialization format in non-backwards compatible ways should the need arise.
74 The :mod:`pickle` serialization format is guaranteed to be backwards compatible
75 across Python releases.
76
77.. warning::
78
79 The :mod:`pickle` module is not intended to be secure against erroneous or
80 maliciously constructed data. Never unpickle data received from an untrusted or
81 unauthenticated source.
82
83Note that serialization is a more primitive notion than persistence; although
84:mod:`pickle` reads and writes file objects, it does not handle the issue of
85naming persistent objects, nor the (even more complicated) issue of concurrent
86access to persistent objects. The :mod:`pickle` module can transform a complex
87object into a byte stream and it can transform the byte stream into an object
88with the same internal structure. Perhaps the most obvious thing to do with
89these byte streams is to write them onto a file, but it is also conceivable to
90send them across a network or store them in a database. The module
91:mod:`shelve` provides a simple interface to pickle and unpickle objects on
92DBM-style database files.
93
94
95Data stream format
96------------------
97
98.. index::
99 single: XDR
100 single: External Data Representation
101
102The data format used by :mod:`pickle` is Python-specific. This has the
103advantage that there are no restrictions imposed by external standards such as
104XDR (which can't represent pointer sharing); however it means that non-Python
105programs may not be able to reconstruct pickled Python objects.
106
107By default, the :mod:`pickle` data format uses a printable ASCII representation.
108This is slightly more voluminous than a binary representation. The big
109advantage of using printable ASCII (and of some other characteristics of
110:mod:`pickle`'s representation) is that for debugging or recovery purposes it is
111possible for a human to read the pickled file with a standard text editor.
112
Georg Brandl42f2ae02008-04-06 08:39:37 +0000113There are currently 4 different protocols which can be used for pickling.
Georg Brandl116aa622007-08-15 14:28:22 +0000114
115* Protocol version 0 is the original ASCII protocol and is backwards compatible
116 with earlier versions of Python.
117
118* Protocol version 1 is the old binary format which is also compatible with
119 earlier versions of Python.
120
121* Protocol version 2 was introduced in Python 2.3. It provides much more
Georg Brandl9afde1c2007-11-01 20:32:30 +0000122 efficient pickling of :term:`new-style class`\es.
Georg Brandl116aa622007-08-15 14:28:22 +0000123
Georg Brandl42f2ae02008-04-06 08:39:37 +0000124* Protocol version 3 was added in Python 3.0. It has explicit support for
125 bytes and cannot be unpickled by Python 2.x pickle modules.
126
Georg Brandl116aa622007-08-15 14:28:22 +0000127Refer to :pep:`307` for more information.
128
Georg Brandl42f2ae02008-04-06 08:39:37 +0000129If a *protocol* is not specified, protocol 3 is used. If *protocol* is
130specified as a negative value or :const:`HIGHEST_PROTOCOL`, the highest
131protocol version available will be used.
Georg Brandl116aa622007-08-15 14:28:22 +0000132
Georg Brandl116aa622007-08-15 14:28:22 +0000133A binary format, which is slightly more efficient, can be chosen by specifying a
134*protocol* version >= 1.
135
136
137Usage
138-----
139
140To serialize an object hierarchy, you first create a pickler, then you call the
141pickler's :meth:`dump` method. To de-serialize a data stream, you first create
142an unpickler, then you call the unpickler's :meth:`load` method. The
143:mod:`pickle` module provides the following constant:
144
145
146.. data:: HIGHEST_PROTOCOL
147
148 The highest protocol version available. This value can be passed as a
149 *protocol* value.
150
Georg Brandl116aa622007-08-15 14:28:22 +0000151.. note::
152
153 Be sure to always open pickle files created with protocols >= 1 in binary mode.
154 For the old ASCII-based pickle protocol 0 you can use either text mode or binary
155 mode as long as you stay consistent.
156
157 A pickle file written with protocol 0 in binary mode will contain lone linefeeds
158 as line terminators and therefore will look "funny" when viewed in Notepad or
159 other editors which do not support this format.
160
161The :mod:`pickle` module provides the following functions to make the pickling
162process more convenient:
163
164
165.. function:: dump(obj, file[, protocol])
166
167 Write a pickled representation of *obj* to the open file object *file*. This is
168 equivalent to ``Pickler(file, protocol).dump(obj)``.
169
Georg Brandl42f2ae02008-04-06 08:39:37 +0000170 If the *protocol* parameter is omitted, protocol 3 is used. If *protocol* is
171 specified as a negative value or :const:`HIGHEST_PROTOCOL`, the highest
172 protocol version will be used.
Georg Brandl116aa622007-08-15 14:28:22 +0000173
Georg Brandl116aa622007-08-15 14:28:22 +0000174 *file* must have a :meth:`write` method that accepts a single string argument.
175 It can thus be a file object opened for writing, a :mod:`StringIO` object, or
176 any other custom object that meets this interface.
177
178
179.. function:: load(file)
180
181 Read a string from the open file object *file* and interpret it as a pickle data
182 stream, reconstructing and returning the original object hierarchy. This is
183 equivalent to ``Unpickler(file).load()``.
184
185 *file* must have two methods, a :meth:`read` method that takes an integer
186 argument, and a :meth:`readline` method that requires no arguments. Both
187 methods should return a string. Thus *file* can be a file object opened for
188 reading, a :mod:`StringIO` object, or any other custom object that meets this
189 interface.
190
191 This function automatically determines whether the data stream was written in
192 binary mode or not.
193
194
195.. function:: dumps(obj[, protocol])
196
Mark Summerfieldb9e23042008-04-21 14:47:45 +0000197 Return the pickled representation of the object as a :class:`bytes`
198 object, instead of writing it to a file.
Georg Brandl116aa622007-08-15 14:28:22 +0000199
Georg Brandl42f2ae02008-04-06 08:39:37 +0000200 If the *protocol* parameter is omitted, protocol 3 is used. If *protocol*
201 is specified as a negative value or :const:`HIGHEST_PROTOCOL`, the highest
202 protocol version will be used.
Georg Brandl116aa622007-08-15 14:28:22 +0000203
Georg Brandl116aa622007-08-15 14:28:22 +0000204
Mark Summerfieldb9e23042008-04-21 14:47:45 +0000205.. function:: loads(bytes_object)
Georg Brandl116aa622007-08-15 14:28:22 +0000206
Mark Summerfieldb9e23042008-04-21 14:47:45 +0000207 Read a pickled object hierarchy from a :class:`bytes` object.
208 Bytes past the pickled object's representation are ignored.
Georg Brandl116aa622007-08-15 14:28:22 +0000209
210The :mod:`pickle` module also defines three exceptions:
211
212
213.. exception:: PickleError
214
215 A common base class for the other exceptions defined below. This inherits from
216 :exc:`Exception`.
217
218
219.. exception:: PicklingError
220
221 This exception is raised when an unpicklable object is passed to the
222 :meth:`dump` method.
223
224
225.. exception:: UnpicklingError
226
227 This exception is raised when there is a problem unpickling an object. Note that
228 other exceptions may also be raised during unpickling, including (but not
229 necessarily limited to) :exc:`AttributeError`, :exc:`EOFError`,
230 :exc:`ImportError`, and :exc:`IndexError`.
231
232The :mod:`pickle` module also exports two callables [#]_, :class:`Pickler` and
233:class:`Unpickler`:
234
235
236.. class:: Pickler(file[, protocol])
237
238 This takes a file-like object to which it will write a pickle data stream.
239
Georg Brandl42f2ae02008-04-06 08:39:37 +0000240 If the *protocol* parameter is omitted, protocol 3 is used. If *protocol* is
Georg Brandl116aa622007-08-15 14:28:22 +0000241 specified as a negative value or :const:`HIGHEST_PROTOCOL`, the highest
242 protocol version will be used.
243
Georg Brandl116aa622007-08-15 14:28:22 +0000244 *file* must have a :meth:`write` method that accepts a single string argument.
245 It can thus be an open file object, a :mod:`StringIO` object, or any other
246 custom object that meets this interface.
247
Benjamin Petersone41251e2008-04-25 01:59:09 +0000248 :class:`Pickler` objects define one (or two) public methods:
Georg Brandl116aa622007-08-15 14:28:22 +0000249
250
Benjamin Petersone41251e2008-04-25 01:59:09 +0000251 .. method:: dump(obj)
Georg Brandl116aa622007-08-15 14:28:22 +0000252
Benjamin Petersone41251e2008-04-25 01:59:09 +0000253 Write a pickled representation of *obj* to the open file object given in the
254 constructor. Either the binary or ASCII format will be used, depending on the
255 value of the *protocol* argument passed to the constructor.
Georg Brandl116aa622007-08-15 14:28:22 +0000256
257
Benjamin Petersone41251e2008-04-25 01:59:09 +0000258 .. method:: clear_memo()
Georg Brandl116aa622007-08-15 14:28:22 +0000259
Benjamin Petersone41251e2008-04-25 01:59:09 +0000260 Clears the pickler's "memo". The memo is the data structure that remembers
261 which objects the pickler has already seen, so that shared or recursive objects
262 pickled by reference and not by value. This method is useful when re-using
263 picklers.
Georg Brandl116aa622007-08-15 14:28:22 +0000264
Benjamin Petersone41251e2008-04-25 01:59:09 +0000265 .. note::
Georg Brandl116aa622007-08-15 14:28:22 +0000266
Benjamin Petersone41251e2008-04-25 01:59:09 +0000267 Prior to Python 2.3, :meth:`clear_memo` was only available on the picklers
268 created by :mod:`cPickle`. In the :mod:`pickle` module, picklers have an
269 instance variable called :attr:`memo` which is a Python dictionary. So to clear
270 the memo for a :mod:`pickle` module pickler, you could do the following::
Georg Brandl116aa622007-08-15 14:28:22 +0000271
Benjamin Petersone41251e2008-04-25 01:59:09 +0000272 mypickler.memo.clear()
Georg Brandl116aa622007-08-15 14:28:22 +0000273
Benjamin Petersone41251e2008-04-25 01:59:09 +0000274 Code that does not need to support older versions of Python should simply use
275 :meth:`clear_memo`.
Georg Brandl116aa622007-08-15 14:28:22 +0000276
277It is possible to make multiple calls to the :meth:`dump` method of the same
278:class:`Pickler` instance. These must then be matched to the same number of
279calls to the :meth:`load` method of the corresponding :class:`Unpickler`
280instance. If the same object is pickled by multiple :meth:`dump` calls, the
281:meth:`load` will all yield references to the same object. [#]_
282
283:class:`Unpickler` objects are defined as:
284
285
286.. class:: Unpickler(file)
287
288 This takes a file-like object from which it will read a pickle data stream.
289 This class automatically determines whether the data stream was written in
290 binary mode or not, so it does not need a flag as in the :class:`Pickler`
291 factory.
292
293 *file* must have two methods, a :meth:`read` method that takes an integer
294 argument, and a :meth:`readline` method that requires no arguments. Both
295 methods should return a string. Thus *file* can be a file object opened for
296 reading, a :mod:`StringIO` object, or any other custom object that meets this
297 interface.
298
Benjamin Petersone41251e2008-04-25 01:59:09 +0000299 :class:`Unpickler` objects have one (or two) public methods:
Georg Brandl116aa622007-08-15 14:28:22 +0000300
301
Benjamin Petersone41251e2008-04-25 01:59:09 +0000302 .. method:: load()
Georg Brandl116aa622007-08-15 14:28:22 +0000303
Benjamin Petersone41251e2008-04-25 01:59:09 +0000304 Read a pickled object representation from the open file object given in
305 the constructor, and return the reconstituted object hierarchy specified
306 therein.
Georg Brandl116aa622007-08-15 14:28:22 +0000307
Benjamin Petersone41251e2008-04-25 01:59:09 +0000308 This method automatically determines whether the data stream was written
309 in binary mode or not.
Georg Brandl116aa622007-08-15 14:28:22 +0000310
311
Benjamin Petersone41251e2008-04-25 01:59:09 +0000312 .. method:: noload()
Georg Brandl116aa622007-08-15 14:28:22 +0000313
Benjamin Petersone41251e2008-04-25 01:59:09 +0000314 This is just like :meth:`load` except that it doesn't actually create any
315 objects. This is useful primarily for finding what's called "persistent
316 ids" that may be referenced in a pickle data stream. See section
317 :ref:`pickle-protocol` below for more details.
Georg Brandl116aa622007-08-15 14:28:22 +0000318
Benjamin Petersone41251e2008-04-25 01:59:09 +0000319 **Note:** the :meth:`noload` method is currently only available on
320 :class:`Unpickler` objects created with the :mod:`cPickle` module.
321 :mod:`pickle` module :class:`Unpickler`\ s do not have the :meth:`noload`
322 method.
Georg Brandl116aa622007-08-15 14:28:22 +0000323
324
325What can be pickled and unpickled?
326----------------------------------
327
328The following types can be pickled:
329
330* ``None``, ``True``, and ``False``
331
Georg Brandlba956ae2007-11-29 17:24:34 +0000332* integers, floating point numbers, complex numbers
Georg Brandl116aa622007-08-15 14:28:22 +0000333
Georg Brandlf6945182008-02-01 11:56:49 +0000334* strings, bytes, bytearrays
Georg Brandl116aa622007-08-15 14:28:22 +0000335
336* tuples, lists, sets, and dictionaries containing only picklable objects
337
338* functions defined at the top level of a module
339
340* built-in functions defined at the top level of a module
341
342* classes that are defined at the top level of a module
343
344* instances of such classes whose :attr:`__dict__` or :meth:`__setstate__` is
345 picklable (see section :ref:`pickle-protocol` for details)
346
347Attempts to pickle unpicklable objects will raise the :exc:`PicklingError`
348exception; when this happens, an unspecified number of bytes may have already
349been written to the underlying file. Trying to pickle a highly recursive data
350structure may exceed the maximum recursion depth, a :exc:`RuntimeError` will be
351raised in this case. You can carefully raise this limit with
352:func:`sys.setrecursionlimit`.
353
354Note that functions (built-in and user-defined) are pickled by "fully qualified"
355name reference, not by value. This means that only the function name is
356pickled, along with the name of module the function is defined in. Neither the
357function's code, nor any of its function attributes are pickled. Thus the
358defining module must be importable in the unpickling environment, and the module
359must contain the named object, otherwise an exception will be raised. [#]_
360
361Similarly, classes are pickled by named reference, so the same restrictions in
362the unpickling environment apply. Note that none of the class's code or data is
363pickled, so in the following example the class attribute ``attr`` is not
364restored in the unpickling environment::
365
366 class Foo:
367 attr = 'a class attr'
368
369 picklestring = pickle.dumps(Foo)
370
371These restrictions are why picklable functions and classes must be defined in
372the top level of a module.
373
374Similarly, when class instances are pickled, their class's code and data are not
375pickled along with them. Only the instance data are pickled. This is done on
376purpose, so you can fix bugs in a class or add methods to the class and still
377load objects that were created with an earlier version of the class. If you
378plan to have long-lived objects that will see many versions of a class, it may
379be worthwhile to put a version number in the objects so that suitable
380conversions can be made by the class's :meth:`__setstate__` method.
381
382
383.. _pickle-protocol:
384
385The pickle protocol
386-------------------
387
388This section describes the "pickling protocol" that defines the interface
389between the pickler/unpickler and the objects that are being serialized. This
390protocol provides a standard way for you to define, customize, and control how
391your objects are serialized and de-serialized. The description in this section
392doesn't cover specific customizations that you can employ to make the unpickling
393environment slightly safer from untrusted pickle data streams; see section
394:ref:`pickle-sub` for more details.
395
396
397.. _pickle-inst:
398
399Pickling and unpickling normal class instances
400^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
401
402.. index::
403 single: __getinitargs__() (copy protocol)
404 single: __init__() (instance constructor)
405
Georg Brandl85eb8c12007-08-31 16:33:38 +0000406.. XXX is __getinitargs__ only used with old-style classes?
Georg Brandl23e8db52008-04-07 19:17:06 +0000407.. XXX update w.r.t Py3k's classes
Georg Brandl85eb8c12007-08-31 16:33:38 +0000408
Georg Brandl116aa622007-08-15 14:28:22 +0000409When a pickled class instance is unpickled, its :meth:`__init__` method is
410normally *not* invoked. If it is desirable that the :meth:`__init__` method be
411called on unpickling, an old-style class can define a method
412:meth:`__getinitargs__`, which should return a *tuple* containing the arguments
413to be passed to the class constructor (:meth:`__init__` for example). The
414:meth:`__getinitargs__` method is called at pickle time; the tuple it returns is
415incorporated in the pickle for the instance.
416
417.. index:: single: __getnewargs__() (copy protocol)
418
419New-style types can provide a :meth:`__getnewargs__` method that is used for
420protocol 2. Implementing this method is needed if the type establishes some
421internal invariants when the instance is created, or if the memory allocation is
422affected by the values passed to the :meth:`__new__` method for the type (as it
Georg Brandl9afde1c2007-11-01 20:32:30 +0000423is for tuples and strings). Instances of a :term:`new-style class` :class:`C`
424are created using ::
Georg Brandl116aa622007-08-15 14:28:22 +0000425
426 obj = C.__new__(C, *args)
427
428
429where *args* is the result of calling :meth:`__getnewargs__` on the original
430object; if there is no :meth:`__getnewargs__`, an empty tuple is assumed.
431
432.. index::
433 single: __getstate__() (copy protocol)
434 single: __setstate__() (copy protocol)
435 single: __dict__ (instance attribute)
436
437Classes can further influence how their instances are pickled; if the class
438defines the method :meth:`__getstate__`, it is called and the return state is
439pickled as the contents for the instance, instead of the contents of the
440instance's dictionary. If there is no :meth:`__getstate__` method, the
441instance's :attr:`__dict__` is pickled.
442
443Upon unpickling, if the class also defines the method :meth:`__setstate__`, it
444is called with the unpickled state. [#]_ If there is no :meth:`__setstate__`
445method, the pickled state must be a dictionary and its items are assigned to the
446new instance's dictionary. If a class defines both :meth:`__getstate__` and
447:meth:`__setstate__`, the state object needn't be a dictionary and these methods
448can do what they want. [#]_
449
450.. warning::
451
Georg Brandl23e8db52008-04-07 19:17:06 +0000452 If :meth:`__getstate__` returns a false value, the :meth:`__setstate__`
453 method will not be called.
Georg Brandl116aa622007-08-15 14:28:22 +0000454
455
456Pickling and unpickling extension types
457^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
458
Christian Heimes05e8be12008-02-23 18:30:17 +0000459.. index::
460 single: __reduce__() (pickle protocol)
461 single: __reduce_ex__() (pickle protocol)
462 single: __safe_for_unpickling__ (pickle protocol)
463
Georg Brandl116aa622007-08-15 14:28:22 +0000464When the :class:`Pickler` encounters an object of a type it knows nothing about
465--- such as an extension type --- it looks in two places for a hint of how to
466pickle it. One alternative is for the object to implement a :meth:`__reduce__`
467method. If provided, at pickling time :meth:`__reduce__` will be called with no
468arguments, and it must return either a string or a tuple.
469
470If a string is returned, it names a global variable whose contents are pickled
471as normal. The string returned by :meth:`__reduce__` should be the object's
472local name relative to its module; the pickle module searches the module
473namespace to determine the object's module.
474
475When a tuple is returned, it must be between two and five elements long.
Martin v. Löwis2a241ca2008-04-05 18:58:09 +0000476Optional elements can either be omitted, or ``None`` can be provided as their
477value. The contents of this tuple are pickled as normal and used to
478reconstruct the object at unpickling time. The semantics of each element are:
Georg Brandl116aa622007-08-15 14:28:22 +0000479
480* A callable object that will be called to create the initial version of the
481 object. The next element of the tuple will provide arguments for this callable,
482 and later elements provide additional state information that will subsequently
483 be used to fully reconstruct the pickled data.
484
485 In the unpickling environment this object must be either a class, a callable
486 registered as a "safe constructor" (see below), or it must have an attribute
487 :attr:`__safe_for_unpickling__` with a true value. Otherwise, an
488 :exc:`UnpicklingError` will be raised in the unpickling environment. Note that
489 as usual, the callable itself is pickled by name.
490
Georg Brandl55ac8f02007-09-01 13:51:09 +0000491* A tuple of arguments for the callable object, not ``None``.
Georg Brandl116aa622007-08-15 14:28:22 +0000492
493* Optionally, the object's state, which will be passed to the object's
494 :meth:`__setstate__` method as described in section :ref:`pickle-inst`. If the
495 object has no :meth:`__setstate__` method, then, as above, the value must be a
496 dictionary and it will be added to the object's :attr:`__dict__`.
497
498* Optionally, an iterator (and not a sequence) yielding successive list items.
499 These list items will be pickled, and appended to the object using either
500 ``obj.append(item)`` or ``obj.extend(list_of_items)``. This is primarily used
501 for list subclasses, but may be used by other classes as long as they have
502 :meth:`append` and :meth:`extend` methods with the appropriate signature.
503 (Whether :meth:`append` or :meth:`extend` is used depends on which pickle
504 protocol version is used as well as the number of items to append, so both must
505 be supported.)
506
507* Optionally, an iterator (not a sequence) yielding successive dictionary items,
508 which should be tuples of the form ``(key, value)``. These items will be
509 pickled and stored to the object using ``obj[key] = value``. This is primarily
510 used for dictionary subclasses, but may be used by other classes as long as they
511 implement :meth:`__setitem__`.
512
513It is sometimes useful to know the protocol version when implementing
514:meth:`__reduce__`. This can be done by implementing a method named
515:meth:`__reduce_ex__` instead of :meth:`__reduce__`. :meth:`__reduce_ex__`, when
516it exists, is called in preference over :meth:`__reduce__` (you may still
517provide :meth:`__reduce__` for backwards compatibility). The
518:meth:`__reduce_ex__` method will be called with a single integer argument, the
519protocol version.
520
521The :class:`object` class implements both :meth:`__reduce__` and
522:meth:`__reduce_ex__`; however, if a subclass overrides :meth:`__reduce__` but
523not :meth:`__reduce_ex__`, the :meth:`__reduce_ex__` implementation detects this
524and calls :meth:`__reduce__`.
525
526An alternative to implementing a :meth:`__reduce__` method on the object to be
Alexandre Vassalottif7fa63d2008-05-11 08:55:36 +0000527pickled, is to register the callable with the :mod:`copyreg` module. This
Georg Brandl116aa622007-08-15 14:28:22 +0000528module provides a way for programs to register "reduction functions" and
529constructors for user-defined types. Reduction functions have the same
530semantics and interface as the :meth:`__reduce__` method described above, except
531that they are called with a single argument, the object to be pickled.
532
533The registered constructor is deemed a "safe constructor" for purposes of
534unpickling as described above.
535
536
537Pickling and unpickling external objects
538^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
539
Christian Heimes05e8be12008-02-23 18:30:17 +0000540.. index::
541 single: persistent_id (pickle protocol)
542 single: persistent_load (pickle protocol)
543
Georg Brandl116aa622007-08-15 14:28:22 +0000544For the benefit of object persistence, the :mod:`pickle` module supports the
545notion of a reference to an object outside the pickled data stream. Such
546objects are referenced by a "persistent id", which is just an arbitrary string
547of printable ASCII characters. The resolution of such names is not defined by
548the :mod:`pickle` module; it will delegate this resolution to user defined
549functions on the pickler and unpickler. [#]_
550
551To define external persistent id resolution, you need to set the
552:attr:`persistent_id` attribute of the pickler object and the
553:attr:`persistent_load` attribute of the unpickler object.
554
555To pickle objects that have an external persistent id, the pickler must have a
556custom :func:`persistent_id` method that takes an object as an argument and
557returns either ``None`` or the persistent id for that object. When ``None`` is
558returned, the pickler simply pickles the object as normal. When a persistent id
559string is returned, the pickler will pickle that string, along with a marker so
560that the unpickler will recognize the string as a persistent id.
561
562To unpickle external objects, the unpickler must have a custom
563:func:`persistent_load` function that takes a persistent id string and returns
564the referenced object.
565
566Here's a silly example that *might* shed more light::
567
568 import pickle
569 from cStringIO import StringIO
570
571 src = StringIO()
572 p = pickle.Pickler(src)
573
574 def persistent_id(obj):
575 if hasattr(obj, 'x'):
576 return 'the value %d' % obj.x
577 else:
578 return None
579
580 p.persistent_id = persistent_id
581
582 class Integer:
583 def __init__(self, x):
584 self.x = x
585 def __str__(self):
586 return 'My name is integer %d' % self.x
587
588 i = Integer(7)
Georg Brandl6911e3c2007-09-04 07:15:32 +0000589 print(i)
Georg Brandl116aa622007-08-15 14:28:22 +0000590 p.dump(i)
591
592 datastream = src.getvalue()
Georg Brandl6911e3c2007-09-04 07:15:32 +0000593 print(repr(datastream))
Georg Brandl116aa622007-08-15 14:28:22 +0000594 dst = StringIO(datastream)
595
596 up = pickle.Unpickler(dst)
597
598 class FancyInteger(Integer):
599 def __str__(self):
600 return 'I am the integer %d' % self.x
601
602 def persistent_load(persid):
603 if persid.startswith('the value '):
604 value = int(persid.split()[2])
605 return FancyInteger(value)
606 else:
Collin Winter6fe2a6c2007-09-10 00:20:05 +0000607 raise pickle.UnpicklingError('Invalid persistent id')
Georg Brandl116aa622007-08-15 14:28:22 +0000608
609 up.persistent_load = persistent_load
610
611 j = up.load()
Georg Brandl6911e3c2007-09-04 07:15:32 +0000612 print(j)
Georg Brandl116aa622007-08-15 14:28:22 +0000613
614In the :mod:`cPickle` module, the unpickler's :attr:`persistent_load` attribute
615can also be set to a Python list, in which case, when the unpickler reaches a
616persistent id, the persistent id string will simply be appended to this list.
617This functionality exists so that a pickle data stream can be "sniffed" for
618object references without actually instantiating all the objects in a pickle.
619[#]_ Setting :attr:`persistent_load` to a list is usually used in conjunction
620with the :meth:`noload` method on the Unpickler.
621
Christian Heimes5b5e81c2007-12-31 16:14:33 +0000622.. BAW: Both pickle and cPickle support something called inst_persistent_id()
623 which appears to give unknown types a second shot at producing a persistent
624 id. Since Jim Fulton can't remember why it was added or what it's for, I'm
625 leaving it undocumented.
Georg Brandl116aa622007-08-15 14:28:22 +0000626
627
628.. _pickle-sub:
629
630Subclassing Unpicklers
631----------------------
632
Christian Heimes05e8be12008-02-23 18:30:17 +0000633.. index::
634 single: load_global() (pickle protocol)
635 single: find_global() (pickle protocol)
636
Georg Brandl116aa622007-08-15 14:28:22 +0000637By default, unpickling will import any class that it finds in the pickle data.
638You can control exactly what gets unpickled and what gets called by customizing
639your unpickler. Unfortunately, exactly how you do this is different depending
640on whether you're using :mod:`pickle` or :mod:`cPickle`. [#]_
641
642In the :mod:`pickle` module, you need to derive a subclass from
643:class:`Unpickler`, overriding the :meth:`load_global` method.
644:meth:`load_global` should read two lines from the pickle data stream where the
645first line will the name of the module containing the class and the second line
646will be the name of the instance's class. It then looks up the class, possibly
647importing the module and digging out the attribute, then it appends what it
648finds to the unpickler's stack. Later on, this class will be assigned to the
649:attr:`__class__` attribute of an empty class, as a way of magically creating an
650instance without calling its class's :meth:`__init__`. Your job (should you
651choose to accept it), would be to have :meth:`load_global` push onto the
652unpickler's stack, a known safe version of any class you deem safe to unpickle.
653It is up to you to produce such a class. Or you could raise an error if you
654want to disallow all unpickling of instances. If this sounds like a hack,
655you're right. Refer to the source code to make this work.
656
657Things are a little cleaner with :mod:`cPickle`, but not by much. To control
658what gets unpickled, you can set the unpickler's :attr:`find_global` attribute
659to a function or ``None``. If it is ``None`` then any attempts to unpickle
660instances will raise an :exc:`UnpicklingError`. If it is a function, then it
661should accept a module name and a class name, and return the corresponding class
662object. It is responsible for looking up the class and performing any necessary
663imports, and it may raise an error to prevent instances of the class from being
664unpickled.
665
666The moral of the story is that you should be really careful about the source of
667the strings your application unpickles.
668
669
670.. _pickle-example:
671
672Example
673-------
674
675For the simplest code, use the :func:`dump` and :func:`load` functions. Note
676that a self-referencing list is pickled and restored correctly. ::
677
678 import pickle
679
680 data1 = {'a': [1, 2.0, 3, 4+6j],
Georg Brandlf6945182008-02-01 11:56:49 +0000681 'b': ("string", "string using Unicode features \u0394"),
Georg Brandl116aa622007-08-15 14:28:22 +0000682 'c': None}
683
684 selfref_list = [1, 2, 3]
685 selfref_list.append(selfref_list)
686
687 output = open('data.pkl', 'wb')
688
Georg Brandl42f2ae02008-04-06 08:39:37 +0000689 # Pickle dictionary using protocol 2.
690 pickle.dump(data1, output, 2)
Georg Brandl116aa622007-08-15 14:28:22 +0000691
692 # Pickle the list using the highest protocol available.
693 pickle.dump(selfref_list, output, -1)
694
695 output.close()
696
697The following example reads the resulting pickled data. When reading a
698pickle-containing file, you should open the file in binary mode because you
699can't be sure if the ASCII or binary format was used. ::
700
701 import pprint, pickle
702
703 pkl_file = open('data.pkl', 'rb')
704
705 data1 = pickle.load(pkl_file)
706 pprint.pprint(data1)
707
708 data2 = pickle.load(pkl_file)
709 pprint.pprint(data2)
710
711 pkl_file.close()
712
713Here's a larger example that shows how to modify pickling behavior for a class.
714The :class:`TextReader` class opens a text file, and returns the line number and
715line contents each time its :meth:`readline` method is called. If a
716:class:`TextReader` instance is pickled, all attributes *except* the file object
717member are saved. When the instance is unpickled, the file is reopened, and
718reading resumes from the last location. The :meth:`__setstate__` and
719:meth:`__getstate__` methods are used to implement this behavior. ::
720
721 #!/usr/local/bin/python
722
723 class TextReader:
724 """Print and number lines in a text file."""
725 def __init__(self, file):
726 self.file = file
727 self.fh = open(file)
728 self.lineno = 0
729
730 def readline(self):
731 self.lineno = self.lineno + 1
732 line = self.fh.readline()
733 if not line:
734 return None
735 if line.endswith("\n"):
736 line = line[:-1]
737 return "%d: %s" % (self.lineno, line)
738
739 def __getstate__(self):
740 odict = self.__dict__.copy() # copy the dict since we change it
741 del odict['fh'] # remove filehandle entry
742 return odict
743
744 def __setstate__(self, dict):
745 fh = open(dict['file']) # reopen file
746 count = dict['lineno'] # read from file...
747 while count: # until line count is restored
748 fh.readline()
749 count = count - 1
750 self.__dict__.update(dict) # update attributes
751 self.fh = fh # save the file object
752
753A sample usage might be something like this::
754
755 >>> import TextReader
756 >>> obj = TextReader.TextReader("TextReader.py")
757 >>> obj.readline()
758 '1: #!/usr/local/bin/python'
759 >>> obj.readline()
760 '2: '
761 >>> obj.readline()
762 '3: class TextReader:'
763 >>> import pickle
764 >>> pickle.dump(obj, open('save.p', 'wb'))
765
766If you want to see that :mod:`pickle` works across Python processes, start
767another Python session, before continuing. What follows can happen from either
768the same process or a new process. ::
769
770 >>> import pickle
771 >>> reader = pickle.load(open('save.p', 'rb'))
772 >>> reader.readline()
773 '4: """Print and number lines in a text file."""'
774
775
776.. seealso::
777
Alexandre Vassalottif7fa63d2008-05-11 08:55:36 +0000778 Module :mod:`copyreg`
Georg Brandl116aa622007-08-15 14:28:22 +0000779 Pickle interface constructor registration for extension types.
780
781 Module :mod:`shelve`
782 Indexed databases of objects; uses :mod:`pickle`.
783
784 Module :mod:`copy`
785 Shallow and deep object copying.
786
787 Module :mod:`marshal`
788 High-performance serialization of built-in types.
789
790
791:mod:`cPickle` --- A faster :mod:`pickle`
792=========================================
793
794.. module:: cPickle
795 :synopsis: Faster version of pickle, but not subclassable.
796.. moduleauthor:: Jim Fulton <jim@zope.com>
797.. sectionauthor:: Fred L. Drake, Jr. <fdrake@acm.org>
798
799
800.. index:: module: pickle
801
802The :mod:`cPickle` module supports serialization and de-serialization of Python
803objects, providing an interface and functionality nearly identical to the
804:mod:`pickle` module. There are several differences, the most important being
805performance and subclassability.
806
807First, :mod:`cPickle` can be up to 1000 times faster than :mod:`pickle` because
808the former is implemented in C. Second, in the :mod:`cPickle` module the
809callables :func:`Pickler` and :func:`Unpickler` are functions, not classes.
810This means that you cannot use them to derive custom pickling and unpickling
811subclasses. Most applications have no need for this functionality and should
812benefit from the greatly improved performance of the :mod:`cPickle` module.
813
814The pickle data stream produced by :mod:`pickle` and :mod:`cPickle` are
815identical, so it is possible to use :mod:`pickle` and :mod:`cPickle`
816interchangeably with existing pickles. [#]_
817
818There are additional minor differences in API between :mod:`cPickle` and
819:mod:`pickle`, however for most applications, they are interchangeable. More
820documentation is provided in the :mod:`pickle` module documentation, which
821includes a list of the documented differences.
822
823.. rubric:: Footnotes
824
825.. [#] Don't confuse this with the :mod:`marshal` module
826
827.. [#] In the :mod:`pickle` module these callables are classes, which you could
828 subclass to customize the behavior. However, in the :mod:`cPickle` module these
829 callables are factory functions and so cannot be subclassed. One common reason
830 to subclass is to control what objects can actually be unpickled. See section
831 :ref:`pickle-sub` for more details.
832
833.. [#] *Warning*: this is intended for pickling multiple objects without intervening
834 modifications to the objects or their parts. If you modify an object and then
835 pickle it again using the same :class:`Pickler` instance, the object is not
836 pickled again --- a reference to it is pickled and the :class:`Unpickler` will
837 return the old value, not the modified one. There are two problems here: (1)
838 detecting changes, and (2) marshalling a minimal set of changes. Garbage
839 Collection may also become a problem here.
840
841.. [#] The exception raised will likely be an :exc:`ImportError` or an
842 :exc:`AttributeError` but it could be something else.
843
844.. [#] These methods can also be used to implement copying class instances.
845
846.. [#] This protocol is also used by the shallow and deep copying operations defined in
847 the :mod:`copy` module.
848
849.. [#] The actual mechanism for associating these user defined functions is slightly
850 different for :mod:`pickle` and :mod:`cPickle`. The description given here
851 works the same for both implementations. Users of the :mod:`pickle` module
852 could also use subclassing to effect the same results, overriding the
853 :meth:`persistent_id` and :meth:`persistent_load` methods in the derived
854 classes.
855
856.. [#] We'll leave you with the image of Guido and Jim sitting around sniffing pickles
857 in their living rooms.
858
859.. [#] A word of caution: the mechanisms described here use internal attributes and
860 methods, which are subject to change in future versions of Python. We intend to
861 someday provide a common interface for controlling this behavior, which will
862 work in either :mod:`pickle` or :mod:`cPickle`.
863
864.. [#] Since the pickle data format is actually a tiny stack-oriented programming
865 language, and some freedom is taken in the encodings of certain objects, it is
866 possible that the two modules produce different data streams for the same input
867 objects. However it is guaranteed that they will always be able to read each
868 other's data streams.
869