blob: 520c24b8eead6b7043b69b8539980555b9db809c [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001:mod:`pickle` --- Python object serialization
2=============================================
3
4.. index::
5 single: persistence
6 pair: persistent; objects
7 pair: serializing; objects
8 pair: marshalling; objects
9 pair: flattening; objects
10 pair: pickling; objects
11
12.. module:: pickle
13 :synopsis: Convert Python objects to streams of bytes and back.
Christian Heimes5b5e81c2007-12-31 16:14:33 +000014.. sectionauthor:: Jim Kerr <jbkerr@sr.hp.com>.
15.. sectionauthor:: Barry Warsaw <barry@zope.com>
Georg Brandl116aa622007-08-15 14:28:22 +000016
17The :mod:`pickle` module implements a fundamental, but powerful algorithm for
18serializing and de-serializing a Python object structure. "Pickling" is the
19process whereby a Python object hierarchy is converted into a byte stream, and
20"unpickling" is the inverse operation, whereby a byte stream is converted back
21into an object hierarchy. Pickling (and unpickling) is alternatively known as
22"serialization", "marshalling," [#]_ or "flattening", however, to avoid
23confusion, the terms used here are "pickling" and "unpickling".
24
25This documentation describes both the :mod:`pickle` module and the
26:mod:`cPickle` module.
27
28
29Relationship to other Python modules
30------------------------------------
31
32The :mod:`pickle` module has an optimized cousin called the :mod:`cPickle`
33module. As its name implies, :mod:`cPickle` is written in C, so it can be up to
341000 times faster than :mod:`pickle`. However it does not support subclassing
35of the :func:`Pickler` and :func:`Unpickler` classes, because in :mod:`cPickle`
36these are functions, not classes. Most applications have no need for this
37functionality, and can benefit from the improved performance of :mod:`cPickle`.
38Other than that, the interfaces of the two modules are nearly identical; the
39common interface is described in this manual and differences are pointed out
40where necessary. In the following discussions, we use the term "pickle" to
41collectively describe the :mod:`pickle` and :mod:`cPickle` modules.
42
43The data streams the two modules produce are guaranteed to be interchangeable.
44
45Python has a more primitive serialization module called :mod:`marshal`, but in
46general :mod:`pickle` should always be the preferred way to serialize Python
47objects. :mod:`marshal` exists primarily to support Python's :file:`.pyc`
48files.
49
50The :mod:`pickle` module differs from :mod:`marshal` several significant ways:
51
52* The :mod:`pickle` module keeps track of the objects it has already serialized,
53 so that later references to the same object won't be serialized again.
54 :mod:`marshal` doesn't do this.
55
56 This has implications both for recursive objects and object sharing. Recursive
57 objects are objects that contain references to themselves. These are not
58 handled by marshal, and in fact, attempting to marshal recursive objects will
59 crash your Python interpreter. Object sharing happens when there are multiple
60 references to the same object in different places in the object hierarchy being
61 serialized. :mod:`pickle` stores such objects only once, and ensures that all
62 other references point to the master copy. Shared objects remain shared, which
63 can be very important for mutable objects.
64
65* :mod:`marshal` cannot be used to serialize user-defined classes and their
66 instances. :mod:`pickle` can save and restore class instances transparently,
67 however the class definition must be importable and live in the same module as
68 when the object was stored.
69
70* The :mod:`marshal` serialization format is not guaranteed to be portable
71 across Python versions. Because its primary job in life is to support
72 :file:`.pyc` files, the Python implementers reserve the right to change the
73 serialization format in non-backwards compatible ways should the need arise.
74 The :mod:`pickle` serialization format is guaranteed to be backwards compatible
75 across Python releases.
76
77.. warning::
78
79 The :mod:`pickle` module is not intended to be secure against erroneous or
80 maliciously constructed data. Never unpickle data received from an untrusted or
81 unauthenticated source.
82
83Note that serialization is a more primitive notion than persistence; although
84:mod:`pickle` reads and writes file objects, it does not handle the issue of
85naming persistent objects, nor the (even more complicated) issue of concurrent
86access to persistent objects. The :mod:`pickle` module can transform a complex
87object into a byte stream and it can transform the byte stream into an object
88with the same internal structure. Perhaps the most obvious thing to do with
89these byte streams is to write them onto a file, but it is also conceivable to
90send them across a network or store them in a database. The module
91:mod:`shelve` provides a simple interface to pickle and unpickle objects on
92DBM-style database files.
93
94
95Data stream format
96------------------
97
98.. index::
99 single: XDR
100 single: External Data Representation
101
102The data format used by :mod:`pickle` is Python-specific. This has the
103advantage that there are no restrictions imposed by external standards such as
104XDR (which can't represent pointer sharing); however it means that non-Python
105programs may not be able to reconstruct pickled Python objects.
106
107By default, the :mod:`pickle` data format uses a printable ASCII representation.
108This is slightly more voluminous than a binary representation. The big
109advantage of using printable ASCII (and of some other characteristics of
110:mod:`pickle`'s representation) is that for debugging or recovery purposes it is
111possible for a human to read the pickled file with a standard text editor.
112
113There are currently 3 different protocols which can be used for pickling.
114
115* Protocol version 0 is the original ASCII protocol and is backwards compatible
116 with earlier versions of Python.
117
118* Protocol version 1 is the old binary format which is also compatible with
119 earlier versions of Python.
120
121* Protocol version 2 was introduced in Python 2.3. It provides much more
Georg Brandl9afde1c2007-11-01 20:32:30 +0000122 efficient pickling of :term:`new-style class`\es.
Georg Brandl116aa622007-08-15 14:28:22 +0000123
124Refer to :pep:`307` for more information.
125
126If a *protocol* is not specified, protocol 0 is used. If *protocol* is specified
127as a negative value or :const:`HIGHEST_PROTOCOL`, the highest protocol version
128available will be used.
129
Georg Brandl116aa622007-08-15 14:28:22 +0000130A binary format, which is slightly more efficient, can be chosen by specifying a
131*protocol* version >= 1.
132
133
134Usage
135-----
136
137To serialize an object hierarchy, you first create a pickler, then you call the
138pickler's :meth:`dump` method. To de-serialize a data stream, you first create
139an unpickler, then you call the unpickler's :meth:`load` method. The
140:mod:`pickle` module provides the following constant:
141
142
143.. data:: HIGHEST_PROTOCOL
144
145 The highest protocol version available. This value can be passed as a
146 *protocol* value.
147
Georg Brandl116aa622007-08-15 14:28:22 +0000148.. note::
149
150 Be sure to always open pickle files created with protocols >= 1 in binary mode.
151 For the old ASCII-based pickle protocol 0 you can use either text mode or binary
152 mode as long as you stay consistent.
153
154 A pickle file written with protocol 0 in binary mode will contain lone linefeeds
155 as line terminators and therefore will look "funny" when viewed in Notepad or
156 other editors which do not support this format.
157
158The :mod:`pickle` module provides the following functions to make the pickling
159process more convenient:
160
161
162.. function:: dump(obj, file[, protocol])
163
164 Write a pickled representation of *obj* to the open file object *file*. This is
165 equivalent to ``Pickler(file, protocol).dump(obj)``.
166
167 If the *protocol* parameter is omitted, protocol 0 is used. If *protocol* is
168 specified as a negative value or :const:`HIGHEST_PROTOCOL`, the highest protocol
169 version will be used.
170
Georg Brandl116aa622007-08-15 14:28:22 +0000171 *file* must have a :meth:`write` method that accepts a single string argument.
172 It can thus be a file object opened for writing, a :mod:`StringIO` object, or
173 any other custom object that meets this interface.
174
175
176.. function:: load(file)
177
178 Read a string from the open file object *file* and interpret it as a pickle data
179 stream, reconstructing and returning the original object hierarchy. This is
180 equivalent to ``Unpickler(file).load()``.
181
182 *file* must have two methods, a :meth:`read` method that takes an integer
183 argument, and a :meth:`readline` method that requires no arguments. Both
184 methods should return a string. Thus *file* can be a file object opened for
185 reading, a :mod:`StringIO` object, or any other custom object that meets this
186 interface.
187
188 This function automatically determines whether the data stream was written in
189 binary mode or not.
190
191
192.. function:: dumps(obj[, protocol])
193
194 Return the pickled representation of the object as a string, instead of writing
195 it to a file.
196
197 If the *protocol* parameter is omitted, protocol 0 is used. If *protocol* is
198 specified as a negative value or :const:`HIGHEST_PROTOCOL`, the highest protocol
199 version will be used.
200
Georg Brandl116aa622007-08-15 14:28:22 +0000201
202.. function:: loads(string)
203
204 Read a pickled object hierarchy from a string. Characters in the string past
205 the pickled object's representation are ignored.
206
207The :mod:`pickle` module also defines three exceptions:
208
209
210.. exception:: PickleError
211
212 A common base class for the other exceptions defined below. This inherits from
213 :exc:`Exception`.
214
215
216.. exception:: PicklingError
217
218 This exception is raised when an unpicklable object is passed to the
219 :meth:`dump` method.
220
221
222.. exception:: UnpicklingError
223
224 This exception is raised when there is a problem unpickling an object. Note that
225 other exceptions may also be raised during unpickling, including (but not
226 necessarily limited to) :exc:`AttributeError`, :exc:`EOFError`,
227 :exc:`ImportError`, and :exc:`IndexError`.
228
229The :mod:`pickle` module also exports two callables [#]_, :class:`Pickler` and
230:class:`Unpickler`:
231
232
233.. class:: Pickler(file[, protocol])
234
235 This takes a file-like object to which it will write a pickle data stream.
236
237 If the *protocol* parameter is omitted, protocol 0 is used. If *protocol* is
238 specified as a negative value or :const:`HIGHEST_PROTOCOL`, the highest
239 protocol version will be used.
240
Georg Brandl116aa622007-08-15 14:28:22 +0000241 *file* must have a :meth:`write` method that accepts a single string argument.
242 It can thus be an open file object, a :mod:`StringIO` object, or any other
243 custom object that meets this interface.
244
245:class:`Pickler` objects define one (or two) public methods:
246
247
248.. method:: Pickler.dump(obj)
249
250 Write a pickled representation of *obj* to the open file object given in the
251 constructor. Either the binary or ASCII format will be used, depending on the
252 value of the *protocol* argument passed to the constructor.
253
254
255.. method:: Pickler.clear_memo()
256
257 Clears the pickler's "memo". The memo is the data structure that remembers
258 which objects the pickler has already seen, so that shared or recursive objects
259 pickled by reference and not by value. This method is useful when re-using
260 picklers.
261
262 .. note::
263
264 Prior to Python 2.3, :meth:`clear_memo` was only available on the picklers
265 created by :mod:`cPickle`. In the :mod:`pickle` module, picklers have an
266 instance variable called :attr:`memo` which is a Python dictionary. So to clear
267 the memo for a :mod:`pickle` module pickler, you could do the following::
268
269 mypickler.memo.clear()
270
271 Code that does not need to support older versions of Python should simply use
272 :meth:`clear_memo`.
273
274It is possible to make multiple calls to the :meth:`dump` method of the same
275:class:`Pickler` instance. These must then be matched to the same number of
276calls to the :meth:`load` method of the corresponding :class:`Unpickler`
277instance. If the same object is pickled by multiple :meth:`dump` calls, the
278:meth:`load` will all yield references to the same object. [#]_
279
280:class:`Unpickler` objects are defined as:
281
282
283.. class:: Unpickler(file)
284
285 This takes a file-like object from which it will read a pickle data stream.
286 This class automatically determines whether the data stream was written in
287 binary mode or not, so it does not need a flag as in the :class:`Pickler`
288 factory.
289
290 *file* must have two methods, a :meth:`read` method that takes an integer
291 argument, and a :meth:`readline` method that requires no arguments. Both
292 methods should return a string. Thus *file* can be a file object opened for
293 reading, a :mod:`StringIO` object, or any other custom object that meets this
294 interface.
295
296:class:`Unpickler` objects have one (or two) public methods:
297
298
299.. method:: Unpickler.load()
300
301 Read a pickled object representation from the open file object given in the
302 constructor, and return the reconstituted object hierarchy specified therein.
303
304 This method automatically determines whether the data stream was written in
305 binary mode or not.
306
307
308.. method:: Unpickler.noload()
309
310 This is just like :meth:`load` except that it doesn't actually create any
311 objects. This is useful primarily for finding what's called "persistent ids"
312 that may be referenced in a pickle data stream. See section
313 :ref:`pickle-protocol` below for more details.
314
315 **Note:** the :meth:`noload` method is currently only available on
316 :class:`Unpickler` objects created with the :mod:`cPickle` module.
317 :mod:`pickle` module :class:`Unpickler`\ s do not have the :meth:`noload`
318 method.
319
320
321What can be pickled and unpickled?
322----------------------------------
323
324The following types can be pickled:
325
326* ``None``, ``True``, and ``False``
327
Georg Brandlba956ae2007-11-29 17:24:34 +0000328* integers, floating point numbers, complex numbers
Georg Brandl116aa622007-08-15 14:28:22 +0000329
Georg Brandlf6945182008-02-01 11:56:49 +0000330* strings, bytes, bytearrays
Georg Brandl116aa622007-08-15 14:28:22 +0000331
332* tuples, lists, sets, and dictionaries containing only picklable objects
333
334* functions defined at the top level of a module
335
336* built-in functions defined at the top level of a module
337
338* classes that are defined at the top level of a module
339
340* instances of such classes whose :attr:`__dict__` or :meth:`__setstate__` is
341 picklable (see section :ref:`pickle-protocol` for details)
342
343Attempts to pickle unpicklable objects will raise the :exc:`PicklingError`
344exception; when this happens, an unspecified number of bytes may have already
345been written to the underlying file. Trying to pickle a highly recursive data
346structure may exceed the maximum recursion depth, a :exc:`RuntimeError` will be
347raised in this case. You can carefully raise this limit with
348:func:`sys.setrecursionlimit`.
349
350Note that functions (built-in and user-defined) are pickled by "fully qualified"
351name reference, not by value. This means that only the function name is
352pickled, along with the name of module the function is defined in. Neither the
353function's code, nor any of its function attributes are pickled. Thus the
354defining module must be importable in the unpickling environment, and the module
355must contain the named object, otherwise an exception will be raised. [#]_
356
357Similarly, classes are pickled by named reference, so the same restrictions in
358the unpickling environment apply. Note that none of the class's code or data is
359pickled, so in the following example the class attribute ``attr`` is not
360restored in the unpickling environment::
361
362 class Foo:
363 attr = 'a class attr'
364
365 picklestring = pickle.dumps(Foo)
366
367These restrictions are why picklable functions and classes must be defined in
368the top level of a module.
369
370Similarly, when class instances are pickled, their class's code and data are not
371pickled along with them. Only the instance data are pickled. This is done on
372purpose, so you can fix bugs in a class or add methods to the class and still
373load objects that were created with an earlier version of the class. If you
374plan to have long-lived objects that will see many versions of a class, it may
375be worthwhile to put a version number in the objects so that suitable
376conversions can be made by the class's :meth:`__setstate__` method.
377
378
379.. _pickle-protocol:
380
381The pickle protocol
382-------------------
383
384This section describes the "pickling protocol" that defines the interface
385between the pickler/unpickler and the objects that are being serialized. This
386protocol provides a standard way for you to define, customize, and control how
387your objects are serialized and de-serialized. The description in this section
388doesn't cover specific customizations that you can employ to make the unpickling
389environment slightly safer from untrusted pickle data streams; see section
390:ref:`pickle-sub` for more details.
391
392
393.. _pickle-inst:
394
395Pickling and unpickling normal class instances
396^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
397
398.. index::
399 single: __getinitargs__() (copy protocol)
400 single: __init__() (instance constructor)
401
Georg Brandl85eb8c12007-08-31 16:33:38 +0000402.. XXX is __getinitargs__ only used with old-style classes?
403
Georg Brandl116aa622007-08-15 14:28:22 +0000404When a pickled class instance is unpickled, its :meth:`__init__` method is
405normally *not* invoked. If it is desirable that the :meth:`__init__` method be
406called on unpickling, an old-style class can define a method
407:meth:`__getinitargs__`, which should return a *tuple* containing the arguments
408to be passed to the class constructor (:meth:`__init__` for example). The
409:meth:`__getinitargs__` method is called at pickle time; the tuple it returns is
410incorporated in the pickle for the instance.
411
412.. index:: single: __getnewargs__() (copy protocol)
413
414New-style types can provide a :meth:`__getnewargs__` method that is used for
415protocol 2. Implementing this method is needed if the type establishes some
416internal invariants when the instance is created, or if the memory allocation is
417affected by the values passed to the :meth:`__new__` method for the type (as it
Georg Brandl9afde1c2007-11-01 20:32:30 +0000418is for tuples and strings). Instances of a :term:`new-style class` :class:`C`
419are created using ::
Georg Brandl116aa622007-08-15 14:28:22 +0000420
421 obj = C.__new__(C, *args)
422
423
424where *args* is the result of calling :meth:`__getnewargs__` on the original
425object; if there is no :meth:`__getnewargs__`, an empty tuple is assumed.
426
427.. index::
428 single: __getstate__() (copy protocol)
429 single: __setstate__() (copy protocol)
430 single: __dict__ (instance attribute)
431
432Classes can further influence how their instances are pickled; if the class
433defines the method :meth:`__getstate__`, it is called and the return state is
434pickled as the contents for the instance, instead of the contents of the
435instance's dictionary. If there is no :meth:`__getstate__` method, the
436instance's :attr:`__dict__` is pickled.
437
438Upon unpickling, if the class also defines the method :meth:`__setstate__`, it
439is called with the unpickled state. [#]_ If there is no :meth:`__setstate__`
440method, the pickled state must be a dictionary and its items are assigned to the
441new instance's dictionary. If a class defines both :meth:`__getstate__` and
442:meth:`__setstate__`, the state object needn't be a dictionary and these methods
443can do what they want. [#]_
444
445.. warning::
446
Georg Brandl9afde1c2007-11-01 20:32:30 +0000447 For :term:`new-style class`\es, if :meth:`__getstate__` returns a false
448 value, the :meth:`__setstate__` method will not be called.
Georg Brandl116aa622007-08-15 14:28:22 +0000449
450
451Pickling and unpickling extension types
452^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
453
Christian Heimes05e8be12008-02-23 18:30:17 +0000454.. index::
455 single: __reduce__() (pickle protocol)
456 single: __reduce_ex__() (pickle protocol)
457 single: __safe_for_unpickling__ (pickle protocol)
458
Georg Brandl116aa622007-08-15 14:28:22 +0000459When the :class:`Pickler` encounters an object of a type it knows nothing about
460--- such as an extension type --- it looks in two places for a hint of how to
461pickle it. One alternative is for the object to implement a :meth:`__reduce__`
462method. If provided, at pickling time :meth:`__reduce__` will be called with no
463arguments, and it must return either a string or a tuple.
464
465If a string is returned, it names a global variable whose contents are pickled
466as normal. The string returned by :meth:`__reduce__` should be the object's
467local name relative to its module; the pickle module searches the module
468namespace to determine the object's module.
469
470When a tuple is returned, it must be between two and five elements long.
Martin v. Löwis2a241ca2008-04-05 18:58:09 +0000471Optional elements can either be omitted, or ``None`` can be provided as their
472value. The contents of this tuple are pickled as normal and used to
473reconstruct the object at unpickling time. The semantics of each element are:
Georg Brandl116aa622007-08-15 14:28:22 +0000474
475* A callable object that will be called to create the initial version of the
476 object. The next element of the tuple will provide arguments for this callable,
477 and later elements provide additional state information that will subsequently
478 be used to fully reconstruct the pickled data.
479
480 In the unpickling environment this object must be either a class, a callable
481 registered as a "safe constructor" (see below), or it must have an attribute
482 :attr:`__safe_for_unpickling__` with a true value. Otherwise, an
483 :exc:`UnpicklingError` will be raised in the unpickling environment. Note that
484 as usual, the callable itself is pickled by name.
485
Georg Brandl55ac8f02007-09-01 13:51:09 +0000486* A tuple of arguments for the callable object, not ``None``.
Georg Brandl116aa622007-08-15 14:28:22 +0000487
488* Optionally, the object's state, which will be passed to the object's
489 :meth:`__setstate__` method as described in section :ref:`pickle-inst`. If the
490 object has no :meth:`__setstate__` method, then, as above, the value must be a
491 dictionary and it will be added to the object's :attr:`__dict__`.
492
493* Optionally, an iterator (and not a sequence) yielding successive list items.
494 These list items will be pickled, and appended to the object using either
495 ``obj.append(item)`` or ``obj.extend(list_of_items)``. This is primarily used
496 for list subclasses, but may be used by other classes as long as they have
497 :meth:`append` and :meth:`extend` methods with the appropriate signature.
498 (Whether :meth:`append` or :meth:`extend` is used depends on which pickle
499 protocol version is used as well as the number of items to append, so both must
500 be supported.)
501
502* Optionally, an iterator (not a sequence) yielding successive dictionary items,
503 which should be tuples of the form ``(key, value)``. These items will be
504 pickled and stored to the object using ``obj[key] = value``. This is primarily
505 used for dictionary subclasses, but may be used by other classes as long as they
506 implement :meth:`__setitem__`.
507
508It is sometimes useful to know the protocol version when implementing
509:meth:`__reduce__`. This can be done by implementing a method named
510:meth:`__reduce_ex__` instead of :meth:`__reduce__`. :meth:`__reduce_ex__`, when
511it exists, is called in preference over :meth:`__reduce__` (you may still
512provide :meth:`__reduce__` for backwards compatibility). The
513:meth:`__reduce_ex__` method will be called with a single integer argument, the
514protocol version.
515
516The :class:`object` class implements both :meth:`__reduce__` and
517:meth:`__reduce_ex__`; however, if a subclass overrides :meth:`__reduce__` but
518not :meth:`__reduce_ex__`, the :meth:`__reduce_ex__` implementation detects this
519and calls :meth:`__reduce__`.
520
521An alternative to implementing a :meth:`__reduce__` method on the object to be
522pickled, is to register the callable with the :mod:`copy_reg` module. This
523module provides a way for programs to register "reduction functions" and
524constructors for user-defined types. Reduction functions have the same
525semantics and interface as the :meth:`__reduce__` method described above, except
526that they are called with a single argument, the object to be pickled.
527
528The registered constructor is deemed a "safe constructor" for purposes of
529unpickling as described above.
530
531
532Pickling and unpickling external objects
533^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
534
Christian Heimes05e8be12008-02-23 18:30:17 +0000535.. index::
536 single: persistent_id (pickle protocol)
537 single: persistent_load (pickle protocol)
538
Georg Brandl116aa622007-08-15 14:28:22 +0000539For the benefit of object persistence, the :mod:`pickle` module supports the
540notion of a reference to an object outside the pickled data stream. Such
541objects are referenced by a "persistent id", which is just an arbitrary string
542of printable ASCII characters. The resolution of such names is not defined by
543the :mod:`pickle` module; it will delegate this resolution to user defined
544functions on the pickler and unpickler. [#]_
545
546To define external persistent id resolution, you need to set the
547:attr:`persistent_id` attribute of the pickler object and the
548:attr:`persistent_load` attribute of the unpickler object.
549
550To pickle objects that have an external persistent id, the pickler must have a
551custom :func:`persistent_id` method that takes an object as an argument and
552returns either ``None`` or the persistent id for that object. When ``None`` is
553returned, the pickler simply pickles the object as normal. When a persistent id
554string is returned, the pickler will pickle that string, along with a marker so
555that the unpickler will recognize the string as a persistent id.
556
557To unpickle external objects, the unpickler must have a custom
558:func:`persistent_load` function that takes a persistent id string and returns
559the referenced object.
560
561Here's a silly example that *might* shed more light::
562
563 import pickle
564 from cStringIO import StringIO
565
566 src = StringIO()
567 p = pickle.Pickler(src)
568
569 def persistent_id(obj):
570 if hasattr(obj, 'x'):
571 return 'the value %d' % obj.x
572 else:
573 return None
574
575 p.persistent_id = persistent_id
576
577 class Integer:
578 def __init__(self, x):
579 self.x = x
580 def __str__(self):
581 return 'My name is integer %d' % self.x
582
583 i = Integer(7)
Georg Brandl6911e3c2007-09-04 07:15:32 +0000584 print(i)
Georg Brandl116aa622007-08-15 14:28:22 +0000585 p.dump(i)
586
587 datastream = src.getvalue()
Georg Brandl6911e3c2007-09-04 07:15:32 +0000588 print(repr(datastream))
Georg Brandl116aa622007-08-15 14:28:22 +0000589 dst = StringIO(datastream)
590
591 up = pickle.Unpickler(dst)
592
593 class FancyInteger(Integer):
594 def __str__(self):
595 return 'I am the integer %d' % self.x
596
597 def persistent_load(persid):
598 if persid.startswith('the value '):
599 value = int(persid.split()[2])
600 return FancyInteger(value)
601 else:
Collin Winter6fe2a6c2007-09-10 00:20:05 +0000602 raise pickle.UnpicklingError('Invalid persistent id')
Georg Brandl116aa622007-08-15 14:28:22 +0000603
604 up.persistent_load = persistent_load
605
606 j = up.load()
Georg Brandl6911e3c2007-09-04 07:15:32 +0000607 print(j)
Georg Brandl116aa622007-08-15 14:28:22 +0000608
609In the :mod:`cPickle` module, the unpickler's :attr:`persistent_load` attribute
610can also be set to a Python list, in which case, when the unpickler reaches a
611persistent id, the persistent id string will simply be appended to this list.
612This functionality exists so that a pickle data stream can be "sniffed" for
613object references without actually instantiating all the objects in a pickle.
614[#]_ Setting :attr:`persistent_load` to a list is usually used in conjunction
615with the :meth:`noload` method on the Unpickler.
616
Christian Heimes5b5e81c2007-12-31 16:14:33 +0000617.. BAW: Both pickle and cPickle support something called inst_persistent_id()
618 which appears to give unknown types a second shot at producing a persistent
619 id. Since Jim Fulton can't remember why it was added or what it's for, I'm
620 leaving it undocumented.
Georg Brandl116aa622007-08-15 14:28:22 +0000621
622
623.. _pickle-sub:
624
625Subclassing Unpicklers
626----------------------
627
Christian Heimes05e8be12008-02-23 18:30:17 +0000628.. index::
629 single: load_global() (pickle protocol)
630 single: find_global() (pickle protocol)
631
Georg Brandl116aa622007-08-15 14:28:22 +0000632By default, unpickling will import any class that it finds in the pickle data.
633You can control exactly what gets unpickled and what gets called by customizing
634your unpickler. Unfortunately, exactly how you do this is different depending
635on whether you're using :mod:`pickle` or :mod:`cPickle`. [#]_
636
637In the :mod:`pickle` module, you need to derive a subclass from
638:class:`Unpickler`, overriding the :meth:`load_global` method.
639:meth:`load_global` should read two lines from the pickle data stream where the
640first line will the name of the module containing the class and the second line
641will be the name of the instance's class. It then looks up the class, possibly
642importing the module and digging out the attribute, then it appends what it
643finds to the unpickler's stack. Later on, this class will be assigned to the
644:attr:`__class__` attribute of an empty class, as a way of magically creating an
645instance without calling its class's :meth:`__init__`. Your job (should you
646choose to accept it), would be to have :meth:`load_global` push onto the
647unpickler's stack, a known safe version of any class you deem safe to unpickle.
648It is up to you to produce such a class. Or you could raise an error if you
649want to disallow all unpickling of instances. If this sounds like a hack,
650you're right. Refer to the source code to make this work.
651
652Things are a little cleaner with :mod:`cPickle`, but not by much. To control
653what gets unpickled, you can set the unpickler's :attr:`find_global` attribute
654to a function or ``None``. If it is ``None`` then any attempts to unpickle
655instances will raise an :exc:`UnpicklingError`. If it is a function, then it
656should accept a module name and a class name, and return the corresponding class
657object. It is responsible for looking up the class and performing any necessary
658imports, and it may raise an error to prevent instances of the class from being
659unpickled.
660
661The moral of the story is that you should be really careful about the source of
662the strings your application unpickles.
663
664
665.. _pickle-example:
666
667Example
668-------
669
670For the simplest code, use the :func:`dump` and :func:`load` functions. Note
671that a self-referencing list is pickled and restored correctly. ::
672
673 import pickle
674
675 data1 = {'a': [1, 2.0, 3, 4+6j],
Georg Brandlf6945182008-02-01 11:56:49 +0000676 'b': ("string", "string using Unicode features \u0394"),
Georg Brandl116aa622007-08-15 14:28:22 +0000677 'c': None}
678
679 selfref_list = [1, 2, 3]
680 selfref_list.append(selfref_list)
681
682 output = open('data.pkl', 'wb')
683
684 # Pickle dictionary using protocol 0.
685 pickle.dump(data1, output)
686
687 # Pickle the list using the highest protocol available.
688 pickle.dump(selfref_list, output, -1)
689
690 output.close()
691
692The following example reads the resulting pickled data. When reading a
693pickle-containing file, you should open the file in binary mode because you
694can't be sure if the ASCII or binary format was used. ::
695
696 import pprint, pickle
697
698 pkl_file = open('data.pkl', 'rb')
699
700 data1 = pickle.load(pkl_file)
701 pprint.pprint(data1)
702
703 data2 = pickle.load(pkl_file)
704 pprint.pprint(data2)
705
706 pkl_file.close()
707
708Here's a larger example that shows how to modify pickling behavior for a class.
709The :class:`TextReader` class opens a text file, and returns the line number and
710line contents each time its :meth:`readline` method is called. If a
711:class:`TextReader` instance is pickled, all attributes *except* the file object
712member are saved. When the instance is unpickled, the file is reopened, and
713reading resumes from the last location. The :meth:`__setstate__` and
714:meth:`__getstate__` methods are used to implement this behavior. ::
715
716 #!/usr/local/bin/python
717
718 class TextReader:
719 """Print and number lines in a text file."""
720 def __init__(self, file):
721 self.file = file
722 self.fh = open(file)
723 self.lineno = 0
724
725 def readline(self):
726 self.lineno = self.lineno + 1
727 line = self.fh.readline()
728 if not line:
729 return None
730 if line.endswith("\n"):
731 line = line[:-1]
732 return "%d: %s" % (self.lineno, line)
733
734 def __getstate__(self):
735 odict = self.__dict__.copy() # copy the dict since we change it
736 del odict['fh'] # remove filehandle entry
737 return odict
738
739 def __setstate__(self, dict):
740 fh = open(dict['file']) # reopen file
741 count = dict['lineno'] # read from file...
742 while count: # until line count is restored
743 fh.readline()
744 count = count - 1
745 self.__dict__.update(dict) # update attributes
746 self.fh = fh # save the file object
747
748A sample usage might be something like this::
749
750 >>> import TextReader
751 >>> obj = TextReader.TextReader("TextReader.py")
752 >>> obj.readline()
753 '1: #!/usr/local/bin/python'
754 >>> obj.readline()
755 '2: '
756 >>> obj.readline()
757 '3: class TextReader:'
758 >>> import pickle
759 >>> pickle.dump(obj, open('save.p', 'wb'))
760
761If you want to see that :mod:`pickle` works across Python processes, start
762another Python session, before continuing. What follows can happen from either
763the same process or a new process. ::
764
765 >>> import pickle
766 >>> reader = pickle.load(open('save.p', 'rb'))
767 >>> reader.readline()
768 '4: """Print and number lines in a text file."""'
769
770
771.. seealso::
772
773 Module :mod:`copy_reg`
774 Pickle interface constructor registration for extension types.
775
776 Module :mod:`shelve`
777 Indexed databases of objects; uses :mod:`pickle`.
778
779 Module :mod:`copy`
780 Shallow and deep object copying.
781
782 Module :mod:`marshal`
783 High-performance serialization of built-in types.
784
785
786:mod:`cPickle` --- A faster :mod:`pickle`
787=========================================
788
789.. module:: cPickle
790 :synopsis: Faster version of pickle, but not subclassable.
791.. moduleauthor:: Jim Fulton <jim@zope.com>
792.. sectionauthor:: Fred L. Drake, Jr. <fdrake@acm.org>
793
794
795.. index:: module: pickle
796
797The :mod:`cPickle` module supports serialization and de-serialization of Python
798objects, providing an interface and functionality nearly identical to the
799:mod:`pickle` module. There are several differences, the most important being
800performance and subclassability.
801
802First, :mod:`cPickle` can be up to 1000 times faster than :mod:`pickle` because
803the former is implemented in C. Second, in the :mod:`cPickle` module the
804callables :func:`Pickler` and :func:`Unpickler` are functions, not classes.
805This means that you cannot use them to derive custom pickling and unpickling
806subclasses. Most applications have no need for this functionality and should
807benefit from the greatly improved performance of the :mod:`cPickle` module.
808
809The pickle data stream produced by :mod:`pickle` and :mod:`cPickle` are
810identical, so it is possible to use :mod:`pickle` and :mod:`cPickle`
811interchangeably with existing pickles. [#]_
812
813There are additional minor differences in API between :mod:`cPickle` and
814:mod:`pickle`, however for most applications, they are interchangeable. More
815documentation is provided in the :mod:`pickle` module documentation, which
816includes a list of the documented differences.
817
818.. rubric:: Footnotes
819
820.. [#] Don't confuse this with the :mod:`marshal` module
821
822.. [#] In the :mod:`pickle` module these callables are classes, which you could
823 subclass to customize the behavior. However, in the :mod:`cPickle` module these
824 callables are factory functions and so cannot be subclassed. One common reason
825 to subclass is to control what objects can actually be unpickled. See section
826 :ref:`pickle-sub` for more details.
827
828.. [#] *Warning*: this is intended for pickling multiple objects without intervening
829 modifications to the objects or their parts. If you modify an object and then
830 pickle it again using the same :class:`Pickler` instance, the object is not
831 pickled again --- a reference to it is pickled and the :class:`Unpickler` will
832 return the old value, not the modified one. There are two problems here: (1)
833 detecting changes, and (2) marshalling a minimal set of changes. Garbage
834 Collection may also become a problem here.
835
836.. [#] The exception raised will likely be an :exc:`ImportError` or an
837 :exc:`AttributeError` but it could be something else.
838
839.. [#] These methods can also be used to implement copying class instances.
840
841.. [#] This protocol is also used by the shallow and deep copying operations defined in
842 the :mod:`copy` module.
843
844.. [#] The actual mechanism for associating these user defined functions is slightly
845 different for :mod:`pickle` and :mod:`cPickle`. The description given here
846 works the same for both implementations. Users of the :mod:`pickle` module
847 could also use subclassing to effect the same results, overriding the
848 :meth:`persistent_id` and :meth:`persistent_load` methods in the derived
849 classes.
850
851.. [#] We'll leave you with the image of Guido and Jim sitting around sniffing pickles
852 in their living rooms.
853
854.. [#] A word of caution: the mechanisms described here use internal attributes and
855 methods, which are subject to change in future versions of Python. We intend to
856 someday provide a common interface for controlling this behavior, which will
857 work in either :mod:`pickle` or :mod:`cPickle`.
858
859.. [#] Since the pickle data format is actually a tiny stack-oriented programming
860 language, and some freedom is taken in the encodings of certain objects, it is
861 possible that the two modules produce different data streams for the same input
862 objects. However it is guaranteed that they will always be able to read each
863 other's data streams.
864