blob: 2e6ea488f97eaaf5ccd5eea5b0f14e8be29c976a [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001:mod:`pickle` --- Python object serialization
2=============================================
3
4.. index::
5 single: persistence
6 pair: persistent; objects
7 pair: serializing; objects
8 pair: marshalling; objects
9 pair: flattening; objects
10 pair: pickling; objects
11
12.. module:: pickle
13 :synopsis: Convert Python objects to streams of bytes and back.
Christian Heimes5b5e81c2007-12-31 16:14:33 +000014.. sectionauthor:: Jim Kerr <jbkerr@sr.hp.com>.
15.. sectionauthor:: Barry Warsaw <barry@zope.com>
Georg Brandl116aa622007-08-15 14:28:22 +000016
17The :mod:`pickle` module implements a fundamental, but powerful algorithm for
18serializing and de-serializing a Python object structure. "Pickling" is the
19process whereby a Python object hierarchy is converted into a byte stream, and
20"unpickling" is the inverse operation, whereby a byte stream is converted back
21into an object hierarchy. Pickling (and unpickling) is alternatively known as
22"serialization", "marshalling," [#]_ or "flattening", however, to avoid
Benjamin Petersonbe149d02008-06-20 21:03:22 +000023confusion, the terms used here are "pickling" and "unpickling"..
Georg Brandl116aa622007-08-15 14:28:22 +000024
25
26Relationship to other Python modules
27------------------------------------
28
Benjamin Petersonbe149d02008-06-20 21:03:22 +000029The :mod:`pickle` module has an transparent optimizer (:mod:`_pickle`) written
30in C. It is used whenever available. Otherwise the pure Python implementation is
31used.
Georg Brandl116aa622007-08-15 14:28:22 +000032
33Python has a more primitive serialization module called :mod:`marshal`, but in
34general :mod:`pickle` should always be the preferred way to serialize Python
35objects. :mod:`marshal` exists primarily to support Python's :file:`.pyc`
36files.
37
38The :mod:`pickle` module differs from :mod:`marshal` several significant ways:
39
40* The :mod:`pickle` module keeps track of the objects it has already serialized,
41 so that later references to the same object won't be serialized again.
42 :mod:`marshal` doesn't do this.
43
44 This has implications both for recursive objects and object sharing. Recursive
45 objects are objects that contain references to themselves. These are not
46 handled by marshal, and in fact, attempting to marshal recursive objects will
47 crash your Python interpreter. Object sharing happens when there are multiple
48 references to the same object in different places in the object hierarchy being
49 serialized. :mod:`pickle` stores such objects only once, and ensures that all
50 other references point to the master copy. Shared objects remain shared, which
51 can be very important for mutable objects.
52
53* :mod:`marshal` cannot be used to serialize user-defined classes and their
54 instances. :mod:`pickle` can save and restore class instances transparently,
55 however the class definition must be importable and live in the same module as
56 when the object was stored.
57
58* The :mod:`marshal` serialization format is not guaranteed to be portable
59 across Python versions. Because its primary job in life is to support
60 :file:`.pyc` files, the Python implementers reserve the right to change the
61 serialization format in non-backwards compatible ways should the need arise.
62 The :mod:`pickle` serialization format is guaranteed to be backwards compatible
63 across Python releases.
64
65.. warning::
66
67 The :mod:`pickle` module is not intended to be secure against erroneous or
68 maliciously constructed data. Never unpickle data received from an untrusted or
69 unauthenticated source.
70
71Note that serialization is a more primitive notion than persistence; although
72:mod:`pickle` reads and writes file objects, it does not handle the issue of
73naming persistent objects, nor the (even more complicated) issue of concurrent
74access to persistent objects. The :mod:`pickle` module can transform a complex
75object into a byte stream and it can transform the byte stream into an object
76with the same internal structure. Perhaps the most obvious thing to do with
77these byte streams is to write them onto a file, but it is also conceivable to
78send them across a network or store them in a database. The module
79:mod:`shelve` provides a simple interface to pickle and unpickle objects on
80DBM-style database files.
81
82
83Data stream format
84------------------
85
86.. index::
87 single: XDR
88 single: External Data Representation
89
90The data format used by :mod:`pickle` is Python-specific. This has the
91advantage that there are no restrictions imposed by external standards such as
92XDR (which can't represent pointer sharing); however it means that non-Python
93programs may not be able to reconstruct pickled Python objects.
94
95By default, the :mod:`pickle` data format uses a printable ASCII representation.
96This is slightly more voluminous than a binary representation. The big
97advantage of using printable ASCII (and of some other characteristics of
98:mod:`pickle`'s representation) is that for debugging or recovery purposes it is
99possible for a human to read the pickled file with a standard text editor.
100
Georg Brandl42f2ae02008-04-06 08:39:37 +0000101There are currently 4 different protocols which can be used for pickling.
Georg Brandl116aa622007-08-15 14:28:22 +0000102
103* Protocol version 0 is the original ASCII protocol and is backwards compatible
104 with earlier versions of Python.
105
106* Protocol version 1 is the old binary format which is also compatible with
107 earlier versions of Python.
108
109* Protocol version 2 was introduced in Python 2.3. It provides much more
Georg Brandl9afde1c2007-11-01 20:32:30 +0000110 efficient pickling of :term:`new-style class`\es.
Georg Brandl116aa622007-08-15 14:28:22 +0000111
Georg Brandl42f2ae02008-04-06 08:39:37 +0000112* Protocol version 3 was added in Python 3.0. It has explicit support for
113 bytes and cannot be unpickled by Python 2.x pickle modules.
114
Georg Brandl116aa622007-08-15 14:28:22 +0000115Refer to :pep:`307` for more information.
116
Georg Brandl42f2ae02008-04-06 08:39:37 +0000117If a *protocol* is not specified, protocol 3 is used. If *protocol* is
118specified as a negative value or :const:`HIGHEST_PROTOCOL`, the highest
119protocol version available will be used.
Georg Brandl116aa622007-08-15 14:28:22 +0000120
Georg Brandl116aa622007-08-15 14:28:22 +0000121A binary format, which is slightly more efficient, can be chosen by specifying a
122*protocol* version >= 1.
123
124
125Usage
126-----
127
128To serialize an object hierarchy, you first create a pickler, then you call the
129pickler's :meth:`dump` method. To de-serialize a data stream, you first create
130an unpickler, then you call the unpickler's :meth:`load` method. The
131:mod:`pickle` module provides the following constant:
132
133
134.. data:: HIGHEST_PROTOCOL
135
136 The highest protocol version available. This value can be passed as a
137 *protocol* value.
138
Georg Brandl116aa622007-08-15 14:28:22 +0000139.. note::
140
141 Be sure to always open pickle files created with protocols >= 1 in binary mode.
142 For the old ASCII-based pickle protocol 0 you can use either text mode or binary
143 mode as long as you stay consistent.
144
145 A pickle file written with protocol 0 in binary mode will contain lone linefeeds
146 as line terminators and therefore will look "funny" when viewed in Notepad or
147 other editors which do not support this format.
148
149The :mod:`pickle` module provides the following functions to make the pickling
150process more convenient:
151
152
153.. function:: dump(obj, file[, protocol])
154
155 Write a pickled representation of *obj* to the open file object *file*. This is
156 equivalent to ``Pickler(file, protocol).dump(obj)``.
157
Georg Brandl42f2ae02008-04-06 08:39:37 +0000158 If the *protocol* parameter is omitted, protocol 3 is used. If *protocol* is
159 specified as a negative value or :const:`HIGHEST_PROTOCOL`, the highest
160 protocol version will be used.
Georg Brandl116aa622007-08-15 14:28:22 +0000161
Georg Brandl116aa622007-08-15 14:28:22 +0000162 *file* must have a :meth:`write` method that accepts a single string argument.
163 It can thus be a file object opened for writing, a :mod:`StringIO` object, or
164 any other custom object that meets this interface.
165
166
167.. function:: load(file)
168
169 Read a string from the open file object *file* and interpret it as a pickle data
170 stream, reconstructing and returning the original object hierarchy. This is
171 equivalent to ``Unpickler(file).load()``.
172
173 *file* must have two methods, a :meth:`read` method that takes an integer
174 argument, and a :meth:`readline` method that requires no arguments. Both
175 methods should return a string. Thus *file* can be a file object opened for
176 reading, a :mod:`StringIO` object, or any other custom object that meets this
177 interface.
178
179 This function automatically determines whether the data stream was written in
180 binary mode or not.
181
182
183.. function:: dumps(obj[, protocol])
184
Mark Summerfieldb9e23042008-04-21 14:47:45 +0000185 Return the pickled representation of the object as a :class:`bytes`
186 object, instead of writing it to a file.
Georg Brandl116aa622007-08-15 14:28:22 +0000187
Georg Brandl42f2ae02008-04-06 08:39:37 +0000188 If the *protocol* parameter is omitted, protocol 3 is used. If *protocol*
189 is specified as a negative value or :const:`HIGHEST_PROTOCOL`, the highest
190 protocol version will be used.
Georg Brandl116aa622007-08-15 14:28:22 +0000191
Georg Brandl116aa622007-08-15 14:28:22 +0000192
Mark Summerfieldb9e23042008-04-21 14:47:45 +0000193.. function:: loads(bytes_object)
Georg Brandl116aa622007-08-15 14:28:22 +0000194
Mark Summerfieldb9e23042008-04-21 14:47:45 +0000195 Read a pickled object hierarchy from a :class:`bytes` object.
196 Bytes past the pickled object's representation are ignored.
Georg Brandl116aa622007-08-15 14:28:22 +0000197
198The :mod:`pickle` module also defines three exceptions:
199
200
201.. exception:: PickleError
202
203 A common base class for the other exceptions defined below. This inherits from
204 :exc:`Exception`.
205
206
207.. exception:: PicklingError
208
209 This exception is raised when an unpicklable object is passed to the
210 :meth:`dump` method.
211
212
213.. exception:: UnpicklingError
214
215 This exception is raised when there is a problem unpickling an object. Note that
216 other exceptions may also be raised during unpickling, including (but not
217 necessarily limited to) :exc:`AttributeError`, :exc:`EOFError`,
218 :exc:`ImportError`, and :exc:`IndexError`.
219
Benjamin Petersonbe149d02008-06-20 21:03:22 +0000220The :mod:`pickle` module also exports two callables, :class:`Pickler` and
Georg Brandl116aa622007-08-15 14:28:22 +0000221:class:`Unpickler`:
222
223
224.. class:: Pickler(file[, protocol])
225
226 This takes a file-like object to which it will write a pickle data stream.
227
Georg Brandl42f2ae02008-04-06 08:39:37 +0000228 If the *protocol* parameter is omitted, protocol 3 is used. If *protocol* is
Georg Brandl116aa622007-08-15 14:28:22 +0000229 specified as a negative value or :const:`HIGHEST_PROTOCOL`, the highest
230 protocol version will be used.
231
Georg Brandl116aa622007-08-15 14:28:22 +0000232 *file* must have a :meth:`write` method that accepts a single string argument.
233 It can thus be an open file object, a :mod:`StringIO` object, or any other
234 custom object that meets this interface.
235
Benjamin Petersone41251e2008-04-25 01:59:09 +0000236 :class:`Pickler` objects define one (or two) public methods:
Georg Brandl116aa622007-08-15 14:28:22 +0000237
238
Benjamin Petersone41251e2008-04-25 01:59:09 +0000239 .. method:: dump(obj)
Georg Brandl116aa622007-08-15 14:28:22 +0000240
Benjamin Petersone41251e2008-04-25 01:59:09 +0000241 Write a pickled representation of *obj* to the open file object given in the
242 constructor. Either the binary or ASCII format will be used, depending on the
243 value of the *protocol* argument passed to the constructor.
Georg Brandl116aa622007-08-15 14:28:22 +0000244
245
Benjamin Petersone41251e2008-04-25 01:59:09 +0000246 .. method:: clear_memo()
Georg Brandl116aa622007-08-15 14:28:22 +0000247
Benjamin Petersone41251e2008-04-25 01:59:09 +0000248 Clears the pickler's "memo". The memo is the data structure that remembers
249 which objects the pickler has already seen, so that shared or recursive objects
250 pickled by reference and not by value. This method is useful when re-using
251 picklers.
Georg Brandl116aa622007-08-15 14:28:22 +0000252
Georg Brandl116aa622007-08-15 14:28:22 +0000253
254It is possible to make multiple calls to the :meth:`dump` method of the same
255:class:`Pickler` instance. These must then be matched to the same number of
256calls to the :meth:`load` method of the corresponding :class:`Unpickler`
257instance. If the same object is pickled by multiple :meth:`dump` calls, the
258:meth:`load` will all yield references to the same object. [#]_
259
260:class:`Unpickler` objects are defined as:
261
262
263.. class:: Unpickler(file)
264
265 This takes a file-like object from which it will read a pickle data stream.
266 This class automatically determines whether the data stream was written in
267 binary mode or not, so it does not need a flag as in the :class:`Pickler`
268 factory.
269
270 *file* must have two methods, a :meth:`read` method that takes an integer
271 argument, and a :meth:`readline` method that requires no arguments. Both
272 methods should return a string. Thus *file* can be a file object opened for
273 reading, a :mod:`StringIO` object, or any other custom object that meets this
274 interface.
275
Benjamin Petersone41251e2008-04-25 01:59:09 +0000276 :class:`Unpickler` objects have one (or two) public methods:
Georg Brandl116aa622007-08-15 14:28:22 +0000277
278
Benjamin Petersone41251e2008-04-25 01:59:09 +0000279 .. method:: load()
Georg Brandl116aa622007-08-15 14:28:22 +0000280
Benjamin Petersone41251e2008-04-25 01:59:09 +0000281 Read a pickled object representation from the open file object given in
282 the constructor, and return the reconstituted object hierarchy specified
283 therein.
Georg Brandl116aa622007-08-15 14:28:22 +0000284
Benjamin Petersone41251e2008-04-25 01:59:09 +0000285 This method automatically determines whether the data stream was written
286 in binary mode or not.
Georg Brandl116aa622007-08-15 14:28:22 +0000287
288
Benjamin Petersone41251e2008-04-25 01:59:09 +0000289 .. method:: noload()
Georg Brandl116aa622007-08-15 14:28:22 +0000290
Benjamin Petersone41251e2008-04-25 01:59:09 +0000291 This is just like :meth:`load` except that it doesn't actually create any
292 objects. This is useful primarily for finding what's called "persistent
293 ids" that may be referenced in a pickle data stream. See section
294 :ref:`pickle-protocol` below for more details.
Georg Brandl116aa622007-08-15 14:28:22 +0000295
Georg Brandl116aa622007-08-15 14:28:22 +0000296
297What can be pickled and unpickled?
298----------------------------------
299
300The following types can be pickled:
301
302* ``None``, ``True``, and ``False``
303
Georg Brandlba956ae2007-11-29 17:24:34 +0000304* integers, floating point numbers, complex numbers
Georg Brandl116aa622007-08-15 14:28:22 +0000305
Georg Brandlf6945182008-02-01 11:56:49 +0000306* strings, bytes, bytearrays
Georg Brandl116aa622007-08-15 14:28:22 +0000307
308* tuples, lists, sets, and dictionaries containing only picklable objects
309
310* functions defined at the top level of a module
311
312* built-in functions defined at the top level of a module
313
314* classes that are defined at the top level of a module
315
316* instances of such classes whose :attr:`__dict__` or :meth:`__setstate__` is
317 picklable (see section :ref:`pickle-protocol` for details)
318
319Attempts to pickle unpicklable objects will raise the :exc:`PicklingError`
320exception; when this happens, an unspecified number of bytes may have already
321been written to the underlying file. Trying to pickle a highly recursive data
322structure may exceed the maximum recursion depth, a :exc:`RuntimeError` will be
323raised in this case. You can carefully raise this limit with
324:func:`sys.setrecursionlimit`.
325
326Note that functions (built-in and user-defined) are pickled by "fully qualified"
327name reference, not by value. This means that only the function name is
328pickled, along with the name of module the function is defined in. Neither the
329function's code, nor any of its function attributes are pickled. Thus the
330defining module must be importable in the unpickling environment, and the module
331must contain the named object, otherwise an exception will be raised. [#]_
332
333Similarly, classes are pickled by named reference, so the same restrictions in
334the unpickling environment apply. Note that none of the class's code or data is
335pickled, so in the following example the class attribute ``attr`` is not
336restored in the unpickling environment::
337
338 class Foo:
339 attr = 'a class attr'
340
341 picklestring = pickle.dumps(Foo)
342
343These restrictions are why picklable functions and classes must be defined in
344the top level of a module.
345
346Similarly, when class instances are pickled, their class's code and data are not
347pickled along with them. Only the instance data are pickled. This is done on
348purpose, so you can fix bugs in a class or add methods to the class and still
349load objects that were created with an earlier version of the class. If you
350plan to have long-lived objects that will see many versions of a class, it may
351be worthwhile to put a version number in the objects so that suitable
352conversions can be made by the class's :meth:`__setstate__` method.
353
354
355.. _pickle-protocol:
356
357The pickle protocol
358-------------------
359
360This section describes the "pickling protocol" that defines the interface
361between the pickler/unpickler and the objects that are being serialized. This
362protocol provides a standard way for you to define, customize, and control how
363your objects are serialized and de-serialized. The description in this section
364doesn't cover specific customizations that you can employ to make the unpickling
365environment slightly safer from untrusted pickle data streams; see section
366:ref:`pickle-sub` for more details.
367
368
369.. _pickle-inst:
370
371Pickling and unpickling normal class instances
372^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
373
374.. index::
375 single: __getinitargs__() (copy protocol)
376 single: __init__() (instance constructor)
377
Georg Brandl85eb8c12007-08-31 16:33:38 +0000378.. XXX is __getinitargs__ only used with old-style classes?
Georg Brandl23e8db52008-04-07 19:17:06 +0000379.. XXX update w.r.t Py3k's classes
Georg Brandl85eb8c12007-08-31 16:33:38 +0000380
Georg Brandl116aa622007-08-15 14:28:22 +0000381When a pickled class instance is unpickled, its :meth:`__init__` method is
382normally *not* invoked. If it is desirable that the :meth:`__init__` method be
383called on unpickling, an old-style class can define a method
384:meth:`__getinitargs__`, which should return a *tuple* containing the arguments
385to be passed to the class constructor (:meth:`__init__` for example). The
386:meth:`__getinitargs__` method is called at pickle time; the tuple it returns is
387incorporated in the pickle for the instance.
388
389.. index:: single: __getnewargs__() (copy protocol)
390
391New-style types can provide a :meth:`__getnewargs__` method that is used for
392protocol 2. Implementing this method is needed if the type establishes some
393internal invariants when the instance is created, or if the memory allocation is
394affected by the values passed to the :meth:`__new__` method for the type (as it
Georg Brandl9afde1c2007-11-01 20:32:30 +0000395is for tuples and strings). Instances of a :term:`new-style class` :class:`C`
396are created using ::
Georg Brandl116aa622007-08-15 14:28:22 +0000397
398 obj = C.__new__(C, *args)
399
400
401where *args* is the result of calling :meth:`__getnewargs__` on the original
402object; if there is no :meth:`__getnewargs__`, an empty tuple is assumed.
403
404.. index::
405 single: __getstate__() (copy protocol)
406 single: __setstate__() (copy protocol)
407 single: __dict__ (instance attribute)
408
409Classes can further influence how their instances are pickled; if the class
410defines the method :meth:`__getstate__`, it is called and the return state is
411pickled as the contents for the instance, instead of the contents of the
412instance's dictionary. If there is no :meth:`__getstate__` method, the
413instance's :attr:`__dict__` is pickled.
414
415Upon unpickling, if the class also defines the method :meth:`__setstate__`, it
416is called with the unpickled state. [#]_ If there is no :meth:`__setstate__`
417method, the pickled state must be a dictionary and its items are assigned to the
418new instance's dictionary. If a class defines both :meth:`__getstate__` and
419:meth:`__setstate__`, the state object needn't be a dictionary and these methods
420can do what they want. [#]_
421
422.. warning::
423
Georg Brandl23e8db52008-04-07 19:17:06 +0000424 If :meth:`__getstate__` returns a false value, the :meth:`__setstate__`
425 method will not be called.
Georg Brandl116aa622007-08-15 14:28:22 +0000426
427
428Pickling and unpickling extension types
429^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
430
Christian Heimes05e8be12008-02-23 18:30:17 +0000431.. index::
432 single: __reduce__() (pickle protocol)
433 single: __reduce_ex__() (pickle protocol)
434 single: __safe_for_unpickling__ (pickle protocol)
435
Georg Brandl116aa622007-08-15 14:28:22 +0000436When the :class:`Pickler` encounters an object of a type it knows nothing about
437--- such as an extension type --- it looks in two places for a hint of how to
438pickle it. One alternative is for the object to implement a :meth:`__reduce__`
439method. If provided, at pickling time :meth:`__reduce__` will be called with no
440arguments, and it must return either a string or a tuple.
441
442If a string is returned, it names a global variable whose contents are pickled
443as normal. The string returned by :meth:`__reduce__` should be the object's
444local name relative to its module; the pickle module searches the module
445namespace to determine the object's module.
446
447When a tuple is returned, it must be between two and five elements long.
Martin v. Löwis2a241ca2008-04-05 18:58:09 +0000448Optional elements can either be omitted, or ``None`` can be provided as their
449value. The contents of this tuple are pickled as normal and used to
450reconstruct the object at unpickling time. The semantics of each element are:
Georg Brandl116aa622007-08-15 14:28:22 +0000451
452* A callable object that will be called to create the initial version of the
453 object. The next element of the tuple will provide arguments for this callable,
454 and later elements provide additional state information that will subsequently
455 be used to fully reconstruct the pickled data.
456
457 In the unpickling environment this object must be either a class, a callable
458 registered as a "safe constructor" (see below), or it must have an attribute
459 :attr:`__safe_for_unpickling__` with a true value. Otherwise, an
460 :exc:`UnpicklingError` will be raised in the unpickling environment. Note that
461 as usual, the callable itself is pickled by name.
462
Georg Brandl55ac8f02007-09-01 13:51:09 +0000463* A tuple of arguments for the callable object, not ``None``.
Georg Brandl116aa622007-08-15 14:28:22 +0000464
465* Optionally, the object's state, which will be passed to the object's
466 :meth:`__setstate__` method as described in section :ref:`pickle-inst`. If the
467 object has no :meth:`__setstate__` method, then, as above, the value must be a
468 dictionary and it will be added to the object's :attr:`__dict__`.
469
470* Optionally, an iterator (and not a sequence) yielding successive list items.
471 These list items will be pickled, and appended to the object using either
472 ``obj.append(item)`` or ``obj.extend(list_of_items)``. This is primarily used
473 for list subclasses, but may be used by other classes as long as they have
474 :meth:`append` and :meth:`extend` methods with the appropriate signature.
475 (Whether :meth:`append` or :meth:`extend` is used depends on which pickle
476 protocol version is used as well as the number of items to append, so both must
477 be supported.)
478
479* Optionally, an iterator (not a sequence) yielding successive dictionary items,
480 which should be tuples of the form ``(key, value)``. These items will be
481 pickled and stored to the object using ``obj[key] = value``. This is primarily
482 used for dictionary subclasses, but may be used by other classes as long as they
483 implement :meth:`__setitem__`.
484
485It is sometimes useful to know the protocol version when implementing
486:meth:`__reduce__`. This can be done by implementing a method named
487:meth:`__reduce_ex__` instead of :meth:`__reduce__`. :meth:`__reduce_ex__`, when
488it exists, is called in preference over :meth:`__reduce__` (you may still
489provide :meth:`__reduce__` for backwards compatibility). The
490:meth:`__reduce_ex__` method will be called with a single integer argument, the
491protocol version.
492
493The :class:`object` class implements both :meth:`__reduce__` and
494:meth:`__reduce_ex__`; however, if a subclass overrides :meth:`__reduce__` but
495not :meth:`__reduce_ex__`, the :meth:`__reduce_ex__` implementation detects this
496and calls :meth:`__reduce__`.
497
498An alternative to implementing a :meth:`__reduce__` method on the object to be
Alexandre Vassalottif7fa63d2008-05-11 08:55:36 +0000499pickled, is to register the callable with the :mod:`copyreg` module. This
Georg Brandl116aa622007-08-15 14:28:22 +0000500module provides a way for programs to register "reduction functions" and
501constructors for user-defined types. Reduction functions have the same
502semantics and interface as the :meth:`__reduce__` method described above, except
503that they are called with a single argument, the object to be pickled.
504
505The registered constructor is deemed a "safe constructor" for purposes of
506unpickling as described above.
507
508
509Pickling and unpickling external objects
510^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
511
Christian Heimes05e8be12008-02-23 18:30:17 +0000512.. index::
513 single: persistent_id (pickle protocol)
514 single: persistent_load (pickle protocol)
515
Georg Brandl116aa622007-08-15 14:28:22 +0000516For the benefit of object persistence, the :mod:`pickle` module supports the
517notion of a reference to an object outside the pickled data stream. Such
518objects are referenced by a "persistent id", which is just an arbitrary string
519of printable ASCII characters. The resolution of such names is not defined by
520the :mod:`pickle` module; it will delegate this resolution to user defined
Benjamin Petersonbe149d02008-06-20 21:03:22 +0000521functions on the pickler and unpickler.
Georg Brandl116aa622007-08-15 14:28:22 +0000522
523To define external persistent id resolution, you need to set the
524:attr:`persistent_id` attribute of the pickler object and the
525:attr:`persistent_load` attribute of the unpickler object.
526
527To pickle objects that have an external persistent id, the pickler must have a
528custom :func:`persistent_id` method that takes an object as an argument and
529returns either ``None`` or the persistent id for that object. When ``None`` is
530returned, the pickler simply pickles the object as normal. When a persistent id
531string is returned, the pickler will pickle that string, along with a marker so
532that the unpickler will recognize the string as a persistent id.
533
534To unpickle external objects, the unpickler must have a custom
535:func:`persistent_load` function that takes a persistent id string and returns
536the referenced object.
537
538Here's a silly example that *might* shed more light::
539
540 import pickle
Georg Brandl03124942008-06-10 15:50:56 +0000541 from io import StringIO
Georg Brandl116aa622007-08-15 14:28:22 +0000542
543 src = StringIO()
544 p = pickle.Pickler(src)
545
546 def persistent_id(obj):
547 if hasattr(obj, 'x'):
548 return 'the value %d' % obj.x
549 else:
550 return None
551
552 p.persistent_id = persistent_id
553
554 class Integer:
555 def __init__(self, x):
556 self.x = x
557 def __str__(self):
558 return 'My name is integer %d' % self.x
559
560 i = Integer(7)
Georg Brandl6911e3c2007-09-04 07:15:32 +0000561 print(i)
Georg Brandl116aa622007-08-15 14:28:22 +0000562 p.dump(i)
563
564 datastream = src.getvalue()
Georg Brandl6911e3c2007-09-04 07:15:32 +0000565 print(repr(datastream))
Georg Brandl116aa622007-08-15 14:28:22 +0000566 dst = StringIO(datastream)
567
568 up = pickle.Unpickler(dst)
569
570 class FancyInteger(Integer):
571 def __str__(self):
572 return 'I am the integer %d' % self.x
573
574 def persistent_load(persid):
575 if persid.startswith('the value '):
576 value = int(persid.split()[2])
577 return FancyInteger(value)
578 else:
Collin Winter6fe2a6c2007-09-10 00:20:05 +0000579 raise pickle.UnpicklingError('Invalid persistent id')
Georg Brandl116aa622007-08-15 14:28:22 +0000580
581 up.persistent_load = persistent_load
582
583 j = up.load()
Georg Brandl6911e3c2007-09-04 07:15:32 +0000584 print(j)
Georg Brandl116aa622007-08-15 14:28:22 +0000585
Georg Brandl116aa622007-08-15 14:28:22 +0000586
Benjamin Petersonbe149d02008-06-20 21:03:22 +0000587.. BAW: pickle supports something called inst_persistent_id()
Christian Heimes5b5e81c2007-12-31 16:14:33 +0000588 which appears to give unknown types a second shot at producing a persistent
589 id. Since Jim Fulton can't remember why it was added or what it's for, I'm
590 leaving it undocumented.
Georg Brandl116aa622007-08-15 14:28:22 +0000591
592
593.. _pickle-sub:
594
595Subclassing Unpicklers
596----------------------
597
Christian Heimes05e8be12008-02-23 18:30:17 +0000598.. index::
599 single: load_global() (pickle protocol)
600 single: find_global() (pickle protocol)
601
Georg Brandl116aa622007-08-15 14:28:22 +0000602By default, unpickling will import any class that it finds in the pickle data.
603You can control exactly what gets unpickled and what gets called by customizing
Benjamin Petersonbe149d02008-06-20 21:03:22 +0000604your unpickler.
Georg Brandl116aa622007-08-15 14:28:22 +0000605
Benjamin Petersonbe149d02008-06-20 21:03:22 +0000606You need to derive a subclass from :class:`Unpickler`, overriding the
607:meth:`load_global` method. :meth:`load_global` should read two lines from the
608pickle data stream where the first line will the name of the module containing
609the class and the second line will be the name of the instance's class. It then
610looks up the class, possibly importing the module and digging out the attribute,
611then it appends what it finds to the unpickler's stack. Later on, this class
612will be assigned to the :attr:`__class__` attribute of an empty class, as a way
613of magically creating an instance without calling its class's
614:meth:`__init__`. Your job (should you choose to accept it), would be to have
615:meth:`load_global` push onto the unpickler's stack, a known safe version of any
616class you deem safe to unpickle. It is up to you to produce such a class. Or
617you could raise an error if you want to disallow all unpickling of instances.
618If this sounds like a hack, you're right. Refer to the source code to make this
619work.
Georg Brandl116aa622007-08-15 14:28:22 +0000620
621The moral of the story is that you should be really careful about the source of
622the strings your application unpickles.
623
624
625.. _pickle-example:
626
627Example
628-------
629
630For the simplest code, use the :func:`dump` and :func:`load` functions. Note
631that a self-referencing list is pickled and restored correctly. ::
632
633 import pickle
634
635 data1 = {'a': [1, 2.0, 3, 4+6j],
Georg Brandlf6945182008-02-01 11:56:49 +0000636 'b': ("string", "string using Unicode features \u0394"),
Georg Brandl116aa622007-08-15 14:28:22 +0000637 'c': None}
638
639 selfref_list = [1, 2, 3]
640 selfref_list.append(selfref_list)
641
642 output = open('data.pkl', 'wb')
643
Georg Brandl42f2ae02008-04-06 08:39:37 +0000644 # Pickle dictionary using protocol 2.
645 pickle.dump(data1, output, 2)
Georg Brandl116aa622007-08-15 14:28:22 +0000646
647 # Pickle the list using the highest protocol available.
648 pickle.dump(selfref_list, output, -1)
649
650 output.close()
651
652The following example reads the resulting pickled data. When reading a
653pickle-containing file, you should open the file in binary mode because you
654can't be sure if the ASCII or binary format was used. ::
655
656 import pprint, pickle
657
658 pkl_file = open('data.pkl', 'rb')
659
660 data1 = pickle.load(pkl_file)
661 pprint.pprint(data1)
662
663 data2 = pickle.load(pkl_file)
664 pprint.pprint(data2)
665
666 pkl_file.close()
667
668Here's a larger example that shows how to modify pickling behavior for a class.
669The :class:`TextReader` class opens a text file, and returns the line number and
670line contents each time its :meth:`readline` method is called. If a
671:class:`TextReader` instance is pickled, all attributes *except* the file object
672member are saved. When the instance is unpickled, the file is reopened, and
673reading resumes from the last location. The :meth:`__setstate__` and
674:meth:`__getstate__` methods are used to implement this behavior. ::
675
676 #!/usr/local/bin/python
677
678 class TextReader:
679 """Print and number lines in a text file."""
680 def __init__(self, file):
681 self.file = file
682 self.fh = open(file)
683 self.lineno = 0
684
685 def readline(self):
686 self.lineno = self.lineno + 1
687 line = self.fh.readline()
688 if not line:
689 return None
690 if line.endswith("\n"):
691 line = line[:-1]
692 return "%d: %s" % (self.lineno, line)
693
694 def __getstate__(self):
695 odict = self.__dict__.copy() # copy the dict since we change it
696 del odict['fh'] # remove filehandle entry
697 return odict
698
699 def __setstate__(self, dict):
700 fh = open(dict['file']) # reopen file
701 count = dict['lineno'] # read from file...
702 while count: # until line count is restored
703 fh.readline()
704 count = count - 1
705 self.__dict__.update(dict) # update attributes
706 self.fh = fh # save the file object
707
708A sample usage might be something like this::
709
710 >>> import TextReader
711 >>> obj = TextReader.TextReader("TextReader.py")
712 >>> obj.readline()
713 '1: #!/usr/local/bin/python'
714 >>> obj.readline()
715 '2: '
716 >>> obj.readline()
717 '3: class TextReader:'
718 >>> import pickle
719 >>> pickle.dump(obj, open('save.p', 'wb'))
720
721If you want to see that :mod:`pickle` works across Python processes, start
722another Python session, before continuing. What follows can happen from either
723the same process or a new process. ::
724
725 >>> import pickle
726 >>> reader = pickle.load(open('save.p', 'rb'))
727 >>> reader.readline()
728 '4: """Print and number lines in a text file."""'
729
730
731.. seealso::
732
Alexandre Vassalottif7fa63d2008-05-11 08:55:36 +0000733 Module :mod:`copyreg`
Georg Brandl116aa622007-08-15 14:28:22 +0000734 Pickle interface constructor registration for extension types.
735
736 Module :mod:`shelve`
737 Indexed databases of objects; uses :mod:`pickle`.
738
739 Module :mod:`copy`
740 Shallow and deep object copying.
741
742 Module :mod:`marshal`
743 High-performance serialization of built-in types.
744
745
Georg Brandl116aa622007-08-15 14:28:22 +0000746.. rubric:: Footnotes
747
748.. [#] Don't confuse this with the :mod:`marshal` module
749
Georg Brandl116aa622007-08-15 14:28:22 +0000750.. [#] *Warning*: this is intended for pickling multiple objects without intervening
751 modifications to the objects or their parts. If you modify an object and then
752 pickle it again using the same :class:`Pickler` instance, the object is not
753 pickled again --- a reference to it is pickled and the :class:`Unpickler` will
754 return the old value, not the modified one. There are two problems here: (1)
755 detecting changes, and (2) marshalling a minimal set of changes. Garbage
756 Collection may also become a problem here.
757
758.. [#] The exception raised will likely be an :exc:`ImportError` or an
759 :exc:`AttributeError` but it could be something else.
760
761.. [#] These methods can also be used to implement copying class instances.
762
763.. [#] This protocol is also used by the shallow and deep copying operations defined in
764 the :mod:`copy` module.