blob: 21e4001139ec92fba99bdcd21a172bdfb287ed72 [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001:mod:`pickle` --- Python object serialization
2=============================================
3
4.. index::
5 single: persistence
6 pair: persistent; objects
7 pair: serializing; objects
8 pair: marshalling; objects
9 pair: flattening; objects
10 pair: pickling; objects
11
12.. module:: pickle
13 :synopsis: Convert Python objects to streams of bytes and back.
Christian Heimes5b5e81c2007-12-31 16:14:33 +000014.. sectionauthor:: Jim Kerr <jbkerr@sr.hp.com>.
15.. sectionauthor:: Barry Warsaw <barry@zope.com>
Georg Brandl116aa622007-08-15 14:28:22 +000016
Alexandre Vassalotti9d7665d2009-04-03 06:13:29 +000017
Georg Brandl116aa622007-08-15 14:28:22 +000018The :mod:`pickle` module implements a fundamental, but powerful algorithm for
19serializing and de-serializing a Python object structure. "Pickling" is the
20process whereby a Python object hierarchy is converted into a byte stream, and
21"unpickling" is the inverse operation, whereby a byte stream is converted back
22into an object hierarchy. Pickling (and unpickling) is alternatively known as
23"serialization", "marshalling," [#]_ or "flattening", however, to avoid
Benjamin Petersonbe149d02008-06-20 21:03:22 +000024confusion, the terms used here are "pickling" and "unpickling"..
Georg Brandl116aa622007-08-15 14:28:22 +000025
26
27Relationship to other Python modules
28------------------------------------
29
Benjamin Petersonbe149d02008-06-20 21:03:22 +000030The :mod:`pickle` module has an transparent optimizer (:mod:`_pickle`) written
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +000031in C. It is used whenever available. Otherwise the pure Python implementation is
Benjamin Petersonbe149d02008-06-20 21:03:22 +000032used.
Georg Brandl116aa622007-08-15 14:28:22 +000033
34Python has a more primitive serialization module called :mod:`marshal`, but in
35general :mod:`pickle` should always be the preferred way to serialize Python
36objects. :mod:`marshal` exists primarily to support Python's :file:`.pyc`
37files.
38
39The :mod:`pickle` module differs from :mod:`marshal` several significant ways:
40
41* The :mod:`pickle` module keeps track of the objects it has already serialized,
42 so that later references to the same object won't be serialized again.
43 :mod:`marshal` doesn't do this.
44
45 This has implications both for recursive objects and object sharing. Recursive
46 objects are objects that contain references to themselves. These are not
47 handled by marshal, and in fact, attempting to marshal recursive objects will
48 crash your Python interpreter. Object sharing happens when there are multiple
49 references to the same object in different places in the object hierarchy being
50 serialized. :mod:`pickle` stores such objects only once, and ensures that all
51 other references point to the master copy. Shared objects remain shared, which
52 can be very important for mutable objects.
53
54* :mod:`marshal` cannot be used to serialize user-defined classes and their
55 instances. :mod:`pickle` can save and restore class instances transparently,
56 however the class definition must be importable and live in the same module as
57 when the object was stored.
58
59* The :mod:`marshal` serialization format is not guaranteed to be portable
60 across Python versions. Because its primary job in life is to support
61 :file:`.pyc` files, the Python implementers reserve the right to change the
62 serialization format in non-backwards compatible ways should the need arise.
63 The :mod:`pickle` serialization format is guaranteed to be backwards compatible
64 across Python releases.
65
66.. warning::
67
68 The :mod:`pickle` module is not intended to be secure against erroneous or
Georg Brandle720c0a2009-04-27 16:20:50 +000069 maliciously constructed data. Never unpickle data received from an untrusted
70 or unauthenticated source.
Georg Brandl116aa622007-08-15 14:28:22 +000071
72Note that serialization is a more primitive notion than persistence; although
73:mod:`pickle` reads and writes file objects, it does not handle the issue of
74naming persistent objects, nor the (even more complicated) issue of concurrent
75access to persistent objects. The :mod:`pickle` module can transform a complex
76object into a byte stream and it can transform the byte stream into an object
77with the same internal structure. Perhaps the most obvious thing to do with
78these byte streams is to write them onto a file, but it is also conceivable to
79send them across a network or store them in a database. The module
80:mod:`shelve` provides a simple interface to pickle and unpickle objects on
81DBM-style database files.
82
83
84Data stream format
85------------------
86
87.. index::
88 single: XDR
89 single: External Data Representation
90
91The data format used by :mod:`pickle` is Python-specific. This has the
92advantage that there are no restrictions imposed by external standards such as
93XDR (which can't represent pointer sharing); however it means that non-Python
94programs may not be able to reconstruct pickled Python objects.
95
Alexandre Vassalotti758bca62008-10-18 19:25:07 +000096By default, the :mod:`pickle` data format uses a compact binary representation.
97The module :mod:`pickletools` contains tools for analyzing data streams
98generated by :mod:`pickle`.
Georg Brandl116aa622007-08-15 14:28:22 +000099
Georg Brandl42f2ae02008-04-06 08:39:37 +0000100There are currently 4 different protocols which can be used for pickling.
Georg Brandl116aa622007-08-15 14:28:22 +0000101
Alexandre Vassalottif7d08c72009-01-23 04:50:05 +0000102* Protocol version 0 is the original human-readable protocol and is
103 backwards compatible with earlier versions of Python.
Georg Brandl116aa622007-08-15 14:28:22 +0000104
105* Protocol version 1 is the old binary format which is also compatible with
106 earlier versions of Python.
107
108* Protocol version 2 was introduced in Python 2.3. It provides much more
Georg Brandl9afde1c2007-11-01 20:32:30 +0000109 efficient pickling of :term:`new-style class`\es.
Georg Brandl116aa622007-08-15 14:28:22 +0000110
Georg Brandl42f2ae02008-04-06 08:39:37 +0000111* Protocol version 3 was added in Python 3.0. It has explicit support for
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000112 bytes and cannot be unpickled by Python 2.x pickle modules. This is
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000113 the current recommended protocol, use it whenever it is possible.
Georg Brandl42f2ae02008-04-06 08:39:37 +0000114
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000115Refer to :pep:`307` for information about improvements brought by
116protocol 2. See :mod:`pickletools`'s source code for extensive
117comments about opcodes used by pickle protocols.
Georg Brandl116aa622007-08-15 14:28:22 +0000118
Georg Brandl116aa622007-08-15 14:28:22 +0000119
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000120Module Interface
121----------------
Georg Brandl116aa622007-08-15 14:28:22 +0000122
123To serialize an object hierarchy, you first create a pickler, then you call the
124pickler's :meth:`dump` method. To de-serialize a data stream, you first create
125an unpickler, then you call the unpickler's :meth:`load` method. The
126:mod:`pickle` module provides the following constant:
127
128
129.. data:: HIGHEST_PROTOCOL
130
131 The highest protocol version available. This value can be passed as a
132 *protocol* value.
133
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000134.. data:: DEFAULT_PROTOCOL
135
136 The default protocol used for pickling. May be less than HIGHEST_PROTOCOL.
137 Currently the default protocol is 3; a backward-incompatible protocol
138 designed for Python 3.0.
139
140
Georg Brandl116aa622007-08-15 14:28:22 +0000141The :mod:`pickle` module provides the following functions to make the pickling
142process more convenient:
143
Antoine Pitroud9dfaa92009-06-04 20:32:06 +0000144.. function:: dump(obj, file[, protocol, \*, fix_imports=True])
Georg Brandl116aa622007-08-15 14:28:22 +0000145
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000146 Write a pickled representation of *obj* to the open file object *file*. This
147 is equivalent to ``Pickler(file, protocol).dump(obj)``.
Georg Brandl116aa622007-08-15 14:28:22 +0000148
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000149 The optional *protocol* argument tells the pickler to use the given protocol;
150 supported protocols are 0, 1, 2, 3. The default protocol is 3; a
151 backward-incompatible protocol designed for Python 3.0.
Georg Brandl116aa622007-08-15 14:28:22 +0000152
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000153 Specifying a negative protocol version selects the highest protocol version
154 supported. The higher the protocol used, the more recent the version of
155 Python needed to read the pickle produced.
Georg Brandl116aa622007-08-15 14:28:22 +0000156
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000157 The *file* argument must have a write() method that accepts a single bytes
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000158 argument. It can thus be a file object opened for binary writing, a
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000159 io.BytesIO instance, or any other custom object that meets this interface.
Georg Brandl116aa622007-08-15 14:28:22 +0000160
Antoine Pitroud9dfaa92009-06-04 20:32:06 +0000161 If *fix_imports* is True and *protocol* is less than 3, pickle will try to
162 map the new Python 3.x names to the old module names used in Python 2.x,
163 so that the pickle data stream is readable with Python 2.x.
164
165.. function:: dumps(obj[, protocol, \*, fix_imports=True])
Georg Brandl116aa622007-08-15 14:28:22 +0000166
Mark Summerfieldb9e23042008-04-21 14:47:45 +0000167 Return the pickled representation of the object as a :class:`bytes`
168 object, instead of writing it to a file.
Georg Brandl116aa622007-08-15 14:28:22 +0000169
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000170 The optional *protocol* argument tells the pickler to use the given protocol;
171 supported protocols are 0, 1, 2, 3. The default protocol is 3; a
172 backward-incompatible protocol designed for Python 3.0.
173
174 Specifying a negative protocol version selects the highest protocol version
175 supported. The higher the protocol used, the more recent the version of
176 Python needed to read the pickle produced.
177
Antoine Pitroud9dfaa92009-06-04 20:32:06 +0000178 If *fix_imports* is True and *protocol* is less than 3, pickle will try to
179 map the new Python 3.x names to the old module names used in Python 2.x,
180 so that the pickle data stream is readable with Python 2.x.
181
182.. function:: load(file, [\*, fix_imports=True, encoding="ASCII", errors="strict"])
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000183
184 Read a pickled object representation from the open file object *file* and
185 return the reconstituted object hierarchy specified therein. This is
186 equivalent to ``Unpickler(file).load()``.
187
188 The protocol version of the pickle is detected automatically, so no protocol
189 argument is needed. Bytes past the pickled object's representation are
190 ignored.
191
192 The argument *file* must have two methods, a read() method that takes an
193 integer argument, and a readline() method that requires no arguments. Both
194 methods should return bytes. Thus *file* can be a binary file object opened
195 for reading, a BytesIO object, or any other custom object that meets this
196 interface.
197
Antoine Pitroud9dfaa92009-06-04 20:32:06 +0000198 Optional keyword arguments are *fix_imports*, *encoding* and *errors*,
199 which are used to control compatiblity support for pickle stream generated
200 by Python 2.x. If *fix_imports* is True, pickle will try to map the old
201 Python 2.x names to the new names used in Python 3.x. The *encoding* and
202 *errors* tell pickle how to decode 8-bit string instances pickled by Python
203 2.x; these default to 'ASCII' and 'strict', respectively.
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000204
Antoine Pitroud9dfaa92009-06-04 20:32:06 +0000205.. function:: loads(bytes_object, [\*, fix_imports=True, encoding="ASCII", errors="strict"])
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000206
207 Read a pickled object hierarchy from a :class:`bytes` object and return the
208 reconstituted object hierarchy specified therein
209
210 The protocol version of the pickle is detected automatically, so no protocol
211 argument is needed. Bytes past the pickled object's representation are
212 ignored.
213
Antoine Pitroud9dfaa92009-06-04 20:32:06 +0000214 Optional keyword arguments are *fix_imports*, *encoding* and *errors*,
215 which are used to control compatiblity support for pickle stream generated
216 by Python 2.x. If *fix_imports* is True, pickle will try to map the old
217 Python 2.x names to the new names used in Python 3.x. The *encoding* and
218 *errors* tell pickle how to decode 8-bit string instances pickled by Python
219 2.x; these default to 'ASCII' and 'strict', respectively.
Georg Brandl116aa622007-08-15 14:28:22 +0000220
Georg Brandl116aa622007-08-15 14:28:22 +0000221
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000222The :mod:`pickle` module defines three exceptions:
Georg Brandl116aa622007-08-15 14:28:22 +0000223
224.. exception:: PickleError
225
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000226 Common base class for the other pickling exceptions. It inherits
Georg Brandl116aa622007-08-15 14:28:22 +0000227 :exc:`Exception`.
228
Georg Brandl116aa622007-08-15 14:28:22 +0000229.. exception:: PicklingError
230
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000231 Error raised when an unpicklable object is encountered by :class:`Pickler`.
232 It inherits :exc:`PickleError`.
Georg Brandl116aa622007-08-15 14:28:22 +0000233
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000234 Refer to :ref:`pickle-picklable` to learn what kinds of objects can be
235 pickled.
236
Georg Brandl116aa622007-08-15 14:28:22 +0000237.. exception:: UnpicklingError
238
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000239 Error raised when there a problem unpickling an object, such as a data
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000240 corruption or a security violation. It inherits :exc:`PickleError`.
Georg Brandl116aa622007-08-15 14:28:22 +0000241
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000242 Note that other exceptions may also be raised during unpickling, including
243 (but not necessarily limited to) AttributeError, EOFError, ImportError, and
244 IndexError.
245
246
247The :mod:`pickle` module exports two classes, :class:`Pickler` and
Georg Brandl116aa622007-08-15 14:28:22 +0000248:class:`Unpickler`:
249
Antoine Pitroud9dfaa92009-06-04 20:32:06 +0000250.. class:: Pickler(file[, protocol, \*, fix_imports=True])
Georg Brandl116aa622007-08-15 14:28:22 +0000251
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000252 This takes a binary file for writing a pickle data stream.
Georg Brandl116aa622007-08-15 14:28:22 +0000253
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000254 The optional *protocol* argument tells the pickler to use the given protocol;
255 supported protocols are 0, 1, 2, 3. The default protocol is 3; a
256 backward-incompatible protocol designed for Python 3.0.
Georg Brandl116aa622007-08-15 14:28:22 +0000257
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000258 Specifying a negative protocol version selects the highest protocol version
259 supported. The higher the protocol used, the more recent the version of
260 Python needed to read the pickle produced.
Georg Brandl116aa622007-08-15 14:28:22 +0000261
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000262 The *file* argument must have a write() method that accepts a single bytes
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000263 argument. It can thus be a file object opened for binary writing, a
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000264 io.BytesIO instance, or any other custom object that meets this interface.
Georg Brandl116aa622007-08-15 14:28:22 +0000265
Antoine Pitroud9dfaa92009-06-04 20:32:06 +0000266 If *fix_imports* is True and *protocol* is less than 3, pickle will try to
267 map the new Python 3.x names to the old module names used in Python 2.x,
268 so that the pickle data stream is readable with Python 2.x.
269
Benjamin Petersone41251e2008-04-25 01:59:09 +0000270 .. method:: dump(obj)
Georg Brandl116aa622007-08-15 14:28:22 +0000271
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000272 Write a pickled representation of *obj* to the open file object given in
273 the constructor.
Georg Brandl116aa622007-08-15 14:28:22 +0000274
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000275 .. method:: persistent_id(obj)
276
277 Do nothing by default. This exists so a subclass can override it.
278
279 If :meth:`persistent_id` returns ``None``, *obj* is pickled as usual. Any
280 other value causes :class:`Pickler` to emit the returned value as a
281 persistent ID for *obj*. The meaning of this persistent ID should be
282 defined by :meth:`Unpickler.persistent_load`. Note that the value
283 returned by :meth:`persistent_id` cannot itself have a persistent ID.
284
285 See :ref:`pickle-persistent` for details and examples of uses.
Georg Brandl116aa622007-08-15 14:28:22 +0000286
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000287 .. attribute:: fast
288
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000289 Deprecated. Enable fast mode if set to a true value. The fast mode
290 disables the usage of memo, therefore speeding the pickling process by not
291 generating superfluous PUT opcodes. It should not be used with
292 self-referential objects, doing otherwise will cause :class:`Pickler` to
293 recurse infinitely.
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000294
295 Use :func:`pickletools.optimize` if you need more compact pickles.
296
Georg Brandl116aa622007-08-15 14:28:22 +0000297
Antoine Pitroud9dfaa92009-06-04 20:32:06 +0000298.. class:: Unpickler(file, [\*, fix_imports=True, encoding="ASCII", errors="strict"])
Georg Brandl116aa622007-08-15 14:28:22 +0000299
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000300 This takes a binary file for reading a pickle data stream.
Georg Brandl116aa622007-08-15 14:28:22 +0000301
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000302 The protocol version of the pickle is detected automatically, so no
303 protocol argument is needed.
304
305 The argument *file* must have two methods, a read() method that takes an
306 integer argument, and a readline() method that requires no arguments. Both
307 methods should return bytes. Thus *file* can be a binary file object opened
308 for reading, a BytesIO object, or any other custom object that meets this
Georg Brandl116aa622007-08-15 14:28:22 +0000309 interface.
310
Antoine Pitroud9dfaa92009-06-04 20:32:06 +0000311 Optional keyword arguments are *fix_imports*, *encoding* and *errors*,
312 which are used to control compatiblity support for pickle stream generated
313 by Python 2.x. If *fix_imports* is True, pickle will try to map the old
314 Python 2.x names to the new names used in Python 3.x. The *encoding* and
315 *errors* tell pickle how to decode 8-bit string instances pickled by Python
316 2.x; these default to 'ASCII' and 'strict', respectively.
Georg Brandl116aa622007-08-15 14:28:22 +0000317
Benjamin Petersone41251e2008-04-25 01:59:09 +0000318 .. method:: load()
Georg Brandl116aa622007-08-15 14:28:22 +0000319
Benjamin Petersone41251e2008-04-25 01:59:09 +0000320 Read a pickled object representation from the open file object given in
321 the constructor, and return the reconstituted object hierarchy specified
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000322 therein. Bytes past the pickled object's representation are ignored.
Georg Brandl116aa622007-08-15 14:28:22 +0000323
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000324 .. method:: persistent_load(pid)
Georg Brandl116aa622007-08-15 14:28:22 +0000325
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000326 Raise an :exc:`UnpickingError` by default.
Georg Brandl116aa622007-08-15 14:28:22 +0000327
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000328 If defined, :meth:`persistent_load` should return the object specified by
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000329 the persistent ID *pid*. If an invalid persistent ID is encountered, an
330 :exc:`UnpickingError` should be raised.
Georg Brandl116aa622007-08-15 14:28:22 +0000331
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000332 See :ref:`pickle-persistent` for details and examples of uses.
333
334 .. method:: find_class(module, name)
335
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000336 Import *module* if necessary and return the object called *name* from it,
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000337 where the *module* and *name* arguments are :class:`str` objects. Note,
338 unlike its name suggests, :meth:`find_class` is also used for finding
339 functions.
Georg Brandl116aa622007-08-15 14:28:22 +0000340
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000341 Subclasses may override this to gain control over what type of objects and
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000342 how they can be loaded, potentially reducing security risks. Refer to
343 :ref:`pickle-restrict` for details.
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000344
345
346.. _pickle-picklable:
Georg Brandl116aa622007-08-15 14:28:22 +0000347
348What can be pickled and unpickled?
349----------------------------------
350
351The following types can be pickled:
352
353* ``None``, ``True``, and ``False``
354
Georg Brandlba956ae2007-11-29 17:24:34 +0000355* integers, floating point numbers, complex numbers
Georg Brandl116aa622007-08-15 14:28:22 +0000356
Georg Brandlf6945182008-02-01 11:56:49 +0000357* strings, bytes, bytearrays
Georg Brandl116aa622007-08-15 14:28:22 +0000358
359* tuples, lists, sets, and dictionaries containing only picklable objects
360
361* functions defined at the top level of a module
362
363* built-in functions defined at the top level of a module
364
365* classes that are defined at the top level of a module
366
367* instances of such classes whose :attr:`__dict__` or :meth:`__setstate__` is
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000368 picklable (see section :ref:`pickle-inst` for details)
Georg Brandl116aa622007-08-15 14:28:22 +0000369
370Attempts to pickle unpicklable objects will raise the :exc:`PicklingError`
371exception; when this happens, an unspecified number of bytes may have already
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000372been written to the underlying file. Trying to pickle a highly recursive data
Georg Brandl116aa622007-08-15 14:28:22 +0000373structure may exceed the maximum recursion depth, a :exc:`RuntimeError` will be
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000374raised in this case. You can carefully raise this limit with
Georg Brandl116aa622007-08-15 14:28:22 +0000375:func:`sys.setrecursionlimit`.
376
377Note that functions (built-in and user-defined) are pickled by "fully qualified"
378name reference, not by value. This means that only the function name is
379pickled, along with the name of module the function is defined in. Neither the
380function's code, nor any of its function attributes are pickled. Thus the
381defining module must be importable in the unpickling environment, and the module
382must contain the named object, otherwise an exception will be raised. [#]_
383
384Similarly, classes are pickled by named reference, so the same restrictions in
385the unpickling environment apply. Note that none of the class's code or data is
386pickled, so in the following example the class attribute ``attr`` is not
387restored in the unpickling environment::
388
389 class Foo:
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000390 attr = 'A class attribute'
Georg Brandl116aa622007-08-15 14:28:22 +0000391
392 picklestring = pickle.dumps(Foo)
393
394These restrictions are why picklable functions and classes must be defined in
395the top level of a module.
396
397Similarly, when class instances are pickled, their class's code and data are not
398pickled along with them. Only the instance data are pickled. This is done on
399purpose, so you can fix bugs in a class or add methods to the class and still
400load objects that were created with an earlier version of the class. If you
401plan to have long-lived objects that will see many versions of a class, it may
402be worthwhile to put a version number in the objects so that suitable
403conversions can be made by the class's :meth:`__setstate__` method.
404
405
Georg Brandl116aa622007-08-15 14:28:22 +0000406.. _pickle-inst:
407
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000408Pickling Class Instances
409------------------------
Georg Brandl116aa622007-08-15 14:28:22 +0000410
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000411In this section, we describe the general mechanisms available to you to define,
412customize, and control how class instances are pickled and unpickled.
Georg Brandl116aa622007-08-15 14:28:22 +0000413
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000414In most cases, no additional code is needed to make instances picklable. By
415default, pickle will retrieve the class and the attributes of an instance via
416introspection. When a class instance is unpickled, its :meth:`__init__` method
417is usually *not* invoked. The default behaviour first creates an uninitialized
418instance and then restores the saved attributes. The following code shows an
419implementation of this behaviour::
Georg Brandl85eb8c12007-08-31 16:33:38 +0000420
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000421 def save(obj):
422 return (obj.__class__, obj.__dict__)
423
424 def load(cls, attributes):
425 obj = cls.__new__(cls)
426 obj.__dict__.update(attributes)
427 return obj
Georg Brandl116aa622007-08-15 14:28:22 +0000428
429.. index:: single: __getnewargs__() (copy protocol)
430
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000431Classes can alter the default behaviour by providing one or severals special
432methods. In protocol 2 and newer, classes that implements the
433:meth:`__getnewargs__` method can dictate the values passed to the
434:meth:`__new__` method upon unpickling. This is often needed for classes
435whose :meth:`__new__` method requires arguments.
Georg Brandl116aa622007-08-15 14:28:22 +0000436
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000437.. index:: single: __getstate__() (copy protocol)
Georg Brandl116aa622007-08-15 14:28:22 +0000438
439Classes can further influence how their instances are pickled; if the class
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000440defines the method :meth:`__getstate__`, it is called and the returned object is
Georg Brandl116aa622007-08-15 14:28:22 +0000441pickled as the contents for the instance, instead of the contents of the
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000442instance's dictionary. If the :meth:`__getstate__` method is absent, the
443instance's :attr:`__dict__` is pickled as usual.
Georg Brandl116aa622007-08-15 14:28:22 +0000444
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000445.. index:: single: __setstate__() (copy protocol)
Georg Brandl116aa622007-08-15 14:28:22 +0000446
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000447Upon unpickling, if the class defines :meth:`__setstate__`, it is called with
448the unpickled state. In that case, there is no requirement for the state object
449to be a dictionary. Otherwise, the pickled state must be a dictionary and its
450items are assigned to the new instance's dictionary.
451
452.. note::
Georg Brandl116aa622007-08-15 14:28:22 +0000453
Georg Brandl23e8db52008-04-07 19:17:06 +0000454 If :meth:`__getstate__` returns a false value, the :meth:`__setstate__`
455 method will not be called.
Georg Brandl116aa622007-08-15 14:28:22 +0000456
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000457Refer to the section :ref:`pickle-state` for more information about how to use
458the methods :meth:`__getstate__` and :meth:`__setstate__`.
Georg Brandl116aa622007-08-15 14:28:22 +0000459
Benjamin Petersond23f8222009-04-05 19:13:16 +0000460.. note::
Georg Brandle720c0a2009-04-27 16:20:50 +0000461
Benjamin Petersond23f8222009-04-05 19:13:16 +0000462 At unpickling time, some methods like :meth:`__getattr__`,
463 :meth:`__getattribute__`, or :meth:`__setattr__` may be called upon the
464 instance. In case those methods rely on some internal invariant being
465 true, the type should implement either :meth:`__getinitargs__` or
466 :meth:`__getnewargs__` to establish such an invariant; otherwise, neither
467 :meth:`__new__` nor :meth:`__init__` will be called.
468
Christian Heimes05e8be12008-02-23 18:30:17 +0000469.. index::
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000470 pair: copy; protocol
471 single: __reduce__() (copy protocol)
Christian Heimes05e8be12008-02-23 18:30:17 +0000472
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000473As we shall see, pickle does not use directly the methods described above. In
474fact, these methods are part of the copy protocol which implements the
475:meth:`__reduce__` special method. The copy protocol provides a unified
476interface for retrieving the data necessary for pickling and copying
Georg Brandl48310cd2009-01-03 21:18:54 +0000477objects. [#]_
Georg Brandl116aa622007-08-15 14:28:22 +0000478
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000479Although powerful, implementing :meth:`__reduce__` directly in your classes is
480error prone. For this reason, class designers should use the high-level
481interface (i.e., :meth:`__getnewargs__`, :meth:`__getstate__` and
Georg Brandlae2dbe22009-03-13 19:04:40 +0000482:meth:`__setstate__`) whenever possible. We will show, however, cases where using
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000483:meth:`__reduce__` is the only option or leads to more efficient pickling or
484both.
Georg Brandl116aa622007-08-15 14:28:22 +0000485
Georg Brandlae2dbe22009-03-13 19:04:40 +0000486The interface is currently defined as follows. The :meth:`__reduce__` method
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000487takes no argument and shall return either a string or preferably a tuple (the
Georg Brandlae2dbe22009-03-13 19:04:40 +0000488returned object is often referred to as the "reduce value").
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000489
490If a string is returned, the string should be interpreted as the name of a
491global variable. It should be the object's local name relative to its module;
492the pickle module searches the module namespace to determine the object's
493module. This behaviour is typically useful for singletons.
494
495When a tuple is returned, it must be between two and five items long. Optional
496items can either be omitted, or ``None`` can be provided as their value. The
497semantics of each item are in order:
498
499.. XXX Mention __newobj__ special-case?
Georg Brandl116aa622007-08-15 14:28:22 +0000500
501* A callable object that will be called to create the initial version of the
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000502 object.
Georg Brandl116aa622007-08-15 14:28:22 +0000503
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000504* A tuple of arguments for the callable object. An empty tuple must be given if
505 the callable does not accept any argument.
Georg Brandl116aa622007-08-15 14:28:22 +0000506
507* Optionally, the object's state, which will be passed to the object's
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000508 :meth:`__setstate__` method as previously described. If the object has no
509 such method then, the value must be a dictionary and it will be added to the
510 object's :attr:`__dict__` attribute.
Georg Brandl116aa622007-08-15 14:28:22 +0000511
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000512* Optionally, an iterator (and not a sequence) yielding successive items. These
513 items will be appended to the object either using ``obj.append(item)`` or, in
514 batch, using ``obj.extend(list_of_items)``. This is primarily used for list
515 subclasses, but may be used by other classes as long as they have
Georg Brandl116aa622007-08-15 14:28:22 +0000516 :meth:`append` and :meth:`extend` methods with the appropriate signature.
517 (Whether :meth:`append` or :meth:`extend` is used depends on which pickle
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000518 protocol version is used as well as the number of items to append, so both
519 must be supported.)
Georg Brandl116aa622007-08-15 14:28:22 +0000520
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000521* Optionally, an iterator (not a sequence) yielding successive key-value pairs.
522 These items will be stored to the object using ``obj[key] = value``. This is
523 primarily used for dictionary subclasses, but may be used by other classes as
524 long as they implement :meth:`__setitem__`.
Georg Brandl116aa622007-08-15 14:28:22 +0000525
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000526.. index:: single: __reduce_ex__() (copy protocol)
Georg Brandl116aa622007-08-15 14:28:22 +0000527
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000528Alternatively, a :meth:`__reduce_ex__` method may be defined. The only
529difference is this method should take a single integer argument, the protocol
530version. When defined, pickle will prefer it over the :meth:`__reduce__`
531method. In addition, :meth:`__reduce__` automatically becomes a synonym for the
532extended version. The main use for this method is to provide
533backwards-compatible reduce values for older Python releases.
Georg Brandl116aa622007-08-15 14:28:22 +0000534
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000535.. _pickle-persistent:
536
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000537Persistence of External Objects
538^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Georg Brandl116aa622007-08-15 14:28:22 +0000539
Christian Heimes05e8be12008-02-23 18:30:17 +0000540.. index::
541 single: persistent_id (pickle protocol)
542 single: persistent_load (pickle protocol)
543
Georg Brandl116aa622007-08-15 14:28:22 +0000544For the benefit of object persistence, the :mod:`pickle` module supports the
545notion of a reference to an object outside the pickled data stream. Such
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000546objects are referenced by a persistent ID, which should be either a string of
547alphanumeric characters (for protocol 0) [#]_ or just an arbitrary object (for
548any newer protocol).
Georg Brandl116aa622007-08-15 14:28:22 +0000549
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000550The resolution of such persistent IDs is not defined by the :mod:`pickle`
551module; it will delegate this resolution to the user defined methods on the
552pickler and unpickler, :meth:`persistent_id` and :meth:`persistent_load`
553respectively.
Georg Brandl116aa622007-08-15 14:28:22 +0000554
555To pickle objects that have an external persistent id, the pickler must have a
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000556custom :meth:`persistent_id` method that takes an object as an argument and
Georg Brandl116aa622007-08-15 14:28:22 +0000557returns either ``None`` or the persistent id for that object. When ``None`` is
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000558returned, the pickler simply pickles the object as normal. When a persistent ID
559string is returned, the pickler will pickle that object, along with a marker so
560that the unpickler will recognize it as a persistent ID.
Georg Brandl116aa622007-08-15 14:28:22 +0000561
562To unpickle external objects, the unpickler must have a custom
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000563:meth:`persistent_load` method that takes a persistent ID object and returns the
564referenced object.
Georg Brandl116aa622007-08-15 14:28:22 +0000565
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000566Here is a comprehensive example presenting how persistent ID can be used to
567pickle external objects by reference.
Georg Brandl116aa622007-08-15 14:28:22 +0000568
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000569.. literalinclude:: ../includes/dbpickle.py
Alexandre Vassalottibcd1e3a2009-01-23 05:28:16 +0000570
Georg Brandl116aa622007-08-15 14:28:22 +0000571
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000572.. _pickle-state:
573
574Handling Stateful Objects
575^^^^^^^^^^^^^^^^^^^^^^^^^
576
577.. index::
578 single: __getstate__() (copy protocol)
579 single: __setstate__() (copy protocol)
580
581Here's an example that shows how to modify pickling behavior for a class.
582The :class:`TextReader` class opens a text file, and returns the line number and
583line contents each time its :meth:`readline` method is called. If a
584:class:`TextReader` instance is pickled, all attributes *except* the file object
585member are saved. When the instance is unpickled, the file is reopened, and
586reading resumes from the last location. The :meth:`__setstate__` and
587:meth:`__getstate__` methods are used to implement this behavior. ::
588
589 class TextReader:
590 """Print and number lines in a text file."""
591
592 def __init__(self, filename):
593 self.filename = filename
594 self.file = open(filename)
595 self.lineno = 0
596
597 def readline(self):
598 self.lineno += 1
599 line = self.file.readline()
600 if not line:
601 return None
Alexandre Vassalotti9d7665d2009-04-03 06:13:29 +0000602 if line.endswith('\n'):
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000603 line = line[:-1]
604 return "%i: %s" % (self.lineno, line)
605
606 def __getstate__(self):
607 # Copy the object's state from self.__dict__ which contains
608 # all our instance attributes. Always use the dict.copy()
609 # method to avoid modifying the original state.
610 state = self.__dict__.copy()
611 # Remove the unpicklable entries.
612 del state['file']
613 return state
614
615 def __setstate__(self, state):
616 # Restore instance attributes (i.e., filename and lineno).
617 self.__dict__.update(state)
618 # Restore the previously opened file's state. To do so, we need to
619 # reopen it and read from it until the line count is restored.
620 file = open(self.filename)
621 for _ in range(self.lineno):
622 file.readline()
623 # Finally, save the file.
624 self.file = file
625
626
627A sample usage might be something like this::
628
629 >>> reader = TextReader("hello.txt")
630 >>> reader.readline()
631 '1: Hello world!'
632 >>> reader.readline()
633 '2: I am line number two.'
634 >>> new_reader = pickle.loads(pickle.dumps(reader))
635 >>> new_reader.readline()
636 '3: Goodbye!'
637
638
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000639.. _pickle-restrict:
Georg Brandl116aa622007-08-15 14:28:22 +0000640
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000641Restricting Globals
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000642-------------------
Georg Brandl116aa622007-08-15 14:28:22 +0000643
Christian Heimes05e8be12008-02-23 18:30:17 +0000644.. index::
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000645 single: find_class() (pickle protocol)
Christian Heimes05e8be12008-02-23 18:30:17 +0000646
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000647By default, unpickling will import any class or function that it finds in the
648pickle data. For many applications, this behaviour is unacceptable as it
649permits the unpickler to import and invoke arbitrary code. Just consider what
650this hand-crafted pickle data stream does when loaded::
Georg Brandl116aa622007-08-15 14:28:22 +0000651
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000652 >>> import pickle
653 >>> pickle.loads(b"cos\nsystem\n(S'echo hello world'\ntR.")
654 hello world
655 0
Georg Brandl116aa622007-08-15 14:28:22 +0000656
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000657In this example, the unpickler imports the :func:`os.system` function and then
658apply the string argument "echo hello world". Although this example is
659inoffensive, it is not difficult to imagine one that could damage your system.
Georg Brandl116aa622007-08-15 14:28:22 +0000660
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000661For this reason, you may want to control what gets unpickled by customizing
662:meth:`Unpickler.find_class`. Unlike its name suggests, :meth:`find_class` is
663called whenever a global (i.e., a class or a function) is requested. Thus it is
664possible to either forbid completely globals or restrict them to a safe subset.
665
666Here is an example of an unpickler allowing only few safe classes from the
667:mod:`builtins` module to be loaded::
668
669 import builtins
670 import io
671 import pickle
672
673 safe_builtins = {
674 'range',
675 'complex',
676 'set',
677 'frozenset',
678 'slice',
679 }
680
681 class RestrictedUnpickler(pickle.Unpickler):
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000682
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000683 def find_class(self, module, name):
684 # Only allow safe classes from builtins.
685 if module == "builtins" and name in safe_builtins:
686 return getattr(builtins, name)
687 # Forbid everything else.
688 raise pickle.UnpicklingError("global '%s.%s' is forbidden" %
689 (module, name))
690
691 def restricted_loads(s):
692 """Helper function analogous to pickle.loads()."""
693 return RestrictedUnpickler(io.BytesIO(s)).load()
694
695A sample usage of our unpickler working has intended::
696
697 >>> restricted_loads(pickle.dumps([1, 2, range(15)]))
698 [1, 2, range(0, 15)]
699 >>> restricted_loads(b"cos\nsystem\n(S'echo hello world'\ntR.")
700 Traceback (most recent call last):
701 ...
702 pickle.UnpicklingError: global 'os.system' is forbidden
703 >>> restricted_loads(b'cbuiltins\neval\n'
704 ... b'(S\'getattr(__import__("os"), "system")'
705 ... b'("echo hello world")\'\ntR.')
706 Traceback (most recent call last):
707 ...
708 pickle.UnpicklingError: global 'builtins.eval' is forbidden
709
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000710
711.. XXX Add note about how extension codes could evade our protection
Georg Brandl48310cd2009-01-03 21:18:54 +0000712 mechanism (e.g. cached classes do not invokes find_class()).
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000713
714As our examples shows, you have to be careful with what you allow to be
715unpickled. Therefore if security is a concern, you may want to consider
Alexandre Vassalotti9d7665d2009-04-03 06:13:29 +0000716alternatives such as the marshalling API in :mod:`xmlrpc.client` or
717third-party solutions.
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000718
Georg Brandl116aa622007-08-15 14:28:22 +0000719
720.. _pickle-example:
721
Alexandre Vassalotti9d7665d2009-04-03 06:13:29 +0000722Examples
723--------
Georg Brandl116aa622007-08-15 14:28:22 +0000724
Alexandre Vassalotti9d7665d2009-04-03 06:13:29 +0000725For the simplest code, use the :func:`dump` and :func:`load` functions. ::
Georg Brandl116aa622007-08-15 14:28:22 +0000726
727 import pickle
728
Alexandre Vassalottibcd1e3a2009-01-23 05:28:16 +0000729 # An arbitrary collection of objects supported by pickle.
730 data = {
Alexandre Vassalotti9d7665d2009-04-03 06:13:29 +0000731 'a': [1, 2.0, 3, 4+6j],
732 'b': ("character string", b"byte string"),
733 'c': set([None, True, False])
Alexandre Vassalottibcd1e3a2009-01-23 05:28:16 +0000734 }
Georg Brandl116aa622007-08-15 14:28:22 +0000735
Alexandre Vassalottibcd1e3a2009-01-23 05:28:16 +0000736 with open('data.pickle', 'wb') as f:
737 # Pickle the 'data' dictionary using the highest protocol available.
738 pickle.dump(data, f, pickle.HIGHEST_PROTOCOL)
Georg Brandl116aa622007-08-15 14:28:22 +0000739
Georg Brandl116aa622007-08-15 14:28:22 +0000740
Alexandre Vassalottibcd1e3a2009-01-23 05:28:16 +0000741The following example reads the resulting pickled data. ::
Georg Brandl116aa622007-08-15 14:28:22 +0000742
Alexandre Vassalottibcd1e3a2009-01-23 05:28:16 +0000743 import pickle
Georg Brandl116aa622007-08-15 14:28:22 +0000744
Alexandre Vassalottibcd1e3a2009-01-23 05:28:16 +0000745 with open('data.pickle', 'rb') as f:
746 # The protocol version used is detected automatically, so we do not
747 # have to specify it.
748 data = pickle.load(f)
Georg Brandl116aa622007-08-15 14:28:22 +0000749
Georg Brandl116aa622007-08-15 14:28:22 +0000750
Alexandre Vassalotti9d7665d2009-04-03 06:13:29 +0000751.. XXX: Add examples showing how to optimize pickles for size (like using
752.. pickletools.optimize() or the gzip module).
753
754
Georg Brandl116aa622007-08-15 14:28:22 +0000755.. seealso::
756
Alexandre Vassalottif7fa63d2008-05-11 08:55:36 +0000757 Module :mod:`copyreg`
Georg Brandl116aa622007-08-15 14:28:22 +0000758 Pickle interface constructor registration for extension types.
759
Alexandre Vassalotti9d7665d2009-04-03 06:13:29 +0000760 Module :mod:`pickletools`
761 Tools for working with and analyzing pickled data.
762
Georg Brandl116aa622007-08-15 14:28:22 +0000763 Module :mod:`shelve`
764 Indexed databases of objects; uses :mod:`pickle`.
765
766 Module :mod:`copy`
767 Shallow and deep object copying.
768
769 Module :mod:`marshal`
770 High-performance serialization of built-in types.
771
772
Georg Brandl116aa622007-08-15 14:28:22 +0000773.. rubric:: Footnotes
774
775.. [#] Don't confuse this with the :mod:`marshal` module
776
Georg Brandl116aa622007-08-15 14:28:22 +0000777.. [#] The exception raised will likely be an :exc:`ImportError` or an
778 :exc:`AttributeError` but it could be something else.
779
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000780.. [#] The :mod:`copy` module uses this protocol for shallow and deep copying
781 operations.
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000782
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000783.. [#] The limitation on alphanumeric characters is due to the fact
784 the persistent IDs, in protocol 0, are delimited by the newline
785 character. Therefore if any kind of newline characters occurs in
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000786 persistent IDs, the resulting pickle will become unreadable.