blob: 8157a525fe421d006bc2e5a2815c404406dc6859 [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001:mod:`pickle` --- Python object serialization
2=============================================
3
4.. index::
5 single: persistence
6 pair: persistent; objects
7 pair: serializing; objects
8 pair: marshalling; objects
9 pair: flattening; objects
10 pair: pickling; objects
11
12.. module:: pickle
13 :synopsis: Convert Python objects to streams of bytes and back.
Christian Heimes5b5e81c2007-12-31 16:14:33 +000014.. sectionauthor:: Jim Kerr <jbkerr@sr.hp.com>.
15.. sectionauthor:: Barry Warsaw <barry@zope.com>
Georg Brandl116aa622007-08-15 14:28:22 +000016
Alexandre Vassalotti9d7665d2009-04-03 06:13:29 +000017
Georg Brandl116aa622007-08-15 14:28:22 +000018The :mod:`pickle` module implements a fundamental, but powerful algorithm for
19serializing and de-serializing a Python object structure. "Pickling" is the
20process whereby a Python object hierarchy is converted into a byte stream, and
21"unpickling" is the inverse operation, whereby a byte stream is converted back
22into an object hierarchy. Pickling (and unpickling) is alternatively known as
23"serialization", "marshalling," [#]_ or "flattening", however, to avoid
Benjamin Petersonbe149d02008-06-20 21:03:22 +000024confusion, the terms used here are "pickling" and "unpickling"..
Georg Brandl116aa622007-08-15 14:28:22 +000025
26
27Relationship to other Python modules
28------------------------------------
29
Benjamin Petersonbe149d02008-06-20 21:03:22 +000030The :mod:`pickle` module has an transparent optimizer (:mod:`_pickle`) written
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +000031in C. It is used whenever available. Otherwise the pure Python implementation is
Benjamin Petersonbe149d02008-06-20 21:03:22 +000032used.
Georg Brandl116aa622007-08-15 14:28:22 +000033
34Python has a more primitive serialization module called :mod:`marshal`, but in
35general :mod:`pickle` should always be the preferred way to serialize Python
36objects. :mod:`marshal` exists primarily to support Python's :file:`.pyc`
37files.
38
39The :mod:`pickle` module differs from :mod:`marshal` several significant ways:
40
41* The :mod:`pickle` module keeps track of the objects it has already serialized,
42 so that later references to the same object won't be serialized again.
43 :mod:`marshal` doesn't do this.
44
45 This has implications both for recursive objects and object sharing. Recursive
46 objects are objects that contain references to themselves. These are not
47 handled by marshal, and in fact, attempting to marshal recursive objects will
48 crash your Python interpreter. Object sharing happens when there are multiple
49 references to the same object in different places in the object hierarchy being
50 serialized. :mod:`pickle` stores such objects only once, and ensures that all
51 other references point to the master copy. Shared objects remain shared, which
52 can be very important for mutable objects.
53
54* :mod:`marshal` cannot be used to serialize user-defined classes and their
55 instances. :mod:`pickle` can save and restore class instances transparently,
56 however the class definition must be importable and live in the same module as
57 when the object was stored.
58
59* The :mod:`marshal` serialization format is not guaranteed to be portable
60 across Python versions. Because its primary job in life is to support
61 :file:`.pyc` files, the Python implementers reserve the right to change the
62 serialization format in non-backwards compatible ways should the need arise.
63 The :mod:`pickle` serialization format is guaranteed to be backwards compatible
64 across Python releases.
65
66.. warning::
67
68 The :mod:`pickle` module is not intended to be secure against erroneous or
Georg Brandle720c0a2009-04-27 16:20:50 +000069 maliciously constructed data. Never unpickle data received from an untrusted
70 or unauthenticated source.
Georg Brandl116aa622007-08-15 14:28:22 +000071
72Note that serialization is a more primitive notion than persistence; although
73:mod:`pickle` reads and writes file objects, it does not handle the issue of
74naming persistent objects, nor the (even more complicated) issue of concurrent
75access to persistent objects. The :mod:`pickle` module can transform a complex
76object into a byte stream and it can transform the byte stream into an object
77with the same internal structure. Perhaps the most obvious thing to do with
78these byte streams is to write them onto a file, but it is also conceivable to
79send them across a network or store them in a database. The module
80:mod:`shelve` provides a simple interface to pickle and unpickle objects on
81DBM-style database files.
82
83
84Data stream format
85------------------
86
87.. index::
88 single: XDR
89 single: External Data Representation
90
91The data format used by :mod:`pickle` is Python-specific. This has the
92advantage that there are no restrictions imposed by external standards such as
93XDR (which can't represent pointer sharing); however it means that non-Python
94programs may not be able to reconstruct pickled Python objects.
95
Alexandre Vassalotti758bca62008-10-18 19:25:07 +000096By default, the :mod:`pickle` data format uses a compact binary representation.
97The module :mod:`pickletools` contains tools for analyzing data streams
98generated by :mod:`pickle`.
Georg Brandl116aa622007-08-15 14:28:22 +000099
Georg Brandl42f2ae02008-04-06 08:39:37 +0000100There are currently 4 different protocols which can be used for pickling.
Georg Brandl116aa622007-08-15 14:28:22 +0000101
Alexandre Vassalottif7d08c72009-01-23 04:50:05 +0000102* Protocol version 0 is the original human-readable protocol and is
103 backwards compatible with earlier versions of Python.
Georg Brandl116aa622007-08-15 14:28:22 +0000104
105* Protocol version 1 is the old binary format which is also compatible with
106 earlier versions of Python.
107
108* Protocol version 2 was introduced in Python 2.3. It provides much more
Georg Brandl9afde1c2007-11-01 20:32:30 +0000109 efficient pickling of :term:`new-style class`\es.
Georg Brandl116aa622007-08-15 14:28:22 +0000110
Georg Brandl42f2ae02008-04-06 08:39:37 +0000111* Protocol version 3 was added in Python 3.0. It has explicit support for
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000112 bytes and cannot be unpickled by Python 2.x pickle modules. This is
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000113 the current recommended protocol, use it whenever it is possible.
Georg Brandl42f2ae02008-04-06 08:39:37 +0000114
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000115Refer to :pep:`307` for information about improvements brought by
116protocol 2. See :mod:`pickletools`'s source code for extensive
117comments about opcodes used by pickle protocols.
Georg Brandl116aa622007-08-15 14:28:22 +0000118
Georg Brandl116aa622007-08-15 14:28:22 +0000119
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000120Module Interface
121----------------
Georg Brandl116aa622007-08-15 14:28:22 +0000122
123To serialize an object hierarchy, you first create a pickler, then you call the
124pickler's :meth:`dump` method. To de-serialize a data stream, you first create
125an unpickler, then you call the unpickler's :meth:`load` method. The
126:mod:`pickle` module provides the following constant:
127
128
129.. data:: HIGHEST_PROTOCOL
130
131 The highest protocol version available. This value can be passed as a
132 *protocol* value.
133
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000134.. data:: DEFAULT_PROTOCOL
135
136 The default protocol used for pickling. May be less than HIGHEST_PROTOCOL.
137 Currently the default protocol is 3; a backward-incompatible protocol
138 designed for Python 3.0.
139
140
Georg Brandl116aa622007-08-15 14:28:22 +0000141The :mod:`pickle` module provides the following functions to make the pickling
142process more convenient:
143
Antoine Pitroud9dfaa92009-06-04 20:32:06 +0000144.. function:: dump(obj, file[, protocol, \*, fix_imports=True])
Georg Brandl116aa622007-08-15 14:28:22 +0000145
Antoine Pitrou25d535e2010-09-15 11:25:11 +0000146 Write a pickled representation of *obj* to the open :term:`file object` *file*.
147 This is equivalent to ``Pickler(file, protocol).dump(obj)``.
Georg Brandl116aa622007-08-15 14:28:22 +0000148
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000149 The optional *protocol* argument tells the pickler to use the given protocol;
150 supported protocols are 0, 1, 2, 3. The default protocol is 3; a
151 backward-incompatible protocol designed for Python 3.0.
Georg Brandl116aa622007-08-15 14:28:22 +0000152
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000153 Specifying a negative protocol version selects the highest protocol version
154 supported. The higher the protocol used, the more recent the version of
155 Python needed to read the pickle produced.
Georg Brandl116aa622007-08-15 14:28:22 +0000156
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000157 The *file* argument must have a write() method that accepts a single bytes
Antoine Pitrou25d535e2010-09-15 11:25:11 +0000158 argument. It can thus be an on-disk file opened for binary writing, a
159 :class:`io.BytesIO` instance, or any other custom object that meets this
160 interface.
Georg Brandl116aa622007-08-15 14:28:22 +0000161
Antoine Pitroud9dfaa92009-06-04 20:32:06 +0000162 If *fix_imports* is True and *protocol* is less than 3, pickle will try to
163 map the new Python 3.x names to the old module names used in Python 2.x,
164 so that the pickle data stream is readable with Python 2.x.
165
166.. function:: dumps(obj[, protocol, \*, fix_imports=True])
Georg Brandl116aa622007-08-15 14:28:22 +0000167
Mark Summerfieldb9e23042008-04-21 14:47:45 +0000168 Return the pickled representation of the object as a :class:`bytes`
169 object, instead of writing it to a file.
Georg Brandl116aa622007-08-15 14:28:22 +0000170
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000171 The optional *protocol* argument tells the pickler to use the given protocol;
172 supported protocols are 0, 1, 2, 3. The default protocol is 3; a
173 backward-incompatible protocol designed for Python 3.0.
174
175 Specifying a negative protocol version selects the highest protocol version
176 supported. The higher the protocol used, the more recent the version of
177 Python needed to read the pickle produced.
178
Antoine Pitroud9dfaa92009-06-04 20:32:06 +0000179 If *fix_imports* is True and *protocol* is less than 3, pickle will try to
180 map the new Python 3.x names to the old module names used in Python 2.x,
181 so that the pickle data stream is readable with Python 2.x.
182
183.. function:: load(file, [\*, fix_imports=True, encoding="ASCII", errors="strict"])
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000184
Antoine Pitrou25d535e2010-09-15 11:25:11 +0000185 Read a pickled object representation from the open :term:`file object` *file*
186 and return the reconstituted object hierarchy specified therein. This is
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000187 equivalent to ``Unpickler(file).load()``.
188
189 The protocol version of the pickle is detected automatically, so no protocol
190 argument is needed. Bytes past the pickled object's representation are
191 ignored.
192
193 The argument *file* must have two methods, a read() method that takes an
194 integer argument, and a readline() method that requires no arguments. Both
Antoine Pitrou25d535e2010-09-15 11:25:11 +0000195 methods should return bytes. Thus *file* can be an on-disk file opened
196 for binary reading, a :class:`io.BytesIO` object, or any other custom object
197 that meets this interface.
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000198
Antoine Pitroud9dfaa92009-06-04 20:32:06 +0000199 Optional keyword arguments are *fix_imports*, *encoding* and *errors*,
200 which are used to control compatiblity support for pickle stream generated
201 by Python 2.x. If *fix_imports* is True, pickle will try to map the old
202 Python 2.x names to the new names used in Python 3.x. The *encoding* and
203 *errors* tell pickle how to decode 8-bit string instances pickled by Python
204 2.x; these default to 'ASCII' and 'strict', respectively.
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000205
Antoine Pitroud9dfaa92009-06-04 20:32:06 +0000206.. function:: loads(bytes_object, [\*, fix_imports=True, encoding="ASCII", errors="strict"])
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000207
208 Read a pickled object hierarchy from a :class:`bytes` object and return the
209 reconstituted object hierarchy specified therein
210
211 The protocol version of the pickle is detected automatically, so no protocol
212 argument is needed. Bytes past the pickled object's representation are
213 ignored.
214
Antoine Pitroud9dfaa92009-06-04 20:32:06 +0000215 Optional keyword arguments are *fix_imports*, *encoding* and *errors*,
216 which are used to control compatiblity support for pickle stream generated
217 by Python 2.x. If *fix_imports* is True, pickle will try to map the old
218 Python 2.x names to the new names used in Python 3.x. The *encoding* and
219 *errors* tell pickle how to decode 8-bit string instances pickled by Python
220 2.x; these default to 'ASCII' and 'strict', respectively.
Georg Brandl116aa622007-08-15 14:28:22 +0000221
Georg Brandl116aa622007-08-15 14:28:22 +0000222
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000223The :mod:`pickle` module defines three exceptions:
Georg Brandl116aa622007-08-15 14:28:22 +0000224
225.. exception:: PickleError
226
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000227 Common base class for the other pickling exceptions. It inherits
Georg Brandl116aa622007-08-15 14:28:22 +0000228 :exc:`Exception`.
229
Georg Brandl116aa622007-08-15 14:28:22 +0000230.. exception:: PicklingError
231
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000232 Error raised when an unpicklable object is encountered by :class:`Pickler`.
233 It inherits :exc:`PickleError`.
Georg Brandl116aa622007-08-15 14:28:22 +0000234
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000235 Refer to :ref:`pickle-picklable` to learn what kinds of objects can be
236 pickled.
237
Georg Brandl116aa622007-08-15 14:28:22 +0000238.. exception:: UnpicklingError
239
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000240 Error raised when there a problem unpickling an object, such as a data
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000241 corruption or a security violation. It inherits :exc:`PickleError`.
Georg Brandl116aa622007-08-15 14:28:22 +0000242
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000243 Note that other exceptions may also be raised during unpickling, including
244 (but not necessarily limited to) AttributeError, EOFError, ImportError, and
245 IndexError.
246
247
248The :mod:`pickle` module exports two classes, :class:`Pickler` and
Georg Brandl116aa622007-08-15 14:28:22 +0000249:class:`Unpickler`:
250
Antoine Pitroud9dfaa92009-06-04 20:32:06 +0000251.. class:: Pickler(file[, protocol, \*, fix_imports=True])
Georg Brandl116aa622007-08-15 14:28:22 +0000252
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000253 This takes a binary file for writing a pickle data stream.
Georg Brandl116aa622007-08-15 14:28:22 +0000254
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000255 The optional *protocol* argument tells the pickler to use the given protocol;
256 supported protocols are 0, 1, 2, 3. The default protocol is 3; a
257 backward-incompatible protocol designed for Python 3.0.
Georg Brandl116aa622007-08-15 14:28:22 +0000258
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000259 Specifying a negative protocol version selects the highest protocol version
260 supported. The higher the protocol used, the more recent the version of
261 Python needed to read the pickle produced.
Georg Brandl116aa622007-08-15 14:28:22 +0000262
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000263 The *file* argument must have a write() method that accepts a single bytes
Antoine Pitrou25d535e2010-09-15 11:25:11 +0000264 argument. It can thus be an on-disk file opened for binary writing, a
265 :class:`io.BytesIO` instance, or any other custom object that meets this interface.
Georg Brandl116aa622007-08-15 14:28:22 +0000266
Antoine Pitroud9dfaa92009-06-04 20:32:06 +0000267 If *fix_imports* is True and *protocol* is less than 3, pickle will try to
268 map the new Python 3.x names to the old module names used in Python 2.x,
269 so that the pickle data stream is readable with Python 2.x.
270
Benjamin Petersone41251e2008-04-25 01:59:09 +0000271 .. method:: dump(obj)
Georg Brandl116aa622007-08-15 14:28:22 +0000272
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000273 Write a pickled representation of *obj* to the open file object given in
274 the constructor.
Georg Brandl116aa622007-08-15 14:28:22 +0000275
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000276 .. method:: persistent_id(obj)
277
278 Do nothing by default. This exists so a subclass can override it.
279
280 If :meth:`persistent_id` returns ``None``, *obj* is pickled as usual. Any
281 other value causes :class:`Pickler` to emit the returned value as a
282 persistent ID for *obj*. The meaning of this persistent ID should be
283 defined by :meth:`Unpickler.persistent_load`. Note that the value
284 returned by :meth:`persistent_id` cannot itself have a persistent ID.
285
286 See :ref:`pickle-persistent` for details and examples of uses.
Georg Brandl116aa622007-08-15 14:28:22 +0000287
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000288 .. attribute:: fast
289
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000290 Deprecated. Enable fast mode if set to a true value. The fast mode
291 disables the usage of memo, therefore speeding the pickling process by not
292 generating superfluous PUT opcodes. It should not be used with
293 self-referential objects, doing otherwise will cause :class:`Pickler` to
294 recurse infinitely.
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000295
296 Use :func:`pickletools.optimize` if you need more compact pickles.
297
Georg Brandl116aa622007-08-15 14:28:22 +0000298
Antoine Pitroud9dfaa92009-06-04 20:32:06 +0000299.. class:: Unpickler(file, [\*, fix_imports=True, encoding="ASCII", errors="strict"])
Georg Brandl116aa622007-08-15 14:28:22 +0000300
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000301 This takes a binary file for reading a pickle data stream.
Georg Brandl116aa622007-08-15 14:28:22 +0000302
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000303 The protocol version of the pickle is detected automatically, so no
304 protocol argument is needed.
305
306 The argument *file* must have two methods, a read() method that takes an
307 integer argument, and a readline() method that requires no arguments. Both
Antoine Pitrou25d535e2010-09-15 11:25:11 +0000308 methods should return bytes. Thus *file* can be an on-disk file object opened
309 for binary reading, a :class:`io.BytesIO` object, or any other custom object
310 that meets this interface.
Georg Brandl116aa622007-08-15 14:28:22 +0000311
Antoine Pitroud9dfaa92009-06-04 20:32:06 +0000312 Optional keyword arguments are *fix_imports*, *encoding* and *errors*,
313 which are used to control compatiblity support for pickle stream generated
314 by Python 2.x. If *fix_imports* is True, pickle will try to map the old
315 Python 2.x names to the new names used in Python 3.x. The *encoding* and
316 *errors* tell pickle how to decode 8-bit string instances pickled by Python
317 2.x; these default to 'ASCII' and 'strict', respectively.
Georg Brandl116aa622007-08-15 14:28:22 +0000318
Benjamin Petersone41251e2008-04-25 01:59:09 +0000319 .. method:: load()
Georg Brandl116aa622007-08-15 14:28:22 +0000320
Benjamin Petersone41251e2008-04-25 01:59:09 +0000321 Read a pickled object representation from the open file object given in
322 the constructor, and return the reconstituted object hierarchy specified
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000323 therein. Bytes past the pickled object's representation are ignored.
Georg Brandl116aa622007-08-15 14:28:22 +0000324
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000325 .. method:: persistent_load(pid)
Georg Brandl116aa622007-08-15 14:28:22 +0000326
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000327 Raise an :exc:`UnpickingError` by default.
Georg Brandl116aa622007-08-15 14:28:22 +0000328
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000329 If defined, :meth:`persistent_load` should return the object specified by
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000330 the persistent ID *pid*. If an invalid persistent ID is encountered, an
331 :exc:`UnpickingError` should be raised.
Georg Brandl116aa622007-08-15 14:28:22 +0000332
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000333 See :ref:`pickle-persistent` for details and examples of uses.
334
335 .. method:: find_class(module, name)
336
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000337 Import *module* if necessary and return the object called *name* from it,
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000338 where the *module* and *name* arguments are :class:`str` objects. Note,
339 unlike its name suggests, :meth:`find_class` is also used for finding
340 functions.
Georg Brandl116aa622007-08-15 14:28:22 +0000341
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000342 Subclasses may override this to gain control over what type of objects and
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000343 how they can be loaded, potentially reducing security risks. Refer to
344 :ref:`pickle-restrict` for details.
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000345
346
347.. _pickle-picklable:
Georg Brandl116aa622007-08-15 14:28:22 +0000348
349What can be pickled and unpickled?
350----------------------------------
351
352The following types can be pickled:
353
354* ``None``, ``True``, and ``False``
355
Georg Brandlba956ae2007-11-29 17:24:34 +0000356* integers, floating point numbers, complex numbers
Georg Brandl116aa622007-08-15 14:28:22 +0000357
Georg Brandlf6945182008-02-01 11:56:49 +0000358* strings, bytes, bytearrays
Georg Brandl116aa622007-08-15 14:28:22 +0000359
360* tuples, lists, sets, and dictionaries containing only picklable objects
361
362* functions defined at the top level of a module
363
364* built-in functions defined at the top level of a module
365
366* classes that are defined at the top level of a module
367
368* instances of such classes whose :attr:`__dict__` or :meth:`__setstate__` is
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000369 picklable (see section :ref:`pickle-inst` for details)
Georg Brandl116aa622007-08-15 14:28:22 +0000370
371Attempts to pickle unpicklable objects will raise the :exc:`PicklingError`
372exception; when this happens, an unspecified number of bytes may have already
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000373been written to the underlying file. Trying to pickle a highly recursive data
Georg Brandl116aa622007-08-15 14:28:22 +0000374structure may exceed the maximum recursion depth, a :exc:`RuntimeError` will be
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000375raised in this case. You can carefully raise this limit with
Georg Brandl116aa622007-08-15 14:28:22 +0000376:func:`sys.setrecursionlimit`.
377
378Note that functions (built-in and user-defined) are pickled by "fully qualified"
379name reference, not by value. This means that only the function name is
380pickled, along with the name of module the function is defined in. Neither the
381function's code, nor any of its function attributes are pickled. Thus the
382defining module must be importable in the unpickling environment, and the module
383must contain the named object, otherwise an exception will be raised. [#]_
384
385Similarly, classes are pickled by named reference, so the same restrictions in
386the unpickling environment apply. Note that none of the class's code or data is
387pickled, so in the following example the class attribute ``attr`` is not
388restored in the unpickling environment::
389
390 class Foo:
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000391 attr = 'A class attribute'
Georg Brandl116aa622007-08-15 14:28:22 +0000392
393 picklestring = pickle.dumps(Foo)
394
395These restrictions are why picklable functions and classes must be defined in
396the top level of a module.
397
398Similarly, when class instances are pickled, their class's code and data are not
399pickled along with them. Only the instance data are pickled. This is done on
400purpose, so you can fix bugs in a class or add methods to the class and still
401load objects that were created with an earlier version of the class. If you
402plan to have long-lived objects that will see many versions of a class, it may
403be worthwhile to put a version number in the objects so that suitable
404conversions can be made by the class's :meth:`__setstate__` method.
405
406
Georg Brandl116aa622007-08-15 14:28:22 +0000407.. _pickle-inst:
408
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000409Pickling Class Instances
410------------------------
Georg Brandl116aa622007-08-15 14:28:22 +0000411
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000412In this section, we describe the general mechanisms available to you to define,
413customize, and control how class instances are pickled and unpickled.
Georg Brandl116aa622007-08-15 14:28:22 +0000414
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000415In most cases, no additional code is needed to make instances picklable. By
416default, pickle will retrieve the class and the attributes of an instance via
417introspection. When a class instance is unpickled, its :meth:`__init__` method
418is usually *not* invoked. The default behaviour first creates an uninitialized
419instance and then restores the saved attributes. The following code shows an
420implementation of this behaviour::
Georg Brandl85eb8c12007-08-31 16:33:38 +0000421
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000422 def save(obj):
423 return (obj.__class__, obj.__dict__)
424
425 def load(cls, attributes):
426 obj = cls.__new__(cls)
427 obj.__dict__.update(attributes)
428 return obj
Georg Brandl116aa622007-08-15 14:28:22 +0000429
430.. index:: single: __getnewargs__() (copy protocol)
431
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000432Classes can alter the default behaviour by providing one or severals special
433methods. In protocol 2 and newer, classes that implements the
434:meth:`__getnewargs__` method can dictate the values passed to the
435:meth:`__new__` method upon unpickling. This is often needed for classes
436whose :meth:`__new__` method requires arguments.
Georg Brandl116aa622007-08-15 14:28:22 +0000437
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000438.. index:: single: __getstate__() (copy protocol)
Georg Brandl116aa622007-08-15 14:28:22 +0000439
440Classes can further influence how their instances are pickled; if the class
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000441defines the method :meth:`__getstate__`, it is called and the returned object is
Georg Brandl116aa622007-08-15 14:28:22 +0000442pickled as the contents for the instance, instead of the contents of the
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000443instance's dictionary. If the :meth:`__getstate__` method is absent, the
444instance's :attr:`__dict__` is pickled as usual.
Georg Brandl116aa622007-08-15 14:28:22 +0000445
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000446.. index:: single: __setstate__() (copy protocol)
Georg Brandl116aa622007-08-15 14:28:22 +0000447
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000448Upon unpickling, if the class defines :meth:`__setstate__`, it is called with
449the unpickled state. In that case, there is no requirement for the state object
450to be a dictionary. Otherwise, the pickled state must be a dictionary and its
451items are assigned to the new instance's dictionary.
452
453.. note::
Georg Brandl116aa622007-08-15 14:28:22 +0000454
Georg Brandl23e8db52008-04-07 19:17:06 +0000455 If :meth:`__getstate__` returns a false value, the :meth:`__setstate__`
456 method will not be called.
Georg Brandl116aa622007-08-15 14:28:22 +0000457
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000458Refer to the section :ref:`pickle-state` for more information about how to use
459the methods :meth:`__getstate__` and :meth:`__setstate__`.
Georg Brandl116aa622007-08-15 14:28:22 +0000460
Benjamin Petersond23f8222009-04-05 19:13:16 +0000461.. note::
Georg Brandle720c0a2009-04-27 16:20:50 +0000462
Benjamin Petersond23f8222009-04-05 19:13:16 +0000463 At unpickling time, some methods like :meth:`__getattr__`,
464 :meth:`__getattribute__`, or :meth:`__setattr__` may be called upon the
465 instance. In case those methods rely on some internal invariant being
466 true, the type should implement either :meth:`__getinitargs__` or
467 :meth:`__getnewargs__` to establish such an invariant; otherwise, neither
468 :meth:`__new__` nor :meth:`__init__` will be called.
469
Christian Heimes05e8be12008-02-23 18:30:17 +0000470.. index::
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000471 pair: copy; protocol
472 single: __reduce__() (copy protocol)
Christian Heimes05e8be12008-02-23 18:30:17 +0000473
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000474As we shall see, pickle does not use directly the methods described above. In
475fact, these methods are part of the copy protocol which implements the
476:meth:`__reduce__` special method. The copy protocol provides a unified
477interface for retrieving the data necessary for pickling and copying
Georg Brandl48310cd2009-01-03 21:18:54 +0000478objects. [#]_
Georg Brandl116aa622007-08-15 14:28:22 +0000479
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000480Although powerful, implementing :meth:`__reduce__` directly in your classes is
481error prone. For this reason, class designers should use the high-level
482interface (i.e., :meth:`__getnewargs__`, :meth:`__getstate__` and
Georg Brandlae2dbe22009-03-13 19:04:40 +0000483:meth:`__setstate__`) whenever possible. We will show, however, cases where using
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000484:meth:`__reduce__` is the only option or leads to more efficient pickling or
485both.
Georg Brandl116aa622007-08-15 14:28:22 +0000486
Georg Brandlae2dbe22009-03-13 19:04:40 +0000487The interface is currently defined as follows. The :meth:`__reduce__` method
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000488takes no argument and shall return either a string or preferably a tuple (the
Georg Brandlae2dbe22009-03-13 19:04:40 +0000489returned object is often referred to as the "reduce value").
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000490
491If a string is returned, the string should be interpreted as the name of a
492global variable. It should be the object's local name relative to its module;
493the pickle module searches the module namespace to determine the object's
494module. This behaviour is typically useful for singletons.
495
496When a tuple is returned, it must be between two and five items long. Optional
497items can either be omitted, or ``None`` can be provided as their value. The
498semantics of each item are in order:
499
500.. XXX Mention __newobj__ special-case?
Georg Brandl116aa622007-08-15 14:28:22 +0000501
502* A callable object that will be called to create the initial version of the
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000503 object.
Georg Brandl116aa622007-08-15 14:28:22 +0000504
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000505* A tuple of arguments for the callable object. An empty tuple must be given if
506 the callable does not accept any argument.
Georg Brandl116aa622007-08-15 14:28:22 +0000507
508* Optionally, the object's state, which will be passed to the object's
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000509 :meth:`__setstate__` method as previously described. If the object has no
510 such method then, the value must be a dictionary and it will be added to the
511 object's :attr:`__dict__` attribute.
Georg Brandl116aa622007-08-15 14:28:22 +0000512
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000513* Optionally, an iterator (and not a sequence) yielding successive items. These
514 items will be appended to the object either using ``obj.append(item)`` or, in
515 batch, using ``obj.extend(list_of_items)``. This is primarily used for list
516 subclasses, but may be used by other classes as long as they have
Georg Brandl116aa622007-08-15 14:28:22 +0000517 :meth:`append` and :meth:`extend` methods with the appropriate signature.
518 (Whether :meth:`append` or :meth:`extend` is used depends on which pickle
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000519 protocol version is used as well as the number of items to append, so both
520 must be supported.)
Georg Brandl116aa622007-08-15 14:28:22 +0000521
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000522* Optionally, an iterator (not a sequence) yielding successive key-value pairs.
523 These items will be stored to the object using ``obj[key] = value``. This is
524 primarily used for dictionary subclasses, but may be used by other classes as
525 long as they implement :meth:`__setitem__`.
Georg Brandl116aa622007-08-15 14:28:22 +0000526
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000527.. index:: single: __reduce_ex__() (copy protocol)
Georg Brandl116aa622007-08-15 14:28:22 +0000528
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000529Alternatively, a :meth:`__reduce_ex__` method may be defined. The only
530difference is this method should take a single integer argument, the protocol
531version. When defined, pickle will prefer it over the :meth:`__reduce__`
532method. In addition, :meth:`__reduce__` automatically becomes a synonym for the
533extended version. The main use for this method is to provide
534backwards-compatible reduce values for older Python releases.
Georg Brandl116aa622007-08-15 14:28:22 +0000535
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000536.. _pickle-persistent:
537
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000538Persistence of External Objects
539^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Georg Brandl116aa622007-08-15 14:28:22 +0000540
Christian Heimes05e8be12008-02-23 18:30:17 +0000541.. index::
542 single: persistent_id (pickle protocol)
543 single: persistent_load (pickle protocol)
544
Georg Brandl116aa622007-08-15 14:28:22 +0000545For the benefit of object persistence, the :mod:`pickle` module supports the
546notion of a reference to an object outside the pickled data stream. Such
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000547objects are referenced by a persistent ID, which should be either a string of
548alphanumeric characters (for protocol 0) [#]_ or just an arbitrary object (for
549any newer protocol).
Georg Brandl116aa622007-08-15 14:28:22 +0000550
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000551The resolution of such persistent IDs is not defined by the :mod:`pickle`
552module; it will delegate this resolution to the user defined methods on the
553pickler and unpickler, :meth:`persistent_id` and :meth:`persistent_load`
554respectively.
Georg Brandl116aa622007-08-15 14:28:22 +0000555
556To pickle objects that have an external persistent id, the pickler must have a
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000557custom :meth:`persistent_id` method that takes an object as an argument and
Georg Brandl116aa622007-08-15 14:28:22 +0000558returns either ``None`` or the persistent id for that object. When ``None`` is
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000559returned, the pickler simply pickles the object as normal. When a persistent ID
560string is returned, the pickler will pickle that object, along with a marker so
561that the unpickler will recognize it as a persistent ID.
Georg Brandl116aa622007-08-15 14:28:22 +0000562
563To unpickle external objects, the unpickler must have a custom
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000564:meth:`persistent_load` method that takes a persistent ID object and returns the
565referenced object.
Georg Brandl116aa622007-08-15 14:28:22 +0000566
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000567Here is a comprehensive example presenting how persistent ID can be used to
568pickle external objects by reference.
Georg Brandl116aa622007-08-15 14:28:22 +0000569
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000570.. literalinclude:: ../includes/dbpickle.py
Alexandre Vassalottibcd1e3a2009-01-23 05:28:16 +0000571
Georg Brandl116aa622007-08-15 14:28:22 +0000572
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000573.. _pickle-state:
574
575Handling Stateful Objects
576^^^^^^^^^^^^^^^^^^^^^^^^^
577
578.. index::
579 single: __getstate__() (copy protocol)
580 single: __setstate__() (copy protocol)
581
582Here's an example that shows how to modify pickling behavior for a class.
583The :class:`TextReader` class opens a text file, and returns the line number and
584line contents each time its :meth:`readline` method is called. If a
585:class:`TextReader` instance is pickled, all attributes *except* the file object
586member are saved. When the instance is unpickled, the file is reopened, and
587reading resumes from the last location. The :meth:`__setstate__` and
588:meth:`__getstate__` methods are used to implement this behavior. ::
589
590 class TextReader:
591 """Print and number lines in a text file."""
592
593 def __init__(self, filename):
594 self.filename = filename
595 self.file = open(filename)
596 self.lineno = 0
597
598 def readline(self):
599 self.lineno += 1
600 line = self.file.readline()
601 if not line:
602 return None
Alexandre Vassalotti9d7665d2009-04-03 06:13:29 +0000603 if line.endswith('\n'):
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000604 line = line[:-1]
605 return "%i: %s" % (self.lineno, line)
606
607 def __getstate__(self):
608 # Copy the object's state from self.__dict__ which contains
609 # all our instance attributes. Always use the dict.copy()
610 # method to avoid modifying the original state.
611 state = self.__dict__.copy()
612 # Remove the unpicklable entries.
613 del state['file']
614 return state
615
616 def __setstate__(self, state):
617 # Restore instance attributes (i.e., filename and lineno).
618 self.__dict__.update(state)
619 # Restore the previously opened file's state. To do so, we need to
620 # reopen it and read from it until the line count is restored.
621 file = open(self.filename)
622 for _ in range(self.lineno):
623 file.readline()
624 # Finally, save the file.
625 self.file = file
626
627
628A sample usage might be something like this::
629
630 >>> reader = TextReader("hello.txt")
631 >>> reader.readline()
632 '1: Hello world!'
633 >>> reader.readline()
634 '2: I am line number two.'
635 >>> new_reader = pickle.loads(pickle.dumps(reader))
636 >>> new_reader.readline()
637 '3: Goodbye!'
638
639
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000640.. _pickle-restrict:
Georg Brandl116aa622007-08-15 14:28:22 +0000641
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000642Restricting Globals
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000643-------------------
Georg Brandl116aa622007-08-15 14:28:22 +0000644
Christian Heimes05e8be12008-02-23 18:30:17 +0000645.. index::
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000646 single: find_class() (pickle protocol)
Christian Heimes05e8be12008-02-23 18:30:17 +0000647
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000648By default, unpickling will import any class or function that it finds in the
649pickle data. For many applications, this behaviour is unacceptable as it
650permits the unpickler to import and invoke arbitrary code. Just consider what
651this hand-crafted pickle data stream does when loaded::
Georg Brandl116aa622007-08-15 14:28:22 +0000652
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000653 >>> import pickle
654 >>> pickle.loads(b"cos\nsystem\n(S'echo hello world'\ntR.")
655 hello world
656 0
Georg Brandl116aa622007-08-15 14:28:22 +0000657
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000658In this example, the unpickler imports the :func:`os.system` function and then
659apply the string argument "echo hello world". Although this example is
660inoffensive, it is not difficult to imagine one that could damage your system.
Georg Brandl116aa622007-08-15 14:28:22 +0000661
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000662For this reason, you may want to control what gets unpickled by customizing
663:meth:`Unpickler.find_class`. Unlike its name suggests, :meth:`find_class` is
664called whenever a global (i.e., a class or a function) is requested. Thus it is
665possible to either forbid completely globals or restrict them to a safe subset.
666
667Here is an example of an unpickler allowing only few safe classes from the
668:mod:`builtins` module to be loaded::
669
670 import builtins
671 import io
672 import pickle
673
674 safe_builtins = {
675 'range',
676 'complex',
677 'set',
678 'frozenset',
679 'slice',
680 }
681
682 class RestrictedUnpickler(pickle.Unpickler):
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000683
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000684 def find_class(self, module, name):
685 # Only allow safe classes from builtins.
686 if module == "builtins" and name in safe_builtins:
687 return getattr(builtins, name)
688 # Forbid everything else.
689 raise pickle.UnpicklingError("global '%s.%s' is forbidden" %
690 (module, name))
691
692 def restricted_loads(s):
693 """Helper function analogous to pickle.loads()."""
694 return RestrictedUnpickler(io.BytesIO(s)).load()
695
696A sample usage of our unpickler working has intended::
697
698 >>> restricted_loads(pickle.dumps([1, 2, range(15)]))
699 [1, 2, range(0, 15)]
700 >>> restricted_loads(b"cos\nsystem\n(S'echo hello world'\ntR.")
701 Traceback (most recent call last):
702 ...
703 pickle.UnpicklingError: global 'os.system' is forbidden
704 >>> restricted_loads(b'cbuiltins\neval\n'
705 ... b'(S\'getattr(__import__("os"), "system")'
706 ... b'("echo hello world")\'\ntR.')
707 Traceback (most recent call last):
708 ...
709 pickle.UnpicklingError: global 'builtins.eval' is forbidden
710
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000711
712.. XXX Add note about how extension codes could evade our protection
Georg Brandl48310cd2009-01-03 21:18:54 +0000713 mechanism (e.g. cached classes do not invokes find_class()).
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000714
715As our examples shows, you have to be careful with what you allow to be
716unpickled. Therefore if security is a concern, you may want to consider
Alexandre Vassalotti9d7665d2009-04-03 06:13:29 +0000717alternatives such as the marshalling API in :mod:`xmlrpc.client` or
718third-party solutions.
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000719
Georg Brandl116aa622007-08-15 14:28:22 +0000720
721.. _pickle-example:
722
Alexandre Vassalotti9d7665d2009-04-03 06:13:29 +0000723Examples
724--------
Georg Brandl116aa622007-08-15 14:28:22 +0000725
Alexandre Vassalotti9d7665d2009-04-03 06:13:29 +0000726For the simplest code, use the :func:`dump` and :func:`load` functions. ::
Georg Brandl116aa622007-08-15 14:28:22 +0000727
728 import pickle
729
Alexandre Vassalottibcd1e3a2009-01-23 05:28:16 +0000730 # An arbitrary collection of objects supported by pickle.
731 data = {
Alexandre Vassalotti9d7665d2009-04-03 06:13:29 +0000732 'a': [1, 2.0, 3, 4+6j],
733 'b': ("character string", b"byte string"),
734 'c': set([None, True, False])
Alexandre Vassalottibcd1e3a2009-01-23 05:28:16 +0000735 }
Georg Brandl116aa622007-08-15 14:28:22 +0000736
Alexandre Vassalottibcd1e3a2009-01-23 05:28:16 +0000737 with open('data.pickle', 'wb') as f:
738 # Pickle the 'data' dictionary using the highest protocol available.
739 pickle.dump(data, f, pickle.HIGHEST_PROTOCOL)
Georg Brandl116aa622007-08-15 14:28:22 +0000740
Georg Brandl116aa622007-08-15 14:28:22 +0000741
Alexandre Vassalottibcd1e3a2009-01-23 05:28:16 +0000742The following example reads the resulting pickled data. ::
Georg Brandl116aa622007-08-15 14:28:22 +0000743
Alexandre Vassalottibcd1e3a2009-01-23 05:28:16 +0000744 import pickle
Georg Brandl116aa622007-08-15 14:28:22 +0000745
Alexandre Vassalottibcd1e3a2009-01-23 05:28:16 +0000746 with open('data.pickle', 'rb') as f:
747 # The protocol version used is detected automatically, so we do not
748 # have to specify it.
749 data = pickle.load(f)
Georg Brandl116aa622007-08-15 14:28:22 +0000750
Georg Brandl116aa622007-08-15 14:28:22 +0000751
Alexandre Vassalotti9d7665d2009-04-03 06:13:29 +0000752.. XXX: Add examples showing how to optimize pickles for size (like using
753.. pickletools.optimize() or the gzip module).
754
755
Georg Brandl116aa622007-08-15 14:28:22 +0000756.. seealso::
757
Alexandre Vassalottif7fa63d2008-05-11 08:55:36 +0000758 Module :mod:`copyreg`
Georg Brandl116aa622007-08-15 14:28:22 +0000759 Pickle interface constructor registration for extension types.
760
Alexandre Vassalotti9d7665d2009-04-03 06:13:29 +0000761 Module :mod:`pickletools`
762 Tools for working with and analyzing pickled data.
763
Georg Brandl116aa622007-08-15 14:28:22 +0000764 Module :mod:`shelve`
765 Indexed databases of objects; uses :mod:`pickle`.
766
767 Module :mod:`copy`
768 Shallow and deep object copying.
769
770 Module :mod:`marshal`
771 High-performance serialization of built-in types.
772
773
Georg Brandl116aa622007-08-15 14:28:22 +0000774.. rubric:: Footnotes
775
776.. [#] Don't confuse this with the :mod:`marshal` module
777
Georg Brandl116aa622007-08-15 14:28:22 +0000778.. [#] The exception raised will likely be an :exc:`ImportError` or an
779 :exc:`AttributeError` but it could be something else.
780
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000781.. [#] The :mod:`copy` module uses this protocol for shallow and deep copying
782 operations.
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000783
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000784.. [#] The limitation on alphanumeric characters is due to the fact
785 the persistent IDs, in protocol 0, are delimited by the newline
786 character. Therefore if any kind of newline characters occurs in
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000787 persistent IDs, the resulting pickle will become unreadable.