blob: 897621147c0dc606b6cfd3bb03c8fed7343b8a0b [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001:mod:`pickle` --- Python object serialization
2=============================================
3
4.. index::
5 single: persistence
6 pair: persistent; objects
7 pair: serializing; objects
8 pair: marshalling; objects
9 pair: flattening; objects
10 pair: pickling; objects
11
12.. module:: pickle
13 :synopsis: Convert Python objects to streams of bytes and back.
Christian Heimes5b5e81c2007-12-31 16:14:33 +000014.. sectionauthor:: Jim Kerr <jbkerr@sr.hp.com>.
Andrew Kuchling587e9702013-11-12 10:02:35 -050015.. sectionauthor:: Barry Warsaw <barry@python.org>
Georg Brandl116aa622007-08-15 14:28:22 +000016
Alexandre Vassalotti9d7665d2009-04-03 06:13:29 +000017
Antoine Pitroud4d60552013-12-07 00:56:59 +010018The :mod:`pickle` module implements binary protocols for serializing and
19de-serializing a Python object structure. *"Pickling"* is the process
20whereby a Python object hierarchy is converted into a byte stream, and
21*"unpickling"* is the inverse operation, whereby a byte stream
22(from a :term:`binary file` or :term:`bytes-like object`) is converted
23back into an object hierarchy. Pickling (and unpickling) is alternatively
24known as "serialization", "marshalling," [#]_ or "flattening"; however, to
25avoid confusion, the terms used here are "pickling" and "unpickling".
Georg Brandl116aa622007-08-15 14:28:22 +000026
Georg Brandl0036bcf2010-10-17 10:24:54 +000027.. warning::
28
29 The :mod:`pickle` module is not intended to be secure against erroneous or
30 maliciously constructed data. Never unpickle data received from an untrusted
31 or unauthenticated source.
32
Georg Brandl116aa622007-08-15 14:28:22 +000033
34Relationship to other Python modules
35------------------------------------
36
Antoine Pitroud4d60552013-12-07 00:56:59 +010037Comparison with ``marshal``
38^^^^^^^^^^^^^^^^^^^^^^^^^^^
Georg Brandl116aa622007-08-15 14:28:22 +000039
40Python has a more primitive serialization module called :mod:`marshal`, but in
41general :mod:`pickle` should always be the preferred way to serialize Python
42objects. :mod:`marshal` exists primarily to support Python's :file:`.pyc`
43files.
44
Georg Brandl5aa580f2010-11-30 14:57:54 +000045The :mod:`pickle` module differs from :mod:`marshal` in several significant ways:
Georg Brandl116aa622007-08-15 14:28:22 +000046
47* The :mod:`pickle` module keeps track of the objects it has already serialized,
48 so that later references to the same object won't be serialized again.
49 :mod:`marshal` doesn't do this.
50
51 This has implications both for recursive objects and object sharing. Recursive
52 objects are objects that contain references to themselves. These are not
53 handled by marshal, and in fact, attempting to marshal recursive objects will
54 crash your Python interpreter. Object sharing happens when there are multiple
55 references to the same object in different places in the object hierarchy being
56 serialized. :mod:`pickle` stores such objects only once, and ensures that all
57 other references point to the master copy. Shared objects remain shared, which
58 can be very important for mutable objects.
59
60* :mod:`marshal` cannot be used to serialize user-defined classes and their
61 instances. :mod:`pickle` can save and restore class instances transparently,
62 however the class definition must be importable and live in the same module as
63 when the object was stored.
64
65* The :mod:`marshal` serialization format is not guaranteed to be portable
66 across Python versions. Because its primary job in life is to support
67 :file:`.pyc` files, the Python implementers reserve the right to change the
68 serialization format in non-backwards compatible ways should the need arise.
69 The :mod:`pickle` serialization format is guaranteed to be backwards compatible
70 across Python releases.
71
Antoine Pitroud4d60552013-12-07 00:56:59 +010072Comparison with ``json``
73^^^^^^^^^^^^^^^^^^^^^^^^
Georg Brandl116aa622007-08-15 14:28:22 +000074
Antoine Pitroud4d60552013-12-07 00:56:59 +010075There are fundamental differences between the pickle protocols and
76`JSON (JavaScript Object Notation) <http://json.org>`_:
77
78* JSON is a text serialization format (it outputs unicode text, although
79 most of the time it is then encoded to ``utf-8``), while pickle is
80 a binary serialization format;
81
82* JSON is human-readable, while pickle is not;
83
84* JSON is interoperable and widely used outside of the Python ecosystem,
85 while pickle is Python-specific;
86
87* JSON, by default, can only represent a subset of the Python built-in
88 types, and no custom classes; pickle can represent an extremely large
89 number of Python types (many of them automatically, by clever usage
90 of Python's introspection facilities; complex cases can be tackled by
91 implementing :ref:`specific object APIs <pickle-inst>`).
92
93.. seealso::
94 The :mod:`json` module: a standard library module allowing JSON
95 serialization and deserialization.
Georg Brandl116aa622007-08-15 14:28:22 +000096
Antoine Pitrou9bcb1122013-12-07 01:05:57 +010097
98.. _pickle-protocols:
99
Georg Brandl116aa622007-08-15 14:28:22 +0000100Data stream format
101------------------
102
103.. index::
Georg Brandl116aa622007-08-15 14:28:22 +0000104 single: External Data Representation
105
106The data format used by :mod:`pickle` is Python-specific. This has the
107advantage that there are no restrictions imposed by external standards such as
Antoine Pitroua9494f62012-05-10 15:38:30 +0200108JSON or XDR (which can't represent pointer sharing); however it means that
109non-Python programs may not be able to reconstruct pickled Python objects.
Georg Brandl116aa622007-08-15 14:28:22 +0000110
Antoine Pitroua9494f62012-05-10 15:38:30 +0200111By default, the :mod:`pickle` data format uses a relatively compact binary
112representation. If you need optimal size characteristics, you can efficiently
113:doc:`compress <archiving>` pickled data.
114
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000115The module :mod:`pickletools` contains tools for analyzing data streams
Antoine Pitroua9494f62012-05-10 15:38:30 +0200116generated by :mod:`pickle`. :mod:`pickletools` source code has extensive
117comments about opcodes used by pickle protocols.
Georg Brandl116aa622007-08-15 14:28:22 +0000118
Georg Brandl42f2ae02008-04-06 08:39:37 +0000119There are currently 4 different protocols which can be used for pickling.
Georg Brandl116aa622007-08-15 14:28:22 +0000120
Antoine Pitroua9494f62012-05-10 15:38:30 +0200121* Protocol version 0 is the original "human-readable" protocol and is
Alexandre Vassalottif7d08c72009-01-23 04:50:05 +0000122 backwards compatible with earlier versions of Python.
Georg Brandl116aa622007-08-15 14:28:22 +0000123
Antoine Pitroua9494f62012-05-10 15:38:30 +0200124* Protocol version 1 is an old binary format which is also compatible with
Georg Brandl116aa622007-08-15 14:28:22 +0000125 earlier versions of Python.
126
127* Protocol version 2 was introduced in Python 2.3. It provides much more
Antoine Pitroua9494f62012-05-10 15:38:30 +0200128 efficient pickling of :term:`new-style class`\es. Refer to :pep:`307` for
129 information about improvements brought by protocol 2.
Georg Brandl116aa622007-08-15 14:28:22 +0000130
Antoine Pitrou9bcb1122013-12-07 01:05:57 +0100131* Protocol version 3 was added in Python 3.0. It has explicit support for
Antoine Pitroua9494f62012-05-10 15:38:30 +0200132 :class:`bytes` objects and cannot be unpickled by Python 2.x. This is
Antoine Pitrou9bcb1122013-12-07 01:05:57 +0100133 the default protocol, and the recommended protocol when compatibility with
134 other Python 3 versions is required.
135
136* Protocol version 4 was added in Python 3.4. It adds support for very large
137 objects, pickling more kinds of objects, and some data format
138 optimizations. Refer to :pep:`3154` for information about improvements
139 brought by protocol 4.
Georg Brandl116aa622007-08-15 14:28:22 +0000140
Antoine Pitroud4d60552013-12-07 00:56:59 +0100141.. note::
142 Serialization is a more primitive notion than persistence; although
143 :mod:`pickle` reads and writes file objects, it does not handle the issue of
144 naming persistent objects, nor the (even more complicated) issue of concurrent
145 access to persistent objects. The :mod:`pickle` module can transform a complex
146 object into a byte stream and it can transform the byte stream into an object
147 with the same internal structure. Perhaps the most obvious thing to do with
148 these byte streams is to write them onto a file, but it is also conceivable to
149 send them across a network or store them in a database. The :mod:`shelve`
150 module provides a simple interface to pickle and unpickle objects on
151 DBM-style database files.
152
Georg Brandl116aa622007-08-15 14:28:22 +0000153
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000154Module Interface
155----------------
Georg Brandl116aa622007-08-15 14:28:22 +0000156
Antoine Pitroua9494f62012-05-10 15:38:30 +0200157To serialize an object hierarchy, you simply call the :func:`dumps` function.
158Similarly, to de-serialize a data stream, you call the :func:`loads` function.
159However, if you want more control over serialization and de-serialization,
160you can create a :class:`Pickler` or an :class:`Unpickler` object, respectively.
161
162The :mod:`pickle` module provides the following constants:
Georg Brandl116aa622007-08-15 14:28:22 +0000163
164
165.. data:: HIGHEST_PROTOCOL
166
Antoine Pitrou9bcb1122013-12-07 01:05:57 +0100167 An integer, the highest :ref:`protocol version <pickle-protocols>`
168 available. This value can be passed as a *protocol* value to functions
169 :func:`dump` and :func:`dumps` as well as the :class:`Pickler`
170 constructor.
Georg Brandl116aa622007-08-15 14:28:22 +0000171
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000172.. data:: DEFAULT_PROTOCOL
173
Antoine Pitrou9bcb1122013-12-07 01:05:57 +0100174 An integer, the default :ref:`protocol version <pickle-protocols>` used
175 for pickling. May be less than :data:`HIGHEST_PROTOCOL`. Currently the
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -0800176 default protocol is 3, a new protocol designed for Python 3.
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000177
178
Georg Brandl116aa622007-08-15 14:28:22 +0000179The :mod:`pickle` module provides the following functions to make the pickling
180process more convenient:
181
Georg Brandl18244152009-09-02 20:34:52 +0000182.. function:: dump(obj, file, protocol=None, \*, fix_imports=True)
Georg Brandl116aa622007-08-15 14:28:22 +0000183
Antoine Pitrou11cb9612010-09-15 11:11:28 +0000184 Write a pickled representation of *obj* to the open :term:`file object` *file*.
185 This is equivalent to ``Pickler(file, protocol).dump(obj)``.
Georg Brandl116aa622007-08-15 14:28:22 +0000186
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -0800187 The optional *protocol* argument tells the pickler to use the given
188 protocol; supported protocols are 0, 1, 2, 3. The default protocol is 3; a
189 backward-incompatible protocol designed for Python 3.
Georg Brandl116aa622007-08-15 14:28:22 +0000190
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000191 Specifying a negative protocol version selects the highest protocol version
192 supported. The higher the protocol used, the more recent the version of
193 Python needed to read the pickle produced.
Georg Brandl116aa622007-08-15 14:28:22 +0000194
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000195 The *file* argument must have a write() method that accepts a single bytes
Antoine Pitrou11cb9612010-09-15 11:11:28 +0000196 argument. It can thus be an on-disk file opened for binary writing, a
197 :class:`io.BytesIO` instance, or any other custom object that meets this
198 interface.
Georg Brandl116aa622007-08-15 14:28:22 +0000199
Serhiy Storchakafbc1c262013-11-29 12:17:13 +0200200 If *fix_imports* is true and *protocol* is less than 3, pickle will try to
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -0800201 map the new Python 3 names to the old module names used in Python 2, so
202 that the pickle data stream is readable with Python 2.
Antoine Pitroud9dfaa92009-06-04 20:32:06 +0000203
Georg Brandl18244152009-09-02 20:34:52 +0000204.. function:: dumps(obj, protocol=None, \*, fix_imports=True)
Georg Brandl116aa622007-08-15 14:28:22 +0000205
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -0800206 Return the pickled representation of the object as a :class:`bytes` object,
207 instead of writing it to a file.
Georg Brandl116aa622007-08-15 14:28:22 +0000208
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -0800209 The optional *protocol* argument tells the pickler to use the given
210 protocol; supported protocols are 0, 1, 2, 3 and 4. The default protocol
211 is 3; a backward-incompatible protocol designed for Python 3.
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000212
213 Specifying a negative protocol version selects the highest protocol version
214 supported. The higher the protocol used, the more recent the version of
215 Python needed to read the pickle produced.
216
Serhiy Storchakafbc1c262013-11-29 12:17:13 +0200217 If *fix_imports* is true and *protocol* is less than 3, pickle will try to
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -0800218 map the new Python 3 names to the old module names used in Python 2, so
219 that the pickle data stream is readable with Python 2.
Antoine Pitroud9dfaa92009-06-04 20:32:06 +0000220
Georg Brandl18244152009-09-02 20:34:52 +0000221.. function:: load(file, \*, fix_imports=True, encoding="ASCII", errors="strict")
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000222
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -0800223 Read a pickled object representation from the open :term:`file object`
224 *file* and return the reconstituted object hierarchy specified therein.
225 This is equivalent to ``Unpickler(file).load()``.
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000226
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -0800227 The protocol version of the pickle is detected automatically, so no
228 protocol argument is needed. Bytes past the pickled object's
229 representation are ignored.
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000230
231 The argument *file* must have two methods, a read() method that takes an
232 integer argument, and a readline() method that requires no arguments. Both
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -0800233 methods should return bytes. Thus *file* can be an on-disk file opened for
234 binary reading, a :class:`io.BytesIO` object, or any other custom object
Antoine Pitrou11cb9612010-09-15 11:11:28 +0000235 that meets this interface.
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000236
Antoine Pitroud9dfaa92009-06-04 20:32:06 +0000237 Optional keyword arguments are *fix_imports*, *encoding* and *errors*,
Georg Brandl6faee4e2010-09-21 14:48:28 +0000238 which are used to control compatibility support for pickle stream generated
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -0800239 by Python 2. If *fix_imports* is true, pickle will try to map the old
240 Python 2 names to the new names used in Python 3. The *encoding* and
Antoine Pitroud9dfaa92009-06-04 20:32:06 +0000241 *errors* tell pickle how to decode 8-bit string instances pickled by Python
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -0800242 2; these default to 'ASCII' and 'strict', respectively. The *encoding* can
243 be 'bytes' to read these 8-bit string instances as bytes objects.
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000244
Georg Brandl18244152009-09-02 20:34:52 +0000245.. function:: loads(bytes_object, \*, fix_imports=True, encoding="ASCII", errors="strict")
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000246
247 Read a pickled object hierarchy from a :class:`bytes` object and return the
248 reconstituted object hierarchy specified therein
249
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -0800250 The protocol version of the pickle is detected automatically, so no
251 protocol argument is needed. Bytes past the pickled object's
252 representation are ignored.
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000253
Antoine Pitroud9dfaa92009-06-04 20:32:06 +0000254 Optional keyword arguments are *fix_imports*, *encoding* and *errors*,
Georg Brandl6faee4e2010-09-21 14:48:28 +0000255 which are used to control compatibility support for pickle stream generated
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -0800256 by Python 2. If *fix_imports* is true, pickle will try to map the old
257 Python 2 names to the new names used in Python 3. The *encoding* and
Antoine Pitroud9dfaa92009-06-04 20:32:06 +0000258 *errors* tell pickle how to decode 8-bit string instances pickled by Python
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -0800259 2; these default to 'ASCII' and 'strict', respectively. The *encoding* can
260 be 'bytes' to read these 8-bit string instances as bytes objects.
Georg Brandl116aa622007-08-15 14:28:22 +0000261
Georg Brandl116aa622007-08-15 14:28:22 +0000262
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000263The :mod:`pickle` module defines three exceptions:
Georg Brandl116aa622007-08-15 14:28:22 +0000264
265.. exception:: PickleError
266
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000267 Common base class for the other pickling exceptions. It inherits
Georg Brandl116aa622007-08-15 14:28:22 +0000268 :exc:`Exception`.
269
Georg Brandl116aa622007-08-15 14:28:22 +0000270.. exception:: PicklingError
271
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000272 Error raised when an unpicklable object is encountered by :class:`Pickler`.
273 It inherits :exc:`PickleError`.
Georg Brandl116aa622007-08-15 14:28:22 +0000274
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000275 Refer to :ref:`pickle-picklable` to learn what kinds of objects can be
276 pickled.
277
Georg Brandl116aa622007-08-15 14:28:22 +0000278.. exception:: UnpicklingError
279
Ezio Melottie62aad32011-11-18 13:51:10 +0200280 Error raised when there is a problem unpickling an object, such as a data
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000281 corruption or a security violation. It inherits :exc:`PickleError`.
Georg Brandl116aa622007-08-15 14:28:22 +0000282
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000283 Note that other exceptions may also be raised during unpickling, including
284 (but not necessarily limited to) AttributeError, EOFError, ImportError, and
285 IndexError.
286
287
288The :mod:`pickle` module exports two classes, :class:`Pickler` and
Georg Brandl116aa622007-08-15 14:28:22 +0000289:class:`Unpickler`:
290
Georg Brandl18244152009-09-02 20:34:52 +0000291.. class:: Pickler(file, protocol=None, \*, fix_imports=True)
Georg Brandl116aa622007-08-15 14:28:22 +0000292
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000293 This takes a binary file for writing a pickle data stream.
Georg Brandl116aa622007-08-15 14:28:22 +0000294
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -0800295 The optional *protocol* argument tells the pickler to use the given
296 protocol; supported protocols are 0, 1, 2, 3 and 4. The default protocol
297 is 3; a backward-incompatible protocol designed for Python 3.
Georg Brandl116aa622007-08-15 14:28:22 +0000298
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000299 Specifying a negative protocol version selects the highest protocol version
300 supported. The higher the protocol used, the more recent the version of
301 Python needed to read the pickle produced.
Georg Brandl116aa622007-08-15 14:28:22 +0000302
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000303 The *file* argument must have a write() method that accepts a single bytes
Antoine Pitrou11cb9612010-09-15 11:11:28 +0000304 argument. It can thus be an on-disk file opened for binary writing, a
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -0800305 :class:`io.BytesIO` instance, or any other custom object that meets this
306 interface.
Georg Brandl116aa622007-08-15 14:28:22 +0000307
Serhiy Storchakafbc1c262013-11-29 12:17:13 +0200308 If *fix_imports* is true and *protocol* is less than 3, pickle will try to
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -0800309 map the new Python 3 names to the old module names used in Python 2, so
310 that the pickle data stream is readable with Python 2.
Antoine Pitroud9dfaa92009-06-04 20:32:06 +0000311
Benjamin Petersone41251e2008-04-25 01:59:09 +0000312 .. method:: dump(obj)
Georg Brandl116aa622007-08-15 14:28:22 +0000313
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000314 Write a pickled representation of *obj* to the open file object given in
315 the constructor.
Georg Brandl116aa622007-08-15 14:28:22 +0000316
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000317 .. method:: persistent_id(obj)
318
319 Do nothing by default. This exists so a subclass can override it.
320
321 If :meth:`persistent_id` returns ``None``, *obj* is pickled as usual. Any
322 other value causes :class:`Pickler` to emit the returned value as a
323 persistent ID for *obj*. The meaning of this persistent ID should be
324 defined by :meth:`Unpickler.persistent_load`. Note that the value
325 returned by :meth:`persistent_id` cannot itself have a persistent ID.
326
327 See :ref:`pickle-persistent` for details and examples of uses.
Georg Brandl116aa622007-08-15 14:28:22 +0000328
Antoine Pitrou8d3c2902012-03-04 18:31:48 +0100329 .. attribute:: dispatch_table
330
331 A pickler object's dispatch table is a registry of *reduction
332 functions* of the kind which can be declared using
333 :func:`copyreg.pickle`. It is a mapping whose keys are classes
334 and whose values are reduction functions. A reduction function
335 takes a single argument of the associated class and should
Serhiy Storchaka5bbbc942013-10-14 10:43:46 +0300336 conform to the same interface as a :meth:`__reduce__`
Antoine Pitrou8d3c2902012-03-04 18:31:48 +0100337 method.
338
339 By default, a pickler object will not have a
340 :attr:`dispatch_table` attribute, and it will instead use the
341 global dispatch table managed by the :mod:`copyreg` module.
342 However, to customize the pickling for a specific pickler object
343 one can set the :attr:`dispatch_table` attribute to a dict-like
344 object. Alternatively, if a subclass of :class:`Pickler` has a
345 :attr:`dispatch_table` attribute then this will be used as the
346 default dispatch table for instances of that class.
347
348 See :ref:`pickle-dispatch` for usage examples.
349
350 .. versionadded:: 3.3
351
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000352 .. attribute:: fast
353
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000354 Deprecated. Enable fast mode if set to a true value. The fast mode
355 disables the usage of memo, therefore speeding the pickling process by not
356 generating superfluous PUT opcodes. It should not be used with
357 self-referential objects, doing otherwise will cause :class:`Pickler` to
358 recurse infinitely.
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000359
360 Use :func:`pickletools.optimize` if you need more compact pickles.
361
Georg Brandl116aa622007-08-15 14:28:22 +0000362
Georg Brandl18244152009-09-02 20:34:52 +0000363.. class:: Unpickler(file, \*, fix_imports=True, encoding="ASCII", errors="strict")
Georg Brandl116aa622007-08-15 14:28:22 +0000364
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000365 This takes a binary file for reading a pickle data stream.
Georg Brandl116aa622007-08-15 14:28:22 +0000366
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000367 The protocol version of the pickle is detected automatically, so no
368 protocol argument is needed.
369
370 The argument *file* must have two methods, a read() method that takes an
371 integer argument, and a readline() method that requires no arguments. Both
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -0800372 methods should return bytes. Thus *file* can be an on-disk file object
373 opened for binary reading, a :class:`io.BytesIO` object, or any other
374 custom object that meets this interface.
Georg Brandl116aa622007-08-15 14:28:22 +0000375
Antoine Pitroud9dfaa92009-06-04 20:32:06 +0000376 Optional keyword arguments are *fix_imports*, *encoding* and *errors*,
Georg Brandl6faee4e2010-09-21 14:48:28 +0000377 which are used to control compatibility support for pickle stream generated
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -0800378 by Python 2. If *fix_imports* is true, pickle will try to map the old
379 Python 2 names to the new names used in Python 3. The *encoding* and
Antoine Pitroud9dfaa92009-06-04 20:32:06 +0000380 *errors* tell pickle how to decode 8-bit string instances pickled by Python
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -0800381 2; these default to 'ASCII' and 'strict', respectively. The *encoding* can
382 be 'bytes' to read these ß8-bit string instances as bytes objects.
Georg Brandl116aa622007-08-15 14:28:22 +0000383
Benjamin Petersone41251e2008-04-25 01:59:09 +0000384 .. method:: load()
Georg Brandl116aa622007-08-15 14:28:22 +0000385
Benjamin Petersone41251e2008-04-25 01:59:09 +0000386 Read a pickled object representation from the open file object given in
387 the constructor, and return the reconstituted object hierarchy specified
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000388 therein. Bytes past the pickled object's representation are ignored.
Georg Brandl116aa622007-08-15 14:28:22 +0000389
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000390 .. method:: persistent_load(pid)
Georg Brandl116aa622007-08-15 14:28:22 +0000391
Ezio Melottie62aad32011-11-18 13:51:10 +0200392 Raise an :exc:`UnpicklingError` by default.
Georg Brandl116aa622007-08-15 14:28:22 +0000393
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000394 If defined, :meth:`persistent_load` should return the object specified by
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000395 the persistent ID *pid*. If an invalid persistent ID is encountered, an
Ezio Melottie62aad32011-11-18 13:51:10 +0200396 :exc:`UnpicklingError` should be raised.
Georg Brandl116aa622007-08-15 14:28:22 +0000397
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000398 See :ref:`pickle-persistent` for details and examples of uses.
399
400 .. method:: find_class(module, name)
401
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000402 Import *module* if necessary and return the object called *name* from it,
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000403 where the *module* and *name* arguments are :class:`str` objects. Note,
404 unlike its name suggests, :meth:`find_class` is also used for finding
405 functions.
Georg Brandl116aa622007-08-15 14:28:22 +0000406
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000407 Subclasses may override this to gain control over what type of objects and
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000408 how they can be loaded, potentially reducing security risks. Refer to
409 :ref:`pickle-restrict` for details.
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000410
411
412.. _pickle-picklable:
Georg Brandl116aa622007-08-15 14:28:22 +0000413
414What can be pickled and unpickled?
415----------------------------------
416
417The following types can be pickled:
418
419* ``None``, ``True``, and ``False``
420
Georg Brandlba956ae2007-11-29 17:24:34 +0000421* integers, floating point numbers, complex numbers
Georg Brandl116aa622007-08-15 14:28:22 +0000422
Georg Brandlf6945182008-02-01 11:56:49 +0000423* strings, bytes, bytearrays
Georg Brandl116aa622007-08-15 14:28:22 +0000424
425* tuples, lists, sets, and dictionaries containing only picklable objects
426
Ethan Furman2498d9e2013-10-18 00:45:40 -0700427* functions defined at the top level of a module (using :keyword:`def`, not
428 :keyword:`lambda`)
Georg Brandl116aa622007-08-15 14:28:22 +0000429
430* built-in functions defined at the top level of a module
431
432* classes that are defined at the top level of a module
433
Serhiy Storchaka5bbbc942013-10-14 10:43:46 +0300434* instances of such classes whose :attr:`~object.__dict__` or the result of
435 calling :meth:`__getstate__` is picklable (see section :ref:`pickle-inst` for
Eli Bendersky78f3ce52013-01-02 05:53:59 -0800436 details).
Georg Brandl116aa622007-08-15 14:28:22 +0000437
438Attempts to pickle unpicklable objects will raise the :exc:`PicklingError`
439exception; when this happens, an unspecified number of bytes may have already
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000440been written to the underlying file. Trying to pickle a highly recursive data
Georg Brandl116aa622007-08-15 14:28:22 +0000441structure may exceed the maximum recursion depth, a :exc:`RuntimeError` will be
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000442raised in this case. You can carefully raise this limit with
Georg Brandl116aa622007-08-15 14:28:22 +0000443:func:`sys.setrecursionlimit`.
444
445Note that functions (built-in and user-defined) are pickled by "fully qualified"
Ethan Furman2498d9e2013-10-18 00:45:40 -0700446name reference, not by value. [#]_ This means that only the function name is
Eli Bendersky78f3ce52013-01-02 05:53:59 -0800447pickled, along with the name of the module the function is defined in. Neither
448the function's code, nor any of its function attributes are pickled. Thus the
Georg Brandl116aa622007-08-15 14:28:22 +0000449defining module must be importable in the unpickling environment, and the module
450must contain the named object, otherwise an exception will be raised. [#]_
451
452Similarly, classes are pickled by named reference, so the same restrictions in
453the unpickling environment apply. Note that none of the class's code or data is
454pickled, so in the following example the class attribute ``attr`` is not
455restored in the unpickling environment::
456
457 class Foo:
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000458 attr = 'A class attribute'
Georg Brandl116aa622007-08-15 14:28:22 +0000459
460 picklestring = pickle.dumps(Foo)
461
462These restrictions are why picklable functions and classes must be defined in
463the top level of a module.
464
465Similarly, when class instances are pickled, their class's code and data are not
466pickled along with them. Only the instance data are pickled. This is done on
467purpose, so you can fix bugs in a class or add methods to the class and still
468load objects that were created with an earlier version of the class. If you
469plan to have long-lived objects that will see many versions of a class, it may
470be worthwhile to put a version number in the objects so that suitable
471conversions can be made by the class's :meth:`__setstate__` method.
472
473
Georg Brandl116aa622007-08-15 14:28:22 +0000474.. _pickle-inst:
475
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000476Pickling Class Instances
477------------------------
Georg Brandl116aa622007-08-15 14:28:22 +0000478
Serhiy Storchaka5bbbc942013-10-14 10:43:46 +0300479.. currentmodule:: None
480
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000481In this section, we describe the general mechanisms available to you to define,
482customize, and control how class instances are pickled and unpickled.
Georg Brandl116aa622007-08-15 14:28:22 +0000483
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000484In most cases, no additional code is needed to make instances picklable. By
485default, pickle will retrieve the class and the attributes of an instance via
486introspection. When a class instance is unpickled, its :meth:`__init__` method
487is usually *not* invoked. The default behaviour first creates an uninitialized
488instance and then restores the saved attributes. The following code shows an
489implementation of this behaviour::
Georg Brandl85eb8c12007-08-31 16:33:38 +0000490
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000491 def save(obj):
492 return (obj.__class__, obj.__dict__)
493
494 def load(cls, attributes):
495 obj = cls.__new__(cls)
496 obj.__dict__.update(attributes)
497 return obj
Georg Brandl116aa622007-08-15 14:28:22 +0000498
Georg Brandl6faee4e2010-09-21 14:48:28 +0000499Classes can alter the default behaviour by providing one or several special
Georg Brandlc8148262010-10-17 11:13:37 +0000500methods:
Georg Brandl116aa622007-08-15 14:28:22 +0000501
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100502.. method:: object.__getnewargs_ex__()
503
504 In protocols 4 and newer, classes that implements the
505 :meth:`__getnewargs_ex__` method can dictate the values passed to the
506 :meth:`__new__` method upon unpickling. The method must return a pair
507 ``(args, kwargs)`` where *args* is a tuple of positional arguments
508 and *kwargs* a dictionary of named arguments for constructing the
509 object. Those will be passed to the :meth:`__new__` method upon
510 unpickling.
511
512 You should implement this method if the :meth:`__new__` method of your
513 class requires keyword-only arguments. Otherwise, it is recommended for
514 compatibility to implement :meth:`__getnewargs__`.
515
516
Georg Brandlc8148262010-10-17 11:13:37 +0000517.. method:: object.__getnewargs__()
Georg Brandl116aa622007-08-15 14:28:22 +0000518
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100519 This method serve a similar purpose as :meth:`__getnewargs_ex__` but
520 for protocols 2 and newer. It must return a tuple of arguments `args`
521 which will be passed to the :meth:`__new__` method upon unpickling.
522
523 In protocols 4 and newer, :meth:`__getnewargs__` will not be called if
524 :meth:`__getnewargs_ex__` is defined.
Georg Brandl116aa622007-08-15 14:28:22 +0000525
Georg Brandl116aa622007-08-15 14:28:22 +0000526
Georg Brandlc8148262010-10-17 11:13:37 +0000527.. method:: object.__getstate__()
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000528
Georg Brandlc8148262010-10-17 11:13:37 +0000529 Classes can further influence how their instances are pickled; if the class
530 defines the method :meth:`__getstate__`, it is called and the returned object
531 is pickled as the contents for the instance, instead of the contents of the
532 instance's dictionary. If the :meth:`__getstate__` method is absent, the
Serhiy Storchaka5bbbc942013-10-14 10:43:46 +0300533 instance's :attr:`~object.__dict__` is pickled as usual.
Georg Brandl116aa622007-08-15 14:28:22 +0000534
Georg Brandlc8148262010-10-17 11:13:37 +0000535
536.. method:: object.__setstate__(state)
537
538 Upon unpickling, if the class defines :meth:`__setstate__`, it is called with
539 the unpickled state. In that case, there is no requirement for the state
540 object to be a dictionary. Otherwise, the pickled state must be a dictionary
541 and its items are assigned to the new instance's dictionary.
542
543 .. note::
544
545 If :meth:`__getstate__` returns a false value, the :meth:`__setstate__`
546 method will not be called upon unpickling.
547
Georg Brandl116aa622007-08-15 14:28:22 +0000548
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000549Refer to the section :ref:`pickle-state` for more information about how to use
550the methods :meth:`__getstate__` and :meth:`__setstate__`.
Georg Brandl116aa622007-08-15 14:28:22 +0000551
Benjamin Petersond23f8222009-04-05 19:13:16 +0000552.. note::
Georg Brandle720c0a2009-04-27 16:20:50 +0000553
Benjamin Petersond23f8222009-04-05 19:13:16 +0000554 At unpickling time, some methods like :meth:`__getattr__`,
555 :meth:`__getattribute__`, or :meth:`__setattr__` may be called upon the
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100556 instance. In case those methods rely on some internal invariant being
557 true, the type should implement :meth:`__getnewargs__` or
558 :meth:`__getnewargs_ex__` to establish such an invariant; otherwise,
559 neither :meth:`__new__` nor :meth:`__init__` will be called.
Benjamin Petersond23f8222009-04-05 19:13:16 +0000560
Georg Brandlc8148262010-10-17 11:13:37 +0000561.. index:: pair: copy; protocol
Christian Heimes05e8be12008-02-23 18:30:17 +0000562
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000563As we shall see, pickle does not use directly the methods described above. In
564fact, these methods are part of the copy protocol which implements the
565:meth:`__reduce__` special method. The copy protocol provides a unified
566interface for retrieving the data necessary for pickling and copying
Georg Brandl48310cd2009-01-03 21:18:54 +0000567objects. [#]_
Georg Brandl116aa622007-08-15 14:28:22 +0000568
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000569Although powerful, implementing :meth:`__reduce__` directly in your classes is
570error prone. For this reason, class designers should use the high-level
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100571interface (i.e., :meth:`__getnewargs_ex__`, :meth:`__getstate__` and
Georg Brandlc8148262010-10-17 11:13:37 +0000572:meth:`__setstate__`) whenever possible. We will show, however, cases where
573using :meth:`__reduce__` is the only option or leads to more efficient pickling
574or both.
Georg Brandl116aa622007-08-15 14:28:22 +0000575
Georg Brandlc8148262010-10-17 11:13:37 +0000576.. method:: object.__reduce__()
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000577
Georg Brandlc8148262010-10-17 11:13:37 +0000578 The interface is currently defined as follows. The :meth:`__reduce__` method
579 takes no argument and shall return either a string or preferably a tuple (the
580 returned object is often referred to as the "reduce value").
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000581
Georg Brandlc8148262010-10-17 11:13:37 +0000582 If a string is returned, the string should be interpreted as the name of a
583 global variable. It should be the object's local name relative to its
584 module; the pickle module searches the module namespace to determine the
585 object's module. This behaviour is typically useful for singletons.
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000586
Georg Brandlc8148262010-10-17 11:13:37 +0000587 When a tuple is returned, it must be between two and five items long.
588 Optional items can either be omitted, or ``None`` can be provided as their
589 value. The semantics of each item are in order:
Georg Brandl116aa622007-08-15 14:28:22 +0000590
Georg Brandlc8148262010-10-17 11:13:37 +0000591 .. XXX Mention __newobj__ special-case?
Georg Brandl116aa622007-08-15 14:28:22 +0000592
Georg Brandlc8148262010-10-17 11:13:37 +0000593 * A callable object that will be called to create the initial version of the
594 object.
Georg Brandl116aa622007-08-15 14:28:22 +0000595
Georg Brandlc8148262010-10-17 11:13:37 +0000596 * A tuple of arguments for the callable object. An empty tuple must be given
597 if the callable does not accept any argument.
Georg Brandl116aa622007-08-15 14:28:22 +0000598
Georg Brandlc8148262010-10-17 11:13:37 +0000599 * Optionally, the object's state, which will be passed to the object's
600 :meth:`__setstate__` method as previously described. If the object has no
601 such method then, the value must be a dictionary and it will be added to
Serhiy Storchaka5bbbc942013-10-14 10:43:46 +0300602 the object's :attr:`~object.__dict__` attribute.
Georg Brandl116aa622007-08-15 14:28:22 +0000603
Georg Brandlc8148262010-10-17 11:13:37 +0000604 * Optionally, an iterator (and not a sequence) yielding successive items.
605 These items will be appended to the object either using
606 ``obj.append(item)`` or, in batch, using ``obj.extend(list_of_items)``.
607 This is primarily used for list subclasses, but may be used by other
608 classes as long as they have :meth:`append` and :meth:`extend` methods with
609 the appropriate signature. (Whether :meth:`append` or :meth:`extend` is
610 used depends on which pickle protocol version is used as well as the number
611 of items to append, so both must be supported.)
Georg Brandl116aa622007-08-15 14:28:22 +0000612
Georg Brandlc8148262010-10-17 11:13:37 +0000613 * Optionally, an iterator (not a sequence) yielding successive key-value
614 pairs. These items will be stored to the object using ``obj[key] =
615 value``. This is primarily used for dictionary subclasses, but may be used
616 by other classes as long as they implement :meth:`__setitem__`.
Georg Brandl116aa622007-08-15 14:28:22 +0000617
Georg Brandlc8148262010-10-17 11:13:37 +0000618
619.. method:: object.__reduce_ex__(protocol)
620
621 Alternatively, a :meth:`__reduce_ex__` method may be defined. The only
622 difference is this method should take a single integer argument, the protocol
623 version. When defined, pickle will prefer it over the :meth:`__reduce__`
624 method. In addition, :meth:`__reduce__` automatically becomes a synonym for
625 the extended version. The main use for this method is to provide
626 backwards-compatible reduce values for older Python releases.
Georg Brandl116aa622007-08-15 14:28:22 +0000627
Serhiy Storchaka5bbbc942013-10-14 10:43:46 +0300628.. currentmodule:: pickle
629
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000630.. _pickle-persistent:
631
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000632Persistence of External Objects
633^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Georg Brandl116aa622007-08-15 14:28:22 +0000634
Christian Heimes05e8be12008-02-23 18:30:17 +0000635.. index::
636 single: persistent_id (pickle protocol)
637 single: persistent_load (pickle protocol)
638
Georg Brandl116aa622007-08-15 14:28:22 +0000639For the benefit of object persistence, the :mod:`pickle` module supports the
640notion of a reference to an object outside the pickled data stream. Such
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000641objects are referenced by a persistent ID, which should be either a string of
642alphanumeric characters (for protocol 0) [#]_ or just an arbitrary object (for
643any newer protocol).
Georg Brandl116aa622007-08-15 14:28:22 +0000644
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000645The resolution of such persistent IDs is not defined by the :mod:`pickle`
646module; it will delegate this resolution to the user defined methods on the
Serhiy Storchaka5bbbc942013-10-14 10:43:46 +0300647pickler and unpickler, :meth:`~Pickler.persistent_id` and
648:meth:`~Unpickler.persistent_load` respectively.
Georg Brandl116aa622007-08-15 14:28:22 +0000649
650To pickle objects that have an external persistent id, the pickler must have a
Serhiy Storchaka5bbbc942013-10-14 10:43:46 +0300651custom :meth:`~Pickler.persistent_id` method that takes an object as an
652argument and returns either ``None`` or the persistent id for that object.
653When ``None`` is returned, the pickler simply pickles the object as normal.
654When a persistent ID string is returned, the pickler will pickle that object,
655along with a marker so that the unpickler will recognize it as a persistent ID.
Georg Brandl116aa622007-08-15 14:28:22 +0000656
657To unpickle external objects, the unpickler must have a custom
Serhiy Storchaka5bbbc942013-10-14 10:43:46 +0300658:meth:`~Unpickler.persistent_load` method that takes a persistent ID object and
659returns the referenced object.
Georg Brandl116aa622007-08-15 14:28:22 +0000660
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000661Here is a comprehensive example presenting how persistent ID can be used to
662pickle external objects by reference.
Georg Brandl116aa622007-08-15 14:28:22 +0000663
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000664.. literalinclude:: ../includes/dbpickle.py
Alexandre Vassalottibcd1e3a2009-01-23 05:28:16 +0000665
Antoine Pitrou8d3c2902012-03-04 18:31:48 +0100666.. _pickle-dispatch:
667
668Dispatch Tables
669^^^^^^^^^^^^^^^
670
671If one wants to customize pickling of some classes without disturbing
672any other code which depends on pickling, then one can create a
673pickler with a private dispatch table.
674
675The global dispatch table managed by the :mod:`copyreg` module is
676available as :data:`copyreg.dispatch_table`. Therefore, one may
677choose to use a modified copy of :data:`copyreg.dispatch_table` as a
678private dispatch table.
679
680For example ::
681
682 f = io.BytesIO()
683 p = pickle.Pickler(f)
684 p.dispatch_table = copyreg.dispatch_table.copy()
685 p.dispatch_table[SomeClass] = reduce_SomeClass
686
687creates an instance of :class:`pickle.Pickler` with a private dispatch
688table which handles the ``SomeClass`` class specially. Alternatively,
689the code ::
690
691 class MyPickler(pickle.Pickler):
692 dispatch_table = copyreg.dispatch_table.copy()
693 dispatch_table[SomeClass] = reduce_SomeClass
694 f = io.BytesIO()
695 p = MyPickler(f)
696
697does the same, but all instances of ``MyPickler`` will by default
698share the same dispatch table. The equivalent code using the
699:mod:`copyreg` module is ::
700
701 copyreg.pickle(SomeClass, reduce_SomeClass)
702 f = io.BytesIO()
703 p = pickle.Pickler(f)
Georg Brandl116aa622007-08-15 14:28:22 +0000704
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000705.. _pickle-state:
706
707Handling Stateful Objects
708^^^^^^^^^^^^^^^^^^^^^^^^^
709
710.. index::
711 single: __getstate__() (copy protocol)
712 single: __setstate__() (copy protocol)
713
714Here's an example that shows how to modify pickling behavior for a class.
715The :class:`TextReader` class opens a text file, and returns the line number and
Serhiy Storchaka5bbbc942013-10-14 10:43:46 +0300716line contents each time its :meth:`!readline` method is called. If a
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000717:class:`TextReader` instance is pickled, all attributes *except* the file object
718member are saved. When the instance is unpickled, the file is reopened, and
719reading resumes from the last location. The :meth:`__setstate__` and
720:meth:`__getstate__` methods are used to implement this behavior. ::
721
722 class TextReader:
723 """Print and number lines in a text file."""
724
725 def __init__(self, filename):
726 self.filename = filename
727 self.file = open(filename)
728 self.lineno = 0
729
730 def readline(self):
731 self.lineno += 1
732 line = self.file.readline()
733 if not line:
734 return None
Alexandre Vassalotti9d7665d2009-04-03 06:13:29 +0000735 if line.endswith('\n'):
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000736 line = line[:-1]
737 return "%i: %s" % (self.lineno, line)
738
739 def __getstate__(self):
740 # Copy the object's state from self.__dict__ which contains
741 # all our instance attributes. Always use the dict.copy()
742 # method to avoid modifying the original state.
743 state = self.__dict__.copy()
744 # Remove the unpicklable entries.
745 del state['file']
746 return state
747
748 def __setstate__(self, state):
749 # Restore instance attributes (i.e., filename and lineno).
750 self.__dict__.update(state)
751 # Restore the previously opened file's state. To do so, we need to
752 # reopen it and read from it until the line count is restored.
753 file = open(self.filename)
754 for _ in range(self.lineno):
755 file.readline()
756 # Finally, save the file.
757 self.file = file
758
759
760A sample usage might be something like this::
761
762 >>> reader = TextReader("hello.txt")
763 >>> reader.readline()
764 '1: Hello world!'
765 >>> reader.readline()
766 '2: I am line number two.'
767 >>> new_reader = pickle.loads(pickle.dumps(reader))
768 >>> new_reader.readline()
769 '3: Goodbye!'
770
771
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000772.. _pickle-restrict:
Georg Brandl116aa622007-08-15 14:28:22 +0000773
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000774Restricting Globals
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000775-------------------
Georg Brandl116aa622007-08-15 14:28:22 +0000776
Christian Heimes05e8be12008-02-23 18:30:17 +0000777.. index::
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000778 single: find_class() (pickle protocol)
Christian Heimes05e8be12008-02-23 18:30:17 +0000779
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000780By default, unpickling will import any class or function that it finds in the
781pickle data. For many applications, this behaviour is unacceptable as it
782permits the unpickler to import and invoke arbitrary code. Just consider what
783this hand-crafted pickle data stream does when loaded::
Georg Brandl116aa622007-08-15 14:28:22 +0000784
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000785 >>> import pickle
786 >>> pickle.loads(b"cos\nsystem\n(S'echo hello world'\ntR.")
787 hello world
788 0
Georg Brandl116aa622007-08-15 14:28:22 +0000789
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000790In this example, the unpickler imports the :func:`os.system` function and then
791apply the string argument "echo hello world". Although this example is
792inoffensive, it is not difficult to imagine one that could damage your system.
Georg Brandl116aa622007-08-15 14:28:22 +0000793
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000794For this reason, you may want to control what gets unpickled by customizing
Serhiy Storchaka5bbbc942013-10-14 10:43:46 +0300795:meth:`Unpickler.find_class`. Unlike its name suggests,
796:meth:`Unpickler.find_class` is called whenever a global (i.e., a class or
797a function) is requested. Thus it is possible to either completely forbid
798globals or restrict them to a safe subset.
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000799
800Here is an example of an unpickler allowing only few safe classes from the
801:mod:`builtins` module to be loaded::
802
803 import builtins
804 import io
805 import pickle
806
807 safe_builtins = {
808 'range',
809 'complex',
810 'set',
811 'frozenset',
812 'slice',
813 }
814
815 class RestrictedUnpickler(pickle.Unpickler):
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000816
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000817 def find_class(self, module, name):
818 # Only allow safe classes from builtins.
819 if module == "builtins" and name in safe_builtins:
820 return getattr(builtins, name)
821 # Forbid everything else.
822 raise pickle.UnpicklingError("global '%s.%s' is forbidden" %
823 (module, name))
824
825 def restricted_loads(s):
826 """Helper function analogous to pickle.loads()."""
827 return RestrictedUnpickler(io.BytesIO(s)).load()
828
829A sample usage of our unpickler working has intended::
830
831 >>> restricted_loads(pickle.dumps([1, 2, range(15)]))
832 [1, 2, range(0, 15)]
833 >>> restricted_loads(b"cos\nsystem\n(S'echo hello world'\ntR.")
834 Traceback (most recent call last):
835 ...
836 pickle.UnpicklingError: global 'os.system' is forbidden
837 >>> restricted_loads(b'cbuiltins\neval\n'
838 ... b'(S\'getattr(__import__("os"), "system")'
839 ... b'("echo hello world")\'\ntR.')
840 Traceback (most recent call last):
841 ...
842 pickle.UnpicklingError: global 'builtins.eval' is forbidden
843
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000844
845.. XXX Add note about how extension codes could evade our protection
Georg Brandl48310cd2009-01-03 21:18:54 +0000846 mechanism (e.g. cached classes do not invokes find_class()).
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000847
848As our examples shows, you have to be careful with what you allow to be
849unpickled. Therefore if security is a concern, you may want to consider
Alexandre Vassalotti9d7665d2009-04-03 06:13:29 +0000850alternatives such as the marshalling API in :mod:`xmlrpc.client` or
851third-party solutions.
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000852
Georg Brandl116aa622007-08-15 14:28:22 +0000853
Antoine Pitroud4d60552013-12-07 00:56:59 +0100854Performance
855-----------
856
857Recent versions of the pickle protocol (from protocol 2 and upwards) feature
858efficient binary encodings for several common features and built-in types.
859Also, the :mod:`pickle` module has a transparent optimizer written in C.
860
861
Georg Brandl116aa622007-08-15 14:28:22 +0000862.. _pickle-example:
863
Alexandre Vassalotti9d7665d2009-04-03 06:13:29 +0000864Examples
865--------
Georg Brandl116aa622007-08-15 14:28:22 +0000866
Alexandre Vassalotti9d7665d2009-04-03 06:13:29 +0000867For the simplest code, use the :func:`dump` and :func:`load` functions. ::
Georg Brandl116aa622007-08-15 14:28:22 +0000868
869 import pickle
870
Alexandre Vassalottibcd1e3a2009-01-23 05:28:16 +0000871 # An arbitrary collection of objects supported by pickle.
872 data = {
Alexandre Vassalotti9d7665d2009-04-03 06:13:29 +0000873 'a': [1, 2.0, 3, 4+6j],
874 'b': ("character string", b"byte string"),
875 'c': set([None, True, False])
Alexandre Vassalottibcd1e3a2009-01-23 05:28:16 +0000876 }
Georg Brandl116aa622007-08-15 14:28:22 +0000877
Alexandre Vassalottibcd1e3a2009-01-23 05:28:16 +0000878 with open('data.pickle', 'wb') as f:
879 # Pickle the 'data' dictionary using the highest protocol available.
880 pickle.dump(data, f, pickle.HIGHEST_PROTOCOL)
Georg Brandl116aa622007-08-15 14:28:22 +0000881
Georg Brandl116aa622007-08-15 14:28:22 +0000882
Alexandre Vassalottibcd1e3a2009-01-23 05:28:16 +0000883The following example reads the resulting pickled data. ::
Georg Brandl116aa622007-08-15 14:28:22 +0000884
Alexandre Vassalottibcd1e3a2009-01-23 05:28:16 +0000885 import pickle
Georg Brandl116aa622007-08-15 14:28:22 +0000886
Alexandre Vassalottibcd1e3a2009-01-23 05:28:16 +0000887 with open('data.pickle', 'rb') as f:
888 # The protocol version used is detected automatically, so we do not
889 # have to specify it.
890 data = pickle.load(f)
Georg Brandl116aa622007-08-15 14:28:22 +0000891
Georg Brandl116aa622007-08-15 14:28:22 +0000892
Alexandre Vassalotti9d7665d2009-04-03 06:13:29 +0000893.. XXX: Add examples showing how to optimize pickles for size (like using
894.. pickletools.optimize() or the gzip module).
895
896
Georg Brandl116aa622007-08-15 14:28:22 +0000897.. seealso::
898
Alexandre Vassalottif7fa63d2008-05-11 08:55:36 +0000899 Module :mod:`copyreg`
Georg Brandl116aa622007-08-15 14:28:22 +0000900 Pickle interface constructor registration for extension types.
901
Alexandre Vassalotti9d7665d2009-04-03 06:13:29 +0000902 Module :mod:`pickletools`
903 Tools for working with and analyzing pickled data.
904
Georg Brandl116aa622007-08-15 14:28:22 +0000905 Module :mod:`shelve`
906 Indexed databases of objects; uses :mod:`pickle`.
907
908 Module :mod:`copy`
909 Shallow and deep object copying.
910
911 Module :mod:`marshal`
912 High-performance serialization of built-in types.
913
914
Georg Brandl116aa622007-08-15 14:28:22 +0000915.. rubric:: Footnotes
916
917.. [#] Don't confuse this with the :mod:`marshal` module
918
Ethan Furman2498d9e2013-10-18 00:45:40 -0700919.. [#] This is why :keyword:`lambda` functions cannot be pickled: all
920 :keyword:`lambda` functions share the same name: ``<lambda>``.
921
Georg Brandl116aa622007-08-15 14:28:22 +0000922.. [#] The exception raised will likely be an :exc:`ImportError` or an
923 :exc:`AttributeError` but it could be something else.
924
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000925.. [#] The :mod:`copy` module uses this protocol for shallow and deep copying
926 operations.
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000927
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000928.. [#] The limitation on alphanumeric characters is due to the fact
929 the persistent IDs, in protocol 0, are delimited by the newline
930 character. Therefore if any kind of newline characters occurs in
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000931 persistent IDs, the resulting pickle will become unreadable.