blob: a92526f5a3b5148681f5b8791700ed4910fb4781 [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001:mod:`pickle` --- Python object serialization
2=============================================
3
4.. index::
5 single: persistence
6 pair: persistent; objects
7 pair: serializing; objects
8 pair: marshalling; objects
9 pair: flattening; objects
10 pair: pickling; objects
11
12.. module:: pickle
13 :synopsis: Convert Python objects to streams of bytes and back.
Christian Heimes5b5e81c2007-12-31 16:14:33 +000014.. sectionauthor:: Jim Kerr <jbkerr@sr.hp.com>.
Andrew Kuchling587e9702013-11-12 10:02:35 -050015.. sectionauthor:: Barry Warsaw <barry@python.org>
Georg Brandl116aa622007-08-15 14:28:22 +000016
Alexandre Vassalotti9d7665d2009-04-03 06:13:29 +000017
Antoine Pitroud4d60552013-12-07 00:56:59 +010018The :mod:`pickle` module implements binary protocols for serializing and
19de-serializing a Python object structure. *"Pickling"* is the process
20whereby a Python object hierarchy is converted into a byte stream, and
21*"unpickling"* is the inverse operation, whereby a byte stream
22(from a :term:`binary file` or :term:`bytes-like object`) is converted
23back into an object hierarchy. Pickling (and unpickling) is alternatively
24known as "serialization", "marshalling," [#]_ or "flattening"; however, to
25avoid confusion, the terms used here are "pickling" and "unpickling".
Georg Brandl116aa622007-08-15 14:28:22 +000026
Georg Brandl0036bcf2010-10-17 10:24:54 +000027.. warning::
28
Benjamin Peterson7dcbf902015-07-06 11:28:07 -050029 The :mod:`pickle` module is not secure against erroneous or maliciously
Benjamin Petersonb8fd2622015-07-06 09:40:43 -050030 constructed data. Never unpickle data received from an untrusted or
31 unauthenticated source.
Georg Brandl0036bcf2010-10-17 10:24:54 +000032
Georg Brandl116aa622007-08-15 14:28:22 +000033
34Relationship to other Python modules
35------------------------------------
36
Antoine Pitroud4d60552013-12-07 00:56:59 +010037Comparison with ``marshal``
38^^^^^^^^^^^^^^^^^^^^^^^^^^^
Georg Brandl116aa622007-08-15 14:28:22 +000039
40Python has a more primitive serialization module called :mod:`marshal`, but in
41general :mod:`pickle` should always be the preferred way to serialize Python
42objects. :mod:`marshal` exists primarily to support Python's :file:`.pyc`
43files.
44
Georg Brandl5aa580f2010-11-30 14:57:54 +000045The :mod:`pickle` module differs from :mod:`marshal` in several significant ways:
Georg Brandl116aa622007-08-15 14:28:22 +000046
47* The :mod:`pickle` module keeps track of the objects it has already serialized,
48 so that later references to the same object won't be serialized again.
49 :mod:`marshal` doesn't do this.
50
51 This has implications both for recursive objects and object sharing. Recursive
52 objects are objects that contain references to themselves. These are not
53 handled by marshal, and in fact, attempting to marshal recursive objects will
54 crash your Python interpreter. Object sharing happens when there are multiple
55 references to the same object in different places in the object hierarchy being
56 serialized. :mod:`pickle` stores such objects only once, and ensures that all
57 other references point to the master copy. Shared objects remain shared, which
58 can be very important for mutable objects.
59
60* :mod:`marshal` cannot be used to serialize user-defined classes and their
61 instances. :mod:`pickle` can save and restore class instances transparently,
62 however the class definition must be importable and live in the same module as
63 when the object was stored.
64
65* The :mod:`marshal` serialization format is not guaranteed to be portable
66 across Python versions. Because its primary job in life is to support
67 :file:`.pyc` files, the Python implementers reserve the right to change the
68 serialization format in non-backwards compatible ways should the need arise.
69 The :mod:`pickle` serialization format is guaranteed to be backwards compatible
70 across Python releases.
71
Antoine Pitroud4d60552013-12-07 00:56:59 +010072Comparison with ``json``
73^^^^^^^^^^^^^^^^^^^^^^^^
Georg Brandl116aa622007-08-15 14:28:22 +000074
Antoine Pitroud4d60552013-12-07 00:56:59 +010075There are fundamental differences between the pickle protocols and
76`JSON (JavaScript Object Notation) <http://json.org>`_:
77
78* JSON is a text serialization format (it outputs unicode text, although
79 most of the time it is then encoded to ``utf-8``), while pickle is
80 a binary serialization format;
81
82* JSON is human-readable, while pickle is not;
83
84* JSON is interoperable and widely used outside of the Python ecosystem,
85 while pickle is Python-specific;
86
87* JSON, by default, can only represent a subset of the Python built-in
88 types, and no custom classes; pickle can represent an extremely large
89 number of Python types (many of them automatically, by clever usage
90 of Python's introspection facilities; complex cases can be tackled by
91 implementing :ref:`specific object APIs <pickle-inst>`).
92
93.. seealso::
94 The :mod:`json` module: a standard library module allowing JSON
95 serialization and deserialization.
Georg Brandl116aa622007-08-15 14:28:22 +000096
Antoine Pitrou9bcb1122013-12-07 01:05:57 +010097
98.. _pickle-protocols:
99
Georg Brandl116aa622007-08-15 14:28:22 +0000100Data stream format
101------------------
102
103.. index::
Georg Brandl116aa622007-08-15 14:28:22 +0000104 single: External Data Representation
105
106The data format used by :mod:`pickle` is Python-specific. This has the
107advantage that there are no restrictions imposed by external standards such as
Antoine Pitroua9494f62012-05-10 15:38:30 +0200108JSON or XDR (which can't represent pointer sharing); however it means that
109non-Python programs may not be able to reconstruct pickled Python objects.
Georg Brandl116aa622007-08-15 14:28:22 +0000110
Antoine Pitroua9494f62012-05-10 15:38:30 +0200111By default, the :mod:`pickle` data format uses a relatively compact binary
112representation. If you need optimal size characteristics, you can efficiently
113:doc:`compress <archiving>` pickled data.
114
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000115The module :mod:`pickletools` contains tools for analyzing data streams
Antoine Pitroua9494f62012-05-10 15:38:30 +0200116generated by :mod:`pickle`. :mod:`pickletools` source code has extensive
117comments about opcodes used by pickle protocols.
Georg Brandl116aa622007-08-15 14:28:22 +0000118
Antoine Pitroub6457242014-01-21 02:39:54 +0100119There are currently 5 different protocols which can be used for pickling.
120The higher the protocol used, the more recent the version of Python needed
121to read the pickle produced.
Georg Brandl116aa622007-08-15 14:28:22 +0000122
Antoine Pitroua9494f62012-05-10 15:38:30 +0200123* Protocol version 0 is the original "human-readable" protocol and is
Alexandre Vassalottif7d08c72009-01-23 04:50:05 +0000124 backwards compatible with earlier versions of Python.
Georg Brandl116aa622007-08-15 14:28:22 +0000125
Antoine Pitroua9494f62012-05-10 15:38:30 +0200126* Protocol version 1 is an old binary format which is also compatible with
Georg Brandl116aa622007-08-15 14:28:22 +0000127 earlier versions of Python.
128
129* Protocol version 2 was introduced in Python 2.3. It provides much more
Antoine Pitroua9494f62012-05-10 15:38:30 +0200130 efficient pickling of :term:`new-style class`\es. Refer to :pep:`307` for
131 information about improvements brought by protocol 2.
Georg Brandl116aa622007-08-15 14:28:22 +0000132
Antoine Pitrou9bcb1122013-12-07 01:05:57 +0100133* Protocol version 3 was added in Python 3.0. It has explicit support for
Antoine Pitroua9494f62012-05-10 15:38:30 +0200134 :class:`bytes` objects and cannot be unpickled by Python 2.x. This is
Antoine Pitrou9bcb1122013-12-07 01:05:57 +0100135 the default protocol, and the recommended protocol when compatibility with
136 other Python 3 versions is required.
137
138* Protocol version 4 was added in Python 3.4. It adds support for very large
139 objects, pickling more kinds of objects, and some data format
140 optimizations. Refer to :pep:`3154` for information about improvements
141 brought by protocol 4.
Georg Brandl116aa622007-08-15 14:28:22 +0000142
Antoine Pitroud4d60552013-12-07 00:56:59 +0100143.. note::
144 Serialization is a more primitive notion than persistence; although
145 :mod:`pickle` reads and writes file objects, it does not handle the issue of
146 naming persistent objects, nor the (even more complicated) issue of concurrent
147 access to persistent objects. The :mod:`pickle` module can transform a complex
148 object into a byte stream and it can transform the byte stream into an object
149 with the same internal structure. Perhaps the most obvious thing to do with
150 these byte streams is to write them onto a file, but it is also conceivable to
151 send them across a network or store them in a database. The :mod:`shelve`
152 module provides a simple interface to pickle and unpickle objects on
153 DBM-style database files.
154
Georg Brandl116aa622007-08-15 14:28:22 +0000155
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000156Module Interface
157----------------
Georg Brandl116aa622007-08-15 14:28:22 +0000158
Antoine Pitroua9494f62012-05-10 15:38:30 +0200159To serialize an object hierarchy, you simply call the :func:`dumps` function.
160Similarly, to de-serialize a data stream, you call the :func:`loads` function.
161However, if you want more control over serialization and de-serialization,
162you can create a :class:`Pickler` or an :class:`Unpickler` object, respectively.
163
164The :mod:`pickle` module provides the following constants:
Georg Brandl116aa622007-08-15 14:28:22 +0000165
166
167.. data:: HIGHEST_PROTOCOL
168
Antoine Pitrou9bcb1122013-12-07 01:05:57 +0100169 An integer, the highest :ref:`protocol version <pickle-protocols>`
170 available. This value can be passed as a *protocol* value to functions
171 :func:`dump` and :func:`dumps` as well as the :class:`Pickler`
172 constructor.
Georg Brandl116aa622007-08-15 14:28:22 +0000173
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000174.. data:: DEFAULT_PROTOCOL
175
Antoine Pitrou9bcb1122013-12-07 01:05:57 +0100176 An integer, the default :ref:`protocol version <pickle-protocols>` used
177 for pickling. May be less than :data:`HIGHEST_PROTOCOL`. Currently the
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -0800178 default protocol is 3, a new protocol designed for Python 3.
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000179
180
Georg Brandl116aa622007-08-15 14:28:22 +0000181The :mod:`pickle` module provides the following functions to make the pickling
182process more convenient:
183
Georg Brandl18244152009-09-02 20:34:52 +0000184.. function:: dump(obj, file, protocol=None, \*, fix_imports=True)
Georg Brandl116aa622007-08-15 14:28:22 +0000185
Antoine Pitrou11cb9612010-09-15 11:11:28 +0000186 Write a pickled representation of *obj* to the open :term:`file object` *file*.
187 This is equivalent to ``Pickler(file, protocol).dump(obj)``.
Georg Brandl116aa622007-08-15 14:28:22 +0000188
Antoine Pitroub6457242014-01-21 02:39:54 +0100189 The optional *protocol* argument, an integer, tells the pickler to use
190 the given protocol; supported protocols are 0 to :data:`HIGHEST_PROTOCOL`.
191 If not specified, the default is :data:`DEFAULT_PROTOCOL`. If a negative
192 number is specified, :data:`HIGHEST_PROTOCOL` is selected.
Georg Brandl116aa622007-08-15 14:28:22 +0000193
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000194 The *file* argument must have a write() method that accepts a single bytes
Antoine Pitrou11cb9612010-09-15 11:11:28 +0000195 argument. It can thus be an on-disk file opened for binary writing, a
196 :class:`io.BytesIO` instance, or any other custom object that meets this
197 interface.
Georg Brandl116aa622007-08-15 14:28:22 +0000198
Serhiy Storchakafbc1c262013-11-29 12:17:13 +0200199 If *fix_imports* is true and *protocol* is less than 3, pickle will try to
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -0800200 map the new Python 3 names to the old module names used in Python 2, so
201 that the pickle data stream is readable with Python 2.
Antoine Pitroud9dfaa92009-06-04 20:32:06 +0000202
Georg Brandl18244152009-09-02 20:34:52 +0000203.. function:: dumps(obj, protocol=None, \*, fix_imports=True)
Georg Brandl116aa622007-08-15 14:28:22 +0000204
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -0800205 Return the pickled representation of the object as a :class:`bytes` object,
206 instead of writing it to a file.
Georg Brandl116aa622007-08-15 14:28:22 +0000207
Antoine Pitroub6457242014-01-21 02:39:54 +0100208 Arguments *protocol* and *fix_imports* have the same meaning as in
209 :func:`dump`.
Antoine Pitroud9dfaa92009-06-04 20:32:06 +0000210
Georg Brandl18244152009-09-02 20:34:52 +0000211.. function:: load(file, \*, fix_imports=True, encoding="ASCII", errors="strict")
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000212
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -0800213 Read a pickled object representation from the open :term:`file object`
214 *file* and return the reconstituted object hierarchy specified therein.
215 This is equivalent to ``Unpickler(file).load()``.
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000216
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -0800217 The protocol version of the pickle is detected automatically, so no
218 protocol argument is needed. Bytes past the pickled object's
219 representation are ignored.
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000220
221 The argument *file* must have two methods, a read() method that takes an
222 integer argument, and a readline() method that requires no arguments. Both
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -0800223 methods should return bytes. Thus *file* can be an on-disk file opened for
224 binary reading, a :class:`io.BytesIO` object, or any other custom object
Antoine Pitrou11cb9612010-09-15 11:11:28 +0000225 that meets this interface.
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000226
Antoine Pitroud9dfaa92009-06-04 20:32:06 +0000227 Optional keyword arguments are *fix_imports*, *encoding* and *errors*,
Georg Brandl6faee4e2010-09-21 14:48:28 +0000228 which are used to control compatibility support for pickle stream generated
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -0800229 by Python 2. If *fix_imports* is true, pickle will try to map the old
230 Python 2 names to the new names used in Python 3. The *encoding* and
Antoine Pitroud9dfaa92009-06-04 20:32:06 +0000231 *errors* tell pickle how to decode 8-bit string instances pickled by Python
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -0800232 2; these default to 'ASCII' and 'strict', respectively. The *encoding* can
233 be 'bytes' to read these 8-bit string instances as bytes objects.
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000234
Georg Brandl18244152009-09-02 20:34:52 +0000235.. function:: loads(bytes_object, \*, fix_imports=True, encoding="ASCII", errors="strict")
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000236
237 Read a pickled object hierarchy from a :class:`bytes` object and return the
238 reconstituted object hierarchy specified therein
239
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -0800240 The protocol version of the pickle is detected automatically, so no
241 protocol argument is needed. Bytes past the pickled object's
242 representation are ignored.
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000243
Antoine Pitroud9dfaa92009-06-04 20:32:06 +0000244 Optional keyword arguments are *fix_imports*, *encoding* and *errors*,
Georg Brandl6faee4e2010-09-21 14:48:28 +0000245 which are used to control compatibility support for pickle stream generated
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -0800246 by Python 2. If *fix_imports* is true, pickle will try to map the old
247 Python 2 names to the new names used in Python 3. The *encoding* and
Antoine Pitroud9dfaa92009-06-04 20:32:06 +0000248 *errors* tell pickle how to decode 8-bit string instances pickled by Python
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -0800249 2; these default to 'ASCII' and 'strict', respectively. The *encoding* can
250 be 'bytes' to read these 8-bit string instances as bytes objects.
Georg Brandl116aa622007-08-15 14:28:22 +0000251
Georg Brandl116aa622007-08-15 14:28:22 +0000252
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000253The :mod:`pickle` module defines three exceptions:
Georg Brandl116aa622007-08-15 14:28:22 +0000254
255.. exception:: PickleError
256
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000257 Common base class for the other pickling exceptions. It inherits
Georg Brandl116aa622007-08-15 14:28:22 +0000258 :exc:`Exception`.
259
Georg Brandl116aa622007-08-15 14:28:22 +0000260.. exception:: PicklingError
261
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000262 Error raised when an unpicklable object is encountered by :class:`Pickler`.
263 It inherits :exc:`PickleError`.
Georg Brandl116aa622007-08-15 14:28:22 +0000264
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000265 Refer to :ref:`pickle-picklable` to learn what kinds of objects can be
266 pickled.
267
Georg Brandl116aa622007-08-15 14:28:22 +0000268.. exception:: UnpicklingError
269
Ezio Melottie62aad32011-11-18 13:51:10 +0200270 Error raised when there is a problem unpickling an object, such as a data
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000271 corruption or a security violation. It inherits :exc:`PickleError`.
Georg Brandl116aa622007-08-15 14:28:22 +0000272
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000273 Note that other exceptions may also be raised during unpickling, including
274 (but not necessarily limited to) AttributeError, EOFError, ImportError, and
275 IndexError.
276
277
278The :mod:`pickle` module exports two classes, :class:`Pickler` and
Georg Brandl116aa622007-08-15 14:28:22 +0000279:class:`Unpickler`:
280
Georg Brandl18244152009-09-02 20:34:52 +0000281.. class:: Pickler(file, protocol=None, \*, fix_imports=True)
Georg Brandl116aa622007-08-15 14:28:22 +0000282
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000283 This takes a binary file for writing a pickle data stream.
Georg Brandl116aa622007-08-15 14:28:22 +0000284
Antoine Pitroub6457242014-01-21 02:39:54 +0100285 The optional *protocol* argument, an integer, tells the pickler to use
286 the given protocol; supported protocols are 0 to :data:`HIGHEST_PROTOCOL`.
287 If not specified, the default is :data:`DEFAULT_PROTOCOL`. If a negative
288 number is specified, :data:`HIGHEST_PROTOCOL` is selected.
Georg Brandl116aa622007-08-15 14:28:22 +0000289
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000290 The *file* argument must have a write() method that accepts a single bytes
Antoine Pitrou11cb9612010-09-15 11:11:28 +0000291 argument. It can thus be an on-disk file opened for binary writing, a
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -0800292 :class:`io.BytesIO` instance, or any other custom object that meets this
293 interface.
Georg Brandl116aa622007-08-15 14:28:22 +0000294
Serhiy Storchakafbc1c262013-11-29 12:17:13 +0200295 If *fix_imports* is true and *protocol* is less than 3, pickle will try to
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -0800296 map the new Python 3 names to the old module names used in Python 2, so
297 that the pickle data stream is readable with Python 2.
Antoine Pitroud9dfaa92009-06-04 20:32:06 +0000298
Benjamin Petersone41251e2008-04-25 01:59:09 +0000299 .. method:: dump(obj)
Georg Brandl116aa622007-08-15 14:28:22 +0000300
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000301 Write a pickled representation of *obj* to the open file object given in
302 the constructor.
Georg Brandl116aa622007-08-15 14:28:22 +0000303
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000304 .. method:: persistent_id(obj)
305
306 Do nothing by default. This exists so a subclass can override it.
307
308 If :meth:`persistent_id` returns ``None``, *obj* is pickled as usual. Any
309 other value causes :class:`Pickler` to emit the returned value as a
310 persistent ID for *obj*. The meaning of this persistent ID should be
311 defined by :meth:`Unpickler.persistent_load`. Note that the value
312 returned by :meth:`persistent_id` cannot itself have a persistent ID.
313
314 See :ref:`pickle-persistent` for details and examples of uses.
Georg Brandl116aa622007-08-15 14:28:22 +0000315
Antoine Pitrou8d3c2902012-03-04 18:31:48 +0100316 .. attribute:: dispatch_table
317
318 A pickler object's dispatch table is a registry of *reduction
319 functions* of the kind which can be declared using
320 :func:`copyreg.pickle`. It is a mapping whose keys are classes
321 and whose values are reduction functions. A reduction function
322 takes a single argument of the associated class and should
Serhiy Storchaka5bbbc942013-10-14 10:43:46 +0300323 conform to the same interface as a :meth:`__reduce__`
Antoine Pitrou8d3c2902012-03-04 18:31:48 +0100324 method.
325
326 By default, a pickler object will not have a
327 :attr:`dispatch_table` attribute, and it will instead use the
328 global dispatch table managed by the :mod:`copyreg` module.
329 However, to customize the pickling for a specific pickler object
330 one can set the :attr:`dispatch_table` attribute to a dict-like
331 object. Alternatively, if a subclass of :class:`Pickler` has a
332 :attr:`dispatch_table` attribute then this will be used as the
333 default dispatch table for instances of that class.
334
335 See :ref:`pickle-dispatch` for usage examples.
336
337 .. versionadded:: 3.3
338
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000339 .. attribute:: fast
340
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000341 Deprecated. Enable fast mode if set to a true value. The fast mode
342 disables the usage of memo, therefore speeding the pickling process by not
343 generating superfluous PUT opcodes. It should not be used with
344 self-referential objects, doing otherwise will cause :class:`Pickler` to
345 recurse infinitely.
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000346
347 Use :func:`pickletools.optimize` if you need more compact pickles.
348
Georg Brandl116aa622007-08-15 14:28:22 +0000349
Georg Brandl18244152009-09-02 20:34:52 +0000350.. class:: Unpickler(file, \*, fix_imports=True, encoding="ASCII", errors="strict")
Georg Brandl116aa622007-08-15 14:28:22 +0000351
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000352 This takes a binary file for reading a pickle data stream.
Georg Brandl116aa622007-08-15 14:28:22 +0000353
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000354 The protocol version of the pickle is detected automatically, so no
355 protocol argument is needed.
356
357 The argument *file* must have two methods, a read() method that takes an
358 integer argument, and a readline() method that requires no arguments. Both
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -0800359 methods should return bytes. Thus *file* can be an on-disk file object
360 opened for binary reading, a :class:`io.BytesIO` object, or any other
361 custom object that meets this interface.
Georg Brandl116aa622007-08-15 14:28:22 +0000362
Antoine Pitroud9dfaa92009-06-04 20:32:06 +0000363 Optional keyword arguments are *fix_imports*, *encoding* and *errors*,
Georg Brandl6faee4e2010-09-21 14:48:28 +0000364 which are used to control compatibility support for pickle stream generated
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -0800365 by Python 2. If *fix_imports* is true, pickle will try to map the old
366 Python 2 names to the new names used in Python 3. The *encoding* and
Antoine Pitroud9dfaa92009-06-04 20:32:06 +0000367 *errors* tell pickle how to decode 8-bit string instances pickled by Python
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -0800368 2; these default to 'ASCII' and 'strict', respectively. The *encoding* can
369 be 'bytes' to read these ß8-bit string instances as bytes objects.
Georg Brandl116aa622007-08-15 14:28:22 +0000370
Benjamin Petersone41251e2008-04-25 01:59:09 +0000371 .. method:: load()
Georg Brandl116aa622007-08-15 14:28:22 +0000372
Benjamin Petersone41251e2008-04-25 01:59:09 +0000373 Read a pickled object representation from the open file object given in
374 the constructor, and return the reconstituted object hierarchy specified
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000375 therein. Bytes past the pickled object's representation are ignored.
Georg Brandl116aa622007-08-15 14:28:22 +0000376
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000377 .. method:: persistent_load(pid)
Georg Brandl116aa622007-08-15 14:28:22 +0000378
Ezio Melottie62aad32011-11-18 13:51:10 +0200379 Raise an :exc:`UnpicklingError` by default.
Georg Brandl116aa622007-08-15 14:28:22 +0000380
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000381 If defined, :meth:`persistent_load` should return the object specified by
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000382 the persistent ID *pid*. If an invalid persistent ID is encountered, an
Ezio Melottie62aad32011-11-18 13:51:10 +0200383 :exc:`UnpicklingError` should be raised.
Georg Brandl116aa622007-08-15 14:28:22 +0000384
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000385 See :ref:`pickle-persistent` for details and examples of uses.
386
387 .. method:: find_class(module, name)
388
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000389 Import *module* if necessary and return the object called *name* from it,
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000390 where the *module* and *name* arguments are :class:`str` objects. Note,
391 unlike its name suggests, :meth:`find_class` is also used for finding
392 functions.
Georg Brandl116aa622007-08-15 14:28:22 +0000393
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000394 Subclasses may override this to gain control over what type of objects and
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000395 how they can be loaded, potentially reducing security risks. Refer to
396 :ref:`pickle-restrict` for details.
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000397
398
399.. _pickle-picklable:
Georg Brandl116aa622007-08-15 14:28:22 +0000400
401What can be pickled and unpickled?
402----------------------------------
403
404The following types can be pickled:
405
406* ``None``, ``True``, and ``False``
407
Georg Brandlba956ae2007-11-29 17:24:34 +0000408* integers, floating point numbers, complex numbers
Georg Brandl116aa622007-08-15 14:28:22 +0000409
Georg Brandlf6945182008-02-01 11:56:49 +0000410* strings, bytes, bytearrays
Georg Brandl116aa622007-08-15 14:28:22 +0000411
412* tuples, lists, sets, and dictionaries containing only picklable objects
413
Ethan Furman2498d9e2013-10-18 00:45:40 -0700414* functions defined at the top level of a module (using :keyword:`def`, not
415 :keyword:`lambda`)
Georg Brandl116aa622007-08-15 14:28:22 +0000416
417* built-in functions defined at the top level of a module
418
419* classes that are defined at the top level of a module
420
Serhiy Storchaka5bbbc942013-10-14 10:43:46 +0300421* instances of such classes whose :attr:`~object.__dict__` or the result of
422 calling :meth:`__getstate__` is picklable (see section :ref:`pickle-inst` for
Eli Bendersky78f3ce52013-01-02 05:53:59 -0800423 details).
Georg Brandl116aa622007-08-15 14:28:22 +0000424
425Attempts to pickle unpicklable objects will raise the :exc:`PicklingError`
426exception; when this happens, an unspecified number of bytes may have already
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000427been written to the underlying file. Trying to pickle a highly recursive data
Yury Selivanovf488fb42015-07-03 01:04:23 -0400428structure may exceed the maximum recursion depth, a :exc:`RecursionError` will be
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000429raised in this case. You can carefully raise this limit with
Georg Brandl116aa622007-08-15 14:28:22 +0000430:func:`sys.setrecursionlimit`.
431
432Note that functions (built-in and user-defined) are pickled by "fully qualified"
Ethan Furman2498d9e2013-10-18 00:45:40 -0700433name reference, not by value. [#]_ This means that only the function name is
Eli Bendersky78f3ce52013-01-02 05:53:59 -0800434pickled, along with the name of the module the function is defined in. Neither
435the function's code, nor any of its function attributes are pickled. Thus the
Georg Brandl116aa622007-08-15 14:28:22 +0000436defining module must be importable in the unpickling environment, and the module
437must contain the named object, otherwise an exception will be raised. [#]_
438
439Similarly, classes are pickled by named reference, so the same restrictions in
440the unpickling environment apply. Note that none of the class's code or data is
441pickled, so in the following example the class attribute ``attr`` is not
442restored in the unpickling environment::
443
444 class Foo:
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000445 attr = 'A class attribute'
Georg Brandl116aa622007-08-15 14:28:22 +0000446
447 picklestring = pickle.dumps(Foo)
448
449These restrictions are why picklable functions and classes must be defined in
450the top level of a module.
451
452Similarly, when class instances are pickled, their class's code and data are not
453pickled along with them. Only the instance data are pickled. This is done on
454purpose, so you can fix bugs in a class or add methods to the class and still
455load objects that were created with an earlier version of the class. If you
456plan to have long-lived objects that will see many versions of a class, it may
457be worthwhile to put a version number in the objects so that suitable
458conversions can be made by the class's :meth:`__setstate__` method.
459
460
Georg Brandl116aa622007-08-15 14:28:22 +0000461.. _pickle-inst:
462
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000463Pickling Class Instances
464------------------------
Georg Brandl116aa622007-08-15 14:28:22 +0000465
Serhiy Storchaka5bbbc942013-10-14 10:43:46 +0300466.. currentmodule:: None
467
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000468In this section, we describe the general mechanisms available to you to define,
469customize, and control how class instances are pickled and unpickled.
Georg Brandl116aa622007-08-15 14:28:22 +0000470
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000471In most cases, no additional code is needed to make instances picklable. By
472default, pickle will retrieve the class and the attributes of an instance via
473introspection. When a class instance is unpickled, its :meth:`__init__` method
474is usually *not* invoked. The default behaviour first creates an uninitialized
475instance and then restores the saved attributes. The following code shows an
476implementation of this behaviour::
Georg Brandl85eb8c12007-08-31 16:33:38 +0000477
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000478 def save(obj):
479 return (obj.__class__, obj.__dict__)
480
481 def load(cls, attributes):
482 obj = cls.__new__(cls)
483 obj.__dict__.update(attributes)
484 return obj
Georg Brandl116aa622007-08-15 14:28:22 +0000485
Georg Brandl6faee4e2010-09-21 14:48:28 +0000486Classes can alter the default behaviour by providing one or several special
Georg Brandlc8148262010-10-17 11:13:37 +0000487methods:
Georg Brandl116aa622007-08-15 14:28:22 +0000488
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100489.. method:: object.__getnewargs_ex__()
490
491 In protocols 4 and newer, classes that implements the
492 :meth:`__getnewargs_ex__` method can dictate the values passed to the
493 :meth:`__new__` method upon unpickling. The method must return a pair
494 ``(args, kwargs)`` where *args* is a tuple of positional arguments
495 and *kwargs* a dictionary of named arguments for constructing the
496 object. Those will be passed to the :meth:`__new__` method upon
497 unpickling.
498
499 You should implement this method if the :meth:`__new__` method of your
500 class requires keyword-only arguments. Otherwise, it is recommended for
501 compatibility to implement :meth:`__getnewargs__`.
502
503
Georg Brandlc8148262010-10-17 11:13:37 +0000504.. method:: object.__getnewargs__()
Georg Brandl116aa622007-08-15 14:28:22 +0000505
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100506 This method serve a similar purpose as :meth:`__getnewargs_ex__` but
Georg Brandl93a56cd2014-10-30 22:25:41 +0100507 for protocols 2 and newer. It must return a tuple of arguments ``args``
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100508 which will be passed to the :meth:`__new__` method upon unpickling.
509
510 In protocols 4 and newer, :meth:`__getnewargs__` will not be called if
511 :meth:`__getnewargs_ex__` is defined.
Georg Brandl116aa622007-08-15 14:28:22 +0000512
Georg Brandl116aa622007-08-15 14:28:22 +0000513
Georg Brandlc8148262010-10-17 11:13:37 +0000514.. method:: object.__getstate__()
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000515
Georg Brandlc8148262010-10-17 11:13:37 +0000516 Classes can further influence how their instances are pickled; if the class
517 defines the method :meth:`__getstate__`, it is called and the returned object
518 is pickled as the contents for the instance, instead of the contents of the
519 instance's dictionary. If the :meth:`__getstate__` method is absent, the
Serhiy Storchaka5bbbc942013-10-14 10:43:46 +0300520 instance's :attr:`~object.__dict__` is pickled as usual.
Georg Brandl116aa622007-08-15 14:28:22 +0000521
Georg Brandlc8148262010-10-17 11:13:37 +0000522
523.. method:: object.__setstate__(state)
524
525 Upon unpickling, if the class defines :meth:`__setstate__`, it is called with
526 the unpickled state. In that case, there is no requirement for the state
527 object to be a dictionary. Otherwise, the pickled state must be a dictionary
528 and its items are assigned to the new instance's dictionary.
529
530 .. note::
531
532 If :meth:`__getstate__` returns a false value, the :meth:`__setstate__`
533 method will not be called upon unpickling.
534
Georg Brandl116aa622007-08-15 14:28:22 +0000535
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000536Refer to the section :ref:`pickle-state` for more information about how to use
537the methods :meth:`__getstate__` and :meth:`__setstate__`.
Georg Brandl116aa622007-08-15 14:28:22 +0000538
Benjamin Petersond23f8222009-04-05 19:13:16 +0000539.. note::
Georg Brandle720c0a2009-04-27 16:20:50 +0000540
Benjamin Petersond23f8222009-04-05 19:13:16 +0000541 At unpickling time, some methods like :meth:`__getattr__`,
542 :meth:`__getattribute__`, or :meth:`__setattr__` may be called upon the
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100543 instance. In case those methods rely on some internal invariant being
544 true, the type should implement :meth:`__getnewargs__` or
545 :meth:`__getnewargs_ex__` to establish such an invariant; otherwise,
546 neither :meth:`__new__` nor :meth:`__init__` will be called.
Benjamin Petersond23f8222009-04-05 19:13:16 +0000547
Georg Brandlc8148262010-10-17 11:13:37 +0000548.. index:: pair: copy; protocol
Christian Heimes05e8be12008-02-23 18:30:17 +0000549
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000550As we shall see, pickle does not use directly the methods described above. In
551fact, these methods are part of the copy protocol which implements the
552:meth:`__reduce__` special method. The copy protocol provides a unified
553interface for retrieving the data necessary for pickling and copying
Georg Brandl48310cd2009-01-03 21:18:54 +0000554objects. [#]_
Georg Brandl116aa622007-08-15 14:28:22 +0000555
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000556Although powerful, implementing :meth:`__reduce__` directly in your classes is
557error prone. For this reason, class designers should use the high-level
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100558interface (i.e., :meth:`__getnewargs_ex__`, :meth:`__getstate__` and
Georg Brandlc8148262010-10-17 11:13:37 +0000559:meth:`__setstate__`) whenever possible. We will show, however, cases where
560using :meth:`__reduce__` is the only option or leads to more efficient pickling
561or both.
Georg Brandl116aa622007-08-15 14:28:22 +0000562
Georg Brandlc8148262010-10-17 11:13:37 +0000563.. method:: object.__reduce__()
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000564
Georg Brandlc8148262010-10-17 11:13:37 +0000565 The interface is currently defined as follows. The :meth:`__reduce__` method
566 takes no argument and shall return either a string or preferably a tuple (the
567 returned object is often referred to as the "reduce value").
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000568
Georg Brandlc8148262010-10-17 11:13:37 +0000569 If a string is returned, the string should be interpreted as the name of a
570 global variable. It should be the object's local name relative to its
571 module; the pickle module searches the module namespace to determine the
572 object's module. This behaviour is typically useful for singletons.
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000573
Georg Brandlc8148262010-10-17 11:13:37 +0000574 When a tuple is returned, it must be between two and five items long.
575 Optional items can either be omitted, or ``None`` can be provided as their
576 value. The semantics of each item are in order:
Georg Brandl116aa622007-08-15 14:28:22 +0000577
Georg Brandlc8148262010-10-17 11:13:37 +0000578 .. XXX Mention __newobj__ special-case?
Georg Brandl116aa622007-08-15 14:28:22 +0000579
Georg Brandlc8148262010-10-17 11:13:37 +0000580 * A callable object that will be called to create the initial version of the
581 object.
Georg Brandl116aa622007-08-15 14:28:22 +0000582
Georg Brandlc8148262010-10-17 11:13:37 +0000583 * A tuple of arguments for the callable object. An empty tuple must be given
584 if the callable does not accept any argument.
Georg Brandl116aa622007-08-15 14:28:22 +0000585
Georg Brandlc8148262010-10-17 11:13:37 +0000586 * Optionally, the object's state, which will be passed to the object's
587 :meth:`__setstate__` method as previously described. If the object has no
588 such method then, the value must be a dictionary and it will be added to
Serhiy Storchaka5bbbc942013-10-14 10:43:46 +0300589 the object's :attr:`~object.__dict__` attribute.
Georg Brandl116aa622007-08-15 14:28:22 +0000590
Georg Brandlc8148262010-10-17 11:13:37 +0000591 * Optionally, an iterator (and not a sequence) yielding successive items.
592 These items will be appended to the object either using
593 ``obj.append(item)`` or, in batch, using ``obj.extend(list_of_items)``.
594 This is primarily used for list subclasses, but may be used by other
595 classes as long as they have :meth:`append` and :meth:`extend` methods with
596 the appropriate signature. (Whether :meth:`append` or :meth:`extend` is
597 used depends on which pickle protocol version is used as well as the number
598 of items to append, so both must be supported.)
Georg Brandl116aa622007-08-15 14:28:22 +0000599
Georg Brandlc8148262010-10-17 11:13:37 +0000600 * Optionally, an iterator (not a sequence) yielding successive key-value
601 pairs. These items will be stored to the object using ``obj[key] =
602 value``. This is primarily used for dictionary subclasses, but may be used
603 by other classes as long as they implement :meth:`__setitem__`.
Georg Brandl116aa622007-08-15 14:28:22 +0000604
Georg Brandlc8148262010-10-17 11:13:37 +0000605
606.. method:: object.__reduce_ex__(protocol)
607
608 Alternatively, a :meth:`__reduce_ex__` method may be defined. The only
609 difference is this method should take a single integer argument, the protocol
610 version. When defined, pickle will prefer it over the :meth:`__reduce__`
611 method. In addition, :meth:`__reduce__` automatically becomes a synonym for
612 the extended version. The main use for this method is to provide
613 backwards-compatible reduce values for older Python releases.
Georg Brandl116aa622007-08-15 14:28:22 +0000614
Serhiy Storchaka5bbbc942013-10-14 10:43:46 +0300615.. currentmodule:: pickle
616
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000617.. _pickle-persistent:
618
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000619Persistence of External Objects
620^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Georg Brandl116aa622007-08-15 14:28:22 +0000621
Christian Heimes05e8be12008-02-23 18:30:17 +0000622.. index::
623 single: persistent_id (pickle protocol)
624 single: persistent_load (pickle protocol)
625
Georg Brandl116aa622007-08-15 14:28:22 +0000626For the benefit of object persistence, the :mod:`pickle` module supports the
627notion of a reference to an object outside the pickled data stream. Such
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000628objects are referenced by a persistent ID, which should be either a string of
629alphanumeric characters (for protocol 0) [#]_ or just an arbitrary object (for
630any newer protocol).
Georg Brandl116aa622007-08-15 14:28:22 +0000631
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000632The resolution of such persistent IDs is not defined by the :mod:`pickle`
633module; it will delegate this resolution to the user defined methods on the
Serhiy Storchaka5bbbc942013-10-14 10:43:46 +0300634pickler and unpickler, :meth:`~Pickler.persistent_id` and
635:meth:`~Unpickler.persistent_load` respectively.
Georg Brandl116aa622007-08-15 14:28:22 +0000636
637To pickle objects that have an external persistent id, the pickler must have a
Serhiy Storchaka5bbbc942013-10-14 10:43:46 +0300638custom :meth:`~Pickler.persistent_id` method that takes an object as an
639argument and returns either ``None`` or the persistent id for that object.
640When ``None`` is returned, the pickler simply pickles the object as normal.
641When a persistent ID string is returned, the pickler will pickle that object,
642along with a marker so that the unpickler will recognize it as a persistent ID.
Georg Brandl116aa622007-08-15 14:28:22 +0000643
644To unpickle external objects, the unpickler must have a custom
Serhiy Storchaka5bbbc942013-10-14 10:43:46 +0300645:meth:`~Unpickler.persistent_load` method that takes a persistent ID object and
646returns the referenced object.
Georg Brandl116aa622007-08-15 14:28:22 +0000647
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000648Here is a comprehensive example presenting how persistent ID can be used to
649pickle external objects by reference.
Georg Brandl116aa622007-08-15 14:28:22 +0000650
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000651.. literalinclude:: ../includes/dbpickle.py
Alexandre Vassalottibcd1e3a2009-01-23 05:28:16 +0000652
Antoine Pitrou8d3c2902012-03-04 18:31:48 +0100653.. _pickle-dispatch:
654
655Dispatch Tables
656^^^^^^^^^^^^^^^
657
658If one wants to customize pickling of some classes without disturbing
659any other code which depends on pickling, then one can create a
660pickler with a private dispatch table.
661
662The global dispatch table managed by the :mod:`copyreg` module is
663available as :data:`copyreg.dispatch_table`. Therefore, one may
664choose to use a modified copy of :data:`copyreg.dispatch_table` as a
665private dispatch table.
666
667For example ::
668
669 f = io.BytesIO()
670 p = pickle.Pickler(f)
671 p.dispatch_table = copyreg.dispatch_table.copy()
672 p.dispatch_table[SomeClass] = reduce_SomeClass
673
674creates an instance of :class:`pickle.Pickler` with a private dispatch
675table which handles the ``SomeClass`` class specially. Alternatively,
676the code ::
677
678 class MyPickler(pickle.Pickler):
679 dispatch_table = copyreg.dispatch_table.copy()
680 dispatch_table[SomeClass] = reduce_SomeClass
681 f = io.BytesIO()
682 p = MyPickler(f)
683
684does the same, but all instances of ``MyPickler`` will by default
685share the same dispatch table. The equivalent code using the
686:mod:`copyreg` module is ::
687
688 copyreg.pickle(SomeClass, reduce_SomeClass)
689 f = io.BytesIO()
690 p = pickle.Pickler(f)
Georg Brandl116aa622007-08-15 14:28:22 +0000691
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000692.. _pickle-state:
693
694Handling Stateful Objects
695^^^^^^^^^^^^^^^^^^^^^^^^^
696
697.. index::
698 single: __getstate__() (copy protocol)
699 single: __setstate__() (copy protocol)
700
701Here's an example that shows how to modify pickling behavior for a class.
702The :class:`TextReader` class opens a text file, and returns the line number and
Serhiy Storchaka5bbbc942013-10-14 10:43:46 +0300703line contents each time its :meth:`!readline` method is called. If a
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000704:class:`TextReader` instance is pickled, all attributes *except* the file object
705member are saved. When the instance is unpickled, the file is reopened, and
706reading resumes from the last location. The :meth:`__setstate__` and
707:meth:`__getstate__` methods are used to implement this behavior. ::
708
709 class TextReader:
710 """Print and number lines in a text file."""
711
712 def __init__(self, filename):
713 self.filename = filename
714 self.file = open(filename)
715 self.lineno = 0
716
717 def readline(self):
718 self.lineno += 1
719 line = self.file.readline()
720 if not line:
721 return None
Alexandre Vassalotti9d7665d2009-04-03 06:13:29 +0000722 if line.endswith('\n'):
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000723 line = line[:-1]
724 return "%i: %s" % (self.lineno, line)
725
726 def __getstate__(self):
727 # Copy the object's state from self.__dict__ which contains
728 # all our instance attributes. Always use the dict.copy()
729 # method to avoid modifying the original state.
730 state = self.__dict__.copy()
731 # Remove the unpicklable entries.
732 del state['file']
733 return state
734
735 def __setstate__(self, state):
736 # Restore instance attributes (i.e., filename and lineno).
737 self.__dict__.update(state)
738 # Restore the previously opened file's state. To do so, we need to
739 # reopen it and read from it until the line count is restored.
740 file = open(self.filename)
741 for _ in range(self.lineno):
742 file.readline()
743 # Finally, save the file.
744 self.file = file
745
746
747A sample usage might be something like this::
748
749 >>> reader = TextReader("hello.txt")
750 >>> reader.readline()
751 '1: Hello world!'
752 >>> reader.readline()
753 '2: I am line number two.'
754 >>> new_reader = pickle.loads(pickle.dumps(reader))
755 >>> new_reader.readline()
756 '3: Goodbye!'
757
758
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000759.. _pickle-restrict:
Georg Brandl116aa622007-08-15 14:28:22 +0000760
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000761Restricting Globals
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000762-------------------
Georg Brandl116aa622007-08-15 14:28:22 +0000763
Christian Heimes05e8be12008-02-23 18:30:17 +0000764.. index::
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000765 single: find_class() (pickle protocol)
Christian Heimes05e8be12008-02-23 18:30:17 +0000766
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000767By default, unpickling will import any class or function that it finds in the
768pickle data. For many applications, this behaviour is unacceptable as it
769permits the unpickler to import and invoke arbitrary code. Just consider what
770this hand-crafted pickle data stream does when loaded::
Georg Brandl116aa622007-08-15 14:28:22 +0000771
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000772 >>> import pickle
773 >>> pickle.loads(b"cos\nsystem\n(S'echo hello world'\ntR.")
774 hello world
775 0
Georg Brandl116aa622007-08-15 14:28:22 +0000776
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000777In this example, the unpickler imports the :func:`os.system` function and then
778apply the string argument "echo hello world". Although this example is
779inoffensive, it is not difficult to imagine one that could damage your system.
Georg Brandl116aa622007-08-15 14:28:22 +0000780
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000781For this reason, you may want to control what gets unpickled by customizing
Serhiy Storchaka5bbbc942013-10-14 10:43:46 +0300782:meth:`Unpickler.find_class`. Unlike its name suggests,
783:meth:`Unpickler.find_class` is called whenever a global (i.e., a class or
784a function) is requested. Thus it is possible to either completely forbid
785globals or restrict them to a safe subset.
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000786
787Here is an example of an unpickler allowing only few safe classes from the
788:mod:`builtins` module to be loaded::
789
790 import builtins
791 import io
792 import pickle
793
794 safe_builtins = {
795 'range',
796 'complex',
797 'set',
798 'frozenset',
799 'slice',
800 }
801
802 class RestrictedUnpickler(pickle.Unpickler):
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000803
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000804 def find_class(self, module, name):
805 # Only allow safe classes from builtins.
806 if module == "builtins" and name in safe_builtins:
807 return getattr(builtins, name)
808 # Forbid everything else.
809 raise pickle.UnpicklingError("global '%s.%s' is forbidden" %
810 (module, name))
811
812 def restricted_loads(s):
813 """Helper function analogous to pickle.loads()."""
814 return RestrictedUnpickler(io.BytesIO(s)).load()
815
816A sample usage of our unpickler working has intended::
817
818 >>> restricted_loads(pickle.dumps([1, 2, range(15)]))
819 [1, 2, range(0, 15)]
820 >>> restricted_loads(b"cos\nsystem\n(S'echo hello world'\ntR.")
821 Traceback (most recent call last):
822 ...
823 pickle.UnpicklingError: global 'os.system' is forbidden
824 >>> restricted_loads(b'cbuiltins\neval\n'
825 ... b'(S\'getattr(__import__("os"), "system")'
826 ... b'("echo hello world")\'\ntR.')
827 Traceback (most recent call last):
828 ...
829 pickle.UnpicklingError: global 'builtins.eval' is forbidden
830
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000831
832.. XXX Add note about how extension codes could evade our protection
Georg Brandl48310cd2009-01-03 21:18:54 +0000833 mechanism (e.g. cached classes do not invokes find_class()).
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000834
835As our examples shows, you have to be careful with what you allow to be
836unpickled. Therefore if security is a concern, you may want to consider
Alexandre Vassalotti9d7665d2009-04-03 06:13:29 +0000837alternatives such as the marshalling API in :mod:`xmlrpc.client` or
838third-party solutions.
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000839
Georg Brandl116aa622007-08-15 14:28:22 +0000840
Antoine Pitroud4d60552013-12-07 00:56:59 +0100841Performance
842-----------
843
844Recent versions of the pickle protocol (from protocol 2 and upwards) feature
845efficient binary encodings for several common features and built-in types.
846Also, the :mod:`pickle` module has a transparent optimizer written in C.
847
848
Georg Brandl116aa622007-08-15 14:28:22 +0000849.. _pickle-example:
850
Alexandre Vassalotti9d7665d2009-04-03 06:13:29 +0000851Examples
852--------
Georg Brandl116aa622007-08-15 14:28:22 +0000853
Alexandre Vassalotti9d7665d2009-04-03 06:13:29 +0000854For the simplest code, use the :func:`dump` and :func:`load` functions. ::
Georg Brandl116aa622007-08-15 14:28:22 +0000855
856 import pickle
857
Alexandre Vassalottibcd1e3a2009-01-23 05:28:16 +0000858 # An arbitrary collection of objects supported by pickle.
859 data = {
Alexandre Vassalotti9d7665d2009-04-03 06:13:29 +0000860 'a': [1, 2.0, 3, 4+6j],
861 'b': ("character string", b"byte string"),
Raymond Hettingerdf1b6992014-11-09 15:56:33 -0800862 'c': {None, True, False}
Alexandre Vassalottibcd1e3a2009-01-23 05:28:16 +0000863 }
Georg Brandl116aa622007-08-15 14:28:22 +0000864
Alexandre Vassalottibcd1e3a2009-01-23 05:28:16 +0000865 with open('data.pickle', 'wb') as f:
866 # Pickle the 'data' dictionary using the highest protocol available.
867 pickle.dump(data, f, pickle.HIGHEST_PROTOCOL)
Georg Brandl116aa622007-08-15 14:28:22 +0000868
Georg Brandl116aa622007-08-15 14:28:22 +0000869
Alexandre Vassalottibcd1e3a2009-01-23 05:28:16 +0000870The following example reads the resulting pickled data. ::
Georg Brandl116aa622007-08-15 14:28:22 +0000871
Alexandre Vassalottibcd1e3a2009-01-23 05:28:16 +0000872 import pickle
Georg Brandl116aa622007-08-15 14:28:22 +0000873
Alexandre Vassalottibcd1e3a2009-01-23 05:28:16 +0000874 with open('data.pickle', 'rb') as f:
875 # The protocol version used is detected automatically, so we do not
876 # have to specify it.
877 data = pickle.load(f)
Georg Brandl116aa622007-08-15 14:28:22 +0000878
Georg Brandl116aa622007-08-15 14:28:22 +0000879
Alexandre Vassalotti9d7665d2009-04-03 06:13:29 +0000880.. XXX: Add examples showing how to optimize pickles for size (like using
881.. pickletools.optimize() or the gzip module).
882
883
Georg Brandl116aa622007-08-15 14:28:22 +0000884.. seealso::
885
Alexandre Vassalottif7fa63d2008-05-11 08:55:36 +0000886 Module :mod:`copyreg`
Georg Brandl116aa622007-08-15 14:28:22 +0000887 Pickle interface constructor registration for extension types.
888
Alexandre Vassalotti9d7665d2009-04-03 06:13:29 +0000889 Module :mod:`pickletools`
890 Tools for working with and analyzing pickled data.
891
Georg Brandl116aa622007-08-15 14:28:22 +0000892 Module :mod:`shelve`
893 Indexed databases of objects; uses :mod:`pickle`.
894
895 Module :mod:`copy`
896 Shallow and deep object copying.
897
898 Module :mod:`marshal`
899 High-performance serialization of built-in types.
900
901
Georg Brandl116aa622007-08-15 14:28:22 +0000902.. rubric:: Footnotes
903
904.. [#] Don't confuse this with the :mod:`marshal` module
905
Ethan Furman2498d9e2013-10-18 00:45:40 -0700906.. [#] This is why :keyword:`lambda` functions cannot be pickled: all
907 :keyword:`lambda` functions share the same name: ``<lambda>``.
908
Georg Brandl116aa622007-08-15 14:28:22 +0000909.. [#] The exception raised will likely be an :exc:`ImportError` or an
910 :exc:`AttributeError` but it could be something else.
911
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000912.. [#] The :mod:`copy` module uses this protocol for shallow and deep copying
913 operations.
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000914
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000915.. [#] The limitation on alphanumeric characters is due to the fact
916 the persistent IDs, in protocol 0, are delimited by the newline
917 character. Therefore if any kind of newline characters occurs in
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000918 persistent IDs, the resulting pickle will become unreadable.