blob: f32684332f2deaab03591ba534b61e7b587f7882 [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001:mod:`pickle` --- Python object serialization
2=============================================
3
4.. index::
5 single: persistence
6 pair: persistent; objects
7 pair: serializing; objects
8 pair: marshalling; objects
9 pair: flattening; objects
10 pair: pickling; objects
11
12.. module:: pickle
13 :synopsis: Convert Python objects to streams of bytes and back.
Christian Heimes5b5e81c2007-12-31 16:14:33 +000014.. sectionauthor:: Jim Kerr <jbkerr@sr.hp.com>.
15.. sectionauthor:: Barry Warsaw <barry@zope.com>
Georg Brandl116aa622007-08-15 14:28:22 +000016
17The :mod:`pickle` module implements a fundamental, but powerful algorithm for
18serializing and de-serializing a Python object structure. "Pickling" is the
19process whereby a Python object hierarchy is converted into a byte stream, and
20"unpickling" is the inverse operation, whereby a byte stream is converted back
21into an object hierarchy. Pickling (and unpickling) is alternatively known as
22"serialization", "marshalling," [#]_ or "flattening", however, to avoid
Benjamin Petersonbe149d02008-06-20 21:03:22 +000023confusion, the terms used here are "pickling" and "unpickling"..
Georg Brandl116aa622007-08-15 14:28:22 +000024
25
26Relationship to other Python modules
27------------------------------------
28
Benjamin Petersonbe149d02008-06-20 21:03:22 +000029The :mod:`pickle` module has an transparent optimizer (:mod:`_pickle`) written
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +000030in C. It is used whenever available. Otherwise the pure Python implementation is
Benjamin Petersonbe149d02008-06-20 21:03:22 +000031used.
Georg Brandl116aa622007-08-15 14:28:22 +000032
33Python has a more primitive serialization module called :mod:`marshal`, but in
34general :mod:`pickle` should always be the preferred way to serialize Python
35objects. :mod:`marshal` exists primarily to support Python's :file:`.pyc`
36files.
37
38The :mod:`pickle` module differs from :mod:`marshal` several significant ways:
39
40* The :mod:`pickle` module keeps track of the objects it has already serialized,
41 so that later references to the same object won't be serialized again.
42 :mod:`marshal` doesn't do this.
43
44 This has implications both for recursive objects and object sharing. Recursive
45 objects are objects that contain references to themselves. These are not
46 handled by marshal, and in fact, attempting to marshal recursive objects will
47 crash your Python interpreter. Object sharing happens when there are multiple
48 references to the same object in different places in the object hierarchy being
49 serialized. :mod:`pickle` stores such objects only once, and ensures that all
50 other references point to the master copy. Shared objects remain shared, which
51 can be very important for mutable objects.
52
53* :mod:`marshal` cannot be used to serialize user-defined classes and their
54 instances. :mod:`pickle` can save and restore class instances transparently,
55 however the class definition must be importable and live in the same module as
56 when the object was stored.
57
58* The :mod:`marshal` serialization format is not guaranteed to be portable
59 across Python versions. Because its primary job in life is to support
60 :file:`.pyc` files, the Python implementers reserve the right to change the
61 serialization format in non-backwards compatible ways should the need arise.
62 The :mod:`pickle` serialization format is guaranteed to be backwards compatible
63 across Python releases.
64
65.. warning::
66
67 The :mod:`pickle` module is not intended to be secure against erroneous or
68 maliciously constructed data. Never unpickle data received from an untrusted or
69 unauthenticated source.
70
71Note that serialization is a more primitive notion than persistence; although
72:mod:`pickle` reads and writes file objects, it does not handle the issue of
73naming persistent objects, nor the (even more complicated) issue of concurrent
74access to persistent objects. The :mod:`pickle` module can transform a complex
75object into a byte stream and it can transform the byte stream into an object
76with the same internal structure. Perhaps the most obvious thing to do with
77these byte streams is to write them onto a file, but it is also conceivable to
78send them across a network or store them in a database. The module
79:mod:`shelve` provides a simple interface to pickle and unpickle objects on
80DBM-style database files.
81
82
83Data stream format
84------------------
85
86.. index::
87 single: XDR
88 single: External Data Representation
89
90The data format used by :mod:`pickle` is Python-specific. This has the
91advantage that there are no restrictions imposed by external standards such as
92XDR (which can't represent pointer sharing); however it means that non-Python
93programs may not be able to reconstruct pickled Python objects.
94
Alexandre Vassalotti758bca62008-10-18 19:25:07 +000095By default, the :mod:`pickle` data format uses a compact binary representation.
96The module :mod:`pickletools` contains tools for analyzing data streams
97generated by :mod:`pickle`.
Georg Brandl116aa622007-08-15 14:28:22 +000098
Georg Brandl42f2ae02008-04-06 08:39:37 +000099There are currently 4 different protocols which can be used for pickling.
Georg Brandl116aa622007-08-15 14:28:22 +0000100
101* Protocol version 0 is the original ASCII protocol and is backwards compatible
102 with earlier versions of Python.
103
104* Protocol version 1 is the old binary format which is also compatible with
105 earlier versions of Python.
106
107* Protocol version 2 was introduced in Python 2.3. It provides much more
Georg Brandl9afde1c2007-11-01 20:32:30 +0000108 efficient pickling of :term:`new-style class`\es.
Georg Brandl116aa622007-08-15 14:28:22 +0000109
Georg Brandl42f2ae02008-04-06 08:39:37 +0000110* Protocol version 3 was added in Python 3.0. It has explicit support for
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000111 bytes and cannot be unpickled by Python 2.x pickle modules. This is
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000112 the current recommended protocol, use it whenever it is possible.
Georg Brandl42f2ae02008-04-06 08:39:37 +0000113
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000114Refer to :pep:`307` for information about improvements brought by
115protocol 2. See :mod:`pickletools`'s source code for extensive
116comments about opcodes used by pickle protocols.
Georg Brandl116aa622007-08-15 14:28:22 +0000117
Georg Brandl116aa622007-08-15 14:28:22 +0000118
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000119Module Interface
120----------------
Georg Brandl116aa622007-08-15 14:28:22 +0000121
122To serialize an object hierarchy, you first create a pickler, then you call the
123pickler's :meth:`dump` method. To de-serialize a data stream, you first create
124an unpickler, then you call the unpickler's :meth:`load` method. The
125:mod:`pickle` module provides the following constant:
126
127
128.. data:: HIGHEST_PROTOCOL
129
130 The highest protocol version available. This value can be passed as a
131 *protocol* value.
132
Georg Brandl116aa622007-08-15 14:28:22 +0000133.. note::
134
135 Be sure to always open pickle files created with protocols >= 1 in binary mode.
136 For the old ASCII-based pickle protocol 0 you can use either text mode or binary
137 mode as long as you stay consistent.
138
139 A pickle file written with protocol 0 in binary mode will contain lone linefeeds
140 as line terminators and therefore will look "funny" when viewed in Notepad or
141 other editors which do not support this format.
142
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000143.. data:: DEFAULT_PROTOCOL
144
145 The default protocol used for pickling. May be less than HIGHEST_PROTOCOL.
146 Currently the default protocol is 3; a backward-incompatible protocol
147 designed for Python 3.0.
148
149
Georg Brandl116aa622007-08-15 14:28:22 +0000150The :mod:`pickle` module provides the following functions to make the pickling
151process more convenient:
152
Georg Brandl116aa622007-08-15 14:28:22 +0000153.. function:: dump(obj, file[, protocol])
154
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000155 Write a pickled representation of *obj* to the open file object *file*. This
156 is equivalent to ``Pickler(file, protocol).dump(obj)``.
Georg Brandl116aa622007-08-15 14:28:22 +0000157
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000158 The optional *protocol* argument tells the pickler to use the given protocol;
159 supported protocols are 0, 1, 2, 3. The default protocol is 3; a
160 backward-incompatible protocol designed for Python 3.0.
Georg Brandl116aa622007-08-15 14:28:22 +0000161
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000162 Specifying a negative protocol version selects the highest protocol version
163 supported. The higher the protocol used, the more recent the version of
164 Python needed to read the pickle produced.
Georg Brandl116aa622007-08-15 14:28:22 +0000165
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000166 The *file* argument must have a write() method that accepts a single bytes
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000167 argument. It can thus be a file object opened for binary writing, a
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000168 io.BytesIO instance, or any other custom object that meets this interface.
Georg Brandl116aa622007-08-15 14:28:22 +0000169
170.. function:: dumps(obj[, protocol])
171
Mark Summerfieldb9e23042008-04-21 14:47:45 +0000172 Return the pickled representation of the object as a :class:`bytes`
173 object, instead of writing it to a file.
Georg Brandl116aa622007-08-15 14:28:22 +0000174
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000175 The optional *protocol* argument tells the pickler to use the given protocol;
176 supported protocols are 0, 1, 2, 3. The default protocol is 3; a
177 backward-incompatible protocol designed for Python 3.0.
178
179 Specifying a negative protocol version selects the highest protocol version
180 supported. The higher the protocol used, the more recent the version of
181 Python needed to read the pickle produced.
182
183.. function:: load(file, [\*, encoding="ASCII", errors="strict"])
184
185 Read a pickled object representation from the open file object *file* and
186 return the reconstituted object hierarchy specified therein. This is
187 equivalent to ``Unpickler(file).load()``.
188
189 The protocol version of the pickle is detected automatically, so no protocol
190 argument is needed. Bytes past the pickled object's representation are
191 ignored.
192
193 The argument *file* must have two methods, a read() method that takes an
194 integer argument, and a readline() method that requires no arguments. Both
195 methods should return bytes. Thus *file* can be a binary file object opened
196 for reading, a BytesIO object, or any other custom object that meets this
197 interface.
198
199 Optional keyword arguments are encoding and errors, which are used to decode
200 8-bit string instances pickled by Python 2.x. These default to 'ASCII' and
201 'strict', respectively.
202
203.. function:: loads(bytes_object, [\*, encoding="ASCII", errors="strict"])
204
205 Read a pickled object hierarchy from a :class:`bytes` object and return the
206 reconstituted object hierarchy specified therein
207
208 The protocol version of the pickle is detected automatically, so no protocol
209 argument is needed. Bytes past the pickled object's representation are
210 ignored.
211
212 Optional keyword arguments are encoding and errors, which are used to decode
213 8-bit string instances pickled by Python 2.x. These default to 'ASCII' and
214 'strict', respectively.
Georg Brandl116aa622007-08-15 14:28:22 +0000215
Georg Brandl116aa622007-08-15 14:28:22 +0000216
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000217The :mod:`pickle` module defines three exceptions:
Georg Brandl116aa622007-08-15 14:28:22 +0000218
219.. exception:: PickleError
220
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000221 Common base class for the other pickling exceptions. It inherits
Georg Brandl116aa622007-08-15 14:28:22 +0000222 :exc:`Exception`.
223
Georg Brandl116aa622007-08-15 14:28:22 +0000224.. exception:: PicklingError
225
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000226 Error raised when an unpicklable object is encountered by :class:`Pickler`.
227 It inherits :exc:`PickleError`.
Georg Brandl116aa622007-08-15 14:28:22 +0000228
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000229 Refer to :ref:`pickle-picklable` to learn what kinds of objects can be
230 pickled.
231
Georg Brandl116aa622007-08-15 14:28:22 +0000232.. exception:: UnpicklingError
233
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000234 Error raised when there a problem unpickling an object, such as a data
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000235 corruption or a security violation. It inherits :exc:`PickleError`.
Georg Brandl116aa622007-08-15 14:28:22 +0000236
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000237 Note that other exceptions may also be raised during unpickling, including
238 (but not necessarily limited to) AttributeError, EOFError, ImportError, and
239 IndexError.
240
241
242The :mod:`pickle` module exports two classes, :class:`Pickler` and
Georg Brandl116aa622007-08-15 14:28:22 +0000243:class:`Unpickler`:
244
Georg Brandl116aa622007-08-15 14:28:22 +0000245.. class:: Pickler(file[, protocol])
246
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000247 This takes a binary file for writing a pickle data stream.
Georg Brandl116aa622007-08-15 14:28:22 +0000248
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000249 The optional *protocol* argument tells the pickler to use the given protocol;
250 supported protocols are 0, 1, 2, 3. The default protocol is 3; a
251 backward-incompatible protocol designed for Python 3.0.
Georg Brandl116aa622007-08-15 14:28:22 +0000252
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000253 Specifying a negative protocol version selects the highest protocol version
254 supported. The higher the protocol used, the more recent the version of
255 Python needed to read the pickle produced.
Georg Brandl116aa622007-08-15 14:28:22 +0000256
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000257 The *file* argument must have a write() method that accepts a single bytes
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000258 argument. It can thus be a file object opened for binary writing, a
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000259 io.BytesIO instance, or any other custom object that meets this interface.
Georg Brandl116aa622007-08-15 14:28:22 +0000260
Benjamin Petersone41251e2008-04-25 01:59:09 +0000261 .. method:: dump(obj)
Georg Brandl116aa622007-08-15 14:28:22 +0000262
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000263 Write a pickled representation of *obj* to the open file object given in
264 the constructor.
Georg Brandl116aa622007-08-15 14:28:22 +0000265
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000266 .. method:: persistent_id(obj)
267
268 Do nothing by default. This exists so a subclass can override it.
269
270 If :meth:`persistent_id` returns ``None``, *obj* is pickled as usual. Any
271 other value causes :class:`Pickler` to emit the returned value as a
272 persistent ID for *obj*. The meaning of this persistent ID should be
273 defined by :meth:`Unpickler.persistent_load`. Note that the value
274 returned by :meth:`persistent_id` cannot itself have a persistent ID.
275
276 See :ref:`pickle-persistent` for details and examples of uses.
Georg Brandl116aa622007-08-15 14:28:22 +0000277
Benjamin Petersone41251e2008-04-25 01:59:09 +0000278 .. method:: clear_memo()
Georg Brandl116aa622007-08-15 14:28:22 +0000279
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000280 Deprecated. Use the :meth:`clear` method on :attr:`memo`, instead.
281 Clear the pickler's memo, useful when reusing picklers.
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000282
283 .. attribute:: fast
284
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000285 Deprecated. Enable fast mode if set to a true value. The fast mode
286 disables the usage of memo, therefore speeding the pickling process by not
287 generating superfluous PUT opcodes. It should not be used with
288 self-referential objects, doing otherwise will cause :class:`Pickler` to
289 recurse infinitely.
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000290
291 Use :func:`pickletools.optimize` if you need more compact pickles.
292
293 .. attribute:: memo
294
295 Dictionary holding previously pickled objects to allow shared or
296 recursive objects to pickled by reference as opposed to by value.
Georg Brandl116aa622007-08-15 14:28:22 +0000297
Georg Brandl116aa622007-08-15 14:28:22 +0000298
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000299.. XXX Move these comments to somewhere more appropriate.
300
Georg Brandl116aa622007-08-15 14:28:22 +0000301It is possible to make multiple calls to the :meth:`dump` method of the same
302:class:`Pickler` instance. These must then be matched to the same number of
303calls to the :meth:`load` method of the corresponding :class:`Unpickler`
304instance. If the same object is pickled by multiple :meth:`dump` calls, the
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000305:meth:`load` will all yield references to the same object.
Georg Brandl116aa622007-08-15 14:28:22 +0000306
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000307Please note, this is intended for pickling multiple objects without intervening
308modifications to the objects or their parts. If you modify an object and then
309pickle it again using the same :class:`Pickler` instance, the object is not
310pickled again --- a reference to it is pickled and the :class:`Unpickler` will
311return the old value, not the modified one.
Georg Brandl116aa622007-08-15 14:28:22 +0000312
313
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000314.. class:: Unpickler(file, [\*, encoding="ASCII", errors="strict"])
Georg Brandl116aa622007-08-15 14:28:22 +0000315
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000316 This takes a binary file for reading a pickle data stream.
Georg Brandl116aa622007-08-15 14:28:22 +0000317
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000318 The protocol version of the pickle is detected automatically, so no
319 protocol argument is needed.
320
321 The argument *file* must have two methods, a read() method that takes an
322 integer argument, and a readline() method that requires no arguments. Both
323 methods should return bytes. Thus *file* can be a binary file object opened
324 for reading, a BytesIO object, or any other custom object that meets this
Georg Brandl116aa622007-08-15 14:28:22 +0000325 interface.
326
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000327 Optional keyword arguments are encoding and errors, which are used to decode
328 8-bit string instances pickled by Python 2.x. These default to 'ASCII' and
329 'strict', respectively.
Georg Brandl116aa622007-08-15 14:28:22 +0000330
Benjamin Petersone41251e2008-04-25 01:59:09 +0000331 .. method:: load()
Georg Brandl116aa622007-08-15 14:28:22 +0000332
Benjamin Petersone41251e2008-04-25 01:59:09 +0000333 Read a pickled object representation from the open file object given in
334 the constructor, and return the reconstituted object hierarchy specified
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000335 therein. Bytes past the pickled object's representation are ignored.
Georg Brandl116aa622007-08-15 14:28:22 +0000336
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000337 .. method:: persistent_load(pid)
Georg Brandl116aa622007-08-15 14:28:22 +0000338
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000339 Raise an :exc:`UnpickingError` by default.
Georg Brandl116aa622007-08-15 14:28:22 +0000340
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000341 If defined, :meth:`persistent_load` should return the object specified by
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000342 the persistent ID *pid*. If an invalid persistent ID is encountered, an
343 :exc:`UnpickingError` should be raised.
Georg Brandl116aa622007-08-15 14:28:22 +0000344
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000345 See :ref:`pickle-persistent` for details and examples of uses.
346
347 .. method:: find_class(module, name)
348
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000349 Import *module* if necessary and return the object called *name* from it,
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000350 where the *module* and *name* arguments are :class:`str` objects. Note,
351 unlike its name suggests, :meth:`find_class` is also used for finding
352 functions.
Georg Brandl116aa622007-08-15 14:28:22 +0000353
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000354 Subclasses may override this to gain control over what type of objects and
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000355 how they can be loaded, potentially reducing security risks. Refer to
356 :ref:`pickle-restrict` for details.
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000357
358
359.. _pickle-picklable:
Georg Brandl116aa622007-08-15 14:28:22 +0000360
361What can be pickled and unpickled?
362----------------------------------
363
364The following types can be pickled:
365
366* ``None``, ``True``, and ``False``
367
Georg Brandlba956ae2007-11-29 17:24:34 +0000368* integers, floating point numbers, complex numbers
Georg Brandl116aa622007-08-15 14:28:22 +0000369
Georg Brandlf6945182008-02-01 11:56:49 +0000370* strings, bytes, bytearrays
Georg Brandl116aa622007-08-15 14:28:22 +0000371
372* tuples, lists, sets, and dictionaries containing only picklable objects
373
374* functions defined at the top level of a module
375
376* built-in functions defined at the top level of a module
377
378* classes that are defined at the top level of a module
379
380* instances of such classes whose :attr:`__dict__` or :meth:`__setstate__` is
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000381 picklable (see section :ref:`pickle-inst` for details)
Georg Brandl116aa622007-08-15 14:28:22 +0000382
383Attempts to pickle unpicklable objects will raise the :exc:`PicklingError`
384exception; when this happens, an unspecified number of bytes may have already
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000385been written to the underlying file. Trying to pickle a highly recursive data
Georg Brandl116aa622007-08-15 14:28:22 +0000386structure may exceed the maximum recursion depth, a :exc:`RuntimeError` will be
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000387raised in this case. You can carefully raise this limit with
Georg Brandl116aa622007-08-15 14:28:22 +0000388:func:`sys.setrecursionlimit`.
389
390Note that functions (built-in and user-defined) are pickled by "fully qualified"
391name reference, not by value. This means that only the function name is
392pickled, along with the name of module the function is defined in. Neither the
393function's code, nor any of its function attributes are pickled. Thus the
394defining module must be importable in the unpickling environment, and the module
395must contain the named object, otherwise an exception will be raised. [#]_
396
397Similarly, classes are pickled by named reference, so the same restrictions in
398the unpickling environment apply. Note that none of the class's code or data is
399pickled, so in the following example the class attribute ``attr`` is not
400restored in the unpickling environment::
401
402 class Foo:
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000403 attr = 'A class attribute'
Georg Brandl116aa622007-08-15 14:28:22 +0000404
405 picklestring = pickle.dumps(Foo)
406
407These restrictions are why picklable functions and classes must be defined in
408the top level of a module.
409
410Similarly, when class instances are pickled, their class's code and data are not
411pickled along with them. Only the instance data are pickled. This is done on
412purpose, so you can fix bugs in a class or add methods to the class and still
413load objects that were created with an earlier version of the class. If you
414plan to have long-lived objects that will see many versions of a class, it may
415be worthwhile to put a version number in the objects so that suitable
416conversions can be made by the class's :meth:`__setstate__` method.
417
418
Georg Brandl116aa622007-08-15 14:28:22 +0000419.. _pickle-inst:
420
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000421Pickling Class Instances
422------------------------
Georg Brandl116aa622007-08-15 14:28:22 +0000423
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000424In this section, we describe the general mechanisms available to you to define,
425customize, and control how class instances are pickled and unpickled.
Georg Brandl116aa622007-08-15 14:28:22 +0000426
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000427In most cases, no additional code is needed to make instances picklable. By
428default, pickle will retrieve the class and the attributes of an instance via
429introspection. When a class instance is unpickled, its :meth:`__init__` method
430is usually *not* invoked. The default behaviour first creates an uninitialized
431instance and then restores the saved attributes. The following code shows an
432implementation of this behaviour::
Georg Brandl85eb8c12007-08-31 16:33:38 +0000433
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000434 def save(obj):
435 return (obj.__class__, obj.__dict__)
436
437 def load(cls, attributes):
438 obj = cls.__new__(cls)
439 obj.__dict__.update(attributes)
440 return obj
Georg Brandl116aa622007-08-15 14:28:22 +0000441
442.. index:: single: __getnewargs__() (copy protocol)
443
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000444Classes can alter the default behaviour by providing one or severals special
445methods. In protocol 2 and newer, classes that implements the
446:meth:`__getnewargs__` method can dictate the values passed to the
447:meth:`__new__` method upon unpickling. This is often needed for classes
448whose :meth:`__new__` method requires arguments.
Georg Brandl116aa622007-08-15 14:28:22 +0000449
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000450.. index:: single: __getstate__() (copy protocol)
Georg Brandl116aa622007-08-15 14:28:22 +0000451
452Classes can further influence how their instances are pickled; if the class
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000453defines the method :meth:`__getstate__`, it is called and the returned object is
Georg Brandl116aa622007-08-15 14:28:22 +0000454pickled as the contents for the instance, instead of the contents of the
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000455instance's dictionary. If the :meth:`__getstate__` method is absent, the
456instance's :attr:`__dict__` is pickled as usual.
Georg Brandl116aa622007-08-15 14:28:22 +0000457
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000458.. index:: single: __setstate__() (copy protocol)
Georg Brandl116aa622007-08-15 14:28:22 +0000459
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000460Upon unpickling, if the class defines :meth:`__setstate__`, it is called with
461the unpickled state. In that case, there is no requirement for the state object
462to be a dictionary. Otherwise, the pickled state must be a dictionary and its
463items are assigned to the new instance's dictionary.
464
465.. note::
Georg Brandl116aa622007-08-15 14:28:22 +0000466
Georg Brandl23e8db52008-04-07 19:17:06 +0000467 If :meth:`__getstate__` returns a false value, the :meth:`__setstate__`
468 method will not be called.
Georg Brandl116aa622007-08-15 14:28:22 +0000469
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000470Refer to the section :ref:`pickle-state` for more information about how to use
471the methods :meth:`__getstate__` and :meth:`__setstate__`.
Georg Brandl116aa622007-08-15 14:28:22 +0000472
Christian Heimes05e8be12008-02-23 18:30:17 +0000473.. index::
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000474 pair: copy; protocol
475 single: __reduce__() (copy protocol)
Christian Heimes05e8be12008-02-23 18:30:17 +0000476
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000477As we shall see, pickle does not use directly the methods described above. In
478fact, these methods are part of the copy protocol which implements the
479:meth:`__reduce__` special method. The copy protocol provides a unified
480interface for retrieving the data necessary for pickling and copying
Georg Brandl48310cd2009-01-03 21:18:54 +0000481objects. [#]_
Georg Brandl116aa622007-08-15 14:28:22 +0000482
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000483Although powerful, implementing :meth:`__reduce__` directly in your classes is
484error prone. For this reason, class designers should use the high-level
485interface (i.e., :meth:`__getnewargs__`, :meth:`__getstate__` and
486:meth:`__setstate__`) whenever possible. We will show however cases where using
487:meth:`__reduce__` is the only option or leads to more efficient pickling or
488both.
Georg Brandl116aa622007-08-15 14:28:22 +0000489
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000490The interface is currently defined as follow. The :meth:`__reduce__` method
491takes no argument and shall return either a string or preferably a tuple (the
492returned object is often refered as the "reduce value").
493
494If a string is returned, the string should be interpreted as the name of a
495global variable. It should be the object's local name relative to its module;
496the pickle module searches the module namespace to determine the object's
497module. This behaviour is typically useful for singletons.
498
499When a tuple is returned, it must be between two and five items long. Optional
500items can either be omitted, or ``None`` can be provided as their value. The
501semantics of each item are in order:
502
503.. XXX Mention __newobj__ special-case?
Georg Brandl116aa622007-08-15 14:28:22 +0000504
505* A callable object that will be called to create the initial version of the
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000506 object.
Georg Brandl116aa622007-08-15 14:28:22 +0000507
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000508* A tuple of arguments for the callable object. An empty tuple must be given if
509 the callable does not accept any argument.
Georg Brandl116aa622007-08-15 14:28:22 +0000510
511* Optionally, the object's state, which will be passed to the object's
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000512 :meth:`__setstate__` method as previously described. If the object has no
513 such method then, the value must be a dictionary and it will be added to the
514 object's :attr:`__dict__` attribute.
Georg Brandl116aa622007-08-15 14:28:22 +0000515
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000516* Optionally, an iterator (and not a sequence) yielding successive items. These
517 items will be appended to the object either using ``obj.append(item)`` or, in
518 batch, using ``obj.extend(list_of_items)``. This is primarily used for list
519 subclasses, but may be used by other classes as long as they have
Georg Brandl116aa622007-08-15 14:28:22 +0000520 :meth:`append` and :meth:`extend` methods with the appropriate signature.
521 (Whether :meth:`append` or :meth:`extend` is used depends on which pickle
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000522 protocol version is used as well as the number of items to append, so both
523 must be supported.)
Georg Brandl116aa622007-08-15 14:28:22 +0000524
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000525* Optionally, an iterator (not a sequence) yielding successive key-value pairs.
526 These items will be stored to the object using ``obj[key] = value``. This is
527 primarily used for dictionary subclasses, but may be used by other classes as
528 long as they implement :meth:`__setitem__`.
Georg Brandl116aa622007-08-15 14:28:22 +0000529
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000530.. index:: single: __reduce_ex__() (copy protocol)
Georg Brandl116aa622007-08-15 14:28:22 +0000531
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000532Alternatively, a :meth:`__reduce_ex__` method may be defined. The only
533difference is this method should take a single integer argument, the protocol
534version. When defined, pickle will prefer it over the :meth:`__reduce__`
535method. In addition, :meth:`__reduce__` automatically becomes a synonym for the
536extended version. The main use for this method is to provide
537backwards-compatible reduce values for older Python releases.
Georg Brandl116aa622007-08-15 14:28:22 +0000538
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000539.. _pickle-persistent:
540
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000541Persistence of External Objects
542^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Georg Brandl116aa622007-08-15 14:28:22 +0000543
Christian Heimes05e8be12008-02-23 18:30:17 +0000544.. index::
545 single: persistent_id (pickle protocol)
546 single: persistent_load (pickle protocol)
547
Georg Brandl116aa622007-08-15 14:28:22 +0000548For the benefit of object persistence, the :mod:`pickle` module supports the
549notion of a reference to an object outside the pickled data stream. Such
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000550objects are referenced by a persistent ID, which should be either a string of
551alphanumeric characters (for protocol 0) [#]_ or just an arbitrary object (for
552any newer protocol).
Georg Brandl116aa622007-08-15 14:28:22 +0000553
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000554The resolution of such persistent IDs is not defined by the :mod:`pickle`
555module; it will delegate this resolution to the user defined methods on the
556pickler and unpickler, :meth:`persistent_id` and :meth:`persistent_load`
557respectively.
Georg Brandl116aa622007-08-15 14:28:22 +0000558
559To pickle objects that have an external persistent id, the pickler must have a
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000560custom :meth:`persistent_id` method that takes an object as an argument and
Georg Brandl116aa622007-08-15 14:28:22 +0000561returns either ``None`` or the persistent id for that object. When ``None`` is
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000562returned, the pickler simply pickles the object as normal. When a persistent ID
563string is returned, the pickler will pickle that object, along with a marker so
564that the unpickler will recognize it as a persistent ID.
Georg Brandl116aa622007-08-15 14:28:22 +0000565
566To unpickle external objects, the unpickler must have a custom
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000567:meth:`persistent_load` method that takes a persistent ID object and returns the
568referenced object.
Georg Brandl116aa622007-08-15 14:28:22 +0000569
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000570Here is a comprehensive example presenting how persistent ID can be used to
571pickle external objects by reference.
Georg Brandl116aa622007-08-15 14:28:22 +0000572
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000573.. XXX Work around for some bug in sphinx/pygments.
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000574.. highlightlang:: python
575.. literalinclude:: ../includes/dbpickle.py
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000576.. highlightlang:: python3
Georg Brandl116aa622007-08-15 14:28:22 +0000577
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000578.. _pickle-state:
579
580Handling Stateful Objects
581^^^^^^^^^^^^^^^^^^^^^^^^^
582
583.. index::
584 single: __getstate__() (copy protocol)
585 single: __setstate__() (copy protocol)
586
587Here's an example that shows how to modify pickling behavior for a class.
588The :class:`TextReader` class opens a text file, and returns the line number and
589line contents each time its :meth:`readline` method is called. If a
590:class:`TextReader` instance is pickled, all attributes *except* the file object
591member are saved. When the instance is unpickled, the file is reopened, and
592reading resumes from the last location. The :meth:`__setstate__` and
593:meth:`__getstate__` methods are used to implement this behavior. ::
594
595 class TextReader:
596 """Print and number lines in a text file."""
597
598 def __init__(self, filename):
599 self.filename = filename
600 self.file = open(filename)
601 self.lineno = 0
602
603 def readline(self):
604 self.lineno += 1
605 line = self.file.readline()
606 if not line:
607 return None
608 if line.endswith("\n"):
609 line = line[:-1]
610 return "%i: %s" % (self.lineno, line)
611
612 def __getstate__(self):
613 # Copy the object's state from self.__dict__ which contains
614 # all our instance attributes. Always use the dict.copy()
615 # method to avoid modifying the original state.
616 state = self.__dict__.copy()
617 # Remove the unpicklable entries.
618 del state['file']
619 return state
620
621 def __setstate__(self, state):
622 # Restore instance attributes (i.e., filename and lineno).
623 self.__dict__.update(state)
624 # Restore the previously opened file's state. To do so, we need to
625 # reopen it and read from it until the line count is restored.
626 file = open(self.filename)
627 for _ in range(self.lineno):
628 file.readline()
629 # Finally, save the file.
630 self.file = file
631
632
633A sample usage might be something like this::
634
635 >>> reader = TextReader("hello.txt")
636 >>> reader.readline()
637 '1: Hello world!'
638 >>> reader.readline()
639 '2: I am line number two.'
640 >>> new_reader = pickle.loads(pickle.dumps(reader))
641 >>> new_reader.readline()
642 '3: Goodbye!'
643
644
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000645.. _pickle-restrict:
Georg Brandl116aa622007-08-15 14:28:22 +0000646
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000647Restricting Globals
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000648-------------------
Georg Brandl116aa622007-08-15 14:28:22 +0000649
Christian Heimes05e8be12008-02-23 18:30:17 +0000650.. index::
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000651 single: find_class() (pickle protocol)
Christian Heimes05e8be12008-02-23 18:30:17 +0000652
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000653By default, unpickling will import any class or function that it finds in the
654pickle data. For many applications, this behaviour is unacceptable as it
655permits the unpickler to import and invoke arbitrary code. Just consider what
656this hand-crafted pickle data stream does when loaded::
Georg Brandl116aa622007-08-15 14:28:22 +0000657
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000658 >>> import pickle
659 >>> pickle.loads(b"cos\nsystem\n(S'echo hello world'\ntR.")
660 hello world
661 0
Georg Brandl116aa622007-08-15 14:28:22 +0000662
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000663In this example, the unpickler imports the :func:`os.system` function and then
664apply the string argument "echo hello world". Although this example is
665inoffensive, it is not difficult to imagine one that could damage your system.
Georg Brandl116aa622007-08-15 14:28:22 +0000666
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000667For this reason, you may want to control what gets unpickled by customizing
668:meth:`Unpickler.find_class`. Unlike its name suggests, :meth:`find_class` is
669called whenever a global (i.e., a class or a function) is requested. Thus it is
670possible to either forbid completely globals or restrict them to a safe subset.
671
672Here is an example of an unpickler allowing only few safe classes from the
673:mod:`builtins` module to be loaded::
674
675 import builtins
676 import io
677 import pickle
678
679 safe_builtins = {
680 'range',
681 'complex',
682 'set',
683 'frozenset',
684 'slice',
685 }
686
687 class RestrictedUnpickler(pickle.Unpickler):
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000688
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000689 def find_class(self, module, name):
690 # Only allow safe classes from builtins.
691 if module == "builtins" and name in safe_builtins:
692 return getattr(builtins, name)
693 # Forbid everything else.
694 raise pickle.UnpicklingError("global '%s.%s' is forbidden" %
695 (module, name))
696
697 def restricted_loads(s):
698 """Helper function analogous to pickle.loads()."""
699 return RestrictedUnpickler(io.BytesIO(s)).load()
700
701A sample usage of our unpickler working has intended::
702
703 >>> restricted_loads(pickle.dumps([1, 2, range(15)]))
704 [1, 2, range(0, 15)]
705 >>> restricted_loads(b"cos\nsystem\n(S'echo hello world'\ntR.")
706 Traceback (most recent call last):
707 ...
708 pickle.UnpicklingError: global 'os.system' is forbidden
709 >>> restricted_loads(b'cbuiltins\neval\n'
710 ... b'(S\'getattr(__import__("os"), "system")'
711 ... b'("echo hello world")\'\ntR.')
712 Traceback (most recent call last):
713 ...
714 pickle.UnpicklingError: global 'builtins.eval' is forbidden
715
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000716
717.. XXX Add note about how extension codes could evade our protection
Georg Brandl48310cd2009-01-03 21:18:54 +0000718 mechanism (e.g. cached classes do not invokes find_class()).
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000719
720As our examples shows, you have to be careful with what you allow to be
721unpickled. Therefore if security is a concern, you may want to consider
722alternatives such as the marshalling API in :mod:`xmlrpc.client` or third-party
723solutions.
724
Georg Brandl116aa622007-08-15 14:28:22 +0000725
726.. _pickle-example:
727
728Example
729-------
730
731For the simplest code, use the :func:`dump` and :func:`load` functions. Note
732that a self-referencing list is pickled and restored correctly. ::
733
734 import pickle
735
736 data1 = {'a': [1, 2.0, 3, 4+6j],
Georg Brandlf6945182008-02-01 11:56:49 +0000737 'b': ("string", "string using Unicode features \u0394"),
Georg Brandl116aa622007-08-15 14:28:22 +0000738 'c': None}
739
740 selfref_list = [1, 2, 3]
741 selfref_list.append(selfref_list)
742
743 output = open('data.pkl', 'wb')
744
Georg Brandl42f2ae02008-04-06 08:39:37 +0000745 # Pickle dictionary using protocol 2.
746 pickle.dump(data1, output, 2)
Georg Brandl116aa622007-08-15 14:28:22 +0000747
748 # Pickle the list using the highest protocol available.
749 pickle.dump(selfref_list, output, -1)
750
751 output.close()
752
753The following example reads the resulting pickled data. When reading a
754pickle-containing file, you should open the file in binary mode because you
755can't be sure if the ASCII or binary format was used. ::
756
757 import pprint, pickle
758
759 pkl_file = open('data.pkl', 'rb')
760
761 data1 = pickle.load(pkl_file)
762 pprint.pprint(data1)
763
764 data2 = pickle.load(pkl_file)
765 pprint.pprint(data2)
766
767 pkl_file.close()
768
Georg Brandl116aa622007-08-15 14:28:22 +0000769.. seealso::
770
Alexandre Vassalottif7fa63d2008-05-11 08:55:36 +0000771 Module :mod:`copyreg`
Georg Brandl116aa622007-08-15 14:28:22 +0000772 Pickle interface constructor registration for extension types.
773
774 Module :mod:`shelve`
775 Indexed databases of objects; uses :mod:`pickle`.
776
777 Module :mod:`copy`
778 Shallow and deep object copying.
779
780 Module :mod:`marshal`
781 High-performance serialization of built-in types.
782
783
Georg Brandl116aa622007-08-15 14:28:22 +0000784.. rubric:: Footnotes
785
786.. [#] Don't confuse this with the :mod:`marshal` module
787
Georg Brandl116aa622007-08-15 14:28:22 +0000788.. [#] The exception raised will likely be an :exc:`ImportError` or an
789 :exc:`AttributeError` but it could be something else.
790
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000791.. [#] The :mod:`copy` module uses this protocol for shallow and deep copying
792 operations.
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000793
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000794.. [#] The limitation on alphanumeric characters is due to the fact
795 the persistent IDs, in protocol 0, are delimited by the newline
796 character. Therefore if any kind of newline characters occurs in
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000797 persistent IDs, the resulting pickle will become unreadable.