blob: 027a01451d4c73e128fe721f6278b1bc005f7115 [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001:mod:`pickle` --- Python object serialization
2=============================================
3
4.. index::
5 single: persistence
6 pair: persistent; objects
7 pair: serializing; objects
8 pair: marshalling; objects
9 pair: flattening; objects
10 pair: pickling; objects
11
12.. module:: pickle
13 :synopsis: Convert Python objects to streams of bytes and back.
Christian Heimes5b5e81c2007-12-31 16:14:33 +000014.. sectionauthor:: Jim Kerr <jbkerr@sr.hp.com>.
15.. sectionauthor:: Barry Warsaw <barry@zope.com>
Georg Brandl116aa622007-08-15 14:28:22 +000016
17The :mod:`pickle` module implements a fundamental, but powerful algorithm for
18serializing and de-serializing a Python object structure. "Pickling" is the
19process whereby a Python object hierarchy is converted into a byte stream, and
20"unpickling" is the inverse operation, whereby a byte stream is converted back
21into an object hierarchy. Pickling (and unpickling) is alternatively known as
22"serialization", "marshalling," [#]_ or "flattening", however, to avoid
Benjamin Petersonbe149d02008-06-20 21:03:22 +000023confusion, the terms used here are "pickling" and "unpickling"..
Georg Brandl116aa622007-08-15 14:28:22 +000024
25
26Relationship to other Python modules
27------------------------------------
28
Benjamin Petersonbe149d02008-06-20 21:03:22 +000029The :mod:`pickle` module has an transparent optimizer (:mod:`_pickle`) written
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +000030in C. It is used whenever available. Otherwise the pure Python implementation is
Benjamin Petersonbe149d02008-06-20 21:03:22 +000031used.
Georg Brandl116aa622007-08-15 14:28:22 +000032
33Python has a more primitive serialization module called :mod:`marshal`, but in
34general :mod:`pickle` should always be the preferred way to serialize Python
35objects. :mod:`marshal` exists primarily to support Python's :file:`.pyc`
36files.
37
38The :mod:`pickle` module differs from :mod:`marshal` several significant ways:
39
40* The :mod:`pickle` module keeps track of the objects it has already serialized,
41 so that later references to the same object won't be serialized again.
42 :mod:`marshal` doesn't do this.
43
44 This has implications both for recursive objects and object sharing. Recursive
45 objects are objects that contain references to themselves. These are not
46 handled by marshal, and in fact, attempting to marshal recursive objects will
47 crash your Python interpreter. Object sharing happens when there are multiple
48 references to the same object in different places in the object hierarchy being
49 serialized. :mod:`pickle` stores such objects only once, and ensures that all
50 other references point to the master copy. Shared objects remain shared, which
51 can be very important for mutable objects.
52
53* :mod:`marshal` cannot be used to serialize user-defined classes and their
54 instances. :mod:`pickle` can save and restore class instances transparently,
55 however the class definition must be importable and live in the same module as
56 when the object was stored.
57
58* The :mod:`marshal` serialization format is not guaranteed to be portable
59 across Python versions. Because its primary job in life is to support
60 :file:`.pyc` files, the Python implementers reserve the right to change the
61 serialization format in non-backwards compatible ways should the need arise.
62 The :mod:`pickle` serialization format is guaranteed to be backwards compatible
63 across Python releases.
64
65.. warning::
66
67 The :mod:`pickle` module is not intended to be secure against erroneous or
68 maliciously constructed data. Never unpickle data received from an untrusted or
69 unauthenticated source.
70
71Note that serialization is a more primitive notion than persistence; although
72:mod:`pickle` reads and writes file objects, it does not handle the issue of
73naming persistent objects, nor the (even more complicated) issue of concurrent
74access to persistent objects. The :mod:`pickle` module can transform a complex
75object into a byte stream and it can transform the byte stream into an object
76with the same internal structure. Perhaps the most obvious thing to do with
77these byte streams is to write them onto a file, but it is also conceivable to
78send them across a network or store them in a database. The module
79:mod:`shelve` provides a simple interface to pickle and unpickle objects on
80DBM-style database files.
81
82
83Data stream format
84------------------
85
86.. index::
87 single: XDR
88 single: External Data Representation
89
90The data format used by :mod:`pickle` is Python-specific. This has the
91advantage that there are no restrictions imposed by external standards such as
92XDR (which can't represent pointer sharing); however it means that non-Python
93programs may not be able to reconstruct pickled Python objects.
94
Alexandre Vassalotti758bca62008-10-18 19:25:07 +000095By default, the :mod:`pickle` data format uses a compact binary representation.
96The module :mod:`pickletools` contains tools for analyzing data streams
97generated by :mod:`pickle`.
Georg Brandl116aa622007-08-15 14:28:22 +000098
Georg Brandl42f2ae02008-04-06 08:39:37 +000099There are currently 4 different protocols which can be used for pickling.
Georg Brandl116aa622007-08-15 14:28:22 +0000100
101* Protocol version 0 is the original ASCII protocol and is backwards compatible
102 with earlier versions of Python.
103
104* Protocol version 1 is the old binary format which is also compatible with
105 earlier versions of Python.
106
107* Protocol version 2 was introduced in Python 2.3. It provides much more
Georg Brandl9afde1c2007-11-01 20:32:30 +0000108 efficient pickling of :term:`new-style class`\es.
Georg Brandl116aa622007-08-15 14:28:22 +0000109
Georg Brandl42f2ae02008-04-06 08:39:37 +0000110* Protocol version 3 was added in Python 3.0. It has explicit support for
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000111 bytes and cannot be unpickled by Python 2.x pickle modules. This is
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000112 the current recommended protocol, use it whenever it is possible.
Georg Brandl42f2ae02008-04-06 08:39:37 +0000113
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000114Refer to :pep:`307` for information about improvements brought by
115protocol 2. See :mod:`pickletools`'s source code for extensive
116comments about opcodes used by pickle protocols.
Georg Brandl116aa622007-08-15 14:28:22 +0000117
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000118If a *protocol* is not specified, protocol 3 is used. If *protocol* is
Georg Brandl42f2ae02008-04-06 08:39:37 +0000119specified as a negative value or :const:`HIGHEST_PROTOCOL`, the highest
120protocol version available will be used.
Georg Brandl116aa622007-08-15 14:28:22 +0000121
Georg Brandl116aa622007-08-15 14:28:22 +0000122
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000123Module Interface
124----------------
Georg Brandl116aa622007-08-15 14:28:22 +0000125
126To serialize an object hierarchy, you first create a pickler, then you call the
127pickler's :meth:`dump` method. To de-serialize a data stream, you first create
128an unpickler, then you call the unpickler's :meth:`load` method. The
129:mod:`pickle` module provides the following constant:
130
131
132.. data:: HIGHEST_PROTOCOL
133
134 The highest protocol version available. This value can be passed as a
135 *protocol* value.
136
Georg Brandl116aa622007-08-15 14:28:22 +0000137.. note::
138
139 Be sure to always open pickle files created with protocols >= 1 in binary mode.
140 For the old ASCII-based pickle protocol 0 you can use either text mode or binary
141 mode as long as you stay consistent.
142
143 A pickle file written with protocol 0 in binary mode will contain lone linefeeds
144 as line terminators and therefore will look "funny" when viewed in Notepad or
145 other editors which do not support this format.
146
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000147.. data:: DEFAULT_PROTOCOL
148
149 The default protocol used for pickling. May be less than HIGHEST_PROTOCOL.
150 Currently the default protocol is 3; a backward-incompatible protocol
151 designed for Python 3.0.
152
153
Georg Brandl116aa622007-08-15 14:28:22 +0000154The :mod:`pickle` module provides the following functions to make the pickling
155process more convenient:
156
Georg Brandl116aa622007-08-15 14:28:22 +0000157.. function:: dump(obj, file[, protocol])
158
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000159 Write a pickled representation of *obj* to the open file object *file*. This
160 is equivalent to ``Pickler(file, protocol).dump(obj)``.
Georg Brandl116aa622007-08-15 14:28:22 +0000161
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000162 The optional *protocol* argument tells the pickler to use the given protocol;
163 supported protocols are 0, 1, 2, 3. The default protocol is 3; a
164 backward-incompatible protocol designed for Python 3.0.
Georg Brandl116aa622007-08-15 14:28:22 +0000165
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000166 Specifying a negative protocol version selects the highest protocol version
167 supported. The higher the protocol used, the more recent the version of
168 Python needed to read the pickle produced.
Georg Brandl116aa622007-08-15 14:28:22 +0000169
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000170 The *file* argument must have a write() method that accepts a single bytes
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000171 argument. It can thus be a file object opened for binary writing, a
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000172 io.BytesIO instance, or any other custom object that meets this interface.
Georg Brandl116aa622007-08-15 14:28:22 +0000173
174.. function:: dumps(obj[, protocol])
175
Mark Summerfieldb9e23042008-04-21 14:47:45 +0000176 Return the pickled representation of the object as a :class:`bytes`
177 object, instead of writing it to a file.
Georg Brandl116aa622007-08-15 14:28:22 +0000178
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000179 The optional *protocol* argument tells the pickler to use the given protocol;
180 supported protocols are 0, 1, 2, 3. The default protocol is 3; a
181 backward-incompatible protocol designed for Python 3.0.
182
183 Specifying a negative protocol version selects the highest protocol version
184 supported. The higher the protocol used, the more recent the version of
185 Python needed to read the pickle produced.
186
187.. function:: load(file, [\*, encoding="ASCII", errors="strict"])
188
189 Read a pickled object representation from the open file object *file* and
190 return the reconstituted object hierarchy specified therein. This is
191 equivalent to ``Unpickler(file).load()``.
192
193 The protocol version of the pickle is detected automatically, so no protocol
194 argument is needed. Bytes past the pickled object's representation are
195 ignored.
196
197 The argument *file* must have two methods, a read() method that takes an
198 integer argument, and a readline() method that requires no arguments. Both
199 methods should return bytes. Thus *file* can be a binary file object opened
200 for reading, a BytesIO object, or any other custom object that meets this
201 interface.
202
203 Optional keyword arguments are encoding and errors, which are used to decode
204 8-bit string instances pickled by Python 2.x. These default to 'ASCII' and
205 'strict', respectively.
206
207.. function:: loads(bytes_object, [\*, encoding="ASCII", errors="strict"])
208
209 Read a pickled object hierarchy from a :class:`bytes` object and return the
210 reconstituted object hierarchy specified therein
211
212 The protocol version of the pickle is detected automatically, so no protocol
213 argument is needed. Bytes past the pickled object's representation are
214 ignored.
215
216 Optional keyword arguments are encoding and errors, which are used to decode
217 8-bit string instances pickled by Python 2.x. These default to 'ASCII' and
218 'strict', respectively.
Georg Brandl116aa622007-08-15 14:28:22 +0000219
Georg Brandl116aa622007-08-15 14:28:22 +0000220
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000221The :mod:`pickle` module defines three exceptions:
Georg Brandl116aa622007-08-15 14:28:22 +0000222
223.. exception:: PickleError
224
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000225 Common base class for the other pickling exceptions. It inherits
Georg Brandl116aa622007-08-15 14:28:22 +0000226 :exc:`Exception`.
227
Georg Brandl116aa622007-08-15 14:28:22 +0000228.. exception:: PicklingError
229
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000230 Error raised when an unpicklable object is encountered by :class:`Pickler`.
231 It inherits :exc:`PickleError`.
Georg Brandl116aa622007-08-15 14:28:22 +0000232
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000233 Refer to :ref:`pickle-picklable` to learn what kinds of objects can be
234 pickled.
235
Georg Brandl116aa622007-08-15 14:28:22 +0000236.. exception:: UnpicklingError
237
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000238 Error raised when there a problem unpickling an object, such as a data
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000239 corruption or a security violation. It inherits :exc:`PickleError`.
Georg Brandl116aa622007-08-15 14:28:22 +0000240
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000241 Note that other exceptions may also be raised during unpickling, including
242 (but not necessarily limited to) AttributeError, EOFError, ImportError, and
243 IndexError.
244
245
246The :mod:`pickle` module exports two classes, :class:`Pickler` and
Georg Brandl116aa622007-08-15 14:28:22 +0000247:class:`Unpickler`:
248
Georg Brandl116aa622007-08-15 14:28:22 +0000249.. class:: Pickler(file[, protocol])
250
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000251 This takes a binary file for writing a pickle data stream.
Georg Brandl116aa622007-08-15 14:28:22 +0000252
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000253 The optional *protocol* argument tells the pickler to use the given protocol;
254 supported protocols are 0, 1, 2, 3. The default protocol is 3; a
255 backward-incompatible protocol designed for Python 3.0.
Georg Brandl116aa622007-08-15 14:28:22 +0000256
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000257 Specifying a negative protocol version selects the highest protocol version
258 supported. The higher the protocol used, the more recent the version of
259 Python needed to read the pickle produced.
Georg Brandl116aa622007-08-15 14:28:22 +0000260
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000261 The *file* argument must have a write() method that accepts a single bytes
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000262 argument. It can thus be a file object opened for binary writing, a
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000263 io.BytesIO instance, or any other custom object that meets this interface.
Georg Brandl116aa622007-08-15 14:28:22 +0000264
Benjamin Petersone41251e2008-04-25 01:59:09 +0000265 .. method:: dump(obj)
Georg Brandl116aa622007-08-15 14:28:22 +0000266
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000267 Write a pickled representation of *obj* to the open file object given in
268 the constructor.
Georg Brandl116aa622007-08-15 14:28:22 +0000269
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000270 .. method:: persistent_id(obj)
271
272 Do nothing by default. This exists so a subclass can override it.
273
274 If :meth:`persistent_id` returns ``None``, *obj* is pickled as usual. Any
275 other value causes :class:`Pickler` to emit the returned value as a
276 persistent ID for *obj*. The meaning of this persistent ID should be
277 defined by :meth:`Unpickler.persistent_load`. Note that the value
278 returned by :meth:`persistent_id` cannot itself have a persistent ID.
279
280 See :ref:`pickle-persistent` for details and examples of uses.
Georg Brandl116aa622007-08-15 14:28:22 +0000281
Benjamin Petersone41251e2008-04-25 01:59:09 +0000282 .. method:: clear_memo()
Georg Brandl116aa622007-08-15 14:28:22 +0000283
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000284 Deprecated. Use the :meth:`clear` method on :attr:`memo`, instead.
285 Clear the pickler's memo, useful when reusing picklers.
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000286
287 .. attribute:: fast
288
289 Enable fast mode if set to a true value. The fast mode disables the usage
290 of memo, therefore speeding the pickling process by not generating
291 superfluous PUT opcodes. It should not be used with self-referential
292 objects, doing otherwise will cause :class:`Pickler` to recurse
293 infinitely.
294
295 Use :func:`pickletools.optimize` if you need more compact pickles.
296
297 .. attribute:: memo
298
299 Dictionary holding previously pickled objects to allow shared or
300 recursive objects to pickled by reference as opposed to by value.
Georg Brandl116aa622007-08-15 14:28:22 +0000301
Georg Brandl116aa622007-08-15 14:28:22 +0000302
303It is possible to make multiple calls to the :meth:`dump` method of the same
304:class:`Pickler` instance. These must then be matched to the same number of
305calls to the :meth:`load` method of the corresponding :class:`Unpickler`
306instance. If the same object is pickled by multiple :meth:`dump` calls, the
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000307:meth:`load` will all yield references to the same object.
Georg Brandl116aa622007-08-15 14:28:22 +0000308
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000309Please note, this is intended for pickling multiple objects without intervening
310modifications to the objects or their parts. If you modify an object and then
311pickle it again using the same :class:`Pickler` instance, the object is not
312pickled again --- a reference to it is pickled and the :class:`Unpickler` will
313return the old value, not the modified one.
Georg Brandl116aa622007-08-15 14:28:22 +0000314
315
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000316.. class:: Unpickler(file, [\*, encoding="ASCII", errors="strict"])
Georg Brandl116aa622007-08-15 14:28:22 +0000317
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000318 This takes a binary file for reading a pickle data stream.
Georg Brandl116aa622007-08-15 14:28:22 +0000319
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000320 The protocol version of the pickle is detected automatically, so no
321 protocol argument is needed.
322
323 The argument *file* must have two methods, a read() method that takes an
324 integer argument, and a readline() method that requires no arguments. Both
325 methods should return bytes. Thus *file* can be a binary file object opened
326 for reading, a BytesIO object, or any other custom object that meets this
Georg Brandl116aa622007-08-15 14:28:22 +0000327 interface.
328
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000329 Optional keyword arguments are encoding and errors, which are used to decode
330 8-bit string instances pickled by Python 2.x. These default to 'ASCII' and
331 'strict', respectively.
Georg Brandl116aa622007-08-15 14:28:22 +0000332
Benjamin Petersone41251e2008-04-25 01:59:09 +0000333 .. method:: load()
Georg Brandl116aa622007-08-15 14:28:22 +0000334
Benjamin Petersone41251e2008-04-25 01:59:09 +0000335 Read a pickled object representation from the open file object given in
336 the constructor, and return the reconstituted object hierarchy specified
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000337 therein. Bytes past the pickled object's representation are ignored.
Georg Brandl116aa622007-08-15 14:28:22 +0000338
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000339 .. method:: persistent_load(pid)
Georg Brandl116aa622007-08-15 14:28:22 +0000340
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000341 Raise an :exc:`UnpickingError` by default.
Georg Brandl116aa622007-08-15 14:28:22 +0000342
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000343 If defined, :meth:`persistent_load` should return the object specified by
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000344 the persistent ID *pid*. If an invalid persistent ID is encountered, an
345 :exc:`UnpickingError` should be raised.
Georg Brandl116aa622007-08-15 14:28:22 +0000346
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000347 See :ref:`pickle-persistent` for details and examples of uses.
348
349 .. method:: find_class(module, name)
350
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000351 Import *module* if necessary and return the object called *name* from it,
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000352 where the *module* and *name* arguments are :class:`str` objects. Note,
353 unlike its name suggests, :meth:`find_class` is also used for finding
354 functions.
Georg Brandl116aa622007-08-15 14:28:22 +0000355
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000356 Subclasses may override this to gain control over what type of objects and
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000357 how they can be loaded, potentially reducing security risks. Refer to
358 :ref:`pickle-restrict` for details.
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000359
360
361.. _pickle-picklable:
Georg Brandl116aa622007-08-15 14:28:22 +0000362
363What can be pickled and unpickled?
364----------------------------------
365
366The following types can be pickled:
367
368* ``None``, ``True``, and ``False``
369
Georg Brandlba956ae2007-11-29 17:24:34 +0000370* integers, floating point numbers, complex numbers
Georg Brandl116aa622007-08-15 14:28:22 +0000371
Georg Brandlf6945182008-02-01 11:56:49 +0000372* strings, bytes, bytearrays
Georg Brandl116aa622007-08-15 14:28:22 +0000373
374* tuples, lists, sets, and dictionaries containing only picklable objects
375
376* functions defined at the top level of a module
377
378* built-in functions defined at the top level of a module
379
380* classes that are defined at the top level of a module
381
382* instances of such classes whose :attr:`__dict__` or :meth:`__setstate__` is
383 picklable (see section :ref:`pickle-protocol` for details)
384
385Attempts to pickle unpicklable objects will raise the :exc:`PicklingError`
386exception; when this happens, an unspecified number of bytes may have already
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000387been written to the underlying file. Trying to pickle a highly recursive data
Georg Brandl116aa622007-08-15 14:28:22 +0000388structure may exceed the maximum recursion depth, a :exc:`RuntimeError` will be
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000389raised in this case. You can carefully raise this limit with
Georg Brandl116aa622007-08-15 14:28:22 +0000390:func:`sys.setrecursionlimit`.
391
392Note that functions (built-in and user-defined) are pickled by "fully qualified"
393name reference, not by value. This means that only the function name is
394pickled, along with the name of module the function is defined in. Neither the
395function's code, nor any of its function attributes are pickled. Thus the
396defining module must be importable in the unpickling environment, and the module
397must contain the named object, otherwise an exception will be raised. [#]_
398
399Similarly, classes are pickled by named reference, so the same restrictions in
400the unpickling environment apply. Note that none of the class's code or data is
401pickled, so in the following example the class attribute ``attr`` is not
402restored in the unpickling environment::
403
404 class Foo:
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000405 attr = 'A class attribute'
Georg Brandl116aa622007-08-15 14:28:22 +0000406
407 picklestring = pickle.dumps(Foo)
408
409These restrictions are why picklable functions and classes must be defined in
410the top level of a module.
411
412Similarly, when class instances are pickled, their class's code and data are not
413pickled along with them. Only the instance data are pickled. This is done on
414purpose, so you can fix bugs in a class or add methods to the class and still
415load objects that were created with an earlier version of the class. If you
416plan to have long-lived objects that will see many versions of a class, it may
417be worthwhile to put a version number in the objects so that suitable
418conversions can be made by the class's :meth:`__setstate__` method.
419
420
421.. _pickle-protocol:
422
423The pickle protocol
424-------------------
425
426This section describes the "pickling protocol" that defines the interface
427between the pickler/unpickler and the objects that are being serialized. This
428protocol provides a standard way for you to define, customize, and control how
429your objects are serialized and de-serialized. The description in this section
430doesn't cover specific customizations that you can employ to make the unpickling
431environment slightly safer from untrusted pickle data streams; see section
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000432:ref:`pickle-restrict` for more details.
Georg Brandl116aa622007-08-15 14:28:22 +0000433
434
435.. _pickle-inst:
436
437Pickling and unpickling normal class instances
438^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
439
440.. index::
441 single: __getinitargs__() (copy protocol)
442 single: __init__() (instance constructor)
443
Georg Brandl85eb8c12007-08-31 16:33:38 +0000444.. XXX is __getinitargs__ only used with old-style classes?
Georg Brandl23e8db52008-04-07 19:17:06 +0000445.. XXX update w.r.t Py3k's classes
Georg Brandl85eb8c12007-08-31 16:33:38 +0000446
Georg Brandl116aa622007-08-15 14:28:22 +0000447When a pickled class instance is unpickled, its :meth:`__init__` method is
448normally *not* invoked. If it is desirable that the :meth:`__init__` method be
449called on unpickling, an old-style class can define a method
450:meth:`__getinitargs__`, which should return a *tuple* containing the arguments
451to be passed to the class constructor (:meth:`__init__` for example). The
452:meth:`__getinitargs__` method is called at pickle time; the tuple it returns is
453incorporated in the pickle for the instance.
454
455.. index:: single: __getnewargs__() (copy protocol)
456
457New-style types can provide a :meth:`__getnewargs__` method that is used for
458protocol 2. Implementing this method is needed if the type establishes some
459internal invariants when the instance is created, or if the memory allocation is
460affected by the values passed to the :meth:`__new__` method for the type (as it
Georg Brandl9afde1c2007-11-01 20:32:30 +0000461is for tuples and strings). Instances of a :term:`new-style class` :class:`C`
462are created using ::
Georg Brandl116aa622007-08-15 14:28:22 +0000463
464 obj = C.__new__(C, *args)
465
466
467where *args* is the result of calling :meth:`__getnewargs__` on the original
468object; if there is no :meth:`__getnewargs__`, an empty tuple is assumed.
469
470.. index::
471 single: __getstate__() (copy protocol)
472 single: __setstate__() (copy protocol)
473 single: __dict__ (instance attribute)
474
475Classes can further influence how their instances are pickled; if the class
476defines the method :meth:`__getstate__`, it is called and the return state is
477pickled as the contents for the instance, instead of the contents of the
478instance's dictionary. If there is no :meth:`__getstate__` method, the
479instance's :attr:`__dict__` is pickled.
480
481Upon unpickling, if the class also defines the method :meth:`__setstate__`, it
482is called with the unpickled state. [#]_ If there is no :meth:`__setstate__`
483method, the pickled state must be a dictionary and its items are assigned to the
484new instance's dictionary. If a class defines both :meth:`__getstate__` and
485:meth:`__setstate__`, the state object needn't be a dictionary and these methods
486can do what they want. [#]_
487
488.. warning::
489
Georg Brandl23e8db52008-04-07 19:17:06 +0000490 If :meth:`__getstate__` returns a false value, the :meth:`__setstate__`
491 method will not be called.
Georg Brandl116aa622007-08-15 14:28:22 +0000492
493
494Pickling and unpickling extension types
495^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
496
Christian Heimes05e8be12008-02-23 18:30:17 +0000497.. index::
498 single: __reduce__() (pickle protocol)
499 single: __reduce_ex__() (pickle protocol)
500 single: __safe_for_unpickling__ (pickle protocol)
501
Georg Brandl116aa622007-08-15 14:28:22 +0000502When the :class:`Pickler` encounters an object of a type it knows nothing about
503--- such as an extension type --- it looks in two places for a hint of how to
504pickle it. One alternative is for the object to implement a :meth:`__reduce__`
505method. If provided, at pickling time :meth:`__reduce__` will be called with no
506arguments, and it must return either a string or a tuple.
507
508If a string is returned, it names a global variable whose contents are pickled
509as normal. The string returned by :meth:`__reduce__` should be the object's
510local name relative to its module; the pickle module searches the module
511namespace to determine the object's module.
512
513When a tuple is returned, it must be between two and five elements long.
Martin v. Löwis2a241ca2008-04-05 18:58:09 +0000514Optional elements can either be omitted, or ``None`` can be provided as their
515value. The contents of this tuple are pickled as normal and used to
516reconstruct the object at unpickling time. The semantics of each element are:
Georg Brandl116aa622007-08-15 14:28:22 +0000517
518* A callable object that will be called to create the initial version of the
519 object. The next element of the tuple will provide arguments for this callable,
520 and later elements provide additional state information that will subsequently
521 be used to fully reconstruct the pickled data.
522
523 In the unpickling environment this object must be either a class, a callable
524 registered as a "safe constructor" (see below), or it must have an attribute
525 :attr:`__safe_for_unpickling__` with a true value. Otherwise, an
526 :exc:`UnpicklingError` will be raised in the unpickling environment. Note that
527 as usual, the callable itself is pickled by name.
528
Georg Brandl55ac8f02007-09-01 13:51:09 +0000529* A tuple of arguments for the callable object, not ``None``.
Georg Brandl116aa622007-08-15 14:28:22 +0000530
531* Optionally, the object's state, which will be passed to the object's
532 :meth:`__setstate__` method as described in section :ref:`pickle-inst`. If the
533 object has no :meth:`__setstate__` method, then, as above, the value must be a
534 dictionary and it will be added to the object's :attr:`__dict__`.
535
536* Optionally, an iterator (and not a sequence) yielding successive list items.
537 These list items will be pickled, and appended to the object using either
538 ``obj.append(item)`` or ``obj.extend(list_of_items)``. This is primarily used
539 for list subclasses, but may be used by other classes as long as they have
540 :meth:`append` and :meth:`extend` methods with the appropriate signature.
541 (Whether :meth:`append` or :meth:`extend` is used depends on which pickle
542 protocol version is used as well as the number of items to append, so both must
543 be supported.)
544
545* Optionally, an iterator (not a sequence) yielding successive dictionary items,
546 which should be tuples of the form ``(key, value)``. These items will be
547 pickled and stored to the object using ``obj[key] = value``. This is primarily
548 used for dictionary subclasses, but may be used by other classes as long as they
549 implement :meth:`__setitem__`.
550
551It is sometimes useful to know the protocol version when implementing
552:meth:`__reduce__`. This can be done by implementing a method named
553:meth:`__reduce_ex__` instead of :meth:`__reduce__`. :meth:`__reduce_ex__`, when
554it exists, is called in preference over :meth:`__reduce__` (you may still
555provide :meth:`__reduce__` for backwards compatibility). The
556:meth:`__reduce_ex__` method will be called with a single integer argument, the
557protocol version.
558
559The :class:`object` class implements both :meth:`__reduce__` and
560:meth:`__reduce_ex__`; however, if a subclass overrides :meth:`__reduce__` but
561not :meth:`__reduce_ex__`, the :meth:`__reduce_ex__` implementation detects this
562and calls :meth:`__reduce__`.
563
564An alternative to implementing a :meth:`__reduce__` method on the object to be
Alexandre Vassalottif7fa63d2008-05-11 08:55:36 +0000565pickled, is to register the callable with the :mod:`copyreg` module. This
Georg Brandl116aa622007-08-15 14:28:22 +0000566module provides a way for programs to register "reduction functions" and
567constructors for user-defined types. Reduction functions have the same
568semantics and interface as the :meth:`__reduce__` method described above, except
569that they are called with a single argument, the object to be pickled.
570
571The registered constructor is deemed a "safe constructor" for purposes of
572unpickling as described above.
573
574
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000575.. _pickle-persistent:
576
Georg Brandl116aa622007-08-15 14:28:22 +0000577Pickling and unpickling external objects
578^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
579
Christian Heimes05e8be12008-02-23 18:30:17 +0000580.. index::
581 single: persistent_id (pickle protocol)
582 single: persistent_load (pickle protocol)
583
Georg Brandl116aa622007-08-15 14:28:22 +0000584For the benefit of object persistence, the :mod:`pickle` module supports the
585notion of a reference to an object outside the pickled data stream. Such
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000586objects are referenced by a persistent ID, which should be either a string of
587alphanumeric characters (for protocol 0) [#]_ or just an arbitrary object (for
588any newer protocol).
Georg Brandl116aa622007-08-15 14:28:22 +0000589
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000590The resolution of such persistent IDs is not defined by the :mod:`pickle`
591module; it will delegate this resolution to the user defined methods on the
592pickler and unpickler, :meth:`persistent_id` and :meth:`persistent_load`
593respectively.
Georg Brandl116aa622007-08-15 14:28:22 +0000594
595To pickle objects that have an external persistent id, the pickler must have a
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000596custom :meth:`persistent_id` method that takes an object as an argument and
Georg Brandl116aa622007-08-15 14:28:22 +0000597returns either ``None`` or the persistent id for that object. When ``None`` is
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000598returned, the pickler simply pickles the object as normal. When a persistent ID
599string is returned, the pickler will pickle that object, along with a marker so
600that the unpickler will recognize it as a persistent ID.
Georg Brandl116aa622007-08-15 14:28:22 +0000601
602To unpickle external objects, the unpickler must have a custom
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000603:meth:`persistent_load` method that takes a persistent ID object and returns the
604referenced object.
Georg Brandl116aa622007-08-15 14:28:22 +0000605
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000606Example:
Georg Brandl116aa622007-08-15 14:28:22 +0000607
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000608.. XXX Work around for some bug in sphinx/pygments.
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000609.. highlightlang:: python
610.. literalinclude:: ../includes/dbpickle.py
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000611.. highlightlang:: python3
Georg Brandl116aa622007-08-15 14:28:22 +0000612
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000613.. _pickle-restrict:
Georg Brandl116aa622007-08-15 14:28:22 +0000614
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000615Restricting Globals
616^^^^^^^^^^^^^^^^^^^
Georg Brandl116aa622007-08-15 14:28:22 +0000617
Christian Heimes05e8be12008-02-23 18:30:17 +0000618.. index::
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000619 single: find_class() (pickle protocol)
Christian Heimes05e8be12008-02-23 18:30:17 +0000620
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000621By default, unpickling will import any class or function that it finds in the
622pickle data. For many applications, this behaviour is unacceptable as it
623permits the unpickler to import and invoke arbitrary code. Just consider what
624this hand-crafted pickle data stream does when loaded::
Georg Brandl116aa622007-08-15 14:28:22 +0000625
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000626 >>> import pickle
627 >>> pickle.loads(b"cos\nsystem\n(S'echo hello world'\ntR.")
628 hello world
629 0
Georg Brandl116aa622007-08-15 14:28:22 +0000630
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000631In this example, the unpickler imports the :func:`os.system` function and then
632apply the string argument "echo hello world". Although this example is
633inoffensive, it is not difficult to imagine one that could damage your system.
Georg Brandl116aa622007-08-15 14:28:22 +0000634
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000635For this reason, you may want to control what gets unpickled by customizing
636:meth:`Unpickler.find_class`. Unlike its name suggests, :meth:`find_class` is
637called whenever a global (i.e., a class or a function) is requested. Thus it is
638possible to either forbid completely globals or restrict them to a safe subset.
639
640Here is an example of an unpickler allowing only few safe classes from the
641:mod:`builtins` module to be loaded::
642
643 import builtins
644 import io
645 import pickle
646
647 safe_builtins = {
648 'range',
649 'complex',
650 'set',
651 'frozenset',
652 'slice',
653 }
654
655 class RestrictedUnpickler(pickle.Unpickler):
656 def find_class(self, module, name):
657 # Only allow safe classes from builtins.
658 if module == "builtins" and name in safe_builtins:
659 return getattr(builtins, name)
660 # Forbid everything else.
661 raise pickle.UnpicklingError("global '%s.%s' is forbidden" %
662 (module, name))
663
664 def restricted_loads(s):
665 """Helper function analogous to pickle.loads()."""
666 return RestrictedUnpickler(io.BytesIO(s)).load()
667
668A sample usage of our unpickler working has intended::
669
670 >>> restricted_loads(pickle.dumps([1, 2, range(15)]))
671 [1, 2, range(0, 15)]
672 >>> restricted_loads(b"cos\nsystem\n(S'echo hello world'\ntR.")
673 Traceback (most recent call last):
674 ...
675 pickle.UnpicklingError: global 'os.system' is forbidden
676 >>> restricted_loads(b'cbuiltins\neval\n'
677 ... b'(S\'getattr(__import__("os"), "system")'
678 ... b'("echo hello world")\'\ntR.')
679 Traceback (most recent call last):
680 ...
681 pickle.UnpicklingError: global 'builtins.eval' is forbidden
682
683As our examples shows, you have to be careful with what you allow to
684be unpickled. Therefore if security is a concern, you may want to consider
685alternatives such as the marshalling API in :mod:`xmlrpc.client` or
686third-party solutions.
Georg Brandl116aa622007-08-15 14:28:22 +0000687
688.. _pickle-example:
689
690Example
691-------
692
693For the simplest code, use the :func:`dump` and :func:`load` functions. Note
694that a self-referencing list is pickled and restored correctly. ::
695
696 import pickle
697
698 data1 = {'a': [1, 2.0, 3, 4+6j],
Georg Brandlf6945182008-02-01 11:56:49 +0000699 'b': ("string", "string using Unicode features \u0394"),
Georg Brandl116aa622007-08-15 14:28:22 +0000700 'c': None}
701
702 selfref_list = [1, 2, 3]
703 selfref_list.append(selfref_list)
704
705 output = open('data.pkl', 'wb')
706
Georg Brandl42f2ae02008-04-06 08:39:37 +0000707 # Pickle dictionary using protocol 2.
708 pickle.dump(data1, output, 2)
Georg Brandl116aa622007-08-15 14:28:22 +0000709
710 # Pickle the list using the highest protocol available.
711 pickle.dump(selfref_list, output, -1)
712
713 output.close()
714
715The following example reads the resulting pickled data. When reading a
716pickle-containing file, you should open the file in binary mode because you
717can't be sure if the ASCII or binary format was used. ::
718
719 import pprint, pickle
720
721 pkl_file = open('data.pkl', 'rb')
722
723 data1 = pickle.load(pkl_file)
724 pprint.pprint(data1)
725
726 data2 = pickle.load(pkl_file)
727 pprint.pprint(data2)
728
729 pkl_file.close()
730
731Here's a larger example that shows how to modify pickling behavior for a class.
732The :class:`TextReader` class opens a text file, and returns the line number and
733line contents each time its :meth:`readline` method is called. If a
734:class:`TextReader` instance is pickled, all attributes *except* the file object
735member are saved. When the instance is unpickled, the file is reopened, and
736reading resumes from the last location. The :meth:`__setstate__` and
737:meth:`__getstate__` methods are used to implement this behavior. ::
738
739 #!/usr/local/bin/python
740
741 class TextReader:
742 """Print and number lines in a text file."""
743 def __init__(self, file):
744 self.file = file
745 self.fh = open(file)
746 self.lineno = 0
747
748 def readline(self):
749 self.lineno = self.lineno + 1
750 line = self.fh.readline()
751 if not line:
752 return None
753 if line.endswith("\n"):
754 line = line[:-1]
755 return "%d: %s" % (self.lineno, line)
756
757 def __getstate__(self):
758 odict = self.__dict__.copy() # copy the dict since we change it
759 del odict['fh'] # remove filehandle entry
760 return odict
761
762 def __setstate__(self, dict):
763 fh = open(dict['file']) # reopen file
764 count = dict['lineno'] # read from file...
765 while count: # until line count is restored
766 fh.readline()
767 count = count - 1
768 self.__dict__.update(dict) # update attributes
769 self.fh = fh # save the file object
770
771A sample usage might be something like this::
772
773 >>> import TextReader
774 >>> obj = TextReader.TextReader("TextReader.py")
775 >>> obj.readline()
776 '1: #!/usr/local/bin/python'
777 >>> obj.readline()
778 '2: '
779 >>> obj.readline()
780 '3: class TextReader:'
781 >>> import pickle
782 >>> pickle.dump(obj, open('save.p', 'wb'))
783
784If you want to see that :mod:`pickle` works across Python processes, start
785another Python session, before continuing. What follows can happen from either
786the same process or a new process. ::
787
788 >>> import pickle
789 >>> reader = pickle.load(open('save.p', 'rb'))
790 >>> reader.readline()
791 '4: """Print and number lines in a text file."""'
792
793
794.. seealso::
795
Alexandre Vassalottif7fa63d2008-05-11 08:55:36 +0000796 Module :mod:`copyreg`
Georg Brandl116aa622007-08-15 14:28:22 +0000797 Pickle interface constructor registration for extension types.
798
799 Module :mod:`shelve`
800 Indexed databases of objects; uses :mod:`pickle`.
801
802 Module :mod:`copy`
803 Shallow and deep object copying.
804
805 Module :mod:`marshal`
806 High-performance serialization of built-in types.
807
808
Georg Brandl116aa622007-08-15 14:28:22 +0000809.. rubric:: Footnotes
810
811.. [#] Don't confuse this with the :mod:`marshal` module
812
Georg Brandl116aa622007-08-15 14:28:22 +0000813.. [#] The exception raised will likely be an :exc:`ImportError` or an
814 :exc:`AttributeError` but it could be something else.
815
816.. [#] These methods can also be used to implement copying class instances.
817
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000818.. [#] This protocol is also used by the shallow and deep copying operations
819 defined in the :mod:`copy` module.
820
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000821.. [#] The limitation on alphanumeric characters is due to the fact
822 the persistent IDs, in protocol 0, are delimited by the newline
823 character. Therefore if any kind of newline characters occurs in
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000824 persistent IDs, the resulting pickle will become unreadable.