blob: 9a31520e644a5afc8bb0151d5d838dfc32ba914d [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001:mod:`pickle` --- Python object serialization
2=============================================
3
4.. index::
5 single: persistence
6 pair: persistent; objects
7 pair: serializing; objects
8 pair: marshalling; objects
9 pair: flattening; objects
10 pair: pickling; objects
11
12.. module:: pickle
13 :synopsis: Convert Python objects to streams of bytes and back.
Christian Heimes5b5e81c2007-12-31 16:14:33 +000014.. sectionauthor:: Jim Kerr <jbkerr@sr.hp.com>.
15.. sectionauthor:: Barry Warsaw <barry@zope.com>
Georg Brandl116aa622007-08-15 14:28:22 +000016
17The :mod:`pickle` module implements a fundamental, but powerful algorithm for
18serializing and de-serializing a Python object structure. "Pickling" is the
19process whereby a Python object hierarchy is converted into a byte stream, and
20"unpickling" is the inverse operation, whereby a byte stream is converted back
21into an object hierarchy. Pickling (and unpickling) is alternatively known as
22"serialization", "marshalling," [#]_ or "flattening", however, to avoid
Benjamin Petersonbe149d02008-06-20 21:03:22 +000023confusion, the terms used here are "pickling" and "unpickling"..
Georg Brandl116aa622007-08-15 14:28:22 +000024
25
26Relationship to other Python modules
27------------------------------------
28
Benjamin Petersonbe149d02008-06-20 21:03:22 +000029The :mod:`pickle` module has an transparent optimizer (:mod:`_pickle`) written
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +000030in C. It is used whenever available. Otherwise the pure Python implementation is
Benjamin Petersonbe149d02008-06-20 21:03:22 +000031used.
Georg Brandl116aa622007-08-15 14:28:22 +000032
33Python has a more primitive serialization module called :mod:`marshal`, but in
34general :mod:`pickle` should always be the preferred way to serialize Python
35objects. :mod:`marshal` exists primarily to support Python's :file:`.pyc`
36files.
37
38The :mod:`pickle` module differs from :mod:`marshal` several significant ways:
39
40* The :mod:`pickle` module keeps track of the objects it has already serialized,
41 so that later references to the same object won't be serialized again.
42 :mod:`marshal` doesn't do this.
43
44 This has implications both for recursive objects and object sharing. Recursive
45 objects are objects that contain references to themselves. These are not
46 handled by marshal, and in fact, attempting to marshal recursive objects will
47 crash your Python interpreter. Object sharing happens when there are multiple
48 references to the same object in different places in the object hierarchy being
49 serialized. :mod:`pickle` stores such objects only once, and ensures that all
50 other references point to the master copy. Shared objects remain shared, which
51 can be very important for mutable objects.
52
53* :mod:`marshal` cannot be used to serialize user-defined classes and their
54 instances. :mod:`pickle` can save and restore class instances transparently,
55 however the class definition must be importable and live in the same module as
56 when the object was stored.
57
58* The :mod:`marshal` serialization format is not guaranteed to be portable
59 across Python versions. Because its primary job in life is to support
60 :file:`.pyc` files, the Python implementers reserve the right to change the
61 serialization format in non-backwards compatible ways should the need arise.
62 The :mod:`pickle` serialization format is guaranteed to be backwards compatible
63 across Python releases.
64
65.. warning::
66
67 The :mod:`pickle` module is not intended to be secure against erroneous or
68 maliciously constructed data. Never unpickle data received from an untrusted or
69 unauthenticated source.
70
71Note that serialization is a more primitive notion than persistence; although
72:mod:`pickle` reads and writes file objects, it does not handle the issue of
73naming persistent objects, nor the (even more complicated) issue of concurrent
74access to persistent objects. The :mod:`pickle` module can transform a complex
75object into a byte stream and it can transform the byte stream into an object
76with the same internal structure. Perhaps the most obvious thing to do with
77these byte streams is to write them onto a file, but it is also conceivable to
78send them across a network or store them in a database. The module
79:mod:`shelve` provides a simple interface to pickle and unpickle objects on
80DBM-style database files.
81
82
83Data stream format
84------------------
85
86.. index::
87 single: XDR
88 single: External Data Representation
89
90The data format used by :mod:`pickle` is Python-specific. This has the
91advantage that there are no restrictions imposed by external standards such as
92XDR (which can't represent pointer sharing); however it means that non-Python
93programs may not be able to reconstruct pickled Python objects.
94
Alexandre Vassalotti758bca62008-10-18 19:25:07 +000095By default, the :mod:`pickle` data format uses a compact binary representation.
96The module :mod:`pickletools` contains tools for analyzing data streams
97generated by :mod:`pickle`.
Georg Brandl116aa622007-08-15 14:28:22 +000098
Georg Brandl42f2ae02008-04-06 08:39:37 +000099There are currently 4 different protocols which can be used for pickling.
Georg Brandl116aa622007-08-15 14:28:22 +0000100
Alexandre Vassalottif7d08c72009-01-23 04:50:05 +0000101* Protocol version 0 is the original human-readable protocol and is
102 backwards compatible with earlier versions of Python.
Georg Brandl116aa622007-08-15 14:28:22 +0000103
104* Protocol version 1 is the old binary format which is also compatible with
105 earlier versions of Python.
106
107* Protocol version 2 was introduced in Python 2.3. It provides much more
Georg Brandl9afde1c2007-11-01 20:32:30 +0000108 efficient pickling of :term:`new-style class`\es.
Georg Brandl116aa622007-08-15 14:28:22 +0000109
Georg Brandl42f2ae02008-04-06 08:39:37 +0000110* Protocol version 3 was added in Python 3.0. It has explicit support for
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000111 bytes and cannot be unpickled by Python 2.x pickle modules. This is
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000112 the current recommended protocol, use it whenever it is possible.
Georg Brandl42f2ae02008-04-06 08:39:37 +0000113
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000114Refer to :pep:`307` for information about improvements brought by
115protocol 2. See :mod:`pickletools`'s source code for extensive
116comments about opcodes used by pickle protocols.
Georg Brandl116aa622007-08-15 14:28:22 +0000117
Georg Brandl116aa622007-08-15 14:28:22 +0000118
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000119Module Interface
120----------------
Georg Brandl116aa622007-08-15 14:28:22 +0000121
122To serialize an object hierarchy, you first create a pickler, then you call the
123pickler's :meth:`dump` method. To de-serialize a data stream, you first create
124an unpickler, then you call the unpickler's :meth:`load` method. The
125:mod:`pickle` module provides the following constant:
126
127
128.. data:: HIGHEST_PROTOCOL
129
130 The highest protocol version available. This value can be passed as a
131 *protocol* value.
132
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000133.. data:: DEFAULT_PROTOCOL
134
135 The default protocol used for pickling. May be less than HIGHEST_PROTOCOL.
136 Currently the default protocol is 3; a backward-incompatible protocol
137 designed for Python 3.0.
138
139
Georg Brandl116aa622007-08-15 14:28:22 +0000140The :mod:`pickle` module provides the following functions to make the pickling
141process more convenient:
142
Georg Brandl116aa622007-08-15 14:28:22 +0000143.. function:: dump(obj, file[, protocol])
144
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000145 Write a pickled representation of *obj* to the open file object *file*. This
146 is equivalent to ``Pickler(file, protocol).dump(obj)``.
Georg Brandl116aa622007-08-15 14:28:22 +0000147
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000148 The optional *protocol* argument tells the pickler to use the given protocol;
149 supported protocols are 0, 1, 2, 3. The default protocol is 3; a
150 backward-incompatible protocol designed for Python 3.0.
Georg Brandl116aa622007-08-15 14:28:22 +0000151
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000152 Specifying a negative protocol version selects the highest protocol version
153 supported. The higher the protocol used, the more recent the version of
154 Python needed to read the pickle produced.
Georg Brandl116aa622007-08-15 14:28:22 +0000155
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000156 The *file* argument must have a write() method that accepts a single bytes
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000157 argument. It can thus be a file object opened for binary writing, a
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000158 io.BytesIO instance, or any other custom object that meets this interface.
Georg Brandl116aa622007-08-15 14:28:22 +0000159
160.. function:: dumps(obj[, protocol])
161
Mark Summerfieldb9e23042008-04-21 14:47:45 +0000162 Return the pickled representation of the object as a :class:`bytes`
163 object, instead of writing it to a file.
Georg Brandl116aa622007-08-15 14:28:22 +0000164
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000165 The optional *protocol* argument tells the pickler to use the given protocol;
166 supported protocols are 0, 1, 2, 3. The default protocol is 3; a
167 backward-incompatible protocol designed for Python 3.0.
168
169 Specifying a negative protocol version selects the highest protocol version
170 supported. The higher the protocol used, the more recent the version of
171 Python needed to read the pickle produced.
172
173.. function:: load(file, [\*, encoding="ASCII", errors="strict"])
174
175 Read a pickled object representation from the open file object *file* and
176 return the reconstituted object hierarchy specified therein. This is
177 equivalent to ``Unpickler(file).load()``.
178
179 The protocol version of the pickle is detected automatically, so no protocol
180 argument is needed. Bytes past the pickled object's representation are
181 ignored.
182
183 The argument *file* must have two methods, a read() method that takes an
184 integer argument, and a readline() method that requires no arguments. Both
185 methods should return bytes. Thus *file* can be a binary file object opened
186 for reading, a BytesIO object, or any other custom object that meets this
187 interface.
188
189 Optional keyword arguments are encoding and errors, which are used to decode
190 8-bit string instances pickled by Python 2.x. These default to 'ASCII' and
191 'strict', respectively.
192
193.. function:: loads(bytes_object, [\*, encoding="ASCII", errors="strict"])
194
195 Read a pickled object hierarchy from a :class:`bytes` object and return the
196 reconstituted object hierarchy specified therein
197
198 The protocol version of the pickle is detected automatically, so no protocol
199 argument is needed. Bytes past the pickled object's representation are
200 ignored.
201
202 Optional keyword arguments are encoding and errors, which are used to decode
203 8-bit string instances pickled by Python 2.x. These default to 'ASCII' and
204 'strict', respectively.
Georg Brandl116aa622007-08-15 14:28:22 +0000205
Georg Brandl116aa622007-08-15 14:28:22 +0000206
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000207The :mod:`pickle` module defines three exceptions:
Georg Brandl116aa622007-08-15 14:28:22 +0000208
209.. exception:: PickleError
210
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000211 Common base class for the other pickling exceptions. It inherits
Georg Brandl116aa622007-08-15 14:28:22 +0000212 :exc:`Exception`.
213
Georg Brandl116aa622007-08-15 14:28:22 +0000214.. exception:: PicklingError
215
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000216 Error raised when an unpicklable object is encountered by :class:`Pickler`.
217 It inherits :exc:`PickleError`.
Georg Brandl116aa622007-08-15 14:28:22 +0000218
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000219 Refer to :ref:`pickle-picklable` to learn what kinds of objects can be
220 pickled.
221
Georg Brandl116aa622007-08-15 14:28:22 +0000222.. exception:: UnpicklingError
223
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000224 Error raised when there a problem unpickling an object, such as a data
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000225 corruption or a security violation. It inherits :exc:`PickleError`.
Georg Brandl116aa622007-08-15 14:28:22 +0000226
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000227 Note that other exceptions may also be raised during unpickling, including
228 (but not necessarily limited to) AttributeError, EOFError, ImportError, and
229 IndexError.
230
231
232The :mod:`pickle` module exports two classes, :class:`Pickler` and
Georg Brandl116aa622007-08-15 14:28:22 +0000233:class:`Unpickler`:
234
Georg Brandl116aa622007-08-15 14:28:22 +0000235.. class:: Pickler(file[, protocol])
236
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000237 This takes a binary file for writing a pickle data stream.
Georg Brandl116aa622007-08-15 14:28:22 +0000238
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000239 The optional *protocol* argument tells the pickler to use the given protocol;
240 supported protocols are 0, 1, 2, 3. The default protocol is 3; a
241 backward-incompatible protocol designed for Python 3.0.
Georg Brandl116aa622007-08-15 14:28:22 +0000242
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000243 Specifying a negative protocol version selects the highest protocol version
244 supported. The higher the protocol used, the more recent the version of
245 Python needed to read the pickle produced.
Georg Brandl116aa622007-08-15 14:28:22 +0000246
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000247 The *file* argument must have a write() method that accepts a single bytes
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000248 argument. It can thus be a file object opened for binary writing, a
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000249 io.BytesIO instance, or any other custom object that meets this interface.
Georg Brandl116aa622007-08-15 14:28:22 +0000250
Benjamin Petersone41251e2008-04-25 01:59:09 +0000251 .. method:: dump(obj)
Georg Brandl116aa622007-08-15 14:28:22 +0000252
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000253 Write a pickled representation of *obj* to the open file object given in
254 the constructor.
Georg Brandl116aa622007-08-15 14:28:22 +0000255
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000256 .. method:: persistent_id(obj)
257
258 Do nothing by default. This exists so a subclass can override it.
259
260 If :meth:`persistent_id` returns ``None``, *obj* is pickled as usual. Any
261 other value causes :class:`Pickler` to emit the returned value as a
262 persistent ID for *obj*. The meaning of this persistent ID should be
263 defined by :meth:`Unpickler.persistent_load`. Note that the value
264 returned by :meth:`persistent_id` cannot itself have a persistent ID.
265
266 See :ref:`pickle-persistent` for details and examples of uses.
Georg Brandl116aa622007-08-15 14:28:22 +0000267
Benjamin Petersone41251e2008-04-25 01:59:09 +0000268 .. method:: clear_memo()
Georg Brandl116aa622007-08-15 14:28:22 +0000269
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000270 Deprecated. Use the :meth:`clear` method on :attr:`memo`, instead.
271 Clear the pickler's memo, useful when reusing picklers.
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000272
273 .. attribute:: fast
274
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000275 Deprecated. Enable fast mode if set to a true value. The fast mode
276 disables the usage of memo, therefore speeding the pickling process by not
277 generating superfluous PUT opcodes. It should not be used with
278 self-referential objects, doing otherwise will cause :class:`Pickler` to
279 recurse infinitely.
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000280
281 Use :func:`pickletools.optimize` if you need more compact pickles.
282
283 .. attribute:: memo
284
285 Dictionary holding previously pickled objects to allow shared or
286 recursive objects to pickled by reference as opposed to by value.
Georg Brandl116aa622007-08-15 14:28:22 +0000287
Georg Brandl116aa622007-08-15 14:28:22 +0000288
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000289.. XXX Move these comments to somewhere more appropriate.
290
Georg Brandl116aa622007-08-15 14:28:22 +0000291It is possible to make multiple calls to the :meth:`dump` method of the same
292:class:`Pickler` instance. These must then be matched to the same number of
293calls to the :meth:`load` method of the corresponding :class:`Unpickler`
294instance. If the same object is pickled by multiple :meth:`dump` calls, the
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000295:meth:`load` will all yield references to the same object.
Georg Brandl116aa622007-08-15 14:28:22 +0000296
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000297Please note, this is intended for pickling multiple objects without intervening
298modifications to the objects or their parts. If you modify an object and then
299pickle it again using the same :class:`Pickler` instance, the object is not
300pickled again --- a reference to it is pickled and the :class:`Unpickler` will
301return the old value, not the modified one.
Georg Brandl116aa622007-08-15 14:28:22 +0000302
303
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000304.. class:: Unpickler(file, [\*, encoding="ASCII", errors="strict"])
Georg Brandl116aa622007-08-15 14:28:22 +0000305
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000306 This takes a binary file for reading a pickle data stream.
Georg Brandl116aa622007-08-15 14:28:22 +0000307
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000308 The protocol version of the pickle is detected automatically, so no
309 protocol argument is needed.
310
311 The argument *file* must have two methods, a read() method that takes an
312 integer argument, and a readline() method that requires no arguments. Both
313 methods should return bytes. Thus *file* can be a binary file object opened
314 for reading, a BytesIO object, or any other custom object that meets this
Georg Brandl116aa622007-08-15 14:28:22 +0000315 interface.
316
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000317 Optional keyword arguments are encoding and errors, which are used to decode
318 8-bit string instances pickled by Python 2.x. These default to 'ASCII' and
319 'strict', respectively.
Georg Brandl116aa622007-08-15 14:28:22 +0000320
Benjamin Petersone41251e2008-04-25 01:59:09 +0000321 .. method:: load()
Georg Brandl116aa622007-08-15 14:28:22 +0000322
Benjamin Petersone41251e2008-04-25 01:59:09 +0000323 Read a pickled object representation from the open file object given in
324 the constructor, and return the reconstituted object hierarchy specified
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000325 therein. Bytes past the pickled object's representation are ignored.
Georg Brandl116aa622007-08-15 14:28:22 +0000326
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000327 .. method:: persistent_load(pid)
Georg Brandl116aa622007-08-15 14:28:22 +0000328
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000329 Raise an :exc:`UnpickingError` by default.
Georg Brandl116aa622007-08-15 14:28:22 +0000330
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000331 If defined, :meth:`persistent_load` should return the object specified by
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000332 the persistent ID *pid*. If an invalid persistent ID is encountered, an
333 :exc:`UnpickingError` should be raised.
Georg Brandl116aa622007-08-15 14:28:22 +0000334
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000335 See :ref:`pickle-persistent` for details and examples of uses.
336
337 .. method:: find_class(module, name)
338
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000339 Import *module* if necessary and return the object called *name* from it,
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000340 where the *module* and *name* arguments are :class:`str` objects. Note,
341 unlike its name suggests, :meth:`find_class` is also used for finding
342 functions.
Georg Brandl116aa622007-08-15 14:28:22 +0000343
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000344 Subclasses may override this to gain control over what type of objects and
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000345 how they can be loaded, potentially reducing security risks. Refer to
346 :ref:`pickle-restrict` for details.
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000347
348
349.. _pickle-picklable:
Georg Brandl116aa622007-08-15 14:28:22 +0000350
351What can be pickled and unpickled?
352----------------------------------
353
354The following types can be pickled:
355
356* ``None``, ``True``, and ``False``
357
Georg Brandlba956ae2007-11-29 17:24:34 +0000358* integers, floating point numbers, complex numbers
Georg Brandl116aa622007-08-15 14:28:22 +0000359
Georg Brandlf6945182008-02-01 11:56:49 +0000360* strings, bytes, bytearrays
Georg Brandl116aa622007-08-15 14:28:22 +0000361
362* tuples, lists, sets, and dictionaries containing only picklable objects
363
364* functions defined at the top level of a module
365
366* built-in functions defined at the top level of a module
367
368* classes that are defined at the top level of a module
369
370* instances of such classes whose :attr:`__dict__` or :meth:`__setstate__` is
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000371 picklable (see section :ref:`pickle-inst` for details)
Georg Brandl116aa622007-08-15 14:28:22 +0000372
373Attempts to pickle unpicklable objects will raise the :exc:`PicklingError`
374exception; when this happens, an unspecified number of bytes may have already
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000375been written to the underlying file. Trying to pickle a highly recursive data
Georg Brandl116aa622007-08-15 14:28:22 +0000376structure may exceed the maximum recursion depth, a :exc:`RuntimeError` will be
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000377raised in this case. You can carefully raise this limit with
Georg Brandl116aa622007-08-15 14:28:22 +0000378:func:`sys.setrecursionlimit`.
379
380Note that functions (built-in and user-defined) are pickled by "fully qualified"
381name reference, not by value. This means that only the function name is
382pickled, along with the name of module the function is defined in. Neither the
383function's code, nor any of its function attributes are pickled. Thus the
384defining module must be importable in the unpickling environment, and the module
385must contain the named object, otherwise an exception will be raised. [#]_
386
387Similarly, classes are pickled by named reference, so the same restrictions in
388the unpickling environment apply. Note that none of the class's code or data is
389pickled, so in the following example the class attribute ``attr`` is not
390restored in the unpickling environment::
391
392 class Foo:
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000393 attr = 'A class attribute'
Georg Brandl116aa622007-08-15 14:28:22 +0000394
395 picklestring = pickle.dumps(Foo)
396
397These restrictions are why picklable functions and classes must be defined in
398the top level of a module.
399
400Similarly, when class instances are pickled, their class's code and data are not
401pickled along with them. Only the instance data are pickled. This is done on
402purpose, so you can fix bugs in a class or add methods to the class and still
403load objects that were created with an earlier version of the class. If you
404plan to have long-lived objects that will see many versions of a class, it may
405be worthwhile to put a version number in the objects so that suitable
406conversions can be made by the class's :meth:`__setstate__` method.
407
408
Georg Brandl116aa622007-08-15 14:28:22 +0000409.. _pickle-inst:
410
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000411Pickling Class Instances
412------------------------
Georg Brandl116aa622007-08-15 14:28:22 +0000413
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000414In this section, we describe the general mechanisms available to you to define,
415customize, and control how class instances are pickled and unpickled.
Georg Brandl116aa622007-08-15 14:28:22 +0000416
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000417In most cases, no additional code is needed to make instances picklable. By
418default, pickle will retrieve the class and the attributes of an instance via
419introspection. When a class instance is unpickled, its :meth:`__init__` method
420is usually *not* invoked. The default behaviour first creates an uninitialized
421instance and then restores the saved attributes. The following code shows an
422implementation of this behaviour::
Georg Brandl85eb8c12007-08-31 16:33:38 +0000423
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000424 def save(obj):
425 return (obj.__class__, obj.__dict__)
426
427 def load(cls, attributes):
428 obj = cls.__new__(cls)
429 obj.__dict__.update(attributes)
430 return obj
Georg Brandl116aa622007-08-15 14:28:22 +0000431
432.. index:: single: __getnewargs__() (copy protocol)
433
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000434Classes can alter the default behaviour by providing one or severals special
435methods. In protocol 2 and newer, classes that implements the
436:meth:`__getnewargs__` method can dictate the values passed to the
437:meth:`__new__` method upon unpickling. This is often needed for classes
438whose :meth:`__new__` method requires arguments.
Georg Brandl116aa622007-08-15 14:28:22 +0000439
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000440.. index:: single: __getstate__() (copy protocol)
Georg Brandl116aa622007-08-15 14:28:22 +0000441
442Classes can further influence how their instances are pickled; if the class
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000443defines the method :meth:`__getstate__`, it is called and the returned object is
Georg Brandl116aa622007-08-15 14:28:22 +0000444pickled as the contents for the instance, instead of the contents of the
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000445instance's dictionary. If the :meth:`__getstate__` method is absent, the
446instance's :attr:`__dict__` is pickled as usual.
Georg Brandl116aa622007-08-15 14:28:22 +0000447
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000448.. index:: single: __setstate__() (copy protocol)
Georg Brandl116aa622007-08-15 14:28:22 +0000449
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000450Upon unpickling, if the class defines :meth:`__setstate__`, it is called with
451the unpickled state. In that case, there is no requirement for the state object
452to be a dictionary. Otherwise, the pickled state must be a dictionary and its
453items are assigned to the new instance's dictionary.
454
455.. note::
Georg Brandl116aa622007-08-15 14:28:22 +0000456
Georg Brandl23e8db52008-04-07 19:17:06 +0000457 If :meth:`__getstate__` returns a false value, the :meth:`__setstate__`
458 method will not be called.
Georg Brandl116aa622007-08-15 14:28:22 +0000459
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000460Refer to the section :ref:`pickle-state` for more information about how to use
461the methods :meth:`__getstate__` and :meth:`__setstate__`.
Georg Brandl116aa622007-08-15 14:28:22 +0000462
Christian Heimes05e8be12008-02-23 18:30:17 +0000463.. index::
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000464 pair: copy; protocol
465 single: __reduce__() (copy protocol)
Christian Heimes05e8be12008-02-23 18:30:17 +0000466
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000467As we shall see, pickle does not use directly the methods described above. In
468fact, these methods are part of the copy protocol which implements the
469:meth:`__reduce__` special method. The copy protocol provides a unified
470interface for retrieving the data necessary for pickling and copying
Georg Brandl48310cd2009-01-03 21:18:54 +0000471objects. [#]_
Georg Brandl116aa622007-08-15 14:28:22 +0000472
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000473Although powerful, implementing :meth:`__reduce__` directly in your classes is
474error prone. For this reason, class designers should use the high-level
475interface (i.e., :meth:`__getnewargs__`, :meth:`__getstate__` and
Georg Brandlae2dbe22009-03-13 19:04:40 +0000476:meth:`__setstate__`) whenever possible. We will show, however, cases where using
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000477:meth:`__reduce__` is the only option or leads to more efficient pickling or
478both.
Georg Brandl116aa622007-08-15 14:28:22 +0000479
Georg Brandlae2dbe22009-03-13 19:04:40 +0000480The interface is currently defined as follows. The :meth:`__reduce__` method
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000481takes no argument and shall return either a string or preferably a tuple (the
Georg Brandlae2dbe22009-03-13 19:04:40 +0000482returned object is often referred to as the "reduce value").
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000483
484If a string is returned, the string should be interpreted as the name of a
485global variable. It should be the object's local name relative to its module;
486the pickle module searches the module namespace to determine the object's
487module. This behaviour is typically useful for singletons.
488
489When a tuple is returned, it must be between two and five items long. Optional
490items can either be omitted, or ``None`` can be provided as their value. The
491semantics of each item are in order:
492
493.. XXX Mention __newobj__ special-case?
Georg Brandl116aa622007-08-15 14:28:22 +0000494
495* A callable object that will be called to create the initial version of the
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000496 object.
Georg Brandl116aa622007-08-15 14:28:22 +0000497
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000498* A tuple of arguments for the callable object. An empty tuple must be given if
499 the callable does not accept any argument.
Georg Brandl116aa622007-08-15 14:28:22 +0000500
501* Optionally, the object's state, which will be passed to the object's
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000502 :meth:`__setstate__` method as previously described. If the object has no
503 such method then, the value must be a dictionary and it will be added to the
504 object's :attr:`__dict__` attribute.
Georg Brandl116aa622007-08-15 14:28:22 +0000505
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000506* Optionally, an iterator (and not a sequence) yielding successive items. These
507 items will be appended to the object either using ``obj.append(item)`` or, in
508 batch, using ``obj.extend(list_of_items)``. This is primarily used for list
509 subclasses, but may be used by other classes as long as they have
Georg Brandl116aa622007-08-15 14:28:22 +0000510 :meth:`append` and :meth:`extend` methods with the appropriate signature.
511 (Whether :meth:`append` or :meth:`extend` is used depends on which pickle
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000512 protocol version is used as well as the number of items to append, so both
513 must be supported.)
Georg Brandl116aa622007-08-15 14:28:22 +0000514
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000515* Optionally, an iterator (not a sequence) yielding successive key-value pairs.
516 These items will be stored to the object using ``obj[key] = value``. This is
517 primarily used for dictionary subclasses, but may be used by other classes as
518 long as they implement :meth:`__setitem__`.
Georg Brandl116aa622007-08-15 14:28:22 +0000519
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000520.. index:: single: __reduce_ex__() (copy protocol)
Georg Brandl116aa622007-08-15 14:28:22 +0000521
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000522Alternatively, a :meth:`__reduce_ex__` method may be defined. The only
523difference is this method should take a single integer argument, the protocol
524version. When defined, pickle will prefer it over the :meth:`__reduce__`
525method. In addition, :meth:`__reduce__` automatically becomes a synonym for the
526extended version. The main use for this method is to provide
527backwards-compatible reduce values for older Python releases.
Georg Brandl116aa622007-08-15 14:28:22 +0000528
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000529.. _pickle-persistent:
530
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000531Persistence of External Objects
532^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Georg Brandl116aa622007-08-15 14:28:22 +0000533
Christian Heimes05e8be12008-02-23 18:30:17 +0000534.. index::
535 single: persistent_id (pickle protocol)
536 single: persistent_load (pickle protocol)
537
Georg Brandl116aa622007-08-15 14:28:22 +0000538For the benefit of object persistence, the :mod:`pickle` module supports the
539notion of a reference to an object outside the pickled data stream. Such
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000540objects are referenced by a persistent ID, which should be either a string of
541alphanumeric characters (for protocol 0) [#]_ or just an arbitrary object (for
542any newer protocol).
Georg Brandl116aa622007-08-15 14:28:22 +0000543
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000544The resolution of such persistent IDs is not defined by the :mod:`pickle`
545module; it will delegate this resolution to the user defined methods on the
546pickler and unpickler, :meth:`persistent_id` and :meth:`persistent_load`
547respectively.
Georg Brandl116aa622007-08-15 14:28:22 +0000548
549To pickle objects that have an external persistent id, the pickler must have a
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000550custom :meth:`persistent_id` method that takes an object as an argument and
Georg Brandl116aa622007-08-15 14:28:22 +0000551returns either ``None`` or the persistent id for that object. When ``None`` is
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000552returned, the pickler simply pickles the object as normal. When a persistent ID
553string is returned, the pickler will pickle that object, along with a marker so
554that the unpickler will recognize it as a persistent ID.
Georg Brandl116aa622007-08-15 14:28:22 +0000555
556To unpickle external objects, the unpickler must have a custom
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000557:meth:`persistent_load` method that takes a persistent ID object and returns the
558referenced object.
Georg Brandl116aa622007-08-15 14:28:22 +0000559
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000560Here is a comprehensive example presenting how persistent ID can be used to
561pickle external objects by reference.
Georg Brandl116aa622007-08-15 14:28:22 +0000562
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000563.. literalinclude:: ../includes/dbpickle.py
Alexandre Vassalottibcd1e3a2009-01-23 05:28:16 +0000564
Georg Brandl116aa622007-08-15 14:28:22 +0000565
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000566.. _pickle-state:
567
568Handling Stateful Objects
569^^^^^^^^^^^^^^^^^^^^^^^^^
570
571.. index::
572 single: __getstate__() (copy protocol)
573 single: __setstate__() (copy protocol)
574
575Here's an example that shows how to modify pickling behavior for a class.
576The :class:`TextReader` class opens a text file, and returns the line number and
577line contents each time its :meth:`readline` method is called. If a
578:class:`TextReader` instance is pickled, all attributes *except* the file object
579member are saved. When the instance is unpickled, the file is reopened, and
580reading resumes from the last location. The :meth:`__setstate__` and
581:meth:`__getstate__` methods are used to implement this behavior. ::
582
583 class TextReader:
584 """Print and number lines in a text file."""
585
586 def __init__(self, filename):
587 self.filename = filename
588 self.file = open(filename)
589 self.lineno = 0
590
591 def readline(self):
592 self.lineno += 1
593 line = self.file.readline()
594 if not line:
595 return None
596 if line.endswith("\n"):
597 line = line[:-1]
598 return "%i: %s" % (self.lineno, line)
599
600 def __getstate__(self):
601 # Copy the object's state from self.__dict__ which contains
602 # all our instance attributes. Always use the dict.copy()
603 # method to avoid modifying the original state.
604 state = self.__dict__.copy()
605 # Remove the unpicklable entries.
606 del state['file']
607 return state
608
609 def __setstate__(self, state):
610 # Restore instance attributes (i.e., filename and lineno).
611 self.__dict__.update(state)
612 # Restore the previously opened file's state. To do so, we need to
613 # reopen it and read from it until the line count is restored.
614 file = open(self.filename)
615 for _ in range(self.lineno):
616 file.readline()
617 # Finally, save the file.
618 self.file = file
619
620
621A sample usage might be something like this::
622
623 >>> reader = TextReader("hello.txt")
624 >>> reader.readline()
625 '1: Hello world!'
626 >>> reader.readline()
627 '2: I am line number two.'
628 >>> new_reader = pickle.loads(pickle.dumps(reader))
629 >>> new_reader.readline()
630 '3: Goodbye!'
631
632
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000633.. _pickle-restrict:
Georg Brandl116aa622007-08-15 14:28:22 +0000634
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000635Restricting Globals
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000636-------------------
Georg Brandl116aa622007-08-15 14:28:22 +0000637
Christian Heimes05e8be12008-02-23 18:30:17 +0000638.. index::
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000639 single: find_class() (pickle protocol)
Christian Heimes05e8be12008-02-23 18:30:17 +0000640
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000641By default, unpickling will import any class or function that it finds in the
642pickle data. For many applications, this behaviour is unacceptable as it
643permits the unpickler to import and invoke arbitrary code. Just consider what
644this hand-crafted pickle data stream does when loaded::
Georg Brandl116aa622007-08-15 14:28:22 +0000645
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000646 >>> import pickle
647 >>> pickle.loads(b"cos\nsystem\n(S'echo hello world'\ntR.")
648 hello world
649 0
Georg Brandl116aa622007-08-15 14:28:22 +0000650
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000651In this example, the unpickler imports the :func:`os.system` function and then
652apply the string argument "echo hello world". Although this example is
653inoffensive, it is not difficult to imagine one that could damage your system.
Georg Brandl116aa622007-08-15 14:28:22 +0000654
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000655For this reason, you may want to control what gets unpickled by customizing
656:meth:`Unpickler.find_class`. Unlike its name suggests, :meth:`find_class` is
657called whenever a global (i.e., a class or a function) is requested. Thus it is
658possible to either forbid completely globals or restrict them to a safe subset.
659
660Here is an example of an unpickler allowing only few safe classes from the
661:mod:`builtins` module to be loaded::
662
663 import builtins
664 import io
665 import pickle
666
667 safe_builtins = {
668 'range',
669 'complex',
670 'set',
671 'frozenset',
672 'slice',
673 }
674
675 class RestrictedUnpickler(pickle.Unpickler):
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000676
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000677 def find_class(self, module, name):
678 # Only allow safe classes from builtins.
679 if module == "builtins" and name in safe_builtins:
680 return getattr(builtins, name)
681 # Forbid everything else.
682 raise pickle.UnpicklingError("global '%s.%s' is forbidden" %
683 (module, name))
684
685 def restricted_loads(s):
686 """Helper function analogous to pickle.loads()."""
687 return RestrictedUnpickler(io.BytesIO(s)).load()
688
689A sample usage of our unpickler working has intended::
690
691 >>> restricted_loads(pickle.dumps([1, 2, range(15)]))
692 [1, 2, range(0, 15)]
693 >>> restricted_loads(b"cos\nsystem\n(S'echo hello world'\ntR.")
694 Traceback (most recent call last):
695 ...
696 pickle.UnpicklingError: global 'os.system' is forbidden
697 >>> restricted_loads(b'cbuiltins\neval\n'
698 ... b'(S\'getattr(__import__("os"), "system")'
699 ... b'("echo hello world")\'\ntR.')
700 Traceback (most recent call last):
701 ...
702 pickle.UnpicklingError: global 'builtins.eval' is forbidden
703
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000704
705.. XXX Add note about how extension codes could evade our protection
Georg Brandl48310cd2009-01-03 21:18:54 +0000706 mechanism (e.g. cached classes do not invokes find_class()).
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000707
708As our examples shows, you have to be careful with what you allow to be
709unpickled. Therefore if security is a concern, you may want to consider
710alternatives such as the marshalling API in :mod:`xmlrpc.client` or third-party
711solutions.
712
Georg Brandl116aa622007-08-15 14:28:22 +0000713
714.. _pickle-example:
715
Alexandre Vassalottibcd1e3a2009-01-23 05:28:16 +0000716Usage Examples
717--------------
Georg Brandl116aa622007-08-15 14:28:22 +0000718
719For the simplest code, use the :func:`dump` and :func:`load` functions. Note
720that a self-referencing list is pickled and restored correctly. ::
721
722 import pickle
723
Alexandre Vassalottibcd1e3a2009-01-23 05:28:16 +0000724 # An arbitrary collection of objects supported by pickle.
725 data = {
726 'a': [1, 2.0, 3, 4+6j],
727 'b': ("character string", b"byte string"),
728 'c': set([None, True, False])
729 }
Georg Brandl116aa622007-08-15 14:28:22 +0000730
Alexandre Vassalottibcd1e3a2009-01-23 05:28:16 +0000731 with open('data.pickle', 'wb') as f:
732 # Pickle the 'data' dictionary using the highest protocol available.
733 pickle.dump(data, f, pickle.HIGHEST_PROTOCOL)
Georg Brandl116aa622007-08-15 14:28:22 +0000734
Georg Brandl116aa622007-08-15 14:28:22 +0000735
Alexandre Vassalottibcd1e3a2009-01-23 05:28:16 +0000736The following example reads the resulting pickled data. ::
Georg Brandl116aa622007-08-15 14:28:22 +0000737
Alexandre Vassalottibcd1e3a2009-01-23 05:28:16 +0000738 import pickle
Georg Brandl116aa622007-08-15 14:28:22 +0000739
Alexandre Vassalottibcd1e3a2009-01-23 05:28:16 +0000740 with open('data.pickle', 'rb') as f:
741 # The protocol version used is detected automatically, so we do not
742 # have to specify it.
743 data = pickle.load(f)
Georg Brandl116aa622007-08-15 14:28:22 +0000744
Georg Brandl116aa622007-08-15 14:28:22 +0000745
Georg Brandl116aa622007-08-15 14:28:22 +0000746.. seealso::
747
Alexandre Vassalottif7fa63d2008-05-11 08:55:36 +0000748 Module :mod:`copyreg`
Georg Brandl116aa622007-08-15 14:28:22 +0000749 Pickle interface constructor registration for extension types.
750
751 Module :mod:`shelve`
752 Indexed databases of objects; uses :mod:`pickle`.
753
754 Module :mod:`copy`
755 Shallow and deep object copying.
756
757 Module :mod:`marshal`
758 High-performance serialization of built-in types.
759
760
Georg Brandl116aa622007-08-15 14:28:22 +0000761.. rubric:: Footnotes
762
763.. [#] Don't confuse this with the :mod:`marshal` module
764
Georg Brandl116aa622007-08-15 14:28:22 +0000765.. [#] The exception raised will likely be an :exc:`ImportError` or an
766 :exc:`AttributeError` but it could be something else.
767
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000768.. [#] The :mod:`copy` module uses this protocol for shallow and deep copying
769 operations.
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000770
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000771.. [#] The limitation on alphanumeric characters is due to the fact
772 the persistent IDs, in protocol 0, are delimited by the newline
773 character. Therefore if any kind of newline characters occurs in
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000774 persistent IDs, the resulting pickle will become unreadable.