blob: cd50d115f41bb100cbb166be9aac21cfa7314141 [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001:mod:`pickle` --- Python object serialization
2=============================================
3
4.. index::
5 single: persistence
6 pair: persistent; objects
7 pair: serializing; objects
8 pair: marshalling; objects
9 pair: flattening; objects
10 pair: pickling; objects
11
12.. module:: pickle
13 :synopsis: Convert Python objects to streams of bytes and back.
Christian Heimes5b5e81c2007-12-31 16:14:33 +000014.. sectionauthor:: Jim Kerr <jbkerr@sr.hp.com>.
15.. sectionauthor:: Barry Warsaw <barry@zope.com>
Georg Brandl116aa622007-08-15 14:28:22 +000016
Alexandre Vassalotti9d7665d2009-04-03 06:13:29 +000017
Georg Brandl116aa622007-08-15 14:28:22 +000018The :mod:`pickle` module implements a fundamental, but powerful algorithm for
19serializing and de-serializing a Python object structure. "Pickling" is the
20process whereby a Python object hierarchy is converted into a byte stream, and
21"unpickling" is the inverse operation, whereby a byte stream is converted back
22into an object hierarchy. Pickling (and unpickling) is alternatively known as
23"serialization", "marshalling," [#]_ or "flattening", however, to avoid
Benjamin Petersonbe149d02008-06-20 21:03:22 +000024confusion, the terms used here are "pickling" and "unpickling"..
Georg Brandl116aa622007-08-15 14:28:22 +000025
26
27Relationship to other Python modules
28------------------------------------
29
Benjamin Petersonbe149d02008-06-20 21:03:22 +000030The :mod:`pickle` module has an transparent optimizer (:mod:`_pickle`) written
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +000031in C. It is used whenever available. Otherwise the pure Python implementation is
Benjamin Petersonbe149d02008-06-20 21:03:22 +000032used.
Georg Brandl116aa622007-08-15 14:28:22 +000033
34Python has a more primitive serialization module called :mod:`marshal`, but in
35general :mod:`pickle` should always be the preferred way to serialize Python
36objects. :mod:`marshal` exists primarily to support Python's :file:`.pyc`
37files.
38
39The :mod:`pickle` module differs from :mod:`marshal` several significant ways:
40
41* The :mod:`pickle` module keeps track of the objects it has already serialized,
42 so that later references to the same object won't be serialized again.
43 :mod:`marshal` doesn't do this.
44
45 This has implications both for recursive objects and object sharing. Recursive
46 objects are objects that contain references to themselves. These are not
47 handled by marshal, and in fact, attempting to marshal recursive objects will
48 crash your Python interpreter. Object sharing happens when there are multiple
49 references to the same object in different places in the object hierarchy being
50 serialized. :mod:`pickle` stores such objects only once, and ensures that all
51 other references point to the master copy. Shared objects remain shared, which
52 can be very important for mutable objects.
53
54* :mod:`marshal` cannot be used to serialize user-defined classes and their
55 instances. :mod:`pickle` can save and restore class instances transparently,
56 however the class definition must be importable and live in the same module as
57 when the object was stored.
58
59* The :mod:`marshal` serialization format is not guaranteed to be portable
60 across Python versions. Because its primary job in life is to support
61 :file:`.pyc` files, the Python implementers reserve the right to change the
62 serialization format in non-backwards compatible ways should the need arise.
63 The :mod:`pickle` serialization format is guaranteed to be backwards compatible
64 across Python releases.
65
66.. warning::
67
68 The :mod:`pickle` module is not intended to be secure against erroneous or
69 maliciously constructed data. Never unpickle data received from an untrusted or
70 unauthenticated source.
71
72Note that serialization is a more primitive notion than persistence; although
73:mod:`pickle` reads and writes file objects, it does not handle the issue of
74naming persistent objects, nor the (even more complicated) issue of concurrent
75access to persistent objects. The :mod:`pickle` module can transform a complex
76object into a byte stream and it can transform the byte stream into an object
77with the same internal structure. Perhaps the most obvious thing to do with
78these byte streams is to write them onto a file, but it is also conceivable to
79send them across a network or store them in a database. The module
80:mod:`shelve` provides a simple interface to pickle and unpickle objects on
81DBM-style database files.
82
83
84Data stream format
85------------------
86
87.. index::
88 single: XDR
89 single: External Data Representation
90
91The data format used by :mod:`pickle` is Python-specific. This has the
92advantage that there are no restrictions imposed by external standards such as
93XDR (which can't represent pointer sharing); however it means that non-Python
94programs may not be able to reconstruct pickled Python objects.
95
Alexandre Vassalotti758bca62008-10-18 19:25:07 +000096By default, the :mod:`pickle` data format uses a compact binary representation.
97The module :mod:`pickletools` contains tools for analyzing data streams
98generated by :mod:`pickle`.
Georg Brandl116aa622007-08-15 14:28:22 +000099
Georg Brandl42f2ae02008-04-06 08:39:37 +0000100There are currently 4 different protocols which can be used for pickling.
Georg Brandl116aa622007-08-15 14:28:22 +0000101
Alexandre Vassalottif7d08c72009-01-23 04:50:05 +0000102* Protocol version 0 is the original human-readable protocol and is
103 backwards compatible with earlier versions of Python.
Georg Brandl116aa622007-08-15 14:28:22 +0000104
105* Protocol version 1 is the old binary format which is also compatible with
106 earlier versions of Python.
107
108* Protocol version 2 was introduced in Python 2.3. It provides much more
Georg Brandl9afde1c2007-11-01 20:32:30 +0000109 efficient pickling of :term:`new-style class`\es.
Georg Brandl116aa622007-08-15 14:28:22 +0000110
Georg Brandl42f2ae02008-04-06 08:39:37 +0000111* Protocol version 3 was added in Python 3.0. It has explicit support for
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000112 bytes and cannot be unpickled by Python 2.x pickle modules. This is
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000113 the current recommended protocol, use it whenever it is possible.
Georg Brandl42f2ae02008-04-06 08:39:37 +0000114
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000115Refer to :pep:`307` for information about improvements brought by
116protocol 2. See :mod:`pickletools`'s source code for extensive
117comments about opcodes used by pickle protocols.
Georg Brandl116aa622007-08-15 14:28:22 +0000118
Georg Brandl116aa622007-08-15 14:28:22 +0000119
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000120Module Interface
121----------------
Georg Brandl116aa622007-08-15 14:28:22 +0000122
123To serialize an object hierarchy, you first create a pickler, then you call the
124pickler's :meth:`dump` method. To de-serialize a data stream, you first create
125an unpickler, then you call the unpickler's :meth:`load` method. The
126:mod:`pickle` module provides the following constant:
127
128
129.. data:: HIGHEST_PROTOCOL
130
131 The highest protocol version available. This value can be passed as a
132 *protocol* value.
133
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000134.. data:: DEFAULT_PROTOCOL
135
136 The default protocol used for pickling. May be less than HIGHEST_PROTOCOL.
137 Currently the default protocol is 3; a backward-incompatible protocol
138 designed for Python 3.0.
139
140
Georg Brandl116aa622007-08-15 14:28:22 +0000141The :mod:`pickle` module provides the following functions to make the pickling
142process more convenient:
143
Georg Brandl116aa622007-08-15 14:28:22 +0000144.. function:: dump(obj, file[, protocol])
145
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000146 Write a pickled representation of *obj* to the open file object *file*. This
147 is equivalent to ``Pickler(file, protocol).dump(obj)``.
Georg Brandl116aa622007-08-15 14:28:22 +0000148
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000149 The optional *protocol* argument tells the pickler to use the given protocol;
150 supported protocols are 0, 1, 2, 3. The default protocol is 3; a
151 backward-incompatible protocol designed for Python 3.0.
Georg Brandl116aa622007-08-15 14:28:22 +0000152
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000153 Specifying a negative protocol version selects the highest protocol version
154 supported. The higher the protocol used, the more recent the version of
155 Python needed to read the pickle produced.
Georg Brandl116aa622007-08-15 14:28:22 +0000156
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000157 The *file* argument must have a write() method that accepts a single bytes
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000158 argument. It can thus be a file object opened for binary writing, a
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000159 io.BytesIO instance, or any other custom object that meets this interface.
Georg Brandl116aa622007-08-15 14:28:22 +0000160
161.. function:: dumps(obj[, protocol])
162
Mark Summerfieldb9e23042008-04-21 14:47:45 +0000163 Return the pickled representation of the object as a :class:`bytes`
164 object, instead of writing it to a file.
Georg Brandl116aa622007-08-15 14:28:22 +0000165
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000166 The optional *protocol* argument tells the pickler to use the given protocol;
167 supported protocols are 0, 1, 2, 3. The default protocol is 3; a
168 backward-incompatible protocol designed for Python 3.0.
169
170 Specifying a negative protocol version selects the highest protocol version
171 supported. The higher the protocol used, the more recent the version of
172 Python needed to read the pickle produced.
173
174.. function:: load(file, [\*, encoding="ASCII", errors="strict"])
175
176 Read a pickled object representation from the open file object *file* and
177 return the reconstituted object hierarchy specified therein. This is
178 equivalent to ``Unpickler(file).load()``.
179
180 The protocol version of the pickle is detected automatically, so no protocol
181 argument is needed. Bytes past the pickled object's representation are
182 ignored.
183
184 The argument *file* must have two methods, a read() method that takes an
185 integer argument, and a readline() method that requires no arguments. Both
186 methods should return bytes. Thus *file* can be a binary file object opened
187 for reading, a BytesIO object, or any other custom object that meets this
188 interface.
189
190 Optional keyword arguments are encoding and errors, which are used to decode
191 8-bit string instances pickled by Python 2.x. These default to 'ASCII' and
192 'strict', respectively.
193
194.. function:: loads(bytes_object, [\*, encoding="ASCII", errors="strict"])
195
196 Read a pickled object hierarchy from a :class:`bytes` object and return the
197 reconstituted object hierarchy specified therein
198
199 The protocol version of the pickle is detected automatically, so no protocol
200 argument is needed. Bytes past the pickled object's representation are
201 ignored.
202
203 Optional keyword arguments are encoding and errors, which are used to decode
204 8-bit string instances pickled by Python 2.x. These default to 'ASCII' and
205 'strict', respectively.
Georg Brandl116aa622007-08-15 14:28:22 +0000206
Georg Brandl116aa622007-08-15 14:28:22 +0000207
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000208The :mod:`pickle` module defines three exceptions:
Georg Brandl116aa622007-08-15 14:28:22 +0000209
210.. exception:: PickleError
211
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000212 Common base class for the other pickling exceptions. It inherits
Georg Brandl116aa622007-08-15 14:28:22 +0000213 :exc:`Exception`.
214
Georg Brandl116aa622007-08-15 14:28:22 +0000215.. exception:: PicklingError
216
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000217 Error raised when an unpicklable object is encountered by :class:`Pickler`.
218 It inherits :exc:`PickleError`.
Georg Brandl116aa622007-08-15 14:28:22 +0000219
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000220 Refer to :ref:`pickle-picklable` to learn what kinds of objects can be
221 pickled.
222
Georg Brandl116aa622007-08-15 14:28:22 +0000223.. exception:: UnpicklingError
224
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000225 Error raised when there a problem unpickling an object, such as a data
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000226 corruption or a security violation. It inherits :exc:`PickleError`.
Georg Brandl116aa622007-08-15 14:28:22 +0000227
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000228 Note that other exceptions may also be raised during unpickling, including
229 (but not necessarily limited to) AttributeError, EOFError, ImportError, and
230 IndexError.
231
232
233The :mod:`pickle` module exports two classes, :class:`Pickler` and
Georg Brandl116aa622007-08-15 14:28:22 +0000234:class:`Unpickler`:
235
Georg Brandl116aa622007-08-15 14:28:22 +0000236.. class:: Pickler(file[, protocol])
237
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000238 This takes a binary file for writing a pickle data stream.
Georg Brandl116aa622007-08-15 14:28:22 +0000239
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000240 The optional *protocol* argument tells the pickler to use the given protocol;
241 supported protocols are 0, 1, 2, 3. The default protocol is 3; a
242 backward-incompatible protocol designed for Python 3.0.
Georg Brandl116aa622007-08-15 14:28:22 +0000243
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000244 Specifying a negative protocol version selects the highest protocol version
245 supported. The higher the protocol used, the more recent the version of
246 Python needed to read the pickle produced.
Georg Brandl116aa622007-08-15 14:28:22 +0000247
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000248 The *file* argument must have a write() method that accepts a single bytes
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000249 argument. It can thus be a file object opened for binary writing, a
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000250 io.BytesIO instance, or any other custom object that meets this interface.
Georg Brandl116aa622007-08-15 14:28:22 +0000251
Benjamin Petersone41251e2008-04-25 01:59:09 +0000252 .. method:: dump(obj)
Georg Brandl116aa622007-08-15 14:28:22 +0000253
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000254 Write a pickled representation of *obj* to the open file object given in
255 the constructor.
Georg Brandl116aa622007-08-15 14:28:22 +0000256
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000257 .. method:: persistent_id(obj)
258
259 Do nothing by default. This exists so a subclass can override it.
260
261 If :meth:`persistent_id` returns ``None``, *obj* is pickled as usual. Any
262 other value causes :class:`Pickler` to emit the returned value as a
263 persistent ID for *obj*. The meaning of this persistent ID should be
264 defined by :meth:`Unpickler.persistent_load`. Note that the value
265 returned by :meth:`persistent_id` cannot itself have a persistent ID.
266
267 See :ref:`pickle-persistent` for details and examples of uses.
Georg Brandl116aa622007-08-15 14:28:22 +0000268
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000269 .. attribute:: fast
270
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000271 Deprecated. Enable fast mode if set to a true value. The fast mode
272 disables the usage of memo, therefore speeding the pickling process by not
273 generating superfluous PUT opcodes. It should not be used with
274 self-referential objects, doing otherwise will cause :class:`Pickler` to
275 recurse infinitely.
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000276
277 Use :func:`pickletools.optimize` if you need more compact pickles.
278
Georg Brandl116aa622007-08-15 14:28:22 +0000279
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000280.. class:: Unpickler(file, [\*, encoding="ASCII", errors="strict"])
Georg Brandl116aa622007-08-15 14:28:22 +0000281
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000282 This takes a binary file for reading a pickle data stream.
Georg Brandl116aa622007-08-15 14:28:22 +0000283
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000284 The protocol version of the pickle is detected automatically, so no
285 protocol argument is needed.
286
287 The argument *file* must have two methods, a read() method that takes an
288 integer argument, and a readline() method that requires no arguments. Both
289 methods should return bytes. Thus *file* can be a binary file object opened
290 for reading, a BytesIO object, or any other custom object that meets this
Georg Brandl116aa622007-08-15 14:28:22 +0000291 interface.
292
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000293 Optional keyword arguments are encoding and errors, which are used to decode
294 8-bit string instances pickled by Python 2.x. These default to 'ASCII' and
295 'strict', respectively.
Georg Brandl116aa622007-08-15 14:28:22 +0000296
Benjamin Petersone41251e2008-04-25 01:59:09 +0000297 .. method:: load()
Georg Brandl116aa622007-08-15 14:28:22 +0000298
Benjamin Petersone41251e2008-04-25 01:59:09 +0000299 Read a pickled object representation from the open file object given in
300 the constructor, and return the reconstituted object hierarchy specified
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000301 therein. Bytes past the pickled object's representation are ignored.
Georg Brandl116aa622007-08-15 14:28:22 +0000302
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000303 .. method:: persistent_load(pid)
Georg Brandl116aa622007-08-15 14:28:22 +0000304
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000305 Raise an :exc:`UnpickingError` by default.
Georg Brandl116aa622007-08-15 14:28:22 +0000306
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000307 If defined, :meth:`persistent_load` should return the object specified by
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000308 the persistent ID *pid*. If an invalid persistent ID is encountered, an
309 :exc:`UnpickingError` should be raised.
Georg Brandl116aa622007-08-15 14:28:22 +0000310
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000311 See :ref:`pickle-persistent` for details and examples of uses.
312
313 .. method:: find_class(module, name)
314
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000315 Import *module* if necessary and return the object called *name* from it,
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000316 where the *module* and *name* arguments are :class:`str` objects. Note,
317 unlike its name suggests, :meth:`find_class` is also used for finding
318 functions.
Georg Brandl116aa622007-08-15 14:28:22 +0000319
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000320 Subclasses may override this to gain control over what type of objects and
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000321 how they can be loaded, potentially reducing security risks. Refer to
322 :ref:`pickle-restrict` for details.
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000323
324
325.. _pickle-picklable:
Georg Brandl116aa622007-08-15 14:28:22 +0000326
327What can be pickled and unpickled?
328----------------------------------
329
330The following types can be pickled:
331
332* ``None``, ``True``, and ``False``
333
Georg Brandlba956ae2007-11-29 17:24:34 +0000334* integers, floating point numbers, complex numbers
Georg Brandl116aa622007-08-15 14:28:22 +0000335
Georg Brandlf6945182008-02-01 11:56:49 +0000336* strings, bytes, bytearrays
Georg Brandl116aa622007-08-15 14:28:22 +0000337
338* tuples, lists, sets, and dictionaries containing only picklable objects
339
340* functions defined at the top level of a module
341
342* built-in functions defined at the top level of a module
343
344* classes that are defined at the top level of a module
345
346* instances of such classes whose :attr:`__dict__` or :meth:`__setstate__` is
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000347 picklable (see section :ref:`pickle-inst` for details)
Georg Brandl116aa622007-08-15 14:28:22 +0000348
349Attempts to pickle unpicklable objects will raise the :exc:`PicklingError`
350exception; when this happens, an unspecified number of bytes may have already
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000351been written to the underlying file. Trying to pickle a highly recursive data
Georg Brandl116aa622007-08-15 14:28:22 +0000352structure may exceed the maximum recursion depth, a :exc:`RuntimeError` will be
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000353raised in this case. You can carefully raise this limit with
Georg Brandl116aa622007-08-15 14:28:22 +0000354:func:`sys.setrecursionlimit`.
355
356Note that functions (built-in and user-defined) are pickled by "fully qualified"
357name reference, not by value. This means that only the function name is
358pickled, along with the name of module the function is defined in. Neither the
359function's code, nor any of its function attributes are pickled. Thus the
360defining module must be importable in the unpickling environment, and the module
361must contain the named object, otherwise an exception will be raised. [#]_
362
363Similarly, classes are pickled by named reference, so the same restrictions in
364the unpickling environment apply. Note that none of the class's code or data is
365pickled, so in the following example the class attribute ``attr`` is not
366restored in the unpickling environment::
367
368 class Foo:
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000369 attr = 'A class attribute'
Georg Brandl116aa622007-08-15 14:28:22 +0000370
371 picklestring = pickle.dumps(Foo)
372
373These restrictions are why picklable functions and classes must be defined in
374the top level of a module.
375
376Similarly, when class instances are pickled, their class's code and data are not
377pickled along with them. Only the instance data are pickled. This is done on
378purpose, so you can fix bugs in a class or add methods to the class and still
379load objects that were created with an earlier version of the class. If you
380plan to have long-lived objects that will see many versions of a class, it may
381be worthwhile to put a version number in the objects so that suitable
382conversions can be made by the class's :meth:`__setstate__` method.
383
384
Georg Brandl116aa622007-08-15 14:28:22 +0000385.. _pickle-inst:
386
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000387Pickling Class Instances
388------------------------
Georg Brandl116aa622007-08-15 14:28:22 +0000389
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000390In this section, we describe the general mechanisms available to you to define,
391customize, and control how class instances are pickled and unpickled.
Georg Brandl116aa622007-08-15 14:28:22 +0000392
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000393In most cases, no additional code is needed to make instances picklable. By
394default, pickle will retrieve the class and the attributes of an instance via
395introspection. When a class instance is unpickled, its :meth:`__init__` method
396is usually *not* invoked. The default behaviour first creates an uninitialized
397instance and then restores the saved attributes. The following code shows an
398implementation of this behaviour::
Georg Brandl85eb8c12007-08-31 16:33:38 +0000399
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000400 def save(obj):
401 return (obj.__class__, obj.__dict__)
402
403 def load(cls, attributes):
404 obj = cls.__new__(cls)
405 obj.__dict__.update(attributes)
406 return obj
Georg Brandl116aa622007-08-15 14:28:22 +0000407
408.. index:: single: __getnewargs__() (copy protocol)
409
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000410Classes can alter the default behaviour by providing one or severals special
411methods. In protocol 2 and newer, classes that implements the
412:meth:`__getnewargs__` method can dictate the values passed to the
413:meth:`__new__` method upon unpickling. This is often needed for classes
414whose :meth:`__new__` method requires arguments.
Georg Brandl116aa622007-08-15 14:28:22 +0000415
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000416.. index:: single: __getstate__() (copy protocol)
Georg Brandl116aa622007-08-15 14:28:22 +0000417
418Classes can further influence how their instances are pickled; if the class
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000419defines the method :meth:`__getstate__`, it is called and the returned object is
Georg Brandl116aa622007-08-15 14:28:22 +0000420pickled as the contents for the instance, instead of the contents of the
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000421instance's dictionary. If the :meth:`__getstate__` method is absent, the
422instance's :attr:`__dict__` is pickled as usual.
Georg Brandl116aa622007-08-15 14:28:22 +0000423
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000424.. index:: single: __setstate__() (copy protocol)
Georg Brandl116aa622007-08-15 14:28:22 +0000425
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000426Upon unpickling, if the class defines :meth:`__setstate__`, it is called with
427the unpickled state. In that case, there is no requirement for the state object
428to be a dictionary. Otherwise, the pickled state must be a dictionary and its
429items are assigned to the new instance's dictionary.
430
431.. note::
Georg Brandl116aa622007-08-15 14:28:22 +0000432
Georg Brandl23e8db52008-04-07 19:17:06 +0000433 If :meth:`__getstate__` returns a false value, the :meth:`__setstate__`
434 method will not be called.
Georg Brandl116aa622007-08-15 14:28:22 +0000435
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000436Refer to the section :ref:`pickle-state` for more information about how to use
437the methods :meth:`__getstate__` and :meth:`__setstate__`.
Georg Brandl116aa622007-08-15 14:28:22 +0000438
Christian Heimes05e8be12008-02-23 18:30:17 +0000439.. index::
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000440 pair: copy; protocol
441 single: __reduce__() (copy protocol)
Christian Heimes05e8be12008-02-23 18:30:17 +0000442
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000443As we shall see, pickle does not use directly the methods described above. In
444fact, these methods are part of the copy protocol which implements the
445:meth:`__reduce__` special method. The copy protocol provides a unified
446interface for retrieving the data necessary for pickling and copying
Georg Brandl48310cd2009-01-03 21:18:54 +0000447objects. [#]_
Georg Brandl116aa622007-08-15 14:28:22 +0000448
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000449Although powerful, implementing :meth:`__reduce__` directly in your classes is
450error prone. For this reason, class designers should use the high-level
451interface (i.e., :meth:`__getnewargs__`, :meth:`__getstate__` and
Georg Brandlae2dbe22009-03-13 19:04:40 +0000452:meth:`__setstate__`) whenever possible. We will show, however, cases where using
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000453:meth:`__reduce__` is the only option or leads to more efficient pickling or
454both.
Georg Brandl116aa622007-08-15 14:28:22 +0000455
Georg Brandlae2dbe22009-03-13 19:04:40 +0000456The interface is currently defined as follows. The :meth:`__reduce__` method
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000457takes no argument and shall return either a string or preferably a tuple (the
Georg Brandlae2dbe22009-03-13 19:04:40 +0000458returned object is often referred to as the "reduce value").
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000459
460If a string is returned, the string should be interpreted as the name of a
461global variable. It should be the object's local name relative to its module;
462the pickle module searches the module namespace to determine the object's
463module. This behaviour is typically useful for singletons.
464
465When a tuple is returned, it must be between two and five items long. Optional
466items can either be omitted, or ``None`` can be provided as their value. The
467semantics of each item are in order:
468
469.. XXX Mention __newobj__ special-case?
Georg Brandl116aa622007-08-15 14:28:22 +0000470
471* A callable object that will be called to create the initial version of the
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000472 object.
Georg Brandl116aa622007-08-15 14:28:22 +0000473
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000474* A tuple of arguments for the callable object. An empty tuple must be given if
475 the callable does not accept any argument.
Georg Brandl116aa622007-08-15 14:28:22 +0000476
477* Optionally, the object's state, which will be passed to the object's
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000478 :meth:`__setstate__` method as previously described. If the object has no
479 such method then, the value must be a dictionary and it will be added to the
480 object's :attr:`__dict__` attribute.
Georg Brandl116aa622007-08-15 14:28:22 +0000481
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000482* Optionally, an iterator (and not a sequence) yielding successive items. These
483 items will be appended to the object either using ``obj.append(item)`` or, in
484 batch, using ``obj.extend(list_of_items)``. This is primarily used for list
485 subclasses, but may be used by other classes as long as they have
Georg Brandl116aa622007-08-15 14:28:22 +0000486 :meth:`append` and :meth:`extend` methods with the appropriate signature.
487 (Whether :meth:`append` or :meth:`extend` is used depends on which pickle
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000488 protocol version is used as well as the number of items to append, so both
489 must be supported.)
Georg Brandl116aa622007-08-15 14:28:22 +0000490
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000491* Optionally, an iterator (not a sequence) yielding successive key-value pairs.
492 These items will be stored to the object using ``obj[key] = value``. This is
493 primarily used for dictionary subclasses, but may be used by other classes as
494 long as they implement :meth:`__setitem__`.
Georg Brandl116aa622007-08-15 14:28:22 +0000495
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000496.. index:: single: __reduce_ex__() (copy protocol)
Georg Brandl116aa622007-08-15 14:28:22 +0000497
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000498Alternatively, a :meth:`__reduce_ex__` method may be defined. The only
499difference is this method should take a single integer argument, the protocol
500version. When defined, pickle will prefer it over the :meth:`__reduce__`
501method. In addition, :meth:`__reduce__` automatically becomes a synonym for the
502extended version. The main use for this method is to provide
503backwards-compatible reduce values for older Python releases.
Georg Brandl116aa622007-08-15 14:28:22 +0000504
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000505.. _pickle-persistent:
506
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000507Persistence of External Objects
508^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Georg Brandl116aa622007-08-15 14:28:22 +0000509
Christian Heimes05e8be12008-02-23 18:30:17 +0000510.. index::
511 single: persistent_id (pickle protocol)
512 single: persistent_load (pickle protocol)
513
Georg Brandl116aa622007-08-15 14:28:22 +0000514For the benefit of object persistence, the :mod:`pickle` module supports the
515notion of a reference to an object outside the pickled data stream. Such
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000516objects are referenced by a persistent ID, which should be either a string of
517alphanumeric characters (for protocol 0) [#]_ or just an arbitrary object (for
518any newer protocol).
Georg Brandl116aa622007-08-15 14:28:22 +0000519
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000520The resolution of such persistent IDs is not defined by the :mod:`pickle`
521module; it will delegate this resolution to the user defined methods on the
522pickler and unpickler, :meth:`persistent_id` and :meth:`persistent_load`
523respectively.
Georg Brandl116aa622007-08-15 14:28:22 +0000524
525To pickle objects that have an external persistent id, the pickler must have a
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000526custom :meth:`persistent_id` method that takes an object as an argument and
Georg Brandl116aa622007-08-15 14:28:22 +0000527returns either ``None`` or the persistent id for that object. When ``None`` is
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000528returned, the pickler simply pickles the object as normal. When a persistent ID
529string is returned, the pickler will pickle that object, along with a marker so
530that the unpickler will recognize it as a persistent ID.
Georg Brandl116aa622007-08-15 14:28:22 +0000531
532To unpickle external objects, the unpickler must have a custom
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000533:meth:`persistent_load` method that takes a persistent ID object and returns the
534referenced object.
Georg Brandl116aa622007-08-15 14:28:22 +0000535
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000536Here is a comprehensive example presenting how persistent ID can be used to
537pickle external objects by reference.
Georg Brandl116aa622007-08-15 14:28:22 +0000538
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000539.. literalinclude:: ../includes/dbpickle.py
Alexandre Vassalottibcd1e3a2009-01-23 05:28:16 +0000540
Georg Brandl116aa622007-08-15 14:28:22 +0000541
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000542.. _pickle-state:
543
544Handling Stateful Objects
545^^^^^^^^^^^^^^^^^^^^^^^^^
546
547.. index::
548 single: __getstate__() (copy protocol)
549 single: __setstate__() (copy protocol)
550
551Here's an example that shows how to modify pickling behavior for a class.
552The :class:`TextReader` class opens a text file, and returns the line number and
553line contents each time its :meth:`readline` method is called. If a
554:class:`TextReader` instance is pickled, all attributes *except* the file object
555member are saved. When the instance is unpickled, the file is reopened, and
556reading resumes from the last location. The :meth:`__setstate__` and
557:meth:`__getstate__` methods are used to implement this behavior. ::
558
559 class TextReader:
560 """Print and number lines in a text file."""
561
562 def __init__(self, filename):
563 self.filename = filename
564 self.file = open(filename)
565 self.lineno = 0
566
567 def readline(self):
568 self.lineno += 1
569 line = self.file.readline()
570 if not line:
571 return None
Alexandre Vassalotti9d7665d2009-04-03 06:13:29 +0000572 if line.endswith('\n'):
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000573 line = line[:-1]
574 return "%i: %s" % (self.lineno, line)
575
576 def __getstate__(self):
577 # Copy the object's state from self.__dict__ which contains
578 # all our instance attributes. Always use the dict.copy()
579 # method to avoid modifying the original state.
580 state = self.__dict__.copy()
581 # Remove the unpicklable entries.
582 del state['file']
583 return state
584
585 def __setstate__(self, state):
586 # Restore instance attributes (i.e., filename and lineno).
587 self.__dict__.update(state)
588 # Restore the previously opened file's state. To do so, we need to
589 # reopen it and read from it until the line count is restored.
590 file = open(self.filename)
591 for _ in range(self.lineno):
592 file.readline()
593 # Finally, save the file.
594 self.file = file
595
596
597A sample usage might be something like this::
598
599 >>> reader = TextReader("hello.txt")
600 >>> reader.readline()
601 '1: Hello world!'
602 >>> reader.readline()
603 '2: I am line number two.'
604 >>> new_reader = pickle.loads(pickle.dumps(reader))
605 >>> new_reader.readline()
606 '3: Goodbye!'
607
608
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000609.. _pickle-restrict:
Georg Brandl116aa622007-08-15 14:28:22 +0000610
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000611Restricting Globals
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000612-------------------
Georg Brandl116aa622007-08-15 14:28:22 +0000613
Christian Heimes05e8be12008-02-23 18:30:17 +0000614.. index::
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000615 single: find_class() (pickle protocol)
Christian Heimes05e8be12008-02-23 18:30:17 +0000616
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000617By default, unpickling will import any class or function that it finds in the
618pickle data. For many applications, this behaviour is unacceptable as it
619permits the unpickler to import and invoke arbitrary code. Just consider what
620this hand-crafted pickle data stream does when loaded::
Georg Brandl116aa622007-08-15 14:28:22 +0000621
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000622 >>> import pickle
623 >>> pickle.loads(b"cos\nsystem\n(S'echo hello world'\ntR.")
624 hello world
625 0
Georg Brandl116aa622007-08-15 14:28:22 +0000626
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000627In this example, the unpickler imports the :func:`os.system` function and then
628apply the string argument "echo hello world". Although this example is
629inoffensive, it is not difficult to imagine one that could damage your system.
Georg Brandl116aa622007-08-15 14:28:22 +0000630
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000631For this reason, you may want to control what gets unpickled by customizing
632:meth:`Unpickler.find_class`. Unlike its name suggests, :meth:`find_class` is
633called whenever a global (i.e., a class or a function) is requested. Thus it is
634possible to either forbid completely globals or restrict them to a safe subset.
635
636Here is an example of an unpickler allowing only few safe classes from the
637:mod:`builtins` module to be loaded::
638
639 import builtins
640 import io
641 import pickle
642
643 safe_builtins = {
644 'range',
645 'complex',
646 'set',
647 'frozenset',
648 'slice',
649 }
650
651 class RestrictedUnpickler(pickle.Unpickler):
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000652
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000653 def find_class(self, module, name):
654 # Only allow safe classes from builtins.
655 if module == "builtins" and name in safe_builtins:
656 return getattr(builtins, name)
657 # Forbid everything else.
658 raise pickle.UnpicklingError("global '%s.%s' is forbidden" %
659 (module, name))
660
661 def restricted_loads(s):
662 """Helper function analogous to pickle.loads()."""
663 return RestrictedUnpickler(io.BytesIO(s)).load()
664
665A sample usage of our unpickler working has intended::
666
667 >>> restricted_loads(pickle.dumps([1, 2, range(15)]))
668 [1, 2, range(0, 15)]
669 >>> restricted_loads(b"cos\nsystem\n(S'echo hello world'\ntR.")
670 Traceback (most recent call last):
671 ...
672 pickle.UnpicklingError: global 'os.system' is forbidden
673 >>> restricted_loads(b'cbuiltins\neval\n'
674 ... b'(S\'getattr(__import__("os"), "system")'
675 ... b'("echo hello world")\'\ntR.')
676 Traceback (most recent call last):
677 ...
678 pickle.UnpicklingError: global 'builtins.eval' is forbidden
679
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000680
681.. XXX Add note about how extension codes could evade our protection
Georg Brandl48310cd2009-01-03 21:18:54 +0000682 mechanism (e.g. cached classes do not invokes find_class()).
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000683
684As our examples shows, you have to be careful with what you allow to be
685unpickled. Therefore if security is a concern, you may want to consider
Alexandre Vassalotti9d7665d2009-04-03 06:13:29 +0000686alternatives such as the marshalling API in :mod:`xmlrpc.client` or
687third-party solutions.
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000688
Georg Brandl116aa622007-08-15 14:28:22 +0000689
690.. _pickle-example:
691
Alexandre Vassalotti9d7665d2009-04-03 06:13:29 +0000692Examples
693--------
Georg Brandl116aa622007-08-15 14:28:22 +0000694
Alexandre Vassalotti9d7665d2009-04-03 06:13:29 +0000695For the simplest code, use the :func:`dump` and :func:`load` functions. ::
Georg Brandl116aa622007-08-15 14:28:22 +0000696
697 import pickle
698
Alexandre Vassalottibcd1e3a2009-01-23 05:28:16 +0000699 # An arbitrary collection of objects supported by pickle.
700 data = {
Alexandre Vassalotti9d7665d2009-04-03 06:13:29 +0000701 'a': [1, 2.0, 3, 4+6j],
702 'b': ("character string", b"byte string"),
703 'c': set([None, True, False])
Alexandre Vassalottibcd1e3a2009-01-23 05:28:16 +0000704 }
Georg Brandl116aa622007-08-15 14:28:22 +0000705
Alexandre Vassalottibcd1e3a2009-01-23 05:28:16 +0000706 with open('data.pickle', 'wb') as f:
707 # Pickle the 'data' dictionary using the highest protocol available.
708 pickle.dump(data, f, pickle.HIGHEST_PROTOCOL)
Georg Brandl116aa622007-08-15 14:28:22 +0000709
Georg Brandl116aa622007-08-15 14:28:22 +0000710
Alexandre Vassalottibcd1e3a2009-01-23 05:28:16 +0000711The following example reads the resulting pickled data. ::
Georg Brandl116aa622007-08-15 14:28:22 +0000712
Alexandre Vassalottibcd1e3a2009-01-23 05:28:16 +0000713 import pickle
Georg Brandl116aa622007-08-15 14:28:22 +0000714
Alexandre Vassalottibcd1e3a2009-01-23 05:28:16 +0000715 with open('data.pickle', 'rb') as f:
716 # The protocol version used is detected automatically, so we do not
717 # have to specify it.
718 data = pickle.load(f)
Georg Brandl116aa622007-08-15 14:28:22 +0000719
Georg Brandl116aa622007-08-15 14:28:22 +0000720
Alexandre Vassalotti9d7665d2009-04-03 06:13:29 +0000721.. XXX: Add examples showing how to optimize pickles for size (like using
722.. pickletools.optimize() or the gzip module).
723
724
Georg Brandl116aa622007-08-15 14:28:22 +0000725.. seealso::
726
Alexandre Vassalottif7fa63d2008-05-11 08:55:36 +0000727 Module :mod:`copyreg`
Georg Brandl116aa622007-08-15 14:28:22 +0000728 Pickle interface constructor registration for extension types.
729
Alexandre Vassalotti9d7665d2009-04-03 06:13:29 +0000730 Module :mod:`pickletools`
731 Tools for working with and analyzing pickled data.
732
Georg Brandl116aa622007-08-15 14:28:22 +0000733 Module :mod:`shelve`
734 Indexed databases of objects; uses :mod:`pickle`.
735
736 Module :mod:`copy`
737 Shallow and deep object copying.
738
739 Module :mod:`marshal`
740 High-performance serialization of built-in types.
741
742
Georg Brandl116aa622007-08-15 14:28:22 +0000743.. rubric:: Footnotes
744
745.. [#] Don't confuse this with the :mod:`marshal` module
746
Georg Brandl116aa622007-08-15 14:28:22 +0000747.. [#] The exception raised will likely be an :exc:`ImportError` or an
748 :exc:`AttributeError` but it could be something else.
749
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000750.. [#] The :mod:`copy` module uses this protocol for shallow and deep copying
751 operations.
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000752
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000753.. [#] The limitation on alphanumeric characters is due to the fact
754 the persistent IDs, in protocol 0, are delimited by the newline
755 character. Therefore if any kind of newline characters occurs in
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000756 persistent IDs, the resulting pickle will become unreadable.