blob: bb0da80b0016eda0deb69b757621a49e857c3a61 [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001:mod:`pickle` --- Python object serialization
2=============================================
3
4.. index::
5 single: persistence
6 pair: persistent; objects
7 pair: serializing; objects
8 pair: marshalling; objects
9 pair: flattening; objects
10 pair: pickling; objects
11
12.. module:: pickle
13 :synopsis: Convert Python objects to streams of bytes and back.
Christian Heimes5b5e81c2007-12-31 16:14:33 +000014.. sectionauthor:: Jim Kerr <jbkerr@sr.hp.com>.
15.. sectionauthor:: Barry Warsaw <barry@zope.com>
Georg Brandl116aa622007-08-15 14:28:22 +000016
Alexandre Vassalotti9d7665d2009-04-03 06:13:29 +000017
Georg Brandl116aa622007-08-15 14:28:22 +000018The :mod:`pickle` module implements a fundamental, but powerful algorithm for
19serializing and de-serializing a Python object structure. "Pickling" is the
20process whereby a Python object hierarchy is converted into a byte stream, and
21"unpickling" is the inverse operation, whereby a byte stream is converted back
22into an object hierarchy. Pickling (and unpickling) is alternatively known as
23"serialization", "marshalling," [#]_ or "flattening", however, to avoid
Benjamin Petersonbe149d02008-06-20 21:03:22 +000024confusion, the terms used here are "pickling" and "unpickling"..
Georg Brandl116aa622007-08-15 14:28:22 +000025
26
27Relationship to other Python modules
28------------------------------------
29
Benjamin Petersonbe149d02008-06-20 21:03:22 +000030The :mod:`pickle` module has an transparent optimizer (:mod:`_pickle`) written
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +000031in C. It is used whenever available. Otherwise the pure Python implementation is
Benjamin Petersonbe149d02008-06-20 21:03:22 +000032used.
Georg Brandl116aa622007-08-15 14:28:22 +000033
34Python has a more primitive serialization module called :mod:`marshal`, but in
35general :mod:`pickle` should always be the preferred way to serialize Python
36objects. :mod:`marshal` exists primarily to support Python's :file:`.pyc`
37files.
38
39The :mod:`pickle` module differs from :mod:`marshal` several significant ways:
40
41* The :mod:`pickle` module keeps track of the objects it has already serialized,
42 so that later references to the same object won't be serialized again.
43 :mod:`marshal` doesn't do this.
44
45 This has implications both for recursive objects and object sharing. Recursive
46 objects are objects that contain references to themselves. These are not
47 handled by marshal, and in fact, attempting to marshal recursive objects will
48 crash your Python interpreter. Object sharing happens when there are multiple
49 references to the same object in different places in the object hierarchy being
50 serialized. :mod:`pickle` stores such objects only once, and ensures that all
51 other references point to the master copy. Shared objects remain shared, which
52 can be very important for mutable objects.
53
54* :mod:`marshal` cannot be used to serialize user-defined classes and their
55 instances. :mod:`pickle` can save and restore class instances transparently,
56 however the class definition must be importable and live in the same module as
57 when the object was stored.
58
59* The :mod:`marshal` serialization format is not guaranteed to be portable
60 across Python versions. Because its primary job in life is to support
61 :file:`.pyc` files, the Python implementers reserve the right to change the
62 serialization format in non-backwards compatible ways should the need arise.
63 The :mod:`pickle` serialization format is guaranteed to be backwards compatible
64 across Python releases.
65
66.. warning::
67
68 The :mod:`pickle` module is not intended to be secure against erroneous or
69 maliciously constructed data. Never unpickle data received from an untrusted or
70 unauthenticated source.
71
72Note that serialization is a more primitive notion than persistence; although
73:mod:`pickle` reads and writes file objects, it does not handle the issue of
74naming persistent objects, nor the (even more complicated) issue of concurrent
75access to persistent objects. The :mod:`pickle` module can transform a complex
76object into a byte stream and it can transform the byte stream into an object
77with the same internal structure. Perhaps the most obvious thing to do with
78these byte streams is to write them onto a file, but it is also conceivable to
79send them across a network or store them in a database. The module
80:mod:`shelve` provides a simple interface to pickle and unpickle objects on
81DBM-style database files.
82
83
84Data stream format
85------------------
86
87.. index::
88 single: XDR
89 single: External Data Representation
90
91The data format used by :mod:`pickle` is Python-specific. This has the
92advantage that there are no restrictions imposed by external standards such as
93XDR (which can't represent pointer sharing); however it means that non-Python
94programs may not be able to reconstruct pickled Python objects.
95
Alexandre Vassalotti758bca62008-10-18 19:25:07 +000096By default, the :mod:`pickle` data format uses a compact binary representation.
97The module :mod:`pickletools` contains tools for analyzing data streams
98generated by :mod:`pickle`.
Georg Brandl116aa622007-08-15 14:28:22 +000099
Georg Brandl42f2ae02008-04-06 08:39:37 +0000100There are currently 4 different protocols which can be used for pickling.
Georg Brandl116aa622007-08-15 14:28:22 +0000101
Alexandre Vassalottif7d08c72009-01-23 04:50:05 +0000102* Protocol version 0 is the original human-readable protocol and is
103 backwards compatible with earlier versions of Python.
Georg Brandl116aa622007-08-15 14:28:22 +0000104
105* Protocol version 1 is the old binary format which is also compatible with
106 earlier versions of Python.
107
108* Protocol version 2 was introduced in Python 2.3. It provides much more
Georg Brandl9afde1c2007-11-01 20:32:30 +0000109 efficient pickling of :term:`new-style class`\es.
Georg Brandl116aa622007-08-15 14:28:22 +0000110
Georg Brandl42f2ae02008-04-06 08:39:37 +0000111* Protocol version 3 was added in Python 3.0. It has explicit support for
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000112 bytes and cannot be unpickled by Python 2.x pickle modules. This is
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000113 the current recommended protocol, use it whenever it is possible.
Georg Brandl42f2ae02008-04-06 08:39:37 +0000114
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000115Refer to :pep:`307` for information about improvements brought by
116protocol 2. See :mod:`pickletools`'s source code for extensive
117comments about opcodes used by pickle protocols.
Georg Brandl116aa622007-08-15 14:28:22 +0000118
Georg Brandl116aa622007-08-15 14:28:22 +0000119
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000120Module Interface
121----------------
Georg Brandl116aa622007-08-15 14:28:22 +0000122
123To serialize an object hierarchy, you first create a pickler, then you call the
124pickler's :meth:`dump` method. To de-serialize a data stream, you first create
125an unpickler, then you call the unpickler's :meth:`load` method. The
126:mod:`pickle` module provides the following constant:
127
128
129.. data:: HIGHEST_PROTOCOL
130
131 The highest protocol version available. This value can be passed as a
132 *protocol* value.
133
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000134.. data:: DEFAULT_PROTOCOL
135
136 The default protocol used for pickling. May be less than HIGHEST_PROTOCOL.
137 Currently the default protocol is 3; a backward-incompatible protocol
138 designed for Python 3.0.
139
140
Georg Brandl116aa622007-08-15 14:28:22 +0000141The :mod:`pickle` module provides the following functions to make the pickling
142process more convenient:
143
Georg Brandl116aa622007-08-15 14:28:22 +0000144.. function:: dump(obj, file[, protocol])
145
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000146 Write a pickled representation of *obj* to the open file object *file*. This
147 is equivalent to ``Pickler(file, protocol).dump(obj)``.
Georg Brandl116aa622007-08-15 14:28:22 +0000148
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000149 The optional *protocol* argument tells the pickler to use the given protocol;
150 supported protocols are 0, 1, 2, 3. The default protocol is 3; a
151 backward-incompatible protocol designed for Python 3.0.
Georg Brandl116aa622007-08-15 14:28:22 +0000152
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000153 Specifying a negative protocol version selects the highest protocol version
154 supported. The higher the protocol used, the more recent the version of
155 Python needed to read the pickle produced.
Georg Brandl116aa622007-08-15 14:28:22 +0000156
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000157 The *file* argument must have a write() method that accepts a single bytes
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000158 argument. It can thus be a file object opened for binary writing, a
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000159 io.BytesIO instance, or any other custom object that meets this interface.
Georg Brandl116aa622007-08-15 14:28:22 +0000160
161.. function:: dumps(obj[, protocol])
162
Mark Summerfieldb9e23042008-04-21 14:47:45 +0000163 Return the pickled representation of the object as a :class:`bytes`
164 object, instead of writing it to a file.
Georg Brandl116aa622007-08-15 14:28:22 +0000165
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000166 The optional *protocol* argument tells the pickler to use the given protocol;
167 supported protocols are 0, 1, 2, 3. The default protocol is 3; a
168 backward-incompatible protocol designed for Python 3.0.
169
170 Specifying a negative protocol version selects the highest protocol version
171 supported. The higher the protocol used, the more recent the version of
172 Python needed to read the pickle produced.
173
174.. function:: load(file, [\*, encoding="ASCII", errors="strict"])
175
176 Read a pickled object representation from the open file object *file* and
177 return the reconstituted object hierarchy specified therein. This is
178 equivalent to ``Unpickler(file).load()``.
179
180 The protocol version of the pickle is detected automatically, so no protocol
181 argument is needed. Bytes past the pickled object's representation are
182 ignored.
183
184 The argument *file* must have two methods, a read() method that takes an
185 integer argument, and a readline() method that requires no arguments. Both
186 methods should return bytes. Thus *file* can be a binary file object opened
187 for reading, a BytesIO object, or any other custom object that meets this
188 interface.
189
190 Optional keyword arguments are encoding and errors, which are used to decode
191 8-bit string instances pickled by Python 2.x. These default to 'ASCII' and
192 'strict', respectively.
193
194.. function:: loads(bytes_object, [\*, encoding="ASCII", errors="strict"])
195
196 Read a pickled object hierarchy from a :class:`bytes` object and return the
197 reconstituted object hierarchy specified therein
198
199 The protocol version of the pickle is detected automatically, so no protocol
200 argument is needed. Bytes past the pickled object's representation are
201 ignored.
202
203 Optional keyword arguments are encoding and errors, which are used to decode
204 8-bit string instances pickled by Python 2.x. These default to 'ASCII' and
205 'strict', respectively.
Georg Brandl116aa622007-08-15 14:28:22 +0000206
Georg Brandl116aa622007-08-15 14:28:22 +0000207
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000208The :mod:`pickle` module defines three exceptions:
Georg Brandl116aa622007-08-15 14:28:22 +0000209
210.. exception:: PickleError
211
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000212 Common base class for the other pickling exceptions. It inherits
Georg Brandl116aa622007-08-15 14:28:22 +0000213 :exc:`Exception`.
214
Georg Brandl116aa622007-08-15 14:28:22 +0000215.. exception:: PicklingError
216
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000217 Error raised when an unpicklable object is encountered by :class:`Pickler`.
218 It inherits :exc:`PickleError`.
Georg Brandl116aa622007-08-15 14:28:22 +0000219
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000220 Refer to :ref:`pickle-picklable` to learn what kinds of objects can be
221 pickled.
222
Georg Brandl116aa622007-08-15 14:28:22 +0000223.. exception:: UnpicklingError
224
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000225 Error raised when there a problem unpickling an object, such as a data
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000226 corruption or a security violation. It inherits :exc:`PickleError`.
Georg Brandl116aa622007-08-15 14:28:22 +0000227
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000228 Note that other exceptions may also be raised during unpickling, including
229 (but not necessarily limited to) AttributeError, EOFError, ImportError, and
230 IndexError.
231
232
233The :mod:`pickle` module exports two classes, :class:`Pickler` and
Georg Brandl116aa622007-08-15 14:28:22 +0000234:class:`Unpickler`:
235
Georg Brandl116aa622007-08-15 14:28:22 +0000236.. class:: Pickler(file[, protocol])
237
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000238 This takes a binary file for writing a pickle data stream.
Georg Brandl116aa622007-08-15 14:28:22 +0000239
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000240 The optional *protocol* argument tells the pickler to use the given protocol;
241 supported protocols are 0, 1, 2, 3. The default protocol is 3; a
242 backward-incompatible protocol designed for Python 3.0.
Georg Brandl116aa622007-08-15 14:28:22 +0000243
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000244 Specifying a negative protocol version selects the highest protocol version
245 supported. The higher the protocol used, the more recent the version of
246 Python needed to read the pickle produced.
Georg Brandl116aa622007-08-15 14:28:22 +0000247
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000248 The *file* argument must have a write() method that accepts a single bytes
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000249 argument. It can thus be a file object opened for binary writing, a
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000250 io.BytesIO instance, or any other custom object that meets this interface.
Georg Brandl116aa622007-08-15 14:28:22 +0000251
Benjamin Petersone41251e2008-04-25 01:59:09 +0000252 .. method:: dump(obj)
Georg Brandl116aa622007-08-15 14:28:22 +0000253
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000254 Write a pickled representation of *obj* to the open file object given in
255 the constructor.
Georg Brandl116aa622007-08-15 14:28:22 +0000256
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000257 .. method:: persistent_id(obj)
258
259 Do nothing by default. This exists so a subclass can override it.
260
261 If :meth:`persistent_id` returns ``None``, *obj* is pickled as usual. Any
262 other value causes :class:`Pickler` to emit the returned value as a
263 persistent ID for *obj*. The meaning of this persistent ID should be
264 defined by :meth:`Unpickler.persistent_load`. Note that the value
265 returned by :meth:`persistent_id` cannot itself have a persistent ID.
266
267 See :ref:`pickle-persistent` for details and examples of uses.
Georg Brandl116aa622007-08-15 14:28:22 +0000268
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000269 .. attribute:: fast
270
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000271 Deprecated. Enable fast mode if set to a true value. The fast mode
272 disables the usage of memo, therefore speeding the pickling process by not
273 generating superfluous PUT opcodes. It should not be used with
274 self-referential objects, doing otherwise will cause :class:`Pickler` to
275 recurse infinitely.
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000276
277 Use :func:`pickletools.optimize` if you need more compact pickles.
278
Georg Brandl116aa622007-08-15 14:28:22 +0000279
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000280.. class:: Unpickler(file, [\*, encoding="ASCII", errors="strict"])
Georg Brandl116aa622007-08-15 14:28:22 +0000281
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000282 This takes a binary file for reading a pickle data stream.
Georg Brandl116aa622007-08-15 14:28:22 +0000283
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000284 The protocol version of the pickle is detected automatically, so no
285 protocol argument is needed.
286
287 The argument *file* must have two methods, a read() method that takes an
288 integer argument, and a readline() method that requires no arguments. Both
289 methods should return bytes. Thus *file* can be a binary file object opened
290 for reading, a BytesIO object, or any other custom object that meets this
Georg Brandl116aa622007-08-15 14:28:22 +0000291 interface.
292
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000293 Optional keyword arguments are encoding and errors, which are used to decode
294 8-bit string instances pickled by Python 2.x. These default to 'ASCII' and
295 'strict', respectively.
Georg Brandl116aa622007-08-15 14:28:22 +0000296
Benjamin Petersone41251e2008-04-25 01:59:09 +0000297 .. method:: load()
Georg Brandl116aa622007-08-15 14:28:22 +0000298
Benjamin Petersone41251e2008-04-25 01:59:09 +0000299 Read a pickled object representation from the open file object given in
300 the constructor, and return the reconstituted object hierarchy specified
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000301 therein. Bytes past the pickled object's representation are ignored.
Georg Brandl116aa622007-08-15 14:28:22 +0000302
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000303 .. method:: persistent_load(pid)
Georg Brandl116aa622007-08-15 14:28:22 +0000304
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000305 Raise an :exc:`UnpickingError` by default.
Georg Brandl116aa622007-08-15 14:28:22 +0000306
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000307 If defined, :meth:`persistent_load` should return the object specified by
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000308 the persistent ID *pid*. If an invalid persistent ID is encountered, an
309 :exc:`UnpickingError` should be raised.
Georg Brandl116aa622007-08-15 14:28:22 +0000310
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000311 See :ref:`pickle-persistent` for details and examples of uses.
312
313 .. method:: find_class(module, name)
314
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000315 Import *module* if necessary and return the object called *name* from it,
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000316 where the *module* and *name* arguments are :class:`str` objects. Note,
317 unlike its name suggests, :meth:`find_class` is also used for finding
318 functions.
Georg Brandl116aa622007-08-15 14:28:22 +0000319
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000320 Subclasses may override this to gain control over what type of objects and
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000321 how they can be loaded, potentially reducing security risks. Refer to
322 :ref:`pickle-restrict` for details.
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000323
324
325.. _pickle-picklable:
Georg Brandl116aa622007-08-15 14:28:22 +0000326
327What can be pickled and unpickled?
328----------------------------------
329
330The following types can be pickled:
331
332* ``None``, ``True``, and ``False``
333
Georg Brandlba956ae2007-11-29 17:24:34 +0000334* integers, floating point numbers, complex numbers
Georg Brandl116aa622007-08-15 14:28:22 +0000335
Georg Brandlf6945182008-02-01 11:56:49 +0000336* strings, bytes, bytearrays
Georg Brandl116aa622007-08-15 14:28:22 +0000337
338* tuples, lists, sets, and dictionaries containing only picklable objects
339
340* functions defined at the top level of a module
341
342* built-in functions defined at the top level of a module
343
344* classes that are defined at the top level of a module
345
346* instances of such classes whose :attr:`__dict__` or :meth:`__setstate__` is
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000347 picklable (see section :ref:`pickle-inst` for details)
Georg Brandl116aa622007-08-15 14:28:22 +0000348
349Attempts to pickle unpicklable objects will raise the :exc:`PicklingError`
350exception; when this happens, an unspecified number of bytes may have already
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000351been written to the underlying file. Trying to pickle a highly recursive data
Georg Brandl116aa622007-08-15 14:28:22 +0000352structure may exceed the maximum recursion depth, a :exc:`RuntimeError` will be
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000353raised in this case. You can carefully raise this limit with
Georg Brandl116aa622007-08-15 14:28:22 +0000354:func:`sys.setrecursionlimit`.
355
356Note that functions (built-in and user-defined) are pickled by "fully qualified"
357name reference, not by value. This means that only the function name is
358pickled, along with the name of module the function is defined in. Neither the
359function's code, nor any of its function attributes are pickled. Thus the
360defining module must be importable in the unpickling environment, and the module
361must contain the named object, otherwise an exception will be raised. [#]_
362
363Similarly, classes are pickled by named reference, so the same restrictions in
364the unpickling environment apply. Note that none of the class's code or data is
365pickled, so in the following example the class attribute ``attr`` is not
366restored in the unpickling environment::
367
368 class Foo:
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000369 attr = 'A class attribute'
Georg Brandl116aa622007-08-15 14:28:22 +0000370
371 picklestring = pickle.dumps(Foo)
372
373These restrictions are why picklable functions and classes must be defined in
374the top level of a module.
375
376Similarly, when class instances are pickled, their class's code and data are not
377pickled along with them. Only the instance data are pickled. This is done on
378purpose, so you can fix bugs in a class or add methods to the class and still
379load objects that were created with an earlier version of the class. If you
380plan to have long-lived objects that will see many versions of a class, it may
381be worthwhile to put a version number in the objects so that suitable
382conversions can be made by the class's :meth:`__setstate__` method.
383
384
Georg Brandl116aa622007-08-15 14:28:22 +0000385.. _pickle-inst:
386
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000387Pickling Class Instances
388------------------------
Georg Brandl116aa622007-08-15 14:28:22 +0000389
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000390In this section, we describe the general mechanisms available to you to define,
391customize, and control how class instances are pickled and unpickled.
Georg Brandl116aa622007-08-15 14:28:22 +0000392
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000393In most cases, no additional code is needed to make instances picklable. By
394default, pickle will retrieve the class and the attributes of an instance via
395introspection. When a class instance is unpickled, its :meth:`__init__` method
396is usually *not* invoked. The default behaviour first creates an uninitialized
397instance and then restores the saved attributes. The following code shows an
398implementation of this behaviour::
Georg Brandl85eb8c12007-08-31 16:33:38 +0000399
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000400 def save(obj):
401 return (obj.__class__, obj.__dict__)
402
403 def load(cls, attributes):
404 obj = cls.__new__(cls)
405 obj.__dict__.update(attributes)
406 return obj
Georg Brandl116aa622007-08-15 14:28:22 +0000407
408.. index:: single: __getnewargs__() (copy protocol)
409
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000410Classes can alter the default behaviour by providing one or severals special
411methods. In protocol 2 and newer, classes that implements the
412:meth:`__getnewargs__` method can dictate the values passed to the
413:meth:`__new__` method upon unpickling. This is often needed for classes
414whose :meth:`__new__` method requires arguments.
Georg Brandl116aa622007-08-15 14:28:22 +0000415
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000416.. index:: single: __getstate__() (copy protocol)
Georg Brandl116aa622007-08-15 14:28:22 +0000417
418Classes can further influence how their instances are pickled; if the class
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000419defines the method :meth:`__getstate__`, it is called and the returned object is
Georg Brandl116aa622007-08-15 14:28:22 +0000420pickled as the contents for the instance, instead of the contents of the
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000421instance's dictionary. If the :meth:`__getstate__` method is absent, the
422instance's :attr:`__dict__` is pickled as usual.
Georg Brandl116aa622007-08-15 14:28:22 +0000423
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000424.. index:: single: __setstate__() (copy protocol)
Georg Brandl116aa622007-08-15 14:28:22 +0000425
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000426Upon unpickling, if the class defines :meth:`__setstate__`, it is called with
427the unpickled state. In that case, there is no requirement for the state object
428to be a dictionary. Otherwise, the pickled state must be a dictionary and its
429items are assigned to the new instance's dictionary.
430
431.. note::
Georg Brandl116aa622007-08-15 14:28:22 +0000432
Georg Brandl23e8db52008-04-07 19:17:06 +0000433 If :meth:`__getstate__` returns a false value, the :meth:`__setstate__`
434 method will not be called.
Georg Brandl116aa622007-08-15 14:28:22 +0000435
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000436Refer to the section :ref:`pickle-state` for more information about how to use
437the methods :meth:`__getstate__` and :meth:`__setstate__`.
Georg Brandl116aa622007-08-15 14:28:22 +0000438
Benjamin Petersond23f8222009-04-05 19:13:16 +0000439.. note::
440 At unpickling time, some methods like :meth:`__getattr__`,
441 :meth:`__getattribute__`, or :meth:`__setattr__` may be called upon the
442 instance. In case those methods rely on some internal invariant being
443 true, the type should implement either :meth:`__getinitargs__` or
444 :meth:`__getnewargs__` to establish such an invariant; otherwise, neither
445 :meth:`__new__` nor :meth:`__init__` will be called.
446
Christian Heimes05e8be12008-02-23 18:30:17 +0000447.. index::
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000448 pair: copy; protocol
449 single: __reduce__() (copy protocol)
Christian Heimes05e8be12008-02-23 18:30:17 +0000450
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000451As we shall see, pickle does not use directly the methods described above. In
452fact, these methods are part of the copy protocol which implements the
453:meth:`__reduce__` special method. The copy protocol provides a unified
454interface for retrieving the data necessary for pickling and copying
Georg Brandl48310cd2009-01-03 21:18:54 +0000455objects. [#]_
Georg Brandl116aa622007-08-15 14:28:22 +0000456
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000457Although powerful, implementing :meth:`__reduce__` directly in your classes is
458error prone. For this reason, class designers should use the high-level
459interface (i.e., :meth:`__getnewargs__`, :meth:`__getstate__` and
Georg Brandlae2dbe22009-03-13 19:04:40 +0000460:meth:`__setstate__`) whenever possible. We will show, however, cases where using
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000461:meth:`__reduce__` is the only option or leads to more efficient pickling or
462both.
Georg Brandl116aa622007-08-15 14:28:22 +0000463
Georg Brandlae2dbe22009-03-13 19:04:40 +0000464The interface is currently defined as follows. The :meth:`__reduce__` method
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000465takes no argument and shall return either a string or preferably a tuple (the
Georg Brandlae2dbe22009-03-13 19:04:40 +0000466returned object is often referred to as the "reduce value").
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000467
468If a string is returned, the string should be interpreted as the name of a
469global variable. It should be the object's local name relative to its module;
470the pickle module searches the module namespace to determine the object's
471module. This behaviour is typically useful for singletons.
472
473When a tuple is returned, it must be between two and five items long. Optional
474items can either be omitted, or ``None`` can be provided as their value. The
475semantics of each item are in order:
476
477.. XXX Mention __newobj__ special-case?
Georg Brandl116aa622007-08-15 14:28:22 +0000478
479* A callable object that will be called to create the initial version of the
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000480 object.
Georg Brandl116aa622007-08-15 14:28:22 +0000481
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000482* A tuple of arguments for the callable object. An empty tuple must be given if
483 the callable does not accept any argument.
Georg Brandl116aa622007-08-15 14:28:22 +0000484
485* Optionally, the object's state, which will be passed to the object's
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000486 :meth:`__setstate__` method as previously described. If the object has no
487 such method then, the value must be a dictionary and it will be added to the
488 object's :attr:`__dict__` attribute.
Georg Brandl116aa622007-08-15 14:28:22 +0000489
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000490* Optionally, an iterator (and not a sequence) yielding successive items. These
491 items will be appended to the object either using ``obj.append(item)`` or, in
492 batch, using ``obj.extend(list_of_items)``. This is primarily used for list
493 subclasses, but may be used by other classes as long as they have
Georg Brandl116aa622007-08-15 14:28:22 +0000494 :meth:`append` and :meth:`extend` methods with the appropriate signature.
495 (Whether :meth:`append` or :meth:`extend` is used depends on which pickle
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000496 protocol version is used as well as the number of items to append, so both
497 must be supported.)
Georg Brandl116aa622007-08-15 14:28:22 +0000498
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000499* Optionally, an iterator (not a sequence) yielding successive key-value pairs.
500 These items will be stored to the object using ``obj[key] = value``. This is
501 primarily used for dictionary subclasses, but may be used by other classes as
502 long as they implement :meth:`__setitem__`.
Georg Brandl116aa622007-08-15 14:28:22 +0000503
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000504.. index:: single: __reduce_ex__() (copy protocol)
Georg Brandl116aa622007-08-15 14:28:22 +0000505
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000506Alternatively, a :meth:`__reduce_ex__` method may be defined. The only
507difference is this method should take a single integer argument, the protocol
508version. When defined, pickle will prefer it over the :meth:`__reduce__`
509method. In addition, :meth:`__reduce__` automatically becomes a synonym for the
510extended version. The main use for this method is to provide
511backwards-compatible reduce values for older Python releases.
Georg Brandl116aa622007-08-15 14:28:22 +0000512
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000513.. _pickle-persistent:
514
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000515Persistence of External Objects
516^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Georg Brandl116aa622007-08-15 14:28:22 +0000517
Christian Heimes05e8be12008-02-23 18:30:17 +0000518.. index::
519 single: persistent_id (pickle protocol)
520 single: persistent_load (pickle protocol)
521
Georg Brandl116aa622007-08-15 14:28:22 +0000522For the benefit of object persistence, the :mod:`pickle` module supports the
523notion of a reference to an object outside the pickled data stream. Such
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000524objects are referenced by a persistent ID, which should be either a string of
525alphanumeric characters (for protocol 0) [#]_ or just an arbitrary object (for
526any newer protocol).
Georg Brandl116aa622007-08-15 14:28:22 +0000527
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000528The resolution of such persistent IDs is not defined by the :mod:`pickle`
529module; it will delegate this resolution to the user defined methods on the
530pickler and unpickler, :meth:`persistent_id` and :meth:`persistent_load`
531respectively.
Georg Brandl116aa622007-08-15 14:28:22 +0000532
533To pickle objects that have an external persistent id, the pickler must have a
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000534custom :meth:`persistent_id` method that takes an object as an argument and
Georg Brandl116aa622007-08-15 14:28:22 +0000535returns either ``None`` or the persistent id for that object. When ``None`` is
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000536returned, the pickler simply pickles the object as normal. When a persistent ID
537string is returned, the pickler will pickle that object, along with a marker so
538that the unpickler will recognize it as a persistent ID.
Georg Brandl116aa622007-08-15 14:28:22 +0000539
540To unpickle external objects, the unpickler must have a custom
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000541:meth:`persistent_load` method that takes a persistent ID object and returns the
542referenced object.
Georg Brandl116aa622007-08-15 14:28:22 +0000543
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000544Here is a comprehensive example presenting how persistent ID can be used to
545pickle external objects by reference.
Georg Brandl116aa622007-08-15 14:28:22 +0000546
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000547.. literalinclude:: ../includes/dbpickle.py
Alexandre Vassalottibcd1e3a2009-01-23 05:28:16 +0000548
Georg Brandl116aa622007-08-15 14:28:22 +0000549
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000550.. _pickle-state:
551
552Handling Stateful Objects
553^^^^^^^^^^^^^^^^^^^^^^^^^
554
555.. index::
556 single: __getstate__() (copy protocol)
557 single: __setstate__() (copy protocol)
558
559Here's an example that shows how to modify pickling behavior for a class.
560The :class:`TextReader` class opens a text file, and returns the line number and
561line contents each time its :meth:`readline` method is called. If a
562:class:`TextReader` instance is pickled, all attributes *except* the file object
563member are saved. When the instance is unpickled, the file is reopened, and
564reading resumes from the last location. The :meth:`__setstate__` and
565:meth:`__getstate__` methods are used to implement this behavior. ::
566
567 class TextReader:
568 """Print and number lines in a text file."""
569
570 def __init__(self, filename):
571 self.filename = filename
572 self.file = open(filename)
573 self.lineno = 0
574
575 def readline(self):
576 self.lineno += 1
577 line = self.file.readline()
578 if not line:
579 return None
Alexandre Vassalotti9d7665d2009-04-03 06:13:29 +0000580 if line.endswith('\n'):
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000581 line = line[:-1]
582 return "%i: %s" % (self.lineno, line)
583
584 def __getstate__(self):
585 # Copy the object's state from self.__dict__ which contains
586 # all our instance attributes. Always use the dict.copy()
587 # method to avoid modifying the original state.
588 state = self.__dict__.copy()
589 # Remove the unpicklable entries.
590 del state['file']
591 return state
592
593 def __setstate__(self, state):
594 # Restore instance attributes (i.e., filename and lineno).
595 self.__dict__.update(state)
596 # Restore the previously opened file's state. To do so, we need to
597 # reopen it and read from it until the line count is restored.
598 file = open(self.filename)
599 for _ in range(self.lineno):
600 file.readline()
601 # Finally, save the file.
602 self.file = file
603
604
605A sample usage might be something like this::
606
607 >>> reader = TextReader("hello.txt")
608 >>> reader.readline()
609 '1: Hello world!'
610 >>> reader.readline()
611 '2: I am line number two.'
612 >>> new_reader = pickle.loads(pickle.dumps(reader))
613 >>> new_reader.readline()
614 '3: Goodbye!'
615
616
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000617.. _pickle-restrict:
Georg Brandl116aa622007-08-15 14:28:22 +0000618
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000619Restricting Globals
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000620-------------------
Georg Brandl116aa622007-08-15 14:28:22 +0000621
Christian Heimes05e8be12008-02-23 18:30:17 +0000622.. index::
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000623 single: find_class() (pickle protocol)
Christian Heimes05e8be12008-02-23 18:30:17 +0000624
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000625By default, unpickling will import any class or function that it finds in the
626pickle data. For many applications, this behaviour is unacceptable as it
627permits the unpickler to import and invoke arbitrary code. Just consider what
628this hand-crafted pickle data stream does when loaded::
Georg Brandl116aa622007-08-15 14:28:22 +0000629
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000630 >>> import pickle
631 >>> pickle.loads(b"cos\nsystem\n(S'echo hello world'\ntR.")
632 hello world
633 0
Georg Brandl116aa622007-08-15 14:28:22 +0000634
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000635In this example, the unpickler imports the :func:`os.system` function and then
636apply the string argument "echo hello world". Although this example is
637inoffensive, it is not difficult to imagine one that could damage your system.
Georg Brandl116aa622007-08-15 14:28:22 +0000638
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000639For this reason, you may want to control what gets unpickled by customizing
640:meth:`Unpickler.find_class`. Unlike its name suggests, :meth:`find_class` is
641called whenever a global (i.e., a class or a function) is requested. Thus it is
642possible to either forbid completely globals or restrict them to a safe subset.
643
644Here is an example of an unpickler allowing only few safe classes from the
645:mod:`builtins` module to be loaded::
646
647 import builtins
648 import io
649 import pickle
650
651 safe_builtins = {
652 'range',
653 'complex',
654 'set',
655 'frozenset',
656 'slice',
657 }
658
659 class RestrictedUnpickler(pickle.Unpickler):
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000660
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000661 def find_class(self, module, name):
662 # Only allow safe classes from builtins.
663 if module == "builtins" and name in safe_builtins:
664 return getattr(builtins, name)
665 # Forbid everything else.
666 raise pickle.UnpicklingError("global '%s.%s' is forbidden" %
667 (module, name))
668
669 def restricted_loads(s):
670 """Helper function analogous to pickle.loads()."""
671 return RestrictedUnpickler(io.BytesIO(s)).load()
672
673A sample usage of our unpickler working has intended::
674
675 >>> restricted_loads(pickle.dumps([1, 2, range(15)]))
676 [1, 2, range(0, 15)]
677 >>> restricted_loads(b"cos\nsystem\n(S'echo hello world'\ntR.")
678 Traceback (most recent call last):
679 ...
680 pickle.UnpicklingError: global 'os.system' is forbidden
681 >>> restricted_loads(b'cbuiltins\neval\n'
682 ... b'(S\'getattr(__import__("os"), "system")'
683 ... b'("echo hello world")\'\ntR.')
684 Traceback (most recent call last):
685 ...
686 pickle.UnpicklingError: global 'builtins.eval' is forbidden
687
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000688
689.. XXX Add note about how extension codes could evade our protection
Georg Brandl48310cd2009-01-03 21:18:54 +0000690 mechanism (e.g. cached classes do not invokes find_class()).
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000691
692As our examples shows, you have to be careful with what you allow to be
693unpickled. Therefore if security is a concern, you may want to consider
Alexandre Vassalotti9d7665d2009-04-03 06:13:29 +0000694alternatives such as the marshalling API in :mod:`xmlrpc.client` or
695third-party solutions.
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000696
Georg Brandl116aa622007-08-15 14:28:22 +0000697
698.. _pickle-example:
699
Alexandre Vassalotti9d7665d2009-04-03 06:13:29 +0000700Examples
701--------
Georg Brandl116aa622007-08-15 14:28:22 +0000702
Alexandre Vassalotti9d7665d2009-04-03 06:13:29 +0000703For the simplest code, use the :func:`dump` and :func:`load` functions. ::
Georg Brandl116aa622007-08-15 14:28:22 +0000704
705 import pickle
706
Alexandre Vassalottibcd1e3a2009-01-23 05:28:16 +0000707 # An arbitrary collection of objects supported by pickle.
708 data = {
Alexandre Vassalotti9d7665d2009-04-03 06:13:29 +0000709 'a': [1, 2.0, 3, 4+6j],
710 'b': ("character string", b"byte string"),
711 'c': set([None, True, False])
Alexandre Vassalottibcd1e3a2009-01-23 05:28:16 +0000712 }
Georg Brandl116aa622007-08-15 14:28:22 +0000713
Alexandre Vassalottibcd1e3a2009-01-23 05:28:16 +0000714 with open('data.pickle', 'wb') as f:
715 # Pickle the 'data' dictionary using the highest protocol available.
716 pickle.dump(data, f, pickle.HIGHEST_PROTOCOL)
Georg Brandl116aa622007-08-15 14:28:22 +0000717
Georg Brandl116aa622007-08-15 14:28:22 +0000718
Alexandre Vassalottibcd1e3a2009-01-23 05:28:16 +0000719The following example reads the resulting pickled data. ::
Georg Brandl116aa622007-08-15 14:28:22 +0000720
Alexandre Vassalottibcd1e3a2009-01-23 05:28:16 +0000721 import pickle
Georg Brandl116aa622007-08-15 14:28:22 +0000722
Alexandre Vassalottibcd1e3a2009-01-23 05:28:16 +0000723 with open('data.pickle', 'rb') as f:
724 # The protocol version used is detected automatically, so we do not
725 # have to specify it.
726 data = pickle.load(f)
Georg Brandl116aa622007-08-15 14:28:22 +0000727
Georg Brandl116aa622007-08-15 14:28:22 +0000728
Alexandre Vassalotti9d7665d2009-04-03 06:13:29 +0000729.. XXX: Add examples showing how to optimize pickles for size (like using
730.. pickletools.optimize() or the gzip module).
731
732
Georg Brandl116aa622007-08-15 14:28:22 +0000733.. seealso::
734
Alexandre Vassalottif7fa63d2008-05-11 08:55:36 +0000735 Module :mod:`copyreg`
Georg Brandl116aa622007-08-15 14:28:22 +0000736 Pickle interface constructor registration for extension types.
737
Alexandre Vassalotti9d7665d2009-04-03 06:13:29 +0000738 Module :mod:`pickletools`
739 Tools for working with and analyzing pickled data.
740
Georg Brandl116aa622007-08-15 14:28:22 +0000741 Module :mod:`shelve`
742 Indexed databases of objects; uses :mod:`pickle`.
743
744 Module :mod:`copy`
745 Shallow and deep object copying.
746
747 Module :mod:`marshal`
748 High-performance serialization of built-in types.
749
750
Georg Brandl116aa622007-08-15 14:28:22 +0000751.. rubric:: Footnotes
752
753.. [#] Don't confuse this with the :mod:`marshal` module
754
Georg Brandl116aa622007-08-15 14:28:22 +0000755.. [#] The exception raised will likely be an :exc:`ImportError` or an
756 :exc:`AttributeError` but it could be something else.
757
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000758.. [#] The :mod:`copy` module uses this protocol for shallow and deep copying
759 operations.
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000760
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000761.. [#] The limitation on alphanumeric characters is due to the fact
762 the persistent IDs, in protocol 0, are delimited by the newline
763 character. Therefore if any kind of newline characters occurs in
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000764 persistent IDs, the resulting pickle will become unreadable.