blob: 1c70196dbb1a769501f680e34f037f9c54cb6eab [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001:mod:`pickle` --- Python object serialization
2=============================================
3
4.. index::
5 single: persistence
6 pair: persistent; objects
7 pair: serializing; objects
8 pair: marshalling; objects
9 pair: flattening; objects
10 pair: pickling; objects
11
12.. module:: pickle
13 :synopsis: Convert Python objects to streams of bytes and back.
Christian Heimes5b5e81c2007-12-31 16:14:33 +000014.. sectionauthor:: Jim Kerr <jbkerr@sr.hp.com>.
15.. sectionauthor:: Barry Warsaw <barry@zope.com>
Georg Brandl116aa622007-08-15 14:28:22 +000016
Alexandre Vassalotti9d7665d2009-04-03 06:13:29 +000017
Georg Brandl116aa622007-08-15 14:28:22 +000018The :mod:`pickle` module implements a fundamental, but powerful algorithm for
19serializing and de-serializing a Python object structure. "Pickling" is the
20process whereby a Python object hierarchy is converted into a byte stream, and
21"unpickling" is the inverse operation, whereby a byte stream is converted back
22into an object hierarchy. Pickling (and unpickling) is alternatively known as
23"serialization", "marshalling," [#]_ or "flattening", however, to avoid
Benjamin Petersonbe149d02008-06-20 21:03:22 +000024confusion, the terms used here are "pickling" and "unpickling"..
Georg Brandl116aa622007-08-15 14:28:22 +000025
26
27Relationship to other Python modules
28------------------------------------
29
Benjamin Petersonbe149d02008-06-20 21:03:22 +000030The :mod:`pickle` module has an transparent optimizer (:mod:`_pickle`) written
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +000031in C. It is used whenever available. Otherwise the pure Python implementation is
Benjamin Petersonbe149d02008-06-20 21:03:22 +000032used.
Georg Brandl116aa622007-08-15 14:28:22 +000033
34Python has a more primitive serialization module called :mod:`marshal`, but in
35general :mod:`pickle` should always be the preferred way to serialize Python
36objects. :mod:`marshal` exists primarily to support Python's :file:`.pyc`
37files.
38
39The :mod:`pickle` module differs from :mod:`marshal` several significant ways:
40
41* The :mod:`pickle` module keeps track of the objects it has already serialized,
42 so that later references to the same object won't be serialized again.
43 :mod:`marshal` doesn't do this.
44
45 This has implications both for recursive objects and object sharing. Recursive
46 objects are objects that contain references to themselves. These are not
47 handled by marshal, and in fact, attempting to marshal recursive objects will
48 crash your Python interpreter. Object sharing happens when there are multiple
49 references to the same object in different places in the object hierarchy being
50 serialized. :mod:`pickle` stores such objects only once, and ensures that all
51 other references point to the master copy. Shared objects remain shared, which
52 can be very important for mutable objects.
53
54* :mod:`marshal` cannot be used to serialize user-defined classes and their
55 instances. :mod:`pickle` can save and restore class instances transparently,
56 however the class definition must be importable and live in the same module as
57 when the object was stored.
58
59* The :mod:`marshal` serialization format is not guaranteed to be portable
60 across Python versions. Because its primary job in life is to support
61 :file:`.pyc` files, the Python implementers reserve the right to change the
62 serialization format in non-backwards compatible ways should the need arise.
63 The :mod:`pickle` serialization format is guaranteed to be backwards compatible
64 across Python releases.
65
66.. warning::
67
68 The :mod:`pickle` module is not intended to be secure against erroneous or
Georg Brandle720c0a2009-04-27 16:20:50 +000069 maliciously constructed data. Never unpickle data received from an untrusted
70 or unauthenticated source.
Georg Brandl116aa622007-08-15 14:28:22 +000071
72Note that serialization is a more primitive notion than persistence; although
73:mod:`pickle` reads and writes file objects, it does not handle the issue of
74naming persistent objects, nor the (even more complicated) issue of concurrent
75access to persistent objects. The :mod:`pickle` module can transform a complex
76object into a byte stream and it can transform the byte stream into an object
77with the same internal structure. Perhaps the most obvious thing to do with
78these byte streams is to write them onto a file, but it is also conceivable to
79send them across a network or store them in a database. The module
80:mod:`shelve` provides a simple interface to pickle and unpickle objects on
81DBM-style database files.
82
83
84Data stream format
85------------------
86
87.. index::
88 single: XDR
89 single: External Data Representation
90
91The data format used by :mod:`pickle` is Python-specific. This has the
92advantage that there are no restrictions imposed by external standards such as
93XDR (which can't represent pointer sharing); however it means that non-Python
94programs may not be able to reconstruct pickled Python objects.
95
Alexandre Vassalotti758bca62008-10-18 19:25:07 +000096By default, the :mod:`pickle` data format uses a compact binary representation.
97The module :mod:`pickletools` contains tools for analyzing data streams
98generated by :mod:`pickle`.
Georg Brandl116aa622007-08-15 14:28:22 +000099
Georg Brandl42f2ae02008-04-06 08:39:37 +0000100There are currently 4 different protocols which can be used for pickling.
Georg Brandl116aa622007-08-15 14:28:22 +0000101
Alexandre Vassalottif7d08c72009-01-23 04:50:05 +0000102* Protocol version 0 is the original human-readable protocol and is
103 backwards compatible with earlier versions of Python.
Georg Brandl116aa622007-08-15 14:28:22 +0000104
105* Protocol version 1 is the old binary format which is also compatible with
106 earlier versions of Python.
107
108* Protocol version 2 was introduced in Python 2.3. It provides much more
Georg Brandl9afde1c2007-11-01 20:32:30 +0000109 efficient pickling of :term:`new-style class`\es.
Georg Brandl116aa622007-08-15 14:28:22 +0000110
Georg Brandl42f2ae02008-04-06 08:39:37 +0000111* Protocol version 3 was added in Python 3.0. It has explicit support for
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000112 bytes and cannot be unpickled by Python 2.x pickle modules. This is
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000113 the current recommended protocol, use it whenever it is possible.
Georg Brandl42f2ae02008-04-06 08:39:37 +0000114
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000115Refer to :pep:`307` for information about improvements brought by
116protocol 2. See :mod:`pickletools`'s source code for extensive
117comments about opcodes used by pickle protocols.
Georg Brandl116aa622007-08-15 14:28:22 +0000118
Georg Brandl116aa622007-08-15 14:28:22 +0000119
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000120Module Interface
121----------------
Georg Brandl116aa622007-08-15 14:28:22 +0000122
123To serialize an object hierarchy, you first create a pickler, then you call the
124pickler's :meth:`dump` method. To de-serialize a data stream, you first create
125an unpickler, then you call the unpickler's :meth:`load` method. The
126:mod:`pickle` module provides the following constant:
127
128
129.. data:: HIGHEST_PROTOCOL
130
131 The highest protocol version available. This value can be passed as a
132 *protocol* value.
133
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000134.. data:: DEFAULT_PROTOCOL
135
136 The default protocol used for pickling. May be less than HIGHEST_PROTOCOL.
137 Currently the default protocol is 3; a backward-incompatible protocol
138 designed for Python 3.0.
139
140
Georg Brandl116aa622007-08-15 14:28:22 +0000141The :mod:`pickle` module provides the following functions to make the pickling
142process more convenient:
143
Georg Brandl116aa622007-08-15 14:28:22 +0000144.. function:: dump(obj, file[, protocol])
145
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000146 Write a pickled representation of *obj* to the open file object *file*. This
147 is equivalent to ``Pickler(file, protocol).dump(obj)``.
Georg Brandl116aa622007-08-15 14:28:22 +0000148
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000149 The optional *protocol* argument tells the pickler to use the given protocol;
150 supported protocols are 0, 1, 2, 3. The default protocol is 3; a
151 backward-incompatible protocol designed for Python 3.0.
Georg Brandl116aa622007-08-15 14:28:22 +0000152
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000153 Specifying a negative protocol version selects the highest protocol version
154 supported. The higher the protocol used, the more recent the version of
155 Python needed to read the pickle produced.
Georg Brandl116aa622007-08-15 14:28:22 +0000156
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000157 The *file* argument must have a write() method that accepts a single bytes
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000158 argument. It can thus be a file object opened for binary writing, a
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000159 io.BytesIO instance, or any other custom object that meets this interface.
Georg Brandl116aa622007-08-15 14:28:22 +0000160
161.. function:: dumps(obj[, protocol])
162
Mark Summerfieldb9e23042008-04-21 14:47:45 +0000163 Return the pickled representation of the object as a :class:`bytes`
164 object, instead of writing it to a file.
Georg Brandl116aa622007-08-15 14:28:22 +0000165
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000166 The optional *protocol* argument tells the pickler to use the given protocol;
167 supported protocols are 0, 1, 2, 3. The default protocol is 3; a
168 backward-incompatible protocol designed for Python 3.0.
169
170 Specifying a negative protocol version selects the highest protocol version
171 supported. The higher the protocol used, the more recent the version of
172 Python needed to read the pickle produced.
173
174.. function:: load(file, [\*, encoding="ASCII", errors="strict"])
175
176 Read a pickled object representation from the open file object *file* and
177 return the reconstituted object hierarchy specified therein. This is
178 equivalent to ``Unpickler(file).load()``.
179
180 The protocol version of the pickle is detected automatically, so no protocol
181 argument is needed. Bytes past the pickled object's representation are
182 ignored.
183
184 The argument *file* must have two methods, a read() method that takes an
185 integer argument, and a readline() method that requires no arguments. Both
186 methods should return bytes. Thus *file* can be a binary file object opened
187 for reading, a BytesIO object, or any other custom object that meets this
188 interface.
189
190 Optional keyword arguments are encoding and errors, which are used to decode
191 8-bit string instances pickled by Python 2.x. These default to 'ASCII' and
192 'strict', respectively.
193
194.. function:: loads(bytes_object, [\*, encoding="ASCII", errors="strict"])
195
196 Read a pickled object hierarchy from a :class:`bytes` object and return the
197 reconstituted object hierarchy specified therein
198
199 The protocol version of the pickle is detected automatically, so no protocol
200 argument is needed. Bytes past the pickled object's representation are
201 ignored.
202
203 Optional keyword arguments are encoding and errors, which are used to decode
204 8-bit string instances pickled by Python 2.x. These default to 'ASCII' and
205 'strict', respectively.
Georg Brandl116aa622007-08-15 14:28:22 +0000206
Georg Brandl116aa622007-08-15 14:28:22 +0000207
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000208The :mod:`pickle` module defines three exceptions:
Georg Brandl116aa622007-08-15 14:28:22 +0000209
210.. exception:: PickleError
211
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000212 Common base class for the other pickling exceptions. It inherits
Georg Brandl116aa622007-08-15 14:28:22 +0000213 :exc:`Exception`.
214
Georg Brandl116aa622007-08-15 14:28:22 +0000215.. exception:: PicklingError
216
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000217 Error raised when an unpicklable object is encountered by :class:`Pickler`.
218 It inherits :exc:`PickleError`.
Georg Brandl116aa622007-08-15 14:28:22 +0000219
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000220 Refer to :ref:`pickle-picklable` to learn what kinds of objects can be
221 pickled.
222
Georg Brandl116aa622007-08-15 14:28:22 +0000223.. exception:: UnpicklingError
224
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000225 Error raised when there a problem unpickling an object, such as a data
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000226 corruption or a security violation. It inherits :exc:`PickleError`.
Georg Brandl116aa622007-08-15 14:28:22 +0000227
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000228 Note that other exceptions may also be raised during unpickling, including
229 (but not necessarily limited to) AttributeError, EOFError, ImportError, and
230 IndexError.
231
232
233The :mod:`pickle` module exports two classes, :class:`Pickler` and
Georg Brandl116aa622007-08-15 14:28:22 +0000234:class:`Unpickler`:
235
Georg Brandl116aa622007-08-15 14:28:22 +0000236.. class:: Pickler(file[, protocol])
237
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000238 This takes a binary file for writing a pickle data stream.
Georg Brandl116aa622007-08-15 14:28:22 +0000239
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000240 The optional *protocol* argument tells the pickler to use the given protocol;
241 supported protocols are 0, 1, 2, 3. The default protocol is 3; a
242 backward-incompatible protocol designed for Python 3.0.
Georg Brandl116aa622007-08-15 14:28:22 +0000243
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000244 Specifying a negative protocol version selects the highest protocol version
245 supported. The higher the protocol used, the more recent the version of
246 Python needed to read the pickle produced.
Georg Brandl116aa622007-08-15 14:28:22 +0000247
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000248 The *file* argument must have a write() method that accepts a single bytes
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000249 argument. It can thus be a file object opened for binary writing, a
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000250 io.BytesIO instance, or any other custom object that meets this interface.
Georg Brandl116aa622007-08-15 14:28:22 +0000251
Benjamin Petersone41251e2008-04-25 01:59:09 +0000252 .. method:: dump(obj)
Georg Brandl116aa622007-08-15 14:28:22 +0000253
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000254 Write a pickled representation of *obj* to the open file object given in
255 the constructor.
Georg Brandl116aa622007-08-15 14:28:22 +0000256
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000257 .. method:: persistent_id(obj)
258
259 Do nothing by default. This exists so a subclass can override it.
260
261 If :meth:`persistent_id` returns ``None``, *obj* is pickled as usual. Any
262 other value causes :class:`Pickler` to emit the returned value as a
263 persistent ID for *obj*. The meaning of this persistent ID should be
264 defined by :meth:`Unpickler.persistent_load`. Note that the value
265 returned by :meth:`persistent_id` cannot itself have a persistent ID.
266
267 See :ref:`pickle-persistent` for details and examples of uses.
Georg Brandl116aa622007-08-15 14:28:22 +0000268
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000269 .. attribute:: fast
270
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000271 Deprecated. Enable fast mode if set to a true value. The fast mode
272 disables the usage of memo, therefore speeding the pickling process by not
273 generating superfluous PUT opcodes. It should not be used with
274 self-referential objects, doing otherwise will cause :class:`Pickler` to
275 recurse infinitely.
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000276
277 Use :func:`pickletools.optimize` if you need more compact pickles.
278
Georg Brandl116aa622007-08-15 14:28:22 +0000279
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000280.. class:: Unpickler(file, [\*, encoding="ASCII", errors="strict"])
Georg Brandl116aa622007-08-15 14:28:22 +0000281
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000282 This takes a binary file for reading a pickle data stream.
Georg Brandl116aa622007-08-15 14:28:22 +0000283
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000284 The protocol version of the pickle is detected automatically, so no
285 protocol argument is needed.
286
287 The argument *file* must have two methods, a read() method that takes an
288 integer argument, and a readline() method that requires no arguments. Both
289 methods should return bytes. Thus *file* can be a binary file object opened
290 for reading, a BytesIO object, or any other custom object that meets this
Georg Brandl116aa622007-08-15 14:28:22 +0000291 interface.
292
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000293 Optional keyword arguments are encoding and errors, which are used to decode
294 8-bit string instances pickled by Python 2.x. These default to 'ASCII' and
295 'strict', respectively.
Georg Brandl116aa622007-08-15 14:28:22 +0000296
Benjamin Petersone41251e2008-04-25 01:59:09 +0000297 .. method:: load()
Georg Brandl116aa622007-08-15 14:28:22 +0000298
Benjamin Petersone41251e2008-04-25 01:59:09 +0000299 Read a pickled object representation from the open file object given in
300 the constructor, and return the reconstituted object hierarchy specified
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000301 therein. Bytes past the pickled object's representation are ignored.
Georg Brandl116aa622007-08-15 14:28:22 +0000302
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000303 .. method:: persistent_load(pid)
Georg Brandl116aa622007-08-15 14:28:22 +0000304
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000305 Raise an :exc:`UnpickingError` by default.
Georg Brandl116aa622007-08-15 14:28:22 +0000306
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000307 If defined, :meth:`persistent_load` should return the object specified by
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000308 the persistent ID *pid*. If an invalid persistent ID is encountered, an
309 :exc:`UnpickingError` should be raised.
Georg Brandl116aa622007-08-15 14:28:22 +0000310
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000311 See :ref:`pickle-persistent` for details and examples of uses.
312
313 .. method:: find_class(module, name)
314
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000315 Import *module* if necessary and return the object called *name* from it,
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000316 where the *module* and *name* arguments are :class:`str` objects. Note,
317 unlike its name suggests, :meth:`find_class` is also used for finding
318 functions.
Georg Brandl116aa622007-08-15 14:28:22 +0000319
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000320 Subclasses may override this to gain control over what type of objects and
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000321 how they can be loaded, potentially reducing security risks. Refer to
322 :ref:`pickle-restrict` for details.
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000323
324
325.. _pickle-picklable:
Georg Brandl116aa622007-08-15 14:28:22 +0000326
327What can be pickled and unpickled?
328----------------------------------
329
330The following types can be pickled:
331
332* ``None``, ``True``, and ``False``
333
Georg Brandlba956ae2007-11-29 17:24:34 +0000334* integers, floating point numbers, complex numbers
Georg Brandl116aa622007-08-15 14:28:22 +0000335
Georg Brandlf6945182008-02-01 11:56:49 +0000336* strings, bytes, bytearrays
Georg Brandl116aa622007-08-15 14:28:22 +0000337
338* tuples, lists, sets, and dictionaries containing only picklable objects
339
340* functions defined at the top level of a module
341
342* built-in functions defined at the top level of a module
343
344* classes that are defined at the top level of a module
345
346* instances of such classes whose :attr:`__dict__` or :meth:`__setstate__` is
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000347 picklable (see section :ref:`pickle-inst` for details)
Georg Brandl116aa622007-08-15 14:28:22 +0000348
349Attempts to pickle unpicklable objects will raise the :exc:`PicklingError`
350exception; when this happens, an unspecified number of bytes may have already
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000351been written to the underlying file. Trying to pickle a highly recursive data
Georg Brandl116aa622007-08-15 14:28:22 +0000352structure may exceed the maximum recursion depth, a :exc:`RuntimeError` will be
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000353raised in this case. You can carefully raise this limit with
Georg Brandl116aa622007-08-15 14:28:22 +0000354:func:`sys.setrecursionlimit`.
355
356Note that functions (built-in and user-defined) are pickled by "fully qualified"
357name reference, not by value. This means that only the function name is
358pickled, along with the name of module the function is defined in. Neither the
359function's code, nor any of its function attributes are pickled. Thus the
360defining module must be importable in the unpickling environment, and the module
361must contain the named object, otherwise an exception will be raised. [#]_
362
363Similarly, classes are pickled by named reference, so the same restrictions in
364the unpickling environment apply. Note that none of the class's code or data is
365pickled, so in the following example the class attribute ``attr`` is not
366restored in the unpickling environment::
367
368 class Foo:
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000369 attr = 'A class attribute'
Georg Brandl116aa622007-08-15 14:28:22 +0000370
371 picklestring = pickle.dumps(Foo)
372
373These restrictions are why picklable functions and classes must be defined in
374the top level of a module.
375
376Similarly, when class instances are pickled, their class's code and data are not
377pickled along with them. Only the instance data are pickled. This is done on
378purpose, so you can fix bugs in a class or add methods to the class and still
379load objects that were created with an earlier version of the class. If you
380plan to have long-lived objects that will see many versions of a class, it may
381be worthwhile to put a version number in the objects so that suitable
382conversions can be made by the class's :meth:`__setstate__` method.
383
384
Georg Brandl116aa622007-08-15 14:28:22 +0000385.. _pickle-inst:
386
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000387Pickling Class Instances
388------------------------
Georg Brandl116aa622007-08-15 14:28:22 +0000389
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000390In this section, we describe the general mechanisms available to you to define,
391customize, and control how class instances are pickled and unpickled.
Georg Brandl116aa622007-08-15 14:28:22 +0000392
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000393In most cases, no additional code is needed to make instances picklable. By
394default, pickle will retrieve the class and the attributes of an instance via
395introspection. When a class instance is unpickled, its :meth:`__init__` method
396is usually *not* invoked. The default behaviour first creates an uninitialized
397instance and then restores the saved attributes. The following code shows an
398implementation of this behaviour::
Georg Brandl85eb8c12007-08-31 16:33:38 +0000399
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000400 def save(obj):
401 return (obj.__class__, obj.__dict__)
402
403 def load(cls, attributes):
404 obj = cls.__new__(cls)
405 obj.__dict__.update(attributes)
406 return obj
Georg Brandl116aa622007-08-15 14:28:22 +0000407
408.. index:: single: __getnewargs__() (copy protocol)
409
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000410Classes can alter the default behaviour by providing one or severals special
411methods. In protocol 2 and newer, classes that implements the
412:meth:`__getnewargs__` method can dictate the values passed to the
413:meth:`__new__` method upon unpickling. This is often needed for classes
414whose :meth:`__new__` method requires arguments.
Georg Brandl116aa622007-08-15 14:28:22 +0000415
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000416.. index:: single: __getstate__() (copy protocol)
Georg Brandl116aa622007-08-15 14:28:22 +0000417
418Classes can further influence how their instances are pickled; if the class
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000419defines the method :meth:`__getstate__`, it is called and the returned object is
Georg Brandl116aa622007-08-15 14:28:22 +0000420pickled as the contents for the instance, instead of the contents of the
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000421instance's dictionary. If the :meth:`__getstate__` method is absent, the
422instance's :attr:`__dict__` is pickled as usual.
Georg Brandl116aa622007-08-15 14:28:22 +0000423
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000424.. index:: single: __setstate__() (copy protocol)
Georg Brandl116aa622007-08-15 14:28:22 +0000425
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000426Upon unpickling, if the class defines :meth:`__setstate__`, it is called with
427the unpickled state. In that case, there is no requirement for the state object
428to be a dictionary. Otherwise, the pickled state must be a dictionary and its
429items are assigned to the new instance's dictionary.
430
431.. note::
Georg Brandl116aa622007-08-15 14:28:22 +0000432
Georg Brandl23e8db52008-04-07 19:17:06 +0000433 If :meth:`__getstate__` returns a false value, the :meth:`__setstate__`
434 method will not be called.
Georg Brandl116aa622007-08-15 14:28:22 +0000435
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000436Refer to the section :ref:`pickle-state` for more information about how to use
437the methods :meth:`__getstate__` and :meth:`__setstate__`.
Georg Brandl116aa622007-08-15 14:28:22 +0000438
Benjamin Petersond23f8222009-04-05 19:13:16 +0000439.. note::
Georg Brandle720c0a2009-04-27 16:20:50 +0000440
Benjamin Petersond23f8222009-04-05 19:13:16 +0000441 At unpickling time, some methods like :meth:`__getattr__`,
442 :meth:`__getattribute__`, or :meth:`__setattr__` may be called upon the
443 instance. In case those methods rely on some internal invariant being
444 true, the type should implement either :meth:`__getinitargs__` or
445 :meth:`__getnewargs__` to establish such an invariant; otherwise, neither
446 :meth:`__new__` nor :meth:`__init__` will be called.
447
Christian Heimes05e8be12008-02-23 18:30:17 +0000448.. index::
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000449 pair: copy; protocol
450 single: __reduce__() (copy protocol)
Christian Heimes05e8be12008-02-23 18:30:17 +0000451
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000452As we shall see, pickle does not use directly the methods described above. In
453fact, these methods are part of the copy protocol which implements the
454:meth:`__reduce__` special method. The copy protocol provides a unified
455interface for retrieving the data necessary for pickling and copying
Georg Brandl48310cd2009-01-03 21:18:54 +0000456objects. [#]_
Georg Brandl116aa622007-08-15 14:28:22 +0000457
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000458Although powerful, implementing :meth:`__reduce__` directly in your classes is
459error prone. For this reason, class designers should use the high-level
460interface (i.e., :meth:`__getnewargs__`, :meth:`__getstate__` and
Georg Brandlae2dbe22009-03-13 19:04:40 +0000461:meth:`__setstate__`) whenever possible. We will show, however, cases where using
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000462:meth:`__reduce__` is the only option or leads to more efficient pickling or
463both.
Georg Brandl116aa622007-08-15 14:28:22 +0000464
Georg Brandlae2dbe22009-03-13 19:04:40 +0000465The interface is currently defined as follows. The :meth:`__reduce__` method
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000466takes no argument and shall return either a string or preferably a tuple (the
Georg Brandlae2dbe22009-03-13 19:04:40 +0000467returned object is often referred to as the "reduce value").
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000468
469If a string is returned, the string should be interpreted as the name of a
470global variable. It should be the object's local name relative to its module;
471the pickle module searches the module namespace to determine the object's
472module. This behaviour is typically useful for singletons.
473
474When a tuple is returned, it must be between two and five items long. Optional
475items can either be omitted, or ``None`` can be provided as their value. The
476semantics of each item are in order:
477
478.. XXX Mention __newobj__ special-case?
Georg Brandl116aa622007-08-15 14:28:22 +0000479
480* A callable object that will be called to create the initial version of the
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000481 object.
Georg Brandl116aa622007-08-15 14:28:22 +0000482
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000483* A tuple of arguments for the callable object. An empty tuple must be given if
484 the callable does not accept any argument.
Georg Brandl116aa622007-08-15 14:28:22 +0000485
486* Optionally, the object's state, which will be passed to the object's
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000487 :meth:`__setstate__` method as previously described. If the object has no
488 such method then, the value must be a dictionary and it will be added to the
489 object's :attr:`__dict__` attribute.
Georg Brandl116aa622007-08-15 14:28:22 +0000490
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000491* Optionally, an iterator (and not a sequence) yielding successive items. These
492 items will be appended to the object either using ``obj.append(item)`` or, in
493 batch, using ``obj.extend(list_of_items)``. This is primarily used for list
494 subclasses, but may be used by other classes as long as they have
Georg Brandl116aa622007-08-15 14:28:22 +0000495 :meth:`append` and :meth:`extend` methods with the appropriate signature.
496 (Whether :meth:`append` or :meth:`extend` is used depends on which pickle
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000497 protocol version is used as well as the number of items to append, so both
498 must be supported.)
Georg Brandl116aa622007-08-15 14:28:22 +0000499
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000500* Optionally, an iterator (not a sequence) yielding successive key-value pairs.
501 These items will be stored to the object using ``obj[key] = value``. This is
502 primarily used for dictionary subclasses, but may be used by other classes as
503 long as they implement :meth:`__setitem__`.
Georg Brandl116aa622007-08-15 14:28:22 +0000504
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000505.. index:: single: __reduce_ex__() (copy protocol)
Georg Brandl116aa622007-08-15 14:28:22 +0000506
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000507Alternatively, a :meth:`__reduce_ex__` method may be defined. The only
508difference is this method should take a single integer argument, the protocol
509version. When defined, pickle will prefer it over the :meth:`__reduce__`
510method. In addition, :meth:`__reduce__` automatically becomes a synonym for the
511extended version. The main use for this method is to provide
512backwards-compatible reduce values for older Python releases.
Georg Brandl116aa622007-08-15 14:28:22 +0000513
Alexandre Vassalotti758bca62008-10-18 19:25:07 +0000514.. _pickle-persistent:
515
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000516Persistence of External Objects
517^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Georg Brandl116aa622007-08-15 14:28:22 +0000518
Christian Heimes05e8be12008-02-23 18:30:17 +0000519.. index::
520 single: persistent_id (pickle protocol)
521 single: persistent_load (pickle protocol)
522
Georg Brandl116aa622007-08-15 14:28:22 +0000523For the benefit of object persistence, the :mod:`pickle` module supports the
524notion of a reference to an object outside the pickled data stream. Such
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000525objects are referenced by a persistent ID, which should be either a string of
526alphanumeric characters (for protocol 0) [#]_ or just an arbitrary object (for
527any newer protocol).
Georg Brandl116aa622007-08-15 14:28:22 +0000528
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000529The resolution of such persistent IDs is not defined by the :mod:`pickle`
530module; it will delegate this resolution to the user defined methods on the
531pickler and unpickler, :meth:`persistent_id` and :meth:`persistent_load`
532respectively.
Georg Brandl116aa622007-08-15 14:28:22 +0000533
534To pickle objects that have an external persistent id, the pickler must have a
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000535custom :meth:`persistent_id` method that takes an object as an argument and
Georg Brandl116aa622007-08-15 14:28:22 +0000536returns either ``None`` or the persistent id for that object. When ``None`` is
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000537returned, the pickler simply pickles the object as normal. When a persistent ID
538string is returned, the pickler will pickle that object, along with a marker so
539that the unpickler will recognize it as a persistent ID.
Georg Brandl116aa622007-08-15 14:28:22 +0000540
541To unpickle external objects, the unpickler must have a custom
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000542:meth:`persistent_load` method that takes a persistent ID object and returns the
543referenced object.
Georg Brandl116aa622007-08-15 14:28:22 +0000544
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000545Here is a comprehensive example presenting how persistent ID can be used to
546pickle external objects by reference.
Georg Brandl116aa622007-08-15 14:28:22 +0000547
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000548.. literalinclude:: ../includes/dbpickle.py
Alexandre Vassalottibcd1e3a2009-01-23 05:28:16 +0000549
Georg Brandl116aa622007-08-15 14:28:22 +0000550
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000551.. _pickle-state:
552
553Handling Stateful Objects
554^^^^^^^^^^^^^^^^^^^^^^^^^
555
556.. index::
557 single: __getstate__() (copy protocol)
558 single: __setstate__() (copy protocol)
559
560Here's an example that shows how to modify pickling behavior for a class.
561The :class:`TextReader` class opens a text file, and returns the line number and
562line contents each time its :meth:`readline` method is called. If a
563:class:`TextReader` instance is pickled, all attributes *except* the file object
564member are saved. When the instance is unpickled, the file is reopened, and
565reading resumes from the last location. The :meth:`__setstate__` and
566:meth:`__getstate__` methods are used to implement this behavior. ::
567
568 class TextReader:
569 """Print and number lines in a text file."""
570
571 def __init__(self, filename):
572 self.filename = filename
573 self.file = open(filename)
574 self.lineno = 0
575
576 def readline(self):
577 self.lineno += 1
578 line = self.file.readline()
579 if not line:
580 return None
Alexandre Vassalotti9d7665d2009-04-03 06:13:29 +0000581 if line.endswith('\n'):
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000582 line = line[:-1]
583 return "%i: %s" % (self.lineno, line)
584
585 def __getstate__(self):
586 # Copy the object's state from self.__dict__ which contains
587 # all our instance attributes. Always use the dict.copy()
588 # method to avoid modifying the original state.
589 state = self.__dict__.copy()
590 # Remove the unpicklable entries.
591 del state['file']
592 return state
593
594 def __setstate__(self, state):
595 # Restore instance attributes (i.e., filename and lineno).
596 self.__dict__.update(state)
597 # Restore the previously opened file's state. To do so, we need to
598 # reopen it and read from it until the line count is restored.
599 file = open(self.filename)
600 for _ in range(self.lineno):
601 file.readline()
602 # Finally, save the file.
603 self.file = file
604
605
606A sample usage might be something like this::
607
608 >>> reader = TextReader("hello.txt")
609 >>> reader.readline()
610 '1: Hello world!'
611 >>> reader.readline()
612 '2: I am line number two.'
613 >>> new_reader = pickle.loads(pickle.dumps(reader))
614 >>> new_reader.readline()
615 '3: Goodbye!'
616
617
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000618.. _pickle-restrict:
Georg Brandl116aa622007-08-15 14:28:22 +0000619
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000620Restricting Globals
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000621-------------------
Georg Brandl116aa622007-08-15 14:28:22 +0000622
Christian Heimes05e8be12008-02-23 18:30:17 +0000623.. index::
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000624 single: find_class() (pickle protocol)
Christian Heimes05e8be12008-02-23 18:30:17 +0000625
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000626By default, unpickling will import any class or function that it finds in the
627pickle data. For many applications, this behaviour is unacceptable as it
628permits the unpickler to import and invoke arbitrary code. Just consider what
629this hand-crafted pickle data stream does when loaded::
Georg Brandl116aa622007-08-15 14:28:22 +0000630
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000631 >>> import pickle
632 >>> pickle.loads(b"cos\nsystem\n(S'echo hello world'\ntR.")
633 hello world
634 0
Georg Brandl116aa622007-08-15 14:28:22 +0000635
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000636In this example, the unpickler imports the :func:`os.system` function and then
637apply the string argument "echo hello world". Although this example is
638inoffensive, it is not difficult to imagine one that could damage your system.
Georg Brandl116aa622007-08-15 14:28:22 +0000639
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000640For this reason, you may want to control what gets unpickled by customizing
641:meth:`Unpickler.find_class`. Unlike its name suggests, :meth:`find_class` is
642called whenever a global (i.e., a class or a function) is requested. Thus it is
643possible to either forbid completely globals or restrict them to a safe subset.
644
645Here is an example of an unpickler allowing only few safe classes from the
646:mod:`builtins` module to be loaded::
647
648 import builtins
649 import io
650 import pickle
651
652 safe_builtins = {
653 'range',
654 'complex',
655 'set',
656 'frozenset',
657 'slice',
658 }
659
660 class RestrictedUnpickler(pickle.Unpickler):
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000661
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000662 def find_class(self, module, name):
663 # Only allow safe classes from builtins.
664 if module == "builtins" and name in safe_builtins:
665 return getattr(builtins, name)
666 # Forbid everything else.
667 raise pickle.UnpicklingError("global '%s.%s' is forbidden" %
668 (module, name))
669
670 def restricted_loads(s):
671 """Helper function analogous to pickle.loads()."""
672 return RestrictedUnpickler(io.BytesIO(s)).load()
673
674A sample usage of our unpickler working has intended::
675
676 >>> restricted_loads(pickle.dumps([1, 2, range(15)]))
677 [1, 2, range(0, 15)]
678 >>> restricted_loads(b"cos\nsystem\n(S'echo hello world'\ntR.")
679 Traceback (most recent call last):
680 ...
681 pickle.UnpicklingError: global 'os.system' is forbidden
682 >>> restricted_loads(b'cbuiltins\neval\n'
683 ... b'(S\'getattr(__import__("os"), "system")'
684 ... b'("echo hello world")\'\ntR.')
685 Traceback (most recent call last):
686 ...
687 pickle.UnpicklingError: global 'builtins.eval' is forbidden
688
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000689
690.. XXX Add note about how extension codes could evade our protection
Georg Brandl48310cd2009-01-03 21:18:54 +0000691 mechanism (e.g. cached classes do not invokes find_class()).
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000692
693As our examples shows, you have to be careful with what you allow to be
694unpickled. Therefore if security is a concern, you may want to consider
Alexandre Vassalotti9d7665d2009-04-03 06:13:29 +0000695alternatives such as the marshalling API in :mod:`xmlrpc.client` or
696third-party solutions.
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000697
Georg Brandl116aa622007-08-15 14:28:22 +0000698
699.. _pickle-example:
700
Alexandre Vassalotti9d7665d2009-04-03 06:13:29 +0000701Examples
702--------
Georg Brandl116aa622007-08-15 14:28:22 +0000703
Alexandre Vassalotti9d7665d2009-04-03 06:13:29 +0000704For the simplest code, use the :func:`dump` and :func:`load` functions. ::
Georg Brandl116aa622007-08-15 14:28:22 +0000705
706 import pickle
707
Alexandre Vassalottibcd1e3a2009-01-23 05:28:16 +0000708 # An arbitrary collection of objects supported by pickle.
709 data = {
Alexandre Vassalotti9d7665d2009-04-03 06:13:29 +0000710 'a': [1, 2.0, 3, 4+6j],
711 'b': ("character string", b"byte string"),
712 'c': set([None, True, False])
Alexandre Vassalottibcd1e3a2009-01-23 05:28:16 +0000713 }
Georg Brandl116aa622007-08-15 14:28:22 +0000714
Alexandre Vassalottibcd1e3a2009-01-23 05:28:16 +0000715 with open('data.pickle', 'wb') as f:
716 # Pickle the 'data' dictionary using the highest protocol available.
717 pickle.dump(data, f, pickle.HIGHEST_PROTOCOL)
Georg Brandl116aa622007-08-15 14:28:22 +0000718
Georg Brandl116aa622007-08-15 14:28:22 +0000719
Alexandre Vassalottibcd1e3a2009-01-23 05:28:16 +0000720The following example reads the resulting pickled data. ::
Georg Brandl116aa622007-08-15 14:28:22 +0000721
Alexandre Vassalottibcd1e3a2009-01-23 05:28:16 +0000722 import pickle
Georg Brandl116aa622007-08-15 14:28:22 +0000723
Alexandre Vassalottibcd1e3a2009-01-23 05:28:16 +0000724 with open('data.pickle', 'rb') as f:
725 # The protocol version used is detected automatically, so we do not
726 # have to specify it.
727 data = pickle.load(f)
Georg Brandl116aa622007-08-15 14:28:22 +0000728
Georg Brandl116aa622007-08-15 14:28:22 +0000729
Alexandre Vassalotti9d7665d2009-04-03 06:13:29 +0000730.. XXX: Add examples showing how to optimize pickles for size (like using
731.. pickletools.optimize() or the gzip module).
732
733
Georg Brandl116aa622007-08-15 14:28:22 +0000734.. seealso::
735
Alexandre Vassalottif7fa63d2008-05-11 08:55:36 +0000736 Module :mod:`copyreg`
Georg Brandl116aa622007-08-15 14:28:22 +0000737 Pickle interface constructor registration for extension types.
738
Alexandre Vassalotti9d7665d2009-04-03 06:13:29 +0000739 Module :mod:`pickletools`
740 Tools for working with and analyzing pickled data.
741
Georg Brandl116aa622007-08-15 14:28:22 +0000742 Module :mod:`shelve`
743 Indexed databases of objects; uses :mod:`pickle`.
744
745 Module :mod:`copy`
746 Shallow and deep object copying.
747
748 Module :mod:`marshal`
749 High-performance serialization of built-in types.
750
751
Georg Brandl116aa622007-08-15 14:28:22 +0000752.. rubric:: Footnotes
753
754.. [#] Don't confuse this with the :mod:`marshal` module
755
Georg Brandl116aa622007-08-15 14:28:22 +0000756.. [#] The exception raised will likely be an :exc:`ImportError` or an
757 :exc:`AttributeError` but it could be something else.
758
Alexandre Vassalotti73b90a82008-10-29 23:32:33 +0000759.. [#] The :mod:`copy` module uses this protocol for shallow and deep copying
760 operations.
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000761
Alexandre Vassalottid0392862008-10-24 01:32:40 +0000762.. [#] The limitation on alphanumeric characters is due to the fact
763 the persistent IDs, in protocol 0, are delimited by the newline
764 character. Therefore if any kind of newline characters occurs in
Alexandre Vassalotti5f3b63a2008-10-18 20:47:58 +0000765 persistent IDs, the resulting pickle will become unreadable.