blob: dace18a3a972e4d6d92f97e627427958bc91ef49 [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001
2:mod:`pickle` --- Python object serialization
3=============================================
4
5.. index::
6 single: persistence
7 pair: persistent; objects
8 pair: serializing; objects
9 pair: marshalling; objects
10 pair: flattening; objects
11 pair: pickling; objects
12
13.. module:: pickle
14 :synopsis: Convert Python objects to streams of bytes and back.
15
16
17.. % Substantial improvements by Jim Kerr <jbkerr@sr.hp.com>.
18.. % Rewritten by Barry Warsaw <barry@zope.com>
19
20The :mod:`pickle` module implements a fundamental, but powerful algorithm for
21serializing and de-serializing a Python object structure. "Pickling" is the
22process whereby a Python object hierarchy is converted into a byte stream, and
23"unpickling" is the inverse operation, whereby a byte stream is converted back
24into an object hierarchy. Pickling (and unpickling) is alternatively known as
25"serialization", "marshalling," [#]_ or "flattening", however, to avoid
26confusion, the terms used here are "pickling" and "unpickling".
27
28This documentation describes both the :mod:`pickle` module and the
29:mod:`cPickle` module.
30
31
32Relationship to other Python modules
33------------------------------------
34
35The :mod:`pickle` module has an optimized cousin called the :mod:`cPickle`
36module. As its name implies, :mod:`cPickle` is written in C, so it can be up to
371000 times faster than :mod:`pickle`. However it does not support subclassing
38of the :func:`Pickler` and :func:`Unpickler` classes, because in :mod:`cPickle`
39these are functions, not classes. Most applications have no need for this
40functionality, and can benefit from the improved performance of :mod:`cPickle`.
41Other than that, the interfaces of the two modules are nearly identical; the
42common interface is described in this manual and differences are pointed out
43where necessary. In the following discussions, we use the term "pickle" to
44collectively describe the :mod:`pickle` and :mod:`cPickle` modules.
45
46The data streams the two modules produce are guaranteed to be interchangeable.
47
48Python has a more primitive serialization module called :mod:`marshal`, but in
49general :mod:`pickle` should always be the preferred way to serialize Python
50objects. :mod:`marshal` exists primarily to support Python's :file:`.pyc`
51files.
52
53The :mod:`pickle` module differs from :mod:`marshal` several significant ways:
54
55* The :mod:`pickle` module keeps track of the objects it has already serialized,
56 so that later references to the same object won't be serialized again.
57 :mod:`marshal` doesn't do this.
58
59 This has implications both for recursive objects and object sharing. Recursive
60 objects are objects that contain references to themselves. These are not
61 handled by marshal, and in fact, attempting to marshal recursive objects will
62 crash your Python interpreter. Object sharing happens when there are multiple
63 references to the same object in different places in the object hierarchy being
64 serialized. :mod:`pickle` stores such objects only once, and ensures that all
65 other references point to the master copy. Shared objects remain shared, which
66 can be very important for mutable objects.
67
68* :mod:`marshal` cannot be used to serialize user-defined classes and their
69 instances. :mod:`pickle` can save and restore class instances transparently,
70 however the class definition must be importable and live in the same module as
71 when the object was stored.
72
73* The :mod:`marshal` serialization format is not guaranteed to be portable
74 across Python versions. Because its primary job in life is to support
75 :file:`.pyc` files, the Python implementers reserve the right to change the
76 serialization format in non-backwards compatible ways should the need arise.
77 The :mod:`pickle` serialization format is guaranteed to be backwards compatible
78 across Python releases.
79
80.. warning::
81
82 The :mod:`pickle` module is not intended to be secure against erroneous or
83 maliciously constructed data. Never unpickle data received from an untrusted or
84 unauthenticated source.
85
86Note that serialization is a more primitive notion than persistence; although
87:mod:`pickle` reads and writes file objects, it does not handle the issue of
88naming persistent objects, nor the (even more complicated) issue of concurrent
89access to persistent objects. The :mod:`pickle` module can transform a complex
90object into a byte stream and it can transform the byte stream into an object
91with the same internal structure. Perhaps the most obvious thing to do with
92these byte streams is to write them onto a file, but it is also conceivable to
93send them across a network or store them in a database. The module
94:mod:`shelve` provides a simple interface to pickle and unpickle objects on
95DBM-style database files.
96
97
98Data stream format
99------------------
100
101.. index::
102 single: XDR
103 single: External Data Representation
104
105The data format used by :mod:`pickle` is Python-specific. This has the
106advantage that there are no restrictions imposed by external standards such as
107XDR (which can't represent pointer sharing); however it means that non-Python
108programs may not be able to reconstruct pickled Python objects.
109
110By default, the :mod:`pickle` data format uses a printable ASCII representation.
111This is slightly more voluminous than a binary representation. The big
112advantage of using printable ASCII (and of some other characteristics of
113:mod:`pickle`'s representation) is that for debugging or recovery purposes it is
114possible for a human to read the pickled file with a standard text editor.
115
116There are currently 3 different protocols which can be used for pickling.
117
118* Protocol version 0 is the original ASCII protocol and is backwards compatible
119 with earlier versions of Python.
120
121* Protocol version 1 is the old binary format which is also compatible with
122 earlier versions of Python.
123
124* Protocol version 2 was introduced in Python 2.3. It provides much more
125 efficient pickling of new-style classes.
126
127Refer to :pep:`307` for more information.
128
129If a *protocol* is not specified, protocol 0 is used. If *protocol* is specified
130as a negative value or :const:`HIGHEST_PROTOCOL`, the highest protocol version
131available will be used.
132
Georg Brandl116aa622007-08-15 14:28:22 +0000133A binary format, which is slightly more efficient, can be chosen by specifying a
134*protocol* version >= 1.
135
136
137Usage
138-----
139
140To serialize an object hierarchy, you first create a pickler, then you call the
141pickler's :meth:`dump` method. To de-serialize a data stream, you first create
142an unpickler, then you call the unpickler's :meth:`load` method. The
143:mod:`pickle` module provides the following constant:
144
145
146.. data:: HIGHEST_PROTOCOL
147
148 The highest protocol version available. This value can be passed as a
149 *protocol* value.
150
Georg Brandl116aa622007-08-15 14:28:22 +0000151.. note::
152
153 Be sure to always open pickle files created with protocols >= 1 in binary mode.
154 For the old ASCII-based pickle protocol 0 you can use either text mode or binary
155 mode as long as you stay consistent.
156
157 A pickle file written with protocol 0 in binary mode will contain lone linefeeds
158 as line terminators and therefore will look "funny" when viewed in Notepad or
159 other editors which do not support this format.
160
161The :mod:`pickle` module provides the following functions to make the pickling
162process more convenient:
163
164
165.. function:: dump(obj, file[, protocol])
166
167 Write a pickled representation of *obj* to the open file object *file*. This is
168 equivalent to ``Pickler(file, protocol).dump(obj)``.
169
170 If the *protocol* parameter is omitted, protocol 0 is used. If *protocol* is
171 specified as a negative value or :const:`HIGHEST_PROTOCOL`, the highest protocol
172 version will be used.
173
Georg Brandl116aa622007-08-15 14:28:22 +0000174 *file* must have a :meth:`write` method that accepts a single string argument.
175 It can thus be a file object opened for writing, a :mod:`StringIO` object, or
176 any other custom object that meets this interface.
177
178
179.. function:: load(file)
180
181 Read a string from the open file object *file* and interpret it as a pickle data
182 stream, reconstructing and returning the original object hierarchy. This is
183 equivalent to ``Unpickler(file).load()``.
184
185 *file* must have two methods, a :meth:`read` method that takes an integer
186 argument, and a :meth:`readline` method that requires no arguments. Both
187 methods should return a string. Thus *file* can be a file object opened for
188 reading, a :mod:`StringIO` object, or any other custom object that meets this
189 interface.
190
191 This function automatically determines whether the data stream was written in
192 binary mode or not.
193
194
195.. function:: dumps(obj[, protocol])
196
197 Return the pickled representation of the object as a string, instead of writing
198 it to a file.
199
200 If the *protocol* parameter is omitted, protocol 0 is used. If *protocol* is
201 specified as a negative value or :const:`HIGHEST_PROTOCOL`, the highest protocol
202 version will be used.
203
Georg Brandl116aa622007-08-15 14:28:22 +0000204
205.. function:: loads(string)
206
207 Read a pickled object hierarchy from a string. Characters in the string past
208 the pickled object's representation are ignored.
209
210The :mod:`pickle` module also defines three exceptions:
211
212
213.. exception:: PickleError
214
215 A common base class for the other exceptions defined below. This inherits from
216 :exc:`Exception`.
217
218
219.. exception:: PicklingError
220
221 This exception is raised when an unpicklable object is passed to the
222 :meth:`dump` method.
223
224
225.. exception:: UnpicklingError
226
227 This exception is raised when there is a problem unpickling an object. Note that
228 other exceptions may also be raised during unpickling, including (but not
229 necessarily limited to) :exc:`AttributeError`, :exc:`EOFError`,
230 :exc:`ImportError`, and :exc:`IndexError`.
231
232The :mod:`pickle` module also exports two callables [#]_, :class:`Pickler` and
233:class:`Unpickler`:
234
235
236.. class:: Pickler(file[, protocol])
237
238 This takes a file-like object to which it will write a pickle data stream.
239
240 If the *protocol* parameter is omitted, protocol 0 is used. If *protocol* is
241 specified as a negative value or :const:`HIGHEST_PROTOCOL`, the highest
242 protocol version will be used.
243
Georg Brandl116aa622007-08-15 14:28:22 +0000244 *file* must have a :meth:`write` method that accepts a single string argument.
245 It can thus be an open file object, a :mod:`StringIO` object, or any other
246 custom object that meets this interface.
247
248:class:`Pickler` objects define one (or two) public methods:
249
250
251.. method:: Pickler.dump(obj)
252
253 Write a pickled representation of *obj* to the open file object given in the
254 constructor. Either the binary or ASCII format will be used, depending on the
255 value of the *protocol* argument passed to the constructor.
256
257
258.. method:: Pickler.clear_memo()
259
260 Clears the pickler's "memo". The memo is the data structure that remembers
261 which objects the pickler has already seen, so that shared or recursive objects
262 pickled by reference and not by value. This method is useful when re-using
263 picklers.
264
265 .. note::
266
267 Prior to Python 2.3, :meth:`clear_memo` was only available on the picklers
268 created by :mod:`cPickle`. In the :mod:`pickle` module, picklers have an
269 instance variable called :attr:`memo` which is a Python dictionary. So to clear
270 the memo for a :mod:`pickle` module pickler, you could do the following::
271
272 mypickler.memo.clear()
273
274 Code that does not need to support older versions of Python should simply use
275 :meth:`clear_memo`.
276
277It is possible to make multiple calls to the :meth:`dump` method of the same
278:class:`Pickler` instance. These must then be matched to the same number of
279calls to the :meth:`load` method of the corresponding :class:`Unpickler`
280instance. If the same object is pickled by multiple :meth:`dump` calls, the
281:meth:`load` will all yield references to the same object. [#]_
282
283:class:`Unpickler` objects are defined as:
284
285
286.. class:: Unpickler(file)
287
288 This takes a file-like object from which it will read a pickle data stream.
289 This class automatically determines whether the data stream was written in
290 binary mode or not, so it does not need a flag as in the :class:`Pickler`
291 factory.
292
293 *file* must have two methods, a :meth:`read` method that takes an integer
294 argument, and a :meth:`readline` method that requires no arguments. Both
295 methods should return a string. Thus *file* can be a file object opened for
296 reading, a :mod:`StringIO` object, or any other custom object that meets this
297 interface.
298
299:class:`Unpickler` objects have one (or two) public methods:
300
301
302.. method:: Unpickler.load()
303
304 Read a pickled object representation from the open file object given in the
305 constructor, and return the reconstituted object hierarchy specified therein.
306
307 This method automatically determines whether the data stream was written in
308 binary mode or not.
309
310
311.. method:: Unpickler.noload()
312
313 This is just like :meth:`load` except that it doesn't actually create any
314 objects. This is useful primarily for finding what's called "persistent ids"
315 that may be referenced in a pickle data stream. See section
316 :ref:`pickle-protocol` below for more details.
317
318 **Note:** the :meth:`noload` method is currently only available on
319 :class:`Unpickler` objects created with the :mod:`cPickle` module.
320 :mod:`pickle` module :class:`Unpickler`\ s do not have the :meth:`noload`
321 method.
322
323
324What can be pickled and unpickled?
325----------------------------------
326
327The following types can be pickled:
328
329* ``None``, ``True``, and ``False``
330
331* integers, long integers, floating point numbers, complex numbers
332
333* normal and Unicode strings
334
335* tuples, lists, sets, and dictionaries containing only picklable objects
336
337* functions defined at the top level of a module
338
339* built-in functions defined at the top level of a module
340
341* classes that are defined at the top level of a module
342
343* instances of such classes whose :attr:`__dict__` or :meth:`__setstate__` is
344 picklable (see section :ref:`pickle-protocol` for details)
345
346Attempts to pickle unpicklable objects will raise the :exc:`PicklingError`
347exception; when this happens, an unspecified number of bytes may have already
348been written to the underlying file. Trying to pickle a highly recursive data
349structure may exceed the maximum recursion depth, a :exc:`RuntimeError` will be
350raised in this case. You can carefully raise this limit with
351:func:`sys.setrecursionlimit`.
352
353Note that functions (built-in and user-defined) are pickled by "fully qualified"
354name reference, not by value. This means that only the function name is
355pickled, along with the name of module the function is defined in. Neither the
356function's code, nor any of its function attributes are pickled. Thus the
357defining module must be importable in the unpickling environment, and the module
358must contain the named object, otherwise an exception will be raised. [#]_
359
360Similarly, classes are pickled by named reference, so the same restrictions in
361the unpickling environment apply. Note that none of the class's code or data is
362pickled, so in the following example the class attribute ``attr`` is not
363restored in the unpickling environment::
364
365 class Foo:
366 attr = 'a class attr'
367
368 picklestring = pickle.dumps(Foo)
369
370These restrictions are why picklable functions and classes must be defined in
371the top level of a module.
372
373Similarly, when class instances are pickled, their class's code and data are not
374pickled along with them. Only the instance data are pickled. This is done on
375purpose, so you can fix bugs in a class or add methods to the class and still
376load objects that were created with an earlier version of the class. If you
377plan to have long-lived objects that will see many versions of a class, it may
378be worthwhile to put a version number in the objects so that suitable
379conversions can be made by the class's :meth:`__setstate__` method.
380
381
382.. _pickle-protocol:
383
384The pickle protocol
385-------------------
386
387This section describes the "pickling protocol" that defines the interface
388between the pickler/unpickler and the objects that are being serialized. This
389protocol provides a standard way for you to define, customize, and control how
390your objects are serialized and de-serialized. The description in this section
391doesn't cover specific customizations that you can employ to make the unpickling
392environment slightly safer from untrusted pickle data streams; see section
393:ref:`pickle-sub` for more details.
394
395
396.. _pickle-inst:
397
398Pickling and unpickling normal class instances
399^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
400
401.. index::
402 single: __getinitargs__() (copy protocol)
403 single: __init__() (instance constructor)
404
Georg Brandl85eb8c12007-08-31 16:33:38 +0000405.. XXX is __getinitargs__ only used with old-style classes?
406
Georg Brandl116aa622007-08-15 14:28:22 +0000407When a pickled class instance is unpickled, its :meth:`__init__` method is
408normally *not* invoked. If it is desirable that the :meth:`__init__` method be
409called on unpickling, an old-style class can define a method
410:meth:`__getinitargs__`, which should return a *tuple* containing the arguments
411to be passed to the class constructor (:meth:`__init__` for example). The
412:meth:`__getinitargs__` method is called at pickle time; the tuple it returns is
413incorporated in the pickle for the instance.
414
415.. index:: single: __getnewargs__() (copy protocol)
416
417New-style types can provide a :meth:`__getnewargs__` method that is used for
418protocol 2. Implementing this method is needed if the type establishes some
419internal invariants when the instance is created, or if the memory allocation is
420affected by the values passed to the :meth:`__new__` method for the type (as it
421is for tuples and strings). Instances of a new-style type :class:`C` are
422created using ::
423
424 obj = C.__new__(C, *args)
425
426
427where *args* is the result of calling :meth:`__getnewargs__` on the original
428object; if there is no :meth:`__getnewargs__`, an empty tuple is assumed.
429
430.. index::
431 single: __getstate__() (copy protocol)
432 single: __setstate__() (copy protocol)
433 single: __dict__ (instance attribute)
434
435Classes can further influence how their instances are pickled; if the class
436defines the method :meth:`__getstate__`, it is called and the return state is
437pickled as the contents for the instance, instead of the contents of the
438instance's dictionary. If there is no :meth:`__getstate__` method, the
439instance's :attr:`__dict__` is pickled.
440
441Upon unpickling, if the class also defines the method :meth:`__setstate__`, it
442is called with the unpickled state. [#]_ If there is no :meth:`__setstate__`
443method, the pickled state must be a dictionary and its items are assigned to the
444new instance's dictionary. If a class defines both :meth:`__getstate__` and
445:meth:`__setstate__`, the state object needn't be a dictionary and these methods
446can do what they want. [#]_
447
448.. warning::
449
450 For new-style classes, if :meth:`__getstate__` returns a false value, the
451 :meth:`__setstate__` method will not be called.
452
453
454Pickling and unpickling extension types
455^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
456
457When the :class:`Pickler` encounters an object of a type it knows nothing about
458--- such as an extension type --- it looks in two places for a hint of how to
459pickle it. One alternative is for the object to implement a :meth:`__reduce__`
460method. If provided, at pickling time :meth:`__reduce__` will be called with no
461arguments, and it must return either a string or a tuple.
462
463If a string is returned, it names a global variable whose contents are pickled
464as normal. The string returned by :meth:`__reduce__` should be the object's
465local name relative to its module; the pickle module searches the module
466namespace to determine the object's module.
467
468When a tuple is returned, it must be between two and five elements long.
469Optional elements can either be omitted, or ``None`` can be provided as their
470value. The semantics of each element are:
471
472* A callable object that will be called to create the initial version of the
473 object. The next element of the tuple will provide arguments for this callable,
474 and later elements provide additional state information that will subsequently
475 be used to fully reconstruct the pickled data.
476
477 In the unpickling environment this object must be either a class, a callable
478 registered as a "safe constructor" (see below), or it must have an attribute
479 :attr:`__safe_for_unpickling__` with a true value. Otherwise, an
480 :exc:`UnpicklingError` will be raised in the unpickling environment. Note that
481 as usual, the callable itself is pickled by name.
482
Georg Brandl55ac8f02007-09-01 13:51:09 +0000483* A tuple of arguments for the callable object, not ``None``.
Georg Brandl116aa622007-08-15 14:28:22 +0000484
485* Optionally, the object's state, which will be passed to the object's
486 :meth:`__setstate__` method as described in section :ref:`pickle-inst`. If the
487 object has no :meth:`__setstate__` method, then, as above, the value must be a
488 dictionary and it will be added to the object's :attr:`__dict__`.
489
490* Optionally, an iterator (and not a sequence) yielding successive list items.
491 These list items will be pickled, and appended to the object using either
492 ``obj.append(item)`` or ``obj.extend(list_of_items)``. This is primarily used
493 for list subclasses, but may be used by other classes as long as they have
494 :meth:`append` and :meth:`extend` methods with the appropriate signature.
495 (Whether :meth:`append` or :meth:`extend` is used depends on which pickle
496 protocol version is used as well as the number of items to append, so both must
497 be supported.)
498
499* Optionally, an iterator (not a sequence) yielding successive dictionary items,
500 which should be tuples of the form ``(key, value)``. These items will be
501 pickled and stored to the object using ``obj[key] = value``. This is primarily
502 used for dictionary subclasses, but may be used by other classes as long as they
503 implement :meth:`__setitem__`.
504
505It is sometimes useful to know the protocol version when implementing
506:meth:`__reduce__`. This can be done by implementing a method named
507:meth:`__reduce_ex__` instead of :meth:`__reduce__`. :meth:`__reduce_ex__`, when
508it exists, is called in preference over :meth:`__reduce__` (you may still
509provide :meth:`__reduce__` for backwards compatibility). The
510:meth:`__reduce_ex__` method will be called with a single integer argument, the
511protocol version.
512
513The :class:`object` class implements both :meth:`__reduce__` and
514:meth:`__reduce_ex__`; however, if a subclass overrides :meth:`__reduce__` but
515not :meth:`__reduce_ex__`, the :meth:`__reduce_ex__` implementation detects this
516and calls :meth:`__reduce__`.
517
518An alternative to implementing a :meth:`__reduce__` method on the object to be
519pickled, is to register the callable with the :mod:`copy_reg` module. This
520module provides a way for programs to register "reduction functions" and
521constructors for user-defined types. Reduction functions have the same
522semantics and interface as the :meth:`__reduce__` method described above, except
523that they are called with a single argument, the object to be pickled.
524
525The registered constructor is deemed a "safe constructor" for purposes of
526unpickling as described above.
527
528
529Pickling and unpickling external objects
530^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
531
532For the benefit of object persistence, the :mod:`pickle` module supports the
533notion of a reference to an object outside the pickled data stream. Such
534objects are referenced by a "persistent id", which is just an arbitrary string
535of printable ASCII characters. The resolution of such names is not defined by
536the :mod:`pickle` module; it will delegate this resolution to user defined
537functions on the pickler and unpickler. [#]_
538
539To define external persistent id resolution, you need to set the
540:attr:`persistent_id` attribute of the pickler object and the
541:attr:`persistent_load` attribute of the unpickler object.
542
543To pickle objects that have an external persistent id, the pickler must have a
544custom :func:`persistent_id` method that takes an object as an argument and
545returns either ``None`` or the persistent id for that object. When ``None`` is
546returned, the pickler simply pickles the object as normal. When a persistent id
547string is returned, the pickler will pickle that string, along with a marker so
548that the unpickler will recognize the string as a persistent id.
549
550To unpickle external objects, the unpickler must have a custom
551:func:`persistent_load` function that takes a persistent id string and returns
552the referenced object.
553
554Here's a silly example that *might* shed more light::
555
556 import pickle
557 from cStringIO import StringIO
558
559 src = StringIO()
560 p = pickle.Pickler(src)
561
562 def persistent_id(obj):
563 if hasattr(obj, 'x'):
564 return 'the value %d' % obj.x
565 else:
566 return None
567
568 p.persistent_id = persistent_id
569
570 class Integer:
571 def __init__(self, x):
572 self.x = x
573 def __str__(self):
574 return 'My name is integer %d' % self.x
575
576 i = Integer(7)
Georg Brandl6911e3c2007-09-04 07:15:32 +0000577 print(i)
Georg Brandl116aa622007-08-15 14:28:22 +0000578 p.dump(i)
579
580 datastream = src.getvalue()
Georg Brandl6911e3c2007-09-04 07:15:32 +0000581 print(repr(datastream))
Georg Brandl116aa622007-08-15 14:28:22 +0000582 dst = StringIO(datastream)
583
584 up = pickle.Unpickler(dst)
585
586 class FancyInteger(Integer):
587 def __str__(self):
588 return 'I am the integer %d' % self.x
589
590 def persistent_load(persid):
591 if persid.startswith('the value '):
592 value = int(persid.split()[2])
593 return FancyInteger(value)
594 else:
595 raise pickle.UnpicklingError, 'Invalid persistent id'
596
597 up.persistent_load = persistent_load
598
599 j = up.load()
Georg Brandl6911e3c2007-09-04 07:15:32 +0000600 print(j)
Georg Brandl116aa622007-08-15 14:28:22 +0000601
602In the :mod:`cPickle` module, the unpickler's :attr:`persistent_load` attribute
603can also be set to a Python list, in which case, when the unpickler reaches a
604persistent id, the persistent id string will simply be appended to this list.
605This functionality exists so that a pickle data stream can be "sniffed" for
606object references without actually instantiating all the objects in a pickle.
607[#]_ Setting :attr:`persistent_load` to a list is usually used in conjunction
608with the :meth:`noload` method on the Unpickler.
609
610.. % BAW: Both pickle and cPickle support something called
611.. % inst_persistent_id() which appears to give unknown types a second
612.. % shot at producing a persistent id. Since Jim Fulton can't remember
613.. % why it was added or what it's for, I'm leaving it undocumented.
614
615
616.. _pickle-sub:
617
618Subclassing Unpicklers
619----------------------
620
621By default, unpickling will import any class that it finds in the pickle data.
622You can control exactly what gets unpickled and what gets called by customizing
623your unpickler. Unfortunately, exactly how you do this is different depending
624on whether you're using :mod:`pickle` or :mod:`cPickle`. [#]_
625
626In the :mod:`pickle` module, you need to derive a subclass from
627:class:`Unpickler`, overriding the :meth:`load_global` method.
628:meth:`load_global` should read two lines from the pickle data stream where the
629first line will the name of the module containing the class and the second line
630will be the name of the instance's class. It then looks up the class, possibly
631importing the module and digging out the attribute, then it appends what it
632finds to the unpickler's stack. Later on, this class will be assigned to the
633:attr:`__class__` attribute of an empty class, as a way of magically creating an
634instance without calling its class's :meth:`__init__`. Your job (should you
635choose to accept it), would be to have :meth:`load_global` push onto the
636unpickler's stack, a known safe version of any class you deem safe to unpickle.
637It is up to you to produce such a class. Or you could raise an error if you
638want to disallow all unpickling of instances. If this sounds like a hack,
639you're right. Refer to the source code to make this work.
640
641Things are a little cleaner with :mod:`cPickle`, but not by much. To control
642what gets unpickled, you can set the unpickler's :attr:`find_global` attribute
643to a function or ``None``. If it is ``None`` then any attempts to unpickle
644instances will raise an :exc:`UnpicklingError`. If it is a function, then it
645should accept a module name and a class name, and return the corresponding class
646object. It is responsible for looking up the class and performing any necessary
647imports, and it may raise an error to prevent instances of the class from being
648unpickled.
649
650The moral of the story is that you should be really careful about the source of
651the strings your application unpickles.
652
653
654.. _pickle-example:
655
656Example
657-------
658
659For the simplest code, use the :func:`dump` and :func:`load` functions. Note
660that a self-referencing list is pickled and restored correctly. ::
661
662 import pickle
663
664 data1 = {'a': [1, 2.0, 3, 4+6j],
665 'b': ('string', u'Unicode string'),
666 'c': None}
667
668 selfref_list = [1, 2, 3]
669 selfref_list.append(selfref_list)
670
671 output = open('data.pkl', 'wb')
672
673 # Pickle dictionary using protocol 0.
674 pickle.dump(data1, output)
675
676 # Pickle the list using the highest protocol available.
677 pickle.dump(selfref_list, output, -1)
678
679 output.close()
680
681The following example reads the resulting pickled data. When reading a
682pickle-containing file, you should open the file in binary mode because you
683can't be sure if the ASCII or binary format was used. ::
684
685 import pprint, pickle
686
687 pkl_file = open('data.pkl', 'rb')
688
689 data1 = pickle.load(pkl_file)
690 pprint.pprint(data1)
691
692 data2 = pickle.load(pkl_file)
693 pprint.pprint(data2)
694
695 pkl_file.close()
696
697Here's a larger example that shows how to modify pickling behavior for a class.
698The :class:`TextReader` class opens a text file, and returns the line number and
699line contents each time its :meth:`readline` method is called. If a
700:class:`TextReader` instance is pickled, all attributes *except* the file object
701member are saved. When the instance is unpickled, the file is reopened, and
702reading resumes from the last location. The :meth:`__setstate__` and
703:meth:`__getstate__` methods are used to implement this behavior. ::
704
705 #!/usr/local/bin/python
706
707 class TextReader:
708 """Print and number lines in a text file."""
709 def __init__(self, file):
710 self.file = file
711 self.fh = open(file)
712 self.lineno = 0
713
714 def readline(self):
715 self.lineno = self.lineno + 1
716 line = self.fh.readline()
717 if not line:
718 return None
719 if line.endswith("\n"):
720 line = line[:-1]
721 return "%d: %s" % (self.lineno, line)
722
723 def __getstate__(self):
724 odict = self.__dict__.copy() # copy the dict since we change it
725 del odict['fh'] # remove filehandle entry
726 return odict
727
728 def __setstate__(self, dict):
729 fh = open(dict['file']) # reopen file
730 count = dict['lineno'] # read from file...
731 while count: # until line count is restored
732 fh.readline()
733 count = count - 1
734 self.__dict__.update(dict) # update attributes
735 self.fh = fh # save the file object
736
737A sample usage might be something like this::
738
739 >>> import TextReader
740 >>> obj = TextReader.TextReader("TextReader.py")
741 >>> obj.readline()
742 '1: #!/usr/local/bin/python'
743 >>> obj.readline()
744 '2: '
745 >>> obj.readline()
746 '3: class TextReader:'
747 >>> import pickle
748 >>> pickle.dump(obj, open('save.p', 'wb'))
749
750If you want to see that :mod:`pickle` works across Python processes, start
751another Python session, before continuing. What follows can happen from either
752the same process or a new process. ::
753
754 >>> import pickle
755 >>> reader = pickle.load(open('save.p', 'rb'))
756 >>> reader.readline()
757 '4: """Print and number lines in a text file."""'
758
759
760.. seealso::
761
762 Module :mod:`copy_reg`
763 Pickle interface constructor registration for extension types.
764
765 Module :mod:`shelve`
766 Indexed databases of objects; uses :mod:`pickle`.
767
768 Module :mod:`copy`
769 Shallow and deep object copying.
770
771 Module :mod:`marshal`
772 High-performance serialization of built-in types.
773
774
775:mod:`cPickle` --- A faster :mod:`pickle`
776=========================================
777
778.. module:: cPickle
779 :synopsis: Faster version of pickle, but not subclassable.
780.. moduleauthor:: Jim Fulton <jim@zope.com>
781.. sectionauthor:: Fred L. Drake, Jr. <fdrake@acm.org>
782
783
784.. index:: module: pickle
785
786The :mod:`cPickle` module supports serialization and de-serialization of Python
787objects, providing an interface and functionality nearly identical to the
788:mod:`pickle` module. There are several differences, the most important being
789performance and subclassability.
790
791First, :mod:`cPickle` can be up to 1000 times faster than :mod:`pickle` because
792the former is implemented in C. Second, in the :mod:`cPickle` module the
793callables :func:`Pickler` and :func:`Unpickler` are functions, not classes.
794This means that you cannot use them to derive custom pickling and unpickling
795subclasses. Most applications have no need for this functionality and should
796benefit from the greatly improved performance of the :mod:`cPickle` module.
797
798The pickle data stream produced by :mod:`pickle` and :mod:`cPickle` are
799identical, so it is possible to use :mod:`pickle` and :mod:`cPickle`
800interchangeably with existing pickles. [#]_
801
802There are additional minor differences in API between :mod:`cPickle` and
803:mod:`pickle`, however for most applications, they are interchangeable. More
804documentation is provided in the :mod:`pickle` module documentation, which
805includes a list of the documented differences.
806
807.. rubric:: Footnotes
808
809.. [#] Don't confuse this with the :mod:`marshal` module
810
811.. [#] In the :mod:`pickle` module these callables are classes, which you could
812 subclass to customize the behavior. However, in the :mod:`cPickle` module these
813 callables are factory functions and so cannot be subclassed. One common reason
814 to subclass is to control what objects can actually be unpickled. See section
815 :ref:`pickle-sub` for more details.
816
817.. [#] *Warning*: this is intended for pickling multiple objects without intervening
818 modifications to the objects or their parts. If you modify an object and then
819 pickle it again using the same :class:`Pickler` instance, the object is not
820 pickled again --- a reference to it is pickled and the :class:`Unpickler` will
821 return the old value, not the modified one. There are two problems here: (1)
822 detecting changes, and (2) marshalling a minimal set of changes. Garbage
823 Collection may also become a problem here.
824
825.. [#] The exception raised will likely be an :exc:`ImportError` or an
826 :exc:`AttributeError` but it could be something else.
827
828.. [#] These methods can also be used to implement copying class instances.
829
830.. [#] This protocol is also used by the shallow and deep copying operations defined in
831 the :mod:`copy` module.
832
833.. [#] The actual mechanism for associating these user defined functions is slightly
834 different for :mod:`pickle` and :mod:`cPickle`. The description given here
835 works the same for both implementations. Users of the :mod:`pickle` module
836 could also use subclassing to effect the same results, overriding the
837 :meth:`persistent_id` and :meth:`persistent_load` methods in the derived
838 classes.
839
840.. [#] We'll leave you with the image of Guido and Jim sitting around sniffing pickles
841 in their living rooms.
842
843.. [#] A word of caution: the mechanisms described here use internal attributes and
844 methods, which are subject to change in future versions of Python. We intend to
845 someday provide a common interface for controlling this behavior, which will
846 work in either :mod:`pickle` or :mod:`cPickle`.
847
848.. [#] Since the pickle data format is actually a tiny stack-oriented programming
849 language, and some freedom is taken in the encodings of certain objects, it is
850 possible that the two modules produce different data streams for the same input
851 objects. However it is guaranteed that they will always be able to read each
852 other's data streams.
853