blob: 18727248653fd510a8b88475fc12b8a9880ac672 [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001
2:mod:`pickle` --- Python object serialization
3=============================================
4
5.. index::
6 single: persistence
7 pair: persistent; objects
8 pair: serializing; objects
9 pair: marshalling; objects
10 pair: flattening; objects
11 pair: pickling; objects
12
13.. module:: pickle
14 :synopsis: Convert Python objects to streams of bytes and back.
15
16
17.. % Substantial improvements by Jim Kerr <jbkerr@sr.hp.com>.
18.. % Rewritten by Barry Warsaw <barry@zope.com>
19
20The :mod:`pickle` module implements a fundamental, but powerful algorithm for
21serializing and de-serializing a Python object structure. "Pickling" is the
22process whereby a Python object hierarchy is converted into a byte stream, and
23"unpickling" is the inverse operation, whereby a byte stream is converted back
24into an object hierarchy. Pickling (and unpickling) is alternatively known as
25"serialization", "marshalling," [#]_ or "flattening", however, to avoid
26confusion, the terms used here are "pickling" and "unpickling".
27
28This documentation describes both the :mod:`pickle` module and the
29:mod:`cPickle` module.
30
31
32Relationship to other Python modules
33------------------------------------
34
35The :mod:`pickle` module has an optimized cousin called the :mod:`cPickle`
36module. As its name implies, :mod:`cPickle` is written in C, so it can be up to
371000 times faster than :mod:`pickle`. However it does not support subclassing
38of the :func:`Pickler` and :func:`Unpickler` classes, because in :mod:`cPickle`
39these are functions, not classes. Most applications have no need for this
40functionality, and can benefit from the improved performance of :mod:`cPickle`.
41Other than that, the interfaces of the two modules are nearly identical; the
42common interface is described in this manual and differences are pointed out
43where necessary. In the following discussions, we use the term "pickle" to
44collectively describe the :mod:`pickle` and :mod:`cPickle` modules.
45
46The data streams the two modules produce are guaranteed to be interchangeable.
47
48Python has a more primitive serialization module called :mod:`marshal`, but in
49general :mod:`pickle` should always be the preferred way to serialize Python
50objects. :mod:`marshal` exists primarily to support Python's :file:`.pyc`
51files.
52
53The :mod:`pickle` module differs from :mod:`marshal` several significant ways:
54
55* The :mod:`pickle` module keeps track of the objects it has already serialized,
56 so that later references to the same object won't be serialized again.
57 :mod:`marshal` doesn't do this.
58
59 This has implications both for recursive objects and object sharing. Recursive
60 objects are objects that contain references to themselves. These are not
61 handled by marshal, and in fact, attempting to marshal recursive objects will
62 crash your Python interpreter. Object sharing happens when there are multiple
63 references to the same object in different places in the object hierarchy being
64 serialized. :mod:`pickle` stores such objects only once, and ensures that all
65 other references point to the master copy. Shared objects remain shared, which
66 can be very important for mutable objects.
67
68* :mod:`marshal` cannot be used to serialize user-defined classes and their
69 instances. :mod:`pickle` can save and restore class instances transparently,
70 however the class definition must be importable and live in the same module as
71 when the object was stored.
72
73* The :mod:`marshal` serialization format is not guaranteed to be portable
74 across Python versions. Because its primary job in life is to support
75 :file:`.pyc` files, the Python implementers reserve the right to change the
76 serialization format in non-backwards compatible ways should the need arise.
77 The :mod:`pickle` serialization format is guaranteed to be backwards compatible
78 across Python releases.
79
80.. warning::
81
82 The :mod:`pickle` module is not intended to be secure against erroneous or
83 maliciously constructed data. Never unpickle data received from an untrusted or
84 unauthenticated source.
85
86Note that serialization is a more primitive notion than persistence; although
87:mod:`pickle` reads and writes file objects, it does not handle the issue of
88naming persistent objects, nor the (even more complicated) issue of concurrent
89access to persistent objects. The :mod:`pickle` module can transform a complex
90object into a byte stream and it can transform the byte stream into an object
91with the same internal structure. Perhaps the most obvious thing to do with
92these byte streams is to write them onto a file, but it is also conceivable to
93send them across a network or store them in a database. The module
94:mod:`shelve` provides a simple interface to pickle and unpickle objects on
95DBM-style database files.
96
97
98Data stream format
99------------------
100
101.. index::
102 single: XDR
103 single: External Data Representation
104
105The data format used by :mod:`pickle` is Python-specific. This has the
106advantage that there are no restrictions imposed by external standards such as
107XDR (which can't represent pointer sharing); however it means that non-Python
108programs may not be able to reconstruct pickled Python objects.
109
110By default, the :mod:`pickle` data format uses a printable ASCII representation.
111This is slightly more voluminous than a binary representation. The big
112advantage of using printable ASCII (and of some other characteristics of
113:mod:`pickle`'s representation) is that for debugging or recovery purposes it is
114possible for a human to read the pickled file with a standard text editor.
115
116There are currently 3 different protocols which can be used for pickling.
117
118* Protocol version 0 is the original ASCII protocol and is backwards compatible
119 with earlier versions of Python.
120
121* Protocol version 1 is the old binary format which is also compatible with
122 earlier versions of Python.
123
124* Protocol version 2 was introduced in Python 2.3. It provides much more
125 efficient pickling of new-style classes.
126
127Refer to :pep:`307` for more information.
128
129If a *protocol* is not specified, protocol 0 is used. If *protocol* is specified
130as a negative value or :const:`HIGHEST_PROTOCOL`, the highest protocol version
131available will be used.
132
133.. versionchanged:: 2.3
134 Introduced the *protocol* parameter.
135
136A binary format, which is slightly more efficient, can be chosen by specifying a
137*protocol* version >= 1.
138
139
140Usage
141-----
142
143To serialize an object hierarchy, you first create a pickler, then you call the
144pickler's :meth:`dump` method. To de-serialize a data stream, you first create
145an unpickler, then you call the unpickler's :meth:`load` method. The
146:mod:`pickle` module provides the following constant:
147
148
149.. data:: HIGHEST_PROTOCOL
150
151 The highest protocol version available. This value can be passed as a
152 *protocol* value.
153
154 .. versionadded:: 2.3
155
156.. note::
157
158 Be sure to always open pickle files created with protocols >= 1 in binary mode.
159 For the old ASCII-based pickle protocol 0 you can use either text mode or binary
160 mode as long as you stay consistent.
161
162 A pickle file written with protocol 0 in binary mode will contain lone linefeeds
163 as line terminators and therefore will look "funny" when viewed in Notepad or
164 other editors which do not support this format.
165
166The :mod:`pickle` module provides the following functions to make the pickling
167process more convenient:
168
169
170.. function:: dump(obj, file[, protocol])
171
172 Write a pickled representation of *obj* to the open file object *file*. This is
173 equivalent to ``Pickler(file, protocol).dump(obj)``.
174
175 If the *protocol* parameter is omitted, protocol 0 is used. If *protocol* is
176 specified as a negative value or :const:`HIGHEST_PROTOCOL`, the highest protocol
177 version will be used.
178
179 .. versionchanged:: 2.3
180 Introduced the *protocol* parameter.
181
182 *file* must have a :meth:`write` method that accepts a single string argument.
183 It can thus be a file object opened for writing, a :mod:`StringIO` object, or
184 any other custom object that meets this interface.
185
186
187.. function:: load(file)
188
189 Read a string from the open file object *file* and interpret it as a pickle data
190 stream, reconstructing and returning the original object hierarchy. This is
191 equivalent to ``Unpickler(file).load()``.
192
193 *file* must have two methods, a :meth:`read` method that takes an integer
194 argument, and a :meth:`readline` method that requires no arguments. Both
195 methods should return a string. Thus *file* can be a file object opened for
196 reading, a :mod:`StringIO` object, or any other custom object that meets this
197 interface.
198
199 This function automatically determines whether the data stream was written in
200 binary mode or not.
201
202
203.. function:: dumps(obj[, protocol])
204
205 Return the pickled representation of the object as a string, instead of writing
206 it to a file.
207
208 If the *protocol* parameter is omitted, protocol 0 is used. If *protocol* is
209 specified as a negative value or :const:`HIGHEST_PROTOCOL`, the highest protocol
210 version will be used.
211
212 .. versionchanged:: 2.3
213 The *protocol* parameter was added.
214
215
216.. function:: loads(string)
217
218 Read a pickled object hierarchy from a string. Characters in the string past
219 the pickled object's representation are ignored.
220
221The :mod:`pickle` module also defines three exceptions:
222
223
224.. exception:: PickleError
225
226 A common base class for the other exceptions defined below. This inherits from
227 :exc:`Exception`.
228
229
230.. exception:: PicklingError
231
232 This exception is raised when an unpicklable object is passed to the
233 :meth:`dump` method.
234
235
236.. exception:: UnpicklingError
237
238 This exception is raised when there is a problem unpickling an object. Note that
239 other exceptions may also be raised during unpickling, including (but not
240 necessarily limited to) :exc:`AttributeError`, :exc:`EOFError`,
241 :exc:`ImportError`, and :exc:`IndexError`.
242
243The :mod:`pickle` module also exports two callables [#]_, :class:`Pickler` and
244:class:`Unpickler`:
245
246
247.. class:: Pickler(file[, protocol])
248
249 This takes a file-like object to which it will write a pickle data stream.
250
251 If the *protocol* parameter is omitted, protocol 0 is used. If *protocol* is
252 specified as a negative value or :const:`HIGHEST_PROTOCOL`, the highest
253 protocol version will be used.
254
255 .. versionchanged:: 2.3
256 Introduced the *protocol* parameter.
257
258 *file* must have a :meth:`write` method that accepts a single string argument.
259 It can thus be an open file object, a :mod:`StringIO` object, or any other
260 custom object that meets this interface.
261
262:class:`Pickler` objects define one (or two) public methods:
263
264
265.. method:: Pickler.dump(obj)
266
267 Write a pickled representation of *obj* to the open file object given in the
268 constructor. Either the binary or ASCII format will be used, depending on the
269 value of the *protocol* argument passed to the constructor.
270
271
272.. method:: Pickler.clear_memo()
273
274 Clears the pickler's "memo". The memo is the data structure that remembers
275 which objects the pickler has already seen, so that shared or recursive objects
276 pickled by reference and not by value. This method is useful when re-using
277 picklers.
278
279 .. note::
280
281 Prior to Python 2.3, :meth:`clear_memo` was only available on the picklers
282 created by :mod:`cPickle`. In the :mod:`pickle` module, picklers have an
283 instance variable called :attr:`memo` which is a Python dictionary. So to clear
284 the memo for a :mod:`pickle` module pickler, you could do the following::
285
286 mypickler.memo.clear()
287
288 Code that does not need to support older versions of Python should simply use
289 :meth:`clear_memo`.
290
291It is possible to make multiple calls to the :meth:`dump` method of the same
292:class:`Pickler` instance. These must then be matched to the same number of
293calls to the :meth:`load` method of the corresponding :class:`Unpickler`
294instance. If the same object is pickled by multiple :meth:`dump` calls, the
295:meth:`load` will all yield references to the same object. [#]_
296
297:class:`Unpickler` objects are defined as:
298
299
300.. class:: Unpickler(file)
301
302 This takes a file-like object from which it will read a pickle data stream.
303 This class automatically determines whether the data stream was written in
304 binary mode or not, so it does not need a flag as in the :class:`Pickler`
305 factory.
306
307 *file* must have two methods, a :meth:`read` method that takes an integer
308 argument, and a :meth:`readline` method that requires no arguments. Both
309 methods should return a string. Thus *file* can be a file object opened for
310 reading, a :mod:`StringIO` object, or any other custom object that meets this
311 interface.
312
313:class:`Unpickler` objects have one (or two) public methods:
314
315
316.. method:: Unpickler.load()
317
318 Read a pickled object representation from the open file object given in the
319 constructor, and return the reconstituted object hierarchy specified therein.
320
321 This method automatically determines whether the data stream was written in
322 binary mode or not.
323
324
325.. method:: Unpickler.noload()
326
327 This is just like :meth:`load` except that it doesn't actually create any
328 objects. This is useful primarily for finding what's called "persistent ids"
329 that may be referenced in a pickle data stream. See section
330 :ref:`pickle-protocol` below for more details.
331
332 **Note:** the :meth:`noload` method is currently only available on
333 :class:`Unpickler` objects created with the :mod:`cPickle` module.
334 :mod:`pickle` module :class:`Unpickler`\ s do not have the :meth:`noload`
335 method.
336
337
338What can be pickled and unpickled?
339----------------------------------
340
341The following types can be pickled:
342
343* ``None``, ``True``, and ``False``
344
345* integers, long integers, floating point numbers, complex numbers
346
347* normal and Unicode strings
348
349* tuples, lists, sets, and dictionaries containing only picklable objects
350
351* functions defined at the top level of a module
352
353* built-in functions defined at the top level of a module
354
355* classes that are defined at the top level of a module
356
357* instances of such classes whose :attr:`__dict__` or :meth:`__setstate__` is
358 picklable (see section :ref:`pickle-protocol` for details)
359
360Attempts to pickle unpicklable objects will raise the :exc:`PicklingError`
361exception; when this happens, an unspecified number of bytes may have already
362been written to the underlying file. Trying to pickle a highly recursive data
363structure may exceed the maximum recursion depth, a :exc:`RuntimeError` will be
364raised in this case. You can carefully raise this limit with
365:func:`sys.setrecursionlimit`.
366
367Note that functions (built-in and user-defined) are pickled by "fully qualified"
368name reference, not by value. This means that only the function name is
369pickled, along with the name of module the function is defined in. Neither the
370function's code, nor any of its function attributes are pickled. Thus the
371defining module must be importable in the unpickling environment, and the module
372must contain the named object, otherwise an exception will be raised. [#]_
373
374Similarly, classes are pickled by named reference, so the same restrictions in
375the unpickling environment apply. Note that none of the class's code or data is
376pickled, so in the following example the class attribute ``attr`` is not
377restored in the unpickling environment::
378
379 class Foo:
380 attr = 'a class attr'
381
382 picklestring = pickle.dumps(Foo)
383
384These restrictions are why picklable functions and classes must be defined in
385the top level of a module.
386
387Similarly, when class instances are pickled, their class's code and data are not
388pickled along with them. Only the instance data are pickled. This is done on
389purpose, so you can fix bugs in a class or add methods to the class and still
390load objects that were created with an earlier version of the class. If you
391plan to have long-lived objects that will see many versions of a class, it may
392be worthwhile to put a version number in the objects so that suitable
393conversions can be made by the class's :meth:`__setstate__` method.
394
395
396.. _pickle-protocol:
397
398The pickle protocol
399-------------------
400
401This section describes the "pickling protocol" that defines the interface
402between the pickler/unpickler and the objects that are being serialized. This
403protocol provides a standard way for you to define, customize, and control how
404your objects are serialized and de-serialized. The description in this section
405doesn't cover specific customizations that you can employ to make the unpickling
406environment slightly safer from untrusted pickle data streams; see section
407:ref:`pickle-sub` for more details.
408
409
410.. _pickle-inst:
411
412Pickling and unpickling normal class instances
413^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
414
415.. index::
416 single: __getinitargs__() (copy protocol)
417 single: __init__() (instance constructor)
418
Georg Brandl85eb8c12007-08-31 16:33:38 +0000419.. XXX is __getinitargs__ only used with old-style classes?
420
Georg Brandl116aa622007-08-15 14:28:22 +0000421When a pickled class instance is unpickled, its :meth:`__init__` method is
422normally *not* invoked. If it is desirable that the :meth:`__init__` method be
423called on unpickling, an old-style class can define a method
424:meth:`__getinitargs__`, which should return a *tuple* containing the arguments
425to be passed to the class constructor (:meth:`__init__` for example). The
426:meth:`__getinitargs__` method is called at pickle time; the tuple it returns is
427incorporated in the pickle for the instance.
428
429.. index:: single: __getnewargs__() (copy protocol)
430
431New-style types can provide a :meth:`__getnewargs__` method that is used for
432protocol 2. Implementing this method is needed if the type establishes some
433internal invariants when the instance is created, or if the memory allocation is
434affected by the values passed to the :meth:`__new__` method for the type (as it
435is for tuples and strings). Instances of a new-style type :class:`C` are
436created using ::
437
438 obj = C.__new__(C, *args)
439
440
441where *args* is the result of calling :meth:`__getnewargs__` on the original
442object; if there is no :meth:`__getnewargs__`, an empty tuple is assumed.
443
444.. index::
445 single: __getstate__() (copy protocol)
446 single: __setstate__() (copy protocol)
447 single: __dict__ (instance attribute)
448
449Classes can further influence how their instances are pickled; if the class
450defines the method :meth:`__getstate__`, it is called and the return state is
451pickled as the contents for the instance, instead of the contents of the
452instance's dictionary. If there is no :meth:`__getstate__` method, the
453instance's :attr:`__dict__` is pickled.
454
455Upon unpickling, if the class also defines the method :meth:`__setstate__`, it
456is called with the unpickled state. [#]_ If there is no :meth:`__setstate__`
457method, the pickled state must be a dictionary and its items are assigned to the
458new instance's dictionary. If a class defines both :meth:`__getstate__` and
459:meth:`__setstate__`, the state object needn't be a dictionary and these methods
460can do what they want. [#]_
461
462.. warning::
463
464 For new-style classes, if :meth:`__getstate__` returns a false value, the
465 :meth:`__setstate__` method will not be called.
466
467
468Pickling and unpickling extension types
469^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
470
471When the :class:`Pickler` encounters an object of a type it knows nothing about
472--- such as an extension type --- it looks in two places for a hint of how to
473pickle it. One alternative is for the object to implement a :meth:`__reduce__`
474method. If provided, at pickling time :meth:`__reduce__` will be called with no
475arguments, and it must return either a string or a tuple.
476
477If a string is returned, it names a global variable whose contents are pickled
478as normal. The string returned by :meth:`__reduce__` should be the object's
479local name relative to its module; the pickle module searches the module
480namespace to determine the object's module.
481
482When a tuple is returned, it must be between two and five elements long.
483Optional elements can either be omitted, or ``None`` can be provided as their
484value. The semantics of each element are:
485
486* A callable object that will be called to create the initial version of the
487 object. The next element of the tuple will provide arguments for this callable,
488 and later elements provide additional state information that will subsequently
489 be used to fully reconstruct the pickled data.
490
491 In the unpickling environment this object must be either a class, a callable
492 registered as a "safe constructor" (see below), or it must have an attribute
493 :attr:`__safe_for_unpickling__` with a true value. Otherwise, an
494 :exc:`UnpicklingError` will be raised in the unpickling environment. Note that
495 as usual, the callable itself is pickled by name.
496
497* A tuple of arguments for the callable object.
498
499 .. versionchanged:: 2.5
500 Formerly, this argument could also be ``None``.
501
502* Optionally, the object's state, which will be passed to the object's
503 :meth:`__setstate__` method as described in section :ref:`pickle-inst`. If the
504 object has no :meth:`__setstate__` method, then, as above, the value must be a
505 dictionary and it will be added to the object's :attr:`__dict__`.
506
507* Optionally, an iterator (and not a sequence) yielding successive list items.
508 These list items will be pickled, and appended to the object using either
509 ``obj.append(item)`` or ``obj.extend(list_of_items)``. This is primarily used
510 for list subclasses, but may be used by other classes as long as they have
511 :meth:`append` and :meth:`extend` methods with the appropriate signature.
512 (Whether :meth:`append` or :meth:`extend` is used depends on which pickle
513 protocol version is used as well as the number of items to append, so both must
514 be supported.)
515
516* Optionally, an iterator (not a sequence) yielding successive dictionary items,
517 which should be tuples of the form ``(key, value)``. These items will be
518 pickled and stored to the object using ``obj[key] = value``. This is primarily
519 used for dictionary subclasses, but may be used by other classes as long as they
520 implement :meth:`__setitem__`.
521
522It is sometimes useful to know the protocol version when implementing
523:meth:`__reduce__`. This can be done by implementing a method named
524:meth:`__reduce_ex__` instead of :meth:`__reduce__`. :meth:`__reduce_ex__`, when
525it exists, is called in preference over :meth:`__reduce__` (you may still
526provide :meth:`__reduce__` for backwards compatibility). The
527:meth:`__reduce_ex__` method will be called with a single integer argument, the
528protocol version.
529
530The :class:`object` class implements both :meth:`__reduce__` and
531:meth:`__reduce_ex__`; however, if a subclass overrides :meth:`__reduce__` but
532not :meth:`__reduce_ex__`, the :meth:`__reduce_ex__` implementation detects this
533and calls :meth:`__reduce__`.
534
535An alternative to implementing a :meth:`__reduce__` method on the object to be
536pickled, is to register the callable with the :mod:`copy_reg` module. This
537module provides a way for programs to register "reduction functions" and
538constructors for user-defined types. Reduction functions have the same
539semantics and interface as the :meth:`__reduce__` method described above, except
540that they are called with a single argument, the object to be pickled.
541
542The registered constructor is deemed a "safe constructor" for purposes of
543unpickling as described above.
544
545
546Pickling and unpickling external objects
547^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
548
549For the benefit of object persistence, the :mod:`pickle` module supports the
550notion of a reference to an object outside the pickled data stream. Such
551objects are referenced by a "persistent id", which is just an arbitrary string
552of printable ASCII characters. The resolution of such names is not defined by
553the :mod:`pickle` module; it will delegate this resolution to user defined
554functions on the pickler and unpickler. [#]_
555
556To define external persistent id resolution, you need to set the
557:attr:`persistent_id` attribute of the pickler object and the
558:attr:`persistent_load` attribute of the unpickler object.
559
560To pickle objects that have an external persistent id, the pickler must have a
561custom :func:`persistent_id` method that takes an object as an argument and
562returns either ``None`` or the persistent id for that object. When ``None`` is
563returned, the pickler simply pickles the object as normal. When a persistent id
564string is returned, the pickler will pickle that string, along with a marker so
565that the unpickler will recognize the string as a persistent id.
566
567To unpickle external objects, the unpickler must have a custom
568:func:`persistent_load` function that takes a persistent id string and returns
569the referenced object.
570
571Here's a silly example that *might* shed more light::
572
573 import pickle
574 from cStringIO import StringIO
575
576 src = StringIO()
577 p = pickle.Pickler(src)
578
579 def persistent_id(obj):
580 if hasattr(obj, 'x'):
581 return 'the value %d' % obj.x
582 else:
583 return None
584
585 p.persistent_id = persistent_id
586
587 class Integer:
588 def __init__(self, x):
589 self.x = x
590 def __str__(self):
591 return 'My name is integer %d' % self.x
592
593 i = Integer(7)
594 print i
595 p.dump(i)
596
597 datastream = src.getvalue()
598 print repr(datastream)
599 dst = StringIO(datastream)
600
601 up = pickle.Unpickler(dst)
602
603 class FancyInteger(Integer):
604 def __str__(self):
605 return 'I am the integer %d' % self.x
606
607 def persistent_load(persid):
608 if persid.startswith('the value '):
609 value = int(persid.split()[2])
610 return FancyInteger(value)
611 else:
612 raise pickle.UnpicklingError, 'Invalid persistent id'
613
614 up.persistent_load = persistent_load
615
616 j = up.load()
617 print j
618
619In the :mod:`cPickle` module, the unpickler's :attr:`persistent_load` attribute
620can also be set to a Python list, in which case, when the unpickler reaches a
621persistent id, the persistent id string will simply be appended to this list.
622This functionality exists so that a pickle data stream can be "sniffed" for
623object references without actually instantiating all the objects in a pickle.
624[#]_ Setting :attr:`persistent_load` to a list is usually used in conjunction
625with the :meth:`noload` method on the Unpickler.
626
627.. % BAW: Both pickle and cPickle support something called
628.. % inst_persistent_id() which appears to give unknown types a second
629.. % shot at producing a persistent id. Since Jim Fulton can't remember
630.. % why it was added or what it's for, I'm leaving it undocumented.
631
632
633.. _pickle-sub:
634
635Subclassing Unpicklers
636----------------------
637
638By default, unpickling will import any class that it finds in the pickle data.
639You can control exactly what gets unpickled and what gets called by customizing
640your unpickler. Unfortunately, exactly how you do this is different depending
641on whether you're using :mod:`pickle` or :mod:`cPickle`. [#]_
642
643In the :mod:`pickle` module, you need to derive a subclass from
644:class:`Unpickler`, overriding the :meth:`load_global` method.
645:meth:`load_global` should read two lines from the pickle data stream where the
646first line will the name of the module containing the class and the second line
647will be the name of the instance's class. It then looks up the class, possibly
648importing the module and digging out the attribute, then it appends what it
649finds to the unpickler's stack. Later on, this class will be assigned to the
650:attr:`__class__` attribute of an empty class, as a way of magically creating an
651instance without calling its class's :meth:`__init__`. Your job (should you
652choose to accept it), would be to have :meth:`load_global` push onto the
653unpickler's stack, a known safe version of any class you deem safe to unpickle.
654It is up to you to produce such a class. Or you could raise an error if you
655want to disallow all unpickling of instances. If this sounds like a hack,
656you're right. Refer to the source code to make this work.
657
658Things are a little cleaner with :mod:`cPickle`, but not by much. To control
659what gets unpickled, you can set the unpickler's :attr:`find_global` attribute
660to a function or ``None``. If it is ``None`` then any attempts to unpickle
661instances will raise an :exc:`UnpicklingError`. If it is a function, then it
662should accept a module name and a class name, and return the corresponding class
663object. It is responsible for looking up the class and performing any necessary
664imports, and it may raise an error to prevent instances of the class from being
665unpickled.
666
667The moral of the story is that you should be really careful about the source of
668the strings your application unpickles.
669
670
671.. _pickle-example:
672
673Example
674-------
675
676For the simplest code, use the :func:`dump` and :func:`load` functions. Note
677that a self-referencing list is pickled and restored correctly. ::
678
679 import pickle
680
681 data1 = {'a': [1, 2.0, 3, 4+6j],
682 'b': ('string', u'Unicode string'),
683 'c': None}
684
685 selfref_list = [1, 2, 3]
686 selfref_list.append(selfref_list)
687
688 output = open('data.pkl', 'wb')
689
690 # Pickle dictionary using protocol 0.
691 pickle.dump(data1, output)
692
693 # Pickle the list using the highest protocol available.
694 pickle.dump(selfref_list, output, -1)
695
696 output.close()
697
698The following example reads the resulting pickled data. When reading a
699pickle-containing file, you should open the file in binary mode because you
700can't be sure if the ASCII or binary format was used. ::
701
702 import pprint, pickle
703
704 pkl_file = open('data.pkl', 'rb')
705
706 data1 = pickle.load(pkl_file)
707 pprint.pprint(data1)
708
709 data2 = pickle.load(pkl_file)
710 pprint.pprint(data2)
711
712 pkl_file.close()
713
714Here's a larger example that shows how to modify pickling behavior for a class.
715The :class:`TextReader` class opens a text file, and returns the line number and
716line contents each time its :meth:`readline` method is called. If a
717:class:`TextReader` instance is pickled, all attributes *except* the file object
718member are saved. When the instance is unpickled, the file is reopened, and
719reading resumes from the last location. The :meth:`__setstate__` and
720:meth:`__getstate__` methods are used to implement this behavior. ::
721
722 #!/usr/local/bin/python
723
724 class TextReader:
725 """Print and number lines in a text file."""
726 def __init__(self, file):
727 self.file = file
728 self.fh = open(file)
729 self.lineno = 0
730
731 def readline(self):
732 self.lineno = self.lineno + 1
733 line = self.fh.readline()
734 if not line:
735 return None
736 if line.endswith("\n"):
737 line = line[:-1]
738 return "%d: %s" % (self.lineno, line)
739
740 def __getstate__(self):
741 odict = self.__dict__.copy() # copy the dict since we change it
742 del odict['fh'] # remove filehandle entry
743 return odict
744
745 def __setstate__(self, dict):
746 fh = open(dict['file']) # reopen file
747 count = dict['lineno'] # read from file...
748 while count: # until line count is restored
749 fh.readline()
750 count = count - 1
751 self.__dict__.update(dict) # update attributes
752 self.fh = fh # save the file object
753
754A sample usage might be something like this::
755
756 >>> import TextReader
757 >>> obj = TextReader.TextReader("TextReader.py")
758 >>> obj.readline()
759 '1: #!/usr/local/bin/python'
760 >>> obj.readline()
761 '2: '
762 >>> obj.readline()
763 '3: class TextReader:'
764 >>> import pickle
765 >>> pickle.dump(obj, open('save.p', 'wb'))
766
767If you want to see that :mod:`pickle` works across Python processes, start
768another Python session, before continuing. What follows can happen from either
769the same process or a new process. ::
770
771 >>> import pickle
772 >>> reader = pickle.load(open('save.p', 'rb'))
773 >>> reader.readline()
774 '4: """Print and number lines in a text file."""'
775
776
777.. seealso::
778
779 Module :mod:`copy_reg`
780 Pickle interface constructor registration for extension types.
781
782 Module :mod:`shelve`
783 Indexed databases of objects; uses :mod:`pickle`.
784
785 Module :mod:`copy`
786 Shallow and deep object copying.
787
788 Module :mod:`marshal`
789 High-performance serialization of built-in types.
790
791
792:mod:`cPickle` --- A faster :mod:`pickle`
793=========================================
794
795.. module:: cPickle
796 :synopsis: Faster version of pickle, but not subclassable.
797.. moduleauthor:: Jim Fulton <jim@zope.com>
798.. sectionauthor:: Fred L. Drake, Jr. <fdrake@acm.org>
799
800
801.. index:: module: pickle
802
803The :mod:`cPickle` module supports serialization and de-serialization of Python
804objects, providing an interface and functionality nearly identical to the
805:mod:`pickle` module. There are several differences, the most important being
806performance and subclassability.
807
808First, :mod:`cPickle` can be up to 1000 times faster than :mod:`pickle` because
809the former is implemented in C. Second, in the :mod:`cPickle` module the
810callables :func:`Pickler` and :func:`Unpickler` are functions, not classes.
811This means that you cannot use them to derive custom pickling and unpickling
812subclasses. Most applications have no need for this functionality and should
813benefit from the greatly improved performance of the :mod:`cPickle` module.
814
815The pickle data stream produced by :mod:`pickle` and :mod:`cPickle` are
816identical, so it is possible to use :mod:`pickle` and :mod:`cPickle`
817interchangeably with existing pickles. [#]_
818
819There are additional minor differences in API between :mod:`cPickle` and
820:mod:`pickle`, however for most applications, they are interchangeable. More
821documentation is provided in the :mod:`pickle` module documentation, which
822includes a list of the documented differences.
823
824.. rubric:: Footnotes
825
826.. [#] Don't confuse this with the :mod:`marshal` module
827
828.. [#] In the :mod:`pickle` module these callables are classes, which you could
829 subclass to customize the behavior. However, in the :mod:`cPickle` module these
830 callables are factory functions and so cannot be subclassed. One common reason
831 to subclass is to control what objects can actually be unpickled. See section
832 :ref:`pickle-sub` for more details.
833
834.. [#] *Warning*: this is intended for pickling multiple objects without intervening
835 modifications to the objects or their parts. If you modify an object and then
836 pickle it again using the same :class:`Pickler` instance, the object is not
837 pickled again --- a reference to it is pickled and the :class:`Unpickler` will
838 return the old value, not the modified one. There are two problems here: (1)
839 detecting changes, and (2) marshalling a minimal set of changes. Garbage
840 Collection may also become a problem here.
841
842.. [#] The exception raised will likely be an :exc:`ImportError` or an
843 :exc:`AttributeError` but it could be something else.
844
845.. [#] These methods can also be used to implement copying class instances.
846
847.. [#] This protocol is also used by the shallow and deep copying operations defined in
848 the :mod:`copy` module.
849
850.. [#] The actual mechanism for associating these user defined functions is slightly
851 different for :mod:`pickle` and :mod:`cPickle`. The description given here
852 works the same for both implementations. Users of the :mod:`pickle` module
853 could also use subclassing to effect the same results, overriding the
854 :meth:`persistent_id` and :meth:`persistent_load` methods in the derived
855 classes.
856
857.. [#] We'll leave you with the image of Guido and Jim sitting around sniffing pickles
858 in their living rooms.
859
860.. [#] A word of caution: the mechanisms described here use internal attributes and
861 methods, which are subject to change in future versions of Python. We intend to
862 someday provide a common interface for controlling this behavior, which will
863 work in either :mod:`pickle` or :mod:`cPickle`.
864
865.. [#] Since the pickle data format is actually a tiny stack-oriented programming
866 language, and some freedom is taken in the encodings of certain objects, it is
867 possible that the two modules produce different data streams for the same input
868 objects. However it is guaranteed that they will always be able to read each
869 other's data streams.
870