blob: 918fb8e786773a2e7725fe5fa82dc0985119aa6a [file] [log] [blame]
Georg Brandl8ec7f652007-08-15 14:28:01 +00001:mod:`pickle` --- Python object serialization
2=============================================
3
4.. index::
5 single: persistence
6 pair: persistent; objects
7 pair: serializing; objects
8 pair: marshalling; objects
9 pair: flattening; objects
10 pair: pickling; objects
11
12.. module:: pickle
13 :synopsis: Convert Python objects to streams of bytes and back.
Georg Brandlb19be572007-12-29 10:57:00 +000014.. sectionauthor:: Jim Kerr <jbkerr@sr.hp.com>.
15.. sectionauthor:: Barry Warsaw <barry@zope.com>
Georg Brandl8ec7f652007-08-15 14:28:01 +000016
17The :mod:`pickle` module implements a fundamental, but powerful algorithm for
18serializing and de-serializing a Python object structure. "Pickling" is the
19process whereby a Python object hierarchy is converted into a byte stream, and
20"unpickling" is the inverse operation, whereby a byte stream is converted back
21into an object hierarchy. Pickling (and unpickling) is alternatively known as
22"serialization", "marshalling," [#]_ or "flattening", however, to avoid
23confusion, the terms used here are "pickling" and "unpickling".
24
25This documentation describes both the :mod:`pickle` module and the
26:mod:`cPickle` module.
27
28
29Relationship to other Python modules
30------------------------------------
31
32The :mod:`pickle` module has an optimized cousin called the :mod:`cPickle`
33module. As its name implies, :mod:`cPickle` is written in C, so it can be up to
341000 times faster than :mod:`pickle`. However it does not support subclassing
35of the :func:`Pickler` and :func:`Unpickler` classes, because in :mod:`cPickle`
36these are functions, not classes. Most applications have no need for this
37functionality, and can benefit from the improved performance of :mod:`cPickle`.
38Other than that, the interfaces of the two modules are nearly identical; the
39common interface is described in this manual and differences are pointed out
40where necessary. In the following discussions, we use the term "pickle" to
41collectively describe the :mod:`pickle` and :mod:`cPickle` modules.
42
43The data streams the two modules produce are guaranteed to be interchangeable.
44
45Python has a more primitive serialization module called :mod:`marshal`, but in
46general :mod:`pickle` should always be the preferred way to serialize Python
47objects. :mod:`marshal` exists primarily to support Python's :file:`.pyc`
48files.
49
50The :mod:`pickle` module differs from :mod:`marshal` several significant ways:
51
52* The :mod:`pickle` module keeps track of the objects it has already serialized,
53 so that later references to the same object won't be serialized again.
54 :mod:`marshal` doesn't do this.
55
56 This has implications both for recursive objects and object sharing. Recursive
57 objects are objects that contain references to themselves. These are not
58 handled by marshal, and in fact, attempting to marshal recursive objects will
59 crash your Python interpreter. Object sharing happens when there are multiple
60 references to the same object in different places in the object hierarchy being
61 serialized. :mod:`pickle` stores such objects only once, and ensures that all
62 other references point to the master copy. Shared objects remain shared, which
63 can be very important for mutable objects.
64
65* :mod:`marshal` cannot be used to serialize user-defined classes and their
66 instances. :mod:`pickle` can save and restore class instances transparently,
67 however the class definition must be importable and live in the same module as
68 when the object was stored.
69
70* The :mod:`marshal` serialization format is not guaranteed to be portable
71 across Python versions. Because its primary job in life is to support
72 :file:`.pyc` files, the Python implementers reserve the right to change the
73 serialization format in non-backwards compatible ways should the need arise.
74 The :mod:`pickle` serialization format is guaranteed to be backwards compatible
75 across Python releases.
76
77.. warning::
78
79 The :mod:`pickle` module is not intended to be secure against erroneous or
80 maliciously constructed data. Never unpickle data received from an untrusted or
81 unauthenticated source.
82
83Note that serialization is a more primitive notion than persistence; although
84:mod:`pickle` reads and writes file objects, it does not handle the issue of
85naming persistent objects, nor the (even more complicated) issue of concurrent
86access to persistent objects. The :mod:`pickle` module can transform a complex
87object into a byte stream and it can transform the byte stream into an object
88with the same internal structure. Perhaps the most obvious thing to do with
89these byte streams is to write them onto a file, but it is also conceivable to
90send them across a network or store them in a database. The module
91:mod:`shelve` provides a simple interface to pickle and unpickle objects on
92DBM-style database files.
93
94
95Data stream format
96------------------
97
98.. index::
99 single: XDR
100 single: External Data Representation
101
102The data format used by :mod:`pickle` is Python-specific. This has the
103advantage that there are no restrictions imposed by external standards such as
104XDR (which can't represent pointer sharing); however it means that non-Python
105programs may not be able to reconstruct pickled Python objects.
106
107By default, the :mod:`pickle` data format uses a printable ASCII representation.
108This is slightly more voluminous than a binary representation. The big
109advantage of using printable ASCII (and of some other characteristics of
110:mod:`pickle`'s representation) is that for debugging or recovery purposes it is
111possible for a human to read the pickled file with a standard text editor.
112
113There are currently 3 different protocols which can be used for pickling.
114
115* Protocol version 0 is the original ASCII protocol and is backwards compatible
116 with earlier versions of Python.
117
118* Protocol version 1 is the old binary format which is also compatible with
119 earlier versions of Python.
120
121* Protocol version 2 was introduced in Python 2.3. It provides much more
Georg Brandla7395032007-10-21 12:15:05 +0000122 efficient pickling of :term:`new-style class`\es.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000123
124Refer to :pep:`307` for more information.
125
126If a *protocol* is not specified, protocol 0 is used. If *protocol* is specified
127as a negative value or :const:`HIGHEST_PROTOCOL`, the highest protocol version
128available will be used.
129
130.. versionchanged:: 2.3
131 Introduced the *protocol* parameter.
132
133A binary format, which is slightly more efficient, can be chosen by specifying a
134*protocol* version >= 1.
135
136
137Usage
138-----
139
140To serialize an object hierarchy, you first create a pickler, then you call the
141pickler's :meth:`dump` method. To de-serialize a data stream, you first create
142an unpickler, then you call the unpickler's :meth:`load` method. The
143:mod:`pickle` module provides the following constant:
144
145
146.. data:: HIGHEST_PROTOCOL
147
148 The highest protocol version available. This value can be passed as a
149 *protocol* value.
150
151 .. versionadded:: 2.3
152
153.. note::
154
155 Be sure to always open pickle files created with protocols >= 1 in binary mode.
156 For the old ASCII-based pickle protocol 0 you can use either text mode or binary
157 mode as long as you stay consistent.
158
159 A pickle file written with protocol 0 in binary mode will contain lone linefeeds
160 as line terminators and therefore will look "funny" when viewed in Notepad or
161 other editors which do not support this format.
162
163The :mod:`pickle` module provides the following functions to make the pickling
164process more convenient:
165
166
167.. function:: dump(obj, file[, protocol])
168
169 Write a pickled representation of *obj* to the open file object *file*. This is
170 equivalent to ``Pickler(file, protocol).dump(obj)``.
171
172 If the *protocol* parameter is omitted, protocol 0 is used. If *protocol* is
173 specified as a negative value or :const:`HIGHEST_PROTOCOL`, the highest protocol
174 version will be used.
175
176 .. versionchanged:: 2.3
177 Introduced the *protocol* parameter.
178
179 *file* must have a :meth:`write` method that accepts a single string argument.
180 It can thus be a file object opened for writing, a :mod:`StringIO` object, or
181 any other custom object that meets this interface.
182
183
184.. function:: load(file)
185
186 Read a string from the open file object *file* and interpret it as a pickle data
187 stream, reconstructing and returning the original object hierarchy. This is
188 equivalent to ``Unpickler(file).load()``.
189
190 *file* must have two methods, a :meth:`read` method that takes an integer
191 argument, and a :meth:`readline` method that requires no arguments. Both
192 methods should return a string. Thus *file* can be a file object opened for
193 reading, a :mod:`StringIO` object, or any other custom object that meets this
194 interface.
195
196 This function automatically determines whether the data stream was written in
197 binary mode or not.
198
199
200.. function:: dumps(obj[, protocol])
201
202 Return the pickled representation of the object as a string, instead of writing
203 it to a file.
204
205 If the *protocol* parameter is omitted, protocol 0 is used. If *protocol* is
206 specified as a negative value or :const:`HIGHEST_PROTOCOL`, the highest protocol
207 version will be used.
208
209 .. versionchanged:: 2.3
210 The *protocol* parameter was added.
211
212
213.. function:: loads(string)
214
215 Read a pickled object hierarchy from a string. Characters in the string past
216 the pickled object's representation are ignored.
217
218The :mod:`pickle` module also defines three exceptions:
219
220
221.. exception:: PickleError
222
223 A common base class for the other exceptions defined below. This inherits from
224 :exc:`Exception`.
225
226
227.. exception:: PicklingError
228
229 This exception is raised when an unpicklable object is passed to the
230 :meth:`dump` method.
231
232
233.. exception:: UnpicklingError
234
235 This exception is raised when there is a problem unpickling an object. Note that
236 other exceptions may also be raised during unpickling, including (but not
237 necessarily limited to) :exc:`AttributeError`, :exc:`EOFError`,
238 :exc:`ImportError`, and :exc:`IndexError`.
239
240The :mod:`pickle` module also exports two callables [#]_, :class:`Pickler` and
241:class:`Unpickler`:
242
243
244.. class:: Pickler(file[, protocol])
245
246 This takes a file-like object to which it will write a pickle data stream.
247
248 If the *protocol* parameter is omitted, protocol 0 is used. If *protocol* is
249 specified as a negative value or :const:`HIGHEST_PROTOCOL`, the highest
250 protocol version will be used.
251
252 .. versionchanged:: 2.3
253 Introduced the *protocol* parameter.
254
255 *file* must have a :meth:`write` method that accepts a single string argument.
256 It can thus be an open file object, a :mod:`StringIO` object, or any other
257 custom object that meets this interface.
258
259:class:`Pickler` objects define one (or two) public methods:
260
261
262.. method:: Pickler.dump(obj)
263
264 Write a pickled representation of *obj* to the open file object given in the
265 constructor. Either the binary or ASCII format will be used, depending on the
266 value of the *protocol* argument passed to the constructor.
267
268
269.. method:: Pickler.clear_memo()
270
271 Clears the pickler's "memo". The memo is the data structure that remembers
272 which objects the pickler has already seen, so that shared or recursive objects
273 pickled by reference and not by value. This method is useful when re-using
274 picklers.
275
276 .. note::
277
278 Prior to Python 2.3, :meth:`clear_memo` was only available on the picklers
279 created by :mod:`cPickle`. In the :mod:`pickle` module, picklers have an
280 instance variable called :attr:`memo` which is a Python dictionary. So to clear
281 the memo for a :mod:`pickle` module pickler, you could do the following::
282
283 mypickler.memo.clear()
284
285 Code that does not need to support older versions of Python should simply use
286 :meth:`clear_memo`.
287
288It is possible to make multiple calls to the :meth:`dump` method of the same
289:class:`Pickler` instance. These must then be matched to the same number of
290calls to the :meth:`load` method of the corresponding :class:`Unpickler`
291instance. If the same object is pickled by multiple :meth:`dump` calls, the
292:meth:`load` will all yield references to the same object. [#]_
293
294:class:`Unpickler` objects are defined as:
295
296
297.. class:: Unpickler(file)
298
299 This takes a file-like object from which it will read a pickle data stream.
300 This class automatically determines whether the data stream was written in
301 binary mode or not, so it does not need a flag as in the :class:`Pickler`
302 factory.
303
304 *file* must have two methods, a :meth:`read` method that takes an integer
305 argument, and a :meth:`readline` method that requires no arguments. Both
306 methods should return a string. Thus *file* can be a file object opened for
307 reading, a :mod:`StringIO` object, or any other custom object that meets this
308 interface.
309
310:class:`Unpickler` objects have one (or two) public methods:
311
312
313.. method:: Unpickler.load()
314
315 Read a pickled object representation from the open file object given in the
316 constructor, and return the reconstituted object hierarchy specified therein.
317
318 This method automatically determines whether the data stream was written in
319 binary mode or not.
320
321
322.. method:: Unpickler.noload()
323
324 This is just like :meth:`load` except that it doesn't actually create any
325 objects. This is useful primarily for finding what's called "persistent ids"
326 that may be referenced in a pickle data stream. See section
327 :ref:`pickle-protocol` below for more details.
328
329 **Note:** the :meth:`noload` method is currently only available on
330 :class:`Unpickler` objects created with the :mod:`cPickle` module.
331 :mod:`pickle` module :class:`Unpickler`\ s do not have the :meth:`noload`
332 method.
333
334
335What can be pickled and unpickled?
336----------------------------------
337
338The following types can be pickled:
339
340* ``None``, ``True``, and ``False``
341
342* integers, long integers, floating point numbers, complex numbers
343
344* normal and Unicode strings
345
346* tuples, lists, sets, and dictionaries containing only picklable objects
347
348* functions defined at the top level of a module
349
350* built-in functions defined at the top level of a module
351
352* classes that are defined at the top level of a module
353
354* instances of such classes whose :attr:`__dict__` or :meth:`__setstate__` is
355 picklable (see section :ref:`pickle-protocol` for details)
356
357Attempts to pickle unpicklable objects will raise the :exc:`PicklingError`
358exception; when this happens, an unspecified number of bytes may have already
359been written to the underlying file. Trying to pickle a highly recursive data
360structure may exceed the maximum recursion depth, a :exc:`RuntimeError` will be
361raised in this case. You can carefully raise this limit with
362:func:`sys.setrecursionlimit`.
363
364Note that functions (built-in and user-defined) are pickled by "fully qualified"
365name reference, not by value. This means that only the function name is
366pickled, along with the name of module the function is defined in. Neither the
367function's code, nor any of its function attributes are pickled. Thus the
368defining module must be importable in the unpickling environment, and the module
369must contain the named object, otherwise an exception will be raised. [#]_
370
371Similarly, classes are pickled by named reference, so the same restrictions in
372the unpickling environment apply. Note that none of the class's code or data is
373pickled, so in the following example the class attribute ``attr`` is not
374restored in the unpickling environment::
375
376 class Foo:
377 attr = 'a class attr'
378
379 picklestring = pickle.dumps(Foo)
380
381These restrictions are why picklable functions and classes must be defined in
382the top level of a module.
383
384Similarly, when class instances are pickled, their class's code and data are not
385pickled along with them. Only the instance data are pickled. This is done on
386purpose, so you can fix bugs in a class or add methods to the class and still
387load objects that were created with an earlier version of the class. If you
388plan to have long-lived objects that will see many versions of a class, it may
389be worthwhile to put a version number in the objects so that suitable
390conversions can be made by the class's :meth:`__setstate__` method.
391
392
393.. _pickle-protocol:
394
395The pickle protocol
396-------------------
397
398This section describes the "pickling protocol" that defines the interface
399between the pickler/unpickler and the objects that are being serialized. This
400protocol provides a standard way for you to define, customize, and control how
401your objects are serialized and de-serialized. The description in this section
402doesn't cover specific customizations that you can employ to make the unpickling
403environment slightly safer from untrusted pickle data streams; see section
404:ref:`pickle-sub` for more details.
405
406
407.. _pickle-inst:
408
409Pickling and unpickling normal class instances
410^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
411
412.. index::
413 single: __getinitargs__() (copy protocol)
414 single: __init__() (instance constructor)
415
416When a pickled class instance is unpickled, its :meth:`__init__` method is
417normally *not* invoked. If it is desirable that the :meth:`__init__` method be
418called on unpickling, an old-style class can define a method
419:meth:`__getinitargs__`, which should return a *tuple* containing the arguments
420to be passed to the class constructor (:meth:`__init__` for example). The
421:meth:`__getinitargs__` method is called at pickle time; the tuple it returns is
422incorporated in the pickle for the instance.
423
424.. index:: single: __getnewargs__() (copy protocol)
425
426New-style types can provide a :meth:`__getnewargs__` method that is used for
427protocol 2. Implementing this method is needed if the type establishes some
428internal invariants when the instance is created, or if the memory allocation is
429affected by the values passed to the :meth:`__new__` method for the type (as it
Georg Brandla7395032007-10-21 12:15:05 +0000430is for tuples and strings). Instances of a :term:`new-style class` :class:`C`
431are created using ::
Georg Brandl8ec7f652007-08-15 14:28:01 +0000432
433 obj = C.__new__(C, *args)
434
435
436where *args* is the result of calling :meth:`__getnewargs__` on the original
437object; if there is no :meth:`__getnewargs__`, an empty tuple is assumed.
438
439.. index::
440 single: __getstate__() (copy protocol)
441 single: __setstate__() (copy protocol)
442 single: __dict__ (instance attribute)
443
444Classes can further influence how their instances are pickled; if the class
445defines the method :meth:`__getstate__`, it is called and the return state is
446pickled as the contents for the instance, instead of the contents of the
447instance's dictionary. If there is no :meth:`__getstate__` method, the
448instance's :attr:`__dict__` is pickled.
449
450Upon unpickling, if the class also defines the method :meth:`__setstate__`, it
451is called with the unpickled state. [#]_ If there is no :meth:`__setstate__`
452method, the pickled state must be a dictionary and its items are assigned to the
453new instance's dictionary. If a class defines both :meth:`__getstate__` and
454:meth:`__setstate__`, the state object needn't be a dictionary and these methods
455can do what they want. [#]_
456
457.. warning::
458
Georg Brandla7395032007-10-21 12:15:05 +0000459 For :term:`new-style class`\es, if :meth:`__getstate__` returns a false
460 value, the :meth:`__setstate__` method will not be called.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000461
462
463Pickling and unpickling extension types
464^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
465
Andrew M. Kuchling8887e542008-02-23 16:39:43 +0000466.. index::
467 single: __reduce__() (pickle protocol)
468 single: __reduce_ex__() (pickle protocol)
469 single: __safe_for_unpickling__ (pickle protocol)
470
Georg Brandl8ec7f652007-08-15 14:28:01 +0000471When the :class:`Pickler` encounters an object of a type it knows nothing about
472--- such as an extension type --- it looks in two places for a hint of how to
473pickle it. One alternative is for the object to implement a :meth:`__reduce__`
474method. If provided, at pickling time :meth:`__reduce__` will be called with no
475arguments, and it must return either a string or a tuple.
476
477If a string is returned, it names a global variable whose contents are pickled
478as normal. The string returned by :meth:`__reduce__` should be the object's
479local name relative to its module; the pickle module searches the module
480namespace to determine the object's module.
481
482When a tuple is returned, it must be between two and five elements long.
483Optional elements can either be omitted, or ``None`` can be provided as their
484value. The semantics of each element are:
485
486* A callable object that will be called to create the initial version of the
487 object. The next element of the tuple will provide arguments for this callable,
488 and later elements provide additional state information that will subsequently
489 be used to fully reconstruct the pickled data.
490
491 In the unpickling environment this object must be either a class, a callable
492 registered as a "safe constructor" (see below), or it must have an attribute
493 :attr:`__safe_for_unpickling__` with a true value. Otherwise, an
494 :exc:`UnpicklingError` will be raised in the unpickling environment. Note that
495 as usual, the callable itself is pickled by name.
496
497* A tuple of arguments for the callable object.
498
499 .. versionchanged:: 2.5
500 Formerly, this argument could also be ``None``.
501
502* Optionally, the object's state, which will be passed to the object's
503 :meth:`__setstate__` method as described in section :ref:`pickle-inst`. If the
504 object has no :meth:`__setstate__` method, then, as above, the value must be a
505 dictionary and it will be added to the object's :attr:`__dict__`.
506
507* Optionally, an iterator (and not a sequence) yielding successive list items.
508 These list items will be pickled, and appended to the object using either
509 ``obj.append(item)`` or ``obj.extend(list_of_items)``. This is primarily used
510 for list subclasses, but may be used by other classes as long as they have
511 :meth:`append` and :meth:`extend` methods with the appropriate signature.
512 (Whether :meth:`append` or :meth:`extend` is used depends on which pickle
513 protocol version is used as well as the number of items to append, so both must
514 be supported.)
515
516* Optionally, an iterator (not a sequence) yielding successive dictionary items,
517 which should be tuples of the form ``(key, value)``. These items will be
518 pickled and stored to the object using ``obj[key] = value``. This is primarily
519 used for dictionary subclasses, but may be used by other classes as long as they
520 implement :meth:`__setitem__`.
521
522It is sometimes useful to know the protocol version when implementing
523:meth:`__reduce__`. This can be done by implementing a method named
524:meth:`__reduce_ex__` instead of :meth:`__reduce__`. :meth:`__reduce_ex__`, when
525it exists, is called in preference over :meth:`__reduce__` (you may still
526provide :meth:`__reduce__` for backwards compatibility). The
527:meth:`__reduce_ex__` method will be called with a single integer argument, the
528protocol version.
529
530The :class:`object` class implements both :meth:`__reduce__` and
531:meth:`__reduce_ex__`; however, if a subclass overrides :meth:`__reduce__` but
532not :meth:`__reduce_ex__`, the :meth:`__reduce_ex__` implementation detects this
533and calls :meth:`__reduce__`.
534
535An alternative to implementing a :meth:`__reduce__` method on the object to be
536pickled, is to register the callable with the :mod:`copy_reg` module. This
537module provides a way for programs to register "reduction functions" and
538constructors for user-defined types. Reduction functions have the same
539semantics and interface as the :meth:`__reduce__` method described above, except
540that they are called with a single argument, the object to be pickled.
541
542The registered constructor is deemed a "safe constructor" for purposes of
543unpickling as described above.
544
545
546Pickling and unpickling external objects
547^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
548
Andrew M. Kuchling8887e542008-02-23 16:39:43 +0000549.. index::
550 single: persistent_id (pickle protocol)
551 single: persistent_load (pickle protocol)
552
Georg Brandl8ec7f652007-08-15 14:28:01 +0000553For the benefit of object persistence, the :mod:`pickle` module supports the
554notion of a reference to an object outside the pickled data stream. Such
555objects are referenced by a "persistent id", which is just an arbitrary string
556of printable ASCII characters. The resolution of such names is not defined by
557the :mod:`pickle` module; it will delegate this resolution to user defined
558functions on the pickler and unpickler. [#]_
559
560To define external persistent id resolution, you need to set the
561:attr:`persistent_id` attribute of the pickler object and the
562:attr:`persistent_load` attribute of the unpickler object.
563
564To pickle objects that have an external persistent id, the pickler must have a
565custom :func:`persistent_id` method that takes an object as an argument and
566returns either ``None`` or the persistent id for that object. When ``None`` is
567returned, the pickler simply pickles the object as normal. When a persistent id
568string is returned, the pickler will pickle that string, along with a marker so
569that the unpickler will recognize the string as a persistent id.
570
571To unpickle external objects, the unpickler must have a custom
572:func:`persistent_load` function that takes a persistent id string and returns
573the referenced object.
574
575Here's a silly example that *might* shed more light::
576
577 import pickle
578 from cStringIO import StringIO
579
580 src = StringIO()
581 p = pickle.Pickler(src)
582
583 def persistent_id(obj):
584 if hasattr(obj, 'x'):
585 return 'the value %d' % obj.x
586 else:
587 return None
588
589 p.persistent_id = persistent_id
590
591 class Integer:
592 def __init__(self, x):
593 self.x = x
594 def __str__(self):
595 return 'My name is integer %d' % self.x
596
597 i = Integer(7)
598 print i
599 p.dump(i)
600
601 datastream = src.getvalue()
602 print repr(datastream)
603 dst = StringIO(datastream)
604
605 up = pickle.Unpickler(dst)
606
607 class FancyInteger(Integer):
608 def __str__(self):
609 return 'I am the integer %d' % self.x
610
611 def persistent_load(persid):
612 if persid.startswith('the value '):
613 value = int(persid.split()[2])
614 return FancyInteger(value)
615 else:
616 raise pickle.UnpicklingError, 'Invalid persistent id'
617
618 up.persistent_load = persistent_load
619
620 j = up.load()
621 print j
622
623In the :mod:`cPickle` module, the unpickler's :attr:`persistent_load` attribute
624can also be set to a Python list, in which case, when the unpickler reaches a
625persistent id, the persistent id string will simply be appended to this list.
626This functionality exists so that a pickle data stream can be "sniffed" for
627object references without actually instantiating all the objects in a pickle.
628[#]_ Setting :attr:`persistent_load` to a list is usually used in conjunction
629with the :meth:`noload` method on the Unpickler.
630
Georg Brandlb19be572007-12-29 10:57:00 +0000631.. BAW: Both pickle and cPickle support something called inst_persistent_id()
632 which appears to give unknown types a second shot at producing a persistent
633 id. Since Jim Fulton can't remember why it was added or what it's for, I'm
634 leaving it undocumented.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000635
636
637.. _pickle-sub:
638
639Subclassing Unpicklers
640----------------------
641
Andrew M. Kuchling8887e542008-02-23 16:39:43 +0000642.. index::
643 single: load_global() (pickle protocol)
644 single: find_global() (pickle protocol)
645
Georg Brandl8ec7f652007-08-15 14:28:01 +0000646By default, unpickling will import any class that it finds in the pickle data.
647You can control exactly what gets unpickled and what gets called by customizing
648your unpickler. Unfortunately, exactly how you do this is different depending
649on whether you're using :mod:`pickle` or :mod:`cPickle`. [#]_
650
651In the :mod:`pickle` module, you need to derive a subclass from
652:class:`Unpickler`, overriding the :meth:`load_global` method.
653:meth:`load_global` should read two lines from the pickle data stream where the
654first line will the name of the module containing the class and the second line
655will be the name of the instance's class. It then looks up the class, possibly
656importing the module and digging out the attribute, then it appends what it
657finds to the unpickler's stack. Later on, this class will be assigned to the
658:attr:`__class__` attribute of an empty class, as a way of magically creating an
659instance without calling its class's :meth:`__init__`. Your job (should you
660choose to accept it), would be to have :meth:`load_global` push onto the
661unpickler's stack, a known safe version of any class you deem safe to unpickle.
662It is up to you to produce such a class. Or you could raise an error if you
663want to disallow all unpickling of instances. If this sounds like a hack,
664you're right. Refer to the source code to make this work.
665
666Things are a little cleaner with :mod:`cPickle`, but not by much. To control
667what gets unpickled, you can set the unpickler's :attr:`find_global` attribute
668to a function or ``None``. If it is ``None`` then any attempts to unpickle
669instances will raise an :exc:`UnpicklingError`. If it is a function, then it
670should accept a module name and a class name, and return the corresponding class
671object. It is responsible for looking up the class and performing any necessary
672imports, and it may raise an error to prevent instances of the class from being
673unpickled.
674
675The moral of the story is that you should be really careful about the source of
676the strings your application unpickles.
677
678
679.. _pickle-example:
680
681Example
682-------
683
684For the simplest code, use the :func:`dump` and :func:`load` functions. Note
685that a self-referencing list is pickled and restored correctly. ::
686
687 import pickle
688
689 data1 = {'a': [1, 2.0, 3, 4+6j],
690 'b': ('string', u'Unicode string'),
691 'c': None}
692
693 selfref_list = [1, 2, 3]
694 selfref_list.append(selfref_list)
695
696 output = open('data.pkl', 'wb')
697
698 # Pickle dictionary using protocol 0.
699 pickle.dump(data1, output)
700
701 # Pickle the list using the highest protocol available.
702 pickle.dump(selfref_list, output, -1)
703
704 output.close()
705
706The following example reads the resulting pickled data. When reading a
707pickle-containing file, you should open the file in binary mode because you
708can't be sure if the ASCII or binary format was used. ::
709
710 import pprint, pickle
711
712 pkl_file = open('data.pkl', 'rb')
713
714 data1 = pickle.load(pkl_file)
715 pprint.pprint(data1)
716
717 data2 = pickle.load(pkl_file)
718 pprint.pprint(data2)
719
720 pkl_file.close()
721
722Here's a larger example that shows how to modify pickling behavior for a class.
723The :class:`TextReader` class opens a text file, and returns the line number and
724line contents each time its :meth:`readline` method is called. If a
725:class:`TextReader` instance is pickled, all attributes *except* the file object
726member are saved. When the instance is unpickled, the file is reopened, and
727reading resumes from the last location. The :meth:`__setstate__` and
728:meth:`__getstate__` methods are used to implement this behavior. ::
729
730 #!/usr/local/bin/python
731
732 class TextReader:
733 """Print and number lines in a text file."""
734 def __init__(self, file):
735 self.file = file
736 self.fh = open(file)
737 self.lineno = 0
738
739 def readline(self):
740 self.lineno = self.lineno + 1
741 line = self.fh.readline()
742 if not line:
743 return None
744 if line.endswith("\n"):
745 line = line[:-1]
746 return "%d: %s" % (self.lineno, line)
747
748 def __getstate__(self):
749 odict = self.__dict__.copy() # copy the dict since we change it
750 del odict['fh'] # remove filehandle entry
751 return odict
752
753 def __setstate__(self, dict):
754 fh = open(dict['file']) # reopen file
755 count = dict['lineno'] # read from file...
756 while count: # until line count is restored
757 fh.readline()
758 count = count - 1
759 self.__dict__.update(dict) # update attributes
760 self.fh = fh # save the file object
761
762A sample usage might be something like this::
763
764 >>> import TextReader
765 >>> obj = TextReader.TextReader("TextReader.py")
766 >>> obj.readline()
767 '1: #!/usr/local/bin/python'
768 >>> obj.readline()
769 '2: '
770 >>> obj.readline()
771 '3: class TextReader:'
772 >>> import pickle
773 >>> pickle.dump(obj, open('save.p', 'wb'))
774
775If you want to see that :mod:`pickle` works across Python processes, start
776another Python session, before continuing. What follows can happen from either
777the same process or a new process. ::
778
779 >>> import pickle
780 >>> reader = pickle.load(open('save.p', 'rb'))
781 >>> reader.readline()
782 '4: """Print and number lines in a text file."""'
783
784
785.. seealso::
786
787 Module :mod:`copy_reg`
788 Pickle interface constructor registration for extension types.
789
790 Module :mod:`shelve`
791 Indexed databases of objects; uses :mod:`pickle`.
792
793 Module :mod:`copy`
794 Shallow and deep object copying.
795
796 Module :mod:`marshal`
797 High-performance serialization of built-in types.
798
799
800:mod:`cPickle` --- A faster :mod:`pickle`
801=========================================
802
803.. module:: cPickle
804 :synopsis: Faster version of pickle, but not subclassable.
805.. moduleauthor:: Jim Fulton <jim@zope.com>
806.. sectionauthor:: Fred L. Drake, Jr. <fdrake@acm.org>
807
808
809.. index:: module: pickle
810
811The :mod:`cPickle` module supports serialization and de-serialization of Python
812objects, providing an interface and functionality nearly identical to the
813:mod:`pickle` module. There are several differences, the most important being
814performance and subclassability.
815
816First, :mod:`cPickle` can be up to 1000 times faster than :mod:`pickle` because
817the former is implemented in C. Second, in the :mod:`cPickle` module the
818callables :func:`Pickler` and :func:`Unpickler` are functions, not classes.
819This means that you cannot use them to derive custom pickling and unpickling
820subclasses. Most applications have no need for this functionality and should
821benefit from the greatly improved performance of the :mod:`cPickle` module.
822
823The pickle data stream produced by :mod:`pickle` and :mod:`cPickle` are
824identical, so it is possible to use :mod:`pickle` and :mod:`cPickle`
825interchangeably with existing pickles. [#]_
826
827There are additional minor differences in API between :mod:`cPickle` and
828:mod:`pickle`, however for most applications, they are interchangeable. More
829documentation is provided in the :mod:`pickle` module documentation, which
830includes a list of the documented differences.
831
832.. rubric:: Footnotes
833
834.. [#] Don't confuse this with the :mod:`marshal` module
835
836.. [#] In the :mod:`pickle` module these callables are classes, which you could
837 subclass to customize the behavior. However, in the :mod:`cPickle` module these
838 callables are factory functions and so cannot be subclassed. One common reason
839 to subclass is to control what objects can actually be unpickled. See section
840 :ref:`pickle-sub` for more details.
841
842.. [#] *Warning*: this is intended for pickling multiple objects without intervening
843 modifications to the objects or their parts. If you modify an object and then
844 pickle it again using the same :class:`Pickler` instance, the object is not
845 pickled again --- a reference to it is pickled and the :class:`Unpickler` will
846 return the old value, not the modified one. There are two problems here: (1)
847 detecting changes, and (2) marshalling a minimal set of changes. Garbage
848 Collection may also become a problem here.
849
850.. [#] The exception raised will likely be an :exc:`ImportError` or an
851 :exc:`AttributeError` but it could be something else.
852
853.. [#] These methods can also be used to implement copying class instances.
854
855.. [#] This protocol is also used by the shallow and deep copying operations defined in
856 the :mod:`copy` module.
857
858.. [#] The actual mechanism for associating these user defined functions is slightly
859 different for :mod:`pickle` and :mod:`cPickle`. The description given here
860 works the same for both implementations. Users of the :mod:`pickle` module
861 could also use subclassing to effect the same results, overriding the
862 :meth:`persistent_id` and :meth:`persistent_load` methods in the derived
863 classes.
864
865.. [#] We'll leave you with the image of Guido and Jim sitting around sniffing pickles
866 in their living rooms.
867
868.. [#] A word of caution: the mechanisms described here use internal attributes and
869 methods, which are subject to change in future versions of Python. We intend to
870 someday provide a common interface for controlling this behavior, which will
871 work in either :mod:`pickle` or :mod:`cPickle`.
872
873.. [#] Since the pickle data format is actually a tiny stack-oriented programming
874 language, and some freedom is taken in the encodings of certain objects, it is
875 possible that the two modules produce different data streams for the same input
876 objects. However it is guaranteed that they will always be able to read each
877 other's data streams.
878