| \section{Built-in module \sectcode{pickle}} | 
 | \stmodindex{pickle} | 
 | \index{persistency} | 
 | \indexii{persistent}{objects} | 
 | \indexii{serializing}{objects} | 
 | \indexii{marshalling}{objects} | 
 | \indexii{flattening}{objects} | 
 | \indexii{pickling}{objects} | 
 |  | 
 | The \code{pickle} module implements a basic but powerful algorithm for | 
 | ``pickling'' (a.k.a.\ serializing, marshalling or flattening) nearly | 
 | arbitrary Python objects.  This is a more primitive notion than | 
 | persistency --- although \code{pickle} reads and writes file objects, | 
 | it does not handle the issue of naming persistent objects, nor the | 
 | (even more complicated) area of concurrent access to persistent | 
 | objects.  The \code{pickle} module can transform a complex object into | 
 | a byte stream and it can transform the byte stream into an object with | 
 | the same internal structure.  The most obvious thing to do with these | 
 | byte streams is to write them onto a file, but it is also conceivable | 
 | to send them across a network or store them in a database.  The module | 
 | \code{shelve} provides a simple interface to pickle and unpickle | 
 | objects on ``dbm''-style database files. | 
 | \stmodindex{shelve} | 
 |  | 
 | Unlike the built-in module \code{marshal}, \code{pickle} handles the | 
 | following correctly: | 
 | \stmodindex{marshal} | 
 |  | 
 | \begin{itemize} | 
 |  | 
 | \item recursive objects | 
 |  | 
 | \item pointer sharing | 
 |  | 
 | \item instances of user-defined classes | 
 |  | 
 | \end{itemize} | 
 |  | 
 | The data format used by \code{pickle} is Python-specific.  This has | 
 | the advantage that there are no restrictions imposed by external | 
 | standards such as CORBA (which probably can't represent pointer | 
 | sharing or recursive objects); however it means that non-Python | 
 | programs may not be able to reconstruct pickled Python objects. | 
 |  | 
 | The \code{pickle} data format uses a printable ASCII representation. | 
 | This is slightly more voluminous than a binary representation. | 
 | However, small integers actually take {\em less} space when | 
 | represented as minimal-size decimal strings than when represented as | 
 | 32-bit binary numbers, and strings are only much longer if they | 
 | contain many control characters or 8-bit characters.  The big | 
 | advantage of using printable ASCII (and of some other characteristics | 
 | of \code{pickle}'s representation) is that for debugging or recovery | 
 | purposes it is possible for a human to read the pickled file with a | 
 | standard text editor.  (I could have gone a step further and used a | 
 | notation like S-expressions, but the parser would have been | 
 | considerably more complicated and slower, and the files would probably | 
 | have become much larger.) | 
 |  | 
 | The \code{pickle} module doesn't handle code objects, which the | 
 | \code{marshal} module does.  I suppose \code{pickle} could, and maybe | 
 | it should, but there's probably no great need for it right now (as | 
 | long as \code{marshal} continues to be used for reading and writing | 
 | code objects), and at least this avoids the possibility of smuggling | 
 | Trojan horses into a program. | 
 | \stmodindex{marshal} | 
 |  | 
 | For the benefit of persistency modules written using \code{pickle}, it | 
 | supports the notion of a reference to an object outside the pickled | 
 | data stream.  Such objects are referenced by a name, which is an | 
 | arbitrary string of printable ASCII characters.  The resolution of | 
 | such names is not defined by the \code{pickle} module --- the | 
 | persistent object module will have to implement a method | 
 | \code{persistent_load}.  To write references to persistent objects, | 
 | the persistent module must define a method \code{persistent_id} which | 
 | returns either \code{None} or the persistent ID of the object. | 
 |  | 
 | There are some restrictions on the pickling of class instances. | 
 |  | 
 | First of all, the class must be defined at the top level in a module. | 
 |  | 
 | Next, it must normally be possible to create class instances by | 
 | calling the class without arguments.  If this is undesirable, the | 
 | class can define a method \code{__getinitargs__()}, which should | 
 | return a {\em tuple} containing the arguments to be passed to the | 
 | class constructor (\code{__init__()}). | 
 | \ttindex{__getinitargs__} | 
 | \ttindex{__init__} | 
 |  | 
 | Classes can further influence how they are pickled --- if the class | 
 | defines the method \code{__getstate__()}, it is called and the return | 
 | state is pickled as the contents for the instance, and if the class | 
 | defines the method \code{__setstate__()}, it is called with the | 
 | unpickled state.  (Note that these methods can also be used to | 
 | implement copying class instances.)  If there is no | 
 | \code{__getstate__()} method, the instance's \code{__dict__} is | 
 | pickled.  If there is no \code{__setstate__()} method, the pickled | 
 | object must be a dictionary and its items are assigned to the new | 
 | instance's dictionary.  (If a class defines both \code{__getstate__()} | 
 | and \code{__setstate__()}, the state object needn't be a dictionary | 
 | --- these methods can do what they want.)  This protocol is also used | 
 | by the shallow and deep copying operations defined in the \code{copy} | 
 | module. | 
 | \ttindex{__getstate__} | 
 | \ttindex{__setstate__} | 
 | \ttindex{__dict__} | 
 |  | 
 | Note that when class instances are pickled, their class's code and | 
 | data are not pickled along with them.  Only the instance data are | 
 | pickled.  This is done on purpose, so you can fix bugs in a class or | 
 | add methods and still load objects that were created with an earlier | 
 | version of the class.  If you plan to have long-lived objects that | 
 | will see many versions of a class, it may be worthwhile to put a version | 
 | number in the objects so that suitable conversions can be made by the | 
 | class's \code{__setstate__()} method. | 
 |  | 
 | The interface can be summarized as follows. | 
 |  | 
 | To pickle an object \code{x} onto a file \code{f}, open for writing: | 
 |  | 
 | \begin{verbatim} | 
 | p = pickle.Pickler(f) | 
 | p.dump(x) | 
 | \end{verbatim} | 
 |  | 
 | To unpickle an object \code{x} from a file \code{f}, open for reading: | 
 |  | 
 | \begin{verbatim} | 
 | u = pickle.Unpickler(f) | 
 | x = u.load(x) | 
 | \end{verbatim} | 
 |  | 
 | The \code{Pickler} class only calls the method \code{f.write} with a | 
 | string argument.  The \code{Unpickler} calls the methods \code{f.read} | 
 | (with an integer argument) and \code{f.readline} (without argument), | 
 | both returning a string.  It is explicitly allowed to pass non-file | 
 | objects here, as long as they have the right methods. | 
 |  | 
 | The following types can be pickled: | 
 | \begin{itemize} | 
 |  | 
 | \item \code{None} | 
 |  | 
 | \item integers, long integers, floating point numbers | 
 |  | 
 | \item strings | 
 |  | 
 | \item tuples, lists and dictionaries containing only picklable objects | 
 |  | 
 | \item class instances whose \code{__dict__} or \code{__setstate__()} | 
 | is picklable | 
 |  | 
 | \end{itemize} | 
 |  | 
 | Attempts to pickle unpicklable objects will raise an exception; when | 
 | this happens, an unspecified number of bytes may have been written to | 
 | the file argument. | 
 |  | 
 | It is possible to make multiple calls to \code{Pickler.dump()} or to | 
 | \code{Unpickler.load()}, as long as there is a one-to-one | 
 | correspondence between \code{Pickler} and \code{Unpickler} objects and | 
 | between \code{dump} and \code{load} calls for any pair of | 
 | corresponding \code{Pickler} and \code{Unpicklers}.  {\em Warning}: | 
 | this is intended for pickling multiple objects without intervening | 
 | modifications to the objects or their parts.  If you modify an object | 
 | and then pickle it again using the same \code{Pickler} instance, the | 
 | object is not pickled again --- a reference to it is pickled and the | 
 | \code{Unpickler} will return the old value, not the modified one.  (There | 
 | are two problems here: (a) detecting changes, and (b) marshalling a | 
 | minimal set of changes.  I have no answers.  Garbage Collection may | 
 | also become a problem here.) |