blob: b5ade573535dadaa39f56b2f8a994f032ea8ce2c [file] [log] [blame]
Guido van Rossuma48061a1995-01-10 00:31:14 +00001"""\
2Pickling Algorithm
3------------------
4
5This module implements a basic but powerful algorithm for "pickling" (a.k.a.
6serializing, marshalling or flattening) nearly arbitrary Python objects.
7This is a more primitive notion than persistency -- although pickle
8reads and writes file objects, it does not handle the issue of naming
9persistent objects, nor the (even more complicated) area of concurrent
10access to persistent objects. The pickle module can transform a complex
11object into a byte stream and it can transform the byte stream into
12an object with the same internal structure. The most obvious thing to
13do with these byte streams is to write them onto a file, but it is also
14conceivable to send them across a network or store them in a database.
15
16Unlike the built-in marshal module, pickle handles the following correctly:
17
18- recursive objects
19- pointer sharing
20- class instances
21
22Pickle is Python-specific. This has the advantage that there are no
23restrictions imposed by external standards such as CORBA (which probably
24can't represent pointer sharing or recursive objects); however it means
25that non-Python programs may not be able to reconstruct pickled Python
26objects.
27
28Pickle uses a printable ASCII representation. This is slightly more
29voluminous than a binary representation. However, small integers actually
30take *less* space when represented as minimal-size decimal strings than
31when represented as 32-bit binary numbers, and strings are only much longer
32if they contain control characters or 8-bit characters. The big advantage
33of using printable ASCII (and of some other characteristics of pickle's
34representation) is that for debugging or recovery purposes it is possible
35for a human to read the pickled file with a standard text editor. (I could
36have gone a step further and used a notation like S-expressions, but the
37parser would have been considerably more complicated and slower, and the
38files would probably have become much larger.)
39
40Pickle doesn't handle code objects, which marshal does.
41I suppose pickle could, and maybe it should, but there's probably no
42great need for it right now (as long as marshal continues to be used
43for reading and writing code objects), and at least this avoids
44the possibility of smuggling Trojan horses into a program.
45
46For the benefit of persistency modules written using pickle, it supports
47the notion of a reference to an object outside the pickled data stream.
48Such objects are referenced by a name, which is an arbitrary string of
49printable ASCII characters. The resolution of such names is not defined
50by the pickle module -- the persistent object module will have to implement
51a method "persistent_load". To write references to persistent objects,
52the persistent module must define a method "persistent_id" which returns
53either None or the persistent ID of the object.
54
55There are some restrictions on the pickling of class instances.
56
57First of all, the class must be defined at the top level in a module.
58
59Next, it must normally be possible to create class instances by calling
60the class without arguments. If this is undesirable, the class can
61define a method __getinitargs__ (XXX not a pretty name!), which should
62return a *tuple* containing the arguments to be passed to the class
63constructor.
64
65Classes can influence how they are pickled -- if the class defines
66the method __getstate__, it is called and the return state is pickled
67as the contents for the instance, and if the class defines the
68method __setstate__, it is called with the unpickled state. (Note
69that these methods can also be used to implement copying class instances.)
70If there is no __getstate__ method, the instance's __dict__
71is pickled. If there is no __setstate__ method, the pickled object
72must be a dictionary and its items are assigned to the new instance's
73dictionary. (If a class defines both __getstate__ and __setstate__,
74the state object needn't be a dictionary -- these methods can do what they
75want.)
76
77Note that when class instances are pickled, their class's code and data
78is not pickled along with them. Only the instance data is pickled.
79This is done on purpose, so you can fix bugs in a class or add methods and
80still load objects that were created with an earlier version of the
81class. If you plan to have long-lived objects that will see many versions
82of a class, it may be worth to put a version number in the objects so
83that suitable conversions can be made by the class's __setstate__ method.
84
85The interface is as follows:
86
87To pickle an object x onto a file f. open for writing:
88
89 p = pickle.Pickler(f)
90 p.dump(x)
91
92To unpickle an object x from a file f, open for reading:
93
94 u = pickle.Unpickler(f)
95 x = u.load(x)
96
97The Pickler class only calls the method f.write with a string argument
98(XXX possibly the interface should pass f.write instead of f).
99The Unpickler calls the methods f.read(with an integer argument)
100and f.readline(without argument), both returning a string.
101It is explicitly allowed to pass non-file objects here, as long as they
102have the right methods.
103
104The following types can be pickled:
105
106- None
107- integers, long integers, floating point numbers
108- strings
109- tuples, lists and dictionaries containing picklable objects
110- class instances whose __dict__ or __setstate__() is picklable
111
112Attempts to pickle unpicklable objects will raise an exception
113after having written an unspecified number of bytes to the file argument.
114
115It is possible to make multiple calls to Pickler.dump() or to
116Unpickler.load(), as long as there is a one-to-one correspondence
117betwee pickler and Unpickler objects and between dump and load calls
118for any pair of corresponding Pickler and Unpicklers. WARNING: this
119is intended for pickleing multiple objects without intervening modifications
120to the objects or their parts. If you modify an object and then pickle
121it again using the same Pickler instance, the object is not pickled
122again -- a reference to it is pickled and the Unpickler will return
123the old value, not the modified one. (XXX There are two problems here:
124(a) detecting changes, and (b) marshalling a minimal set of changes.
125I have no answers. Garbage Collection may also become a problem here.)
126"""
127
128__format_version__ = "1.0" # File format version
129__version__ = "1.2" # Code version
130
131from types import *
132import string
133
134AtomicTypes = [NoneType, IntType, FloatType, StringType]
135
136def safe(object):
137 t = type(object)
138 if t in AtomicTypes:
139 return 1
140 if t is TupleType:
141 for item in object:
142 if not safe(item): return 0
143 return 1
144 return 0
145
146MARK = '('
147POP = '0'
148DUP = '2'
149STOP = '.'
150TUPLE = 't'
151LIST = 'l'
152DICT = 'd'
153INST = 'i'
154GET = 'g'
155PUT = 'p'
156APPEND = 'a'
157SETITEM = 's'
158BUILD = 'b'
159NONE = 'N'
160INT = 'I'
161LONG = 'L'
162FLOAT = 'F'
163STRING = 'S'
164PERSID = 'P'
165AtomicKeys = [NONE, INT, LONG, FLOAT, STRING]
166AtomicMap = {
167 NoneType: NONE,
168 IntType: INT,
169 LongType: LONG,
170 FloatType: FLOAT,
171 StringType: STRING,
172}
173
174class Pickler:
175
176 def __init__(self, file):
177 self.write = file.write
178 self.memo = {}
179
180 def dump(self, object):
181 self.save(object)
182 self.write(STOP)
183
184 def save(self, object):
185 pid = self.persistent_id(object)
186 if pid:
187 self.write(PERSID + str(pid) + '\n')
188 return
189 d = id(object)
190 if self.memo.has_key(d):
191 self.write(GET + `d` + '\n')
192 return
193 t = type(object)
194 self.dispatch[t](self, object)
195
196 def persistent_id(self, object):
197 return None
198
199 dispatch = {}
200
201 def save_none(self, object):
202 self.write(NONE)
203 dispatch[NoneType] = save_none
204
205 def save_int(self, object):
206 self.write(INT + `object` + '\n')
207 dispatch[IntType] = save_int
208
209 def save_long(self, object):
210 self.write(LONG + `object` + '\n')
211 dispatch[LongType] = save_long
212
213 def save_float(self, object):
214 self.write(FLOAT + `object` + '\n')
215 dispatch[FloatType] = save_float
216
217 def save_string(self, object):
218 d = id(object)
219 self.write(STRING + `object` + '\n')
220 self.write(PUT + `d` + '\n')
221 self.memo[d] = object
222 dispatch[StringType] = save_string
223
224 def save_tuple(self, object):
225 d = id(object)
226 self.write(MARK)
227 n = len(object)
228 for k in range(n):
229 self.save(object[k])
230 if self.memo.has_key(d):
231 # Saving object[k] has saved us!
232 while k >= 0:
233 self.write(POP)
234 k = k-1
235 self.write(GET + `d` + '\n')
236 break
237 else:
238 self.write(TUPLE + PUT + `d` + '\n')
239 self.memo[d] = object
240 dispatch[TupleType] = save_tuple
241
242 def save_list(self, object):
243 d = id(object)
244 self.write(MARK)
245 n = len(object)
246 for k in range(n):
247 item = object[k]
248 if not safe(item):
249 break
250 self.save(item)
251 else:
252 k = n
253 self.write(LIST + PUT + `d` + '\n')
254 self.memo[d] = object
255 for k in range(k, n):
256 item = object[k]
257 self.save(item)
258 self.write(APPEND)
259 dispatch[ListType] = save_list
260
261 def save_dict(self, object):
262 d = id(object)
263 self.write(MARK)
264 items = object.items()
265 n = len(items)
266 for k in range(n):
267 key, value = items[k]
268 if not safe(key) or not safe(value):
269 break
270 self.save(key)
271 self.save(value)
272 else:
273 k = n
274 self.write(DICT + PUT + `d` + '\n')
275 self.memo[d] = object
276 for k in range(k, n):
277 key, value = items[k]
278 self.save(key)
279 self.save(value)
280 self.write(SETITEM)
281 dispatch[DictionaryType] = save_dict
282
283 def save_inst(self, object):
284 d = id(object)
285 cls = object.__class__
286 module = whichmodule(cls)
287 name = cls.__name__
288 if hasattr(object, '__getinitargs__'):
289 args = object.__getinitargs__()
290 len(args) # XXX Assert it's a sequence
291 else:
292 args = ()
293 self.write(MARK)
294 for arg in args:
295 self.save(arg)
296 self.write(INST + module + '\n' + name + '\n' +
297 PUT + `d` + '\n')
298 self.memo[d] = object
299 try:
300 getstate = object.__getstate__
301 except AttributeError:
302 stuff = object.__dict__
303 else:
304 stuff = getstate()
305 self.save(stuff)
306 self.write(BUILD)
307 dispatch[InstanceType] = save_inst
308
309
310classmap = {}
311
312def whichmodule(cls):
313 """Figure out the module in which a class occurs.
314
315 Search sys.modules for the module.
316 Cache in classmap.
317 Return a module name.
318 If the class cannot be found, return __main__.
319 """
320 if classmap.has_key(cls):
321 return classmap[cls]
322 import sys
323 clsname = cls.__name__
324 for name, module in sys.modules.items():
325 if module.__name__ != '__main__' and \
326 hasattr(module, clsname) and \
327 getattr(module, clsname) is cls:
328 break
329 else:
330 name = '__main__'
331 classmap[cls] = name
332 return name
333
334
335class Unpickler:
336
337 def __init__(self, file):
338 self.readline = file.readline
339 self.read = file.read
340 self.memo = {}
341
342 def load(self):
343 self.mark = ['spam'] # Any new unique object
344 self.stack = []
345 try:
346 while 1:
347 key = self.read(1)
348 self.dispatch[key](self)
349 except STOP, value:
350 return value
351
352 def marker(self):
353 k = len(self.stack)-1
354 while self.stack[k] != self.mark: k = k-1
355 return k
356
357 dispatch = {}
358
359 def load_persid(self):
360 pid = self.readline()[:-1]
361 self.stack.append(self.persisent_load(pid))
362 dispatch[PERSID] = load_persid
363
364 def load_none(self):
365 self.stack.append(None)
366 dispatch[NONE] = load_none
367
368 def load_atomic(self):
369 self.stack.append(eval(self.readline()[:-1]))
370 dispatch[INT] = load_atomic
371 dispatch[LONG] = load_atomic
372 dispatch[FLOAT] = load_atomic
373 dispatch[STRING] = load_atomic
374
375 def load_tuple(self):
376 k = self.marker()
377 self.stack[k:] = [tuple(self.stack[k+1:])]
378 dispatch[TUPLE] = load_tuple
379
380 def load_list(self):
381 k = self.marker()
382 self.stack[k:] = [self.stack[k+1:]]
383 dispatch[LIST] = load_list
384
385 def load_dict(self):
386 k = self.marker()
387 d = {}
388 items = self.stack[k+1:]
389 for i in range(0, len(items), 2):
390 key = items[i]
391 value = items[i+1]
392 d[key] = value
393 self.stack[k:] = [d]
394 dispatch[DICT] = load_dict
395
396 def load_inst(self):
397 k = self.marker()
398 args = tuple(self.stack[k+1:])
399 del self.stack[k:]
400 module = self.readline()[:-1]
401 name = self.readline()[:-1]
402 env = {}
403 try:
404 exec 'from %s import %s' % (module, name) in env
405 except ImportError:
406 raise SystemError, \
407 "Failed to import class %s from module %s" % \
408 (name, module)
409 else:
410 klass = env[name]
411 if type(klass) != ClassType:
412 raise SystemError, \
413 "imported object %s from module %s is not a class" % \
414 (name, module)
415 value = apply(klass, args)
416 self.stack.append(value)
417 dispatch[INST] = load_inst
418
419 def load_pop(self):
420 del self.stack[-1]
421 dispatch[POP] = load_pop
422
423 def load_dup(self):
424 stack.append(stack[-1])
425 dispatch[DUP] = load_dup
426
427 def load_get(self):
428 self.stack.append(self.memo[string.atoi(self.readline()[:-1])])
429 dispatch[GET] = load_get
430
431 def load_put(self):
432 self.memo[string.atoi(self.readline()[:-1])] = self.stack[-1]
433 dispatch[PUT] = load_put
434
435 def load_append(self):
436 value = self.stack[-1]
437 del self.stack[-1]
438 list = self.stack[-1]
439 list.append(value)
440 dispatch[APPEND] = load_append
441
442 def load_setitem(self):
443 value = self.stack[-1]
444 key = self.stack[-2]
445 del self.stack[-2:]
446 dict = self.stack[-1]
447 dict[key] = value
448 dispatch[SETITEM] = load_setitem
449
450 def load_build(self):
451 value = self.stack[-1]
452 del self.stack[-1]
453 inst = self.stack[-1]
454 try:
455 setstate = inst.__setstate__
456 except AttributeError:
457 for key in value.keys():
458 inst.__dict__[key] = value[key]
459 else:
460 setstate(value)
461 dispatch[BUILD] = load_build
462
463 def load_mark(self):
464 self.stack.append(self.mark)
465 dispatch[MARK] = load_mark
466
467 def load_stop(self):
468 value = self.stack[-1]
469 del self.stack[-1]
470 raise STOP, value
471 dispatch[STOP] = load_stop
472
473
474class C:
475 def __cmp__(self, other):
476 return cmp(self.__dict__, other.__dict__)
477
478def test():
479 fn = 'pickle_tmp'
480 c = C()
481 c.foo = 1
482 c.bar = 2
483 x = [0,1,2,3]
484 y = ('abc', 'abc', c, c)
485 x.append(y)
486 x.append(y)
487 x.append(5)
488 f = open(fn, 'w')
489 F = Pickler(f)
490 F.dump(x)
491 f.close()
492 f = open(fn, 'r')
493 U = Unpickler(f)
494 x2 = U.load()
495 print x
496 print x2
497 print x == x2
498 print map(id, x)
499 print map(id, x2)
500 print F.memo
501 print U.memo
502
503if __name__ == '__main__':
504 test()