blob: 44447d40d039aa35d76b2faae0a0139a1af63ee8 [file] [log] [blame]
Guido van Rossuma48061a1995-01-10 00:31:14 +00001"""\
2Pickling Algorithm
3------------------
4
5This module implements a basic but powerful algorithm for "pickling" (a.k.a.
6serializing, marshalling or flattening) nearly arbitrary Python objects.
7This is a more primitive notion than persistency -- although pickle
8reads and writes file objects, it does not handle the issue of naming
9persistent objects, nor the (even more complicated) area of concurrent
10access to persistent objects. The pickle module can transform a complex
11object into a byte stream and it can transform the byte stream into
12an object with the same internal structure. The most obvious thing to
13do with these byte streams is to write them onto a file, but it is also
14conceivable to send them across a network or store them in a database.
15
16Unlike the built-in marshal module, pickle handles the following correctly:
17
18- recursive objects
19- pointer sharing
20- class instances
21
22Pickle is Python-specific. This has the advantage that there are no
23restrictions imposed by external standards such as CORBA (which probably
24can't represent pointer sharing or recursive objects); however it means
25that non-Python programs may not be able to reconstruct pickled Python
26objects.
27
28Pickle uses a printable ASCII representation. This is slightly more
29voluminous than a binary representation. However, small integers actually
30take *less* space when represented as minimal-size decimal strings than
31when represented as 32-bit binary numbers, and strings are only much longer
32if they contain control characters or 8-bit characters. The big advantage
33of using printable ASCII (and of some other characteristics of pickle's
34representation) is that for debugging or recovery purposes it is possible
35for a human to read the pickled file with a standard text editor. (I could
36have gone a step further and used a notation like S-expressions, but the
37parser would have been considerably more complicated and slower, and the
38files would probably have become much larger.)
39
40Pickle doesn't handle code objects, which marshal does.
41I suppose pickle could, and maybe it should, but there's probably no
42great need for it right now (as long as marshal continues to be used
43for reading and writing code objects), and at least this avoids
44the possibility of smuggling Trojan horses into a program.
45
46For the benefit of persistency modules written using pickle, it supports
47the notion of a reference to an object outside the pickled data stream.
48Such objects are referenced by a name, which is an arbitrary string of
49printable ASCII characters. The resolution of such names is not defined
50by the pickle module -- the persistent object module will have to implement
51a method "persistent_load". To write references to persistent objects,
52the persistent module must define a method "persistent_id" which returns
53either None or the persistent ID of the object.
54
55There are some restrictions on the pickling of class instances.
56
57First of all, the class must be defined at the top level in a module.
58
59Next, it must normally be possible to create class instances by calling
60the class without arguments. If this is undesirable, the class can
61define a method __getinitargs__ (XXX not a pretty name!), which should
62return a *tuple* containing the arguments to be passed to the class
63constructor.
64
65Classes can influence how they are pickled -- if the class defines
66the method __getstate__, it is called and the return state is pickled
67as the contents for the instance, and if the class defines the
68method __setstate__, it is called with the unpickled state. (Note
69that these methods can also be used to implement copying class instances.)
70If there is no __getstate__ method, the instance's __dict__
71is pickled. If there is no __setstate__ method, the pickled object
72must be a dictionary and its items are assigned to the new instance's
73dictionary. (If a class defines both __getstate__ and __setstate__,
74the state object needn't be a dictionary -- these methods can do what they
75want.)
76
77Note that when class instances are pickled, their class's code and data
78is not pickled along with them. Only the instance data is pickled.
79This is done on purpose, so you can fix bugs in a class or add methods and
80still load objects that were created with an earlier version of the
81class. If you plan to have long-lived objects that will see many versions
82of a class, it may be worth to put a version number in the objects so
83that suitable conversions can be made by the class's __setstate__ method.
84
85The interface is as follows:
86
Guido van Rossum256cbd71995-02-16 16:30:50 +000087To pickle an object x onto a file f, open for writing:
Guido van Rossuma48061a1995-01-10 00:31:14 +000088
89 p = pickle.Pickler(f)
90 p.dump(x)
91
92To unpickle an object x from a file f, open for reading:
93
94 u = pickle.Unpickler(f)
95 x = u.load(x)
96
97The Pickler class only calls the method f.write with a string argument
98(XXX possibly the interface should pass f.write instead of f).
99The Unpickler calls the methods f.read(with an integer argument)
100and f.readline(without argument), both returning a string.
101It is explicitly allowed to pass non-file objects here, as long as they
102have the right methods.
103
104The following types can be pickled:
105
106- None
107- integers, long integers, floating point numbers
108- strings
Guido van Rossum256cbd71995-02-16 16:30:50 +0000109- tuples, lists and dictionaries containing only picklable objects
Guido van Rossuma48061a1995-01-10 00:31:14 +0000110- class instances whose __dict__ or __setstate__() is picklable
111
112Attempts to pickle unpicklable objects will raise an exception
113after having written an unspecified number of bytes to the file argument.
114
115It is possible to make multiple calls to Pickler.dump() or to
116Unpickler.load(), as long as there is a one-to-one correspondence
Guido van Rossum256cbd71995-02-16 16:30:50 +0000117between pickler and Unpickler objects and between dump and load calls
Guido van Rossuma48061a1995-01-10 00:31:14 +0000118for any pair of corresponding Pickler and Unpicklers. WARNING: this
119is intended for pickleing multiple objects without intervening modifications
120to the objects or their parts. If you modify an object and then pickle
121it again using the same Pickler instance, the object is not pickled
122again -- a reference to it is pickled and the Unpickler will return
123the old value, not the modified one. (XXX There are two problems here:
124(a) detecting changes, and (b) marshalling a minimal set of changes.
125I have no answers. Garbage Collection may also become a problem here.)
126"""
127
128__format_version__ = "1.0" # File format version
Guido van Rossum7849da81995-03-09 14:08:35 +0000129__version__ = "1.4" # Code version
Guido van Rossuma48061a1995-01-10 00:31:14 +0000130
131from types import *
132import string
133
Guido van Rossum7849da81995-03-09 14:08:35 +0000134PicklingError = "pickle.PicklingError"
135
Guido van Rossuma48061a1995-01-10 00:31:14 +0000136AtomicTypes = [NoneType, IntType, FloatType, StringType]
137
138def safe(object):
139 t = type(object)
140 if t in AtomicTypes:
141 return 1
142 if t is TupleType:
143 for item in object:
144 if not safe(item): return 0
145 return 1
146 return 0
147
148MARK = '('
149POP = '0'
150DUP = '2'
151STOP = '.'
152TUPLE = 't'
153LIST = 'l'
154DICT = 'd'
155INST = 'i'
156GET = 'g'
157PUT = 'p'
158APPEND = 'a'
159SETITEM = 's'
160BUILD = 'b'
161NONE = 'N'
162INT = 'I'
163LONG = 'L'
164FLOAT = 'F'
165STRING = 'S'
166PERSID = 'P'
167AtomicKeys = [NONE, INT, LONG, FLOAT, STRING]
168AtomicMap = {
169 NoneType: NONE,
170 IntType: INT,
171 LongType: LONG,
172 FloatType: FLOAT,
173 StringType: STRING,
174}
175
176class Pickler:
177
178 def __init__(self, file):
179 self.write = file.write
180 self.memo = {}
181
182 def dump(self, object):
183 self.save(object)
184 self.write(STOP)
185
186 def save(self, object):
187 pid = self.persistent_id(object)
188 if pid:
189 self.write(PERSID + str(pid) + '\n')
190 return
191 d = id(object)
192 if self.memo.has_key(d):
193 self.write(GET + `d` + '\n')
194 return
195 t = type(object)
Guido van Rossum7849da81995-03-09 14:08:35 +0000196 try:
197 f = self.dispatch[t]
198 except KeyError:
199 raise PicklingError, \
200 "can't pickle %s objects" % `t.__name__`
201 f(self, object)
Guido van Rossuma48061a1995-01-10 00:31:14 +0000202
203 def persistent_id(self, object):
204 return None
205
206 dispatch = {}
207
208 def save_none(self, object):
209 self.write(NONE)
210 dispatch[NoneType] = save_none
211
212 def save_int(self, object):
213 self.write(INT + `object` + '\n')
214 dispatch[IntType] = save_int
215
216 def save_long(self, object):
217 self.write(LONG + `object` + '\n')
218 dispatch[LongType] = save_long
219
220 def save_float(self, object):
221 self.write(FLOAT + `object` + '\n')
222 dispatch[FloatType] = save_float
223
224 def save_string(self, object):
225 d = id(object)
226 self.write(STRING + `object` + '\n')
227 self.write(PUT + `d` + '\n')
228 self.memo[d] = object
229 dispatch[StringType] = save_string
230
231 def save_tuple(self, object):
232 d = id(object)
233 self.write(MARK)
234 n = len(object)
235 for k in range(n):
236 self.save(object[k])
237 if self.memo.has_key(d):
238 # Saving object[k] has saved us!
239 while k >= 0:
240 self.write(POP)
241 k = k-1
242 self.write(GET + `d` + '\n')
243 break
244 else:
245 self.write(TUPLE + PUT + `d` + '\n')
246 self.memo[d] = object
247 dispatch[TupleType] = save_tuple
248
249 def save_list(self, object):
250 d = id(object)
251 self.write(MARK)
252 n = len(object)
253 for k in range(n):
254 item = object[k]
255 if not safe(item):
256 break
257 self.save(item)
258 else:
259 k = n
260 self.write(LIST + PUT + `d` + '\n')
261 self.memo[d] = object
262 for k in range(k, n):
263 item = object[k]
264 self.save(item)
265 self.write(APPEND)
266 dispatch[ListType] = save_list
267
268 def save_dict(self, object):
269 d = id(object)
270 self.write(MARK)
271 items = object.items()
272 n = len(items)
273 for k in range(n):
274 key, value = items[k]
275 if not safe(key) or not safe(value):
276 break
277 self.save(key)
278 self.save(value)
279 else:
280 k = n
281 self.write(DICT + PUT + `d` + '\n')
282 self.memo[d] = object
283 for k in range(k, n):
284 key, value = items[k]
285 self.save(key)
286 self.save(value)
287 self.write(SETITEM)
288 dispatch[DictionaryType] = save_dict
289
290 def save_inst(self, object):
291 d = id(object)
292 cls = object.__class__
293 module = whichmodule(cls)
294 name = cls.__name__
295 if hasattr(object, '__getinitargs__'):
296 args = object.__getinitargs__()
297 len(args) # XXX Assert it's a sequence
298 else:
299 args = ()
300 self.write(MARK)
301 for arg in args:
302 self.save(arg)
303 self.write(INST + module + '\n' + name + '\n' +
304 PUT + `d` + '\n')
305 self.memo[d] = object
306 try:
307 getstate = object.__getstate__
308 except AttributeError:
309 stuff = object.__dict__
310 else:
311 stuff = getstate()
312 self.save(stuff)
313 self.write(BUILD)
314 dispatch[InstanceType] = save_inst
315
316
317classmap = {}
318
319def whichmodule(cls):
320 """Figure out the module in which a class occurs.
321
322 Search sys.modules for the module.
323 Cache in classmap.
324 Return a module name.
325 If the class cannot be found, return __main__.
326 """
327 if classmap.has_key(cls):
328 return classmap[cls]
329 import sys
330 clsname = cls.__name__
331 for name, module in sys.modules.items():
332 if module.__name__ != '__main__' and \
333 hasattr(module, clsname) and \
334 getattr(module, clsname) is cls:
335 break
336 else:
337 name = '__main__'
338 classmap[cls] = name
339 return name
340
341
342class Unpickler:
343
344 def __init__(self, file):
345 self.readline = file.readline
346 self.read = file.read
347 self.memo = {}
348
349 def load(self):
350 self.mark = ['spam'] # Any new unique object
351 self.stack = []
352 try:
353 while 1:
354 key = self.read(1)
355 self.dispatch[key](self)
356 except STOP, value:
357 return value
358
359 def marker(self):
360 k = len(self.stack)-1
361 while self.stack[k] != self.mark: k = k-1
362 return k
363
364 dispatch = {}
365
Guido van Rossum7b5430f1995-03-04 22:25:21 +0000366 def load_eof(self):
367 raise EOFError
368 dispatch[''] = load_eof
369
Guido van Rossuma48061a1995-01-10 00:31:14 +0000370 def load_persid(self):
371 pid = self.readline()[:-1]
372 self.stack.append(self.persisent_load(pid))
373 dispatch[PERSID] = load_persid
374
375 def load_none(self):
376 self.stack.append(None)
377 dispatch[NONE] = load_none
378
379 def load_atomic(self):
380 self.stack.append(eval(self.readline()[:-1]))
381 dispatch[INT] = load_atomic
382 dispatch[LONG] = load_atomic
383 dispatch[FLOAT] = load_atomic
384 dispatch[STRING] = load_atomic
385
386 def load_tuple(self):
387 k = self.marker()
388 self.stack[k:] = [tuple(self.stack[k+1:])]
389 dispatch[TUPLE] = load_tuple
390
391 def load_list(self):
392 k = self.marker()
393 self.stack[k:] = [self.stack[k+1:]]
394 dispatch[LIST] = load_list
395
396 def load_dict(self):
397 k = self.marker()
398 d = {}
399 items = self.stack[k+1:]
400 for i in range(0, len(items), 2):
401 key = items[i]
402 value = items[i+1]
403 d[key] = value
404 self.stack[k:] = [d]
405 dispatch[DICT] = load_dict
406
407 def load_inst(self):
408 k = self.marker()
409 args = tuple(self.stack[k+1:])
410 del self.stack[k:]
411 module = self.readline()[:-1]
412 name = self.readline()[:-1]
413 env = {}
414 try:
415 exec 'from %s import %s' % (module, name) in env
416 except ImportError:
417 raise SystemError, \
418 "Failed to import class %s from module %s" % \
419 (name, module)
420 else:
421 klass = env[name]
422 if type(klass) != ClassType:
423 raise SystemError, \
424 "imported object %s from module %s is not a class" % \
425 (name, module)
426 value = apply(klass, args)
427 self.stack.append(value)
428 dispatch[INST] = load_inst
429
430 def load_pop(self):
431 del self.stack[-1]
432 dispatch[POP] = load_pop
433
434 def load_dup(self):
435 stack.append(stack[-1])
436 dispatch[DUP] = load_dup
437
438 def load_get(self):
439 self.stack.append(self.memo[string.atoi(self.readline()[:-1])])
440 dispatch[GET] = load_get
441
442 def load_put(self):
443 self.memo[string.atoi(self.readline()[:-1])] = self.stack[-1]
444 dispatch[PUT] = load_put
445
446 def load_append(self):
447 value = self.stack[-1]
448 del self.stack[-1]
449 list = self.stack[-1]
450 list.append(value)
451 dispatch[APPEND] = load_append
452
453 def load_setitem(self):
454 value = self.stack[-1]
455 key = self.stack[-2]
456 del self.stack[-2:]
457 dict = self.stack[-1]
458 dict[key] = value
459 dispatch[SETITEM] = load_setitem
460
461 def load_build(self):
462 value = self.stack[-1]
463 del self.stack[-1]
464 inst = self.stack[-1]
465 try:
466 setstate = inst.__setstate__
467 except AttributeError:
468 for key in value.keys():
469 inst.__dict__[key] = value[key]
470 else:
471 setstate(value)
472 dispatch[BUILD] = load_build
473
474 def load_mark(self):
475 self.stack.append(self.mark)
476 dispatch[MARK] = load_mark
477
478 def load_stop(self):
479 value = self.stack[-1]
480 del self.stack[-1]
481 raise STOP, value
482 dispatch[STOP] = load_stop
483
484
485class C:
486 def __cmp__(self, other):
487 return cmp(self.__dict__, other.__dict__)
488
489def test():
490 fn = 'pickle_tmp'
491 c = C()
492 c.foo = 1
493 c.bar = 2
494 x = [0,1,2,3]
495 y = ('abc', 'abc', c, c)
496 x.append(y)
497 x.append(y)
498 x.append(5)
499 f = open(fn, 'w')
500 F = Pickler(f)
501 F.dump(x)
502 f.close()
503 f = open(fn, 'r')
504 U = Unpickler(f)
505 x2 = U.load()
506 print x
507 print x2
508 print x == x2
509 print map(id, x)
510 print map(id, x2)
511 print F.memo
512 print U.memo
513
514if __name__ == '__main__':
515 test()