blob: 7984b98168bf8ba07e559194bd555fd4b32e3490 [file] [log] [blame]
Guido van Rossuma48061a1995-01-10 00:31:14 +00001"""\
2Pickling Algorithm
3------------------
4
5This module implements a basic but powerful algorithm for "pickling" (a.k.a.
6serializing, marshalling or flattening) nearly arbitrary Python objects.
7This is a more primitive notion than persistency -- although pickle
8reads and writes file objects, it does not handle the issue of naming
9persistent objects, nor the (even more complicated) area of concurrent
10access to persistent objects. The pickle module can transform a complex
11object into a byte stream and it can transform the byte stream into
12an object with the same internal structure. The most obvious thing to
13do with these byte streams is to write them onto a file, but it is also
14conceivable to send them across a network or store them in a database.
15
16Unlike the built-in marshal module, pickle handles the following correctly:
17
18- recursive objects
19- pointer sharing
20- class instances
21
22Pickle is Python-specific. This has the advantage that there are no
23restrictions imposed by external standards such as CORBA (which probably
24can't represent pointer sharing or recursive objects); however it means
25that non-Python programs may not be able to reconstruct pickled Python
26objects.
27
28Pickle uses a printable ASCII representation. This is slightly more
29voluminous than a binary representation. However, small integers actually
30take *less* space when represented as minimal-size decimal strings than
31when represented as 32-bit binary numbers, and strings are only much longer
32if they contain control characters or 8-bit characters. The big advantage
33of using printable ASCII (and of some other characteristics of pickle's
34representation) is that for debugging or recovery purposes it is possible
35for a human to read the pickled file with a standard text editor. (I could
36have gone a step further and used a notation like S-expressions, but the
37parser would have been considerably more complicated and slower, and the
38files would probably have become much larger.)
39
40Pickle doesn't handle code objects, which marshal does.
41I suppose pickle could, and maybe it should, but there's probably no
42great need for it right now (as long as marshal continues to be used
43for reading and writing code objects), and at least this avoids
44the possibility of smuggling Trojan horses into a program.
45
46For the benefit of persistency modules written using pickle, it supports
47the notion of a reference to an object outside the pickled data stream.
48Such objects are referenced by a name, which is an arbitrary string of
49printable ASCII characters. The resolution of such names is not defined
50by the pickle module -- the persistent object module will have to implement
51a method "persistent_load". To write references to persistent objects,
52the persistent module must define a method "persistent_id" which returns
53either None or the persistent ID of the object.
54
55There are some restrictions on the pickling of class instances.
56
57First of all, the class must be defined at the top level in a module.
58
59Next, it must normally be possible to create class instances by calling
60the class without arguments. If this is undesirable, the class can
61define a method __getinitargs__ (XXX not a pretty name!), which should
62return a *tuple* containing the arguments to be passed to the class
63constructor.
64
65Classes can influence how they are pickled -- if the class defines
66the method __getstate__, it is called and the return state is pickled
67as the contents for the instance, and if the class defines the
68method __setstate__, it is called with the unpickled state. (Note
69that these methods can also be used to implement copying class instances.)
70If there is no __getstate__ method, the instance's __dict__
71is pickled. If there is no __setstate__ method, the pickled object
72must be a dictionary and its items are assigned to the new instance's
73dictionary. (If a class defines both __getstate__ and __setstate__,
74the state object needn't be a dictionary -- these methods can do what they
75want.)
76
77Note that when class instances are pickled, their class's code and data
78is not pickled along with them. Only the instance data is pickled.
79This is done on purpose, so you can fix bugs in a class or add methods and
80still load objects that were created with an earlier version of the
81class. If you plan to have long-lived objects that will see many versions
82of a class, it may be worth to put a version number in the objects so
83that suitable conversions can be made by the class's __setstate__ method.
84
85The interface is as follows:
86
Guido van Rossum256cbd71995-02-16 16:30:50 +000087To pickle an object x onto a file f, open for writing:
Guido van Rossuma48061a1995-01-10 00:31:14 +000088
89 p = pickle.Pickler(f)
90 p.dump(x)
91
92To unpickle an object x from a file f, open for reading:
93
94 u = pickle.Unpickler(f)
95 x = u.load(x)
96
97The Pickler class only calls the method f.write with a string argument
98(XXX possibly the interface should pass f.write instead of f).
99The Unpickler calls the methods f.read(with an integer argument)
100and f.readline(without argument), both returning a string.
101It is explicitly allowed to pass non-file objects here, as long as they
102have the right methods.
103
104The following types can be pickled:
105
106- None
107- integers, long integers, floating point numbers
108- strings
Guido van Rossum256cbd71995-02-16 16:30:50 +0000109- tuples, lists and dictionaries containing only picklable objects
Guido van Rossuma48061a1995-01-10 00:31:14 +0000110- class instances whose __dict__ or __setstate__() is picklable
111
112Attempts to pickle unpicklable objects will raise an exception
113after having written an unspecified number of bytes to the file argument.
114
115It is possible to make multiple calls to Pickler.dump() or to
116Unpickler.load(), as long as there is a one-to-one correspondence
Guido van Rossum256cbd71995-02-16 16:30:50 +0000117between pickler and Unpickler objects and between dump and load calls
Guido van Rossuma48061a1995-01-10 00:31:14 +0000118for any pair of corresponding Pickler and Unpicklers. WARNING: this
119is intended for pickleing multiple objects without intervening modifications
120to the objects or their parts. If you modify an object and then pickle
121it again using the same Pickler instance, the object is not pickled
122again -- a reference to it is pickled and the Unpickler will return
123the old value, not the modified one. (XXX There are two problems here:
124(a) detecting changes, and (b) marshalling a minimal set of changes.
125I have no answers. Garbage Collection may also become a problem here.)
126"""
127
128__format_version__ = "1.0" # File format version
129__version__ = "1.2" # Code version
130
131from types import *
132import string
133
134AtomicTypes = [NoneType, IntType, FloatType, StringType]
135
136def safe(object):
137 t = type(object)
138 if t in AtomicTypes:
139 return 1
140 if t is TupleType:
141 for item in object:
142 if not safe(item): return 0
143 return 1
144 return 0
145
146MARK = '('
147POP = '0'
148DUP = '2'
149STOP = '.'
150TUPLE = 't'
151LIST = 'l'
152DICT = 'd'
153INST = 'i'
154GET = 'g'
155PUT = 'p'
156APPEND = 'a'
157SETITEM = 's'
158BUILD = 'b'
159NONE = 'N'
160INT = 'I'
161LONG = 'L'
162FLOAT = 'F'
163STRING = 'S'
164PERSID = 'P'
165AtomicKeys = [NONE, INT, LONG, FLOAT, STRING]
166AtomicMap = {
167 NoneType: NONE,
168 IntType: INT,
169 LongType: LONG,
170 FloatType: FLOAT,
171 StringType: STRING,
172}
173
174class Pickler:
175
176 def __init__(self, file):
177 self.write = file.write
178 self.memo = {}
179
180 def dump(self, object):
181 self.save(object)
182 self.write(STOP)
183
184 def save(self, object):
185 pid = self.persistent_id(object)
186 if pid:
187 self.write(PERSID + str(pid) + '\n')
188 return
189 d = id(object)
190 if self.memo.has_key(d):
191 self.write(GET + `d` + '\n')
192 return
193 t = type(object)
194 self.dispatch[t](self, object)
195
196 def persistent_id(self, object):
197 return None
198
199 dispatch = {}
200
201 def save_none(self, object):
202 self.write(NONE)
203 dispatch[NoneType] = save_none
204
205 def save_int(self, object):
206 self.write(INT + `object` + '\n')
207 dispatch[IntType] = save_int
208
209 def save_long(self, object):
210 self.write(LONG + `object` + '\n')
211 dispatch[LongType] = save_long
212
213 def save_float(self, object):
214 self.write(FLOAT + `object` + '\n')
215 dispatch[FloatType] = save_float
216
217 def save_string(self, object):
218 d = id(object)
219 self.write(STRING + `object` + '\n')
220 self.write(PUT + `d` + '\n')
221 self.memo[d] = object
222 dispatch[StringType] = save_string
223
224 def save_tuple(self, object):
225 d = id(object)
226 self.write(MARK)
227 n = len(object)
228 for k in range(n):
229 self.save(object[k])
230 if self.memo.has_key(d):
231 # Saving object[k] has saved us!
232 while k >= 0:
233 self.write(POP)
234 k = k-1
235 self.write(GET + `d` + '\n')
236 break
237 else:
238 self.write(TUPLE + PUT + `d` + '\n')
239 self.memo[d] = object
240 dispatch[TupleType] = save_tuple
241
242 def save_list(self, object):
243 d = id(object)
244 self.write(MARK)
245 n = len(object)
246 for k in range(n):
247 item = object[k]
248 if not safe(item):
249 break
250 self.save(item)
251 else:
252 k = n
253 self.write(LIST + PUT + `d` + '\n')
254 self.memo[d] = object
255 for k in range(k, n):
256 item = object[k]
257 self.save(item)
258 self.write(APPEND)
259 dispatch[ListType] = save_list
260
261 def save_dict(self, object):
262 d = id(object)
263 self.write(MARK)
264 items = object.items()
265 n = len(items)
266 for k in range(n):
267 key, value = items[k]
268 if not safe(key) or not safe(value):
269 break
270 self.save(key)
271 self.save(value)
272 else:
273 k = n
274 self.write(DICT + PUT + `d` + '\n')
275 self.memo[d] = object
276 for k in range(k, n):
277 key, value = items[k]
278 self.save(key)
279 self.save(value)
280 self.write(SETITEM)
281 dispatch[DictionaryType] = save_dict
282
283 def save_inst(self, object):
284 d = id(object)
285 cls = object.__class__
286 module = whichmodule(cls)
287 name = cls.__name__
288 if hasattr(object, '__getinitargs__'):
289 args = object.__getinitargs__()
290 len(args) # XXX Assert it's a sequence
291 else:
292 args = ()
293 self.write(MARK)
294 for arg in args:
295 self.save(arg)
296 self.write(INST + module + '\n' + name + '\n' +
297 PUT + `d` + '\n')
298 self.memo[d] = object
299 try:
300 getstate = object.__getstate__
301 except AttributeError:
302 stuff = object.__dict__
303 else:
304 stuff = getstate()
305 self.save(stuff)
306 self.write(BUILD)
307 dispatch[InstanceType] = save_inst
308
309
310classmap = {}
311
312def whichmodule(cls):
313 """Figure out the module in which a class occurs.
314
315 Search sys.modules for the module.
316 Cache in classmap.
317 Return a module name.
318 If the class cannot be found, return __main__.
319 """
320 if classmap.has_key(cls):
321 return classmap[cls]
322 import sys
323 clsname = cls.__name__
324 for name, module in sys.modules.items():
325 if module.__name__ != '__main__' and \
326 hasattr(module, clsname) and \
327 getattr(module, clsname) is cls:
328 break
329 else:
330 name = '__main__'
331 classmap[cls] = name
332 return name
333
334
335class Unpickler:
336
337 def __init__(self, file):
338 self.readline = file.readline
339 self.read = file.read
340 self.memo = {}
341
342 def load(self):
343 self.mark = ['spam'] # Any new unique object
344 self.stack = []
345 try:
346 while 1:
347 key = self.read(1)
348 self.dispatch[key](self)
349 except STOP, value:
350 return value
351
352 def marker(self):
353 k = len(self.stack)-1
354 while self.stack[k] != self.mark: k = k-1
355 return k
356
357 dispatch = {}
358
Guido van Rossum7b5430f1995-03-04 22:25:21 +0000359 def load_eof(self):
360 raise EOFError
361 dispatch[''] = load_eof
362
Guido van Rossuma48061a1995-01-10 00:31:14 +0000363 def load_persid(self):
364 pid = self.readline()[:-1]
365 self.stack.append(self.persisent_load(pid))
366 dispatch[PERSID] = load_persid
367
368 def load_none(self):
369 self.stack.append(None)
370 dispatch[NONE] = load_none
371
372 def load_atomic(self):
373 self.stack.append(eval(self.readline()[:-1]))
374 dispatch[INT] = load_atomic
375 dispatch[LONG] = load_atomic
376 dispatch[FLOAT] = load_atomic
377 dispatch[STRING] = load_atomic
378
379 def load_tuple(self):
380 k = self.marker()
381 self.stack[k:] = [tuple(self.stack[k+1:])]
382 dispatch[TUPLE] = load_tuple
383
384 def load_list(self):
385 k = self.marker()
386 self.stack[k:] = [self.stack[k+1:]]
387 dispatch[LIST] = load_list
388
389 def load_dict(self):
390 k = self.marker()
391 d = {}
392 items = self.stack[k+1:]
393 for i in range(0, len(items), 2):
394 key = items[i]
395 value = items[i+1]
396 d[key] = value
397 self.stack[k:] = [d]
398 dispatch[DICT] = load_dict
399
400 def load_inst(self):
401 k = self.marker()
402 args = tuple(self.stack[k+1:])
403 del self.stack[k:]
404 module = self.readline()[:-1]
405 name = self.readline()[:-1]
406 env = {}
407 try:
408 exec 'from %s import %s' % (module, name) in env
409 except ImportError:
410 raise SystemError, \
411 "Failed to import class %s from module %s" % \
412 (name, module)
413 else:
414 klass = env[name]
415 if type(klass) != ClassType:
416 raise SystemError, \
417 "imported object %s from module %s is not a class" % \
418 (name, module)
419 value = apply(klass, args)
420 self.stack.append(value)
421 dispatch[INST] = load_inst
422
423 def load_pop(self):
424 del self.stack[-1]
425 dispatch[POP] = load_pop
426
427 def load_dup(self):
428 stack.append(stack[-1])
429 dispatch[DUP] = load_dup
430
431 def load_get(self):
432 self.stack.append(self.memo[string.atoi(self.readline()[:-1])])
433 dispatch[GET] = load_get
434
435 def load_put(self):
436 self.memo[string.atoi(self.readline()[:-1])] = self.stack[-1]
437 dispatch[PUT] = load_put
438
439 def load_append(self):
440 value = self.stack[-1]
441 del self.stack[-1]
442 list = self.stack[-1]
443 list.append(value)
444 dispatch[APPEND] = load_append
445
446 def load_setitem(self):
447 value = self.stack[-1]
448 key = self.stack[-2]
449 del self.stack[-2:]
450 dict = self.stack[-1]
451 dict[key] = value
452 dispatch[SETITEM] = load_setitem
453
454 def load_build(self):
455 value = self.stack[-1]
456 del self.stack[-1]
457 inst = self.stack[-1]
458 try:
459 setstate = inst.__setstate__
460 except AttributeError:
461 for key in value.keys():
462 inst.__dict__[key] = value[key]
463 else:
464 setstate(value)
465 dispatch[BUILD] = load_build
466
467 def load_mark(self):
468 self.stack.append(self.mark)
469 dispatch[MARK] = load_mark
470
471 def load_stop(self):
472 value = self.stack[-1]
473 del self.stack[-1]
474 raise STOP, value
475 dispatch[STOP] = load_stop
476
477
478class C:
479 def __cmp__(self, other):
480 return cmp(self.__dict__, other.__dict__)
481
482def test():
483 fn = 'pickle_tmp'
484 c = C()
485 c.foo = 1
486 c.bar = 2
487 x = [0,1,2,3]
488 y = ('abc', 'abc', c, c)
489 x.append(y)
490 x.append(y)
491 x.append(5)
492 f = open(fn, 'w')
493 F = Pickler(f)
494 F.dump(x)
495 f.close()
496 f = open(fn, 'r')
497 U = Unpickler(f)
498 x2 = U.load()
499 print x
500 print x2
501 print x == x2
502 print map(id, x)
503 print map(id, x2)
504 print F.memo
505 print U.memo
506
507if __name__ == '__main__':
508 test()