blob: dd88a0ecb933d24220e49c068d17732371bc4d29 [file] [log] [blame]
Skip Montanaro54455942003-01-29 15:41:33 +00001'''"Executable documentation" for the pickle module.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002
3Extensive comments about the pickle protocols and pickle-machine opcodes
4can be found here. Some functions meant for external use:
5
6genops(pickle)
7 Generate all the opcodes in a pickle, as (opcode, arg, position) triples.
8
9dis(pickle, out=None, indentlevel=4)
10 Print a symbolic disassembly of a pickle.
Skip Montanaro54455942003-01-29 15:41:33 +000011'''
Tim Peters8ecfc8e2003-01-27 18:51:48 +000012
13# Other ideas:
14#
15# - A pickle verifier: read a pickle and check it exhaustively for
16# well-formedness.
17#
18# - A protocol identifier: examine a pickle and return its protocol number
19# (== the highest .proto attr value among all the opcodes in the pickle).
20#
21# - A pickle optimizer: for example, tuple-building code is sometimes more
22# elaborate than necessary, catering for the possibility that the tuple
23# is recursive. Or lots of times a PUT is generated that's never accessed
24# by a later GET.
25
26
27"""
28"A pickle" is a program for a virtual pickle machine (PM, but more accurately
29called an unpickling machine). It's a sequence of opcodes, interpreted by the
30PM, building an arbitrarily complex Python object.
31
32For the most part, the PM is very simple: there are no looping, testing, or
33conditional instructions, no arithmetic and no function calls. Opcodes are
34executed once each, from first to last, until a STOP opcode is reached.
35
36The PM has two data areas, "the stack" and "the memo".
37
38Many opcodes push Python objects onto the stack; e.g., INT pushes a Python
39integer object on the stack, whose value is gotten from a decimal string
40literal immediately following the INT opcode in the pickle bytestream. Other
41opcodes take Python objects off the stack. The result of unpickling is
42whatever object is left on the stack when the final STOP opcode is executed.
43
44The memo is simply an array of objects, or it can be implemented as a dict
45mapping little integers to objects. The memo serves as the PM's "long term
46memory", and the little integers indexing the memo are akin to variable
47names. Some opcodes pop a stack object into the memo at a given index,
48and others push a memo object at a given index onto the stack again.
49
50At heart, that's all the PM has. Subtleties arise for these reasons:
51
52+ Object identity. Objects can be arbitrarily complex, and subobjects
53 may be shared (for example, the list [a, a] refers to the same object a
54 twice). It can be vital that unpickling recreate an isomorphic object
55 graph, faithfully reproducing sharing.
56
57+ Recursive objects. For example, after "L = []; L.append(L)", L is a
58 list, and L[0] is the same list. This is related to the object identity
59 point, and some sequences of pickle opcodes are subtle in order to
60 get the right result in all cases.
61
62+ Things pickle doesn't know everything about. Examples of things pickle
63 does know everything about are Python's builtin scalar and container
64 types, like ints and tuples. They generally have opcodes dedicated to
65 them. For things like module references and instances of user-defined
66 classes, pickle's knowledge is limited. Historically, many enhancements
67 have been made to the pickle protocol in order to do a better (faster,
68 and/or more compact) job on those.
69
70+ Backward compatibility and micro-optimization. As explained below,
71 pickle opcodes never go away, not even when better ways to do a thing
72 get invented. The repertoire of the PM just keeps growing over time.
Tim Petersfdc03462003-01-28 04:56:33 +000073 For example, protocol 0 had two opcodes for building Python integers (INT
74 and LONG), protocol 1 added three more for more-efficient pickling of short
75 integers, and protocol 2 added two more for more-efficient pickling of
76 long integers (before protocol 2, the only ways to pickle a Python long
77 took time quadratic in the number of digits, for both pickling and
78 unpickling). "Opcode bloat" isn't so much a subtlety as a source of
Tim Peters8ecfc8e2003-01-27 18:51:48 +000079 wearying complication.
80
81
82Pickle protocols:
83
84For compatibility, the meaning of a pickle opcode never changes. Instead new
85pickle opcodes get added, and each version's unpickler can handle all the
86pickle opcodes in all protocol versions to date. So old pickles continue to
87be readable forever. The pickler can generally be told to restrict itself to
88the subset of opcodes available under previous protocol versions too, so that
89users can create pickles under the current version readable by older
90versions. However, a pickle does not contain its version number embedded
91within it. If an older unpickler tries to read a pickle using a later
92protocol, the result is most likely an exception due to seeing an unknown (in
93the older unpickler) opcode.
94
95The original pickle used what's now called "protocol 0", and what was called
96"text mode" before Python 2.3. The entire pickle bytestream is made up of
97printable 7-bit ASCII characters, plus the newline character, in protocol 0.
Tim Petersfdc03462003-01-28 04:56:33 +000098That's why it was called text mode. Protocol 0 is small and elegant, but
99sometimes painfully inefficient.
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000100
101The second major set of additions is now called "protocol 1", and was called
102"binary mode" before Python 2.3. This added many opcodes with arguments
103consisting of arbitrary bytes, including NUL bytes and unprintable "high bit"
104bytes. Binary mode pickles can be substantially smaller than equivalent
105text mode pickles, and sometimes faster too; e.g., BININT represents a 4-byte
106int as 4 bytes following the opcode, which is cheaper to unpickle than the
Tim Petersfdc03462003-01-28 04:56:33 +0000107(perhaps) 11-character decimal string attached to INT. Protocol 1 also added
108a number of opcodes that operate on many stack elements at once (like APPENDS
Tim Peters81098ac2003-01-28 05:12:08 +0000109and SETITEMS), and "shortcut" opcodes (like EMPTY_DICT and EMPTY_TUPLE).
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000110
111The third major set of additions came in Python 2.3, and is called "protocol
Tim Petersfdc03462003-01-28 04:56:33 +00001122". This added:
113
114- A better way to pickle instances of new-style classes (NEWOBJ).
115
116- A way for a pickle to identify its protocol (PROTO).
117
118- Time- and space- efficient pickling of long ints (LONG{1,4}).
119
120- Shortcuts for small tuples (TUPLE{1,2,3}}.
121
122- Dedicated opcodes for bools (NEWTRUE, NEWFALSE).
123
124- The "extension registry", a vector of popular objects that can be pushed
125 efficiently by index (EXT{1,2,4}). This is akin to the memo and GET, but
126 the registry contents are predefined (there's nothing akin to the memo's
127 PUT).
Guido van Rossumecb11042003-01-29 06:24:30 +0000128
Skip Montanaro54455942003-01-29 15:41:33 +0000129Another independent change with Python 2.3 is the abandonment of any
130pretense that it might be safe to load pickles received from untrusted
Guido van Rossumecb11042003-01-29 06:24:30 +0000131parties -- no sufficient security analysis has been done to guarantee
Skip Montanaro54455942003-01-29 15:41:33 +0000132this and there isn't a use case that warrants the expense of such an
Guido van Rossumecb11042003-01-29 06:24:30 +0000133analysis.
134
135To this end, all tests for __safe_for_unpickling__ or for
136copy_reg.safe_constructors are removed from the unpickling code.
137References to these variables in the descriptions below are to be seen
138as describing unpickling in Python 2.2 and before.
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000139"""
140
141# Meta-rule: Descriptions are stored in instances of descriptor objects,
142# with plain constructors. No meta-language is defined from which
143# descriptors could be constructed. If you want, e.g., XML, write a little
144# program to generate XML from the objects.
145
146##############################################################################
147# Some pickle opcodes have an argument, following the opcode in the
148# bytestream. An argument is of a specific type, described by an instance
149# of ArgumentDescriptor. These are not to be confused with arguments taken
150# off the stack -- ArgumentDescriptor applies only to arguments embedded in
151# the opcode stream, immediately following an opcode.
152
153# Represents the number of bytes consumed by an argument delimited by the
154# next newline character.
155UP_TO_NEWLINE = -1
156
157# Represents the number of bytes consumed by a two-argument opcode where
158# the first argument gives the number of bytes in the second argument.
Tim Petersfdb8cfa2003-01-28 00:13:19 +0000159TAKEN_FROM_ARGUMENT1 = -2 # num bytes is 1-byte unsigned int
160TAKEN_FROM_ARGUMENT4 = -3 # num bytes is 4-byte signed little-endian int
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000161
162class ArgumentDescriptor(object):
163 __slots__ = (
164 # name of descriptor record, also a module global name; a string
165 'name',
166
167 # length of argument, in bytes; an int; UP_TO_NEWLINE and
Tim Petersfdb8cfa2003-01-28 00:13:19 +0000168 # TAKEN_FROM_ARGUMENT{1,4} are negative values for variable-length
169 # cases
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000170 'n',
171
172 # a function taking a file-like object, reading this kind of argument
173 # from the object at the current position, advancing the current
174 # position by n bytes, and returning the value of the argument
175 'reader',
176
177 # human-readable docs for this arg descriptor; a string
178 'doc',
179 )
180
181 def __init__(self, name, n, reader, doc):
182 assert isinstance(name, str)
183 self.name = name
184
185 assert isinstance(n, int) and (n >= 0 or
Tim Petersfdb8cfa2003-01-28 00:13:19 +0000186 n in (UP_TO_NEWLINE,
187 TAKEN_FROM_ARGUMENT1,
188 TAKEN_FROM_ARGUMENT4))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000189 self.n = n
190
191 self.reader = reader
192
193 assert isinstance(doc, str)
194 self.doc = doc
195
196from struct import unpack as _unpack
197
198def read_uint1(f):
Tim Peters55762f52003-01-28 16:01:25 +0000199 r"""
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000200 >>> import StringIO
Tim Peters55762f52003-01-28 16:01:25 +0000201 >>> read_uint1(StringIO.StringIO('\xff'))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000202 255
203 """
204
205 data = f.read(1)
206 if data:
207 return ord(data)
208 raise ValueError("not enough data in stream to read uint1")
209
210uint1 = ArgumentDescriptor(
211 name='uint1',
212 n=1,
213 reader=read_uint1,
214 doc="One-byte unsigned integer.")
215
216
217def read_uint2(f):
Tim Peters55762f52003-01-28 16:01:25 +0000218 r"""
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000219 >>> import StringIO
Tim Peters55762f52003-01-28 16:01:25 +0000220 >>> read_uint2(StringIO.StringIO('\xff\x00'))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000221 255
Tim Peters55762f52003-01-28 16:01:25 +0000222 >>> read_uint2(StringIO.StringIO('\xff\xff'))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000223 65535
224 """
225
226 data = f.read(2)
227 if len(data) == 2:
228 return _unpack("<H", data)[0]
229 raise ValueError("not enough data in stream to read uint2")
230
231uint2 = ArgumentDescriptor(
232 name='uint2',
233 n=2,
234 reader=read_uint2,
235 doc="Two-byte unsigned integer, little-endian.")
236
237
238def read_int4(f):
Tim Peters55762f52003-01-28 16:01:25 +0000239 r"""
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000240 >>> import StringIO
Tim Peters55762f52003-01-28 16:01:25 +0000241 >>> read_int4(StringIO.StringIO('\xff\x00\x00\x00'))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000242 255
Tim Peters55762f52003-01-28 16:01:25 +0000243 >>> read_int4(StringIO.StringIO('\x00\x00\x00\x80')) == -(2**31)
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000244 True
245 """
246
247 data = f.read(4)
248 if len(data) == 4:
249 return _unpack("<i", data)[0]
250 raise ValueError("not enough data in stream to read int4")
251
252int4 = ArgumentDescriptor(
253 name='int4',
254 n=4,
255 reader=read_int4,
256 doc="Four-byte signed integer, little-endian, 2's complement.")
257
258
259def read_stringnl(f, decode=True, stripquotes=True):
Tim Peters55762f52003-01-28 16:01:25 +0000260 r"""
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000261 >>> import StringIO
Tim Peters55762f52003-01-28 16:01:25 +0000262 >>> read_stringnl(StringIO.StringIO("'abcd'\nefg\n"))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000263 'abcd'
264
Tim Peters55762f52003-01-28 16:01:25 +0000265 >>> read_stringnl(StringIO.StringIO("\n"))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000266 Traceback (most recent call last):
267 ...
268 ValueError: no string quotes around ''
269
Tim Peters55762f52003-01-28 16:01:25 +0000270 >>> read_stringnl(StringIO.StringIO("\n"), stripquotes=False)
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000271 ''
272
Tim Peters55762f52003-01-28 16:01:25 +0000273 >>> read_stringnl(StringIO.StringIO("''\n"))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000274 ''
275
276 >>> read_stringnl(StringIO.StringIO('"abcd"'))
277 Traceback (most recent call last):
278 ...
279 ValueError: no newline found when trying to read stringnl
280
281 Embedded escapes are undone in the result.
Tim Peters55762f52003-01-28 16:01:25 +0000282 >>> read_stringnl(StringIO.StringIO(r"'a\n\\b\x00c\td'" + "\n'e'"))
283 'a\n\\b\x00c\td'
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000284 """
285
286 data = f.readline()
287 if not data.endswith('\n'):
288 raise ValueError("no newline found when trying to read stringnl")
289 data = data[:-1] # lose the newline
290
291 if stripquotes:
292 for q in "'\"":
293 if data.startswith(q):
294 if not data.endswith(q):
295 raise ValueError("strinq quote %r not found at both "
296 "ends of %r" % (q, data))
297 data = data[1:-1]
298 break
299 else:
300 raise ValueError("no string quotes around %r" % data)
301
302 # I'm not sure when 'string_escape' was added to the std codecs; it's
303 # crazy not to use it if it's there.
304 if decode:
305 data = data.decode('string_escape')
306 return data
307
308stringnl = ArgumentDescriptor(
309 name='stringnl',
310 n=UP_TO_NEWLINE,
311 reader=read_stringnl,
312 doc="""A newline-terminated string.
313
314 This is a repr-style string, with embedded escapes, and
315 bracketing quotes.
316 """)
317
318def read_stringnl_noescape(f):
319 return read_stringnl(f, decode=False, stripquotes=False)
320
321stringnl_noescape = ArgumentDescriptor(
322 name='stringnl_noescape',
323 n=UP_TO_NEWLINE,
324 reader=read_stringnl_noescape,
325 doc="""A newline-terminated string.
326
327 This is a str-style string, without embedded escapes,
328 or bracketing quotes. It should consist solely of
329 printable ASCII characters.
330 """)
331
332def read_stringnl_noescape_pair(f):
Tim Peters55762f52003-01-28 16:01:25 +0000333 r"""
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000334 >>> import StringIO
Tim Peters55762f52003-01-28 16:01:25 +0000335 >>> read_stringnl_noescape_pair(StringIO.StringIO("Queue\nEmpty\njunk"))
Tim Petersd916cf42003-01-27 19:01:47 +0000336 'Queue Empty'
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000337 """
338
Tim Petersd916cf42003-01-27 19:01:47 +0000339 return "%s %s" % (read_stringnl_noescape(f), read_stringnl_noescape(f))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000340
341stringnl_noescape_pair = ArgumentDescriptor(
342 name='stringnl_noescape_pair',
343 n=UP_TO_NEWLINE,
344 reader=read_stringnl_noescape_pair,
345 doc="""A pair of newline-terminated strings.
346
347 These are str-style strings, without embedded
348 escapes, or bracketing quotes. They should
349 consist solely of printable ASCII characters.
350 The pair is returned as a single string, with
Tim Petersd916cf42003-01-27 19:01:47 +0000351 a single blank separating the two strings.
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000352 """)
353
354def read_string4(f):
Tim Peters55762f52003-01-28 16:01:25 +0000355 r"""
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000356 >>> import StringIO
Tim Peters55762f52003-01-28 16:01:25 +0000357 >>> read_string4(StringIO.StringIO("\x00\x00\x00\x00abc"))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000358 ''
Tim Peters55762f52003-01-28 16:01:25 +0000359 >>> read_string4(StringIO.StringIO("\x03\x00\x00\x00abcdef"))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000360 'abc'
Tim Peters55762f52003-01-28 16:01:25 +0000361 >>> read_string4(StringIO.StringIO("\x00\x00\x00\x03abcdef"))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000362 Traceback (most recent call last):
363 ...
364 ValueError: expected 50331648 bytes in a string4, but only 6 remain
365 """
366
367 n = read_int4(f)
368 if n < 0:
369 raise ValueError("string4 byte count < 0: %d" % n)
370 data = f.read(n)
371 if len(data) == n:
372 return data
373 raise ValueError("expected %d bytes in a string4, but only %d remain" %
374 (n, len(data)))
375
376string4 = ArgumentDescriptor(
377 name="string4",
Tim Petersfdb8cfa2003-01-28 00:13:19 +0000378 n=TAKEN_FROM_ARGUMENT4,
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000379 reader=read_string4,
380 doc="""A counted string.
381
382 The first argument is a 4-byte little-endian signed int giving
383 the number of bytes in the string, and the second argument is
384 that many bytes.
385 """)
386
387
388def read_string1(f):
Tim Peters55762f52003-01-28 16:01:25 +0000389 r"""
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000390 >>> import StringIO
Tim Peters55762f52003-01-28 16:01:25 +0000391 >>> read_string1(StringIO.StringIO("\x00"))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000392 ''
Tim Peters55762f52003-01-28 16:01:25 +0000393 >>> read_string1(StringIO.StringIO("\x03abcdef"))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000394 'abc'
395 """
396
397 n = read_uint1(f)
398 assert n >= 0
399 data = f.read(n)
400 if len(data) == n:
401 return data
402 raise ValueError("expected %d bytes in a string1, but only %d remain" %
403 (n, len(data)))
404
405string1 = ArgumentDescriptor(
406 name="string1",
Tim Petersfdb8cfa2003-01-28 00:13:19 +0000407 n=TAKEN_FROM_ARGUMENT1,
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000408 reader=read_string1,
409 doc="""A counted string.
410
411 The first argument is a 1-byte unsigned int giving the number
412 of bytes in the string, and the second argument is that many
413 bytes.
414 """)
415
416
417def read_unicodestringnl(f):
Tim Peters55762f52003-01-28 16:01:25 +0000418 r"""
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000419 >>> import StringIO
Tim Peters55762f52003-01-28 16:01:25 +0000420 >>> read_unicodestringnl(StringIO.StringIO("abc\uabcd\njunk"))
421 u'abc\uabcd'
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000422 """
423
424 data = f.readline()
425 if not data.endswith('\n'):
426 raise ValueError("no newline found when trying to read "
427 "unicodestringnl")
428 data = data[:-1] # lose the newline
429 return unicode(data, 'raw-unicode-escape')
430
431unicodestringnl = ArgumentDescriptor(
432 name='unicodestringnl',
433 n=UP_TO_NEWLINE,
434 reader=read_unicodestringnl,
435 doc="""A newline-terminated Unicode string.
436
437 This is raw-unicode-escape encoded, so consists of
438 printable ASCII characters, and may contain embedded
439 escape sequences.
440 """)
441
442def read_unicodestring4(f):
Tim Peters55762f52003-01-28 16:01:25 +0000443 r"""
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000444 >>> import StringIO
Tim Peters55762f52003-01-28 16:01:25 +0000445 >>> s = u'abcd\uabcd'
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000446 >>> enc = s.encode('utf-8')
447 >>> enc
Tim Peters55762f52003-01-28 16:01:25 +0000448 'abcd\xea\xaf\x8d'
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000449 >>> n = chr(len(enc)) + chr(0) * 3 # little-endian 4-byte length
450 >>> t = read_unicodestring4(StringIO.StringIO(n + enc + 'junk'))
451 >>> s == t
452 True
453
454 >>> read_unicodestring4(StringIO.StringIO(n + enc[:-1]))
455 Traceback (most recent call last):
456 ...
457 ValueError: expected 7 bytes in a unicodestring4, but only 6 remain
458 """
459
460 n = read_int4(f)
461 if n < 0:
462 raise ValueError("unicodestring4 byte count < 0: %d" % n)
463 data = f.read(n)
464 if len(data) == n:
465 return unicode(data, 'utf-8')
466 raise ValueError("expected %d bytes in a unicodestring4, but only %d "
467 "remain" % (n, len(data)))
468
469unicodestring4 = ArgumentDescriptor(
470 name="unicodestring4",
Tim Petersfdb8cfa2003-01-28 00:13:19 +0000471 n=TAKEN_FROM_ARGUMENT4,
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000472 reader=read_unicodestring4,
473 doc="""A counted Unicode string.
474
475 The first argument is a 4-byte little-endian signed int
476 giving the number of bytes in the string, and the second
477 argument-- the UTF-8 encoding of the Unicode string --
478 contains that many bytes.
479 """)
480
481
482def read_decimalnl_short(f):
Tim Peters55762f52003-01-28 16:01:25 +0000483 r"""
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000484 >>> import StringIO
Tim Peters55762f52003-01-28 16:01:25 +0000485 >>> read_decimalnl_short(StringIO.StringIO("1234\n56"))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000486 1234
487
Tim Peters55762f52003-01-28 16:01:25 +0000488 >>> read_decimalnl_short(StringIO.StringIO("1234L\n56"))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000489 Traceback (most recent call last):
490 ...
491 ValueError: trailing 'L' not allowed in '1234L'
492 """
493
494 s = read_stringnl(f, decode=False, stripquotes=False)
495 if s.endswith("L"):
496 raise ValueError("trailing 'L' not allowed in %r" % s)
497
498 # It's not necessarily true that the result fits in a Python short int:
499 # the pickle may have been written on a 64-bit box. There's also a hack
500 # for True and False here.
501 if s == "00":
502 return False
503 elif s == "01":
504 return True
505
506 try:
507 return int(s)
508 except OverflowError:
509 return long(s)
510
511def read_decimalnl_long(f):
Tim Peters55762f52003-01-28 16:01:25 +0000512 r"""
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000513 >>> import StringIO
514
Tim Peters55762f52003-01-28 16:01:25 +0000515 >>> read_decimalnl_long(StringIO.StringIO("1234\n56"))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000516 Traceback (most recent call last):
517 ...
518 ValueError: trailing 'L' required in '1234'
519
520 Someday the trailing 'L' will probably go away from this output.
521
Tim Peters55762f52003-01-28 16:01:25 +0000522 >>> read_decimalnl_long(StringIO.StringIO("1234L\n56"))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000523 1234L
524
Tim Peters55762f52003-01-28 16:01:25 +0000525 >>> read_decimalnl_long(StringIO.StringIO("123456789012345678901234L\n6"))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000526 123456789012345678901234L
527 """
528
529 s = read_stringnl(f, decode=False, stripquotes=False)
530 if not s.endswith("L"):
531 raise ValueError("trailing 'L' required in %r" % s)
532 return long(s)
533
534
535decimalnl_short = ArgumentDescriptor(
536 name='decimalnl_short',
537 n=UP_TO_NEWLINE,
538 reader=read_decimalnl_short,
539 doc="""A newline-terminated decimal integer literal.
540
541 This never has a trailing 'L', and the integer fit
542 in a short Python int on the box where the pickle
543 was written -- but there's no guarantee it will fit
544 in a short Python int on the box where the pickle
545 is read.
546 """)
547
548decimalnl_long = ArgumentDescriptor(
549 name='decimalnl_long',
550 n=UP_TO_NEWLINE,
551 reader=read_decimalnl_long,
552 doc="""A newline-terminated decimal integer literal.
553
554 This has a trailing 'L', and can represent integers
555 of any size.
556 """)
557
558
559def read_floatnl(f):
Tim Peters55762f52003-01-28 16:01:25 +0000560 r"""
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000561 >>> import StringIO
Tim Peters55762f52003-01-28 16:01:25 +0000562 >>> read_floatnl(StringIO.StringIO("-1.25\n6"))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000563 -1.25
564 """
565 s = read_stringnl(f, decode=False, stripquotes=False)
566 return float(s)
567
568floatnl = ArgumentDescriptor(
569 name='floatnl',
570 n=UP_TO_NEWLINE,
571 reader=read_floatnl,
572 doc="""A newline-terminated decimal floating literal.
573
574 In general this requires 17 significant digits for roundtrip
575 identity, and pickling then unpickling infinities, NaNs, and
576 minus zero doesn't work across boxes, or on some boxes even
577 on itself (e.g., Windows can't read the strings it produces
578 for infinities or NaNs).
579 """)
580
581def read_float8(f):
Tim Peters55762f52003-01-28 16:01:25 +0000582 r"""
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000583 >>> import StringIO, struct
584 >>> raw = struct.pack(">d", -1.25)
585 >>> raw
Tim Peters55762f52003-01-28 16:01:25 +0000586 '\xbf\xf4\x00\x00\x00\x00\x00\x00'
587 >>> read_float8(StringIO.StringIO(raw + "\n"))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000588 -1.25
589 """
590
591 data = f.read(8)
592 if len(data) == 8:
593 return _unpack(">d", data)[0]
594 raise ValueError("not enough data in stream to read float8")
595
596
597float8 = ArgumentDescriptor(
598 name='float8',
599 n=8,
600 reader=read_float8,
601 doc="""An 8-byte binary representation of a float, big-endian.
602
603 The format is unique to Python, and shared with the struct
604 module (format string '>d') "in theory" (the struct and cPickle
605 implementations don't share the code -- they should). It's
606 strongly related to the IEEE-754 double format, and, in normal
607 cases, is in fact identical to the big-endian 754 double format.
608 On other boxes the dynamic range is limited to that of a 754
609 double, and "add a half and chop" rounding is used to reduce
610 the precision to 53 bits. However, even on a 754 box,
611 infinities, NaNs, and minus zero may not be handled correctly
612 (may not survive roundtrip pickling intact).
613 """)
614
Guido van Rossum5a2d8f52003-01-27 21:44:25 +0000615# Protocol 2 formats
616
Tim Petersc0c12b52003-01-29 00:56:17 +0000617from pickle import decode_long
Guido van Rossum5a2d8f52003-01-27 21:44:25 +0000618
619def read_long1(f):
620 r"""
621 >>> import StringIO
622 >>> read_long1(StringIO.StringIO("\x02\xff\x00"))
623 255L
624 >>> read_long1(StringIO.StringIO("\x02\xff\x7f"))
625 32767L
626 >>> read_long1(StringIO.StringIO("\x02\x00\xff"))
627 -256L
628 >>> read_long1(StringIO.StringIO("\x02\x00\x80"))
629 -32768L
Tim Peters5eed3402003-01-27 23:51:36 +0000630 >>>
Guido van Rossum5a2d8f52003-01-27 21:44:25 +0000631 """
632
633 n = read_uint1(f)
634 data = f.read(n)
635 if len(data) != n:
636 raise ValueError("not enough data in stream to read long1")
637 return decode_long(data)
638
639long1 = ArgumentDescriptor(
640 name="long1",
Tim Petersfdb8cfa2003-01-28 00:13:19 +0000641 n=TAKEN_FROM_ARGUMENT1,
Guido van Rossum5a2d8f52003-01-27 21:44:25 +0000642 reader=read_long1,
643 doc="""A binary long, little-endian, using 1-byte size.
644
645 This first reads one byte as an unsigned size, then reads that
Tim Petersbdbe7412003-01-27 23:54:04 +0000646 many bytes and interprets them as a little-endian 2's-complement long.
Guido van Rossum5a2d8f52003-01-27 21:44:25 +0000647 """)
648
Guido van Rossum5a2d8f52003-01-27 21:44:25 +0000649def read_long4(f):
650 r"""
651 >>> import StringIO
652 >>> read_long4(StringIO.StringIO("\x02\x00\x00\x00\xff\x00"))
653 255L
654 >>> read_long4(StringIO.StringIO("\x02\x00\x00\x00\xff\x7f"))
655 32767L
656 >>> read_long4(StringIO.StringIO("\x02\x00\x00\x00\x00\xff"))
657 -256L
658 >>> read_long4(StringIO.StringIO("\x02\x00\x00\x00\x00\x80"))
659 -32768L
Tim Peters5eed3402003-01-27 23:51:36 +0000660 >>>
Guido van Rossum5a2d8f52003-01-27 21:44:25 +0000661 """
662
663 n = read_int4(f)
664 if n < 0:
Neal Norwitz784a3f52003-01-28 00:20:41 +0000665 raise ValueError("long4 byte count < 0: %d" % n)
Guido van Rossum5a2d8f52003-01-27 21:44:25 +0000666 data = f.read(n)
667 if len(data) != n:
Neal Norwitz784a3f52003-01-28 00:20:41 +0000668 raise ValueError("not enough data in stream to read long4")
Guido van Rossum5a2d8f52003-01-27 21:44:25 +0000669 return decode_long(data)
670
671long4 = ArgumentDescriptor(
672 name="long4",
Tim Petersfdb8cfa2003-01-28 00:13:19 +0000673 n=TAKEN_FROM_ARGUMENT4,
Guido van Rossum5a2d8f52003-01-27 21:44:25 +0000674 reader=read_long4,
675 doc="""A binary representation of a long, little-endian.
676
677 This first reads four bytes as a signed size (but requires the
678 size to be >= 0), then reads that many bytes and interprets them
Tim Petersbdbe7412003-01-27 23:54:04 +0000679 as a little-endian 2's-complement long.
Guido van Rossum5a2d8f52003-01-27 21:44:25 +0000680 """)
681
682
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000683##############################################################################
684# Object descriptors. The stack used by the pickle machine holds objects,
685# and in the stack_before and stack_after attributes of OpcodeInfo
686# descriptors we need names to describe the various types of objects that can
687# appear on the stack.
688
689class StackObject(object):
690 __slots__ = (
691 # name of descriptor record, for info only
692 'name',
693
694 # type of object, or tuple of type objects (meaning the object can
695 # be of any type in the tuple)
696 'obtype',
697
698 # human-readable docs for this kind of stack object; a string
699 'doc',
700 )
701
702 def __init__(self, name, obtype, doc):
703 assert isinstance(name, str)
704 self.name = name
705
706 assert isinstance(obtype, type) or isinstance(obtype, tuple)
707 if isinstance(obtype, tuple):
708 for contained in obtype:
709 assert isinstance(contained, type)
710 self.obtype = obtype
711
712 assert isinstance(doc, str)
713 self.doc = doc
714
715
716pyint = StackObject(
717 name='int',
718 obtype=int,
719 doc="A short (as opposed to long) Python integer object.")
720
721pylong = StackObject(
722 name='long',
723 obtype=long,
724 doc="A long (as opposed to short) Python integer object.")
725
726pyinteger_or_bool = StackObject(
727 name='int_or_bool',
728 obtype=(int, long, bool),
729 doc="A Python integer object (short or long), or "
730 "a Python bool.")
731
Guido van Rossum5a2d8f52003-01-27 21:44:25 +0000732pybool = StackObject(
733 name='bool',
734 obtype=(bool,),
735 doc="A Python bool object.")
736
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000737pyfloat = StackObject(
738 name='float',
739 obtype=float,
740 doc="A Python float object.")
741
742pystring = StackObject(
743 name='str',
744 obtype=str,
745 doc="A Python string object.")
746
747pyunicode = StackObject(
748 name='unicode',
749 obtype=unicode,
750 doc="A Python Unicode string object.")
751
752pynone = StackObject(
753 name="None",
754 obtype=type(None),
755 doc="The Python None object.")
756
757pytuple = StackObject(
758 name="tuple",
759 obtype=tuple,
760 doc="A Python tuple object.")
761
762pylist = StackObject(
763 name="list",
764 obtype=list,
765 doc="A Python list object.")
766
767pydict = StackObject(
768 name="dict",
769 obtype=dict,
770 doc="A Python dict object.")
771
772anyobject = StackObject(
773 name='any',
774 obtype=object,
775 doc="Any kind of object whatsoever.")
776
777markobject = StackObject(
778 name="mark",
779 obtype=StackObject,
780 doc="""'The mark' is a unique object.
781
782 Opcodes that operate on a variable number of objects
783 generally don't embed the count of objects in the opcode,
784 or pull it off the stack. Instead the MARK opcode is used
785 to push a special marker object on the stack, and then
786 some other opcodes grab all the objects from the top of
787 the stack down to (but not including) the topmost marker
788 object.
789 """)
790
791stackslice = StackObject(
792 name="stackslice",
793 obtype=StackObject,
794 doc="""An object representing a contiguous slice of the stack.
795
796 This is used in conjuction with markobject, to represent all
797 of the stack following the topmost markobject. For example,
798 the POP_MARK opcode changes the stack from
799
800 [..., markobject, stackslice]
801 to
802 [...]
803
804 No matter how many object are on the stack after the topmost
805 markobject, POP_MARK gets rid of all of them (including the
806 topmost markobject too).
807 """)
808
809##############################################################################
810# Descriptors for pickle opcodes.
811
812class OpcodeInfo(object):
813
814 __slots__ = (
815 # symbolic name of opcode; a string
816 'name',
817
818 # the code used in a bytestream to represent the opcode; a
819 # one-character string
820 'code',
821
822 # If the opcode has an argument embedded in the byte string, an
823 # instance of ArgumentDescriptor specifying its type. Note that
824 # arg.reader(s) can be used to read and decode the argument from
825 # the bytestream s, and arg.doc documents the format of the raw
826 # argument bytes. If the opcode doesn't have an argument embedded
827 # in the bytestream, arg should be None.
828 'arg',
829
830 # what the stack looks like before this opcode runs; a list
831 'stack_before',
832
833 # what the stack looks like after this opcode runs; a list
834 'stack_after',
835
836 # the protocol number in which this opcode was introduced; an int
837 'proto',
838
839 # human-readable docs for this opcode; a string
840 'doc',
841 )
842
843 def __init__(self, name, code, arg,
844 stack_before, stack_after, proto, doc):
845 assert isinstance(name, str)
846 self.name = name
847
848 assert isinstance(code, str)
849 assert len(code) == 1
850 self.code = code
851
852 assert arg is None or isinstance(arg, ArgumentDescriptor)
853 self.arg = arg
854
855 assert isinstance(stack_before, list)
856 for x in stack_before:
857 assert isinstance(x, StackObject)
858 self.stack_before = stack_before
859
860 assert isinstance(stack_after, list)
861 for x in stack_after:
862 assert isinstance(x, StackObject)
863 self.stack_after = stack_after
864
865 assert isinstance(proto, int) and 0 <= proto <= 2
866 self.proto = proto
867
868 assert isinstance(doc, str)
869 self.doc = doc
870
871I = OpcodeInfo
872opcodes = [
873
874 # Ways to spell integers.
875
876 I(name='INT',
877 code='I',
878 arg=decimalnl_short,
879 stack_before=[],
880 stack_after=[pyinteger_or_bool],
881 proto=0,
882 doc="""Push an integer or bool.
883
884 The argument is a newline-terminated decimal literal string.
885
886 The intent may have been that this always fit in a short Python int,
887 but INT can be generated in pickles written on a 64-bit box that
888 require a Python long on a 32-bit box. The difference between this
889 and LONG then is that INT skips a trailing 'L', and produces a short
890 int whenever possible.
891
892 Another difference is due to that, when bool was introduced as a
893 distinct type in 2.3, builtin names True and False were also added to
894 2.2.2, mapping to ints 1 and 0. For compatibility in both directions,
895 True gets pickled as INT + "I01\\n", and False as INT + "I00\\n".
896 Leading zeroes are never produced for a genuine integer. The 2.3
897 (and later) unpicklers special-case these and return bool instead;
898 earlier unpicklers ignore the leading "0" and return the int.
899 """),
900
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000901 I(name='BININT',
902 code='J',
903 arg=int4,
904 stack_before=[],
905 stack_after=[pyint],
906 proto=1,
907 doc="""Push a four-byte signed integer.
908
909 This handles the full range of Python (short) integers on a 32-bit
910 box, directly as binary bytes (1 for the opcode and 4 for the integer).
911 If the integer is non-negative and fits in 1 or 2 bytes, pickling via
912 BININT1 or BININT2 saves space.
913 """),
914
915 I(name='BININT1',
916 code='K',
917 arg=uint1,
918 stack_before=[],
919 stack_after=[pyint],
920 proto=1,
921 doc="""Push a one-byte unsigned integer.
922
923 This is a space optimization for pickling very small non-negative ints,
924 in range(256).
925 """),
926
927 I(name='BININT2',
928 code='M',
929 arg=uint2,
930 stack_before=[],
931 stack_after=[pyint],
932 proto=1,
933 doc="""Push a two-byte unsigned integer.
934
935 This is a space optimization for pickling small positive ints, in
936 range(256, 2**16). Integers in range(256) can also be pickled via
937 BININT2, but BININT1 instead saves a byte.
938 """),
939
Tim Petersfdc03462003-01-28 04:56:33 +0000940 I(name='LONG',
941 code='L',
942 arg=decimalnl_long,
943 stack_before=[],
944 stack_after=[pylong],
945 proto=0,
946 doc="""Push a long integer.
947
948 The same as INT, except that the literal ends with 'L', and always
949 unpickles to a Python long. There doesn't seem a real purpose to the
950 trailing 'L'.
951
952 Note that LONG takes time quadratic in the number of digits when
953 unpickling (this is simply due to the nature of decimal->binary
954 conversion). Proto 2 added linear-time (in C; still quadratic-time
955 in Python) LONG1 and LONG4 opcodes.
956 """),
957
958 I(name="LONG1",
959 code='\x8a',
960 arg=long1,
961 stack_before=[],
962 stack_after=[pylong],
963 proto=2,
964 doc="""Long integer using one-byte length.
965
966 A more efficient encoding of a Python long; the long1 encoding
967 says it all."""),
968
969 I(name="LONG4",
970 code='\x8b',
971 arg=long4,
972 stack_before=[],
973 stack_after=[pylong],
974 proto=2,
975 doc="""Long integer using found-byte length.
976
977 A more efficient encoding of a Python long; the long4 encoding
978 says it all."""),
979
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000980 # Ways to spell strings (8-bit, not Unicode).
981
982 I(name='STRING',
983 code='S',
984 arg=stringnl,
985 stack_before=[],
986 stack_after=[pystring],
987 proto=0,
988 doc="""Push a Python string object.
989
990 The argument is a repr-style string, with bracketing quote characters,
991 and perhaps embedded escapes. The argument extends until the next
992 newline character.
993 """),
994
995 I(name='BINSTRING',
996 code='T',
997 arg=string4,
998 stack_before=[],
999 stack_after=[pystring],
1000 proto=1,
1001 doc="""Push a Python string object.
1002
1003 There are two arguments: the first is a 4-byte little-endian signed int
1004 giving the number of bytes in the string, and the second is that many
1005 bytes, which are taken literally as the string content.
1006 """),
1007
1008 I(name='SHORT_BINSTRING',
1009 code='U',
1010 arg=string1,
1011 stack_before=[],
1012 stack_after=[pystring],
1013 proto=1,
1014 doc="""Push a Python string object.
1015
1016 There are two arguments: the first is a 1-byte unsigned int giving
1017 the number of bytes in the string, and the second is that many bytes,
1018 which are taken literally as the string content.
1019 """),
1020
1021 # Ways to spell None.
1022
1023 I(name='NONE',
1024 code='N',
1025 arg=None,
1026 stack_before=[],
1027 stack_after=[pynone],
1028 proto=0,
1029 doc="Push None on the stack."),
1030
Tim Petersfdc03462003-01-28 04:56:33 +00001031 # Ways to spell bools, starting with proto 2. See INT for how this was
1032 # done before proto 2.
1033
1034 I(name='NEWTRUE',
1035 code='\x88',
1036 arg=None,
1037 stack_before=[],
1038 stack_after=[pybool],
1039 proto=2,
1040 doc="""True.
1041
1042 Push True onto the stack."""),
1043
1044 I(name='NEWFALSE',
1045 code='\x89',
1046 arg=None,
1047 stack_before=[],
1048 stack_after=[pybool],
1049 proto=2,
1050 doc="""True.
1051
1052 Push False onto the stack."""),
1053
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001054 # Ways to spell Unicode strings.
1055
1056 I(name='UNICODE',
1057 code='V',
1058 arg=unicodestringnl,
1059 stack_before=[],
1060 stack_after=[pyunicode],
1061 proto=0, # this may be pure-text, but it's a later addition
1062 doc="""Push a Python Unicode string object.
1063
1064 The argument is a raw-unicode-escape encoding of a Unicode string,
1065 and so may contain embedded escape sequences. The argument extends
1066 until the next newline character.
1067 """),
1068
1069 I(name='BINUNICODE',
1070 code='X',
1071 arg=unicodestring4,
1072 stack_before=[],
1073 stack_after=[pyunicode],
1074 proto=1,
1075 doc="""Push a Python Unicode string object.
1076
1077 There are two arguments: the first is a 4-byte little-endian signed int
1078 giving the number of bytes in the string. The second is that many
1079 bytes, and is the UTF-8 encoding of the Unicode string.
1080 """),
1081
1082 # Ways to spell floats.
1083
1084 I(name='FLOAT',
1085 code='F',
1086 arg=floatnl,
1087 stack_before=[],
1088 stack_after=[pyfloat],
1089 proto=0,
1090 doc="""Newline-terminated decimal float literal.
1091
1092 The argument is repr(a_float), and in general requires 17 significant
1093 digits for roundtrip conversion to be an identity (this is so for
1094 IEEE-754 double precision values, which is what Python float maps to
1095 on most boxes).
1096
1097 In general, FLOAT cannot be used to transport infinities, NaNs, or
1098 minus zero across boxes (or even on a single box, if the platform C
1099 library can't read the strings it produces for such things -- Windows
1100 is like that), but may do less damage than BINFLOAT on boxes with
1101 greater precision or dynamic range than IEEE-754 double.
1102 """),
1103
1104 I(name='BINFLOAT',
1105 code='G',
1106 arg=float8,
1107 stack_before=[],
1108 stack_after=[pyfloat],
1109 proto=1,
1110 doc="""Float stored in binary form, with 8 bytes of data.
1111
1112 This generally requires less than half the space of FLOAT encoding.
1113 In general, BINFLOAT cannot be used to transport infinities, NaNs, or
1114 minus zero, raises an exception if the exponent exceeds the range of
1115 an IEEE-754 double, and retains no more than 53 bits of precision (if
1116 there are more than that, "add a half and chop" rounding is used to
1117 cut it back to 53 significant bits).
1118 """),
1119
1120 # Ways to build lists.
1121
1122 I(name='EMPTY_LIST',
1123 code=']',
1124 arg=None,
1125 stack_before=[],
1126 stack_after=[pylist],
1127 proto=1,
1128 doc="Push an empty list."),
1129
1130 I(name='APPEND',
1131 code='a',
1132 arg=None,
1133 stack_before=[pylist, anyobject],
1134 stack_after=[pylist],
1135 proto=0,
1136 doc="""Append an object to a list.
1137
1138 Stack before: ... pylist anyobject
1139 Stack after: ... pylist+[anyobject]
Tim Peters81098ac2003-01-28 05:12:08 +00001140
1141 although pylist is really extended in-place.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001142 """),
1143
1144 I(name='APPENDS',
1145 code='e',
1146 arg=None,
1147 stack_before=[pylist, markobject, stackslice],
1148 stack_after=[pylist],
1149 proto=1,
1150 doc="""Extend a list by a slice of stack objects.
1151
1152 Stack before: ... pylist markobject stackslice
1153 Stack after: ... pylist+stackslice
Tim Peters81098ac2003-01-28 05:12:08 +00001154
1155 although pylist is really extended in-place.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001156 """),
1157
1158 I(name='LIST',
1159 code='l',
1160 arg=None,
1161 stack_before=[markobject, stackslice],
1162 stack_after=[pylist],
1163 proto=0,
1164 doc="""Build a list out of the topmost stack slice, after markobject.
1165
1166 All the stack entries following the topmost markobject are placed into
1167 a single Python list, which single list object replaces all of the
1168 stack from the topmost markobject onward. For example,
1169
1170 Stack before: ... markobject 1 2 3 'abc'
1171 Stack after: ... [1, 2, 3, 'abc']
1172 """),
1173
1174 # Ways to build tuples.
1175
1176 I(name='EMPTY_TUPLE',
1177 code=')',
1178 arg=None,
1179 stack_before=[],
1180 stack_after=[pytuple],
1181 proto=1,
1182 doc="Push an empty tuple."),
1183
1184 I(name='TUPLE',
1185 code='t',
1186 arg=None,
1187 stack_before=[markobject, stackslice],
1188 stack_after=[pytuple],
1189 proto=0,
1190 doc="""Build a tuple out of the topmost stack slice, after markobject.
1191
1192 All the stack entries following the topmost markobject are placed into
1193 a single Python tuple, which single tuple object replaces all of the
1194 stack from the topmost markobject onward. For example,
1195
1196 Stack before: ... markobject 1 2 3 'abc'
1197 Stack after: ... (1, 2, 3, 'abc')
1198 """),
1199
Tim Petersfdc03462003-01-28 04:56:33 +00001200 I(name='TUPLE1',
1201 code='\x85',
1202 arg=None,
1203 stack_before=[anyobject],
1204 stack_after=[pytuple],
1205 proto=2,
1206 doc="""One-tuple.
1207
1208 This code pops one value off the stack and pushes a tuple of
1209 length 1 whose one item is that value back onto it. IOW:
1210
1211 stack[-1] = tuple(stack[-1:])
1212 """),
1213
1214 I(name='TUPLE2',
1215 code='\x86',
1216 arg=None,
1217 stack_before=[anyobject, anyobject],
1218 stack_after=[pytuple],
1219 proto=2,
1220 doc="""One-tuple.
1221
1222 This code pops two values off the stack and pushes a tuple
1223 of length 2 whose items are those values back onto it. IOW:
1224
1225 stack[-2:] = [tuple(stack[-2:])]
1226 """),
1227
1228 I(name='TUPLE3',
1229 code='\x87',
1230 arg=None,
1231 stack_before=[anyobject, anyobject, anyobject],
1232 stack_after=[pytuple],
1233 proto=2,
1234 doc="""One-tuple.
1235
1236 This code pops three values off the stack and pushes a tuple
1237 of length 3 whose items are those values back onto it. IOW:
1238
1239 stack[-3:] = [tuple(stack[-3:])]
1240 """),
1241
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001242 # Ways to build dicts.
1243
1244 I(name='EMPTY_DICT',
1245 code='}',
1246 arg=None,
1247 stack_before=[],
1248 stack_after=[pydict],
1249 proto=1,
1250 doc="Push an empty dict."),
1251
1252 I(name='DICT',
1253 code='d',
1254 arg=None,
1255 stack_before=[markobject, stackslice],
1256 stack_after=[pydict],
1257 proto=0,
1258 doc="""Build a dict out of the topmost stack slice, after markobject.
1259
1260 All the stack entries following the topmost markobject are placed into
1261 a single Python dict, which single dict object replaces all of the
1262 stack from the topmost markobject onward. The stack slice alternates
1263 key, value, key, value, .... For example,
1264
1265 Stack before: ... markobject 1 2 3 'abc'
1266 Stack after: ... {1: 2, 3: 'abc'}
1267 """),
1268
1269 I(name='SETITEM',
1270 code='s',
1271 arg=None,
1272 stack_before=[pydict, anyobject, anyobject],
1273 stack_after=[pydict],
1274 proto=0,
1275 doc="""Add a key+value pair to an existing dict.
1276
1277 Stack before: ... pydict key value
1278 Stack after: ... pydict
1279
1280 where pydict has been modified via pydict[key] = value.
1281 """),
1282
1283 I(name='SETITEMS',
1284 code='u',
1285 arg=None,
1286 stack_before=[pydict, markobject, stackslice],
1287 stack_after=[pydict],
1288 proto=1,
1289 doc="""Add an arbitrary number of key+value pairs to an existing dict.
1290
1291 The slice of the stack following the topmost markobject is taken as
1292 an alternating sequence of keys and values, added to the dict
1293 immediately under the topmost markobject. Everything at and after the
1294 topmost markobject is popped, leaving the mutated dict at the top
1295 of the stack.
1296
1297 Stack before: ... pydict markobject key_1 value_1 ... key_n value_n
1298 Stack after: ... pydict
1299
1300 where pydict has been modified via pydict[key_i] = value_i for i in
1301 1, 2, ..., n, and in that order.
1302 """),
1303
1304 # Stack manipulation.
1305
1306 I(name='POP',
1307 code='0',
1308 arg=None,
1309 stack_before=[anyobject],
1310 stack_after=[],
1311 proto=0,
1312 doc="Discard the top stack item, shrinking the stack by one item."),
1313
1314 I(name='DUP',
1315 code='2',
1316 arg=None,
1317 stack_before=[anyobject],
1318 stack_after=[anyobject, anyobject],
1319 proto=0,
1320 doc="Push the top stack item onto the stack again, duplicating it."),
1321
1322 I(name='MARK',
1323 code='(',
1324 arg=None,
1325 stack_before=[],
1326 stack_after=[markobject],
1327 proto=0,
1328 doc="""Push markobject onto the stack.
1329
1330 markobject is a unique object, used by other opcodes to identify a
1331 region of the stack containing a variable number of objects for them
1332 to work on. See markobject.doc for more detail.
1333 """),
1334
1335 I(name='POP_MARK',
1336 code='1',
1337 arg=None,
1338 stack_before=[markobject, stackslice],
1339 stack_after=[],
1340 proto=0,
1341 doc="""Pop all the stack objects at and above the topmost markobject.
1342
1343 When an opcode using a variable number of stack objects is done,
1344 POP_MARK is used to remove those objects, and to remove the markobject
1345 that delimited their starting position on the stack.
1346 """),
1347
1348 # Memo manipulation. There are really only two operations (get and put),
1349 # each in all-text, "short binary", and "long binary" flavors.
1350
1351 I(name='GET',
1352 code='g',
1353 arg=decimalnl_short,
1354 stack_before=[],
1355 stack_after=[anyobject],
1356 proto=0,
1357 doc="""Read an object from the memo and push it on the stack.
1358
1359 The index of the memo object to push is given by the newline-teriminated
1360 decimal string following. BINGET and LONG_BINGET are space-optimized
1361 versions.
1362 """),
1363
1364 I(name='BINGET',
1365 code='h',
1366 arg=uint1,
1367 stack_before=[],
1368 stack_after=[anyobject],
1369 proto=1,
1370 doc="""Read an object from the memo and push it on the stack.
1371
1372 The index of the memo object to push is given by the 1-byte unsigned
1373 integer following.
1374 """),
1375
1376 I(name='LONG_BINGET',
1377 code='j',
1378 arg=int4,
1379 stack_before=[],
1380 stack_after=[anyobject],
1381 proto=1,
1382 doc="""Read an object from the memo and push it on the stack.
1383
1384 The index of the memo object to push is given by the 4-byte signed
1385 little-endian integer following.
1386 """),
1387
1388 I(name='PUT',
1389 code='p',
1390 arg=decimalnl_short,
1391 stack_before=[],
1392 stack_after=[],
1393 proto=0,
1394 doc="""Store the stack top into the memo. The stack is not popped.
1395
1396 The index of the memo location to write into is given by the newline-
1397 terminated decimal string following. BINPUT and LONG_BINPUT are
1398 space-optimized versions.
1399 """),
1400
1401 I(name='BINPUT',
1402 code='q',
1403 arg=uint1,
1404 stack_before=[],
1405 stack_after=[],
1406 proto=1,
1407 doc="""Store the stack top into the memo. The stack is not popped.
1408
1409 The index of the memo location to write into is given by the 1-byte
1410 unsigned integer following.
1411 """),
1412
1413 I(name='LONG_BINPUT',
1414 code='r',
1415 arg=int4,
1416 stack_before=[],
1417 stack_after=[],
1418 proto=1,
1419 doc="""Store the stack top into the memo. The stack is not popped.
1420
1421 The index of the memo location to write into is given by the 4-byte
1422 signed little-endian integer following.
1423 """),
1424
Tim Petersfdc03462003-01-28 04:56:33 +00001425 # Access the extension registry (predefined objects). Akin to the GET
1426 # family.
1427
1428 I(name='EXT1',
1429 code='\x82',
1430 arg=uint1,
1431 stack_before=[],
1432 stack_after=[anyobject],
1433 proto=2,
1434 doc="""Extension code.
1435
1436 This code and the similar EXT2 and EXT4 allow using a registry
1437 of popular objects that are pickled by name, typically classes.
1438 It is envisioned that through a global negotiation and
1439 registration process, third parties can set up a mapping between
1440 ints and object names.
1441
1442 In order to guarantee pickle interchangeability, the extension
1443 code registry ought to be global, although a range of codes may
1444 be reserved for private use.
1445
1446 EXT1 has a 1-byte integer argument. This is used to index into the
1447 extension registry, and the object at that index is pushed on the stack.
1448 """),
1449
1450 I(name='EXT2',
1451 code='\x83',
1452 arg=uint2,
1453 stack_before=[],
1454 stack_after=[anyobject],
1455 proto=2,
1456 doc="""Extension code.
1457
1458 See EXT1. EXT2 has a two-byte integer argument.
1459 """),
1460
1461 I(name='EXT4',
1462 code='\x84',
1463 arg=int4,
1464 stack_before=[],
1465 stack_after=[anyobject],
1466 proto=2,
1467 doc="""Extension code.
1468
1469 See EXT1. EXT4 has a four-byte integer argument.
1470 """),
1471
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001472 # Push a class object, or module function, on the stack, via its module
1473 # and name.
1474
1475 I(name='GLOBAL',
1476 code='c',
1477 arg=stringnl_noescape_pair,
1478 stack_before=[],
1479 stack_after=[anyobject],
1480 proto=0,
1481 doc="""Push a global object (module.attr) on the stack.
1482
1483 Two newline-terminated strings follow the GLOBAL opcode. The first is
1484 taken as a module name, and the second as a class name. The class
1485 object module.class is pushed on the stack. More accurately, the
1486 object returned by self.find_class(module, class) is pushed on the
1487 stack, so unpickling subclasses can override this form of lookup.
1488 """),
1489
1490 # Ways to build objects of classes pickle doesn't know about directly
1491 # (user-defined classes). I despair of documenting this accurately
1492 # and comprehensibly -- you really have to read the pickle code to
1493 # find all the special cases.
1494
1495 I(name='REDUCE',
1496 code='R',
1497 arg=None,
1498 stack_before=[anyobject, anyobject],
1499 stack_after=[anyobject],
1500 proto=0,
1501 doc="""Push an object built from a callable and an argument tuple.
1502
1503 The opcode is named to remind of the __reduce__() method.
1504
1505 Stack before: ... callable pytuple
1506 Stack after: ... callable(*pytuple)
1507
1508 The callable and the argument tuple are the first two items returned
1509 by a __reduce__ method. Applying the callable to the argtuple is
1510 supposed to reproduce the original object, or at least get it started.
1511 If the __reduce__ method returns a 3-tuple, the last component is an
1512 argument to be passed to the object's __setstate__, and then the REDUCE
1513 opcode is followed by code to create setstate's argument, and then a
1514 BUILD opcode to apply __setstate__ to that argument.
1515
1516 There are lots of special cases here. The argtuple can be None, in
1517 which case callable.__basicnew__() is called instead to produce the
1518 object to be pushed on the stack. This appears to be a trick unique
1519 to ExtensionClasses, and is deprecated regardless.
1520
1521 If type(callable) is not ClassType, REDUCE complains unless the
1522 callable has been registered with the copy_reg module's
1523 safe_constructors dict, or the callable has a magic
1524 '__safe_for_unpickling__' attribute with a true value. I'm not sure
1525 why it does this, but I've sure seen this complaint often enough when
1526 I didn't want to <wink>.
1527 """),
1528
1529 I(name='BUILD',
1530 code='b',
1531 arg=None,
1532 stack_before=[anyobject, anyobject],
1533 stack_after=[anyobject],
1534 proto=0,
1535 doc="""Finish building an object, via __setstate__ or dict update.
1536
1537 Stack before: ... anyobject argument
1538 Stack after: ... anyobject
1539
1540 where anyobject may have been mutated, as follows:
1541
1542 If the object has a __setstate__ method,
1543
1544 anyobject.__setstate__(argument)
1545
1546 is called.
1547
1548 Else the argument must be a dict, the object must have a __dict__, and
1549 the object is updated via
1550
1551 anyobject.__dict__.update(argument)
1552
1553 This may raise RuntimeError in restricted execution mode (which
1554 disallows access to __dict__ directly); in that case, the object
1555 is updated instead via
1556
1557 for k, v in argument.items():
1558 anyobject[k] = v
1559 """),
1560
1561 I(name='INST',
1562 code='i',
1563 arg=stringnl_noescape_pair,
1564 stack_before=[markobject, stackslice],
1565 stack_after=[anyobject],
1566 proto=0,
1567 doc="""Build a class instance.
1568
1569 This is the protocol 0 version of protocol 1's OBJ opcode.
1570 INST is followed by two newline-terminated strings, giving a
1571 module and class name, just as for the GLOBAL opcode (and see
1572 GLOBAL for more details about that). self.find_class(module, name)
1573 is used to get a class object.
1574
1575 In addition, all the objects on the stack following the topmost
1576 markobject are gathered into a tuple and popped (along with the
1577 topmost markobject), just as for the TUPLE opcode.
1578
1579 Now it gets complicated. If all of these are true:
1580
1581 + The argtuple is empty (markobject was at the top of the stack
1582 at the start).
1583
1584 + It's an old-style class object (the type of the class object is
1585 ClassType).
1586
1587 + The class object does not have a __getinitargs__ attribute.
1588
1589 then we want to create an old-style class instance without invoking
1590 its __init__() method (pickle has waffled on this over the years; not
1591 calling __init__() is current wisdom). In this case, an instance of
1592 an old-style dummy class is created, and then we try to rebind its
1593 __class__ attribute to the desired class object. If this succeeds,
1594 the new instance object is pushed on the stack, and we're done. In
1595 restricted execution mode it can fail (assignment to __class__ is
1596 disallowed), and I'm not really sure what happens then -- it looks
1597 like the code ends up calling the class object's __init__ anyway,
1598 via falling into the next case.
1599
1600 Else (the argtuple is not empty, it's not an old-style class object,
1601 or the class object does have a __getinitargs__ attribute), the code
1602 first insists that the class object have a __safe_for_unpickling__
1603 attribute. Unlike as for the __safe_for_unpickling__ check in REDUCE,
1604 it doesn't matter whether this attribute has a true or false value, it
Guido van Rossumecb11042003-01-29 06:24:30 +00001605 only matters whether it exists (XXX this is a bug; cPickle
1606 requires the attribute to be true). If __safe_for_unpickling__
1607 doesn't exist, UnpicklingError is raised.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001608
1609 Else (the class object does have a __safe_for_unpickling__ attr),
1610 the class object obtained from INST's arguments is applied to the
1611 argtuple obtained from the stack, and the resulting instance object
1612 is pushed on the stack.
1613 """),
1614
1615 I(name='OBJ',
1616 code='o',
1617 arg=None,
1618 stack_before=[markobject, anyobject, stackslice],
1619 stack_after=[anyobject],
1620 proto=1,
1621 doc="""Build a class instance.
1622
1623 This is the protocol 1 version of protocol 0's INST opcode, and is
1624 very much like it. The major difference is that the class object
1625 is taken off the stack, allowing it to be retrieved from the memo
1626 repeatedly if several instances of the same class are created. This
1627 can be much more efficient (in both time and space) than repeatedly
1628 embedding the module and class names in INST opcodes.
1629
1630 Unlike INST, OBJ takes no arguments from the opcode stream. Instead
1631 the class object is taken off the stack, immediately above the
1632 topmost markobject:
1633
1634 Stack before: ... markobject classobject stackslice
1635 Stack after: ... new_instance_object
1636
1637 As for INST, the remainder of the stack above the markobject is
1638 gathered into an argument tuple, and then the logic seems identical,
Guido van Rossumecb11042003-01-29 06:24:30 +00001639 except that no __safe_for_unpickling__ check is done (XXX this is
1640 a bug; cPickle does test __safe_for_unpickling__). See INST for
1641 the gory details.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001642 """),
1643
Tim Petersfdc03462003-01-28 04:56:33 +00001644 I(name='NEWOBJ',
1645 code='\x81',
1646 arg=None,
1647 stack_before=[anyobject, anyobject],
1648 stack_after=[anyobject],
1649 proto=2,
1650 doc="""Build an object instance.
1651
1652 The stack before should be thought of as containing a class
1653 object followed by an argument tuple (the tuple being the stack
1654 top). Call these cls and args. They are popped off the stack,
1655 and the value returned by cls.__new__(cls, *args) is pushed back
1656 onto the stack.
1657 """),
1658
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001659 # Machine control.
1660
Tim Petersfdc03462003-01-28 04:56:33 +00001661 I(name='PROTO',
1662 code='\x80',
1663 arg=uint1,
1664 stack_before=[],
1665 stack_after=[],
1666 proto=2,
1667 doc="""Protocol version indicator.
1668
1669 For protocol 2 and above, a pickle must start with this opcode.
1670 The argument is the protocol version, an int in range(2, 256).
1671 """),
1672
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001673 I(name='STOP',
1674 code='.',
1675 arg=None,
1676 stack_before=[anyobject],
1677 stack_after=[],
1678 proto=0,
1679 doc="""Stop the unpickling machine.
1680
1681 Every pickle ends with this opcode. The object at the top of the stack
1682 is popped, and that's the result of unpickling. The stack should be
1683 empty then.
1684 """),
1685
1686 # Ways to deal with persistent IDs.
1687
1688 I(name='PERSID',
1689 code='P',
1690 arg=stringnl_noescape,
1691 stack_before=[],
1692 stack_after=[anyobject],
1693 proto=0,
1694 doc="""Push an object identified by a persistent ID.
1695
1696 The pickle module doesn't define what a persistent ID means. PERSID's
1697 argument is a newline-terminated str-style (no embedded escapes, no
1698 bracketing quote characters) string, which *is* "the persistent ID".
1699 The unpickler passes this string to self.persistent_load(). Whatever
1700 object that returns is pushed on the stack. There is no implementation
1701 of persistent_load() in Python's unpickler: it must be supplied by an
1702 unpickler subclass.
1703 """),
1704
1705 I(name='BINPERSID',
1706 code='Q',
1707 arg=None,
1708 stack_before=[anyobject],
1709 stack_after=[anyobject],
1710 proto=1,
1711 doc="""Push an object identified by a persistent ID.
1712
1713 Like PERSID, except the persistent ID is popped off the stack (instead
1714 of being a string embedded in the opcode bytestream). The persistent
1715 ID is passed to self.persistent_load(), and whatever object that
1716 returns is pushed on the stack. See PERSID for more detail.
1717 """),
1718]
1719del I
1720
1721# Verify uniqueness of .name and .code members.
1722name2i = {}
1723code2i = {}
1724
1725for i, d in enumerate(opcodes):
1726 if d.name in name2i:
1727 raise ValueError("repeated name %r at indices %d and %d" %
1728 (d.name, name2i[d.name], i))
1729 if d.code in code2i:
1730 raise ValueError("repeated code %r at indices %d and %d" %
1731 (d.code, code2i[d.code], i))
1732
1733 name2i[d.name] = i
1734 code2i[d.code] = i
1735
1736del name2i, code2i, i, d
1737
1738##############################################################################
1739# Build a code2op dict, mapping opcode characters to OpcodeInfo records.
1740# Also ensure we've got the same stuff as pickle.py, although the
1741# introspection here is dicey.
1742
1743code2op = {}
1744for d in opcodes:
1745 code2op[d.code] = d
1746del d
1747
1748def assure_pickle_consistency(verbose=False):
1749 import pickle, re
1750
1751 copy = code2op.copy()
1752 for name in pickle.__all__:
1753 if not re.match("[A-Z][A-Z0-9_]+$", name):
1754 if verbose:
1755 print "skipping %r: it doesn't look like an opcode name" % name
1756 continue
1757 picklecode = getattr(pickle, name)
1758 if not isinstance(picklecode, str) or len(picklecode) != 1:
1759 if verbose:
1760 print ("skipping %r: value %r doesn't look like a pickle "
1761 "code" % (name, picklecode))
1762 continue
1763 if picklecode in copy:
1764 if verbose:
1765 print "checking name %r w/ code %r for consistency" % (
1766 name, picklecode)
1767 d = copy[picklecode]
1768 if d.name != name:
1769 raise ValueError("for pickle code %r, pickle.py uses name %r "
1770 "but we're using name %r" % (picklecode,
1771 name,
1772 d.name))
1773 # Forget this one. Any left over in copy at the end are a problem
1774 # of a different kind.
1775 del copy[picklecode]
1776 else:
1777 raise ValueError("pickle.py appears to have a pickle opcode with "
1778 "name %r and code %r, but we don't" %
1779 (name, picklecode))
1780 if copy:
1781 msg = ["we appear to have pickle opcodes that pickle.py doesn't have:"]
1782 for code, d in copy.items():
1783 msg.append(" name %r with code %r" % (d.name, code))
1784 raise ValueError("\n".join(msg))
1785
1786assure_pickle_consistency()
Tim Petersc0c12b52003-01-29 00:56:17 +00001787del assure_pickle_consistency
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001788
1789##############################################################################
1790# A pickle opcode generator.
1791
1792def genops(pickle):
Guido van Rossuma72ded92003-01-27 19:40:47 +00001793 """Generate all the opcodes in a pickle.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001794
1795 'pickle' is a file-like object, or string, containing the pickle.
1796
1797 Each opcode in the pickle is generated, from the current pickle position,
1798 stopping after a STOP opcode is delivered. A triple is generated for
1799 each opcode:
1800
1801 opcode, arg, pos
1802
1803 opcode is an OpcodeInfo record, describing the current opcode.
1804
1805 If the opcode has an argument embedded in the pickle, arg is its decoded
1806 value, as a Python object. If the opcode doesn't have an argument, arg
1807 is None.
1808
1809 If the pickle has a tell() method, pos was the value of pickle.tell()
1810 before reading the current opcode. If the pickle is a string object,
1811 it's wrapped in a StringIO object, and the latter's tell() result is
1812 used. Else (the pickle doesn't have a tell(), and it's not obvious how
1813 to query its current position) pos is None.
1814 """
1815
1816 import cStringIO as StringIO
1817
1818 if isinstance(pickle, str):
1819 pickle = StringIO.StringIO(pickle)
1820
1821 if hasattr(pickle, "tell"):
1822 getpos = pickle.tell
1823 else:
1824 getpos = lambda: None
1825
1826 while True:
1827 pos = getpos()
1828 code = pickle.read(1)
1829 opcode = code2op.get(code)
1830 if opcode is None:
1831 if code == "":
1832 raise ValueError("pickle exhausted before seeing STOP")
1833 else:
1834 raise ValueError("at position %s, opcode %r unknown" % (
1835 pos is None and "<unknown>" or pos,
1836 code))
1837 if opcode.arg is None:
1838 arg = None
1839 else:
1840 arg = opcode.arg.reader(pickle)
1841 yield opcode, arg, pos
1842 if code == '.':
1843 assert opcode.name == 'STOP'
1844 break
1845
1846##############################################################################
1847# A symbolic pickle disassembler.
1848
1849def dis(pickle, out=None, indentlevel=4):
1850 """Produce a symbolic disassembly of a pickle.
1851
1852 'pickle' is a file-like object, or string, containing a (at least one)
1853 pickle. The pickle is disassembled from the current position, through
1854 the first STOP opcode encountered.
1855
1856 Optional arg 'out' is a file-like object to which the disassembly is
1857 printed. It defaults to sys.stdout.
1858
1859 Optional arg indentlevel is the number of blanks by which to indent
1860 a new MARK level. It defaults to 4.
1861 """
1862
1863 markstack = []
1864 indentchunk = ' ' * indentlevel
1865 for opcode, arg, pos in genops(pickle):
1866 if pos is not None:
1867 print >> out, "%5d:" % pos,
1868
Tim Petersd0f7c862003-01-28 15:27:57 +00001869 line = "%-4s %s%s" % (repr(opcode.code)[1:-1],
1870 indentchunk * len(markstack),
1871 opcode.name)
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001872
1873 markmsg = None
1874 if markstack and markobject in opcode.stack_before:
Tim Peters2c60f7a2003-01-29 03:49:43 +00001875 assert markobject not in opcode.stack_after
1876 markpos = markstack.pop()
1877 if markpos is not None:
1878 markmsg = "(MARK at %d)" % markpos
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001879
1880 if arg is not None or markmsg:
1881 # make a mild effort to align arguments
1882 line += ' ' * (10 - len(opcode.name))
1883 if arg is not None:
1884 line += ' ' + repr(arg)
1885 if markmsg:
1886 line += ' ' + markmsg
1887 print >> out, line
1888
1889 if markobject in opcode.stack_after:
1890 assert markobject not in opcode.stack_before
1891 markstack.append(pos)
1892
1893
Guido van Rossum03e35322003-01-28 15:37:13 +00001894_dis_test = r"""
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001895>>> import pickle
1896>>> x = [1, 2, (3, 4), {'abc': u"def"}]
Guido van Rossum57028352003-01-28 15:09:10 +00001897>>> pkl = pickle.dumps(x, 0)
1898>>> dis(pkl)
Tim Petersd0f7c862003-01-28 15:27:57 +00001899 0: ( MARK
1900 1: l LIST (MARK at 0)
1901 2: p PUT 0
1902 5: I INT 1
1903 8: a APPEND
1904 9: I INT 2
1905 12: a APPEND
1906 13: ( MARK
1907 14: I INT 3
1908 17: I INT 4
1909 20: t TUPLE (MARK at 13)
1910 21: p PUT 1
1911 24: a APPEND
1912 25: ( MARK
1913 26: d DICT (MARK at 25)
1914 27: p PUT 2
1915 30: S STRING 'abc'
1916 37: p PUT 3
1917 40: V UNICODE u'def'
1918 45: p PUT 4
1919 48: s SETITEM
1920 49: a APPEND
1921 50: . STOP
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001922
1923Try again with a "binary" pickle.
1924
Guido van Rossum57028352003-01-28 15:09:10 +00001925>>> pkl = pickle.dumps(x, 1)
1926>>> dis(pkl)
Tim Petersd0f7c862003-01-28 15:27:57 +00001927 0: ] EMPTY_LIST
1928 1: q BINPUT 0
1929 3: ( MARK
1930 4: K BININT1 1
1931 6: K BININT1 2
1932 8: ( MARK
1933 9: K BININT1 3
1934 11: K BININT1 4
1935 13: t TUPLE (MARK at 8)
1936 14: q BINPUT 1
1937 16: } EMPTY_DICT
1938 17: q BINPUT 2
1939 19: U SHORT_BINSTRING 'abc'
1940 24: q BINPUT 3
1941 26: X BINUNICODE u'def'
1942 34: q BINPUT 4
1943 36: s SETITEM
1944 37: e APPENDS (MARK at 3)
1945 38: . STOP
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001946
1947Exercise the INST/OBJ/BUILD family.
1948
1949>>> import random
Guido van Rossumf29d3d62003-01-27 22:47:53 +00001950>>> dis(pickle.dumps(random.random, 0))
Tim Petersd0f7c862003-01-28 15:27:57 +00001951 0: c GLOBAL 'random random'
1952 15: p PUT 0
1953 18: . STOP
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001954
1955>>> x = [pickle.PicklingError()] * 2
Guido van Rossumf29d3d62003-01-27 22:47:53 +00001956>>> dis(pickle.dumps(x, 0))
Tim Petersd0f7c862003-01-28 15:27:57 +00001957 0: ( MARK
1958 1: l LIST (MARK at 0)
1959 2: p PUT 0
1960 5: ( MARK
1961 6: i INST 'pickle PicklingError' (MARK at 5)
1962 28: p PUT 1
1963 31: ( MARK
1964 32: d DICT (MARK at 31)
1965 33: p PUT 2
1966 36: S STRING 'args'
1967 44: p PUT 3
1968 47: ( MARK
1969 48: t TUPLE (MARK at 47)
1970 49: s SETITEM
1971 50: b BUILD
1972 51: a APPEND
1973 52: g GET 1
1974 55: a APPEND
1975 56: . STOP
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001976
1977>>> dis(pickle.dumps(x, 1))
Tim Petersd0f7c862003-01-28 15:27:57 +00001978 0: ] EMPTY_LIST
1979 1: q BINPUT 0
1980 3: ( MARK
1981 4: ( MARK
1982 5: c GLOBAL 'pickle PicklingError'
1983 27: q BINPUT 1
1984 29: o OBJ (MARK at 4)
1985 30: q BINPUT 2
1986 32: } EMPTY_DICT
1987 33: q BINPUT 3
1988 35: U SHORT_BINSTRING 'args'
1989 41: q BINPUT 4
1990 43: ) EMPTY_TUPLE
1991 44: s SETITEM
1992 45: b BUILD
1993 46: h BINGET 2
1994 48: e APPENDS (MARK at 3)
1995 49: . STOP
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001996
1997Try "the canonical" recursive-object test.
1998
1999>>> L = []
2000>>> T = L,
2001>>> L.append(T)
2002>>> L[0] is T
2003True
2004>>> T[0] is L
2005True
2006>>> L[0][0] is L
2007True
2008>>> T[0][0] is T
2009True
Guido van Rossumf29d3d62003-01-27 22:47:53 +00002010>>> dis(pickle.dumps(L, 0))
Tim Petersd0f7c862003-01-28 15:27:57 +00002011 0: ( MARK
2012 1: l LIST (MARK at 0)
2013 2: p PUT 0
2014 5: ( MARK
2015 6: g GET 0
2016 9: t TUPLE (MARK at 5)
2017 10: p PUT 1
2018 13: a APPEND
2019 14: . STOP
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002020>>> dis(pickle.dumps(L, 1))
Tim Petersd0f7c862003-01-28 15:27:57 +00002021 0: ] EMPTY_LIST
2022 1: q BINPUT 0
2023 3: ( MARK
2024 4: h BINGET 0
2025 6: t TUPLE (MARK at 3)
2026 7: q BINPUT 1
2027 9: a APPEND
2028 10: . STOP
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002029
2030The protocol 0 pickle of the tuple causes the disassembly to get confused,
2031as it doesn't realize that the POP opcode at 16 gets rid of the MARK at 0
2032(so the output remains indented until the end). The protocol 1 pickle
2033doesn't trigger this glitch, because the disassembler realizes that
2034POP_MARK gets rid of the MARK. Doing a better job on the protocol 0
2035pickle would require the disassembler to emulate the stack.
2036
Guido van Rossumf29d3d62003-01-27 22:47:53 +00002037>>> dis(pickle.dumps(T, 0))
Tim Petersd0f7c862003-01-28 15:27:57 +00002038 0: ( MARK
2039 1: ( MARK
2040 2: l LIST (MARK at 1)
2041 3: p PUT 0
2042 6: ( MARK
2043 7: g GET 0
2044 10: t TUPLE (MARK at 6)
2045 11: p PUT 1
2046 14: a APPEND
2047 15: 0 POP
2048 16: 0 POP
2049 17: g GET 1
2050 20: . STOP
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002051>>> dis(pickle.dumps(T, 1))
Tim Petersd0f7c862003-01-28 15:27:57 +00002052 0: ( MARK
2053 1: ] EMPTY_LIST
2054 2: q BINPUT 0
2055 4: ( MARK
2056 5: h BINGET 0
2057 7: t TUPLE (MARK at 4)
2058 8: q BINPUT 1
2059 10: a APPEND
2060 11: 1 POP_MARK (MARK at 0)
2061 12: h BINGET 1
2062 14: . STOP
2063
2064Try protocol 2.
2065
2066>>> dis(pickle.dumps(L, 2))
2067 0: \x80 PROTO 2
2068 2: ] EMPTY_LIST
2069 3: q BINPUT 0
2070 5: h BINGET 0
2071 7: \x85 TUPLE1
2072 8: q BINPUT 1
2073 10: a APPEND
2074 11: . STOP
2075
2076>>> dis(pickle.dumps(T, 2))
2077 0: \x80 PROTO 2
2078 2: ] EMPTY_LIST
2079 3: q BINPUT 0
2080 5: h BINGET 0
2081 7: \x85 TUPLE1
2082 8: q BINPUT 1
2083 10: a APPEND
2084 11: 0 POP
2085 12: h BINGET 1
2086 14: . STOP
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002087"""
2088
Guido van Rossum57028352003-01-28 15:09:10 +00002089__test__ = {'disassembler_test': _dis_test,
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002090 }
2091
2092def _test():
2093 import doctest
2094 return doctest.testmod()
2095
2096if __name__ == "__main__":
2097 _test()