blob: af84c1f8b9edd11bcacd86a571afb71f1382f656 [file] [log] [blame]
Skip Montanaro54455942003-01-29 15:41:33 +00001'''"Executable documentation" for the pickle module.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002
3Extensive comments about the pickle protocols and pickle-machine opcodes
4can be found here. Some functions meant for external use:
5
6genops(pickle)
7 Generate all the opcodes in a pickle, as (opcode, arg, position) triples.
8
Andrew M. Kuchlingd0c53fe2004-08-07 16:51:30 +00009dis(pickle, out=None, memo=None, indentlevel=4)
Tim Peters8ecfc8e2003-01-27 18:51:48 +000010 Print a symbolic disassembly of a pickle.
Skip Montanaro54455942003-01-29 15:41:33 +000011'''
Tim Peters8ecfc8e2003-01-27 18:51:48 +000012
Walter Dörwald42748a82007-06-12 16:40:17 +000013import codecs
Guido van Rossum98297ee2007-11-06 21:34:58 +000014import pickle
15import re
Walter Dörwald42748a82007-06-12 16:40:17 +000016
Tim Peters90cf2122004-11-06 23:45:48 +000017__all__ = ['dis',
18 'genops',
19 ]
20
Guido van Rossum98297ee2007-11-06 21:34:58 +000021bytes_types = pickle.bytes_types
22
Tim Peters8ecfc8e2003-01-27 18:51:48 +000023# Other ideas:
24#
25# - A pickle verifier: read a pickle and check it exhaustively for
Tim Petersc1c2b3e2003-01-29 20:12:21 +000026# well-formedness. dis() does a lot of this already.
Tim Peters8ecfc8e2003-01-27 18:51:48 +000027#
28# - A protocol identifier: examine a pickle and return its protocol number
29# (== the highest .proto attr value among all the opcodes in the pickle).
Tim Petersc1c2b3e2003-01-29 20:12:21 +000030# dis() already prints this info at the end.
Tim Peters8ecfc8e2003-01-27 18:51:48 +000031#
32# - A pickle optimizer: for example, tuple-building code is sometimes more
33# elaborate than necessary, catering for the possibility that the tuple
34# is recursive. Or lots of times a PUT is generated that's never accessed
35# by a later GET.
36
37
38"""
39"A pickle" is a program for a virtual pickle machine (PM, but more accurately
40called an unpickling machine). It's a sequence of opcodes, interpreted by the
41PM, building an arbitrarily complex Python object.
42
43For the most part, the PM is very simple: there are no looping, testing, or
44conditional instructions, no arithmetic and no function calls. Opcodes are
45executed once each, from first to last, until a STOP opcode is reached.
46
47The PM has two data areas, "the stack" and "the memo".
48
49Many opcodes push Python objects onto the stack; e.g., INT pushes a Python
50integer object on the stack, whose value is gotten from a decimal string
51literal immediately following the INT opcode in the pickle bytestream. Other
52opcodes take Python objects off the stack. The result of unpickling is
53whatever object is left on the stack when the final STOP opcode is executed.
54
55The memo is simply an array of objects, or it can be implemented as a dict
56mapping little integers to objects. The memo serves as the PM's "long term
57memory", and the little integers indexing the memo are akin to variable
58names. Some opcodes pop a stack object into the memo at a given index,
59and others push a memo object at a given index onto the stack again.
60
61At heart, that's all the PM has. Subtleties arise for these reasons:
62
63+ Object identity. Objects can be arbitrarily complex, and subobjects
64 may be shared (for example, the list [a, a] refers to the same object a
65 twice). It can be vital that unpickling recreate an isomorphic object
66 graph, faithfully reproducing sharing.
67
68+ Recursive objects. For example, after "L = []; L.append(L)", L is a
69 list, and L[0] is the same list. This is related to the object identity
70 point, and some sequences of pickle opcodes are subtle in order to
71 get the right result in all cases.
72
73+ Things pickle doesn't know everything about. Examples of things pickle
74 does know everything about are Python's builtin scalar and container
75 types, like ints and tuples. They generally have opcodes dedicated to
76 them. For things like module references and instances of user-defined
77 classes, pickle's knowledge is limited. Historically, many enhancements
78 have been made to the pickle protocol in order to do a better (faster,
79 and/or more compact) job on those.
80
81+ Backward compatibility and micro-optimization. As explained below,
82 pickle opcodes never go away, not even when better ways to do a thing
83 get invented. The repertoire of the PM just keeps growing over time.
Tim Petersfdc03462003-01-28 04:56:33 +000084 For example, protocol 0 had two opcodes for building Python integers (INT
85 and LONG), protocol 1 added three more for more-efficient pickling of short
86 integers, and protocol 2 added two more for more-efficient pickling of
87 long integers (before protocol 2, the only ways to pickle a Python long
88 took time quadratic in the number of digits, for both pickling and
89 unpickling). "Opcode bloat" isn't so much a subtlety as a source of
Tim Peters8ecfc8e2003-01-27 18:51:48 +000090 wearying complication.
91
92
93Pickle protocols:
94
95For compatibility, the meaning of a pickle opcode never changes. Instead new
96pickle opcodes get added, and each version's unpickler can handle all the
97pickle opcodes in all protocol versions to date. So old pickles continue to
98be readable forever. The pickler can generally be told to restrict itself to
99the subset of opcodes available under previous protocol versions too, so that
100users can create pickles under the current version readable by older
101versions. However, a pickle does not contain its version number embedded
102within it. If an older unpickler tries to read a pickle using a later
103protocol, the result is most likely an exception due to seeing an unknown (in
104the older unpickler) opcode.
105
106The original pickle used what's now called "protocol 0", and what was called
107"text mode" before Python 2.3. The entire pickle bytestream is made up of
108printable 7-bit ASCII characters, plus the newline character, in protocol 0.
Tim Petersfdc03462003-01-28 04:56:33 +0000109That's why it was called text mode. Protocol 0 is small and elegant, but
110sometimes painfully inefficient.
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000111
112The second major set of additions is now called "protocol 1", and was called
113"binary mode" before Python 2.3. This added many opcodes with arguments
114consisting of arbitrary bytes, including NUL bytes and unprintable "high bit"
115bytes. Binary mode pickles can be substantially smaller than equivalent
116text mode pickles, and sometimes faster too; e.g., BININT represents a 4-byte
117int as 4 bytes following the opcode, which is cheaper to unpickle than the
Tim Petersfdc03462003-01-28 04:56:33 +0000118(perhaps) 11-character decimal string attached to INT. Protocol 1 also added
119a number of opcodes that operate on many stack elements at once (like APPENDS
Tim Peters81098ac2003-01-28 05:12:08 +0000120and SETITEMS), and "shortcut" opcodes (like EMPTY_DICT and EMPTY_TUPLE).
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000121
122The third major set of additions came in Python 2.3, and is called "protocol
Tim Petersfdc03462003-01-28 04:56:33 +00001232". This added:
124
125- A better way to pickle instances of new-style classes (NEWOBJ).
126
127- A way for a pickle to identify its protocol (PROTO).
128
129- Time- and space- efficient pickling of long ints (LONG{1,4}).
130
131- Shortcuts for small tuples (TUPLE{1,2,3}}.
132
133- Dedicated opcodes for bools (NEWTRUE, NEWFALSE).
134
135- The "extension registry", a vector of popular objects that can be pushed
136 efficiently by index (EXT{1,2,4}). This is akin to the memo and GET, but
137 the registry contents are predefined (there's nothing akin to the memo's
138 PUT).
Guido van Rossumecb11042003-01-29 06:24:30 +0000139
Skip Montanaro54455942003-01-29 15:41:33 +0000140Another independent change with Python 2.3 is the abandonment of any
141pretense that it might be safe to load pickles received from untrusted
Guido van Rossumecb11042003-01-29 06:24:30 +0000142parties -- no sufficient security analysis has been done to guarantee
Skip Montanaro54455942003-01-29 15:41:33 +0000143this and there isn't a use case that warrants the expense of such an
Guido van Rossumecb11042003-01-29 06:24:30 +0000144analysis.
145
146To this end, all tests for __safe_for_unpickling__ or for
147copy_reg.safe_constructors are removed from the unpickling code.
148References to these variables in the descriptions below are to be seen
149as describing unpickling in Python 2.2 and before.
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000150"""
151
152# Meta-rule: Descriptions are stored in instances of descriptor objects,
153# with plain constructors. No meta-language is defined from which
154# descriptors could be constructed. If you want, e.g., XML, write a little
155# program to generate XML from the objects.
156
157##############################################################################
158# Some pickle opcodes have an argument, following the opcode in the
159# bytestream. An argument is of a specific type, described by an instance
160# of ArgumentDescriptor. These are not to be confused with arguments taken
161# off the stack -- ArgumentDescriptor applies only to arguments embedded in
162# the opcode stream, immediately following an opcode.
163
164# Represents the number of bytes consumed by an argument delimited by the
165# next newline character.
166UP_TO_NEWLINE = -1
167
168# Represents the number of bytes consumed by a two-argument opcode where
169# the first argument gives the number of bytes in the second argument.
Tim Petersfdb8cfa2003-01-28 00:13:19 +0000170TAKEN_FROM_ARGUMENT1 = -2 # num bytes is 1-byte unsigned int
171TAKEN_FROM_ARGUMENT4 = -3 # num bytes is 4-byte signed little-endian int
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000172
173class ArgumentDescriptor(object):
174 __slots__ = (
175 # name of descriptor record, also a module global name; a string
176 'name',
177
178 # length of argument, in bytes; an int; UP_TO_NEWLINE and
Tim Petersfdb8cfa2003-01-28 00:13:19 +0000179 # TAKEN_FROM_ARGUMENT{1,4} are negative values for variable-length
180 # cases
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000181 'n',
182
183 # a function taking a file-like object, reading this kind of argument
184 # from the object at the current position, advancing the current
185 # position by n bytes, and returning the value of the argument
186 'reader',
187
188 # human-readable docs for this arg descriptor; a string
189 'doc',
190 )
191
192 def __init__(self, name, n, reader, doc):
193 assert isinstance(name, str)
194 self.name = name
195
196 assert isinstance(n, int) and (n >= 0 or
Tim Petersfdb8cfa2003-01-28 00:13:19 +0000197 n in (UP_TO_NEWLINE,
198 TAKEN_FROM_ARGUMENT1,
199 TAKEN_FROM_ARGUMENT4))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000200 self.n = n
201
202 self.reader = reader
203
204 assert isinstance(doc, str)
205 self.doc = doc
206
207from struct import unpack as _unpack
208
209def read_uint1(f):
Tim Peters55762f52003-01-28 16:01:25 +0000210 r"""
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000211 >>> import io
212 >>> read_uint1(io.BytesIO(b'\xff'))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000213 255
214 """
215
216 data = f.read(1)
217 if data:
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000218 return data[0]
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000219 raise ValueError("not enough data in stream to read uint1")
220
221uint1 = ArgumentDescriptor(
222 name='uint1',
223 n=1,
224 reader=read_uint1,
225 doc="One-byte unsigned integer.")
226
227
228def read_uint2(f):
Tim Peters55762f52003-01-28 16:01:25 +0000229 r"""
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000230 >>> import io
231 >>> read_uint2(io.BytesIO(b'\xff\x00'))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000232 255
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000233 >>> read_uint2(io.BytesIO(b'\xff\xff'))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000234 65535
235 """
236
237 data = f.read(2)
238 if len(data) == 2:
239 return _unpack("<H", data)[0]
240 raise ValueError("not enough data in stream to read uint2")
241
242uint2 = ArgumentDescriptor(
243 name='uint2',
244 n=2,
245 reader=read_uint2,
246 doc="Two-byte unsigned integer, little-endian.")
247
248
249def read_int4(f):
Tim Peters55762f52003-01-28 16:01:25 +0000250 r"""
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000251 >>> import io
252 >>> read_int4(io.BytesIO(b'\xff\x00\x00\x00'))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000253 255
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000254 >>> read_int4(io.BytesIO(b'\x00\x00\x00\x80')) == -(2**31)
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000255 True
256 """
257
258 data = f.read(4)
259 if len(data) == 4:
260 return _unpack("<i", data)[0]
261 raise ValueError("not enough data in stream to read int4")
262
263int4 = ArgumentDescriptor(
264 name='int4',
265 n=4,
266 reader=read_int4,
267 doc="Four-byte signed integer, little-endian, 2's complement.")
268
269
270def read_stringnl(f, decode=True, stripquotes=True):
Tim Peters55762f52003-01-28 16:01:25 +0000271 r"""
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000272 >>> import io
273 >>> read_stringnl(io.BytesIO(b"'abcd'\nefg\n"))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000274 'abcd'
275
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000276 >>> read_stringnl(io.BytesIO(b"\n"))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000277 Traceback (most recent call last):
278 ...
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000279 ValueError: no string quotes around b''
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000280
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000281 >>> read_stringnl(io.BytesIO(b"\n"), stripquotes=False)
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000282 ''
283
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000284 >>> read_stringnl(io.BytesIO(b"''\n"))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000285 ''
286
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000287 >>> read_stringnl(io.BytesIO(b'"abcd"'))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000288 Traceback (most recent call last):
289 ...
290 ValueError: no newline found when trying to read stringnl
291
292 Embedded escapes are undone in the result.
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000293 >>> read_stringnl(io.BytesIO(br"'a\n\\b\x00c\td'" + b"\n'e'"))
Tim Peters55762f52003-01-28 16:01:25 +0000294 'a\n\\b\x00c\td'
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000295 """
296
Guido van Rossum26986312007-07-17 00:19:46 +0000297 data = f.readline()
Guido van Rossum26d95c32007-08-27 23:18:54 +0000298 if not data.endswith(b'\n'):
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000299 raise ValueError("no newline found when trying to read stringnl")
300 data = data[:-1] # lose the newline
301
302 if stripquotes:
Guido van Rossum26d95c32007-08-27 23:18:54 +0000303 for q in (b'"', b"'"):
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000304 if data.startswith(q):
305 if not data.endswith(q):
306 raise ValueError("strinq quote %r not found at both "
307 "ends of %r" % (q, data))
308 data = data[1:-1]
309 break
310 else:
311 raise ValueError("no string quotes around %r" % data)
312
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000313 if decode:
Guido van Rossum98297ee2007-11-06 21:34:58 +0000314 data = codecs.escape_decode(data)[0].decode("ascii")
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000315 return data
316
317stringnl = ArgumentDescriptor(
318 name='stringnl',
319 n=UP_TO_NEWLINE,
320 reader=read_stringnl,
321 doc="""A newline-terminated string.
322
323 This is a repr-style string, with embedded escapes, and
324 bracketing quotes.
325 """)
326
327def read_stringnl_noescape(f):
Guido van Rossum98297ee2007-11-06 21:34:58 +0000328 return read_stringnl(f, stripquotes=False)
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000329
330stringnl_noescape = ArgumentDescriptor(
331 name='stringnl_noescape',
332 n=UP_TO_NEWLINE,
333 reader=read_stringnl_noescape,
334 doc="""A newline-terminated string.
335
336 This is a str-style string, without embedded escapes,
337 or bracketing quotes. It should consist solely of
338 printable ASCII characters.
339 """)
340
341def read_stringnl_noescape_pair(f):
Tim Peters55762f52003-01-28 16:01:25 +0000342 r"""
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000343 >>> import io
344 >>> read_stringnl_noescape_pair(io.BytesIO(b"Queue\nEmpty\njunk"))
Tim Petersd916cf42003-01-27 19:01:47 +0000345 'Queue Empty'
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000346 """
347
Tim Petersd916cf42003-01-27 19:01:47 +0000348 return "%s %s" % (read_stringnl_noescape(f), read_stringnl_noescape(f))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000349
350stringnl_noescape_pair = ArgumentDescriptor(
351 name='stringnl_noescape_pair',
352 n=UP_TO_NEWLINE,
353 reader=read_stringnl_noescape_pair,
354 doc="""A pair of newline-terminated strings.
355
356 These are str-style strings, without embedded
357 escapes, or bracketing quotes. They should
358 consist solely of printable ASCII characters.
359 The pair is returned as a single string, with
Tim Petersd916cf42003-01-27 19:01:47 +0000360 a single blank separating the two strings.
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000361 """)
362
363def read_string4(f):
Tim Peters55762f52003-01-28 16:01:25 +0000364 r"""
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000365 >>> import io
366 >>> read_string4(io.BytesIO(b"\x00\x00\x00\x00abc"))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000367 ''
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000368 >>> read_string4(io.BytesIO(b"\x03\x00\x00\x00abcdef"))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000369 'abc'
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000370 >>> read_string4(io.BytesIO(b"\x00\x00\x00\x03abcdef"))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000371 Traceback (most recent call last):
372 ...
373 ValueError: expected 50331648 bytes in a string4, but only 6 remain
374 """
375
376 n = read_int4(f)
377 if n < 0:
378 raise ValueError("string4 byte count < 0: %d" % n)
379 data = f.read(n)
380 if len(data) == n:
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000381 return data.decode("latin-1")
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000382 raise ValueError("expected %d bytes in a string4, but only %d remain" %
383 (n, len(data)))
384
385string4 = ArgumentDescriptor(
386 name="string4",
Tim Petersfdb8cfa2003-01-28 00:13:19 +0000387 n=TAKEN_FROM_ARGUMENT4,
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000388 reader=read_string4,
389 doc="""A counted string.
390
391 The first argument is a 4-byte little-endian signed int giving
392 the number of bytes in the string, and the second argument is
393 that many bytes.
394 """)
395
396
397def read_string1(f):
Tim Peters55762f52003-01-28 16:01:25 +0000398 r"""
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000399 >>> import io
400 >>> read_string1(io.BytesIO(b"\x00"))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000401 ''
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000402 >>> read_string1(io.BytesIO(b"\x03abcdef"))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000403 'abc'
404 """
405
406 n = read_uint1(f)
407 assert n >= 0
408 data = f.read(n)
409 if len(data) == n:
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000410 return data.decode("latin-1")
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000411 raise ValueError("expected %d bytes in a string1, but only %d remain" %
412 (n, len(data)))
413
414string1 = ArgumentDescriptor(
415 name="string1",
Tim Petersfdb8cfa2003-01-28 00:13:19 +0000416 n=TAKEN_FROM_ARGUMENT1,
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000417 reader=read_string1,
418 doc="""A counted string.
419
420 The first argument is a 1-byte unsigned int giving the number
421 of bytes in the string, and the second argument is that many
422 bytes.
423 """)
424
425
426def read_unicodestringnl(f):
Tim Peters55762f52003-01-28 16:01:25 +0000427 r"""
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000428 >>> import io
429 >>> read_unicodestringnl(io.BytesIO(b"abc\\uabcd\njunk")) == 'abc\uabcd'
430 True
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000431 """
432
Guido van Rossum26986312007-07-17 00:19:46 +0000433 data = f.readline()
Guido van Rossum26d95c32007-08-27 23:18:54 +0000434 if not data.endswith(b'\n'):
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000435 raise ValueError("no newline found when trying to read "
436 "unicodestringnl")
437 data = data[:-1] # lose the newline
Guido van Rossumef87d6e2007-05-02 19:09:54 +0000438 return str(data, 'raw-unicode-escape')
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000439
440unicodestringnl = ArgumentDescriptor(
441 name='unicodestringnl',
442 n=UP_TO_NEWLINE,
443 reader=read_unicodestringnl,
444 doc="""A newline-terminated Unicode string.
445
446 This is raw-unicode-escape encoded, so consists of
447 printable ASCII characters, and may contain embedded
448 escape sequences.
449 """)
450
451def read_unicodestring4(f):
Tim Peters55762f52003-01-28 16:01:25 +0000452 r"""
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000453 >>> import io
454 >>> s = 'abcd\uabcd'
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000455 >>> enc = s.encode('utf-8')
456 >>> enc
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000457 b'abcd\xea\xaf\x8d'
458 >>> n = bytes([len(enc), 0, 0, 0]) # little-endian 4-byte length
459 >>> t = read_unicodestring4(io.BytesIO(n + enc + b'junk'))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000460 >>> s == t
461 True
462
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000463 >>> read_unicodestring4(io.BytesIO(n + enc[:-1]))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000464 Traceback (most recent call last):
465 ...
466 ValueError: expected 7 bytes in a unicodestring4, but only 6 remain
467 """
468
469 n = read_int4(f)
470 if n < 0:
471 raise ValueError("unicodestring4 byte count < 0: %d" % n)
472 data = f.read(n)
473 if len(data) == n:
Guido van Rossumef87d6e2007-05-02 19:09:54 +0000474 return str(data, 'utf-8')
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000475 raise ValueError("expected %d bytes in a unicodestring4, but only %d "
476 "remain" % (n, len(data)))
477
478unicodestring4 = ArgumentDescriptor(
479 name="unicodestring4",
Tim Petersfdb8cfa2003-01-28 00:13:19 +0000480 n=TAKEN_FROM_ARGUMENT4,
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000481 reader=read_unicodestring4,
482 doc="""A counted Unicode string.
483
484 The first argument is a 4-byte little-endian signed int
485 giving the number of bytes in the string, and the second
486 argument-- the UTF-8 encoding of the Unicode string --
487 contains that many bytes.
488 """)
489
490
491def read_decimalnl_short(f):
Tim Peters55762f52003-01-28 16:01:25 +0000492 r"""
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000493 >>> import io
494 >>> read_decimalnl_short(io.BytesIO(b"1234\n56"))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000495 1234
496
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000497 >>> read_decimalnl_short(io.BytesIO(b"1234L\n56"))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000498 Traceback (most recent call last):
499 ...
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000500 ValueError: trailing 'L' not allowed in b'1234L'
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000501 """
502
503 s = read_stringnl(f, decode=False, stripquotes=False)
Guido van Rossum26d95c32007-08-27 23:18:54 +0000504 if s.endswith(b"L"):
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000505 raise ValueError("trailing 'L' not allowed in %r" % s)
506
507 # It's not necessarily true that the result fits in a Python short int:
508 # the pickle may have been written on a 64-bit box. There's also a hack
509 # for True and False here.
Jeremy Hyltona5dc3db2007-08-29 19:07:40 +0000510 if s == b"00":
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000511 return False
Jeremy Hyltona5dc3db2007-08-29 19:07:40 +0000512 elif s == b"01":
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000513 return True
514
515 try:
516 return int(s)
517 except OverflowError:
Guido van Rossume2a383d2007-01-15 16:59:06 +0000518 return int(s)
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000519
520def read_decimalnl_long(f):
Tim Peters55762f52003-01-28 16:01:25 +0000521 r"""
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000522 >>> import io
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000523
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000524 >>> read_decimalnl_long(io.BytesIO(b"1234L\n56"))
Guido van Rossume2b70bc2006-08-18 22:13:04 +0000525 1234
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000526
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000527 >>> read_decimalnl_long(io.BytesIO(b"123456789012345678901234L\n6"))
Guido van Rossume2b70bc2006-08-18 22:13:04 +0000528 123456789012345678901234
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000529 """
530
531 s = read_stringnl(f, decode=False, stripquotes=False)
Guido van Rossume2a383d2007-01-15 16:59:06 +0000532 return int(s)
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000533
534
535decimalnl_short = ArgumentDescriptor(
536 name='decimalnl_short',
537 n=UP_TO_NEWLINE,
538 reader=read_decimalnl_short,
539 doc="""A newline-terminated decimal integer literal.
540
541 This never has a trailing 'L', and the integer fit
542 in a short Python int on the box where the pickle
543 was written -- but there's no guarantee it will fit
544 in a short Python int on the box where the pickle
545 is read.
546 """)
547
548decimalnl_long = ArgumentDescriptor(
549 name='decimalnl_long',
550 n=UP_TO_NEWLINE,
551 reader=read_decimalnl_long,
552 doc="""A newline-terminated decimal integer literal.
553
554 This has a trailing 'L', and can represent integers
555 of any size.
556 """)
557
558
559def read_floatnl(f):
Tim Peters55762f52003-01-28 16:01:25 +0000560 r"""
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000561 >>> import io
562 >>> read_floatnl(io.BytesIO(b"-1.25\n6"))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000563 -1.25
564 """
565 s = read_stringnl(f, decode=False, stripquotes=False)
566 return float(s)
567
568floatnl = ArgumentDescriptor(
569 name='floatnl',
570 n=UP_TO_NEWLINE,
571 reader=read_floatnl,
572 doc="""A newline-terminated decimal floating literal.
573
574 In general this requires 17 significant digits for roundtrip
575 identity, and pickling then unpickling infinities, NaNs, and
576 minus zero doesn't work across boxes, or on some boxes even
577 on itself (e.g., Windows can't read the strings it produces
578 for infinities or NaNs).
579 """)
580
581def read_float8(f):
Tim Peters55762f52003-01-28 16:01:25 +0000582 r"""
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000583 >>> import io, struct
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000584 >>> raw = struct.pack(">d", -1.25)
585 >>> raw
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000586 b'\xbf\xf4\x00\x00\x00\x00\x00\x00'
587 >>> read_float8(io.BytesIO(raw + b"\n"))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000588 -1.25
589 """
590
591 data = f.read(8)
592 if len(data) == 8:
593 return _unpack(">d", data)[0]
594 raise ValueError("not enough data in stream to read float8")
595
596
597float8 = ArgumentDescriptor(
598 name='float8',
599 n=8,
600 reader=read_float8,
601 doc="""An 8-byte binary representation of a float, big-endian.
602
603 The format is unique to Python, and shared with the struct
Guido van Rossum99603b02007-07-20 00:22:32 +0000604 module (format string '>d') "in theory" (the struct and pickle
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000605 implementations don't share the code -- they should). It's
606 strongly related to the IEEE-754 double format, and, in normal
607 cases, is in fact identical to the big-endian 754 double format.
608 On other boxes the dynamic range is limited to that of a 754
609 double, and "add a half and chop" rounding is used to reduce
610 the precision to 53 bits. However, even on a 754 box,
611 infinities, NaNs, and minus zero may not be handled correctly
612 (may not survive roundtrip pickling intact).
613 """)
614
Guido van Rossum5a2d8f52003-01-27 21:44:25 +0000615# Protocol 2 formats
616
Tim Petersc0c12b52003-01-29 00:56:17 +0000617from pickle import decode_long
Guido van Rossum5a2d8f52003-01-27 21:44:25 +0000618
619def read_long1(f):
620 r"""
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000621 >>> import io
622 >>> read_long1(io.BytesIO(b"\x00"))
Guido van Rossume2b70bc2006-08-18 22:13:04 +0000623 0
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000624 >>> read_long1(io.BytesIO(b"\x02\xff\x00"))
Guido van Rossume2b70bc2006-08-18 22:13:04 +0000625 255
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000626 >>> read_long1(io.BytesIO(b"\x02\xff\x7f"))
Guido van Rossume2b70bc2006-08-18 22:13:04 +0000627 32767
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000628 >>> read_long1(io.BytesIO(b"\x02\x00\xff"))
Guido van Rossume2b70bc2006-08-18 22:13:04 +0000629 -256
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000630 >>> read_long1(io.BytesIO(b"\x02\x00\x80"))
Guido van Rossume2b70bc2006-08-18 22:13:04 +0000631 -32768
Guido van Rossum5a2d8f52003-01-27 21:44:25 +0000632 """
633
634 n = read_uint1(f)
635 data = f.read(n)
636 if len(data) != n:
637 raise ValueError("not enough data in stream to read long1")
638 return decode_long(data)
639
640long1 = ArgumentDescriptor(
641 name="long1",
Tim Petersfdb8cfa2003-01-28 00:13:19 +0000642 n=TAKEN_FROM_ARGUMENT1,
Guido van Rossum5a2d8f52003-01-27 21:44:25 +0000643 reader=read_long1,
644 doc="""A binary long, little-endian, using 1-byte size.
645
646 This first reads one byte as an unsigned size, then reads that
Tim Petersbdbe7412003-01-27 23:54:04 +0000647 many bytes and interprets them as a little-endian 2's-complement long.
Tim Peters4b23f2b2003-01-31 16:43:39 +0000648 If the size is 0, that's taken as a shortcut for the long 0L.
Guido van Rossum5a2d8f52003-01-27 21:44:25 +0000649 """)
650
Guido van Rossum5a2d8f52003-01-27 21:44:25 +0000651def read_long4(f):
652 r"""
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000653 >>> import io
654 >>> read_long4(io.BytesIO(b"\x02\x00\x00\x00\xff\x00"))
Guido van Rossume2b70bc2006-08-18 22:13:04 +0000655 255
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000656 >>> read_long4(io.BytesIO(b"\x02\x00\x00\x00\xff\x7f"))
Guido van Rossume2b70bc2006-08-18 22:13:04 +0000657 32767
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000658 >>> read_long4(io.BytesIO(b"\x02\x00\x00\x00\x00\xff"))
Guido van Rossume2b70bc2006-08-18 22:13:04 +0000659 -256
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000660 >>> read_long4(io.BytesIO(b"\x02\x00\x00\x00\x00\x80"))
Guido van Rossume2b70bc2006-08-18 22:13:04 +0000661 -32768
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000662 >>> read_long1(io.BytesIO(b"\x00\x00\x00\x00"))
Guido van Rossume2b70bc2006-08-18 22:13:04 +0000663 0
Guido van Rossum5a2d8f52003-01-27 21:44:25 +0000664 """
665
666 n = read_int4(f)
667 if n < 0:
Neal Norwitz784a3f52003-01-28 00:20:41 +0000668 raise ValueError("long4 byte count < 0: %d" % n)
Guido van Rossum5a2d8f52003-01-27 21:44:25 +0000669 data = f.read(n)
670 if len(data) != n:
Neal Norwitz784a3f52003-01-28 00:20:41 +0000671 raise ValueError("not enough data in stream to read long4")
Guido van Rossum5a2d8f52003-01-27 21:44:25 +0000672 return decode_long(data)
673
674long4 = ArgumentDescriptor(
675 name="long4",
Tim Petersfdb8cfa2003-01-28 00:13:19 +0000676 n=TAKEN_FROM_ARGUMENT4,
Guido van Rossum5a2d8f52003-01-27 21:44:25 +0000677 reader=read_long4,
678 doc="""A binary representation of a long, little-endian.
679
680 This first reads four bytes as a signed size (but requires the
681 size to be >= 0), then reads that many bytes and interprets them
Tim Peters4b23f2b2003-01-31 16:43:39 +0000682 as a little-endian 2's-complement long. If the size is 0, that's taken
Guido van Rossume2a383d2007-01-15 16:59:06 +0000683 as a shortcut for the int 0, although LONG1 should really be used
Tim Peters4b23f2b2003-01-31 16:43:39 +0000684 then instead (and in any case where # of bytes < 256).
Guido van Rossum5a2d8f52003-01-27 21:44:25 +0000685 """)
686
687
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000688##############################################################################
689# Object descriptors. The stack used by the pickle machine holds objects,
690# and in the stack_before and stack_after attributes of OpcodeInfo
691# descriptors we need names to describe the various types of objects that can
692# appear on the stack.
693
694class StackObject(object):
695 __slots__ = (
696 # name of descriptor record, for info only
697 'name',
698
699 # type of object, or tuple of type objects (meaning the object can
700 # be of any type in the tuple)
701 'obtype',
702
703 # human-readable docs for this kind of stack object; a string
704 'doc',
705 )
706
707 def __init__(self, name, obtype, doc):
Guido van Rossum3172c5d2007-10-16 18:12:55 +0000708 assert isinstance(name, str)
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000709 self.name = name
710
711 assert isinstance(obtype, type) or isinstance(obtype, tuple)
712 if isinstance(obtype, tuple):
713 for contained in obtype:
714 assert isinstance(contained, type)
715 self.obtype = obtype
716
Guido van Rossum3172c5d2007-10-16 18:12:55 +0000717 assert isinstance(doc, str)
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000718 self.doc = doc
719
Tim Petersc1c2b3e2003-01-29 20:12:21 +0000720 def __repr__(self):
721 return self.name
722
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000723
724pyint = StackObject(
725 name='int',
726 obtype=int,
727 doc="A short (as opposed to long) Python integer object.")
728
729pylong = StackObject(
730 name='long',
Guido van Rossume2a383d2007-01-15 16:59:06 +0000731 obtype=int,
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000732 doc="A long (as opposed to short) Python integer object.")
733
734pyinteger_or_bool = StackObject(
735 name='int_or_bool',
Guido van Rossume2a383d2007-01-15 16:59:06 +0000736 obtype=(int, int, bool),
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000737 doc="A Python integer object (short or long), or "
738 "a Python bool.")
739
Guido van Rossum5a2d8f52003-01-27 21:44:25 +0000740pybool = StackObject(
741 name='bool',
742 obtype=(bool,),
743 doc="A Python bool object.")
744
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000745pyfloat = StackObject(
746 name='float',
747 obtype=float,
748 doc="A Python float object.")
749
750pystring = StackObject(
Guido van Rossum98297ee2007-11-06 21:34:58 +0000751 name='bytes',
752 obtype=bytes,
753 doc="A Python bytes object.")
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000754
755pyunicode = StackObject(
Guido van Rossum98297ee2007-11-06 21:34:58 +0000756 name='str',
Guido van Rossumef87d6e2007-05-02 19:09:54 +0000757 obtype=str,
Guido van Rossum98297ee2007-11-06 21:34:58 +0000758 doc="A Python string object.")
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000759
760pynone = StackObject(
761 name="None",
762 obtype=type(None),
763 doc="The Python None object.")
764
765pytuple = StackObject(
766 name="tuple",
767 obtype=tuple,
768 doc="A Python tuple object.")
769
770pylist = StackObject(
771 name="list",
772 obtype=list,
773 doc="A Python list object.")
774
775pydict = StackObject(
776 name="dict",
777 obtype=dict,
778 doc="A Python dict object.")
779
780anyobject = StackObject(
781 name='any',
782 obtype=object,
783 doc="Any kind of object whatsoever.")
784
785markobject = StackObject(
786 name="mark",
787 obtype=StackObject,
788 doc="""'The mark' is a unique object.
789
790 Opcodes that operate on a variable number of objects
791 generally don't embed the count of objects in the opcode,
792 or pull it off the stack. Instead the MARK opcode is used
793 to push a special marker object on the stack, and then
794 some other opcodes grab all the objects from the top of
795 the stack down to (but not including) the topmost marker
796 object.
797 """)
798
799stackslice = StackObject(
800 name="stackslice",
801 obtype=StackObject,
802 doc="""An object representing a contiguous slice of the stack.
803
804 This is used in conjuction with markobject, to represent all
805 of the stack following the topmost markobject. For example,
806 the POP_MARK opcode changes the stack from
807
808 [..., markobject, stackslice]
809 to
810 [...]
811
812 No matter how many object are on the stack after the topmost
813 markobject, POP_MARK gets rid of all of them (including the
814 topmost markobject too).
815 """)
816
817##############################################################################
818# Descriptors for pickle opcodes.
819
820class OpcodeInfo(object):
821
822 __slots__ = (
823 # symbolic name of opcode; a string
824 'name',
825
826 # the code used in a bytestream to represent the opcode; a
827 # one-character string
828 'code',
829
830 # If the opcode has an argument embedded in the byte string, an
831 # instance of ArgumentDescriptor specifying its type. Note that
832 # arg.reader(s) can be used to read and decode the argument from
833 # the bytestream s, and arg.doc documents the format of the raw
834 # argument bytes. If the opcode doesn't have an argument embedded
835 # in the bytestream, arg should be None.
836 'arg',
837
838 # what the stack looks like before this opcode runs; a list
839 'stack_before',
840
841 # what the stack looks like after this opcode runs; a list
842 'stack_after',
843
844 # the protocol number in which this opcode was introduced; an int
845 'proto',
846
847 # human-readable docs for this opcode; a string
848 'doc',
849 )
850
851 def __init__(self, name, code, arg,
852 stack_before, stack_after, proto, doc):
Guido van Rossum3172c5d2007-10-16 18:12:55 +0000853 assert isinstance(name, str)
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000854 self.name = name
855
Guido van Rossum3172c5d2007-10-16 18:12:55 +0000856 assert isinstance(code, str)
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000857 assert len(code) == 1
858 self.code = code
859
860 assert arg is None or isinstance(arg, ArgumentDescriptor)
861 self.arg = arg
862
863 assert isinstance(stack_before, list)
864 for x in stack_before:
865 assert isinstance(x, StackObject)
866 self.stack_before = stack_before
867
868 assert isinstance(stack_after, list)
869 for x in stack_after:
870 assert isinstance(x, StackObject)
871 self.stack_after = stack_after
872
873 assert isinstance(proto, int) and 0 <= proto <= 2
874 self.proto = proto
875
Guido van Rossum3172c5d2007-10-16 18:12:55 +0000876 assert isinstance(doc, str)
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000877 self.doc = doc
878
879I = OpcodeInfo
880opcodes = [
881
882 # Ways to spell integers.
883
884 I(name='INT',
885 code='I',
886 arg=decimalnl_short,
887 stack_before=[],
888 stack_after=[pyinteger_or_bool],
889 proto=0,
890 doc="""Push an integer or bool.
891
892 The argument is a newline-terminated decimal literal string.
893
894 The intent may have been that this always fit in a short Python int,
895 but INT can be generated in pickles written on a 64-bit box that
896 require a Python long on a 32-bit box. The difference between this
897 and LONG then is that INT skips a trailing 'L', and produces a short
898 int whenever possible.
899
900 Another difference is due to that, when bool was introduced as a
901 distinct type in 2.3, builtin names True and False were also added to
902 2.2.2, mapping to ints 1 and 0. For compatibility in both directions,
903 True gets pickled as INT + "I01\\n", and False as INT + "I00\\n".
904 Leading zeroes are never produced for a genuine integer. The 2.3
905 (and later) unpicklers special-case these and return bool instead;
906 earlier unpicklers ignore the leading "0" and return the int.
907 """),
908
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000909 I(name='BININT',
910 code='J',
911 arg=int4,
912 stack_before=[],
913 stack_after=[pyint],
914 proto=1,
915 doc="""Push a four-byte signed integer.
916
917 This handles the full range of Python (short) integers on a 32-bit
918 box, directly as binary bytes (1 for the opcode and 4 for the integer).
919 If the integer is non-negative and fits in 1 or 2 bytes, pickling via
920 BININT1 or BININT2 saves space.
921 """),
922
923 I(name='BININT1',
924 code='K',
925 arg=uint1,
926 stack_before=[],
927 stack_after=[pyint],
928 proto=1,
929 doc="""Push a one-byte unsigned integer.
930
931 This is a space optimization for pickling very small non-negative ints,
932 in range(256).
933 """),
934
935 I(name='BININT2',
936 code='M',
937 arg=uint2,
938 stack_before=[],
939 stack_after=[pyint],
940 proto=1,
941 doc="""Push a two-byte unsigned integer.
942
943 This is a space optimization for pickling small positive ints, in
944 range(256, 2**16). Integers in range(256) can also be pickled via
945 BININT2, but BININT1 instead saves a byte.
946 """),
947
Tim Petersfdc03462003-01-28 04:56:33 +0000948 I(name='LONG',
949 code='L',
950 arg=decimalnl_long,
951 stack_before=[],
952 stack_after=[pylong],
953 proto=0,
954 doc="""Push a long integer.
955
956 The same as INT, except that the literal ends with 'L', and always
957 unpickles to a Python long. There doesn't seem a real purpose to the
958 trailing 'L'.
959
960 Note that LONG takes time quadratic in the number of digits when
961 unpickling (this is simply due to the nature of decimal->binary
962 conversion). Proto 2 added linear-time (in C; still quadratic-time
963 in Python) LONG1 and LONG4 opcodes.
964 """),
965
966 I(name="LONG1",
967 code='\x8a',
968 arg=long1,
969 stack_before=[],
970 stack_after=[pylong],
971 proto=2,
972 doc="""Long integer using one-byte length.
973
974 A more efficient encoding of a Python long; the long1 encoding
975 says it all."""),
976
977 I(name="LONG4",
978 code='\x8b',
979 arg=long4,
980 stack_before=[],
981 stack_after=[pylong],
982 proto=2,
983 doc="""Long integer using found-byte length.
984
985 A more efficient encoding of a Python long; the long4 encoding
986 says it all."""),
987
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000988 # Ways to spell strings (8-bit, not Unicode).
989
990 I(name='STRING',
991 code='S',
992 arg=stringnl,
993 stack_before=[],
994 stack_after=[pystring],
995 proto=0,
996 doc="""Push a Python string object.
997
998 The argument is a repr-style string, with bracketing quote characters,
999 and perhaps embedded escapes. The argument extends until the next
1000 newline character.
1001 """),
1002
1003 I(name='BINSTRING',
1004 code='T',
1005 arg=string4,
1006 stack_before=[],
1007 stack_after=[pystring],
1008 proto=1,
1009 doc="""Push a Python string object.
1010
1011 There are two arguments: the first is a 4-byte little-endian signed int
1012 giving the number of bytes in the string, and the second is that many
1013 bytes, which are taken literally as the string content.
1014 """),
1015
1016 I(name='SHORT_BINSTRING',
1017 code='U',
1018 arg=string1,
1019 stack_before=[],
1020 stack_after=[pystring],
1021 proto=1,
1022 doc="""Push a Python string object.
1023
1024 There are two arguments: the first is a 1-byte unsigned int giving
1025 the number of bytes in the string, and the second is that many bytes,
1026 which are taken literally as the string content.
1027 """),
1028
1029 # Ways to spell None.
1030
1031 I(name='NONE',
1032 code='N',
1033 arg=None,
1034 stack_before=[],
1035 stack_after=[pynone],
1036 proto=0,
1037 doc="Push None on the stack."),
1038
Tim Petersfdc03462003-01-28 04:56:33 +00001039 # Ways to spell bools, starting with proto 2. See INT for how this was
1040 # done before proto 2.
1041
1042 I(name='NEWTRUE',
1043 code='\x88',
1044 arg=None,
1045 stack_before=[],
1046 stack_after=[pybool],
1047 proto=2,
1048 doc="""True.
1049
1050 Push True onto the stack."""),
1051
1052 I(name='NEWFALSE',
1053 code='\x89',
1054 arg=None,
1055 stack_before=[],
1056 stack_after=[pybool],
1057 proto=2,
1058 doc="""True.
1059
1060 Push False onto the stack."""),
1061
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001062 # Ways to spell Unicode strings.
1063
1064 I(name='UNICODE',
1065 code='V',
1066 arg=unicodestringnl,
1067 stack_before=[],
1068 stack_after=[pyunicode],
1069 proto=0, # this may be pure-text, but it's a later addition
1070 doc="""Push a Python Unicode string object.
1071
1072 The argument is a raw-unicode-escape encoding of a Unicode string,
1073 and so may contain embedded escape sequences. The argument extends
1074 until the next newline character.
1075 """),
1076
1077 I(name='BINUNICODE',
1078 code='X',
1079 arg=unicodestring4,
1080 stack_before=[],
1081 stack_after=[pyunicode],
1082 proto=1,
1083 doc="""Push a Python Unicode string object.
1084
1085 There are two arguments: the first is a 4-byte little-endian signed int
1086 giving the number of bytes in the string. The second is that many
1087 bytes, and is the UTF-8 encoding of the Unicode string.
1088 """),
1089
1090 # Ways to spell floats.
1091
1092 I(name='FLOAT',
1093 code='F',
1094 arg=floatnl,
1095 stack_before=[],
1096 stack_after=[pyfloat],
1097 proto=0,
1098 doc="""Newline-terminated decimal float literal.
1099
1100 The argument is repr(a_float), and in general requires 17 significant
1101 digits for roundtrip conversion to be an identity (this is so for
1102 IEEE-754 double precision values, which is what Python float maps to
1103 on most boxes).
1104
1105 In general, FLOAT cannot be used to transport infinities, NaNs, or
1106 minus zero across boxes (or even on a single box, if the platform C
1107 library can't read the strings it produces for such things -- Windows
1108 is like that), but may do less damage than BINFLOAT on boxes with
1109 greater precision or dynamic range than IEEE-754 double.
1110 """),
1111
1112 I(name='BINFLOAT',
1113 code='G',
1114 arg=float8,
1115 stack_before=[],
1116 stack_after=[pyfloat],
1117 proto=1,
1118 doc="""Float stored in binary form, with 8 bytes of data.
1119
1120 This generally requires less than half the space of FLOAT encoding.
1121 In general, BINFLOAT cannot be used to transport infinities, NaNs, or
1122 minus zero, raises an exception if the exponent exceeds the range of
1123 an IEEE-754 double, and retains no more than 53 bits of precision (if
1124 there are more than that, "add a half and chop" rounding is used to
1125 cut it back to 53 significant bits).
1126 """),
1127
1128 # Ways to build lists.
1129
1130 I(name='EMPTY_LIST',
1131 code=']',
1132 arg=None,
1133 stack_before=[],
1134 stack_after=[pylist],
1135 proto=1,
1136 doc="Push an empty list."),
1137
1138 I(name='APPEND',
1139 code='a',
1140 arg=None,
1141 stack_before=[pylist, anyobject],
1142 stack_after=[pylist],
1143 proto=0,
1144 doc="""Append an object to a list.
1145
1146 Stack before: ... pylist anyobject
1147 Stack after: ... pylist+[anyobject]
Tim Peters81098ac2003-01-28 05:12:08 +00001148
1149 although pylist is really extended in-place.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001150 """),
1151
1152 I(name='APPENDS',
1153 code='e',
1154 arg=None,
1155 stack_before=[pylist, markobject, stackslice],
1156 stack_after=[pylist],
1157 proto=1,
1158 doc="""Extend a list by a slice of stack objects.
1159
1160 Stack before: ... pylist markobject stackslice
1161 Stack after: ... pylist+stackslice
Tim Peters81098ac2003-01-28 05:12:08 +00001162
1163 although pylist is really extended in-place.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001164 """),
1165
1166 I(name='LIST',
1167 code='l',
1168 arg=None,
1169 stack_before=[markobject, stackslice],
1170 stack_after=[pylist],
1171 proto=0,
1172 doc="""Build a list out of the topmost stack slice, after markobject.
1173
1174 All the stack entries following the topmost markobject are placed into
1175 a single Python list, which single list object replaces all of the
1176 stack from the topmost markobject onward. For example,
1177
1178 Stack before: ... markobject 1 2 3 'abc'
1179 Stack after: ... [1, 2, 3, 'abc']
1180 """),
1181
1182 # Ways to build tuples.
1183
1184 I(name='EMPTY_TUPLE',
1185 code=')',
1186 arg=None,
1187 stack_before=[],
1188 stack_after=[pytuple],
1189 proto=1,
1190 doc="Push an empty tuple."),
1191
1192 I(name='TUPLE',
1193 code='t',
1194 arg=None,
1195 stack_before=[markobject, stackslice],
1196 stack_after=[pytuple],
1197 proto=0,
1198 doc="""Build a tuple out of the topmost stack slice, after markobject.
1199
1200 All the stack entries following the topmost markobject are placed into
1201 a single Python tuple, which single tuple object replaces all of the
1202 stack from the topmost markobject onward. For example,
1203
1204 Stack before: ... markobject 1 2 3 'abc'
1205 Stack after: ... (1, 2, 3, 'abc')
1206 """),
1207
Tim Petersfdc03462003-01-28 04:56:33 +00001208 I(name='TUPLE1',
1209 code='\x85',
1210 arg=None,
1211 stack_before=[anyobject],
1212 stack_after=[pytuple],
1213 proto=2,
1214 doc="""One-tuple.
1215
1216 This code pops one value off the stack and pushes a tuple of
1217 length 1 whose one item is that value back onto it. IOW:
1218
1219 stack[-1] = tuple(stack[-1:])
1220 """),
1221
1222 I(name='TUPLE2',
1223 code='\x86',
1224 arg=None,
1225 stack_before=[anyobject, anyobject],
1226 stack_after=[pytuple],
1227 proto=2,
1228 doc="""One-tuple.
1229
1230 This code pops two values off the stack and pushes a tuple
1231 of length 2 whose items are those values back onto it. IOW:
1232
1233 stack[-2:] = [tuple(stack[-2:])]
1234 """),
1235
1236 I(name='TUPLE3',
1237 code='\x87',
1238 arg=None,
1239 stack_before=[anyobject, anyobject, anyobject],
1240 stack_after=[pytuple],
1241 proto=2,
1242 doc="""One-tuple.
1243
1244 This code pops three values off the stack and pushes a tuple
1245 of length 3 whose items are those values back onto it. IOW:
1246
1247 stack[-3:] = [tuple(stack[-3:])]
1248 """),
1249
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001250 # Ways to build dicts.
1251
1252 I(name='EMPTY_DICT',
1253 code='}',
1254 arg=None,
1255 stack_before=[],
1256 stack_after=[pydict],
1257 proto=1,
1258 doc="Push an empty dict."),
1259
1260 I(name='DICT',
1261 code='d',
1262 arg=None,
1263 stack_before=[markobject, stackslice],
1264 stack_after=[pydict],
1265 proto=0,
1266 doc="""Build a dict out of the topmost stack slice, after markobject.
1267
1268 All the stack entries following the topmost markobject are placed into
1269 a single Python dict, which single dict object replaces all of the
1270 stack from the topmost markobject onward. The stack slice alternates
1271 key, value, key, value, .... For example,
1272
1273 Stack before: ... markobject 1 2 3 'abc'
1274 Stack after: ... {1: 2, 3: 'abc'}
1275 """),
1276
1277 I(name='SETITEM',
1278 code='s',
1279 arg=None,
1280 stack_before=[pydict, anyobject, anyobject],
1281 stack_after=[pydict],
1282 proto=0,
1283 doc="""Add a key+value pair to an existing dict.
1284
1285 Stack before: ... pydict key value
1286 Stack after: ... pydict
1287
1288 where pydict has been modified via pydict[key] = value.
1289 """),
1290
1291 I(name='SETITEMS',
1292 code='u',
1293 arg=None,
1294 stack_before=[pydict, markobject, stackslice],
1295 stack_after=[pydict],
1296 proto=1,
1297 doc="""Add an arbitrary number of key+value pairs to an existing dict.
1298
1299 The slice of the stack following the topmost markobject is taken as
1300 an alternating sequence of keys and values, added to the dict
1301 immediately under the topmost markobject. Everything at and after the
1302 topmost markobject is popped, leaving the mutated dict at the top
1303 of the stack.
1304
1305 Stack before: ... pydict markobject key_1 value_1 ... key_n value_n
1306 Stack after: ... pydict
1307
1308 where pydict has been modified via pydict[key_i] = value_i for i in
1309 1, 2, ..., n, and in that order.
1310 """),
1311
1312 # Stack manipulation.
1313
1314 I(name='POP',
1315 code='0',
1316 arg=None,
1317 stack_before=[anyobject],
1318 stack_after=[],
1319 proto=0,
1320 doc="Discard the top stack item, shrinking the stack by one item."),
1321
1322 I(name='DUP',
1323 code='2',
1324 arg=None,
1325 stack_before=[anyobject],
1326 stack_after=[anyobject, anyobject],
1327 proto=0,
1328 doc="Push the top stack item onto the stack again, duplicating it."),
1329
1330 I(name='MARK',
1331 code='(',
1332 arg=None,
1333 stack_before=[],
1334 stack_after=[markobject],
1335 proto=0,
1336 doc="""Push markobject onto the stack.
1337
1338 markobject is a unique object, used by other opcodes to identify a
1339 region of the stack containing a variable number of objects for them
1340 to work on. See markobject.doc for more detail.
1341 """),
1342
1343 I(name='POP_MARK',
1344 code='1',
1345 arg=None,
1346 stack_before=[markobject, stackslice],
1347 stack_after=[],
1348 proto=0,
1349 doc="""Pop all the stack objects at and above the topmost markobject.
1350
1351 When an opcode using a variable number of stack objects is done,
1352 POP_MARK is used to remove those objects, and to remove the markobject
1353 that delimited their starting position on the stack.
1354 """),
1355
1356 # Memo manipulation. There are really only two operations (get and put),
1357 # each in all-text, "short binary", and "long binary" flavors.
1358
1359 I(name='GET',
1360 code='g',
1361 arg=decimalnl_short,
1362 stack_before=[],
1363 stack_after=[anyobject],
1364 proto=0,
1365 doc="""Read an object from the memo and push it on the stack.
1366
1367 The index of the memo object to push is given by the newline-teriminated
1368 decimal string following. BINGET and LONG_BINGET are space-optimized
1369 versions.
1370 """),
1371
1372 I(name='BINGET',
1373 code='h',
1374 arg=uint1,
1375 stack_before=[],
1376 stack_after=[anyobject],
1377 proto=1,
1378 doc="""Read an object from the memo and push it on the stack.
1379
1380 The index of the memo object to push is given by the 1-byte unsigned
1381 integer following.
1382 """),
1383
1384 I(name='LONG_BINGET',
1385 code='j',
1386 arg=int4,
1387 stack_before=[],
1388 stack_after=[anyobject],
1389 proto=1,
1390 doc="""Read an object from the memo and push it on the stack.
1391
1392 The index of the memo object to push is given by the 4-byte signed
1393 little-endian integer following.
1394 """),
1395
1396 I(name='PUT',
1397 code='p',
1398 arg=decimalnl_short,
1399 stack_before=[],
1400 stack_after=[],
1401 proto=0,
1402 doc="""Store the stack top into the memo. The stack is not popped.
1403
1404 The index of the memo location to write into is given by the newline-
1405 terminated decimal string following. BINPUT and LONG_BINPUT are
1406 space-optimized versions.
1407 """),
1408
1409 I(name='BINPUT',
1410 code='q',
1411 arg=uint1,
1412 stack_before=[],
1413 stack_after=[],
1414 proto=1,
1415 doc="""Store the stack top into the memo. The stack is not popped.
1416
1417 The index of the memo location to write into is given by the 1-byte
1418 unsigned integer following.
1419 """),
1420
1421 I(name='LONG_BINPUT',
1422 code='r',
1423 arg=int4,
1424 stack_before=[],
1425 stack_after=[],
1426 proto=1,
1427 doc="""Store the stack top into the memo. The stack is not popped.
1428
1429 The index of the memo location to write into is given by the 4-byte
1430 signed little-endian integer following.
1431 """),
1432
Tim Petersfdc03462003-01-28 04:56:33 +00001433 # Access the extension registry (predefined objects). Akin to the GET
1434 # family.
1435
1436 I(name='EXT1',
1437 code='\x82',
1438 arg=uint1,
1439 stack_before=[],
1440 stack_after=[anyobject],
1441 proto=2,
1442 doc="""Extension code.
1443
1444 This code and the similar EXT2 and EXT4 allow using a registry
1445 of popular objects that are pickled by name, typically classes.
1446 It is envisioned that through a global negotiation and
1447 registration process, third parties can set up a mapping between
1448 ints and object names.
1449
1450 In order to guarantee pickle interchangeability, the extension
1451 code registry ought to be global, although a range of codes may
1452 be reserved for private use.
1453
1454 EXT1 has a 1-byte integer argument. This is used to index into the
1455 extension registry, and the object at that index is pushed on the stack.
1456 """),
1457
1458 I(name='EXT2',
1459 code='\x83',
1460 arg=uint2,
1461 stack_before=[],
1462 stack_after=[anyobject],
1463 proto=2,
1464 doc="""Extension code.
1465
1466 See EXT1. EXT2 has a two-byte integer argument.
1467 """),
1468
1469 I(name='EXT4',
1470 code='\x84',
1471 arg=int4,
1472 stack_before=[],
1473 stack_after=[anyobject],
1474 proto=2,
1475 doc="""Extension code.
1476
1477 See EXT1. EXT4 has a four-byte integer argument.
1478 """),
1479
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001480 # Push a class object, or module function, on the stack, via its module
1481 # and name.
1482
1483 I(name='GLOBAL',
1484 code='c',
1485 arg=stringnl_noescape_pair,
1486 stack_before=[],
1487 stack_after=[anyobject],
1488 proto=0,
1489 doc="""Push a global object (module.attr) on the stack.
1490
1491 Two newline-terminated strings follow the GLOBAL opcode. The first is
1492 taken as a module name, and the second as a class name. The class
1493 object module.class is pushed on the stack. More accurately, the
1494 object returned by self.find_class(module, class) is pushed on the
1495 stack, so unpickling subclasses can override this form of lookup.
1496 """),
1497
1498 # Ways to build objects of classes pickle doesn't know about directly
1499 # (user-defined classes). I despair of documenting this accurately
1500 # and comprehensibly -- you really have to read the pickle code to
1501 # find all the special cases.
1502
1503 I(name='REDUCE',
1504 code='R',
1505 arg=None,
1506 stack_before=[anyobject, anyobject],
1507 stack_after=[anyobject],
1508 proto=0,
1509 doc="""Push an object built from a callable and an argument tuple.
1510
1511 The opcode is named to remind of the __reduce__() method.
1512
1513 Stack before: ... callable pytuple
1514 Stack after: ... callable(*pytuple)
1515
1516 The callable and the argument tuple are the first two items returned
1517 by a __reduce__ method. Applying the callable to the argtuple is
1518 supposed to reproduce the original object, or at least get it started.
1519 If the __reduce__ method returns a 3-tuple, the last component is an
1520 argument to be passed to the object's __setstate__, and then the REDUCE
1521 opcode is followed by code to create setstate's argument, and then a
1522 BUILD opcode to apply __setstate__ to that argument.
1523
Guido van Rossum13257902007-06-07 23:15:56 +00001524 If not isinstance(callable, type), REDUCE complains unless the
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001525 callable has been registered with the copy_reg module's
1526 safe_constructors dict, or the callable has a magic
1527 '__safe_for_unpickling__' attribute with a true value. I'm not sure
1528 why it does this, but I've sure seen this complaint often enough when
1529 I didn't want to <wink>.
1530 """),
1531
1532 I(name='BUILD',
1533 code='b',
1534 arg=None,
1535 stack_before=[anyobject, anyobject],
1536 stack_after=[anyobject],
1537 proto=0,
1538 doc="""Finish building an object, via __setstate__ or dict update.
1539
1540 Stack before: ... anyobject argument
1541 Stack after: ... anyobject
1542
1543 where anyobject may have been mutated, as follows:
1544
1545 If the object has a __setstate__ method,
1546
1547 anyobject.__setstate__(argument)
1548
1549 is called.
1550
1551 Else the argument must be a dict, the object must have a __dict__, and
1552 the object is updated via
1553
1554 anyobject.__dict__.update(argument)
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001555 """),
1556
1557 I(name='INST',
1558 code='i',
1559 arg=stringnl_noescape_pair,
1560 stack_before=[markobject, stackslice],
1561 stack_after=[anyobject],
1562 proto=0,
1563 doc="""Build a class instance.
1564
1565 This is the protocol 0 version of protocol 1's OBJ opcode.
1566 INST is followed by two newline-terminated strings, giving a
1567 module and class name, just as for the GLOBAL opcode (and see
1568 GLOBAL for more details about that). self.find_class(module, name)
1569 is used to get a class object.
1570
1571 In addition, all the objects on the stack following the topmost
1572 markobject are gathered into a tuple and popped (along with the
1573 topmost markobject), just as for the TUPLE opcode.
1574
1575 Now it gets complicated. If all of these are true:
1576
1577 + The argtuple is empty (markobject was at the top of the stack
1578 at the start).
1579
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001580 + The class object does not have a __getinitargs__ attribute.
1581
1582 then we want to create an old-style class instance without invoking
1583 its __init__() method (pickle has waffled on this over the years; not
1584 calling __init__() is current wisdom). In this case, an instance of
1585 an old-style dummy class is created, and then we try to rebind its
1586 __class__ attribute to the desired class object. If this succeeds,
Guido van Rossuma8add0e2007-05-14 22:03:55 +00001587 the new instance object is pushed on the stack, and we're done.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001588
1589 Else (the argtuple is not empty, it's not an old-style class object,
1590 or the class object does have a __getinitargs__ attribute), the code
1591 first insists that the class object have a __safe_for_unpickling__
1592 attribute. Unlike as for the __safe_for_unpickling__ check in REDUCE,
1593 it doesn't matter whether this attribute has a true or false value, it
Guido van Rossum99603b02007-07-20 00:22:32 +00001594 only matters whether it exists (XXX this is a bug). If
1595 __safe_for_unpickling__ doesn't exist, UnpicklingError is raised.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001596
1597 Else (the class object does have a __safe_for_unpickling__ attr),
1598 the class object obtained from INST's arguments is applied to the
1599 argtuple obtained from the stack, and the resulting instance object
1600 is pushed on the stack.
Tim Peters2b93c4c2003-01-30 16:35:08 +00001601
1602 NOTE: checks for __safe_for_unpickling__ went away in Python 2.3.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001603 """),
1604
1605 I(name='OBJ',
1606 code='o',
1607 arg=None,
1608 stack_before=[markobject, anyobject, stackslice],
1609 stack_after=[anyobject],
1610 proto=1,
1611 doc="""Build a class instance.
1612
1613 This is the protocol 1 version of protocol 0's INST opcode, and is
1614 very much like it. The major difference is that the class object
1615 is taken off the stack, allowing it to be retrieved from the memo
1616 repeatedly if several instances of the same class are created. This
1617 can be much more efficient (in both time and space) than repeatedly
1618 embedding the module and class names in INST opcodes.
1619
1620 Unlike INST, OBJ takes no arguments from the opcode stream. Instead
1621 the class object is taken off the stack, immediately above the
1622 topmost markobject:
1623
1624 Stack before: ... markobject classobject stackslice
1625 Stack after: ... new_instance_object
1626
1627 As for INST, the remainder of the stack above the markobject is
1628 gathered into an argument tuple, and then the logic seems identical,
Guido van Rossumecb11042003-01-29 06:24:30 +00001629 except that no __safe_for_unpickling__ check is done (XXX this is
Guido van Rossum99603b02007-07-20 00:22:32 +00001630 a bug). See INST for the gory details.
Tim Peters2b93c4c2003-01-30 16:35:08 +00001631
1632 NOTE: In Python 2.3, INST and OBJ are identical except for how they
1633 get the class object. That was always the intent; the implementations
1634 had diverged for accidental reasons.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001635 """),
1636
Tim Petersfdc03462003-01-28 04:56:33 +00001637 I(name='NEWOBJ',
1638 code='\x81',
1639 arg=None,
1640 stack_before=[anyobject, anyobject],
1641 stack_after=[anyobject],
1642 proto=2,
1643 doc="""Build an object instance.
1644
1645 The stack before should be thought of as containing a class
1646 object followed by an argument tuple (the tuple being the stack
1647 top). Call these cls and args. They are popped off the stack,
1648 and the value returned by cls.__new__(cls, *args) is pushed back
1649 onto the stack.
1650 """),
1651
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001652 # Machine control.
1653
Tim Petersfdc03462003-01-28 04:56:33 +00001654 I(name='PROTO',
1655 code='\x80',
1656 arg=uint1,
1657 stack_before=[],
1658 stack_after=[],
1659 proto=2,
1660 doc="""Protocol version indicator.
1661
1662 For protocol 2 and above, a pickle must start with this opcode.
1663 The argument is the protocol version, an int in range(2, 256).
1664 """),
1665
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001666 I(name='STOP',
1667 code='.',
1668 arg=None,
1669 stack_before=[anyobject],
1670 stack_after=[],
1671 proto=0,
1672 doc="""Stop the unpickling machine.
1673
1674 Every pickle ends with this opcode. The object at the top of the stack
1675 is popped, and that's the result of unpickling. The stack should be
1676 empty then.
1677 """),
1678
1679 # Ways to deal with persistent IDs.
1680
1681 I(name='PERSID',
1682 code='P',
1683 arg=stringnl_noescape,
1684 stack_before=[],
1685 stack_after=[anyobject],
1686 proto=0,
1687 doc="""Push an object identified by a persistent ID.
1688
1689 The pickle module doesn't define what a persistent ID means. PERSID's
1690 argument is a newline-terminated str-style (no embedded escapes, no
1691 bracketing quote characters) string, which *is* "the persistent ID".
1692 The unpickler passes this string to self.persistent_load(). Whatever
1693 object that returns is pushed on the stack. There is no implementation
1694 of persistent_load() in Python's unpickler: it must be supplied by an
1695 unpickler subclass.
1696 """),
1697
1698 I(name='BINPERSID',
1699 code='Q',
1700 arg=None,
1701 stack_before=[anyobject],
1702 stack_after=[anyobject],
1703 proto=1,
1704 doc="""Push an object identified by a persistent ID.
1705
1706 Like PERSID, except the persistent ID is popped off the stack (instead
1707 of being a string embedded in the opcode bytestream). The persistent
1708 ID is passed to self.persistent_load(), and whatever object that
1709 returns is pushed on the stack. See PERSID for more detail.
1710 """),
1711]
1712del I
1713
1714# Verify uniqueness of .name and .code members.
1715name2i = {}
1716code2i = {}
1717
1718for i, d in enumerate(opcodes):
1719 if d.name in name2i:
1720 raise ValueError("repeated name %r at indices %d and %d" %
1721 (d.name, name2i[d.name], i))
1722 if d.code in code2i:
1723 raise ValueError("repeated code %r at indices %d and %d" %
1724 (d.code, code2i[d.code], i))
1725
1726 name2i[d.name] = i
1727 code2i[d.code] = i
1728
1729del name2i, code2i, i, d
1730
1731##############################################################################
1732# Build a code2op dict, mapping opcode characters to OpcodeInfo records.
1733# Also ensure we've got the same stuff as pickle.py, although the
1734# introspection here is dicey.
1735
1736code2op = {}
1737for d in opcodes:
1738 code2op[d.code] = d
1739del d
1740
1741def assure_pickle_consistency(verbose=False):
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001742
1743 copy = code2op.copy()
1744 for name in pickle.__all__:
1745 if not re.match("[A-Z][A-Z0-9_]+$", name):
1746 if verbose:
Guido van Rossumbe19ed72007-02-09 05:37:30 +00001747 print("skipping %r: it doesn't look like an opcode name" % name)
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001748 continue
1749 picklecode = getattr(pickle, name)
Guido van Rossum617dbc42007-05-07 23:57:08 +00001750 if not isinstance(picklecode, bytes) or len(picklecode) != 1:
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001751 if verbose:
Guido van Rossumbe19ed72007-02-09 05:37:30 +00001752 print(("skipping %r: value %r doesn't look like a pickle "
1753 "code" % (name, picklecode)))
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001754 continue
Guido van Rossum617dbc42007-05-07 23:57:08 +00001755 picklecode = picklecode.decode("latin-1")
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001756 if picklecode in copy:
1757 if verbose:
Guido van Rossumbe19ed72007-02-09 05:37:30 +00001758 print("checking name %r w/ code %r for consistency" % (
1759 name, picklecode))
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001760 d = copy[picklecode]
1761 if d.name != name:
1762 raise ValueError("for pickle code %r, pickle.py uses name %r "
1763 "but we're using name %r" % (picklecode,
1764 name,
1765 d.name))
1766 # Forget this one. Any left over in copy at the end are a problem
1767 # of a different kind.
1768 del copy[picklecode]
1769 else:
1770 raise ValueError("pickle.py appears to have a pickle opcode with "
1771 "name %r and code %r, but we don't" %
1772 (name, picklecode))
1773 if copy:
1774 msg = ["we appear to have pickle opcodes that pickle.py doesn't have:"]
1775 for code, d in copy.items():
1776 msg.append(" name %r with code %r" % (d.name, code))
1777 raise ValueError("\n".join(msg))
1778
1779assure_pickle_consistency()
Tim Petersc0c12b52003-01-29 00:56:17 +00001780del assure_pickle_consistency
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001781
1782##############################################################################
1783# A pickle opcode generator.
1784
1785def genops(pickle):
Guido van Rossuma72ded92003-01-27 19:40:47 +00001786 """Generate all the opcodes in a pickle.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001787
1788 'pickle' is a file-like object, or string, containing the pickle.
1789
1790 Each opcode in the pickle is generated, from the current pickle position,
1791 stopping after a STOP opcode is delivered. A triple is generated for
1792 each opcode:
1793
1794 opcode, arg, pos
1795
1796 opcode is an OpcodeInfo record, describing the current opcode.
1797
1798 If the opcode has an argument embedded in the pickle, arg is its decoded
1799 value, as a Python object. If the opcode doesn't have an argument, arg
1800 is None.
1801
1802 If the pickle has a tell() method, pos was the value of pickle.tell()
Guido van Rossum34d19282007-08-09 01:03:29 +00001803 before reading the current opcode. If the pickle is a bytes object,
1804 it's wrapped in a BytesIO object, and the latter's tell() result is
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001805 used. Else (the pickle doesn't have a tell(), and it's not obvious how
1806 to query its current position) pos is None.
1807 """
1808
Guido van Rossum98297ee2007-11-06 21:34:58 +00001809 if isinstance(pickle, bytes_types):
Guido van Rossumcfe5f202007-05-08 21:26:54 +00001810 import io
1811 pickle = io.BytesIO(pickle)
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001812
1813 if hasattr(pickle, "tell"):
1814 getpos = pickle.tell
1815 else:
1816 getpos = lambda: None
1817
1818 while True:
1819 pos = getpos()
1820 code = pickle.read(1)
Guido van Rossumcfe5f202007-05-08 21:26:54 +00001821 opcode = code2op.get(code.decode("latin-1"))
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001822 if opcode is None:
Guido van Rossumcfe5f202007-05-08 21:26:54 +00001823 if code == b"":
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001824 raise ValueError("pickle exhausted before seeing STOP")
1825 else:
1826 raise ValueError("at position %s, opcode %r unknown" % (
1827 pos is None and "<unknown>" or pos,
1828 code))
1829 if opcode.arg is None:
1830 arg = None
1831 else:
1832 arg = opcode.arg.reader(pickle)
1833 yield opcode, arg, pos
Guido van Rossumcfe5f202007-05-08 21:26:54 +00001834 if code == b'.':
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001835 assert opcode.name == 'STOP'
1836 break
1837
1838##############################################################################
1839# A symbolic pickle disassembler.
1840
Tim Peters62235e72003-02-05 19:55:53 +00001841def dis(pickle, out=None, memo=None, indentlevel=4):
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001842 """Produce a symbolic disassembly of a pickle.
1843
1844 'pickle' is a file-like object, or string, containing a (at least one)
1845 pickle. The pickle is disassembled from the current position, through
1846 the first STOP opcode encountered.
1847
1848 Optional arg 'out' is a file-like object to which the disassembly is
1849 printed. It defaults to sys.stdout.
1850
Tim Peters62235e72003-02-05 19:55:53 +00001851 Optional arg 'memo' is a Python dict, used as the pickle's memo. It
1852 may be mutated by dis(), if the pickle contains PUT or BINPUT opcodes.
1853 Passing the same memo object to another dis() call then allows disassembly
1854 to proceed across multiple pickles that were all created by the same
1855 pickler with the same memo. Ordinarily you don't need to worry about this.
1856
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001857 Optional arg indentlevel is the number of blanks by which to indent
1858 a new MARK level. It defaults to 4.
Tim Petersc1c2b3e2003-01-29 20:12:21 +00001859
1860 In addition to printing the disassembly, some sanity checks are made:
1861
1862 + All embedded opcode arguments "make sense".
1863
1864 + Explicit and implicit pop operations have enough items on the stack.
1865
1866 + When an opcode implicitly refers to a markobject, a markobject is
1867 actually on the stack.
1868
1869 + A memo entry isn't referenced before it's defined.
1870
1871 + The markobject isn't stored in the memo.
1872
1873 + A memo entry isn't redefined.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001874 """
1875
Tim Petersc1c2b3e2003-01-29 20:12:21 +00001876 # Most of the hair here is for sanity checks, but most of it is needed
1877 # anyway to detect when a protocol 0 POP takes a MARK off the stack
1878 # (which in turn is needed to indent MARK blocks correctly).
1879
1880 stack = [] # crude emulation of unpickler stack
Tim Peters62235e72003-02-05 19:55:53 +00001881 if memo is None:
1882 memo = {} # crude emulation of unpicker memo
Tim Petersc1c2b3e2003-01-29 20:12:21 +00001883 maxproto = -1 # max protocol number seen
1884 markstack = [] # bytecode positions of MARK opcodes
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001885 indentchunk = ' ' * indentlevel
Tim Petersc1c2b3e2003-01-29 20:12:21 +00001886 errormsg = None
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001887 for opcode, arg, pos in genops(pickle):
1888 if pos is not None:
Guido van Rossumbe19ed72007-02-09 05:37:30 +00001889 print("%5d:" % pos, end=' ', file=out)
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001890
Tim Petersd0f7c862003-01-28 15:27:57 +00001891 line = "%-4s %s%s" % (repr(opcode.code)[1:-1],
1892 indentchunk * len(markstack),
1893 opcode.name)
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001894
Tim Petersc1c2b3e2003-01-29 20:12:21 +00001895 maxproto = max(maxproto, opcode.proto)
Tim Petersc1c2b3e2003-01-29 20:12:21 +00001896 before = opcode.stack_before # don't mutate
1897 after = opcode.stack_after # don't mutate
Tim Peters43277d62003-01-30 15:02:12 +00001898 numtopop = len(before)
1899
1900 # See whether a MARK should be popped.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001901 markmsg = None
Tim Petersc1c2b3e2003-01-29 20:12:21 +00001902 if markobject in before or (opcode.name == "POP" and
1903 stack and
1904 stack[-1] is markobject):
1905 assert markobject not in after
Tim Peters43277d62003-01-30 15:02:12 +00001906 if __debug__:
1907 if markobject in before:
1908 assert before[-1] is stackslice
Tim Petersc1c2b3e2003-01-29 20:12:21 +00001909 if markstack:
1910 markpos = markstack.pop()
1911 if markpos is None:
1912 markmsg = "(MARK at unknown opcode offset)"
1913 else:
1914 markmsg = "(MARK at %d)" % markpos
1915 # Pop everything at and after the topmost markobject.
1916 while stack[-1] is not markobject:
1917 stack.pop()
1918 stack.pop()
Tim Peters43277d62003-01-30 15:02:12 +00001919 # Stop later code from popping too much.
Tim Petersc1c2b3e2003-01-29 20:12:21 +00001920 try:
Tim Peters43277d62003-01-30 15:02:12 +00001921 numtopop = before.index(markobject)
Tim Petersc1c2b3e2003-01-29 20:12:21 +00001922 except ValueError:
1923 assert opcode.name == "POP"
Tim Peters43277d62003-01-30 15:02:12 +00001924 numtopop = 0
Tim Petersc1c2b3e2003-01-29 20:12:21 +00001925 else:
1926 errormsg = markmsg = "no MARK exists on stack"
1927
1928 # Check for correct memo usage.
1929 if opcode.name in ("PUT", "BINPUT", "LONG_BINPUT"):
Tim Peters43277d62003-01-30 15:02:12 +00001930 assert arg is not None
Tim Petersc1c2b3e2003-01-29 20:12:21 +00001931 if arg in memo:
1932 errormsg = "memo key %r already defined" % arg
1933 elif not stack:
1934 errormsg = "stack is empty -- can't store into memo"
1935 elif stack[-1] is markobject:
1936 errormsg = "can't store markobject in the memo"
1937 else:
1938 memo[arg] = stack[-1]
1939
1940 elif opcode.name in ("GET", "BINGET", "LONG_BINGET"):
1941 if arg in memo:
1942 assert len(after) == 1
1943 after = [memo[arg]] # for better stack emulation
1944 else:
1945 errormsg = "memo key %r has never been stored into" % arg
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001946
1947 if arg is not None or markmsg:
1948 # make a mild effort to align arguments
1949 line += ' ' * (10 - len(opcode.name))
1950 if arg is not None:
1951 line += ' ' + repr(arg)
1952 if markmsg:
1953 line += ' ' + markmsg
Guido van Rossumbe19ed72007-02-09 05:37:30 +00001954 print(line, file=out)
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001955
Tim Petersc1c2b3e2003-01-29 20:12:21 +00001956 if errormsg:
1957 # Note that we delayed complaining until the offending opcode
1958 # was printed.
1959 raise ValueError(errormsg)
1960
1961 # Emulate the stack effects.
Tim Peters43277d62003-01-30 15:02:12 +00001962 if len(stack) < numtopop:
1963 raise ValueError("tries to pop %d items from stack with "
1964 "only %d items" % (numtopop, len(stack)))
1965 if numtopop:
1966 del stack[-numtopop:]
Tim Petersc1c2b3e2003-01-29 20:12:21 +00001967 if markobject in after:
Tim Peters43277d62003-01-30 15:02:12 +00001968 assert markobject not in before
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001969 markstack.append(pos)
1970
Tim Petersc1c2b3e2003-01-29 20:12:21 +00001971 stack.extend(after)
1972
Guido van Rossumbe19ed72007-02-09 05:37:30 +00001973 print("highest protocol among opcodes =", maxproto, file=out)
Tim Petersc1c2b3e2003-01-29 20:12:21 +00001974 if stack:
1975 raise ValueError("stack not empty after STOP: %r" % stack)
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001976
Tim Peters90718a42005-02-15 16:22:34 +00001977# For use in the doctest, simply as an example of a class to pickle.
1978class _Example:
1979 def __init__(self, value):
1980 self.value = value
1981
Guido van Rossum03e35322003-01-28 15:37:13 +00001982_dis_test = r"""
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001983>>> import pickle
Guido van Rossum98297ee2007-11-06 21:34:58 +00001984>>> x = [1, 2, (3, 4), {bytes(b'abc'): "def"}]
Guido van Rossum57028352003-01-28 15:09:10 +00001985>>> pkl = pickle.dumps(x, 0)
1986>>> dis(pkl)
Tim Petersd0f7c862003-01-28 15:27:57 +00001987 0: ( MARK
1988 1: l LIST (MARK at 0)
1989 2: p PUT 0
Guido van Rossumf4100002007-01-15 00:21:46 +00001990 5: L LONG 1
Tim Petersd0f7c862003-01-28 15:27:57 +00001991 8: a APPEND
Guido van Rossumf4100002007-01-15 00:21:46 +00001992 9: L LONG 2
Tim Petersd0f7c862003-01-28 15:27:57 +00001993 12: a APPEND
1994 13: ( MARK
Guido van Rossumf4100002007-01-15 00:21:46 +00001995 14: L LONG 3
1996 17: L LONG 4
Tim Petersd0f7c862003-01-28 15:27:57 +00001997 20: t TUPLE (MARK at 13)
1998 21: p PUT 1
1999 24: a APPEND
2000 25: ( MARK
2001 26: d DICT (MARK at 25)
2002 27: p PUT 2
2003 30: S STRING 'abc'
2004 37: p PUT 3
Guido van Rossumcfe5f202007-05-08 21:26:54 +00002005 40: V UNICODE 'def'
Tim Petersd0f7c862003-01-28 15:27:57 +00002006 45: p PUT 4
2007 48: s SETITEM
2008 49: a APPEND
2009 50: . STOP
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002010highest protocol among opcodes = 0
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002011
2012Try again with a "binary" pickle.
2013
Guido van Rossum57028352003-01-28 15:09:10 +00002014>>> pkl = pickle.dumps(x, 1)
2015>>> dis(pkl)
Tim Petersd0f7c862003-01-28 15:27:57 +00002016 0: ] EMPTY_LIST
2017 1: q BINPUT 0
2018 3: ( MARK
2019 4: K BININT1 1
2020 6: K BININT1 2
2021 8: ( MARK
2022 9: K BININT1 3
2023 11: K BININT1 4
2024 13: t TUPLE (MARK at 8)
2025 14: q BINPUT 1
2026 16: } EMPTY_DICT
2027 17: q BINPUT 2
2028 19: U SHORT_BINSTRING 'abc'
2029 24: q BINPUT 3
Guido van Rossumcfe5f202007-05-08 21:26:54 +00002030 26: X BINUNICODE 'def'
Tim Petersd0f7c862003-01-28 15:27:57 +00002031 34: q BINPUT 4
2032 36: s SETITEM
2033 37: e APPENDS (MARK at 3)
2034 38: . STOP
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002035highest protocol among opcodes = 1
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002036
2037Exercise the INST/OBJ/BUILD family.
2038
2039>>> import random
Guido van Rossum4f7ac2e2007-02-26 15:59:50 +00002040>>> dis(pickle.dumps(random.getrandbits, 0))
2041 0: c GLOBAL 'random getrandbits'
2042 20: p PUT 0
2043 23: . STOP
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002044highest protocol among opcodes = 0
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002045
Tim Peters90718a42005-02-15 16:22:34 +00002046>>> from pickletools import _Example
2047>>> x = [_Example(42)] * 2
Guido van Rossumf29d3d62003-01-27 22:47:53 +00002048>>> dis(pickle.dumps(x, 0))
Tim Petersd0f7c862003-01-28 15:27:57 +00002049 0: ( MARK
2050 1: l LIST (MARK at 0)
2051 2: p PUT 0
Guido van Rossum65810fe2006-05-26 19:12:38 +00002052 5: c GLOBAL 'copy_reg _reconstructor'
2053 30: p PUT 1
2054 33: ( MARK
2055 34: c GLOBAL 'pickletools _Example'
2056 56: p PUT 2
2057 59: c GLOBAL '__builtin__ object'
2058 79: p PUT 3
2059 82: N NONE
2060 83: t TUPLE (MARK at 33)
2061 84: p PUT 4
2062 87: R REDUCE
2063 88: p PUT 5
2064 91: ( MARK
2065 92: d DICT (MARK at 91)
2066 93: p PUT 6
Guido van Rossum26986312007-07-17 00:19:46 +00002067 96: V UNICODE 'value'
2068 103: p PUT 7
2069 106: L LONG 42
2070 110: s SETITEM
2071 111: b BUILD
2072 112: a APPEND
2073 113: g GET 5
2074 116: a APPEND
2075 117: . STOP
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002076highest protocol among opcodes = 0
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002077
2078>>> dis(pickle.dumps(x, 1))
Tim Petersd0f7c862003-01-28 15:27:57 +00002079 0: ] EMPTY_LIST
2080 1: q BINPUT 0
2081 3: ( MARK
Guido van Rossum65810fe2006-05-26 19:12:38 +00002082 4: c GLOBAL 'copy_reg _reconstructor'
2083 29: q BINPUT 1
2084 31: ( MARK
2085 32: c GLOBAL 'pickletools _Example'
2086 54: q BINPUT 2
2087 56: c GLOBAL '__builtin__ object'
2088 76: q BINPUT 3
2089 78: N NONE
2090 79: t TUPLE (MARK at 31)
2091 80: q BINPUT 4
2092 82: R REDUCE
2093 83: q BINPUT 5
2094 85: } EMPTY_DICT
2095 86: q BINPUT 6
Guido van Rossum26986312007-07-17 00:19:46 +00002096 88: X BINUNICODE 'value'
2097 98: q BINPUT 7
2098 100: K BININT1 42
2099 102: s SETITEM
2100 103: b BUILD
2101 104: h BINGET 5
2102 106: e APPENDS (MARK at 3)
2103 107: . STOP
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002104highest protocol among opcodes = 1
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002105
2106Try "the canonical" recursive-object test.
2107
2108>>> L = []
2109>>> T = L,
2110>>> L.append(T)
2111>>> L[0] is T
2112True
2113>>> T[0] is L
2114True
2115>>> L[0][0] is L
2116True
2117>>> T[0][0] is T
2118True
Guido van Rossumf29d3d62003-01-27 22:47:53 +00002119>>> dis(pickle.dumps(L, 0))
Tim Petersd0f7c862003-01-28 15:27:57 +00002120 0: ( MARK
2121 1: l LIST (MARK at 0)
2122 2: p PUT 0
2123 5: ( MARK
2124 6: g GET 0
2125 9: t TUPLE (MARK at 5)
2126 10: p PUT 1
2127 13: a APPEND
2128 14: . STOP
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002129highest protocol among opcodes = 0
2130
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002131>>> dis(pickle.dumps(L, 1))
Tim Petersd0f7c862003-01-28 15:27:57 +00002132 0: ] EMPTY_LIST
2133 1: q BINPUT 0
2134 3: ( MARK
2135 4: h BINGET 0
2136 6: t TUPLE (MARK at 3)
2137 7: q BINPUT 1
2138 9: a APPEND
2139 10: . STOP
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002140highest protocol among opcodes = 1
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002141
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002142Note that, in the protocol 0 pickle of the recursive tuple, the disassembler
2143has to emulate the stack in order to realize that the POP opcode at 16 gets
2144rid of the MARK at 0.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002145
Guido van Rossumf29d3d62003-01-27 22:47:53 +00002146>>> dis(pickle.dumps(T, 0))
Tim Petersd0f7c862003-01-28 15:27:57 +00002147 0: ( MARK
2148 1: ( MARK
2149 2: l LIST (MARK at 1)
2150 3: p PUT 0
2151 6: ( MARK
2152 7: g GET 0
2153 10: t TUPLE (MARK at 6)
2154 11: p PUT 1
2155 14: a APPEND
2156 15: 0 POP
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002157 16: 0 POP (MARK at 0)
2158 17: g GET 1
2159 20: . STOP
2160highest protocol among opcodes = 0
2161
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002162>>> dis(pickle.dumps(T, 1))
Tim Petersd0f7c862003-01-28 15:27:57 +00002163 0: ( MARK
2164 1: ] EMPTY_LIST
2165 2: q BINPUT 0
2166 4: ( MARK
2167 5: h BINGET 0
2168 7: t TUPLE (MARK at 4)
2169 8: q BINPUT 1
2170 10: a APPEND
2171 11: 1 POP_MARK (MARK at 0)
2172 12: h BINGET 1
2173 14: . STOP
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002174highest protocol among opcodes = 1
Tim Petersd0f7c862003-01-28 15:27:57 +00002175
2176Try protocol 2.
2177
2178>>> dis(pickle.dumps(L, 2))
2179 0: \x80 PROTO 2
2180 2: ] EMPTY_LIST
2181 3: q BINPUT 0
2182 5: h BINGET 0
2183 7: \x85 TUPLE1
2184 8: q BINPUT 1
2185 10: a APPEND
2186 11: . STOP
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002187highest protocol among opcodes = 2
Tim Petersd0f7c862003-01-28 15:27:57 +00002188
2189>>> dis(pickle.dumps(T, 2))
2190 0: \x80 PROTO 2
2191 2: ] EMPTY_LIST
2192 3: q BINPUT 0
2193 5: h BINGET 0
2194 7: \x85 TUPLE1
2195 8: q BINPUT 1
2196 10: a APPEND
2197 11: 0 POP
2198 12: h BINGET 1
2199 14: . STOP
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002200highest protocol among opcodes = 2
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002201"""
2202
Tim Peters62235e72003-02-05 19:55:53 +00002203_memo_test = r"""
2204>>> import pickle
Guido van Rossumcfe5f202007-05-08 21:26:54 +00002205>>> import io
2206>>> f = io.BytesIO()
Tim Peters62235e72003-02-05 19:55:53 +00002207>>> p = pickle.Pickler(f, 2)
2208>>> x = [1, 2, 3]
2209>>> p.dump(x)
2210>>> p.dump(x)
2211>>> f.seek(0)
Guido van Rossumcfe5f202007-05-08 21:26:54 +000022120
Tim Peters62235e72003-02-05 19:55:53 +00002213>>> memo = {}
2214>>> dis(f, memo=memo)
2215 0: \x80 PROTO 2
2216 2: ] EMPTY_LIST
2217 3: q BINPUT 0
2218 5: ( MARK
2219 6: K BININT1 1
2220 8: K BININT1 2
2221 10: K BININT1 3
2222 12: e APPENDS (MARK at 5)
2223 13: . STOP
2224highest protocol among opcodes = 2
2225>>> dis(f, memo=memo)
2226 14: \x80 PROTO 2
2227 16: h BINGET 0
2228 18: . STOP
2229highest protocol among opcodes = 2
2230"""
2231
Guido van Rossum57028352003-01-28 15:09:10 +00002232__test__ = {'disassembler_test': _dis_test,
Tim Peters62235e72003-02-05 19:55:53 +00002233 'disassembler_memo_test': _memo_test,
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002234 }
2235
2236def _test():
2237 import doctest
2238 return doctest.testmod()
2239
2240if __name__ == "__main__":
2241 _test()