blob: cf5df4158a0ac4fe4cd94d106e18a7d16a87ebc5 [file] [log] [blame]
Skip Montanaro54455942003-01-29 15:41:33 +00001'''"Executable documentation" for the pickle module.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002
3Extensive comments about the pickle protocols and pickle-machine opcodes
4can be found here. Some functions meant for external use:
5
6genops(pickle)
7 Generate all the opcodes in a pickle, as (opcode, arg, position) triples.
8
Andrew M. Kuchlingd0c53fe2004-08-07 16:51:30 +00009dis(pickle, out=None, memo=None, indentlevel=4)
Tim Peters8ecfc8e2003-01-27 18:51:48 +000010 Print a symbolic disassembly of a pickle.
Skip Montanaro54455942003-01-29 15:41:33 +000011'''
Tim Peters8ecfc8e2003-01-27 18:51:48 +000012
Walter Dörwald42748a82007-06-12 16:40:17 +000013import codecs
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +010014import io
Guido van Rossum98297ee2007-11-06 21:34:58 +000015import pickle
16import re
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -070017import sys
Walter Dörwald42748a82007-06-12 16:40:17 +000018
Christian Heimes3feef612008-02-11 06:19:17 +000019__all__ = ['dis', 'genops', 'optimize']
Tim Peters90cf2122004-11-06 23:45:48 +000020
Guido van Rossum98297ee2007-11-06 21:34:58 +000021bytes_types = pickle.bytes_types
22
Tim Peters8ecfc8e2003-01-27 18:51:48 +000023# Other ideas:
24#
25# - A pickle verifier: read a pickle and check it exhaustively for
Tim Petersc1c2b3e2003-01-29 20:12:21 +000026# well-formedness. dis() does a lot of this already.
Tim Peters8ecfc8e2003-01-27 18:51:48 +000027#
28# - A protocol identifier: examine a pickle and return its protocol number
29# (== the highest .proto attr value among all the opcodes in the pickle).
Tim Petersc1c2b3e2003-01-29 20:12:21 +000030# dis() already prints this info at the end.
Tim Peters8ecfc8e2003-01-27 18:51:48 +000031#
32# - A pickle optimizer: for example, tuple-building code is sometimes more
33# elaborate than necessary, catering for the possibility that the tuple
34# is recursive. Or lots of times a PUT is generated that's never accessed
35# by a later GET.
36
37
Victor Stinner765531d2013-03-26 01:11:54 +010038# "A pickle" is a program for a virtual pickle machine (PM, but more accurately
39# called an unpickling machine). It's a sequence of opcodes, interpreted by the
40# PM, building an arbitrarily complex Python object.
41#
42# For the most part, the PM is very simple: there are no looping, testing, or
43# conditional instructions, no arithmetic and no function calls. Opcodes are
44# executed once each, from first to last, until a STOP opcode is reached.
45#
46# The PM has two data areas, "the stack" and "the memo".
47#
48# Many opcodes push Python objects onto the stack; e.g., INT pushes a Python
49# integer object on the stack, whose value is gotten from a decimal string
50# literal immediately following the INT opcode in the pickle bytestream. Other
51# opcodes take Python objects off the stack. The result of unpickling is
52# whatever object is left on the stack when the final STOP opcode is executed.
53#
54# The memo is simply an array of objects, or it can be implemented as a dict
55# mapping little integers to objects. The memo serves as the PM's "long term
56# memory", and the little integers indexing the memo are akin to variable
57# names. Some opcodes pop a stack object into the memo at a given index,
58# and others push a memo object at a given index onto the stack again.
59#
60# At heart, that's all the PM has. Subtleties arise for these reasons:
61#
62# + Object identity. Objects can be arbitrarily complex, and subobjects
63# may be shared (for example, the list [a, a] refers to the same object a
64# twice). It can be vital that unpickling recreate an isomorphic object
65# graph, faithfully reproducing sharing.
66#
67# + Recursive objects. For example, after "L = []; L.append(L)", L is a
68# list, and L[0] is the same list. This is related to the object identity
69# point, and some sequences of pickle opcodes are subtle in order to
70# get the right result in all cases.
71#
72# + Things pickle doesn't know everything about. Examples of things pickle
73# does know everything about are Python's builtin scalar and container
74# types, like ints and tuples. They generally have opcodes dedicated to
75# them. For things like module references and instances of user-defined
76# classes, pickle's knowledge is limited. Historically, many enhancements
77# have been made to the pickle protocol in order to do a better (faster,
78# and/or more compact) job on those.
79#
80# + Backward compatibility and micro-optimization. As explained below,
81# pickle opcodes never go away, not even when better ways to do a thing
82# get invented. The repertoire of the PM just keeps growing over time.
83# For example, protocol 0 had two opcodes for building Python integers (INT
84# and LONG), protocol 1 added three more for more-efficient pickling of short
85# integers, and protocol 2 added two more for more-efficient pickling of
86# long integers (before protocol 2, the only ways to pickle a Python long
87# took time quadratic in the number of digits, for both pickling and
88# unpickling). "Opcode bloat" isn't so much a subtlety as a source of
89# wearying complication.
90#
91#
92# Pickle protocols:
93#
94# For compatibility, the meaning of a pickle opcode never changes. Instead new
95# pickle opcodes get added, and each version's unpickler can handle all the
96# pickle opcodes in all protocol versions to date. So old pickles continue to
97# be readable forever. The pickler can generally be told to restrict itself to
98# the subset of opcodes available under previous protocol versions too, so that
99# users can create pickles under the current version readable by older
100# versions. However, a pickle does not contain its version number embedded
101# within it. If an older unpickler tries to read a pickle using a later
102# protocol, the result is most likely an exception due to seeing an unknown (in
103# the older unpickler) opcode.
104#
105# The original pickle used what's now called "protocol 0", and what was called
106# "text mode" before Python 2.3. The entire pickle bytestream is made up of
107# printable 7-bit ASCII characters, plus the newline character, in protocol 0.
108# That's why it was called text mode. Protocol 0 is small and elegant, but
109# sometimes painfully inefficient.
110#
111# The second major set of additions is now called "protocol 1", and was called
112# "binary mode" before Python 2.3. This added many opcodes with arguments
113# consisting of arbitrary bytes, including NUL bytes and unprintable "high bit"
114# bytes. Binary mode pickles can be substantially smaller than equivalent
115# text mode pickles, and sometimes faster too; e.g., BININT represents a 4-byte
116# int as 4 bytes following the opcode, which is cheaper to unpickle than the
117# (perhaps) 11-character decimal string attached to INT. Protocol 1 also added
118# a number of opcodes that operate on many stack elements at once (like APPENDS
119# and SETITEMS), and "shortcut" opcodes (like EMPTY_DICT and EMPTY_TUPLE).
120#
121# The third major set of additions came in Python 2.3, and is called "protocol
122# 2". This added:
123#
124# - A better way to pickle instances of new-style classes (NEWOBJ).
125#
126# - A way for a pickle to identify its protocol (PROTO).
127#
128# - Time- and space- efficient pickling of long ints (LONG{1,4}).
129#
130# - Shortcuts for small tuples (TUPLE{1,2,3}}.
131#
132# - Dedicated opcodes for bools (NEWTRUE, NEWFALSE).
133#
134# - The "extension registry", a vector of popular objects that can be pushed
135# efficiently by index (EXT{1,2,4}). This is akin to the memo and GET, but
136# the registry contents are predefined (there's nothing akin to the memo's
137# PUT).
138#
139# Another independent change with Python 2.3 is the abandonment of any
140# pretense that it might be safe to load pickles received from untrusted
141# parties -- no sufficient security analysis has been done to guarantee
142# this and there isn't a use case that warrants the expense of such an
143# analysis.
144#
145# To this end, all tests for __safe_for_unpickling__ or for
146# copyreg.safe_constructors are removed from the unpickling code.
147# References to these variables in the descriptions below are to be seen
148# as describing unpickling in Python 2.2 and before.
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000149
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000150
151# Meta-rule: Descriptions are stored in instances of descriptor objects,
152# with plain constructors. No meta-language is defined from which
153# descriptors could be constructed. If you want, e.g., XML, write a little
154# program to generate XML from the objects.
155
156##############################################################################
157# Some pickle opcodes have an argument, following the opcode in the
158# bytestream. An argument is of a specific type, described by an instance
159# of ArgumentDescriptor. These are not to be confused with arguments taken
160# off the stack -- ArgumentDescriptor applies only to arguments embedded in
161# the opcode stream, immediately following an opcode.
162
163# Represents the number of bytes consumed by an argument delimited by the
164# next newline character.
165UP_TO_NEWLINE = -1
166
167# Represents the number of bytes consumed by a two-argument opcode where
168# the first argument gives the number of bytes in the second argument.
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -0700169TAKEN_FROM_ARGUMENT1 = -2 # num bytes is 1-byte unsigned int
170TAKEN_FROM_ARGUMENT4 = -3 # num bytes is 4-byte signed little-endian int
171TAKEN_FROM_ARGUMENT4U = -4 # num bytes is 4-byte unsigned little-endian int
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100172TAKEN_FROM_ARGUMENT8U = -5 # num bytes is 8-byte unsigned little-endian int
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000173
174class ArgumentDescriptor(object):
175 __slots__ = (
176 # name of descriptor record, also a module global name; a string
177 'name',
178
179 # length of argument, in bytes; an int; UP_TO_NEWLINE and
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100180 # TAKEN_FROM_ARGUMENT{1,4,8} are negative values for variable-length
Tim Petersfdb8cfa2003-01-28 00:13:19 +0000181 # cases
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000182 'n',
183
184 # a function taking a file-like object, reading this kind of argument
185 # from the object at the current position, advancing the current
186 # position by n bytes, and returning the value of the argument
187 'reader',
188
189 # human-readable docs for this arg descriptor; a string
190 'doc',
191 )
192
193 def __init__(self, name, n, reader, doc):
194 assert isinstance(name, str)
195 self.name = name
196
197 assert isinstance(n, int) and (n >= 0 or
Tim Petersfdb8cfa2003-01-28 00:13:19 +0000198 n in (UP_TO_NEWLINE,
199 TAKEN_FROM_ARGUMENT1,
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -0700200 TAKEN_FROM_ARGUMENT4,
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100201 TAKEN_FROM_ARGUMENT4U,
202 TAKEN_FROM_ARGUMENT8U))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000203 self.n = n
204
205 self.reader = reader
206
207 assert isinstance(doc, str)
208 self.doc = doc
209
210from struct import unpack as _unpack
211
212def read_uint1(f):
Tim Peters55762f52003-01-28 16:01:25 +0000213 r"""
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000214 >>> import io
215 >>> read_uint1(io.BytesIO(b'\xff'))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000216 255
217 """
218
219 data = f.read(1)
220 if data:
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000221 return data[0]
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000222 raise ValueError("not enough data in stream to read uint1")
223
224uint1 = ArgumentDescriptor(
225 name='uint1',
226 n=1,
227 reader=read_uint1,
228 doc="One-byte unsigned integer.")
229
230
231def read_uint2(f):
Tim Peters55762f52003-01-28 16:01:25 +0000232 r"""
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000233 >>> import io
234 >>> read_uint2(io.BytesIO(b'\xff\x00'))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000235 255
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000236 >>> read_uint2(io.BytesIO(b'\xff\xff'))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000237 65535
238 """
239
240 data = f.read(2)
241 if len(data) == 2:
242 return _unpack("<H", data)[0]
243 raise ValueError("not enough data in stream to read uint2")
244
245uint2 = ArgumentDescriptor(
246 name='uint2',
247 n=2,
248 reader=read_uint2,
249 doc="Two-byte unsigned integer, little-endian.")
250
251
252def read_int4(f):
Tim Peters55762f52003-01-28 16:01:25 +0000253 r"""
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000254 >>> import io
255 >>> read_int4(io.BytesIO(b'\xff\x00\x00\x00'))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000256 255
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000257 >>> read_int4(io.BytesIO(b'\x00\x00\x00\x80')) == -(2**31)
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000258 True
259 """
260
261 data = f.read(4)
262 if len(data) == 4:
263 return _unpack("<i", data)[0]
264 raise ValueError("not enough data in stream to read int4")
265
266int4 = ArgumentDescriptor(
267 name='int4',
268 n=4,
269 reader=read_int4,
270 doc="Four-byte signed integer, little-endian, 2's complement.")
271
272
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -0700273def read_uint4(f):
274 r"""
275 >>> import io
276 >>> read_uint4(io.BytesIO(b'\xff\x00\x00\x00'))
277 255
278 >>> read_uint4(io.BytesIO(b'\x00\x00\x00\x80')) == 2**31
279 True
280 """
281
282 data = f.read(4)
283 if len(data) == 4:
284 return _unpack("<I", data)[0]
285 raise ValueError("not enough data in stream to read uint4")
286
287uint4 = ArgumentDescriptor(
288 name='uint4',
289 n=4,
290 reader=read_uint4,
291 doc="Four-byte unsigned integer, little-endian.")
292
293
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100294def read_uint8(f):
295 r"""
296 >>> import io
297 >>> read_uint8(io.BytesIO(b'\xff\x00\x00\x00\x00\x00\x00\x00'))
298 255
299 >>> read_uint8(io.BytesIO(b'\xff' * 8)) == 2**64-1
300 True
301 """
302
303 data = f.read(8)
304 if len(data) == 8:
305 return _unpack("<Q", data)[0]
306 raise ValueError("not enough data in stream to read uint8")
307
308uint8 = ArgumentDescriptor(
309 name='uint8',
310 n=8,
311 reader=read_uint8,
312 doc="Eight-byte unsigned integer, little-endian.")
313
314
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000315def read_stringnl(f, decode=True, stripquotes=True):
Tim Peters55762f52003-01-28 16:01:25 +0000316 r"""
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000317 >>> import io
318 >>> read_stringnl(io.BytesIO(b"'abcd'\nefg\n"))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000319 'abcd'
320
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000321 >>> read_stringnl(io.BytesIO(b"\n"))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000322 Traceback (most recent call last):
323 ...
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000324 ValueError: no string quotes around b''
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000325
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000326 >>> read_stringnl(io.BytesIO(b"\n"), stripquotes=False)
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000327 ''
328
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000329 >>> read_stringnl(io.BytesIO(b"''\n"))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000330 ''
331
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000332 >>> read_stringnl(io.BytesIO(b'"abcd"'))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000333 Traceback (most recent call last):
334 ...
335 ValueError: no newline found when trying to read stringnl
336
337 Embedded escapes are undone in the result.
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000338 >>> read_stringnl(io.BytesIO(br"'a\n\\b\x00c\td'" + b"\n'e'"))
Tim Peters55762f52003-01-28 16:01:25 +0000339 'a\n\\b\x00c\td'
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000340 """
341
Guido van Rossum26986312007-07-17 00:19:46 +0000342 data = f.readline()
Guido van Rossum26d95c32007-08-27 23:18:54 +0000343 if not data.endswith(b'\n'):
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000344 raise ValueError("no newline found when trying to read stringnl")
345 data = data[:-1] # lose the newline
346
347 if stripquotes:
Guido van Rossum26d95c32007-08-27 23:18:54 +0000348 for q in (b'"', b"'"):
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000349 if data.startswith(q):
350 if not data.endswith(q):
351 raise ValueError("strinq quote %r not found at both "
352 "ends of %r" % (q, data))
353 data = data[1:-1]
354 break
355 else:
356 raise ValueError("no string quotes around %r" % data)
357
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000358 if decode:
Guido van Rossum98297ee2007-11-06 21:34:58 +0000359 data = codecs.escape_decode(data)[0].decode("ascii")
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000360 return data
361
362stringnl = ArgumentDescriptor(
363 name='stringnl',
364 n=UP_TO_NEWLINE,
365 reader=read_stringnl,
366 doc="""A newline-terminated string.
367
368 This is a repr-style string, with embedded escapes, and
369 bracketing quotes.
370 """)
371
372def read_stringnl_noescape(f):
Guido van Rossum98297ee2007-11-06 21:34:58 +0000373 return read_stringnl(f, stripquotes=False)
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000374
375stringnl_noescape = ArgumentDescriptor(
376 name='stringnl_noescape',
377 n=UP_TO_NEWLINE,
378 reader=read_stringnl_noescape,
379 doc="""A newline-terminated string.
380
381 This is a str-style string, without embedded escapes,
382 or bracketing quotes. It should consist solely of
383 printable ASCII characters.
384 """)
385
386def read_stringnl_noescape_pair(f):
Tim Peters55762f52003-01-28 16:01:25 +0000387 r"""
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000388 >>> import io
389 >>> read_stringnl_noescape_pair(io.BytesIO(b"Queue\nEmpty\njunk"))
Tim Petersd916cf42003-01-27 19:01:47 +0000390 'Queue Empty'
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000391 """
392
Tim Petersd916cf42003-01-27 19:01:47 +0000393 return "%s %s" % (read_stringnl_noescape(f), read_stringnl_noescape(f))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000394
395stringnl_noescape_pair = ArgumentDescriptor(
396 name='stringnl_noescape_pair',
397 n=UP_TO_NEWLINE,
398 reader=read_stringnl_noescape_pair,
399 doc="""A pair of newline-terminated strings.
400
401 These are str-style strings, without embedded
402 escapes, or bracketing quotes. They should
403 consist solely of printable ASCII characters.
404 The pair is returned as a single string, with
Tim Petersd916cf42003-01-27 19:01:47 +0000405 a single blank separating the two strings.
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000406 """)
407
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100408
409def read_string1(f):
410 r"""
411 >>> import io
412 >>> read_string1(io.BytesIO(b"\x00"))
413 ''
414 >>> read_string1(io.BytesIO(b"\x03abcdef"))
415 'abc'
416 """
417
418 n = read_uint1(f)
419 assert n >= 0
420 data = f.read(n)
421 if len(data) == n:
422 return data.decode("latin-1")
423 raise ValueError("expected %d bytes in a string1, but only %d remain" %
424 (n, len(data)))
425
426string1 = ArgumentDescriptor(
427 name="string1",
428 n=TAKEN_FROM_ARGUMENT1,
429 reader=read_string1,
430 doc="""A counted string.
431
432 The first argument is a 1-byte unsigned int giving the number
433 of bytes in the string, and the second argument is that many
434 bytes.
435 """)
436
437
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000438def read_string4(f):
Tim Peters55762f52003-01-28 16:01:25 +0000439 r"""
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000440 >>> import io
441 >>> read_string4(io.BytesIO(b"\x00\x00\x00\x00abc"))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000442 ''
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000443 >>> read_string4(io.BytesIO(b"\x03\x00\x00\x00abcdef"))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000444 'abc'
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000445 >>> read_string4(io.BytesIO(b"\x00\x00\x00\x03abcdef"))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000446 Traceback (most recent call last):
447 ...
448 ValueError: expected 50331648 bytes in a string4, but only 6 remain
449 """
450
451 n = read_int4(f)
452 if n < 0:
453 raise ValueError("string4 byte count < 0: %d" % n)
454 data = f.read(n)
455 if len(data) == n:
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000456 return data.decode("latin-1")
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000457 raise ValueError("expected %d bytes in a string4, but only %d remain" %
458 (n, len(data)))
459
460string4 = ArgumentDescriptor(
461 name="string4",
Tim Petersfdb8cfa2003-01-28 00:13:19 +0000462 n=TAKEN_FROM_ARGUMENT4,
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000463 reader=read_string4,
464 doc="""A counted string.
465
466 The first argument is a 4-byte little-endian signed int giving
467 the number of bytes in the string, and the second argument is
468 that many bytes.
469 """)
470
471
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100472def read_bytes1(f):
Tim Peters55762f52003-01-28 16:01:25 +0000473 r"""
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000474 >>> import io
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100475 >>> read_bytes1(io.BytesIO(b"\x00"))
476 b''
477 >>> read_bytes1(io.BytesIO(b"\x03abcdef"))
478 b'abc'
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000479 """
480
481 n = read_uint1(f)
482 assert n >= 0
483 data = f.read(n)
484 if len(data) == n:
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100485 return data
486 raise ValueError("expected %d bytes in a bytes1, but only %d remain" %
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000487 (n, len(data)))
488
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100489bytes1 = ArgumentDescriptor(
490 name="bytes1",
Tim Petersfdb8cfa2003-01-28 00:13:19 +0000491 n=TAKEN_FROM_ARGUMENT1,
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100492 reader=read_bytes1,
493 doc="""A counted bytes string.
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000494
495 The first argument is a 1-byte unsigned int giving the number
496 of bytes in the string, and the second argument is that many
497 bytes.
498 """)
499
500
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -0700501def read_bytes1(f):
502 r"""
503 >>> import io
504 >>> read_bytes1(io.BytesIO(b"\x00"))
505 b''
506 >>> read_bytes1(io.BytesIO(b"\x03abcdef"))
507 b'abc'
508 """
509
510 n = read_uint1(f)
511 assert n >= 0
512 data = f.read(n)
513 if len(data) == n:
514 return data
515 raise ValueError("expected %d bytes in a bytes1, but only %d remain" %
516 (n, len(data)))
517
518bytes1 = ArgumentDescriptor(
519 name="bytes1",
520 n=TAKEN_FROM_ARGUMENT1,
521 reader=read_bytes1,
522 doc="""A counted bytes string.
523
524 The first argument is a 1-byte unsigned int giving the number
525 of bytes, and the second argument is that many bytes.
526 """)
527
528
529def read_bytes4(f):
530 r"""
531 >>> import io
532 >>> read_bytes4(io.BytesIO(b"\x00\x00\x00\x00abc"))
533 b''
534 >>> read_bytes4(io.BytesIO(b"\x03\x00\x00\x00abcdef"))
535 b'abc'
536 >>> read_bytes4(io.BytesIO(b"\x00\x00\x00\x03abcdef"))
537 Traceback (most recent call last):
538 ...
539 ValueError: expected 50331648 bytes in a bytes4, but only 6 remain
540 """
541
542 n = read_uint4(f)
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100543 assert n >= 0
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -0700544 if n > sys.maxsize:
545 raise ValueError("bytes4 byte count > sys.maxsize: %d" % n)
546 data = f.read(n)
547 if len(data) == n:
548 return data
549 raise ValueError("expected %d bytes in a bytes4, but only %d remain" %
550 (n, len(data)))
551
552bytes4 = ArgumentDescriptor(
553 name="bytes4",
554 n=TAKEN_FROM_ARGUMENT4U,
555 reader=read_bytes4,
556 doc="""A counted bytes string.
557
558 The first argument is a 4-byte little-endian unsigned int giving
559 the number of bytes, and the second argument is that many bytes.
560 """)
561
562
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100563def read_bytes8(f):
564 r"""
Gregory P. Smith057e58d2013-11-23 20:40:46 +0000565 >>> import io, struct, sys
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100566 >>> read_bytes8(io.BytesIO(b"\x00\x00\x00\x00\x00\x00\x00\x00abc"))
567 b''
568 >>> read_bytes8(io.BytesIO(b"\x03\x00\x00\x00\x00\x00\x00\x00abcdef"))
569 b'abc'
Gregory P. Smith057e58d2013-11-23 20:40:46 +0000570 >>> bigsize8 = struct.pack("<Q", sys.maxsize//3)
571 >>> read_bytes8(io.BytesIO(bigsize8 + b"abcdef")) #doctest: +ELLIPSIS
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100572 Traceback (most recent call last):
573 ...
Gregory P. Smith057e58d2013-11-23 20:40:46 +0000574 ValueError: expected ... bytes in a bytes8, but only 6 remain
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100575 """
576
577 n = read_uint8(f)
578 assert n >= 0
579 if n > sys.maxsize:
580 raise ValueError("bytes8 byte count > sys.maxsize: %d" % n)
581 data = f.read(n)
582 if len(data) == n:
583 return data
584 raise ValueError("expected %d bytes in a bytes8, but only %d remain" %
585 (n, len(data)))
586
587bytes8 = ArgumentDescriptor(
588 name="bytes8",
589 n=TAKEN_FROM_ARGUMENT8U,
590 reader=read_bytes8,
591 doc="""A counted bytes string.
592
593 The first argument is a 8-byte little-endian unsigned int giving
594 the number of bytes, and the second argument is that many bytes.
595 """)
596
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000597def read_unicodestringnl(f):
Tim Peters55762f52003-01-28 16:01:25 +0000598 r"""
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000599 >>> import io
600 >>> read_unicodestringnl(io.BytesIO(b"abc\\uabcd\njunk")) == 'abc\uabcd'
601 True
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000602 """
603
Guido van Rossum26986312007-07-17 00:19:46 +0000604 data = f.readline()
Guido van Rossum26d95c32007-08-27 23:18:54 +0000605 if not data.endswith(b'\n'):
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000606 raise ValueError("no newline found when trying to read "
607 "unicodestringnl")
608 data = data[:-1] # lose the newline
Guido van Rossumef87d6e2007-05-02 19:09:54 +0000609 return str(data, 'raw-unicode-escape')
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000610
611unicodestringnl = ArgumentDescriptor(
612 name='unicodestringnl',
613 n=UP_TO_NEWLINE,
614 reader=read_unicodestringnl,
615 doc="""A newline-terminated Unicode string.
616
617 This is raw-unicode-escape encoded, so consists of
618 printable ASCII characters, and may contain embedded
619 escape sequences.
620 """)
621
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100622
623def read_unicodestring1(f):
624 r"""
625 >>> import io
626 >>> s = 'abcd\uabcd'
627 >>> enc = s.encode('utf-8')
628 >>> enc
629 b'abcd\xea\xaf\x8d'
630 >>> n = bytes([len(enc)]) # little-endian 1-byte length
631 >>> t = read_unicodestring1(io.BytesIO(n + enc + b'junk'))
632 >>> s == t
633 True
634
635 >>> read_unicodestring1(io.BytesIO(n + enc[:-1]))
636 Traceback (most recent call last):
637 ...
638 ValueError: expected 7 bytes in a unicodestring1, but only 6 remain
639 """
640
641 n = read_uint1(f)
642 assert n >= 0
643 data = f.read(n)
644 if len(data) == n:
645 return str(data, 'utf-8', 'surrogatepass')
646 raise ValueError("expected %d bytes in a unicodestring1, but only %d "
647 "remain" % (n, len(data)))
648
649unicodestring1 = ArgumentDescriptor(
650 name="unicodestring1",
651 n=TAKEN_FROM_ARGUMENT1,
652 reader=read_unicodestring1,
653 doc="""A counted Unicode string.
654
655 The first argument is a 1-byte little-endian signed int
656 giving the number of bytes in the string, and the second
657 argument-- the UTF-8 encoding of the Unicode string --
658 contains that many bytes.
659 """)
660
661
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000662def read_unicodestring4(f):
Tim Peters55762f52003-01-28 16:01:25 +0000663 r"""
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000664 >>> import io
665 >>> s = 'abcd\uabcd'
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000666 >>> enc = s.encode('utf-8')
667 >>> enc
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000668 b'abcd\xea\xaf\x8d'
669 >>> n = bytes([len(enc), 0, 0, 0]) # little-endian 4-byte length
670 >>> t = read_unicodestring4(io.BytesIO(n + enc + b'junk'))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000671 >>> s == t
672 True
673
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000674 >>> read_unicodestring4(io.BytesIO(n + enc[:-1]))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000675 Traceback (most recent call last):
676 ...
677 ValueError: expected 7 bytes in a unicodestring4, but only 6 remain
678 """
679
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -0700680 n = read_uint4(f)
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100681 assert n >= 0
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -0700682 if n > sys.maxsize:
683 raise ValueError("unicodestring4 byte count > sys.maxsize: %d" % n)
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000684 data = f.read(n)
685 if len(data) == n:
Victor Stinner485fb562010-04-13 11:07:24 +0000686 return str(data, 'utf-8', 'surrogatepass')
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000687 raise ValueError("expected %d bytes in a unicodestring4, but only %d "
688 "remain" % (n, len(data)))
689
690unicodestring4 = ArgumentDescriptor(
691 name="unicodestring4",
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -0700692 n=TAKEN_FROM_ARGUMENT4U,
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000693 reader=read_unicodestring4,
694 doc="""A counted Unicode string.
695
696 The first argument is a 4-byte little-endian signed int
697 giving the number of bytes in the string, and the second
698 argument-- the UTF-8 encoding of the Unicode string --
699 contains that many bytes.
700 """)
701
702
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100703def read_unicodestring8(f):
704 r"""
705 >>> import io
706 >>> s = 'abcd\uabcd'
707 >>> enc = s.encode('utf-8')
708 >>> enc
709 b'abcd\xea\xaf\x8d'
710 >>> n = bytes([len(enc)]) + bytes(7) # little-endian 8-byte length
711 >>> t = read_unicodestring8(io.BytesIO(n + enc + b'junk'))
712 >>> s == t
713 True
714
715 >>> read_unicodestring8(io.BytesIO(n + enc[:-1]))
716 Traceback (most recent call last):
717 ...
718 ValueError: expected 7 bytes in a unicodestring8, but only 6 remain
719 """
720
721 n = read_uint8(f)
722 assert n >= 0
723 if n > sys.maxsize:
724 raise ValueError("unicodestring8 byte count > sys.maxsize: %d" % n)
725 data = f.read(n)
726 if len(data) == n:
727 return str(data, 'utf-8', 'surrogatepass')
728 raise ValueError("expected %d bytes in a unicodestring8, but only %d "
729 "remain" % (n, len(data)))
730
731unicodestring8 = ArgumentDescriptor(
732 name="unicodestring8",
733 n=TAKEN_FROM_ARGUMENT8U,
734 reader=read_unicodestring8,
735 doc="""A counted Unicode string.
736
737 The first argument is a 8-byte little-endian signed int
738 giving the number of bytes in the string, and the second
739 argument-- the UTF-8 encoding of the Unicode string --
740 contains that many bytes.
741 """)
742
743
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000744def read_decimalnl_short(f):
Tim Peters55762f52003-01-28 16:01:25 +0000745 r"""
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000746 >>> import io
747 >>> read_decimalnl_short(io.BytesIO(b"1234\n56"))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000748 1234
749
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000750 >>> read_decimalnl_short(io.BytesIO(b"1234L\n56"))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000751 Traceback (most recent call last):
752 ...
Serhiy Storchaka95949422013-08-27 19:40:23 +0300753 ValueError: invalid literal for int() with base 10: b'1234L'
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000754 """
755
756 s = read_stringnl(f, decode=False, stripquotes=False)
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000757
Serhiy Storchaka95949422013-08-27 19:40:23 +0300758 # There's a hack for True and False here.
Jeremy Hyltona5dc3db2007-08-29 19:07:40 +0000759 if s == b"00":
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000760 return False
Jeremy Hyltona5dc3db2007-08-29 19:07:40 +0000761 elif s == b"01":
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000762 return True
763
Florent Xicluna2bb96f52011-10-23 22:11:00 +0200764 return int(s)
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000765
766def read_decimalnl_long(f):
Tim Peters55762f52003-01-28 16:01:25 +0000767 r"""
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000768 >>> import io
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000769
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000770 >>> read_decimalnl_long(io.BytesIO(b"1234L\n56"))
Guido van Rossume2b70bc2006-08-18 22:13:04 +0000771 1234
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000772
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000773 >>> read_decimalnl_long(io.BytesIO(b"123456789012345678901234L\n6"))
Guido van Rossume2b70bc2006-08-18 22:13:04 +0000774 123456789012345678901234
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000775 """
776
777 s = read_stringnl(f, decode=False, stripquotes=False)
Mark Dickinson8dd05142009-01-20 20:43:58 +0000778 if s[-1:] == b'L':
779 s = s[:-1]
Guido van Rossume2a383d2007-01-15 16:59:06 +0000780 return int(s)
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000781
782
783decimalnl_short = ArgumentDescriptor(
784 name='decimalnl_short',
785 n=UP_TO_NEWLINE,
786 reader=read_decimalnl_short,
787 doc="""A newline-terminated decimal integer literal.
788
789 This never has a trailing 'L', and the integer fit
790 in a short Python int on the box where the pickle
791 was written -- but there's no guarantee it will fit
792 in a short Python int on the box where the pickle
793 is read.
794 """)
795
796decimalnl_long = ArgumentDescriptor(
797 name='decimalnl_long',
798 n=UP_TO_NEWLINE,
799 reader=read_decimalnl_long,
800 doc="""A newline-terminated decimal integer literal.
801
802 This has a trailing 'L', and can represent integers
803 of any size.
804 """)
805
806
807def read_floatnl(f):
Tim Peters55762f52003-01-28 16:01:25 +0000808 r"""
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000809 >>> import io
810 >>> read_floatnl(io.BytesIO(b"-1.25\n6"))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000811 -1.25
812 """
813 s = read_stringnl(f, decode=False, stripquotes=False)
814 return float(s)
815
816floatnl = ArgumentDescriptor(
817 name='floatnl',
818 n=UP_TO_NEWLINE,
819 reader=read_floatnl,
820 doc="""A newline-terminated decimal floating literal.
821
822 In general this requires 17 significant digits for roundtrip
823 identity, and pickling then unpickling infinities, NaNs, and
824 minus zero doesn't work across boxes, or on some boxes even
825 on itself (e.g., Windows can't read the strings it produces
826 for infinities or NaNs).
827 """)
828
829def read_float8(f):
Tim Peters55762f52003-01-28 16:01:25 +0000830 r"""
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000831 >>> import io, struct
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000832 >>> raw = struct.pack(">d", -1.25)
833 >>> raw
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000834 b'\xbf\xf4\x00\x00\x00\x00\x00\x00'
835 >>> read_float8(io.BytesIO(raw + b"\n"))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000836 -1.25
837 """
838
839 data = f.read(8)
840 if len(data) == 8:
841 return _unpack(">d", data)[0]
842 raise ValueError("not enough data in stream to read float8")
843
844
845float8 = ArgumentDescriptor(
846 name='float8',
847 n=8,
848 reader=read_float8,
849 doc="""An 8-byte binary representation of a float, big-endian.
850
851 The format is unique to Python, and shared with the struct
Guido van Rossum99603b02007-07-20 00:22:32 +0000852 module (format string '>d') "in theory" (the struct and pickle
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000853 implementations don't share the code -- they should). It's
854 strongly related to the IEEE-754 double format, and, in normal
855 cases, is in fact identical to the big-endian 754 double format.
856 On other boxes the dynamic range is limited to that of a 754
857 double, and "add a half and chop" rounding is used to reduce
858 the precision to 53 bits. However, even on a 754 box,
859 infinities, NaNs, and minus zero may not be handled correctly
860 (may not survive roundtrip pickling intact).
861 """)
862
Guido van Rossum5a2d8f52003-01-27 21:44:25 +0000863# Protocol 2 formats
864
Tim Petersc0c12b52003-01-29 00:56:17 +0000865from pickle import decode_long
Guido van Rossum5a2d8f52003-01-27 21:44:25 +0000866
867def read_long1(f):
868 r"""
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000869 >>> import io
870 >>> read_long1(io.BytesIO(b"\x00"))
Guido van Rossume2b70bc2006-08-18 22:13:04 +0000871 0
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000872 >>> read_long1(io.BytesIO(b"\x02\xff\x00"))
Guido van Rossume2b70bc2006-08-18 22:13:04 +0000873 255
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000874 >>> read_long1(io.BytesIO(b"\x02\xff\x7f"))
Guido van Rossume2b70bc2006-08-18 22:13:04 +0000875 32767
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000876 >>> read_long1(io.BytesIO(b"\x02\x00\xff"))
Guido van Rossume2b70bc2006-08-18 22:13:04 +0000877 -256
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000878 >>> read_long1(io.BytesIO(b"\x02\x00\x80"))
Guido van Rossume2b70bc2006-08-18 22:13:04 +0000879 -32768
Guido van Rossum5a2d8f52003-01-27 21:44:25 +0000880 """
881
882 n = read_uint1(f)
883 data = f.read(n)
884 if len(data) != n:
885 raise ValueError("not enough data in stream to read long1")
886 return decode_long(data)
887
888long1 = ArgumentDescriptor(
889 name="long1",
Tim Petersfdb8cfa2003-01-28 00:13:19 +0000890 n=TAKEN_FROM_ARGUMENT1,
Guido van Rossum5a2d8f52003-01-27 21:44:25 +0000891 reader=read_long1,
892 doc="""A binary long, little-endian, using 1-byte size.
893
894 This first reads one byte as an unsigned size, then reads that
Tim Petersbdbe7412003-01-27 23:54:04 +0000895 many bytes and interprets them as a little-endian 2's-complement long.
Tim Peters4b23f2b2003-01-31 16:43:39 +0000896 If the size is 0, that's taken as a shortcut for the long 0L.
Guido van Rossum5a2d8f52003-01-27 21:44:25 +0000897 """)
898
Guido van Rossum5a2d8f52003-01-27 21:44:25 +0000899def read_long4(f):
900 r"""
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000901 >>> import io
902 >>> read_long4(io.BytesIO(b"\x02\x00\x00\x00\xff\x00"))
Guido van Rossume2b70bc2006-08-18 22:13:04 +0000903 255
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000904 >>> read_long4(io.BytesIO(b"\x02\x00\x00\x00\xff\x7f"))
Guido van Rossume2b70bc2006-08-18 22:13:04 +0000905 32767
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000906 >>> read_long4(io.BytesIO(b"\x02\x00\x00\x00\x00\xff"))
Guido van Rossume2b70bc2006-08-18 22:13:04 +0000907 -256
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000908 >>> read_long4(io.BytesIO(b"\x02\x00\x00\x00\x00\x80"))
Guido van Rossume2b70bc2006-08-18 22:13:04 +0000909 -32768
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000910 >>> read_long1(io.BytesIO(b"\x00\x00\x00\x00"))
Guido van Rossume2b70bc2006-08-18 22:13:04 +0000911 0
Guido van Rossum5a2d8f52003-01-27 21:44:25 +0000912 """
913
914 n = read_int4(f)
915 if n < 0:
Neal Norwitz784a3f52003-01-28 00:20:41 +0000916 raise ValueError("long4 byte count < 0: %d" % n)
Guido van Rossum5a2d8f52003-01-27 21:44:25 +0000917 data = f.read(n)
918 if len(data) != n:
Neal Norwitz784a3f52003-01-28 00:20:41 +0000919 raise ValueError("not enough data in stream to read long4")
Guido van Rossum5a2d8f52003-01-27 21:44:25 +0000920 return decode_long(data)
921
922long4 = ArgumentDescriptor(
923 name="long4",
Tim Petersfdb8cfa2003-01-28 00:13:19 +0000924 n=TAKEN_FROM_ARGUMENT4,
Guido van Rossum5a2d8f52003-01-27 21:44:25 +0000925 reader=read_long4,
926 doc="""A binary representation of a long, little-endian.
927
928 This first reads four bytes as a signed size (but requires the
929 size to be >= 0), then reads that many bytes and interprets them
Tim Peters4b23f2b2003-01-31 16:43:39 +0000930 as a little-endian 2's-complement long. If the size is 0, that's taken
Guido van Rossume2a383d2007-01-15 16:59:06 +0000931 as a shortcut for the int 0, although LONG1 should really be used
Tim Peters4b23f2b2003-01-31 16:43:39 +0000932 then instead (and in any case where # of bytes < 256).
Guido van Rossum5a2d8f52003-01-27 21:44:25 +0000933 """)
934
935
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000936##############################################################################
937# Object descriptors. The stack used by the pickle machine holds objects,
938# and in the stack_before and stack_after attributes of OpcodeInfo
939# descriptors we need names to describe the various types of objects that can
940# appear on the stack.
941
942class StackObject(object):
943 __slots__ = (
944 # name of descriptor record, for info only
945 'name',
946
947 # type of object, or tuple of type objects (meaning the object can
948 # be of any type in the tuple)
949 'obtype',
950
951 # human-readable docs for this kind of stack object; a string
952 'doc',
953 )
954
955 def __init__(self, name, obtype, doc):
Guido van Rossum3172c5d2007-10-16 18:12:55 +0000956 assert isinstance(name, str)
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000957 self.name = name
958
959 assert isinstance(obtype, type) or isinstance(obtype, tuple)
960 if isinstance(obtype, tuple):
961 for contained in obtype:
962 assert isinstance(contained, type)
963 self.obtype = obtype
964
Guido van Rossum3172c5d2007-10-16 18:12:55 +0000965 assert isinstance(doc, str)
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000966 self.doc = doc
967
Tim Petersc1c2b3e2003-01-29 20:12:21 +0000968 def __repr__(self):
969 return self.name
970
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000971
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -0800972pyint = pylong = StackObject(
973 name='int',
974 obtype=int,
975 doc="A Python integer object.")
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000976
977pyinteger_or_bool = StackObject(
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -0800978 name='int_or_bool',
979 obtype=(int, bool),
980 doc="A Python integer or boolean object.")
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000981
Guido van Rossum5a2d8f52003-01-27 21:44:25 +0000982pybool = StackObject(
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -0800983 name='bool',
984 obtype=bool,
985 doc="A Python boolean object.")
Guido van Rossum5a2d8f52003-01-27 21:44:25 +0000986
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000987pyfloat = StackObject(
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -0800988 name='float',
989 obtype=float,
990 doc="A Python float object.")
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000991
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -0800992pybytes_or_str = pystring = StackObject(
993 name='bytes_or_str',
994 obtype=(bytes, str),
995 doc="A Python bytes or (Unicode) string object.")
Guido van Rossumf4169812008-03-17 22:56:06 +0000996
997pybytes = StackObject(
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -0800998 name='bytes',
999 obtype=bytes,
1000 doc="A Python bytes object.")
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001001
1002pyunicode = StackObject(
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -08001003 name='str',
1004 obtype=str,
1005 doc="A Python (Unicode) string object.")
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001006
1007pynone = StackObject(
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -08001008 name="None",
1009 obtype=type(None),
1010 doc="The Python None object.")
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001011
1012pytuple = StackObject(
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -08001013 name="tuple",
1014 obtype=tuple,
1015 doc="A Python tuple object.")
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001016
1017pylist = StackObject(
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -08001018 name="list",
1019 obtype=list,
1020 doc="A Python list object.")
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001021
1022pydict = StackObject(
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -08001023 name="dict",
1024 obtype=dict,
1025 doc="A Python dict object.")
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001026
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01001027pyset = StackObject(
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -08001028 name="set",
1029 obtype=set,
1030 doc="A Python set object.")
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01001031
1032pyfrozenset = StackObject(
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -08001033 name="frozenset",
1034 obtype=set,
1035 doc="A Python frozenset object.")
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01001036
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001037anyobject = StackObject(
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -08001038 name='any',
1039 obtype=object,
1040 doc="Any kind of object whatsoever.")
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001041
1042markobject = StackObject(
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -08001043 name="mark",
1044 obtype=StackObject,
1045 doc="""'The mark' is a unique object.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001046
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -08001047Opcodes that operate on a variable number of objects
1048generally don't embed the count of objects in the opcode,
1049or pull it off the stack. Instead the MARK opcode is used
1050to push a special marker object on the stack, and then
1051some other opcodes grab all the objects from the top of
1052the stack down to (but not including) the topmost marker
1053object.
1054""")
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001055
1056stackslice = StackObject(
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -08001057 name="stackslice",
1058 obtype=StackObject,
1059 doc="""An object representing a contiguous slice of the stack.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001060
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -08001061This is used in conjunction with markobject, to represent all
1062of the stack following the topmost markobject. For example,
1063the POP_MARK opcode changes the stack from
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001064
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -08001065 [..., markobject, stackslice]
1066to
1067 [...]
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001068
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -08001069No matter how many object are on the stack after the topmost
1070markobject, POP_MARK gets rid of all of them (including the
1071topmost markobject too).
1072""")
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001073
1074##############################################################################
1075# Descriptors for pickle opcodes.
1076
1077class OpcodeInfo(object):
1078
1079 __slots__ = (
1080 # symbolic name of opcode; a string
1081 'name',
1082
1083 # the code used in a bytestream to represent the opcode; a
1084 # one-character string
1085 'code',
1086
1087 # If the opcode has an argument embedded in the byte string, an
1088 # instance of ArgumentDescriptor specifying its type. Note that
1089 # arg.reader(s) can be used to read and decode the argument from
1090 # the bytestream s, and arg.doc documents the format of the raw
1091 # argument bytes. If the opcode doesn't have an argument embedded
1092 # in the bytestream, arg should be None.
1093 'arg',
1094
1095 # what the stack looks like before this opcode runs; a list
1096 'stack_before',
1097
1098 # what the stack looks like after this opcode runs; a list
1099 'stack_after',
1100
1101 # the protocol number in which this opcode was introduced; an int
1102 'proto',
1103
1104 # human-readable docs for this opcode; a string
1105 'doc',
1106 )
1107
1108 def __init__(self, name, code, arg,
1109 stack_before, stack_after, proto, doc):
Guido van Rossum3172c5d2007-10-16 18:12:55 +00001110 assert isinstance(name, str)
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001111 self.name = name
1112
Guido van Rossum3172c5d2007-10-16 18:12:55 +00001113 assert isinstance(code, str)
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001114 assert len(code) == 1
1115 self.code = code
1116
1117 assert arg is None or isinstance(arg, ArgumentDescriptor)
1118 self.arg = arg
1119
1120 assert isinstance(stack_before, list)
1121 for x in stack_before:
1122 assert isinstance(x, StackObject)
1123 self.stack_before = stack_before
1124
1125 assert isinstance(stack_after, list)
1126 for x in stack_after:
1127 assert isinstance(x, StackObject)
1128 self.stack_after = stack_after
1129
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -07001130 assert isinstance(proto, int) and 0 <= proto <= pickle.HIGHEST_PROTOCOL
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001131 self.proto = proto
1132
Guido van Rossum3172c5d2007-10-16 18:12:55 +00001133 assert isinstance(doc, str)
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001134 self.doc = doc
1135
1136I = OpcodeInfo
1137opcodes = [
1138
1139 # Ways to spell integers.
1140
1141 I(name='INT',
1142 code='I',
1143 arg=decimalnl_short,
1144 stack_before=[],
1145 stack_after=[pyinteger_or_bool],
1146 proto=0,
1147 doc="""Push an integer or bool.
1148
1149 The argument is a newline-terminated decimal literal string.
1150
1151 The intent may have been that this always fit in a short Python int,
1152 but INT can be generated in pickles written on a 64-bit box that
1153 require a Python long on a 32-bit box. The difference between this
1154 and LONG then is that INT skips a trailing 'L', and produces a short
1155 int whenever possible.
1156
1157 Another difference is due to that, when bool was introduced as a
1158 distinct type in 2.3, builtin names True and False were also added to
1159 2.2.2, mapping to ints 1 and 0. For compatibility in both directions,
1160 True gets pickled as INT + "I01\\n", and False as INT + "I00\\n".
1161 Leading zeroes are never produced for a genuine integer. The 2.3
1162 (and later) unpicklers special-case these and return bool instead;
1163 earlier unpicklers ignore the leading "0" and return the int.
1164 """),
1165
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001166 I(name='BININT',
1167 code='J',
1168 arg=int4,
1169 stack_before=[],
1170 stack_after=[pyint],
1171 proto=1,
1172 doc="""Push a four-byte signed integer.
1173
1174 This handles the full range of Python (short) integers on a 32-bit
1175 box, directly as binary bytes (1 for the opcode and 4 for the integer).
1176 If the integer is non-negative and fits in 1 or 2 bytes, pickling via
1177 BININT1 or BININT2 saves space.
1178 """),
1179
1180 I(name='BININT1',
1181 code='K',
1182 arg=uint1,
1183 stack_before=[],
1184 stack_after=[pyint],
1185 proto=1,
1186 doc="""Push a one-byte unsigned integer.
1187
1188 This is a space optimization for pickling very small non-negative ints,
1189 in range(256).
1190 """),
1191
1192 I(name='BININT2',
1193 code='M',
1194 arg=uint2,
1195 stack_before=[],
1196 stack_after=[pyint],
1197 proto=1,
1198 doc="""Push a two-byte unsigned integer.
1199
1200 This is a space optimization for pickling small positive ints, in
1201 range(256, 2**16). Integers in range(256) can also be pickled via
1202 BININT2, but BININT1 instead saves a byte.
1203 """),
1204
Tim Petersfdc03462003-01-28 04:56:33 +00001205 I(name='LONG',
1206 code='L',
1207 arg=decimalnl_long,
1208 stack_before=[],
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -08001209 stack_after=[pyint],
Tim Petersfdc03462003-01-28 04:56:33 +00001210 proto=0,
1211 doc="""Push a long integer.
1212
1213 The same as INT, except that the literal ends with 'L', and always
1214 unpickles to a Python long. There doesn't seem a real purpose to the
1215 trailing 'L'.
1216
1217 Note that LONG takes time quadratic in the number of digits when
1218 unpickling (this is simply due to the nature of decimal->binary
1219 conversion). Proto 2 added linear-time (in C; still quadratic-time
1220 in Python) LONG1 and LONG4 opcodes.
1221 """),
1222
1223 I(name="LONG1",
1224 code='\x8a',
1225 arg=long1,
1226 stack_before=[],
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -08001227 stack_after=[pyint],
Tim Petersfdc03462003-01-28 04:56:33 +00001228 proto=2,
1229 doc="""Long integer using one-byte length.
1230
1231 A more efficient encoding of a Python long; the long1 encoding
1232 says it all."""),
1233
1234 I(name="LONG4",
1235 code='\x8b',
1236 arg=long4,
1237 stack_before=[],
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -08001238 stack_after=[pyint],
Tim Petersfdc03462003-01-28 04:56:33 +00001239 proto=2,
1240 doc="""Long integer using found-byte length.
1241
1242 A more efficient encoding of a Python long; the long4 encoding
1243 says it all."""),
1244
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001245 # Ways to spell strings (8-bit, not Unicode).
1246
1247 I(name='STRING',
1248 code='S',
1249 arg=stringnl,
1250 stack_before=[],
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -08001251 stack_after=[pybytes_or_str],
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001252 proto=0,
1253 doc="""Push a Python string object.
1254
1255 The argument is a repr-style string, with bracketing quote characters,
1256 and perhaps embedded escapes. The argument extends until the next
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -08001257 newline character. These are usually decoded into a str instance
Guido van Rossumf4169812008-03-17 22:56:06 +00001258 using the encoding given to the Unpickler constructor. or the default,
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -08001259 'ASCII'. If the encoding given was 'bytes' however, they will be
1260 decoded as bytes object instead.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001261 """),
1262
1263 I(name='BINSTRING',
1264 code='T',
1265 arg=string4,
1266 stack_before=[],
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -08001267 stack_after=[pybytes_or_str],
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001268 proto=1,
1269 doc="""Push a Python string object.
1270
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -08001271 There are two arguments: the first is a 4-byte little-endian
1272 signed int giving the number of bytes in the string, and the
1273 second is that many bytes, which are taken literally as the string
1274 content. These are usually decoded into a str instance using the
1275 encoding given to the Unpickler constructor. or the default,
1276 'ASCII'. If the encoding given was 'bytes' however, they will be
1277 decoded as bytes object instead.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001278 """),
1279
1280 I(name='SHORT_BINSTRING',
1281 code='U',
1282 arg=string1,
1283 stack_before=[],
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -08001284 stack_after=[pybytes_or_str],
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001285 proto=1,
1286 doc="""Push a Python string object.
1287
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -08001288 There are two arguments: the first is a 1-byte unsigned int giving
1289 the number of bytes in the string, and the second is that many
1290 bytes, which are taken literally as the string content. These are
1291 usually decoded into a str instance using the encoding given to
1292 the Unpickler constructor. or the default, 'ASCII'. If the
1293 encoding given was 'bytes' however, they will be decoded as bytes
1294 object instead.
Guido van Rossumf4169812008-03-17 22:56:06 +00001295 """),
1296
1297 # Bytes (protocol 3 only; older protocols don't support bytes at all)
1298
1299 I(name='BINBYTES',
1300 code='B',
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -07001301 arg=bytes4,
Guido van Rossumf4169812008-03-17 22:56:06 +00001302 stack_before=[],
1303 stack_after=[pybytes],
1304 proto=3,
1305 doc="""Push a Python bytes object.
1306
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -07001307 There are two arguments: the first is a 4-byte little-endian unsigned int
1308 giving the number of bytes, and the second is that many bytes, which are
1309 taken literally as the bytes content.
Guido van Rossumf4169812008-03-17 22:56:06 +00001310 """),
1311
1312 I(name='SHORT_BINBYTES',
1313 code='C',
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -07001314 arg=bytes1,
Guido van Rossumf4169812008-03-17 22:56:06 +00001315 stack_before=[],
1316 stack_after=[pybytes],
Collin Wintere61d4372009-05-20 17:46:47 +00001317 proto=3,
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -07001318 doc="""Push a Python bytes object.
Guido van Rossumf4169812008-03-17 22:56:06 +00001319
1320 There are two arguments: the first is a 1-byte unsigned int giving
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -07001321 the number of bytes, and the second is that many bytes, which are taken
1322 literally as the string content.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001323 """),
1324
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01001325 I(name='BINBYTES8',
1326 code='\x8e',
1327 arg=bytes8,
1328 stack_before=[],
1329 stack_after=[pybytes],
1330 proto=4,
1331 doc="""Push a Python bytes object.
1332
1333 There are two arguments: the first is a 8-byte unsigned int giving
1334 the number of bytes in the string, and the second is that many bytes,
1335 which are taken literally as the string content.
1336 """),
1337
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001338 # Ways to spell None.
1339
1340 I(name='NONE',
1341 code='N',
1342 arg=None,
1343 stack_before=[],
1344 stack_after=[pynone],
1345 proto=0,
1346 doc="Push None on the stack."),
1347
Tim Petersfdc03462003-01-28 04:56:33 +00001348 # Ways to spell bools, starting with proto 2. See INT for how this was
1349 # done before proto 2.
1350
1351 I(name='NEWTRUE',
1352 code='\x88',
1353 arg=None,
1354 stack_before=[],
1355 stack_after=[pybool],
1356 proto=2,
1357 doc="""True.
1358
1359 Push True onto the stack."""),
1360
1361 I(name='NEWFALSE',
1362 code='\x89',
1363 arg=None,
1364 stack_before=[],
1365 stack_after=[pybool],
1366 proto=2,
1367 doc="""True.
1368
1369 Push False onto the stack."""),
1370
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001371 # Ways to spell Unicode strings.
1372
1373 I(name='UNICODE',
1374 code='V',
1375 arg=unicodestringnl,
1376 stack_before=[],
1377 stack_after=[pyunicode],
1378 proto=0, # this may be pure-text, but it's a later addition
1379 doc="""Push a Python Unicode string object.
1380
1381 The argument is a raw-unicode-escape encoding of a Unicode string,
1382 and so may contain embedded escape sequences. The argument extends
1383 until the next newline character.
1384 """),
1385
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01001386 I(name='SHORT_BINUNICODE',
1387 code='\x8c',
1388 arg=unicodestring1,
1389 stack_before=[],
1390 stack_after=[pyunicode],
1391 proto=4,
1392 doc="""Push a Python Unicode string object.
1393
1394 There are two arguments: the first is a 1-byte little-endian signed int
1395 giving the number of bytes in the string. The second is that many
1396 bytes, and is the UTF-8 encoding of the Unicode string.
1397 """),
1398
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001399 I(name='BINUNICODE',
1400 code='X',
1401 arg=unicodestring4,
1402 stack_before=[],
1403 stack_after=[pyunicode],
1404 proto=1,
1405 doc="""Push a Python Unicode string object.
1406
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -07001407 There are two arguments: the first is a 4-byte little-endian unsigned int
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001408 giving the number of bytes in the string. The second is that many
1409 bytes, and is the UTF-8 encoding of the Unicode string.
1410 """),
1411
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01001412 I(name='BINUNICODE8',
1413 code='\x8d',
1414 arg=unicodestring8,
1415 stack_before=[],
1416 stack_after=[pyunicode],
1417 proto=4,
1418 doc="""Push a Python Unicode string object.
1419
1420 There are two arguments: the first is a 8-byte little-endian signed int
1421 giving the number of bytes in the string. The second is that many
1422 bytes, and is the UTF-8 encoding of the Unicode string.
1423 """),
1424
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001425 # Ways to spell floats.
1426
1427 I(name='FLOAT',
1428 code='F',
1429 arg=floatnl,
1430 stack_before=[],
1431 stack_after=[pyfloat],
1432 proto=0,
1433 doc="""Newline-terminated decimal float literal.
1434
1435 The argument is repr(a_float), and in general requires 17 significant
1436 digits for roundtrip conversion to be an identity (this is so for
1437 IEEE-754 double precision values, which is what Python float maps to
1438 on most boxes).
1439
1440 In general, FLOAT cannot be used to transport infinities, NaNs, or
1441 minus zero across boxes (or even on a single box, if the platform C
1442 library can't read the strings it produces for such things -- Windows
1443 is like that), but may do less damage than BINFLOAT on boxes with
1444 greater precision or dynamic range than IEEE-754 double.
1445 """),
1446
1447 I(name='BINFLOAT',
1448 code='G',
1449 arg=float8,
1450 stack_before=[],
1451 stack_after=[pyfloat],
1452 proto=1,
1453 doc="""Float stored in binary form, with 8 bytes of data.
1454
1455 This generally requires less than half the space of FLOAT encoding.
1456 In general, BINFLOAT cannot be used to transport infinities, NaNs, or
1457 minus zero, raises an exception if the exponent exceeds the range of
1458 an IEEE-754 double, and retains no more than 53 bits of precision (if
1459 there are more than that, "add a half and chop" rounding is used to
1460 cut it back to 53 significant bits).
1461 """),
1462
1463 # Ways to build lists.
1464
1465 I(name='EMPTY_LIST',
1466 code=']',
1467 arg=None,
1468 stack_before=[],
1469 stack_after=[pylist],
1470 proto=1,
1471 doc="Push an empty list."),
1472
1473 I(name='APPEND',
1474 code='a',
1475 arg=None,
1476 stack_before=[pylist, anyobject],
1477 stack_after=[pylist],
1478 proto=0,
1479 doc="""Append an object to a list.
1480
1481 Stack before: ... pylist anyobject
1482 Stack after: ... pylist+[anyobject]
Tim Peters81098ac2003-01-28 05:12:08 +00001483
1484 although pylist is really extended in-place.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001485 """),
1486
1487 I(name='APPENDS',
1488 code='e',
1489 arg=None,
1490 stack_before=[pylist, markobject, stackslice],
1491 stack_after=[pylist],
1492 proto=1,
1493 doc="""Extend a list by a slice of stack objects.
1494
1495 Stack before: ... pylist markobject stackslice
1496 Stack after: ... pylist+stackslice
Tim Peters81098ac2003-01-28 05:12:08 +00001497
1498 although pylist is really extended in-place.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001499 """),
1500
1501 I(name='LIST',
1502 code='l',
1503 arg=None,
1504 stack_before=[markobject, stackslice],
1505 stack_after=[pylist],
1506 proto=0,
1507 doc="""Build a list out of the topmost stack slice, after markobject.
1508
1509 All the stack entries following the topmost markobject are placed into
1510 a single Python list, which single list object replaces all of the
1511 stack from the topmost markobject onward. For example,
1512
1513 Stack before: ... markobject 1 2 3 'abc'
1514 Stack after: ... [1, 2, 3, 'abc']
1515 """),
1516
1517 # Ways to build tuples.
1518
1519 I(name='EMPTY_TUPLE',
1520 code=')',
1521 arg=None,
1522 stack_before=[],
1523 stack_after=[pytuple],
1524 proto=1,
1525 doc="Push an empty tuple."),
1526
1527 I(name='TUPLE',
1528 code='t',
1529 arg=None,
1530 stack_before=[markobject, stackslice],
1531 stack_after=[pytuple],
1532 proto=0,
1533 doc="""Build a tuple out of the topmost stack slice, after markobject.
1534
1535 All the stack entries following the topmost markobject are placed into
1536 a single Python tuple, which single tuple object replaces all of the
1537 stack from the topmost markobject onward. For example,
1538
1539 Stack before: ... markobject 1 2 3 'abc'
1540 Stack after: ... (1, 2, 3, 'abc')
1541 """),
1542
Tim Petersfdc03462003-01-28 04:56:33 +00001543 I(name='TUPLE1',
1544 code='\x85',
1545 arg=None,
1546 stack_before=[anyobject],
1547 stack_after=[pytuple],
1548 proto=2,
Alexander Belopolsky44c2ffd2010-07-16 14:39:45 +00001549 doc="""Build a one-tuple out of the topmost item on the stack.
Tim Petersfdc03462003-01-28 04:56:33 +00001550
1551 This code pops one value off the stack and pushes a tuple of
Alexander Belopolsky44c2ffd2010-07-16 14:39:45 +00001552 length 1 whose one item is that value back onto it. In other
1553 words:
Tim Petersfdc03462003-01-28 04:56:33 +00001554
1555 stack[-1] = tuple(stack[-1:])
1556 """),
1557
1558 I(name='TUPLE2',
1559 code='\x86',
1560 arg=None,
1561 stack_before=[anyobject, anyobject],
1562 stack_after=[pytuple],
1563 proto=2,
Alexander Belopolsky44c2ffd2010-07-16 14:39:45 +00001564 doc="""Build a two-tuple out of the top two items on the stack.
Tim Petersfdc03462003-01-28 04:56:33 +00001565
Alexander Belopolsky44c2ffd2010-07-16 14:39:45 +00001566 This code pops two values off the stack and pushes a tuple of
1567 length 2 whose items are those values back onto it. In other
1568 words:
Tim Petersfdc03462003-01-28 04:56:33 +00001569
1570 stack[-2:] = [tuple(stack[-2:])]
1571 """),
1572
1573 I(name='TUPLE3',
1574 code='\x87',
1575 arg=None,
1576 stack_before=[anyobject, anyobject, anyobject],
1577 stack_after=[pytuple],
1578 proto=2,
Alexander Belopolsky44c2ffd2010-07-16 14:39:45 +00001579 doc="""Build a three-tuple out of the top three items on the stack.
Tim Petersfdc03462003-01-28 04:56:33 +00001580
Alexander Belopolsky44c2ffd2010-07-16 14:39:45 +00001581 This code pops three values off the stack and pushes a tuple of
1582 length 3 whose items are those values back onto it. In other
1583 words:
Tim Petersfdc03462003-01-28 04:56:33 +00001584
1585 stack[-3:] = [tuple(stack[-3:])]
1586 """),
1587
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001588 # Ways to build dicts.
1589
1590 I(name='EMPTY_DICT',
1591 code='}',
1592 arg=None,
1593 stack_before=[],
1594 stack_after=[pydict],
1595 proto=1,
1596 doc="Push an empty dict."),
1597
1598 I(name='DICT',
1599 code='d',
1600 arg=None,
1601 stack_before=[markobject, stackslice],
1602 stack_after=[pydict],
1603 proto=0,
1604 doc="""Build a dict out of the topmost stack slice, after markobject.
1605
1606 All the stack entries following the topmost markobject are placed into
1607 a single Python dict, which single dict object replaces all of the
1608 stack from the topmost markobject onward. The stack slice alternates
1609 key, value, key, value, .... For example,
1610
1611 Stack before: ... markobject 1 2 3 'abc'
1612 Stack after: ... {1: 2, 3: 'abc'}
1613 """),
1614
1615 I(name='SETITEM',
1616 code='s',
1617 arg=None,
1618 stack_before=[pydict, anyobject, anyobject],
1619 stack_after=[pydict],
1620 proto=0,
1621 doc="""Add a key+value pair to an existing dict.
1622
1623 Stack before: ... pydict key value
1624 Stack after: ... pydict
1625
1626 where pydict has been modified via pydict[key] = value.
1627 """),
1628
1629 I(name='SETITEMS',
1630 code='u',
1631 arg=None,
1632 stack_before=[pydict, markobject, stackslice],
1633 stack_after=[pydict],
1634 proto=1,
1635 doc="""Add an arbitrary number of key+value pairs to an existing dict.
1636
1637 The slice of the stack following the topmost markobject is taken as
1638 an alternating sequence of keys and values, added to the dict
1639 immediately under the topmost markobject. Everything at and after the
1640 topmost markobject is popped, leaving the mutated dict at the top
1641 of the stack.
1642
1643 Stack before: ... pydict markobject key_1 value_1 ... key_n value_n
1644 Stack after: ... pydict
1645
1646 where pydict has been modified via pydict[key_i] = value_i for i in
1647 1, 2, ..., n, and in that order.
1648 """),
1649
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01001650 # Ways to build sets
1651
1652 I(name='EMPTY_SET',
1653 code='\x8f',
1654 arg=None,
1655 stack_before=[],
1656 stack_after=[pyset],
1657 proto=4,
1658 doc="Push an empty set."),
1659
1660 I(name='ADDITEMS',
1661 code='\x90',
1662 arg=None,
1663 stack_before=[pyset, markobject, stackslice],
1664 stack_after=[pyset],
1665 proto=4,
1666 doc="""Add an arbitrary number of items to an existing set.
1667
1668 The slice of the stack following the topmost markobject is taken as
1669 a sequence of items, added to the set immediately under the topmost
1670 markobject. Everything at and after the topmost markobject is popped,
1671 leaving the mutated set at the top of the stack.
1672
1673 Stack before: ... pyset markobject item_1 ... item_n
1674 Stack after: ... pyset
1675
1676 where pyset has been modified via pyset.add(item_i) = item_i for i in
1677 1, 2, ..., n, and in that order.
1678 """),
1679
1680 # Way to build frozensets
1681
1682 I(name='FROZENSET',
1683 code='\x91',
1684 arg=None,
1685 stack_before=[markobject, stackslice],
1686 stack_after=[pyfrozenset],
1687 proto=4,
1688 doc="""Build a frozenset out of the topmost slice, after markobject.
1689
1690 All the stack entries following the topmost markobject are placed into
1691 a single Python frozenset, which single frozenset object replaces all
1692 of the stack from the topmost markobject onward. For example,
1693
1694 Stack before: ... markobject 1 2 3
1695 Stack after: ... frozenset({1, 2, 3})
1696 """),
1697
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001698 # Stack manipulation.
1699
1700 I(name='POP',
1701 code='0',
1702 arg=None,
1703 stack_before=[anyobject],
1704 stack_after=[],
1705 proto=0,
1706 doc="Discard the top stack item, shrinking the stack by one item."),
1707
1708 I(name='DUP',
1709 code='2',
1710 arg=None,
1711 stack_before=[anyobject],
1712 stack_after=[anyobject, anyobject],
1713 proto=0,
1714 doc="Push the top stack item onto the stack again, duplicating it."),
1715
1716 I(name='MARK',
1717 code='(',
1718 arg=None,
1719 stack_before=[],
1720 stack_after=[markobject],
1721 proto=0,
1722 doc="""Push markobject onto the stack.
1723
1724 markobject is a unique object, used by other opcodes to identify a
1725 region of the stack containing a variable number of objects for them
1726 to work on. See markobject.doc for more detail.
1727 """),
1728
1729 I(name='POP_MARK',
1730 code='1',
1731 arg=None,
1732 stack_before=[markobject, stackslice],
1733 stack_after=[],
Collin Wintere61d4372009-05-20 17:46:47 +00001734 proto=1,
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001735 doc="""Pop all the stack objects at and above the topmost markobject.
1736
1737 When an opcode using a variable number of stack objects is done,
1738 POP_MARK is used to remove those objects, and to remove the markobject
1739 that delimited their starting position on the stack.
1740 """),
1741
1742 # Memo manipulation. There are really only two operations (get and put),
1743 # each in all-text, "short binary", and "long binary" flavors.
1744
1745 I(name='GET',
1746 code='g',
1747 arg=decimalnl_short,
1748 stack_before=[],
1749 stack_after=[anyobject],
1750 proto=0,
1751 doc="""Read an object from the memo and push it on the stack.
1752
Ezio Melotti13925002011-03-16 11:05:33 +02001753 The index of the memo object to push is given by the newline-terminated
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001754 decimal string following. BINGET and LONG_BINGET are space-optimized
1755 versions.
1756 """),
1757
1758 I(name='BINGET',
1759 code='h',
1760 arg=uint1,
1761 stack_before=[],
1762 stack_after=[anyobject],
1763 proto=1,
1764 doc="""Read an object from the memo and push it on the stack.
1765
1766 The index of the memo object to push is given by the 1-byte unsigned
1767 integer following.
1768 """),
1769
1770 I(name='LONG_BINGET',
1771 code='j',
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -07001772 arg=uint4,
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001773 stack_before=[],
1774 stack_after=[anyobject],
1775 proto=1,
1776 doc="""Read an object from the memo and push it on the stack.
1777
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -07001778 The index of the memo object to push is given by the 4-byte unsigned
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001779 little-endian integer following.
1780 """),
1781
1782 I(name='PUT',
1783 code='p',
1784 arg=decimalnl_short,
1785 stack_before=[],
1786 stack_after=[],
1787 proto=0,
1788 doc="""Store the stack top into the memo. The stack is not popped.
1789
1790 The index of the memo location to write into is given by the newline-
1791 terminated decimal string following. BINPUT and LONG_BINPUT are
1792 space-optimized versions.
1793 """),
1794
1795 I(name='BINPUT',
1796 code='q',
1797 arg=uint1,
1798 stack_before=[],
1799 stack_after=[],
1800 proto=1,
1801 doc="""Store the stack top into the memo. The stack is not popped.
1802
1803 The index of the memo location to write into is given by the 1-byte
1804 unsigned integer following.
1805 """),
1806
1807 I(name='LONG_BINPUT',
1808 code='r',
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -07001809 arg=uint4,
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001810 stack_before=[],
1811 stack_after=[],
1812 proto=1,
1813 doc="""Store the stack top into the memo. The stack is not popped.
1814
1815 The index of the memo location to write into is given by the 4-byte
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -07001816 unsigned little-endian integer following.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001817 """),
1818
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01001819 I(name='MEMOIZE',
1820 code='\x94',
1821 arg=None,
1822 stack_before=[anyobject],
1823 stack_after=[anyobject],
1824 proto=4,
1825 doc="""Store the stack top into the memo. The stack is not popped.
1826
1827 The index of the memo location to write is the number of
1828 elements currently present in the memo.
1829 """),
1830
Tim Petersfdc03462003-01-28 04:56:33 +00001831 # Access the extension registry (predefined objects). Akin to the GET
1832 # family.
1833
1834 I(name='EXT1',
1835 code='\x82',
1836 arg=uint1,
1837 stack_before=[],
1838 stack_after=[anyobject],
1839 proto=2,
1840 doc="""Extension code.
1841
1842 This code and the similar EXT2 and EXT4 allow using a registry
1843 of popular objects that are pickled by name, typically classes.
1844 It is envisioned that through a global negotiation and
1845 registration process, third parties can set up a mapping between
1846 ints and object names.
1847
1848 In order to guarantee pickle interchangeability, the extension
1849 code registry ought to be global, although a range of codes may
1850 be reserved for private use.
1851
1852 EXT1 has a 1-byte integer argument. This is used to index into the
1853 extension registry, and the object at that index is pushed on the stack.
1854 """),
1855
1856 I(name='EXT2',
1857 code='\x83',
1858 arg=uint2,
1859 stack_before=[],
1860 stack_after=[anyobject],
1861 proto=2,
1862 doc="""Extension code.
1863
1864 See EXT1. EXT2 has a two-byte integer argument.
1865 """),
1866
1867 I(name='EXT4',
1868 code='\x84',
1869 arg=int4,
1870 stack_before=[],
1871 stack_after=[anyobject],
1872 proto=2,
1873 doc="""Extension code.
1874
1875 See EXT1. EXT4 has a four-byte integer argument.
1876 """),
1877
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001878 # Push a class object, or module function, on the stack, via its module
1879 # and name.
1880
1881 I(name='GLOBAL',
1882 code='c',
1883 arg=stringnl_noescape_pair,
1884 stack_before=[],
1885 stack_after=[anyobject],
1886 proto=0,
1887 doc="""Push a global object (module.attr) on the stack.
1888
1889 Two newline-terminated strings follow the GLOBAL opcode. The first is
1890 taken as a module name, and the second as a class name. The class
1891 object module.class is pushed on the stack. More accurately, the
1892 object returned by self.find_class(module, class) is pushed on the
1893 stack, so unpickling subclasses can override this form of lookup.
1894 """),
1895
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01001896 I(name='STACK_GLOBAL',
1897 code='\x93',
1898 arg=None,
1899 stack_before=[pyunicode, pyunicode],
1900 stack_after=[anyobject],
Serhiy Storchaka5805dde2015-10-13 21:12:32 +03001901 proto=4,
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01001902 doc="""Push a global object (module.attr) on the stack.
1903 """),
1904
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001905 # Ways to build objects of classes pickle doesn't know about directly
1906 # (user-defined classes). I despair of documenting this accurately
1907 # and comprehensibly -- you really have to read the pickle code to
1908 # find all the special cases.
1909
1910 I(name='REDUCE',
1911 code='R',
1912 arg=None,
1913 stack_before=[anyobject, anyobject],
1914 stack_after=[anyobject],
1915 proto=0,
1916 doc="""Push an object built from a callable and an argument tuple.
1917
1918 The opcode is named to remind of the __reduce__() method.
1919
1920 Stack before: ... callable pytuple
1921 Stack after: ... callable(*pytuple)
1922
1923 The callable and the argument tuple are the first two items returned
1924 by a __reduce__ method. Applying the callable to the argtuple is
1925 supposed to reproduce the original object, or at least get it started.
1926 If the __reduce__ method returns a 3-tuple, the last component is an
1927 argument to be passed to the object's __setstate__, and then the REDUCE
1928 opcode is followed by code to create setstate's argument, and then a
1929 BUILD opcode to apply __setstate__ to that argument.
1930
Guido van Rossum13257902007-06-07 23:15:56 +00001931 If not isinstance(callable, type), REDUCE complains unless the
Alexandre Vassalottif7fa63d2008-05-11 08:55:36 +00001932 callable has been registered with the copyreg module's
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001933 safe_constructors dict, or the callable has a magic
1934 '__safe_for_unpickling__' attribute with a true value. I'm not sure
1935 why it does this, but I've sure seen this complaint often enough when
1936 I didn't want to <wink>.
1937 """),
1938
1939 I(name='BUILD',
1940 code='b',
1941 arg=None,
1942 stack_before=[anyobject, anyobject],
1943 stack_after=[anyobject],
1944 proto=0,
1945 doc="""Finish building an object, via __setstate__ or dict update.
1946
1947 Stack before: ... anyobject argument
1948 Stack after: ... anyobject
1949
1950 where anyobject may have been mutated, as follows:
1951
1952 If the object has a __setstate__ method,
1953
1954 anyobject.__setstate__(argument)
1955
1956 is called.
1957
1958 Else the argument must be a dict, the object must have a __dict__, and
1959 the object is updated via
1960
1961 anyobject.__dict__.update(argument)
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001962 """),
1963
1964 I(name='INST',
1965 code='i',
1966 arg=stringnl_noescape_pair,
1967 stack_before=[markobject, stackslice],
1968 stack_after=[anyobject],
1969 proto=0,
1970 doc="""Build a class instance.
1971
1972 This is the protocol 0 version of protocol 1's OBJ opcode.
1973 INST is followed by two newline-terminated strings, giving a
1974 module and class name, just as for the GLOBAL opcode (and see
1975 GLOBAL for more details about that). self.find_class(module, name)
1976 is used to get a class object.
1977
1978 In addition, all the objects on the stack following the topmost
1979 markobject are gathered into a tuple and popped (along with the
1980 topmost markobject), just as for the TUPLE opcode.
1981
1982 Now it gets complicated. If all of these are true:
1983
1984 + The argtuple is empty (markobject was at the top of the stack
1985 at the start).
1986
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001987 + The class object does not have a __getinitargs__ attribute.
1988
1989 then we want to create an old-style class instance without invoking
1990 its __init__() method (pickle has waffled on this over the years; not
1991 calling __init__() is current wisdom). In this case, an instance of
1992 an old-style dummy class is created, and then we try to rebind its
1993 __class__ attribute to the desired class object. If this succeeds,
Guido van Rossuma8add0e2007-05-14 22:03:55 +00001994 the new instance object is pushed on the stack, and we're done.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001995
1996 Else (the argtuple is not empty, it's not an old-style class object,
1997 or the class object does have a __getinitargs__ attribute), the code
1998 first insists that the class object have a __safe_for_unpickling__
1999 attribute. Unlike as for the __safe_for_unpickling__ check in REDUCE,
2000 it doesn't matter whether this attribute has a true or false value, it
Guido van Rossum99603b02007-07-20 00:22:32 +00002001 only matters whether it exists (XXX this is a bug). If
2002 __safe_for_unpickling__ doesn't exist, UnpicklingError is raised.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002003
2004 Else (the class object does have a __safe_for_unpickling__ attr),
2005 the class object obtained from INST's arguments is applied to the
2006 argtuple obtained from the stack, and the resulting instance object
2007 is pushed on the stack.
Tim Peters2b93c4c2003-01-30 16:35:08 +00002008
2009 NOTE: checks for __safe_for_unpickling__ went away in Python 2.3.
Florent Xiclunaaa6c1d22011-12-12 18:54:29 +01002010 NOTE: the distinction between old-style and new-style classes does
2011 not make sense in Python 3.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002012 """),
2013
2014 I(name='OBJ',
2015 code='o',
2016 arg=None,
2017 stack_before=[markobject, anyobject, stackslice],
2018 stack_after=[anyobject],
2019 proto=1,
2020 doc="""Build a class instance.
2021
2022 This is the protocol 1 version of protocol 0's INST opcode, and is
2023 very much like it. The major difference is that the class object
2024 is taken off the stack, allowing it to be retrieved from the memo
2025 repeatedly if several instances of the same class are created. This
2026 can be much more efficient (in both time and space) than repeatedly
2027 embedding the module and class names in INST opcodes.
2028
2029 Unlike INST, OBJ takes no arguments from the opcode stream. Instead
2030 the class object is taken off the stack, immediately above the
2031 topmost markobject:
2032
2033 Stack before: ... markobject classobject stackslice
2034 Stack after: ... new_instance_object
2035
2036 As for INST, the remainder of the stack above the markobject is
2037 gathered into an argument tuple, and then the logic seems identical,
Guido van Rossumecb11042003-01-29 06:24:30 +00002038 except that no __safe_for_unpickling__ check is done (XXX this is
Guido van Rossum99603b02007-07-20 00:22:32 +00002039 a bug). See INST for the gory details.
Tim Peters2b93c4c2003-01-30 16:35:08 +00002040
2041 NOTE: In Python 2.3, INST and OBJ are identical except for how they
2042 get the class object. That was always the intent; the implementations
2043 had diverged for accidental reasons.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002044 """),
2045
Tim Petersfdc03462003-01-28 04:56:33 +00002046 I(name='NEWOBJ',
2047 code='\x81',
2048 arg=None,
2049 stack_before=[anyobject, anyobject],
2050 stack_after=[anyobject],
2051 proto=2,
2052 doc="""Build an object instance.
2053
2054 The stack before should be thought of as containing a class
2055 object followed by an argument tuple (the tuple being the stack
2056 top). Call these cls and args. They are popped off the stack,
2057 and the value returned by cls.__new__(cls, *args) is pushed back
2058 onto the stack.
2059 """),
2060
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01002061 I(name='NEWOBJ_EX',
2062 code='\x92',
2063 arg=None,
2064 stack_before=[anyobject, anyobject, anyobject],
2065 stack_after=[anyobject],
2066 proto=4,
2067 doc="""Build an object instance.
2068
2069 The stack before should be thought of as containing a class
2070 object followed by an argument tuple and by a keyword argument dict
2071 (the dict being the stack top). Call these cls and args. They are
2072 popped off the stack, and the value returned by
2073 cls.__new__(cls, *args, *kwargs) is pushed back onto the stack.
2074 """),
2075
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002076 # Machine control.
2077
Tim Petersfdc03462003-01-28 04:56:33 +00002078 I(name='PROTO',
2079 code='\x80',
2080 arg=uint1,
2081 stack_before=[],
2082 stack_after=[],
2083 proto=2,
2084 doc="""Protocol version indicator.
2085
2086 For protocol 2 and above, a pickle must start with this opcode.
2087 The argument is the protocol version, an int in range(2, 256).
2088 """),
2089
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002090 I(name='STOP',
2091 code='.',
2092 arg=None,
2093 stack_before=[anyobject],
2094 stack_after=[],
2095 proto=0,
2096 doc="""Stop the unpickling machine.
2097
2098 Every pickle ends with this opcode. The object at the top of the stack
2099 is popped, and that's the result of unpickling. The stack should be
2100 empty then.
2101 """),
2102
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01002103 # Framing support.
2104
2105 I(name='FRAME',
2106 code='\x95',
2107 arg=uint8,
2108 stack_before=[],
2109 stack_after=[],
2110 proto=4,
2111 doc="""Indicate the beginning of a new frame.
2112
2113 The unpickler may use this opcode to safely prefetch data from its
2114 underlying stream.
2115 """),
2116
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002117 # Ways to deal with persistent IDs.
2118
2119 I(name='PERSID',
2120 code='P',
2121 arg=stringnl_noescape,
2122 stack_before=[],
2123 stack_after=[anyobject],
2124 proto=0,
2125 doc="""Push an object identified by a persistent ID.
2126
2127 The pickle module doesn't define what a persistent ID means. PERSID's
2128 argument is a newline-terminated str-style (no embedded escapes, no
2129 bracketing quote characters) string, which *is* "the persistent ID".
2130 The unpickler passes this string to self.persistent_load(). Whatever
2131 object that returns is pushed on the stack. There is no implementation
2132 of persistent_load() in Python's unpickler: it must be supplied by an
2133 unpickler subclass.
2134 """),
2135
2136 I(name='BINPERSID',
2137 code='Q',
2138 arg=None,
2139 stack_before=[anyobject],
2140 stack_after=[anyobject],
2141 proto=1,
2142 doc="""Push an object identified by a persistent ID.
2143
2144 Like PERSID, except the persistent ID is popped off the stack (instead
2145 of being a string embedded in the opcode bytestream). The persistent
2146 ID is passed to self.persistent_load(), and whatever object that
2147 returns is pushed on the stack. See PERSID for more detail.
2148 """),
2149]
2150del I
2151
2152# Verify uniqueness of .name and .code members.
2153name2i = {}
2154code2i = {}
2155
2156for i, d in enumerate(opcodes):
2157 if d.name in name2i:
2158 raise ValueError("repeated name %r at indices %d and %d" %
2159 (d.name, name2i[d.name], i))
2160 if d.code in code2i:
2161 raise ValueError("repeated code %r at indices %d and %d" %
2162 (d.code, code2i[d.code], i))
2163
2164 name2i[d.name] = i
2165 code2i[d.code] = i
2166
2167del name2i, code2i, i, d
2168
2169##############################################################################
2170# Build a code2op dict, mapping opcode characters to OpcodeInfo records.
2171# Also ensure we've got the same stuff as pickle.py, although the
2172# introspection here is dicey.
2173
2174code2op = {}
2175for d in opcodes:
2176 code2op[d.code] = d
2177del d
2178
2179def assure_pickle_consistency(verbose=False):
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002180
2181 copy = code2op.copy()
2182 for name in pickle.__all__:
2183 if not re.match("[A-Z][A-Z0-9_]+$", name):
2184 if verbose:
Guido van Rossumbe19ed72007-02-09 05:37:30 +00002185 print("skipping %r: it doesn't look like an opcode name" % name)
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002186 continue
2187 picklecode = getattr(pickle, name)
Guido van Rossum617dbc42007-05-07 23:57:08 +00002188 if not isinstance(picklecode, bytes) or len(picklecode) != 1:
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002189 if verbose:
Guido van Rossumbe19ed72007-02-09 05:37:30 +00002190 print(("skipping %r: value %r doesn't look like a pickle "
2191 "code" % (name, picklecode)))
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002192 continue
Guido van Rossum617dbc42007-05-07 23:57:08 +00002193 picklecode = picklecode.decode("latin-1")
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002194 if picklecode in copy:
2195 if verbose:
Guido van Rossumbe19ed72007-02-09 05:37:30 +00002196 print("checking name %r w/ code %r for consistency" % (
2197 name, picklecode))
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002198 d = copy[picklecode]
2199 if d.name != name:
2200 raise ValueError("for pickle code %r, pickle.py uses name %r "
2201 "but we're using name %r" % (picklecode,
2202 name,
2203 d.name))
2204 # Forget this one. Any left over in copy at the end are a problem
2205 # of a different kind.
2206 del copy[picklecode]
2207 else:
2208 raise ValueError("pickle.py appears to have a pickle opcode with "
2209 "name %r and code %r, but we don't" %
2210 (name, picklecode))
2211 if copy:
2212 msg = ["we appear to have pickle opcodes that pickle.py doesn't have:"]
2213 for code, d in copy.items():
2214 msg.append(" name %r with code %r" % (d.name, code))
2215 raise ValueError("\n".join(msg))
2216
2217assure_pickle_consistency()
Tim Petersc0c12b52003-01-29 00:56:17 +00002218del assure_pickle_consistency
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002219
2220##############################################################################
2221# A pickle opcode generator.
2222
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01002223def _genops(data, yield_end_pos=False):
2224 if isinstance(data, bytes_types):
2225 data = io.BytesIO(data)
2226
2227 if hasattr(data, "tell"):
2228 getpos = data.tell
2229 else:
2230 getpos = lambda: None
2231
2232 while True:
2233 pos = getpos()
2234 code = data.read(1)
2235 opcode = code2op.get(code.decode("latin-1"))
2236 if opcode is None:
2237 if code == b"":
2238 raise ValueError("pickle exhausted before seeing STOP")
2239 else:
2240 raise ValueError("at position %s, opcode %r unknown" % (
2241 "<unknown>" if pos is None else pos,
2242 code))
2243 if opcode.arg is None:
2244 arg = None
2245 else:
2246 arg = opcode.arg.reader(data)
2247 if yield_end_pos:
2248 yield opcode, arg, pos, getpos()
2249 else:
2250 yield opcode, arg, pos
2251 if code == b'.':
2252 assert opcode.name == 'STOP'
2253 break
2254
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002255def genops(pickle):
Guido van Rossuma72ded92003-01-27 19:40:47 +00002256 """Generate all the opcodes in a pickle.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002257
2258 'pickle' is a file-like object, or string, containing the pickle.
2259
2260 Each opcode in the pickle is generated, from the current pickle position,
2261 stopping after a STOP opcode is delivered. A triple is generated for
2262 each opcode:
2263
2264 opcode, arg, pos
2265
2266 opcode is an OpcodeInfo record, describing the current opcode.
2267
2268 If the opcode has an argument embedded in the pickle, arg is its decoded
2269 value, as a Python object. If the opcode doesn't have an argument, arg
2270 is None.
2271
2272 If the pickle has a tell() method, pos was the value of pickle.tell()
Guido van Rossum34d19282007-08-09 01:03:29 +00002273 before reading the current opcode. If the pickle is a bytes object,
2274 it's wrapped in a BytesIO object, and the latter's tell() result is
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002275 used. Else (the pickle doesn't have a tell(), and it's not obvious how
2276 to query its current position) pos is None.
2277 """
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01002278 return _genops(pickle)
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002279
2280##############################################################################
Christian Heimes3feef612008-02-11 06:19:17 +00002281# A pickle optimizer.
2282
2283def optimize(p):
2284 'Optimize a pickle string by removing unused PUT opcodes'
Serhiy Storchaka05dadcf2014-12-16 18:00:56 +02002285 put = 'PUT'
2286 get = 'GET'
2287 oldids = set() # set of all PUT ids
2288 newids = {} # set of ids used by a GET opcode
2289 opcodes = [] # (op, idx) or (pos, end_pos)
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01002290 proto = 0
Serhiy Storchaka05dadcf2014-12-16 18:00:56 +02002291 protoheader = b''
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01002292 for opcode, arg, pos, end_pos in _genops(p, yield_end_pos=True):
Christian Heimes3feef612008-02-11 06:19:17 +00002293 if 'PUT' in opcode.name:
Serhiy Storchaka05dadcf2014-12-16 18:00:56 +02002294 oldids.add(arg)
2295 opcodes.append((put, arg))
2296 elif opcode.name == 'MEMOIZE':
2297 idx = len(oldids)
2298 oldids.add(idx)
2299 opcodes.append((put, idx))
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01002300 elif 'FRAME' in opcode.name:
2301 pass
Serhiy Storchaka05dadcf2014-12-16 18:00:56 +02002302 elif 'GET' in opcode.name:
2303 if opcode.proto > proto:
2304 proto = opcode.proto
2305 newids[arg] = None
2306 opcodes.append((get, arg))
2307 elif opcode.name == 'PROTO':
2308 if arg > proto:
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01002309 proto = arg
Serhiy Storchaka05dadcf2014-12-16 18:00:56 +02002310 if pos == 0:
2311 protoheader = p[pos: end_pos]
2312 else:
2313 opcodes.append((pos, end_pos))
2314 else:
2315 opcodes.append((pos, end_pos))
2316 del oldids
Christian Heimes3feef612008-02-11 06:19:17 +00002317
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01002318 # Copy the opcodes except for PUTS without a corresponding GET
2319 out = io.BytesIO()
Serhiy Storchaka05dadcf2014-12-16 18:00:56 +02002320 # Write the PROTO header before any framing
2321 out.write(protoheader)
2322 pickler = pickle._Pickler(out, proto)
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01002323 if proto >= 4:
Serhiy Storchaka05dadcf2014-12-16 18:00:56 +02002324 pickler.framer.start_framing()
2325 idx = 0
2326 for op, arg in opcodes:
2327 if op is put:
2328 if arg not in newids:
2329 continue
2330 data = pickler.put(idx)
2331 newids[arg] = idx
2332 idx += 1
2333 elif op is get:
2334 data = pickler.get(newids[arg])
2335 else:
2336 data = p[op:arg]
2337 pickler.framer.commit_frame()
2338 pickler.write(data)
2339 pickler.framer.end_framing()
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01002340 return out.getvalue()
Christian Heimes3feef612008-02-11 06:19:17 +00002341
2342##############################################################################
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002343# A symbolic pickle disassembler.
2344
Alexander Belopolsky929d3842010-07-17 15:51:21 +00002345def dis(pickle, out=None, memo=None, indentlevel=4, annotate=0):
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002346 """Produce a symbolic disassembly of a pickle.
2347
2348 'pickle' is a file-like object, or string, containing a (at least one)
2349 pickle. The pickle is disassembled from the current position, through
2350 the first STOP opcode encountered.
2351
2352 Optional arg 'out' is a file-like object to which the disassembly is
2353 printed. It defaults to sys.stdout.
2354
Tim Peters62235e72003-02-05 19:55:53 +00002355 Optional arg 'memo' is a Python dict, used as the pickle's memo. It
2356 may be mutated by dis(), if the pickle contains PUT or BINPUT opcodes.
2357 Passing the same memo object to another dis() call then allows disassembly
2358 to proceed across multiple pickles that were all created by the same
2359 pickler with the same memo. Ordinarily you don't need to worry about this.
2360
Alexander Belopolsky929d3842010-07-17 15:51:21 +00002361 Optional arg 'indentlevel' is the number of blanks by which to indent
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002362 a new MARK level. It defaults to 4.
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002363
Alexander Belopolsky929d3842010-07-17 15:51:21 +00002364 Optional arg 'annotate' if nonzero instructs dis() to add short
2365 description of the opcode on each line of disassembled output.
2366 The value given to 'annotate' must be an integer and is used as a
2367 hint for the column where annotation should start. The default
2368 value is 0, meaning no annotations.
2369
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002370 In addition to printing the disassembly, some sanity checks are made:
2371
2372 + All embedded opcode arguments "make sense".
2373
2374 + Explicit and implicit pop operations have enough items on the stack.
2375
2376 + When an opcode implicitly refers to a markobject, a markobject is
2377 actually on the stack.
2378
2379 + A memo entry isn't referenced before it's defined.
2380
2381 + The markobject isn't stored in the memo.
2382
2383 + A memo entry isn't redefined.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002384 """
2385
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002386 # Most of the hair here is for sanity checks, but most of it is needed
2387 # anyway to detect when a protocol 0 POP takes a MARK off the stack
2388 # (which in turn is needed to indent MARK blocks correctly).
2389
2390 stack = [] # crude emulation of unpickler stack
Tim Peters62235e72003-02-05 19:55:53 +00002391 if memo is None:
Ezio Melotti30b9d5d2013-08-17 15:50:46 +03002392 memo = {} # crude emulation of unpickler memo
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002393 maxproto = -1 # max protocol number seen
2394 markstack = [] # bytecode positions of MARK opcodes
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002395 indentchunk = ' ' * indentlevel
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002396 errormsg = None
Ezio Melotti30b9d5d2013-08-17 15:50:46 +03002397 annocol = annotate # column hint for annotations
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002398 for opcode, arg, pos in genops(pickle):
2399 if pos is not None:
Guido van Rossumbe19ed72007-02-09 05:37:30 +00002400 print("%5d:" % pos, end=' ', file=out)
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002401
Tim Petersd0f7c862003-01-28 15:27:57 +00002402 line = "%-4s %s%s" % (repr(opcode.code)[1:-1],
2403 indentchunk * len(markstack),
2404 opcode.name)
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002405
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002406 maxproto = max(maxproto, opcode.proto)
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002407 before = opcode.stack_before # don't mutate
2408 after = opcode.stack_after # don't mutate
Tim Peters43277d62003-01-30 15:02:12 +00002409 numtopop = len(before)
2410
2411 # See whether a MARK should be popped.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002412 markmsg = None
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002413 if markobject in before or (opcode.name == "POP" and
2414 stack and
2415 stack[-1] is markobject):
2416 assert markobject not in after
Tim Peters43277d62003-01-30 15:02:12 +00002417 if __debug__:
2418 if markobject in before:
2419 assert before[-1] is stackslice
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002420 if markstack:
2421 markpos = markstack.pop()
2422 if markpos is None:
2423 markmsg = "(MARK at unknown opcode offset)"
2424 else:
2425 markmsg = "(MARK at %d)" % markpos
2426 # Pop everything at and after the topmost markobject.
2427 while stack[-1] is not markobject:
2428 stack.pop()
2429 stack.pop()
Tim Peters43277d62003-01-30 15:02:12 +00002430 # Stop later code from popping too much.
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002431 try:
Tim Peters43277d62003-01-30 15:02:12 +00002432 numtopop = before.index(markobject)
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002433 except ValueError:
2434 assert opcode.name == "POP"
Tim Peters43277d62003-01-30 15:02:12 +00002435 numtopop = 0
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002436 else:
2437 errormsg = markmsg = "no MARK exists on stack"
2438
2439 # Check for correct memo usage.
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01002440 if opcode.name in ("PUT", "BINPUT", "LONG_BINPUT", "MEMOIZE"):
2441 if opcode.name == "MEMOIZE":
2442 memo_idx = len(memo)
2443 else:
2444 assert arg is not None
2445 memo_idx = arg
2446 if memo_idx in memo:
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002447 errormsg = "memo key %r already defined" % arg
2448 elif not stack:
2449 errormsg = "stack is empty -- can't store into memo"
2450 elif stack[-1] is markobject:
2451 errormsg = "can't store markobject in the memo"
2452 else:
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01002453 memo[memo_idx] = stack[-1]
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002454 elif opcode.name in ("GET", "BINGET", "LONG_BINGET"):
2455 if arg in memo:
2456 assert len(after) == 1
2457 after = [memo[arg]] # for better stack emulation
2458 else:
2459 errormsg = "memo key %r has never been stored into" % arg
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002460
2461 if arg is not None or markmsg:
2462 # make a mild effort to align arguments
2463 line += ' ' * (10 - len(opcode.name))
2464 if arg is not None:
2465 line += ' ' + repr(arg)
2466 if markmsg:
2467 line += ' ' + markmsg
Alexander Belopolsky929d3842010-07-17 15:51:21 +00002468 if annotate:
2469 line += ' ' * (annocol - len(line))
2470 # make a mild effort to align annotations
2471 annocol = len(line)
2472 if annocol > 50:
2473 annocol = annotate
2474 line += ' ' + opcode.doc.split('\n', 1)[0]
Guido van Rossumbe19ed72007-02-09 05:37:30 +00002475 print(line, file=out)
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002476
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002477 if errormsg:
2478 # Note that we delayed complaining until the offending opcode
2479 # was printed.
2480 raise ValueError(errormsg)
2481
2482 # Emulate the stack effects.
Tim Peters43277d62003-01-30 15:02:12 +00002483 if len(stack) < numtopop:
2484 raise ValueError("tries to pop %d items from stack with "
2485 "only %d items" % (numtopop, len(stack)))
2486 if numtopop:
2487 del stack[-numtopop:]
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002488 if markobject in after:
Tim Peters43277d62003-01-30 15:02:12 +00002489 assert markobject not in before
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002490 markstack.append(pos)
2491
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002492 stack.extend(after)
2493
Guido van Rossumbe19ed72007-02-09 05:37:30 +00002494 print("highest protocol among opcodes =", maxproto, file=out)
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002495 if stack:
2496 raise ValueError("stack not empty after STOP: %r" % stack)
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002497
Tim Peters90718a42005-02-15 16:22:34 +00002498# For use in the doctest, simply as an example of a class to pickle.
2499class _Example:
2500 def __init__(self, value):
2501 self.value = value
2502
Guido van Rossum03e35322003-01-28 15:37:13 +00002503_dis_test = r"""
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002504>>> import pickle
Guido van Rossumf4169812008-03-17 22:56:06 +00002505>>> x = [1, 2, (3, 4), {b'abc': "def"}]
2506>>> pkl0 = pickle.dumps(x, 0)
2507>>> dis(pkl0)
Tim Petersd0f7c862003-01-28 15:27:57 +00002508 0: ( MARK
2509 1: l LIST (MARK at 0)
2510 2: p PUT 0
Guido van Rossumf4100002007-01-15 00:21:46 +00002511 5: L LONG 1
Mark Dickinson8dd05142009-01-20 20:43:58 +00002512 9: a APPEND
2513 10: L LONG 2
2514 14: a APPEND
2515 15: ( MARK
2516 16: L LONG 3
2517 20: L LONG 4
2518 24: t TUPLE (MARK at 15)
2519 25: p PUT 1
2520 28: a APPEND
2521 29: ( MARK
2522 30: d DICT (MARK at 29)
2523 31: p PUT 2
Alexandre Vassalotti3bfc65a2011-12-13 13:08:09 -05002524 34: c GLOBAL '_codecs encode'
2525 50: p PUT 3
2526 53: ( MARK
2527 54: V UNICODE 'abc'
Antoine Pitroud9dfaa92009-06-04 20:32:06 +00002528 59: p PUT 4
Alexandre Vassalotti3bfc65a2011-12-13 13:08:09 -05002529 62: V UNICODE 'latin1'
2530 70: p PUT 5
2531 73: t TUPLE (MARK at 53)
2532 74: p PUT 6
2533 77: R REDUCE
2534 78: p PUT 7
2535 81: V UNICODE 'def'
2536 86: p PUT 8
2537 89: s SETITEM
2538 90: a APPEND
2539 91: . STOP
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002540highest protocol among opcodes = 0
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002541
2542Try again with a "binary" pickle.
2543
Guido van Rossumf4169812008-03-17 22:56:06 +00002544>>> pkl1 = pickle.dumps(x, 1)
2545>>> dis(pkl1)
Tim Petersd0f7c862003-01-28 15:27:57 +00002546 0: ] EMPTY_LIST
2547 1: q BINPUT 0
2548 3: ( MARK
2549 4: K BININT1 1
2550 6: K BININT1 2
2551 8: ( MARK
2552 9: K BININT1 3
2553 11: K BININT1 4
2554 13: t TUPLE (MARK at 8)
2555 14: q BINPUT 1
2556 16: } EMPTY_DICT
2557 17: q BINPUT 2
Alexandre Vassalotti3bfc65a2011-12-13 13:08:09 -05002558 19: c GLOBAL '_codecs encode'
2559 35: q BINPUT 3
2560 37: ( MARK
2561 38: X BINUNICODE 'abc'
2562 46: q BINPUT 4
2563 48: X BINUNICODE 'latin1'
2564 59: q BINPUT 5
2565 61: t TUPLE (MARK at 37)
2566 62: q BINPUT 6
2567 64: R REDUCE
2568 65: q BINPUT 7
2569 67: X BINUNICODE 'def'
2570 75: q BINPUT 8
2571 77: s SETITEM
2572 78: e APPENDS (MARK at 3)
2573 79: . STOP
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002574highest protocol among opcodes = 1
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002575
2576Exercise the INST/OBJ/BUILD family.
2577
Mark Dickinsoncddcf442009-01-24 21:46:33 +00002578>>> import pickletools
2579>>> dis(pickle.dumps(pickletools.dis, 0))
2580 0: c GLOBAL 'pickletools dis'
2581 17: p PUT 0
2582 20: . STOP
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002583highest protocol among opcodes = 0
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002584
Tim Peters90718a42005-02-15 16:22:34 +00002585>>> from pickletools import _Example
2586>>> x = [_Example(42)] * 2
Guido van Rossumf29d3d62003-01-27 22:47:53 +00002587>>> dis(pickle.dumps(x, 0))
Tim Petersd0f7c862003-01-28 15:27:57 +00002588 0: ( MARK
2589 1: l LIST (MARK at 0)
2590 2: p PUT 0
Antoine Pitroud9dfaa92009-06-04 20:32:06 +00002591 5: c GLOBAL 'copy_reg _reconstructor'
2592 30: p PUT 1
2593 33: ( MARK
2594 34: c GLOBAL 'pickletools _Example'
2595 56: p PUT 2
2596 59: c GLOBAL '__builtin__ object'
2597 79: p PUT 3
2598 82: N NONE
2599 83: t TUPLE (MARK at 33)
2600 84: p PUT 4
2601 87: R REDUCE
2602 88: p PUT 5
2603 91: ( MARK
2604 92: d DICT (MARK at 91)
2605 93: p PUT 6
2606 96: V UNICODE 'value'
2607 103: p PUT 7
2608 106: L LONG 42
2609 111: s SETITEM
2610 112: b BUILD
Mark Dickinson8dd05142009-01-20 20:43:58 +00002611 113: a APPEND
Antoine Pitroud9dfaa92009-06-04 20:32:06 +00002612 114: g GET 5
2613 117: a APPEND
2614 118: . STOP
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002615highest protocol among opcodes = 0
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002616
2617>>> dis(pickle.dumps(x, 1))
Tim Petersd0f7c862003-01-28 15:27:57 +00002618 0: ] EMPTY_LIST
2619 1: q BINPUT 0
2620 3: ( MARK
Antoine Pitroud9dfaa92009-06-04 20:32:06 +00002621 4: c GLOBAL 'copy_reg _reconstructor'
2622 29: q BINPUT 1
2623 31: ( MARK
2624 32: c GLOBAL 'pickletools _Example'
2625 54: q BINPUT 2
2626 56: c GLOBAL '__builtin__ object'
2627 76: q BINPUT 3
2628 78: N NONE
2629 79: t TUPLE (MARK at 31)
2630 80: q BINPUT 4
2631 82: R REDUCE
2632 83: q BINPUT 5
2633 85: } EMPTY_DICT
2634 86: q BINPUT 6
2635 88: X BINUNICODE 'value'
2636 98: q BINPUT 7
2637 100: K BININT1 42
2638 102: s SETITEM
2639 103: b BUILD
2640 104: h BINGET 5
2641 106: e APPENDS (MARK at 3)
2642 107: . STOP
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002643highest protocol among opcodes = 1
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002644
2645Try "the canonical" recursive-object test.
2646
2647>>> L = []
2648>>> T = L,
2649>>> L.append(T)
2650>>> L[0] is T
2651True
2652>>> T[0] is L
2653True
2654>>> L[0][0] is L
2655True
2656>>> T[0][0] is T
2657True
Guido van Rossumf29d3d62003-01-27 22:47:53 +00002658>>> dis(pickle.dumps(L, 0))
Tim Petersd0f7c862003-01-28 15:27:57 +00002659 0: ( MARK
2660 1: l LIST (MARK at 0)
2661 2: p PUT 0
2662 5: ( MARK
2663 6: g GET 0
2664 9: t TUPLE (MARK at 5)
2665 10: p PUT 1
2666 13: a APPEND
2667 14: . STOP
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002668highest protocol among opcodes = 0
2669
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002670>>> dis(pickle.dumps(L, 1))
Tim Petersd0f7c862003-01-28 15:27:57 +00002671 0: ] EMPTY_LIST
2672 1: q BINPUT 0
2673 3: ( MARK
2674 4: h BINGET 0
2675 6: t TUPLE (MARK at 3)
2676 7: q BINPUT 1
2677 9: a APPEND
2678 10: . STOP
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002679highest protocol among opcodes = 1
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002680
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002681Note that, in the protocol 0 pickle of the recursive tuple, the disassembler
2682has to emulate the stack in order to realize that the POP opcode at 16 gets
2683rid of the MARK at 0.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002684
Guido van Rossumf29d3d62003-01-27 22:47:53 +00002685>>> dis(pickle.dumps(T, 0))
Tim Petersd0f7c862003-01-28 15:27:57 +00002686 0: ( MARK
2687 1: ( MARK
2688 2: l LIST (MARK at 1)
2689 3: p PUT 0
2690 6: ( MARK
2691 7: g GET 0
2692 10: t TUPLE (MARK at 6)
2693 11: p PUT 1
2694 14: a APPEND
2695 15: 0 POP
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002696 16: 0 POP (MARK at 0)
2697 17: g GET 1
2698 20: . STOP
2699highest protocol among opcodes = 0
2700
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002701>>> dis(pickle.dumps(T, 1))
Tim Petersd0f7c862003-01-28 15:27:57 +00002702 0: ( MARK
2703 1: ] EMPTY_LIST
2704 2: q BINPUT 0
2705 4: ( MARK
2706 5: h BINGET 0
2707 7: t TUPLE (MARK at 4)
2708 8: q BINPUT 1
2709 10: a APPEND
2710 11: 1 POP_MARK (MARK at 0)
2711 12: h BINGET 1
2712 14: . STOP
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002713highest protocol among opcodes = 1
Tim Petersd0f7c862003-01-28 15:27:57 +00002714
2715Try protocol 2.
2716
2717>>> dis(pickle.dumps(L, 2))
2718 0: \x80 PROTO 2
2719 2: ] EMPTY_LIST
2720 3: q BINPUT 0
2721 5: h BINGET 0
2722 7: \x85 TUPLE1
2723 8: q BINPUT 1
2724 10: a APPEND
2725 11: . STOP
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002726highest protocol among opcodes = 2
Tim Petersd0f7c862003-01-28 15:27:57 +00002727
2728>>> dis(pickle.dumps(T, 2))
2729 0: \x80 PROTO 2
2730 2: ] EMPTY_LIST
2731 3: q BINPUT 0
2732 5: h BINGET 0
2733 7: \x85 TUPLE1
2734 8: q BINPUT 1
2735 10: a APPEND
2736 11: 0 POP
2737 12: h BINGET 1
2738 14: . STOP
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002739highest protocol among opcodes = 2
Alexander Belopolsky929d3842010-07-17 15:51:21 +00002740
2741Try protocol 3 with annotations:
2742
2743>>> dis(pickle.dumps(T, 3), annotate=1)
2744 0: \x80 PROTO 3 Protocol version indicator.
2745 2: ] EMPTY_LIST Push an empty list.
2746 3: q BINPUT 0 Store the stack top into the memo. The stack is not popped.
2747 5: h BINGET 0 Read an object from the memo and push it on the stack.
2748 7: \x85 TUPLE1 Build a one-tuple out of the topmost item on the stack.
2749 8: q BINPUT 1 Store the stack top into the memo. The stack is not popped.
2750 10: a APPEND Append an object to a list.
2751 11: 0 POP Discard the top stack item, shrinking the stack by one item.
2752 12: h BINGET 1 Read an object from the memo and push it on the stack.
2753 14: . STOP Stop the unpickling machine.
2754highest protocol among opcodes = 2
2755
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002756"""
2757
Tim Peters62235e72003-02-05 19:55:53 +00002758_memo_test = r"""
2759>>> import pickle
Guido van Rossumcfe5f202007-05-08 21:26:54 +00002760>>> import io
2761>>> f = io.BytesIO()
Tim Peters62235e72003-02-05 19:55:53 +00002762>>> p = pickle.Pickler(f, 2)
2763>>> x = [1, 2, 3]
2764>>> p.dump(x)
2765>>> p.dump(x)
2766>>> f.seek(0)
Guido van Rossumcfe5f202007-05-08 21:26:54 +000027670
Tim Peters62235e72003-02-05 19:55:53 +00002768>>> memo = {}
2769>>> dis(f, memo=memo)
2770 0: \x80 PROTO 2
2771 2: ] EMPTY_LIST
2772 3: q BINPUT 0
2773 5: ( MARK
2774 6: K BININT1 1
2775 8: K BININT1 2
2776 10: K BININT1 3
2777 12: e APPENDS (MARK at 5)
2778 13: . STOP
2779highest protocol among opcodes = 2
2780>>> dis(f, memo=memo)
2781 14: \x80 PROTO 2
2782 16: h BINGET 0
2783 18: . STOP
2784highest protocol among opcodes = 2
2785"""
2786
Guido van Rossum57028352003-01-28 15:09:10 +00002787__test__ = {'disassembler_test': _dis_test,
Tim Peters62235e72003-02-05 19:55:53 +00002788 'disassembler_memo_test': _memo_test,
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002789 }
2790
2791def _test():
2792 import doctest
2793 return doctest.testmod()
2794
2795if __name__ == "__main__":
Alexander Belopolsky60c762b2010-07-03 20:35:53 +00002796 import sys, argparse
2797 parser = argparse.ArgumentParser(
2798 description='disassemble one or more pickle files')
2799 parser.add_argument(
2800 'pickle_file', type=argparse.FileType('br'),
2801 nargs='*', help='the pickle file')
2802 parser.add_argument(
2803 '-o', '--output', default=sys.stdout, type=argparse.FileType('w'),
2804 help='the file where the output should be written')
2805 parser.add_argument(
2806 '-m', '--memo', action='store_true',
2807 help='preserve memo between disassemblies')
2808 parser.add_argument(
2809 '-l', '--indentlevel', default=4, type=int,
2810 help='the number of blanks by which to indent a new MARK level')
2811 parser.add_argument(
Alexander Belopolsky929d3842010-07-17 15:51:21 +00002812 '-a', '--annotate', action='store_true',
2813 help='annotate each line with a short opcode description')
2814 parser.add_argument(
Alexander Belopolsky60c762b2010-07-03 20:35:53 +00002815 '-p', '--preamble', default="==> {name} <==",
2816 help='if more than one pickle file is specified, print this before'
2817 ' each disassembly')
2818 parser.add_argument(
2819 '-t', '--test', action='store_true',
2820 help='run self-test suite')
2821 parser.add_argument(
2822 '-v', action='store_true',
2823 help='run verbosely; only affects self-test run')
2824 args = parser.parse_args()
2825 if args.test:
2826 _test()
2827 else:
Alexander Belopolsky929d3842010-07-17 15:51:21 +00002828 annotate = 30 if args.annotate else 0
Alexander Belopolsky60c762b2010-07-03 20:35:53 +00002829 if not args.pickle_file:
2830 parser.print_help()
2831 elif len(args.pickle_file) == 1:
Alexander Belopolsky929d3842010-07-17 15:51:21 +00002832 dis(args.pickle_file[0], args.output, None,
2833 args.indentlevel, annotate)
Alexander Belopolsky60c762b2010-07-03 20:35:53 +00002834 else:
2835 memo = {} if args.memo else None
2836 for f in args.pickle_file:
2837 preamble = args.preamble.format(name=f.name)
2838 args.output.write(preamble + '\n')
Alexander Belopolsky929d3842010-07-17 15:51:21 +00002839 dis(f, args.output, memo, args.indentlevel, annotate)