blob: ed8bee36e8c586abc87770cbcf4237e3d91fb9fa [file] [log] [blame]
Skip Montanaro54455942003-01-29 15:41:33 +00001'''"Executable documentation" for the pickle module.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002
3Extensive comments about the pickle protocols and pickle-machine opcodes
4can be found here. Some functions meant for external use:
5
6genops(pickle)
7 Generate all the opcodes in a pickle, as (opcode, arg, position) triples.
8
Andrew M. Kuchlingd0c53fe2004-08-07 16:51:30 +00009dis(pickle, out=None, memo=None, indentlevel=4)
Tim Peters8ecfc8e2003-01-27 18:51:48 +000010 Print a symbolic disassembly of a pickle.
Skip Montanaro54455942003-01-29 15:41:33 +000011'''
Tim Peters8ecfc8e2003-01-27 18:51:48 +000012
Walter Dörwald42748a82007-06-12 16:40:17 +000013import codecs
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +010014import io
Guido van Rossum98297ee2007-11-06 21:34:58 +000015import pickle
16import re
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -070017import sys
Walter Dörwald42748a82007-06-12 16:40:17 +000018
Christian Heimes3feef612008-02-11 06:19:17 +000019__all__ = ['dis', 'genops', 'optimize']
Tim Peters90cf2122004-11-06 23:45:48 +000020
Guido van Rossum98297ee2007-11-06 21:34:58 +000021bytes_types = pickle.bytes_types
22
Tim Peters8ecfc8e2003-01-27 18:51:48 +000023# Other ideas:
24#
25# - A pickle verifier: read a pickle and check it exhaustively for
Tim Petersc1c2b3e2003-01-29 20:12:21 +000026# well-formedness. dis() does a lot of this already.
Tim Peters8ecfc8e2003-01-27 18:51:48 +000027#
28# - A protocol identifier: examine a pickle and return its protocol number
29# (== the highest .proto attr value among all the opcodes in the pickle).
Tim Petersc1c2b3e2003-01-29 20:12:21 +000030# dis() already prints this info at the end.
Tim Peters8ecfc8e2003-01-27 18:51:48 +000031#
32# - A pickle optimizer: for example, tuple-building code is sometimes more
33# elaborate than necessary, catering for the possibility that the tuple
34# is recursive. Or lots of times a PUT is generated that's never accessed
35# by a later GET.
36
37
Victor Stinner765531d2013-03-26 01:11:54 +010038# "A pickle" is a program for a virtual pickle machine (PM, but more accurately
39# called an unpickling machine). It's a sequence of opcodes, interpreted by the
40# PM, building an arbitrarily complex Python object.
41#
42# For the most part, the PM is very simple: there are no looping, testing, or
43# conditional instructions, no arithmetic and no function calls. Opcodes are
44# executed once each, from first to last, until a STOP opcode is reached.
45#
46# The PM has two data areas, "the stack" and "the memo".
47#
48# Many opcodes push Python objects onto the stack; e.g., INT pushes a Python
49# integer object on the stack, whose value is gotten from a decimal string
50# literal immediately following the INT opcode in the pickle bytestream. Other
51# opcodes take Python objects off the stack. The result of unpickling is
52# whatever object is left on the stack when the final STOP opcode is executed.
53#
54# The memo is simply an array of objects, or it can be implemented as a dict
55# mapping little integers to objects. The memo serves as the PM's "long term
56# memory", and the little integers indexing the memo are akin to variable
57# names. Some opcodes pop a stack object into the memo at a given index,
58# and others push a memo object at a given index onto the stack again.
59#
60# At heart, that's all the PM has. Subtleties arise for these reasons:
61#
62# + Object identity. Objects can be arbitrarily complex, and subobjects
63# may be shared (for example, the list [a, a] refers to the same object a
64# twice). It can be vital that unpickling recreate an isomorphic object
65# graph, faithfully reproducing sharing.
66#
67# + Recursive objects. For example, after "L = []; L.append(L)", L is a
68# list, and L[0] is the same list. This is related to the object identity
69# point, and some sequences of pickle opcodes are subtle in order to
70# get the right result in all cases.
71#
72# + Things pickle doesn't know everything about. Examples of things pickle
73# does know everything about are Python's builtin scalar and container
74# types, like ints and tuples. They generally have opcodes dedicated to
75# them. For things like module references and instances of user-defined
76# classes, pickle's knowledge is limited. Historically, many enhancements
77# have been made to the pickle protocol in order to do a better (faster,
78# and/or more compact) job on those.
79#
80# + Backward compatibility and micro-optimization. As explained below,
81# pickle opcodes never go away, not even when better ways to do a thing
82# get invented. The repertoire of the PM just keeps growing over time.
83# For example, protocol 0 had two opcodes for building Python integers (INT
84# and LONG), protocol 1 added three more for more-efficient pickling of short
85# integers, and protocol 2 added two more for more-efficient pickling of
86# long integers (before protocol 2, the only ways to pickle a Python long
87# took time quadratic in the number of digits, for both pickling and
88# unpickling). "Opcode bloat" isn't so much a subtlety as a source of
89# wearying complication.
90#
91#
92# Pickle protocols:
93#
94# For compatibility, the meaning of a pickle opcode never changes. Instead new
95# pickle opcodes get added, and each version's unpickler can handle all the
96# pickle opcodes in all protocol versions to date. So old pickles continue to
97# be readable forever. The pickler can generally be told to restrict itself to
98# the subset of opcodes available under previous protocol versions too, so that
99# users can create pickles under the current version readable by older
100# versions. However, a pickle does not contain its version number embedded
101# within it. If an older unpickler tries to read a pickle using a later
102# protocol, the result is most likely an exception due to seeing an unknown (in
103# the older unpickler) opcode.
104#
105# The original pickle used what's now called "protocol 0", and what was called
106# "text mode" before Python 2.3. The entire pickle bytestream is made up of
107# printable 7-bit ASCII characters, plus the newline character, in protocol 0.
108# That's why it was called text mode. Protocol 0 is small and elegant, but
109# sometimes painfully inefficient.
110#
111# The second major set of additions is now called "protocol 1", and was called
112# "binary mode" before Python 2.3. This added many opcodes with arguments
113# consisting of arbitrary bytes, including NUL bytes and unprintable "high bit"
114# bytes. Binary mode pickles can be substantially smaller than equivalent
115# text mode pickles, and sometimes faster too; e.g., BININT represents a 4-byte
116# int as 4 bytes following the opcode, which is cheaper to unpickle than the
117# (perhaps) 11-character decimal string attached to INT. Protocol 1 also added
118# a number of opcodes that operate on many stack elements at once (like APPENDS
119# and SETITEMS), and "shortcut" opcodes (like EMPTY_DICT and EMPTY_TUPLE).
120#
121# The third major set of additions came in Python 2.3, and is called "protocol
122# 2". This added:
123#
124# - A better way to pickle instances of new-style classes (NEWOBJ).
125#
126# - A way for a pickle to identify its protocol (PROTO).
127#
128# - Time- and space- efficient pickling of long ints (LONG{1,4}).
129#
130# - Shortcuts for small tuples (TUPLE{1,2,3}}.
131#
132# - Dedicated opcodes for bools (NEWTRUE, NEWFALSE).
133#
134# - The "extension registry", a vector of popular objects that can be pushed
135# efficiently by index (EXT{1,2,4}). This is akin to the memo and GET, but
136# the registry contents are predefined (there's nothing akin to the memo's
137# PUT).
138#
139# Another independent change with Python 2.3 is the abandonment of any
140# pretense that it might be safe to load pickles received from untrusted
141# parties -- no sufficient security analysis has been done to guarantee
142# this and there isn't a use case that warrants the expense of such an
143# analysis.
144#
145# To this end, all tests for __safe_for_unpickling__ or for
146# copyreg.safe_constructors are removed from the unpickling code.
147# References to these variables in the descriptions below are to be seen
148# as describing unpickling in Python 2.2 and before.
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000149
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000150
151# Meta-rule: Descriptions are stored in instances of descriptor objects,
152# with plain constructors. No meta-language is defined from which
153# descriptors could be constructed. If you want, e.g., XML, write a little
154# program to generate XML from the objects.
155
156##############################################################################
157# Some pickle opcodes have an argument, following the opcode in the
158# bytestream. An argument is of a specific type, described by an instance
159# of ArgumentDescriptor. These are not to be confused with arguments taken
160# off the stack -- ArgumentDescriptor applies only to arguments embedded in
161# the opcode stream, immediately following an opcode.
162
163# Represents the number of bytes consumed by an argument delimited by the
164# next newline character.
165UP_TO_NEWLINE = -1
166
167# Represents the number of bytes consumed by a two-argument opcode where
168# the first argument gives the number of bytes in the second argument.
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -0700169TAKEN_FROM_ARGUMENT1 = -2 # num bytes is 1-byte unsigned int
170TAKEN_FROM_ARGUMENT4 = -3 # num bytes is 4-byte signed little-endian int
171TAKEN_FROM_ARGUMENT4U = -4 # num bytes is 4-byte unsigned little-endian int
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100172TAKEN_FROM_ARGUMENT8U = -5 # num bytes is 8-byte unsigned little-endian int
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000173
174class ArgumentDescriptor(object):
175 __slots__ = (
176 # name of descriptor record, also a module global name; a string
177 'name',
178
179 # length of argument, in bytes; an int; UP_TO_NEWLINE and
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100180 # TAKEN_FROM_ARGUMENT{1,4,8} are negative values for variable-length
Tim Petersfdb8cfa2003-01-28 00:13:19 +0000181 # cases
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000182 'n',
183
184 # a function taking a file-like object, reading this kind of argument
185 # from the object at the current position, advancing the current
186 # position by n bytes, and returning the value of the argument
187 'reader',
188
189 # human-readable docs for this arg descriptor; a string
190 'doc',
191 )
192
193 def __init__(self, name, n, reader, doc):
194 assert isinstance(name, str)
195 self.name = name
196
197 assert isinstance(n, int) and (n >= 0 or
Tim Petersfdb8cfa2003-01-28 00:13:19 +0000198 n in (UP_TO_NEWLINE,
199 TAKEN_FROM_ARGUMENT1,
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -0700200 TAKEN_FROM_ARGUMENT4,
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100201 TAKEN_FROM_ARGUMENT4U,
202 TAKEN_FROM_ARGUMENT8U))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000203 self.n = n
204
205 self.reader = reader
206
207 assert isinstance(doc, str)
208 self.doc = doc
209
210from struct import unpack as _unpack
211
212def read_uint1(f):
Tim Peters55762f52003-01-28 16:01:25 +0000213 r"""
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000214 >>> import io
215 >>> read_uint1(io.BytesIO(b'\xff'))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000216 255
217 """
218
219 data = f.read(1)
220 if data:
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000221 return data[0]
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000222 raise ValueError("not enough data in stream to read uint1")
223
224uint1 = ArgumentDescriptor(
225 name='uint1',
226 n=1,
227 reader=read_uint1,
228 doc="One-byte unsigned integer.")
229
230
231def read_uint2(f):
Tim Peters55762f52003-01-28 16:01:25 +0000232 r"""
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000233 >>> import io
234 >>> read_uint2(io.BytesIO(b'\xff\x00'))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000235 255
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000236 >>> read_uint2(io.BytesIO(b'\xff\xff'))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000237 65535
238 """
239
240 data = f.read(2)
241 if len(data) == 2:
242 return _unpack("<H", data)[0]
243 raise ValueError("not enough data in stream to read uint2")
244
245uint2 = ArgumentDescriptor(
246 name='uint2',
247 n=2,
248 reader=read_uint2,
249 doc="Two-byte unsigned integer, little-endian.")
250
251
252def read_int4(f):
Tim Peters55762f52003-01-28 16:01:25 +0000253 r"""
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000254 >>> import io
255 >>> read_int4(io.BytesIO(b'\xff\x00\x00\x00'))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000256 255
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000257 >>> read_int4(io.BytesIO(b'\x00\x00\x00\x80')) == -(2**31)
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000258 True
259 """
260
261 data = f.read(4)
262 if len(data) == 4:
263 return _unpack("<i", data)[0]
264 raise ValueError("not enough data in stream to read int4")
265
266int4 = ArgumentDescriptor(
267 name='int4',
268 n=4,
269 reader=read_int4,
270 doc="Four-byte signed integer, little-endian, 2's complement.")
271
272
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -0700273def read_uint4(f):
274 r"""
275 >>> import io
276 >>> read_uint4(io.BytesIO(b'\xff\x00\x00\x00'))
277 255
278 >>> read_uint4(io.BytesIO(b'\x00\x00\x00\x80')) == 2**31
279 True
280 """
281
282 data = f.read(4)
283 if len(data) == 4:
284 return _unpack("<I", data)[0]
285 raise ValueError("not enough data in stream to read uint4")
286
287uint4 = ArgumentDescriptor(
288 name='uint4',
289 n=4,
290 reader=read_uint4,
291 doc="Four-byte unsigned integer, little-endian.")
292
293
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100294def read_uint8(f):
295 r"""
296 >>> import io
297 >>> read_uint8(io.BytesIO(b'\xff\x00\x00\x00\x00\x00\x00\x00'))
298 255
299 >>> read_uint8(io.BytesIO(b'\xff' * 8)) == 2**64-1
300 True
301 """
302
303 data = f.read(8)
304 if len(data) == 8:
305 return _unpack("<Q", data)[0]
306 raise ValueError("not enough data in stream to read uint8")
307
308uint8 = ArgumentDescriptor(
309 name='uint8',
310 n=8,
311 reader=read_uint8,
312 doc="Eight-byte unsigned integer, little-endian.")
313
314
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000315def read_stringnl(f, decode=True, stripquotes=True):
Tim Peters55762f52003-01-28 16:01:25 +0000316 r"""
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000317 >>> import io
318 >>> read_stringnl(io.BytesIO(b"'abcd'\nefg\n"))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000319 'abcd'
320
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000321 >>> read_stringnl(io.BytesIO(b"\n"))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000322 Traceback (most recent call last):
323 ...
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000324 ValueError: no string quotes around b''
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000325
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000326 >>> read_stringnl(io.BytesIO(b"\n"), stripquotes=False)
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000327 ''
328
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000329 >>> read_stringnl(io.BytesIO(b"''\n"))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000330 ''
331
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000332 >>> read_stringnl(io.BytesIO(b'"abcd"'))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000333 Traceback (most recent call last):
334 ...
335 ValueError: no newline found when trying to read stringnl
336
337 Embedded escapes are undone in the result.
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000338 >>> read_stringnl(io.BytesIO(br"'a\n\\b\x00c\td'" + b"\n'e'"))
Tim Peters55762f52003-01-28 16:01:25 +0000339 'a\n\\b\x00c\td'
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000340 """
341
Guido van Rossum26986312007-07-17 00:19:46 +0000342 data = f.readline()
Guido van Rossum26d95c32007-08-27 23:18:54 +0000343 if not data.endswith(b'\n'):
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000344 raise ValueError("no newline found when trying to read stringnl")
345 data = data[:-1] # lose the newline
346
347 if stripquotes:
Guido van Rossum26d95c32007-08-27 23:18:54 +0000348 for q in (b'"', b"'"):
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000349 if data.startswith(q):
350 if not data.endswith(q):
351 raise ValueError("strinq quote %r not found at both "
352 "ends of %r" % (q, data))
353 data = data[1:-1]
354 break
355 else:
356 raise ValueError("no string quotes around %r" % data)
357
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000358 if decode:
Guido van Rossum98297ee2007-11-06 21:34:58 +0000359 data = codecs.escape_decode(data)[0].decode("ascii")
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000360 return data
361
362stringnl = ArgumentDescriptor(
363 name='stringnl',
364 n=UP_TO_NEWLINE,
365 reader=read_stringnl,
366 doc="""A newline-terminated string.
367
368 This is a repr-style string, with embedded escapes, and
369 bracketing quotes.
370 """)
371
372def read_stringnl_noescape(f):
Guido van Rossum98297ee2007-11-06 21:34:58 +0000373 return read_stringnl(f, stripquotes=False)
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000374
375stringnl_noescape = ArgumentDescriptor(
376 name='stringnl_noescape',
377 n=UP_TO_NEWLINE,
378 reader=read_stringnl_noescape,
379 doc="""A newline-terminated string.
380
381 This is a str-style string, without embedded escapes,
382 or bracketing quotes. It should consist solely of
383 printable ASCII characters.
384 """)
385
386def read_stringnl_noescape_pair(f):
Tim Peters55762f52003-01-28 16:01:25 +0000387 r"""
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000388 >>> import io
389 >>> read_stringnl_noescape_pair(io.BytesIO(b"Queue\nEmpty\njunk"))
Tim Petersd916cf42003-01-27 19:01:47 +0000390 'Queue Empty'
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000391 """
392
Tim Petersd916cf42003-01-27 19:01:47 +0000393 return "%s %s" % (read_stringnl_noescape(f), read_stringnl_noescape(f))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000394
395stringnl_noescape_pair = ArgumentDescriptor(
396 name='stringnl_noescape_pair',
397 n=UP_TO_NEWLINE,
398 reader=read_stringnl_noescape_pair,
399 doc="""A pair of newline-terminated strings.
400
401 These are str-style strings, without embedded
402 escapes, or bracketing quotes. They should
403 consist solely of printable ASCII characters.
404 The pair is returned as a single string, with
Tim Petersd916cf42003-01-27 19:01:47 +0000405 a single blank separating the two strings.
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000406 """)
407
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100408
409def read_string1(f):
410 r"""
411 >>> import io
412 >>> read_string1(io.BytesIO(b"\x00"))
413 ''
414 >>> read_string1(io.BytesIO(b"\x03abcdef"))
415 'abc'
416 """
417
418 n = read_uint1(f)
419 assert n >= 0
420 data = f.read(n)
421 if len(data) == n:
422 return data.decode("latin-1")
423 raise ValueError("expected %d bytes in a string1, but only %d remain" %
424 (n, len(data)))
425
426string1 = ArgumentDescriptor(
427 name="string1",
428 n=TAKEN_FROM_ARGUMENT1,
429 reader=read_string1,
430 doc="""A counted string.
431
432 The first argument is a 1-byte unsigned int giving the number
433 of bytes in the string, and the second argument is that many
434 bytes.
435 """)
436
437
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000438def read_string4(f):
Tim Peters55762f52003-01-28 16:01:25 +0000439 r"""
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000440 >>> import io
441 >>> read_string4(io.BytesIO(b"\x00\x00\x00\x00abc"))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000442 ''
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000443 >>> read_string4(io.BytesIO(b"\x03\x00\x00\x00abcdef"))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000444 'abc'
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000445 >>> read_string4(io.BytesIO(b"\x00\x00\x00\x03abcdef"))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000446 Traceback (most recent call last):
447 ...
448 ValueError: expected 50331648 bytes in a string4, but only 6 remain
449 """
450
451 n = read_int4(f)
452 if n < 0:
453 raise ValueError("string4 byte count < 0: %d" % n)
454 data = f.read(n)
455 if len(data) == n:
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000456 return data.decode("latin-1")
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000457 raise ValueError("expected %d bytes in a string4, but only %d remain" %
458 (n, len(data)))
459
460string4 = ArgumentDescriptor(
461 name="string4",
Tim Petersfdb8cfa2003-01-28 00:13:19 +0000462 n=TAKEN_FROM_ARGUMENT4,
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000463 reader=read_string4,
464 doc="""A counted string.
465
466 The first argument is a 4-byte little-endian signed int giving
467 the number of bytes in the string, and the second argument is
468 that many bytes.
469 """)
470
471
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100472def read_bytes1(f):
Tim Peters55762f52003-01-28 16:01:25 +0000473 r"""
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000474 >>> import io
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100475 >>> read_bytes1(io.BytesIO(b"\x00"))
476 b''
477 >>> read_bytes1(io.BytesIO(b"\x03abcdef"))
478 b'abc'
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000479 """
480
481 n = read_uint1(f)
482 assert n >= 0
483 data = f.read(n)
484 if len(data) == n:
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100485 return data
486 raise ValueError("expected %d bytes in a bytes1, but only %d remain" %
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000487 (n, len(data)))
488
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100489bytes1 = ArgumentDescriptor(
490 name="bytes1",
Tim Petersfdb8cfa2003-01-28 00:13:19 +0000491 n=TAKEN_FROM_ARGUMENT1,
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100492 reader=read_bytes1,
493 doc="""A counted bytes string.
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000494
495 The first argument is a 1-byte unsigned int giving the number
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -0700496 of bytes, and the second argument is that many bytes.
497 """)
498
499
500def read_bytes4(f):
501 r"""
502 >>> import io
503 >>> read_bytes4(io.BytesIO(b"\x00\x00\x00\x00abc"))
504 b''
505 >>> read_bytes4(io.BytesIO(b"\x03\x00\x00\x00abcdef"))
506 b'abc'
507 >>> read_bytes4(io.BytesIO(b"\x00\x00\x00\x03abcdef"))
508 Traceback (most recent call last):
509 ...
510 ValueError: expected 50331648 bytes in a bytes4, but only 6 remain
511 """
512
513 n = read_uint4(f)
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100514 assert n >= 0
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -0700515 if n > sys.maxsize:
516 raise ValueError("bytes4 byte count > sys.maxsize: %d" % n)
517 data = f.read(n)
518 if len(data) == n:
519 return data
520 raise ValueError("expected %d bytes in a bytes4, but only %d remain" %
521 (n, len(data)))
522
523bytes4 = ArgumentDescriptor(
524 name="bytes4",
525 n=TAKEN_FROM_ARGUMENT4U,
526 reader=read_bytes4,
527 doc="""A counted bytes string.
528
529 The first argument is a 4-byte little-endian unsigned int giving
530 the number of bytes, and the second argument is that many bytes.
531 """)
532
533
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100534def read_bytes8(f):
535 r"""
Gregory P. Smith057e58d2013-11-23 20:40:46 +0000536 >>> import io, struct, sys
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100537 >>> read_bytes8(io.BytesIO(b"\x00\x00\x00\x00\x00\x00\x00\x00abc"))
538 b''
539 >>> read_bytes8(io.BytesIO(b"\x03\x00\x00\x00\x00\x00\x00\x00abcdef"))
540 b'abc'
Gregory P. Smith057e58d2013-11-23 20:40:46 +0000541 >>> bigsize8 = struct.pack("<Q", sys.maxsize//3)
542 >>> read_bytes8(io.BytesIO(bigsize8 + b"abcdef")) #doctest: +ELLIPSIS
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100543 Traceback (most recent call last):
544 ...
Gregory P. Smith057e58d2013-11-23 20:40:46 +0000545 ValueError: expected ... bytes in a bytes8, but only 6 remain
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100546 """
547
548 n = read_uint8(f)
549 assert n >= 0
550 if n > sys.maxsize:
551 raise ValueError("bytes8 byte count > sys.maxsize: %d" % n)
552 data = f.read(n)
553 if len(data) == n:
554 return data
555 raise ValueError("expected %d bytes in a bytes8, but only %d remain" %
556 (n, len(data)))
557
558bytes8 = ArgumentDescriptor(
559 name="bytes8",
560 n=TAKEN_FROM_ARGUMENT8U,
561 reader=read_bytes8,
562 doc="""A counted bytes string.
563
Martin Panter4c359642016-05-08 13:53:41 +0000564 The first argument is an 8-byte little-endian unsigned int giving
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100565 the number of bytes, and the second argument is that many bytes.
566 """)
567
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000568def read_unicodestringnl(f):
Tim Peters55762f52003-01-28 16:01:25 +0000569 r"""
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000570 >>> import io
571 >>> read_unicodestringnl(io.BytesIO(b"abc\\uabcd\njunk")) == 'abc\uabcd'
572 True
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000573 """
574
Guido van Rossum26986312007-07-17 00:19:46 +0000575 data = f.readline()
Guido van Rossum26d95c32007-08-27 23:18:54 +0000576 if not data.endswith(b'\n'):
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000577 raise ValueError("no newline found when trying to read "
578 "unicodestringnl")
579 data = data[:-1] # lose the newline
Guido van Rossumef87d6e2007-05-02 19:09:54 +0000580 return str(data, 'raw-unicode-escape')
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000581
582unicodestringnl = ArgumentDescriptor(
583 name='unicodestringnl',
584 n=UP_TO_NEWLINE,
585 reader=read_unicodestringnl,
586 doc="""A newline-terminated Unicode string.
587
588 This is raw-unicode-escape encoded, so consists of
589 printable ASCII characters, and may contain embedded
590 escape sequences.
591 """)
592
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100593
594def read_unicodestring1(f):
595 r"""
596 >>> import io
597 >>> s = 'abcd\uabcd'
598 >>> enc = s.encode('utf-8')
599 >>> enc
600 b'abcd\xea\xaf\x8d'
601 >>> n = bytes([len(enc)]) # little-endian 1-byte length
602 >>> t = read_unicodestring1(io.BytesIO(n + enc + b'junk'))
603 >>> s == t
604 True
605
606 >>> read_unicodestring1(io.BytesIO(n + enc[:-1]))
607 Traceback (most recent call last):
608 ...
609 ValueError: expected 7 bytes in a unicodestring1, but only 6 remain
610 """
611
612 n = read_uint1(f)
613 assert n >= 0
614 data = f.read(n)
615 if len(data) == n:
616 return str(data, 'utf-8', 'surrogatepass')
617 raise ValueError("expected %d bytes in a unicodestring1, but only %d "
618 "remain" % (n, len(data)))
619
620unicodestring1 = ArgumentDescriptor(
621 name="unicodestring1",
622 n=TAKEN_FROM_ARGUMENT1,
623 reader=read_unicodestring1,
624 doc="""A counted Unicode string.
625
626 The first argument is a 1-byte little-endian signed int
627 giving the number of bytes in the string, and the second
628 argument-- the UTF-8 encoding of the Unicode string --
629 contains that many bytes.
630 """)
631
632
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000633def read_unicodestring4(f):
Tim Peters55762f52003-01-28 16:01:25 +0000634 r"""
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000635 >>> import io
636 >>> s = 'abcd\uabcd'
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000637 >>> enc = s.encode('utf-8')
638 >>> enc
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000639 b'abcd\xea\xaf\x8d'
640 >>> n = bytes([len(enc), 0, 0, 0]) # little-endian 4-byte length
641 >>> t = read_unicodestring4(io.BytesIO(n + enc + b'junk'))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000642 >>> s == t
643 True
644
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000645 >>> read_unicodestring4(io.BytesIO(n + enc[:-1]))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000646 Traceback (most recent call last):
647 ...
648 ValueError: expected 7 bytes in a unicodestring4, but only 6 remain
649 """
650
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -0700651 n = read_uint4(f)
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100652 assert n >= 0
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -0700653 if n > sys.maxsize:
654 raise ValueError("unicodestring4 byte count > sys.maxsize: %d" % n)
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000655 data = f.read(n)
656 if len(data) == n:
Victor Stinner485fb562010-04-13 11:07:24 +0000657 return str(data, 'utf-8', 'surrogatepass')
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000658 raise ValueError("expected %d bytes in a unicodestring4, but only %d "
659 "remain" % (n, len(data)))
660
661unicodestring4 = ArgumentDescriptor(
662 name="unicodestring4",
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -0700663 n=TAKEN_FROM_ARGUMENT4U,
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000664 reader=read_unicodestring4,
665 doc="""A counted Unicode string.
666
667 The first argument is a 4-byte little-endian signed int
668 giving the number of bytes in the string, and the second
669 argument-- the UTF-8 encoding of the Unicode string --
670 contains that many bytes.
671 """)
672
673
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100674def read_unicodestring8(f):
675 r"""
676 >>> import io
677 >>> s = 'abcd\uabcd'
678 >>> enc = s.encode('utf-8')
679 >>> enc
680 b'abcd\xea\xaf\x8d'
Serhiy Storchaka5f1a5182016-09-11 14:41:02 +0300681 >>> n = bytes([len(enc)]) + b'\0' * 7 # little-endian 8-byte length
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100682 >>> t = read_unicodestring8(io.BytesIO(n + enc + b'junk'))
683 >>> s == t
684 True
685
686 >>> read_unicodestring8(io.BytesIO(n + enc[:-1]))
687 Traceback (most recent call last):
688 ...
689 ValueError: expected 7 bytes in a unicodestring8, but only 6 remain
690 """
691
692 n = read_uint8(f)
693 assert n >= 0
694 if n > sys.maxsize:
695 raise ValueError("unicodestring8 byte count > sys.maxsize: %d" % n)
696 data = f.read(n)
697 if len(data) == n:
698 return str(data, 'utf-8', 'surrogatepass')
699 raise ValueError("expected %d bytes in a unicodestring8, but only %d "
700 "remain" % (n, len(data)))
701
702unicodestring8 = ArgumentDescriptor(
703 name="unicodestring8",
704 n=TAKEN_FROM_ARGUMENT8U,
705 reader=read_unicodestring8,
706 doc="""A counted Unicode string.
707
Martin Panter4c359642016-05-08 13:53:41 +0000708 The first argument is an 8-byte little-endian signed int
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100709 giving the number of bytes in the string, and the second
710 argument-- the UTF-8 encoding of the Unicode string --
711 contains that many bytes.
712 """)
713
714
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000715def read_decimalnl_short(f):
Tim Peters55762f52003-01-28 16:01:25 +0000716 r"""
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000717 >>> import io
718 >>> read_decimalnl_short(io.BytesIO(b"1234\n56"))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000719 1234
720
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000721 >>> read_decimalnl_short(io.BytesIO(b"1234L\n56"))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000722 Traceback (most recent call last):
723 ...
Serhiy Storchaka95949422013-08-27 19:40:23 +0300724 ValueError: invalid literal for int() with base 10: b'1234L'
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000725 """
726
727 s = read_stringnl(f, decode=False, stripquotes=False)
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000728
Serhiy Storchaka95949422013-08-27 19:40:23 +0300729 # There's a hack for True and False here.
Jeremy Hyltona5dc3db2007-08-29 19:07:40 +0000730 if s == b"00":
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000731 return False
Jeremy Hyltona5dc3db2007-08-29 19:07:40 +0000732 elif s == b"01":
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000733 return True
734
Florent Xicluna2bb96f52011-10-23 22:11:00 +0200735 return int(s)
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000736
737def read_decimalnl_long(f):
Tim Peters55762f52003-01-28 16:01:25 +0000738 r"""
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000739 >>> import io
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000740
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000741 >>> read_decimalnl_long(io.BytesIO(b"1234L\n56"))
Guido van Rossume2b70bc2006-08-18 22:13:04 +0000742 1234
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000743
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000744 >>> read_decimalnl_long(io.BytesIO(b"123456789012345678901234L\n6"))
Guido van Rossume2b70bc2006-08-18 22:13:04 +0000745 123456789012345678901234
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000746 """
747
748 s = read_stringnl(f, decode=False, stripquotes=False)
Mark Dickinson8dd05142009-01-20 20:43:58 +0000749 if s[-1:] == b'L':
750 s = s[:-1]
Guido van Rossume2a383d2007-01-15 16:59:06 +0000751 return int(s)
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000752
753
754decimalnl_short = ArgumentDescriptor(
755 name='decimalnl_short',
756 n=UP_TO_NEWLINE,
757 reader=read_decimalnl_short,
758 doc="""A newline-terminated decimal integer literal.
759
760 This never has a trailing 'L', and the integer fit
761 in a short Python int on the box where the pickle
762 was written -- but there's no guarantee it will fit
763 in a short Python int on the box where the pickle
764 is read.
765 """)
766
767decimalnl_long = ArgumentDescriptor(
768 name='decimalnl_long',
769 n=UP_TO_NEWLINE,
770 reader=read_decimalnl_long,
771 doc="""A newline-terminated decimal integer literal.
772
773 This has a trailing 'L', and can represent integers
774 of any size.
775 """)
776
777
778def read_floatnl(f):
Tim Peters55762f52003-01-28 16:01:25 +0000779 r"""
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000780 >>> import io
781 >>> read_floatnl(io.BytesIO(b"-1.25\n6"))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000782 -1.25
783 """
784 s = read_stringnl(f, decode=False, stripquotes=False)
785 return float(s)
786
787floatnl = ArgumentDescriptor(
788 name='floatnl',
789 n=UP_TO_NEWLINE,
790 reader=read_floatnl,
791 doc="""A newline-terminated decimal floating literal.
792
793 In general this requires 17 significant digits for roundtrip
794 identity, and pickling then unpickling infinities, NaNs, and
795 minus zero doesn't work across boxes, or on some boxes even
796 on itself (e.g., Windows can't read the strings it produces
797 for infinities or NaNs).
798 """)
799
800def read_float8(f):
Tim Peters55762f52003-01-28 16:01:25 +0000801 r"""
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000802 >>> import io, struct
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000803 >>> raw = struct.pack(">d", -1.25)
804 >>> raw
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000805 b'\xbf\xf4\x00\x00\x00\x00\x00\x00'
806 >>> read_float8(io.BytesIO(raw + b"\n"))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000807 -1.25
808 """
809
810 data = f.read(8)
811 if len(data) == 8:
812 return _unpack(">d", data)[0]
813 raise ValueError("not enough data in stream to read float8")
814
815
816float8 = ArgumentDescriptor(
817 name='float8',
818 n=8,
819 reader=read_float8,
820 doc="""An 8-byte binary representation of a float, big-endian.
821
822 The format is unique to Python, and shared with the struct
Guido van Rossum99603b02007-07-20 00:22:32 +0000823 module (format string '>d') "in theory" (the struct and pickle
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000824 implementations don't share the code -- they should). It's
825 strongly related to the IEEE-754 double format, and, in normal
826 cases, is in fact identical to the big-endian 754 double format.
827 On other boxes the dynamic range is limited to that of a 754
828 double, and "add a half and chop" rounding is used to reduce
829 the precision to 53 bits. However, even on a 754 box,
830 infinities, NaNs, and minus zero may not be handled correctly
831 (may not survive roundtrip pickling intact).
832 """)
833
Guido van Rossum5a2d8f52003-01-27 21:44:25 +0000834# Protocol 2 formats
835
Tim Petersc0c12b52003-01-29 00:56:17 +0000836from pickle import decode_long
Guido van Rossum5a2d8f52003-01-27 21:44:25 +0000837
838def read_long1(f):
839 r"""
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000840 >>> import io
841 >>> read_long1(io.BytesIO(b"\x00"))
Guido van Rossume2b70bc2006-08-18 22:13:04 +0000842 0
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000843 >>> read_long1(io.BytesIO(b"\x02\xff\x00"))
Guido van Rossume2b70bc2006-08-18 22:13:04 +0000844 255
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000845 >>> read_long1(io.BytesIO(b"\x02\xff\x7f"))
Guido van Rossume2b70bc2006-08-18 22:13:04 +0000846 32767
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000847 >>> read_long1(io.BytesIO(b"\x02\x00\xff"))
Guido van Rossume2b70bc2006-08-18 22:13:04 +0000848 -256
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000849 >>> read_long1(io.BytesIO(b"\x02\x00\x80"))
Guido van Rossume2b70bc2006-08-18 22:13:04 +0000850 -32768
Guido van Rossum5a2d8f52003-01-27 21:44:25 +0000851 """
852
853 n = read_uint1(f)
854 data = f.read(n)
855 if len(data) != n:
856 raise ValueError("not enough data in stream to read long1")
857 return decode_long(data)
858
859long1 = ArgumentDescriptor(
860 name="long1",
Tim Petersfdb8cfa2003-01-28 00:13:19 +0000861 n=TAKEN_FROM_ARGUMENT1,
Guido van Rossum5a2d8f52003-01-27 21:44:25 +0000862 reader=read_long1,
863 doc="""A binary long, little-endian, using 1-byte size.
864
865 This first reads one byte as an unsigned size, then reads that
Tim Petersbdbe7412003-01-27 23:54:04 +0000866 many bytes and interprets them as a little-endian 2's-complement long.
Tim Peters4b23f2b2003-01-31 16:43:39 +0000867 If the size is 0, that's taken as a shortcut for the long 0L.
Guido van Rossum5a2d8f52003-01-27 21:44:25 +0000868 """)
869
Guido van Rossum5a2d8f52003-01-27 21:44:25 +0000870def read_long4(f):
871 r"""
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000872 >>> import io
873 >>> read_long4(io.BytesIO(b"\x02\x00\x00\x00\xff\x00"))
Guido van Rossume2b70bc2006-08-18 22:13:04 +0000874 255
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000875 >>> read_long4(io.BytesIO(b"\x02\x00\x00\x00\xff\x7f"))
Guido van Rossume2b70bc2006-08-18 22:13:04 +0000876 32767
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000877 >>> read_long4(io.BytesIO(b"\x02\x00\x00\x00\x00\xff"))
Guido van Rossume2b70bc2006-08-18 22:13:04 +0000878 -256
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000879 >>> read_long4(io.BytesIO(b"\x02\x00\x00\x00\x00\x80"))
Guido van Rossume2b70bc2006-08-18 22:13:04 +0000880 -32768
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000881 >>> read_long1(io.BytesIO(b"\x00\x00\x00\x00"))
Guido van Rossume2b70bc2006-08-18 22:13:04 +0000882 0
Guido van Rossum5a2d8f52003-01-27 21:44:25 +0000883 """
884
885 n = read_int4(f)
886 if n < 0:
Neal Norwitz784a3f52003-01-28 00:20:41 +0000887 raise ValueError("long4 byte count < 0: %d" % n)
Guido van Rossum5a2d8f52003-01-27 21:44:25 +0000888 data = f.read(n)
889 if len(data) != n:
Neal Norwitz784a3f52003-01-28 00:20:41 +0000890 raise ValueError("not enough data in stream to read long4")
Guido van Rossum5a2d8f52003-01-27 21:44:25 +0000891 return decode_long(data)
892
893long4 = ArgumentDescriptor(
894 name="long4",
Tim Petersfdb8cfa2003-01-28 00:13:19 +0000895 n=TAKEN_FROM_ARGUMENT4,
Guido van Rossum5a2d8f52003-01-27 21:44:25 +0000896 reader=read_long4,
897 doc="""A binary representation of a long, little-endian.
898
899 This first reads four bytes as a signed size (but requires the
900 size to be >= 0), then reads that many bytes and interprets them
Tim Peters4b23f2b2003-01-31 16:43:39 +0000901 as a little-endian 2's-complement long. If the size is 0, that's taken
Guido van Rossume2a383d2007-01-15 16:59:06 +0000902 as a shortcut for the int 0, although LONG1 should really be used
Tim Peters4b23f2b2003-01-31 16:43:39 +0000903 then instead (and in any case where # of bytes < 256).
Guido van Rossum5a2d8f52003-01-27 21:44:25 +0000904 """)
905
906
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000907##############################################################################
908# Object descriptors. The stack used by the pickle machine holds objects,
909# and in the stack_before and stack_after attributes of OpcodeInfo
910# descriptors we need names to describe the various types of objects that can
911# appear on the stack.
912
913class StackObject(object):
914 __slots__ = (
915 # name of descriptor record, for info only
916 'name',
917
918 # type of object, or tuple of type objects (meaning the object can
919 # be of any type in the tuple)
920 'obtype',
921
922 # human-readable docs for this kind of stack object; a string
923 'doc',
924 )
925
926 def __init__(self, name, obtype, doc):
Guido van Rossum3172c5d2007-10-16 18:12:55 +0000927 assert isinstance(name, str)
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000928 self.name = name
929
930 assert isinstance(obtype, type) or isinstance(obtype, tuple)
931 if isinstance(obtype, tuple):
932 for contained in obtype:
933 assert isinstance(contained, type)
934 self.obtype = obtype
935
Guido van Rossum3172c5d2007-10-16 18:12:55 +0000936 assert isinstance(doc, str)
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000937 self.doc = doc
938
Tim Petersc1c2b3e2003-01-29 20:12:21 +0000939 def __repr__(self):
940 return self.name
941
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000942
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -0800943pyint = pylong = StackObject(
944 name='int',
945 obtype=int,
946 doc="A Python integer object.")
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000947
948pyinteger_or_bool = StackObject(
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -0800949 name='int_or_bool',
950 obtype=(int, bool),
951 doc="A Python integer or boolean object.")
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000952
Guido van Rossum5a2d8f52003-01-27 21:44:25 +0000953pybool = StackObject(
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -0800954 name='bool',
955 obtype=bool,
956 doc="A Python boolean object.")
Guido van Rossum5a2d8f52003-01-27 21:44:25 +0000957
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000958pyfloat = StackObject(
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -0800959 name='float',
960 obtype=float,
961 doc="A Python float object.")
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000962
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -0800963pybytes_or_str = pystring = StackObject(
964 name='bytes_or_str',
965 obtype=(bytes, str),
966 doc="A Python bytes or (Unicode) string object.")
Guido van Rossumf4169812008-03-17 22:56:06 +0000967
968pybytes = StackObject(
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -0800969 name='bytes',
970 obtype=bytes,
971 doc="A Python bytes object.")
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000972
973pyunicode = StackObject(
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -0800974 name='str',
975 obtype=str,
976 doc="A Python (Unicode) string object.")
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000977
978pynone = StackObject(
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -0800979 name="None",
980 obtype=type(None),
981 doc="The Python None object.")
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000982
983pytuple = StackObject(
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -0800984 name="tuple",
985 obtype=tuple,
986 doc="A Python tuple object.")
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000987
988pylist = StackObject(
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -0800989 name="list",
990 obtype=list,
991 doc="A Python list object.")
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000992
993pydict = StackObject(
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -0800994 name="dict",
995 obtype=dict,
996 doc="A Python dict object.")
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000997
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100998pyset = StackObject(
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -0800999 name="set",
1000 obtype=set,
1001 doc="A Python set object.")
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01001002
1003pyfrozenset = StackObject(
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -08001004 name="frozenset",
1005 obtype=set,
1006 doc="A Python frozenset object.")
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01001007
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001008anyobject = StackObject(
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -08001009 name='any',
1010 obtype=object,
1011 doc="Any kind of object whatsoever.")
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001012
1013markobject = StackObject(
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -08001014 name="mark",
1015 obtype=StackObject,
1016 doc="""'The mark' is a unique object.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001017
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -08001018Opcodes that operate on a variable number of objects
1019generally don't embed the count of objects in the opcode,
1020or pull it off the stack. Instead the MARK opcode is used
1021to push a special marker object on the stack, and then
1022some other opcodes grab all the objects from the top of
1023the stack down to (but not including) the topmost marker
1024object.
1025""")
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001026
1027stackslice = StackObject(
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -08001028 name="stackslice",
1029 obtype=StackObject,
1030 doc="""An object representing a contiguous slice of the stack.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001031
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -08001032This is used in conjunction with markobject, to represent all
1033of the stack following the topmost markobject. For example,
1034the POP_MARK opcode changes the stack from
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001035
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -08001036 [..., markobject, stackslice]
1037to
1038 [...]
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001039
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -08001040No matter how many object are on the stack after the topmost
1041markobject, POP_MARK gets rid of all of them (including the
1042topmost markobject too).
1043""")
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001044
1045##############################################################################
1046# Descriptors for pickle opcodes.
1047
1048class OpcodeInfo(object):
1049
1050 __slots__ = (
1051 # symbolic name of opcode; a string
1052 'name',
1053
1054 # the code used in a bytestream to represent the opcode; a
1055 # one-character string
1056 'code',
1057
1058 # If the opcode has an argument embedded in the byte string, an
1059 # instance of ArgumentDescriptor specifying its type. Note that
1060 # arg.reader(s) can be used to read and decode the argument from
1061 # the bytestream s, and arg.doc documents the format of the raw
1062 # argument bytes. If the opcode doesn't have an argument embedded
1063 # in the bytestream, arg should be None.
1064 'arg',
1065
1066 # what the stack looks like before this opcode runs; a list
1067 'stack_before',
1068
1069 # what the stack looks like after this opcode runs; a list
1070 'stack_after',
1071
1072 # the protocol number in which this opcode was introduced; an int
1073 'proto',
1074
1075 # human-readable docs for this opcode; a string
1076 'doc',
1077 )
1078
1079 def __init__(self, name, code, arg,
1080 stack_before, stack_after, proto, doc):
Guido van Rossum3172c5d2007-10-16 18:12:55 +00001081 assert isinstance(name, str)
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001082 self.name = name
1083
Guido van Rossum3172c5d2007-10-16 18:12:55 +00001084 assert isinstance(code, str)
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001085 assert len(code) == 1
1086 self.code = code
1087
1088 assert arg is None or isinstance(arg, ArgumentDescriptor)
1089 self.arg = arg
1090
1091 assert isinstance(stack_before, list)
1092 for x in stack_before:
1093 assert isinstance(x, StackObject)
1094 self.stack_before = stack_before
1095
1096 assert isinstance(stack_after, list)
1097 for x in stack_after:
1098 assert isinstance(x, StackObject)
1099 self.stack_after = stack_after
1100
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -07001101 assert isinstance(proto, int) and 0 <= proto <= pickle.HIGHEST_PROTOCOL
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001102 self.proto = proto
1103
Guido van Rossum3172c5d2007-10-16 18:12:55 +00001104 assert isinstance(doc, str)
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001105 self.doc = doc
1106
1107I = OpcodeInfo
1108opcodes = [
1109
1110 # Ways to spell integers.
1111
1112 I(name='INT',
1113 code='I',
1114 arg=decimalnl_short,
1115 stack_before=[],
1116 stack_after=[pyinteger_or_bool],
1117 proto=0,
1118 doc="""Push an integer or bool.
1119
1120 The argument is a newline-terminated decimal literal string.
1121
1122 The intent may have been that this always fit in a short Python int,
1123 but INT can be generated in pickles written on a 64-bit box that
1124 require a Python long on a 32-bit box. The difference between this
1125 and LONG then is that INT skips a trailing 'L', and produces a short
1126 int whenever possible.
1127
1128 Another difference is due to that, when bool was introduced as a
1129 distinct type in 2.3, builtin names True and False were also added to
1130 2.2.2, mapping to ints 1 and 0. For compatibility in both directions,
1131 True gets pickled as INT + "I01\\n", and False as INT + "I00\\n".
1132 Leading zeroes are never produced for a genuine integer. The 2.3
1133 (and later) unpicklers special-case these and return bool instead;
1134 earlier unpicklers ignore the leading "0" and return the int.
1135 """),
1136
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001137 I(name='BININT',
1138 code='J',
1139 arg=int4,
1140 stack_before=[],
1141 stack_after=[pyint],
1142 proto=1,
1143 doc="""Push a four-byte signed integer.
1144
1145 This handles the full range of Python (short) integers on a 32-bit
1146 box, directly as binary bytes (1 for the opcode and 4 for the integer).
1147 If the integer is non-negative and fits in 1 or 2 bytes, pickling via
1148 BININT1 or BININT2 saves space.
1149 """),
1150
1151 I(name='BININT1',
1152 code='K',
1153 arg=uint1,
1154 stack_before=[],
1155 stack_after=[pyint],
1156 proto=1,
1157 doc="""Push a one-byte unsigned integer.
1158
1159 This is a space optimization for pickling very small non-negative ints,
1160 in range(256).
1161 """),
1162
1163 I(name='BININT2',
1164 code='M',
1165 arg=uint2,
1166 stack_before=[],
1167 stack_after=[pyint],
1168 proto=1,
1169 doc="""Push a two-byte unsigned integer.
1170
1171 This is a space optimization for pickling small positive ints, in
1172 range(256, 2**16). Integers in range(256) can also be pickled via
1173 BININT2, but BININT1 instead saves a byte.
1174 """),
1175
Tim Petersfdc03462003-01-28 04:56:33 +00001176 I(name='LONG',
1177 code='L',
1178 arg=decimalnl_long,
1179 stack_before=[],
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -08001180 stack_after=[pyint],
Tim Petersfdc03462003-01-28 04:56:33 +00001181 proto=0,
1182 doc="""Push a long integer.
1183
1184 The same as INT, except that the literal ends with 'L', and always
1185 unpickles to a Python long. There doesn't seem a real purpose to the
1186 trailing 'L'.
1187
1188 Note that LONG takes time quadratic in the number of digits when
1189 unpickling (this is simply due to the nature of decimal->binary
1190 conversion). Proto 2 added linear-time (in C; still quadratic-time
1191 in Python) LONG1 and LONG4 opcodes.
1192 """),
1193
1194 I(name="LONG1",
1195 code='\x8a',
1196 arg=long1,
1197 stack_before=[],
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -08001198 stack_after=[pyint],
Tim Petersfdc03462003-01-28 04:56:33 +00001199 proto=2,
1200 doc="""Long integer using one-byte length.
1201
1202 A more efficient encoding of a Python long; the long1 encoding
1203 says it all."""),
1204
1205 I(name="LONG4",
1206 code='\x8b',
1207 arg=long4,
1208 stack_before=[],
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -08001209 stack_after=[pyint],
Tim Petersfdc03462003-01-28 04:56:33 +00001210 proto=2,
1211 doc="""Long integer using found-byte length.
1212
1213 A more efficient encoding of a Python long; the long4 encoding
1214 says it all."""),
1215
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001216 # Ways to spell strings (8-bit, not Unicode).
1217
1218 I(name='STRING',
1219 code='S',
1220 arg=stringnl,
1221 stack_before=[],
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -08001222 stack_after=[pybytes_or_str],
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001223 proto=0,
1224 doc="""Push a Python string object.
1225
1226 The argument is a repr-style string, with bracketing quote characters,
1227 and perhaps embedded escapes. The argument extends until the next
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -08001228 newline character. These are usually decoded into a str instance
Guido van Rossumf4169812008-03-17 22:56:06 +00001229 using the encoding given to the Unpickler constructor. or the default,
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -08001230 'ASCII'. If the encoding given was 'bytes' however, they will be
1231 decoded as bytes object instead.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001232 """),
1233
1234 I(name='BINSTRING',
1235 code='T',
1236 arg=string4,
1237 stack_before=[],
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -08001238 stack_after=[pybytes_or_str],
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001239 proto=1,
1240 doc="""Push a Python string object.
1241
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -08001242 There are two arguments: the first is a 4-byte little-endian
1243 signed int giving the number of bytes in the string, and the
1244 second is that many bytes, which are taken literally as the string
1245 content. These are usually decoded into a str instance using the
1246 encoding given to the Unpickler constructor. or the default,
1247 'ASCII'. If the encoding given was 'bytes' however, they will be
1248 decoded as bytes object instead.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001249 """),
1250
1251 I(name='SHORT_BINSTRING',
1252 code='U',
1253 arg=string1,
1254 stack_before=[],
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -08001255 stack_after=[pybytes_or_str],
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001256 proto=1,
1257 doc="""Push a Python string object.
1258
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -08001259 There are two arguments: the first is a 1-byte unsigned int giving
1260 the number of bytes in the string, and the second is that many
1261 bytes, which are taken literally as the string content. These are
1262 usually decoded into a str instance using the encoding given to
1263 the Unpickler constructor. or the default, 'ASCII'. If the
1264 encoding given was 'bytes' however, they will be decoded as bytes
1265 object instead.
Guido van Rossumf4169812008-03-17 22:56:06 +00001266 """),
1267
1268 # Bytes (protocol 3 only; older protocols don't support bytes at all)
1269
1270 I(name='BINBYTES',
1271 code='B',
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -07001272 arg=bytes4,
Guido van Rossumf4169812008-03-17 22:56:06 +00001273 stack_before=[],
1274 stack_after=[pybytes],
1275 proto=3,
1276 doc="""Push a Python bytes object.
1277
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -07001278 There are two arguments: the first is a 4-byte little-endian unsigned int
1279 giving the number of bytes, and the second is that many bytes, which are
1280 taken literally as the bytes content.
Guido van Rossumf4169812008-03-17 22:56:06 +00001281 """),
1282
1283 I(name='SHORT_BINBYTES',
1284 code='C',
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -07001285 arg=bytes1,
Guido van Rossumf4169812008-03-17 22:56:06 +00001286 stack_before=[],
1287 stack_after=[pybytes],
Collin Wintere61d4372009-05-20 17:46:47 +00001288 proto=3,
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -07001289 doc="""Push a Python bytes object.
Guido van Rossumf4169812008-03-17 22:56:06 +00001290
1291 There are two arguments: the first is a 1-byte unsigned int giving
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -07001292 the number of bytes, and the second is that many bytes, which are taken
1293 literally as the string content.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001294 """),
1295
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01001296 I(name='BINBYTES8',
1297 code='\x8e',
1298 arg=bytes8,
1299 stack_before=[],
1300 stack_after=[pybytes],
1301 proto=4,
1302 doc="""Push a Python bytes object.
1303
Martin Panter4c359642016-05-08 13:53:41 +00001304 There are two arguments: the first is an 8-byte unsigned int giving
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01001305 the number of bytes in the string, and the second is that many bytes,
1306 which are taken literally as the string content.
1307 """),
1308
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001309 # Ways to spell None.
1310
1311 I(name='NONE',
1312 code='N',
1313 arg=None,
1314 stack_before=[],
1315 stack_after=[pynone],
1316 proto=0,
1317 doc="Push None on the stack."),
1318
Tim Petersfdc03462003-01-28 04:56:33 +00001319 # Ways to spell bools, starting with proto 2. See INT for how this was
1320 # done before proto 2.
1321
1322 I(name='NEWTRUE',
1323 code='\x88',
1324 arg=None,
1325 stack_before=[],
1326 stack_after=[pybool],
1327 proto=2,
Krzysztof Wroblewski488cfb72018-09-22 16:13:53 +01001328 doc="Push True onto the stack."),
Tim Petersfdc03462003-01-28 04:56:33 +00001329
1330 I(name='NEWFALSE',
1331 code='\x89',
1332 arg=None,
1333 stack_before=[],
1334 stack_after=[pybool],
1335 proto=2,
Krzysztof Wroblewski488cfb72018-09-22 16:13:53 +01001336 doc="Push False onto the stack."),
Tim Petersfdc03462003-01-28 04:56:33 +00001337
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001338 # Ways to spell Unicode strings.
1339
1340 I(name='UNICODE',
1341 code='V',
1342 arg=unicodestringnl,
1343 stack_before=[],
1344 stack_after=[pyunicode],
1345 proto=0, # this may be pure-text, but it's a later addition
1346 doc="""Push a Python Unicode string object.
1347
1348 The argument is a raw-unicode-escape encoding of a Unicode string,
1349 and so may contain embedded escape sequences. The argument extends
1350 until the next newline character.
1351 """),
1352
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01001353 I(name='SHORT_BINUNICODE',
1354 code='\x8c',
1355 arg=unicodestring1,
1356 stack_before=[],
1357 stack_after=[pyunicode],
1358 proto=4,
1359 doc="""Push a Python Unicode string object.
1360
1361 There are two arguments: the first is a 1-byte little-endian signed int
1362 giving the number of bytes in the string. The second is that many
1363 bytes, and is the UTF-8 encoding of the Unicode string.
1364 """),
1365
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001366 I(name='BINUNICODE',
1367 code='X',
1368 arg=unicodestring4,
1369 stack_before=[],
1370 stack_after=[pyunicode],
1371 proto=1,
1372 doc="""Push a Python Unicode string object.
1373
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -07001374 There are two arguments: the first is a 4-byte little-endian unsigned int
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001375 giving the number of bytes in the string. The second is that many
1376 bytes, and is the UTF-8 encoding of the Unicode string.
1377 """),
1378
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01001379 I(name='BINUNICODE8',
1380 code='\x8d',
1381 arg=unicodestring8,
1382 stack_before=[],
1383 stack_after=[pyunicode],
1384 proto=4,
1385 doc="""Push a Python Unicode string object.
1386
Martin Panter4c359642016-05-08 13:53:41 +00001387 There are two arguments: the first is an 8-byte little-endian signed int
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01001388 giving the number of bytes in the string. The second is that many
1389 bytes, and is the UTF-8 encoding of the Unicode string.
1390 """),
1391
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001392 # Ways to spell floats.
1393
1394 I(name='FLOAT',
1395 code='F',
1396 arg=floatnl,
1397 stack_before=[],
1398 stack_after=[pyfloat],
1399 proto=0,
1400 doc="""Newline-terminated decimal float literal.
1401
1402 The argument is repr(a_float), and in general requires 17 significant
1403 digits for roundtrip conversion to be an identity (this is so for
1404 IEEE-754 double precision values, which is what Python float maps to
1405 on most boxes).
1406
1407 In general, FLOAT cannot be used to transport infinities, NaNs, or
1408 minus zero across boxes (or even on a single box, if the platform C
1409 library can't read the strings it produces for such things -- Windows
1410 is like that), but may do less damage than BINFLOAT on boxes with
1411 greater precision or dynamic range than IEEE-754 double.
1412 """),
1413
1414 I(name='BINFLOAT',
1415 code='G',
1416 arg=float8,
1417 stack_before=[],
1418 stack_after=[pyfloat],
1419 proto=1,
1420 doc="""Float stored in binary form, with 8 bytes of data.
1421
1422 This generally requires less than half the space of FLOAT encoding.
1423 In general, BINFLOAT cannot be used to transport infinities, NaNs, or
1424 minus zero, raises an exception if the exponent exceeds the range of
1425 an IEEE-754 double, and retains no more than 53 bits of precision (if
1426 there are more than that, "add a half and chop" rounding is used to
1427 cut it back to 53 significant bits).
1428 """),
1429
1430 # Ways to build lists.
1431
1432 I(name='EMPTY_LIST',
1433 code=']',
1434 arg=None,
1435 stack_before=[],
1436 stack_after=[pylist],
1437 proto=1,
1438 doc="Push an empty list."),
1439
1440 I(name='APPEND',
1441 code='a',
1442 arg=None,
1443 stack_before=[pylist, anyobject],
1444 stack_after=[pylist],
1445 proto=0,
1446 doc="""Append an object to a list.
1447
1448 Stack before: ... pylist anyobject
1449 Stack after: ... pylist+[anyobject]
Tim Peters81098ac2003-01-28 05:12:08 +00001450
1451 although pylist is really extended in-place.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001452 """),
1453
1454 I(name='APPENDS',
1455 code='e',
1456 arg=None,
1457 stack_before=[pylist, markobject, stackslice],
1458 stack_after=[pylist],
1459 proto=1,
1460 doc="""Extend a list by a slice of stack objects.
1461
1462 Stack before: ... pylist markobject stackslice
1463 Stack after: ... pylist+stackslice
Tim Peters81098ac2003-01-28 05:12:08 +00001464
1465 although pylist is really extended in-place.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001466 """),
1467
1468 I(name='LIST',
1469 code='l',
1470 arg=None,
1471 stack_before=[markobject, stackslice],
1472 stack_after=[pylist],
1473 proto=0,
1474 doc="""Build a list out of the topmost stack slice, after markobject.
1475
1476 All the stack entries following the topmost markobject are placed into
1477 a single Python list, which single list object replaces all of the
1478 stack from the topmost markobject onward. For example,
1479
1480 Stack before: ... markobject 1 2 3 'abc'
1481 Stack after: ... [1, 2, 3, 'abc']
1482 """),
1483
1484 # Ways to build tuples.
1485
1486 I(name='EMPTY_TUPLE',
1487 code=')',
1488 arg=None,
1489 stack_before=[],
1490 stack_after=[pytuple],
1491 proto=1,
1492 doc="Push an empty tuple."),
1493
1494 I(name='TUPLE',
1495 code='t',
1496 arg=None,
1497 stack_before=[markobject, stackslice],
1498 stack_after=[pytuple],
1499 proto=0,
1500 doc="""Build a tuple out of the topmost stack slice, after markobject.
1501
1502 All the stack entries following the topmost markobject are placed into
1503 a single Python tuple, which single tuple object replaces all of the
1504 stack from the topmost markobject onward. For example,
1505
1506 Stack before: ... markobject 1 2 3 'abc'
1507 Stack after: ... (1, 2, 3, 'abc')
1508 """),
1509
Tim Petersfdc03462003-01-28 04:56:33 +00001510 I(name='TUPLE1',
1511 code='\x85',
1512 arg=None,
1513 stack_before=[anyobject],
1514 stack_after=[pytuple],
1515 proto=2,
Alexander Belopolsky44c2ffd2010-07-16 14:39:45 +00001516 doc="""Build a one-tuple out of the topmost item on the stack.
Tim Petersfdc03462003-01-28 04:56:33 +00001517
1518 This code pops one value off the stack and pushes a tuple of
Alexander Belopolsky44c2ffd2010-07-16 14:39:45 +00001519 length 1 whose one item is that value back onto it. In other
1520 words:
Tim Petersfdc03462003-01-28 04:56:33 +00001521
1522 stack[-1] = tuple(stack[-1:])
1523 """),
1524
1525 I(name='TUPLE2',
1526 code='\x86',
1527 arg=None,
1528 stack_before=[anyobject, anyobject],
1529 stack_after=[pytuple],
1530 proto=2,
Alexander Belopolsky44c2ffd2010-07-16 14:39:45 +00001531 doc="""Build a two-tuple out of the top two items on the stack.
Tim Petersfdc03462003-01-28 04:56:33 +00001532
Alexander Belopolsky44c2ffd2010-07-16 14:39:45 +00001533 This code pops two values off the stack and pushes a tuple of
1534 length 2 whose items are those values back onto it. In other
1535 words:
Tim Petersfdc03462003-01-28 04:56:33 +00001536
1537 stack[-2:] = [tuple(stack[-2:])]
1538 """),
1539
1540 I(name='TUPLE3',
1541 code='\x87',
1542 arg=None,
1543 stack_before=[anyobject, anyobject, anyobject],
1544 stack_after=[pytuple],
1545 proto=2,
Alexander Belopolsky44c2ffd2010-07-16 14:39:45 +00001546 doc="""Build a three-tuple out of the top three items on the stack.
Tim Petersfdc03462003-01-28 04:56:33 +00001547
Alexander Belopolsky44c2ffd2010-07-16 14:39:45 +00001548 This code pops three values off the stack and pushes a tuple of
1549 length 3 whose items are those values back onto it. In other
1550 words:
Tim Petersfdc03462003-01-28 04:56:33 +00001551
1552 stack[-3:] = [tuple(stack[-3:])]
1553 """),
1554
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001555 # Ways to build dicts.
1556
1557 I(name='EMPTY_DICT',
1558 code='}',
1559 arg=None,
1560 stack_before=[],
1561 stack_after=[pydict],
1562 proto=1,
1563 doc="Push an empty dict."),
1564
1565 I(name='DICT',
1566 code='d',
1567 arg=None,
1568 stack_before=[markobject, stackslice],
1569 stack_after=[pydict],
1570 proto=0,
1571 doc="""Build a dict out of the topmost stack slice, after markobject.
1572
1573 All the stack entries following the topmost markobject are placed into
1574 a single Python dict, which single dict object replaces all of the
1575 stack from the topmost markobject onward. The stack slice alternates
1576 key, value, key, value, .... For example,
1577
1578 Stack before: ... markobject 1 2 3 'abc'
1579 Stack after: ... {1: 2, 3: 'abc'}
1580 """),
1581
1582 I(name='SETITEM',
1583 code='s',
1584 arg=None,
1585 stack_before=[pydict, anyobject, anyobject],
1586 stack_after=[pydict],
1587 proto=0,
1588 doc="""Add a key+value pair to an existing dict.
1589
1590 Stack before: ... pydict key value
1591 Stack after: ... pydict
1592
1593 where pydict has been modified via pydict[key] = value.
1594 """),
1595
1596 I(name='SETITEMS',
1597 code='u',
1598 arg=None,
1599 stack_before=[pydict, markobject, stackslice],
1600 stack_after=[pydict],
1601 proto=1,
1602 doc="""Add an arbitrary number of key+value pairs to an existing dict.
1603
1604 The slice of the stack following the topmost markobject is taken as
1605 an alternating sequence of keys and values, added to the dict
1606 immediately under the topmost markobject. Everything at and after the
1607 topmost markobject is popped, leaving the mutated dict at the top
1608 of the stack.
1609
1610 Stack before: ... pydict markobject key_1 value_1 ... key_n value_n
1611 Stack after: ... pydict
1612
1613 where pydict has been modified via pydict[key_i] = value_i for i in
1614 1, 2, ..., n, and in that order.
1615 """),
1616
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01001617 # Ways to build sets
1618
1619 I(name='EMPTY_SET',
1620 code='\x8f',
1621 arg=None,
1622 stack_before=[],
1623 stack_after=[pyset],
1624 proto=4,
1625 doc="Push an empty set."),
1626
1627 I(name='ADDITEMS',
1628 code='\x90',
1629 arg=None,
1630 stack_before=[pyset, markobject, stackslice],
1631 stack_after=[pyset],
1632 proto=4,
1633 doc="""Add an arbitrary number of items to an existing set.
1634
1635 The slice of the stack following the topmost markobject is taken as
1636 a sequence of items, added to the set immediately under the topmost
1637 markobject. Everything at and after the topmost markobject is popped,
1638 leaving the mutated set at the top of the stack.
1639
1640 Stack before: ... pyset markobject item_1 ... item_n
1641 Stack after: ... pyset
1642
1643 where pyset has been modified via pyset.add(item_i) = item_i for i in
1644 1, 2, ..., n, and in that order.
1645 """),
1646
1647 # Way to build frozensets
1648
1649 I(name='FROZENSET',
1650 code='\x91',
1651 arg=None,
1652 stack_before=[markobject, stackslice],
1653 stack_after=[pyfrozenset],
1654 proto=4,
1655 doc="""Build a frozenset out of the topmost slice, after markobject.
1656
1657 All the stack entries following the topmost markobject are placed into
1658 a single Python frozenset, which single frozenset object replaces all
1659 of the stack from the topmost markobject onward. For example,
1660
1661 Stack before: ... markobject 1 2 3
1662 Stack after: ... frozenset({1, 2, 3})
1663 """),
1664
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001665 # Stack manipulation.
1666
1667 I(name='POP',
1668 code='0',
1669 arg=None,
1670 stack_before=[anyobject],
1671 stack_after=[],
1672 proto=0,
1673 doc="Discard the top stack item, shrinking the stack by one item."),
1674
1675 I(name='DUP',
1676 code='2',
1677 arg=None,
1678 stack_before=[anyobject],
1679 stack_after=[anyobject, anyobject],
1680 proto=0,
1681 doc="Push the top stack item onto the stack again, duplicating it."),
1682
1683 I(name='MARK',
1684 code='(',
1685 arg=None,
1686 stack_before=[],
1687 stack_after=[markobject],
1688 proto=0,
1689 doc="""Push markobject onto the stack.
1690
1691 markobject is a unique object, used by other opcodes to identify a
1692 region of the stack containing a variable number of objects for them
1693 to work on. See markobject.doc for more detail.
1694 """),
1695
1696 I(name='POP_MARK',
1697 code='1',
1698 arg=None,
1699 stack_before=[markobject, stackslice],
1700 stack_after=[],
Collin Wintere61d4372009-05-20 17:46:47 +00001701 proto=1,
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001702 doc="""Pop all the stack objects at and above the topmost markobject.
1703
1704 When an opcode using a variable number of stack objects is done,
1705 POP_MARK is used to remove those objects, and to remove the markobject
1706 that delimited their starting position on the stack.
1707 """),
1708
1709 # Memo manipulation. There are really only two operations (get and put),
1710 # each in all-text, "short binary", and "long binary" flavors.
1711
1712 I(name='GET',
1713 code='g',
1714 arg=decimalnl_short,
1715 stack_before=[],
1716 stack_after=[anyobject],
1717 proto=0,
1718 doc="""Read an object from the memo and push it on the stack.
1719
Ezio Melotti13925002011-03-16 11:05:33 +02001720 The index of the memo object to push is given by the newline-terminated
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001721 decimal string following. BINGET and LONG_BINGET are space-optimized
1722 versions.
1723 """),
1724
1725 I(name='BINGET',
1726 code='h',
1727 arg=uint1,
1728 stack_before=[],
1729 stack_after=[anyobject],
1730 proto=1,
1731 doc="""Read an object from the memo and push it on the stack.
1732
1733 The index of the memo object to push is given by the 1-byte unsigned
1734 integer following.
1735 """),
1736
1737 I(name='LONG_BINGET',
1738 code='j',
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -07001739 arg=uint4,
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001740 stack_before=[],
1741 stack_after=[anyobject],
1742 proto=1,
1743 doc="""Read an object from the memo and push it on the stack.
1744
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -07001745 The index of the memo object to push is given by the 4-byte unsigned
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001746 little-endian integer following.
1747 """),
1748
1749 I(name='PUT',
1750 code='p',
1751 arg=decimalnl_short,
1752 stack_before=[],
1753 stack_after=[],
1754 proto=0,
1755 doc="""Store the stack top into the memo. The stack is not popped.
1756
1757 The index of the memo location to write into is given by the newline-
1758 terminated decimal string following. BINPUT and LONG_BINPUT are
1759 space-optimized versions.
1760 """),
1761
1762 I(name='BINPUT',
1763 code='q',
1764 arg=uint1,
1765 stack_before=[],
1766 stack_after=[],
1767 proto=1,
1768 doc="""Store the stack top into the memo. The stack is not popped.
1769
1770 The index of the memo location to write into is given by the 1-byte
1771 unsigned integer following.
1772 """),
1773
1774 I(name='LONG_BINPUT',
1775 code='r',
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -07001776 arg=uint4,
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001777 stack_before=[],
1778 stack_after=[],
1779 proto=1,
1780 doc="""Store the stack top into the memo. The stack is not popped.
1781
1782 The index of the memo location to write into is given by the 4-byte
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -07001783 unsigned little-endian integer following.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001784 """),
1785
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01001786 I(name='MEMOIZE',
1787 code='\x94',
1788 arg=None,
1789 stack_before=[anyobject],
1790 stack_after=[anyobject],
1791 proto=4,
1792 doc="""Store the stack top into the memo. The stack is not popped.
1793
1794 The index of the memo location to write is the number of
1795 elements currently present in the memo.
1796 """),
1797
Tim Petersfdc03462003-01-28 04:56:33 +00001798 # Access the extension registry (predefined objects). Akin to the GET
1799 # family.
1800
1801 I(name='EXT1',
1802 code='\x82',
1803 arg=uint1,
1804 stack_before=[],
1805 stack_after=[anyobject],
1806 proto=2,
1807 doc="""Extension code.
1808
1809 This code and the similar EXT2 and EXT4 allow using a registry
1810 of popular objects that are pickled by name, typically classes.
1811 It is envisioned that through a global negotiation and
1812 registration process, third parties can set up a mapping between
1813 ints and object names.
1814
1815 In order to guarantee pickle interchangeability, the extension
1816 code registry ought to be global, although a range of codes may
1817 be reserved for private use.
1818
1819 EXT1 has a 1-byte integer argument. This is used to index into the
1820 extension registry, and the object at that index is pushed on the stack.
1821 """),
1822
1823 I(name='EXT2',
1824 code='\x83',
1825 arg=uint2,
1826 stack_before=[],
1827 stack_after=[anyobject],
1828 proto=2,
1829 doc="""Extension code.
1830
1831 See EXT1. EXT2 has a two-byte integer argument.
1832 """),
1833
1834 I(name='EXT4',
1835 code='\x84',
1836 arg=int4,
1837 stack_before=[],
1838 stack_after=[anyobject],
1839 proto=2,
1840 doc="""Extension code.
1841
1842 See EXT1. EXT4 has a four-byte integer argument.
1843 """),
1844
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001845 # Push a class object, or module function, on the stack, via its module
1846 # and name.
1847
1848 I(name='GLOBAL',
1849 code='c',
1850 arg=stringnl_noescape_pair,
1851 stack_before=[],
1852 stack_after=[anyobject],
1853 proto=0,
1854 doc="""Push a global object (module.attr) on the stack.
1855
1856 Two newline-terminated strings follow the GLOBAL opcode. The first is
1857 taken as a module name, and the second as a class name. The class
1858 object module.class is pushed on the stack. More accurately, the
1859 object returned by self.find_class(module, class) is pushed on the
1860 stack, so unpickling subclasses can override this form of lookup.
1861 """),
1862
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01001863 I(name='STACK_GLOBAL',
1864 code='\x93',
1865 arg=None,
1866 stack_before=[pyunicode, pyunicode],
1867 stack_after=[anyobject],
Serhiy Storchaka5805dde2015-10-13 21:12:32 +03001868 proto=4,
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01001869 doc="""Push a global object (module.attr) on the stack.
1870 """),
1871
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001872 # Ways to build objects of classes pickle doesn't know about directly
1873 # (user-defined classes). I despair of documenting this accurately
1874 # and comprehensibly -- you really have to read the pickle code to
1875 # find all the special cases.
1876
1877 I(name='REDUCE',
1878 code='R',
1879 arg=None,
1880 stack_before=[anyobject, anyobject],
1881 stack_after=[anyobject],
1882 proto=0,
1883 doc="""Push an object built from a callable and an argument tuple.
1884
1885 The opcode is named to remind of the __reduce__() method.
1886
1887 Stack before: ... callable pytuple
1888 Stack after: ... callable(*pytuple)
1889
1890 The callable and the argument tuple are the first two items returned
1891 by a __reduce__ method. Applying the callable to the argtuple is
1892 supposed to reproduce the original object, or at least get it started.
1893 If the __reduce__ method returns a 3-tuple, the last component is an
1894 argument to be passed to the object's __setstate__, and then the REDUCE
1895 opcode is followed by code to create setstate's argument, and then a
1896 BUILD opcode to apply __setstate__ to that argument.
1897
Guido van Rossum13257902007-06-07 23:15:56 +00001898 If not isinstance(callable, type), REDUCE complains unless the
Alexandre Vassalottif7fa63d2008-05-11 08:55:36 +00001899 callable has been registered with the copyreg module's
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001900 safe_constructors dict, or the callable has a magic
1901 '__safe_for_unpickling__' attribute with a true value. I'm not sure
1902 why it does this, but I've sure seen this complaint often enough when
1903 I didn't want to <wink>.
1904 """),
1905
1906 I(name='BUILD',
1907 code='b',
1908 arg=None,
1909 stack_before=[anyobject, anyobject],
1910 stack_after=[anyobject],
1911 proto=0,
1912 doc="""Finish building an object, via __setstate__ or dict update.
1913
1914 Stack before: ... anyobject argument
1915 Stack after: ... anyobject
1916
1917 where anyobject may have been mutated, as follows:
1918
1919 If the object has a __setstate__ method,
1920
1921 anyobject.__setstate__(argument)
1922
1923 is called.
1924
1925 Else the argument must be a dict, the object must have a __dict__, and
1926 the object is updated via
1927
1928 anyobject.__dict__.update(argument)
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001929 """),
1930
1931 I(name='INST',
1932 code='i',
1933 arg=stringnl_noescape_pair,
1934 stack_before=[markobject, stackslice],
1935 stack_after=[anyobject],
1936 proto=0,
1937 doc="""Build a class instance.
1938
1939 This is the protocol 0 version of protocol 1's OBJ opcode.
1940 INST is followed by two newline-terminated strings, giving a
1941 module and class name, just as for the GLOBAL opcode (and see
1942 GLOBAL for more details about that). self.find_class(module, name)
1943 is used to get a class object.
1944
1945 In addition, all the objects on the stack following the topmost
1946 markobject are gathered into a tuple and popped (along with the
1947 topmost markobject), just as for the TUPLE opcode.
1948
1949 Now it gets complicated. If all of these are true:
1950
1951 + The argtuple is empty (markobject was at the top of the stack
1952 at the start).
1953
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001954 + The class object does not have a __getinitargs__ attribute.
1955
1956 then we want to create an old-style class instance without invoking
1957 its __init__() method (pickle has waffled on this over the years; not
1958 calling __init__() is current wisdom). In this case, an instance of
1959 an old-style dummy class is created, and then we try to rebind its
1960 __class__ attribute to the desired class object. If this succeeds,
Guido van Rossuma8add0e2007-05-14 22:03:55 +00001961 the new instance object is pushed on the stack, and we're done.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001962
1963 Else (the argtuple is not empty, it's not an old-style class object,
1964 or the class object does have a __getinitargs__ attribute), the code
1965 first insists that the class object have a __safe_for_unpickling__
1966 attribute. Unlike as for the __safe_for_unpickling__ check in REDUCE,
1967 it doesn't matter whether this attribute has a true or false value, it
Guido van Rossum99603b02007-07-20 00:22:32 +00001968 only matters whether it exists (XXX this is a bug). If
1969 __safe_for_unpickling__ doesn't exist, UnpicklingError is raised.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001970
1971 Else (the class object does have a __safe_for_unpickling__ attr),
1972 the class object obtained from INST's arguments is applied to the
1973 argtuple obtained from the stack, and the resulting instance object
1974 is pushed on the stack.
Tim Peters2b93c4c2003-01-30 16:35:08 +00001975
1976 NOTE: checks for __safe_for_unpickling__ went away in Python 2.3.
Florent Xiclunaaa6c1d22011-12-12 18:54:29 +01001977 NOTE: the distinction between old-style and new-style classes does
1978 not make sense in Python 3.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001979 """),
1980
1981 I(name='OBJ',
1982 code='o',
1983 arg=None,
1984 stack_before=[markobject, anyobject, stackslice],
1985 stack_after=[anyobject],
1986 proto=1,
1987 doc="""Build a class instance.
1988
1989 This is the protocol 1 version of protocol 0's INST opcode, and is
1990 very much like it. The major difference is that the class object
1991 is taken off the stack, allowing it to be retrieved from the memo
1992 repeatedly if several instances of the same class are created. This
1993 can be much more efficient (in both time and space) than repeatedly
1994 embedding the module and class names in INST opcodes.
1995
1996 Unlike INST, OBJ takes no arguments from the opcode stream. Instead
1997 the class object is taken off the stack, immediately above the
1998 topmost markobject:
1999
2000 Stack before: ... markobject classobject stackslice
2001 Stack after: ... new_instance_object
2002
2003 As for INST, the remainder of the stack above the markobject is
2004 gathered into an argument tuple, and then the logic seems identical,
Guido van Rossumecb11042003-01-29 06:24:30 +00002005 except that no __safe_for_unpickling__ check is done (XXX this is
Guido van Rossum99603b02007-07-20 00:22:32 +00002006 a bug). See INST for the gory details.
Tim Peters2b93c4c2003-01-30 16:35:08 +00002007
2008 NOTE: In Python 2.3, INST and OBJ are identical except for how they
2009 get the class object. That was always the intent; the implementations
2010 had diverged for accidental reasons.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002011 """),
2012
Tim Petersfdc03462003-01-28 04:56:33 +00002013 I(name='NEWOBJ',
2014 code='\x81',
2015 arg=None,
2016 stack_before=[anyobject, anyobject],
2017 stack_after=[anyobject],
2018 proto=2,
2019 doc="""Build an object instance.
2020
2021 The stack before should be thought of as containing a class
2022 object followed by an argument tuple (the tuple being the stack
2023 top). Call these cls and args. They are popped off the stack,
2024 and the value returned by cls.__new__(cls, *args) is pushed back
2025 onto the stack.
2026 """),
2027
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01002028 I(name='NEWOBJ_EX',
2029 code='\x92',
2030 arg=None,
2031 stack_before=[anyobject, anyobject, anyobject],
2032 stack_after=[anyobject],
2033 proto=4,
2034 doc="""Build an object instance.
2035
2036 The stack before should be thought of as containing a class
2037 object followed by an argument tuple and by a keyword argument dict
2038 (the dict being the stack top). Call these cls and args. They are
2039 popped off the stack, and the value returned by
2040 cls.__new__(cls, *args, *kwargs) is pushed back onto the stack.
2041 """),
2042
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002043 # Machine control.
2044
Tim Petersfdc03462003-01-28 04:56:33 +00002045 I(name='PROTO',
2046 code='\x80',
2047 arg=uint1,
2048 stack_before=[],
2049 stack_after=[],
2050 proto=2,
2051 doc="""Protocol version indicator.
2052
2053 For protocol 2 and above, a pickle must start with this opcode.
2054 The argument is the protocol version, an int in range(2, 256).
2055 """),
2056
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002057 I(name='STOP',
2058 code='.',
2059 arg=None,
2060 stack_before=[anyobject],
2061 stack_after=[],
2062 proto=0,
2063 doc="""Stop the unpickling machine.
2064
2065 Every pickle ends with this opcode. The object at the top of the stack
2066 is popped, and that's the result of unpickling. The stack should be
2067 empty then.
2068 """),
2069
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01002070 # Framing support.
2071
2072 I(name='FRAME',
2073 code='\x95',
2074 arg=uint8,
2075 stack_before=[],
2076 stack_after=[],
2077 proto=4,
2078 doc="""Indicate the beginning of a new frame.
2079
2080 The unpickler may use this opcode to safely prefetch data from its
2081 underlying stream.
2082 """),
2083
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002084 # Ways to deal with persistent IDs.
2085
2086 I(name='PERSID',
2087 code='P',
2088 arg=stringnl_noescape,
2089 stack_before=[],
2090 stack_after=[anyobject],
2091 proto=0,
2092 doc="""Push an object identified by a persistent ID.
2093
2094 The pickle module doesn't define what a persistent ID means. PERSID's
2095 argument is a newline-terminated str-style (no embedded escapes, no
2096 bracketing quote characters) string, which *is* "the persistent ID".
2097 The unpickler passes this string to self.persistent_load(). Whatever
2098 object that returns is pushed on the stack. There is no implementation
2099 of persistent_load() in Python's unpickler: it must be supplied by an
2100 unpickler subclass.
2101 """),
2102
2103 I(name='BINPERSID',
2104 code='Q',
2105 arg=None,
2106 stack_before=[anyobject],
2107 stack_after=[anyobject],
2108 proto=1,
2109 doc="""Push an object identified by a persistent ID.
2110
2111 Like PERSID, except the persistent ID is popped off the stack (instead
2112 of being a string embedded in the opcode bytestream). The persistent
2113 ID is passed to self.persistent_load(), and whatever object that
2114 returns is pushed on the stack. See PERSID for more detail.
2115 """),
2116]
2117del I
2118
2119# Verify uniqueness of .name and .code members.
2120name2i = {}
2121code2i = {}
2122
2123for i, d in enumerate(opcodes):
2124 if d.name in name2i:
2125 raise ValueError("repeated name %r at indices %d and %d" %
2126 (d.name, name2i[d.name], i))
2127 if d.code in code2i:
2128 raise ValueError("repeated code %r at indices %d and %d" %
2129 (d.code, code2i[d.code], i))
2130
2131 name2i[d.name] = i
2132 code2i[d.code] = i
2133
2134del name2i, code2i, i, d
2135
2136##############################################################################
2137# Build a code2op dict, mapping opcode characters to OpcodeInfo records.
2138# Also ensure we've got the same stuff as pickle.py, although the
2139# introspection here is dicey.
2140
2141code2op = {}
2142for d in opcodes:
2143 code2op[d.code] = d
2144del d
2145
2146def assure_pickle_consistency(verbose=False):
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002147
2148 copy = code2op.copy()
2149 for name in pickle.__all__:
2150 if not re.match("[A-Z][A-Z0-9_]+$", name):
2151 if verbose:
Guido van Rossumbe19ed72007-02-09 05:37:30 +00002152 print("skipping %r: it doesn't look like an opcode name" % name)
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002153 continue
2154 picklecode = getattr(pickle, name)
Guido van Rossum617dbc42007-05-07 23:57:08 +00002155 if not isinstance(picklecode, bytes) or len(picklecode) != 1:
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002156 if verbose:
Guido van Rossumbe19ed72007-02-09 05:37:30 +00002157 print(("skipping %r: value %r doesn't look like a pickle "
2158 "code" % (name, picklecode)))
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002159 continue
Guido van Rossum617dbc42007-05-07 23:57:08 +00002160 picklecode = picklecode.decode("latin-1")
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002161 if picklecode in copy:
2162 if verbose:
Guido van Rossumbe19ed72007-02-09 05:37:30 +00002163 print("checking name %r w/ code %r for consistency" % (
2164 name, picklecode))
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002165 d = copy[picklecode]
2166 if d.name != name:
2167 raise ValueError("for pickle code %r, pickle.py uses name %r "
2168 "but we're using name %r" % (picklecode,
2169 name,
2170 d.name))
2171 # Forget this one. Any left over in copy at the end are a problem
2172 # of a different kind.
2173 del copy[picklecode]
2174 else:
2175 raise ValueError("pickle.py appears to have a pickle opcode with "
2176 "name %r and code %r, but we don't" %
2177 (name, picklecode))
2178 if copy:
2179 msg = ["we appear to have pickle opcodes that pickle.py doesn't have:"]
2180 for code, d in copy.items():
2181 msg.append(" name %r with code %r" % (d.name, code))
2182 raise ValueError("\n".join(msg))
2183
2184assure_pickle_consistency()
Tim Petersc0c12b52003-01-29 00:56:17 +00002185del assure_pickle_consistency
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002186
2187##############################################################################
2188# A pickle opcode generator.
2189
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01002190def _genops(data, yield_end_pos=False):
2191 if isinstance(data, bytes_types):
2192 data = io.BytesIO(data)
2193
2194 if hasattr(data, "tell"):
2195 getpos = data.tell
2196 else:
2197 getpos = lambda: None
2198
2199 while True:
2200 pos = getpos()
2201 code = data.read(1)
2202 opcode = code2op.get(code.decode("latin-1"))
2203 if opcode is None:
2204 if code == b"":
2205 raise ValueError("pickle exhausted before seeing STOP")
2206 else:
2207 raise ValueError("at position %s, opcode %r unknown" % (
2208 "<unknown>" if pos is None else pos,
2209 code))
2210 if opcode.arg is None:
2211 arg = None
2212 else:
2213 arg = opcode.arg.reader(data)
2214 if yield_end_pos:
2215 yield opcode, arg, pos, getpos()
2216 else:
2217 yield opcode, arg, pos
2218 if code == b'.':
2219 assert opcode.name == 'STOP'
2220 break
2221
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002222def genops(pickle):
Guido van Rossuma72ded92003-01-27 19:40:47 +00002223 """Generate all the opcodes in a pickle.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002224
2225 'pickle' is a file-like object, or string, containing the pickle.
2226
2227 Each opcode in the pickle is generated, from the current pickle position,
2228 stopping after a STOP opcode is delivered. A triple is generated for
2229 each opcode:
2230
2231 opcode, arg, pos
2232
2233 opcode is an OpcodeInfo record, describing the current opcode.
2234
2235 If the opcode has an argument embedded in the pickle, arg is its decoded
2236 value, as a Python object. If the opcode doesn't have an argument, arg
2237 is None.
2238
2239 If the pickle has a tell() method, pos was the value of pickle.tell()
Guido van Rossum34d19282007-08-09 01:03:29 +00002240 before reading the current opcode. If the pickle is a bytes object,
2241 it's wrapped in a BytesIO object, and the latter's tell() result is
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002242 used. Else (the pickle doesn't have a tell(), and it's not obvious how
2243 to query its current position) pos is None.
2244 """
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01002245 return _genops(pickle)
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002246
2247##############################################################################
Christian Heimes3feef612008-02-11 06:19:17 +00002248# A pickle optimizer.
2249
2250def optimize(p):
2251 'Optimize a pickle string by removing unused PUT opcodes'
Serhiy Storchaka05dadcf2014-12-16 18:00:56 +02002252 put = 'PUT'
2253 get = 'GET'
2254 oldids = set() # set of all PUT ids
2255 newids = {} # set of ids used by a GET opcode
2256 opcodes = [] # (op, idx) or (pos, end_pos)
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01002257 proto = 0
Serhiy Storchaka05dadcf2014-12-16 18:00:56 +02002258 protoheader = b''
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01002259 for opcode, arg, pos, end_pos in _genops(p, yield_end_pos=True):
Christian Heimes3feef612008-02-11 06:19:17 +00002260 if 'PUT' in opcode.name:
Serhiy Storchaka05dadcf2014-12-16 18:00:56 +02002261 oldids.add(arg)
2262 opcodes.append((put, arg))
2263 elif opcode.name == 'MEMOIZE':
2264 idx = len(oldids)
2265 oldids.add(idx)
2266 opcodes.append((put, idx))
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01002267 elif 'FRAME' in opcode.name:
2268 pass
Serhiy Storchaka05dadcf2014-12-16 18:00:56 +02002269 elif 'GET' in opcode.name:
2270 if opcode.proto > proto:
2271 proto = opcode.proto
2272 newids[arg] = None
2273 opcodes.append((get, arg))
2274 elif opcode.name == 'PROTO':
2275 if arg > proto:
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01002276 proto = arg
Serhiy Storchaka05dadcf2014-12-16 18:00:56 +02002277 if pos == 0:
Olivier Grisel3cd7c6e2018-01-06 16:18:54 +01002278 protoheader = p[pos:end_pos]
Serhiy Storchaka05dadcf2014-12-16 18:00:56 +02002279 else:
2280 opcodes.append((pos, end_pos))
2281 else:
2282 opcodes.append((pos, end_pos))
2283 del oldids
Christian Heimes3feef612008-02-11 06:19:17 +00002284
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01002285 # Copy the opcodes except for PUTS without a corresponding GET
2286 out = io.BytesIO()
Serhiy Storchaka05dadcf2014-12-16 18:00:56 +02002287 # Write the PROTO header before any framing
2288 out.write(protoheader)
2289 pickler = pickle._Pickler(out, proto)
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01002290 if proto >= 4:
Serhiy Storchaka05dadcf2014-12-16 18:00:56 +02002291 pickler.framer.start_framing()
2292 idx = 0
2293 for op, arg in opcodes:
Olivier Grisel3cd7c6e2018-01-06 16:18:54 +01002294 frameless = False
Serhiy Storchaka05dadcf2014-12-16 18:00:56 +02002295 if op is put:
2296 if arg not in newids:
2297 continue
2298 data = pickler.put(idx)
2299 newids[arg] = idx
2300 idx += 1
2301 elif op is get:
2302 data = pickler.get(newids[arg])
2303 else:
2304 data = p[op:arg]
Olivier Grisel3cd7c6e2018-01-06 16:18:54 +01002305 frameless = len(data) > pickler.framer._FRAME_SIZE_TARGET
2306 pickler.framer.commit_frame(force=frameless)
2307 if frameless:
2308 pickler.framer.file_write(data)
2309 else:
2310 pickler.write(data)
Serhiy Storchaka05dadcf2014-12-16 18:00:56 +02002311 pickler.framer.end_framing()
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01002312 return out.getvalue()
Christian Heimes3feef612008-02-11 06:19:17 +00002313
2314##############################################################################
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002315# A symbolic pickle disassembler.
2316
Alexander Belopolsky929d3842010-07-17 15:51:21 +00002317def dis(pickle, out=None, memo=None, indentlevel=4, annotate=0):
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002318 """Produce a symbolic disassembly of a pickle.
2319
2320 'pickle' is a file-like object, or string, containing a (at least one)
2321 pickle. The pickle is disassembled from the current position, through
2322 the first STOP opcode encountered.
2323
2324 Optional arg 'out' is a file-like object to which the disassembly is
2325 printed. It defaults to sys.stdout.
2326
Tim Peters62235e72003-02-05 19:55:53 +00002327 Optional arg 'memo' is a Python dict, used as the pickle's memo. It
2328 may be mutated by dis(), if the pickle contains PUT or BINPUT opcodes.
2329 Passing the same memo object to another dis() call then allows disassembly
2330 to proceed across multiple pickles that were all created by the same
2331 pickler with the same memo. Ordinarily you don't need to worry about this.
2332
Alexander Belopolsky929d3842010-07-17 15:51:21 +00002333 Optional arg 'indentlevel' is the number of blanks by which to indent
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002334 a new MARK level. It defaults to 4.
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002335
Alexander Belopolsky929d3842010-07-17 15:51:21 +00002336 Optional arg 'annotate' if nonzero instructs dis() to add short
2337 description of the opcode on each line of disassembled output.
2338 The value given to 'annotate' must be an integer and is used as a
2339 hint for the column where annotation should start. The default
2340 value is 0, meaning no annotations.
2341
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002342 In addition to printing the disassembly, some sanity checks are made:
2343
2344 + All embedded opcode arguments "make sense".
2345
2346 + Explicit and implicit pop operations have enough items on the stack.
2347
2348 + When an opcode implicitly refers to a markobject, a markobject is
2349 actually on the stack.
2350
2351 + A memo entry isn't referenced before it's defined.
2352
2353 + The markobject isn't stored in the memo.
2354
2355 + A memo entry isn't redefined.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002356 """
2357
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002358 # Most of the hair here is for sanity checks, but most of it is needed
2359 # anyway to detect when a protocol 0 POP takes a MARK off the stack
2360 # (which in turn is needed to indent MARK blocks correctly).
2361
2362 stack = [] # crude emulation of unpickler stack
Tim Peters62235e72003-02-05 19:55:53 +00002363 if memo is None:
Ezio Melotti30b9d5d2013-08-17 15:50:46 +03002364 memo = {} # crude emulation of unpickler memo
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002365 maxproto = -1 # max protocol number seen
2366 markstack = [] # bytecode positions of MARK opcodes
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002367 indentchunk = ' ' * indentlevel
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002368 errormsg = None
Ezio Melotti30b9d5d2013-08-17 15:50:46 +03002369 annocol = annotate # column hint for annotations
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002370 for opcode, arg, pos in genops(pickle):
2371 if pos is not None:
Guido van Rossumbe19ed72007-02-09 05:37:30 +00002372 print("%5d:" % pos, end=' ', file=out)
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002373
Tim Petersd0f7c862003-01-28 15:27:57 +00002374 line = "%-4s %s%s" % (repr(opcode.code)[1:-1],
2375 indentchunk * len(markstack),
2376 opcode.name)
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002377
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002378 maxproto = max(maxproto, opcode.proto)
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002379 before = opcode.stack_before # don't mutate
2380 after = opcode.stack_after # don't mutate
Tim Peters43277d62003-01-30 15:02:12 +00002381 numtopop = len(before)
2382
2383 # See whether a MARK should be popped.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002384 markmsg = None
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002385 if markobject in before or (opcode.name == "POP" and
2386 stack and
2387 stack[-1] is markobject):
2388 assert markobject not in after
Tim Peters43277d62003-01-30 15:02:12 +00002389 if __debug__:
2390 if markobject in before:
2391 assert before[-1] is stackslice
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002392 if markstack:
2393 markpos = markstack.pop()
2394 if markpos is None:
2395 markmsg = "(MARK at unknown opcode offset)"
2396 else:
2397 markmsg = "(MARK at %d)" % markpos
2398 # Pop everything at and after the topmost markobject.
2399 while stack[-1] is not markobject:
2400 stack.pop()
2401 stack.pop()
Tim Peters43277d62003-01-30 15:02:12 +00002402 # Stop later code from popping too much.
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002403 try:
Tim Peters43277d62003-01-30 15:02:12 +00002404 numtopop = before.index(markobject)
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002405 except ValueError:
2406 assert opcode.name == "POP"
Tim Peters43277d62003-01-30 15:02:12 +00002407 numtopop = 0
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002408 else:
2409 errormsg = markmsg = "no MARK exists on stack"
2410
2411 # Check for correct memo usage.
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01002412 if opcode.name in ("PUT", "BINPUT", "LONG_BINPUT", "MEMOIZE"):
2413 if opcode.name == "MEMOIZE":
2414 memo_idx = len(memo)
Serhiy Storchakadbc517c2015-10-13 21:20:14 +03002415 markmsg = "(as %d)" % memo_idx
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01002416 else:
2417 assert arg is not None
2418 memo_idx = arg
2419 if memo_idx in memo:
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002420 errormsg = "memo key %r already defined" % arg
2421 elif not stack:
2422 errormsg = "stack is empty -- can't store into memo"
2423 elif stack[-1] is markobject:
2424 errormsg = "can't store markobject in the memo"
2425 else:
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01002426 memo[memo_idx] = stack[-1]
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002427 elif opcode.name in ("GET", "BINGET", "LONG_BINGET"):
2428 if arg in memo:
2429 assert len(after) == 1
2430 after = [memo[arg]] # for better stack emulation
2431 else:
2432 errormsg = "memo key %r has never been stored into" % arg
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002433
2434 if arg is not None or markmsg:
2435 # make a mild effort to align arguments
2436 line += ' ' * (10 - len(opcode.name))
2437 if arg is not None:
2438 line += ' ' + repr(arg)
2439 if markmsg:
2440 line += ' ' + markmsg
Alexander Belopolsky929d3842010-07-17 15:51:21 +00002441 if annotate:
2442 line += ' ' * (annocol - len(line))
2443 # make a mild effort to align annotations
2444 annocol = len(line)
2445 if annocol > 50:
2446 annocol = annotate
2447 line += ' ' + opcode.doc.split('\n', 1)[0]
Guido van Rossumbe19ed72007-02-09 05:37:30 +00002448 print(line, file=out)
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002449
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002450 if errormsg:
2451 # Note that we delayed complaining until the offending opcode
2452 # was printed.
2453 raise ValueError(errormsg)
2454
2455 # Emulate the stack effects.
Tim Peters43277d62003-01-30 15:02:12 +00002456 if len(stack) < numtopop:
2457 raise ValueError("tries to pop %d items from stack with "
2458 "only %d items" % (numtopop, len(stack)))
2459 if numtopop:
2460 del stack[-numtopop:]
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002461 if markobject in after:
Tim Peters43277d62003-01-30 15:02:12 +00002462 assert markobject not in before
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002463 markstack.append(pos)
2464
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002465 stack.extend(after)
2466
Guido van Rossumbe19ed72007-02-09 05:37:30 +00002467 print("highest protocol among opcodes =", maxproto, file=out)
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002468 if stack:
2469 raise ValueError("stack not empty after STOP: %r" % stack)
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002470
Tim Peters90718a42005-02-15 16:22:34 +00002471# For use in the doctest, simply as an example of a class to pickle.
2472class _Example:
2473 def __init__(self, value):
2474 self.value = value
2475
Guido van Rossum03e35322003-01-28 15:37:13 +00002476_dis_test = r"""
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002477>>> import pickle
Guido van Rossumf4169812008-03-17 22:56:06 +00002478>>> x = [1, 2, (3, 4), {b'abc': "def"}]
2479>>> pkl0 = pickle.dumps(x, 0)
2480>>> dis(pkl0)
Tim Petersd0f7c862003-01-28 15:27:57 +00002481 0: ( MARK
2482 1: l LIST (MARK at 0)
2483 2: p PUT 0
Serhiy Storchaka3daaafb2017-11-16 09:44:43 +02002484 5: I INT 1
2485 8: a APPEND
2486 9: I INT 2
2487 12: a APPEND
2488 13: ( MARK
2489 14: I INT 3
2490 17: I INT 4
2491 20: t TUPLE (MARK at 13)
2492 21: p PUT 1
2493 24: a APPEND
2494 25: ( MARK
2495 26: d DICT (MARK at 25)
2496 27: p PUT 2
2497 30: c GLOBAL '_codecs encode'
2498 46: p PUT 3
2499 49: ( MARK
2500 50: V UNICODE 'abc'
2501 55: p PUT 4
2502 58: V UNICODE 'latin1'
2503 66: p PUT 5
2504 69: t TUPLE (MARK at 49)
2505 70: p PUT 6
2506 73: R REDUCE
2507 74: p PUT 7
2508 77: V UNICODE 'def'
2509 82: p PUT 8
2510 85: s SETITEM
2511 86: a APPEND
2512 87: . STOP
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002513highest protocol among opcodes = 0
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002514
2515Try again with a "binary" pickle.
2516
Guido van Rossumf4169812008-03-17 22:56:06 +00002517>>> pkl1 = pickle.dumps(x, 1)
2518>>> dis(pkl1)
Tim Petersd0f7c862003-01-28 15:27:57 +00002519 0: ] EMPTY_LIST
2520 1: q BINPUT 0
2521 3: ( MARK
2522 4: K BININT1 1
2523 6: K BININT1 2
2524 8: ( MARK
2525 9: K BININT1 3
2526 11: K BININT1 4
2527 13: t TUPLE (MARK at 8)
2528 14: q BINPUT 1
2529 16: } EMPTY_DICT
2530 17: q BINPUT 2
Alexandre Vassalotti3bfc65a2011-12-13 13:08:09 -05002531 19: c GLOBAL '_codecs encode'
2532 35: q BINPUT 3
2533 37: ( MARK
2534 38: X BINUNICODE 'abc'
2535 46: q BINPUT 4
2536 48: X BINUNICODE 'latin1'
2537 59: q BINPUT 5
2538 61: t TUPLE (MARK at 37)
2539 62: q BINPUT 6
2540 64: R REDUCE
2541 65: q BINPUT 7
2542 67: X BINUNICODE 'def'
2543 75: q BINPUT 8
2544 77: s SETITEM
2545 78: e APPENDS (MARK at 3)
2546 79: . STOP
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002547highest protocol among opcodes = 1
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002548
2549Exercise the INST/OBJ/BUILD family.
2550
Mark Dickinsoncddcf442009-01-24 21:46:33 +00002551>>> import pickletools
2552>>> dis(pickle.dumps(pickletools.dis, 0))
2553 0: c GLOBAL 'pickletools dis'
2554 17: p PUT 0
2555 20: . STOP
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002556highest protocol among opcodes = 0
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002557
Tim Peters90718a42005-02-15 16:22:34 +00002558>>> from pickletools import _Example
2559>>> x = [_Example(42)] * 2
Guido van Rossumf29d3d62003-01-27 22:47:53 +00002560>>> dis(pickle.dumps(x, 0))
Tim Petersd0f7c862003-01-28 15:27:57 +00002561 0: ( MARK
2562 1: l LIST (MARK at 0)
2563 2: p PUT 0
Antoine Pitroud9dfaa92009-06-04 20:32:06 +00002564 5: c GLOBAL 'copy_reg _reconstructor'
2565 30: p PUT 1
2566 33: ( MARK
2567 34: c GLOBAL 'pickletools _Example'
2568 56: p PUT 2
2569 59: c GLOBAL '__builtin__ object'
2570 79: p PUT 3
2571 82: N NONE
2572 83: t TUPLE (MARK at 33)
2573 84: p PUT 4
2574 87: R REDUCE
2575 88: p PUT 5
2576 91: ( MARK
2577 92: d DICT (MARK at 91)
2578 93: p PUT 6
2579 96: V UNICODE 'value'
2580 103: p PUT 7
Serhiy Storchaka3daaafb2017-11-16 09:44:43 +02002581 106: I INT 42
2582 110: s SETITEM
2583 111: b BUILD
2584 112: a APPEND
2585 113: g GET 5
2586 116: a APPEND
2587 117: . STOP
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002588highest protocol among opcodes = 0
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002589
2590>>> dis(pickle.dumps(x, 1))
Tim Petersd0f7c862003-01-28 15:27:57 +00002591 0: ] EMPTY_LIST
2592 1: q BINPUT 0
2593 3: ( MARK
Antoine Pitroud9dfaa92009-06-04 20:32:06 +00002594 4: c GLOBAL 'copy_reg _reconstructor'
2595 29: q BINPUT 1
2596 31: ( MARK
2597 32: c GLOBAL 'pickletools _Example'
2598 54: q BINPUT 2
2599 56: c GLOBAL '__builtin__ object'
2600 76: q BINPUT 3
2601 78: N NONE
2602 79: t TUPLE (MARK at 31)
2603 80: q BINPUT 4
2604 82: R REDUCE
2605 83: q BINPUT 5
2606 85: } EMPTY_DICT
2607 86: q BINPUT 6
2608 88: X BINUNICODE 'value'
2609 98: q BINPUT 7
2610 100: K BININT1 42
2611 102: s SETITEM
2612 103: b BUILD
2613 104: h BINGET 5
2614 106: e APPENDS (MARK at 3)
2615 107: . STOP
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002616highest protocol among opcodes = 1
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002617
2618Try "the canonical" recursive-object test.
2619
2620>>> L = []
2621>>> T = L,
2622>>> L.append(T)
2623>>> L[0] is T
2624True
2625>>> T[0] is L
2626True
2627>>> L[0][0] is L
2628True
2629>>> T[0][0] is T
2630True
Guido van Rossumf29d3d62003-01-27 22:47:53 +00002631>>> dis(pickle.dumps(L, 0))
Tim Petersd0f7c862003-01-28 15:27:57 +00002632 0: ( MARK
2633 1: l LIST (MARK at 0)
2634 2: p PUT 0
2635 5: ( MARK
2636 6: g GET 0
2637 9: t TUPLE (MARK at 5)
2638 10: p PUT 1
2639 13: a APPEND
2640 14: . STOP
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002641highest protocol among opcodes = 0
2642
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002643>>> dis(pickle.dumps(L, 1))
Tim Petersd0f7c862003-01-28 15:27:57 +00002644 0: ] EMPTY_LIST
2645 1: q BINPUT 0
2646 3: ( MARK
2647 4: h BINGET 0
2648 6: t TUPLE (MARK at 3)
2649 7: q BINPUT 1
2650 9: a APPEND
2651 10: . STOP
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002652highest protocol among opcodes = 1
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002653
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002654Note that, in the protocol 0 pickle of the recursive tuple, the disassembler
2655has to emulate the stack in order to realize that the POP opcode at 16 gets
2656rid of the MARK at 0.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002657
Guido van Rossumf29d3d62003-01-27 22:47:53 +00002658>>> dis(pickle.dumps(T, 0))
Tim Petersd0f7c862003-01-28 15:27:57 +00002659 0: ( MARK
2660 1: ( MARK
2661 2: l LIST (MARK at 1)
2662 3: p PUT 0
2663 6: ( MARK
2664 7: g GET 0
2665 10: t TUPLE (MARK at 6)
2666 11: p PUT 1
2667 14: a APPEND
2668 15: 0 POP
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002669 16: 0 POP (MARK at 0)
2670 17: g GET 1
2671 20: . STOP
2672highest protocol among opcodes = 0
2673
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002674>>> dis(pickle.dumps(T, 1))
Tim Petersd0f7c862003-01-28 15:27:57 +00002675 0: ( MARK
2676 1: ] EMPTY_LIST
2677 2: q BINPUT 0
2678 4: ( MARK
2679 5: h BINGET 0
2680 7: t TUPLE (MARK at 4)
2681 8: q BINPUT 1
2682 10: a APPEND
2683 11: 1 POP_MARK (MARK at 0)
2684 12: h BINGET 1
2685 14: . STOP
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002686highest protocol among opcodes = 1
Tim Petersd0f7c862003-01-28 15:27:57 +00002687
2688Try protocol 2.
2689
2690>>> dis(pickle.dumps(L, 2))
2691 0: \x80 PROTO 2
2692 2: ] EMPTY_LIST
2693 3: q BINPUT 0
2694 5: h BINGET 0
2695 7: \x85 TUPLE1
2696 8: q BINPUT 1
2697 10: a APPEND
2698 11: . STOP
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002699highest protocol among opcodes = 2
Tim Petersd0f7c862003-01-28 15:27:57 +00002700
2701>>> dis(pickle.dumps(T, 2))
2702 0: \x80 PROTO 2
2703 2: ] EMPTY_LIST
2704 3: q BINPUT 0
2705 5: h BINGET 0
2706 7: \x85 TUPLE1
2707 8: q BINPUT 1
2708 10: a APPEND
2709 11: 0 POP
2710 12: h BINGET 1
2711 14: . STOP
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002712highest protocol among opcodes = 2
Alexander Belopolsky929d3842010-07-17 15:51:21 +00002713
2714Try protocol 3 with annotations:
2715
2716>>> dis(pickle.dumps(T, 3), annotate=1)
2717 0: \x80 PROTO 3 Protocol version indicator.
2718 2: ] EMPTY_LIST Push an empty list.
2719 3: q BINPUT 0 Store the stack top into the memo. The stack is not popped.
2720 5: h BINGET 0 Read an object from the memo and push it on the stack.
2721 7: \x85 TUPLE1 Build a one-tuple out of the topmost item on the stack.
2722 8: q BINPUT 1 Store the stack top into the memo. The stack is not popped.
2723 10: a APPEND Append an object to a list.
2724 11: 0 POP Discard the top stack item, shrinking the stack by one item.
2725 12: h BINGET 1 Read an object from the memo and push it on the stack.
2726 14: . STOP Stop the unpickling machine.
2727highest protocol among opcodes = 2
2728
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002729"""
2730
Tim Peters62235e72003-02-05 19:55:53 +00002731_memo_test = r"""
2732>>> import pickle
Guido van Rossumcfe5f202007-05-08 21:26:54 +00002733>>> import io
2734>>> f = io.BytesIO()
Tim Peters62235e72003-02-05 19:55:53 +00002735>>> p = pickle.Pickler(f, 2)
2736>>> x = [1, 2, 3]
2737>>> p.dump(x)
2738>>> p.dump(x)
2739>>> f.seek(0)
Guido van Rossumcfe5f202007-05-08 21:26:54 +000027400
Tim Peters62235e72003-02-05 19:55:53 +00002741>>> memo = {}
2742>>> dis(f, memo=memo)
2743 0: \x80 PROTO 2
2744 2: ] EMPTY_LIST
2745 3: q BINPUT 0
2746 5: ( MARK
2747 6: K BININT1 1
2748 8: K BININT1 2
2749 10: K BININT1 3
2750 12: e APPENDS (MARK at 5)
2751 13: . STOP
2752highest protocol among opcodes = 2
2753>>> dis(f, memo=memo)
2754 14: \x80 PROTO 2
2755 16: h BINGET 0
2756 18: . STOP
2757highest protocol among opcodes = 2
2758"""
2759
Guido van Rossum57028352003-01-28 15:09:10 +00002760__test__ = {'disassembler_test': _dis_test,
Tim Peters62235e72003-02-05 19:55:53 +00002761 'disassembler_memo_test': _memo_test,
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002762 }
2763
2764def _test():
2765 import doctest
2766 return doctest.testmod()
2767
2768if __name__ == "__main__":
Benjamin Peterson669ff662015-10-28 23:15:13 -07002769 import argparse
Alexander Belopolsky60c762b2010-07-03 20:35:53 +00002770 parser = argparse.ArgumentParser(
2771 description='disassemble one or more pickle files')
2772 parser.add_argument(
2773 'pickle_file', type=argparse.FileType('br'),
2774 nargs='*', help='the pickle file')
2775 parser.add_argument(
2776 '-o', '--output', default=sys.stdout, type=argparse.FileType('w'),
2777 help='the file where the output should be written')
2778 parser.add_argument(
2779 '-m', '--memo', action='store_true',
2780 help='preserve memo between disassemblies')
2781 parser.add_argument(
2782 '-l', '--indentlevel', default=4, type=int,
2783 help='the number of blanks by which to indent a new MARK level')
2784 parser.add_argument(
Alexander Belopolsky929d3842010-07-17 15:51:21 +00002785 '-a', '--annotate', action='store_true',
2786 help='annotate each line with a short opcode description')
2787 parser.add_argument(
Alexander Belopolsky60c762b2010-07-03 20:35:53 +00002788 '-p', '--preamble', default="==> {name} <==",
2789 help='if more than one pickle file is specified, print this before'
2790 ' each disassembly')
2791 parser.add_argument(
2792 '-t', '--test', action='store_true',
2793 help='run self-test suite')
2794 parser.add_argument(
2795 '-v', action='store_true',
2796 help='run verbosely; only affects self-test run')
2797 args = parser.parse_args()
2798 if args.test:
2799 _test()
2800 else:
Alexander Belopolsky929d3842010-07-17 15:51:21 +00002801 annotate = 30 if args.annotate else 0
Alexander Belopolsky60c762b2010-07-03 20:35:53 +00002802 if not args.pickle_file:
2803 parser.print_help()
2804 elif len(args.pickle_file) == 1:
Alexander Belopolsky929d3842010-07-17 15:51:21 +00002805 dis(args.pickle_file[0], args.output, None,
2806 args.indentlevel, annotate)
Alexander Belopolsky60c762b2010-07-03 20:35:53 +00002807 else:
2808 memo = {} if args.memo else None
2809 for f in args.pickle_file:
2810 preamble = args.preamble.format(name=f.name)
2811 args.output.write(preamble + '\n')
Alexander Belopolsky929d3842010-07-17 15:51:21 +00002812 dis(f, args.output, memo, args.indentlevel, annotate)