blob: 8486cbf84368786546fff02d918c8c28a1b2f3f2 [file] [log] [blame]
Skip Montanaro54455942003-01-29 15:41:33 +00001'''"Executable documentation" for the pickle module.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002
3Extensive comments about the pickle protocols and pickle-machine opcodes
4can be found here. Some functions meant for external use:
5
6genops(pickle)
7 Generate all the opcodes in a pickle, as (opcode, arg, position) triples.
8
Andrew M. Kuchlingd0c53fe2004-08-07 16:51:30 +00009dis(pickle, out=None, memo=None, indentlevel=4)
Tim Peters8ecfc8e2003-01-27 18:51:48 +000010 Print a symbolic disassembly of a pickle.
Skip Montanaro54455942003-01-29 15:41:33 +000011'''
Tim Peters8ecfc8e2003-01-27 18:51:48 +000012
Walter Dörwald42748a82007-06-12 16:40:17 +000013import codecs
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +010014import io
Guido van Rossum98297ee2007-11-06 21:34:58 +000015import pickle
16import re
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -070017import sys
Walter Dörwald42748a82007-06-12 16:40:17 +000018
Christian Heimes3feef612008-02-11 06:19:17 +000019__all__ = ['dis', 'genops', 'optimize']
Tim Peters90cf2122004-11-06 23:45:48 +000020
Guido van Rossum98297ee2007-11-06 21:34:58 +000021bytes_types = pickle.bytes_types
22
Tim Peters8ecfc8e2003-01-27 18:51:48 +000023# Other ideas:
24#
25# - A pickle verifier: read a pickle and check it exhaustively for
Tim Petersc1c2b3e2003-01-29 20:12:21 +000026# well-formedness. dis() does a lot of this already.
Tim Peters8ecfc8e2003-01-27 18:51:48 +000027#
28# - A protocol identifier: examine a pickle and return its protocol number
29# (== the highest .proto attr value among all the opcodes in the pickle).
Tim Petersc1c2b3e2003-01-29 20:12:21 +000030# dis() already prints this info at the end.
Tim Peters8ecfc8e2003-01-27 18:51:48 +000031#
32# - A pickle optimizer: for example, tuple-building code is sometimes more
33# elaborate than necessary, catering for the possibility that the tuple
34# is recursive. Or lots of times a PUT is generated that's never accessed
35# by a later GET.
36
37
Victor Stinner765531d2013-03-26 01:11:54 +010038# "A pickle" is a program for a virtual pickle machine (PM, but more accurately
39# called an unpickling machine). It's a sequence of opcodes, interpreted by the
40# PM, building an arbitrarily complex Python object.
41#
42# For the most part, the PM is very simple: there are no looping, testing, or
43# conditional instructions, no arithmetic and no function calls. Opcodes are
44# executed once each, from first to last, until a STOP opcode is reached.
45#
46# The PM has two data areas, "the stack" and "the memo".
47#
48# Many opcodes push Python objects onto the stack; e.g., INT pushes a Python
49# integer object on the stack, whose value is gotten from a decimal string
50# literal immediately following the INT opcode in the pickle bytestream. Other
51# opcodes take Python objects off the stack. The result of unpickling is
52# whatever object is left on the stack when the final STOP opcode is executed.
53#
54# The memo is simply an array of objects, or it can be implemented as a dict
55# mapping little integers to objects. The memo serves as the PM's "long term
56# memory", and the little integers indexing the memo are akin to variable
57# names. Some opcodes pop a stack object into the memo at a given index,
58# and others push a memo object at a given index onto the stack again.
59#
60# At heart, that's all the PM has. Subtleties arise for these reasons:
61#
62# + Object identity. Objects can be arbitrarily complex, and subobjects
63# may be shared (for example, the list [a, a] refers to the same object a
64# twice). It can be vital that unpickling recreate an isomorphic object
65# graph, faithfully reproducing sharing.
66#
67# + Recursive objects. For example, after "L = []; L.append(L)", L is a
68# list, and L[0] is the same list. This is related to the object identity
69# point, and some sequences of pickle opcodes are subtle in order to
70# get the right result in all cases.
71#
72# + Things pickle doesn't know everything about. Examples of things pickle
73# does know everything about are Python's builtin scalar and container
74# types, like ints and tuples. They generally have opcodes dedicated to
75# them. For things like module references and instances of user-defined
76# classes, pickle's knowledge is limited. Historically, many enhancements
77# have been made to the pickle protocol in order to do a better (faster,
78# and/or more compact) job on those.
79#
80# + Backward compatibility and micro-optimization. As explained below,
81# pickle opcodes never go away, not even when better ways to do a thing
82# get invented. The repertoire of the PM just keeps growing over time.
83# For example, protocol 0 had two opcodes for building Python integers (INT
84# and LONG), protocol 1 added three more for more-efficient pickling of short
85# integers, and protocol 2 added two more for more-efficient pickling of
86# long integers (before protocol 2, the only ways to pickle a Python long
87# took time quadratic in the number of digits, for both pickling and
88# unpickling). "Opcode bloat" isn't so much a subtlety as a source of
89# wearying complication.
90#
91#
92# Pickle protocols:
93#
94# For compatibility, the meaning of a pickle opcode never changes. Instead new
95# pickle opcodes get added, and each version's unpickler can handle all the
96# pickle opcodes in all protocol versions to date. So old pickles continue to
97# be readable forever. The pickler can generally be told to restrict itself to
98# the subset of opcodes available under previous protocol versions too, so that
99# users can create pickles under the current version readable by older
100# versions. However, a pickle does not contain its version number embedded
101# within it. If an older unpickler tries to read a pickle using a later
102# protocol, the result is most likely an exception due to seeing an unknown (in
103# the older unpickler) opcode.
104#
105# The original pickle used what's now called "protocol 0", and what was called
106# "text mode" before Python 2.3. The entire pickle bytestream is made up of
107# printable 7-bit ASCII characters, plus the newline character, in protocol 0.
108# That's why it was called text mode. Protocol 0 is small and elegant, but
109# sometimes painfully inefficient.
110#
111# The second major set of additions is now called "protocol 1", and was called
112# "binary mode" before Python 2.3. This added many opcodes with arguments
113# consisting of arbitrary bytes, including NUL bytes and unprintable "high bit"
114# bytes. Binary mode pickles can be substantially smaller than equivalent
115# text mode pickles, and sometimes faster too; e.g., BININT represents a 4-byte
116# int as 4 bytes following the opcode, which is cheaper to unpickle than the
117# (perhaps) 11-character decimal string attached to INT. Protocol 1 also added
118# a number of opcodes that operate on many stack elements at once (like APPENDS
119# and SETITEMS), and "shortcut" opcodes (like EMPTY_DICT and EMPTY_TUPLE).
120#
121# The third major set of additions came in Python 2.3, and is called "protocol
122# 2". This added:
123#
124# - A better way to pickle instances of new-style classes (NEWOBJ).
125#
126# - A way for a pickle to identify its protocol (PROTO).
127#
128# - Time- and space- efficient pickling of long ints (LONG{1,4}).
129#
130# - Shortcuts for small tuples (TUPLE{1,2,3}}.
131#
132# - Dedicated opcodes for bools (NEWTRUE, NEWFALSE).
133#
134# - The "extension registry", a vector of popular objects that can be pushed
135# efficiently by index (EXT{1,2,4}). This is akin to the memo and GET, but
136# the registry contents are predefined (there's nothing akin to the memo's
137# PUT).
138#
139# Another independent change with Python 2.3 is the abandonment of any
140# pretense that it might be safe to load pickles received from untrusted
141# parties -- no sufficient security analysis has been done to guarantee
142# this and there isn't a use case that warrants the expense of such an
143# analysis.
144#
145# To this end, all tests for __safe_for_unpickling__ or for
146# copyreg.safe_constructors are removed from the unpickling code.
147# References to these variables in the descriptions below are to be seen
148# as describing unpickling in Python 2.2 and before.
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000149
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000150
151# Meta-rule: Descriptions are stored in instances of descriptor objects,
152# with plain constructors. No meta-language is defined from which
153# descriptors could be constructed. If you want, e.g., XML, write a little
154# program to generate XML from the objects.
155
156##############################################################################
157# Some pickle opcodes have an argument, following the opcode in the
158# bytestream. An argument is of a specific type, described by an instance
159# of ArgumentDescriptor. These are not to be confused with arguments taken
160# off the stack -- ArgumentDescriptor applies only to arguments embedded in
161# the opcode stream, immediately following an opcode.
162
163# Represents the number of bytes consumed by an argument delimited by the
164# next newline character.
165UP_TO_NEWLINE = -1
166
167# Represents the number of bytes consumed by a two-argument opcode where
168# the first argument gives the number of bytes in the second argument.
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -0700169TAKEN_FROM_ARGUMENT1 = -2 # num bytes is 1-byte unsigned int
170TAKEN_FROM_ARGUMENT4 = -3 # num bytes is 4-byte signed little-endian int
171TAKEN_FROM_ARGUMENT4U = -4 # num bytes is 4-byte unsigned little-endian int
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100172TAKEN_FROM_ARGUMENT8U = -5 # num bytes is 8-byte unsigned little-endian int
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000173
174class ArgumentDescriptor(object):
175 __slots__ = (
176 # name of descriptor record, also a module global name; a string
177 'name',
178
179 # length of argument, in bytes; an int; UP_TO_NEWLINE and
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100180 # TAKEN_FROM_ARGUMENT{1,4,8} are negative values for variable-length
Tim Petersfdb8cfa2003-01-28 00:13:19 +0000181 # cases
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000182 'n',
183
184 # a function taking a file-like object, reading this kind of argument
185 # from the object at the current position, advancing the current
186 # position by n bytes, and returning the value of the argument
187 'reader',
188
189 # human-readable docs for this arg descriptor; a string
190 'doc',
191 )
192
193 def __init__(self, name, n, reader, doc):
194 assert isinstance(name, str)
195 self.name = name
196
197 assert isinstance(n, int) and (n >= 0 or
Tim Petersfdb8cfa2003-01-28 00:13:19 +0000198 n in (UP_TO_NEWLINE,
199 TAKEN_FROM_ARGUMENT1,
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -0700200 TAKEN_FROM_ARGUMENT4,
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100201 TAKEN_FROM_ARGUMENT4U,
202 TAKEN_FROM_ARGUMENT8U))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000203 self.n = n
204
205 self.reader = reader
206
207 assert isinstance(doc, str)
208 self.doc = doc
209
210from struct import unpack as _unpack
211
212def read_uint1(f):
Tim Peters55762f52003-01-28 16:01:25 +0000213 r"""
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000214 >>> import io
215 >>> read_uint1(io.BytesIO(b'\xff'))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000216 255
217 """
218
219 data = f.read(1)
220 if data:
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000221 return data[0]
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000222 raise ValueError("not enough data in stream to read uint1")
223
224uint1 = ArgumentDescriptor(
225 name='uint1',
226 n=1,
227 reader=read_uint1,
228 doc="One-byte unsigned integer.")
229
230
231def read_uint2(f):
Tim Peters55762f52003-01-28 16:01:25 +0000232 r"""
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000233 >>> import io
234 >>> read_uint2(io.BytesIO(b'\xff\x00'))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000235 255
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000236 >>> read_uint2(io.BytesIO(b'\xff\xff'))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000237 65535
238 """
239
240 data = f.read(2)
241 if len(data) == 2:
242 return _unpack("<H", data)[0]
243 raise ValueError("not enough data in stream to read uint2")
244
245uint2 = ArgumentDescriptor(
246 name='uint2',
247 n=2,
248 reader=read_uint2,
249 doc="Two-byte unsigned integer, little-endian.")
250
251
252def read_int4(f):
Tim Peters55762f52003-01-28 16:01:25 +0000253 r"""
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000254 >>> import io
255 >>> read_int4(io.BytesIO(b'\xff\x00\x00\x00'))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000256 255
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000257 >>> read_int4(io.BytesIO(b'\x00\x00\x00\x80')) == -(2**31)
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000258 True
259 """
260
261 data = f.read(4)
262 if len(data) == 4:
263 return _unpack("<i", data)[0]
264 raise ValueError("not enough data in stream to read int4")
265
266int4 = ArgumentDescriptor(
267 name='int4',
268 n=4,
269 reader=read_int4,
270 doc="Four-byte signed integer, little-endian, 2's complement.")
271
272
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -0700273def read_uint4(f):
274 r"""
275 >>> import io
276 >>> read_uint4(io.BytesIO(b'\xff\x00\x00\x00'))
277 255
278 >>> read_uint4(io.BytesIO(b'\x00\x00\x00\x80')) == 2**31
279 True
280 """
281
282 data = f.read(4)
283 if len(data) == 4:
284 return _unpack("<I", data)[0]
285 raise ValueError("not enough data in stream to read uint4")
286
287uint4 = ArgumentDescriptor(
288 name='uint4',
289 n=4,
290 reader=read_uint4,
291 doc="Four-byte unsigned integer, little-endian.")
292
293
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100294def read_uint8(f):
295 r"""
296 >>> import io
297 >>> read_uint8(io.BytesIO(b'\xff\x00\x00\x00\x00\x00\x00\x00'))
298 255
299 >>> read_uint8(io.BytesIO(b'\xff' * 8)) == 2**64-1
300 True
301 """
302
303 data = f.read(8)
304 if len(data) == 8:
305 return _unpack("<Q", data)[0]
306 raise ValueError("not enough data in stream to read uint8")
307
308uint8 = ArgumentDescriptor(
309 name='uint8',
310 n=8,
311 reader=read_uint8,
312 doc="Eight-byte unsigned integer, little-endian.")
313
314
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000315def read_stringnl(f, decode=True, stripquotes=True):
Tim Peters55762f52003-01-28 16:01:25 +0000316 r"""
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000317 >>> import io
318 >>> read_stringnl(io.BytesIO(b"'abcd'\nefg\n"))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000319 'abcd'
320
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000321 >>> read_stringnl(io.BytesIO(b"\n"))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000322 Traceback (most recent call last):
323 ...
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000324 ValueError: no string quotes around b''
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000325
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000326 >>> read_stringnl(io.BytesIO(b"\n"), stripquotes=False)
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000327 ''
328
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000329 >>> read_stringnl(io.BytesIO(b"''\n"))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000330 ''
331
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000332 >>> read_stringnl(io.BytesIO(b'"abcd"'))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000333 Traceback (most recent call last):
334 ...
335 ValueError: no newline found when trying to read stringnl
336
337 Embedded escapes are undone in the result.
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000338 >>> read_stringnl(io.BytesIO(br"'a\n\\b\x00c\td'" + b"\n'e'"))
Tim Peters55762f52003-01-28 16:01:25 +0000339 'a\n\\b\x00c\td'
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000340 """
341
Guido van Rossum26986312007-07-17 00:19:46 +0000342 data = f.readline()
Guido van Rossum26d95c32007-08-27 23:18:54 +0000343 if not data.endswith(b'\n'):
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000344 raise ValueError("no newline found when trying to read stringnl")
345 data = data[:-1] # lose the newline
346
347 if stripquotes:
Guido van Rossum26d95c32007-08-27 23:18:54 +0000348 for q in (b'"', b"'"):
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000349 if data.startswith(q):
350 if not data.endswith(q):
351 raise ValueError("strinq quote %r not found at both "
352 "ends of %r" % (q, data))
353 data = data[1:-1]
354 break
355 else:
356 raise ValueError("no string quotes around %r" % data)
357
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000358 if decode:
Guido van Rossum98297ee2007-11-06 21:34:58 +0000359 data = codecs.escape_decode(data)[0].decode("ascii")
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000360 return data
361
362stringnl = ArgumentDescriptor(
363 name='stringnl',
364 n=UP_TO_NEWLINE,
365 reader=read_stringnl,
366 doc="""A newline-terminated string.
367
368 This is a repr-style string, with embedded escapes, and
369 bracketing quotes.
370 """)
371
372def read_stringnl_noescape(f):
Guido van Rossum98297ee2007-11-06 21:34:58 +0000373 return read_stringnl(f, stripquotes=False)
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000374
375stringnl_noescape = ArgumentDescriptor(
376 name='stringnl_noescape',
377 n=UP_TO_NEWLINE,
378 reader=read_stringnl_noescape,
379 doc="""A newline-terminated string.
380
381 This is a str-style string, without embedded escapes,
382 or bracketing quotes. It should consist solely of
383 printable ASCII characters.
384 """)
385
386def read_stringnl_noescape_pair(f):
Tim Peters55762f52003-01-28 16:01:25 +0000387 r"""
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000388 >>> import io
389 >>> read_stringnl_noescape_pair(io.BytesIO(b"Queue\nEmpty\njunk"))
Tim Petersd916cf42003-01-27 19:01:47 +0000390 'Queue Empty'
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000391 """
392
Tim Petersd916cf42003-01-27 19:01:47 +0000393 return "%s %s" % (read_stringnl_noescape(f), read_stringnl_noescape(f))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000394
395stringnl_noescape_pair = ArgumentDescriptor(
396 name='stringnl_noescape_pair',
397 n=UP_TO_NEWLINE,
398 reader=read_stringnl_noescape_pair,
399 doc="""A pair of newline-terminated strings.
400
401 These are str-style strings, without embedded
402 escapes, or bracketing quotes. They should
403 consist solely of printable ASCII characters.
404 The pair is returned as a single string, with
Tim Petersd916cf42003-01-27 19:01:47 +0000405 a single blank separating the two strings.
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000406 """)
407
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100408
409def read_string1(f):
410 r"""
411 >>> import io
412 >>> read_string1(io.BytesIO(b"\x00"))
413 ''
414 >>> read_string1(io.BytesIO(b"\x03abcdef"))
415 'abc'
416 """
417
418 n = read_uint1(f)
419 assert n >= 0
420 data = f.read(n)
421 if len(data) == n:
422 return data.decode("latin-1")
423 raise ValueError("expected %d bytes in a string1, but only %d remain" %
424 (n, len(data)))
425
426string1 = ArgumentDescriptor(
427 name="string1",
428 n=TAKEN_FROM_ARGUMENT1,
429 reader=read_string1,
430 doc="""A counted string.
431
432 The first argument is a 1-byte unsigned int giving the number
433 of bytes in the string, and the second argument is that many
434 bytes.
435 """)
436
437
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000438def read_string4(f):
Tim Peters55762f52003-01-28 16:01:25 +0000439 r"""
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000440 >>> import io
441 >>> read_string4(io.BytesIO(b"\x00\x00\x00\x00abc"))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000442 ''
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000443 >>> read_string4(io.BytesIO(b"\x03\x00\x00\x00abcdef"))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000444 'abc'
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000445 >>> read_string4(io.BytesIO(b"\x00\x00\x00\x03abcdef"))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000446 Traceback (most recent call last):
447 ...
448 ValueError: expected 50331648 bytes in a string4, but only 6 remain
449 """
450
451 n = read_int4(f)
452 if n < 0:
453 raise ValueError("string4 byte count < 0: %d" % n)
454 data = f.read(n)
455 if len(data) == n:
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000456 return data.decode("latin-1")
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000457 raise ValueError("expected %d bytes in a string4, but only %d remain" %
458 (n, len(data)))
459
460string4 = ArgumentDescriptor(
461 name="string4",
Tim Petersfdb8cfa2003-01-28 00:13:19 +0000462 n=TAKEN_FROM_ARGUMENT4,
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000463 reader=read_string4,
464 doc="""A counted string.
465
466 The first argument is a 4-byte little-endian signed int giving
467 the number of bytes in the string, and the second argument is
468 that many bytes.
469 """)
470
471
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100472def read_bytes1(f):
Tim Peters55762f52003-01-28 16:01:25 +0000473 r"""
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000474 >>> import io
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100475 >>> read_bytes1(io.BytesIO(b"\x00"))
476 b''
477 >>> read_bytes1(io.BytesIO(b"\x03abcdef"))
478 b'abc'
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000479 """
480
481 n = read_uint1(f)
482 assert n >= 0
483 data = f.read(n)
484 if len(data) == n:
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100485 return data
486 raise ValueError("expected %d bytes in a bytes1, but only %d remain" %
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000487 (n, len(data)))
488
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100489bytes1 = ArgumentDescriptor(
490 name="bytes1",
Tim Petersfdb8cfa2003-01-28 00:13:19 +0000491 n=TAKEN_FROM_ARGUMENT1,
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100492 reader=read_bytes1,
493 doc="""A counted bytes string.
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000494
495 The first argument is a 1-byte unsigned int giving the number
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -0700496 of bytes, and the second argument is that many bytes.
497 """)
498
499
500def read_bytes4(f):
501 r"""
502 >>> import io
503 >>> read_bytes4(io.BytesIO(b"\x00\x00\x00\x00abc"))
504 b''
505 >>> read_bytes4(io.BytesIO(b"\x03\x00\x00\x00abcdef"))
506 b'abc'
507 >>> read_bytes4(io.BytesIO(b"\x00\x00\x00\x03abcdef"))
508 Traceback (most recent call last):
509 ...
510 ValueError: expected 50331648 bytes in a bytes4, but only 6 remain
511 """
512
513 n = read_uint4(f)
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100514 assert n >= 0
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -0700515 if n > sys.maxsize:
516 raise ValueError("bytes4 byte count > sys.maxsize: %d" % n)
517 data = f.read(n)
518 if len(data) == n:
519 return data
520 raise ValueError("expected %d bytes in a bytes4, but only %d remain" %
521 (n, len(data)))
522
523bytes4 = ArgumentDescriptor(
524 name="bytes4",
525 n=TAKEN_FROM_ARGUMENT4U,
526 reader=read_bytes4,
527 doc="""A counted bytes string.
528
529 The first argument is a 4-byte little-endian unsigned int giving
530 the number of bytes, and the second argument is that many bytes.
531 """)
532
533
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100534def read_bytes8(f):
535 r"""
Gregory P. Smith057e58d2013-11-23 20:40:46 +0000536 >>> import io, struct, sys
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100537 >>> read_bytes8(io.BytesIO(b"\x00\x00\x00\x00\x00\x00\x00\x00abc"))
538 b''
539 >>> read_bytes8(io.BytesIO(b"\x03\x00\x00\x00\x00\x00\x00\x00abcdef"))
540 b'abc'
Gregory P. Smith057e58d2013-11-23 20:40:46 +0000541 >>> bigsize8 = struct.pack("<Q", sys.maxsize//3)
542 >>> read_bytes8(io.BytesIO(bigsize8 + b"abcdef")) #doctest: +ELLIPSIS
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100543 Traceback (most recent call last):
544 ...
Gregory P. Smith057e58d2013-11-23 20:40:46 +0000545 ValueError: expected ... bytes in a bytes8, but only 6 remain
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100546 """
547
548 n = read_uint8(f)
549 assert n >= 0
550 if n > sys.maxsize:
551 raise ValueError("bytes8 byte count > sys.maxsize: %d" % n)
552 data = f.read(n)
553 if len(data) == n:
554 return data
555 raise ValueError("expected %d bytes in a bytes8, but only %d remain" %
556 (n, len(data)))
557
558bytes8 = ArgumentDescriptor(
559 name="bytes8",
560 n=TAKEN_FROM_ARGUMENT8U,
561 reader=read_bytes8,
562 doc="""A counted bytes string.
563
Martin Panter4c359642016-05-08 13:53:41 +0000564 The first argument is an 8-byte little-endian unsigned int giving
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100565 the number of bytes, and the second argument is that many bytes.
566 """)
567
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000568def read_unicodestringnl(f):
Tim Peters55762f52003-01-28 16:01:25 +0000569 r"""
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000570 >>> import io
571 >>> read_unicodestringnl(io.BytesIO(b"abc\\uabcd\njunk")) == 'abc\uabcd'
572 True
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000573 """
574
Guido van Rossum26986312007-07-17 00:19:46 +0000575 data = f.readline()
Guido van Rossum26d95c32007-08-27 23:18:54 +0000576 if not data.endswith(b'\n'):
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000577 raise ValueError("no newline found when trying to read "
578 "unicodestringnl")
579 data = data[:-1] # lose the newline
Guido van Rossumef87d6e2007-05-02 19:09:54 +0000580 return str(data, 'raw-unicode-escape')
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000581
582unicodestringnl = ArgumentDescriptor(
583 name='unicodestringnl',
584 n=UP_TO_NEWLINE,
585 reader=read_unicodestringnl,
586 doc="""A newline-terminated Unicode string.
587
588 This is raw-unicode-escape encoded, so consists of
589 printable ASCII characters, and may contain embedded
590 escape sequences.
591 """)
592
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100593
594def read_unicodestring1(f):
595 r"""
596 >>> import io
597 >>> s = 'abcd\uabcd'
598 >>> enc = s.encode('utf-8')
599 >>> enc
600 b'abcd\xea\xaf\x8d'
601 >>> n = bytes([len(enc)]) # little-endian 1-byte length
602 >>> t = read_unicodestring1(io.BytesIO(n + enc + b'junk'))
603 >>> s == t
604 True
605
606 >>> read_unicodestring1(io.BytesIO(n + enc[:-1]))
607 Traceback (most recent call last):
608 ...
609 ValueError: expected 7 bytes in a unicodestring1, but only 6 remain
610 """
611
612 n = read_uint1(f)
613 assert n >= 0
614 data = f.read(n)
615 if len(data) == n:
616 return str(data, 'utf-8', 'surrogatepass')
617 raise ValueError("expected %d bytes in a unicodestring1, but only %d "
618 "remain" % (n, len(data)))
619
620unicodestring1 = ArgumentDescriptor(
621 name="unicodestring1",
622 n=TAKEN_FROM_ARGUMENT1,
623 reader=read_unicodestring1,
624 doc="""A counted Unicode string.
625
626 The first argument is a 1-byte little-endian signed int
627 giving the number of bytes in the string, and the second
628 argument-- the UTF-8 encoding of the Unicode string --
629 contains that many bytes.
630 """)
631
632
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000633def read_unicodestring4(f):
Tim Peters55762f52003-01-28 16:01:25 +0000634 r"""
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000635 >>> import io
636 >>> s = 'abcd\uabcd'
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000637 >>> enc = s.encode('utf-8')
638 >>> enc
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000639 b'abcd\xea\xaf\x8d'
640 >>> n = bytes([len(enc), 0, 0, 0]) # little-endian 4-byte length
641 >>> t = read_unicodestring4(io.BytesIO(n + enc + b'junk'))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000642 >>> s == t
643 True
644
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000645 >>> read_unicodestring4(io.BytesIO(n + enc[:-1]))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000646 Traceback (most recent call last):
647 ...
648 ValueError: expected 7 bytes in a unicodestring4, but only 6 remain
649 """
650
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -0700651 n = read_uint4(f)
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100652 assert n >= 0
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -0700653 if n > sys.maxsize:
654 raise ValueError("unicodestring4 byte count > sys.maxsize: %d" % n)
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000655 data = f.read(n)
656 if len(data) == n:
Victor Stinner485fb562010-04-13 11:07:24 +0000657 return str(data, 'utf-8', 'surrogatepass')
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000658 raise ValueError("expected %d bytes in a unicodestring4, but only %d "
659 "remain" % (n, len(data)))
660
661unicodestring4 = ArgumentDescriptor(
662 name="unicodestring4",
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -0700663 n=TAKEN_FROM_ARGUMENT4U,
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000664 reader=read_unicodestring4,
665 doc="""A counted Unicode string.
666
667 The first argument is a 4-byte little-endian signed int
668 giving the number of bytes in the string, and the second
669 argument-- the UTF-8 encoding of the Unicode string --
670 contains that many bytes.
671 """)
672
673
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100674def read_unicodestring8(f):
675 r"""
676 >>> import io
677 >>> s = 'abcd\uabcd'
678 >>> enc = s.encode('utf-8')
679 >>> enc
680 b'abcd\xea\xaf\x8d'
Serhiy Storchaka5f1a5182016-09-11 14:41:02 +0300681 >>> n = bytes([len(enc)]) + b'\0' * 7 # little-endian 8-byte length
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100682 >>> t = read_unicodestring8(io.BytesIO(n + enc + b'junk'))
683 >>> s == t
684 True
685
686 >>> read_unicodestring8(io.BytesIO(n + enc[:-1]))
687 Traceback (most recent call last):
688 ...
689 ValueError: expected 7 bytes in a unicodestring8, but only 6 remain
690 """
691
692 n = read_uint8(f)
693 assert n >= 0
694 if n > sys.maxsize:
695 raise ValueError("unicodestring8 byte count > sys.maxsize: %d" % n)
696 data = f.read(n)
697 if len(data) == n:
698 return str(data, 'utf-8', 'surrogatepass')
699 raise ValueError("expected %d bytes in a unicodestring8, but only %d "
700 "remain" % (n, len(data)))
701
702unicodestring8 = ArgumentDescriptor(
703 name="unicodestring8",
704 n=TAKEN_FROM_ARGUMENT8U,
705 reader=read_unicodestring8,
706 doc="""A counted Unicode string.
707
Martin Panter4c359642016-05-08 13:53:41 +0000708 The first argument is an 8-byte little-endian signed int
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100709 giving the number of bytes in the string, and the second
710 argument-- the UTF-8 encoding of the Unicode string --
711 contains that many bytes.
712 """)
713
714
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000715def read_decimalnl_short(f):
Tim Peters55762f52003-01-28 16:01:25 +0000716 r"""
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000717 >>> import io
718 >>> read_decimalnl_short(io.BytesIO(b"1234\n56"))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000719 1234
720
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000721 >>> read_decimalnl_short(io.BytesIO(b"1234L\n56"))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000722 Traceback (most recent call last):
723 ...
Serhiy Storchaka95949422013-08-27 19:40:23 +0300724 ValueError: invalid literal for int() with base 10: b'1234L'
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000725 """
726
727 s = read_stringnl(f, decode=False, stripquotes=False)
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000728
Serhiy Storchaka95949422013-08-27 19:40:23 +0300729 # There's a hack for True and False here.
Jeremy Hyltona5dc3db2007-08-29 19:07:40 +0000730 if s == b"00":
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000731 return False
Jeremy Hyltona5dc3db2007-08-29 19:07:40 +0000732 elif s == b"01":
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000733 return True
734
Florent Xicluna2bb96f52011-10-23 22:11:00 +0200735 return int(s)
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000736
737def read_decimalnl_long(f):
Tim Peters55762f52003-01-28 16:01:25 +0000738 r"""
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000739 >>> import io
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000740
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000741 >>> read_decimalnl_long(io.BytesIO(b"1234L\n56"))
Guido van Rossume2b70bc2006-08-18 22:13:04 +0000742 1234
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000743
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000744 >>> read_decimalnl_long(io.BytesIO(b"123456789012345678901234L\n6"))
Guido van Rossume2b70bc2006-08-18 22:13:04 +0000745 123456789012345678901234
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000746 """
747
748 s = read_stringnl(f, decode=False, stripquotes=False)
Mark Dickinson8dd05142009-01-20 20:43:58 +0000749 if s[-1:] == b'L':
750 s = s[:-1]
Guido van Rossume2a383d2007-01-15 16:59:06 +0000751 return int(s)
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000752
753
754decimalnl_short = ArgumentDescriptor(
755 name='decimalnl_short',
756 n=UP_TO_NEWLINE,
757 reader=read_decimalnl_short,
758 doc="""A newline-terminated decimal integer literal.
759
760 This never has a trailing 'L', and the integer fit
761 in a short Python int on the box where the pickle
762 was written -- but there's no guarantee it will fit
763 in a short Python int on the box where the pickle
764 is read.
765 """)
766
767decimalnl_long = ArgumentDescriptor(
768 name='decimalnl_long',
769 n=UP_TO_NEWLINE,
770 reader=read_decimalnl_long,
771 doc="""A newline-terminated decimal integer literal.
772
773 This has a trailing 'L', and can represent integers
774 of any size.
775 """)
776
777
778def read_floatnl(f):
Tim Peters55762f52003-01-28 16:01:25 +0000779 r"""
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000780 >>> import io
781 >>> read_floatnl(io.BytesIO(b"-1.25\n6"))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000782 -1.25
783 """
784 s = read_stringnl(f, decode=False, stripquotes=False)
785 return float(s)
786
787floatnl = ArgumentDescriptor(
788 name='floatnl',
789 n=UP_TO_NEWLINE,
790 reader=read_floatnl,
791 doc="""A newline-terminated decimal floating literal.
792
793 In general this requires 17 significant digits for roundtrip
794 identity, and pickling then unpickling infinities, NaNs, and
795 minus zero doesn't work across boxes, or on some boxes even
796 on itself (e.g., Windows can't read the strings it produces
797 for infinities or NaNs).
798 """)
799
800def read_float8(f):
Tim Peters55762f52003-01-28 16:01:25 +0000801 r"""
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000802 >>> import io, struct
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000803 >>> raw = struct.pack(">d", -1.25)
804 >>> raw
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000805 b'\xbf\xf4\x00\x00\x00\x00\x00\x00'
806 >>> read_float8(io.BytesIO(raw + b"\n"))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000807 -1.25
808 """
809
810 data = f.read(8)
811 if len(data) == 8:
812 return _unpack(">d", data)[0]
813 raise ValueError("not enough data in stream to read float8")
814
815
816float8 = ArgumentDescriptor(
817 name='float8',
818 n=8,
819 reader=read_float8,
820 doc="""An 8-byte binary representation of a float, big-endian.
821
822 The format is unique to Python, and shared with the struct
Guido van Rossum99603b02007-07-20 00:22:32 +0000823 module (format string '>d') "in theory" (the struct and pickle
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000824 implementations don't share the code -- they should). It's
825 strongly related to the IEEE-754 double format, and, in normal
826 cases, is in fact identical to the big-endian 754 double format.
827 On other boxes the dynamic range is limited to that of a 754
828 double, and "add a half and chop" rounding is used to reduce
829 the precision to 53 bits. However, even on a 754 box,
830 infinities, NaNs, and minus zero may not be handled correctly
831 (may not survive roundtrip pickling intact).
832 """)
833
Guido van Rossum5a2d8f52003-01-27 21:44:25 +0000834# Protocol 2 formats
835
Tim Petersc0c12b52003-01-29 00:56:17 +0000836from pickle import decode_long
Guido van Rossum5a2d8f52003-01-27 21:44:25 +0000837
838def read_long1(f):
839 r"""
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000840 >>> import io
841 >>> read_long1(io.BytesIO(b"\x00"))
Guido van Rossume2b70bc2006-08-18 22:13:04 +0000842 0
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000843 >>> read_long1(io.BytesIO(b"\x02\xff\x00"))
Guido van Rossume2b70bc2006-08-18 22:13:04 +0000844 255
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000845 >>> read_long1(io.BytesIO(b"\x02\xff\x7f"))
Guido van Rossume2b70bc2006-08-18 22:13:04 +0000846 32767
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000847 >>> read_long1(io.BytesIO(b"\x02\x00\xff"))
Guido van Rossume2b70bc2006-08-18 22:13:04 +0000848 -256
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000849 >>> read_long1(io.BytesIO(b"\x02\x00\x80"))
Guido van Rossume2b70bc2006-08-18 22:13:04 +0000850 -32768
Guido van Rossum5a2d8f52003-01-27 21:44:25 +0000851 """
852
853 n = read_uint1(f)
854 data = f.read(n)
855 if len(data) != n:
856 raise ValueError("not enough data in stream to read long1")
857 return decode_long(data)
858
859long1 = ArgumentDescriptor(
860 name="long1",
Tim Petersfdb8cfa2003-01-28 00:13:19 +0000861 n=TAKEN_FROM_ARGUMENT1,
Guido van Rossum5a2d8f52003-01-27 21:44:25 +0000862 reader=read_long1,
863 doc="""A binary long, little-endian, using 1-byte size.
864
865 This first reads one byte as an unsigned size, then reads that
Tim Petersbdbe7412003-01-27 23:54:04 +0000866 many bytes and interprets them as a little-endian 2's-complement long.
Tim Peters4b23f2b2003-01-31 16:43:39 +0000867 If the size is 0, that's taken as a shortcut for the long 0L.
Guido van Rossum5a2d8f52003-01-27 21:44:25 +0000868 """)
869
Guido van Rossum5a2d8f52003-01-27 21:44:25 +0000870def read_long4(f):
871 r"""
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000872 >>> import io
873 >>> read_long4(io.BytesIO(b"\x02\x00\x00\x00\xff\x00"))
Guido van Rossume2b70bc2006-08-18 22:13:04 +0000874 255
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000875 >>> read_long4(io.BytesIO(b"\x02\x00\x00\x00\xff\x7f"))
Guido van Rossume2b70bc2006-08-18 22:13:04 +0000876 32767
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000877 >>> read_long4(io.BytesIO(b"\x02\x00\x00\x00\x00\xff"))
Guido van Rossume2b70bc2006-08-18 22:13:04 +0000878 -256
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000879 >>> read_long4(io.BytesIO(b"\x02\x00\x00\x00\x00\x80"))
Guido van Rossume2b70bc2006-08-18 22:13:04 +0000880 -32768
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000881 >>> read_long1(io.BytesIO(b"\x00\x00\x00\x00"))
Guido van Rossume2b70bc2006-08-18 22:13:04 +0000882 0
Guido van Rossum5a2d8f52003-01-27 21:44:25 +0000883 """
884
885 n = read_int4(f)
886 if n < 0:
Neal Norwitz784a3f52003-01-28 00:20:41 +0000887 raise ValueError("long4 byte count < 0: %d" % n)
Guido van Rossum5a2d8f52003-01-27 21:44:25 +0000888 data = f.read(n)
889 if len(data) != n:
Neal Norwitz784a3f52003-01-28 00:20:41 +0000890 raise ValueError("not enough data in stream to read long4")
Guido van Rossum5a2d8f52003-01-27 21:44:25 +0000891 return decode_long(data)
892
893long4 = ArgumentDescriptor(
894 name="long4",
Tim Petersfdb8cfa2003-01-28 00:13:19 +0000895 n=TAKEN_FROM_ARGUMENT4,
Guido van Rossum5a2d8f52003-01-27 21:44:25 +0000896 reader=read_long4,
897 doc="""A binary representation of a long, little-endian.
898
899 This first reads four bytes as a signed size (but requires the
900 size to be >= 0), then reads that many bytes and interprets them
Tim Peters4b23f2b2003-01-31 16:43:39 +0000901 as a little-endian 2's-complement long. If the size is 0, that's taken
Guido van Rossume2a383d2007-01-15 16:59:06 +0000902 as a shortcut for the int 0, although LONG1 should really be used
Tim Peters4b23f2b2003-01-31 16:43:39 +0000903 then instead (and in any case where # of bytes < 256).
Guido van Rossum5a2d8f52003-01-27 21:44:25 +0000904 """)
905
906
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000907##############################################################################
908# Object descriptors. The stack used by the pickle machine holds objects,
909# and in the stack_before and stack_after attributes of OpcodeInfo
910# descriptors we need names to describe the various types of objects that can
911# appear on the stack.
912
913class StackObject(object):
914 __slots__ = (
915 # name of descriptor record, for info only
916 'name',
917
918 # type of object, or tuple of type objects (meaning the object can
919 # be of any type in the tuple)
920 'obtype',
921
922 # human-readable docs for this kind of stack object; a string
923 'doc',
924 )
925
926 def __init__(self, name, obtype, doc):
Guido van Rossum3172c5d2007-10-16 18:12:55 +0000927 assert isinstance(name, str)
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000928 self.name = name
929
930 assert isinstance(obtype, type) or isinstance(obtype, tuple)
931 if isinstance(obtype, tuple):
932 for contained in obtype:
933 assert isinstance(contained, type)
934 self.obtype = obtype
935
Guido van Rossum3172c5d2007-10-16 18:12:55 +0000936 assert isinstance(doc, str)
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000937 self.doc = doc
938
Tim Petersc1c2b3e2003-01-29 20:12:21 +0000939 def __repr__(self):
940 return self.name
941
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000942
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -0800943pyint = pylong = StackObject(
944 name='int',
945 obtype=int,
946 doc="A Python integer object.")
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000947
948pyinteger_or_bool = StackObject(
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -0800949 name='int_or_bool',
950 obtype=(int, bool),
951 doc="A Python integer or boolean object.")
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000952
Guido van Rossum5a2d8f52003-01-27 21:44:25 +0000953pybool = StackObject(
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -0800954 name='bool',
955 obtype=bool,
956 doc="A Python boolean object.")
Guido van Rossum5a2d8f52003-01-27 21:44:25 +0000957
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000958pyfloat = StackObject(
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -0800959 name='float',
960 obtype=float,
961 doc="A Python float object.")
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000962
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -0800963pybytes_or_str = pystring = StackObject(
964 name='bytes_or_str',
965 obtype=(bytes, str),
966 doc="A Python bytes or (Unicode) string object.")
Guido van Rossumf4169812008-03-17 22:56:06 +0000967
968pybytes = StackObject(
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -0800969 name='bytes',
970 obtype=bytes,
971 doc="A Python bytes object.")
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000972
973pyunicode = StackObject(
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -0800974 name='str',
975 obtype=str,
976 doc="A Python (Unicode) string object.")
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000977
978pynone = StackObject(
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -0800979 name="None",
980 obtype=type(None),
981 doc="The Python None object.")
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000982
983pytuple = StackObject(
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -0800984 name="tuple",
985 obtype=tuple,
986 doc="A Python tuple object.")
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000987
988pylist = StackObject(
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -0800989 name="list",
990 obtype=list,
991 doc="A Python list object.")
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000992
993pydict = StackObject(
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -0800994 name="dict",
995 obtype=dict,
996 doc="A Python dict object.")
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000997
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100998pyset = StackObject(
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -0800999 name="set",
1000 obtype=set,
1001 doc="A Python set object.")
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01001002
1003pyfrozenset = StackObject(
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -08001004 name="frozenset",
1005 obtype=set,
1006 doc="A Python frozenset object.")
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01001007
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001008anyobject = StackObject(
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -08001009 name='any',
1010 obtype=object,
1011 doc="Any kind of object whatsoever.")
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001012
1013markobject = StackObject(
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -08001014 name="mark",
1015 obtype=StackObject,
1016 doc="""'The mark' is a unique object.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001017
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -08001018Opcodes that operate on a variable number of objects
1019generally don't embed the count of objects in the opcode,
1020or pull it off the stack. Instead the MARK opcode is used
1021to push a special marker object on the stack, and then
1022some other opcodes grab all the objects from the top of
1023the stack down to (but not including) the topmost marker
1024object.
1025""")
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001026
1027stackslice = StackObject(
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -08001028 name="stackslice",
1029 obtype=StackObject,
1030 doc="""An object representing a contiguous slice of the stack.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001031
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -08001032This is used in conjunction with markobject, to represent all
1033of the stack following the topmost markobject. For example,
1034the POP_MARK opcode changes the stack from
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001035
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -08001036 [..., markobject, stackslice]
1037to
1038 [...]
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001039
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -08001040No matter how many object are on the stack after the topmost
1041markobject, POP_MARK gets rid of all of them (including the
1042topmost markobject too).
1043""")
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001044
1045##############################################################################
1046# Descriptors for pickle opcodes.
1047
1048class OpcodeInfo(object):
1049
1050 __slots__ = (
1051 # symbolic name of opcode; a string
1052 'name',
1053
1054 # the code used in a bytestream to represent the opcode; a
1055 # one-character string
1056 'code',
1057
1058 # If the opcode has an argument embedded in the byte string, an
1059 # instance of ArgumentDescriptor specifying its type. Note that
1060 # arg.reader(s) can be used to read and decode the argument from
1061 # the bytestream s, and arg.doc documents the format of the raw
1062 # argument bytes. If the opcode doesn't have an argument embedded
1063 # in the bytestream, arg should be None.
1064 'arg',
1065
1066 # what the stack looks like before this opcode runs; a list
1067 'stack_before',
1068
1069 # what the stack looks like after this opcode runs; a list
1070 'stack_after',
1071
1072 # the protocol number in which this opcode was introduced; an int
1073 'proto',
1074
1075 # human-readable docs for this opcode; a string
1076 'doc',
1077 )
1078
1079 def __init__(self, name, code, arg,
1080 stack_before, stack_after, proto, doc):
Guido van Rossum3172c5d2007-10-16 18:12:55 +00001081 assert isinstance(name, str)
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001082 self.name = name
1083
Guido van Rossum3172c5d2007-10-16 18:12:55 +00001084 assert isinstance(code, str)
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001085 assert len(code) == 1
1086 self.code = code
1087
1088 assert arg is None or isinstance(arg, ArgumentDescriptor)
1089 self.arg = arg
1090
1091 assert isinstance(stack_before, list)
1092 for x in stack_before:
1093 assert isinstance(x, StackObject)
1094 self.stack_before = stack_before
1095
1096 assert isinstance(stack_after, list)
1097 for x in stack_after:
1098 assert isinstance(x, StackObject)
1099 self.stack_after = stack_after
1100
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -07001101 assert isinstance(proto, int) and 0 <= proto <= pickle.HIGHEST_PROTOCOL
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001102 self.proto = proto
1103
Guido van Rossum3172c5d2007-10-16 18:12:55 +00001104 assert isinstance(doc, str)
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001105 self.doc = doc
1106
1107I = OpcodeInfo
1108opcodes = [
1109
1110 # Ways to spell integers.
1111
1112 I(name='INT',
1113 code='I',
1114 arg=decimalnl_short,
1115 stack_before=[],
1116 stack_after=[pyinteger_or_bool],
1117 proto=0,
1118 doc="""Push an integer or bool.
1119
1120 The argument is a newline-terminated decimal literal string.
1121
1122 The intent may have been that this always fit in a short Python int,
1123 but INT can be generated in pickles written on a 64-bit box that
1124 require a Python long on a 32-bit box. The difference between this
1125 and LONG then is that INT skips a trailing 'L', and produces a short
1126 int whenever possible.
1127
1128 Another difference is due to that, when bool was introduced as a
1129 distinct type in 2.3, builtin names True and False were also added to
1130 2.2.2, mapping to ints 1 and 0. For compatibility in both directions,
1131 True gets pickled as INT + "I01\\n", and False as INT + "I00\\n".
1132 Leading zeroes are never produced for a genuine integer. The 2.3
1133 (and later) unpicklers special-case these and return bool instead;
1134 earlier unpicklers ignore the leading "0" and return the int.
1135 """),
1136
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001137 I(name='BININT',
1138 code='J',
1139 arg=int4,
1140 stack_before=[],
1141 stack_after=[pyint],
1142 proto=1,
1143 doc="""Push a four-byte signed integer.
1144
1145 This handles the full range of Python (short) integers on a 32-bit
1146 box, directly as binary bytes (1 for the opcode and 4 for the integer).
1147 If the integer is non-negative and fits in 1 or 2 bytes, pickling via
1148 BININT1 or BININT2 saves space.
1149 """),
1150
1151 I(name='BININT1',
1152 code='K',
1153 arg=uint1,
1154 stack_before=[],
1155 stack_after=[pyint],
1156 proto=1,
1157 doc="""Push a one-byte unsigned integer.
1158
1159 This is a space optimization for pickling very small non-negative ints,
1160 in range(256).
1161 """),
1162
1163 I(name='BININT2',
1164 code='M',
1165 arg=uint2,
1166 stack_before=[],
1167 stack_after=[pyint],
1168 proto=1,
1169 doc="""Push a two-byte unsigned integer.
1170
1171 This is a space optimization for pickling small positive ints, in
1172 range(256, 2**16). Integers in range(256) can also be pickled via
1173 BININT2, but BININT1 instead saves a byte.
1174 """),
1175
Tim Petersfdc03462003-01-28 04:56:33 +00001176 I(name='LONG',
1177 code='L',
1178 arg=decimalnl_long,
1179 stack_before=[],
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -08001180 stack_after=[pyint],
Tim Petersfdc03462003-01-28 04:56:33 +00001181 proto=0,
1182 doc="""Push a long integer.
1183
1184 The same as INT, except that the literal ends with 'L', and always
1185 unpickles to a Python long. There doesn't seem a real purpose to the
1186 trailing 'L'.
1187
1188 Note that LONG takes time quadratic in the number of digits when
1189 unpickling (this is simply due to the nature of decimal->binary
1190 conversion). Proto 2 added linear-time (in C; still quadratic-time
1191 in Python) LONG1 and LONG4 opcodes.
1192 """),
1193
1194 I(name="LONG1",
1195 code='\x8a',
1196 arg=long1,
1197 stack_before=[],
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -08001198 stack_after=[pyint],
Tim Petersfdc03462003-01-28 04:56:33 +00001199 proto=2,
1200 doc="""Long integer using one-byte length.
1201
1202 A more efficient encoding of a Python long; the long1 encoding
1203 says it all."""),
1204
1205 I(name="LONG4",
1206 code='\x8b',
1207 arg=long4,
1208 stack_before=[],
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -08001209 stack_after=[pyint],
Tim Petersfdc03462003-01-28 04:56:33 +00001210 proto=2,
1211 doc="""Long integer using found-byte length.
1212
1213 A more efficient encoding of a Python long; the long4 encoding
1214 says it all."""),
1215
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001216 # Ways to spell strings (8-bit, not Unicode).
1217
1218 I(name='STRING',
1219 code='S',
1220 arg=stringnl,
1221 stack_before=[],
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -08001222 stack_after=[pybytes_or_str],
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001223 proto=0,
1224 doc="""Push a Python string object.
1225
1226 The argument is a repr-style string, with bracketing quote characters,
1227 and perhaps embedded escapes. The argument extends until the next
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -08001228 newline character. These are usually decoded into a str instance
Guido van Rossumf4169812008-03-17 22:56:06 +00001229 using the encoding given to the Unpickler constructor. or the default,
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -08001230 'ASCII'. If the encoding given was 'bytes' however, they will be
1231 decoded as bytes object instead.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001232 """),
1233
1234 I(name='BINSTRING',
1235 code='T',
1236 arg=string4,
1237 stack_before=[],
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -08001238 stack_after=[pybytes_or_str],
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001239 proto=1,
1240 doc="""Push a Python string object.
1241
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -08001242 There are two arguments: the first is a 4-byte little-endian
1243 signed int giving the number of bytes in the string, and the
1244 second is that many bytes, which are taken literally as the string
1245 content. These are usually decoded into a str instance using the
1246 encoding given to the Unpickler constructor. or the default,
1247 'ASCII'. If the encoding given was 'bytes' however, they will be
1248 decoded as bytes object instead.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001249 """),
1250
1251 I(name='SHORT_BINSTRING',
1252 code='U',
1253 arg=string1,
1254 stack_before=[],
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -08001255 stack_after=[pybytes_or_str],
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001256 proto=1,
1257 doc="""Push a Python string object.
1258
Alexandre Vassalottid05c9ff2013-12-07 01:09:27 -08001259 There are two arguments: the first is a 1-byte unsigned int giving
1260 the number of bytes in the string, and the second is that many
1261 bytes, which are taken literally as the string content. These are
1262 usually decoded into a str instance using the encoding given to
1263 the Unpickler constructor. or the default, 'ASCII'. If the
1264 encoding given was 'bytes' however, they will be decoded as bytes
1265 object instead.
Guido van Rossumf4169812008-03-17 22:56:06 +00001266 """),
1267
1268 # Bytes (protocol 3 only; older protocols don't support bytes at all)
1269
1270 I(name='BINBYTES',
1271 code='B',
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -07001272 arg=bytes4,
Guido van Rossumf4169812008-03-17 22:56:06 +00001273 stack_before=[],
1274 stack_after=[pybytes],
1275 proto=3,
1276 doc="""Push a Python bytes object.
1277
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -07001278 There are two arguments: the first is a 4-byte little-endian unsigned int
1279 giving the number of bytes, and the second is that many bytes, which are
1280 taken literally as the bytes content.
Guido van Rossumf4169812008-03-17 22:56:06 +00001281 """),
1282
1283 I(name='SHORT_BINBYTES',
1284 code='C',
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -07001285 arg=bytes1,
Guido van Rossumf4169812008-03-17 22:56:06 +00001286 stack_before=[],
1287 stack_after=[pybytes],
Collin Wintere61d4372009-05-20 17:46:47 +00001288 proto=3,
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -07001289 doc="""Push a Python bytes object.
Guido van Rossumf4169812008-03-17 22:56:06 +00001290
1291 There are two arguments: the first is a 1-byte unsigned int giving
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -07001292 the number of bytes, and the second is that many bytes, which are taken
1293 literally as the string content.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001294 """),
1295
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01001296 I(name='BINBYTES8',
1297 code='\x8e',
1298 arg=bytes8,
1299 stack_before=[],
1300 stack_after=[pybytes],
1301 proto=4,
1302 doc="""Push a Python bytes object.
1303
Martin Panter4c359642016-05-08 13:53:41 +00001304 There are two arguments: the first is an 8-byte unsigned int giving
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01001305 the number of bytes in the string, and the second is that many bytes,
1306 which are taken literally as the string content.
1307 """),
1308
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001309 # Ways to spell None.
1310
1311 I(name='NONE',
1312 code='N',
1313 arg=None,
1314 stack_before=[],
1315 stack_after=[pynone],
1316 proto=0,
1317 doc="Push None on the stack."),
1318
Tim Petersfdc03462003-01-28 04:56:33 +00001319 # Ways to spell bools, starting with proto 2. See INT for how this was
1320 # done before proto 2.
1321
1322 I(name='NEWTRUE',
1323 code='\x88',
1324 arg=None,
1325 stack_before=[],
1326 stack_after=[pybool],
1327 proto=2,
1328 doc="""True.
1329
1330 Push True onto the stack."""),
1331
1332 I(name='NEWFALSE',
1333 code='\x89',
1334 arg=None,
1335 stack_before=[],
1336 stack_after=[pybool],
1337 proto=2,
1338 doc="""True.
1339
1340 Push False onto the stack."""),
1341
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001342 # Ways to spell Unicode strings.
1343
1344 I(name='UNICODE',
1345 code='V',
1346 arg=unicodestringnl,
1347 stack_before=[],
1348 stack_after=[pyunicode],
1349 proto=0, # this may be pure-text, but it's a later addition
1350 doc="""Push a Python Unicode string object.
1351
1352 The argument is a raw-unicode-escape encoding of a Unicode string,
1353 and so may contain embedded escape sequences. The argument extends
1354 until the next newline character.
1355 """),
1356
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01001357 I(name='SHORT_BINUNICODE',
1358 code='\x8c',
1359 arg=unicodestring1,
1360 stack_before=[],
1361 stack_after=[pyunicode],
1362 proto=4,
1363 doc="""Push a Python Unicode string object.
1364
1365 There are two arguments: the first is a 1-byte little-endian signed int
1366 giving the number of bytes in the string. The second is that many
1367 bytes, and is the UTF-8 encoding of the Unicode string.
1368 """),
1369
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001370 I(name='BINUNICODE',
1371 code='X',
1372 arg=unicodestring4,
1373 stack_before=[],
1374 stack_after=[pyunicode],
1375 proto=1,
1376 doc="""Push a Python Unicode string object.
1377
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -07001378 There are two arguments: the first is a 4-byte little-endian unsigned int
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001379 giving the number of bytes in the string. The second is that many
1380 bytes, and is the UTF-8 encoding of the Unicode string.
1381 """),
1382
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01001383 I(name='BINUNICODE8',
1384 code='\x8d',
1385 arg=unicodestring8,
1386 stack_before=[],
1387 stack_after=[pyunicode],
1388 proto=4,
1389 doc="""Push a Python Unicode string object.
1390
Martin Panter4c359642016-05-08 13:53:41 +00001391 There are two arguments: the first is an 8-byte little-endian signed int
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01001392 giving the number of bytes in the string. The second is that many
1393 bytes, and is the UTF-8 encoding of the Unicode string.
1394 """),
1395
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001396 # Ways to spell floats.
1397
1398 I(name='FLOAT',
1399 code='F',
1400 arg=floatnl,
1401 stack_before=[],
1402 stack_after=[pyfloat],
1403 proto=0,
1404 doc="""Newline-terminated decimal float literal.
1405
1406 The argument is repr(a_float), and in general requires 17 significant
1407 digits for roundtrip conversion to be an identity (this is so for
1408 IEEE-754 double precision values, which is what Python float maps to
1409 on most boxes).
1410
1411 In general, FLOAT cannot be used to transport infinities, NaNs, or
1412 minus zero across boxes (or even on a single box, if the platform C
1413 library can't read the strings it produces for such things -- Windows
1414 is like that), but may do less damage than BINFLOAT on boxes with
1415 greater precision or dynamic range than IEEE-754 double.
1416 """),
1417
1418 I(name='BINFLOAT',
1419 code='G',
1420 arg=float8,
1421 stack_before=[],
1422 stack_after=[pyfloat],
1423 proto=1,
1424 doc="""Float stored in binary form, with 8 bytes of data.
1425
1426 This generally requires less than half the space of FLOAT encoding.
1427 In general, BINFLOAT cannot be used to transport infinities, NaNs, or
1428 minus zero, raises an exception if the exponent exceeds the range of
1429 an IEEE-754 double, and retains no more than 53 bits of precision (if
1430 there are more than that, "add a half and chop" rounding is used to
1431 cut it back to 53 significant bits).
1432 """),
1433
1434 # Ways to build lists.
1435
1436 I(name='EMPTY_LIST',
1437 code=']',
1438 arg=None,
1439 stack_before=[],
1440 stack_after=[pylist],
1441 proto=1,
1442 doc="Push an empty list."),
1443
1444 I(name='APPEND',
1445 code='a',
1446 arg=None,
1447 stack_before=[pylist, anyobject],
1448 stack_after=[pylist],
1449 proto=0,
1450 doc="""Append an object to a list.
1451
1452 Stack before: ... pylist anyobject
1453 Stack after: ... pylist+[anyobject]
Tim Peters81098ac2003-01-28 05:12:08 +00001454
1455 although pylist is really extended in-place.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001456 """),
1457
1458 I(name='APPENDS',
1459 code='e',
1460 arg=None,
1461 stack_before=[pylist, markobject, stackslice],
1462 stack_after=[pylist],
1463 proto=1,
1464 doc="""Extend a list by a slice of stack objects.
1465
1466 Stack before: ... pylist markobject stackslice
1467 Stack after: ... pylist+stackslice
Tim Peters81098ac2003-01-28 05:12:08 +00001468
1469 although pylist is really extended in-place.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001470 """),
1471
1472 I(name='LIST',
1473 code='l',
1474 arg=None,
1475 stack_before=[markobject, stackslice],
1476 stack_after=[pylist],
1477 proto=0,
1478 doc="""Build a list out of the topmost stack slice, after markobject.
1479
1480 All the stack entries following the topmost markobject are placed into
1481 a single Python list, which single list object replaces all of the
1482 stack from the topmost markobject onward. For example,
1483
1484 Stack before: ... markobject 1 2 3 'abc'
1485 Stack after: ... [1, 2, 3, 'abc']
1486 """),
1487
1488 # Ways to build tuples.
1489
1490 I(name='EMPTY_TUPLE',
1491 code=')',
1492 arg=None,
1493 stack_before=[],
1494 stack_after=[pytuple],
1495 proto=1,
1496 doc="Push an empty tuple."),
1497
1498 I(name='TUPLE',
1499 code='t',
1500 arg=None,
1501 stack_before=[markobject, stackslice],
1502 stack_after=[pytuple],
1503 proto=0,
1504 doc="""Build a tuple out of the topmost stack slice, after markobject.
1505
1506 All the stack entries following the topmost markobject are placed into
1507 a single Python tuple, which single tuple object replaces all of the
1508 stack from the topmost markobject onward. For example,
1509
1510 Stack before: ... markobject 1 2 3 'abc'
1511 Stack after: ... (1, 2, 3, 'abc')
1512 """),
1513
Tim Petersfdc03462003-01-28 04:56:33 +00001514 I(name='TUPLE1',
1515 code='\x85',
1516 arg=None,
1517 stack_before=[anyobject],
1518 stack_after=[pytuple],
1519 proto=2,
Alexander Belopolsky44c2ffd2010-07-16 14:39:45 +00001520 doc="""Build a one-tuple out of the topmost item on the stack.
Tim Petersfdc03462003-01-28 04:56:33 +00001521
1522 This code pops one value off the stack and pushes a tuple of
Alexander Belopolsky44c2ffd2010-07-16 14:39:45 +00001523 length 1 whose one item is that value back onto it. In other
1524 words:
Tim Petersfdc03462003-01-28 04:56:33 +00001525
1526 stack[-1] = tuple(stack[-1:])
1527 """),
1528
1529 I(name='TUPLE2',
1530 code='\x86',
1531 arg=None,
1532 stack_before=[anyobject, anyobject],
1533 stack_after=[pytuple],
1534 proto=2,
Alexander Belopolsky44c2ffd2010-07-16 14:39:45 +00001535 doc="""Build a two-tuple out of the top two items on the stack.
Tim Petersfdc03462003-01-28 04:56:33 +00001536
Alexander Belopolsky44c2ffd2010-07-16 14:39:45 +00001537 This code pops two values off the stack and pushes a tuple of
1538 length 2 whose items are those values back onto it. In other
1539 words:
Tim Petersfdc03462003-01-28 04:56:33 +00001540
1541 stack[-2:] = [tuple(stack[-2:])]
1542 """),
1543
1544 I(name='TUPLE3',
1545 code='\x87',
1546 arg=None,
1547 stack_before=[anyobject, anyobject, anyobject],
1548 stack_after=[pytuple],
1549 proto=2,
Alexander Belopolsky44c2ffd2010-07-16 14:39:45 +00001550 doc="""Build a three-tuple out of the top three items on the stack.
Tim Petersfdc03462003-01-28 04:56:33 +00001551
Alexander Belopolsky44c2ffd2010-07-16 14:39:45 +00001552 This code pops three values off the stack and pushes a tuple of
1553 length 3 whose items are those values back onto it. In other
1554 words:
Tim Petersfdc03462003-01-28 04:56:33 +00001555
1556 stack[-3:] = [tuple(stack[-3:])]
1557 """),
1558
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001559 # Ways to build dicts.
1560
1561 I(name='EMPTY_DICT',
1562 code='}',
1563 arg=None,
1564 stack_before=[],
1565 stack_after=[pydict],
1566 proto=1,
1567 doc="Push an empty dict."),
1568
1569 I(name='DICT',
1570 code='d',
1571 arg=None,
1572 stack_before=[markobject, stackslice],
1573 stack_after=[pydict],
1574 proto=0,
1575 doc="""Build a dict out of the topmost stack slice, after markobject.
1576
1577 All the stack entries following the topmost markobject are placed into
1578 a single Python dict, which single dict object replaces all of the
1579 stack from the topmost markobject onward. The stack slice alternates
1580 key, value, key, value, .... For example,
1581
1582 Stack before: ... markobject 1 2 3 'abc'
1583 Stack after: ... {1: 2, 3: 'abc'}
1584 """),
1585
1586 I(name='SETITEM',
1587 code='s',
1588 arg=None,
1589 stack_before=[pydict, anyobject, anyobject],
1590 stack_after=[pydict],
1591 proto=0,
1592 doc="""Add a key+value pair to an existing dict.
1593
1594 Stack before: ... pydict key value
1595 Stack after: ... pydict
1596
1597 where pydict has been modified via pydict[key] = value.
1598 """),
1599
1600 I(name='SETITEMS',
1601 code='u',
1602 arg=None,
1603 stack_before=[pydict, markobject, stackslice],
1604 stack_after=[pydict],
1605 proto=1,
1606 doc="""Add an arbitrary number of key+value pairs to an existing dict.
1607
1608 The slice of the stack following the topmost markobject is taken as
1609 an alternating sequence of keys and values, added to the dict
1610 immediately under the topmost markobject. Everything at and after the
1611 topmost markobject is popped, leaving the mutated dict at the top
1612 of the stack.
1613
1614 Stack before: ... pydict markobject key_1 value_1 ... key_n value_n
1615 Stack after: ... pydict
1616
1617 where pydict has been modified via pydict[key_i] = value_i for i in
1618 1, 2, ..., n, and in that order.
1619 """),
1620
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01001621 # Ways to build sets
1622
1623 I(name='EMPTY_SET',
1624 code='\x8f',
1625 arg=None,
1626 stack_before=[],
1627 stack_after=[pyset],
1628 proto=4,
1629 doc="Push an empty set."),
1630
1631 I(name='ADDITEMS',
1632 code='\x90',
1633 arg=None,
1634 stack_before=[pyset, markobject, stackslice],
1635 stack_after=[pyset],
1636 proto=4,
1637 doc="""Add an arbitrary number of items to an existing set.
1638
1639 The slice of the stack following the topmost markobject is taken as
1640 a sequence of items, added to the set immediately under the topmost
1641 markobject. Everything at and after the topmost markobject is popped,
1642 leaving the mutated set at the top of the stack.
1643
1644 Stack before: ... pyset markobject item_1 ... item_n
1645 Stack after: ... pyset
1646
1647 where pyset has been modified via pyset.add(item_i) = item_i for i in
1648 1, 2, ..., n, and in that order.
1649 """),
1650
1651 # Way to build frozensets
1652
1653 I(name='FROZENSET',
1654 code='\x91',
1655 arg=None,
1656 stack_before=[markobject, stackslice],
1657 stack_after=[pyfrozenset],
1658 proto=4,
1659 doc="""Build a frozenset out of the topmost slice, after markobject.
1660
1661 All the stack entries following the topmost markobject are placed into
1662 a single Python frozenset, which single frozenset object replaces all
1663 of the stack from the topmost markobject onward. For example,
1664
1665 Stack before: ... markobject 1 2 3
1666 Stack after: ... frozenset({1, 2, 3})
1667 """),
1668
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001669 # Stack manipulation.
1670
1671 I(name='POP',
1672 code='0',
1673 arg=None,
1674 stack_before=[anyobject],
1675 stack_after=[],
1676 proto=0,
1677 doc="Discard the top stack item, shrinking the stack by one item."),
1678
1679 I(name='DUP',
1680 code='2',
1681 arg=None,
1682 stack_before=[anyobject],
1683 stack_after=[anyobject, anyobject],
1684 proto=0,
1685 doc="Push the top stack item onto the stack again, duplicating it."),
1686
1687 I(name='MARK',
1688 code='(',
1689 arg=None,
1690 stack_before=[],
1691 stack_after=[markobject],
1692 proto=0,
1693 doc="""Push markobject onto the stack.
1694
1695 markobject is a unique object, used by other opcodes to identify a
1696 region of the stack containing a variable number of objects for them
1697 to work on. See markobject.doc for more detail.
1698 """),
1699
1700 I(name='POP_MARK',
1701 code='1',
1702 arg=None,
1703 stack_before=[markobject, stackslice],
1704 stack_after=[],
Collin Wintere61d4372009-05-20 17:46:47 +00001705 proto=1,
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001706 doc="""Pop all the stack objects at and above the topmost markobject.
1707
1708 When an opcode using a variable number of stack objects is done,
1709 POP_MARK is used to remove those objects, and to remove the markobject
1710 that delimited their starting position on the stack.
1711 """),
1712
1713 # Memo manipulation. There are really only two operations (get and put),
1714 # each in all-text, "short binary", and "long binary" flavors.
1715
1716 I(name='GET',
1717 code='g',
1718 arg=decimalnl_short,
1719 stack_before=[],
1720 stack_after=[anyobject],
1721 proto=0,
1722 doc="""Read an object from the memo and push it on the stack.
1723
Ezio Melotti13925002011-03-16 11:05:33 +02001724 The index of the memo object to push is given by the newline-terminated
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001725 decimal string following. BINGET and LONG_BINGET are space-optimized
1726 versions.
1727 """),
1728
1729 I(name='BINGET',
1730 code='h',
1731 arg=uint1,
1732 stack_before=[],
1733 stack_after=[anyobject],
1734 proto=1,
1735 doc="""Read an object from the memo and push it on the stack.
1736
1737 The index of the memo object to push is given by the 1-byte unsigned
1738 integer following.
1739 """),
1740
1741 I(name='LONG_BINGET',
1742 code='j',
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -07001743 arg=uint4,
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001744 stack_before=[],
1745 stack_after=[anyobject],
1746 proto=1,
1747 doc="""Read an object from the memo and push it on the stack.
1748
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -07001749 The index of the memo object to push is given by the 4-byte unsigned
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001750 little-endian integer following.
1751 """),
1752
1753 I(name='PUT',
1754 code='p',
1755 arg=decimalnl_short,
1756 stack_before=[],
1757 stack_after=[],
1758 proto=0,
1759 doc="""Store the stack top into the memo. The stack is not popped.
1760
1761 The index of the memo location to write into is given by the newline-
1762 terminated decimal string following. BINPUT and LONG_BINPUT are
1763 space-optimized versions.
1764 """),
1765
1766 I(name='BINPUT',
1767 code='q',
1768 arg=uint1,
1769 stack_before=[],
1770 stack_after=[],
1771 proto=1,
1772 doc="""Store the stack top into the memo. The stack is not popped.
1773
1774 The index of the memo location to write into is given by the 1-byte
1775 unsigned integer following.
1776 """),
1777
1778 I(name='LONG_BINPUT',
1779 code='r',
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -07001780 arg=uint4,
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001781 stack_before=[],
1782 stack_after=[],
1783 proto=1,
1784 doc="""Store the stack top into the memo. The stack is not popped.
1785
1786 The index of the memo location to write into is given by the 4-byte
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -07001787 unsigned little-endian integer following.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001788 """),
1789
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01001790 I(name='MEMOIZE',
1791 code='\x94',
1792 arg=None,
1793 stack_before=[anyobject],
1794 stack_after=[anyobject],
1795 proto=4,
1796 doc="""Store the stack top into the memo. The stack is not popped.
1797
1798 The index of the memo location to write is the number of
1799 elements currently present in the memo.
1800 """),
1801
Tim Petersfdc03462003-01-28 04:56:33 +00001802 # Access the extension registry (predefined objects). Akin to the GET
1803 # family.
1804
1805 I(name='EXT1',
1806 code='\x82',
1807 arg=uint1,
1808 stack_before=[],
1809 stack_after=[anyobject],
1810 proto=2,
1811 doc="""Extension code.
1812
1813 This code and the similar EXT2 and EXT4 allow using a registry
1814 of popular objects that are pickled by name, typically classes.
1815 It is envisioned that through a global negotiation and
1816 registration process, third parties can set up a mapping between
1817 ints and object names.
1818
1819 In order to guarantee pickle interchangeability, the extension
1820 code registry ought to be global, although a range of codes may
1821 be reserved for private use.
1822
1823 EXT1 has a 1-byte integer argument. This is used to index into the
1824 extension registry, and the object at that index is pushed on the stack.
1825 """),
1826
1827 I(name='EXT2',
1828 code='\x83',
1829 arg=uint2,
1830 stack_before=[],
1831 stack_after=[anyobject],
1832 proto=2,
1833 doc="""Extension code.
1834
1835 See EXT1. EXT2 has a two-byte integer argument.
1836 """),
1837
1838 I(name='EXT4',
1839 code='\x84',
1840 arg=int4,
1841 stack_before=[],
1842 stack_after=[anyobject],
1843 proto=2,
1844 doc="""Extension code.
1845
1846 See EXT1. EXT4 has a four-byte integer argument.
1847 """),
1848
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001849 # Push a class object, or module function, on the stack, via its module
1850 # and name.
1851
1852 I(name='GLOBAL',
1853 code='c',
1854 arg=stringnl_noescape_pair,
1855 stack_before=[],
1856 stack_after=[anyobject],
1857 proto=0,
1858 doc="""Push a global object (module.attr) on the stack.
1859
1860 Two newline-terminated strings follow the GLOBAL opcode. The first is
1861 taken as a module name, and the second as a class name. The class
1862 object module.class is pushed on the stack. More accurately, the
1863 object returned by self.find_class(module, class) is pushed on the
1864 stack, so unpickling subclasses can override this form of lookup.
1865 """),
1866
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01001867 I(name='STACK_GLOBAL',
1868 code='\x93',
1869 arg=None,
1870 stack_before=[pyunicode, pyunicode],
1871 stack_after=[anyobject],
Serhiy Storchaka5805dde2015-10-13 21:12:32 +03001872 proto=4,
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01001873 doc="""Push a global object (module.attr) on the stack.
1874 """),
1875
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001876 # Ways to build objects of classes pickle doesn't know about directly
1877 # (user-defined classes). I despair of documenting this accurately
1878 # and comprehensibly -- you really have to read the pickle code to
1879 # find all the special cases.
1880
1881 I(name='REDUCE',
1882 code='R',
1883 arg=None,
1884 stack_before=[anyobject, anyobject],
1885 stack_after=[anyobject],
1886 proto=0,
1887 doc="""Push an object built from a callable and an argument tuple.
1888
1889 The opcode is named to remind of the __reduce__() method.
1890
1891 Stack before: ... callable pytuple
1892 Stack after: ... callable(*pytuple)
1893
1894 The callable and the argument tuple are the first two items returned
1895 by a __reduce__ method. Applying the callable to the argtuple is
1896 supposed to reproduce the original object, or at least get it started.
1897 If the __reduce__ method returns a 3-tuple, the last component is an
1898 argument to be passed to the object's __setstate__, and then the REDUCE
1899 opcode is followed by code to create setstate's argument, and then a
1900 BUILD opcode to apply __setstate__ to that argument.
1901
Guido van Rossum13257902007-06-07 23:15:56 +00001902 If not isinstance(callable, type), REDUCE complains unless the
Alexandre Vassalottif7fa63d2008-05-11 08:55:36 +00001903 callable has been registered with the copyreg module's
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001904 safe_constructors dict, or the callable has a magic
1905 '__safe_for_unpickling__' attribute with a true value. I'm not sure
1906 why it does this, but I've sure seen this complaint often enough when
1907 I didn't want to <wink>.
1908 """),
1909
1910 I(name='BUILD',
1911 code='b',
1912 arg=None,
1913 stack_before=[anyobject, anyobject],
1914 stack_after=[anyobject],
1915 proto=0,
1916 doc="""Finish building an object, via __setstate__ or dict update.
1917
1918 Stack before: ... anyobject argument
1919 Stack after: ... anyobject
1920
1921 where anyobject may have been mutated, as follows:
1922
1923 If the object has a __setstate__ method,
1924
1925 anyobject.__setstate__(argument)
1926
1927 is called.
1928
1929 Else the argument must be a dict, the object must have a __dict__, and
1930 the object is updated via
1931
1932 anyobject.__dict__.update(argument)
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001933 """),
1934
1935 I(name='INST',
1936 code='i',
1937 arg=stringnl_noescape_pair,
1938 stack_before=[markobject, stackslice],
1939 stack_after=[anyobject],
1940 proto=0,
1941 doc="""Build a class instance.
1942
1943 This is the protocol 0 version of protocol 1's OBJ opcode.
1944 INST is followed by two newline-terminated strings, giving a
1945 module and class name, just as for the GLOBAL opcode (and see
1946 GLOBAL for more details about that). self.find_class(module, name)
1947 is used to get a class object.
1948
1949 In addition, all the objects on the stack following the topmost
1950 markobject are gathered into a tuple and popped (along with the
1951 topmost markobject), just as for the TUPLE opcode.
1952
1953 Now it gets complicated. If all of these are true:
1954
1955 + The argtuple is empty (markobject was at the top of the stack
1956 at the start).
1957
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001958 + The class object does not have a __getinitargs__ attribute.
1959
1960 then we want to create an old-style class instance without invoking
1961 its __init__() method (pickle has waffled on this over the years; not
1962 calling __init__() is current wisdom). In this case, an instance of
1963 an old-style dummy class is created, and then we try to rebind its
1964 __class__ attribute to the desired class object. If this succeeds,
Guido van Rossuma8add0e2007-05-14 22:03:55 +00001965 the new instance object is pushed on the stack, and we're done.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001966
1967 Else (the argtuple is not empty, it's not an old-style class object,
1968 or the class object does have a __getinitargs__ attribute), the code
1969 first insists that the class object have a __safe_for_unpickling__
1970 attribute. Unlike as for the __safe_for_unpickling__ check in REDUCE,
1971 it doesn't matter whether this attribute has a true or false value, it
Guido van Rossum99603b02007-07-20 00:22:32 +00001972 only matters whether it exists (XXX this is a bug). If
1973 __safe_for_unpickling__ doesn't exist, UnpicklingError is raised.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001974
1975 Else (the class object does have a __safe_for_unpickling__ attr),
1976 the class object obtained from INST's arguments is applied to the
1977 argtuple obtained from the stack, and the resulting instance object
1978 is pushed on the stack.
Tim Peters2b93c4c2003-01-30 16:35:08 +00001979
1980 NOTE: checks for __safe_for_unpickling__ went away in Python 2.3.
Florent Xiclunaaa6c1d22011-12-12 18:54:29 +01001981 NOTE: the distinction between old-style and new-style classes does
1982 not make sense in Python 3.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001983 """),
1984
1985 I(name='OBJ',
1986 code='o',
1987 arg=None,
1988 stack_before=[markobject, anyobject, stackslice],
1989 stack_after=[anyobject],
1990 proto=1,
1991 doc="""Build a class instance.
1992
1993 This is the protocol 1 version of protocol 0's INST opcode, and is
1994 very much like it. The major difference is that the class object
1995 is taken off the stack, allowing it to be retrieved from the memo
1996 repeatedly if several instances of the same class are created. This
1997 can be much more efficient (in both time and space) than repeatedly
1998 embedding the module and class names in INST opcodes.
1999
2000 Unlike INST, OBJ takes no arguments from the opcode stream. Instead
2001 the class object is taken off the stack, immediately above the
2002 topmost markobject:
2003
2004 Stack before: ... markobject classobject stackslice
2005 Stack after: ... new_instance_object
2006
2007 As for INST, the remainder of the stack above the markobject is
2008 gathered into an argument tuple, and then the logic seems identical,
Guido van Rossumecb11042003-01-29 06:24:30 +00002009 except that no __safe_for_unpickling__ check is done (XXX this is
Guido van Rossum99603b02007-07-20 00:22:32 +00002010 a bug). See INST for the gory details.
Tim Peters2b93c4c2003-01-30 16:35:08 +00002011
2012 NOTE: In Python 2.3, INST and OBJ are identical except for how they
2013 get the class object. That was always the intent; the implementations
2014 had diverged for accidental reasons.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002015 """),
2016
Tim Petersfdc03462003-01-28 04:56:33 +00002017 I(name='NEWOBJ',
2018 code='\x81',
2019 arg=None,
2020 stack_before=[anyobject, anyobject],
2021 stack_after=[anyobject],
2022 proto=2,
2023 doc="""Build an object instance.
2024
2025 The stack before should be thought of as containing a class
2026 object followed by an argument tuple (the tuple being the stack
2027 top). Call these cls and args. They are popped off the stack,
2028 and the value returned by cls.__new__(cls, *args) is pushed back
2029 onto the stack.
2030 """),
2031
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01002032 I(name='NEWOBJ_EX',
2033 code='\x92',
2034 arg=None,
2035 stack_before=[anyobject, anyobject, anyobject],
2036 stack_after=[anyobject],
2037 proto=4,
2038 doc="""Build an object instance.
2039
2040 The stack before should be thought of as containing a class
2041 object followed by an argument tuple and by a keyword argument dict
2042 (the dict being the stack top). Call these cls and args. They are
2043 popped off the stack, and the value returned by
2044 cls.__new__(cls, *args, *kwargs) is pushed back onto the stack.
2045 """),
2046
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002047 # Machine control.
2048
Tim Petersfdc03462003-01-28 04:56:33 +00002049 I(name='PROTO',
2050 code='\x80',
2051 arg=uint1,
2052 stack_before=[],
2053 stack_after=[],
2054 proto=2,
2055 doc="""Protocol version indicator.
2056
2057 For protocol 2 and above, a pickle must start with this opcode.
2058 The argument is the protocol version, an int in range(2, 256).
2059 """),
2060
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002061 I(name='STOP',
2062 code='.',
2063 arg=None,
2064 stack_before=[anyobject],
2065 stack_after=[],
2066 proto=0,
2067 doc="""Stop the unpickling machine.
2068
2069 Every pickle ends with this opcode. The object at the top of the stack
2070 is popped, and that's the result of unpickling. The stack should be
2071 empty then.
2072 """),
2073
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01002074 # Framing support.
2075
2076 I(name='FRAME',
2077 code='\x95',
2078 arg=uint8,
2079 stack_before=[],
2080 stack_after=[],
2081 proto=4,
2082 doc="""Indicate the beginning of a new frame.
2083
2084 The unpickler may use this opcode to safely prefetch data from its
2085 underlying stream.
2086 """),
2087
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002088 # Ways to deal with persistent IDs.
2089
2090 I(name='PERSID',
2091 code='P',
2092 arg=stringnl_noescape,
2093 stack_before=[],
2094 stack_after=[anyobject],
2095 proto=0,
2096 doc="""Push an object identified by a persistent ID.
2097
2098 The pickle module doesn't define what a persistent ID means. PERSID's
2099 argument is a newline-terminated str-style (no embedded escapes, no
2100 bracketing quote characters) string, which *is* "the persistent ID".
2101 The unpickler passes this string to self.persistent_load(). Whatever
2102 object that returns is pushed on the stack. There is no implementation
2103 of persistent_load() in Python's unpickler: it must be supplied by an
2104 unpickler subclass.
2105 """),
2106
2107 I(name='BINPERSID',
2108 code='Q',
2109 arg=None,
2110 stack_before=[anyobject],
2111 stack_after=[anyobject],
2112 proto=1,
2113 doc="""Push an object identified by a persistent ID.
2114
2115 Like PERSID, except the persistent ID is popped off the stack (instead
2116 of being a string embedded in the opcode bytestream). The persistent
2117 ID is passed to self.persistent_load(), and whatever object that
2118 returns is pushed on the stack. See PERSID for more detail.
2119 """),
2120]
2121del I
2122
2123# Verify uniqueness of .name and .code members.
2124name2i = {}
2125code2i = {}
2126
2127for i, d in enumerate(opcodes):
2128 if d.name in name2i:
2129 raise ValueError("repeated name %r at indices %d and %d" %
2130 (d.name, name2i[d.name], i))
2131 if d.code in code2i:
2132 raise ValueError("repeated code %r at indices %d and %d" %
2133 (d.code, code2i[d.code], i))
2134
2135 name2i[d.name] = i
2136 code2i[d.code] = i
2137
2138del name2i, code2i, i, d
2139
2140##############################################################################
2141# Build a code2op dict, mapping opcode characters to OpcodeInfo records.
2142# Also ensure we've got the same stuff as pickle.py, although the
2143# introspection here is dicey.
2144
2145code2op = {}
2146for d in opcodes:
2147 code2op[d.code] = d
2148del d
2149
2150def assure_pickle_consistency(verbose=False):
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002151
2152 copy = code2op.copy()
2153 for name in pickle.__all__:
2154 if not re.match("[A-Z][A-Z0-9_]+$", name):
2155 if verbose:
Guido van Rossumbe19ed72007-02-09 05:37:30 +00002156 print("skipping %r: it doesn't look like an opcode name" % name)
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002157 continue
2158 picklecode = getattr(pickle, name)
Guido van Rossum617dbc42007-05-07 23:57:08 +00002159 if not isinstance(picklecode, bytes) or len(picklecode) != 1:
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002160 if verbose:
Guido van Rossumbe19ed72007-02-09 05:37:30 +00002161 print(("skipping %r: value %r doesn't look like a pickle "
2162 "code" % (name, picklecode)))
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002163 continue
Guido van Rossum617dbc42007-05-07 23:57:08 +00002164 picklecode = picklecode.decode("latin-1")
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002165 if picklecode in copy:
2166 if verbose:
Guido van Rossumbe19ed72007-02-09 05:37:30 +00002167 print("checking name %r w/ code %r for consistency" % (
2168 name, picklecode))
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002169 d = copy[picklecode]
2170 if d.name != name:
2171 raise ValueError("for pickle code %r, pickle.py uses name %r "
2172 "but we're using name %r" % (picklecode,
2173 name,
2174 d.name))
2175 # Forget this one. Any left over in copy at the end are a problem
2176 # of a different kind.
2177 del copy[picklecode]
2178 else:
2179 raise ValueError("pickle.py appears to have a pickle opcode with "
2180 "name %r and code %r, but we don't" %
2181 (name, picklecode))
2182 if copy:
2183 msg = ["we appear to have pickle opcodes that pickle.py doesn't have:"]
2184 for code, d in copy.items():
2185 msg.append(" name %r with code %r" % (d.name, code))
2186 raise ValueError("\n".join(msg))
2187
2188assure_pickle_consistency()
Tim Petersc0c12b52003-01-29 00:56:17 +00002189del assure_pickle_consistency
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002190
2191##############################################################################
2192# A pickle opcode generator.
2193
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01002194def _genops(data, yield_end_pos=False):
2195 if isinstance(data, bytes_types):
2196 data = io.BytesIO(data)
2197
2198 if hasattr(data, "tell"):
2199 getpos = data.tell
2200 else:
2201 getpos = lambda: None
2202
2203 while True:
2204 pos = getpos()
2205 code = data.read(1)
2206 opcode = code2op.get(code.decode("latin-1"))
2207 if opcode is None:
2208 if code == b"":
2209 raise ValueError("pickle exhausted before seeing STOP")
2210 else:
2211 raise ValueError("at position %s, opcode %r unknown" % (
2212 "<unknown>" if pos is None else pos,
2213 code))
2214 if opcode.arg is None:
2215 arg = None
2216 else:
2217 arg = opcode.arg.reader(data)
2218 if yield_end_pos:
2219 yield opcode, arg, pos, getpos()
2220 else:
2221 yield opcode, arg, pos
2222 if code == b'.':
2223 assert opcode.name == 'STOP'
2224 break
2225
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002226def genops(pickle):
Guido van Rossuma72ded92003-01-27 19:40:47 +00002227 """Generate all the opcodes in a pickle.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002228
2229 'pickle' is a file-like object, or string, containing the pickle.
2230
2231 Each opcode in the pickle is generated, from the current pickle position,
2232 stopping after a STOP opcode is delivered. A triple is generated for
2233 each opcode:
2234
2235 opcode, arg, pos
2236
2237 opcode is an OpcodeInfo record, describing the current opcode.
2238
2239 If the opcode has an argument embedded in the pickle, arg is its decoded
2240 value, as a Python object. If the opcode doesn't have an argument, arg
2241 is None.
2242
2243 If the pickle has a tell() method, pos was the value of pickle.tell()
Guido van Rossum34d19282007-08-09 01:03:29 +00002244 before reading the current opcode. If the pickle is a bytes object,
2245 it's wrapped in a BytesIO object, and the latter's tell() result is
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002246 used. Else (the pickle doesn't have a tell(), and it's not obvious how
2247 to query its current position) pos is None.
2248 """
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01002249 return _genops(pickle)
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002250
2251##############################################################################
Christian Heimes3feef612008-02-11 06:19:17 +00002252# A pickle optimizer.
2253
2254def optimize(p):
2255 'Optimize a pickle string by removing unused PUT opcodes'
Serhiy Storchaka05dadcf2014-12-16 18:00:56 +02002256 put = 'PUT'
2257 get = 'GET'
2258 oldids = set() # set of all PUT ids
2259 newids = {} # set of ids used by a GET opcode
2260 opcodes = [] # (op, idx) or (pos, end_pos)
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01002261 proto = 0
Serhiy Storchaka05dadcf2014-12-16 18:00:56 +02002262 protoheader = b''
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01002263 for opcode, arg, pos, end_pos in _genops(p, yield_end_pos=True):
Christian Heimes3feef612008-02-11 06:19:17 +00002264 if 'PUT' in opcode.name:
Serhiy Storchaka05dadcf2014-12-16 18:00:56 +02002265 oldids.add(arg)
2266 opcodes.append((put, arg))
2267 elif opcode.name == 'MEMOIZE':
2268 idx = len(oldids)
2269 oldids.add(idx)
2270 opcodes.append((put, idx))
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01002271 elif 'FRAME' in opcode.name:
2272 pass
Serhiy Storchaka05dadcf2014-12-16 18:00:56 +02002273 elif 'GET' in opcode.name:
2274 if opcode.proto > proto:
2275 proto = opcode.proto
2276 newids[arg] = None
2277 opcodes.append((get, arg))
2278 elif opcode.name == 'PROTO':
2279 if arg > proto:
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01002280 proto = arg
Serhiy Storchaka05dadcf2014-12-16 18:00:56 +02002281 if pos == 0:
Olivier Grisel3cd7c6e2018-01-06 16:18:54 +01002282 protoheader = p[pos:end_pos]
Serhiy Storchaka05dadcf2014-12-16 18:00:56 +02002283 else:
2284 opcodes.append((pos, end_pos))
2285 else:
2286 opcodes.append((pos, end_pos))
2287 del oldids
Christian Heimes3feef612008-02-11 06:19:17 +00002288
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01002289 # Copy the opcodes except for PUTS without a corresponding GET
2290 out = io.BytesIO()
Serhiy Storchaka05dadcf2014-12-16 18:00:56 +02002291 # Write the PROTO header before any framing
2292 out.write(protoheader)
2293 pickler = pickle._Pickler(out, proto)
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01002294 if proto >= 4:
Serhiy Storchaka05dadcf2014-12-16 18:00:56 +02002295 pickler.framer.start_framing()
2296 idx = 0
2297 for op, arg in opcodes:
Olivier Grisel3cd7c6e2018-01-06 16:18:54 +01002298 frameless = False
Serhiy Storchaka05dadcf2014-12-16 18:00:56 +02002299 if op is put:
2300 if arg not in newids:
2301 continue
2302 data = pickler.put(idx)
2303 newids[arg] = idx
2304 idx += 1
2305 elif op is get:
2306 data = pickler.get(newids[arg])
2307 else:
2308 data = p[op:arg]
Olivier Grisel3cd7c6e2018-01-06 16:18:54 +01002309 frameless = len(data) > pickler.framer._FRAME_SIZE_TARGET
2310 pickler.framer.commit_frame(force=frameless)
2311 if frameless:
2312 pickler.framer.file_write(data)
2313 else:
2314 pickler.write(data)
Serhiy Storchaka05dadcf2014-12-16 18:00:56 +02002315 pickler.framer.end_framing()
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01002316 return out.getvalue()
Christian Heimes3feef612008-02-11 06:19:17 +00002317
2318##############################################################################
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002319# A symbolic pickle disassembler.
2320
Alexander Belopolsky929d3842010-07-17 15:51:21 +00002321def dis(pickle, out=None, memo=None, indentlevel=4, annotate=0):
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002322 """Produce a symbolic disassembly of a pickle.
2323
2324 'pickle' is a file-like object, or string, containing a (at least one)
2325 pickle. The pickle is disassembled from the current position, through
2326 the first STOP opcode encountered.
2327
2328 Optional arg 'out' is a file-like object to which the disassembly is
2329 printed. It defaults to sys.stdout.
2330
Tim Peters62235e72003-02-05 19:55:53 +00002331 Optional arg 'memo' is a Python dict, used as the pickle's memo. It
2332 may be mutated by dis(), if the pickle contains PUT or BINPUT opcodes.
2333 Passing the same memo object to another dis() call then allows disassembly
2334 to proceed across multiple pickles that were all created by the same
2335 pickler with the same memo. Ordinarily you don't need to worry about this.
2336
Alexander Belopolsky929d3842010-07-17 15:51:21 +00002337 Optional arg 'indentlevel' is the number of blanks by which to indent
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002338 a new MARK level. It defaults to 4.
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002339
Alexander Belopolsky929d3842010-07-17 15:51:21 +00002340 Optional arg 'annotate' if nonzero instructs dis() to add short
2341 description of the opcode on each line of disassembled output.
2342 The value given to 'annotate' must be an integer and is used as a
2343 hint for the column where annotation should start. The default
2344 value is 0, meaning no annotations.
2345
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002346 In addition to printing the disassembly, some sanity checks are made:
2347
2348 + All embedded opcode arguments "make sense".
2349
2350 + Explicit and implicit pop operations have enough items on the stack.
2351
2352 + When an opcode implicitly refers to a markobject, a markobject is
2353 actually on the stack.
2354
2355 + A memo entry isn't referenced before it's defined.
2356
2357 + The markobject isn't stored in the memo.
2358
2359 + A memo entry isn't redefined.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002360 """
2361
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002362 # Most of the hair here is for sanity checks, but most of it is needed
2363 # anyway to detect when a protocol 0 POP takes a MARK off the stack
2364 # (which in turn is needed to indent MARK blocks correctly).
2365
2366 stack = [] # crude emulation of unpickler stack
Tim Peters62235e72003-02-05 19:55:53 +00002367 if memo is None:
Ezio Melotti30b9d5d2013-08-17 15:50:46 +03002368 memo = {} # crude emulation of unpickler memo
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002369 maxproto = -1 # max protocol number seen
2370 markstack = [] # bytecode positions of MARK opcodes
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002371 indentchunk = ' ' * indentlevel
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002372 errormsg = None
Ezio Melotti30b9d5d2013-08-17 15:50:46 +03002373 annocol = annotate # column hint for annotations
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002374 for opcode, arg, pos in genops(pickle):
2375 if pos is not None:
Guido van Rossumbe19ed72007-02-09 05:37:30 +00002376 print("%5d:" % pos, end=' ', file=out)
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002377
Tim Petersd0f7c862003-01-28 15:27:57 +00002378 line = "%-4s %s%s" % (repr(opcode.code)[1:-1],
2379 indentchunk * len(markstack),
2380 opcode.name)
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002381
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002382 maxproto = max(maxproto, opcode.proto)
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002383 before = opcode.stack_before # don't mutate
2384 after = opcode.stack_after # don't mutate
Tim Peters43277d62003-01-30 15:02:12 +00002385 numtopop = len(before)
2386
2387 # See whether a MARK should be popped.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002388 markmsg = None
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002389 if markobject in before or (opcode.name == "POP" and
2390 stack and
2391 stack[-1] is markobject):
2392 assert markobject not in after
Tim Peters43277d62003-01-30 15:02:12 +00002393 if __debug__:
2394 if markobject in before:
2395 assert before[-1] is stackslice
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002396 if markstack:
2397 markpos = markstack.pop()
2398 if markpos is None:
2399 markmsg = "(MARK at unknown opcode offset)"
2400 else:
2401 markmsg = "(MARK at %d)" % markpos
2402 # Pop everything at and after the topmost markobject.
2403 while stack[-1] is not markobject:
2404 stack.pop()
2405 stack.pop()
Tim Peters43277d62003-01-30 15:02:12 +00002406 # Stop later code from popping too much.
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002407 try:
Tim Peters43277d62003-01-30 15:02:12 +00002408 numtopop = before.index(markobject)
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002409 except ValueError:
2410 assert opcode.name == "POP"
Tim Peters43277d62003-01-30 15:02:12 +00002411 numtopop = 0
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002412 else:
2413 errormsg = markmsg = "no MARK exists on stack"
2414
2415 # Check for correct memo usage.
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01002416 if opcode.name in ("PUT", "BINPUT", "LONG_BINPUT", "MEMOIZE"):
2417 if opcode.name == "MEMOIZE":
2418 memo_idx = len(memo)
Serhiy Storchakadbc517c2015-10-13 21:20:14 +03002419 markmsg = "(as %d)" % memo_idx
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01002420 else:
2421 assert arg is not None
2422 memo_idx = arg
2423 if memo_idx in memo:
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002424 errormsg = "memo key %r already defined" % arg
2425 elif not stack:
2426 errormsg = "stack is empty -- can't store into memo"
2427 elif stack[-1] is markobject:
2428 errormsg = "can't store markobject in the memo"
2429 else:
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01002430 memo[memo_idx] = stack[-1]
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002431 elif opcode.name in ("GET", "BINGET", "LONG_BINGET"):
2432 if arg in memo:
2433 assert len(after) == 1
2434 after = [memo[arg]] # for better stack emulation
2435 else:
2436 errormsg = "memo key %r has never been stored into" % arg
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002437
2438 if arg is not None or markmsg:
2439 # make a mild effort to align arguments
2440 line += ' ' * (10 - len(opcode.name))
2441 if arg is not None:
2442 line += ' ' + repr(arg)
2443 if markmsg:
2444 line += ' ' + markmsg
Alexander Belopolsky929d3842010-07-17 15:51:21 +00002445 if annotate:
2446 line += ' ' * (annocol - len(line))
2447 # make a mild effort to align annotations
2448 annocol = len(line)
2449 if annocol > 50:
2450 annocol = annotate
2451 line += ' ' + opcode.doc.split('\n', 1)[0]
Guido van Rossumbe19ed72007-02-09 05:37:30 +00002452 print(line, file=out)
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002453
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002454 if errormsg:
2455 # Note that we delayed complaining until the offending opcode
2456 # was printed.
2457 raise ValueError(errormsg)
2458
2459 # Emulate the stack effects.
Tim Peters43277d62003-01-30 15:02:12 +00002460 if len(stack) < numtopop:
2461 raise ValueError("tries to pop %d items from stack with "
2462 "only %d items" % (numtopop, len(stack)))
2463 if numtopop:
2464 del stack[-numtopop:]
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002465 if markobject in after:
Tim Peters43277d62003-01-30 15:02:12 +00002466 assert markobject not in before
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002467 markstack.append(pos)
2468
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002469 stack.extend(after)
2470
Guido van Rossumbe19ed72007-02-09 05:37:30 +00002471 print("highest protocol among opcodes =", maxproto, file=out)
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002472 if stack:
2473 raise ValueError("stack not empty after STOP: %r" % stack)
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002474
Tim Peters90718a42005-02-15 16:22:34 +00002475# For use in the doctest, simply as an example of a class to pickle.
2476class _Example:
2477 def __init__(self, value):
2478 self.value = value
2479
Guido van Rossum03e35322003-01-28 15:37:13 +00002480_dis_test = r"""
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002481>>> import pickle
Guido van Rossumf4169812008-03-17 22:56:06 +00002482>>> x = [1, 2, (3, 4), {b'abc': "def"}]
2483>>> pkl0 = pickle.dumps(x, 0)
2484>>> dis(pkl0)
Tim Petersd0f7c862003-01-28 15:27:57 +00002485 0: ( MARK
2486 1: l LIST (MARK at 0)
2487 2: p PUT 0
Serhiy Storchaka3daaafb2017-11-16 09:44:43 +02002488 5: I INT 1
2489 8: a APPEND
2490 9: I INT 2
2491 12: a APPEND
2492 13: ( MARK
2493 14: I INT 3
2494 17: I INT 4
2495 20: t TUPLE (MARK at 13)
2496 21: p PUT 1
2497 24: a APPEND
2498 25: ( MARK
2499 26: d DICT (MARK at 25)
2500 27: p PUT 2
2501 30: c GLOBAL '_codecs encode'
2502 46: p PUT 3
2503 49: ( MARK
2504 50: V UNICODE 'abc'
2505 55: p PUT 4
2506 58: V UNICODE 'latin1'
2507 66: p PUT 5
2508 69: t TUPLE (MARK at 49)
2509 70: p PUT 6
2510 73: R REDUCE
2511 74: p PUT 7
2512 77: V UNICODE 'def'
2513 82: p PUT 8
2514 85: s SETITEM
2515 86: a APPEND
2516 87: . STOP
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002517highest protocol among opcodes = 0
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002518
2519Try again with a "binary" pickle.
2520
Guido van Rossumf4169812008-03-17 22:56:06 +00002521>>> pkl1 = pickle.dumps(x, 1)
2522>>> dis(pkl1)
Tim Petersd0f7c862003-01-28 15:27:57 +00002523 0: ] EMPTY_LIST
2524 1: q BINPUT 0
2525 3: ( MARK
2526 4: K BININT1 1
2527 6: K BININT1 2
2528 8: ( MARK
2529 9: K BININT1 3
2530 11: K BININT1 4
2531 13: t TUPLE (MARK at 8)
2532 14: q BINPUT 1
2533 16: } EMPTY_DICT
2534 17: q BINPUT 2
Alexandre Vassalotti3bfc65a2011-12-13 13:08:09 -05002535 19: c GLOBAL '_codecs encode'
2536 35: q BINPUT 3
2537 37: ( MARK
2538 38: X BINUNICODE 'abc'
2539 46: q BINPUT 4
2540 48: X BINUNICODE 'latin1'
2541 59: q BINPUT 5
2542 61: t TUPLE (MARK at 37)
2543 62: q BINPUT 6
2544 64: R REDUCE
2545 65: q BINPUT 7
2546 67: X BINUNICODE 'def'
2547 75: q BINPUT 8
2548 77: s SETITEM
2549 78: e APPENDS (MARK at 3)
2550 79: . STOP
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002551highest protocol among opcodes = 1
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002552
2553Exercise the INST/OBJ/BUILD family.
2554
Mark Dickinsoncddcf442009-01-24 21:46:33 +00002555>>> import pickletools
2556>>> dis(pickle.dumps(pickletools.dis, 0))
2557 0: c GLOBAL 'pickletools dis'
2558 17: p PUT 0
2559 20: . STOP
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002560highest protocol among opcodes = 0
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002561
Tim Peters90718a42005-02-15 16:22:34 +00002562>>> from pickletools import _Example
2563>>> x = [_Example(42)] * 2
Guido van Rossumf29d3d62003-01-27 22:47:53 +00002564>>> dis(pickle.dumps(x, 0))
Tim Petersd0f7c862003-01-28 15:27:57 +00002565 0: ( MARK
2566 1: l LIST (MARK at 0)
2567 2: p PUT 0
Antoine Pitroud9dfaa92009-06-04 20:32:06 +00002568 5: c GLOBAL 'copy_reg _reconstructor'
2569 30: p PUT 1
2570 33: ( MARK
2571 34: c GLOBAL 'pickletools _Example'
2572 56: p PUT 2
2573 59: c GLOBAL '__builtin__ object'
2574 79: p PUT 3
2575 82: N NONE
2576 83: t TUPLE (MARK at 33)
2577 84: p PUT 4
2578 87: R REDUCE
2579 88: p PUT 5
2580 91: ( MARK
2581 92: d DICT (MARK at 91)
2582 93: p PUT 6
2583 96: V UNICODE 'value'
2584 103: p PUT 7
Serhiy Storchaka3daaafb2017-11-16 09:44:43 +02002585 106: I INT 42
2586 110: s SETITEM
2587 111: b BUILD
2588 112: a APPEND
2589 113: g GET 5
2590 116: a APPEND
2591 117: . STOP
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002592highest protocol among opcodes = 0
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002593
2594>>> dis(pickle.dumps(x, 1))
Tim Petersd0f7c862003-01-28 15:27:57 +00002595 0: ] EMPTY_LIST
2596 1: q BINPUT 0
2597 3: ( MARK
Antoine Pitroud9dfaa92009-06-04 20:32:06 +00002598 4: c GLOBAL 'copy_reg _reconstructor'
2599 29: q BINPUT 1
2600 31: ( MARK
2601 32: c GLOBAL 'pickletools _Example'
2602 54: q BINPUT 2
2603 56: c GLOBAL '__builtin__ object'
2604 76: q BINPUT 3
2605 78: N NONE
2606 79: t TUPLE (MARK at 31)
2607 80: q BINPUT 4
2608 82: R REDUCE
2609 83: q BINPUT 5
2610 85: } EMPTY_DICT
2611 86: q BINPUT 6
2612 88: X BINUNICODE 'value'
2613 98: q BINPUT 7
2614 100: K BININT1 42
2615 102: s SETITEM
2616 103: b BUILD
2617 104: h BINGET 5
2618 106: e APPENDS (MARK at 3)
2619 107: . STOP
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002620highest protocol among opcodes = 1
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002621
2622Try "the canonical" recursive-object test.
2623
2624>>> L = []
2625>>> T = L,
2626>>> L.append(T)
2627>>> L[0] is T
2628True
2629>>> T[0] is L
2630True
2631>>> L[0][0] is L
2632True
2633>>> T[0][0] is T
2634True
Guido van Rossumf29d3d62003-01-27 22:47:53 +00002635>>> dis(pickle.dumps(L, 0))
Tim Petersd0f7c862003-01-28 15:27:57 +00002636 0: ( MARK
2637 1: l LIST (MARK at 0)
2638 2: p PUT 0
2639 5: ( MARK
2640 6: g GET 0
2641 9: t TUPLE (MARK at 5)
2642 10: p PUT 1
2643 13: a APPEND
2644 14: . STOP
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002645highest protocol among opcodes = 0
2646
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002647>>> dis(pickle.dumps(L, 1))
Tim Petersd0f7c862003-01-28 15:27:57 +00002648 0: ] EMPTY_LIST
2649 1: q BINPUT 0
2650 3: ( MARK
2651 4: h BINGET 0
2652 6: t TUPLE (MARK at 3)
2653 7: q BINPUT 1
2654 9: a APPEND
2655 10: . STOP
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002656highest protocol among opcodes = 1
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002657
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002658Note that, in the protocol 0 pickle of the recursive tuple, the disassembler
2659has to emulate the stack in order to realize that the POP opcode at 16 gets
2660rid of the MARK at 0.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002661
Guido van Rossumf29d3d62003-01-27 22:47:53 +00002662>>> dis(pickle.dumps(T, 0))
Tim Petersd0f7c862003-01-28 15:27:57 +00002663 0: ( MARK
2664 1: ( MARK
2665 2: l LIST (MARK at 1)
2666 3: p PUT 0
2667 6: ( MARK
2668 7: g GET 0
2669 10: t TUPLE (MARK at 6)
2670 11: p PUT 1
2671 14: a APPEND
2672 15: 0 POP
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002673 16: 0 POP (MARK at 0)
2674 17: g GET 1
2675 20: . STOP
2676highest protocol among opcodes = 0
2677
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002678>>> dis(pickle.dumps(T, 1))
Tim Petersd0f7c862003-01-28 15:27:57 +00002679 0: ( MARK
2680 1: ] EMPTY_LIST
2681 2: q BINPUT 0
2682 4: ( MARK
2683 5: h BINGET 0
2684 7: t TUPLE (MARK at 4)
2685 8: q BINPUT 1
2686 10: a APPEND
2687 11: 1 POP_MARK (MARK at 0)
2688 12: h BINGET 1
2689 14: . STOP
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002690highest protocol among opcodes = 1
Tim Petersd0f7c862003-01-28 15:27:57 +00002691
2692Try protocol 2.
2693
2694>>> dis(pickle.dumps(L, 2))
2695 0: \x80 PROTO 2
2696 2: ] EMPTY_LIST
2697 3: q BINPUT 0
2698 5: h BINGET 0
2699 7: \x85 TUPLE1
2700 8: q BINPUT 1
2701 10: a APPEND
2702 11: . STOP
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002703highest protocol among opcodes = 2
Tim Petersd0f7c862003-01-28 15:27:57 +00002704
2705>>> dis(pickle.dumps(T, 2))
2706 0: \x80 PROTO 2
2707 2: ] EMPTY_LIST
2708 3: q BINPUT 0
2709 5: h BINGET 0
2710 7: \x85 TUPLE1
2711 8: q BINPUT 1
2712 10: a APPEND
2713 11: 0 POP
2714 12: h BINGET 1
2715 14: . STOP
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002716highest protocol among opcodes = 2
Alexander Belopolsky929d3842010-07-17 15:51:21 +00002717
2718Try protocol 3 with annotations:
2719
2720>>> dis(pickle.dumps(T, 3), annotate=1)
2721 0: \x80 PROTO 3 Protocol version indicator.
2722 2: ] EMPTY_LIST Push an empty list.
2723 3: q BINPUT 0 Store the stack top into the memo. The stack is not popped.
2724 5: h BINGET 0 Read an object from the memo and push it on the stack.
2725 7: \x85 TUPLE1 Build a one-tuple out of the topmost item on the stack.
2726 8: q BINPUT 1 Store the stack top into the memo. The stack is not popped.
2727 10: a APPEND Append an object to a list.
2728 11: 0 POP Discard the top stack item, shrinking the stack by one item.
2729 12: h BINGET 1 Read an object from the memo and push it on the stack.
2730 14: . STOP Stop the unpickling machine.
2731highest protocol among opcodes = 2
2732
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002733"""
2734
Tim Peters62235e72003-02-05 19:55:53 +00002735_memo_test = r"""
2736>>> import pickle
Guido van Rossumcfe5f202007-05-08 21:26:54 +00002737>>> import io
2738>>> f = io.BytesIO()
Tim Peters62235e72003-02-05 19:55:53 +00002739>>> p = pickle.Pickler(f, 2)
2740>>> x = [1, 2, 3]
2741>>> p.dump(x)
2742>>> p.dump(x)
2743>>> f.seek(0)
Guido van Rossumcfe5f202007-05-08 21:26:54 +000027440
Tim Peters62235e72003-02-05 19:55:53 +00002745>>> memo = {}
2746>>> dis(f, memo=memo)
2747 0: \x80 PROTO 2
2748 2: ] EMPTY_LIST
2749 3: q BINPUT 0
2750 5: ( MARK
2751 6: K BININT1 1
2752 8: K BININT1 2
2753 10: K BININT1 3
2754 12: e APPENDS (MARK at 5)
2755 13: . STOP
2756highest protocol among opcodes = 2
2757>>> dis(f, memo=memo)
2758 14: \x80 PROTO 2
2759 16: h BINGET 0
2760 18: . STOP
2761highest protocol among opcodes = 2
2762"""
2763
Guido van Rossum57028352003-01-28 15:09:10 +00002764__test__ = {'disassembler_test': _dis_test,
Tim Peters62235e72003-02-05 19:55:53 +00002765 'disassembler_memo_test': _memo_test,
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002766 }
2767
2768def _test():
2769 import doctest
2770 return doctest.testmod()
2771
2772if __name__ == "__main__":
Benjamin Peterson669ff662015-10-28 23:15:13 -07002773 import argparse
Alexander Belopolsky60c762b2010-07-03 20:35:53 +00002774 parser = argparse.ArgumentParser(
2775 description='disassemble one or more pickle files')
2776 parser.add_argument(
2777 'pickle_file', type=argparse.FileType('br'),
2778 nargs='*', help='the pickle file')
2779 parser.add_argument(
2780 '-o', '--output', default=sys.stdout, type=argparse.FileType('w'),
2781 help='the file where the output should be written')
2782 parser.add_argument(
2783 '-m', '--memo', action='store_true',
2784 help='preserve memo between disassemblies')
2785 parser.add_argument(
2786 '-l', '--indentlevel', default=4, type=int,
2787 help='the number of blanks by which to indent a new MARK level')
2788 parser.add_argument(
Alexander Belopolsky929d3842010-07-17 15:51:21 +00002789 '-a', '--annotate', action='store_true',
2790 help='annotate each line with a short opcode description')
2791 parser.add_argument(
Alexander Belopolsky60c762b2010-07-03 20:35:53 +00002792 '-p', '--preamble', default="==> {name} <==",
2793 help='if more than one pickle file is specified, print this before'
2794 ' each disassembly')
2795 parser.add_argument(
2796 '-t', '--test', action='store_true',
2797 help='run self-test suite')
2798 parser.add_argument(
2799 '-v', action='store_true',
2800 help='run verbosely; only affects self-test run')
2801 args = parser.parse_args()
2802 if args.test:
2803 _test()
2804 else:
Alexander Belopolsky929d3842010-07-17 15:51:21 +00002805 annotate = 30 if args.annotate else 0
Alexander Belopolsky60c762b2010-07-03 20:35:53 +00002806 if not args.pickle_file:
2807 parser.print_help()
2808 elif len(args.pickle_file) == 1:
Alexander Belopolsky929d3842010-07-17 15:51:21 +00002809 dis(args.pickle_file[0], args.output, None,
2810 args.indentlevel, annotate)
Alexander Belopolsky60c762b2010-07-03 20:35:53 +00002811 else:
2812 memo = {} if args.memo else None
2813 for f in args.pickle_file:
2814 preamble = args.preamble.format(name=f.name)
2815 args.output.write(preamble + '\n')
Alexander Belopolsky929d3842010-07-17 15:51:21 +00002816 dis(f, args.output, memo, args.indentlevel, annotate)