blob: d711bf0490e560493e05208c244007b7c17476e9 [file] [log] [blame]
Skip Montanaro54455942003-01-29 15:41:33 +00001'''"Executable documentation" for the pickle module.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002
3Extensive comments about the pickle protocols and pickle-machine opcodes
4can be found here. Some functions meant for external use:
5
6genops(pickle)
7 Generate all the opcodes in a pickle, as (opcode, arg, position) triples.
8
Andrew M. Kuchlingd0c53fe2004-08-07 16:51:30 +00009dis(pickle, out=None, memo=None, indentlevel=4)
Tim Peters8ecfc8e2003-01-27 18:51:48 +000010 Print a symbolic disassembly of a pickle.
Skip Montanaro54455942003-01-29 15:41:33 +000011'''
Tim Peters8ecfc8e2003-01-27 18:51:48 +000012
Walter Dörwald42748a82007-06-12 16:40:17 +000013import codecs
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +010014import io
Guido van Rossum98297ee2007-11-06 21:34:58 +000015import pickle
16import re
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -070017import sys
Walter Dörwald42748a82007-06-12 16:40:17 +000018
Christian Heimes3feef612008-02-11 06:19:17 +000019__all__ = ['dis', 'genops', 'optimize']
Tim Peters90cf2122004-11-06 23:45:48 +000020
Guido van Rossum98297ee2007-11-06 21:34:58 +000021bytes_types = pickle.bytes_types
22
Tim Peters8ecfc8e2003-01-27 18:51:48 +000023# Other ideas:
24#
25# - A pickle verifier: read a pickle and check it exhaustively for
Tim Petersc1c2b3e2003-01-29 20:12:21 +000026# well-formedness. dis() does a lot of this already.
Tim Peters8ecfc8e2003-01-27 18:51:48 +000027#
28# - A protocol identifier: examine a pickle and return its protocol number
29# (== the highest .proto attr value among all the opcodes in the pickle).
Tim Petersc1c2b3e2003-01-29 20:12:21 +000030# dis() already prints this info at the end.
Tim Peters8ecfc8e2003-01-27 18:51:48 +000031#
32# - A pickle optimizer: for example, tuple-building code is sometimes more
33# elaborate than necessary, catering for the possibility that the tuple
34# is recursive. Or lots of times a PUT is generated that's never accessed
35# by a later GET.
36
37
Victor Stinner765531d2013-03-26 01:11:54 +010038# "A pickle" is a program for a virtual pickle machine (PM, but more accurately
39# called an unpickling machine). It's a sequence of opcodes, interpreted by the
40# PM, building an arbitrarily complex Python object.
41#
42# For the most part, the PM is very simple: there are no looping, testing, or
43# conditional instructions, no arithmetic and no function calls. Opcodes are
44# executed once each, from first to last, until a STOP opcode is reached.
45#
46# The PM has two data areas, "the stack" and "the memo".
47#
48# Many opcodes push Python objects onto the stack; e.g., INT pushes a Python
49# integer object on the stack, whose value is gotten from a decimal string
50# literal immediately following the INT opcode in the pickle bytestream. Other
51# opcodes take Python objects off the stack. The result of unpickling is
52# whatever object is left on the stack when the final STOP opcode is executed.
53#
54# The memo is simply an array of objects, or it can be implemented as a dict
55# mapping little integers to objects. The memo serves as the PM's "long term
56# memory", and the little integers indexing the memo are akin to variable
57# names. Some opcodes pop a stack object into the memo at a given index,
58# and others push a memo object at a given index onto the stack again.
59#
60# At heart, that's all the PM has. Subtleties arise for these reasons:
61#
62# + Object identity. Objects can be arbitrarily complex, and subobjects
63# may be shared (for example, the list [a, a] refers to the same object a
64# twice). It can be vital that unpickling recreate an isomorphic object
65# graph, faithfully reproducing sharing.
66#
67# + Recursive objects. For example, after "L = []; L.append(L)", L is a
68# list, and L[0] is the same list. This is related to the object identity
69# point, and some sequences of pickle opcodes are subtle in order to
70# get the right result in all cases.
71#
72# + Things pickle doesn't know everything about. Examples of things pickle
73# does know everything about are Python's builtin scalar and container
74# types, like ints and tuples. They generally have opcodes dedicated to
75# them. For things like module references and instances of user-defined
76# classes, pickle's knowledge is limited. Historically, many enhancements
77# have been made to the pickle protocol in order to do a better (faster,
78# and/or more compact) job on those.
79#
80# + Backward compatibility and micro-optimization. As explained below,
81# pickle opcodes never go away, not even when better ways to do a thing
82# get invented. The repertoire of the PM just keeps growing over time.
83# For example, protocol 0 had two opcodes for building Python integers (INT
84# and LONG), protocol 1 added three more for more-efficient pickling of short
85# integers, and protocol 2 added two more for more-efficient pickling of
86# long integers (before protocol 2, the only ways to pickle a Python long
87# took time quadratic in the number of digits, for both pickling and
88# unpickling). "Opcode bloat" isn't so much a subtlety as a source of
89# wearying complication.
90#
91#
92# Pickle protocols:
93#
94# For compatibility, the meaning of a pickle opcode never changes. Instead new
95# pickle opcodes get added, and each version's unpickler can handle all the
96# pickle opcodes in all protocol versions to date. So old pickles continue to
97# be readable forever. The pickler can generally be told to restrict itself to
98# the subset of opcodes available under previous protocol versions too, so that
99# users can create pickles under the current version readable by older
100# versions. However, a pickle does not contain its version number embedded
101# within it. If an older unpickler tries to read a pickle using a later
102# protocol, the result is most likely an exception due to seeing an unknown (in
103# the older unpickler) opcode.
104#
105# The original pickle used what's now called "protocol 0", and what was called
106# "text mode" before Python 2.3. The entire pickle bytestream is made up of
107# printable 7-bit ASCII characters, plus the newline character, in protocol 0.
108# That's why it was called text mode. Protocol 0 is small and elegant, but
109# sometimes painfully inefficient.
110#
111# The second major set of additions is now called "protocol 1", and was called
112# "binary mode" before Python 2.3. This added many opcodes with arguments
113# consisting of arbitrary bytes, including NUL bytes and unprintable "high bit"
114# bytes. Binary mode pickles can be substantially smaller than equivalent
115# text mode pickles, and sometimes faster too; e.g., BININT represents a 4-byte
116# int as 4 bytes following the opcode, which is cheaper to unpickle than the
117# (perhaps) 11-character decimal string attached to INT. Protocol 1 also added
118# a number of opcodes that operate on many stack elements at once (like APPENDS
119# and SETITEMS), and "shortcut" opcodes (like EMPTY_DICT and EMPTY_TUPLE).
120#
121# The third major set of additions came in Python 2.3, and is called "protocol
122# 2". This added:
123#
124# - A better way to pickle instances of new-style classes (NEWOBJ).
125#
126# - A way for a pickle to identify its protocol (PROTO).
127#
128# - Time- and space- efficient pickling of long ints (LONG{1,4}).
129#
130# - Shortcuts for small tuples (TUPLE{1,2,3}}.
131#
132# - Dedicated opcodes for bools (NEWTRUE, NEWFALSE).
133#
134# - The "extension registry", a vector of popular objects that can be pushed
135# efficiently by index (EXT{1,2,4}). This is akin to the memo and GET, but
136# the registry contents are predefined (there's nothing akin to the memo's
137# PUT).
138#
139# Another independent change with Python 2.3 is the abandonment of any
140# pretense that it might be safe to load pickles received from untrusted
141# parties -- no sufficient security analysis has been done to guarantee
142# this and there isn't a use case that warrants the expense of such an
143# analysis.
144#
145# To this end, all tests for __safe_for_unpickling__ or for
146# copyreg.safe_constructors are removed from the unpickling code.
147# References to these variables in the descriptions below are to be seen
148# as describing unpickling in Python 2.2 and before.
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000149
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000150
151# Meta-rule: Descriptions are stored in instances of descriptor objects,
152# with plain constructors. No meta-language is defined from which
153# descriptors could be constructed. If you want, e.g., XML, write a little
154# program to generate XML from the objects.
155
156##############################################################################
157# Some pickle opcodes have an argument, following the opcode in the
158# bytestream. An argument is of a specific type, described by an instance
159# of ArgumentDescriptor. These are not to be confused with arguments taken
160# off the stack -- ArgumentDescriptor applies only to arguments embedded in
161# the opcode stream, immediately following an opcode.
162
163# Represents the number of bytes consumed by an argument delimited by the
164# next newline character.
165UP_TO_NEWLINE = -1
166
167# Represents the number of bytes consumed by a two-argument opcode where
168# the first argument gives the number of bytes in the second argument.
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -0700169TAKEN_FROM_ARGUMENT1 = -2 # num bytes is 1-byte unsigned int
170TAKEN_FROM_ARGUMENT4 = -3 # num bytes is 4-byte signed little-endian int
171TAKEN_FROM_ARGUMENT4U = -4 # num bytes is 4-byte unsigned little-endian int
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100172TAKEN_FROM_ARGUMENT8U = -5 # num bytes is 8-byte unsigned little-endian int
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000173
174class ArgumentDescriptor(object):
175 __slots__ = (
176 # name of descriptor record, also a module global name; a string
177 'name',
178
179 # length of argument, in bytes; an int; UP_TO_NEWLINE and
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100180 # TAKEN_FROM_ARGUMENT{1,4,8} are negative values for variable-length
Tim Petersfdb8cfa2003-01-28 00:13:19 +0000181 # cases
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000182 'n',
183
184 # a function taking a file-like object, reading this kind of argument
185 # from the object at the current position, advancing the current
186 # position by n bytes, and returning the value of the argument
187 'reader',
188
189 # human-readable docs for this arg descriptor; a string
190 'doc',
191 )
192
193 def __init__(self, name, n, reader, doc):
194 assert isinstance(name, str)
195 self.name = name
196
197 assert isinstance(n, int) and (n >= 0 or
Tim Petersfdb8cfa2003-01-28 00:13:19 +0000198 n in (UP_TO_NEWLINE,
199 TAKEN_FROM_ARGUMENT1,
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -0700200 TAKEN_FROM_ARGUMENT4,
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100201 TAKEN_FROM_ARGUMENT4U,
202 TAKEN_FROM_ARGUMENT8U))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000203 self.n = n
204
205 self.reader = reader
206
207 assert isinstance(doc, str)
208 self.doc = doc
209
210from struct import unpack as _unpack
211
212def read_uint1(f):
Tim Peters55762f52003-01-28 16:01:25 +0000213 r"""
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000214 >>> import io
215 >>> read_uint1(io.BytesIO(b'\xff'))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000216 255
217 """
218
219 data = f.read(1)
220 if data:
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000221 return data[0]
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000222 raise ValueError("not enough data in stream to read uint1")
223
224uint1 = ArgumentDescriptor(
225 name='uint1',
226 n=1,
227 reader=read_uint1,
228 doc="One-byte unsigned integer.")
229
230
231def read_uint2(f):
Tim Peters55762f52003-01-28 16:01:25 +0000232 r"""
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000233 >>> import io
234 >>> read_uint2(io.BytesIO(b'\xff\x00'))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000235 255
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000236 >>> read_uint2(io.BytesIO(b'\xff\xff'))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000237 65535
238 """
239
240 data = f.read(2)
241 if len(data) == 2:
242 return _unpack("<H", data)[0]
243 raise ValueError("not enough data in stream to read uint2")
244
245uint2 = ArgumentDescriptor(
246 name='uint2',
247 n=2,
248 reader=read_uint2,
249 doc="Two-byte unsigned integer, little-endian.")
250
251
252def read_int4(f):
Tim Peters55762f52003-01-28 16:01:25 +0000253 r"""
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000254 >>> import io
255 >>> read_int4(io.BytesIO(b'\xff\x00\x00\x00'))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000256 255
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000257 >>> read_int4(io.BytesIO(b'\x00\x00\x00\x80')) == -(2**31)
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000258 True
259 """
260
261 data = f.read(4)
262 if len(data) == 4:
263 return _unpack("<i", data)[0]
264 raise ValueError("not enough data in stream to read int4")
265
266int4 = ArgumentDescriptor(
267 name='int4',
268 n=4,
269 reader=read_int4,
270 doc="Four-byte signed integer, little-endian, 2's complement.")
271
272
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -0700273def read_uint4(f):
274 r"""
275 >>> import io
276 >>> read_uint4(io.BytesIO(b'\xff\x00\x00\x00'))
277 255
278 >>> read_uint4(io.BytesIO(b'\x00\x00\x00\x80')) == 2**31
279 True
280 """
281
282 data = f.read(4)
283 if len(data) == 4:
284 return _unpack("<I", data)[0]
285 raise ValueError("not enough data in stream to read uint4")
286
287uint4 = ArgumentDescriptor(
288 name='uint4',
289 n=4,
290 reader=read_uint4,
291 doc="Four-byte unsigned integer, little-endian.")
292
293
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100294def read_uint8(f):
295 r"""
296 >>> import io
297 >>> read_uint8(io.BytesIO(b'\xff\x00\x00\x00\x00\x00\x00\x00'))
298 255
299 >>> read_uint8(io.BytesIO(b'\xff' * 8)) == 2**64-1
300 True
301 """
302
303 data = f.read(8)
304 if len(data) == 8:
305 return _unpack("<Q", data)[0]
306 raise ValueError("not enough data in stream to read uint8")
307
308uint8 = ArgumentDescriptor(
309 name='uint8',
310 n=8,
311 reader=read_uint8,
312 doc="Eight-byte unsigned integer, little-endian.")
313
314
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000315def read_stringnl(f, decode=True, stripquotes=True):
Tim Peters55762f52003-01-28 16:01:25 +0000316 r"""
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000317 >>> import io
318 >>> read_stringnl(io.BytesIO(b"'abcd'\nefg\n"))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000319 'abcd'
320
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000321 >>> read_stringnl(io.BytesIO(b"\n"))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000322 Traceback (most recent call last):
323 ...
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000324 ValueError: no string quotes around b''
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000325
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000326 >>> read_stringnl(io.BytesIO(b"\n"), stripquotes=False)
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000327 ''
328
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000329 >>> read_stringnl(io.BytesIO(b"''\n"))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000330 ''
331
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000332 >>> read_stringnl(io.BytesIO(b'"abcd"'))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000333 Traceback (most recent call last):
334 ...
335 ValueError: no newline found when trying to read stringnl
336
337 Embedded escapes are undone in the result.
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000338 >>> read_stringnl(io.BytesIO(br"'a\n\\b\x00c\td'" + b"\n'e'"))
Tim Peters55762f52003-01-28 16:01:25 +0000339 'a\n\\b\x00c\td'
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000340 """
341
Guido van Rossum26986312007-07-17 00:19:46 +0000342 data = f.readline()
Guido van Rossum26d95c32007-08-27 23:18:54 +0000343 if not data.endswith(b'\n'):
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000344 raise ValueError("no newline found when trying to read stringnl")
345 data = data[:-1] # lose the newline
346
347 if stripquotes:
Guido van Rossum26d95c32007-08-27 23:18:54 +0000348 for q in (b'"', b"'"):
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000349 if data.startswith(q):
350 if not data.endswith(q):
351 raise ValueError("strinq quote %r not found at both "
352 "ends of %r" % (q, data))
353 data = data[1:-1]
354 break
355 else:
356 raise ValueError("no string quotes around %r" % data)
357
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000358 if decode:
Guido van Rossum98297ee2007-11-06 21:34:58 +0000359 data = codecs.escape_decode(data)[0].decode("ascii")
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000360 return data
361
362stringnl = ArgumentDescriptor(
363 name='stringnl',
364 n=UP_TO_NEWLINE,
365 reader=read_stringnl,
366 doc="""A newline-terminated string.
367
368 This is a repr-style string, with embedded escapes, and
369 bracketing quotes.
370 """)
371
372def read_stringnl_noescape(f):
Guido van Rossum98297ee2007-11-06 21:34:58 +0000373 return read_stringnl(f, stripquotes=False)
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000374
375stringnl_noescape = ArgumentDescriptor(
376 name='stringnl_noescape',
377 n=UP_TO_NEWLINE,
378 reader=read_stringnl_noescape,
379 doc="""A newline-terminated string.
380
381 This is a str-style string, without embedded escapes,
382 or bracketing quotes. It should consist solely of
383 printable ASCII characters.
384 """)
385
386def read_stringnl_noescape_pair(f):
Tim Peters55762f52003-01-28 16:01:25 +0000387 r"""
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000388 >>> import io
389 >>> read_stringnl_noescape_pair(io.BytesIO(b"Queue\nEmpty\njunk"))
Tim Petersd916cf42003-01-27 19:01:47 +0000390 'Queue Empty'
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000391 """
392
Tim Petersd916cf42003-01-27 19:01:47 +0000393 return "%s %s" % (read_stringnl_noescape(f), read_stringnl_noescape(f))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000394
395stringnl_noescape_pair = ArgumentDescriptor(
396 name='stringnl_noescape_pair',
397 n=UP_TO_NEWLINE,
398 reader=read_stringnl_noescape_pair,
399 doc="""A pair of newline-terminated strings.
400
401 These are str-style strings, without embedded
402 escapes, or bracketing quotes. They should
403 consist solely of printable ASCII characters.
404 The pair is returned as a single string, with
Tim Petersd916cf42003-01-27 19:01:47 +0000405 a single blank separating the two strings.
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000406 """)
407
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100408
409def read_string1(f):
410 r"""
411 >>> import io
412 >>> read_string1(io.BytesIO(b"\x00"))
413 ''
414 >>> read_string1(io.BytesIO(b"\x03abcdef"))
415 'abc'
416 """
417
418 n = read_uint1(f)
419 assert n >= 0
420 data = f.read(n)
421 if len(data) == n:
422 return data.decode("latin-1")
423 raise ValueError("expected %d bytes in a string1, but only %d remain" %
424 (n, len(data)))
425
426string1 = ArgumentDescriptor(
427 name="string1",
428 n=TAKEN_FROM_ARGUMENT1,
429 reader=read_string1,
430 doc="""A counted string.
431
432 The first argument is a 1-byte unsigned int giving the number
433 of bytes in the string, and the second argument is that many
434 bytes.
435 """)
436
437
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000438def read_string4(f):
Tim Peters55762f52003-01-28 16:01:25 +0000439 r"""
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000440 >>> import io
441 >>> read_string4(io.BytesIO(b"\x00\x00\x00\x00abc"))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000442 ''
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000443 >>> read_string4(io.BytesIO(b"\x03\x00\x00\x00abcdef"))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000444 'abc'
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000445 >>> read_string4(io.BytesIO(b"\x00\x00\x00\x03abcdef"))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000446 Traceback (most recent call last):
447 ...
448 ValueError: expected 50331648 bytes in a string4, but only 6 remain
449 """
450
451 n = read_int4(f)
452 if n < 0:
453 raise ValueError("string4 byte count < 0: %d" % n)
454 data = f.read(n)
455 if len(data) == n:
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000456 return data.decode("latin-1")
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000457 raise ValueError("expected %d bytes in a string4, but only %d remain" %
458 (n, len(data)))
459
460string4 = ArgumentDescriptor(
461 name="string4",
Tim Petersfdb8cfa2003-01-28 00:13:19 +0000462 n=TAKEN_FROM_ARGUMENT4,
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000463 reader=read_string4,
464 doc="""A counted string.
465
466 The first argument is a 4-byte little-endian signed int giving
467 the number of bytes in the string, and the second argument is
468 that many bytes.
469 """)
470
471
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100472def read_bytes1(f):
Tim Peters55762f52003-01-28 16:01:25 +0000473 r"""
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000474 >>> import io
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100475 >>> read_bytes1(io.BytesIO(b"\x00"))
476 b''
477 >>> read_bytes1(io.BytesIO(b"\x03abcdef"))
478 b'abc'
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000479 """
480
481 n = read_uint1(f)
482 assert n >= 0
483 data = f.read(n)
484 if len(data) == n:
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100485 return data
486 raise ValueError("expected %d bytes in a bytes1, but only %d remain" %
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000487 (n, len(data)))
488
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100489bytes1 = ArgumentDescriptor(
490 name="bytes1",
Tim Petersfdb8cfa2003-01-28 00:13:19 +0000491 n=TAKEN_FROM_ARGUMENT1,
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100492 reader=read_bytes1,
493 doc="""A counted bytes string.
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000494
495 The first argument is a 1-byte unsigned int giving the number
496 of bytes in the string, and the second argument is that many
497 bytes.
498 """)
499
500
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -0700501def read_bytes1(f):
502 r"""
503 >>> import io
504 >>> read_bytes1(io.BytesIO(b"\x00"))
505 b''
506 >>> read_bytes1(io.BytesIO(b"\x03abcdef"))
507 b'abc'
508 """
509
510 n = read_uint1(f)
511 assert n >= 0
512 data = f.read(n)
513 if len(data) == n:
514 return data
515 raise ValueError("expected %d bytes in a bytes1, but only %d remain" %
516 (n, len(data)))
517
518bytes1 = ArgumentDescriptor(
519 name="bytes1",
520 n=TAKEN_FROM_ARGUMENT1,
521 reader=read_bytes1,
522 doc="""A counted bytes string.
523
524 The first argument is a 1-byte unsigned int giving the number
525 of bytes, and the second argument is that many bytes.
526 """)
527
528
529def read_bytes4(f):
530 r"""
531 >>> import io
532 >>> read_bytes4(io.BytesIO(b"\x00\x00\x00\x00abc"))
533 b''
534 >>> read_bytes4(io.BytesIO(b"\x03\x00\x00\x00abcdef"))
535 b'abc'
536 >>> read_bytes4(io.BytesIO(b"\x00\x00\x00\x03abcdef"))
537 Traceback (most recent call last):
538 ...
539 ValueError: expected 50331648 bytes in a bytes4, but only 6 remain
540 """
541
542 n = read_uint4(f)
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100543 assert n >= 0
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -0700544 if n > sys.maxsize:
545 raise ValueError("bytes4 byte count > sys.maxsize: %d" % n)
546 data = f.read(n)
547 if len(data) == n:
548 return data
549 raise ValueError("expected %d bytes in a bytes4, but only %d remain" %
550 (n, len(data)))
551
552bytes4 = ArgumentDescriptor(
553 name="bytes4",
554 n=TAKEN_FROM_ARGUMENT4U,
555 reader=read_bytes4,
556 doc="""A counted bytes string.
557
558 The first argument is a 4-byte little-endian unsigned int giving
559 the number of bytes, and the second argument is that many bytes.
560 """)
561
562
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100563def read_bytes8(f):
564 r"""
565 >>> import io
566 >>> read_bytes8(io.BytesIO(b"\x00\x00\x00\x00\x00\x00\x00\x00abc"))
567 b''
568 >>> read_bytes8(io.BytesIO(b"\x03\x00\x00\x00\x00\x00\x00\x00abcdef"))
569 b'abc'
570 >>> read_bytes8(io.BytesIO(b"\x00\x00\x00\x00\x00\x00\x03\x00abcdef"))
571 Traceback (most recent call last):
572 ...
573 ValueError: expected 844424930131968 bytes in a bytes8, but only 6 remain
574 """
575
576 n = read_uint8(f)
577 assert n >= 0
578 if n > sys.maxsize:
579 raise ValueError("bytes8 byte count > sys.maxsize: %d" % n)
580 data = f.read(n)
581 if len(data) == n:
582 return data
583 raise ValueError("expected %d bytes in a bytes8, but only %d remain" %
584 (n, len(data)))
585
586bytes8 = ArgumentDescriptor(
587 name="bytes8",
588 n=TAKEN_FROM_ARGUMENT8U,
589 reader=read_bytes8,
590 doc="""A counted bytes string.
591
592 The first argument is a 8-byte little-endian unsigned int giving
593 the number of bytes, and the second argument is that many bytes.
594 """)
595
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000596def read_unicodestringnl(f):
Tim Peters55762f52003-01-28 16:01:25 +0000597 r"""
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000598 >>> import io
599 >>> read_unicodestringnl(io.BytesIO(b"abc\\uabcd\njunk")) == 'abc\uabcd'
600 True
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000601 """
602
Guido van Rossum26986312007-07-17 00:19:46 +0000603 data = f.readline()
Guido van Rossum26d95c32007-08-27 23:18:54 +0000604 if not data.endswith(b'\n'):
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000605 raise ValueError("no newline found when trying to read "
606 "unicodestringnl")
607 data = data[:-1] # lose the newline
Guido van Rossumef87d6e2007-05-02 19:09:54 +0000608 return str(data, 'raw-unicode-escape')
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000609
610unicodestringnl = ArgumentDescriptor(
611 name='unicodestringnl',
612 n=UP_TO_NEWLINE,
613 reader=read_unicodestringnl,
614 doc="""A newline-terminated Unicode string.
615
616 This is raw-unicode-escape encoded, so consists of
617 printable ASCII characters, and may contain embedded
618 escape sequences.
619 """)
620
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100621
622def read_unicodestring1(f):
623 r"""
624 >>> import io
625 >>> s = 'abcd\uabcd'
626 >>> enc = s.encode('utf-8')
627 >>> enc
628 b'abcd\xea\xaf\x8d'
629 >>> n = bytes([len(enc)]) # little-endian 1-byte length
630 >>> t = read_unicodestring1(io.BytesIO(n + enc + b'junk'))
631 >>> s == t
632 True
633
634 >>> read_unicodestring1(io.BytesIO(n + enc[:-1]))
635 Traceback (most recent call last):
636 ...
637 ValueError: expected 7 bytes in a unicodestring1, but only 6 remain
638 """
639
640 n = read_uint1(f)
641 assert n >= 0
642 data = f.read(n)
643 if len(data) == n:
644 return str(data, 'utf-8', 'surrogatepass')
645 raise ValueError("expected %d bytes in a unicodestring1, but only %d "
646 "remain" % (n, len(data)))
647
648unicodestring1 = ArgumentDescriptor(
649 name="unicodestring1",
650 n=TAKEN_FROM_ARGUMENT1,
651 reader=read_unicodestring1,
652 doc="""A counted Unicode string.
653
654 The first argument is a 1-byte little-endian signed int
655 giving the number of bytes in the string, and the second
656 argument-- the UTF-8 encoding of the Unicode string --
657 contains that many bytes.
658 """)
659
660
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000661def read_unicodestring4(f):
Tim Peters55762f52003-01-28 16:01:25 +0000662 r"""
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000663 >>> import io
664 >>> s = 'abcd\uabcd'
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000665 >>> enc = s.encode('utf-8')
666 >>> enc
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000667 b'abcd\xea\xaf\x8d'
668 >>> n = bytes([len(enc), 0, 0, 0]) # little-endian 4-byte length
669 >>> t = read_unicodestring4(io.BytesIO(n + enc + b'junk'))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000670 >>> s == t
671 True
672
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000673 >>> read_unicodestring4(io.BytesIO(n + enc[:-1]))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000674 Traceback (most recent call last):
675 ...
676 ValueError: expected 7 bytes in a unicodestring4, but only 6 remain
677 """
678
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -0700679 n = read_uint4(f)
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100680 assert n >= 0
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -0700681 if n > sys.maxsize:
682 raise ValueError("unicodestring4 byte count > sys.maxsize: %d" % n)
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000683 data = f.read(n)
684 if len(data) == n:
Victor Stinner485fb562010-04-13 11:07:24 +0000685 return str(data, 'utf-8', 'surrogatepass')
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000686 raise ValueError("expected %d bytes in a unicodestring4, but only %d "
687 "remain" % (n, len(data)))
688
689unicodestring4 = ArgumentDescriptor(
690 name="unicodestring4",
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -0700691 n=TAKEN_FROM_ARGUMENT4U,
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000692 reader=read_unicodestring4,
693 doc="""A counted Unicode string.
694
695 The first argument is a 4-byte little-endian signed int
696 giving the number of bytes in the string, and the second
697 argument-- the UTF-8 encoding of the Unicode string --
698 contains that many bytes.
699 """)
700
701
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +0100702def read_unicodestring8(f):
703 r"""
704 >>> import io
705 >>> s = 'abcd\uabcd'
706 >>> enc = s.encode('utf-8')
707 >>> enc
708 b'abcd\xea\xaf\x8d'
709 >>> n = bytes([len(enc)]) + bytes(7) # little-endian 8-byte length
710 >>> t = read_unicodestring8(io.BytesIO(n + enc + b'junk'))
711 >>> s == t
712 True
713
714 >>> read_unicodestring8(io.BytesIO(n + enc[:-1]))
715 Traceback (most recent call last):
716 ...
717 ValueError: expected 7 bytes in a unicodestring8, but only 6 remain
718 """
719
720 n = read_uint8(f)
721 assert n >= 0
722 if n > sys.maxsize:
723 raise ValueError("unicodestring8 byte count > sys.maxsize: %d" % n)
724 data = f.read(n)
725 if len(data) == n:
726 return str(data, 'utf-8', 'surrogatepass')
727 raise ValueError("expected %d bytes in a unicodestring8, but only %d "
728 "remain" % (n, len(data)))
729
730unicodestring8 = ArgumentDescriptor(
731 name="unicodestring8",
732 n=TAKEN_FROM_ARGUMENT8U,
733 reader=read_unicodestring8,
734 doc="""A counted Unicode string.
735
736 The first argument is a 8-byte little-endian signed int
737 giving the number of bytes in the string, and the second
738 argument-- the UTF-8 encoding of the Unicode string --
739 contains that many bytes.
740 """)
741
742
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000743def read_decimalnl_short(f):
Tim Peters55762f52003-01-28 16:01:25 +0000744 r"""
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000745 >>> import io
746 >>> read_decimalnl_short(io.BytesIO(b"1234\n56"))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000747 1234
748
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000749 >>> read_decimalnl_short(io.BytesIO(b"1234L\n56"))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000750 Traceback (most recent call last):
751 ...
Serhiy Storchaka95949422013-08-27 19:40:23 +0300752 ValueError: invalid literal for int() with base 10: b'1234L'
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000753 """
754
755 s = read_stringnl(f, decode=False, stripquotes=False)
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000756
Serhiy Storchaka95949422013-08-27 19:40:23 +0300757 # There's a hack for True and False here.
Jeremy Hyltona5dc3db2007-08-29 19:07:40 +0000758 if s == b"00":
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000759 return False
Jeremy Hyltona5dc3db2007-08-29 19:07:40 +0000760 elif s == b"01":
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000761 return True
762
Florent Xicluna2bb96f52011-10-23 22:11:00 +0200763 return int(s)
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000764
765def read_decimalnl_long(f):
Tim Peters55762f52003-01-28 16:01:25 +0000766 r"""
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000767 >>> import io
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000768
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000769 >>> read_decimalnl_long(io.BytesIO(b"1234L\n56"))
Guido van Rossume2b70bc2006-08-18 22:13:04 +0000770 1234
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000771
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000772 >>> read_decimalnl_long(io.BytesIO(b"123456789012345678901234L\n6"))
Guido van Rossume2b70bc2006-08-18 22:13:04 +0000773 123456789012345678901234
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000774 """
775
776 s = read_stringnl(f, decode=False, stripquotes=False)
Mark Dickinson8dd05142009-01-20 20:43:58 +0000777 if s[-1:] == b'L':
778 s = s[:-1]
Guido van Rossume2a383d2007-01-15 16:59:06 +0000779 return int(s)
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000780
781
782decimalnl_short = ArgumentDescriptor(
783 name='decimalnl_short',
784 n=UP_TO_NEWLINE,
785 reader=read_decimalnl_short,
786 doc="""A newline-terminated decimal integer literal.
787
788 This never has a trailing 'L', and the integer fit
789 in a short Python int on the box where the pickle
790 was written -- but there's no guarantee it will fit
791 in a short Python int on the box where the pickle
792 is read.
793 """)
794
795decimalnl_long = ArgumentDescriptor(
796 name='decimalnl_long',
797 n=UP_TO_NEWLINE,
798 reader=read_decimalnl_long,
799 doc="""A newline-terminated decimal integer literal.
800
801 This has a trailing 'L', and can represent integers
802 of any size.
803 """)
804
805
806def read_floatnl(f):
Tim Peters55762f52003-01-28 16:01:25 +0000807 r"""
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000808 >>> import io
809 >>> read_floatnl(io.BytesIO(b"-1.25\n6"))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000810 -1.25
811 """
812 s = read_stringnl(f, decode=False, stripquotes=False)
813 return float(s)
814
815floatnl = ArgumentDescriptor(
816 name='floatnl',
817 n=UP_TO_NEWLINE,
818 reader=read_floatnl,
819 doc="""A newline-terminated decimal floating literal.
820
821 In general this requires 17 significant digits for roundtrip
822 identity, and pickling then unpickling infinities, NaNs, and
823 minus zero doesn't work across boxes, or on some boxes even
824 on itself (e.g., Windows can't read the strings it produces
825 for infinities or NaNs).
826 """)
827
828def read_float8(f):
Tim Peters55762f52003-01-28 16:01:25 +0000829 r"""
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000830 >>> import io, struct
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000831 >>> raw = struct.pack(">d", -1.25)
832 >>> raw
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000833 b'\xbf\xf4\x00\x00\x00\x00\x00\x00'
834 >>> read_float8(io.BytesIO(raw + b"\n"))
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000835 -1.25
836 """
837
838 data = f.read(8)
839 if len(data) == 8:
840 return _unpack(">d", data)[0]
841 raise ValueError("not enough data in stream to read float8")
842
843
844float8 = ArgumentDescriptor(
845 name='float8',
846 n=8,
847 reader=read_float8,
848 doc="""An 8-byte binary representation of a float, big-endian.
849
850 The format is unique to Python, and shared with the struct
Guido van Rossum99603b02007-07-20 00:22:32 +0000851 module (format string '>d') "in theory" (the struct and pickle
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000852 implementations don't share the code -- they should). It's
853 strongly related to the IEEE-754 double format, and, in normal
854 cases, is in fact identical to the big-endian 754 double format.
855 On other boxes the dynamic range is limited to that of a 754
856 double, and "add a half and chop" rounding is used to reduce
857 the precision to 53 bits. However, even on a 754 box,
858 infinities, NaNs, and minus zero may not be handled correctly
859 (may not survive roundtrip pickling intact).
860 """)
861
Guido van Rossum5a2d8f52003-01-27 21:44:25 +0000862# Protocol 2 formats
863
Tim Petersc0c12b52003-01-29 00:56:17 +0000864from pickle import decode_long
Guido van Rossum5a2d8f52003-01-27 21:44:25 +0000865
866def read_long1(f):
867 r"""
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000868 >>> import io
869 >>> read_long1(io.BytesIO(b"\x00"))
Guido van Rossume2b70bc2006-08-18 22:13:04 +0000870 0
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000871 >>> read_long1(io.BytesIO(b"\x02\xff\x00"))
Guido van Rossume2b70bc2006-08-18 22:13:04 +0000872 255
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000873 >>> read_long1(io.BytesIO(b"\x02\xff\x7f"))
Guido van Rossume2b70bc2006-08-18 22:13:04 +0000874 32767
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000875 >>> read_long1(io.BytesIO(b"\x02\x00\xff"))
Guido van Rossume2b70bc2006-08-18 22:13:04 +0000876 -256
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000877 >>> read_long1(io.BytesIO(b"\x02\x00\x80"))
Guido van Rossume2b70bc2006-08-18 22:13:04 +0000878 -32768
Guido van Rossum5a2d8f52003-01-27 21:44:25 +0000879 """
880
881 n = read_uint1(f)
882 data = f.read(n)
883 if len(data) != n:
884 raise ValueError("not enough data in stream to read long1")
885 return decode_long(data)
886
887long1 = ArgumentDescriptor(
888 name="long1",
Tim Petersfdb8cfa2003-01-28 00:13:19 +0000889 n=TAKEN_FROM_ARGUMENT1,
Guido van Rossum5a2d8f52003-01-27 21:44:25 +0000890 reader=read_long1,
891 doc="""A binary long, little-endian, using 1-byte size.
892
893 This first reads one byte as an unsigned size, then reads that
Tim Petersbdbe7412003-01-27 23:54:04 +0000894 many bytes and interprets them as a little-endian 2's-complement long.
Tim Peters4b23f2b2003-01-31 16:43:39 +0000895 If the size is 0, that's taken as a shortcut for the long 0L.
Guido van Rossum5a2d8f52003-01-27 21:44:25 +0000896 """)
897
Guido van Rossum5a2d8f52003-01-27 21:44:25 +0000898def read_long4(f):
899 r"""
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000900 >>> import io
901 >>> read_long4(io.BytesIO(b"\x02\x00\x00\x00\xff\x00"))
Guido van Rossume2b70bc2006-08-18 22:13:04 +0000902 255
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000903 >>> read_long4(io.BytesIO(b"\x02\x00\x00\x00\xff\x7f"))
Guido van Rossume2b70bc2006-08-18 22:13:04 +0000904 32767
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000905 >>> read_long4(io.BytesIO(b"\x02\x00\x00\x00\x00\xff"))
Guido van Rossume2b70bc2006-08-18 22:13:04 +0000906 -256
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000907 >>> read_long4(io.BytesIO(b"\x02\x00\x00\x00\x00\x80"))
Guido van Rossume2b70bc2006-08-18 22:13:04 +0000908 -32768
Guido van Rossumcfe5f202007-05-08 21:26:54 +0000909 >>> read_long1(io.BytesIO(b"\x00\x00\x00\x00"))
Guido van Rossume2b70bc2006-08-18 22:13:04 +0000910 0
Guido van Rossum5a2d8f52003-01-27 21:44:25 +0000911 """
912
913 n = read_int4(f)
914 if n < 0:
Neal Norwitz784a3f52003-01-28 00:20:41 +0000915 raise ValueError("long4 byte count < 0: %d" % n)
Guido van Rossum5a2d8f52003-01-27 21:44:25 +0000916 data = f.read(n)
917 if len(data) != n:
Neal Norwitz784a3f52003-01-28 00:20:41 +0000918 raise ValueError("not enough data in stream to read long4")
Guido van Rossum5a2d8f52003-01-27 21:44:25 +0000919 return decode_long(data)
920
921long4 = ArgumentDescriptor(
922 name="long4",
Tim Petersfdb8cfa2003-01-28 00:13:19 +0000923 n=TAKEN_FROM_ARGUMENT4,
Guido van Rossum5a2d8f52003-01-27 21:44:25 +0000924 reader=read_long4,
925 doc="""A binary representation of a long, little-endian.
926
927 This first reads four bytes as a signed size (but requires the
928 size to be >= 0), then reads that many bytes and interprets them
Tim Peters4b23f2b2003-01-31 16:43:39 +0000929 as a little-endian 2's-complement long. If the size is 0, that's taken
Guido van Rossume2a383d2007-01-15 16:59:06 +0000930 as a shortcut for the int 0, although LONG1 should really be used
Tim Peters4b23f2b2003-01-31 16:43:39 +0000931 then instead (and in any case where # of bytes < 256).
Guido van Rossum5a2d8f52003-01-27 21:44:25 +0000932 """)
933
934
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000935##############################################################################
936# Object descriptors. The stack used by the pickle machine holds objects,
937# and in the stack_before and stack_after attributes of OpcodeInfo
938# descriptors we need names to describe the various types of objects that can
939# appear on the stack.
940
941class StackObject(object):
942 __slots__ = (
943 # name of descriptor record, for info only
944 'name',
945
946 # type of object, or tuple of type objects (meaning the object can
947 # be of any type in the tuple)
948 'obtype',
949
950 # human-readable docs for this kind of stack object; a string
951 'doc',
952 )
953
954 def __init__(self, name, obtype, doc):
Guido van Rossum3172c5d2007-10-16 18:12:55 +0000955 assert isinstance(name, str)
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000956 self.name = name
957
958 assert isinstance(obtype, type) or isinstance(obtype, tuple)
959 if isinstance(obtype, tuple):
960 for contained in obtype:
961 assert isinstance(contained, type)
962 self.obtype = obtype
963
Guido van Rossum3172c5d2007-10-16 18:12:55 +0000964 assert isinstance(doc, str)
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000965 self.doc = doc
966
Tim Petersc1c2b3e2003-01-29 20:12:21 +0000967 def __repr__(self):
968 return self.name
969
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000970
971pyint = StackObject(
972 name='int',
973 obtype=int,
974 doc="A short (as opposed to long) Python integer object.")
975
976pylong = StackObject(
977 name='long',
Guido van Rossume2a383d2007-01-15 16:59:06 +0000978 obtype=int,
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000979 doc="A long (as opposed to short) Python integer object.")
980
981pyinteger_or_bool = StackObject(
982 name='int_or_bool',
Florent Xicluna02ea12b22010-07-28 16:39:41 +0000983 obtype=(int, bool),
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000984 doc="A Python integer object (short or long), or "
985 "a Python bool.")
986
Guido van Rossum5a2d8f52003-01-27 21:44:25 +0000987pybool = StackObject(
988 name='bool',
989 obtype=(bool,),
990 doc="A Python bool object.")
991
Tim Peters8ecfc8e2003-01-27 18:51:48 +0000992pyfloat = StackObject(
993 name='float',
994 obtype=float,
995 doc="A Python float object.")
996
997pystring = StackObject(
Guido van Rossumf4169812008-03-17 22:56:06 +0000998 name='string',
999 obtype=bytes,
1000 doc="A Python (8-bit) string object.")
1001
1002pybytes = StackObject(
Guido van Rossum98297ee2007-11-06 21:34:58 +00001003 name='bytes',
1004 obtype=bytes,
1005 doc="A Python bytes object.")
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001006
1007pyunicode = StackObject(
Guido van Rossum98297ee2007-11-06 21:34:58 +00001008 name='str',
Guido van Rossumef87d6e2007-05-02 19:09:54 +00001009 obtype=str,
Guido van Rossumf4169812008-03-17 22:56:06 +00001010 doc="A Python (Unicode) string object.")
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001011
1012pynone = StackObject(
1013 name="None",
1014 obtype=type(None),
1015 doc="The Python None object.")
1016
1017pytuple = StackObject(
1018 name="tuple",
1019 obtype=tuple,
1020 doc="A Python tuple object.")
1021
1022pylist = StackObject(
1023 name="list",
1024 obtype=list,
1025 doc="A Python list object.")
1026
1027pydict = StackObject(
1028 name="dict",
1029 obtype=dict,
1030 doc="A Python dict object.")
1031
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01001032pyset = StackObject(
1033 name="set",
1034 obtype=set,
1035 doc="A Python set object.")
1036
1037pyfrozenset = StackObject(
1038 name="frozenset",
1039 obtype=set,
1040 doc="A Python frozenset object.")
1041
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001042anyobject = StackObject(
1043 name='any',
1044 obtype=object,
1045 doc="Any kind of object whatsoever.")
1046
1047markobject = StackObject(
1048 name="mark",
1049 obtype=StackObject,
1050 doc="""'The mark' is a unique object.
1051
1052 Opcodes that operate on a variable number of objects
1053 generally don't embed the count of objects in the opcode,
1054 or pull it off the stack. Instead the MARK opcode is used
1055 to push a special marker object on the stack, and then
1056 some other opcodes grab all the objects from the top of
1057 the stack down to (but not including) the topmost marker
1058 object.
1059 """)
1060
1061stackslice = StackObject(
1062 name="stackslice",
1063 obtype=StackObject,
1064 doc="""An object representing a contiguous slice of the stack.
1065
Ezio Melotti30b9d5d2013-08-17 15:50:46 +03001066 This is used in conjunction with markobject, to represent all
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001067 of the stack following the topmost markobject. For example,
1068 the POP_MARK opcode changes the stack from
1069
1070 [..., markobject, stackslice]
1071 to
1072 [...]
1073
1074 No matter how many object are on the stack after the topmost
1075 markobject, POP_MARK gets rid of all of them (including the
1076 topmost markobject too).
1077 """)
1078
1079##############################################################################
1080# Descriptors for pickle opcodes.
1081
1082class OpcodeInfo(object):
1083
1084 __slots__ = (
1085 # symbolic name of opcode; a string
1086 'name',
1087
1088 # the code used in a bytestream to represent the opcode; a
1089 # one-character string
1090 'code',
1091
1092 # If the opcode has an argument embedded in the byte string, an
1093 # instance of ArgumentDescriptor specifying its type. Note that
1094 # arg.reader(s) can be used to read and decode the argument from
1095 # the bytestream s, and arg.doc documents the format of the raw
1096 # argument bytes. If the opcode doesn't have an argument embedded
1097 # in the bytestream, arg should be None.
1098 'arg',
1099
1100 # what the stack looks like before this opcode runs; a list
1101 'stack_before',
1102
1103 # what the stack looks like after this opcode runs; a list
1104 'stack_after',
1105
1106 # the protocol number in which this opcode was introduced; an int
1107 'proto',
1108
1109 # human-readable docs for this opcode; a string
1110 'doc',
1111 )
1112
1113 def __init__(self, name, code, arg,
1114 stack_before, stack_after, proto, doc):
Guido van Rossum3172c5d2007-10-16 18:12:55 +00001115 assert isinstance(name, str)
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001116 self.name = name
1117
Guido van Rossum3172c5d2007-10-16 18:12:55 +00001118 assert isinstance(code, str)
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001119 assert len(code) == 1
1120 self.code = code
1121
1122 assert arg is None or isinstance(arg, ArgumentDescriptor)
1123 self.arg = arg
1124
1125 assert isinstance(stack_before, list)
1126 for x in stack_before:
1127 assert isinstance(x, StackObject)
1128 self.stack_before = stack_before
1129
1130 assert isinstance(stack_after, list)
1131 for x in stack_after:
1132 assert isinstance(x, StackObject)
1133 self.stack_after = stack_after
1134
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -07001135 assert isinstance(proto, int) and 0 <= proto <= pickle.HIGHEST_PROTOCOL
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001136 self.proto = proto
1137
Guido van Rossum3172c5d2007-10-16 18:12:55 +00001138 assert isinstance(doc, str)
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001139 self.doc = doc
1140
1141I = OpcodeInfo
1142opcodes = [
1143
1144 # Ways to spell integers.
1145
1146 I(name='INT',
1147 code='I',
1148 arg=decimalnl_short,
1149 stack_before=[],
1150 stack_after=[pyinteger_or_bool],
1151 proto=0,
1152 doc="""Push an integer or bool.
1153
1154 The argument is a newline-terminated decimal literal string.
1155
1156 The intent may have been that this always fit in a short Python int,
1157 but INT can be generated in pickles written on a 64-bit box that
1158 require a Python long on a 32-bit box. The difference between this
1159 and LONG then is that INT skips a trailing 'L', and produces a short
1160 int whenever possible.
1161
1162 Another difference is due to that, when bool was introduced as a
1163 distinct type in 2.3, builtin names True and False were also added to
1164 2.2.2, mapping to ints 1 and 0. For compatibility in both directions,
1165 True gets pickled as INT + "I01\\n", and False as INT + "I00\\n".
1166 Leading zeroes are never produced for a genuine integer. The 2.3
1167 (and later) unpicklers special-case these and return bool instead;
1168 earlier unpicklers ignore the leading "0" and return the int.
1169 """),
1170
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001171 I(name='BININT',
1172 code='J',
1173 arg=int4,
1174 stack_before=[],
1175 stack_after=[pyint],
1176 proto=1,
1177 doc="""Push a four-byte signed integer.
1178
1179 This handles the full range of Python (short) integers on a 32-bit
1180 box, directly as binary bytes (1 for the opcode and 4 for the integer).
1181 If the integer is non-negative and fits in 1 or 2 bytes, pickling via
1182 BININT1 or BININT2 saves space.
1183 """),
1184
1185 I(name='BININT1',
1186 code='K',
1187 arg=uint1,
1188 stack_before=[],
1189 stack_after=[pyint],
1190 proto=1,
1191 doc="""Push a one-byte unsigned integer.
1192
1193 This is a space optimization for pickling very small non-negative ints,
1194 in range(256).
1195 """),
1196
1197 I(name='BININT2',
1198 code='M',
1199 arg=uint2,
1200 stack_before=[],
1201 stack_after=[pyint],
1202 proto=1,
1203 doc="""Push a two-byte unsigned integer.
1204
1205 This is a space optimization for pickling small positive ints, in
1206 range(256, 2**16). Integers in range(256) can also be pickled via
1207 BININT2, but BININT1 instead saves a byte.
1208 """),
1209
Tim Petersfdc03462003-01-28 04:56:33 +00001210 I(name='LONG',
1211 code='L',
1212 arg=decimalnl_long,
1213 stack_before=[],
1214 stack_after=[pylong],
1215 proto=0,
1216 doc="""Push a long integer.
1217
1218 The same as INT, except that the literal ends with 'L', and always
1219 unpickles to a Python long. There doesn't seem a real purpose to the
1220 trailing 'L'.
1221
1222 Note that LONG takes time quadratic in the number of digits when
1223 unpickling (this is simply due to the nature of decimal->binary
1224 conversion). Proto 2 added linear-time (in C; still quadratic-time
1225 in Python) LONG1 and LONG4 opcodes.
1226 """),
1227
1228 I(name="LONG1",
1229 code='\x8a',
1230 arg=long1,
1231 stack_before=[],
1232 stack_after=[pylong],
1233 proto=2,
1234 doc="""Long integer using one-byte length.
1235
1236 A more efficient encoding of a Python long; the long1 encoding
1237 says it all."""),
1238
1239 I(name="LONG4",
1240 code='\x8b',
1241 arg=long4,
1242 stack_before=[],
1243 stack_after=[pylong],
1244 proto=2,
1245 doc="""Long integer using found-byte length.
1246
1247 A more efficient encoding of a Python long; the long4 encoding
1248 says it all."""),
1249
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001250 # Ways to spell strings (8-bit, not Unicode).
1251
1252 I(name='STRING',
1253 code='S',
1254 arg=stringnl,
1255 stack_before=[],
1256 stack_after=[pystring],
1257 proto=0,
1258 doc="""Push a Python string object.
1259
1260 The argument is a repr-style string, with bracketing quote characters,
1261 and perhaps embedded escapes. The argument extends until the next
Guido van Rossumf4169812008-03-17 22:56:06 +00001262 newline character. (Actually, they are decoded into a str instance
1263 using the encoding given to the Unpickler constructor. or the default,
1264 'ASCII'.)
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001265 """),
1266
1267 I(name='BINSTRING',
1268 code='T',
1269 arg=string4,
1270 stack_before=[],
1271 stack_after=[pystring],
1272 proto=1,
1273 doc="""Push a Python string object.
1274
1275 There are two arguments: the first is a 4-byte little-endian signed int
1276 giving the number of bytes in the string, and the second is that many
Guido van Rossumf4169812008-03-17 22:56:06 +00001277 bytes, which are taken literally as the string content. (Actually,
1278 they are decoded into a str instance using the encoding given to the
1279 Unpickler constructor. or the default, 'ASCII'.)
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001280 """),
1281
1282 I(name='SHORT_BINSTRING',
1283 code='U',
1284 arg=string1,
1285 stack_before=[],
1286 stack_after=[pystring],
1287 proto=1,
1288 doc="""Push a Python string object.
1289
1290 There are two arguments: the first is a 1-byte unsigned int giving
1291 the number of bytes in the string, and the second is that many bytes,
Guido van Rossumf4169812008-03-17 22:56:06 +00001292 which are taken literally as the string content. (Actually, they
1293 are decoded into a str instance using the encoding given to the
1294 Unpickler constructor. or the default, 'ASCII'.)
1295 """),
1296
1297 # Bytes (protocol 3 only; older protocols don't support bytes at all)
1298
1299 I(name='BINBYTES',
1300 code='B',
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -07001301 arg=bytes4,
Guido van Rossumf4169812008-03-17 22:56:06 +00001302 stack_before=[],
1303 stack_after=[pybytes],
1304 proto=3,
1305 doc="""Push a Python bytes object.
1306
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -07001307 There are two arguments: the first is a 4-byte little-endian unsigned int
1308 giving the number of bytes, and the second is that many bytes, which are
1309 taken literally as the bytes content.
Guido van Rossumf4169812008-03-17 22:56:06 +00001310 """),
1311
1312 I(name='SHORT_BINBYTES',
1313 code='C',
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -07001314 arg=bytes1,
Guido van Rossumf4169812008-03-17 22:56:06 +00001315 stack_before=[],
1316 stack_after=[pybytes],
Collin Wintere61d4372009-05-20 17:46:47 +00001317 proto=3,
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -07001318 doc="""Push a Python bytes object.
Guido van Rossumf4169812008-03-17 22:56:06 +00001319
1320 There are two arguments: the first is a 1-byte unsigned int giving
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -07001321 the number of bytes, and the second is that many bytes, which are taken
1322 literally as the string content.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001323 """),
1324
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01001325 I(name='BINBYTES8',
1326 code='\x8e',
1327 arg=bytes8,
1328 stack_before=[],
1329 stack_after=[pybytes],
1330 proto=4,
1331 doc="""Push a Python bytes object.
1332
1333 There are two arguments: the first is a 8-byte unsigned int giving
1334 the number of bytes in the string, and the second is that many bytes,
1335 which are taken literally as the string content.
1336 """),
1337
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001338 # Ways to spell None.
1339
1340 I(name='NONE',
1341 code='N',
1342 arg=None,
1343 stack_before=[],
1344 stack_after=[pynone],
1345 proto=0,
1346 doc="Push None on the stack."),
1347
Tim Petersfdc03462003-01-28 04:56:33 +00001348 # Ways to spell bools, starting with proto 2. See INT for how this was
1349 # done before proto 2.
1350
1351 I(name='NEWTRUE',
1352 code='\x88',
1353 arg=None,
1354 stack_before=[],
1355 stack_after=[pybool],
1356 proto=2,
1357 doc="""True.
1358
1359 Push True onto the stack."""),
1360
1361 I(name='NEWFALSE',
1362 code='\x89',
1363 arg=None,
1364 stack_before=[],
1365 stack_after=[pybool],
1366 proto=2,
1367 doc="""True.
1368
1369 Push False onto the stack."""),
1370
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001371 # Ways to spell Unicode strings.
1372
1373 I(name='UNICODE',
1374 code='V',
1375 arg=unicodestringnl,
1376 stack_before=[],
1377 stack_after=[pyunicode],
1378 proto=0, # this may be pure-text, but it's a later addition
1379 doc="""Push a Python Unicode string object.
1380
1381 The argument is a raw-unicode-escape encoding of a Unicode string,
1382 and so may contain embedded escape sequences. The argument extends
1383 until the next newline character.
1384 """),
1385
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01001386 I(name='SHORT_BINUNICODE',
1387 code='\x8c',
1388 arg=unicodestring1,
1389 stack_before=[],
1390 stack_after=[pyunicode],
1391 proto=4,
1392 doc="""Push a Python Unicode string object.
1393
1394 There are two arguments: the first is a 1-byte little-endian signed int
1395 giving the number of bytes in the string. The second is that many
1396 bytes, and is the UTF-8 encoding of the Unicode string.
1397 """),
1398
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001399 I(name='BINUNICODE',
1400 code='X',
1401 arg=unicodestring4,
1402 stack_before=[],
1403 stack_after=[pyunicode],
1404 proto=1,
1405 doc="""Push a Python Unicode string object.
1406
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -07001407 There are two arguments: the first is a 4-byte little-endian unsigned int
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001408 giving the number of bytes in the string. The second is that many
1409 bytes, and is the UTF-8 encoding of the Unicode string.
1410 """),
1411
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01001412 I(name='BINUNICODE8',
1413 code='\x8d',
1414 arg=unicodestring8,
1415 stack_before=[],
1416 stack_after=[pyunicode],
1417 proto=4,
1418 doc="""Push a Python Unicode string object.
1419
1420 There are two arguments: the first is a 8-byte little-endian signed int
1421 giving the number of bytes in the string. The second is that many
1422 bytes, and is the UTF-8 encoding of the Unicode string.
1423 """),
1424
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001425 # Ways to spell floats.
1426
1427 I(name='FLOAT',
1428 code='F',
1429 arg=floatnl,
1430 stack_before=[],
1431 stack_after=[pyfloat],
1432 proto=0,
1433 doc="""Newline-terminated decimal float literal.
1434
1435 The argument is repr(a_float), and in general requires 17 significant
1436 digits for roundtrip conversion to be an identity (this is so for
1437 IEEE-754 double precision values, which is what Python float maps to
1438 on most boxes).
1439
1440 In general, FLOAT cannot be used to transport infinities, NaNs, or
1441 minus zero across boxes (or even on a single box, if the platform C
1442 library can't read the strings it produces for such things -- Windows
1443 is like that), but may do less damage than BINFLOAT on boxes with
1444 greater precision or dynamic range than IEEE-754 double.
1445 """),
1446
1447 I(name='BINFLOAT',
1448 code='G',
1449 arg=float8,
1450 stack_before=[],
1451 stack_after=[pyfloat],
1452 proto=1,
1453 doc="""Float stored in binary form, with 8 bytes of data.
1454
1455 This generally requires less than half the space of FLOAT encoding.
1456 In general, BINFLOAT cannot be used to transport infinities, NaNs, or
1457 minus zero, raises an exception if the exponent exceeds the range of
1458 an IEEE-754 double, and retains no more than 53 bits of precision (if
1459 there are more than that, "add a half and chop" rounding is used to
1460 cut it back to 53 significant bits).
1461 """),
1462
1463 # Ways to build lists.
1464
1465 I(name='EMPTY_LIST',
1466 code=']',
1467 arg=None,
1468 stack_before=[],
1469 stack_after=[pylist],
1470 proto=1,
1471 doc="Push an empty list."),
1472
1473 I(name='APPEND',
1474 code='a',
1475 arg=None,
1476 stack_before=[pylist, anyobject],
1477 stack_after=[pylist],
1478 proto=0,
1479 doc="""Append an object to a list.
1480
1481 Stack before: ... pylist anyobject
1482 Stack after: ... pylist+[anyobject]
Tim Peters81098ac2003-01-28 05:12:08 +00001483
1484 although pylist is really extended in-place.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001485 """),
1486
1487 I(name='APPENDS',
1488 code='e',
1489 arg=None,
1490 stack_before=[pylist, markobject, stackslice],
1491 stack_after=[pylist],
1492 proto=1,
1493 doc="""Extend a list by a slice of stack objects.
1494
1495 Stack before: ... pylist markobject stackslice
1496 Stack after: ... pylist+stackslice
Tim Peters81098ac2003-01-28 05:12:08 +00001497
1498 although pylist is really extended in-place.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001499 """),
1500
1501 I(name='LIST',
1502 code='l',
1503 arg=None,
1504 stack_before=[markobject, stackslice],
1505 stack_after=[pylist],
1506 proto=0,
1507 doc="""Build a list out of the topmost stack slice, after markobject.
1508
1509 All the stack entries following the topmost markobject are placed into
1510 a single Python list, which single list object replaces all of the
1511 stack from the topmost markobject onward. For example,
1512
1513 Stack before: ... markobject 1 2 3 'abc'
1514 Stack after: ... [1, 2, 3, 'abc']
1515 """),
1516
1517 # Ways to build tuples.
1518
1519 I(name='EMPTY_TUPLE',
1520 code=')',
1521 arg=None,
1522 stack_before=[],
1523 stack_after=[pytuple],
1524 proto=1,
1525 doc="Push an empty tuple."),
1526
1527 I(name='TUPLE',
1528 code='t',
1529 arg=None,
1530 stack_before=[markobject, stackslice],
1531 stack_after=[pytuple],
1532 proto=0,
1533 doc="""Build a tuple out of the topmost stack slice, after markobject.
1534
1535 All the stack entries following the topmost markobject are placed into
1536 a single Python tuple, which single tuple object replaces all of the
1537 stack from the topmost markobject onward. For example,
1538
1539 Stack before: ... markobject 1 2 3 'abc'
1540 Stack after: ... (1, 2, 3, 'abc')
1541 """),
1542
Tim Petersfdc03462003-01-28 04:56:33 +00001543 I(name='TUPLE1',
1544 code='\x85',
1545 arg=None,
1546 stack_before=[anyobject],
1547 stack_after=[pytuple],
1548 proto=2,
Alexander Belopolsky44c2ffd2010-07-16 14:39:45 +00001549 doc="""Build a one-tuple out of the topmost item on the stack.
Tim Petersfdc03462003-01-28 04:56:33 +00001550
1551 This code pops one value off the stack and pushes a tuple of
Alexander Belopolsky44c2ffd2010-07-16 14:39:45 +00001552 length 1 whose one item is that value back onto it. In other
1553 words:
Tim Petersfdc03462003-01-28 04:56:33 +00001554
1555 stack[-1] = tuple(stack[-1:])
1556 """),
1557
1558 I(name='TUPLE2',
1559 code='\x86',
1560 arg=None,
1561 stack_before=[anyobject, anyobject],
1562 stack_after=[pytuple],
1563 proto=2,
Alexander Belopolsky44c2ffd2010-07-16 14:39:45 +00001564 doc="""Build a two-tuple out of the top two items on the stack.
Tim Petersfdc03462003-01-28 04:56:33 +00001565
Alexander Belopolsky44c2ffd2010-07-16 14:39:45 +00001566 This code pops two values off the stack and pushes a tuple of
1567 length 2 whose items are those values back onto it. In other
1568 words:
Tim Petersfdc03462003-01-28 04:56:33 +00001569
1570 stack[-2:] = [tuple(stack[-2:])]
1571 """),
1572
1573 I(name='TUPLE3',
1574 code='\x87',
1575 arg=None,
1576 stack_before=[anyobject, anyobject, anyobject],
1577 stack_after=[pytuple],
1578 proto=2,
Alexander Belopolsky44c2ffd2010-07-16 14:39:45 +00001579 doc="""Build a three-tuple out of the top three items on the stack.
Tim Petersfdc03462003-01-28 04:56:33 +00001580
Alexander Belopolsky44c2ffd2010-07-16 14:39:45 +00001581 This code pops three values off the stack and pushes a tuple of
1582 length 3 whose items are those values back onto it. In other
1583 words:
Tim Petersfdc03462003-01-28 04:56:33 +00001584
1585 stack[-3:] = [tuple(stack[-3:])]
1586 """),
1587
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001588 # Ways to build dicts.
1589
1590 I(name='EMPTY_DICT',
1591 code='}',
1592 arg=None,
1593 stack_before=[],
1594 stack_after=[pydict],
1595 proto=1,
1596 doc="Push an empty dict."),
1597
1598 I(name='DICT',
1599 code='d',
1600 arg=None,
1601 stack_before=[markobject, stackslice],
1602 stack_after=[pydict],
1603 proto=0,
1604 doc="""Build a dict out of the topmost stack slice, after markobject.
1605
1606 All the stack entries following the topmost markobject are placed into
1607 a single Python dict, which single dict object replaces all of the
1608 stack from the topmost markobject onward. The stack slice alternates
1609 key, value, key, value, .... For example,
1610
1611 Stack before: ... markobject 1 2 3 'abc'
1612 Stack after: ... {1: 2, 3: 'abc'}
1613 """),
1614
1615 I(name='SETITEM',
1616 code='s',
1617 arg=None,
1618 stack_before=[pydict, anyobject, anyobject],
1619 stack_after=[pydict],
1620 proto=0,
1621 doc="""Add a key+value pair to an existing dict.
1622
1623 Stack before: ... pydict key value
1624 Stack after: ... pydict
1625
1626 where pydict has been modified via pydict[key] = value.
1627 """),
1628
1629 I(name='SETITEMS',
1630 code='u',
1631 arg=None,
1632 stack_before=[pydict, markobject, stackslice],
1633 stack_after=[pydict],
1634 proto=1,
1635 doc="""Add an arbitrary number of key+value pairs to an existing dict.
1636
1637 The slice of the stack following the topmost markobject is taken as
1638 an alternating sequence of keys and values, added to the dict
1639 immediately under the topmost markobject. Everything at and after the
1640 topmost markobject is popped, leaving the mutated dict at the top
1641 of the stack.
1642
1643 Stack before: ... pydict markobject key_1 value_1 ... key_n value_n
1644 Stack after: ... pydict
1645
1646 where pydict has been modified via pydict[key_i] = value_i for i in
1647 1, 2, ..., n, and in that order.
1648 """),
1649
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01001650 # Ways to build sets
1651
1652 I(name='EMPTY_SET',
1653 code='\x8f',
1654 arg=None,
1655 stack_before=[],
1656 stack_after=[pyset],
1657 proto=4,
1658 doc="Push an empty set."),
1659
1660 I(name='ADDITEMS',
1661 code='\x90',
1662 arg=None,
1663 stack_before=[pyset, markobject, stackslice],
1664 stack_after=[pyset],
1665 proto=4,
1666 doc="""Add an arbitrary number of items to an existing set.
1667
1668 The slice of the stack following the topmost markobject is taken as
1669 a sequence of items, added to the set immediately under the topmost
1670 markobject. Everything at and after the topmost markobject is popped,
1671 leaving the mutated set at the top of the stack.
1672
1673 Stack before: ... pyset markobject item_1 ... item_n
1674 Stack after: ... pyset
1675
1676 where pyset has been modified via pyset.add(item_i) = item_i for i in
1677 1, 2, ..., n, and in that order.
1678 """),
1679
1680 # Way to build frozensets
1681
1682 I(name='FROZENSET',
1683 code='\x91',
1684 arg=None,
1685 stack_before=[markobject, stackslice],
1686 stack_after=[pyfrozenset],
1687 proto=4,
1688 doc="""Build a frozenset out of the topmost slice, after markobject.
1689
1690 All the stack entries following the topmost markobject are placed into
1691 a single Python frozenset, which single frozenset object replaces all
1692 of the stack from the topmost markobject onward. For example,
1693
1694 Stack before: ... markobject 1 2 3
1695 Stack after: ... frozenset({1, 2, 3})
1696 """),
1697
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001698 # Stack manipulation.
1699
1700 I(name='POP',
1701 code='0',
1702 arg=None,
1703 stack_before=[anyobject],
1704 stack_after=[],
1705 proto=0,
1706 doc="Discard the top stack item, shrinking the stack by one item."),
1707
1708 I(name='DUP',
1709 code='2',
1710 arg=None,
1711 stack_before=[anyobject],
1712 stack_after=[anyobject, anyobject],
1713 proto=0,
1714 doc="Push the top stack item onto the stack again, duplicating it."),
1715
1716 I(name='MARK',
1717 code='(',
1718 arg=None,
1719 stack_before=[],
1720 stack_after=[markobject],
1721 proto=0,
1722 doc="""Push markobject onto the stack.
1723
1724 markobject is a unique object, used by other opcodes to identify a
1725 region of the stack containing a variable number of objects for them
1726 to work on. See markobject.doc for more detail.
1727 """),
1728
1729 I(name='POP_MARK',
1730 code='1',
1731 arg=None,
1732 stack_before=[markobject, stackslice],
1733 stack_after=[],
Collin Wintere61d4372009-05-20 17:46:47 +00001734 proto=1,
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001735 doc="""Pop all the stack objects at and above the topmost markobject.
1736
1737 When an opcode using a variable number of stack objects is done,
1738 POP_MARK is used to remove those objects, and to remove the markobject
1739 that delimited their starting position on the stack.
1740 """),
1741
1742 # Memo manipulation. There are really only two operations (get and put),
1743 # each in all-text, "short binary", and "long binary" flavors.
1744
1745 I(name='GET',
1746 code='g',
1747 arg=decimalnl_short,
1748 stack_before=[],
1749 stack_after=[anyobject],
1750 proto=0,
1751 doc="""Read an object from the memo and push it on the stack.
1752
Ezio Melotti13925002011-03-16 11:05:33 +02001753 The index of the memo object to push is given by the newline-terminated
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001754 decimal string following. BINGET and LONG_BINGET are space-optimized
1755 versions.
1756 """),
1757
1758 I(name='BINGET',
1759 code='h',
1760 arg=uint1,
1761 stack_before=[],
1762 stack_after=[anyobject],
1763 proto=1,
1764 doc="""Read an object from the memo and push it on the stack.
1765
1766 The index of the memo object to push is given by the 1-byte unsigned
1767 integer following.
1768 """),
1769
1770 I(name='LONG_BINGET',
1771 code='j',
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -07001772 arg=uint4,
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001773 stack_before=[],
1774 stack_after=[anyobject],
1775 proto=1,
1776 doc="""Read an object from the memo and push it on the stack.
1777
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -07001778 The index of the memo object to push is given by the 4-byte unsigned
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001779 little-endian integer following.
1780 """),
1781
1782 I(name='PUT',
1783 code='p',
1784 arg=decimalnl_short,
1785 stack_before=[],
1786 stack_after=[],
1787 proto=0,
1788 doc="""Store the stack top into the memo. The stack is not popped.
1789
1790 The index of the memo location to write into is given by the newline-
1791 terminated decimal string following. BINPUT and LONG_BINPUT are
1792 space-optimized versions.
1793 """),
1794
1795 I(name='BINPUT',
1796 code='q',
1797 arg=uint1,
1798 stack_before=[],
1799 stack_after=[],
1800 proto=1,
1801 doc="""Store the stack top into the memo. The stack is not popped.
1802
1803 The index of the memo location to write into is given by the 1-byte
1804 unsigned integer following.
1805 """),
1806
1807 I(name='LONG_BINPUT',
1808 code='r',
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -07001809 arg=uint4,
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001810 stack_before=[],
1811 stack_after=[],
1812 proto=1,
1813 doc="""Store the stack top into the memo. The stack is not popped.
1814
1815 The index of the memo location to write into is given by the 4-byte
Alexandre Vassalotti8db89ca2013-04-14 03:30:35 -07001816 unsigned little-endian integer following.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001817 """),
1818
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01001819 I(name='MEMOIZE',
1820 code='\x94',
1821 arg=None,
1822 stack_before=[anyobject],
1823 stack_after=[anyobject],
1824 proto=4,
1825 doc="""Store the stack top into the memo. The stack is not popped.
1826
1827 The index of the memo location to write is the number of
1828 elements currently present in the memo.
1829 """),
1830
Tim Petersfdc03462003-01-28 04:56:33 +00001831 # Access the extension registry (predefined objects). Akin to the GET
1832 # family.
1833
1834 I(name='EXT1',
1835 code='\x82',
1836 arg=uint1,
1837 stack_before=[],
1838 stack_after=[anyobject],
1839 proto=2,
1840 doc="""Extension code.
1841
1842 This code and the similar EXT2 and EXT4 allow using a registry
1843 of popular objects that are pickled by name, typically classes.
1844 It is envisioned that through a global negotiation and
1845 registration process, third parties can set up a mapping between
1846 ints and object names.
1847
1848 In order to guarantee pickle interchangeability, the extension
1849 code registry ought to be global, although a range of codes may
1850 be reserved for private use.
1851
1852 EXT1 has a 1-byte integer argument. This is used to index into the
1853 extension registry, and the object at that index is pushed on the stack.
1854 """),
1855
1856 I(name='EXT2',
1857 code='\x83',
1858 arg=uint2,
1859 stack_before=[],
1860 stack_after=[anyobject],
1861 proto=2,
1862 doc="""Extension code.
1863
1864 See EXT1. EXT2 has a two-byte integer argument.
1865 """),
1866
1867 I(name='EXT4',
1868 code='\x84',
1869 arg=int4,
1870 stack_before=[],
1871 stack_after=[anyobject],
1872 proto=2,
1873 doc="""Extension code.
1874
1875 See EXT1. EXT4 has a four-byte integer argument.
1876 """),
1877
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001878 # Push a class object, or module function, on the stack, via its module
1879 # and name.
1880
1881 I(name='GLOBAL',
1882 code='c',
1883 arg=stringnl_noescape_pair,
1884 stack_before=[],
1885 stack_after=[anyobject],
1886 proto=0,
1887 doc="""Push a global object (module.attr) on the stack.
1888
1889 Two newline-terminated strings follow the GLOBAL opcode. The first is
1890 taken as a module name, and the second as a class name. The class
1891 object module.class is pushed on the stack. More accurately, the
1892 object returned by self.find_class(module, class) is pushed on the
1893 stack, so unpickling subclasses can override this form of lookup.
1894 """),
1895
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01001896 I(name='STACK_GLOBAL',
1897 code='\x93',
1898 arg=None,
1899 stack_before=[pyunicode, pyunicode],
1900 stack_after=[anyobject],
1901 proto=0,
1902 doc="""Push a global object (module.attr) on the stack.
1903 """),
1904
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001905 # Ways to build objects of classes pickle doesn't know about directly
1906 # (user-defined classes). I despair of documenting this accurately
1907 # and comprehensibly -- you really have to read the pickle code to
1908 # find all the special cases.
1909
1910 I(name='REDUCE',
1911 code='R',
1912 arg=None,
1913 stack_before=[anyobject, anyobject],
1914 stack_after=[anyobject],
1915 proto=0,
1916 doc="""Push an object built from a callable and an argument tuple.
1917
1918 The opcode is named to remind of the __reduce__() method.
1919
1920 Stack before: ... callable pytuple
1921 Stack after: ... callable(*pytuple)
1922
1923 The callable and the argument tuple are the first two items returned
1924 by a __reduce__ method. Applying the callable to the argtuple is
1925 supposed to reproduce the original object, or at least get it started.
1926 If the __reduce__ method returns a 3-tuple, the last component is an
1927 argument to be passed to the object's __setstate__, and then the REDUCE
1928 opcode is followed by code to create setstate's argument, and then a
1929 BUILD opcode to apply __setstate__ to that argument.
1930
Guido van Rossum13257902007-06-07 23:15:56 +00001931 If not isinstance(callable, type), REDUCE complains unless the
Alexandre Vassalottif7fa63d2008-05-11 08:55:36 +00001932 callable has been registered with the copyreg module's
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001933 safe_constructors dict, or the callable has a magic
1934 '__safe_for_unpickling__' attribute with a true value. I'm not sure
1935 why it does this, but I've sure seen this complaint often enough when
1936 I didn't want to <wink>.
1937 """),
1938
1939 I(name='BUILD',
1940 code='b',
1941 arg=None,
1942 stack_before=[anyobject, anyobject],
1943 stack_after=[anyobject],
1944 proto=0,
1945 doc="""Finish building an object, via __setstate__ or dict update.
1946
1947 Stack before: ... anyobject argument
1948 Stack after: ... anyobject
1949
1950 where anyobject may have been mutated, as follows:
1951
1952 If the object has a __setstate__ method,
1953
1954 anyobject.__setstate__(argument)
1955
1956 is called.
1957
1958 Else the argument must be a dict, the object must have a __dict__, and
1959 the object is updated via
1960
1961 anyobject.__dict__.update(argument)
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001962 """),
1963
1964 I(name='INST',
1965 code='i',
1966 arg=stringnl_noescape_pair,
1967 stack_before=[markobject, stackslice],
1968 stack_after=[anyobject],
1969 proto=0,
1970 doc="""Build a class instance.
1971
1972 This is the protocol 0 version of protocol 1's OBJ opcode.
1973 INST is followed by two newline-terminated strings, giving a
1974 module and class name, just as for the GLOBAL opcode (and see
1975 GLOBAL for more details about that). self.find_class(module, name)
1976 is used to get a class object.
1977
1978 In addition, all the objects on the stack following the topmost
1979 markobject are gathered into a tuple and popped (along with the
1980 topmost markobject), just as for the TUPLE opcode.
1981
1982 Now it gets complicated. If all of these are true:
1983
1984 + The argtuple is empty (markobject was at the top of the stack
1985 at the start).
1986
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001987 + The class object does not have a __getinitargs__ attribute.
1988
1989 then we want to create an old-style class instance without invoking
1990 its __init__() method (pickle has waffled on this over the years; not
1991 calling __init__() is current wisdom). In this case, an instance of
1992 an old-style dummy class is created, and then we try to rebind its
1993 __class__ attribute to the desired class object. If this succeeds,
Guido van Rossuma8add0e2007-05-14 22:03:55 +00001994 the new instance object is pushed on the stack, and we're done.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00001995
1996 Else (the argtuple is not empty, it's not an old-style class object,
1997 or the class object does have a __getinitargs__ attribute), the code
1998 first insists that the class object have a __safe_for_unpickling__
1999 attribute. Unlike as for the __safe_for_unpickling__ check in REDUCE,
2000 it doesn't matter whether this attribute has a true or false value, it
Guido van Rossum99603b02007-07-20 00:22:32 +00002001 only matters whether it exists (XXX this is a bug). If
2002 __safe_for_unpickling__ doesn't exist, UnpicklingError is raised.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002003
2004 Else (the class object does have a __safe_for_unpickling__ attr),
2005 the class object obtained from INST's arguments is applied to the
2006 argtuple obtained from the stack, and the resulting instance object
2007 is pushed on the stack.
Tim Peters2b93c4c2003-01-30 16:35:08 +00002008
2009 NOTE: checks for __safe_for_unpickling__ went away in Python 2.3.
Florent Xiclunaaa6c1d22011-12-12 18:54:29 +01002010 NOTE: the distinction between old-style and new-style classes does
2011 not make sense in Python 3.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002012 """),
2013
2014 I(name='OBJ',
2015 code='o',
2016 arg=None,
2017 stack_before=[markobject, anyobject, stackslice],
2018 stack_after=[anyobject],
2019 proto=1,
2020 doc="""Build a class instance.
2021
2022 This is the protocol 1 version of protocol 0's INST opcode, and is
2023 very much like it. The major difference is that the class object
2024 is taken off the stack, allowing it to be retrieved from the memo
2025 repeatedly if several instances of the same class are created. This
2026 can be much more efficient (in both time and space) than repeatedly
2027 embedding the module and class names in INST opcodes.
2028
2029 Unlike INST, OBJ takes no arguments from the opcode stream. Instead
2030 the class object is taken off the stack, immediately above the
2031 topmost markobject:
2032
2033 Stack before: ... markobject classobject stackslice
2034 Stack after: ... new_instance_object
2035
2036 As for INST, the remainder of the stack above the markobject is
2037 gathered into an argument tuple, and then the logic seems identical,
Guido van Rossumecb11042003-01-29 06:24:30 +00002038 except that no __safe_for_unpickling__ check is done (XXX this is
Guido van Rossum99603b02007-07-20 00:22:32 +00002039 a bug). See INST for the gory details.
Tim Peters2b93c4c2003-01-30 16:35:08 +00002040
2041 NOTE: In Python 2.3, INST and OBJ are identical except for how they
2042 get the class object. That was always the intent; the implementations
2043 had diverged for accidental reasons.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002044 """),
2045
Tim Petersfdc03462003-01-28 04:56:33 +00002046 I(name='NEWOBJ',
2047 code='\x81',
2048 arg=None,
2049 stack_before=[anyobject, anyobject],
2050 stack_after=[anyobject],
2051 proto=2,
2052 doc="""Build an object instance.
2053
2054 The stack before should be thought of as containing a class
2055 object followed by an argument tuple (the tuple being the stack
2056 top). Call these cls and args. They are popped off the stack,
2057 and the value returned by cls.__new__(cls, *args) is pushed back
2058 onto the stack.
2059 """),
2060
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01002061 I(name='NEWOBJ_EX',
2062 code='\x92',
2063 arg=None,
2064 stack_before=[anyobject, anyobject, anyobject],
2065 stack_after=[anyobject],
2066 proto=4,
2067 doc="""Build an object instance.
2068
2069 The stack before should be thought of as containing a class
2070 object followed by an argument tuple and by a keyword argument dict
2071 (the dict being the stack top). Call these cls and args. They are
2072 popped off the stack, and the value returned by
2073 cls.__new__(cls, *args, *kwargs) is pushed back onto the stack.
2074 """),
2075
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002076 # Machine control.
2077
Tim Petersfdc03462003-01-28 04:56:33 +00002078 I(name='PROTO',
2079 code='\x80',
2080 arg=uint1,
2081 stack_before=[],
2082 stack_after=[],
2083 proto=2,
2084 doc="""Protocol version indicator.
2085
2086 For protocol 2 and above, a pickle must start with this opcode.
2087 The argument is the protocol version, an int in range(2, 256).
2088 """),
2089
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002090 I(name='STOP',
2091 code='.',
2092 arg=None,
2093 stack_before=[anyobject],
2094 stack_after=[],
2095 proto=0,
2096 doc="""Stop the unpickling machine.
2097
2098 Every pickle ends with this opcode. The object at the top of the stack
2099 is popped, and that's the result of unpickling. The stack should be
2100 empty then.
2101 """),
2102
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01002103 # Framing support.
2104
2105 I(name='FRAME',
2106 code='\x95',
2107 arg=uint8,
2108 stack_before=[],
2109 stack_after=[],
2110 proto=4,
2111 doc="""Indicate the beginning of a new frame.
2112
2113 The unpickler may use this opcode to safely prefetch data from its
2114 underlying stream.
2115 """),
2116
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002117 # Ways to deal with persistent IDs.
2118
2119 I(name='PERSID',
2120 code='P',
2121 arg=stringnl_noescape,
2122 stack_before=[],
2123 stack_after=[anyobject],
2124 proto=0,
2125 doc="""Push an object identified by a persistent ID.
2126
2127 The pickle module doesn't define what a persistent ID means. PERSID's
2128 argument is a newline-terminated str-style (no embedded escapes, no
2129 bracketing quote characters) string, which *is* "the persistent ID".
2130 The unpickler passes this string to self.persistent_load(). Whatever
2131 object that returns is pushed on the stack. There is no implementation
2132 of persistent_load() in Python's unpickler: it must be supplied by an
2133 unpickler subclass.
2134 """),
2135
2136 I(name='BINPERSID',
2137 code='Q',
2138 arg=None,
2139 stack_before=[anyobject],
2140 stack_after=[anyobject],
2141 proto=1,
2142 doc="""Push an object identified by a persistent ID.
2143
2144 Like PERSID, except the persistent ID is popped off the stack (instead
2145 of being a string embedded in the opcode bytestream). The persistent
2146 ID is passed to self.persistent_load(), and whatever object that
2147 returns is pushed on the stack. See PERSID for more detail.
2148 """),
2149]
2150del I
2151
2152# Verify uniqueness of .name and .code members.
2153name2i = {}
2154code2i = {}
2155
2156for i, d in enumerate(opcodes):
2157 if d.name in name2i:
2158 raise ValueError("repeated name %r at indices %d and %d" %
2159 (d.name, name2i[d.name], i))
2160 if d.code in code2i:
2161 raise ValueError("repeated code %r at indices %d and %d" %
2162 (d.code, code2i[d.code], i))
2163
2164 name2i[d.name] = i
2165 code2i[d.code] = i
2166
2167del name2i, code2i, i, d
2168
2169##############################################################################
2170# Build a code2op dict, mapping opcode characters to OpcodeInfo records.
2171# Also ensure we've got the same stuff as pickle.py, although the
2172# introspection here is dicey.
2173
2174code2op = {}
2175for d in opcodes:
2176 code2op[d.code] = d
2177del d
2178
2179def assure_pickle_consistency(verbose=False):
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002180
2181 copy = code2op.copy()
2182 for name in pickle.__all__:
2183 if not re.match("[A-Z][A-Z0-9_]+$", name):
2184 if verbose:
Guido van Rossumbe19ed72007-02-09 05:37:30 +00002185 print("skipping %r: it doesn't look like an opcode name" % name)
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002186 continue
2187 picklecode = getattr(pickle, name)
Guido van Rossum617dbc42007-05-07 23:57:08 +00002188 if not isinstance(picklecode, bytes) or len(picklecode) != 1:
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002189 if verbose:
Guido van Rossumbe19ed72007-02-09 05:37:30 +00002190 print(("skipping %r: value %r doesn't look like a pickle "
2191 "code" % (name, picklecode)))
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002192 continue
Guido van Rossum617dbc42007-05-07 23:57:08 +00002193 picklecode = picklecode.decode("latin-1")
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002194 if picklecode in copy:
2195 if verbose:
Guido van Rossumbe19ed72007-02-09 05:37:30 +00002196 print("checking name %r w/ code %r for consistency" % (
2197 name, picklecode))
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002198 d = copy[picklecode]
2199 if d.name != name:
2200 raise ValueError("for pickle code %r, pickle.py uses name %r "
2201 "but we're using name %r" % (picklecode,
2202 name,
2203 d.name))
2204 # Forget this one. Any left over in copy at the end are a problem
2205 # of a different kind.
2206 del copy[picklecode]
2207 else:
2208 raise ValueError("pickle.py appears to have a pickle opcode with "
2209 "name %r and code %r, but we don't" %
2210 (name, picklecode))
2211 if copy:
2212 msg = ["we appear to have pickle opcodes that pickle.py doesn't have:"]
2213 for code, d in copy.items():
2214 msg.append(" name %r with code %r" % (d.name, code))
2215 raise ValueError("\n".join(msg))
2216
2217assure_pickle_consistency()
Tim Petersc0c12b52003-01-29 00:56:17 +00002218del assure_pickle_consistency
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002219
2220##############################################################################
2221# A pickle opcode generator.
2222
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01002223def _genops(data, yield_end_pos=False):
2224 if isinstance(data, bytes_types):
2225 data = io.BytesIO(data)
2226
2227 if hasattr(data, "tell"):
2228 getpos = data.tell
2229 else:
2230 getpos = lambda: None
2231
2232 while True:
2233 pos = getpos()
2234 code = data.read(1)
2235 opcode = code2op.get(code.decode("latin-1"))
2236 if opcode is None:
2237 if code == b"":
2238 raise ValueError("pickle exhausted before seeing STOP")
2239 else:
2240 raise ValueError("at position %s, opcode %r unknown" % (
2241 "<unknown>" if pos is None else pos,
2242 code))
2243 if opcode.arg is None:
2244 arg = None
2245 else:
2246 arg = opcode.arg.reader(data)
2247 if yield_end_pos:
2248 yield opcode, arg, pos, getpos()
2249 else:
2250 yield opcode, arg, pos
2251 if code == b'.':
2252 assert opcode.name == 'STOP'
2253 break
2254
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002255def genops(pickle):
Guido van Rossuma72ded92003-01-27 19:40:47 +00002256 """Generate all the opcodes in a pickle.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002257
2258 'pickle' is a file-like object, or string, containing the pickle.
2259
2260 Each opcode in the pickle is generated, from the current pickle position,
2261 stopping after a STOP opcode is delivered. A triple is generated for
2262 each opcode:
2263
2264 opcode, arg, pos
2265
2266 opcode is an OpcodeInfo record, describing the current opcode.
2267
2268 If the opcode has an argument embedded in the pickle, arg is its decoded
2269 value, as a Python object. If the opcode doesn't have an argument, arg
2270 is None.
2271
2272 If the pickle has a tell() method, pos was the value of pickle.tell()
Guido van Rossum34d19282007-08-09 01:03:29 +00002273 before reading the current opcode. If the pickle is a bytes object,
2274 it's wrapped in a BytesIO object, and the latter's tell() result is
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002275 used. Else (the pickle doesn't have a tell(), and it's not obvious how
2276 to query its current position) pos is None.
2277 """
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01002278 return _genops(pickle)
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002279
2280##############################################################################
Christian Heimes3feef612008-02-11 06:19:17 +00002281# A pickle optimizer.
2282
2283def optimize(p):
2284 'Optimize a pickle string by removing unused PUT opcodes'
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01002285 not_a_put = object()
2286 gets = { not_a_put } # set of args used by a GET opcode
2287 opcodes = [] # (startpos, stoppos, putid)
2288 proto = 0
2289 for opcode, arg, pos, end_pos in _genops(p, yield_end_pos=True):
Christian Heimes3feef612008-02-11 06:19:17 +00002290 if 'PUT' in opcode.name:
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01002291 opcodes.append((pos, end_pos, arg))
2292 elif 'FRAME' in opcode.name:
2293 pass
2294 else:
2295 if 'GET' in opcode.name:
2296 gets.add(arg)
2297 elif opcode.name == 'PROTO':
2298 assert pos == 0, pos
2299 proto = arg
2300 opcodes.append((pos, end_pos, not_a_put))
2301 prevpos, prevarg = pos, None
Christian Heimes3feef612008-02-11 06:19:17 +00002302
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01002303 # Copy the opcodes except for PUTS without a corresponding GET
2304 out = io.BytesIO()
2305 opcodes = iter(opcodes)
2306 if proto >= 2:
2307 # Write the PROTO header before any framing
2308 start, stop, _ = next(opcodes)
2309 out.write(p[start:stop])
2310 buf = pickle._Framer(out.write)
2311 if proto >= 4:
2312 buf.start_framing()
2313 for start, stop, putid in opcodes:
2314 if putid in gets:
2315 buf.write(p[start:stop])
2316 if proto >= 4:
2317 buf.end_framing()
2318 return out.getvalue()
Christian Heimes3feef612008-02-11 06:19:17 +00002319
2320##############################################################################
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002321# A symbolic pickle disassembler.
2322
Alexander Belopolsky929d3842010-07-17 15:51:21 +00002323def dis(pickle, out=None, memo=None, indentlevel=4, annotate=0):
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002324 """Produce a symbolic disassembly of a pickle.
2325
2326 'pickle' is a file-like object, or string, containing a (at least one)
2327 pickle. The pickle is disassembled from the current position, through
2328 the first STOP opcode encountered.
2329
2330 Optional arg 'out' is a file-like object to which the disassembly is
2331 printed. It defaults to sys.stdout.
2332
Tim Peters62235e72003-02-05 19:55:53 +00002333 Optional arg 'memo' is a Python dict, used as the pickle's memo. It
2334 may be mutated by dis(), if the pickle contains PUT or BINPUT opcodes.
2335 Passing the same memo object to another dis() call then allows disassembly
2336 to proceed across multiple pickles that were all created by the same
2337 pickler with the same memo. Ordinarily you don't need to worry about this.
2338
Alexander Belopolsky929d3842010-07-17 15:51:21 +00002339 Optional arg 'indentlevel' is the number of blanks by which to indent
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002340 a new MARK level. It defaults to 4.
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002341
Alexander Belopolsky929d3842010-07-17 15:51:21 +00002342 Optional arg 'annotate' if nonzero instructs dis() to add short
2343 description of the opcode on each line of disassembled output.
2344 The value given to 'annotate' must be an integer and is used as a
2345 hint for the column where annotation should start. The default
2346 value is 0, meaning no annotations.
2347
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002348 In addition to printing the disassembly, some sanity checks are made:
2349
2350 + All embedded opcode arguments "make sense".
2351
2352 + Explicit and implicit pop operations have enough items on the stack.
2353
2354 + When an opcode implicitly refers to a markobject, a markobject is
2355 actually on the stack.
2356
2357 + A memo entry isn't referenced before it's defined.
2358
2359 + The markobject isn't stored in the memo.
2360
2361 + A memo entry isn't redefined.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002362 """
2363
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002364 # Most of the hair here is for sanity checks, but most of it is needed
2365 # anyway to detect when a protocol 0 POP takes a MARK off the stack
2366 # (which in turn is needed to indent MARK blocks correctly).
2367
2368 stack = [] # crude emulation of unpickler stack
Tim Peters62235e72003-02-05 19:55:53 +00002369 if memo is None:
Ezio Melotti30b9d5d2013-08-17 15:50:46 +03002370 memo = {} # crude emulation of unpickler memo
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002371 maxproto = -1 # max protocol number seen
2372 markstack = [] # bytecode positions of MARK opcodes
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002373 indentchunk = ' ' * indentlevel
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002374 errormsg = None
Ezio Melotti30b9d5d2013-08-17 15:50:46 +03002375 annocol = annotate # column hint for annotations
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002376 for opcode, arg, pos in genops(pickle):
2377 if pos is not None:
Guido van Rossumbe19ed72007-02-09 05:37:30 +00002378 print("%5d:" % pos, end=' ', file=out)
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002379
Tim Petersd0f7c862003-01-28 15:27:57 +00002380 line = "%-4s %s%s" % (repr(opcode.code)[1:-1],
2381 indentchunk * len(markstack),
2382 opcode.name)
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002383
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002384 maxproto = max(maxproto, opcode.proto)
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002385 before = opcode.stack_before # don't mutate
2386 after = opcode.stack_after # don't mutate
Tim Peters43277d62003-01-30 15:02:12 +00002387 numtopop = len(before)
2388
2389 # See whether a MARK should be popped.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002390 markmsg = None
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002391 if markobject in before or (opcode.name == "POP" and
2392 stack and
2393 stack[-1] is markobject):
2394 assert markobject not in after
Tim Peters43277d62003-01-30 15:02:12 +00002395 if __debug__:
2396 if markobject in before:
2397 assert before[-1] is stackslice
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002398 if markstack:
2399 markpos = markstack.pop()
2400 if markpos is None:
2401 markmsg = "(MARK at unknown opcode offset)"
2402 else:
2403 markmsg = "(MARK at %d)" % markpos
2404 # Pop everything at and after the topmost markobject.
2405 while stack[-1] is not markobject:
2406 stack.pop()
2407 stack.pop()
Tim Peters43277d62003-01-30 15:02:12 +00002408 # Stop later code from popping too much.
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002409 try:
Tim Peters43277d62003-01-30 15:02:12 +00002410 numtopop = before.index(markobject)
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002411 except ValueError:
2412 assert opcode.name == "POP"
Tim Peters43277d62003-01-30 15:02:12 +00002413 numtopop = 0
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002414 else:
2415 errormsg = markmsg = "no MARK exists on stack"
2416
2417 # Check for correct memo usage.
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01002418 if opcode.name in ("PUT", "BINPUT", "LONG_BINPUT", "MEMOIZE"):
2419 if opcode.name == "MEMOIZE":
2420 memo_idx = len(memo)
2421 else:
2422 assert arg is not None
2423 memo_idx = arg
2424 if memo_idx in memo:
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002425 errormsg = "memo key %r already defined" % arg
2426 elif not stack:
2427 errormsg = "stack is empty -- can't store into memo"
2428 elif stack[-1] is markobject:
2429 errormsg = "can't store markobject in the memo"
2430 else:
Antoine Pitrouc9dc4a22013-11-23 18:59:12 +01002431 memo[memo_idx] = stack[-1]
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002432 elif opcode.name in ("GET", "BINGET", "LONG_BINGET"):
2433 if arg in memo:
2434 assert len(after) == 1
2435 after = [memo[arg]] # for better stack emulation
2436 else:
2437 errormsg = "memo key %r has never been stored into" % arg
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002438
2439 if arg is not None or markmsg:
2440 # make a mild effort to align arguments
2441 line += ' ' * (10 - len(opcode.name))
2442 if arg is not None:
2443 line += ' ' + repr(arg)
2444 if markmsg:
2445 line += ' ' + markmsg
Alexander Belopolsky929d3842010-07-17 15:51:21 +00002446 if annotate:
2447 line += ' ' * (annocol - len(line))
2448 # make a mild effort to align annotations
2449 annocol = len(line)
2450 if annocol > 50:
2451 annocol = annotate
2452 line += ' ' + opcode.doc.split('\n', 1)[0]
Guido van Rossumbe19ed72007-02-09 05:37:30 +00002453 print(line, file=out)
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002454
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002455 if errormsg:
2456 # Note that we delayed complaining until the offending opcode
2457 # was printed.
2458 raise ValueError(errormsg)
2459
2460 # Emulate the stack effects.
Tim Peters43277d62003-01-30 15:02:12 +00002461 if len(stack) < numtopop:
2462 raise ValueError("tries to pop %d items from stack with "
2463 "only %d items" % (numtopop, len(stack)))
2464 if numtopop:
2465 del stack[-numtopop:]
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002466 if markobject in after:
Tim Peters43277d62003-01-30 15:02:12 +00002467 assert markobject not in before
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002468 markstack.append(pos)
2469
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002470 stack.extend(after)
2471
Guido van Rossumbe19ed72007-02-09 05:37:30 +00002472 print("highest protocol among opcodes =", maxproto, file=out)
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002473 if stack:
2474 raise ValueError("stack not empty after STOP: %r" % stack)
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002475
Tim Peters90718a42005-02-15 16:22:34 +00002476# For use in the doctest, simply as an example of a class to pickle.
2477class _Example:
2478 def __init__(self, value):
2479 self.value = value
2480
Guido van Rossum03e35322003-01-28 15:37:13 +00002481_dis_test = r"""
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002482>>> import pickle
Guido van Rossumf4169812008-03-17 22:56:06 +00002483>>> x = [1, 2, (3, 4), {b'abc': "def"}]
2484>>> pkl0 = pickle.dumps(x, 0)
2485>>> dis(pkl0)
Tim Petersd0f7c862003-01-28 15:27:57 +00002486 0: ( MARK
2487 1: l LIST (MARK at 0)
2488 2: p PUT 0
Guido van Rossumf4100002007-01-15 00:21:46 +00002489 5: L LONG 1
Mark Dickinson8dd05142009-01-20 20:43:58 +00002490 9: a APPEND
2491 10: L LONG 2
2492 14: a APPEND
2493 15: ( MARK
2494 16: L LONG 3
2495 20: L LONG 4
2496 24: t TUPLE (MARK at 15)
2497 25: p PUT 1
2498 28: a APPEND
2499 29: ( MARK
2500 30: d DICT (MARK at 29)
2501 31: p PUT 2
Alexandre Vassalotti3bfc65a2011-12-13 13:08:09 -05002502 34: c GLOBAL '_codecs encode'
2503 50: p PUT 3
2504 53: ( MARK
2505 54: V UNICODE 'abc'
Antoine Pitroud9dfaa92009-06-04 20:32:06 +00002506 59: p PUT 4
Alexandre Vassalotti3bfc65a2011-12-13 13:08:09 -05002507 62: V UNICODE 'latin1'
2508 70: p PUT 5
2509 73: t TUPLE (MARK at 53)
2510 74: p PUT 6
2511 77: R REDUCE
2512 78: p PUT 7
2513 81: V UNICODE 'def'
2514 86: p PUT 8
2515 89: s SETITEM
2516 90: a APPEND
2517 91: . STOP
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002518highest protocol among opcodes = 0
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002519
2520Try again with a "binary" pickle.
2521
Guido van Rossumf4169812008-03-17 22:56:06 +00002522>>> pkl1 = pickle.dumps(x, 1)
2523>>> dis(pkl1)
Tim Petersd0f7c862003-01-28 15:27:57 +00002524 0: ] EMPTY_LIST
2525 1: q BINPUT 0
2526 3: ( MARK
2527 4: K BININT1 1
2528 6: K BININT1 2
2529 8: ( MARK
2530 9: K BININT1 3
2531 11: K BININT1 4
2532 13: t TUPLE (MARK at 8)
2533 14: q BINPUT 1
2534 16: } EMPTY_DICT
2535 17: q BINPUT 2
Alexandre Vassalotti3bfc65a2011-12-13 13:08:09 -05002536 19: c GLOBAL '_codecs encode'
2537 35: q BINPUT 3
2538 37: ( MARK
2539 38: X BINUNICODE 'abc'
2540 46: q BINPUT 4
2541 48: X BINUNICODE 'latin1'
2542 59: q BINPUT 5
2543 61: t TUPLE (MARK at 37)
2544 62: q BINPUT 6
2545 64: R REDUCE
2546 65: q BINPUT 7
2547 67: X BINUNICODE 'def'
2548 75: q BINPUT 8
2549 77: s SETITEM
2550 78: e APPENDS (MARK at 3)
2551 79: . STOP
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002552highest protocol among opcodes = 1
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002553
2554Exercise the INST/OBJ/BUILD family.
2555
Mark Dickinsoncddcf442009-01-24 21:46:33 +00002556>>> import pickletools
2557>>> dis(pickle.dumps(pickletools.dis, 0))
2558 0: c GLOBAL 'pickletools dis'
2559 17: p PUT 0
2560 20: . STOP
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002561highest protocol among opcodes = 0
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002562
Tim Peters90718a42005-02-15 16:22:34 +00002563>>> from pickletools import _Example
2564>>> x = [_Example(42)] * 2
Guido van Rossumf29d3d62003-01-27 22:47:53 +00002565>>> dis(pickle.dumps(x, 0))
Tim Petersd0f7c862003-01-28 15:27:57 +00002566 0: ( MARK
2567 1: l LIST (MARK at 0)
2568 2: p PUT 0
Antoine Pitroud9dfaa92009-06-04 20:32:06 +00002569 5: c GLOBAL 'copy_reg _reconstructor'
2570 30: p PUT 1
2571 33: ( MARK
2572 34: c GLOBAL 'pickletools _Example'
2573 56: p PUT 2
2574 59: c GLOBAL '__builtin__ object'
2575 79: p PUT 3
2576 82: N NONE
2577 83: t TUPLE (MARK at 33)
2578 84: p PUT 4
2579 87: R REDUCE
2580 88: p PUT 5
2581 91: ( MARK
2582 92: d DICT (MARK at 91)
2583 93: p PUT 6
2584 96: V UNICODE 'value'
2585 103: p PUT 7
2586 106: L LONG 42
2587 111: s SETITEM
2588 112: b BUILD
Mark Dickinson8dd05142009-01-20 20:43:58 +00002589 113: a APPEND
Antoine Pitroud9dfaa92009-06-04 20:32:06 +00002590 114: g GET 5
2591 117: a APPEND
2592 118: . STOP
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002593highest protocol among opcodes = 0
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002594
2595>>> dis(pickle.dumps(x, 1))
Tim Petersd0f7c862003-01-28 15:27:57 +00002596 0: ] EMPTY_LIST
2597 1: q BINPUT 0
2598 3: ( MARK
Antoine Pitroud9dfaa92009-06-04 20:32:06 +00002599 4: c GLOBAL 'copy_reg _reconstructor'
2600 29: q BINPUT 1
2601 31: ( MARK
2602 32: c GLOBAL 'pickletools _Example'
2603 54: q BINPUT 2
2604 56: c GLOBAL '__builtin__ object'
2605 76: q BINPUT 3
2606 78: N NONE
2607 79: t TUPLE (MARK at 31)
2608 80: q BINPUT 4
2609 82: R REDUCE
2610 83: q BINPUT 5
2611 85: } EMPTY_DICT
2612 86: q BINPUT 6
2613 88: X BINUNICODE 'value'
2614 98: q BINPUT 7
2615 100: K BININT1 42
2616 102: s SETITEM
2617 103: b BUILD
2618 104: h BINGET 5
2619 106: e APPENDS (MARK at 3)
2620 107: . STOP
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002621highest protocol among opcodes = 1
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002622
2623Try "the canonical" recursive-object test.
2624
2625>>> L = []
2626>>> T = L,
2627>>> L.append(T)
2628>>> L[0] is T
2629True
2630>>> T[0] is L
2631True
2632>>> L[0][0] is L
2633True
2634>>> T[0][0] is T
2635True
Guido van Rossumf29d3d62003-01-27 22:47:53 +00002636>>> dis(pickle.dumps(L, 0))
Tim Petersd0f7c862003-01-28 15:27:57 +00002637 0: ( MARK
2638 1: l LIST (MARK at 0)
2639 2: p PUT 0
2640 5: ( MARK
2641 6: g GET 0
2642 9: t TUPLE (MARK at 5)
2643 10: p PUT 1
2644 13: a APPEND
2645 14: . STOP
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002646highest protocol among opcodes = 0
2647
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002648>>> dis(pickle.dumps(L, 1))
Tim Petersd0f7c862003-01-28 15:27:57 +00002649 0: ] EMPTY_LIST
2650 1: q BINPUT 0
2651 3: ( MARK
2652 4: h BINGET 0
2653 6: t TUPLE (MARK at 3)
2654 7: q BINPUT 1
2655 9: a APPEND
2656 10: . STOP
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002657highest protocol among opcodes = 1
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002658
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002659Note that, in the protocol 0 pickle of the recursive tuple, the disassembler
2660has to emulate the stack in order to realize that the POP opcode at 16 gets
2661rid of the MARK at 0.
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002662
Guido van Rossumf29d3d62003-01-27 22:47:53 +00002663>>> dis(pickle.dumps(T, 0))
Tim Petersd0f7c862003-01-28 15:27:57 +00002664 0: ( MARK
2665 1: ( MARK
2666 2: l LIST (MARK at 1)
2667 3: p PUT 0
2668 6: ( MARK
2669 7: g GET 0
2670 10: t TUPLE (MARK at 6)
2671 11: p PUT 1
2672 14: a APPEND
2673 15: 0 POP
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002674 16: 0 POP (MARK at 0)
2675 17: g GET 1
2676 20: . STOP
2677highest protocol among opcodes = 0
2678
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002679>>> dis(pickle.dumps(T, 1))
Tim Petersd0f7c862003-01-28 15:27:57 +00002680 0: ( MARK
2681 1: ] EMPTY_LIST
2682 2: q BINPUT 0
2683 4: ( MARK
2684 5: h BINGET 0
2685 7: t TUPLE (MARK at 4)
2686 8: q BINPUT 1
2687 10: a APPEND
2688 11: 1 POP_MARK (MARK at 0)
2689 12: h BINGET 1
2690 14: . STOP
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002691highest protocol among opcodes = 1
Tim Petersd0f7c862003-01-28 15:27:57 +00002692
2693Try protocol 2.
2694
2695>>> dis(pickle.dumps(L, 2))
2696 0: \x80 PROTO 2
2697 2: ] EMPTY_LIST
2698 3: q BINPUT 0
2699 5: h BINGET 0
2700 7: \x85 TUPLE1
2701 8: q BINPUT 1
2702 10: a APPEND
2703 11: . STOP
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002704highest protocol among opcodes = 2
Tim Petersd0f7c862003-01-28 15:27:57 +00002705
2706>>> dis(pickle.dumps(T, 2))
2707 0: \x80 PROTO 2
2708 2: ] EMPTY_LIST
2709 3: q BINPUT 0
2710 5: h BINGET 0
2711 7: \x85 TUPLE1
2712 8: q BINPUT 1
2713 10: a APPEND
2714 11: 0 POP
2715 12: h BINGET 1
2716 14: . STOP
Tim Petersc1c2b3e2003-01-29 20:12:21 +00002717highest protocol among opcodes = 2
Alexander Belopolsky929d3842010-07-17 15:51:21 +00002718
2719Try protocol 3 with annotations:
2720
2721>>> dis(pickle.dumps(T, 3), annotate=1)
2722 0: \x80 PROTO 3 Protocol version indicator.
2723 2: ] EMPTY_LIST Push an empty list.
2724 3: q BINPUT 0 Store the stack top into the memo. The stack is not popped.
2725 5: h BINGET 0 Read an object from the memo and push it on the stack.
2726 7: \x85 TUPLE1 Build a one-tuple out of the topmost item on the stack.
2727 8: q BINPUT 1 Store the stack top into the memo. The stack is not popped.
2728 10: a APPEND Append an object to a list.
2729 11: 0 POP Discard the top stack item, shrinking the stack by one item.
2730 12: h BINGET 1 Read an object from the memo and push it on the stack.
2731 14: . STOP Stop the unpickling machine.
2732highest protocol among opcodes = 2
2733
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002734"""
2735
Tim Peters62235e72003-02-05 19:55:53 +00002736_memo_test = r"""
2737>>> import pickle
Guido van Rossumcfe5f202007-05-08 21:26:54 +00002738>>> import io
2739>>> f = io.BytesIO()
Tim Peters62235e72003-02-05 19:55:53 +00002740>>> p = pickle.Pickler(f, 2)
2741>>> x = [1, 2, 3]
2742>>> p.dump(x)
2743>>> p.dump(x)
2744>>> f.seek(0)
Guido van Rossumcfe5f202007-05-08 21:26:54 +000027450
Tim Peters62235e72003-02-05 19:55:53 +00002746>>> memo = {}
2747>>> dis(f, memo=memo)
2748 0: \x80 PROTO 2
2749 2: ] EMPTY_LIST
2750 3: q BINPUT 0
2751 5: ( MARK
2752 6: K BININT1 1
2753 8: K BININT1 2
2754 10: K BININT1 3
2755 12: e APPENDS (MARK at 5)
2756 13: . STOP
2757highest protocol among opcodes = 2
2758>>> dis(f, memo=memo)
2759 14: \x80 PROTO 2
2760 16: h BINGET 0
2761 18: . STOP
2762highest protocol among opcodes = 2
2763"""
2764
Guido van Rossum57028352003-01-28 15:09:10 +00002765__test__ = {'disassembler_test': _dis_test,
Tim Peters62235e72003-02-05 19:55:53 +00002766 'disassembler_memo_test': _memo_test,
Tim Peters8ecfc8e2003-01-27 18:51:48 +00002767 }
2768
2769def _test():
2770 import doctest
2771 return doctest.testmod()
2772
2773if __name__ == "__main__":
Alexander Belopolsky60c762b2010-07-03 20:35:53 +00002774 import sys, argparse
2775 parser = argparse.ArgumentParser(
2776 description='disassemble one or more pickle files')
2777 parser.add_argument(
2778 'pickle_file', type=argparse.FileType('br'),
2779 nargs='*', help='the pickle file')
2780 parser.add_argument(
2781 '-o', '--output', default=sys.stdout, type=argparse.FileType('w'),
2782 help='the file where the output should be written')
2783 parser.add_argument(
2784 '-m', '--memo', action='store_true',
2785 help='preserve memo between disassemblies')
2786 parser.add_argument(
2787 '-l', '--indentlevel', default=4, type=int,
2788 help='the number of blanks by which to indent a new MARK level')
2789 parser.add_argument(
Alexander Belopolsky929d3842010-07-17 15:51:21 +00002790 '-a', '--annotate', action='store_true',
2791 help='annotate each line with a short opcode description')
2792 parser.add_argument(
Alexander Belopolsky60c762b2010-07-03 20:35:53 +00002793 '-p', '--preamble', default="==> {name} <==",
2794 help='if more than one pickle file is specified, print this before'
2795 ' each disassembly')
2796 parser.add_argument(
2797 '-t', '--test', action='store_true',
2798 help='run self-test suite')
2799 parser.add_argument(
2800 '-v', action='store_true',
2801 help='run verbosely; only affects self-test run')
2802 args = parser.parse_args()
2803 if args.test:
2804 _test()
2805 else:
Alexander Belopolsky929d3842010-07-17 15:51:21 +00002806 annotate = 30 if args.annotate else 0
Alexander Belopolsky60c762b2010-07-03 20:35:53 +00002807 if not args.pickle_file:
2808 parser.print_help()
2809 elif len(args.pickle_file) == 1:
Alexander Belopolsky929d3842010-07-17 15:51:21 +00002810 dis(args.pickle_file[0], args.output, None,
2811 args.indentlevel, annotate)
Alexander Belopolsky60c762b2010-07-03 20:35:53 +00002812 else:
2813 memo = {} if args.memo else None
2814 for f in args.pickle_file:
2815 preamble = args.preamble.format(name=f.name)
2816 args.output.write(preamble + '\n')
Alexander Belopolsky929d3842010-07-17 15:51:21 +00002817 dis(f, args.output, memo, args.indentlevel, annotate)