Blame - Lib/pickletools.py - platform/external/python/cpython3

blob: d41bada04242ca13d03cd9eccf7bd3227b0f720a [file] [log] [blame]

Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	1	""""Executable documentation" for the pickle module.
				2
				3	Extensive comments about the pickle protocols and pickle-machine opcodes
				4	can be found here. Some functions meant for external use:
				5
				6	genops(pickle)
				7	Generate all the opcodes in a pickle, as (opcode, arg, position) triples.
				8
				9	dis(pickle, out=None, indentlevel=4)
				10	Print a symbolic disassembly of a pickle.
				11	"""
				12
				13	# Other ideas:
				14	#
				15	# - A pickle verifier: read a pickle and check it exhaustively for
				16	# well-formedness.
				17	#
				18	# - A protocol identifier: examine a pickle and return its protocol number
				19	# (== the highest .proto attr value among all the opcodes in the pickle).
				20	#
				21	# - A pickle optimizer: for example, tuple-building code is sometimes more
				22	# elaborate than necessary, catering for the possibility that the tuple
				23	# is recursive. Or lots of times a PUT is generated that's never accessed
				24	# by a later GET.
				25
				26
				27	"""
				28	"A pickle" is a program for a virtual pickle machine (PM, but more accurately
				29	called an unpickling machine). It's a sequence of opcodes, interpreted by the
				30	PM, building an arbitrarily complex Python object.
				31
				32	For the most part, the PM is very simple: there are no looping, testing, or
				33	conditional instructions, no arithmetic and no function calls. Opcodes are
				34	executed once each, from first to last, until a STOP opcode is reached.
				35
				36	The PM has two data areas, "the stack" and "the memo".
				37
				38	Many opcodes push Python objects onto the stack; e.g., INT pushes a Python
				39	integer object on the stack, whose value is gotten from a decimal string
				40	literal immediately following the INT opcode in the pickle bytestream. Other
				41	opcodes take Python objects off the stack. The result of unpickling is
				42	whatever object is left on the stack when the final STOP opcode is executed.
				43
				44	The memo is simply an array of objects, or it can be implemented as a dict
				45	mapping little integers to objects. The memo serves as the PM's "long term
				46	memory", and the little integers indexing the memo are akin to variable
				47	names. Some opcodes pop a stack object into the memo at a given index,
				48	and others push a memo object at a given index onto the stack again.
				49
				50	At heart, that's all the PM has. Subtleties arise for these reasons:
				51
				52	+ Object identity. Objects can be arbitrarily complex, and subobjects
				53	may be shared (for example, the list [a, a] refers to the same object a
				54	twice). It can be vital that unpickling recreate an isomorphic object
				55	graph, faithfully reproducing sharing.
				56
				57	+ Recursive objects. For example, after "L = []; L.append(L)", L is a
				58	list, and L[0] is the same list. This is related to the object identity
				59	point, and some sequences of pickle opcodes are subtle in order to
				60	get the right result in all cases.
				61
				62	+ Things pickle doesn't know everything about. Examples of things pickle
				63	does know everything about are Python's builtin scalar and container
				64	types, like ints and tuples. They generally have opcodes dedicated to
				65	them. For things like module references and instances of user-defined
				66	classes, pickle's knowledge is limited. Historically, many enhancements
				67	have been made to the pickle protocol in order to do a better (faster,
				68	and/or more compact) job on those.
				69
				70	+ Backward compatibility and micro-optimization. As explained below,
				71	pickle opcodes never go away, not even when better ways to do a thing
				72	get invented. The repertoire of the PM just keeps growing over time.
Tim Peters	fdc0346	2003-01-28 04:56:33 +0000	[diff] [blame]	73	For example, protocol 0 had two opcodes for building Python integers (INT
				74	and LONG), protocol 1 added three more for more-efficient pickling of short
				75	integers, and protocol 2 added two more for more-efficient pickling of
				76	long integers (before protocol 2, the only ways to pickle a Python long
				77	took time quadratic in the number of digits, for both pickling and
				78	unpickling). "Opcode bloat" isn't so much a subtlety as a source of
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	79	wearying complication.
				80
				81
				82	Pickle protocols:
				83
				84	For compatibility, the meaning of a pickle opcode never changes. Instead new
				85	pickle opcodes get added, and each version's unpickler can handle all the
				86	pickle opcodes in all protocol versions to date. So old pickles continue to
				87	be readable forever. The pickler can generally be told to restrict itself to
				88	the subset of opcodes available under previous protocol versions too, so that
				89	users can create pickles under the current version readable by older
				90	versions. However, a pickle does not contain its version number embedded
				91	within it. If an older unpickler tries to read a pickle using a later
				92	protocol, the result is most likely an exception due to seeing an unknown (in
				93	the older unpickler) opcode.
				94
				95	The original pickle used what's now called "protocol 0", and what was called
				96	"text mode" before Python 2.3. The entire pickle bytestream is made up of
				97	printable 7-bit ASCII characters, plus the newline character, in protocol 0.
Tim Peters	fdc0346	2003-01-28 04:56:33 +0000	[diff] [blame]	98	That's why it was called text mode. Protocol 0 is small and elegant, but
				99	sometimes painfully inefficient.
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	100
				101	The second major set of additions is now called "protocol 1", and was called
				102	"binary mode" before Python 2.3. This added many opcodes with arguments
				103	consisting of arbitrary bytes, including NUL bytes and unprintable "high bit"
				104	bytes. Binary mode pickles can be substantially smaller than equivalent
				105	text mode pickles, and sometimes faster too; e.g., BININT represents a 4-byte
				106	int as 4 bytes following the opcode, which is cheaper to unpickle than the
Tim Peters	fdc0346	2003-01-28 04:56:33 +0000	[diff] [blame]	107	(perhaps) 11-character decimal string attached to INT. Protocol 1 also added
				108	a number of opcodes that operate on many stack elements at once (like APPENDS
Tim Peters	81098ac	2003-01-28 05:12:08 +0000	[diff] [blame^]	109	and SETITEMS), and "shortcut" opcodes (like EMPTY_DICT and EMPTY_TUPLE).
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	110
				111	The third major set of additions came in Python 2.3, and is called "protocol
Tim Peters	fdc0346	2003-01-28 04:56:33 +0000	[diff] [blame]	112	2". This added:
				113
				114	- A better way to pickle instances of new-style classes (NEWOBJ).
				115
				116	- A way for a pickle to identify its protocol (PROTO).
				117
				118	- Time- and space- efficient pickling of long ints (LONG{1,4}).
				119
				120	- Shortcuts for small tuples (TUPLE{1,2,3}}.
				121
				122	- Dedicated opcodes for bools (NEWTRUE, NEWFALSE).
				123
				124	- The "extension registry", a vector of popular objects that can be pushed
				125	efficiently by index (EXT{1,2,4}). This is akin to the memo and GET, but
				126	the registry contents are predefined (there's nothing akin to the memo's
				127	PUT).
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	128	"""
				129
				130	# Meta-rule: Descriptions are stored in instances of descriptor objects,
				131	# with plain constructors. No meta-language is defined from which
				132	# descriptors could be constructed. If you want, e.g., XML, write a little
				133	# program to generate XML from the objects.
				134
				135	##############################################################################
				136	# Some pickle opcodes have an argument, following the opcode in the
				137	# bytestream. An argument is of a specific type, described by an instance
				138	# of ArgumentDescriptor. These are not to be confused with arguments taken
				139	# off the stack -- ArgumentDescriptor applies only to arguments embedded in
				140	# the opcode stream, immediately following an opcode.
				141
				142	# Represents the number of bytes consumed by an argument delimited by the
				143	# next newline character.
				144	UP_TO_NEWLINE = -1
				145
				146	# Represents the number of bytes consumed by a two-argument opcode where
				147	# the first argument gives the number of bytes in the second argument.
Tim Peters	fdb8cfa	2003-01-28 00:13:19 +0000	[diff] [blame]	148	TAKEN_FROM_ARGUMENT1 = -2 # num bytes is 1-byte unsigned int
				149	TAKEN_FROM_ARGUMENT4 = -3 # num bytes is 4-byte signed little-endian int
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	150
				151	class ArgumentDescriptor(object):
				152	__slots__ = (
				153	# name of descriptor record, also a module global name; a string
				154	'name',
				155
				156	# length of argument, in bytes; an int; UP_TO_NEWLINE and
Tim Peters	fdb8cfa	2003-01-28 00:13:19 +0000	[diff] [blame]	157	# TAKEN_FROM_ARGUMENT{1,4} are negative values for variable-length
				158	# cases
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	159	'n',
				160
				161	# a function taking a file-like object, reading this kind of argument
				162	# from the object at the current position, advancing the current
				163	# position by n bytes, and returning the value of the argument
				164	'reader',
				165
				166	# human-readable docs for this arg descriptor; a string
				167	'doc',
				168	)
				169
				170	def __init__(self, name, n, reader, doc):
				171	assert isinstance(name, str)
				172	self.name = name
				173
				174	assert isinstance(n, int) and (n >= 0 or
Tim Peters	fdb8cfa	2003-01-28 00:13:19 +0000	[diff] [blame]	175	n in (UP_TO_NEWLINE,
				176	TAKEN_FROM_ARGUMENT1,
				177	TAKEN_FROM_ARGUMENT4))
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	178	self.n = n
				179
				180	self.reader = reader
				181
				182	assert isinstance(doc, str)
				183	self.doc = doc
				184
				185	from struct import unpack as _unpack
				186
				187	def read_uint1(f):
				188	"""
				189	>>> import StringIO
				190	>>> read_uint1(StringIO.StringIO('\\xff'))
				191	255
				192	"""
				193
				194	data = f.read(1)
				195	if data:
				196	return ord(data)
				197	raise ValueError("not enough data in stream to read uint1")
				198
				199	uint1 = ArgumentDescriptor(
				200	name='uint1',
				201	n=1,
				202	reader=read_uint1,
				203	doc="One-byte unsigned integer.")
				204
				205
				206	def read_uint2(f):
				207	"""
				208	>>> import StringIO
				209	>>> read_uint2(StringIO.StringIO('\\xff\\x00'))
				210	255
				211	>>> read_uint2(StringIO.StringIO('\\xff\\xff'))
				212	65535
				213	"""
				214
				215	data = f.read(2)
				216	if len(data) == 2:
				217	return _unpack("<H", data)[0]
				218	raise ValueError("not enough data in stream to read uint2")
				219
				220	uint2 = ArgumentDescriptor(
				221	name='uint2',
				222	n=2,
				223	reader=read_uint2,
				224	doc="Two-byte unsigned integer, little-endian.")
				225
				226
				227	def read_int4(f):
				228	"""
				229	>>> import StringIO
				230	>>> read_int4(StringIO.StringIO('\\xff\\x00\\x00\\x00'))
				231	255
				232	>>> read_int4(StringIO.StringIO('\\x00\\x00\\x00\\x80')) == -(2**31)
				233	True
				234	"""
				235
				236	data = f.read(4)
				237	if len(data) == 4:
				238	return _unpack("<i", data)[0]
				239	raise ValueError("not enough data in stream to read int4")
				240
				241	int4 = ArgumentDescriptor(
				242	name='int4',
				243	n=4,
				244	reader=read_int4,
				245	doc="Four-byte signed integer, little-endian, 2's complement.")
				246
				247
				248	def read_stringnl(f, decode=True, stripquotes=True):
				249	"""
				250	>>> import StringIO
				251	>>> read_stringnl(StringIO.StringIO("'abcd'\\nefg\\n"))
				252	'abcd'
				253
				254	>>> read_stringnl(StringIO.StringIO("\\n"))
				255	Traceback (most recent call last):
				256	...
				257	ValueError: no string quotes around ''
				258
				259	>>> read_stringnl(StringIO.StringIO("\\n"), stripquotes=False)
				260	''
				261
				262	>>> read_stringnl(StringIO.StringIO("''\\n"))
				263	''
				264
				265	>>> read_stringnl(StringIO.StringIO('"abcd"'))
				266	Traceback (most recent call last):
				267	...
				268	ValueError: no newline found when trying to read stringnl
				269
				270	Embedded escapes are undone in the result.
				271	>>> read_stringnl(StringIO.StringIO("'a\\\\nb\\x00c\\td'\\n'e'"))
				272	'a\\nb\\x00c\\td'
				273	"""
				274
				275	data = f.readline()
				276	if not data.endswith('\n'):
				277	raise ValueError("no newline found when trying to read stringnl")
				278	data = data[:-1] # lose the newline
				279
				280	if stripquotes:
				281	for q in "'\"":
				282	if data.startswith(q):
				283	if not data.endswith(q):
				284	raise ValueError("strinq quote %r not found at both "
				285	"ends of %r" % (q, data))
				286	data = data[1:-1]
				287	break
				288	else:
				289	raise ValueError("no string quotes around %r" % data)
				290
				291	# I'm not sure when 'string_escape' was added to the std codecs; it's
				292	# crazy not to use it if it's there.
				293	if decode:
				294	data = data.decode('string_escape')
				295	return data
				296
				297	stringnl = ArgumentDescriptor(
				298	name='stringnl',
				299	n=UP_TO_NEWLINE,
				300	reader=read_stringnl,
				301	doc="""A newline-terminated string.
				302
				303	This is a repr-style string, with embedded escapes, and
				304	bracketing quotes.
				305	""")
				306
				307	def read_stringnl_noescape(f):
				308	return read_stringnl(f, decode=False, stripquotes=False)
				309
				310	stringnl_noescape = ArgumentDescriptor(
				311	name='stringnl_noescape',
				312	n=UP_TO_NEWLINE,
				313	reader=read_stringnl_noescape,
				314	doc="""A newline-terminated string.
				315
				316	This is a str-style string, without embedded escapes,
				317	or bracketing quotes. It should consist solely of
				318	printable ASCII characters.
				319	""")
				320
				321	def read_stringnl_noescape_pair(f):
				322	"""
				323	>>> import StringIO
				324	>>> read_stringnl_noescape_pair(StringIO.StringIO("Queue\\nEmpty\\njunk"))
Tim Peters	d916cf4	2003-01-27 19:01:47 +0000	[diff] [blame]	325	'Queue Empty'
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	326	"""
				327
Tim Peters	d916cf4	2003-01-27 19:01:47 +0000	[diff] [blame]	328	return "%s %s" % (read_stringnl_noescape(f), read_stringnl_noescape(f))
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	329
				330	stringnl_noescape_pair = ArgumentDescriptor(
				331	name='stringnl_noescape_pair',
				332	n=UP_TO_NEWLINE,
				333	reader=read_stringnl_noescape_pair,
				334	doc="""A pair of newline-terminated strings.
				335
				336	These are str-style strings, without embedded
				337	escapes, or bracketing quotes. They should
				338	consist solely of printable ASCII characters.
				339	The pair is returned as a single string, with
Tim Peters	d916cf4	2003-01-27 19:01:47 +0000	[diff] [blame]	340	a single blank separating the two strings.
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	341	""")
				342
				343	def read_string4(f):
				344	"""
				345	>>> import StringIO
				346	>>> read_string4(StringIO.StringIO("\\x00\\x00\\x00\\x00abc"))
				347	''
				348	>>> read_string4(StringIO.StringIO("\\x03\\x00\\x00\\x00abcdef"))
				349	'abc'
				350	>>> read_string4(StringIO.StringIO("\\x00\\x00\\x00\\x03abcdef"))
				351	Traceback (most recent call last):
				352	...
				353	ValueError: expected 50331648 bytes in a string4, but only 6 remain
				354	"""
				355
				356	n = read_int4(f)
				357	if n < 0:
				358	raise ValueError("string4 byte count < 0: %d" % n)
				359	data = f.read(n)
				360	if len(data) == n:
				361	return data
				362	raise ValueError("expected %d bytes in a string4, but only %d remain" %
				363	(n, len(data)))
				364
				365	string4 = ArgumentDescriptor(
				366	name="string4",
Tim Peters	fdb8cfa	2003-01-28 00:13:19 +0000	[diff] [blame]	367	n=TAKEN_FROM_ARGUMENT4,
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	368	reader=read_string4,
				369	doc="""A counted string.
				370
				371	The first argument is a 4-byte little-endian signed int giving
				372	the number of bytes in the string, and the second argument is
				373	that many bytes.
				374	""")
				375
				376
				377	def read_string1(f):
				378	"""
				379	>>> import StringIO
				380	>>> read_string1(StringIO.StringIO("\\x00"))
				381	''
				382	>>> read_string1(StringIO.StringIO("\\x03abcdef"))
				383	'abc'
				384	"""
				385
				386	n = read_uint1(f)
				387	assert n >= 0
				388	data = f.read(n)
				389	if len(data) == n:
				390	return data
				391	raise ValueError("expected %d bytes in a string1, but only %d remain" %
				392	(n, len(data)))
				393
				394	string1 = ArgumentDescriptor(
				395	name="string1",
Tim Peters	fdb8cfa	2003-01-28 00:13:19 +0000	[diff] [blame]	396	n=TAKEN_FROM_ARGUMENT1,
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	397	reader=read_string1,
				398	doc="""A counted string.
				399
				400	The first argument is a 1-byte unsigned int giving the number
				401	of bytes in the string, and the second argument is that many
				402	bytes.
				403	""")
				404
				405
				406	def read_unicodestringnl(f):
				407	"""
				408	>>> import StringIO
				409	>>> read_unicodestringnl(StringIO.StringIO("abc\\uabcd\\njunk"))
				410	u'abc\\uabcd'
				411	"""
				412
				413	data = f.readline()
				414	if not data.endswith('\n'):
				415	raise ValueError("no newline found when trying to read "
				416	"unicodestringnl")
				417	data = data[:-1] # lose the newline
				418	return unicode(data, 'raw-unicode-escape')
				419
				420	unicodestringnl = ArgumentDescriptor(
				421	name='unicodestringnl',
				422	n=UP_TO_NEWLINE,
				423	reader=read_unicodestringnl,
				424	doc="""A newline-terminated Unicode string.
				425
				426	This is raw-unicode-escape encoded, so consists of
				427	printable ASCII characters, and may contain embedded
				428	escape sequences.
				429	""")
				430
				431	def read_unicodestring4(f):
				432	"""
				433	>>> import StringIO
				434	>>> s = u'abcd\\uabcd'
				435	>>> enc = s.encode('utf-8')
				436	>>> enc
				437	'abcd\\xea\\xaf\\x8d'
				438	>>> n = chr(len(enc)) + chr(0) * 3 # little-endian 4-byte length
				439	>>> t = read_unicodestring4(StringIO.StringIO(n + enc + 'junk'))
				440	>>> s == t
				441	True
				442
				443	>>> read_unicodestring4(StringIO.StringIO(n + enc[:-1]))
				444	Traceback (most recent call last):
				445	...
				446	ValueError: expected 7 bytes in a unicodestring4, but only 6 remain
				447	"""
				448
				449	n = read_int4(f)
				450	if n < 0:
				451	raise ValueError("unicodestring4 byte count < 0: %d" % n)
				452	data = f.read(n)
				453	if len(data) == n:
				454	return unicode(data, 'utf-8')
				455	raise ValueError("expected %d bytes in a unicodestring4, but only %d "
				456	"remain" % (n, len(data)))
				457
				458	unicodestring4 = ArgumentDescriptor(
				459	name="unicodestring4",
Tim Peters	fdb8cfa	2003-01-28 00:13:19 +0000	[diff] [blame]	460	n=TAKEN_FROM_ARGUMENT4,
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	461	reader=read_unicodestring4,
				462	doc="""A counted Unicode string.
				463
				464	The first argument is a 4-byte little-endian signed int
				465	giving the number of bytes in the string, and the second
				466	argument-- the UTF-8 encoding of the Unicode string --
				467	contains that many bytes.
				468	""")
				469
				470
				471	def read_decimalnl_short(f):
				472	"""
				473	>>> import StringIO
				474	>>> read_decimalnl_short(StringIO.StringIO("1234\\n56"))
				475	1234
				476
				477	>>> read_decimalnl_short(StringIO.StringIO("1234L\\n56"))
				478	Traceback (most recent call last):
				479	...
				480	ValueError: trailing 'L' not allowed in '1234L'
				481	"""
				482
				483	s = read_stringnl(f, decode=False, stripquotes=False)
				484	if s.endswith("L"):
				485	raise ValueError("trailing 'L' not allowed in %r" % s)
				486
				487	# It's not necessarily true that the result fits in a Python short int:
				488	# the pickle may have been written on a 64-bit box. There's also a hack
				489	# for True and False here.
				490	if s == "00":
				491	return False
				492	elif s == "01":
				493	return True
				494
				495	try:
				496	return int(s)
				497	except OverflowError:
				498	return long(s)
				499
				500	def read_decimalnl_long(f):
				501	"""
				502	>>> import StringIO
				503
				504	>>> read_decimalnl_long(StringIO.StringIO("1234\\n56"))
				505	Traceback (most recent call last):
				506	...
				507	ValueError: trailing 'L' required in '1234'
				508
				509	Someday the trailing 'L' will probably go away from this output.
				510
				511	>>> read_decimalnl_long(StringIO.StringIO("1234L\\n56"))
				512	1234L
				513
				514	>>> read_decimalnl_long(StringIO.StringIO("123456789012345678901234L\\n6"))
				515	123456789012345678901234L
				516	"""
				517
				518	s = read_stringnl(f, decode=False, stripquotes=False)
				519	if not s.endswith("L"):
				520	raise ValueError("trailing 'L' required in %r" % s)
				521	return long(s)
				522
				523
				524	decimalnl_short = ArgumentDescriptor(
				525	name='decimalnl_short',
				526	n=UP_TO_NEWLINE,
				527	reader=read_decimalnl_short,
				528	doc="""A newline-terminated decimal integer literal.
				529
				530	This never has a trailing 'L', and the integer fit
				531	in a short Python int on the box where the pickle
				532	was written -- but there's no guarantee it will fit
				533	in a short Python int on the box where the pickle
				534	is read.
				535	""")
				536
				537	decimalnl_long = ArgumentDescriptor(
				538	name='decimalnl_long',
				539	n=UP_TO_NEWLINE,
				540	reader=read_decimalnl_long,
				541	doc="""A newline-terminated decimal integer literal.
				542
				543	This has a trailing 'L', and can represent integers
				544	of any size.
				545	""")
				546
				547
				548	def read_floatnl(f):
				549	"""
				550	>>> import StringIO
				551	>>> read_floatnl(StringIO.StringIO("-1.25\\n6"))
				552	-1.25
				553	"""
				554	s = read_stringnl(f, decode=False, stripquotes=False)
				555	return float(s)
				556
				557	floatnl = ArgumentDescriptor(
				558	name='floatnl',
				559	n=UP_TO_NEWLINE,
				560	reader=read_floatnl,
				561	doc="""A newline-terminated decimal floating literal.
				562
				563	In general this requires 17 significant digits for roundtrip
				564	identity, and pickling then unpickling infinities, NaNs, and
				565	minus zero doesn't work across boxes, or on some boxes even
				566	on itself (e.g., Windows can't read the strings it produces
				567	for infinities or NaNs).
				568	""")
				569
				570	def read_float8(f):
				571	"""
				572	>>> import StringIO, struct
				573	>>> raw = struct.pack(">d", -1.25)
				574	>>> raw
				575	'\\xbf\\xf4\\x00\\x00\\x00\\x00\\x00\\x00'
				576	>>> read_float8(StringIO.StringIO(raw + "\\n"))
				577	-1.25
				578	"""
				579
				580	data = f.read(8)
				581	if len(data) == 8:
				582	return _unpack(">d", data)[0]
				583	raise ValueError("not enough data in stream to read float8")
				584
				585
				586	float8 = ArgumentDescriptor(
				587	name='float8',
				588	n=8,
				589	reader=read_float8,
				590	doc="""An 8-byte binary representation of a float, big-endian.
				591
				592	The format is unique to Python, and shared with the struct
				593	module (format string '>d') "in theory" (the struct and cPickle
				594	implementations don't share the code -- they should). It's
				595	strongly related to the IEEE-754 double format, and, in normal
				596	cases, is in fact identical to the big-endian 754 double format.
				597	On other boxes the dynamic range is limited to that of a 754
				598	double, and "add a half and chop" rounding is used to reduce
				599	the precision to 53 bits. However, even on a 754 box,
				600	infinities, NaNs, and minus zero may not be handled correctly
				601	(may not survive roundtrip pickling intact).
				602	""")
				603
Guido van Rossum	5a2d8f5	2003-01-27 21:44:25 +0000	[diff] [blame]	604	# Protocol 2 formats
				605
				606	def decode_long(data):
				607	r"""Decode a long from a two's complement little-endian binary string.
				608	>>> decode_long("\xff\x00")
				609	255L
				610	>>> decode_long("\xff\x7f")
				611	32767L
				612	>>> decode_long("\x00\xff")
				613	-256L
				614	>>> decode_long("\x00\x80")
				615	-32768L
Tim Peters	217e571	2003-01-27 23:51:11 +0000	[diff] [blame]	616	>>> decode_long("\x80")
				617	-128L
				618	>>> decode_long("\x7f")
				619	127L
Guido van Rossum	5a2d8f5	2003-01-27 21:44:25 +0000	[diff] [blame]	620	"""
				621	x = 0L
				622	i = 0L
				623	for c in data:
				624	x \|= long(ord(c)) << i
				625	i += 8L
Tim Peters	217e571	2003-01-27 23:51:11 +0000	[diff] [blame]	626	if data and ord(c) >= 0x80:
Guido van Rossum	5a2d8f5	2003-01-27 21:44:25 +0000	[diff] [blame]	627	x -= 1L << i
				628	return x
				629
				630	def read_long1(f):
				631	r"""
				632	>>> import StringIO
				633	>>> read_long1(StringIO.StringIO("\x02\xff\x00"))
				634	255L
				635	>>> read_long1(StringIO.StringIO("\x02\xff\x7f"))
				636	32767L
				637	>>> read_long1(StringIO.StringIO("\x02\x00\xff"))
				638	-256L
				639	>>> read_long1(StringIO.StringIO("\x02\x00\x80"))
				640	-32768L
Tim Peters	5eed340	2003-01-27 23:51:36 +0000	[diff] [blame]	641	>>>
Guido van Rossum	5a2d8f5	2003-01-27 21:44:25 +0000	[diff] [blame]	642	"""
				643
				644	n = read_uint1(f)
				645	data = f.read(n)
				646	if len(data) != n:
				647	raise ValueError("not enough data in stream to read long1")
				648	return decode_long(data)
				649
				650	long1 = ArgumentDescriptor(
				651	name="long1",
Tim Peters	fdb8cfa	2003-01-28 00:13:19 +0000	[diff] [blame]	652	n=TAKEN_FROM_ARGUMENT1,
Guido van Rossum	5a2d8f5	2003-01-27 21:44:25 +0000	[diff] [blame]	653	reader=read_long1,
				654	doc="""A binary long, little-endian, using 1-byte size.
				655
				656	This first reads one byte as an unsigned size, then reads that
Tim Peters	bdbe741	2003-01-27 23:54:04 +0000	[diff] [blame]	657	many bytes and interprets them as a little-endian 2's-complement long.
Guido van Rossum	5a2d8f5	2003-01-27 21:44:25 +0000	[diff] [blame]	658	""")
				659
Guido van Rossum	5a2d8f5	2003-01-27 21:44:25 +0000	[diff] [blame]	660	def read_long4(f):
				661	r"""
				662	>>> import StringIO
				663	>>> read_long4(StringIO.StringIO("\x02\x00\x00\x00\xff\x00"))
				664	255L
				665	>>> read_long4(StringIO.StringIO("\x02\x00\x00\x00\xff\x7f"))
				666	32767L
				667	>>> read_long4(StringIO.StringIO("\x02\x00\x00\x00\x00\xff"))
				668	-256L
				669	>>> read_long4(StringIO.StringIO("\x02\x00\x00\x00\x00\x80"))
				670	-32768L
Tim Peters	5eed340	2003-01-27 23:51:36 +0000	[diff] [blame]	671	>>>
Guido van Rossum	5a2d8f5	2003-01-27 21:44:25 +0000	[diff] [blame]	672	"""
				673
				674	n = read_int4(f)
				675	if n < 0:
Neal Norwitz	784a3f5	2003-01-28 00:20:41 +0000	[diff] [blame]	676	raise ValueError("long4 byte count < 0: %d" % n)
Guido van Rossum	5a2d8f5	2003-01-27 21:44:25 +0000	[diff] [blame]	677	data = f.read(n)
				678	if len(data) != n:
Neal Norwitz	784a3f5	2003-01-28 00:20:41 +0000	[diff] [blame]	679	raise ValueError("not enough data in stream to read long4")
Guido van Rossum	5a2d8f5	2003-01-27 21:44:25 +0000	[diff] [blame]	680	return decode_long(data)
				681
				682	long4 = ArgumentDescriptor(
				683	name="long4",
Tim Peters	fdb8cfa	2003-01-28 00:13:19 +0000	[diff] [blame]	684	n=TAKEN_FROM_ARGUMENT4,
Guido van Rossum	5a2d8f5	2003-01-27 21:44:25 +0000	[diff] [blame]	685	reader=read_long4,
				686	doc="""A binary representation of a long, little-endian.
				687
				688	This first reads four bytes as a signed size (but requires the
				689	size to be >= 0), then reads that many bytes and interprets them
Tim Peters	bdbe741	2003-01-27 23:54:04 +0000	[diff] [blame]	690	as a little-endian 2's-complement long.
Guido van Rossum	5a2d8f5	2003-01-27 21:44:25 +0000	[diff] [blame]	691	""")
				692
				693
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	694	##############################################################################
				695	# Object descriptors. The stack used by the pickle machine holds objects,
				696	# and in the stack_before and stack_after attributes of OpcodeInfo
				697	# descriptors we need names to describe the various types of objects that can
				698	# appear on the stack.
				699
				700	class StackObject(object):
				701	__slots__ = (
				702	# name of descriptor record, for info only
				703	'name',
				704
				705	# type of object, or tuple of type objects (meaning the object can
				706	# be of any type in the tuple)
				707	'obtype',
				708
				709	# human-readable docs for this kind of stack object; a string
				710	'doc',
				711	)
				712
				713	def __init__(self, name, obtype, doc):
				714	assert isinstance(name, str)
				715	self.name = name
				716
				717	assert isinstance(obtype, type) or isinstance(obtype, tuple)
				718	if isinstance(obtype, tuple):
				719	for contained in obtype:
				720	assert isinstance(contained, type)
				721	self.obtype = obtype
				722
				723	assert isinstance(doc, str)
				724	self.doc = doc
				725
				726
				727	pyint = StackObject(
				728	name='int',
				729	obtype=int,
				730	doc="A short (as opposed to long) Python integer object.")
				731
				732	pylong = StackObject(
				733	name='long',
				734	obtype=long,
				735	doc="A long (as opposed to short) Python integer object.")
				736
				737	pyinteger_or_bool = StackObject(
				738	name='int_or_bool',
				739	obtype=(int, long, bool),
				740	doc="A Python integer object (short or long), or "
				741	"a Python bool.")
				742
Guido van Rossum	5a2d8f5	2003-01-27 21:44:25 +0000	[diff] [blame]	743	pybool = StackObject(
				744	name='bool',
				745	obtype=(bool,),
				746	doc="A Python bool object.")
				747
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	748	pyfloat = StackObject(
				749	name='float',
				750	obtype=float,
				751	doc="A Python float object.")
				752
				753	pystring = StackObject(
				754	name='str',
				755	obtype=str,
				756	doc="A Python string object.")
				757
				758	pyunicode = StackObject(
				759	name='unicode',
				760	obtype=unicode,
				761	doc="A Python Unicode string object.")
				762
				763	pynone = StackObject(
				764	name="None",
				765	obtype=type(None),
				766	doc="The Python None object.")
				767
				768	pytuple = StackObject(
				769	name="tuple",
				770	obtype=tuple,
				771	doc="A Python tuple object.")
				772
				773	pylist = StackObject(
				774	name="list",
				775	obtype=list,
				776	doc="A Python list object.")
				777
				778	pydict = StackObject(
				779	name="dict",
				780	obtype=dict,
				781	doc="A Python dict object.")
				782
				783	anyobject = StackObject(
				784	name='any',
				785	obtype=object,
				786	doc="Any kind of object whatsoever.")
				787
				788	markobject = StackObject(
				789	name="mark",
				790	obtype=StackObject,
				791	doc="""'The mark' is a unique object.
				792
				793	Opcodes that operate on a variable number of objects
				794	generally don't embed the count of objects in the opcode,
				795	or pull it off the stack. Instead the MARK opcode is used
				796	to push a special marker object on the stack, and then
				797	some other opcodes grab all the objects from the top of
				798	the stack down to (but not including) the topmost marker
				799	object.
				800	""")
				801
				802	stackslice = StackObject(
				803	name="stackslice",
				804	obtype=StackObject,
				805	doc="""An object representing a contiguous slice of the stack.
				806
				807	This is used in conjuction with markobject, to represent all
				808	of the stack following the topmost markobject. For example,
				809	the POP_MARK opcode changes the stack from
				810
				811	[..., markobject, stackslice]
				812	to
				813	[...]
				814
				815	No matter how many object are on the stack after the topmost
				816	markobject, POP_MARK gets rid of all of them (including the
				817	topmost markobject too).
				818	""")
				819
				820	##############################################################################
				821	# Descriptors for pickle opcodes.
				822
				823	class OpcodeInfo(object):
				824
				825	__slots__ = (
				826	# symbolic name of opcode; a string
				827	'name',
				828
				829	# the code used in a bytestream to represent the opcode; a
				830	# one-character string
				831	'code',
				832
				833	# If the opcode has an argument embedded in the byte string, an
				834	# instance of ArgumentDescriptor specifying its type. Note that
				835	# arg.reader(s) can be used to read and decode the argument from
				836	# the bytestream s, and arg.doc documents the format of the raw
				837	# argument bytes. If the opcode doesn't have an argument embedded
				838	# in the bytestream, arg should be None.
				839	'arg',
				840
				841	# what the stack looks like before this opcode runs; a list
				842	'stack_before',
				843
				844	# what the stack looks like after this opcode runs; a list
				845	'stack_after',
				846
				847	# the protocol number in which this opcode was introduced; an int
				848	'proto',
				849
				850	# human-readable docs for this opcode; a string
				851	'doc',
				852	)
				853
				854	def __init__(self, name, code, arg,
				855	stack_before, stack_after, proto, doc):
				856	assert isinstance(name, str)
				857	self.name = name
				858
				859	assert isinstance(code, str)
				860	assert len(code) == 1
				861	self.code = code
				862
				863	assert arg is None or isinstance(arg, ArgumentDescriptor)
				864	self.arg = arg
				865
				866	assert isinstance(stack_before, list)
				867	for x in stack_before:
				868	assert isinstance(x, StackObject)
				869	self.stack_before = stack_before
				870
				871	assert isinstance(stack_after, list)
				872	for x in stack_after:
				873	assert isinstance(x, StackObject)
				874	self.stack_after = stack_after
				875
				876	assert isinstance(proto, int) and 0 <= proto <= 2
				877	self.proto = proto
				878
				879	assert isinstance(doc, str)
				880	self.doc = doc
				881
				882	I = OpcodeInfo
				883	opcodes = [
				884
				885	# Ways to spell integers.
				886
				887	I(name='INT',
				888	code='I',
				889	arg=decimalnl_short,
				890	stack_before=[],
				891	stack_after=[pyinteger_or_bool],
				892	proto=0,
				893	doc="""Push an integer or bool.
				894
				895	The argument is a newline-terminated decimal literal string.
				896
				897	The intent may have been that this always fit in a short Python int,
				898	but INT can be generated in pickles written on a 64-bit box that
				899	require a Python long on a 32-bit box. The difference between this
				900	and LONG then is that INT skips a trailing 'L', and produces a short
				901	int whenever possible.
				902
				903	Another difference is due to that, when bool was introduced as a
				904	distinct type in 2.3, builtin names True and False were also added to
				905	2.2.2, mapping to ints 1 and 0. For compatibility in both directions,
				906	True gets pickled as INT + "I01\\n", and False as INT + "I00\\n".
				907	Leading zeroes are never produced for a genuine integer. The 2.3
				908	(and later) unpicklers special-case these and return bool instead;
				909	earlier unpicklers ignore the leading "0" and return the int.
				910	"""),
				911
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	912	I(name='BININT',
				913	code='J',
				914	arg=int4,
				915	stack_before=[],
				916	stack_after=[pyint],
				917	proto=1,
				918	doc="""Push a four-byte signed integer.
				919
				920	This handles the full range of Python (short) integers on a 32-bit
				921	box, directly as binary bytes (1 for the opcode and 4 for the integer).
				922	If the integer is non-negative and fits in 1 or 2 bytes, pickling via
				923	BININT1 or BININT2 saves space.
				924	"""),
				925
				926	I(name='BININT1',
				927	code='K',
				928	arg=uint1,
				929	stack_before=[],
				930	stack_after=[pyint],
				931	proto=1,
				932	doc="""Push a one-byte unsigned integer.
				933
				934	This is a space optimization for pickling very small non-negative ints,
				935	in range(256).
				936	"""),
				937
				938	I(name='BININT2',
				939	code='M',
				940	arg=uint2,
				941	stack_before=[],
				942	stack_after=[pyint],
				943	proto=1,
				944	doc="""Push a two-byte unsigned integer.
				945
				946	This is a space optimization for pickling small positive ints, in
				947	range(256, 2**16). Integers in range(256) can also be pickled via
				948	BININT2, but BININT1 instead saves a byte.
				949	"""),
				950
Tim Peters	fdc0346	2003-01-28 04:56:33 +0000	[diff] [blame]	951	I(name='LONG',
				952	code='L',
				953	arg=decimalnl_long,
				954	stack_before=[],
				955	stack_after=[pylong],
				956	proto=0,
				957	doc="""Push a long integer.
				958
				959	The same as INT, except that the literal ends with 'L', and always
				960	unpickles to a Python long. There doesn't seem a real purpose to the
				961	trailing 'L'.
				962
				963	Note that LONG takes time quadratic in the number of digits when
				964	unpickling (this is simply due to the nature of decimal->binary
				965	conversion). Proto 2 added linear-time (in C; still quadratic-time
				966	in Python) LONG1 and LONG4 opcodes.
				967	"""),
				968
				969	I(name="LONG1",
				970	code='\x8a',
				971	arg=long1,
				972	stack_before=[],
				973	stack_after=[pylong],
				974	proto=2,
				975	doc="""Long integer using one-byte length.
				976
				977	A more efficient encoding of a Python long; the long1 encoding
				978	says it all."""),
				979
				980	I(name="LONG4",
				981	code='\x8b',
				982	arg=long4,
				983	stack_before=[],
				984	stack_after=[pylong],
				985	proto=2,
				986	doc="""Long integer using found-byte length.
				987
				988	A more efficient encoding of a Python long; the long4 encoding
				989	says it all."""),
				990
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	991	# Ways to spell strings (8-bit, not Unicode).
				992
				993	I(name='STRING',
				994	code='S',
				995	arg=stringnl,
				996	stack_before=[],
				997	stack_after=[pystring],
				998	proto=0,
				999	doc="""Push a Python string object.
				1000
				1001	The argument is a repr-style string, with bracketing quote characters,
				1002	and perhaps embedded escapes. The argument extends until the next
				1003	newline character.
				1004	"""),
				1005
				1006	I(name='BINSTRING',
				1007	code='T',
				1008	arg=string4,
				1009	stack_before=[],
				1010	stack_after=[pystring],
				1011	proto=1,
				1012	doc="""Push a Python string object.
				1013
				1014	There are two arguments: the first is a 4-byte little-endian signed int
				1015	giving the number of bytes in the string, and the second is that many
				1016	bytes, which are taken literally as the string content.
				1017	"""),
				1018
				1019	I(name='SHORT_BINSTRING',
				1020	code='U',
				1021	arg=string1,
				1022	stack_before=[],
				1023	stack_after=[pystring],
				1024	proto=1,
				1025	doc="""Push a Python string object.
				1026
				1027	There are two arguments: the first is a 1-byte unsigned int giving
				1028	the number of bytes in the string, and the second is that many bytes,
				1029	which are taken literally as the string content.
				1030	"""),
				1031
				1032	# Ways to spell None.
				1033
				1034	I(name='NONE',
				1035	code='N',
				1036	arg=None,
				1037	stack_before=[],
				1038	stack_after=[pynone],
				1039	proto=0,
				1040	doc="Push None on the stack."),
				1041
Tim Peters	fdc0346	2003-01-28 04:56:33 +0000	[diff] [blame]	1042	# Ways to spell bools, starting with proto 2. See INT for how this was
				1043	# done before proto 2.
				1044
				1045	I(name='NEWTRUE',
				1046	code='\x88',
				1047	arg=None,
				1048	stack_before=[],
				1049	stack_after=[pybool],
				1050	proto=2,
				1051	doc="""True.
				1052
				1053	Push True onto the stack."""),
				1054
				1055	I(name='NEWFALSE',
				1056	code='\x89',
				1057	arg=None,
				1058	stack_before=[],
				1059	stack_after=[pybool],
				1060	proto=2,
				1061	doc="""True.
				1062
				1063	Push False onto the stack."""),
				1064
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	1065	# Ways to spell Unicode strings.
				1066
				1067	I(name='UNICODE',
				1068	code='V',
				1069	arg=unicodestringnl,
				1070	stack_before=[],
				1071	stack_after=[pyunicode],
				1072	proto=0, # this may be pure-text, but it's a later addition
				1073	doc="""Push a Python Unicode string object.
				1074
				1075	The argument is a raw-unicode-escape encoding of a Unicode string,
				1076	and so may contain embedded escape sequences. The argument extends
				1077	until the next newline character.
				1078	"""),
				1079
				1080	I(name='BINUNICODE',
				1081	code='X',
				1082	arg=unicodestring4,
				1083	stack_before=[],
				1084	stack_after=[pyunicode],
				1085	proto=1,
				1086	doc="""Push a Python Unicode string object.
				1087
				1088	There are two arguments: the first is a 4-byte little-endian signed int
				1089	giving the number of bytes in the string. The second is that many
				1090	bytes, and is the UTF-8 encoding of the Unicode string.
				1091	"""),
				1092
				1093	# Ways to spell floats.
				1094
				1095	I(name='FLOAT',
				1096	code='F',
				1097	arg=floatnl,
				1098	stack_before=[],
				1099	stack_after=[pyfloat],
				1100	proto=0,
				1101	doc="""Newline-terminated decimal float literal.
				1102
				1103	The argument is repr(a_float), and in general requires 17 significant
				1104	digits for roundtrip conversion to be an identity (this is so for
				1105	IEEE-754 double precision values, which is what Python float maps to
				1106	on most boxes).
				1107
				1108	In general, FLOAT cannot be used to transport infinities, NaNs, or
				1109	minus zero across boxes (or even on a single box, if the platform C
				1110	library can't read the strings it produces for such things -- Windows
				1111	is like that), but may do less damage than BINFLOAT on boxes with
				1112	greater precision or dynamic range than IEEE-754 double.
				1113	"""),
				1114
				1115	I(name='BINFLOAT',
				1116	code='G',
				1117	arg=float8,
				1118	stack_before=[],
				1119	stack_after=[pyfloat],
				1120	proto=1,
				1121	doc="""Float stored in binary form, with 8 bytes of data.
				1122
				1123	This generally requires less than half the space of FLOAT encoding.
				1124	In general, BINFLOAT cannot be used to transport infinities, NaNs, or
				1125	minus zero, raises an exception if the exponent exceeds the range of
				1126	an IEEE-754 double, and retains no more than 53 bits of precision (if
				1127	there are more than that, "add a half and chop" rounding is used to
				1128	cut it back to 53 significant bits).
				1129	"""),
				1130
				1131	# Ways to build lists.
				1132
				1133	I(name='EMPTY_LIST',
				1134	code=']',
				1135	arg=None,
				1136	stack_before=[],
				1137	stack_after=[pylist],
				1138	proto=1,
				1139	doc="Push an empty list."),
				1140
				1141	I(name='APPEND',
				1142	code='a',
				1143	arg=None,
				1144	stack_before=[pylist, anyobject],
				1145	stack_after=[pylist],
				1146	proto=0,
				1147	doc="""Append an object to a list.
				1148
				1149	Stack before: ... pylist anyobject
				1150	Stack after: ... pylist+[anyobject]
Tim Peters	81098ac	2003-01-28 05:12:08 +0000	[diff] [blame^]	1151
				1152	although pylist is really extended in-place.
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	1153	"""),
				1154
				1155	I(name='APPENDS',
				1156	code='e',
				1157	arg=None,
				1158	stack_before=[pylist, markobject, stackslice],
				1159	stack_after=[pylist],
				1160	proto=1,
				1161	doc="""Extend a list by a slice of stack objects.
				1162
				1163	Stack before: ... pylist markobject stackslice
				1164	Stack after: ... pylist+stackslice
Tim Peters	81098ac	2003-01-28 05:12:08 +0000	[diff] [blame^]	1165
				1166	although pylist is really extended in-place.
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	1167	"""),
				1168
				1169	I(name='LIST',
				1170	code='l',
				1171	arg=None,
				1172	stack_before=[markobject, stackslice],
				1173	stack_after=[pylist],
				1174	proto=0,
				1175	doc="""Build a list out of the topmost stack slice, after markobject.
				1176
				1177	All the stack entries following the topmost markobject are placed into
				1178	a single Python list, which single list object replaces all of the
				1179	stack from the topmost markobject onward. For example,
				1180
				1181	Stack before: ... markobject 1 2 3 'abc'
				1182	Stack after: ... [1, 2, 3, 'abc']
				1183	"""),
				1184
				1185	# Ways to build tuples.
				1186
				1187	I(name='EMPTY_TUPLE',
				1188	code=')',
				1189	arg=None,
				1190	stack_before=[],
				1191	stack_after=[pytuple],
				1192	proto=1,
				1193	doc="Push an empty tuple."),
				1194
				1195	I(name='TUPLE',
				1196	code='t',
				1197	arg=None,
				1198	stack_before=[markobject, stackslice],
				1199	stack_after=[pytuple],
				1200	proto=0,
				1201	doc="""Build a tuple out of the topmost stack slice, after markobject.
				1202
				1203	All the stack entries following the topmost markobject are placed into
				1204	a single Python tuple, which single tuple object replaces all of the
				1205	stack from the topmost markobject onward. For example,
				1206
				1207	Stack before: ... markobject 1 2 3 'abc'
				1208	Stack after: ... (1, 2, 3, 'abc')
				1209	"""),
				1210
Tim Peters	fdc0346	2003-01-28 04:56:33 +0000	[diff] [blame]	1211	I(name='TUPLE1',
				1212	code='\x85',
				1213	arg=None,
				1214	stack_before=[anyobject],
				1215	stack_after=[pytuple],
				1216	proto=2,
				1217	doc="""One-tuple.
				1218
				1219	This code pops one value off the stack and pushes a tuple of
				1220	length 1 whose one item is that value back onto it. IOW:
				1221
				1222	stack[-1] = tuple(stack[-1:])
				1223	"""),
				1224
				1225	I(name='TUPLE2',
				1226	code='\x86',
				1227	arg=None,
				1228	stack_before=[anyobject, anyobject],
				1229	stack_after=[pytuple],
				1230	proto=2,
				1231	doc="""One-tuple.
				1232
				1233	This code pops two values off the stack and pushes a tuple
				1234	of length 2 whose items are those values back onto it. IOW:
				1235
				1236	stack[-2:] = [tuple(stack[-2:])]
				1237	"""),
				1238
				1239	I(name='TUPLE3',
				1240	code='\x87',
				1241	arg=None,
				1242	stack_before=[anyobject, anyobject, anyobject],
				1243	stack_after=[pytuple],
				1244	proto=2,
				1245	doc="""One-tuple.
				1246
				1247	This code pops three values off the stack and pushes a tuple
				1248	of length 3 whose items are those values back onto it. IOW:
				1249
				1250	stack[-3:] = [tuple(stack[-3:])]
				1251	"""),
				1252
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	1253	# Ways to build dicts.
				1254
				1255	I(name='EMPTY_DICT',
				1256	code='}',
				1257	arg=None,
				1258	stack_before=[],
				1259	stack_after=[pydict],
				1260	proto=1,
				1261	doc="Push an empty dict."),
				1262
				1263	I(name='DICT',
				1264	code='d',
				1265	arg=None,
				1266	stack_before=[markobject, stackslice],
				1267	stack_after=[pydict],
				1268	proto=0,
				1269	doc="""Build a dict out of the topmost stack slice, after markobject.
				1270
				1271	All the stack entries following the topmost markobject are placed into
				1272	a single Python dict, which single dict object replaces all of the
				1273	stack from the topmost markobject onward. The stack slice alternates
				1274	key, value, key, value, .... For example,
				1275
				1276	Stack before: ... markobject 1 2 3 'abc'
				1277	Stack after: ... {1: 2, 3: 'abc'}
				1278	"""),
				1279
				1280	I(name='SETITEM',
				1281	code='s',
				1282	arg=None,
				1283	stack_before=[pydict, anyobject, anyobject],
				1284	stack_after=[pydict],
				1285	proto=0,
				1286	doc="""Add a key+value pair to an existing dict.
				1287
				1288	Stack before: ... pydict key value
				1289	Stack after: ... pydict
				1290
				1291	where pydict has been modified via pydict[key] = value.
				1292	"""),
				1293
				1294	I(name='SETITEMS',
				1295	code='u',
				1296	arg=None,
				1297	stack_before=[pydict, markobject, stackslice],
				1298	stack_after=[pydict],
				1299	proto=1,
				1300	doc="""Add an arbitrary number of key+value pairs to an existing dict.
				1301
				1302	The slice of the stack following the topmost markobject is taken as
				1303	an alternating sequence of keys and values, added to the dict
				1304	immediately under the topmost markobject. Everything at and after the
				1305	topmost markobject is popped, leaving the mutated dict at the top
				1306	of the stack.
				1307
				1308	Stack before: ... pydict markobject key_1 value_1 ... key_n value_n
				1309	Stack after: ... pydict
				1310
				1311	where pydict has been modified via pydict[key_i] = value_i for i in
				1312	1, 2, ..., n, and in that order.
				1313	"""),
				1314
				1315	# Stack manipulation.
				1316
				1317	I(name='POP',
				1318	code='0',
				1319	arg=None,
				1320	stack_before=[anyobject],
				1321	stack_after=[],
				1322	proto=0,
				1323	doc="Discard the top stack item, shrinking the stack by one item."),
				1324
				1325	I(name='DUP',
				1326	code='2',
				1327	arg=None,
				1328	stack_before=[anyobject],
				1329	stack_after=[anyobject, anyobject],
				1330	proto=0,
				1331	doc="Push the top stack item onto the stack again, duplicating it."),
				1332
				1333	I(name='MARK',
				1334	code='(',
				1335	arg=None,
				1336	stack_before=[],
				1337	stack_after=[markobject],
				1338	proto=0,
				1339	doc="""Push markobject onto the stack.
				1340
				1341	markobject is a unique object, used by other opcodes to identify a
				1342	region of the stack containing a variable number of objects for them
				1343	to work on. See markobject.doc for more detail.
				1344	"""),
				1345
				1346	I(name='POP_MARK',
				1347	code='1',
				1348	arg=None,
				1349	stack_before=[markobject, stackslice],
				1350	stack_after=[],
				1351	proto=0,
				1352	doc="""Pop all the stack objects at and above the topmost markobject.
				1353
				1354	When an opcode using a variable number of stack objects is done,
				1355	POP_MARK is used to remove those objects, and to remove the markobject
				1356	that delimited their starting position on the stack.
				1357	"""),
				1358
				1359	# Memo manipulation. There are really only two operations (get and put),
				1360	# each in all-text, "short binary", and "long binary" flavors.
				1361
				1362	I(name='GET',
				1363	code='g',
				1364	arg=decimalnl_short,
				1365	stack_before=[],
				1366	stack_after=[anyobject],
				1367	proto=0,
				1368	doc="""Read an object from the memo and push it on the stack.
				1369
				1370	The index of the memo object to push is given by the newline-teriminated
				1371	decimal string following. BINGET and LONG_BINGET are space-optimized
				1372	versions.
				1373	"""),
				1374
				1375	I(name='BINGET',
				1376	code='h',
				1377	arg=uint1,
				1378	stack_before=[],
				1379	stack_after=[anyobject],
				1380	proto=1,
				1381	doc="""Read an object from the memo and push it on the stack.
				1382
				1383	The index of the memo object to push is given by the 1-byte unsigned
				1384	integer following.
				1385	"""),
				1386
				1387	I(name='LONG_BINGET',
				1388	code='j',
				1389	arg=int4,
				1390	stack_before=[],
				1391	stack_after=[anyobject],
				1392	proto=1,
				1393	doc="""Read an object from the memo and push it on the stack.
				1394
				1395	The index of the memo object to push is given by the 4-byte signed
				1396	little-endian integer following.
				1397	"""),
				1398
				1399	I(name='PUT',
				1400	code='p',
				1401	arg=decimalnl_short,
				1402	stack_before=[],
				1403	stack_after=[],
				1404	proto=0,
				1405	doc="""Store the stack top into the memo. The stack is not popped.
				1406
				1407	The index of the memo location to write into is given by the newline-
				1408	terminated decimal string following. BINPUT and LONG_BINPUT are
				1409	space-optimized versions.
				1410	"""),
				1411
				1412	I(name='BINPUT',
				1413	code='q',
				1414	arg=uint1,
				1415	stack_before=[],
				1416	stack_after=[],
				1417	proto=1,
				1418	doc="""Store the stack top into the memo. The stack is not popped.
				1419
				1420	The index of the memo location to write into is given by the 1-byte
				1421	unsigned integer following.
				1422	"""),
				1423
				1424	I(name='LONG_BINPUT',
				1425	code='r',
				1426	arg=int4,
				1427	stack_before=[],
				1428	stack_after=[],
				1429	proto=1,
				1430	doc="""Store the stack top into the memo. The stack is not popped.
				1431
				1432	The index of the memo location to write into is given by the 4-byte
				1433	signed little-endian integer following.
				1434	"""),
				1435
Tim Peters	fdc0346	2003-01-28 04:56:33 +0000	[diff] [blame]	1436	# Access the extension registry (predefined objects). Akin to the GET
				1437	# family.
				1438
				1439	I(name='EXT1',
				1440	code='\x82',
				1441	arg=uint1,
				1442	stack_before=[],
				1443	stack_after=[anyobject],
				1444	proto=2,
				1445	doc="""Extension code.
				1446
				1447	This code and the similar EXT2 and EXT4 allow using a registry
				1448	of popular objects that are pickled by name, typically classes.
				1449	It is envisioned that through a global negotiation and
				1450	registration process, third parties can set up a mapping between
				1451	ints and object names.
				1452
				1453	In order to guarantee pickle interchangeability, the extension
				1454	code registry ought to be global, although a range of codes may
				1455	be reserved for private use.
				1456
				1457	EXT1 has a 1-byte integer argument. This is used to index into the
				1458	extension registry, and the object at that index is pushed on the stack.
				1459	"""),
				1460
				1461	I(name='EXT2',
				1462	code='\x83',
				1463	arg=uint2,
				1464	stack_before=[],
				1465	stack_after=[anyobject],
				1466	proto=2,
				1467	doc="""Extension code.
				1468
				1469	See EXT1. EXT2 has a two-byte integer argument.
				1470	"""),
				1471
				1472	I(name='EXT4',
				1473	code='\x84',
				1474	arg=int4,
				1475	stack_before=[],
				1476	stack_after=[anyobject],
				1477	proto=2,
				1478	doc="""Extension code.
				1479
				1480	See EXT1. EXT4 has a four-byte integer argument.
				1481	"""),
				1482
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	1483	# Push a class object, or module function, on the stack, via its module
				1484	# and name.
				1485
				1486	I(name='GLOBAL',
				1487	code='c',
				1488	arg=stringnl_noescape_pair,
				1489	stack_before=[],
				1490	stack_after=[anyobject],
				1491	proto=0,
				1492	doc="""Push a global object (module.attr) on the stack.
				1493
				1494	Two newline-terminated strings follow the GLOBAL opcode. The first is
				1495	taken as a module name, and the second as a class name. The class
				1496	object module.class is pushed on the stack. More accurately, the
				1497	object returned by self.find_class(module, class) is pushed on the
				1498	stack, so unpickling subclasses can override this form of lookup.
				1499	"""),
				1500
				1501	# Ways to build objects of classes pickle doesn't know about directly
				1502	# (user-defined classes). I despair of documenting this accurately
				1503	# and comprehensibly -- you really have to read the pickle code to
				1504	# find all the special cases.
				1505
				1506	I(name='REDUCE',
				1507	code='R',
				1508	arg=None,
				1509	stack_before=[anyobject, anyobject],
				1510	stack_after=[anyobject],
				1511	proto=0,
				1512	doc="""Push an object built from a callable and an argument tuple.
				1513
				1514	The opcode is named to remind of the __reduce__() method.
				1515
				1516	Stack before: ... callable pytuple
				1517	Stack after: ... callable(*pytuple)
				1518
				1519	The callable and the argument tuple are the first two items returned
				1520	by a __reduce__ method. Applying the callable to the argtuple is
				1521	supposed to reproduce the original object, or at least get it started.
				1522	If the __reduce__ method returns a 3-tuple, the last component is an
				1523	argument to be passed to the object's __setstate__, and then the REDUCE
				1524	opcode is followed by code to create setstate's argument, and then a
				1525	BUILD opcode to apply __setstate__ to that argument.
				1526
				1527	There are lots of special cases here. The argtuple can be None, in
				1528	which case callable.__basicnew__() is called instead to produce the
				1529	object to be pushed on the stack. This appears to be a trick unique
				1530	to ExtensionClasses, and is deprecated regardless.
				1531
				1532	If type(callable) is not ClassType, REDUCE complains unless the
				1533	callable has been registered with the copy_reg module's
				1534	safe_constructors dict, or the callable has a magic
				1535	'__safe_for_unpickling__' attribute with a true value. I'm not sure
				1536	why it does this, but I've sure seen this complaint often enough when
				1537	I didn't want to <wink>.
				1538	"""),
				1539
				1540	I(name='BUILD',
				1541	code='b',
				1542	arg=None,
				1543	stack_before=[anyobject, anyobject],
				1544	stack_after=[anyobject],
				1545	proto=0,
				1546	doc="""Finish building an object, via __setstate__ or dict update.
				1547
				1548	Stack before: ... anyobject argument
				1549	Stack after: ... anyobject
				1550
				1551	where anyobject may have been mutated, as follows:
				1552
				1553	If the object has a __setstate__ method,
				1554
				1555	anyobject.__setstate__(argument)
				1556
				1557	is called.
				1558
				1559	Else the argument must be a dict, the object must have a __dict__, and
				1560	the object is updated via
				1561
				1562	anyobject.__dict__.update(argument)
				1563
				1564	This may raise RuntimeError in restricted execution mode (which
				1565	disallows access to __dict__ directly); in that case, the object
				1566	is updated instead via
				1567
				1568	for k, v in argument.items():
				1569	anyobject[k] = v
				1570	"""),
				1571
				1572	I(name='INST',
				1573	code='i',
				1574	arg=stringnl_noescape_pair,
				1575	stack_before=[markobject, stackslice],
				1576	stack_after=[anyobject],
				1577	proto=0,
				1578	doc="""Build a class instance.
				1579
				1580	This is the protocol 0 version of protocol 1's OBJ opcode.
				1581	INST is followed by two newline-terminated strings, giving a
				1582	module and class name, just as for the GLOBAL opcode (and see
				1583	GLOBAL for more details about that). self.find_class(module, name)
				1584	is used to get a class object.
				1585
				1586	In addition, all the objects on the stack following the topmost
				1587	markobject are gathered into a tuple and popped (along with the
				1588	topmost markobject), just as for the TUPLE opcode.
				1589
				1590	Now it gets complicated. If all of these are true:
				1591
				1592	+ The argtuple is empty (markobject was at the top of the stack
				1593	at the start).
				1594
				1595	+ It's an old-style class object (the type of the class object is
				1596	ClassType).
				1597
				1598	+ The class object does not have a __getinitargs__ attribute.
				1599
				1600	then we want to create an old-style class instance without invoking
				1601	its __init__() method (pickle has waffled on this over the years; not
				1602	calling __init__() is current wisdom). In this case, an instance of
				1603	an old-style dummy class is created, and then we try to rebind its
				1604	__class__ attribute to the desired class object. If this succeeds,
				1605	the new instance object is pushed on the stack, and we're done. In
				1606	restricted execution mode it can fail (assignment to __class__ is
				1607	disallowed), and I'm not really sure what happens then -- it looks
				1608	like the code ends up calling the class object's __init__ anyway,
				1609	via falling into the next case.
				1610
				1611	Else (the argtuple is not empty, it's not an old-style class object,
				1612	or the class object does have a __getinitargs__ attribute), the code
				1613	first insists that the class object have a __safe_for_unpickling__
				1614	attribute. Unlike as for the __safe_for_unpickling__ check in REDUCE,
				1615	it doesn't matter whether this attribute has a true or false value, it
				1616	only matters whether it exists (XXX this smells like a bug). If
				1617	__safe_for_unpickling__ dosn't exist, UnpicklingError is raised.
				1618
				1619	Else (the class object does have a __safe_for_unpickling__ attr),
				1620	the class object obtained from INST's arguments is applied to the
				1621	argtuple obtained from the stack, and the resulting instance object
				1622	is pushed on the stack.
				1623	"""),
				1624
				1625	I(name='OBJ',
				1626	code='o',
				1627	arg=None,
				1628	stack_before=[markobject, anyobject, stackslice],
				1629	stack_after=[anyobject],
				1630	proto=1,
				1631	doc="""Build a class instance.
				1632
				1633	This is the protocol 1 version of protocol 0's INST opcode, and is
				1634	very much like it. The major difference is that the class object
				1635	is taken off the stack, allowing it to be retrieved from the memo
				1636	repeatedly if several instances of the same class are created. This
				1637	can be much more efficient (in both time and space) than repeatedly
				1638	embedding the module and class names in INST opcodes.
				1639
				1640	Unlike INST, OBJ takes no arguments from the opcode stream. Instead
				1641	the class object is taken off the stack, immediately above the
				1642	topmost markobject:
				1643
				1644	Stack before: ... markobject classobject stackslice
				1645	Stack after: ... new_instance_object
				1646
				1647	As for INST, the remainder of the stack above the markobject is
				1648	gathered into an argument tuple, and then the logic seems identical,
				1649	except that no __safe_for_unpickling__ check is done (XXX this smells
				1650	like a bug). See INST for the gory details.
				1651	"""),
				1652
Tim Peters	fdc0346	2003-01-28 04:56:33 +0000	[diff] [blame]	1653	I(name='NEWOBJ',
				1654	code='\x81',
				1655	arg=None,
				1656	stack_before=[anyobject, anyobject],
				1657	stack_after=[anyobject],
				1658	proto=2,
				1659	doc="""Build an object instance.
				1660
				1661	The stack before should be thought of as containing a class
				1662	object followed by an argument tuple (the tuple being the stack
				1663	top). Call these cls and args. They are popped off the stack,
				1664	and the value returned by cls.__new__(cls, *args) is pushed back
				1665	onto the stack.
				1666	"""),
				1667
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	1668	# Machine control.
				1669
Tim Peters	fdc0346	2003-01-28 04:56:33 +0000	[diff] [blame]	1670	I(name='PROTO',
				1671	code='\x80',
				1672	arg=uint1,
				1673	stack_before=[],
				1674	stack_after=[],
				1675	proto=2,
				1676	doc="""Protocol version indicator.
				1677
				1678	For protocol 2 and above, a pickle must start with this opcode.
				1679	The argument is the protocol version, an int in range(2, 256).
				1680	"""),
				1681
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	1682	I(name='STOP',
				1683	code='.',
				1684	arg=None,
				1685	stack_before=[anyobject],
				1686	stack_after=[],
				1687	proto=0,
				1688	doc="""Stop the unpickling machine.
				1689
				1690	Every pickle ends with this opcode. The object at the top of the stack
				1691	is popped, and that's the result of unpickling. The stack should be
				1692	empty then.
				1693	"""),
				1694
				1695	# Ways to deal with persistent IDs.
				1696
				1697	I(name='PERSID',
				1698	code='P',
				1699	arg=stringnl_noescape,
				1700	stack_before=[],
				1701	stack_after=[anyobject],
				1702	proto=0,
				1703	doc="""Push an object identified by a persistent ID.
				1704
				1705	The pickle module doesn't define what a persistent ID means. PERSID's
				1706	argument is a newline-terminated str-style (no embedded escapes, no
				1707	bracketing quote characters) string, which is "the persistent ID".
				1708	The unpickler passes this string to self.persistent_load(). Whatever
				1709	object that returns is pushed on the stack. There is no implementation
				1710	of persistent_load() in Python's unpickler: it must be supplied by an
				1711	unpickler subclass.
				1712	"""),
				1713
				1714	I(name='BINPERSID',
				1715	code='Q',
				1716	arg=None,
				1717	stack_before=[anyobject],
				1718	stack_after=[anyobject],
				1719	proto=1,
				1720	doc="""Push an object identified by a persistent ID.
				1721
				1722	Like PERSID, except the persistent ID is popped off the stack (instead
				1723	of being a string embedded in the opcode bytestream). The persistent
				1724	ID is passed to self.persistent_load(), and whatever object that
				1725	returns is pushed on the stack. See PERSID for more detail.
				1726	"""),
				1727	]
				1728	del I
				1729
				1730	# Verify uniqueness of .name and .code members.
				1731	name2i = {}
				1732	code2i = {}
				1733
				1734	for i, d in enumerate(opcodes):
				1735	if d.name in name2i:
				1736	raise ValueError("repeated name %r at indices %d and %d" %
				1737	(d.name, name2i[d.name], i))
				1738	if d.code in code2i:
				1739	raise ValueError("repeated code %r at indices %d and %d" %
				1740	(d.code, code2i[d.code], i))
				1741
				1742	name2i[d.name] = i
				1743	code2i[d.code] = i
				1744
				1745	del name2i, code2i, i, d
				1746
				1747	##############################################################################
				1748	# Build a code2op dict, mapping opcode characters to OpcodeInfo records.
				1749	# Also ensure we've got the same stuff as pickle.py, although the
				1750	# introspection here is dicey.
				1751
				1752	code2op = {}
				1753	for d in opcodes:
				1754	code2op[d.code] = d
				1755	del d
				1756
				1757	def assure_pickle_consistency(verbose=False):
				1758	import pickle, re
				1759
				1760	copy = code2op.copy()
				1761	for name in pickle.__all__:
				1762	if not re.match("[A-Z][A-Z0-9_]+$", name):
				1763	if verbose:
				1764	print "skipping %r: it doesn't look like an opcode name" % name
				1765	continue
				1766	picklecode = getattr(pickle, name)
				1767	if not isinstance(picklecode, str) or len(picklecode) != 1:
				1768	if verbose:
				1769	print ("skipping %r: value %r doesn't look like a pickle "
				1770	"code" % (name, picklecode))
				1771	continue
				1772	if picklecode in copy:
				1773	if verbose:
				1774	print "checking name %r w/ code %r for consistency" % (
				1775	name, picklecode)
				1776	d = copy[picklecode]
				1777	if d.name != name:
				1778	raise ValueError("for pickle code %r, pickle.py uses name %r "
				1779	"but we're using name %r" % (picklecode,
				1780	name,
				1781	d.name))
				1782	# Forget this one. Any left over in copy at the end are a problem
				1783	# of a different kind.
				1784	del copy[picklecode]
				1785	else:
				1786	raise ValueError("pickle.py appears to have a pickle opcode with "
				1787	"name %r and code %r, but we don't" %
				1788	(name, picklecode))
				1789	if copy:
				1790	msg = ["we appear to have pickle opcodes that pickle.py doesn't have:"]
				1791	for code, d in copy.items():
				1792	msg.append(" name %r with code %r" % (d.name, code))
				1793	raise ValueError("\n".join(msg))
				1794
				1795	assure_pickle_consistency()
				1796
				1797	##############################################################################
				1798	# A pickle opcode generator.
				1799
				1800	def genops(pickle):
Guido van Rossum	a72ded9	2003-01-27 19:40:47 +0000	[diff] [blame]	1801	"""Generate all the opcodes in a pickle.
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	1802
				1803	'pickle' is a file-like object, or string, containing the pickle.
				1804
				1805	Each opcode in the pickle is generated, from the current pickle position,
				1806	stopping after a STOP opcode is delivered. A triple is generated for
				1807	each opcode:
				1808
				1809	opcode, arg, pos
				1810
				1811	opcode is an OpcodeInfo record, describing the current opcode.
				1812
				1813	If the opcode has an argument embedded in the pickle, arg is its decoded
				1814	value, as a Python object. If the opcode doesn't have an argument, arg
				1815	is None.
				1816
				1817	If the pickle has a tell() method, pos was the value of pickle.tell()
				1818	before reading the current opcode. If the pickle is a string object,
				1819	it's wrapped in a StringIO object, and the latter's tell() result is
				1820	used. Else (the pickle doesn't have a tell(), and it's not obvious how
				1821	to query its current position) pos is None.
				1822	"""
				1823
				1824	import cStringIO as StringIO
				1825
				1826	if isinstance(pickle, str):
				1827	pickle = StringIO.StringIO(pickle)
				1828
				1829	if hasattr(pickle, "tell"):
				1830	getpos = pickle.tell
				1831	else:
				1832	getpos = lambda: None
				1833
				1834	while True:
				1835	pos = getpos()
				1836	code = pickle.read(1)
				1837	opcode = code2op.get(code)
				1838	if opcode is None:
				1839	if code == "":
				1840	raise ValueError("pickle exhausted before seeing STOP")
				1841	else:
				1842	raise ValueError("at position %s, opcode %r unknown" % (
				1843	pos is None and "<unknown>" or pos,
				1844	code))
				1845	if opcode.arg is None:
				1846	arg = None
				1847	else:
				1848	arg = opcode.arg.reader(pickle)
				1849	yield opcode, arg, pos
				1850	if code == '.':
				1851	assert opcode.name == 'STOP'
				1852	break
				1853
				1854	##############################################################################
				1855	# A symbolic pickle disassembler.
				1856
				1857	def dis(pickle, out=None, indentlevel=4):
				1858	"""Produce a symbolic disassembly of a pickle.
				1859
				1860	'pickle' is a file-like object, or string, containing a (at least one)
				1861	pickle. The pickle is disassembled from the current position, through
				1862	the first STOP opcode encountered.
				1863
				1864	Optional arg 'out' is a file-like object to which the disassembly is
				1865	printed. It defaults to sys.stdout.
				1866
				1867	Optional arg indentlevel is the number of blanks by which to indent
				1868	a new MARK level. It defaults to 4.
				1869	"""
				1870
				1871	markstack = []
				1872	indentchunk = ' ' * indentlevel
				1873	for opcode, arg, pos in genops(pickle):
				1874	if pos is not None:
				1875	print >> out, "%5d:" % pos,
				1876
				1877	line = "%s %s%s" % (opcode.code,
				1878	indentchunk * len(markstack),
				1879	opcode.name)
				1880
				1881	markmsg = None
				1882	if markstack and markobject in opcode.stack_before:
				1883	assert markobject not in opcode.stack_after
				1884	markpos = markstack.pop()
				1885	if markpos is not None:
				1886	markmsg = "(MARK at %d)" % markpos
				1887
				1888	if arg is not None or markmsg:
				1889	# make a mild effort to align arguments
				1890	line += ' ' * (10 - len(opcode.name))
				1891	if arg is not None:
				1892	line += ' ' + repr(arg)
				1893	if markmsg:
				1894	line += ' ' + markmsg
				1895	print >> out, line
				1896
				1897	if markobject in opcode.stack_after:
				1898	assert markobject not in opcode.stack_before
				1899	markstack.append(pos)
				1900
				1901
				1902	_dis_test = """
				1903	>>> import pickle
				1904	>>> x = [1, 2, (3, 4), {'abc': u"def"}]
Guido van Rossum	f29d3d6	2003-01-27 22:47:53 +0000	[diff] [blame]	1905	>>> pik = pickle.dumps(x, 0)
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	1906	>>> dis(pik)
				1907	0: ( MARK
				1908	1: l LIST (MARK at 0)
				1909	2: p PUT 0
				1910	5: I INT 1
				1911	8: a APPEND
				1912	9: I INT 2
				1913	12: a APPEND
				1914	13: ( MARK
				1915	14: I INT 3
				1916	17: I INT 4
				1917	20: t TUPLE (MARK at 13)
				1918	21: p PUT 1
				1919	24: a APPEND
				1920	25: ( MARK
				1921	26: d DICT (MARK at 25)
				1922	27: p PUT 2
				1923	30: S STRING 'abc'
				1924	37: p PUT 3
				1925	40: V UNICODE u'def'
				1926	45: p PUT 4
				1927	48: s SETITEM
				1928	49: a APPEND
				1929	50: . STOP
				1930
				1931	Try again with a "binary" pickle.
				1932
				1933	>>> pik = pickle.dumps(x, 1)
				1934	>>> dis(pik)
				1935	0: ] EMPTY_LIST
				1936	1: q BINPUT 0
				1937	3: ( MARK
				1938	4: K BININT1 1
				1939	6: K BININT1 2
				1940	8: ( MARK
				1941	9: K BININT1 3
				1942	11: K BININT1 4
				1943	13: t TUPLE (MARK at 8)
				1944	14: q BINPUT 1
				1945	16: } EMPTY_DICT
				1946	17: q BINPUT 2
				1947	19: U SHORT_BINSTRING 'abc'
				1948	24: q BINPUT 3
				1949	26: X BINUNICODE u'def'
				1950	34: q BINPUT 4
				1951	36: s SETITEM
				1952	37: e APPENDS (MARK at 3)
				1953	38: . STOP
				1954
				1955	Exercise the INST/OBJ/BUILD family.
				1956
				1957	>>> import random
Guido van Rossum	f29d3d6	2003-01-27 22:47:53 +0000	[diff] [blame]	1958	>>> dis(pickle.dumps(random.random, 0))
Tim Peters	d916cf4	2003-01-27 19:01:47 +0000	[diff] [blame]	1959	0: c GLOBAL 'random random'
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	1960	15: p PUT 0
				1961	18: . STOP
				1962
				1963	>>> x = [pickle.PicklingError()] * 2
Guido van Rossum	f29d3d6	2003-01-27 22:47:53 +0000	[diff] [blame]	1964	>>> dis(pickle.dumps(x, 0))
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	1965	0: ( MARK
				1966	1: l LIST (MARK at 0)
				1967	2: p PUT 0
				1968	5: ( MARK
Tim Peters	d916cf4	2003-01-27 19:01:47 +0000	[diff] [blame]	1969	6: i INST 'pickle PicklingError' (MARK at 5)
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	1970	28: p PUT 1
				1971	31: ( MARK
				1972	32: d DICT (MARK at 31)
				1973	33: p PUT 2
				1974	36: S STRING 'args'
				1975	44: p PUT 3
				1976	47: ( MARK
				1977	48: t TUPLE (MARK at 47)
				1978	49: p PUT 4
				1979	52: s SETITEM
				1980	53: b BUILD
				1981	54: a APPEND
				1982	55: g GET 1
				1983	58: a APPEND
				1984	59: . STOP
				1985
				1986	>>> dis(pickle.dumps(x, 1))
				1987	0: ] EMPTY_LIST
				1988	1: q BINPUT 0
				1989	3: ( MARK
				1990	4: ( MARK
Tim Peters	d916cf4	2003-01-27 19:01:47 +0000	[diff] [blame]	1991	5: c GLOBAL 'pickle PicklingError'
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	1992	27: q BINPUT 1
				1993	29: o OBJ (MARK at 4)
				1994	30: q BINPUT 2
				1995	32: } EMPTY_DICT
				1996	33: q BINPUT 3
				1997	35: U SHORT_BINSTRING 'args'
				1998	41: q BINPUT 4
				1999	43: ) EMPTY_TUPLE
				2000	44: s SETITEM
				2001	45: b BUILD
				2002	46: h BINGET 2
				2003	48: e APPENDS (MARK at 3)
				2004	49: . STOP
				2005
				2006	Try "the canonical" recursive-object test.
				2007
				2008	>>> L = []
				2009	>>> T = L,
				2010	>>> L.append(T)
				2011	>>> L[0] is T
				2012	True
				2013	>>> T[0] is L
				2014	True
				2015	>>> L[0][0] is L
				2016	True
				2017	>>> T[0][0] is T
				2018	True
Guido van Rossum	f29d3d6	2003-01-27 22:47:53 +0000	[diff] [blame]	2019	>>> dis(pickle.dumps(L, 0))
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	2020	0: ( MARK
				2021	1: l LIST (MARK at 0)
				2022	2: p PUT 0
				2023	5: ( MARK
				2024	6: g GET 0
				2025	9: t TUPLE (MARK at 5)
				2026	10: p PUT 1
				2027	13: a APPEND
				2028	14: . STOP
				2029	>>> dis(pickle.dumps(L, 1))
				2030	0: ] EMPTY_LIST
				2031	1: q BINPUT 0
				2032	3: ( MARK
				2033	4: h BINGET 0
				2034	6: t TUPLE (MARK at 3)
				2035	7: q BINPUT 1
				2036	9: a APPEND
				2037	10: . STOP
				2038
				2039	The protocol 0 pickle of the tuple causes the disassembly to get confused,
				2040	as it doesn't realize that the POP opcode at 16 gets rid of the MARK at 0
				2041	(so the output remains indented until the end). The protocol 1 pickle
				2042	doesn't trigger this glitch, because the disassembler realizes that
				2043	POP_MARK gets rid of the MARK. Doing a better job on the protocol 0
				2044	pickle would require the disassembler to emulate the stack.
				2045
Guido van Rossum	f29d3d6	2003-01-27 22:47:53 +0000	[diff] [blame]	2046	>>> dis(pickle.dumps(T, 0))
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	2047	0: ( MARK
				2048	1: ( MARK
				2049	2: l LIST (MARK at 1)
				2050	3: p PUT 0
				2051	6: ( MARK
				2052	7: g GET 0
				2053	10: t TUPLE (MARK at 6)
				2054	11: p PUT 1
				2055	14: a APPEND
				2056	15: 0 POP
				2057	16: 0 POP
				2058	17: g GET 1
				2059	20: . STOP
				2060	>>> dis(pickle.dumps(T, 1))
				2061	0: ( MARK
				2062	1: ] EMPTY_LIST
				2063	2: q BINPUT 0
				2064	4: ( MARK
				2065	5: h BINGET 0
				2066	7: t TUPLE (MARK at 4)
				2067	8: q BINPUT 1
				2068	10: a APPEND
				2069	11: 1 POP_MARK (MARK at 0)
				2070	12: h BINGET 1
				2071	14: . STOP
				2072	"""
				2073
				2074	__test__ = {'dissassembler_test': _dis_test,
				2075	}
				2076
				2077	def _test():
				2078	import doctest
				2079	return doctest.testmod()
				2080
				2081	if __name__ == "__main__":
				2082	_test()