Blame - Lib/pickletools.py - platform/external/python/cpython2

blob: cbc265fc8049ead47903501efead3cfa4faff5f0 [file] [log] [blame]

Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	1	""""Executable documentation" for the pickle module.
				2
				3	Extensive comments about the pickle protocols and pickle-machine opcodes
				4	can be found here. Some functions meant for external use:
				5
				6	genops(pickle)
				7	Generate all the opcodes in a pickle, as (opcode, arg, position) triples.
				8
				9	dis(pickle, out=None, indentlevel=4)
				10	Print a symbolic disassembly of a pickle.
				11	"""
				12
				13	# Other ideas:
				14	#
				15	# - A pickle verifier: read a pickle and check it exhaustively for
				16	# well-formedness.
				17	#
				18	# - A protocol identifier: examine a pickle and return its protocol number
				19	# (== the highest .proto attr value among all the opcodes in the pickle).
				20	#
				21	# - A pickle optimizer: for example, tuple-building code is sometimes more
				22	# elaborate than necessary, catering for the possibility that the tuple
				23	# is recursive. Or lots of times a PUT is generated that's never accessed
				24	# by a later GET.
				25
				26
				27	"""
				28	"A pickle" is a program for a virtual pickle machine (PM, but more accurately
				29	called an unpickling machine). It's a sequence of opcodes, interpreted by the
				30	PM, building an arbitrarily complex Python object.
				31
				32	For the most part, the PM is very simple: there are no looping, testing, or
				33	conditional instructions, no arithmetic and no function calls. Opcodes are
				34	executed once each, from first to last, until a STOP opcode is reached.
				35
				36	The PM has two data areas, "the stack" and "the memo".
				37
				38	Many opcodes push Python objects onto the stack; e.g., INT pushes a Python
				39	integer object on the stack, whose value is gotten from a decimal string
				40	literal immediately following the INT opcode in the pickle bytestream. Other
				41	opcodes take Python objects off the stack. The result of unpickling is
				42	whatever object is left on the stack when the final STOP opcode is executed.
				43
				44	The memo is simply an array of objects, or it can be implemented as a dict
				45	mapping little integers to objects. The memo serves as the PM's "long term
				46	memory", and the little integers indexing the memo are akin to variable
				47	names. Some opcodes pop a stack object into the memo at a given index,
				48	and others push a memo object at a given index onto the stack again.
				49
				50	At heart, that's all the PM has. Subtleties arise for these reasons:
				51
				52	+ Object identity. Objects can be arbitrarily complex, and subobjects
				53	may be shared (for example, the list [a, a] refers to the same object a
				54	twice). It can be vital that unpickling recreate an isomorphic object
				55	graph, faithfully reproducing sharing.
				56
				57	+ Recursive objects. For example, after "L = []; L.append(L)", L is a
				58	list, and L[0] is the same list. This is related to the object identity
				59	point, and some sequences of pickle opcodes are subtle in order to
				60	get the right result in all cases.
				61
				62	+ Things pickle doesn't know everything about. Examples of things pickle
				63	does know everything about are Python's builtin scalar and container
				64	types, like ints and tuples. They generally have opcodes dedicated to
				65	them. For things like module references and instances of user-defined
				66	classes, pickle's knowledge is limited. Historically, many enhancements
				67	have been made to the pickle protocol in order to do a better (faster,
				68	and/or more compact) job on those.
				69
				70	+ Backward compatibility and micro-optimization. As explained below,
				71	pickle opcodes never go away, not even when better ways to do a thing
				72	get invented. The repertoire of the PM just keeps growing over time.
Tim Peters	1996e23	2003-01-27 19:38:34 +0000	[diff] [blame]	73	So, e.g., there are now five distinct opcodes for building a Python integer,
				74	four of them devoted to "short" integers. Even so, the only way to pickle
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	75	a Python long int takes time quadratic in the number of digits, for both
				76	pickling and unpickling. This isn't so much a subtlety as a source of
				77	wearying complication.
				78
				79
				80	Pickle protocols:
				81
				82	For compatibility, the meaning of a pickle opcode never changes. Instead new
				83	pickle opcodes get added, and each version's unpickler can handle all the
				84	pickle opcodes in all protocol versions to date. So old pickles continue to
				85	be readable forever. The pickler can generally be told to restrict itself to
				86	the subset of opcodes available under previous protocol versions too, so that
				87	users can create pickles under the current version readable by older
				88	versions. However, a pickle does not contain its version number embedded
				89	within it. If an older unpickler tries to read a pickle using a later
				90	protocol, the result is most likely an exception due to seeing an unknown (in
				91	the older unpickler) opcode.
				92
				93	The original pickle used what's now called "protocol 0", and what was called
				94	"text mode" before Python 2.3. The entire pickle bytestream is made up of
				95	printable 7-bit ASCII characters, plus the newline character, in protocol 0.
				96	That's why it was called text mode.
				97
				98	The second major set of additions is now called "protocol 1", and was called
				99	"binary mode" before Python 2.3. This added many opcodes with arguments
				100	consisting of arbitrary bytes, including NUL bytes and unprintable "high bit"
				101	bytes. Binary mode pickles can be substantially smaller than equivalent
				102	text mode pickles, and sometimes faster too; e.g., BININT represents a 4-byte
				103	int as 4 bytes following the opcode, which is cheaper to unpickle than the
				104	(perhaps) 11-character decimal string attached to INT.
				105
				106	The third major set of additions came in Python 2.3, and is called "protocol
				107	2". XXX Write a short blurb when Guido figures out what they are <wink>. XXX
				108	"""
				109
				110	# Meta-rule: Descriptions are stored in instances of descriptor objects,
				111	# with plain constructors. No meta-language is defined from which
				112	# descriptors could be constructed. If you want, e.g., XML, write a little
				113	# program to generate XML from the objects.
				114
				115	##############################################################################
				116	# Some pickle opcodes have an argument, following the opcode in the
				117	# bytestream. An argument is of a specific type, described by an instance
				118	# of ArgumentDescriptor. These are not to be confused with arguments taken
				119	# off the stack -- ArgumentDescriptor applies only to arguments embedded in
				120	# the opcode stream, immediately following an opcode.
				121
				122	# Represents the number of bytes consumed by an argument delimited by the
				123	# next newline character.
				124	UP_TO_NEWLINE = -1
				125
				126	# Represents the number of bytes consumed by a two-argument opcode where
				127	# the first argument gives the number of bytes in the second argument.
				128	TAKEN_FROM_ARGUMENT = -2
				129
				130	class ArgumentDescriptor(object):
				131	__slots__ = (
				132	# name of descriptor record, also a module global name; a string
				133	'name',
				134
				135	# length of argument, in bytes; an int; UP_TO_NEWLINE and
				136	# TAKEN_FROM_ARGUMENT are negative values for variable-length cases
				137	'n',
				138
				139	# a function taking a file-like object, reading this kind of argument
				140	# from the object at the current position, advancing the current
				141	# position by n bytes, and returning the value of the argument
				142	'reader',
				143
				144	# human-readable docs for this arg descriptor; a string
				145	'doc',
				146	)
				147
				148	def __init__(self, name, n, reader, doc):
				149	assert isinstance(name, str)
				150	self.name = name
				151
				152	assert isinstance(n, int) and (n >= 0 or
				153	n is UP_TO_NEWLINE or
				154	n is TAKEN_FROM_ARGUMENT)
				155	self.n = n
				156
				157	self.reader = reader
				158
				159	assert isinstance(doc, str)
				160	self.doc = doc
				161
				162	from struct import unpack as _unpack
				163
				164	def read_uint1(f):
				165	"""
				166	>>> import StringIO
				167	>>> read_uint1(StringIO.StringIO('\\xff'))
				168	255
				169	"""
				170
				171	data = f.read(1)
				172	if data:
				173	return ord(data)
				174	raise ValueError("not enough data in stream to read uint1")
				175
				176	uint1 = ArgumentDescriptor(
				177	name='uint1',
				178	n=1,
				179	reader=read_uint1,
				180	doc="One-byte unsigned integer.")
				181
				182
				183	def read_uint2(f):
				184	"""
				185	>>> import StringIO
				186	>>> read_uint2(StringIO.StringIO('\\xff\\x00'))
				187	255
				188	>>> read_uint2(StringIO.StringIO('\\xff\\xff'))
				189	65535
				190	"""
				191
				192	data = f.read(2)
				193	if len(data) == 2:
				194	return _unpack("<H", data)[0]
				195	raise ValueError("not enough data in stream to read uint2")
				196
				197	uint2 = ArgumentDescriptor(
				198	name='uint2',
				199	n=2,
				200	reader=read_uint2,
				201	doc="Two-byte unsigned integer, little-endian.")
				202
				203
				204	def read_int4(f):
				205	"""
				206	>>> import StringIO
				207	>>> read_int4(StringIO.StringIO('\\xff\\x00\\x00\\x00'))
				208	255
				209	>>> read_int4(StringIO.StringIO('\\x00\\x00\\x00\\x80')) == -(2**31)
				210	True
				211	"""
				212
				213	data = f.read(4)
				214	if len(data) == 4:
				215	return _unpack("<i", data)[0]
				216	raise ValueError("not enough data in stream to read int4")
				217
				218	int4 = ArgumentDescriptor(
				219	name='int4',
				220	n=4,
				221	reader=read_int4,
				222	doc="Four-byte signed integer, little-endian, 2's complement.")
				223
				224
				225	def read_stringnl(f, decode=True, stripquotes=True):
				226	"""
				227	>>> import StringIO
				228	>>> read_stringnl(StringIO.StringIO("'abcd'\\nefg\\n"))
				229	'abcd'
				230
				231	>>> read_stringnl(StringIO.StringIO("\\n"))
				232	Traceback (most recent call last):
				233	...
				234	ValueError: no string quotes around ''
				235
				236	>>> read_stringnl(StringIO.StringIO("\\n"), stripquotes=False)
				237	''
				238
				239	>>> read_stringnl(StringIO.StringIO("''\\n"))
				240	''
				241
				242	>>> read_stringnl(StringIO.StringIO('"abcd"'))
				243	Traceback (most recent call last):
				244	...
				245	ValueError: no newline found when trying to read stringnl
				246
				247	Embedded escapes are undone in the result.
				248	>>> read_stringnl(StringIO.StringIO("'a\\\\nb\\x00c\\td'\\n'e'"))
				249	'a\\nb\\x00c\\td'
				250	"""
				251
				252	data = f.readline()
				253	if not data.endswith('\n'):
				254	raise ValueError("no newline found when trying to read stringnl")
				255	data = data[:-1] # lose the newline
				256
				257	if stripquotes:
				258	for q in "'\"":
				259	if data.startswith(q):
				260	if not data.endswith(q):
				261	raise ValueError("strinq quote %r not found at both "
				262	"ends of %r" % (q, data))
				263	data = data[1:-1]
				264	break
				265	else:
				266	raise ValueError("no string quotes around %r" % data)
				267
				268	# I'm not sure when 'string_escape' was added to the std codecs; it's
				269	# crazy not to use it if it's there.
				270	if decode:
				271	data = data.decode('string_escape')
				272	return data
				273
				274	stringnl = ArgumentDescriptor(
				275	name='stringnl',
				276	n=UP_TO_NEWLINE,
				277	reader=read_stringnl,
				278	doc="""A newline-terminated string.
				279
				280	This is a repr-style string, with embedded escapes, and
				281	bracketing quotes.
				282	""")
				283
				284	def read_stringnl_noescape(f):
				285	return read_stringnl(f, decode=False, stripquotes=False)
				286
				287	stringnl_noescape = ArgumentDescriptor(
				288	name='stringnl_noescape',
				289	n=UP_TO_NEWLINE,
				290	reader=read_stringnl_noescape,
				291	doc="""A newline-terminated string.
				292
				293	This is a str-style string, without embedded escapes,
				294	or bracketing quotes. It should consist solely of
				295	printable ASCII characters.
				296	""")
				297
				298	def read_stringnl_noescape_pair(f):
				299	"""
				300	>>> import StringIO
				301	>>> read_stringnl_noescape_pair(StringIO.StringIO("Queue\\nEmpty\\njunk"))
Tim Peters	d916cf4	2003-01-27 19:01:47 +0000	[diff] [blame]	302	'Queue Empty'
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	303	"""
				304
Tim Peters	d916cf4	2003-01-27 19:01:47 +0000	[diff] [blame]	305	return "%s %s" % (read_stringnl_noescape(f), read_stringnl_noescape(f))
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	306
				307	stringnl_noescape_pair = ArgumentDescriptor(
				308	name='stringnl_noescape_pair',
				309	n=UP_TO_NEWLINE,
				310	reader=read_stringnl_noescape_pair,
				311	doc="""A pair of newline-terminated strings.
				312
				313	These are str-style strings, without embedded
				314	escapes, or bracketing quotes. They should
				315	consist solely of printable ASCII characters.
				316	The pair is returned as a single string, with
Tim Peters	d916cf4	2003-01-27 19:01:47 +0000	[diff] [blame]	317	a single blank separating the two strings.
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	318	""")
				319
				320	def read_string4(f):
				321	"""
				322	>>> import StringIO
				323	>>> read_string4(StringIO.StringIO("\\x00\\x00\\x00\\x00abc"))
				324	''
				325	>>> read_string4(StringIO.StringIO("\\x03\\x00\\x00\\x00abcdef"))
				326	'abc'
				327	>>> read_string4(StringIO.StringIO("\\x00\\x00\\x00\\x03abcdef"))
				328	Traceback (most recent call last):
				329	...
				330	ValueError: expected 50331648 bytes in a string4, but only 6 remain
				331	"""
				332
				333	n = read_int4(f)
				334	if n < 0:
				335	raise ValueError("string4 byte count < 0: %d" % n)
				336	data = f.read(n)
				337	if len(data) == n:
				338	return data
				339	raise ValueError("expected %d bytes in a string4, but only %d remain" %
				340	(n, len(data)))
				341
				342	string4 = ArgumentDescriptor(
				343	name="string4",
				344	n=TAKEN_FROM_ARGUMENT,
				345	reader=read_string4,
				346	doc="""A counted string.
				347
				348	The first argument is a 4-byte little-endian signed int giving
				349	the number of bytes in the string, and the second argument is
				350	that many bytes.
				351	""")
				352
				353
				354	def read_string1(f):
				355	"""
				356	>>> import StringIO
				357	>>> read_string1(StringIO.StringIO("\\x00"))
				358	''
				359	>>> read_string1(StringIO.StringIO("\\x03abcdef"))
				360	'abc'
				361	"""
				362
				363	n = read_uint1(f)
				364	assert n >= 0
				365	data = f.read(n)
				366	if len(data) == n:
				367	return data
				368	raise ValueError("expected %d bytes in a string1, but only %d remain" %
				369	(n, len(data)))
				370
				371	string1 = ArgumentDescriptor(
				372	name="string1",
				373	n=TAKEN_FROM_ARGUMENT,
				374	reader=read_string1,
				375	doc="""A counted string.
				376
				377	The first argument is a 1-byte unsigned int giving the number
				378	of bytes in the string, and the second argument is that many
				379	bytes.
				380	""")
				381
				382
				383	def read_unicodestringnl(f):
				384	"""
				385	>>> import StringIO
				386	>>> read_unicodestringnl(StringIO.StringIO("abc\\uabcd\\njunk"))
				387	u'abc\\uabcd'
				388	"""
				389
				390	data = f.readline()
				391	if not data.endswith('\n'):
				392	raise ValueError("no newline found when trying to read "
				393	"unicodestringnl")
				394	data = data[:-1] # lose the newline
				395	return unicode(data, 'raw-unicode-escape')
				396
				397	unicodestringnl = ArgumentDescriptor(
				398	name='unicodestringnl',
				399	n=UP_TO_NEWLINE,
				400	reader=read_unicodestringnl,
				401	doc="""A newline-terminated Unicode string.
				402
				403	This is raw-unicode-escape encoded, so consists of
				404	printable ASCII characters, and may contain embedded
				405	escape sequences.
				406	""")
				407
				408	def read_unicodestring4(f):
				409	"""
				410	>>> import StringIO
				411	>>> s = u'abcd\\uabcd'
				412	>>> enc = s.encode('utf-8')
				413	>>> enc
				414	'abcd\\xea\\xaf\\x8d'
				415	>>> n = chr(len(enc)) + chr(0) * 3 # little-endian 4-byte length
				416	>>> t = read_unicodestring4(StringIO.StringIO(n + enc + 'junk'))
				417	>>> s == t
				418	True
				419
				420	>>> read_unicodestring4(StringIO.StringIO(n + enc[:-1]))
				421	Traceback (most recent call last):
				422	...
				423	ValueError: expected 7 bytes in a unicodestring4, but only 6 remain
				424	"""
				425
				426	n = read_int4(f)
				427	if n < 0:
				428	raise ValueError("unicodestring4 byte count < 0: %d" % n)
				429	data = f.read(n)
				430	if len(data) == n:
				431	return unicode(data, 'utf-8')
				432	raise ValueError("expected %d bytes in a unicodestring4, but only %d "
				433	"remain" % (n, len(data)))
				434
				435	unicodestring4 = ArgumentDescriptor(
				436	name="unicodestring4",
				437	n=TAKEN_FROM_ARGUMENT,
				438	reader=read_unicodestring4,
				439	doc="""A counted Unicode string.
				440
				441	The first argument is a 4-byte little-endian signed int
				442	giving the number of bytes in the string, and the second
				443	argument-- the UTF-8 encoding of the Unicode string --
				444	contains that many bytes.
				445	""")
				446
				447
				448	def read_decimalnl_short(f):
				449	"""
				450	>>> import StringIO
				451	>>> read_decimalnl_short(StringIO.StringIO("1234\\n56"))
				452	1234
				453
				454	>>> read_decimalnl_short(StringIO.StringIO("1234L\\n56"))
				455	Traceback (most recent call last):
				456	...
				457	ValueError: trailing 'L' not allowed in '1234L'
				458	"""
				459
				460	s = read_stringnl(f, decode=False, stripquotes=False)
				461	if s.endswith("L"):
				462	raise ValueError("trailing 'L' not allowed in %r" % s)
				463
				464	# It's not necessarily true that the result fits in a Python short int:
				465	# the pickle may have been written on a 64-bit box. There's also a hack
				466	# for True and False here.
				467	if s == "00":
				468	return False
				469	elif s == "01":
				470	return True
				471
				472	try:
				473	return int(s)
				474	except OverflowError:
				475	return long(s)
				476
				477	def read_decimalnl_long(f):
				478	"""
				479	>>> import StringIO
				480
				481	>>> read_decimalnl_long(StringIO.StringIO("1234\\n56"))
				482	Traceback (most recent call last):
				483	...
				484	ValueError: trailing 'L' required in '1234'
				485
				486	Someday the trailing 'L' will probably go away from this output.
				487
				488	>>> read_decimalnl_long(StringIO.StringIO("1234L\\n56"))
				489	1234L
				490
				491	>>> read_decimalnl_long(StringIO.StringIO("123456789012345678901234L\\n6"))
				492	123456789012345678901234L
				493	"""
				494
				495	s = read_stringnl(f, decode=False, stripquotes=False)
				496	if not s.endswith("L"):
				497	raise ValueError("trailing 'L' required in %r" % s)
				498	return long(s)
				499
				500
				501	decimalnl_short = ArgumentDescriptor(
				502	name='decimalnl_short',
				503	n=UP_TO_NEWLINE,
				504	reader=read_decimalnl_short,
				505	doc="""A newline-terminated decimal integer literal.
				506
				507	This never has a trailing 'L', and the integer fit
				508	in a short Python int on the box where the pickle
				509	was written -- but there's no guarantee it will fit
				510	in a short Python int on the box where the pickle
				511	is read.
				512	""")
				513
				514	decimalnl_long = ArgumentDescriptor(
				515	name='decimalnl_long',
				516	n=UP_TO_NEWLINE,
				517	reader=read_decimalnl_long,
				518	doc="""A newline-terminated decimal integer literal.
				519
				520	This has a trailing 'L', and can represent integers
				521	of any size.
				522	""")
				523
				524
				525	def read_floatnl(f):
				526	"""
				527	>>> import StringIO
				528	>>> read_floatnl(StringIO.StringIO("-1.25\\n6"))
				529	-1.25
				530	"""
				531	s = read_stringnl(f, decode=False, stripquotes=False)
				532	return float(s)
				533
				534	floatnl = ArgumentDescriptor(
				535	name='floatnl',
				536	n=UP_TO_NEWLINE,
				537	reader=read_floatnl,
				538	doc="""A newline-terminated decimal floating literal.
				539
				540	In general this requires 17 significant digits for roundtrip
				541	identity, and pickling then unpickling infinities, NaNs, and
				542	minus zero doesn't work across boxes, or on some boxes even
				543	on itself (e.g., Windows can't read the strings it produces
				544	for infinities or NaNs).
				545	""")
				546
				547	def read_float8(f):
				548	"""
				549	>>> import StringIO, struct
				550	>>> raw = struct.pack(">d", -1.25)
				551	>>> raw
				552	'\\xbf\\xf4\\x00\\x00\\x00\\x00\\x00\\x00'
				553	>>> read_float8(StringIO.StringIO(raw + "\\n"))
				554	-1.25
				555	"""
				556
				557	data = f.read(8)
				558	if len(data) == 8:
				559	return _unpack(">d", data)[0]
				560	raise ValueError("not enough data in stream to read float8")
				561
				562
				563	float8 = ArgumentDescriptor(
				564	name='float8',
				565	n=8,
				566	reader=read_float8,
				567	doc="""An 8-byte binary representation of a float, big-endian.
				568
				569	The format is unique to Python, and shared with the struct
				570	module (format string '>d') "in theory" (the struct and cPickle
				571	implementations don't share the code -- they should). It's
				572	strongly related to the IEEE-754 double format, and, in normal
				573	cases, is in fact identical to the big-endian 754 double format.
				574	On other boxes the dynamic range is limited to that of a 754
				575	double, and "add a half and chop" rounding is used to reduce
				576	the precision to 53 bits. However, even on a 754 box,
				577	infinities, NaNs, and minus zero may not be handled correctly
				578	(may not survive roundtrip pickling intact).
				579	""")
				580
Guido van Rossum	5a2d8f5	2003-01-27 21:44:25 +0000	[diff] [blame]	581	# Protocol 2 formats
				582
				583	def decode_long(data):
				584	r"""Decode a long from a two's complement little-endian binary string.
				585	>>> decode_long("\xff\x00")
				586	255L
				587	>>> decode_long("\xff\x7f")
				588	32767L
				589	>>> decode_long("\x00\xff")
				590	-256L
				591	>>> decode_long("\x00\x80")
				592	-32768L
Tim Peters	217e571	2003-01-27 23:51:11 +0000	[diff] [blame]	593	>>> decode_long("\x80")
				594	-128L
				595	>>> decode_long("\x7f")
				596	127L
Guido van Rossum	5a2d8f5	2003-01-27 21:44:25 +0000	[diff] [blame]	597	"""
				598	x = 0L
				599	i = 0L
				600	for c in data:
				601	x \|= long(ord(c)) << i
				602	i += 8L
Tim Peters	217e571	2003-01-27 23:51:11 +0000	[diff] [blame]	603	if data and ord(c) >= 0x80:
Guido van Rossum	5a2d8f5	2003-01-27 21:44:25 +0000	[diff] [blame]	604	x -= 1L << i
				605	return x
				606
				607	def read_long1(f):
				608	r"""
				609	>>> import StringIO
				610	>>> read_long1(StringIO.StringIO("\x02\xff\x00"))
				611	255L
				612	>>> read_long1(StringIO.StringIO("\x02\xff\x7f"))
				613	32767L
				614	>>> read_long1(StringIO.StringIO("\x02\x00\xff"))
				615	-256L
				616	>>> read_long1(StringIO.StringIO("\x02\x00\x80"))
				617	-32768L
Tim Peters	5eed340	2003-01-27 23:51:36 +0000	[diff] [blame^]	618	>>>
Guido van Rossum	5a2d8f5	2003-01-27 21:44:25 +0000	[diff] [blame]	619	"""
				620
				621	n = read_uint1(f)
				622	data = f.read(n)
				623	if len(data) != n:
				624	raise ValueError("not enough data in stream to read long1")
				625	return decode_long(data)
				626
				627	long1 = ArgumentDescriptor(
				628	name="long1",
				629	n=TAKEN_FROM_ARGUMENT,
				630	reader=read_long1,
				631	doc="""A binary long, little-endian, using 1-byte size.
				632
				633	This first reads one byte as an unsigned size, then reads that
				634	many bytes and interprets them as a little-endian long.
				635	""")
				636
				637	def read_long2(f):
				638	r"""
				639	>>> import StringIO
				640	>>> read_long2(StringIO.StringIO("\x02\x00\xff\x00"))
				641	255L
				642	>>> read_long2(StringIO.StringIO("\x02\x00\xff\x7f"))
				643	32767L
				644	>>> read_long2(StringIO.StringIO("\x02\x00\x00\xff"))
				645	-256L
				646	>>> read_long2(StringIO.StringIO("\x02\x00\x00\x80"))
				647	-32768L
Tim Peters	5eed340	2003-01-27 23:51:36 +0000	[diff] [blame^]	648	>>>
Guido van Rossum	5a2d8f5	2003-01-27 21:44:25 +0000	[diff] [blame]	649	"""
				650
				651	n = read_uint2(f)
				652	data = f.read(n)
				653	if len(data) != n:
				654	raise ValueError("not enough data in stream to read long2")
				655	return decode_long(data)
				656
				657	long2 = ArgumentDescriptor(
				658	name="long2",
				659	n=TAKEN_FROM_ARGUMENT,
				660	reader=read_long2,
				661	doc="""A binary long, little-endian, using 2-byte size.
				662
				663	This first reads two byte as an unsigned size, then reads that
				664	many bytes and interprets them as a little-endian long.
				665	""")
				666
				667	def read_long4(f):
				668	r"""
				669	>>> import StringIO
				670	>>> read_long4(StringIO.StringIO("\x02\x00\x00\x00\xff\x00"))
				671	255L
				672	>>> read_long4(StringIO.StringIO("\x02\x00\x00\x00\xff\x7f"))
				673	32767L
				674	>>> read_long4(StringIO.StringIO("\x02\x00\x00\x00\x00\xff"))
				675	-256L
				676	>>> read_long4(StringIO.StringIO("\x02\x00\x00\x00\x00\x80"))
				677	-32768L
Tim Peters	5eed340	2003-01-27 23:51:36 +0000	[diff] [blame^]	678	>>>
Guido van Rossum	5a2d8f5	2003-01-27 21:44:25 +0000	[diff] [blame]	679	"""
				680
				681	n = read_int4(f)
				682	if n < 0:
				683	raise ValueError("unicodestring4 byte count < 0: %d" % n)
				684	data = f.read(n)
				685	if len(data) != n:
				686	raise ValueError("not enough data in stream to read long1")
				687	return decode_long(data)
				688
				689	long4 = ArgumentDescriptor(
				690	name="long4",
				691	n=TAKEN_FROM_ARGUMENT,
				692	reader=read_long4,
				693	doc="""A binary representation of a long, little-endian.
				694
				695	This first reads four bytes as a signed size (but requires the
				696	size to be >= 0), then reads that many bytes and interprets them
				697	as a little-endian long.
				698	""")
				699
				700
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	701	##############################################################################
				702	# Object descriptors. The stack used by the pickle machine holds objects,
				703	# and in the stack_before and stack_after attributes of OpcodeInfo
				704	# descriptors we need names to describe the various types of objects that can
				705	# appear on the stack.
				706
				707	class StackObject(object):
				708	__slots__ = (
				709	# name of descriptor record, for info only
				710	'name',
				711
				712	# type of object, or tuple of type objects (meaning the object can
				713	# be of any type in the tuple)
				714	'obtype',
				715
				716	# human-readable docs for this kind of stack object; a string
				717	'doc',
				718	)
				719
				720	def __init__(self, name, obtype, doc):
				721	assert isinstance(name, str)
				722	self.name = name
				723
				724	assert isinstance(obtype, type) or isinstance(obtype, tuple)
				725	if isinstance(obtype, tuple):
				726	for contained in obtype:
				727	assert isinstance(contained, type)
				728	self.obtype = obtype
				729
				730	assert isinstance(doc, str)
				731	self.doc = doc
				732
				733
				734	pyint = StackObject(
				735	name='int',
				736	obtype=int,
				737	doc="A short (as opposed to long) Python integer object.")
				738
				739	pylong = StackObject(
				740	name='long',
				741	obtype=long,
				742	doc="A long (as opposed to short) Python integer object.")
				743
				744	pyinteger_or_bool = StackObject(
				745	name='int_or_bool',
				746	obtype=(int, long, bool),
				747	doc="A Python integer object (short or long), or "
				748	"a Python bool.")
				749
Guido van Rossum	5a2d8f5	2003-01-27 21:44:25 +0000	[diff] [blame]	750	pybool = StackObject(
				751	name='bool',
				752	obtype=(bool,),
				753	doc="A Python bool object.")
				754
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	755	pyfloat = StackObject(
				756	name='float',
				757	obtype=float,
				758	doc="A Python float object.")
				759
				760	pystring = StackObject(
				761	name='str',
				762	obtype=str,
				763	doc="A Python string object.")
				764
				765	pyunicode = StackObject(
				766	name='unicode',
				767	obtype=unicode,
				768	doc="A Python Unicode string object.")
				769
				770	pynone = StackObject(
				771	name="None",
				772	obtype=type(None),
				773	doc="The Python None object.")
				774
				775	pytuple = StackObject(
				776	name="tuple",
				777	obtype=tuple,
				778	doc="A Python tuple object.")
				779
				780	pylist = StackObject(
				781	name="list",
				782	obtype=list,
				783	doc="A Python list object.")
				784
				785	pydict = StackObject(
				786	name="dict",
				787	obtype=dict,
				788	doc="A Python dict object.")
				789
				790	anyobject = StackObject(
				791	name='any',
				792	obtype=object,
				793	doc="Any kind of object whatsoever.")
				794
				795	markobject = StackObject(
				796	name="mark",
				797	obtype=StackObject,
				798	doc="""'The mark' is a unique object.
				799
				800	Opcodes that operate on a variable number of objects
				801	generally don't embed the count of objects in the opcode,
				802	or pull it off the stack. Instead the MARK opcode is used
				803	to push a special marker object on the stack, and then
				804	some other opcodes grab all the objects from the top of
				805	the stack down to (but not including) the topmost marker
				806	object.
				807	""")
				808
				809	stackslice = StackObject(
				810	name="stackslice",
				811	obtype=StackObject,
				812	doc="""An object representing a contiguous slice of the stack.
				813
				814	This is used in conjuction with markobject, to represent all
				815	of the stack following the topmost markobject. For example,
				816	the POP_MARK opcode changes the stack from
				817
				818	[..., markobject, stackslice]
				819	to
				820	[...]
				821
				822	No matter how many object are on the stack after the topmost
				823	markobject, POP_MARK gets rid of all of them (including the
				824	topmost markobject too).
				825	""")
				826
				827	##############################################################################
				828	# Descriptors for pickle opcodes.
				829
				830	class OpcodeInfo(object):
				831
				832	__slots__ = (
				833	# symbolic name of opcode; a string
				834	'name',
				835
				836	# the code used in a bytestream to represent the opcode; a
				837	# one-character string
				838	'code',
				839
				840	# If the opcode has an argument embedded in the byte string, an
				841	# instance of ArgumentDescriptor specifying its type. Note that
				842	# arg.reader(s) can be used to read and decode the argument from
				843	# the bytestream s, and arg.doc documents the format of the raw
				844	# argument bytes. If the opcode doesn't have an argument embedded
				845	# in the bytestream, arg should be None.
				846	'arg',
				847
				848	# what the stack looks like before this opcode runs; a list
				849	'stack_before',
				850
				851	# what the stack looks like after this opcode runs; a list
				852	'stack_after',
				853
				854	# the protocol number in which this opcode was introduced; an int
				855	'proto',
				856
				857	# human-readable docs for this opcode; a string
				858	'doc',
				859	)
				860
				861	def __init__(self, name, code, arg,
				862	stack_before, stack_after, proto, doc):
				863	assert isinstance(name, str)
				864	self.name = name
				865
				866	assert isinstance(code, str)
				867	assert len(code) == 1
				868	self.code = code
				869
				870	assert arg is None or isinstance(arg, ArgumentDescriptor)
				871	self.arg = arg
				872
				873	assert isinstance(stack_before, list)
				874	for x in stack_before:
				875	assert isinstance(x, StackObject)
				876	self.stack_before = stack_before
				877
				878	assert isinstance(stack_after, list)
				879	for x in stack_after:
				880	assert isinstance(x, StackObject)
				881	self.stack_after = stack_after
				882
				883	assert isinstance(proto, int) and 0 <= proto <= 2
				884	self.proto = proto
				885
				886	assert isinstance(doc, str)
				887	self.doc = doc
				888
				889	I = OpcodeInfo
				890	opcodes = [
				891
				892	# Ways to spell integers.
				893
				894	I(name='INT',
				895	code='I',
				896	arg=decimalnl_short,
				897	stack_before=[],
				898	stack_after=[pyinteger_or_bool],
				899	proto=0,
				900	doc="""Push an integer or bool.
				901
				902	The argument is a newline-terminated decimal literal string.
				903
				904	The intent may have been that this always fit in a short Python int,
				905	but INT can be generated in pickles written on a 64-bit box that
				906	require a Python long on a 32-bit box. The difference between this
				907	and LONG then is that INT skips a trailing 'L', and produces a short
				908	int whenever possible.
				909
				910	Another difference is due to that, when bool was introduced as a
				911	distinct type in 2.3, builtin names True and False were also added to
				912	2.2.2, mapping to ints 1 and 0. For compatibility in both directions,
				913	True gets pickled as INT + "I01\\n", and False as INT + "I00\\n".
				914	Leading zeroes are never produced for a genuine integer. The 2.3
				915	(and later) unpicklers special-case these and return bool instead;
				916	earlier unpicklers ignore the leading "0" and return the int.
				917	"""),
				918
				919	I(name='LONG',
				920	code='L',
				921	arg=decimalnl_long,
				922	stack_before=[],
				923	stack_after=[pylong],
				924	proto=0,
				925	doc="""Push a long integer.
				926
				927	The same as INT, except that the literal ends with 'L', and always
				928	unpickles to a Python long. There doesn't seem a real purpose to the
				929	trailing 'L'.
				930	"""),
				931
				932	I(name='BININT',
				933	code='J',
				934	arg=int4,
				935	stack_before=[],
				936	stack_after=[pyint],
				937	proto=1,
				938	doc="""Push a four-byte signed integer.
				939
				940	This handles the full range of Python (short) integers on a 32-bit
				941	box, directly as binary bytes (1 for the opcode and 4 for the integer).
				942	If the integer is non-negative and fits in 1 or 2 bytes, pickling via
				943	BININT1 or BININT2 saves space.
				944	"""),
				945
				946	I(name='BININT1',
				947	code='K',
				948	arg=uint1,
				949	stack_before=[],
				950	stack_after=[pyint],
				951	proto=1,
				952	doc="""Push a one-byte unsigned integer.
				953
				954	This is a space optimization for pickling very small non-negative ints,
				955	in range(256).
				956	"""),
				957
				958	I(name='BININT2',
				959	code='M',
				960	arg=uint2,
				961	stack_before=[],
				962	stack_after=[pyint],
				963	proto=1,
				964	doc="""Push a two-byte unsigned integer.
				965
				966	This is a space optimization for pickling small positive ints, in
				967	range(256, 2**16). Integers in range(256) can also be pickled via
				968	BININT2, but BININT1 instead saves a byte.
				969	"""),
				970
				971	# Ways to spell strings (8-bit, not Unicode).
				972
				973	I(name='STRING',
				974	code='S',
				975	arg=stringnl,
				976	stack_before=[],
				977	stack_after=[pystring],
				978	proto=0,
				979	doc="""Push a Python string object.
				980
				981	The argument is a repr-style string, with bracketing quote characters,
				982	and perhaps embedded escapes. The argument extends until the next
				983	newline character.
				984	"""),
				985
				986	I(name='BINSTRING',
				987	code='T',
				988	arg=string4,
				989	stack_before=[],
				990	stack_after=[pystring],
				991	proto=1,
				992	doc="""Push a Python string object.
				993
				994	There are two arguments: the first is a 4-byte little-endian signed int
				995	giving the number of bytes in the string, and the second is that many
				996	bytes, which are taken literally as the string content.
				997	"""),
				998
				999	I(name='SHORT_BINSTRING',
				1000	code='U',
				1001	arg=string1,
				1002	stack_before=[],
				1003	stack_after=[pystring],
				1004	proto=1,
				1005	doc="""Push a Python string object.
				1006
				1007	There are two arguments: the first is a 1-byte unsigned int giving
				1008	the number of bytes in the string, and the second is that many bytes,
				1009	which are taken literally as the string content.
				1010	"""),
				1011
				1012	# Ways to spell None.
				1013
				1014	I(name='NONE',
				1015	code='N',
				1016	arg=None,
				1017	stack_before=[],
				1018	stack_after=[pynone],
				1019	proto=0,
				1020	doc="Push None on the stack."),
				1021
				1022	# Ways to spell Unicode strings.
				1023
				1024	I(name='UNICODE',
				1025	code='V',
				1026	arg=unicodestringnl,
				1027	stack_before=[],
				1028	stack_after=[pyunicode],
				1029	proto=0, # this may be pure-text, but it's a later addition
				1030	doc="""Push a Python Unicode string object.
				1031
				1032	The argument is a raw-unicode-escape encoding of a Unicode string,
				1033	and so may contain embedded escape sequences. The argument extends
				1034	until the next newline character.
				1035	"""),
				1036
				1037	I(name='BINUNICODE',
				1038	code='X',
				1039	arg=unicodestring4,
				1040	stack_before=[],
				1041	stack_after=[pyunicode],
				1042	proto=1,
				1043	doc="""Push a Python Unicode string object.
				1044
				1045	There are two arguments: the first is a 4-byte little-endian signed int
				1046	giving the number of bytes in the string. The second is that many
				1047	bytes, and is the UTF-8 encoding of the Unicode string.
				1048	"""),
				1049
				1050	# Ways to spell floats.
				1051
				1052	I(name='FLOAT',
				1053	code='F',
				1054	arg=floatnl,
				1055	stack_before=[],
				1056	stack_after=[pyfloat],
				1057	proto=0,
				1058	doc="""Newline-terminated decimal float literal.
				1059
				1060	The argument is repr(a_float), and in general requires 17 significant
				1061	digits for roundtrip conversion to be an identity (this is so for
				1062	IEEE-754 double precision values, which is what Python float maps to
				1063	on most boxes).
				1064
				1065	In general, FLOAT cannot be used to transport infinities, NaNs, or
				1066	minus zero across boxes (or even on a single box, if the platform C
				1067	library can't read the strings it produces for such things -- Windows
				1068	is like that), but may do less damage than BINFLOAT on boxes with
				1069	greater precision or dynamic range than IEEE-754 double.
				1070	"""),
				1071
				1072	I(name='BINFLOAT',
				1073	code='G',
				1074	arg=float8,
				1075	stack_before=[],
				1076	stack_after=[pyfloat],
				1077	proto=1,
				1078	doc="""Float stored in binary form, with 8 bytes of data.
				1079
				1080	This generally requires less than half the space of FLOAT encoding.
				1081	In general, BINFLOAT cannot be used to transport infinities, NaNs, or
				1082	minus zero, raises an exception if the exponent exceeds the range of
				1083	an IEEE-754 double, and retains no more than 53 bits of precision (if
				1084	there are more than that, "add a half and chop" rounding is used to
				1085	cut it back to 53 significant bits).
				1086	"""),
				1087
				1088	# Ways to build lists.
				1089
				1090	I(name='EMPTY_LIST',
				1091	code=']',
				1092	arg=None,
				1093	stack_before=[],
				1094	stack_after=[pylist],
				1095	proto=1,
				1096	doc="Push an empty list."),
				1097
				1098	I(name='APPEND',
				1099	code='a',
				1100	arg=None,
				1101	stack_before=[pylist, anyobject],
				1102	stack_after=[pylist],
				1103	proto=0,
				1104	doc="""Append an object to a list.
				1105
				1106	Stack before: ... pylist anyobject
				1107	Stack after: ... pylist+[anyobject]
				1108	"""),
				1109
				1110	I(name='APPENDS',
				1111	code='e',
				1112	arg=None,
				1113	stack_before=[pylist, markobject, stackslice],
				1114	stack_after=[pylist],
				1115	proto=1,
				1116	doc="""Extend a list by a slice of stack objects.
				1117
				1118	Stack before: ... pylist markobject stackslice
				1119	Stack after: ... pylist+stackslice
				1120	"""),
				1121
				1122	I(name='LIST',
				1123	code='l',
				1124	arg=None,
				1125	stack_before=[markobject, stackslice],
				1126	stack_after=[pylist],
				1127	proto=0,
				1128	doc="""Build a list out of the topmost stack slice, after markobject.
				1129
				1130	All the stack entries following the topmost markobject are placed into
				1131	a single Python list, which single list object replaces all of the
				1132	stack from the topmost markobject onward. For example,
				1133
				1134	Stack before: ... markobject 1 2 3 'abc'
				1135	Stack after: ... [1, 2, 3, 'abc']
				1136	"""),
				1137
				1138	# Ways to build tuples.
				1139
				1140	I(name='EMPTY_TUPLE',
				1141	code=')',
				1142	arg=None,
				1143	stack_before=[],
				1144	stack_after=[pytuple],
				1145	proto=1,
				1146	doc="Push an empty tuple."),
				1147
				1148	I(name='TUPLE',
				1149	code='t',
				1150	arg=None,
				1151	stack_before=[markobject, stackslice],
				1152	stack_after=[pytuple],
				1153	proto=0,
				1154	doc="""Build a tuple out of the topmost stack slice, after markobject.
				1155
				1156	All the stack entries following the topmost markobject are placed into
				1157	a single Python tuple, which single tuple object replaces all of the
				1158	stack from the topmost markobject onward. For example,
				1159
				1160	Stack before: ... markobject 1 2 3 'abc'
				1161	Stack after: ... (1, 2, 3, 'abc')
				1162	"""),
				1163
				1164	# Ways to build dicts.
				1165
				1166	I(name='EMPTY_DICT',
				1167	code='}',
				1168	arg=None,
				1169	stack_before=[],
				1170	stack_after=[pydict],
				1171	proto=1,
				1172	doc="Push an empty dict."),
				1173
				1174	I(name='DICT',
				1175	code='d',
				1176	arg=None,
				1177	stack_before=[markobject, stackslice],
				1178	stack_after=[pydict],
				1179	proto=0,
				1180	doc="""Build a dict out of the topmost stack slice, after markobject.
				1181
				1182	All the stack entries following the topmost markobject are placed into
				1183	a single Python dict, which single dict object replaces all of the
				1184	stack from the topmost markobject onward. The stack slice alternates
				1185	key, value, key, value, .... For example,
				1186
				1187	Stack before: ... markobject 1 2 3 'abc'
				1188	Stack after: ... {1: 2, 3: 'abc'}
				1189	"""),
				1190
				1191	I(name='SETITEM',
				1192	code='s',
				1193	arg=None,
				1194	stack_before=[pydict, anyobject, anyobject],
				1195	stack_after=[pydict],
				1196	proto=0,
				1197	doc="""Add a key+value pair to an existing dict.
				1198
				1199	Stack before: ... pydict key value
				1200	Stack after: ... pydict
				1201
				1202	where pydict has been modified via pydict[key] = value.
				1203	"""),
				1204
				1205	I(name='SETITEMS',
				1206	code='u',
				1207	arg=None,
				1208	stack_before=[pydict, markobject, stackslice],
				1209	stack_after=[pydict],
				1210	proto=1,
				1211	doc="""Add an arbitrary number of key+value pairs to an existing dict.
				1212
				1213	The slice of the stack following the topmost markobject is taken as
				1214	an alternating sequence of keys and values, added to the dict
				1215	immediately under the topmost markobject. Everything at and after the
				1216	topmost markobject is popped, leaving the mutated dict at the top
				1217	of the stack.
				1218
				1219	Stack before: ... pydict markobject key_1 value_1 ... key_n value_n
				1220	Stack after: ... pydict
				1221
				1222	where pydict has been modified via pydict[key_i] = value_i for i in
				1223	1, 2, ..., n, and in that order.
				1224	"""),
				1225
				1226	# Stack manipulation.
				1227
				1228	I(name='POP',
				1229	code='0',
				1230	arg=None,
				1231	stack_before=[anyobject],
				1232	stack_after=[],
				1233	proto=0,
				1234	doc="Discard the top stack item, shrinking the stack by one item."),
				1235
				1236	I(name='DUP',
				1237	code='2',
				1238	arg=None,
				1239	stack_before=[anyobject],
				1240	stack_after=[anyobject, anyobject],
				1241	proto=0,
				1242	doc="Push the top stack item onto the stack again, duplicating it."),
				1243
				1244	I(name='MARK',
				1245	code='(',
				1246	arg=None,
				1247	stack_before=[],
				1248	stack_after=[markobject],
				1249	proto=0,
				1250	doc="""Push markobject onto the stack.
				1251
				1252	markobject is a unique object, used by other opcodes to identify a
				1253	region of the stack containing a variable number of objects for them
				1254	to work on. See markobject.doc for more detail.
				1255	"""),
				1256
				1257	I(name='POP_MARK',
				1258	code='1',
				1259	arg=None,
				1260	stack_before=[markobject, stackslice],
				1261	stack_after=[],
				1262	proto=0,
				1263	doc="""Pop all the stack objects at and above the topmost markobject.
				1264
				1265	When an opcode using a variable number of stack objects is done,
				1266	POP_MARK is used to remove those objects, and to remove the markobject
				1267	that delimited their starting position on the stack.
				1268	"""),
				1269
				1270	# Memo manipulation. There are really only two operations (get and put),
				1271	# each in all-text, "short binary", and "long binary" flavors.
				1272
				1273	I(name='GET',
				1274	code='g',
				1275	arg=decimalnl_short,
				1276	stack_before=[],
				1277	stack_after=[anyobject],
				1278	proto=0,
				1279	doc="""Read an object from the memo and push it on the stack.
				1280
				1281	The index of the memo object to push is given by the newline-teriminated
				1282	decimal string following. BINGET and LONG_BINGET are space-optimized
				1283	versions.
				1284	"""),
				1285
				1286	I(name='BINGET',
				1287	code='h',
				1288	arg=uint1,
				1289	stack_before=[],
				1290	stack_after=[anyobject],
				1291	proto=1,
				1292	doc="""Read an object from the memo and push it on the stack.
				1293
				1294	The index of the memo object to push is given by the 1-byte unsigned
				1295	integer following.
				1296	"""),
				1297
				1298	I(name='LONG_BINGET',
				1299	code='j',
				1300	arg=int4,
				1301	stack_before=[],
				1302	stack_after=[anyobject],
				1303	proto=1,
				1304	doc="""Read an object from the memo and push it on the stack.
				1305
				1306	The index of the memo object to push is given by the 4-byte signed
				1307	little-endian integer following.
				1308	"""),
				1309
				1310	I(name='PUT',
				1311	code='p',
				1312	arg=decimalnl_short,
				1313	stack_before=[],
				1314	stack_after=[],
				1315	proto=0,
				1316	doc="""Store the stack top into the memo. The stack is not popped.
				1317
				1318	The index of the memo location to write into is given by the newline-
				1319	terminated decimal string following. BINPUT and LONG_BINPUT are
				1320	space-optimized versions.
				1321	"""),
				1322
				1323	I(name='BINPUT',
				1324	code='q',
				1325	arg=uint1,
				1326	stack_before=[],
				1327	stack_after=[],
				1328	proto=1,
				1329	doc="""Store the stack top into the memo. The stack is not popped.
				1330
				1331	The index of the memo location to write into is given by the 1-byte
				1332	unsigned integer following.
				1333	"""),
				1334
				1335	I(name='LONG_BINPUT',
				1336	code='r',
				1337	arg=int4,
				1338	stack_before=[],
				1339	stack_after=[],
				1340	proto=1,
				1341	doc="""Store the stack top into the memo. The stack is not popped.
				1342
				1343	The index of the memo location to write into is given by the 4-byte
				1344	signed little-endian integer following.
				1345	"""),
				1346
				1347	# Push a class object, or module function, on the stack, via its module
				1348	# and name.
				1349
				1350	I(name='GLOBAL',
				1351	code='c',
				1352	arg=stringnl_noescape_pair,
				1353	stack_before=[],
				1354	stack_after=[anyobject],
				1355	proto=0,
				1356	doc="""Push a global object (module.attr) on the stack.
				1357
				1358	Two newline-terminated strings follow the GLOBAL opcode. The first is
				1359	taken as a module name, and the second as a class name. The class
				1360	object module.class is pushed on the stack. More accurately, the
				1361	object returned by self.find_class(module, class) is pushed on the
				1362	stack, so unpickling subclasses can override this form of lookup.
				1363	"""),
				1364
				1365	# Ways to build objects of classes pickle doesn't know about directly
				1366	# (user-defined classes). I despair of documenting this accurately
				1367	# and comprehensibly -- you really have to read the pickle code to
				1368	# find all the special cases.
				1369
				1370	I(name='REDUCE',
				1371	code='R',
				1372	arg=None,
				1373	stack_before=[anyobject, anyobject],
				1374	stack_after=[anyobject],
				1375	proto=0,
				1376	doc="""Push an object built from a callable and an argument tuple.
				1377
				1378	The opcode is named to remind of the __reduce__() method.
				1379
				1380	Stack before: ... callable pytuple
				1381	Stack after: ... callable(*pytuple)
				1382
				1383	The callable and the argument tuple are the first two items returned
				1384	by a __reduce__ method. Applying the callable to the argtuple is
				1385	supposed to reproduce the original object, or at least get it started.
				1386	If the __reduce__ method returns a 3-tuple, the last component is an
				1387	argument to be passed to the object's __setstate__, and then the REDUCE
				1388	opcode is followed by code to create setstate's argument, and then a
				1389	BUILD opcode to apply __setstate__ to that argument.
				1390
				1391	There are lots of special cases here. The argtuple can be None, in
				1392	which case callable.__basicnew__() is called instead to produce the
				1393	object to be pushed on the stack. This appears to be a trick unique
				1394	to ExtensionClasses, and is deprecated regardless.
				1395
				1396	If type(callable) is not ClassType, REDUCE complains unless the
				1397	callable has been registered with the copy_reg module's
				1398	safe_constructors dict, or the callable has a magic
				1399	'__safe_for_unpickling__' attribute with a true value. I'm not sure
				1400	why it does this, but I've sure seen this complaint often enough when
				1401	I didn't want to <wink>.
				1402	"""),
				1403
				1404	I(name='BUILD',
				1405	code='b',
				1406	arg=None,
				1407	stack_before=[anyobject, anyobject],
				1408	stack_after=[anyobject],
				1409	proto=0,
				1410	doc="""Finish building an object, via __setstate__ or dict update.
				1411
				1412	Stack before: ... anyobject argument
				1413	Stack after: ... anyobject
				1414
				1415	where anyobject may have been mutated, as follows:
				1416
				1417	If the object has a __setstate__ method,
				1418
				1419	anyobject.__setstate__(argument)
				1420
				1421	is called.
				1422
				1423	Else the argument must be a dict, the object must have a __dict__, and
				1424	the object is updated via
				1425
				1426	anyobject.__dict__.update(argument)
				1427
				1428	This may raise RuntimeError in restricted execution mode (which
				1429	disallows access to __dict__ directly); in that case, the object
				1430	is updated instead via
				1431
				1432	for k, v in argument.items():
				1433	anyobject[k] = v
				1434	"""),
				1435
				1436	I(name='INST',
				1437	code='i',
				1438	arg=stringnl_noescape_pair,
				1439	stack_before=[markobject, stackslice],
				1440	stack_after=[anyobject],
				1441	proto=0,
				1442	doc="""Build a class instance.
				1443
				1444	This is the protocol 0 version of protocol 1's OBJ opcode.
				1445	INST is followed by two newline-terminated strings, giving a
				1446	module and class name, just as for the GLOBAL opcode (and see
				1447	GLOBAL for more details about that). self.find_class(module, name)
				1448	is used to get a class object.
				1449
				1450	In addition, all the objects on the stack following the topmost
				1451	markobject are gathered into a tuple and popped (along with the
				1452	topmost markobject), just as for the TUPLE opcode.
				1453
				1454	Now it gets complicated. If all of these are true:
				1455
				1456	+ The argtuple is empty (markobject was at the top of the stack
				1457	at the start).
				1458
				1459	+ It's an old-style class object (the type of the class object is
				1460	ClassType).
				1461
				1462	+ The class object does not have a __getinitargs__ attribute.
				1463
				1464	then we want to create an old-style class instance without invoking
				1465	its __init__() method (pickle has waffled on this over the years; not
				1466	calling __init__() is current wisdom). In this case, an instance of
				1467	an old-style dummy class is created, and then we try to rebind its
				1468	__class__ attribute to the desired class object. If this succeeds,
				1469	the new instance object is pushed on the stack, and we're done. In
				1470	restricted execution mode it can fail (assignment to __class__ is
				1471	disallowed), and I'm not really sure what happens then -- it looks
				1472	like the code ends up calling the class object's __init__ anyway,
				1473	via falling into the next case.
				1474
				1475	Else (the argtuple is not empty, it's not an old-style class object,
				1476	or the class object does have a __getinitargs__ attribute), the code
				1477	first insists that the class object have a __safe_for_unpickling__
				1478	attribute. Unlike as for the __safe_for_unpickling__ check in REDUCE,
				1479	it doesn't matter whether this attribute has a true or false value, it
				1480	only matters whether it exists (XXX this smells like a bug). If
				1481	__safe_for_unpickling__ dosn't exist, UnpicklingError is raised.
				1482
				1483	Else (the class object does have a __safe_for_unpickling__ attr),
				1484	the class object obtained from INST's arguments is applied to the
				1485	argtuple obtained from the stack, and the resulting instance object
				1486	is pushed on the stack.
				1487	"""),
				1488
				1489	I(name='OBJ',
				1490	code='o',
				1491	arg=None,
				1492	stack_before=[markobject, anyobject, stackslice],
				1493	stack_after=[anyobject],
				1494	proto=1,
				1495	doc="""Build a class instance.
				1496
				1497	This is the protocol 1 version of protocol 0's INST opcode, and is
				1498	very much like it. The major difference is that the class object
				1499	is taken off the stack, allowing it to be retrieved from the memo
				1500	repeatedly if several instances of the same class are created. This
				1501	can be much more efficient (in both time and space) than repeatedly
				1502	embedding the module and class names in INST opcodes.
				1503
				1504	Unlike INST, OBJ takes no arguments from the opcode stream. Instead
				1505	the class object is taken off the stack, immediately above the
				1506	topmost markobject:
				1507
				1508	Stack before: ... markobject classobject stackslice
				1509	Stack after: ... new_instance_object
				1510
				1511	As for INST, the remainder of the stack above the markobject is
				1512	gathered into an argument tuple, and then the logic seems identical,
				1513	except that no __safe_for_unpickling__ check is done (XXX this smells
				1514	like a bug). See INST for the gory details.
				1515	"""),
				1516
				1517	# Machine control.
				1518
				1519	I(name='STOP',
				1520	code='.',
				1521	arg=None,
				1522	stack_before=[anyobject],
				1523	stack_after=[],
				1524	proto=0,
				1525	doc="""Stop the unpickling machine.
				1526
				1527	Every pickle ends with this opcode. The object at the top of the stack
				1528	is popped, and that's the result of unpickling. The stack should be
				1529	empty then.
				1530	"""),
				1531
				1532	# Ways to deal with persistent IDs.
				1533
				1534	I(name='PERSID',
				1535	code='P',
				1536	arg=stringnl_noescape,
				1537	stack_before=[],
				1538	stack_after=[anyobject],
				1539	proto=0,
				1540	doc="""Push an object identified by a persistent ID.
				1541
				1542	The pickle module doesn't define what a persistent ID means. PERSID's
				1543	argument is a newline-terminated str-style (no embedded escapes, no
				1544	bracketing quote characters) string, which is "the persistent ID".
				1545	The unpickler passes this string to self.persistent_load(). Whatever
				1546	object that returns is pushed on the stack. There is no implementation
				1547	of persistent_load() in Python's unpickler: it must be supplied by an
				1548	unpickler subclass.
				1549	"""),
				1550
				1551	I(name='BINPERSID',
				1552	code='Q',
				1553	arg=None,
				1554	stack_before=[anyobject],
				1555	stack_after=[anyobject],
				1556	proto=1,
				1557	doc="""Push an object identified by a persistent ID.
				1558
				1559	Like PERSID, except the persistent ID is popped off the stack (instead
				1560	of being a string embedded in the opcode bytestream). The persistent
				1561	ID is passed to self.persistent_load(), and whatever object that
				1562	returns is pushed on the stack. See PERSID for more detail.
				1563	"""),
Guido van Rossum	5a2d8f5	2003-01-27 21:44:25 +0000	[diff] [blame]	1564
				1565	# Protocol 2 opcodes
				1566
				1567	I(name='PROTO',
				1568	code='\x80',
				1569	arg=uint1,
				1570	stack_before=[],
				1571	stack_after=[],
				1572	proto=2,
				1573	doc="""Protocol version indicator.
				1574
				1575	For protocol 2 and above, a pickle must start with this opcode.
				1576	The argument is the protocol version, an int in range(2, 256).
				1577	"""),
				1578
				1579	I(name='NEWOBJ',
				1580	code='\x81',
				1581	arg=None,
				1582	stack_before=[anyobject, anyobject],
				1583	stack_after=[anyobject],
				1584	proto=2,
				1585	doc="""Build an object instance.
				1586
				1587	The stack before should be thought of as containing a class
				1588	object followed by an argument tuple (the tuple being the stack
				1589	top). Call these cls and args. They are popped off the stack,
				1590	and the value returned by cls.__new__(cls, *args) is pushed back
				1591	onto the stack.
				1592	"""),
				1593
				1594	I(name='EXT1',
				1595	code='\x82',
				1596	arg=uint1,
				1597	stack_before=[],
				1598	stack_after=[anyobject],
				1599	proto=2,
				1600	doc="""Extension code.
				1601
				1602	This code and the similar EXT2 and EXT4 allow using a registry
				1603	of popular objects that are pickled by name, typically classes.
				1604	It is envisioned that through a global negotiation and
				1605	registration process, third parties can set up a mapping between
				1606	ints and object names.
				1607
				1608	In order to guarantee pickle interchangeability, the extension
				1609	code registry ought to be global, although a range of codes may
				1610	be reserved for private use.
				1611	"""),
				1612
				1613	I(name='EXT2',
				1614	code='\x83',
				1615	arg=uint2,
				1616	stack_before=[],
				1617	stack_after=[anyobject],
				1618	proto=2,
				1619	doc="""Extension code.
				1620
				1621	See EXT1.
				1622	"""),
				1623
				1624	I(name='EXT4',
				1625	code='\x84',
				1626	arg=int4,
				1627	stack_before=[],
				1628	stack_after=[anyobject],
				1629	proto=2,
				1630	doc="""Extension code.
				1631
				1632	See EXT1.
				1633	"""),
				1634
				1635	I(name='TUPLE1',
				1636	code='\x85',
				1637	arg=None,
				1638	stack_before=[anyobject],
				1639	stack_after=[pytuple],
				1640	proto=2,
				1641	doc="""One-tuple.
				1642
				1643	This code pops one value off the stack and pushes a tuple of
				1644	length 1 whose one item is that value back onto it. IOW:
				1645
				1646	stack[-1] = tuple(stack[-1:])
				1647	"""),
				1648
				1649	I(name='TUPLE2',
				1650	code='\x86',
				1651	arg=None,
				1652	stack_before=[anyobject, anyobject],
				1653	stack_after=[pytuple],
				1654	proto=2,
				1655	doc="""One-tuple.
				1656
				1657	This code pops two values off the stack and pushes a tuple
				1658	of length 2 whose items are those values back onto it. IOW:
				1659
				1660	stack[-2:] = [tuple(stack[-2:])]
				1661	"""),
				1662
				1663	I(name='TUPLE3',
				1664	code='\x87',
				1665	arg=None,
				1666	stack_before=[anyobject, anyobject, anyobject],
				1667	stack_after=[pytuple],
				1668	proto=2,
				1669	doc="""One-tuple.
				1670
				1671	This code pops three values off the stack and pushes a tuple
				1672	of length 3 whose items are those values back onto it. IOW:
				1673
				1674	stack[-3:] = [tuple(stack[-3:])]
				1675	"""),
				1676
				1677	I(name='NEWTRUE',
				1678	code='\x88',
				1679	arg=None,
				1680	stack_before=[],
				1681	stack_after=[pybool],
				1682	proto=2,
				1683	doc="""True.
				1684
				1685	Push True onto the stack."""),
				1686
				1687	I(name='NEWFALSE',
				1688	code='\x89',
				1689	arg=None,
				1690	stack_before=[],
				1691	stack_after=[pybool],
				1692	proto=2,
				1693	doc="""True.
				1694
				1695	Push False onto the stack."""),
				1696
				1697	I(name="LONG1",
				1698	code='\x8a',
				1699	arg=long1,
				1700	stack_before=[],
				1701	stack_after=[pylong],
				1702	proto=2,
				1703	doc="""Long integer using one-byte length.
				1704
				1705	A more efficient encoding of a Python long; the long1 encoding
				1706	says it all."""),
				1707
				1708	I(name="LONG2",
				1709	code='\x8b',
				1710	arg=long2,
				1711	stack_before=[],
				1712	stack_after=[pylong],
				1713	proto=2,
				1714	doc="""Long integer using two-byte length.
				1715
				1716	A more efficient encoding of a Python long; the long2 encoding
				1717	says it all."""),
				1718
				1719	I(name="LONG4",
				1720	code='\x8c',
				1721	arg=long4,
				1722	stack_before=[],
				1723	stack_after=[pylong],
				1724	proto=2,
				1725	doc="""Long integer using found-byte length.
				1726
				1727	A more efficient encoding of a Python long; the long4 encoding
				1728	says it all."""),
				1729
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	1730	]
				1731	del I
				1732
				1733	# Verify uniqueness of .name and .code members.
				1734	name2i = {}
				1735	code2i = {}
				1736
				1737	for i, d in enumerate(opcodes):
				1738	if d.name in name2i:
				1739	raise ValueError("repeated name %r at indices %d and %d" %
				1740	(d.name, name2i[d.name], i))
				1741	if d.code in code2i:
				1742	raise ValueError("repeated code %r at indices %d and %d" %
				1743	(d.code, code2i[d.code], i))
				1744
				1745	name2i[d.name] = i
				1746	code2i[d.code] = i
				1747
				1748	del name2i, code2i, i, d
				1749
				1750	##############################################################################
				1751	# Build a code2op dict, mapping opcode characters to OpcodeInfo records.
				1752	# Also ensure we've got the same stuff as pickle.py, although the
				1753	# introspection here is dicey.
				1754
				1755	code2op = {}
				1756	for d in opcodes:
				1757	code2op[d.code] = d
				1758	del d
				1759
				1760	def assure_pickle_consistency(verbose=False):
				1761	import pickle, re
				1762
				1763	copy = code2op.copy()
				1764	for name in pickle.__all__:
				1765	if not re.match("[A-Z][A-Z0-9_]+$", name):
				1766	if verbose:
				1767	print "skipping %r: it doesn't look like an opcode name" % name
				1768	continue
				1769	picklecode = getattr(pickle, name)
				1770	if not isinstance(picklecode, str) or len(picklecode) != 1:
				1771	if verbose:
				1772	print ("skipping %r: value %r doesn't look like a pickle "
				1773	"code" % (name, picklecode))
				1774	continue
				1775	if picklecode in copy:
				1776	if verbose:
				1777	print "checking name %r w/ code %r for consistency" % (
				1778	name, picklecode)
				1779	d = copy[picklecode]
				1780	if d.name != name:
				1781	raise ValueError("for pickle code %r, pickle.py uses name %r "
				1782	"but we're using name %r" % (picklecode,
				1783	name,
				1784	d.name))
				1785	# Forget this one. Any left over in copy at the end are a problem
				1786	# of a different kind.
				1787	del copy[picklecode]
				1788	else:
				1789	raise ValueError("pickle.py appears to have a pickle opcode with "
				1790	"name %r and code %r, but we don't" %
				1791	(name, picklecode))
				1792	if copy:
				1793	msg = ["we appear to have pickle opcodes that pickle.py doesn't have:"]
				1794	for code, d in copy.items():
				1795	msg.append(" name %r with code %r" % (d.name, code))
				1796	raise ValueError("\n".join(msg))
				1797
				1798	assure_pickle_consistency()
				1799
				1800	##############################################################################
				1801	# A pickle opcode generator.
				1802
				1803	def genops(pickle):
Guido van Rossum	a72ded9	2003-01-27 19:40:47 +0000	[diff] [blame]	1804	"""Generate all the opcodes in a pickle.
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	1805
				1806	'pickle' is a file-like object, or string, containing the pickle.
				1807
				1808	Each opcode in the pickle is generated, from the current pickle position,
				1809	stopping after a STOP opcode is delivered. A triple is generated for
				1810	each opcode:
				1811
				1812	opcode, arg, pos
				1813
				1814	opcode is an OpcodeInfo record, describing the current opcode.
				1815
				1816	If the opcode has an argument embedded in the pickle, arg is its decoded
				1817	value, as a Python object. If the opcode doesn't have an argument, arg
				1818	is None.
				1819
				1820	If the pickle has a tell() method, pos was the value of pickle.tell()
				1821	before reading the current opcode. If the pickle is a string object,
				1822	it's wrapped in a StringIO object, and the latter's tell() result is
				1823	used. Else (the pickle doesn't have a tell(), and it's not obvious how
				1824	to query its current position) pos is None.
				1825	"""
				1826
				1827	import cStringIO as StringIO
				1828
				1829	if isinstance(pickle, str):
				1830	pickle = StringIO.StringIO(pickle)
				1831
				1832	if hasattr(pickle, "tell"):
				1833	getpos = pickle.tell
				1834	else:
				1835	getpos = lambda: None
				1836
				1837	while True:
				1838	pos = getpos()
				1839	code = pickle.read(1)
				1840	opcode = code2op.get(code)
				1841	if opcode is None:
				1842	if code == "":
				1843	raise ValueError("pickle exhausted before seeing STOP")
				1844	else:
				1845	raise ValueError("at position %s, opcode %r unknown" % (
				1846	pos is None and "<unknown>" or pos,
				1847	code))
				1848	if opcode.arg is None:
				1849	arg = None
				1850	else:
				1851	arg = opcode.arg.reader(pickle)
				1852	yield opcode, arg, pos
				1853	if code == '.':
				1854	assert opcode.name == 'STOP'
				1855	break
				1856
				1857	##############################################################################
				1858	# A symbolic pickle disassembler.
				1859
				1860	def dis(pickle, out=None, indentlevel=4):
				1861	"""Produce a symbolic disassembly of a pickle.
				1862
				1863	'pickle' is a file-like object, or string, containing a (at least one)
				1864	pickle. The pickle is disassembled from the current position, through
				1865	the first STOP opcode encountered.
				1866
				1867	Optional arg 'out' is a file-like object to which the disassembly is
				1868	printed. It defaults to sys.stdout.
				1869
				1870	Optional arg indentlevel is the number of blanks by which to indent
				1871	a new MARK level. It defaults to 4.
				1872	"""
				1873
				1874	markstack = []
				1875	indentchunk = ' ' * indentlevel
				1876	for opcode, arg, pos in genops(pickle):
				1877	if pos is not None:
				1878	print >> out, "%5d:" % pos,
				1879
				1880	line = "%s %s%s" % (opcode.code,
				1881	indentchunk * len(markstack),
				1882	opcode.name)
				1883
				1884	markmsg = None
				1885	if markstack and markobject in opcode.stack_before:
				1886	assert markobject not in opcode.stack_after
				1887	markpos = markstack.pop()
				1888	if markpos is not None:
				1889	markmsg = "(MARK at %d)" % markpos
				1890
				1891	if arg is not None or markmsg:
				1892	# make a mild effort to align arguments
				1893	line += ' ' * (10 - len(opcode.name))
				1894	if arg is not None:
				1895	line += ' ' + repr(arg)
				1896	if markmsg:
				1897	line += ' ' + markmsg
				1898	print >> out, line
				1899
				1900	if markobject in opcode.stack_after:
				1901	assert markobject not in opcode.stack_before
				1902	markstack.append(pos)
				1903
				1904
				1905	_dis_test = """
				1906	>>> import pickle
				1907	>>> x = [1, 2, (3, 4), {'abc': u"def"}]
Guido van Rossum	f29d3d6	2003-01-27 22:47:53 +0000	[diff] [blame]	1908	>>> pik = pickle.dumps(x, 0)
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	1909	>>> dis(pik)
				1910	0: ( MARK
				1911	1: l LIST (MARK at 0)
				1912	2: p PUT 0
				1913	5: I INT 1
				1914	8: a APPEND
				1915	9: I INT 2
				1916	12: a APPEND
				1917	13: ( MARK
				1918	14: I INT 3
				1919	17: I INT 4
				1920	20: t TUPLE (MARK at 13)
				1921	21: p PUT 1
				1922	24: a APPEND
				1923	25: ( MARK
				1924	26: d DICT (MARK at 25)
				1925	27: p PUT 2
				1926	30: S STRING 'abc'
				1927	37: p PUT 3
				1928	40: V UNICODE u'def'
				1929	45: p PUT 4
				1930	48: s SETITEM
				1931	49: a APPEND
				1932	50: . STOP
				1933
				1934	Try again with a "binary" pickle.
				1935
				1936	>>> pik = pickle.dumps(x, 1)
				1937	>>> dis(pik)
				1938	0: ] EMPTY_LIST
				1939	1: q BINPUT 0
				1940	3: ( MARK
				1941	4: K BININT1 1
				1942	6: K BININT1 2
				1943	8: ( MARK
				1944	9: K BININT1 3
				1945	11: K BININT1 4
				1946	13: t TUPLE (MARK at 8)
				1947	14: q BINPUT 1
				1948	16: } EMPTY_DICT
				1949	17: q BINPUT 2
				1950	19: U SHORT_BINSTRING 'abc'
				1951	24: q BINPUT 3
				1952	26: X BINUNICODE u'def'
				1953	34: q BINPUT 4
				1954	36: s SETITEM
				1955	37: e APPENDS (MARK at 3)
				1956	38: . STOP
				1957
				1958	Exercise the INST/OBJ/BUILD family.
				1959
				1960	>>> import random
Guido van Rossum	f29d3d6	2003-01-27 22:47:53 +0000	[diff] [blame]	1961	>>> dis(pickle.dumps(random.random, 0))
Tim Peters	d916cf4	2003-01-27 19:01:47 +0000	[diff] [blame]	1962	0: c GLOBAL 'random random'
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	1963	15: p PUT 0
				1964	18: . STOP
				1965
				1966	>>> x = [pickle.PicklingError()] * 2
Guido van Rossum	f29d3d6	2003-01-27 22:47:53 +0000	[diff] [blame]	1967	>>> dis(pickle.dumps(x, 0))
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	1968	0: ( MARK
				1969	1: l LIST (MARK at 0)
				1970	2: p PUT 0
				1971	5: ( MARK
Tim Peters	d916cf4	2003-01-27 19:01:47 +0000	[diff] [blame]	1972	6: i INST 'pickle PicklingError' (MARK at 5)
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	1973	28: p PUT 1
				1974	31: ( MARK
				1975	32: d DICT (MARK at 31)
				1976	33: p PUT 2
				1977	36: S STRING 'args'
				1978	44: p PUT 3
				1979	47: ( MARK
				1980	48: t TUPLE (MARK at 47)
				1981	49: p PUT 4
				1982	52: s SETITEM
				1983	53: b BUILD
				1984	54: a APPEND
				1985	55: g GET 1
				1986	58: a APPEND
				1987	59: . STOP
				1988
				1989	>>> dis(pickle.dumps(x, 1))
				1990	0: ] EMPTY_LIST
				1991	1: q BINPUT 0
				1992	3: ( MARK
				1993	4: ( MARK
Tim Peters	d916cf4	2003-01-27 19:01:47 +0000	[diff] [blame]	1994	5: c GLOBAL 'pickle PicklingError'
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	1995	27: q BINPUT 1
				1996	29: o OBJ (MARK at 4)
				1997	30: q BINPUT 2
				1998	32: } EMPTY_DICT
				1999	33: q BINPUT 3
				2000	35: U SHORT_BINSTRING 'args'
				2001	41: q BINPUT 4
				2002	43: ) EMPTY_TUPLE
				2003	44: s SETITEM
				2004	45: b BUILD
				2005	46: h BINGET 2
				2006	48: e APPENDS (MARK at 3)
				2007	49: . STOP
				2008
				2009	Try "the canonical" recursive-object test.
				2010
				2011	>>> L = []
				2012	>>> T = L,
				2013	>>> L.append(T)
				2014	>>> L[0] is T
				2015	True
				2016	>>> T[0] is L
				2017	True
				2018	>>> L[0][0] is L
				2019	True
				2020	>>> T[0][0] is T
				2021	True
Guido van Rossum	f29d3d6	2003-01-27 22:47:53 +0000	[diff] [blame]	2022	>>> dis(pickle.dumps(L, 0))
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	2023	0: ( MARK
				2024	1: l LIST (MARK at 0)
				2025	2: p PUT 0
				2026	5: ( MARK
				2027	6: g GET 0
				2028	9: t TUPLE (MARK at 5)
				2029	10: p PUT 1
				2030	13: a APPEND
				2031	14: . STOP
				2032	>>> dis(pickle.dumps(L, 1))
				2033	0: ] EMPTY_LIST
				2034	1: q BINPUT 0
				2035	3: ( MARK
				2036	4: h BINGET 0
				2037	6: t TUPLE (MARK at 3)
				2038	7: q BINPUT 1
				2039	9: a APPEND
				2040	10: . STOP
				2041
				2042	The protocol 0 pickle of the tuple causes the disassembly to get confused,
				2043	as it doesn't realize that the POP opcode at 16 gets rid of the MARK at 0
				2044	(so the output remains indented until the end). The protocol 1 pickle
				2045	doesn't trigger this glitch, because the disassembler realizes that
				2046	POP_MARK gets rid of the MARK. Doing a better job on the protocol 0
				2047	pickle would require the disassembler to emulate the stack.
				2048
Guido van Rossum	f29d3d6	2003-01-27 22:47:53 +0000	[diff] [blame]	2049	>>> dis(pickle.dumps(T, 0))
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	2050	0: ( MARK
				2051	1: ( MARK
				2052	2: l LIST (MARK at 1)
				2053	3: p PUT 0
				2054	6: ( MARK
				2055	7: g GET 0
				2056	10: t TUPLE (MARK at 6)
				2057	11: p PUT 1
				2058	14: a APPEND
				2059	15: 0 POP
				2060	16: 0 POP
				2061	17: g GET 1
				2062	20: . STOP
				2063	>>> dis(pickle.dumps(T, 1))
				2064	0: ( MARK
				2065	1: ] EMPTY_LIST
				2066	2: q BINPUT 0
				2067	4: ( MARK
				2068	5: h BINGET 0
				2069	7: t TUPLE (MARK at 4)
				2070	8: q BINPUT 1
				2071	10: a APPEND
				2072	11: 1 POP_MARK (MARK at 0)
				2073	12: h BINGET 1
				2074	14: . STOP
				2075	"""
				2076
				2077	__test__ = {'dissassembler_test': _dis_test,
				2078	}
				2079
				2080	def _test():
				2081	import doctest
				2082	return doctest.testmod()
				2083
				2084	if __name__ == "__main__":
				2085	_test()