Blame - Lib/pickletools.py - platform/external/python/cpython3

blob: 59e0a1520245c34ab4713ddb4aa2e2c4ad90b246 [file] [log] [blame]

Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	1	""""Executable documentation" for the pickle module.
				2
				3	Extensive comments about the pickle protocols and pickle-machine opcodes
				4	can be found here. Some functions meant for external use:
				5
				6	genops(pickle)
				7	Generate all the opcodes in a pickle, as (opcode, arg, position) triples.
				8
				9	dis(pickle, out=None, indentlevel=4)
				10	Print a symbolic disassembly of a pickle.
				11	"""
				12
				13	# Other ideas:
				14	#
				15	# - A pickle verifier: read a pickle and check it exhaustively for
				16	# well-formedness.
				17	#
				18	# - A protocol identifier: examine a pickle and return its protocol number
				19	# (== the highest .proto attr value among all the opcodes in the pickle).
				20	#
				21	# - A pickle optimizer: for example, tuple-building code is sometimes more
				22	# elaborate than necessary, catering for the possibility that the tuple
				23	# is recursive. Or lots of times a PUT is generated that's never accessed
				24	# by a later GET.
				25
				26
				27	"""
				28	"A pickle" is a program for a virtual pickle machine (PM, but more accurately
				29	called an unpickling machine). It's a sequence of opcodes, interpreted by the
				30	PM, building an arbitrarily complex Python object.
				31
				32	For the most part, the PM is very simple: there are no looping, testing, or
				33	conditional instructions, no arithmetic and no function calls. Opcodes are
				34	executed once each, from first to last, until a STOP opcode is reached.
				35
				36	The PM has two data areas, "the stack" and "the memo".
				37
				38	Many opcodes push Python objects onto the stack; e.g., INT pushes a Python
				39	integer object on the stack, whose value is gotten from a decimal string
				40	literal immediately following the INT opcode in the pickle bytestream. Other
				41	opcodes take Python objects off the stack. The result of unpickling is
				42	whatever object is left on the stack when the final STOP opcode is executed.
				43
				44	The memo is simply an array of objects, or it can be implemented as a dict
				45	mapping little integers to objects. The memo serves as the PM's "long term
				46	memory", and the little integers indexing the memo are akin to variable
				47	names. Some opcodes pop a stack object into the memo at a given index,
				48	and others push a memo object at a given index onto the stack again.
				49
				50	At heart, that's all the PM has. Subtleties arise for these reasons:
				51
				52	+ Object identity. Objects can be arbitrarily complex, and subobjects
				53	may be shared (for example, the list [a, a] refers to the same object a
				54	twice). It can be vital that unpickling recreate an isomorphic object
				55	graph, faithfully reproducing sharing.
				56
				57	+ Recursive objects. For example, after "L = []; L.append(L)", L is a
				58	list, and L[0] is the same list. This is related to the object identity
				59	point, and some sequences of pickle opcodes are subtle in order to
				60	get the right result in all cases.
				61
				62	+ Things pickle doesn't know everything about. Examples of things pickle
				63	does know everything about are Python's builtin scalar and container
				64	types, like ints and tuples. They generally have opcodes dedicated to
				65	them. For things like module references and instances of user-defined
				66	classes, pickle's knowledge is limited. Historically, many enhancements
				67	have been made to the pickle protocol in order to do a better (faster,
				68	and/or more compact) job on those.
				69
				70	+ Backward compatibility and micro-optimization. As explained below,
				71	pickle opcodes never go away, not even when better ways to do a thing
				72	get invented. The repertoire of the PM just keeps growing over time.
Tim Peters	1996e23	2003-01-27 19:38:34 +0000	[diff] [blame]	73	So, e.g., there are now five distinct opcodes for building a Python integer,
				74	four of them devoted to "short" integers. Even so, the only way to pickle
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	75	a Python long int takes time quadratic in the number of digits, for both
				76	pickling and unpickling. This isn't so much a subtlety as a source of
				77	wearying complication.
				78
				79
				80	Pickle protocols:
				81
				82	For compatibility, the meaning of a pickle opcode never changes. Instead new
				83	pickle opcodes get added, and each version's unpickler can handle all the
				84	pickle opcodes in all protocol versions to date. So old pickles continue to
				85	be readable forever. The pickler can generally be told to restrict itself to
				86	the subset of opcodes available under previous protocol versions too, so that
				87	users can create pickles under the current version readable by older
				88	versions. However, a pickle does not contain its version number embedded
				89	within it. If an older unpickler tries to read a pickle using a later
				90	protocol, the result is most likely an exception due to seeing an unknown (in
				91	the older unpickler) opcode.
				92
				93	The original pickle used what's now called "protocol 0", and what was called
				94	"text mode" before Python 2.3. The entire pickle bytestream is made up of
				95	printable 7-bit ASCII characters, plus the newline character, in protocol 0.
				96	That's why it was called text mode.
				97
				98	The second major set of additions is now called "protocol 1", and was called
				99	"binary mode" before Python 2.3. This added many opcodes with arguments
				100	consisting of arbitrary bytes, including NUL bytes and unprintable "high bit"
				101	bytes. Binary mode pickles can be substantially smaller than equivalent
				102	text mode pickles, and sometimes faster too; e.g., BININT represents a 4-byte
				103	int as 4 bytes following the opcode, which is cheaper to unpickle than the
				104	(perhaps) 11-character decimal string attached to INT.
				105
				106	The third major set of additions came in Python 2.3, and is called "protocol
				107	2". XXX Write a short blurb when Guido figures out what they are <wink>. XXX
				108	"""
				109
				110	# Meta-rule: Descriptions are stored in instances of descriptor objects,
				111	# with plain constructors. No meta-language is defined from which
				112	# descriptors could be constructed. If you want, e.g., XML, write a little
				113	# program to generate XML from the objects.
				114
				115	##############################################################################
				116	# Some pickle opcodes have an argument, following the opcode in the
				117	# bytestream. An argument is of a specific type, described by an instance
				118	# of ArgumentDescriptor. These are not to be confused with arguments taken
				119	# off the stack -- ArgumentDescriptor applies only to arguments embedded in
				120	# the opcode stream, immediately following an opcode.
				121
				122	# Represents the number of bytes consumed by an argument delimited by the
				123	# next newline character.
				124	UP_TO_NEWLINE = -1
				125
				126	# Represents the number of bytes consumed by a two-argument opcode where
				127	# the first argument gives the number of bytes in the second argument.
				128	TAKEN_FROM_ARGUMENT = -2
				129
				130	class ArgumentDescriptor(object):
				131	__slots__ = (
				132	# name of descriptor record, also a module global name; a string
				133	'name',
				134
				135	# length of argument, in bytes; an int; UP_TO_NEWLINE and
				136	# TAKEN_FROM_ARGUMENT are negative values for variable-length cases
				137	'n',
				138
				139	# a function taking a file-like object, reading this kind of argument
				140	# from the object at the current position, advancing the current
				141	# position by n bytes, and returning the value of the argument
				142	'reader',
				143
				144	# human-readable docs for this arg descriptor; a string
				145	'doc',
				146	)
				147
				148	def __init__(self, name, n, reader, doc):
				149	assert isinstance(name, str)
				150	self.name = name
				151
				152	assert isinstance(n, int) and (n >= 0 or
				153	n is UP_TO_NEWLINE or
				154	n is TAKEN_FROM_ARGUMENT)
				155	self.n = n
				156
				157	self.reader = reader
				158
				159	assert isinstance(doc, str)
				160	self.doc = doc
				161
				162	from struct import unpack as _unpack
				163
				164	def read_uint1(f):
				165	"""
				166	>>> import StringIO
				167	>>> read_uint1(StringIO.StringIO('\\xff'))
				168	255
				169	"""
				170
				171	data = f.read(1)
				172	if data:
				173	return ord(data)
				174	raise ValueError("not enough data in stream to read uint1")
				175
				176	uint1 = ArgumentDescriptor(
				177	name='uint1',
				178	n=1,
				179	reader=read_uint1,
				180	doc="One-byte unsigned integer.")
				181
				182
				183	def read_uint2(f):
				184	"""
				185	>>> import StringIO
				186	>>> read_uint2(StringIO.StringIO('\\xff\\x00'))
				187	255
				188	>>> read_uint2(StringIO.StringIO('\\xff\\xff'))
				189	65535
				190	"""
				191
				192	data = f.read(2)
				193	if len(data) == 2:
				194	return _unpack("<H", data)[0]
				195	raise ValueError("not enough data in stream to read uint2")
				196
				197	uint2 = ArgumentDescriptor(
				198	name='uint2',
				199	n=2,
				200	reader=read_uint2,
				201	doc="Two-byte unsigned integer, little-endian.")
				202
				203
				204	def read_int4(f):
				205	"""
				206	>>> import StringIO
				207	>>> read_int4(StringIO.StringIO('\\xff\\x00\\x00\\x00'))
				208	255
				209	>>> read_int4(StringIO.StringIO('\\x00\\x00\\x00\\x80')) == -(2**31)
				210	True
				211	"""
				212
				213	data = f.read(4)
				214	if len(data) == 4:
				215	return _unpack("<i", data)[0]
				216	raise ValueError("not enough data in stream to read int4")
				217
				218	int4 = ArgumentDescriptor(
				219	name='int4',
				220	n=4,
				221	reader=read_int4,
				222	doc="Four-byte signed integer, little-endian, 2's complement.")
				223
				224
				225	def read_stringnl(f, decode=True, stripquotes=True):
				226	"""
				227	>>> import StringIO
				228	>>> read_stringnl(StringIO.StringIO("'abcd'\\nefg\\n"))
				229	'abcd'
				230
				231	>>> read_stringnl(StringIO.StringIO("\\n"))
				232	Traceback (most recent call last):
				233	...
				234	ValueError: no string quotes around ''
				235
				236	>>> read_stringnl(StringIO.StringIO("\\n"), stripquotes=False)
				237	''
				238
				239	>>> read_stringnl(StringIO.StringIO("''\\n"))
				240	''
				241
				242	>>> read_stringnl(StringIO.StringIO('"abcd"'))
				243	Traceback (most recent call last):
				244	...
				245	ValueError: no newline found when trying to read stringnl
				246
				247	Embedded escapes are undone in the result.
				248	>>> read_stringnl(StringIO.StringIO("'a\\\\nb\\x00c\\td'\\n'e'"))
				249	'a\\nb\\x00c\\td'
				250	"""
				251
				252	data = f.readline()
				253	if not data.endswith('\n'):
				254	raise ValueError("no newline found when trying to read stringnl")
				255	data = data[:-1] # lose the newline
				256
				257	if stripquotes:
				258	for q in "'\"":
				259	if data.startswith(q):
				260	if not data.endswith(q):
				261	raise ValueError("strinq quote %r not found at both "
				262	"ends of %r" % (q, data))
				263	data = data[1:-1]
				264	break
				265	else:
				266	raise ValueError("no string quotes around %r" % data)
				267
				268	# I'm not sure when 'string_escape' was added to the std codecs; it's
				269	# crazy not to use it if it's there.
				270	if decode:
				271	data = data.decode('string_escape')
				272	return data
				273
				274	stringnl = ArgumentDescriptor(
				275	name='stringnl',
				276	n=UP_TO_NEWLINE,
				277	reader=read_stringnl,
				278	doc="""A newline-terminated string.
				279
				280	This is a repr-style string, with embedded escapes, and
				281	bracketing quotes.
				282	""")
				283
				284	def read_stringnl_noescape(f):
				285	return read_stringnl(f, decode=False, stripquotes=False)
				286
				287	stringnl_noescape = ArgumentDescriptor(
				288	name='stringnl_noescape',
				289	n=UP_TO_NEWLINE,
				290	reader=read_stringnl_noescape,
				291	doc="""A newline-terminated string.
				292
				293	This is a str-style string, without embedded escapes,
				294	or bracketing quotes. It should consist solely of
				295	printable ASCII characters.
				296	""")
				297
				298	def read_stringnl_noescape_pair(f):
				299	"""
				300	>>> import StringIO
				301	>>> read_stringnl_noescape_pair(StringIO.StringIO("Queue\\nEmpty\\njunk"))
Tim Peters	d916cf4	2003-01-27 19:01:47 +0000	[diff] [blame]	302	'Queue Empty'
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	303	"""
				304
Tim Peters	d916cf4	2003-01-27 19:01:47 +0000	[diff] [blame]	305	return "%s %s" % (read_stringnl_noescape(f), read_stringnl_noescape(f))
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	306
				307	stringnl_noescape_pair = ArgumentDescriptor(
				308	name='stringnl_noescape_pair',
				309	n=UP_TO_NEWLINE,
				310	reader=read_stringnl_noescape_pair,
				311	doc="""A pair of newline-terminated strings.
				312
				313	These are str-style strings, without embedded
				314	escapes, or bracketing quotes. They should
				315	consist solely of printable ASCII characters.
				316	The pair is returned as a single string, with
Tim Peters	d916cf4	2003-01-27 19:01:47 +0000	[diff] [blame]	317	a single blank separating the two strings.
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	318	""")
				319
				320	def read_string4(f):
				321	"""
				322	>>> import StringIO
				323	>>> read_string4(StringIO.StringIO("\\x00\\x00\\x00\\x00abc"))
				324	''
				325	>>> read_string4(StringIO.StringIO("\\x03\\x00\\x00\\x00abcdef"))
				326	'abc'
				327	>>> read_string4(StringIO.StringIO("\\x00\\x00\\x00\\x03abcdef"))
				328	Traceback (most recent call last):
				329	...
				330	ValueError: expected 50331648 bytes in a string4, but only 6 remain
				331	"""
				332
				333	n = read_int4(f)
				334	if n < 0:
				335	raise ValueError("string4 byte count < 0: %d" % n)
				336	data = f.read(n)
				337	if len(data) == n:
				338	return data
				339	raise ValueError("expected %d bytes in a string4, but only %d remain" %
				340	(n, len(data)))
				341
				342	string4 = ArgumentDescriptor(
				343	name="string4",
				344	n=TAKEN_FROM_ARGUMENT,
				345	reader=read_string4,
				346	doc="""A counted string.
				347
				348	The first argument is a 4-byte little-endian signed int giving
				349	the number of bytes in the string, and the second argument is
				350	that many bytes.
				351	""")
				352
				353
				354	def read_string1(f):
				355	"""
				356	>>> import StringIO
				357	>>> read_string1(StringIO.StringIO("\\x00"))
				358	''
				359	>>> read_string1(StringIO.StringIO("\\x03abcdef"))
				360	'abc'
				361	"""
				362
				363	n = read_uint1(f)
				364	assert n >= 0
				365	data = f.read(n)
				366	if len(data) == n:
				367	return data
				368	raise ValueError("expected %d bytes in a string1, but only %d remain" %
				369	(n, len(data)))
				370
				371	string1 = ArgumentDescriptor(
				372	name="string1",
				373	n=TAKEN_FROM_ARGUMENT,
				374	reader=read_string1,
				375	doc="""A counted string.
				376
				377	The first argument is a 1-byte unsigned int giving the number
				378	of bytes in the string, and the second argument is that many
				379	bytes.
				380	""")
				381
				382
				383	def read_unicodestringnl(f):
				384	"""
				385	>>> import StringIO
				386	>>> read_unicodestringnl(StringIO.StringIO("abc\\uabcd\\njunk"))
				387	u'abc\\uabcd'
				388	"""
				389
				390	data = f.readline()
				391	if not data.endswith('\n'):
				392	raise ValueError("no newline found when trying to read "
				393	"unicodestringnl")
				394	data = data[:-1] # lose the newline
				395	return unicode(data, 'raw-unicode-escape')
				396
				397	unicodestringnl = ArgumentDescriptor(
				398	name='unicodestringnl',
				399	n=UP_TO_NEWLINE,
				400	reader=read_unicodestringnl,
				401	doc="""A newline-terminated Unicode string.
				402
				403	This is raw-unicode-escape encoded, so consists of
				404	printable ASCII characters, and may contain embedded
				405	escape sequences.
				406	""")
				407
				408	def read_unicodestring4(f):
				409	"""
				410	>>> import StringIO
				411	>>> s = u'abcd\\uabcd'
				412	>>> enc = s.encode('utf-8')
				413	>>> enc
				414	'abcd\\xea\\xaf\\x8d'
				415	>>> n = chr(len(enc)) + chr(0) * 3 # little-endian 4-byte length
				416	>>> t = read_unicodestring4(StringIO.StringIO(n + enc + 'junk'))
				417	>>> s == t
				418	True
				419
				420	>>> read_unicodestring4(StringIO.StringIO(n + enc[:-1]))
				421	Traceback (most recent call last):
				422	...
				423	ValueError: expected 7 bytes in a unicodestring4, but only 6 remain
				424	"""
				425
				426	n = read_int4(f)
				427	if n < 0:
				428	raise ValueError("unicodestring4 byte count < 0: %d" % n)
				429	data = f.read(n)
				430	if len(data) == n:
				431	return unicode(data, 'utf-8')
				432	raise ValueError("expected %d bytes in a unicodestring4, but only %d "
				433	"remain" % (n, len(data)))
				434
				435	unicodestring4 = ArgumentDescriptor(
				436	name="unicodestring4",
				437	n=TAKEN_FROM_ARGUMENT,
				438	reader=read_unicodestring4,
				439	doc="""A counted Unicode string.
				440
				441	The first argument is a 4-byte little-endian signed int
				442	giving the number of bytes in the string, and the second
				443	argument-- the UTF-8 encoding of the Unicode string --
				444	contains that many bytes.
				445	""")
				446
				447
				448	def read_decimalnl_short(f):
				449	"""
				450	>>> import StringIO
				451	>>> read_decimalnl_short(StringIO.StringIO("1234\\n56"))
				452	1234
				453
				454	>>> read_decimalnl_short(StringIO.StringIO("1234L\\n56"))
				455	Traceback (most recent call last):
				456	...
				457	ValueError: trailing 'L' not allowed in '1234L'
				458	"""
				459
				460	s = read_stringnl(f, decode=False, stripquotes=False)
				461	if s.endswith("L"):
				462	raise ValueError("trailing 'L' not allowed in %r" % s)
				463
				464	# It's not necessarily true that the result fits in a Python short int:
				465	# the pickle may have been written on a 64-bit box. There's also a hack
				466	# for True and False here.
				467	if s == "00":
				468	return False
				469	elif s == "01":
				470	return True
				471
				472	try:
				473	return int(s)
				474	except OverflowError:
				475	return long(s)
				476
				477	def read_decimalnl_long(f):
				478	"""
				479	>>> import StringIO
				480
				481	>>> read_decimalnl_long(StringIO.StringIO("1234\\n56"))
				482	Traceback (most recent call last):
				483	...
				484	ValueError: trailing 'L' required in '1234'
				485
				486	Someday the trailing 'L' will probably go away from this output.
				487
				488	>>> read_decimalnl_long(StringIO.StringIO("1234L\\n56"))
				489	1234L
				490
				491	>>> read_decimalnl_long(StringIO.StringIO("123456789012345678901234L\\n6"))
				492	123456789012345678901234L
				493	"""
				494
				495	s = read_stringnl(f, decode=False, stripquotes=False)
				496	if not s.endswith("L"):
				497	raise ValueError("trailing 'L' required in %r" % s)
				498	return long(s)
				499
				500
				501	decimalnl_short = ArgumentDescriptor(
				502	name='decimalnl_short',
				503	n=UP_TO_NEWLINE,
				504	reader=read_decimalnl_short,
				505	doc="""A newline-terminated decimal integer literal.
				506
				507	This never has a trailing 'L', and the integer fit
				508	in a short Python int on the box where the pickle
				509	was written -- but there's no guarantee it will fit
				510	in a short Python int on the box where the pickle
				511	is read.
				512	""")
				513
				514	decimalnl_long = ArgumentDescriptor(
				515	name='decimalnl_long',
				516	n=UP_TO_NEWLINE,
				517	reader=read_decimalnl_long,
				518	doc="""A newline-terminated decimal integer literal.
				519
				520	This has a trailing 'L', and can represent integers
				521	of any size.
				522	""")
				523
				524
				525	def read_floatnl(f):
				526	"""
				527	>>> import StringIO
				528	>>> read_floatnl(StringIO.StringIO("-1.25\\n6"))
				529	-1.25
				530	"""
				531	s = read_stringnl(f, decode=False, stripquotes=False)
				532	return float(s)
				533
				534	floatnl = ArgumentDescriptor(
				535	name='floatnl',
				536	n=UP_TO_NEWLINE,
				537	reader=read_floatnl,
				538	doc="""A newline-terminated decimal floating literal.
				539
				540	In general this requires 17 significant digits for roundtrip
				541	identity, and pickling then unpickling infinities, NaNs, and
				542	minus zero doesn't work across boxes, or on some boxes even
				543	on itself (e.g., Windows can't read the strings it produces
				544	for infinities or NaNs).
				545	""")
				546
				547	def read_float8(f):
				548	"""
				549	>>> import StringIO, struct
				550	>>> raw = struct.pack(">d", -1.25)
				551	>>> raw
				552	'\\xbf\\xf4\\x00\\x00\\x00\\x00\\x00\\x00'
				553	>>> read_float8(StringIO.StringIO(raw + "\\n"))
				554	-1.25
				555	"""
				556
				557	data = f.read(8)
				558	if len(data) == 8:
				559	return _unpack(">d", data)[0]
				560	raise ValueError("not enough data in stream to read float8")
				561
				562
				563	float8 = ArgumentDescriptor(
				564	name='float8',
				565	n=8,
				566	reader=read_float8,
				567	doc="""An 8-byte binary representation of a float, big-endian.
				568
				569	The format is unique to Python, and shared with the struct
				570	module (format string '>d') "in theory" (the struct and cPickle
				571	implementations don't share the code -- they should). It's
				572	strongly related to the IEEE-754 double format, and, in normal
				573	cases, is in fact identical to the big-endian 754 double format.
				574	On other boxes the dynamic range is limited to that of a 754
				575	double, and "add a half and chop" rounding is used to reduce
				576	the precision to 53 bits. However, even on a 754 box,
				577	infinities, NaNs, and minus zero may not be handled correctly
				578	(may not survive roundtrip pickling intact).
				579	""")
				580
Guido van Rossum	5a2d8f5	2003-01-27 21:44:25 +0000	[diff] [blame^]	581	# Protocol 2 formats
				582
				583	def decode_long(data):
				584	r"""Decode a long from a two's complement little-endian binary string.
				585	>>> decode_long("\xff\x00")
				586	255L
				587	>>> decode_long("\xff\x7f")
				588	32767L
				589	>>> decode_long("\x00\xff")
				590	-256L
				591	>>> decode_long("\x00\x80")
				592	-32768L
				593	>>>
				594	"""
				595	x = 0L
				596	i = 0L
				597	for c in data:
				598	x \|= long(ord(c)) << i
				599	i += 8L
				600	if i and (x & (1L << (i-1L))):
				601	x -= 1L << i
				602	return x
				603
				604	def read_long1(f):
				605	r"""
				606	>>> import StringIO
				607	>>> read_long1(StringIO.StringIO("\x02\xff\x00"))
				608	255L
				609	>>> read_long1(StringIO.StringIO("\x02\xff\x7f"))
				610	32767L
				611	>>> read_long1(StringIO.StringIO("\x02\x00\xff"))
				612	-256L
				613	>>> read_long1(StringIO.StringIO("\x02\x00\x80"))
				614	-32768L
				615	>>>
				616	"""
				617
				618	n = read_uint1(f)
				619	data = f.read(n)
				620	if len(data) != n:
				621	raise ValueError("not enough data in stream to read long1")
				622	return decode_long(data)
				623
				624	long1 = ArgumentDescriptor(
				625	name="long1",
				626	n=TAKEN_FROM_ARGUMENT,
				627	reader=read_long1,
				628	doc="""A binary long, little-endian, using 1-byte size.
				629
				630	This first reads one byte as an unsigned size, then reads that
				631	many bytes and interprets them as a little-endian long.
				632	""")
				633
				634	def read_long2(f):
				635	r"""
				636	>>> import StringIO
				637	>>> read_long2(StringIO.StringIO("\x02\x00\xff\x00"))
				638	255L
				639	>>> read_long2(StringIO.StringIO("\x02\x00\xff\x7f"))
				640	32767L
				641	>>> read_long2(StringIO.StringIO("\x02\x00\x00\xff"))
				642	-256L
				643	>>> read_long2(StringIO.StringIO("\x02\x00\x00\x80"))
				644	-32768L
				645	>>>
				646	"""
				647
				648	n = read_uint2(f)
				649	data = f.read(n)
				650	if len(data) != n:
				651	raise ValueError("not enough data in stream to read long2")
				652	return decode_long(data)
				653
				654	long2 = ArgumentDescriptor(
				655	name="long2",
				656	n=TAKEN_FROM_ARGUMENT,
				657	reader=read_long2,
				658	doc="""A binary long, little-endian, using 2-byte size.
				659
				660	This first reads two byte as an unsigned size, then reads that
				661	many bytes and interprets them as a little-endian long.
				662	""")
				663
				664	def read_long4(f):
				665	r"""
				666	>>> import StringIO
				667	>>> read_long4(StringIO.StringIO("\x02\x00\x00\x00\xff\x00"))
				668	255L
				669	>>> read_long4(StringIO.StringIO("\x02\x00\x00\x00\xff\x7f"))
				670	32767L
				671	>>> read_long4(StringIO.StringIO("\x02\x00\x00\x00\x00\xff"))
				672	-256L
				673	>>> read_long4(StringIO.StringIO("\x02\x00\x00\x00\x00\x80"))
				674	-32768L
				675	>>>
				676	"""
				677
				678	n = read_int4(f)
				679	if n < 0:
				680	raise ValueError("unicodestring4 byte count < 0: %d" % n)
				681	data = f.read(n)
				682	if len(data) != n:
				683	raise ValueError("not enough data in stream to read long1")
				684	return decode_long(data)
				685
				686	long4 = ArgumentDescriptor(
				687	name="long4",
				688	n=TAKEN_FROM_ARGUMENT,
				689	reader=read_long4,
				690	doc="""A binary representation of a long, little-endian.
				691
				692	This first reads four bytes as a signed size (but requires the
				693	size to be >= 0), then reads that many bytes and interprets them
				694	as a little-endian long.
				695	""")
				696
				697
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	698	##############################################################################
				699	# Object descriptors. The stack used by the pickle machine holds objects,
				700	# and in the stack_before and stack_after attributes of OpcodeInfo
				701	# descriptors we need names to describe the various types of objects that can
				702	# appear on the stack.
				703
				704	class StackObject(object):
				705	__slots__ = (
				706	# name of descriptor record, for info only
				707	'name',
				708
				709	# type of object, or tuple of type objects (meaning the object can
				710	# be of any type in the tuple)
				711	'obtype',
				712
				713	# human-readable docs for this kind of stack object; a string
				714	'doc',
				715	)
				716
				717	def __init__(self, name, obtype, doc):
				718	assert isinstance(name, str)
				719	self.name = name
				720
				721	assert isinstance(obtype, type) or isinstance(obtype, tuple)
				722	if isinstance(obtype, tuple):
				723	for contained in obtype:
				724	assert isinstance(contained, type)
				725	self.obtype = obtype
				726
				727	assert isinstance(doc, str)
				728	self.doc = doc
				729
				730
				731	pyint = StackObject(
				732	name='int',
				733	obtype=int,
				734	doc="A short (as opposed to long) Python integer object.")
				735
				736	pylong = StackObject(
				737	name='long',
				738	obtype=long,
				739	doc="A long (as opposed to short) Python integer object.")
				740
				741	pyinteger_or_bool = StackObject(
				742	name='int_or_bool',
				743	obtype=(int, long, bool),
				744	doc="A Python integer object (short or long), or "
				745	"a Python bool.")
				746
Guido van Rossum	5a2d8f5	2003-01-27 21:44:25 +0000	[diff] [blame^]	747	pybool = StackObject(
				748	name='bool',
				749	obtype=(bool,),
				750	doc="A Python bool object.")
				751
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	752	pyfloat = StackObject(
				753	name='float',
				754	obtype=float,
				755	doc="A Python float object.")
				756
				757	pystring = StackObject(
				758	name='str',
				759	obtype=str,
				760	doc="A Python string object.")
				761
				762	pyunicode = StackObject(
				763	name='unicode',
				764	obtype=unicode,
				765	doc="A Python Unicode string object.")
				766
				767	pynone = StackObject(
				768	name="None",
				769	obtype=type(None),
				770	doc="The Python None object.")
				771
				772	pytuple = StackObject(
				773	name="tuple",
				774	obtype=tuple,
				775	doc="A Python tuple object.")
				776
				777	pylist = StackObject(
				778	name="list",
				779	obtype=list,
				780	doc="A Python list object.")
				781
				782	pydict = StackObject(
				783	name="dict",
				784	obtype=dict,
				785	doc="A Python dict object.")
				786
				787	anyobject = StackObject(
				788	name='any',
				789	obtype=object,
				790	doc="Any kind of object whatsoever.")
				791
				792	markobject = StackObject(
				793	name="mark",
				794	obtype=StackObject,
				795	doc="""'The mark' is a unique object.
				796
				797	Opcodes that operate on a variable number of objects
				798	generally don't embed the count of objects in the opcode,
				799	or pull it off the stack. Instead the MARK opcode is used
				800	to push a special marker object on the stack, and then
				801	some other opcodes grab all the objects from the top of
				802	the stack down to (but not including) the topmost marker
				803	object.
				804	""")
				805
				806	stackslice = StackObject(
				807	name="stackslice",
				808	obtype=StackObject,
				809	doc="""An object representing a contiguous slice of the stack.
				810
				811	This is used in conjuction with markobject, to represent all
				812	of the stack following the topmost markobject. For example,
				813	the POP_MARK opcode changes the stack from
				814
				815	[..., markobject, stackslice]
				816	to
				817	[...]
				818
				819	No matter how many object are on the stack after the topmost
				820	markobject, POP_MARK gets rid of all of them (including the
				821	topmost markobject too).
				822	""")
				823
				824	##############################################################################
				825	# Descriptors for pickle opcodes.
				826
				827	class OpcodeInfo(object):
				828
				829	__slots__ = (
				830	# symbolic name of opcode; a string
				831	'name',
				832
				833	# the code used in a bytestream to represent the opcode; a
				834	# one-character string
				835	'code',
				836
				837	# If the opcode has an argument embedded in the byte string, an
				838	# instance of ArgumentDescriptor specifying its type. Note that
				839	# arg.reader(s) can be used to read and decode the argument from
				840	# the bytestream s, and arg.doc documents the format of the raw
				841	# argument bytes. If the opcode doesn't have an argument embedded
				842	# in the bytestream, arg should be None.
				843	'arg',
				844
				845	# what the stack looks like before this opcode runs; a list
				846	'stack_before',
				847
				848	# what the stack looks like after this opcode runs; a list
				849	'stack_after',
				850
				851	# the protocol number in which this opcode was introduced; an int
				852	'proto',
				853
				854	# human-readable docs for this opcode; a string
				855	'doc',
				856	)
				857
				858	def __init__(self, name, code, arg,
				859	stack_before, stack_after, proto, doc):
				860	assert isinstance(name, str)
				861	self.name = name
				862
				863	assert isinstance(code, str)
				864	assert len(code) == 1
				865	self.code = code
				866
				867	assert arg is None or isinstance(arg, ArgumentDescriptor)
				868	self.arg = arg
				869
				870	assert isinstance(stack_before, list)
				871	for x in stack_before:
				872	assert isinstance(x, StackObject)
				873	self.stack_before = stack_before
				874
				875	assert isinstance(stack_after, list)
				876	for x in stack_after:
				877	assert isinstance(x, StackObject)
				878	self.stack_after = stack_after
				879
				880	assert isinstance(proto, int) and 0 <= proto <= 2
				881	self.proto = proto
				882
				883	assert isinstance(doc, str)
				884	self.doc = doc
				885
				886	I = OpcodeInfo
				887	opcodes = [
				888
				889	# Ways to spell integers.
				890
				891	I(name='INT',
				892	code='I',
				893	arg=decimalnl_short,
				894	stack_before=[],
				895	stack_after=[pyinteger_or_bool],
				896	proto=0,
				897	doc="""Push an integer or bool.
				898
				899	The argument is a newline-terminated decimal literal string.
				900
				901	The intent may have been that this always fit in a short Python int,
				902	but INT can be generated in pickles written on a 64-bit box that
				903	require a Python long on a 32-bit box. The difference between this
				904	and LONG then is that INT skips a trailing 'L', and produces a short
				905	int whenever possible.
				906
				907	Another difference is due to that, when bool was introduced as a
				908	distinct type in 2.3, builtin names True and False were also added to
				909	2.2.2, mapping to ints 1 and 0. For compatibility in both directions,
				910	True gets pickled as INT + "I01\\n", and False as INT + "I00\\n".
				911	Leading zeroes are never produced for a genuine integer. The 2.3
				912	(and later) unpicklers special-case these and return bool instead;
				913	earlier unpicklers ignore the leading "0" and return the int.
				914	"""),
				915
				916	I(name='LONG',
				917	code='L',
				918	arg=decimalnl_long,
				919	stack_before=[],
				920	stack_after=[pylong],
				921	proto=0,
				922	doc="""Push a long integer.
				923
				924	The same as INT, except that the literal ends with 'L', and always
				925	unpickles to a Python long. There doesn't seem a real purpose to the
				926	trailing 'L'.
				927	"""),
				928
				929	I(name='BININT',
				930	code='J',
				931	arg=int4,
				932	stack_before=[],
				933	stack_after=[pyint],
				934	proto=1,
				935	doc="""Push a four-byte signed integer.
				936
				937	This handles the full range of Python (short) integers on a 32-bit
				938	box, directly as binary bytes (1 for the opcode and 4 for the integer).
				939	If the integer is non-negative and fits in 1 or 2 bytes, pickling via
				940	BININT1 or BININT2 saves space.
				941	"""),
				942
				943	I(name='BININT1',
				944	code='K',
				945	arg=uint1,
				946	stack_before=[],
				947	stack_after=[pyint],
				948	proto=1,
				949	doc="""Push a one-byte unsigned integer.
				950
				951	This is a space optimization for pickling very small non-negative ints,
				952	in range(256).
				953	"""),
				954
				955	I(name='BININT2',
				956	code='M',
				957	arg=uint2,
				958	stack_before=[],
				959	stack_after=[pyint],
				960	proto=1,
				961	doc="""Push a two-byte unsigned integer.
				962
				963	This is a space optimization for pickling small positive ints, in
				964	range(256, 2**16). Integers in range(256) can also be pickled via
				965	BININT2, but BININT1 instead saves a byte.
				966	"""),
				967
				968	# Ways to spell strings (8-bit, not Unicode).
				969
				970	I(name='STRING',
				971	code='S',
				972	arg=stringnl,
				973	stack_before=[],
				974	stack_after=[pystring],
				975	proto=0,
				976	doc="""Push a Python string object.
				977
				978	The argument is a repr-style string, with bracketing quote characters,
				979	and perhaps embedded escapes. The argument extends until the next
				980	newline character.
				981	"""),
				982
				983	I(name='BINSTRING',
				984	code='T',
				985	arg=string4,
				986	stack_before=[],
				987	stack_after=[pystring],
				988	proto=1,
				989	doc="""Push a Python string object.
				990
				991	There are two arguments: the first is a 4-byte little-endian signed int
				992	giving the number of bytes in the string, and the second is that many
				993	bytes, which are taken literally as the string content.
				994	"""),
				995
				996	I(name='SHORT_BINSTRING',
				997	code='U',
				998	arg=string1,
				999	stack_before=[],
				1000	stack_after=[pystring],
				1001	proto=1,
				1002	doc="""Push a Python string object.
				1003
				1004	There are two arguments: the first is a 1-byte unsigned int giving
				1005	the number of bytes in the string, and the second is that many bytes,
				1006	which are taken literally as the string content.
				1007	"""),
				1008
				1009	# Ways to spell None.
				1010
				1011	I(name='NONE',
				1012	code='N',
				1013	arg=None,
				1014	stack_before=[],
				1015	stack_after=[pynone],
				1016	proto=0,
				1017	doc="Push None on the stack."),
				1018
				1019	# Ways to spell Unicode strings.
				1020
				1021	I(name='UNICODE',
				1022	code='V',
				1023	arg=unicodestringnl,
				1024	stack_before=[],
				1025	stack_after=[pyunicode],
				1026	proto=0, # this may be pure-text, but it's a later addition
				1027	doc="""Push a Python Unicode string object.
				1028
				1029	The argument is a raw-unicode-escape encoding of a Unicode string,
				1030	and so may contain embedded escape sequences. The argument extends
				1031	until the next newline character.
				1032	"""),
				1033
				1034	I(name='BINUNICODE',
				1035	code='X',
				1036	arg=unicodestring4,
				1037	stack_before=[],
				1038	stack_after=[pyunicode],
				1039	proto=1,
				1040	doc="""Push a Python Unicode string object.
				1041
				1042	There are two arguments: the first is a 4-byte little-endian signed int
				1043	giving the number of bytes in the string. The second is that many
				1044	bytes, and is the UTF-8 encoding of the Unicode string.
				1045	"""),
				1046
				1047	# Ways to spell floats.
				1048
				1049	I(name='FLOAT',
				1050	code='F',
				1051	arg=floatnl,
				1052	stack_before=[],
				1053	stack_after=[pyfloat],
				1054	proto=0,
				1055	doc="""Newline-terminated decimal float literal.
				1056
				1057	The argument is repr(a_float), and in general requires 17 significant
				1058	digits for roundtrip conversion to be an identity (this is so for
				1059	IEEE-754 double precision values, which is what Python float maps to
				1060	on most boxes).
				1061
				1062	In general, FLOAT cannot be used to transport infinities, NaNs, or
				1063	minus zero across boxes (or even on a single box, if the platform C
				1064	library can't read the strings it produces for such things -- Windows
				1065	is like that), but may do less damage than BINFLOAT on boxes with
				1066	greater precision or dynamic range than IEEE-754 double.
				1067	"""),
				1068
				1069	I(name='BINFLOAT',
				1070	code='G',
				1071	arg=float8,
				1072	stack_before=[],
				1073	stack_after=[pyfloat],
				1074	proto=1,
				1075	doc="""Float stored in binary form, with 8 bytes of data.
				1076
				1077	This generally requires less than half the space of FLOAT encoding.
				1078	In general, BINFLOAT cannot be used to transport infinities, NaNs, or
				1079	minus zero, raises an exception if the exponent exceeds the range of
				1080	an IEEE-754 double, and retains no more than 53 bits of precision (if
				1081	there are more than that, "add a half and chop" rounding is used to
				1082	cut it back to 53 significant bits).
				1083	"""),
				1084
				1085	# Ways to build lists.
				1086
				1087	I(name='EMPTY_LIST',
				1088	code=']',
				1089	arg=None,
				1090	stack_before=[],
				1091	stack_after=[pylist],
				1092	proto=1,
				1093	doc="Push an empty list."),
				1094
				1095	I(name='APPEND',
				1096	code='a',
				1097	arg=None,
				1098	stack_before=[pylist, anyobject],
				1099	stack_after=[pylist],
				1100	proto=0,
				1101	doc="""Append an object to a list.
				1102
				1103	Stack before: ... pylist anyobject
				1104	Stack after: ... pylist+[anyobject]
				1105	"""),
				1106
				1107	I(name='APPENDS',
				1108	code='e',
				1109	arg=None,
				1110	stack_before=[pylist, markobject, stackslice],
				1111	stack_after=[pylist],
				1112	proto=1,
				1113	doc="""Extend a list by a slice of stack objects.
				1114
				1115	Stack before: ... pylist markobject stackslice
				1116	Stack after: ... pylist+stackslice
				1117	"""),
				1118
				1119	I(name='LIST',
				1120	code='l',
				1121	arg=None,
				1122	stack_before=[markobject, stackslice],
				1123	stack_after=[pylist],
				1124	proto=0,
				1125	doc="""Build a list out of the topmost stack slice, after markobject.
				1126
				1127	All the stack entries following the topmost markobject are placed into
				1128	a single Python list, which single list object replaces all of the
				1129	stack from the topmost markobject onward. For example,
				1130
				1131	Stack before: ... markobject 1 2 3 'abc'
				1132	Stack after: ... [1, 2, 3, 'abc']
				1133	"""),
				1134
				1135	# Ways to build tuples.
				1136
				1137	I(name='EMPTY_TUPLE',
				1138	code=')',
				1139	arg=None,
				1140	stack_before=[],
				1141	stack_after=[pytuple],
				1142	proto=1,
				1143	doc="Push an empty tuple."),
				1144
				1145	I(name='TUPLE',
				1146	code='t',
				1147	arg=None,
				1148	stack_before=[markobject, stackslice],
				1149	stack_after=[pytuple],
				1150	proto=0,
				1151	doc="""Build a tuple out of the topmost stack slice, after markobject.
				1152
				1153	All the stack entries following the topmost markobject are placed into
				1154	a single Python tuple, which single tuple object replaces all of the
				1155	stack from the topmost markobject onward. For example,
				1156
				1157	Stack before: ... markobject 1 2 3 'abc'
				1158	Stack after: ... (1, 2, 3, 'abc')
				1159	"""),
				1160
				1161	# Ways to build dicts.
				1162
				1163	I(name='EMPTY_DICT',
				1164	code='}',
				1165	arg=None,
				1166	stack_before=[],
				1167	stack_after=[pydict],
				1168	proto=1,
				1169	doc="Push an empty dict."),
				1170
				1171	I(name='DICT',
				1172	code='d',
				1173	arg=None,
				1174	stack_before=[markobject, stackslice],
				1175	stack_after=[pydict],
				1176	proto=0,
				1177	doc="""Build a dict out of the topmost stack slice, after markobject.
				1178
				1179	All the stack entries following the topmost markobject are placed into
				1180	a single Python dict, which single dict object replaces all of the
				1181	stack from the topmost markobject onward. The stack slice alternates
				1182	key, value, key, value, .... For example,
				1183
				1184	Stack before: ... markobject 1 2 3 'abc'
				1185	Stack after: ... {1: 2, 3: 'abc'}
				1186	"""),
				1187
				1188	I(name='SETITEM',
				1189	code='s',
				1190	arg=None,
				1191	stack_before=[pydict, anyobject, anyobject],
				1192	stack_after=[pydict],
				1193	proto=0,
				1194	doc="""Add a key+value pair to an existing dict.
				1195
				1196	Stack before: ... pydict key value
				1197	Stack after: ... pydict
				1198
				1199	where pydict has been modified via pydict[key] = value.
				1200	"""),
				1201
				1202	I(name='SETITEMS',
				1203	code='u',
				1204	arg=None,
				1205	stack_before=[pydict, markobject, stackslice],
				1206	stack_after=[pydict],
				1207	proto=1,
				1208	doc="""Add an arbitrary number of key+value pairs to an existing dict.
				1209
				1210	The slice of the stack following the topmost markobject is taken as
				1211	an alternating sequence of keys and values, added to the dict
				1212	immediately under the topmost markobject. Everything at and after the
				1213	topmost markobject is popped, leaving the mutated dict at the top
				1214	of the stack.
				1215
				1216	Stack before: ... pydict markobject key_1 value_1 ... key_n value_n
				1217	Stack after: ... pydict
				1218
				1219	where pydict has been modified via pydict[key_i] = value_i for i in
				1220	1, 2, ..., n, and in that order.
				1221	"""),
				1222
				1223	# Stack manipulation.
				1224
				1225	I(name='POP',
				1226	code='0',
				1227	arg=None,
				1228	stack_before=[anyobject],
				1229	stack_after=[],
				1230	proto=0,
				1231	doc="Discard the top stack item, shrinking the stack by one item."),
				1232
				1233	I(name='DUP',
				1234	code='2',
				1235	arg=None,
				1236	stack_before=[anyobject],
				1237	stack_after=[anyobject, anyobject],
				1238	proto=0,
				1239	doc="Push the top stack item onto the stack again, duplicating it."),
				1240
				1241	I(name='MARK',
				1242	code='(',
				1243	arg=None,
				1244	stack_before=[],
				1245	stack_after=[markobject],
				1246	proto=0,
				1247	doc="""Push markobject onto the stack.
				1248
				1249	markobject is a unique object, used by other opcodes to identify a
				1250	region of the stack containing a variable number of objects for them
				1251	to work on. See markobject.doc for more detail.
				1252	"""),
				1253
				1254	I(name='POP_MARK',
				1255	code='1',
				1256	arg=None,
				1257	stack_before=[markobject, stackslice],
				1258	stack_after=[],
				1259	proto=0,
				1260	doc="""Pop all the stack objects at and above the topmost markobject.
				1261
				1262	When an opcode using a variable number of stack objects is done,
				1263	POP_MARK is used to remove those objects, and to remove the markobject
				1264	that delimited their starting position on the stack.
				1265	"""),
				1266
				1267	# Memo manipulation. There are really only two operations (get and put),
				1268	# each in all-text, "short binary", and "long binary" flavors.
				1269
				1270	I(name='GET',
				1271	code='g',
				1272	arg=decimalnl_short,
				1273	stack_before=[],
				1274	stack_after=[anyobject],
				1275	proto=0,
				1276	doc="""Read an object from the memo and push it on the stack.
				1277
				1278	The index of the memo object to push is given by the newline-teriminated
				1279	decimal string following. BINGET and LONG_BINGET are space-optimized
				1280	versions.
				1281	"""),
				1282
				1283	I(name='BINGET',
				1284	code='h',
				1285	arg=uint1,
				1286	stack_before=[],
				1287	stack_after=[anyobject],
				1288	proto=1,
				1289	doc="""Read an object from the memo and push it on the stack.
				1290
				1291	The index of the memo object to push is given by the 1-byte unsigned
				1292	integer following.
				1293	"""),
				1294
				1295	I(name='LONG_BINGET',
				1296	code='j',
				1297	arg=int4,
				1298	stack_before=[],
				1299	stack_after=[anyobject],
				1300	proto=1,
				1301	doc="""Read an object from the memo and push it on the stack.
				1302
				1303	The index of the memo object to push is given by the 4-byte signed
				1304	little-endian integer following.
				1305	"""),
				1306
				1307	I(name='PUT',
				1308	code='p',
				1309	arg=decimalnl_short,
				1310	stack_before=[],
				1311	stack_after=[],
				1312	proto=0,
				1313	doc="""Store the stack top into the memo. The stack is not popped.
				1314
				1315	The index of the memo location to write into is given by the newline-
				1316	terminated decimal string following. BINPUT and LONG_BINPUT are
				1317	space-optimized versions.
				1318	"""),
				1319
				1320	I(name='BINPUT',
				1321	code='q',
				1322	arg=uint1,
				1323	stack_before=[],
				1324	stack_after=[],
				1325	proto=1,
				1326	doc="""Store the stack top into the memo. The stack is not popped.
				1327
				1328	The index of the memo location to write into is given by the 1-byte
				1329	unsigned integer following.
				1330	"""),
				1331
				1332	I(name='LONG_BINPUT',
				1333	code='r',
				1334	arg=int4,
				1335	stack_before=[],
				1336	stack_after=[],
				1337	proto=1,
				1338	doc="""Store the stack top into the memo. The stack is not popped.
				1339
				1340	The index of the memo location to write into is given by the 4-byte
				1341	signed little-endian integer following.
				1342	"""),
				1343
				1344	# Push a class object, or module function, on the stack, via its module
				1345	# and name.
				1346
				1347	I(name='GLOBAL',
				1348	code='c',
				1349	arg=stringnl_noescape_pair,
				1350	stack_before=[],
				1351	stack_after=[anyobject],
				1352	proto=0,
				1353	doc="""Push a global object (module.attr) on the stack.
				1354
				1355	Two newline-terminated strings follow the GLOBAL opcode. The first is
				1356	taken as a module name, and the second as a class name. The class
				1357	object module.class is pushed on the stack. More accurately, the
				1358	object returned by self.find_class(module, class) is pushed on the
				1359	stack, so unpickling subclasses can override this form of lookup.
				1360	"""),
				1361
				1362	# Ways to build objects of classes pickle doesn't know about directly
				1363	# (user-defined classes). I despair of documenting this accurately
				1364	# and comprehensibly -- you really have to read the pickle code to
				1365	# find all the special cases.
				1366
				1367	I(name='REDUCE',
				1368	code='R',
				1369	arg=None,
				1370	stack_before=[anyobject, anyobject],
				1371	stack_after=[anyobject],
				1372	proto=0,
				1373	doc="""Push an object built from a callable and an argument tuple.
				1374
				1375	The opcode is named to remind of the __reduce__() method.
				1376
				1377	Stack before: ... callable pytuple
				1378	Stack after: ... callable(*pytuple)
				1379
				1380	The callable and the argument tuple are the first two items returned
				1381	by a __reduce__ method. Applying the callable to the argtuple is
				1382	supposed to reproduce the original object, or at least get it started.
				1383	If the __reduce__ method returns a 3-tuple, the last component is an
				1384	argument to be passed to the object's __setstate__, and then the REDUCE
				1385	opcode is followed by code to create setstate's argument, and then a
				1386	BUILD opcode to apply __setstate__ to that argument.
				1387
				1388	There are lots of special cases here. The argtuple can be None, in
				1389	which case callable.__basicnew__() is called instead to produce the
				1390	object to be pushed on the stack. This appears to be a trick unique
				1391	to ExtensionClasses, and is deprecated regardless.
				1392
				1393	If type(callable) is not ClassType, REDUCE complains unless the
				1394	callable has been registered with the copy_reg module's
				1395	safe_constructors dict, or the callable has a magic
				1396	'__safe_for_unpickling__' attribute with a true value. I'm not sure
				1397	why it does this, but I've sure seen this complaint often enough when
				1398	I didn't want to <wink>.
				1399	"""),
				1400
				1401	I(name='BUILD',
				1402	code='b',
				1403	arg=None,
				1404	stack_before=[anyobject, anyobject],
				1405	stack_after=[anyobject],
				1406	proto=0,
				1407	doc="""Finish building an object, via __setstate__ or dict update.
				1408
				1409	Stack before: ... anyobject argument
				1410	Stack after: ... anyobject
				1411
				1412	where anyobject may have been mutated, as follows:
				1413
				1414	If the object has a __setstate__ method,
				1415
				1416	anyobject.__setstate__(argument)
				1417
				1418	is called.
				1419
				1420	Else the argument must be a dict, the object must have a __dict__, and
				1421	the object is updated via
				1422
				1423	anyobject.__dict__.update(argument)
				1424
				1425	This may raise RuntimeError in restricted execution mode (which
				1426	disallows access to __dict__ directly); in that case, the object
				1427	is updated instead via
				1428
				1429	for k, v in argument.items():
				1430	anyobject[k] = v
				1431	"""),
				1432
				1433	I(name='INST',
				1434	code='i',
				1435	arg=stringnl_noescape_pair,
				1436	stack_before=[markobject, stackslice],
				1437	stack_after=[anyobject],
				1438	proto=0,
				1439	doc="""Build a class instance.
				1440
				1441	This is the protocol 0 version of protocol 1's OBJ opcode.
				1442	INST is followed by two newline-terminated strings, giving a
				1443	module and class name, just as for the GLOBAL opcode (and see
				1444	GLOBAL for more details about that). self.find_class(module, name)
				1445	is used to get a class object.
				1446
				1447	In addition, all the objects on the stack following the topmost
				1448	markobject are gathered into a tuple and popped (along with the
				1449	topmost markobject), just as for the TUPLE opcode.
				1450
				1451	Now it gets complicated. If all of these are true:
				1452
				1453	+ The argtuple is empty (markobject was at the top of the stack
				1454	at the start).
				1455
				1456	+ It's an old-style class object (the type of the class object is
				1457	ClassType).
				1458
				1459	+ The class object does not have a __getinitargs__ attribute.
				1460
				1461	then we want to create an old-style class instance without invoking
				1462	its __init__() method (pickle has waffled on this over the years; not
				1463	calling __init__() is current wisdom). In this case, an instance of
				1464	an old-style dummy class is created, and then we try to rebind its
				1465	__class__ attribute to the desired class object. If this succeeds,
				1466	the new instance object is pushed on the stack, and we're done. In
				1467	restricted execution mode it can fail (assignment to __class__ is
				1468	disallowed), and I'm not really sure what happens then -- it looks
				1469	like the code ends up calling the class object's __init__ anyway,
				1470	via falling into the next case.
				1471
				1472	Else (the argtuple is not empty, it's not an old-style class object,
				1473	or the class object does have a __getinitargs__ attribute), the code
				1474	first insists that the class object have a __safe_for_unpickling__
				1475	attribute. Unlike as for the __safe_for_unpickling__ check in REDUCE,
				1476	it doesn't matter whether this attribute has a true or false value, it
				1477	only matters whether it exists (XXX this smells like a bug). If
				1478	__safe_for_unpickling__ dosn't exist, UnpicklingError is raised.
				1479
				1480	Else (the class object does have a __safe_for_unpickling__ attr),
				1481	the class object obtained from INST's arguments is applied to the
				1482	argtuple obtained from the stack, and the resulting instance object
				1483	is pushed on the stack.
				1484	"""),
				1485
				1486	I(name='OBJ',
				1487	code='o',
				1488	arg=None,
				1489	stack_before=[markobject, anyobject, stackslice],
				1490	stack_after=[anyobject],
				1491	proto=1,
				1492	doc="""Build a class instance.
				1493
				1494	This is the protocol 1 version of protocol 0's INST opcode, and is
				1495	very much like it. The major difference is that the class object
				1496	is taken off the stack, allowing it to be retrieved from the memo
				1497	repeatedly if several instances of the same class are created. This
				1498	can be much more efficient (in both time and space) than repeatedly
				1499	embedding the module and class names in INST opcodes.
				1500
				1501	Unlike INST, OBJ takes no arguments from the opcode stream. Instead
				1502	the class object is taken off the stack, immediately above the
				1503	topmost markobject:
				1504
				1505	Stack before: ... markobject classobject stackslice
				1506	Stack after: ... new_instance_object
				1507
				1508	As for INST, the remainder of the stack above the markobject is
				1509	gathered into an argument tuple, and then the logic seems identical,
				1510	except that no __safe_for_unpickling__ check is done (XXX this smells
				1511	like a bug). See INST for the gory details.
				1512	"""),
				1513
				1514	# Machine control.
				1515
				1516	I(name='STOP',
				1517	code='.',
				1518	arg=None,
				1519	stack_before=[anyobject],
				1520	stack_after=[],
				1521	proto=0,
				1522	doc="""Stop the unpickling machine.
				1523
				1524	Every pickle ends with this opcode. The object at the top of the stack
				1525	is popped, and that's the result of unpickling. The stack should be
				1526	empty then.
				1527	"""),
				1528
				1529	# Ways to deal with persistent IDs.
				1530
				1531	I(name='PERSID',
				1532	code='P',
				1533	arg=stringnl_noescape,
				1534	stack_before=[],
				1535	stack_after=[anyobject],
				1536	proto=0,
				1537	doc="""Push an object identified by a persistent ID.
				1538
				1539	The pickle module doesn't define what a persistent ID means. PERSID's
				1540	argument is a newline-terminated str-style (no embedded escapes, no
				1541	bracketing quote characters) string, which is "the persistent ID".
				1542	The unpickler passes this string to self.persistent_load(). Whatever
				1543	object that returns is pushed on the stack. There is no implementation
				1544	of persistent_load() in Python's unpickler: it must be supplied by an
				1545	unpickler subclass.
				1546	"""),
				1547
				1548	I(name='BINPERSID',
				1549	code='Q',
				1550	arg=None,
				1551	stack_before=[anyobject],
				1552	stack_after=[anyobject],
				1553	proto=1,
				1554	doc="""Push an object identified by a persistent ID.
				1555
				1556	Like PERSID, except the persistent ID is popped off the stack (instead
				1557	of being a string embedded in the opcode bytestream). The persistent
				1558	ID is passed to self.persistent_load(), and whatever object that
				1559	returns is pushed on the stack. See PERSID for more detail.
				1560	"""),
Guido van Rossum	5a2d8f5	2003-01-27 21:44:25 +0000	[diff] [blame^]	1561
				1562	# Protocol 2 opcodes
				1563
				1564	I(name='PROTO',
				1565	code='\x80',
				1566	arg=uint1,
				1567	stack_before=[],
				1568	stack_after=[],
				1569	proto=2,
				1570	doc="""Protocol version indicator.
				1571
				1572	For protocol 2 and above, a pickle must start with this opcode.
				1573	The argument is the protocol version, an int in range(2, 256).
				1574	"""),
				1575
				1576	I(name='NEWOBJ',
				1577	code='\x81',
				1578	arg=None,
				1579	stack_before=[anyobject, anyobject],
				1580	stack_after=[anyobject],
				1581	proto=2,
				1582	doc="""Build an object instance.
				1583
				1584	The stack before should be thought of as containing a class
				1585	object followed by an argument tuple (the tuple being the stack
				1586	top). Call these cls and args. They are popped off the stack,
				1587	and the value returned by cls.__new__(cls, *args) is pushed back
				1588	onto the stack.
				1589	"""),
				1590
				1591	I(name='EXT1',
				1592	code='\x82',
				1593	arg=uint1,
				1594	stack_before=[],
				1595	stack_after=[anyobject],
				1596	proto=2,
				1597	doc="""Extension code.
				1598
				1599	This code and the similar EXT2 and EXT4 allow using a registry
				1600	of popular objects that are pickled by name, typically classes.
				1601	It is envisioned that through a global negotiation and
				1602	registration process, third parties can set up a mapping between
				1603	ints and object names.
				1604
				1605	In order to guarantee pickle interchangeability, the extension
				1606	code registry ought to be global, although a range of codes may
				1607	be reserved for private use.
				1608	"""),
				1609
				1610	I(name='EXT2',
				1611	code='\x83',
				1612	arg=uint2,
				1613	stack_before=[],
				1614	stack_after=[anyobject],
				1615	proto=2,
				1616	doc="""Extension code.
				1617
				1618	See EXT1.
				1619	"""),
				1620
				1621	I(name='EXT4',
				1622	code='\x84',
				1623	arg=int4,
				1624	stack_before=[],
				1625	stack_after=[anyobject],
				1626	proto=2,
				1627	doc="""Extension code.
				1628
				1629	See EXT1.
				1630	"""),
				1631
				1632	I(name='TUPLE1',
				1633	code='\x85',
				1634	arg=None,
				1635	stack_before=[anyobject],
				1636	stack_after=[pytuple],
				1637	proto=2,
				1638	doc="""One-tuple.
				1639
				1640	This code pops one value off the stack and pushes a tuple of
				1641	length 1 whose one item is that value back onto it. IOW:
				1642
				1643	stack[-1] = tuple(stack[-1:])
				1644	"""),
				1645
				1646	I(name='TUPLE2',
				1647	code='\x86',
				1648	arg=None,
				1649	stack_before=[anyobject, anyobject],
				1650	stack_after=[pytuple],
				1651	proto=2,
				1652	doc="""One-tuple.
				1653
				1654	This code pops two values off the stack and pushes a tuple
				1655	of length 2 whose items are those values back onto it. IOW:
				1656
				1657	stack[-2:] = [tuple(stack[-2:])]
				1658	"""),
				1659
				1660	I(name='TUPLE3',
				1661	code='\x87',
				1662	arg=None,
				1663	stack_before=[anyobject, anyobject, anyobject],
				1664	stack_after=[pytuple],
				1665	proto=2,
				1666	doc="""One-tuple.
				1667
				1668	This code pops three values off the stack and pushes a tuple
				1669	of length 3 whose items are those values back onto it. IOW:
				1670
				1671	stack[-3:] = [tuple(stack[-3:])]
				1672	"""),
				1673
				1674	I(name='NEWTRUE',
				1675	code='\x88',
				1676	arg=None,
				1677	stack_before=[],
				1678	stack_after=[pybool],
				1679	proto=2,
				1680	doc="""True.
				1681
				1682	Push True onto the stack."""),
				1683
				1684	I(name='NEWFALSE',
				1685	code='\x89',
				1686	arg=None,
				1687	stack_before=[],
				1688	stack_after=[pybool],
				1689	proto=2,
				1690	doc="""True.
				1691
				1692	Push False onto the stack."""),
				1693
				1694	I(name="LONG1",
				1695	code='\x8a',
				1696	arg=long1,
				1697	stack_before=[],
				1698	stack_after=[pylong],
				1699	proto=2,
				1700	doc="""Long integer using one-byte length.
				1701
				1702	A more efficient encoding of a Python long; the long1 encoding
				1703	says it all."""),
				1704
				1705	I(name="LONG2",
				1706	code='\x8b',
				1707	arg=long2,
				1708	stack_before=[],
				1709	stack_after=[pylong],
				1710	proto=2,
				1711	doc="""Long integer using two-byte length.
				1712
				1713	A more efficient encoding of a Python long; the long2 encoding
				1714	says it all."""),
				1715
				1716	I(name="LONG4",
				1717	code='\x8c',
				1718	arg=long4,
				1719	stack_before=[],
				1720	stack_after=[pylong],
				1721	proto=2,
				1722	doc="""Long integer using found-byte length.
				1723
				1724	A more efficient encoding of a Python long; the long4 encoding
				1725	says it all."""),
				1726
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	1727	]
				1728	del I
				1729
				1730	# Verify uniqueness of .name and .code members.
				1731	name2i = {}
				1732	code2i = {}
				1733
				1734	for i, d in enumerate(opcodes):
				1735	if d.name in name2i:
				1736	raise ValueError("repeated name %r at indices %d and %d" %
				1737	(d.name, name2i[d.name], i))
				1738	if d.code in code2i:
				1739	raise ValueError("repeated code %r at indices %d and %d" %
				1740	(d.code, code2i[d.code], i))
				1741
				1742	name2i[d.name] = i
				1743	code2i[d.code] = i
				1744
				1745	del name2i, code2i, i, d
				1746
				1747	##############################################################################
				1748	# Build a code2op dict, mapping opcode characters to OpcodeInfo records.
				1749	# Also ensure we've got the same stuff as pickle.py, although the
				1750	# introspection here is dicey.
				1751
				1752	code2op = {}
				1753	for d in opcodes:
				1754	code2op[d.code] = d
				1755	del d
				1756
				1757	def assure_pickle_consistency(verbose=False):
				1758	import pickle, re
				1759
				1760	copy = code2op.copy()
				1761	for name in pickle.__all__:
				1762	if not re.match("[A-Z][A-Z0-9_]+$", name):
				1763	if verbose:
				1764	print "skipping %r: it doesn't look like an opcode name" % name
				1765	continue
				1766	picklecode = getattr(pickle, name)
				1767	if not isinstance(picklecode, str) or len(picklecode) != 1:
				1768	if verbose:
				1769	print ("skipping %r: value %r doesn't look like a pickle "
				1770	"code" % (name, picklecode))
				1771	continue
				1772	if picklecode in copy:
				1773	if verbose:
				1774	print "checking name %r w/ code %r for consistency" % (
				1775	name, picklecode)
				1776	d = copy[picklecode]
				1777	if d.name != name:
				1778	raise ValueError("for pickle code %r, pickle.py uses name %r "
				1779	"but we're using name %r" % (picklecode,
				1780	name,
				1781	d.name))
				1782	# Forget this one. Any left over in copy at the end are a problem
				1783	# of a different kind.
				1784	del copy[picklecode]
				1785	else:
				1786	raise ValueError("pickle.py appears to have a pickle opcode with "
				1787	"name %r and code %r, but we don't" %
				1788	(name, picklecode))
				1789	if copy:
				1790	msg = ["we appear to have pickle opcodes that pickle.py doesn't have:"]
				1791	for code, d in copy.items():
				1792	msg.append(" name %r with code %r" % (d.name, code))
				1793	raise ValueError("\n".join(msg))
				1794
				1795	assure_pickle_consistency()
				1796
				1797	##############################################################################
				1798	# A pickle opcode generator.
				1799
				1800	def genops(pickle):
Guido van Rossum	a72ded9	2003-01-27 19:40:47 +0000	[diff] [blame]	1801	"""Generate all the opcodes in a pickle.
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	1802
				1803	'pickle' is a file-like object, or string, containing the pickle.
				1804
				1805	Each opcode in the pickle is generated, from the current pickle position,
				1806	stopping after a STOP opcode is delivered. A triple is generated for
				1807	each opcode:
				1808
				1809	opcode, arg, pos
				1810
				1811	opcode is an OpcodeInfo record, describing the current opcode.
				1812
				1813	If the opcode has an argument embedded in the pickle, arg is its decoded
				1814	value, as a Python object. If the opcode doesn't have an argument, arg
				1815	is None.
				1816
				1817	If the pickle has a tell() method, pos was the value of pickle.tell()
				1818	before reading the current opcode. If the pickle is a string object,
				1819	it's wrapped in a StringIO object, and the latter's tell() result is
				1820	used. Else (the pickle doesn't have a tell(), and it's not obvious how
				1821	to query its current position) pos is None.
				1822	"""
				1823
				1824	import cStringIO as StringIO
				1825
				1826	if isinstance(pickle, str):
				1827	pickle = StringIO.StringIO(pickle)
				1828
				1829	if hasattr(pickle, "tell"):
				1830	getpos = pickle.tell
				1831	else:
				1832	getpos = lambda: None
				1833
				1834	while True:
				1835	pos = getpos()
				1836	code = pickle.read(1)
				1837	opcode = code2op.get(code)
				1838	if opcode is None:
				1839	if code == "":
				1840	raise ValueError("pickle exhausted before seeing STOP")
				1841	else:
				1842	raise ValueError("at position %s, opcode %r unknown" % (
				1843	pos is None and "<unknown>" or pos,
				1844	code))
				1845	if opcode.arg is None:
				1846	arg = None
				1847	else:
				1848	arg = opcode.arg.reader(pickle)
				1849	yield opcode, arg, pos
				1850	if code == '.':
				1851	assert opcode.name == 'STOP'
				1852	break
				1853
				1854	##############################################################################
				1855	# A symbolic pickle disassembler.
				1856
				1857	def dis(pickle, out=None, indentlevel=4):
				1858	"""Produce a symbolic disassembly of a pickle.
				1859
				1860	'pickle' is a file-like object, or string, containing a (at least one)
				1861	pickle. The pickle is disassembled from the current position, through
				1862	the first STOP opcode encountered.
				1863
				1864	Optional arg 'out' is a file-like object to which the disassembly is
				1865	printed. It defaults to sys.stdout.
				1866
				1867	Optional arg indentlevel is the number of blanks by which to indent
				1868	a new MARK level. It defaults to 4.
				1869	"""
				1870
				1871	markstack = []
				1872	indentchunk = ' ' * indentlevel
				1873	for opcode, arg, pos in genops(pickle):
				1874	if pos is not None:
				1875	print >> out, "%5d:" % pos,
				1876
				1877	line = "%s %s%s" % (opcode.code,
				1878	indentchunk * len(markstack),
				1879	opcode.name)
				1880
				1881	markmsg = None
				1882	if markstack and markobject in opcode.stack_before:
				1883	assert markobject not in opcode.stack_after
				1884	markpos = markstack.pop()
				1885	if markpos is not None:
				1886	markmsg = "(MARK at %d)" % markpos
				1887
				1888	if arg is not None or markmsg:
				1889	# make a mild effort to align arguments
				1890	line += ' ' * (10 - len(opcode.name))
				1891	if arg is not None:
				1892	line += ' ' + repr(arg)
				1893	if markmsg:
				1894	line += ' ' + markmsg
				1895	print >> out, line
				1896
				1897	if markobject in opcode.stack_after:
				1898	assert markobject not in opcode.stack_before
				1899	markstack.append(pos)
				1900
				1901
				1902	_dis_test = """
				1903	>>> import pickle
				1904	>>> x = [1, 2, (3, 4), {'abc': u"def"}]
				1905	>>> pik = pickle.dumps(x)
				1906	>>> dis(pik)
				1907	0: ( MARK
				1908	1: l LIST (MARK at 0)
				1909	2: p PUT 0
				1910	5: I INT 1
				1911	8: a APPEND
				1912	9: I INT 2
				1913	12: a APPEND
				1914	13: ( MARK
				1915	14: I INT 3
				1916	17: I INT 4
				1917	20: t TUPLE (MARK at 13)
				1918	21: p PUT 1
				1919	24: a APPEND
				1920	25: ( MARK
				1921	26: d DICT (MARK at 25)
				1922	27: p PUT 2
				1923	30: S STRING 'abc'
				1924	37: p PUT 3
				1925	40: V UNICODE u'def'
				1926	45: p PUT 4
				1927	48: s SETITEM
				1928	49: a APPEND
				1929	50: . STOP
				1930
				1931	Try again with a "binary" pickle.
				1932
				1933	>>> pik = pickle.dumps(x, 1)
				1934	>>> dis(pik)
				1935	0: ] EMPTY_LIST
				1936	1: q BINPUT 0
				1937	3: ( MARK
				1938	4: K BININT1 1
				1939	6: K BININT1 2
				1940	8: ( MARK
				1941	9: K BININT1 3
				1942	11: K BININT1 4
				1943	13: t TUPLE (MARK at 8)
				1944	14: q BINPUT 1
				1945	16: } EMPTY_DICT
				1946	17: q BINPUT 2
				1947	19: U SHORT_BINSTRING 'abc'
				1948	24: q BINPUT 3
				1949	26: X BINUNICODE u'def'
				1950	34: q BINPUT 4
				1951	36: s SETITEM
				1952	37: e APPENDS (MARK at 3)
				1953	38: . STOP
				1954
				1955	Exercise the INST/OBJ/BUILD family.
				1956
				1957	>>> import random
				1958	>>> dis(pickle.dumps(random.random))
Tim Peters	d916cf4	2003-01-27 19:01:47 +0000	[diff] [blame]	1959	0: c GLOBAL 'random random'
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	1960	15: p PUT 0
				1961	18: . STOP
				1962
				1963	>>> x = [pickle.PicklingError()] * 2
				1964	>>> dis(pickle.dumps(x))
				1965	0: ( MARK
				1966	1: l LIST (MARK at 0)
				1967	2: p PUT 0
				1968	5: ( MARK
Tim Peters	d916cf4	2003-01-27 19:01:47 +0000	[diff] [blame]	1969	6: i INST 'pickle PicklingError' (MARK at 5)
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	1970	28: p PUT 1
				1971	31: ( MARK
				1972	32: d DICT (MARK at 31)
				1973	33: p PUT 2
				1974	36: S STRING 'args'
				1975	44: p PUT 3
				1976	47: ( MARK
				1977	48: t TUPLE (MARK at 47)
				1978	49: p PUT 4
				1979	52: s SETITEM
				1980	53: b BUILD
				1981	54: a APPEND
				1982	55: g GET 1
				1983	58: a APPEND
				1984	59: . STOP
				1985
				1986	>>> dis(pickle.dumps(x, 1))
				1987	0: ] EMPTY_LIST
				1988	1: q BINPUT 0
				1989	3: ( MARK
				1990	4: ( MARK
Tim Peters	d916cf4	2003-01-27 19:01:47 +0000	[diff] [blame]	1991	5: c GLOBAL 'pickle PicklingError'
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	1992	27: q BINPUT 1
				1993	29: o OBJ (MARK at 4)
				1994	30: q BINPUT 2
				1995	32: } EMPTY_DICT
				1996	33: q BINPUT 3
				1997	35: U SHORT_BINSTRING 'args'
				1998	41: q BINPUT 4
				1999	43: ) EMPTY_TUPLE
				2000	44: s SETITEM
				2001	45: b BUILD
				2002	46: h BINGET 2
				2003	48: e APPENDS (MARK at 3)
				2004	49: . STOP
				2005
				2006	Try "the canonical" recursive-object test.
				2007
				2008	>>> L = []
				2009	>>> T = L,
				2010	>>> L.append(T)
				2011	>>> L[0] is T
				2012	True
				2013	>>> T[0] is L
				2014	True
				2015	>>> L[0][0] is L
				2016	True
				2017	>>> T[0][0] is T
				2018	True
				2019	>>> dis(pickle.dumps(L))
				2020	0: ( MARK
				2021	1: l LIST (MARK at 0)
				2022	2: p PUT 0
				2023	5: ( MARK
				2024	6: g GET 0
				2025	9: t TUPLE (MARK at 5)
				2026	10: p PUT 1
				2027	13: a APPEND
				2028	14: . STOP
				2029	>>> dis(pickle.dumps(L, 1))
				2030	0: ] EMPTY_LIST
				2031	1: q BINPUT 0
				2032	3: ( MARK
				2033	4: h BINGET 0
				2034	6: t TUPLE (MARK at 3)
				2035	7: q BINPUT 1
				2036	9: a APPEND
				2037	10: . STOP
				2038
				2039	The protocol 0 pickle of the tuple causes the disassembly to get confused,
				2040	as it doesn't realize that the POP opcode at 16 gets rid of the MARK at 0
				2041	(so the output remains indented until the end). The protocol 1 pickle
				2042	doesn't trigger this glitch, because the disassembler realizes that
				2043	POP_MARK gets rid of the MARK. Doing a better job on the protocol 0
				2044	pickle would require the disassembler to emulate the stack.
				2045
				2046	>>> dis(pickle.dumps(T))
				2047	0: ( MARK
				2048	1: ( MARK
				2049	2: l LIST (MARK at 1)
				2050	3: p PUT 0
				2051	6: ( MARK
				2052	7: g GET 0
				2053	10: t TUPLE (MARK at 6)
				2054	11: p PUT 1
				2055	14: a APPEND
				2056	15: 0 POP
				2057	16: 0 POP
				2058	17: g GET 1
				2059	20: . STOP
				2060	>>> dis(pickle.dumps(T, 1))
				2061	0: ( MARK
				2062	1: ] EMPTY_LIST
				2063	2: q BINPUT 0
				2064	4: ( MARK
				2065	5: h BINGET 0
				2066	7: t TUPLE (MARK at 4)
				2067	8: q BINPUT 1
				2068	10: a APPEND
				2069	11: 1 POP_MARK (MARK at 0)
				2070	12: h BINGET 1
				2071	14: . STOP
				2072	"""
				2073
				2074	__test__ = {'dissassembler_test': _dis_test,
				2075	}
				2076
				2077	def _test():
				2078	import doctest
				2079	return doctest.testmod()
				2080
				2081	if __name__ == "__main__":
				2082	_test()