Blame - Lib/pickletools.py - platform/external/python/cpython3

blob: 152ea8816cbfe5138ad6fe63e49fa57402827a9f [file] [log] [blame]

Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	1	""""Executable documentation" for the pickle module.
				2
				3	Extensive comments about the pickle protocols and pickle-machine opcodes
				4	can be found here. Some functions meant for external use:
				5
				6	genops(pickle)
				7	Generate all the opcodes in a pickle, as (opcode, arg, position) triples.
				8
				9	dis(pickle, out=None, indentlevel=4)
				10	Print a symbolic disassembly of a pickle.
				11	"""
				12
				13	# Other ideas:
				14	#
				15	# - A pickle verifier: read a pickle and check it exhaustively for
				16	# well-formedness.
				17	#
				18	# - A protocol identifier: examine a pickle and return its protocol number
				19	# (== the highest .proto attr value among all the opcodes in the pickle).
				20	#
				21	# - A pickle optimizer: for example, tuple-building code is sometimes more
				22	# elaborate than necessary, catering for the possibility that the tuple
				23	# is recursive. Or lots of times a PUT is generated that's never accessed
				24	# by a later GET.
				25
				26
				27	"""
				28	"A pickle" is a program for a virtual pickle machine (PM, but more accurately
				29	called an unpickling machine). It's a sequence of opcodes, interpreted by the
				30	PM, building an arbitrarily complex Python object.
				31
				32	For the most part, the PM is very simple: there are no looping, testing, or
				33	conditional instructions, no arithmetic and no function calls. Opcodes are
				34	executed once each, from first to last, until a STOP opcode is reached.
				35
				36	The PM has two data areas, "the stack" and "the memo".
				37
				38	Many opcodes push Python objects onto the stack; e.g., INT pushes a Python
				39	integer object on the stack, whose value is gotten from a decimal string
				40	literal immediately following the INT opcode in the pickle bytestream. Other
				41	opcodes take Python objects off the stack. The result of unpickling is
				42	whatever object is left on the stack when the final STOP opcode is executed.
				43
				44	The memo is simply an array of objects, or it can be implemented as a dict
				45	mapping little integers to objects. The memo serves as the PM's "long term
				46	memory", and the little integers indexing the memo are akin to variable
				47	names. Some opcodes pop a stack object into the memo at a given index,
				48	and others push a memo object at a given index onto the stack again.
				49
				50	At heart, that's all the PM has. Subtleties arise for these reasons:
				51
				52	+ Object identity. Objects can be arbitrarily complex, and subobjects
				53	may be shared (for example, the list [a, a] refers to the same object a
				54	twice). It can be vital that unpickling recreate an isomorphic object
				55	graph, faithfully reproducing sharing.
				56
				57	+ Recursive objects. For example, after "L = []; L.append(L)", L is a
				58	list, and L[0] is the same list. This is related to the object identity
				59	point, and some sequences of pickle opcodes are subtle in order to
				60	get the right result in all cases.
				61
				62	+ Things pickle doesn't know everything about. Examples of things pickle
				63	does know everything about are Python's builtin scalar and container
				64	types, like ints and tuples. They generally have opcodes dedicated to
				65	them. For things like module references and instances of user-defined
				66	classes, pickle's knowledge is limited. Historically, many enhancements
				67	have been made to the pickle protocol in order to do a better (faster,
				68	and/or more compact) job on those.
				69
				70	+ Backward compatibility and micro-optimization. As explained below,
				71	pickle opcodes never go away, not even when better ways to do a thing
				72	get invented. The repertoire of the PM just keeps growing over time.
Tim Peters	1996e23	2003-01-27 19:38:34 +0000	[diff] [blame]	73	So, e.g., there are now five distinct opcodes for building a Python integer,
				74	four of them devoted to "short" integers. Even so, the only way to pickle
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	75	a Python long int takes time quadratic in the number of digits, for both
				76	pickling and unpickling. This isn't so much a subtlety as a source of
				77	wearying complication.
				78
				79
				80	Pickle protocols:
				81
				82	For compatibility, the meaning of a pickle opcode never changes. Instead new
				83	pickle opcodes get added, and each version's unpickler can handle all the
				84	pickle opcodes in all protocol versions to date. So old pickles continue to
				85	be readable forever. The pickler can generally be told to restrict itself to
				86	the subset of opcodes available under previous protocol versions too, so that
				87	users can create pickles under the current version readable by older
				88	versions. However, a pickle does not contain its version number embedded
				89	within it. If an older unpickler tries to read a pickle using a later
				90	protocol, the result is most likely an exception due to seeing an unknown (in
				91	the older unpickler) opcode.
				92
				93	The original pickle used what's now called "protocol 0", and what was called
				94	"text mode" before Python 2.3. The entire pickle bytestream is made up of
				95	printable 7-bit ASCII characters, plus the newline character, in protocol 0.
				96	That's why it was called text mode.
				97
				98	The second major set of additions is now called "protocol 1", and was called
				99	"binary mode" before Python 2.3. This added many opcodes with arguments
				100	consisting of arbitrary bytes, including NUL bytes and unprintable "high bit"
				101	bytes. Binary mode pickles can be substantially smaller than equivalent
				102	text mode pickles, and sometimes faster too; e.g., BININT represents a 4-byte
				103	int as 4 bytes following the opcode, which is cheaper to unpickle than the
				104	(perhaps) 11-character decimal string attached to INT.
				105
				106	The third major set of additions came in Python 2.3, and is called "protocol
				107	2". XXX Write a short blurb when Guido figures out what they are <wink>. XXX
				108	"""
				109
				110	# Meta-rule: Descriptions are stored in instances of descriptor objects,
				111	# with plain constructors. No meta-language is defined from which
				112	# descriptors could be constructed. If you want, e.g., XML, write a little
				113	# program to generate XML from the objects.
				114
				115	##############################################################################
				116	# Some pickle opcodes have an argument, following the opcode in the
				117	# bytestream. An argument is of a specific type, described by an instance
				118	# of ArgumentDescriptor. These are not to be confused with arguments taken
				119	# off the stack -- ArgumentDescriptor applies only to arguments embedded in
				120	# the opcode stream, immediately following an opcode.
				121
				122	# Represents the number of bytes consumed by an argument delimited by the
				123	# next newline character.
				124	UP_TO_NEWLINE = -1
				125
				126	# Represents the number of bytes consumed by a two-argument opcode where
				127	# the first argument gives the number of bytes in the second argument.
Tim Peters	fdb8cfa	2003-01-28 00:13:19 +0000	[diff] [blame]	128	TAKEN_FROM_ARGUMENT1 = -2 # num bytes is 1-byte unsigned int
				129	TAKEN_FROM_ARGUMENT4 = -3 # num bytes is 4-byte signed little-endian int
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	130
				131	class ArgumentDescriptor(object):
				132	__slots__ = (
				133	# name of descriptor record, also a module global name; a string
				134	'name',
				135
				136	# length of argument, in bytes; an int; UP_TO_NEWLINE and
Tim Peters	fdb8cfa	2003-01-28 00:13:19 +0000	[diff] [blame]	137	# TAKEN_FROM_ARGUMENT{1,4} are negative values for variable-length
				138	# cases
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	139	'n',
				140
				141	# a function taking a file-like object, reading this kind of argument
				142	# from the object at the current position, advancing the current
				143	# position by n bytes, and returning the value of the argument
				144	'reader',
				145
				146	# human-readable docs for this arg descriptor; a string
				147	'doc',
				148	)
				149
				150	def __init__(self, name, n, reader, doc):
				151	assert isinstance(name, str)
				152	self.name = name
				153
				154	assert isinstance(n, int) and (n >= 0 or
Tim Peters	fdb8cfa	2003-01-28 00:13:19 +0000	[diff] [blame]	155	n in (UP_TO_NEWLINE,
				156	TAKEN_FROM_ARGUMENT1,
				157	TAKEN_FROM_ARGUMENT4))
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	158	self.n = n
				159
				160	self.reader = reader
				161
				162	assert isinstance(doc, str)
				163	self.doc = doc
				164
				165	from struct import unpack as _unpack
				166
				167	def read_uint1(f):
				168	"""
				169	>>> import StringIO
				170	>>> read_uint1(StringIO.StringIO('\\xff'))
				171	255
				172	"""
				173
				174	data = f.read(1)
				175	if data:
				176	return ord(data)
				177	raise ValueError("not enough data in stream to read uint1")
				178
				179	uint1 = ArgumentDescriptor(
				180	name='uint1',
				181	n=1,
				182	reader=read_uint1,
				183	doc="One-byte unsigned integer.")
				184
				185
				186	def read_uint2(f):
				187	"""
				188	>>> import StringIO
				189	>>> read_uint2(StringIO.StringIO('\\xff\\x00'))
				190	255
				191	>>> read_uint2(StringIO.StringIO('\\xff\\xff'))
				192	65535
				193	"""
				194
				195	data = f.read(2)
				196	if len(data) == 2:
				197	return _unpack("<H", data)[0]
				198	raise ValueError("not enough data in stream to read uint2")
				199
				200	uint2 = ArgumentDescriptor(
				201	name='uint2',
				202	n=2,
				203	reader=read_uint2,
				204	doc="Two-byte unsigned integer, little-endian.")
				205
				206
				207	def read_int4(f):
				208	"""
				209	>>> import StringIO
				210	>>> read_int4(StringIO.StringIO('\\xff\\x00\\x00\\x00'))
				211	255
				212	>>> read_int4(StringIO.StringIO('\\x00\\x00\\x00\\x80')) == -(2**31)
				213	True
				214	"""
				215
				216	data = f.read(4)
				217	if len(data) == 4:
				218	return _unpack("<i", data)[0]
				219	raise ValueError("not enough data in stream to read int4")
				220
				221	int4 = ArgumentDescriptor(
				222	name='int4',
				223	n=4,
				224	reader=read_int4,
				225	doc="Four-byte signed integer, little-endian, 2's complement.")
				226
				227
				228	def read_stringnl(f, decode=True, stripquotes=True):
				229	"""
				230	>>> import StringIO
				231	>>> read_stringnl(StringIO.StringIO("'abcd'\\nefg\\n"))
				232	'abcd'
				233
				234	>>> read_stringnl(StringIO.StringIO("\\n"))
				235	Traceback (most recent call last):
				236	...
				237	ValueError: no string quotes around ''
				238
				239	>>> read_stringnl(StringIO.StringIO("\\n"), stripquotes=False)
				240	''
				241
				242	>>> read_stringnl(StringIO.StringIO("''\\n"))
				243	''
				244
				245	>>> read_stringnl(StringIO.StringIO('"abcd"'))
				246	Traceback (most recent call last):
				247	...
				248	ValueError: no newline found when trying to read stringnl
				249
				250	Embedded escapes are undone in the result.
				251	>>> read_stringnl(StringIO.StringIO("'a\\\\nb\\x00c\\td'\\n'e'"))
				252	'a\\nb\\x00c\\td'
				253	"""
				254
				255	data = f.readline()
				256	if not data.endswith('\n'):
				257	raise ValueError("no newline found when trying to read stringnl")
				258	data = data[:-1] # lose the newline
				259
				260	if stripquotes:
				261	for q in "'\"":
				262	if data.startswith(q):
				263	if not data.endswith(q):
				264	raise ValueError("strinq quote %r not found at both "
				265	"ends of %r" % (q, data))
				266	data = data[1:-1]
				267	break
				268	else:
				269	raise ValueError("no string quotes around %r" % data)
				270
				271	# I'm not sure when 'string_escape' was added to the std codecs; it's
				272	# crazy not to use it if it's there.
				273	if decode:
				274	data = data.decode('string_escape')
				275	return data
				276
				277	stringnl = ArgumentDescriptor(
				278	name='stringnl',
				279	n=UP_TO_NEWLINE,
				280	reader=read_stringnl,
				281	doc="""A newline-terminated string.
				282
				283	This is a repr-style string, with embedded escapes, and
				284	bracketing quotes.
				285	""")
				286
				287	def read_stringnl_noescape(f):
				288	return read_stringnl(f, decode=False, stripquotes=False)
				289
				290	stringnl_noescape = ArgumentDescriptor(
				291	name='stringnl_noescape',
				292	n=UP_TO_NEWLINE,
				293	reader=read_stringnl_noescape,
				294	doc="""A newline-terminated string.
				295
				296	This is a str-style string, without embedded escapes,
				297	or bracketing quotes. It should consist solely of
				298	printable ASCII characters.
				299	""")
				300
				301	def read_stringnl_noescape_pair(f):
				302	"""
				303	>>> import StringIO
				304	>>> read_stringnl_noescape_pair(StringIO.StringIO("Queue\\nEmpty\\njunk"))
Tim Peters	d916cf4	2003-01-27 19:01:47 +0000	[diff] [blame]	305	'Queue Empty'
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	306	"""
				307
Tim Peters	d916cf4	2003-01-27 19:01:47 +0000	[diff] [blame]	308	return "%s %s" % (read_stringnl_noescape(f), read_stringnl_noescape(f))
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	309
				310	stringnl_noescape_pair = ArgumentDescriptor(
				311	name='stringnl_noescape_pair',
				312	n=UP_TO_NEWLINE,
				313	reader=read_stringnl_noescape_pair,
				314	doc="""A pair of newline-terminated strings.
				315
				316	These are str-style strings, without embedded
				317	escapes, or bracketing quotes. They should
				318	consist solely of printable ASCII characters.
				319	The pair is returned as a single string, with
Tim Peters	d916cf4	2003-01-27 19:01:47 +0000	[diff] [blame]	320	a single blank separating the two strings.
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	321	""")
				322
				323	def read_string4(f):
				324	"""
				325	>>> import StringIO
				326	>>> read_string4(StringIO.StringIO("\\x00\\x00\\x00\\x00abc"))
				327	''
				328	>>> read_string4(StringIO.StringIO("\\x03\\x00\\x00\\x00abcdef"))
				329	'abc'
				330	>>> read_string4(StringIO.StringIO("\\x00\\x00\\x00\\x03abcdef"))
				331	Traceback (most recent call last):
				332	...
				333	ValueError: expected 50331648 bytes in a string4, but only 6 remain
				334	"""
				335
				336	n = read_int4(f)
				337	if n < 0:
				338	raise ValueError("string4 byte count < 0: %d" % n)
				339	data = f.read(n)
				340	if len(data) == n:
				341	return data
				342	raise ValueError("expected %d bytes in a string4, but only %d remain" %
				343	(n, len(data)))
				344
				345	string4 = ArgumentDescriptor(
				346	name="string4",
Tim Peters	fdb8cfa	2003-01-28 00:13:19 +0000	[diff] [blame]	347	n=TAKEN_FROM_ARGUMENT4,
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	348	reader=read_string4,
				349	doc="""A counted string.
				350
				351	The first argument is a 4-byte little-endian signed int giving
				352	the number of bytes in the string, and the second argument is
				353	that many bytes.
				354	""")
				355
				356
				357	def read_string1(f):
				358	"""
				359	>>> import StringIO
				360	>>> read_string1(StringIO.StringIO("\\x00"))
				361	''
				362	>>> read_string1(StringIO.StringIO("\\x03abcdef"))
				363	'abc'
				364	"""
				365
				366	n = read_uint1(f)
				367	assert n >= 0
				368	data = f.read(n)
				369	if len(data) == n:
				370	return data
				371	raise ValueError("expected %d bytes in a string1, but only %d remain" %
				372	(n, len(data)))
				373
				374	string1 = ArgumentDescriptor(
				375	name="string1",
Tim Peters	fdb8cfa	2003-01-28 00:13:19 +0000	[diff] [blame]	376	n=TAKEN_FROM_ARGUMENT1,
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	377	reader=read_string1,
				378	doc="""A counted string.
				379
				380	The first argument is a 1-byte unsigned int giving the number
				381	of bytes in the string, and the second argument is that many
				382	bytes.
				383	""")
				384
				385
				386	def read_unicodestringnl(f):
				387	"""
				388	>>> import StringIO
				389	>>> read_unicodestringnl(StringIO.StringIO("abc\\uabcd\\njunk"))
				390	u'abc\\uabcd'
				391	"""
				392
				393	data = f.readline()
				394	if not data.endswith('\n'):
				395	raise ValueError("no newline found when trying to read "
				396	"unicodestringnl")
				397	data = data[:-1] # lose the newline
				398	return unicode(data, 'raw-unicode-escape')
				399
				400	unicodestringnl = ArgumentDescriptor(
				401	name='unicodestringnl',
				402	n=UP_TO_NEWLINE,
				403	reader=read_unicodestringnl,
				404	doc="""A newline-terminated Unicode string.
				405
				406	This is raw-unicode-escape encoded, so consists of
				407	printable ASCII characters, and may contain embedded
				408	escape sequences.
				409	""")
				410
				411	def read_unicodestring4(f):
				412	"""
				413	>>> import StringIO
				414	>>> s = u'abcd\\uabcd'
				415	>>> enc = s.encode('utf-8')
				416	>>> enc
				417	'abcd\\xea\\xaf\\x8d'
				418	>>> n = chr(len(enc)) + chr(0) * 3 # little-endian 4-byte length
				419	>>> t = read_unicodestring4(StringIO.StringIO(n + enc + 'junk'))
				420	>>> s == t
				421	True
				422
				423	>>> read_unicodestring4(StringIO.StringIO(n + enc[:-1]))
				424	Traceback (most recent call last):
				425	...
				426	ValueError: expected 7 bytes in a unicodestring4, but only 6 remain
				427	"""
				428
				429	n = read_int4(f)
				430	if n < 0:
				431	raise ValueError("unicodestring4 byte count < 0: %d" % n)
				432	data = f.read(n)
				433	if len(data) == n:
				434	return unicode(data, 'utf-8')
				435	raise ValueError("expected %d bytes in a unicodestring4, but only %d "
				436	"remain" % (n, len(data)))
				437
				438	unicodestring4 = ArgumentDescriptor(
				439	name="unicodestring4",
Tim Peters	fdb8cfa	2003-01-28 00:13:19 +0000	[diff] [blame]	440	n=TAKEN_FROM_ARGUMENT4,
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	441	reader=read_unicodestring4,
				442	doc="""A counted Unicode string.
				443
				444	The first argument is a 4-byte little-endian signed int
				445	giving the number of bytes in the string, and the second
				446	argument-- the UTF-8 encoding of the Unicode string --
				447	contains that many bytes.
				448	""")
				449
				450
				451	def read_decimalnl_short(f):
				452	"""
				453	>>> import StringIO
				454	>>> read_decimalnl_short(StringIO.StringIO("1234\\n56"))
				455	1234
				456
				457	>>> read_decimalnl_short(StringIO.StringIO("1234L\\n56"))
				458	Traceback (most recent call last):
				459	...
				460	ValueError: trailing 'L' not allowed in '1234L'
				461	"""
				462
				463	s = read_stringnl(f, decode=False, stripquotes=False)
				464	if s.endswith("L"):
				465	raise ValueError("trailing 'L' not allowed in %r" % s)
				466
				467	# It's not necessarily true that the result fits in a Python short int:
				468	# the pickle may have been written on a 64-bit box. There's also a hack
				469	# for True and False here.
				470	if s == "00":
				471	return False
				472	elif s == "01":
				473	return True
				474
				475	try:
				476	return int(s)
				477	except OverflowError:
				478	return long(s)
				479
				480	def read_decimalnl_long(f):
				481	"""
				482	>>> import StringIO
				483
				484	>>> read_decimalnl_long(StringIO.StringIO("1234\\n56"))
				485	Traceback (most recent call last):
				486	...
				487	ValueError: trailing 'L' required in '1234'
				488
				489	Someday the trailing 'L' will probably go away from this output.
				490
				491	>>> read_decimalnl_long(StringIO.StringIO("1234L\\n56"))
				492	1234L
				493
				494	>>> read_decimalnl_long(StringIO.StringIO("123456789012345678901234L\\n6"))
				495	123456789012345678901234L
				496	"""
				497
				498	s = read_stringnl(f, decode=False, stripquotes=False)
				499	if not s.endswith("L"):
				500	raise ValueError("trailing 'L' required in %r" % s)
				501	return long(s)
				502
				503
				504	decimalnl_short = ArgumentDescriptor(
				505	name='decimalnl_short',
				506	n=UP_TO_NEWLINE,
				507	reader=read_decimalnl_short,
				508	doc="""A newline-terminated decimal integer literal.
				509
				510	This never has a trailing 'L', and the integer fit
				511	in a short Python int on the box where the pickle
				512	was written -- but there's no guarantee it will fit
				513	in a short Python int on the box where the pickle
				514	is read.
				515	""")
				516
				517	decimalnl_long = ArgumentDescriptor(
				518	name='decimalnl_long',
				519	n=UP_TO_NEWLINE,
				520	reader=read_decimalnl_long,
				521	doc="""A newline-terminated decimal integer literal.
				522
				523	This has a trailing 'L', and can represent integers
				524	of any size.
				525	""")
				526
				527
				528	def read_floatnl(f):
				529	"""
				530	>>> import StringIO
				531	>>> read_floatnl(StringIO.StringIO("-1.25\\n6"))
				532	-1.25
				533	"""
				534	s = read_stringnl(f, decode=False, stripquotes=False)
				535	return float(s)
				536
				537	floatnl = ArgumentDescriptor(
				538	name='floatnl',
				539	n=UP_TO_NEWLINE,
				540	reader=read_floatnl,
				541	doc="""A newline-terminated decimal floating literal.
				542
				543	In general this requires 17 significant digits for roundtrip
				544	identity, and pickling then unpickling infinities, NaNs, and
				545	minus zero doesn't work across boxes, or on some boxes even
				546	on itself (e.g., Windows can't read the strings it produces
				547	for infinities or NaNs).
				548	""")
				549
				550	def read_float8(f):
				551	"""
				552	>>> import StringIO, struct
				553	>>> raw = struct.pack(">d", -1.25)
				554	>>> raw
				555	'\\xbf\\xf4\\x00\\x00\\x00\\x00\\x00\\x00'
				556	>>> read_float8(StringIO.StringIO(raw + "\\n"))
				557	-1.25
				558	"""
				559
				560	data = f.read(8)
				561	if len(data) == 8:
				562	return _unpack(">d", data)[0]
				563	raise ValueError("not enough data in stream to read float8")
				564
				565
				566	float8 = ArgumentDescriptor(
				567	name='float8',
				568	n=8,
				569	reader=read_float8,
				570	doc="""An 8-byte binary representation of a float, big-endian.
				571
				572	The format is unique to Python, and shared with the struct
				573	module (format string '>d') "in theory" (the struct and cPickle
				574	implementations don't share the code -- they should). It's
				575	strongly related to the IEEE-754 double format, and, in normal
				576	cases, is in fact identical to the big-endian 754 double format.
				577	On other boxes the dynamic range is limited to that of a 754
				578	double, and "add a half and chop" rounding is used to reduce
				579	the precision to 53 bits. However, even on a 754 box,
				580	infinities, NaNs, and minus zero may not be handled correctly
				581	(may not survive roundtrip pickling intact).
				582	""")
				583
Guido van Rossum	5a2d8f5	2003-01-27 21:44:25 +0000	[diff] [blame]	584	# Protocol 2 formats
				585
				586	def decode_long(data):
				587	r"""Decode a long from a two's complement little-endian binary string.
				588	>>> decode_long("\xff\x00")
				589	255L
				590	>>> decode_long("\xff\x7f")
				591	32767L
				592	>>> decode_long("\x00\xff")
				593	-256L
				594	>>> decode_long("\x00\x80")
				595	-32768L
Tim Peters	217e571	2003-01-27 23:51:11 +0000	[diff] [blame]	596	>>> decode_long("\x80")
				597	-128L
				598	>>> decode_long("\x7f")
				599	127L
Guido van Rossum	5a2d8f5	2003-01-27 21:44:25 +0000	[diff] [blame]	600	"""
				601	x = 0L
				602	i = 0L
				603	for c in data:
				604	x \|= long(ord(c)) << i
				605	i += 8L
Tim Peters	217e571	2003-01-27 23:51:11 +0000	[diff] [blame]	606	if data and ord(c) >= 0x80:
Guido van Rossum	5a2d8f5	2003-01-27 21:44:25 +0000	[diff] [blame]	607	x -= 1L << i
				608	return x
				609
				610	def read_long1(f):
				611	r"""
				612	>>> import StringIO
				613	>>> read_long1(StringIO.StringIO("\x02\xff\x00"))
				614	255L
				615	>>> read_long1(StringIO.StringIO("\x02\xff\x7f"))
				616	32767L
				617	>>> read_long1(StringIO.StringIO("\x02\x00\xff"))
				618	-256L
				619	>>> read_long1(StringIO.StringIO("\x02\x00\x80"))
				620	-32768L
Tim Peters	5eed340	2003-01-27 23:51:36 +0000	[diff] [blame]	621	>>>
Guido van Rossum	5a2d8f5	2003-01-27 21:44:25 +0000	[diff] [blame]	622	"""
				623
				624	n = read_uint1(f)
				625	data = f.read(n)
				626	if len(data) != n:
				627	raise ValueError("not enough data in stream to read long1")
				628	return decode_long(data)
				629
				630	long1 = ArgumentDescriptor(
				631	name="long1",
Tim Peters	fdb8cfa	2003-01-28 00:13:19 +0000	[diff] [blame]	632	n=TAKEN_FROM_ARGUMENT1,
Guido van Rossum	5a2d8f5	2003-01-27 21:44:25 +0000	[diff] [blame]	633	reader=read_long1,
				634	doc="""A binary long, little-endian, using 1-byte size.
				635
				636	This first reads one byte as an unsigned size, then reads that
Tim Peters	bdbe741	2003-01-27 23:54:04 +0000	[diff] [blame]	637	many bytes and interprets them as a little-endian 2's-complement long.
Guido van Rossum	5a2d8f5	2003-01-27 21:44:25 +0000	[diff] [blame]	638	""")
				639
Guido van Rossum	5a2d8f5	2003-01-27 21:44:25 +0000	[diff] [blame]	640	def read_long4(f):
				641	r"""
				642	>>> import StringIO
				643	>>> read_long4(StringIO.StringIO("\x02\x00\x00\x00\xff\x00"))
				644	255L
				645	>>> read_long4(StringIO.StringIO("\x02\x00\x00\x00\xff\x7f"))
				646	32767L
				647	>>> read_long4(StringIO.StringIO("\x02\x00\x00\x00\x00\xff"))
				648	-256L
				649	>>> read_long4(StringIO.StringIO("\x02\x00\x00\x00\x00\x80"))
				650	-32768L
Tim Peters	5eed340	2003-01-27 23:51:36 +0000	[diff] [blame]	651	>>>
Guido van Rossum	5a2d8f5	2003-01-27 21:44:25 +0000	[diff] [blame]	652	"""
				653
				654	n = read_int4(f)
				655	if n < 0:
Neal Norwitz	784a3f5	2003-01-28 00:20:41 +0000	[diff] [blame^]	656	raise ValueError("long4 byte count < 0: %d" % n)
Guido van Rossum	5a2d8f5	2003-01-27 21:44:25 +0000	[diff] [blame]	657	data = f.read(n)
				658	if len(data) != n:
Neal Norwitz	784a3f5	2003-01-28 00:20:41 +0000	[diff] [blame^]	659	raise ValueError("not enough data in stream to read long4")
Guido van Rossum	5a2d8f5	2003-01-27 21:44:25 +0000	[diff] [blame]	660	return decode_long(data)
				661
				662	long4 = ArgumentDescriptor(
				663	name="long4",
Tim Peters	fdb8cfa	2003-01-28 00:13:19 +0000	[diff] [blame]	664	n=TAKEN_FROM_ARGUMENT4,
Guido van Rossum	5a2d8f5	2003-01-27 21:44:25 +0000	[diff] [blame]	665	reader=read_long4,
				666	doc="""A binary representation of a long, little-endian.
				667
				668	This first reads four bytes as a signed size (but requires the
				669	size to be >= 0), then reads that many bytes and interprets them
Tim Peters	bdbe741	2003-01-27 23:54:04 +0000	[diff] [blame]	670	as a little-endian 2's-complement long.
Guido van Rossum	5a2d8f5	2003-01-27 21:44:25 +0000	[diff] [blame]	671	""")
				672
				673
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	674	##############################################################################
				675	# Object descriptors. The stack used by the pickle machine holds objects,
				676	# and in the stack_before and stack_after attributes of OpcodeInfo
				677	# descriptors we need names to describe the various types of objects that can
				678	# appear on the stack.
				679
				680	class StackObject(object):
				681	__slots__ = (
				682	# name of descriptor record, for info only
				683	'name',
				684
				685	# type of object, or tuple of type objects (meaning the object can
				686	# be of any type in the tuple)
				687	'obtype',
				688
				689	# human-readable docs for this kind of stack object; a string
				690	'doc',
				691	)
				692
				693	def __init__(self, name, obtype, doc):
				694	assert isinstance(name, str)
				695	self.name = name
				696
				697	assert isinstance(obtype, type) or isinstance(obtype, tuple)
				698	if isinstance(obtype, tuple):
				699	for contained in obtype:
				700	assert isinstance(contained, type)
				701	self.obtype = obtype
				702
				703	assert isinstance(doc, str)
				704	self.doc = doc
				705
				706
				707	pyint = StackObject(
				708	name='int',
				709	obtype=int,
				710	doc="A short (as opposed to long) Python integer object.")
				711
				712	pylong = StackObject(
				713	name='long',
				714	obtype=long,
				715	doc="A long (as opposed to short) Python integer object.")
				716
				717	pyinteger_or_bool = StackObject(
				718	name='int_or_bool',
				719	obtype=(int, long, bool),
				720	doc="A Python integer object (short or long), or "
				721	"a Python bool.")
				722
Guido van Rossum	5a2d8f5	2003-01-27 21:44:25 +0000	[diff] [blame]	723	pybool = StackObject(
				724	name='bool',
				725	obtype=(bool,),
				726	doc="A Python bool object.")
				727
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	728	pyfloat = StackObject(
				729	name='float',
				730	obtype=float,
				731	doc="A Python float object.")
				732
				733	pystring = StackObject(
				734	name='str',
				735	obtype=str,
				736	doc="A Python string object.")
				737
				738	pyunicode = StackObject(
				739	name='unicode',
				740	obtype=unicode,
				741	doc="A Python Unicode string object.")
				742
				743	pynone = StackObject(
				744	name="None",
				745	obtype=type(None),
				746	doc="The Python None object.")
				747
				748	pytuple = StackObject(
				749	name="tuple",
				750	obtype=tuple,
				751	doc="A Python tuple object.")
				752
				753	pylist = StackObject(
				754	name="list",
				755	obtype=list,
				756	doc="A Python list object.")
				757
				758	pydict = StackObject(
				759	name="dict",
				760	obtype=dict,
				761	doc="A Python dict object.")
				762
				763	anyobject = StackObject(
				764	name='any',
				765	obtype=object,
				766	doc="Any kind of object whatsoever.")
				767
				768	markobject = StackObject(
				769	name="mark",
				770	obtype=StackObject,
				771	doc="""'The mark' is a unique object.
				772
				773	Opcodes that operate on a variable number of objects
				774	generally don't embed the count of objects in the opcode,
				775	or pull it off the stack. Instead the MARK opcode is used
				776	to push a special marker object on the stack, and then
				777	some other opcodes grab all the objects from the top of
				778	the stack down to (but not including) the topmost marker
				779	object.
				780	""")
				781
				782	stackslice = StackObject(
				783	name="stackslice",
				784	obtype=StackObject,
				785	doc="""An object representing a contiguous slice of the stack.
				786
				787	This is used in conjuction with markobject, to represent all
				788	of the stack following the topmost markobject. For example,
				789	the POP_MARK opcode changes the stack from
				790
				791	[..., markobject, stackslice]
				792	to
				793	[...]
				794
				795	No matter how many object are on the stack after the topmost
				796	markobject, POP_MARK gets rid of all of them (including the
				797	topmost markobject too).
				798	""")
				799
				800	##############################################################################
				801	# Descriptors for pickle opcodes.
				802
				803	class OpcodeInfo(object):
				804
				805	__slots__ = (
				806	# symbolic name of opcode; a string
				807	'name',
				808
				809	# the code used in a bytestream to represent the opcode; a
				810	# one-character string
				811	'code',
				812
				813	# If the opcode has an argument embedded in the byte string, an
				814	# instance of ArgumentDescriptor specifying its type. Note that
				815	# arg.reader(s) can be used to read and decode the argument from
				816	# the bytestream s, and arg.doc documents the format of the raw
				817	# argument bytes. If the opcode doesn't have an argument embedded
				818	# in the bytestream, arg should be None.
				819	'arg',
				820
				821	# what the stack looks like before this opcode runs; a list
				822	'stack_before',
				823
				824	# what the stack looks like after this opcode runs; a list
				825	'stack_after',
				826
				827	# the protocol number in which this opcode was introduced; an int
				828	'proto',
				829
				830	# human-readable docs for this opcode; a string
				831	'doc',
				832	)
				833
				834	def __init__(self, name, code, arg,
				835	stack_before, stack_after, proto, doc):
				836	assert isinstance(name, str)
				837	self.name = name
				838
				839	assert isinstance(code, str)
				840	assert len(code) == 1
				841	self.code = code
				842
				843	assert arg is None or isinstance(arg, ArgumentDescriptor)
				844	self.arg = arg
				845
				846	assert isinstance(stack_before, list)
				847	for x in stack_before:
				848	assert isinstance(x, StackObject)
				849	self.stack_before = stack_before
				850
				851	assert isinstance(stack_after, list)
				852	for x in stack_after:
				853	assert isinstance(x, StackObject)
				854	self.stack_after = stack_after
				855
				856	assert isinstance(proto, int) and 0 <= proto <= 2
				857	self.proto = proto
				858
				859	assert isinstance(doc, str)
				860	self.doc = doc
				861
				862	I = OpcodeInfo
				863	opcodes = [
				864
				865	# Ways to spell integers.
				866
				867	I(name='INT',
				868	code='I',
				869	arg=decimalnl_short,
				870	stack_before=[],
				871	stack_after=[pyinteger_or_bool],
				872	proto=0,
				873	doc="""Push an integer or bool.
				874
				875	The argument is a newline-terminated decimal literal string.
				876
				877	The intent may have been that this always fit in a short Python int,
				878	but INT can be generated in pickles written on a 64-bit box that
				879	require a Python long on a 32-bit box. The difference between this
				880	and LONG then is that INT skips a trailing 'L', and produces a short
				881	int whenever possible.
				882
				883	Another difference is due to that, when bool was introduced as a
				884	distinct type in 2.3, builtin names True and False were also added to
				885	2.2.2, mapping to ints 1 and 0. For compatibility in both directions,
				886	True gets pickled as INT + "I01\\n", and False as INT + "I00\\n".
				887	Leading zeroes are never produced for a genuine integer. The 2.3
				888	(and later) unpicklers special-case these and return bool instead;
				889	earlier unpicklers ignore the leading "0" and return the int.
				890	"""),
				891
				892	I(name='LONG',
				893	code='L',
				894	arg=decimalnl_long,
				895	stack_before=[],
				896	stack_after=[pylong],
				897	proto=0,
				898	doc="""Push a long integer.
				899
				900	The same as INT, except that the literal ends with 'L', and always
				901	unpickles to a Python long. There doesn't seem a real purpose to the
				902	trailing 'L'.
				903	"""),
				904
				905	I(name='BININT',
				906	code='J',
				907	arg=int4,
				908	stack_before=[],
				909	stack_after=[pyint],
				910	proto=1,
				911	doc="""Push a four-byte signed integer.
				912
				913	This handles the full range of Python (short) integers on a 32-bit
				914	box, directly as binary bytes (1 for the opcode and 4 for the integer).
				915	If the integer is non-negative and fits in 1 or 2 bytes, pickling via
				916	BININT1 or BININT2 saves space.
				917	"""),
				918
				919	I(name='BININT1',
				920	code='K',
				921	arg=uint1,
				922	stack_before=[],
				923	stack_after=[pyint],
				924	proto=1,
				925	doc="""Push a one-byte unsigned integer.
				926
				927	This is a space optimization for pickling very small non-negative ints,
				928	in range(256).
				929	"""),
				930
				931	I(name='BININT2',
				932	code='M',
				933	arg=uint2,
				934	stack_before=[],
				935	stack_after=[pyint],
				936	proto=1,
				937	doc="""Push a two-byte unsigned integer.
				938
				939	This is a space optimization for pickling small positive ints, in
				940	range(256, 2**16). Integers in range(256) can also be pickled via
				941	BININT2, but BININT1 instead saves a byte.
				942	"""),
				943
				944	# Ways to spell strings (8-bit, not Unicode).
				945
				946	I(name='STRING',
				947	code='S',
				948	arg=stringnl,
				949	stack_before=[],
				950	stack_after=[pystring],
				951	proto=0,
				952	doc="""Push a Python string object.
				953
				954	The argument is a repr-style string, with bracketing quote characters,
				955	and perhaps embedded escapes. The argument extends until the next
				956	newline character.
				957	"""),
				958
				959	I(name='BINSTRING',
				960	code='T',
				961	arg=string4,
				962	stack_before=[],
				963	stack_after=[pystring],
				964	proto=1,
				965	doc="""Push a Python string object.
				966
				967	There are two arguments: the first is a 4-byte little-endian signed int
				968	giving the number of bytes in the string, and the second is that many
				969	bytes, which are taken literally as the string content.
				970	"""),
				971
				972	I(name='SHORT_BINSTRING',
				973	code='U',
				974	arg=string1,
				975	stack_before=[],
				976	stack_after=[pystring],
				977	proto=1,
				978	doc="""Push a Python string object.
				979
				980	There are two arguments: the first is a 1-byte unsigned int giving
				981	the number of bytes in the string, and the second is that many bytes,
				982	which are taken literally as the string content.
				983	"""),
				984
				985	# Ways to spell None.
				986
				987	I(name='NONE',
				988	code='N',
				989	arg=None,
				990	stack_before=[],
				991	stack_after=[pynone],
				992	proto=0,
				993	doc="Push None on the stack."),
				994
				995	# Ways to spell Unicode strings.
				996
				997	I(name='UNICODE',
				998	code='V',
				999	arg=unicodestringnl,
				1000	stack_before=[],
				1001	stack_after=[pyunicode],
				1002	proto=0, # this may be pure-text, but it's a later addition
				1003	doc="""Push a Python Unicode string object.
				1004
				1005	The argument is a raw-unicode-escape encoding of a Unicode string,
				1006	and so may contain embedded escape sequences. The argument extends
				1007	until the next newline character.
				1008	"""),
				1009
				1010	I(name='BINUNICODE',
				1011	code='X',
				1012	arg=unicodestring4,
				1013	stack_before=[],
				1014	stack_after=[pyunicode],
				1015	proto=1,
				1016	doc="""Push a Python Unicode string object.
				1017
				1018	There are two arguments: the first is a 4-byte little-endian signed int
				1019	giving the number of bytes in the string. The second is that many
				1020	bytes, and is the UTF-8 encoding of the Unicode string.
				1021	"""),
				1022
				1023	# Ways to spell floats.
				1024
				1025	I(name='FLOAT',
				1026	code='F',
				1027	arg=floatnl,
				1028	stack_before=[],
				1029	stack_after=[pyfloat],
				1030	proto=0,
				1031	doc="""Newline-terminated decimal float literal.
				1032
				1033	The argument is repr(a_float), and in general requires 17 significant
				1034	digits for roundtrip conversion to be an identity (this is so for
				1035	IEEE-754 double precision values, which is what Python float maps to
				1036	on most boxes).
				1037
				1038	In general, FLOAT cannot be used to transport infinities, NaNs, or
				1039	minus zero across boxes (or even on a single box, if the platform C
				1040	library can't read the strings it produces for such things -- Windows
				1041	is like that), but may do less damage than BINFLOAT on boxes with
				1042	greater precision or dynamic range than IEEE-754 double.
				1043	"""),
				1044
				1045	I(name='BINFLOAT',
				1046	code='G',
				1047	arg=float8,
				1048	stack_before=[],
				1049	stack_after=[pyfloat],
				1050	proto=1,
				1051	doc="""Float stored in binary form, with 8 bytes of data.
				1052
				1053	This generally requires less than half the space of FLOAT encoding.
				1054	In general, BINFLOAT cannot be used to transport infinities, NaNs, or
				1055	minus zero, raises an exception if the exponent exceeds the range of
				1056	an IEEE-754 double, and retains no more than 53 bits of precision (if
				1057	there are more than that, "add a half and chop" rounding is used to
				1058	cut it back to 53 significant bits).
				1059	"""),
				1060
				1061	# Ways to build lists.
				1062
				1063	I(name='EMPTY_LIST',
				1064	code=']',
				1065	arg=None,
				1066	stack_before=[],
				1067	stack_after=[pylist],
				1068	proto=1,
				1069	doc="Push an empty list."),
				1070
				1071	I(name='APPEND',
				1072	code='a',
				1073	arg=None,
				1074	stack_before=[pylist, anyobject],
				1075	stack_after=[pylist],
				1076	proto=0,
				1077	doc="""Append an object to a list.
				1078
				1079	Stack before: ... pylist anyobject
				1080	Stack after: ... pylist+[anyobject]
				1081	"""),
				1082
				1083	I(name='APPENDS',
				1084	code='e',
				1085	arg=None,
				1086	stack_before=[pylist, markobject, stackslice],
				1087	stack_after=[pylist],
				1088	proto=1,
				1089	doc="""Extend a list by a slice of stack objects.
				1090
				1091	Stack before: ... pylist markobject stackslice
				1092	Stack after: ... pylist+stackslice
				1093	"""),
				1094
				1095	I(name='LIST',
				1096	code='l',
				1097	arg=None,
				1098	stack_before=[markobject, stackslice],
				1099	stack_after=[pylist],
				1100	proto=0,
				1101	doc="""Build a list out of the topmost stack slice, after markobject.
				1102
				1103	All the stack entries following the topmost markobject are placed into
				1104	a single Python list, which single list object replaces all of the
				1105	stack from the topmost markobject onward. For example,
				1106
				1107	Stack before: ... markobject 1 2 3 'abc'
				1108	Stack after: ... [1, 2, 3, 'abc']
				1109	"""),
				1110
				1111	# Ways to build tuples.
				1112
				1113	I(name='EMPTY_TUPLE',
				1114	code=')',
				1115	arg=None,
				1116	stack_before=[],
				1117	stack_after=[pytuple],
				1118	proto=1,
				1119	doc="Push an empty tuple."),
				1120
				1121	I(name='TUPLE',
				1122	code='t',
				1123	arg=None,
				1124	stack_before=[markobject, stackslice],
				1125	stack_after=[pytuple],
				1126	proto=0,
				1127	doc="""Build a tuple out of the topmost stack slice, after markobject.
				1128
				1129	All the stack entries following the topmost markobject are placed into
				1130	a single Python tuple, which single tuple object replaces all of the
				1131	stack from the topmost markobject onward. For example,
				1132
				1133	Stack before: ... markobject 1 2 3 'abc'
				1134	Stack after: ... (1, 2, 3, 'abc')
				1135	"""),
				1136
				1137	# Ways to build dicts.
				1138
				1139	I(name='EMPTY_DICT',
				1140	code='}',
				1141	arg=None,
				1142	stack_before=[],
				1143	stack_after=[pydict],
				1144	proto=1,
				1145	doc="Push an empty dict."),
				1146
				1147	I(name='DICT',
				1148	code='d',
				1149	arg=None,
				1150	stack_before=[markobject, stackslice],
				1151	stack_after=[pydict],
				1152	proto=0,
				1153	doc="""Build a dict out of the topmost stack slice, after markobject.
				1154
				1155	All the stack entries following the topmost markobject are placed into
				1156	a single Python dict, which single dict object replaces all of the
				1157	stack from the topmost markobject onward. The stack slice alternates
				1158	key, value, key, value, .... For example,
				1159
				1160	Stack before: ... markobject 1 2 3 'abc'
				1161	Stack after: ... {1: 2, 3: 'abc'}
				1162	"""),
				1163
				1164	I(name='SETITEM',
				1165	code='s',
				1166	arg=None,
				1167	stack_before=[pydict, anyobject, anyobject],
				1168	stack_after=[pydict],
				1169	proto=0,
				1170	doc="""Add a key+value pair to an existing dict.
				1171
				1172	Stack before: ... pydict key value
				1173	Stack after: ... pydict
				1174
				1175	where pydict has been modified via pydict[key] = value.
				1176	"""),
				1177
				1178	I(name='SETITEMS',
				1179	code='u',
				1180	arg=None,
				1181	stack_before=[pydict, markobject, stackslice],
				1182	stack_after=[pydict],
				1183	proto=1,
				1184	doc="""Add an arbitrary number of key+value pairs to an existing dict.
				1185
				1186	The slice of the stack following the topmost markobject is taken as
				1187	an alternating sequence of keys and values, added to the dict
				1188	immediately under the topmost markobject. Everything at and after the
				1189	topmost markobject is popped, leaving the mutated dict at the top
				1190	of the stack.
				1191
				1192	Stack before: ... pydict markobject key_1 value_1 ... key_n value_n
				1193	Stack after: ... pydict
				1194
				1195	where pydict has been modified via pydict[key_i] = value_i for i in
				1196	1, 2, ..., n, and in that order.
				1197	"""),
				1198
				1199	# Stack manipulation.
				1200
				1201	I(name='POP',
				1202	code='0',
				1203	arg=None,
				1204	stack_before=[anyobject],
				1205	stack_after=[],
				1206	proto=0,
				1207	doc="Discard the top stack item, shrinking the stack by one item."),
				1208
				1209	I(name='DUP',
				1210	code='2',
				1211	arg=None,
				1212	stack_before=[anyobject],
				1213	stack_after=[anyobject, anyobject],
				1214	proto=0,
				1215	doc="Push the top stack item onto the stack again, duplicating it."),
				1216
				1217	I(name='MARK',
				1218	code='(',
				1219	arg=None,
				1220	stack_before=[],
				1221	stack_after=[markobject],
				1222	proto=0,
				1223	doc="""Push markobject onto the stack.
				1224
				1225	markobject is a unique object, used by other opcodes to identify a
				1226	region of the stack containing a variable number of objects for them
				1227	to work on. See markobject.doc for more detail.
				1228	"""),
				1229
				1230	I(name='POP_MARK',
				1231	code='1',
				1232	arg=None,
				1233	stack_before=[markobject, stackslice],
				1234	stack_after=[],
				1235	proto=0,
				1236	doc="""Pop all the stack objects at and above the topmost markobject.
				1237
				1238	When an opcode using a variable number of stack objects is done,
				1239	POP_MARK is used to remove those objects, and to remove the markobject
				1240	that delimited their starting position on the stack.
				1241	"""),
				1242
				1243	# Memo manipulation. There are really only two operations (get and put),
				1244	# each in all-text, "short binary", and "long binary" flavors.
				1245
				1246	I(name='GET',
				1247	code='g',
				1248	arg=decimalnl_short,
				1249	stack_before=[],
				1250	stack_after=[anyobject],
				1251	proto=0,
				1252	doc="""Read an object from the memo and push it on the stack.
				1253
				1254	The index of the memo object to push is given by the newline-teriminated
				1255	decimal string following. BINGET and LONG_BINGET are space-optimized
				1256	versions.
				1257	"""),
				1258
				1259	I(name='BINGET',
				1260	code='h',
				1261	arg=uint1,
				1262	stack_before=[],
				1263	stack_after=[anyobject],
				1264	proto=1,
				1265	doc="""Read an object from the memo and push it on the stack.
				1266
				1267	The index of the memo object to push is given by the 1-byte unsigned
				1268	integer following.
				1269	"""),
				1270
				1271	I(name='LONG_BINGET',
				1272	code='j',
				1273	arg=int4,
				1274	stack_before=[],
				1275	stack_after=[anyobject],
				1276	proto=1,
				1277	doc="""Read an object from the memo and push it on the stack.
				1278
				1279	The index of the memo object to push is given by the 4-byte signed
				1280	little-endian integer following.
				1281	"""),
				1282
				1283	I(name='PUT',
				1284	code='p',
				1285	arg=decimalnl_short,
				1286	stack_before=[],
				1287	stack_after=[],
				1288	proto=0,
				1289	doc="""Store the stack top into the memo. The stack is not popped.
				1290
				1291	The index of the memo location to write into is given by the newline-
				1292	terminated decimal string following. BINPUT and LONG_BINPUT are
				1293	space-optimized versions.
				1294	"""),
				1295
				1296	I(name='BINPUT',
				1297	code='q',
				1298	arg=uint1,
				1299	stack_before=[],
				1300	stack_after=[],
				1301	proto=1,
				1302	doc="""Store the stack top into the memo. The stack is not popped.
				1303
				1304	The index of the memo location to write into is given by the 1-byte
				1305	unsigned integer following.
				1306	"""),
				1307
				1308	I(name='LONG_BINPUT',
				1309	code='r',
				1310	arg=int4,
				1311	stack_before=[],
				1312	stack_after=[],
				1313	proto=1,
				1314	doc="""Store the stack top into the memo. The stack is not popped.
				1315
				1316	The index of the memo location to write into is given by the 4-byte
				1317	signed little-endian integer following.
				1318	"""),
				1319
				1320	# Push a class object, or module function, on the stack, via its module
				1321	# and name.
				1322
				1323	I(name='GLOBAL',
				1324	code='c',
				1325	arg=stringnl_noescape_pair,
				1326	stack_before=[],
				1327	stack_after=[anyobject],
				1328	proto=0,
				1329	doc="""Push a global object (module.attr) on the stack.
				1330
				1331	Two newline-terminated strings follow the GLOBAL opcode. The first is
				1332	taken as a module name, and the second as a class name. The class
				1333	object module.class is pushed on the stack. More accurately, the
				1334	object returned by self.find_class(module, class) is pushed on the
				1335	stack, so unpickling subclasses can override this form of lookup.
				1336	"""),
				1337
				1338	# Ways to build objects of classes pickle doesn't know about directly
				1339	# (user-defined classes). I despair of documenting this accurately
				1340	# and comprehensibly -- you really have to read the pickle code to
				1341	# find all the special cases.
				1342
				1343	I(name='REDUCE',
				1344	code='R',
				1345	arg=None,
				1346	stack_before=[anyobject, anyobject],
				1347	stack_after=[anyobject],
				1348	proto=0,
				1349	doc="""Push an object built from a callable and an argument tuple.
				1350
				1351	The opcode is named to remind of the __reduce__() method.
				1352
				1353	Stack before: ... callable pytuple
				1354	Stack after: ... callable(*pytuple)
				1355
				1356	The callable and the argument tuple are the first two items returned
				1357	by a __reduce__ method. Applying the callable to the argtuple is
				1358	supposed to reproduce the original object, or at least get it started.
				1359	If the __reduce__ method returns a 3-tuple, the last component is an
				1360	argument to be passed to the object's __setstate__, and then the REDUCE
				1361	opcode is followed by code to create setstate's argument, and then a
				1362	BUILD opcode to apply __setstate__ to that argument.
				1363
				1364	There are lots of special cases here. The argtuple can be None, in
				1365	which case callable.__basicnew__() is called instead to produce the
				1366	object to be pushed on the stack. This appears to be a trick unique
				1367	to ExtensionClasses, and is deprecated regardless.
				1368
				1369	If type(callable) is not ClassType, REDUCE complains unless the
				1370	callable has been registered with the copy_reg module's
				1371	safe_constructors dict, or the callable has a magic
				1372	'__safe_for_unpickling__' attribute with a true value. I'm not sure
				1373	why it does this, but I've sure seen this complaint often enough when
				1374	I didn't want to <wink>.
				1375	"""),
				1376
				1377	I(name='BUILD',
				1378	code='b',
				1379	arg=None,
				1380	stack_before=[anyobject, anyobject],
				1381	stack_after=[anyobject],
				1382	proto=0,
				1383	doc="""Finish building an object, via __setstate__ or dict update.
				1384
				1385	Stack before: ... anyobject argument
				1386	Stack after: ... anyobject
				1387
				1388	where anyobject may have been mutated, as follows:
				1389
				1390	If the object has a __setstate__ method,
				1391
				1392	anyobject.__setstate__(argument)
				1393
				1394	is called.
				1395
				1396	Else the argument must be a dict, the object must have a __dict__, and
				1397	the object is updated via
				1398
				1399	anyobject.__dict__.update(argument)
				1400
				1401	This may raise RuntimeError in restricted execution mode (which
				1402	disallows access to __dict__ directly); in that case, the object
				1403	is updated instead via
				1404
				1405	for k, v in argument.items():
				1406	anyobject[k] = v
				1407	"""),
				1408
				1409	I(name='INST',
				1410	code='i',
				1411	arg=stringnl_noescape_pair,
				1412	stack_before=[markobject, stackslice],
				1413	stack_after=[anyobject],
				1414	proto=0,
				1415	doc="""Build a class instance.
				1416
				1417	This is the protocol 0 version of protocol 1's OBJ opcode.
				1418	INST is followed by two newline-terminated strings, giving a
				1419	module and class name, just as for the GLOBAL opcode (and see
				1420	GLOBAL for more details about that). self.find_class(module, name)
				1421	is used to get a class object.
				1422
				1423	In addition, all the objects on the stack following the topmost
				1424	markobject are gathered into a tuple and popped (along with the
				1425	topmost markobject), just as for the TUPLE opcode.
				1426
				1427	Now it gets complicated. If all of these are true:
				1428
				1429	+ The argtuple is empty (markobject was at the top of the stack
				1430	at the start).
				1431
				1432	+ It's an old-style class object (the type of the class object is
				1433	ClassType).
				1434
				1435	+ The class object does not have a __getinitargs__ attribute.
				1436
				1437	then we want to create an old-style class instance without invoking
				1438	its __init__() method (pickle has waffled on this over the years; not
				1439	calling __init__() is current wisdom). In this case, an instance of
				1440	an old-style dummy class is created, and then we try to rebind its
				1441	__class__ attribute to the desired class object. If this succeeds,
				1442	the new instance object is pushed on the stack, and we're done. In
				1443	restricted execution mode it can fail (assignment to __class__ is
				1444	disallowed), and I'm not really sure what happens then -- it looks
				1445	like the code ends up calling the class object's __init__ anyway,
				1446	via falling into the next case.
				1447
				1448	Else (the argtuple is not empty, it's not an old-style class object,
				1449	or the class object does have a __getinitargs__ attribute), the code
				1450	first insists that the class object have a __safe_for_unpickling__
				1451	attribute. Unlike as for the __safe_for_unpickling__ check in REDUCE,
				1452	it doesn't matter whether this attribute has a true or false value, it
				1453	only matters whether it exists (XXX this smells like a bug). If
				1454	__safe_for_unpickling__ dosn't exist, UnpicklingError is raised.
				1455
				1456	Else (the class object does have a __safe_for_unpickling__ attr),
				1457	the class object obtained from INST's arguments is applied to the
				1458	argtuple obtained from the stack, and the resulting instance object
				1459	is pushed on the stack.
				1460	"""),
				1461
				1462	I(name='OBJ',
				1463	code='o',
				1464	arg=None,
				1465	stack_before=[markobject, anyobject, stackslice],
				1466	stack_after=[anyobject],
				1467	proto=1,
				1468	doc="""Build a class instance.
				1469
				1470	This is the protocol 1 version of protocol 0's INST opcode, and is
				1471	very much like it. The major difference is that the class object
				1472	is taken off the stack, allowing it to be retrieved from the memo
				1473	repeatedly if several instances of the same class are created. This
				1474	can be much more efficient (in both time and space) than repeatedly
				1475	embedding the module and class names in INST opcodes.
				1476
				1477	Unlike INST, OBJ takes no arguments from the opcode stream. Instead
				1478	the class object is taken off the stack, immediately above the
				1479	topmost markobject:
				1480
				1481	Stack before: ... markobject classobject stackslice
				1482	Stack after: ... new_instance_object
				1483
				1484	As for INST, the remainder of the stack above the markobject is
				1485	gathered into an argument tuple, and then the logic seems identical,
				1486	except that no __safe_for_unpickling__ check is done (XXX this smells
				1487	like a bug). See INST for the gory details.
				1488	"""),
				1489
				1490	# Machine control.
				1491
				1492	I(name='STOP',
				1493	code='.',
				1494	arg=None,
				1495	stack_before=[anyobject],
				1496	stack_after=[],
				1497	proto=0,
				1498	doc="""Stop the unpickling machine.
				1499
				1500	Every pickle ends with this opcode. The object at the top of the stack
				1501	is popped, and that's the result of unpickling. The stack should be
				1502	empty then.
				1503	"""),
				1504
				1505	# Ways to deal with persistent IDs.
				1506
				1507	I(name='PERSID',
				1508	code='P',
				1509	arg=stringnl_noescape,
				1510	stack_before=[],
				1511	stack_after=[anyobject],
				1512	proto=0,
				1513	doc="""Push an object identified by a persistent ID.
				1514
				1515	The pickle module doesn't define what a persistent ID means. PERSID's
				1516	argument is a newline-terminated str-style (no embedded escapes, no
				1517	bracketing quote characters) string, which is "the persistent ID".
				1518	The unpickler passes this string to self.persistent_load(). Whatever
				1519	object that returns is pushed on the stack. There is no implementation
				1520	of persistent_load() in Python's unpickler: it must be supplied by an
				1521	unpickler subclass.
				1522	"""),
				1523
				1524	I(name='BINPERSID',
				1525	code='Q',
				1526	arg=None,
				1527	stack_before=[anyobject],
				1528	stack_after=[anyobject],
				1529	proto=1,
				1530	doc="""Push an object identified by a persistent ID.
				1531
				1532	Like PERSID, except the persistent ID is popped off the stack (instead
				1533	of being a string embedded in the opcode bytestream). The persistent
				1534	ID is passed to self.persistent_load(), and whatever object that
				1535	returns is pushed on the stack. See PERSID for more detail.
				1536	"""),
Guido van Rossum	5a2d8f5	2003-01-27 21:44:25 +0000	[diff] [blame]	1537
				1538	# Protocol 2 opcodes
				1539
				1540	I(name='PROTO',
				1541	code='\x80',
				1542	arg=uint1,
				1543	stack_before=[],
				1544	stack_after=[],
				1545	proto=2,
				1546	doc="""Protocol version indicator.
				1547
				1548	For protocol 2 and above, a pickle must start with this opcode.
				1549	The argument is the protocol version, an int in range(2, 256).
				1550	"""),
				1551
				1552	I(name='NEWOBJ',
				1553	code='\x81',
				1554	arg=None,
				1555	stack_before=[anyobject, anyobject],
				1556	stack_after=[anyobject],
				1557	proto=2,
				1558	doc="""Build an object instance.
				1559
				1560	The stack before should be thought of as containing a class
				1561	object followed by an argument tuple (the tuple being the stack
				1562	top). Call these cls and args. They are popped off the stack,
				1563	and the value returned by cls.__new__(cls, *args) is pushed back
				1564	onto the stack.
				1565	"""),
				1566
				1567	I(name='EXT1',
				1568	code='\x82',
				1569	arg=uint1,
				1570	stack_before=[],
				1571	stack_after=[anyobject],
				1572	proto=2,
				1573	doc="""Extension code.
				1574
				1575	This code and the similar EXT2 and EXT4 allow using a registry
				1576	of popular objects that are pickled by name, typically classes.
				1577	It is envisioned that through a global negotiation and
				1578	registration process, third parties can set up a mapping between
				1579	ints and object names.
				1580
				1581	In order to guarantee pickle interchangeability, the extension
				1582	code registry ought to be global, although a range of codes may
				1583	be reserved for private use.
				1584	"""),
				1585
				1586	I(name='EXT2',
				1587	code='\x83',
				1588	arg=uint2,
				1589	stack_before=[],
				1590	stack_after=[anyobject],
				1591	proto=2,
				1592	doc="""Extension code.
				1593
				1594	See EXT1.
				1595	"""),
				1596
				1597	I(name='EXT4',
				1598	code='\x84',
				1599	arg=int4,
				1600	stack_before=[],
				1601	stack_after=[anyobject],
				1602	proto=2,
				1603	doc="""Extension code.
				1604
				1605	See EXT1.
				1606	"""),
				1607
				1608	I(name='TUPLE1',
				1609	code='\x85',
				1610	arg=None,
				1611	stack_before=[anyobject],
				1612	stack_after=[pytuple],
				1613	proto=2,
				1614	doc="""One-tuple.
				1615
				1616	This code pops one value off the stack and pushes a tuple of
				1617	length 1 whose one item is that value back onto it. IOW:
				1618
				1619	stack[-1] = tuple(stack[-1:])
				1620	"""),
				1621
				1622	I(name='TUPLE2',
				1623	code='\x86',
				1624	arg=None,
				1625	stack_before=[anyobject, anyobject],
				1626	stack_after=[pytuple],
				1627	proto=2,
				1628	doc="""One-tuple.
				1629
				1630	This code pops two values off the stack and pushes a tuple
				1631	of length 2 whose items are those values back onto it. IOW:
				1632
				1633	stack[-2:] = [tuple(stack[-2:])]
				1634	"""),
				1635
				1636	I(name='TUPLE3',
				1637	code='\x87',
				1638	arg=None,
				1639	stack_before=[anyobject, anyobject, anyobject],
				1640	stack_after=[pytuple],
				1641	proto=2,
				1642	doc="""One-tuple.
				1643
				1644	This code pops three values off the stack and pushes a tuple
				1645	of length 3 whose items are those values back onto it. IOW:
				1646
				1647	stack[-3:] = [tuple(stack[-3:])]
				1648	"""),
				1649
				1650	I(name='NEWTRUE',
				1651	code='\x88',
				1652	arg=None,
				1653	stack_before=[],
				1654	stack_after=[pybool],
				1655	proto=2,
				1656	doc="""True.
				1657
				1658	Push True onto the stack."""),
				1659
				1660	I(name='NEWFALSE',
				1661	code='\x89',
				1662	arg=None,
				1663	stack_before=[],
				1664	stack_after=[pybool],
				1665	proto=2,
				1666	doc="""True.
				1667
				1668	Push False onto the stack."""),
				1669
				1670	I(name="LONG1",
				1671	code='\x8a',
				1672	arg=long1,
				1673	stack_before=[],
				1674	stack_after=[pylong],
				1675	proto=2,
				1676	doc="""Long integer using one-byte length.
				1677
				1678	A more efficient encoding of a Python long; the long1 encoding
				1679	says it all."""),
				1680
Guido van Rossum	5a2d8f5	2003-01-27 21:44:25 +0000	[diff] [blame]	1681	I(name="LONG4",
Tim Peters	fdb8cfa	2003-01-28 00:13:19 +0000	[diff] [blame]	1682	code='\x8b',
Guido van Rossum	5a2d8f5	2003-01-27 21:44:25 +0000	[diff] [blame]	1683	arg=long4,
				1684	stack_before=[],
				1685	stack_after=[pylong],
				1686	proto=2,
				1687	doc="""Long integer using found-byte length.
				1688
				1689	A more efficient encoding of a Python long; the long4 encoding
				1690	says it all."""),
				1691
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	1692	]
				1693	del I
				1694
				1695	# Verify uniqueness of .name and .code members.
				1696	name2i = {}
				1697	code2i = {}
				1698
				1699	for i, d in enumerate(opcodes):
				1700	if d.name in name2i:
				1701	raise ValueError("repeated name %r at indices %d and %d" %
				1702	(d.name, name2i[d.name], i))
				1703	if d.code in code2i:
				1704	raise ValueError("repeated code %r at indices %d and %d" %
				1705	(d.code, code2i[d.code], i))
				1706
				1707	name2i[d.name] = i
				1708	code2i[d.code] = i
				1709
				1710	del name2i, code2i, i, d
				1711
				1712	##############################################################################
				1713	# Build a code2op dict, mapping opcode characters to OpcodeInfo records.
				1714	# Also ensure we've got the same stuff as pickle.py, although the
				1715	# introspection here is dicey.
				1716
				1717	code2op = {}
				1718	for d in opcodes:
				1719	code2op[d.code] = d
				1720	del d
				1721
				1722	def assure_pickle_consistency(verbose=False):
				1723	import pickle, re
				1724
				1725	copy = code2op.copy()
				1726	for name in pickle.__all__:
				1727	if not re.match("[A-Z][A-Z0-9_]+$", name):
				1728	if verbose:
				1729	print "skipping %r: it doesn't look like an opcode name" % name
				1730	continue
				1731	picklecode = getattr(pickle, name)
				1732	if not isinstance(picklecode, str) or len(picklecode) != 1:
				1733	if verbose:
				1734	print ("skipping %r: value %r doesn't look like a pickle "
				1735	"code" % (name, picklecode))
				1736	continue
				1737	if picklecode in copy:
				1738	if verbose:
				1739	print "checking name %r w/ code %r for consistency" % (
				1740	name, picklecode)
				1741	d = copy[picklecode]
				1742	if d.name != name:
				1743	raise ValueError("for pickle code %r, pickle.py uses name %r "
				1744	"but we're using name %r" % (picklecode,
				1745	name,
				1746	d.name))
				1747	# Forget this one. Any left over in copy at the end are a problem
				1748	# of a different kind.
				1749	del copy[picklecode]
				1750	else:
				1751	raise ValueError("pickle.py appears to have a pickle opcode with "
				1752	"name %r and code %r, but we don't" %
				1753	(name, picklecode))
				1754	if copy:
				1755	msg = ["we appear to have pickle opcodes that pickle.py doesn't have:"]
				1756	for code, d in copy.items():
				1757	msg.append(" name %r with code %r" % (d.name, code))
				1758	raise ValueError("\n".join(msg))
				1759
				1760	assure_pickle_consistency()
				1761
				1762	##############################################################################
				1763	# A pickle opcode generator.
				1764
				1765	def genops(pickle):
Guido van Rossum	a72ded9	2003-01-27 19:40:47 +0000	[diff] [blame]	1766	"""Generate all the opcodes in a pickle.
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	1767
				1768	'pickle' is a file-like object, or string, containing the pickle.
				1769
				1770	Each opcode in the pickle is generated, from the current pickle position,
				1771	stopping after a STOP opcode is delivered. A triple is generated for
				1772	each opcode:
				1773
				1774	opcode, arg, pos
				1775
				1776	opcode is an OpcodeInfo record, describing the current opcode.
				1777
				1778	If the opcode has an argument embedded in the pickle, arg is its decoded
				1779	value, as a Python object. If the opcode doesn't have an argument, arg
				1780	is None.
				1781
				1782	If the pickle has a tell() method, pos was the value of pickle.tell()
				1783	before reading the current opcode. If the pickle is a string object,
				1784	it's wrapped in a StringIO object, and the latter's tell() result is
				1785	used. Else (the pickle doesn't have a tell(), and it's not obvious how
				1786	to query its current position) pos is None.
				1787	"""
				1788
				1789	import cStringIO as StringIO
				1790
				1791	if isinstance(pickle, str):
				1792	pickle = StringIO.StringIO(pickle)
				1793
				1794	if hasattr(pickle, "tell"):
				1795	getpos = pickle.tell
				1796	else:
				1797	getpos = lambda: None
				1798
				1799	while True:
				1800	pos = getpos()
				1801	code = pickle.read(1)
				1802	opcode = code2op.get(code)
				1803	if opcode is None:
				1804	if code == "":
				1805	raise ValueError("pickle exhausted before seeing STOP")
				1806	else:
				1807	raise ValueError("at position %s, opcode %r unknown" % (
				1808	pos is None and "<unknown>" or pos,
				1809	code))
				1810	if opcode.arg is None:
				1811	arg = None
				1812	else:
				1813	arg = opcode.arg.reader(pickle)
				1814	yield opcode, arg, pos
				1815	if code == '.':
				1816	assert opcode.name == 'STOP'
				1817	break
				1818
				1819	##############################################################################
				1820	# A symbolic pickle disassembler.
				1821
				1822	def dis(pickle, out=None, indentlevel=4):
				1823	"""Produce a symbolic disassembly of a pickle.
				1824
				1825	'pickle' is a file-like object, or string, containing a (at least one)
				1826	pickle. The pickle is disassembled from the current position, through
				1827	the first STOP opcode encountered.
				1828
				1829	Optional arg 'out' is a file-like object to which the disassembly is
				1830	printed. It defaults to sys.stdout.
				1831
				1832	Optional arg indentlevel is the number of blanks by which to indent
				1833	a new MARK level. It defaults to 4.
				1834	"""
				1835
				1836	markstack = []
				1837	indentchunk = ' ' * indentlevel
				1838	for opcode, arg, pos in genops(pickle):
				1839	if pos is not None:
				1840	print >> out, "%5d:" % pos,
				1841
				1842	line = "%s %s%s" % (opcode.code,
				1843	indentchunk * len(markstack),
				1844	opcode.name)
				1845
				1846	markmsg = None
				1847	if markstack and markobject in opcode.stack_before:
				1848	assert markobject not in opcode.stack_after
				1849	markpos = markstack.pop()
				1850	if markpos is not None:
				1851	markmsg = "(MARK at %d)" % markpos
				1852
				1853	if arg is not None or markmsg:
				1854	# make a mild effort to align arguments
				1855	line += ' ' * (10 - len(opcode.name))
				1856	if arg is not None:
				1857	line += ' ' + repr(arg)
				1858	if markmsg:
				1859	line += ' ' + markmsg
				1860	print >> out, line
				1861
				1862	if markobject in opcode.stack_after:
				1863	assert markobject not in opcode.stack_before
				1864	markstack.append(pos)
				1865
				1866
				1867	_dis_test = """
				1868	>>> import pickle
				1869	>>> x = [1, 2, (3, 4), {'abc': u"def"}]
Guido van Rossum	f29d3d6	2003-01-27 22:47:53 +0000	[diff] [blame]	1870	>>> pik = pickle.dumps(x, 0)
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	1871	>>> dis(pik)
				1872	0: ( MARK
				1873	1: l LIST (MARK at 0)
				1874	2: p PUT 0
				1875	5: I INT 1
				1876	8: a APPEND
				1877	9: I INT 2
				1878	12: a APPEND
				1879	13: ( MARK
				1880	14: I INT 3
				1881	17: I INT 4
				1882	20: t TUPLE (MARK at 13)
				1883	21: p PUT 1
				1884	24: a APPEND
				1885	25: ( MARK
				1886	26: d DICT (MARK at 25)
				1887	27: p PUT 2
				1888	30: S STRING 'abc'
				1889	37: p PUT 3
				1890	40: V UNICODE u'def'
				1891	45: p PUT 4
				1892	48: s SETITEM
				1893	49: a APPEND
				1894	50: . STOP
				1895
				1896	Try again with a "binary" pickle.
				1897
				1898	>>> pik = pickle.dumps(x, 1)
				1899	>>> dis(pik)
				1900	0: ] EMPTY_LIST
				1901	1: q BINPUT 0
				1902	3: ( MARK
				1903	4: K BININT1 1
				1904	6: K BININT1 2
				1905	8: ( MARK
				1906	9: K BININT1 3
				1907	11: K BININT1 4
				1908	13: t TUPLE (MARK at 8)
				1909	14: q BINPUT 1
				1910	16: } EMPTY_DICT
				1911	17: q BINPUT 2
				1912	19: U SHORT_BINSTRING 'abc'
				1913	24: q BINPUT 3
				1914	26: X BINUNICODE u'def'
				1915	34: q BINPUT 4
				1916	36: s SETITEM
				1917	37: e APPENDS (MARK at 3)
				1918	38: . STOP
				1919
				1920	Exercise the INST/OBJ/BUILD family.
				1921
				1922	>>> import random
Guido van Rossum	f29d3d6	2003-01-27 22:47:53 +0000	[diff] [blame]	1923	>>> dis(pickle.dumps(random.random, 0))
Tim Peters	d916cf4	2003-01-27 19:01:47 +0000	[diff] [blame]	1924	0: c GLOBAL 'random random'
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	1925	15: p PUT 0
				1926	18: . STOP
				1927
				1928	>>> x = [pickle.PicklingError()] * 2
Guido van Rossum	f29d3d6	2003-01-27 22:47:53 +0000	[diff] [blame]	1929	>>> dis(pickle.dumps(x, 0))
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	1930	0: ( MARK
				1931	1: l LIST (MARK at 0)
				1932	2: p PUT 0
				1933	5: ( MARK
Tim Peters	d916cf4	2003-01-27 19:01:47 +0000	[diff] [blame]	1934	6: i INST 'pickle PicklingError' (MARK at 5)
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	1935	28: p PUT 1
				1936	31: ( MARK
				1937	32: d DICT (MARK at 31)
				1938	33: p PUT 2
				1939	36: S STRING 'args'
				1940	44: p PUT 3
				1941	47: ( MARK
				1942	48: t TUPLE (MARK at 47)
				1943	49: p PUT 4
				1944	52: s SETITEM
				1945	53: b BUILD
				1946	54: a APPEND
				1947	55: g GET 1
				1948	58: a APPEND
				1949	59: . STOP
				1950
				1951	>>> dis(pickle.dumps(x, 1))
				1952	0: ] EMPTY_LIST
				1953	1: q BINPUT 0
				1954	3: ( MARK
				1955	4: ( MARK
Tim Peters	d916cf4	2003-01-27 19:01:47 +0000	[diff] [blame]	1956	5: c GLOBAL 'pickle PicklingError'
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	1957	27: q BINPUT 1
				1958	29: o OBJ (MARK at 4)
				1959	30: q BINPUT 2
				1960	32: } EMPTY_DICT
				1961	33: q BINPUT 3
				1962	35: U SHORT_BINSTRING 'args'
				1963	41: q BINPUT 4
				1964	43: ) EMPTY_TUPLE
				1965	44: s SETITEM
				1966	45: b BUILD
				1967	46: h BINGET 2
				1968	48: e APPENDS (MARK at 3)
				1969	49: . STOP
				1970
				1971	Try "the canonical" recursive-object test.
				1972
				1973	>>> L = []
				1974	>>> T = L,
				1975	>>> L.append(T)
				1976	>>> L[0] is T
				1977	True
				1978	>>> T[0] is L
				1979	True
				1980	>>> L[0][0] is L
				1981	True
				1982	>>> T[0][0] is T
				1983	True
Guido van Rossum	f29d3d6	2003-01-27 22:47:53 +0000	[diff] [blame]	1984	>>> dis(pickle.dumps(L, 0))
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	1985	0: ( MARK
				1986	1: l LIST (MARK at 0)
				1987	2: p PUT 0
				1988	5: ( MARK
				1989	6: g GET 0
				1990	9: t TUPLE (MARK at 5)
				1991	10: p PUT 1
				1992	13: a APPEND
				1993	14: . STOP
				1994	>>> dis(pickle.dumps(L, 1))
				1995	0: ] EMPTY_LIST
				1996	1: q BINPUT 0
				1997	3: ( MARK
				1998	4: h BINGET 0
				1999	6: t TUPLE (MARK at 3)
				2000	7: q BINPUT 1
				2001	9: a APPEND
				2002	10: . STOP
				2003
				2004	The protocol 0 pickle of the tuple causes the disassembly to get confused,
				2005	as it doesn't realize that the POP opcode at 16 gets rid of the MARK at 0
				2006	(so the output remains indented until the end). The protocol 1 pickle
				2007	doesn't trigger this glitch, because the disassembler realizes that
				2008	POP_MARK gets rid of the MARK. Doing a better job on the protocol 0
				2009	pickle would require the disassembler to emulate the stack.
				2010
Guido van Rossum	f29d3d6	2003-01-27 22:47:53 +0000	[diff] [blame]	2011	>>> dis(pickle.dumps(T, 0))
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	2012	0: ( MARK
				2013	1: ( MARK
				2014	2: l LIST (MARK at 1)
				2015	3: p PUT 0
				2016	6: ( MARK
				2017	7: g GET 0
				2018	10: t TUPLE (MARK at 6)
				2019	11: p PUT 1
				2020	14: a APPEND
				2021	15: 0 POP
				2022	16: 0 POP
				2023	17: g GET 1
				2024	20: . STOP
				2025	>>> dis(pickle.dumps(T, 1))
				2026	0: ( MARK
				2027	1: ] EMPTY_LIST
				2028	2: q BINPUT 0
				2029	4: ( MARK
				2030	5: h BINGET 0
				2031	7: t TUPLE (MARK at 4)
				2032	8: q BINPUT 1
				2033	10: a APPEND
				2034	11: 1 POP_MARK (MARK at 0)
				2035	12: h BINGET 1
				2036	14: . STOP
				2037	"""
				2038
				2039	__test__ = {'dissassembler_test': _dis_test,
				2040	}
				2041
				2042	def _test():
				2043	import doctest
				2044	return doctest.testmod()
				2045
				2046	if __name__ == "__main__":
				2047	_test()