Blame - Lib/pickletools.py - platform/external/python/cpython3

blob: a2ba09db2da3d33289d8ac55c04532b37e16a35b [file] [log] [blame]

Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	1	""""Executable documentation" for the pickle module.
				2
				3	Extensive comments about the pickle protocols and pickle-machine opcodes
				4	can be found here. Some functions meant for external use:
				5
				6	genops(pickle)
				7	Generate all the opcodes in a pickle, as (opcode, arg, position) triples.
				8
				9	dis(pickle, out=None, indentlevel=4)
				10	Print a symbolic disassembly of a pickle.
				11	"""
				12
				13	# Other ideas:
				14	#
				15	# - A pickle verifier: read a pickle and check it exhaustively for
				16	# well-formedness.
				17	#
				18	# - A protocol identifier: examine a pickle and return its protocol number
				19	# (== the highest .proto attr value among all the opcodes in the pickle).
				20	#
				21	# - A pickle optimizer: for example, tuple-building code is sometimes more
				22	# elaborate than necessary, catering for the possibility that the tuple
				23	# is recursive. Or lots of times a PUT is generated that's never accessed
				24	# by a later GET.
				25
				26
				27	"""
				28	"A pickle" is a program for a virtual pickle machine (PM, but more accurately
				29	called an unpickling machine). It's a sequence of opcodes, interpreted by the
				30	PM, building an arbitrarily complex Python object.
				31
				32	For the most part, the PM is very simple: there are no looping, testing, or
				33	conditional instructions, no arithmetic and no function calls. Opcodes are
				34	executed once each, from first to last, until a STOP opcode is reached.
				35
				36	The PM has two data areas, "the stack" and "the memo".
				37
				38	Many opcodes push Python objects onto the stack; e.g., INT pushes a Python
				39	integer object on the stack, whose value is gotten from a decimal string
				40	literal immediately following the INT opcode in the pickle bytestream. Other
				41	opcodes take Python objects off the stack. The result of unpickling is
				42	whatever object is left on the stack when the final STOP opcode is executed.
				43
				44	The memo is simply an array of objects, or it can be implemented as a dict
				45	mapping little integers to objects. The memo serves as the PM's "long term
				46	memory", and the little integers indexing the memo are akin to variable
				47	names. Some opcodes pop a stack object into the memo at a given index,
				48	and others push a memo object at a given index onto the stack again.
				49
				50	At heart, that's all the PM has. Subtleties arise for these reasons:
				51
				52	+ Object identity. Objects can be arbitrarily complex, and subobjects
				53	may be shared (for example, the list [a, a] refers to the same object a
				54	twice). It can be vital that unpickling recreate an isomorphic object
				55	graph, faithfully reproducing sharing.
				56
				57	+ Recursive objects. For example, after "L = []; L.append(L)", L is a
				58	list, and L[0] is the same list. This is related to the object identity
				59	point, and some sequences of pickle opcodes are subtle in order to
				60	get the right result in all cases.
				61
				62	+ Things pickle doesn't know everything about. Examples of things pickle
				63	does know everything about are Python's builtin scalar and container
				64	types, like ints and tuples. They generally have opcodes dedicated to
				65	them. For things like module references and instances of user-defined
				66	classes, pickle's knowledge is limited. Historically, many enhancements
				67	have been made to the pickle protocol in order to do a better (faster,
				68	and/or more compact) job on those.
				69
				70	+ Backward compatibility and micro-optimization. As explained below,
				71	pickle opcodes never go away, not even when better ways to do a thing
				72	get invented. The repertoire of the PM just keeps growing over time.
Tim Peters	fdc0346	2003-01-28 04:56:33 +0000	[diff] [blame^]	73	For example, protocol 0 had two opcodes for building Python integers (INT
				74	and LONG), protocol 1 added three more for more-efficient pickling of short
				75	integers, and protocol 2 added two more for more-efficient pickling of
				76	long integers (before protocol 2, the only ways to pickle a Python long
				77	took time quadratic in the number of digits, for both pickling and
				78	unpickling). "Opcode bloat" isn't so much a subtlety as a source of
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	79	wearying complication.
				80
				81
				82	Pickle protocols:
				83
				84	For compatibility, the meaning of a pickle opcode never changes. Instead new
				85	pickle opcodes get added, and each version's unpickler can handle all the
				86	pickle opcodes in all protocol versions to date. So old pickles continue to
				87	be readable forever. The pickler can generally be told to restrict itself to
				88	the subset of opcodes available under previous protocol versions too, so that
				89	users can create pickles under the current version readable by older
				90	versions. However, a pickle does not contain its version number embedded
				91	within it. If an older unpickler tries to read a pickle using a later
				92	protocol, the result is most likely an exception due to seeing an unknown (in
				93	the older unpickler) opcode.
				94
				95	The original pickle used what's now called "protocol 0", and what was called
				96	"text mode" before Python 2.3. The entire pickle bytestream is made up of
				97	printable 7-bit ASCII characters, plus the newline character, in protocol 0.
Tim Peters	fdc0346	2003-01-28 04:56:33 +0000	[diff] [blame^]	98	That's why it was called text mode. Protocol 0 is small and elegant, but
				99	sometimes painfully inefficient.
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	100
				101	The second major set of additions is now called "protocol 1", and was called
				102	"binary mode" before Python 2.3. This added many opcodes with arguments
				103	consisting of arbitrary bytes, including NUL bytes and unprintable "high bit"
				104	bytes. Binary mode pickles can be substantially smaller than equivalent
				105	text mode pickles, and sometimes faster too; e.g., BININT represents a 4-byte
				106	int as 4 bytes following the opcode, which is cheaper to unpickle than the
Tim Peters	fdc0346	2003-01-28 04:56:33 +0000	[diff] [blame^]	107	(perhaps) 11-character decimal string attached to INT. Protocol 1 also added
				108	a number of opcodes that operate on many stack elements at once (like APPENDS
				109	and SETITEMS).
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	110
				111	The third major set of additions came in Python 2.3, and is called "protocol
Tim Peters	fdc0346	2003-01-28 04:56:33 +0000	[diff] [blame^]	112	2". This added:
				113
				114	- A better way to pickle instances of new-style classes (NEWOBJ).
				115
				116	- A way for a pickle to identify its protocol (PROTO).
				117
				118	- Time- and space- efficient pickling of long ints (LONG{1,4}).
				119
				120	- Shortcuts for small tuples (TUPLE{1,2,3}}.
				121
				122	- Dedicated opcodes for bools (NEWTRUE, NEWFALSE).
				123
				124	- The "extension registry", a vector of popular objects that can be pushed
				125	efficiently by index (EXT{1,2,4}). This is akin to the memo and GET, but
				126	the registry contents are predefined (there's nothing akin to the memo's
				127	PUT).
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	128	"""
				129
				130	# Meta-rule: Descriptions are stored in instances of descriptor objects,
				131	# with plain constructors. No meta-language is defined from which
				132	# descriptors could be constructed. If you want, e.g., XML, write a little
				133	# program to generate XML from the objects.
				134
				135	##############################################################################
				136	# Some pickle opcodes have an argument, following the opcode in the
				137	# bytestream. An argument is of a specific type, described by an instance
				138	# of ArgumentDescriptor. These are not to be confused with arguments taken
				139	# off the stack -- ArgumentDescriptor applies only to arguments embedded in
				140	# the opcode stream, immediately following an opcode.
				141
				142	# Represents the number of bytes consumed by an argument delimited by the
				143	# next newline character.
				144	UP_TO_NEWLINE = -1
				145
				146	# Represents the number of bytes consumed by a two-argument opcode where
				147	# the first argument gives the number of bytes in the second argument.
Tim Peters	fdb8cfa	2003-01-28 00:13:19 +0000	[diff] [blame]	148	TAKEN_FROM_ARGUMENT1 = -2 # num bytes is 1-byte unsigned int
				149	TAKEN_FROM_ARGUMENT4 = -3 # num bytes is 4-byte signed little-endian int
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	150
				151	class ArgumentDescriptor(object):
				152	__slots__ = (
				153	# name of descriptor record, also a module global name; a string
				154	'name',
				155
				156	# length of argument, in bytes; an int; UP_TO_NEWLINE and
Tim Peters	fdb8cfa	2003-01-28 00:13:19 +0000	[diff] [blame]	157	# TAKEN_FROM_ARGUMENT{1,4} are negative values for variable-length
				158	# cases
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	159	'n',
				160
				161	# a function taking a file-like object, reading this kind of argument
				162	# from the object at the current position, advancing the current
				163	# position by n bytes, and returning the value of the argument
				164	'reader',
				165
				166	# human-readable docs for this arg descriptor; a string
				167	'doc',
				168	)
				169
				170	def __init__(self, name, n, reader, doc):
				171	assert isinstance(name, str)
				172	self.name = name
				173
				174	assert isinstance(n, int) and (n >= 0 or
Tim Peters	fdb8cfa	2003-01-28 00:13:19 +0000	[diff] [blame]	175	n in (UP_TO_NEWLINE,
				176	TAKEN_FROM_ARGUMENT1,
				177	TAKEN_FROM_ARGUMENT4))
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	178	self.n = n
				179
				180	self.reader = reader
				181
				182	assert isinstance(doc, str)
				183	self.doc = doc
				184
				185	from struct import unpack as _unpack
				186
				187	def read_uint1(f):
				188	"""
				189	>>> import StringIO
				190	>>> read_uint1(StringIO.StringIO('\\xff'))
				191	255
				192	"""
				193
				194	data = f.read(1)
				195	if data:
				196	return ord(data)
				197	raise ValueError("not enough data in stream to read uint1")
				198
				199	uint1 = ArgumentDescriptor(
				200	name='uint1',
				201	n=1,
				202	reader=read_uint1,
				203	doc="One-byte unsigned integer.")
				204
				205
				206	def read_uint2(f):
				207	"""
				208	>>> import StringIO
				209	>>> read_uint2(StringIO.StringIO('\\xff\\x00'))
				210	255
				211	>>> read_uint2(StringIO.StringIO('\\xff\\xff'))
				212	65535
				213	"""
				214
				215	data = f.read(2)
				216	if len(data) == 2:
				217	return _unpack("<H", data)[0]
				218	raise ValueError("not enough data in stream to read uint2")
				219
				220	uint2 = ArgumentDescriptor(
				221	name='uint2',
				222	n=2,
				223	reader=read_uint2,
				224	doc="Two-byte unsigned integer, little-endian.")
				225
				226
				227	def read_int4(f):
				228	"""
				229	>>> import StringIO
				230	>>> read_int4(StringIO.StringIO('\\xff\\x00\\x00\\x00'))
				231	255
				232	>>> read_int4(StringIO.StringIO('\\x00\\x00\\x00\\x80')) == -(2**31)
				233	True
				234	"""
				235
				236	data = f.read(4)
				237	if len(data) == 4:
				238	return _unpack("<i", data)[0]
				239	raise ValueError("not enough data in stream to read int4")
				240
				241	int4 = ArgumentDescriptor(
				242	name='int4',
				243	n=4,
				244	reader=read_int4,
				245	doc="Four-byte signed integer, little-endian, 2's complement.")
				246
				247
				248	def read_stringnl(f, decode=True, stripquotes=True):
				249	"""
				250	>>> import StringIO
				251	>>> read_stringnl(StringIO.StringIO("'abcd'\\nefg\\n"))
				252	'abcd'
				253
				254	>>> read_stringnl(StringIO.StringIO("\\n"))
				255	Traceback (most recent call last):
				256	...
				257	ValueError: no string quotes around ''
				258
				259	>>> read_stringnl(StringIO.StringIO("\\n"), stripquotes=False)
				260	''
				261
				262	>>> read_stringnl(StringIO.StringIO("''\\n"))
				263	''
				264
				265	>>> read_stringnl(StringIO.StringIO('"abcd"'))
				266	Traceback (most recent call last):
				267	...
				268	ValueError: no newline found when trying to read stringnl
				269
				270	Embedded escapes are undone in the result.
				271	>>> read_stringnl(StringIO.StringIO("'a\\\\nb\\x00c\\td'\\n'e'"))
				272	'a\\nb\\x00c\\td'
				273	"""
				274
				275	data = f.readline()
				276	if not data.endswith('\n'):
				277	raise ValueError("no newline found when trying to read stringnl")
				278	data = data[:-1] # lose the newline
				279
				280	if stripquotes:
				281	for q in "'\"":
				282	if data.startswith(q):
				283	if not data.endswith(q):
				284	raise ValueError("strinq quote %r not found at both "
				285	"ends of %r" % (q, data))
				286	data = data[1:-1]
				287	break
				288	else:
				289	raise ValueError("no string quotes around %r" % data)
				290
				291	# I'm not sure when 'string_escape' was added to the std codecs; it's
				292	# crazy not to use it if it's there.
				293	if decode:
				294	data = data.decode('string_escape')
				295	return data
				296
				297	stringnl = ArgumentDescriptor(
				298	name='stringnl',
				299	n=UP_TO_NEWLINE,
				300	reader=read_stringnl,
				301	doc="""A newline-terminated string.
				302
				303	This is a repr-style string, with embedded escapes, and
				304	bracketing quotes.
				305	""")
				306
				307	def read_stringnl_noescape(f):
				308	return read_stringnl(f, decode=False, stripquotes=False)
				309
				310	stringnl_noescape = ArgumentDescriptor(
				311	name='stringnl_noescape',
				312	n=UP_TO_NEWLINE,
				313	reader=read_stringnl_noescape,
				314	doc="""A newline-terminated string.
				315
				316	This is a str-style string, without embedded escapes,
				317	or bracketing quotes. It should consist solely of
				318	printable ASCII characters.
				319	""")
				320
				321	def read_stringnl_noescape_pair(f):
				322	"""
				323	>>> import StringIO
				324	>>> read_stringnl_noescape_pair(StringIO.StringIO("Queue\\nEmpty\\njunk"))
Tim Peters	d916cf4	2003-01-27 19:01:47 +0000	[diff] [blame]	325	'Queue Empty'
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	326	"""
				327
Tim Peters	d916cf4	2003-01-27 19:01:47 +0000	[diff] [blame]	328	return "%s %s" % (read_stringnl_noescape(f), read_stringnl_noescape(f))
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	329
				330	stringnl_noescape_pair = ArgumentDescriptor(
				331	name='stringnl_noescape_pair',
				332	n=UP_TO_NEWLINE,
				333	reader=read_stringnl_noescape_pair,
				334	doc="""A pair of newline-terminated strings.
				335
				336	These are str-style strings, without embedded
				337	escapes, or bracketing quotes. They should
				338	consist solely of printable ASCII characters.
				339	The pair is returned as a single string, with
Tim Peters	d916cf4	2003-01-27 19:01:47 +0000	[diff] [blame]	340	a single blank separating the two strings.
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	341	""")
				342
				343	def read_string4(f):
				344	"""
				345	>>> import StringIO
				346	>>> read_string4(StringIO.StringIO("\\x00\\x00\\x00\\x00abc"))
				347	''
				348	>>> read_string4(StringIO.StringIO("\\x03\\x00\\x00\\x00abcdef"))
				349	'abc'
				350	>>> read_string4(StringIO.StringIO("\\x00\\x00\\x00\\x03abcdef"))
				351	Traceback (most recent call last):
				352	...
				353	ValueError: expected 50331648 bytes in a string4, but only 6 remain
				354	"""
				355
				356	n = read_int4(f)
				357	if n < 0:
				358	raise ValueError("string4 byte count < 0: %d" % n)
				359	data = f.read(n)
				360	if len(data) == n:
				361	return data
				362	raise ValueError("expected %d bytes in a string4, but only %d remain" %
				363	(n, len(data)))
				364
				365	string4 = ArgumentDescriptor(
				366	name="string4",
Tim Peters	fdb8cfa	2003-01-28 00:13:19 +0000	[diff] [blame]	367	n=TAKEN_FROM_ARGUMENT4,
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	368	reader=read_string4,
				369	doc="""A counted string.
				370
				371	The first argument is a 4-byte little-endian signed int giving
				372	the number of bytes in the string, and the second argument is
				373	that many bytes.
				374	""")
				375
				376
				377	def read_string1(f):
				378	"""
				379	>>> import StringIO
				380	>>> read_string1(StringIO.StringIO("\\x00"))
				381	''
				382	>>> read_string1(StringIO.StringIO("\\x03abcdef"))
				383	'abc'
				384	"""
				385
				386	n = read_uint1(f)
				387	assert n >= 0
				388	data = f.read(n)
				389	if len(data) == n:
				390	return data
				391	raise ValueError("expected %d bytes in a string1, but only %d remain" %
				392	(n, len(data)))
				393
				394	string1 = ArgumentDescriptor(
				395	name="string1",
Tim Peters	fdb8cfa	2003-01-28 00:13:19 +0000	[diff] [blame]	396	n=TAKEN_FROM_ARGUMENT1,
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	397	reader=read_string1,
				398	doc="""A counted string.
				399
				400	The first argument is a 1-byte unsigned int giving the number
				401	of bytes in the string, and the second argument is that many
				402	bytes.
				403	""")
				404
				405
				406	def read_unicodestringnl(f):
				407	"""
				408	>>> import StringIO
				409	>>> read_unicodestringnl(StringIO.StringIO("abc\\uabcd\\njunk"))
				410	u'abc\\uabcd'
				411	"""
				412
				413	data = f.readline()
				414	if not data.endswith('\n'):
				415	raise ValueError("no newline found when trying to read "
				416	"unicodestringnl")
				417	data = data[:-1] # lose the newline
				418	return unicode(data, 'raw-unicode-escape')
				419
				420	unicodestringnl = ArgumentDescriptor(
				421	name='unicodestringnl',
				422	n=UP_TO_NEWLINE,
				423	reader=read_unicodestringnl,
				424	doc="""A newline-terminated Unicode string.
				425
				426	This is raw-unicode-escape encoded, so consists of
				427	printable ASCII characters, and may contain embedded
				428	escape sequences.
				429	""")
				430
				431	def read_unicodestring4(f):
				432	"""
				433	>>> import StringIO
				434	>>> s = u'abcd\\uabcd'
				435	>>> enc = s.encode('utf-8')
				436	>>> enc
				437	'abcd\\xea\\xaf\\x8d'
				438	>>> n = chr(len(enc)) + chr(0) * 3 # little-endian 4-byte length
				439	>>> t = read_unicodestring4(StringIO.StringIO(n + enc + 'junk'))
				440	>>> s == t
				441	True
				442
				443	>>> read_unicodestring4(StringIO.StringIO(n + enc[:-1]))
				444	Traceback (most recent call last):
				445	...
				446	ValueError: expected 7 bytes in a unicodestring4, but only 6 remain
				447	"""
				448
				449	n = read_int4(f)
				450	if n < 0:
				451	raise ValueError("unicodestring4 byte count < 0: %d" % n)
				452	data = f.read(n)
				453	if len(data) == n:
				454	return unicode(data, 'utf-8')
				455	raise ValueError("expected %d bytes in a unicodestring4, but only %d "
				456	"remain" % (n, len(data)))
				457
				458	unicodestring4 = ArgumentDescriptor(
				459	name="unicodestring4",
Tim Peters	fdb8cfa	2003-01-28 00:13:19 +0000	[diff] [blame]	460	n=TAKEN_FROM_ARGUMENT4,
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	461	reader=read_unicodestring4,
				462	doc="""A counted Unicode string.
				463
				464	The first argument is a 4-byte little-endian signed int
				465	giving the number of bytes in the string, and the second
				466	argument-- the UTF-8 encoding of the Unicode string --
				467	contains that many bytes.
				468	""")
				469
				470
				471	def read_decimalnl_short(f):
				472	"""
				473	>>> import StringIO
				474	>>> read_decimalnl_short(StringIO.StringIO("1234\\n56"))
				475	1234
				476
				477	>>> read_decimalnl_short(StringIO.StringIO("1234L\\n56"))
				478	Traceback (most recent call last):
				479	...
				480	ValueError: trailing 'L' not allowed in '1234L'
				481	"""
				482
				483	s = read_stringnl(f, decode=False, stripquotes=False)
				484	if s.endswith("L"):
				485	raise ValueError("trailing 'L' not allowed in %r" % s)
				486
				487	# It's not necessarily true that the result fits in a Python short int:
				488	# the pickle may have been written on a 64-bit box. There's also a hack
				489	# for True and False here.
				490	if s == "00":
				491	return False
				492	elif s == "01":
				493	return True
				494
				495	try:
				496	return int(s)
				497	except OverflowError:
				498	return long(s)
				499
				500	def read_decimalnl_long(f):
				501	"""
				502	>>> import StringIO
				503
				504	>>> read_decimalnl_long(StringIO.StringIO("1234\\n56"))
				505	Traceback (most recent call last):
				506	...
				507	ValueError: trailing 'L' required in '1234'
				508
				509	Someday the trailing 'L' will probably go away from this output.
				510
				511	>>> read_decimalnl_long(StringIO.StringIO("1234L\\n56"))
				512	1234L
				513
				514	>>> read_decimalnl_long(StringIO.StringIO("123456789012345678901234L\\n6"))
				515	123456789012345678901234L
				516	"""
				517
				518	s = read_stringnl(f, decode=False, stripquotes=False)
				519	if not s.endswith("L"):
				520	raise ValueError("trailing 'L' required in %r" % s)
				521	return long(s)
				522
				523
				524	decimalnl_short = ArgumentDescriptor(
				525	name='decimalnl_short',
				526	n=UP_TO_NEWLINE,
				527	reader=read_decimalnl_short,
				528	doc="""A newline-terminated decimal integer literal.
				529
				530	This never has a trailing 'L', and the integer fit
				531	in a short Python int on the box where the pickle
				532	was written -- but there's no guarantee it will fit
				533	in a short Python int on the box where the pickle
				534	is read.
				535	""")
				536
				537	decimalnl_long = ArgumentDescriptor(
				538	name='decimalnl_long',
				539	n=UP_TO_NEWLINE,
				540	reader=read_decimalnl_long,
				541	doc="""A newline-terminated decimal integer literal.
				542
				543	This has a trailing 'L', and can represent integers
				544	of any size.
				545	""")
				546
				547
				548	def read_floatnl(f):
				549	"""
				550	>>> import StringIO
				551	>>> read_floatnl(StringIO.StringIO("-1.25\\n6"))
				552	-1.25
				553	"""
				554	s = read_stringnl(f, decode=False, stripquotes=False)
				555	return float(s)
				556
				557	floatnl = ArgumentDescriptor(
				558	name='floatnl',
				559	n=UP_TO_NEWLINE,
				560	reader=read_floatnl,
				561	doc="""A newline-terminated decimal floating literal.
				562
				563	In general this requires 17 significant digits for roundtrip
				564	identity, and pickling then unpickling infinities, NaNs, and
				565	minus zero doesn't work across boxes, or on some boxes even
				566	on itself (e.g., Windows can't read the strings it produces
				567	for infinities or NaNs).
				568	""")
				569
				570	def read_float8(f):
				571	"""
				572	>>> import StringIO, struct
				573	>>> raw = struct.pack(">d", -1.25)
				574	>>> raw
				575	'\\xbf\\xf4\\x00\\x00\\x00\\x00\\x00\\x00'
				576	>>> read_float8(StringIO.StringIO(raw + "\\n"))
				577	-1.25
				578	"""
				579
				580	data = f.read(8)
				581	if len(data) == 8:
				582	return _unpack(">d", data)[0]
				583	raise ValueError("not enough data in stream to read float8")
				584
				585
				586	float8 = ArgumentDescriptor(
				587	name='float8',
				588	n=8,
				589	reader=read_float8,
				590	doc="""An 8-byte binary representation of a float, big-endian.
				591
				592	The format is unique to Python, and shared with the struct
				593	module (format string '>d') "in theory" (the struct and cPickle
				594	implementations don't share the code -- they should). It's
				595	strongly related to the IEEE-754 double format, and, in normal
				596	cases, is in fact identical to the big-endian 754 double format.
				597	On other boxes the dynamic range is limited to that of a 754
				598	double, and "add a half and chop" rounding is used to reduce
				599	the precision to 53 bits. However, even on a 754 box,
				600	infinities, NaNs, and minus zero may not be handled correctly
				601	(may not survive roundtrip pickling intact).
				602	""")
				603
Guido van Rossum	5a2d8f5	2003-01-27 21:44:25 +0000	[diff] [blame]	604	# Protocol 2 formats
				605
				606	def decode_long(data):
				607	r"""Decode a long from a two's complement little-endian binary string.
				608	>>> decode_long("\xff\x00")
				609	255L
				610	>>> decode_long("\xff\x7f")
				611	32767L
				612	>>> decode_long("\x00\xff")
				613	-256L
				614	>>> decode_long("\x00\x80")
				615	-32768L
Tim Peters	217e571	2003-01-27 23:51:11 +0000	[diff] [blame]	616	>>> decode_long("\x80")
				617	-128L
				618	>>> decode_long("\x7f")
				619	127L
Guido van Rossum	5a2d8f5	2003-01-27 21:44:25 +0000	[diff] [blame]	620	"""
				621	x = 0L
				622	i = 0L
				623	for c in data:
				624	x \|= long(ord(c)) << i
				625	i += 8L
Tim Peters	217e571	2003-01-27 23:51:11 +0000	[diff] [blame]	626	if data and ord(c) >= 0x80:
Guido van Rossum	5a2d8f5	2003-01-27 21:44:25 +0000	[diff] [blame]	627	x -= 1L << i
				628	return x
				629
				630	def read_long1(f):
				631	r"""
				632	>>> import StringIO
				633	>>> read_long1(StringIO.StringIO("\x02\xff\x00"))
				634	255L
				635	>>> read_long1(StringIO.StringIO("\x02\xff\x7f"))
				636	32767L
				637	>>> read_long1(StringIO.StringIO("\x02\x00\xff"))
				638	-256L
				639	>>> read_long1(StringIO.StringIO("\x02\x00\x80"))
				640	-32768L
Tim Peters	5eed340	2003-01-27 23:51:36 +0000	[diff] [blame]	641	>>>
Guido van Rossum	5a2d8f5	2003-01-27 21:44:25 +0000	[diff] [blame]	642	"""
				643
				644	n = read_uint1(f)
				645	data = f.read(n)
				646	if len(data) != n:
				647	raise ValueError("not enough data in stream to read long1")
				648	return decode_long(data)
				649
				650	long1 = ArgumentDescriptor(
				651	name="long1",
Tim Peters	fdb8cfa	2003-01-28 00:13:19 +0000	[diff] [blame]	652	n=TAKEN_FROM_ARGUMENT1,
Guido van Rossum	5a2d8f5	2003-01-27 21:44:25 +0000	[diff] [blame]	653	reader=read_long1,
				654	doc="""A binary long, little-endian, using 1-byte size.
				655
				656	This first reads one byte as an unsigned size, then reads that
Tim Peters	bdbe741	2003-01-27 23:54:04 +0000	[diff] [blame]	657	many bytes and interprets them as a little-endian 2's-complement long.
Guido van Rossum	5a2d8f5	2003-01-27 21:44:25 +0000	[diff] [blame]	658	""")
				659
Guido van Rossum	5a2d8f5	2003-01-27 21:44:25 +0000	[diff] [blame]	660	def read_long4(f):
				661	r"""
				662	>>> import StringIO
				663	>>> read_long4(StringIO.StringIO("\x02\x00\x00\x00\xff\x00"))
				664	255L
				665	>>> read_long4(StringIO.StringIO("\x02\x00\x00\x00\xff\x7f"))
				666	32767L
				667	>>> read_long4(StringIO.StringIO("\x02\x00\x00\x00\x00\xff"))
				668	-256L
				669	>>> read_long4(StringIO.StringIO("\x02\x00\x00\x00\x00\x80"))
				670	-32768L
Tim Peters	5eed340	2003-01-27 23:51:36 +0000	[diff] [blame]	671	>>>
Guido van Rossum	5a2d8f5	2003-01-27 21:44:25 +0000	[diff] [blame]	672	"""
				673
				674	n = read_int4(f)
				675	if n < 0:
Neal Norwitz	784a3f5	2003-01-28 00:20:41 +0000	[diff] [blame]	676	raise ValueError("long4 byte count < 0: %d" % n)
Guido van Rossum	5a2d8f5	2003-01-27 21:44:25 +0000	[diff] [blame]	677	data = f.read(n)
				678	if len(data) != n:
Neal Norwitz	784a3f5	2003-01-28 00:20:41 +0000	[diff] [blame]	679	raise ValueError("not enough data in stream to read long4")
Guido van Rossum	5a2d8f5	2003-01-27 21:44:25 +0000	[diff] [blame]	680	return decode_long(data)
				681
				682	long4 = ArgumentDescriptor(
				683	name="long4",
Tim Peters	fdb8cfa	2003-01-28 00:13:19 +0000	[diff] [blame]	684	n=TAKEN_FROM_ARGUMENT4,
Guido van Rossum	5a2d8f5	2003-01-27 21:44:25 +0000	[diff] [blame]	685	reader=read_long4,
				686	doc="""A binary representation of a long, little-endian.
				687
				688	This first reads four bytes as a signed size (but requires the
				689	size to be >= 0), then reads that many bytes and interprets them
Tim Peters	bdbe741	2003-01-27 23:54:04 +0000	[diff] [blame]	690	as a little-endian 2's-complement long.
Guido van Rossum	5a2d8f5	2003-01-27 21:44:25 +0000	[diff] [blame]	691	""")
				692
				693
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	694	##############################################################################
				695	# Object descriptors. The stack used by the pickle machine holds objects,
				696	# and in the stack_before and stack_after attributes of OpcodeInfo
				697	# descriptors we need names to describe the various types of objects that can
				698	# appear on the stack.
				699
				700	class StackObject(object):
				701	__slots__ = (
				702	# name of descriptor record, for info only
				703	'name',
				704
				705	# type of object, or tuple of type objects (meaning the object can
				706	# be of any type in the tuple)
				707	'obtype',
				708
				709	# human-readable docs for this kind of stack object; a string
				710	'doc',
				711	)
				712
				713	def __init__(self, name, obtype, doc):
				714	assert isinstance(name, str)
				715	self.name = name
				716
				717	assert isinstance(obtype, type) or isinstance(obtype, tuple)
				718	if isinstance(obtype, tuple):
				719	for contained in obtype:
				720	assert isinstance(contained, type)
				721	self.obtype = obtype
				722
				723	assert isinstance(doc, str)
				724	self.doc = doc
				725
				726
				727	pyint = StackObject(
				728	name='int',
				729	obtype=int,
				730	doc="A short (as opposed to long) Python integer object.")
				731
				732	pylong = StackObject(
				733	name='long',
				734	obtype=long,
				735	doc="A long (as opposed to short) Python integer object.")
				736
				737	pyinteger_or_bool = StackObject(
				738	name='int_or_bool',
				739	obtype=(int, long, bool),
				740	doc="A Python integer object (short or long), or "
				741	"a Python bool.")
				742
Guido van Rossum	5a2d8f5	2003-01-27 21:44:25 +0000	[diff] [blame]	743	pybool = StackObject(
				744	name='bool',
				745	obtype=(bool,),
				746	doc="A Python bool object.")
				747
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	748	pyfloat = StackObject(
				749	name='float',
				750	obtype=float,
				751	doc="A Python float object.")
				752
				753	pystring = StackObject(
				754	name='str',
				755	obtype=str,
				756	doc="A Python string object.")
				757
				758	pyunicode = StackObject(
				759	name='unicode',
				760	obtype=unicode,
				761	doc="A Python Unicode string object.")
				762
				763	pynone = StackObject(
				764	name="None",
				765	obtype=type(None),
				766	doc="The Python None object.")
				767
				768	pytuple = StackObject(
				769	name="tuple",
				770	obtype=tuple,
				771	doc="A Python tuple object.")
				772
				773	pylist = StackObject(
				774	name="list",
				775	obtype=list,
				776	doc="A Python list object.")
				777
				778	pydict = StackObject(
				779	name="dict",
				780	obtype=dict,
				781	doc="A Python dict object.")
				782
				783	anyobject = StackObject(
				784	name='any',
				785	obtype=object,
				786	doc="Any kind of object whatsoever.")
				787
				788	markobject = StackObject(
				789	name="mark",
				790	obtype=StackObject,
				791	doc="""'The mark' is a unique object.
				792
				793	Opcodes that operate on a variable number of objects
				794	generally don't embed the count of objects in the opcode,
				795	or pull it off the stack. Instead the MARK opcode is used
				796	to push a special marker object on the stack, and then
				797	some other opcodes grab all the objects from the top of
				798	the stack down to (but not including) the topmost marker
				799	object.
				800	""")
				801
				802	stackslice = StackObject(
				803	name="stackslice",
				804	obtype=StackObject,
				805	doc="""An object representing a contiguous slice of the stack.
				806
				807	This is used in conjuction with markobject, to represent all
				808	of the stack following the topmost markobject. For example,
				809	the POP_MARK opcode changes the stack from
				810
				811	[..., markobject, stackslice]
				812	to
				813	[...]
				814
				815	No matter how many object are on the stack after the topmost
				816	markobject, POP_MARK gets rid of all of them (including the
				817	topmost markobject too).
				818	""")
				819
				820	##############################################################################
				821	# Descriptors for pickle opcodes.
				822
				823	class OpcodeInfo(object):
				824
				825	__slots__ = (
				826	# symbolic name of opcode; a string
				827	'name',
				828
				829	# the code used in a bytestream to represent the opcode; a
				830	# one-character string
				831	'code',
				832
				833	# If the opcode has an argument embedded in the byte string, an
				834	# instance of ArgumentDescriptor specifying its type. Note that
				835	# arg.reader(s) can be used to read and decode the argument from
				836	# the bytestream s, and arg.doc documents the format of the raw
				837	# argument bytes. If the opcode doesn't have an argument embedded
				838	# in the bytestream, arg should be None.
				839	'arg',
				840
				841	# what the stack looks like before this opcode runs; a list
				842	'stack_before',
				843
				844	# what the stack looks like after this opcode runs; a list
				845	'stack_after',
				846
				847	# the protocol number in which this opcode was introduced; an int
				848	'proto',
				849
				850	# human-readable docs for this opcode; a string
				851	'doc',
				852	)
				853
				854	def __init__(self, name, code, arg,
				855	stack_before, stack_after, proto, doc):
				856	assert isinstance(name, str)
				857	self.name = name
				858
				859	assert isinstance(code, str)
				860	assert len(code) == 1
				861	self.code = code
				862
				863	assert arg is None or isinstance(arg, ArgumentDescriptor)
				864	self.arg = arg
				865
				866	assert isinstance(stack_before, list)
				867	for x in stack_before:
				868	assert isinstance(x, StackObject)
				869	self.stack_before = stack_before
				870
				871	assert isinstance(stack_after, list)
				872	for x in stack_after:
				873	assert isinstance(x, StackObject)
				874	self.stack_after = stack_after
				875
				876	assert isinstance(proto, int) and 0 <= proto <= 2
				877	self.proto = proto
				878
				879	assert isinstance(doc, str)
				880	self.doc = doc
				881
				882	I = OpcodeInfo
				883	opcodes = [
				884
				885	# Ways to spell integers.
				886
				887	I(name='INT',
				888	code='I',
				889	arg=decimalnl_short,
				890	stack_before=[],
				891	stack_after=[pyinteger_or_bool],
				892	proto=0,
				893	doc="""Push an integer or bool.
				894
				895	The argument is a newline-terminated decimal literal string.
				896
				897	The intent may have been that this always fit in a short Python int,
				898	but INT can be generated in pickles written on a 64-bit box that
				899	require a Python long on a 32-bit box. The difference between this
				900	and LONG then is that INT skips a trailing 'L', and produces a short
				901	int whenever possible.
				902
				903	Another difference is due to that, when bool was introduced as a
				904	distinct type in 2.3, builtin names True and False were also added to
				905	2.2.2, mapping to ints 1 and 0. For compatibility in both directions,
				906	True gets pickled as INT + "I01\\n", and False as INT + "I00\\n".
				907	Leading zeroes are never produced for a genuine integer. The 2.3
				908	(and later) unpicklers special-case these and return bool instead;
				909	earlier unpicklers ignore the leading "0" and return the int.
				910	"""),
				911
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	912	I(name='BININT',
				913	code='J',
				914	arg=int4,
				915	stack_before=[],
				916	stack_after=[pyint],
				917	proto=1,
				918	doc="""Push a four-byte signed integer.
				919
				920	This handles the full range of Python (short) integers on a 32-bit
				921	box, directly as binary bytes (1 for the opcode and 4 for the integer).
				922	If the integer is non-negative and fits in 1 or 2 bytes, pickling via
				923	BININT1 or BININT2 saves space.
				924	"""),
				925
				926	I(name='BININT1',
				927	code='K',
				928	arg=uint1,
				929	stack_before=[],
				930	stack_after=[pyint],
				931	proto=1,
				932	doc="""Push a one-byte unsigned integer.
				933
				934	This is a space optimization for pickling very small non-negative ints,
				935	in range(256).
				936	"""),
				937
				938	I(name='BININT2',
				939	code='M',
				940	arg=uint2,
				941	stack_before=[],
				942	stack_after=[pyint],
				943	proto=1,
				944	doc="""Push a two-byte unsigned integer.
				945
				946	This is a space optimization for pickling small positive ints, in
				947	range(256, 2**16). Integers in range(256) can also be pickled via
				948	BININT2, but BININT1 instead saves a byte.
				949	"""),
				950
Tim Peters	fdc0346	2003-01-28 04:56:33 +0000	[diff] [blame^]	951	I(name='LONG',
				952	code='L',
				953	arg=decimalnl_long,
				954	stack_before=[],
				955	stack_after=[pylong],
				956	proto=0,
				957	doc="""Push a long integer.
				958
				959	The same as INT, except that the literal ends with 'L', and always
				960	unpickles to a Python long. There doesn't seem a real purpose to the
				961	trailing 'L'.
				962
				963	Note that LONG takes time quadratic in the number of digits when
				964	unpickling (this is simply due to the nature of decimal->binary
				965	conversion). Proto 2 added linear-time (in C; still quadratic-time
				966	in Python) LONG1 and LONG4 opcodes.
				967	"""),
				968
				969	I(name="LONG1",
				970	code='\x8a',
				971	arg=long1,
				972	stack_before=[],
				973	stack_after=[pylong],
				974	proto=2,
				975	doc="""Long integer using one-byte length.
				976
				977	A more efficient encoding of a Python long; the long1 encoding
				978	says it all."""),
				979
				980	I(name="LONG4",
				981	code='\x8b',
				982	arg=long4,
				983	stack_before=[],
				984	stack_after=[pylong],
				985	proto=2,
				986	doc="""Long integer using found-byte length.
				987
				988	A more efficient encoding of a Python long; the long4 encoding
				989	says it all."""),
				990
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	991	# Ways to spell strings (8-bit, not Unicode).
				992
				993	I(name='STRING',
				994	code='S',
				995	arg=stringnl,
				996	stack_before=[],
				997	stack_after=[pystring],
				998	proto=0,
				999	doc="""Push a Python string object.
				1000
				1001	The argument is a repr-style string, with bracketing quote characters,
				1002	and perhaps embedded escapes. The argument extends until the next
				1003	newline character.
				1004	"""),
				1005
				1006	I(name='BINSTRING',
				1007	code='T',
				1008	arg=string4,
				1009	stack_before=[],
				1010	stack_after=[pystring],
				1011	proto=1,
				1012	doc="""Push a Python string object.
				1013
				1014	There are two arguments: the first is a 4-byte little-endian signed int
				1015	giving the number of bytes in the string, and the second is that many
				1016	bytes, which are taken literally as the string content.
				1017	"""),
				1018
				1019	I(name='SHORT_BINSTRING',
				1020	code='U',
				1021	arg=string1,
				1022	stack_before=[],
				1023	stack_after=[pystring],
				1024	proto=1,
				1025	doc="""Push a Python string object.
				1026
				1027	There are two arguments: the first is a 1-byte unsigned int giving
				1028	the number of bytes in the string, and the second is that many bytes,
				1029	which are taken literally as the string content.
				1030	"""),
				1031
				1032	# Ways to spell None.
				1033
				1034	I(name='NONE',
				1035	code='N',
				1036	arg=None,
				1037	stack_before=[],
				1038	stack_after=[pynone],
				1039	proto=0,
				1040	doc="Push None on the stack."),
				1041
Tim Peters	fdc0346	2003-01-28 04:56:33 +0000	[diff] [blame^]	1042	# Ways to spell bools, starting with proto 2. See INT for how this was
				1043	# done before proto 2.
				1044
				1045	I(name='NEWTRUE',
				1046	code='\x88',
				1047	arg=None,
				1048	stack_before=[],
				1049	stack_after=[pybool],
				1050	proto=2,
				1051	doc="""True.
				1052
				1053	Push True onto the stack."""),
				1054
				1055	I(name='NEWFALSE',
				1056	code='\x89',
				1057	arg=None,
				1058	stack_before=[],
				1059	stack_after=[pybool],
				1060	proto=2,
				1061	doc="""True.
				1062
				1063	Push False onto the stack."""),
				1064
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	1065	# Ways to spell Unicode strings.
				1066
				1067	I(name='UNICODE',
				1068	code='V',
				1069	arg=unicodestringnl,
				1070	stack_before=[],
				1071	stack_after=[pyunicode],
				1072	proto=0, # this may be pure-text, but it's a later addition
				1073	doc="""Push a Python Unicode string object.
				1074
				1075	The argument is a raw-unicode-escape encoding of a Unicode string,
				1076	and so may contain embedded escape sequences. The argument extends
				1077	until the next newline character.
				1078	"""),
				1079
				1080	I(name='BINUNICODE',
				1081	code='X',
				1082	arg=unicodestring4,
				1083	stack_before=[],
				1084	stack_after=[pyunicode],
				1085	proto=1,
				1086	doc="""Push a Python Unicode string object.
				1087
				1088	There are two arguments: the first is a 4-byte little-endian signed int
				1089	giving the number of bytes in the string. The second is that many
				1090	bytes, and is the UTF-8 encoding of the Unicode string.
				1091	"""),
				1092
				1093	# Ways to spell floats.
				1094
				1095	I(name='FLOAT',
				1096	code='F',
				1097	arg=floatnl,
				1098	stack_before=[],
				1099	stack_after=[pyfloat],
				1100	proto=0,
				1101	doc="""Newline-terminated decimal float literal.
				1102
				1103	The argument is repr(a_float), and in general requires 17 significant
				1104	digits for roundtrip conversion to be an identity (this is so for
				1105	IEEE-754 double precision values, which is what Python float maps to
				1106	on most boxes).
				1107
				1108	In general, FLOAT cannot be used to transport infinities, NaNs, or
				1109	minus zero across boxes (or even on a single box, if the platform C
				1110	library can't read the strings it produces for such things -- Windows
				1111	is like that), but may do less damage than BINFLOAT on boxes with
				1112	greater precision or dynamic range than IEEE-754 double.
				1113	"""),
				1114
				1115	I(name='BINFLOAT',
				1116	code='G',
				1117	arg=float8,
				1118	stack_before=[],
				1119	stack_after=[pyfloat],
				1120	proto=1,
				1121	doc="""Float stored in binary form, with 8 bytes of data.
				1122
				1123	This generally requires less than half the space of FLOAT encoding.
				1124	In general, BINFLOAT cannot be used to transport infinities, NaNs, or
				1125	minus zero, raises an exception if the exponent exceeds the range of
				1126	an IEEE-754 double, and retains no more than 53 bits of precision (if
				1127	there are more than that, "add a half and chop" rounding is used to
				1128	cut it back to 53 significant bits).
				1129	"""),
				1130
				1131	# Ways to build lists.
				1132
				1133	I(name='EMPTY_LIST',
				1134	code=']',
				1135	arg=None,
				1136	stack_before=[],
				1137	stack_after=[pylist],
				1138	proto=1,
				1139	doc="Push an empty list."),
				1140
				1141	I(name='APPEND',
				1142	code='a',
				1143	arg=None,
				1144	stack_before=[pylist, anyobject],
				1145	stack_after=[pylist],
				1146	proto=0,
				1147	doc="""Append an object to a list.
				1148
				1149	Stack before: ... pylist anyobject
				1150	Stack after: ... pylist+[anyobject]
				1151	"""),
				1152
				1153	I(name='APPENDS',
				1154	code='e',
				1155	arg=None,
				1156	stack_before=[pylist, markobject, stackslice],
				1157	stack_after=[pylist],
				1158	proto=1,
				1159	doc="""Extend a list by a slice of stack objects.
				1160
				1161	Stack before: ... pylist markobject stackslice
				1162	Stack after: ... pylist+stackslice
				1163	"""),
				1164
				1165	I(name='LIST',
				1166	code='l',
				1167	arg=None,
				1168	stack_before=[markobject, stackslice],
				1169	stack_after=[pylist],
				1170	proto=0,
				1171	doc="""Build a list out of the topmost stack slice, after markobject.
				1172
				1173	All the stack entries following the topmost markobject are placed into
				1174	a single Python list, which single list object replaces all of the
				1175	stack from the topmost markobject onward. For example,
				1176
				1177	Stack before: ... markobject 1 2 3 'abc'
				1178	Stack after: ... [1, 2, 3, 'abc']
				1179	"""),
				1180
				1181	# Ways to build tuples.
				1182
				1183	I(name='EMPTY_TUPLE',
				1184	code=')',
				1185	arg=None,
				1186	stack_before=[],
				1187	stack_after=[pytuple],
				1188	proto=1,
				1189	doc="Push an empty tuple."),
				1190
				1191	I(name='TUPLE',
				1192	code='t',
				1193	arg=None,
				1194	stack_before=[markobject, stackslice],
				1195	stack_after=[pytuple],
				1196	proto=0,
				1197	doc="""Build a tuple out of the topmost stack slice, after markobject.
				1198
				1199	All the stack entries following the topmost markobject are placed into
				1200	a single Python tuple, which single tuple object replaces all of the
				1201	stack from the topmost markobject onward. For example,
				1202
				1203	Stack before: ... markobject 1 2 3 'abc'
				1204	Stack after: ... (1, 2, 3, 'abc')
				1205	"""),
				1206
Tim Peters	fdc0346	2003-01-28 04:56:33 +0000	[diff] [blame^]	1207	I(name='TUPLE1',
				1208	code='\x85',
				1209	arg=None,
				1210	stack_before=[anyobject],
				1211	stack_after=[pytuple],
				1212	proto=2,
				1213	doc="""One-tuple.
				1214
				1215	This code pops one value off the stack and pushes a tuple of
				1216	length 1 whose one item is that value back onto it. IOW:
				1217
				1218	stack[-1] = tuple(stack[-1:])
				1219	"""),
				1220
				1221	I(name='TUPLE2',
				1222	code='\x86',
				1223	arg=None,
				1224	stack_before=[anyobject, anyobject],
				1225	stack_after=[pytuple],
				1226	proto=2,
				1227	doc="""One-tuple.
				1228
				1229	This code pops two values off the stack and pushes a tuple
				1230	of length 2 whose items are those values back onto it. IOW:
				1231
				1232	stack[-2:] = [tuple(stack[-2:])]
				1233	"""),
				1234
				1235	I(name='TUPLE3',
				1236	code='\x87',
				1237	arg=None,
				1238	stack_before=[anyobject, anyobject, anyobject],
				1239	stack_after=[pytuple],
				1240	proto=2,
				1241	doc="""One-tuple.
				1242
				1243	This code pops three values off the stack and pushes a tuple
				1244	of length 3 whose items are those values back onto it. IOW:
				1245
				1246	stack[-3:] = [tuple(stack[-3:])]
				1247	"""),
				1248
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	1249	# Ways to build dicts.
				1250
				1251	I(name='EMPTY_DICT',
				1252	code='}',
				1253	arg=None,
				1254	stack_before=[],
				1255	stack_after=[pydict],
				1256	proto=1,
				1257	doc="Push an empty dict."),
				1258
				1259	I(name='DICT',
				1260	code='d',
				1261	arg=None,
				1262	stack_before=[markobject, stackslice],
				1263	stack_after=[pydict],
				1264	proto=0,
				1265	doc="""Build a dict out of the topmost stack slice, after markobject.
				1266
				1267	All the stack entries following the topmost markobject are placed into
				1268	a single Python dict, which single dict object replaces all of the
				1269	stack from the topmost markobject onward. The stack slice alternates
				1270	key, value, key, value, .... For example,
				1271
				1272	Stack before: ... markobject 1 2 3 'abc'
				1273	Stack after: ... {1: 2, 3: 'abc'}
				1274	"""),
				1275
				1276	I(name='SETITEM',
				1277	code='s',
				1278	arg=None,
				1279	stack_before=[pydict, anyobject, anyobject],
				1280	stack_after=[pydict],
				1281	proto=0,
				1282	doc="""Add a key+value pair to an existing dict.
				1283
				1284	Stack before: ... pydict key value
				1285	Stack after: ... pydict
				1286
				1287	where pydict has been modified via pydict[key] = value.
				1288	"""),
				1289
				1290	I(name='SETITEMS',
				1291	code='u',
				1292	arg=None,
				1293	stack_before=[pydict, markobject, stackslice],
				1294	stack_after=[pydict],
				1295	proto=1,
				1296	doc="""Add an arbitrary number of key+value pairs to an existing dict.
				1297
				1298	The slice of the stack following the topmost markobject is taken as
				1299	an alternating sequence of keys and values, added to the dict
				1300	immediately under the topmost markobject. Everything at and after the
				1301	topmost markobject is popped, leaving the mutated dict at the top
				1302	of the stack.
				1303
				1304	Stack before: ... pydict markobject key_1 value_1 ... key_n value_n
				1305	Stack after: ... pydict
				1306
				1307	where pydict has been modified via pydict[key_i] = value_i for i in
				1308	1, 2, ..., n, and in that order.
				1309	"""),
				1310
				1311	# Stack manipulation.
				1312
				1313	I(name='POP',
				1314	code='0',
				1315	arg=None,
				1316	stack_before=[anyobject],
				1317	stack_after=[],
				1318	proto=0,
				1319	doc="Discard the top stack item, shrinking the stack by one item."),
				1320
				1321	I(name='DUP',
				1322	code='2',
				1323	arg=None,
				1324	stack_before=[anyobject],
				1325	stack_after=[anyobject, anyobject],
				1326	proto=0,
				1327	doc="Push the top stack item onto the stack again, duplicating it."),
				1328
				1329	I(name='MARK',
				1330	code='(',
				1331	arg=None,
				1332	stack_before=[],
				1333	stack_after=[markobject],
				1334	proto=0,
				1335	doc="""Push markobject onto the stack.
				1336
				1337	markobject is a unique object, used by other opcodes to identify a
				1338	region of the stack containing a variable number of objects for them
				1339	to work on. See markobject.doc for more detail.
				1340	"""),
				1341
				1342	I(name='POP_MARK',
				1343	code='1',
				1344	arg=None,
				1345	stack_before=[markobject, stackslice],
				1346	stack_after=[],
				1347	proto=0,
				1348	doc="""Pop all the stack objects at and above the topmost markobject.
				1349
				1350	When an opcode using a variable number of stack objects is done,
				1351	POP_MARK is used to remove those objects, and to remove the markobject
				1352	that delimited their starting position on the stack.
				1353	"""),
				1354
				1355	# Memo manipulation. There are really only two operations (get and put),
				1356	# each in all-text, "short binary", and "long binary" flavors.
				1357
				1358	I(name='GET',
				1359	code='g',
				1360	arg=decimalnl_short,
				1361	stack_before=[],
				1362	stack_after=[anyobject],
				1363	proto=0,
				1364	doc="""Read an object from the memo and push it on the stack.
				1365
				1366	The index of the memo object to push is given by the newline-teriminated
				1367	decimal string following. BINGET and LONG_BINGET are space-optimized
				1368	versions.
				1369	"""),
				1370
				1371	I(name='BINGET',
				1372	code='h',
				1373	arg=uint1,
				1374	stack_before=[],
				1375	stack_after=[anyobject],
				1376	proto=1,
				1377	doc="""Read an object from the memo and push it on the stack.
				1378
				1379	The index of the memo object to push is given by the 1-byte unsigned
				1380	integer following.
				1381	"""),
				1382
				1383	I(name='LONG_BINGET',
				1384	code='j',
				1385	arg=int4,
				1386	stack_before=[],
				1387	stack_after=[anyobject],
				1388	proto=1,
				1389	doc="""Read an object from the memo and push it on the stack.
				1390
				1391	The index of the memo object to push is given by the 4-byte signed
				1392	little-endian integer following.
				1393	"""),
				1394
				1395	I(name='PUT',
				1396	code='p',
				1397	arg=decimalnl_short,
				1398	stack_before=[],
				1399	stack_after=[],
				1400	proto=0,
				1401	doc="""Store the stack top into the memo. The stack is not popped.
				1402
				1403	The index of the memo location to write into is given by the newline-
				1404	terminated decimal string following. BINPUT and LONG_BINPUT are
				1405	space-optimized versions.
				1406	"""),
				1407
				1408	I(name='BINPUT',
				1409	code='q',
				1410	arg=uint1,
				1411	stack_before=[],
				1412	stack_after=[],
				1413	proto=1,
				1414	doc="""Store the stack top into the memo. The stack is not popped.
				1415
				1416	The index of the memo location to write into is given by the 1-byte
				1417	unsigned integer following.
				1418	"""),
				1419
				1420	I(name='LONG_BINPUT',
				1421	code='r',
				1422	arg=int4,
				1423	stack_before=[],
				1424	stack_after=[],
				1425	proto=1,
				1426	doc="""Store the stack top into the memo. The stack is not popped.
				1427
				1428	The index of the memo location to write into is given by the 4-byte
				1429	signed little-endian integer following.
				1430	"""),
				1431
Tim Peters	fdc0346	2003-01-28 04:56:33 +0000	[diff] [blame^]	1432	# Access the extension registry (predefined objects). Akin to the GET
				1433	# family.
				1434
				1435	I(name='EXT1',
				1436	code='\x82',
				1437	arg=uint1,
				1438	stack_before=[],
				1439	stack_after=[anyobject],
				1440	proto=2,
				1441	doc="""Extension code.
				1442
				1443	This code and the similar EXT2 and EXT4 allow using a registry
				1444	of popular objects that are pickled by name, typically classes.
				1445	It is envisioned that through a global negotiation and
				1446	registration process, third parties can set up a mapping between
				1447	ints and object names.
				1448
				1449	In order to guarantee pickle interchangeability, the extension
				1450	code registry ought to be global, although a range of codes may
				1451	be reserved for private use.
				1452
				1453	EXT1 has a 1-byte integer argument. This is used to index into the
				1454	extension registry, and the object at that index is pushed on the stack.
				1455	"""),
				1456
				1457	I(name='EXT2',
				1458	code='\x83',
				1459	arg=uint2,
				1460	stack_before=[],
				1461	stack_after=[anyobject],
				1462	proto=2,
				1463	doc="""Extension code.
				1464
				1465	See EXT1. EXT2 has a two-byte integer argument.
				1466	"""),
				1467
				1468	I(name='EXT4',
				1469	code='\x84',
				1470	arg=int4,
				1471	stack_before=[],
				1472	stack_after=[anyobject],
				1473	proto=2,
				1474	doc="""Extension code.
				1475
				1476	See EXT1. EXT4 has a four-byte integer argument.
				1477	"""),
				1478
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	1479	# Push a class object, or module function, on the stack, via its module
				1480	# and name.
				1481
				1482	I(name='GLOBAL',
				1483	code='c',
				1484	arg=stringnl_noescape_pair,
				1485	stack_before=[],
				1486	stack_after=[anyobject],
				1487	proto=0,
				1488	doc="""Push a global object (module.attr) on the stack.
				1489
				1490	Two newline-terminated strings follow the GLOBAL opcode. The first is
				1491	taken as a module name, and the second as a class name. The class
				1492	object module.class is pushed on the stack. More accurately, the
				1493	object returned by self.find_class(module, class) is pushed on the
				1494	stack, so unpickling subclasses can override this form of lookup.
				1495	"""),
				1496
				1497	# Ways to build objects of classes pickle doesn't know about directly
				1498	# (user-defined classes). I despair of documenting this accurately
				1499	# and comprehensibly -- you really have to read the pickle code to
				1500	# find all the special cases.
				1501
				1502	I(name='REDUCE',
				1503	code='R',
				1504	arg=None,
				1505	stack_before=[anyobject, anyobject],
				1506	stack_after=[anyobject],
				1507	proto=0,
				1508	doc="""Push an object built from a callable and an argument tuple.
				1509
				1510	The opcode is named to remind of the __reduce__() method.
				1511
				1512	Stack before: ... callable pytuple
				1513	Stack after: ... callable(*pytuple)
				1514
				1515	The callable and the argument tuple are the first two items returned
				1516	by a __reduce__ method. Applying the callable to the argtuple is
				1517	supposed to reproduce the original object, or at least get it started.
				1518	If the __reduce__ method returns a 3-tuple, the last component is an
				1519	argument to be passed to the object's __setstate__, and then the REDUCE
				1520	opcode is followed by code to create setstate's argument, and then a
				1521	BUILD opcode to apply __setstate__ to that argument.
				1522
				1523	There are lots of special cases here. The argtuple can be None, in
				1524	which case callable.__basicnew__() is called instead to produce the
				1525	object to be pushed on the stack. This appears to be a trick unique
				1526	to ExtensionClasses, and is deprecated regardless.
				1527
				1528	If type(callable) is not ClassType, REDUCE complains unless the
				1529	callable has been registered with the copy_reg module's
				1530	safe_constructors dict, or the callable has a magic
				1531	'__safe_for_unpickling__' attribute with a true value. I'm not sure
				1532	why it does this, but I've sure seen this complaint often enough when
				1533	I didn't want to <wink>.
				1534	"""),
				1535
				1536	I(name='BUILD',
				1537	code='b',
				1538	arg=None,
				1539	stack_before=[anyobject, anyobject],
				1540	stack_after=[anyobject],
				1541	proto=0,
				1542	doc="""Finish building an object, via __setstate__ or dict update.
				1543
				1544	Stack before: ... anyobject argument
				1545	Stack after: ... anyobject
				1546
				1547	where anyobject may have been mutated, as follows:
				1548
				1549	If the object has a __setstate__ method,
				1550
				1551	anyobject.__setstate__(argument)
				1552
				1553	is called.
				1554
				1555	Else the argument must be a dict, the object must have a __dict__, and
				1556	the object is updated via
				1557
				1558	anyobject.__dict__.update(argument)
				1559
				1560	This may raise RuntimeError in restricted execution mode (which
				1561	disallows access to __dict__ directly); in that case, the object
				1562	is updated instead via
				1563
				1564	for k, v in argument.items():
				1565	anyobject[k] = v
				1566	"""),
				1567
				1568	I(name='INST',
				1569	code='i',
				1570	arg=stringnl_noescape_pair,
				1571	stack_before=[markobject, stackslice],
				1572	stack_after=[anyobject],
				1573	proto=0,
				1574	doc="""Build a class instance.
				1575
				1576	This is the protocol 0 version of protocol 1's OBJ opcode.
				1577	INST is followed by two newline-terminated strings, giving a
				1578	module and class name, just as for the GLOBAL opcode (and see
				1579	GLOBAL for more details about that). self.find_class(module, name)
				1580	is used to get a class object.
				1581
				1582	In addition, all the objects on the stack following the topmost
				1583	markobject are gathered into a tuple and popped (along with the
				1584	topmost markobject), just as for the TUPLE opcode.
				1585
				1586	Now it gets complicated. If all of these are true:
				1587
				1588	+ The argtuple is empty (markobject was at the top of the stack
				1589	at the start).
				1590
				1591	+ It's an old-style class object (the type of the class object is
				1592	ClassType).
				1593
				1594	+ The class object does not have a __getinitargs__ attribute.
				1595
				1596	then we want to create an old-style class instance without invoking
				1597	its __init__() method (pickle has waffled on this over the years; not
				1598	calling __init__() is current wisdom). In this case, an instance of
				1599	an old-style dummy class is created, and then we try to rebind its
				1600	__class__ attribute to the desired class object. If this succeeds,
				1601	the new instance object is pushed on the stack, and we're done. In
				1602	restricted execution mode it can fail (assignment to __class__ is
				1603	disallowed), and I'm not really sure what happens then -- it looks
				1604	like the code ends up calling the class object's __init__ anyway,
				1605	via falling into the next case.
				1606
				1607	Else (the argtuple is not empty, it's not an old-style class object,
				1608	or the class object does have a __getinitargs__ attribute), the code
				1609	first insists that the class object have a __safe_for_unpickling__
				1610	attribute. Unlike as for the __safe_for_unpickling__ check in REDUCE,
				1611	it doesn't matter whether this attribute has a true or false value, it
				1612	only matters whether it exists (XXX this smells like a bug). If
				1613	__safe_for_unpickling__ dosn't exist, UnpicklingError is raised.
				1614
				1615	Else (the class object does have a __safe_for_unpickling__ attr),
				1616	the class object obtained from INST's arguments is applied to the
				1617	argtuple obtained from the stack, and the resulting instance object
				1618	is pushed on the stack.
				1619	"""),
				1620
				1621	I(name='OBJ',
				1622	code='o',
				1623	arg=None,
				1624	stack_before=[markobject, anyobject, stackslice],
				1625	stack_after=[anyobject],
				1626	proto=1,
				1627	doc="""Build a class instance.
				1628
				1629	This is the protocol 1 version of protocol 0's INST opcode, and is
				1630	very much like it. The major difference is that the class object
				1631	is taken off the stack, allowing it to be retrieved from the memo
				1632	repeatedly if several instances of the same class are created. This
				1633	can be much more efficient (in both time and space) than repeatedly
				1634	embedding the module and class names in INST opcodes.
				1635
				1636	Unlike INST, OBJ takes no arguments from the opcode stream. Instead
				1637	the class object is taken off the stack, immediately above the
				1638	topmost markobject:
				1639
				1640	Stack before: ... markobject classobject stackslice
				1641	Stack after: ... new_instance_object
				1642
				1643	As for INST, the remainder of the stack above the markobject is
				1644	gathered into an argument tuple, and then the logic seems identical,
				1645	except that no __safe_for_unpickling__ check is done (XXX this smells
				1646	like a bug). See INST for the gory details.
				1647	"""),
				1648
Tim Peters	fdc0346	2003-01-28 04:56:33 +0000	[diff] [blame^]	1649	I(name='NEWOBJ',
				1650	code='\x81',
				1651	arg=None,
				1652	stack_before=[anyobject, anyobject],
				1653	stack_after=[anyobject],
				1654	proto=2,
				1655	doc="""Build an object instance.
				1656
				1657	The stack before should be thought of as containing a class
				1658	object followed by an argument tuple (the tuple being the stack
				1659	top). Call these cls and args. They are popped off the stack,
				1660	and the value returned by cls.__new__(cls, *args) is pushed back
				1661	onto the stack.
				1662	"""),
				1663
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	1664	# Machine control.
				1665
Tim Peters	fdc0346	2003-01-28 04:56:33 +0000	[diff] [blame^]	1666	I(name='PROTO',
				1667	code='\x80',
				1668	arg=uint1,
				1669	stack_before=[],
				1670	stack_after=[],
				1671	proto=2,
				1672	doc="""Protocol version indicator.
				1673
				1674	For protocol 2 and above, a pickle must start with this opcode.
				1675	The argument is the protocol version, an int in range(2, 256).
				1676	"""),
				1677
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	1678	I(name='STOP',
				1679	code='.',
				1680	arg=None,
				1681	stack_before=[anyobject],
				1682	stack_after=[],
				1683	proto=0,
				1684	doc="""Stop the unpickling machine.
				1685
				1686	Every pickle ends with this opcode. The object at the top of the stack
				1687	is popped, and that's the result of unpickling. The stack should be
				1688	empty then.
				1689	"""),
				1690
				1691	# Ways to deal with persistent IDs.
				1692
				1693	I(name='PERSID',
				1694	code='P',
				1695	arg=stringnl_noescape,
				1696	stack_before=[],
				1697	stack_after=[anyobject],
				1698	proto=0,
				1699	doc="""Push an object identified by a persistent ID.
				1700
				1701	The pickle module doesn't define what a persistent ID means. PERSID's
				1702	argument is a newline-terminated str-style (no embedded escapes, no
				1703	bracketing quote characters) string, which is "the persistent ID".
				1704	The unpickler passes this string to self.persistent_load(). Whatever
				1705	object that returns is pushed on the stack. There is no implementation
				1706	of persistent_load() in Python's unpickler: it must be supplied by an
				1707	unpickler subclass.
				1708	"""),
				1709
				1710	I(name='BINPERSID',
				1711	code='Q',
				1712	arg=None,
				1713	stack_before=[anyobject],
				1714	stack_after=[anyobject],
				1715	proto=1,
				1716	doc="""Push an object identified by a persistent ID.
				1717
				1718	Like PERSID, except the persistent ID is popped off the stack (instead
				1719	of being a string embedded in the opcode bytestream). The persistent
				1720	ID is passed to self.persistent_load(), and whatever object that
				1721	returns is pushed on the stack. See PERSID for more detail.
				1722	"""),
				1723	]
				1724	del I
				1725
				1726	# Verify uniqueness of .name and .code members.
				1727	name2i = {}
				1728	code2i = {}
				1729
				1730	for i, d in enumerate(opcodes):
				1731	if d.name in name2i:
				1732	raise ValueError("repeated name %r at indices %d and %d" %
				1733	(d.name, name2i[d.name], i))
				1734	if d.code in code2i:
				1735	raise ValueError("repeated code %r at indices %d and %d" %
				1736	(d.code, code2i[d.code], i))
				1737
				1738	name2i[d.name] = i
				1739	code2i[d.code] = i
				1740
				1741	del name2i, code2i, i, d
				1742
				1743	##############################################################################
				1744	# Build a code2op dict, mapping opcode characters to OpcodeInfo records.
				1745	# Also ensure we've got the same stuff as pickle.py, although the
				1746	# introspection here is dicey.
				1747
				1748	code2op = {}
				1749	for d in opcodes:
				1750	code2op[d.code] = d
				1751	del d
				1752
				1753	def assure_pickle_consistency(verbose=False):
				1754	import pickle, re
				1755
				1756	copy = code2op.copy()
				1757	for name in pickle.__all__:
				1758	if not re.match("[A-Z][A-Z0-9_]+$", name):
				1759	if verbose:
				1760	print "skipping %r: it doesn't look like an opcode name" % name
				1761	continue
				1762	picklecode = getattr(pickle, name)
				1763	if not isinstance(picklecode, str) or len(picklecode) != 1:
				1764	if verbose:
				1765	print ("skipping %r: value %r doesn't look like a pickle "
				1766	"code" % (name, picklecode))
				1767	continue
				1768	if picklecode in copy:
				1769	if verbose:
				1770	print "checking name %r w/ code %r for consistency" % (
				1771	name, picklecode)
				1772	d = copy[picklecode]
				1773	if d.name != name:
				1774	raise ValueError("for pickle code %r, pickle.py uses name %r "
				1775	"but we're using name %r" % (picklecode,
				1776	name,
				1777	d.name))
				1778	# Forget this one. Any left over in copy at the end are a problem
				1779	# of a different kind.
				1780	del copy[picklecode]
				1781	else:
				1782	raise ValueError("pickle.py appears to have a pickle opcode with "
				1783	"name %r and code %r, but we don't" %
				1784	(name, picklecode))
				1785	if copy:
				1786	msg = ["we appear to have pickle opcodes that pickle.py doesn't have:"]
				1787	for code, d in copy.items():
				1788	msg.append(" name %r with code %r" % (d.name, code))
				1789	raise ValueError("\n".join(msg))
				1790
				1791	assure_pickle_consistency()
				1792
				1793	##############################################################################
				1794	# A pickle opcode generator.
				1795
				1796	def genops(pickle):
Guido van Rossum	a72ded9	2003-01-27 19:40:47 +0000	[diff] [blame]	1797	"""Generate all the opcodes in a pickle.
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	1798
				1799	'pickle' is a file-like object, or string, containing the pickle.
				1800
				1801	Each opcode in the pickle is generated, from the current pickle position,
				1802	stopping after a STOP opcode is delivered. A triple is generated for
				1803	each opcode:
				1804
				1805	opcode, arg, pos
				1806
				1807	opcode is an OpcodeInfo record, describing the current opcode.
				1808
				1809	If the opcode has an argument embedded in the pickle, arg is its decoded
				1810	value, as a Python object. If the opcode doesn't have an argument, arg
				1811	is None.
				1812
				1813	If the pickle has a tell() method, pos was the value of pickle.tell()
				1814	before reading the current opcode. If the pickle is a string object,
				1815	it's wrapped in a StringIO object, and the latter's tell() result is
				1816	used. Else (the pickle doesn't have a tell(), and it's not obvious how
				1817	to query its current position) pos is None.
				1818	"""
				1819
				1820	import cStringIO as StringIO
				1821
				1822	if isinstance(pickle, str):
				1823	pickle = StringIO.StringIO(pickle)
				1824
				1825	if hasattr(pickle, "tell"):
				1826	getpos = pickle.tell
				1827	else:
				1828	getpos = lambda: None
				1829
				1830	while True:
				1831	pos = getpos()
				1832	code = pickle.read(1)
				1833	opcode = code2op.get(code)
				1834	if opcode is None:
				1835	if code == "":
				1836	raise ValueError("pickle exhausted before seeing STOP")
				1837	else:
				1838	raise ValueError("at position %s, opcode %r unknown" % (
				1839	pos is None and "<unknown>" or pos,
				1840	code))
				1841	if opcode.arg is None:
				1842	arg = None
				1843	else:
				1844	arg = opcode.arg.reader(pickle)
				1845	yield opcode, arg, pos
				1846	if code == '.':
				1847	assert opcode.name == 'STOP'
				1848	break
				1849
				1850	##############################################################################
				1851	# A symbolic pickle disassembler.
				1852
				1853	def dis(pickle, out=None, indentlevel=4):
				1854	"""Produce a symbolic disassembly of a pickle.
				1855
				1856	'pickle' is a file-like object, or string, containing a (at least one)
				1857	pickle. The pickle is disassembled from the current position, through
				1858	the first STOP opcode encountered.
				1859
				1860	Optional arg 'out' is a file-like object to which the disassembly is
				1861	printed. It defaults to sys.stdout.
				1862
				1863	Optional arg indentlevel is the number of blanks by which to indent
				1864	a new MARK level. It defaults to 4.
				1865	"""
				1866
				1867	markstack = []
				1868	indentchunk = ' ' * indentlevel
				1869	for opcode, arg, pos in genops(pickle):
				1870	if pos is not None:
				1871	print >> out, "%5d:" % pos,
				1872
				1873	line = "%s %s%s" % (opcode.code,
				1874	indentchunk * len(markstack),
				1875	opcode.name)
				1876
				1877	markmsg = None
				1878	if markstack and markobject in opcode.stack_before:
				1879	assert markobject not in opcode.stack_after
				1880	markpos = markstack.pop()
				1881	if markpos is not None:
				1882	markmsg = "(MARK at %d)" % markpos
				1883
				1884	if arg is not None or markmsg:
				1885	# make a mild effort to align arguments
				1886	line += ' ' * (10 - len(opcode.name))
				1887	if arg is not None:
				1888	line += ' ' + repr(arg)
				1889	if markmsg:
				1890	line += ' ' + markmsg
				1891	print >> out, line
				1892
				1893	if markobject in opcode.stack_after:
				1894	assert markobject not in opcode.stack_before
				1895	markstack.append(pos)
				1896
				1897
				1898	_dis_test = """
				1899	>>> import pickle
				1900	>>> x = [1, 2, (3, 4), {'abc': u"def"}]
Guido van Rossum	f29d3d6	2003-01-27 22:47:53 +0000	[diff] [blame]	1901	>>> pik = pickle.dumps(x, 0)
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	1902	>>> dis(pik)
				1903	0: ( MARK
				1904	1: l LIST (MARK at 0)
				1905	2: p PUT 0
				1906	5: I INT 1
				1907	8: a APPEND
				1908	9: I INT 2
				1909	12: a APPEND
				1910	13: ( MARK
				1911	14: I INT 3
				1912	17: I INT 4
				1913	20: t TUPLE (MARK at 13)
				1914	21: p PUT 1
				1915	24: a APPEND
				1916	25: ( MARK
				1917	26: d DICT (MARK at 25)
				1918	27: p PUT 2
				1919	30: S STRING 'abc'
				1920	37: p PUT 3
				1921	40: V UNICODE u'def'
				1922	45: p PUT 4
				1923	48: s SETITEM
				1924	49: a APPEND
				1925	50: . STOP
				1926
				1927	Try again with a "binary" pickle.
				1928
				1929	>>> pik = pickle.dumps(x, 1)
				1930	>>> dis(pik)
				1931	0: ] EMPTY_LIST
				1932	1: q BINPUT 0
				1933	3: ( MARK
				1934	4: K BININT1 1
				1935	6: K BININT1 2
				1936	8: ( MARK
				1937	9: K BININT1 3
				1938	11: K BININT1 4
				1939	13: t TUPLE (MARK at 8)
				1940	14: q BINPUT 1
				1941	16: } EMPTY_DICT
				1942	17: q BINPUT 2
				1943	19: U SHORT_BINSTRING 'abc'
				1944	24: q BINPUT 3
				1945	26: X BINUNICODE u'def'
				1946	34: q BINPUT 4
				1947	36: s SETITEM
				1948	37: e APPENDS (MARK at 3)
				1949	38: . STOP
				1950
				1951	Exercise the INST/OBJ/BUILD family.
				1952
				1953	>>> import random
Guido van Rossum	f29d3d6	2003-01-27 22:47:53 +0000	[diff] [blame]	1954	>>> dis(pickle.dumps(random.random, 0))
Tim Peters	d916cf4	2003-01-27 19:01:47 +0000	[diff] [blame]	1955	0: c GLOBAL 'random random'
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	1956	15: p PUT 0
				1957	18: . STOP
				1958
				1959	>>> x = [pickle.PicklingError()] * 2
Guido van Rossum	f29d3d6	2003-01-27 22:47:53 +0000	[diff] [blame]	1960	>>> dis(pickle.dumps(x, 0))
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	1961	0: ( MARK
				1962	1: l LIST (MARK at 0)
				1963	2: p PUT 0
				1964	5: ( MARK
Tim Peters	d916cf4	2003-01-27 19:01:47 +0000	[diff] [blame]	1965	6: i INST 'pickle PicklingError' (MARK at 5)
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	1966	28: p PUT 1
				1967	31: ( MARK
				1968	32: d DICT (MARK at 31)
				1969	33: p PUT 2
				1970	36: S STRING 'args'
				1971	44: p PUT 3
				1972	47: ( MARK
				1973	48: t TUPLE (MARK at 47)
				1974	49: p PUT 4
				1975	52: s SETITEM
				1976	53: b BUILD
				1977	54: a APPEND
				1978	55: g GET 1
				1979	58: a APPEND
				1980	59: . STOP
				1981
				1982	>>> dis(pickle.dumps(x, 1))
				1983	0: ] EMPTY_LIST
				1984	1: q BINPUT 0
				1985	3: ( MARK
				1986	4: ( MARK
Tim Peters	d916cf4	2003-01-27 19:01:47 +0000	[diff] [blame]	1987	5: c GLOBAL 'pickle PicklingError'
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	1988	27: q BINPUT 1
				1989	29: o OBJ (MARK at 4)
				1990	30: q BINPUT 2
				1991	32: } EMPTY_DICT
				1992	33: q BINPUT 3
				1993	35: U SHORT_BINSTRING 'args'
				1994	41: q BINPUT 4
				1995	43: ) EMPTY_TUPLE
				1996	44: s SETITEM
				1997	45: b BUILD
				1998	46: h BINGET 2
				1999	48: e APPENDS (MARK at 3)
				2000	49: . STOP
				2001
				2002	Try "the canonical" recursive-object test.
				2003
				2004	>>> L = []
				2005	>>> T = L,
				2006	>>> L.append(T)
				2007	>>> L[0] is T
				2008	True
				2009	>>> T[0] is L
				2010	True
				2011	>>> L[0][0] is L
				2012	True
				2013	>>> T[0][0] is T
				2014	True
Guido van Rossum	f29d3d6	2003-01-27 22:47:53 +0000	[diff] [blame]	2015	>>> dis(pickle.dumps(L, 0))
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	2016	0: ( MARK
				2017	1: l LIST (MARK at 0)
				2018	2: p PUT 0
				2019	5: ( MARK
				2020	6: g GET 0
				2021	9: t TUPLE (MARK at 5)
				2022	10: p PUT 1
				2023	13: a APPEND
				2024	14: . STOP
				2025	>>> dis(pickle.dumps(L, 1))
				2026	0: ] EMPTY_LIST
				2027	1: q BINPUT 0
				2028	3: ( MARK
				2029	4: h BINGET 0
				2030	6: t TUPLE (MARK at 3)
				2031	7: q BINPUT 1
				2032	9: a APPEND
				2033	10: . STOP
				2034
				2035	The protocol 0 pickle of the tuple causes the disassembly to get confused,
				2036	as it doesn't realize that the POP opcode at 16 gets rid of the MARK at 0
				2037	(so the output remains indented until the end). The protocol 1 pickle
				2038	doesn't trigger this glitch, because the disassembler realizes that
				2039	POP_MARK gets rid of the MARK. Doing a better job on the protocol 0
				2040	pickle would require the disassembler to emulate the stack.
				2041
Guido van Rossum	f29d3d6	2003-01-27 22:47:53 +0000	[diff] [blame]	2042	>>> dis(pickle.dumps(T, 0))
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	2043	0: ( MARK
				2044	1: ( MARK
				2045	2: l LIST (MARK at 1)
				2046	3: p PUT 0
				2047	6: ( MARK
				2048	7: g GET 0
				2049	10: t TUPLE (MARK at 6)
				2050	11: p PUT 1
				2051	14: a APPEND
				2052	15: 0 POP
				2053	16: 0 POP
				2054	17: g GET 1
				2055	20: . STOP
				2056	>>> dis(pickle.dumps(T, 1))
				2057	0: ( MARK
				2058	1: ] EMPTY_LIST
				2059	2: q BINPUT 0
				2060	4: ( MARK
				2061	5: h BINGET 0
				2062	7: t TUPLE (MARK at 4)
				2063	8: q BINPUT 1
				2064	10: a APPEND
				2065	11: 1 POP_MARK (MARK at 0)
				2066	12: h BINGET 1
				2067	14: . STOP
				2068	"""
				2069
				2070	__test__ = {'dissassembler_test': _dis_test,
				2071	}
				2072
				2073	def _test():
				2074	import doctest
				2075	return doctest.testmod()
				2076
				2077	if __name__ == "__main__":
				2078	_test()