Blame - Lib/pickletools.py - platform/external/python/cpython3

blob: 183db1022b35da9ca51d6ed31957ded295f5d0e6 [file] [log] [blame]

Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	1	""""Executable documentation" for the pickle module.
				2
				3	Extensive comments about the pickle protocols and pickle-machine opcodes
				4	can be found here. Some functions meant for external use:
				5
				6	genops(pickle)
				7	Generate all the opcodes in a pickle, as (opcode, arg, position) triples.
				8
				9	dis(pickle, out=None, indentlevel=4)
				10	Print a symbolic disassembly of a pickle.
				11	"""
				12
				13	# Other ideas:
				14	#
				15	# - A pickle verifier: read a pickle and check it exhaustively for
				16	# well-formedness.
				17	#
				18	# - A protocol identifier: examine a pickle and return its protocol number
				19	# (== the highest .proto attr value among all the opcodes in the pickle).
				20	#
				21	# - A pickle optimizer: for example, tuple-building code is sometimes more
				22	# elaborate than necessary, catering for the possibility that the tuple
				23	# is recursive. Or lots of times a PUT is generated that's never accessed
				24	# by a later GET.
				25
				26
				27	"""
				28	"A pickle" is a program for a virtual pickle machine (PM, but more accurately
				29	called an unpickling machine). It's a sequence of opcodes, interpreted by the
				30	PM, building an arbitrarily complex Python object.
				31
				32	For the most part, the PM is very simple: there are no looping, testing, or
				33	conditional instructions, no arithmetic and no function calls. Opcodes are
				34	executed once each, from first to last, until a STOP opcode is reached.
				35
				36	The PM has two data areas, "the stack" and "the memo".
				37
				38	Many opcodes push Python objects onto the stack; e.g., INT pushes a Python
				39	integer object on the stack, whose value is gotten from a decimal string
				40	literal immediately following the INT opcode in the pickle bytestream. Other
				41	opcodes take Python objects off the stack. The result of unpickling is
				42	whatever object is left on the stack when the final STOP opcode is executed.
				43
				44	The memo is simply an array of objects, or it can be implemented as a dict
				45	mapping little integers to objects. The memo serves as the PM's "long term
				46	memory", and the little integers indexing the memo are akin to variable
				47	names. Some opcodes pop a stack object into the memo at a given index,
				48	and others push a memo object at a given index onto the stack again.
				49
				50	At heart, that's all the PM has. Subtleties arise for these reasons:
				51
				52	+ Object identity. Objects can be arbitrarily complex, and subobjects
				53	may be shared (for example, the list [a, a] refers to the same object a
				54	twice). It can be vital that unpickling recreate an isomorphic object
				55	graph, faithfully reproducing sharing.
				56
				57	+ Recursive objects. For example, after "L = []; L.append(L)", L is a
				58	list, and L[0] is the same list. This is related to the object identity
				59	point, and some sequences of pickle opcodes are subtle in order to
				60	get the right result in all cases.
				61
				62	+ Things pickle doesn't know everything about. Examples of things pickle
				63	does know everything about are Python's builtin scalar and container
				64	types, like ints and tuples. They generally have opcodes dedicated to
				65	them. For things like module references and instances of user-defined
				66	classes, pickle's knowledge is limited. Historically, many enhancements
				67	have been made to the pickle protocol in order to do a better (faster,
				68	and/or more compact) job on those.
				69
				70	+ Backward compatibility and micro-optimization. As explained below,
				71	pickle opcodes never go away, not even when better ways to do a thing
				72	get invented. The repertoire of the PM just keeps growing over time.
Tim Peters	fdc0346	2003-01-28 04:56:33 +0000	[diff] [blame]	73	For example, protocol 0 had two opcodes for building Python integers (INT
				74	and LONG), protocol 1 added three more for more-efficient pickling of short
				75	integers, and protocol 2 added two more for more-efficient pickling of
				76	long integers (before protocol 2, the only ways to pickle a Python long
				77	took time quadratic in the number of digits, for both pickling and
				78	unpickling). "Opcode bloat" isn't so much a subtlety as a source of
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	79	wearying complication.
				80
				81
				82	Pickle protocols:
				83
				84	For compatibility, the meaning of a pickle opcode never changes. Instead new
				85	pickle opcodes get added, and each version's unpickler can handle all the
				86	pickle opcodes in all protocol versions to date. So old pickles continue to
				87	be readable forever. The pickler can generally be told to restrict itself to
				88	the subset of opcodes available under previous protocol versions too, so that
				89	users can create pickles under the current version readable by older
				90	versions. However, a pickle does not contain its version number embedded
				91	within it. If an older unpickler tries to read a pickle using a later
				92	protocol, the result is most likely an exception due to seeing an unknown (in
				93	the older unpickler) opcode.
				94
				95	The original pickle used what's now called "protocol 0", and what was called
				96	"text mode" before Python 2.3. The entire pickle bytestream is made up of
				97	printable 7-bit ASCII characters, plus the newline character, in protocol 0.
Tim Peters	fdc0346	2003-01-28 04:56:33 +0000	[diff] [blame]	98	That's why it was called text mode. Protocol 0 is small and elegant, but
				99	sometimes painfully inefficient.
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	100
				101	The second major set of additions is now called "protocol 1", and was called
				102	"binary mode" before Python 2.3. This added many opcodes with arguments
				103	consisting of arbitrary bytes, including NUL bytes and unprintable "high bit"
				104	bytes. Binary mode pickles can be substantially smaller than equivalent
				105	text mode pickles, and sometimes faster too; e.g., BININT represents a 4-byte
				106	int as 4 bytes following the opcode, which is cheaper to unpickle than the
Tim Peters	fdc0346	2003-01-28 04:56:33 +0000	[diff] [blame]	107	(perhaps) 11-character decimal string attached to INT. Protocol 1 also added
				108	a number of opcodes that operate on many stack elements at once (like APPENDS
Tim Peters	81098ac	2003-01-28 05:12:08 +0000	[diff] [blame]	109	and SETITEMS), and "shortcut" opcodes (like EMPTY_DICT and EMPTY_TUPLE).
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	110
				111	The third major set of additions came in Python 2.3, and is called "protocol
Tim Peters	fdc0346	2003-01-28 04:56:33 +0000	[diff] [blame]	112	2". This added:
				113
				114	- A better way to pickle instances of new-style classes (NEWOBJ).
				115
				116	- A way for a pickle to identify its protocol (PROTO).
				117
				118	- Time- and space- efficient pickling of long ints (LONG{1,4}).
				119
				120	- Shortcuts for small tuples (TUPLE{1,2,3}}.
				121
				122	- Dedicated opcodes for bools (NEWTRUE, NEWFALSE).
				123
				124	- The "extension registry", a vector of popular objects that can be pushed
				125	efficiently by index (EXT{1,2,4}). This is akin to the memo and GET, but
				126	the registry contents are predefined (there's nothing akin to the memo's
				127	PUT).
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	128	"""
				129
				130	# Meta-rule: Descriptions are stored in instances of descriptor objects,
				131	# with plain constructors. No meta-language is defined from which
				132	# descriptors could be constructed. If you want, e.g., XML, write a little
				133	# program to generate XML from the objects.
				134
				135	##############################################################################
				136	# Some pickle opcodes have an argument, following the opcode in the
				137	# bytestream. An argument is of a specific type, described by an instance
				138	# of ArgumentDescriptor. These are not to be confused with arguments taken
				139	# off the stack -- ArgumentDescriptor applies only to arguments embedded in
				140	# the opcode stream, immediately following an opcode.
				141
				142	# Represents the number of bytes consumed by an argument delimited by the
				143	# next newline character.
				144	UP_TO_NEWLINE = -1
				145
				146	# Represents the number of bytes consumed by a two-argument opcode where
				147	# the first argument gives the number of bytes in the second argument.
Tim Peters	fdb8cfa	2003-01-28 00:13:19 +0000	[diff] [blame]	148	TAKEN_FROM_ARGUMENT1 = -2 # num bytes is 1-byte unsigned int
				149	TAKEN_FROM_ARGUMENT4 = -3 # num bytes is 4-byte signed little-endian int
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	150
				151	class ArgumentDescriptor(object):
				152	__slots__ = (
				153	# name of descriptor record, also a module global name; a string
				154	'name',
				155
				156	# length of argument, in bytes; an int; UP_TO_NEWLINE and
Tim Peters	fdb8cfa	2003-01-28 00:13:19 +0000	[diff] [blame]	157	# TAKEN_FROM_ARGUMENT{1,4} are negative values for variable-length
				158	# cases
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	159	'n',
				160
				161	# a function taking a file-like object, reading this kind of argument
				162	# from the object at the current position, advancing the current
				163	# position by n bytes, and returning the value of the argument
				164	'reader',
				165
				166	# human-readable docs for this arg descriptor; a string
				167	'doc',
				168	)
				169
				170	def __init__(self, name, n, reader, doc):
				171	assert isinstance(name, str)
				172	self.name = name
				173
				174	assert isinstance(n, int) and (n >= 0 or
Tim Peters	fdb8cfa	2003-01-28 00:13:19 +0000	[diff] [blame]	175	n in (UP_TO_NEWLINE,
				176	TAKEN_FROM_ARGUMENT1,
				177	TAKEN_FROM_ARGUMENT4))
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	178	self.n = n
				179
				180	self.reader = reader
				181
				182	assert isinstance(doc, str)
				183	self.doc = doc
				184
				185	from struct import unpack as _unpack
				186
				187	def read_uint1(f):
Tim Peters	55762f5	2003-01-28 16:01:25 +0000	[diff] [blame]	188	r"""
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	189	>>> import StringIO
Tim Peters	55762f5	2003-01-28 16:01:25 +0000	[diff] [blame]	190	>>> read_uint1(StringIO.StringIO('\xff'))
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	191	255
				192	"""
				193
				194	data = f.read(1)
				195	if data:
				196	return ord(data)
				197	raise ValueError("not enough data in stream to read uint1")
				198
				199	uint1 = ArgumentDescriptor(
				200	name='uint1',
				201	n=1,
				202	reader=read_uint1,
				203	doc="One-byte unsigned integer.")
				204
				205
				206	def read_uint2(f):
Tim Peters	55762f5	2003-01-28 16:01:25 +0000	[diff] [blame]	207	r"""
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	208	>>> import StringIO
Tim Peters	55762f5	2003-01-28 16:01:25 +0000	[diff] [blame]	209	>>> read_uint2(StringIO.StringIO('\xff\x00'))
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	210	255
Tim Peters	55762f5	2003-01-28 16:01:25 +0000	[diff] [blame]	211	>>> read_uint2(StringIO.StringIO('\xff\xff'))
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	212	65535
				213	"""
				214
				215	data = f.read(2)
				216	if len(data) == 2:
				217	return _unpack("<H", data)[0]
				218	raise ValueError("not enough data in stream to read uint2")
				219
				220	uint2 = ArgumentDescriptor(
				221	name='uint2',
				222	n=2,
				223	reader=read_uint2,
				224	doc="Two-byte unsigned integer, little-endian.")
				225
				226
				227	def read_int4(f):
Tim Peters	55762f5	2003-01-28 16:01:25 +0000	[diff] [blame]	228	r"""
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	229	>>> import StringIO
Tim Peters	55762f5	2003-01-28 16:01:25 +0000	[diff] [blame]	230	>>> read_int4(StringIO.StringIO('\xff\x00\x00\x00'))
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	231	255
Tim Peters	55762f5	2003-01-28 16:01:25 +0000	[diff] [blame]	232	>>> read_int4(StringIO.StringIO('\x00\x00\x00\x80')) == -(2**31)
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	233	True
				234	"""
				235
				236	data = f.read(4)
				237	if len(data) == 4:
				238	return _unpack("<i", data)[0]
				239	raise ValueError("not enough data in stream to read int4")
				240
				241	int4 = ArgumentDescriptor(
				242	name='int4',
				243	n=4,
				244	reader=read_int4,
				245	doc="Four-byte signed integer, little-endian, 2's complement.")
				246
				247
				248	def read_stringnl(f, decode=True, stripquotes=True):
Tim Peters	55762f5	2003-01-28 16:01:25 +0000	[diff] [blame]	249	r"""
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	250	>>> import StringIO
Tim Peters	55762f5	2003-01-28 16:01:25 +0000	[diff] [blame]	251	>>> read_stringnl(StringIO.StringIO("'abcd'\nefg\n"))
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	252	'abcd'
				253
Tim Peters	55762f5	2003-01-28 16:01:25 +0000	[diff] [blame]	254	>>> read_stringnl(StringIO.StringIO("\n"))
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	255	Traceback (most recent call last):
				256	...
				257	ValueError: no string quotes around ''
				258
Tim Peters	55762f5	2003-01-28 16:01:25 +0000	[diff] [blame]	259	>>> read_stringnl(StringIO.StringIO("\n"), stripquotes=False)
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	260	''
				261
Tim Peters	55762f5	2003-01-28 16:01:25 +0000	[diff] [blame]	262	>>> read_stringnl(StringIO.StringIO("''\n"))
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	263	''
				264
				265	>>> read_stringnl(StringIO.StringIO('"abcd"'))
				266	Traceback (most recent call last):
				267	...
				268	ValueError: no newline found when trying to read stringnl
				269
				270	Embedded escapes are undone in the result.
Tim Peters	55762f5	2003-01-28 16:01:25 +0000	[diff] [blame]	271	>>> read_stringnl(StringIO.StringIO(r"'a\n\\b\x00c\td'" + "\n'e'"))
				272	'a\n\\b\x00c\td'
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	273	"""
				274
				275	data = f.readline()
				276	if not data.endswith('\n'):
				277	raise ValueError("no newline found when trying to read stringnl")
				278	data = data[:-1] # lose the newline
				279
				280	if stripquotes:
				281	for q in "'\"":
				282	if data.startswith(q):
				283	if not data.endswith(q):
				284	raise ValueError("strinq quote %r not found at both "
				285	"ends of %r" % (q, data))
				286	data = data[1:-1]
				287	break
				288	else:
				289	raise ValueError("no string quotes around %r" % data)
				290
				291	# I'm not sure when 'string_escape' was added to the std codecs; it's
				292	# crazy not to use it if it's there.
				293	if decode:
				294	data = data.decode('string_escape')
				295	return data
				296
				297	stringnl = ArgumentDescriptor(
				298	name='stringnl',
				299	n=UP_TO_NEWLINE,
				300	reader=read_stringnl,
				301	doc="""A newline-terminated string.
				302
				303	This is a repr-style string, with embedded escapes, and
				304	bracketing quotes.
				305	""")
				306
				307	def read_stringnl_noescape(f):
				308	return read_stringnl(f, decode=False, stripquotes=False)
				309
				310	stringnl_noescape = ArgumentDescriptor(
				311	name='stringnl_noescape',
				312	n=UP_TO_NEWLINE,
				313	reader=read_stringnl_noescape,
				314	doc="""A newline-terminated string.
				315
				316	This is a str-style string, without embedded escapes,
				317	or bracketing quotes. It should consist solely of
				318	printable ASCII characters.
				319	""")
				320
				321	def read_stringnl_noescape_pair(f):
Tim Peters	55762f5	2003-01-28 16:01:25 +0000	[diff] [blame]	322	r"""
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	323	>>> import StringIO
Tim Peters	55762f5	2003-01-28 16:01:25 +0000	[diff] [blame]	324	>>> read_stringnl_noescape_pair(StringIO.StringIO("Queue\nEmpty\njunk"))
Tim Peters	d916cf4	2003-01-27 19:01:47 +0000	[diff] [blame]	325	'Queue Empty'
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	326	"""
				327
Tim Peters	d916cf4	2003-01-27 19:01:47 +0000	[diff] [blame]	328	return "%s %s" % (read_stringnl_noescape(f), read_stringnl_noescape(f))
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	329
				330	stringnl_noescape_pair = ArgumentDescriptor(
				331	name='stringnl_noescape_pair',
				332	n=UP_TO_NEWLINE,
				333	reader=read_stringnl_noescape_pair,
				334	doc="""A pair of newline-terminated strings.
				335
				336	These are str-style strings, without embedded
				337	escapes, or bracketing quotes. They should
				338	consist solely of printable ASCII characters.
				339	The pair is returned as a single string, with
Tim Peters	d916cf4	2003-01-27 19:01:47 +0000	[diff] [blame]	340	a single blank separating the two strings.
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	341	""")
				342
				343	def read_string4(f):
Tim Peters	55762f5	2003-01-28 16:01:25 +0000	[diff] [blame]	344	r"""
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	345	>>> import StringIO
Tim Peters	55762f5	2003-01-28 16:01:25 +0000	[diff] [blame]	346	>>> read_string4(StringIO.StringIO("\x00\x00\x00\x00abc"))
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	347	''
Tim Peters	55762f5	2003-01-28 16:01:25 +0000	[diff] [blame]	348	>>> read_string4(StringIO.StringIO("\x03\x00\x00\x00abcdef"))
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	349	'abc'
Tim Peters	55762f5	2003-01-28 16:01:25 +0000	[diff] [blame]	350	>>> read_string4(StringIO.StringIO("\x00\x00\x00\x03abcdef"))
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	351	Traceback (most recent call last):
				352	...
				353	ValueError: expected 50331648 bytes in a string4, but only 6 remain
				354	"""
				355
				356	n = read_int4(f)
				357	if n < 0:
				358	raise ValueError("string4 byte count < 0: %d" % n)
				359	data = f.read(n)
				360	if len(data) == n:
				361	return data
				362	raise ValueError("expected %d bytes in a string4, but only %d remain" %
				363	(n, len(data)))
				364
				365	string4 = ArgumentDescriptor(
				366	name="string4",
Tim Peters	fdb8cfa	2003-01-28 00:13:19 +0000	[diff] [blame]	367	n=TAKEN_FROM_ARGUMENT4,
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	368	reader=read_string4,
				369	doc="""A counted string.
				370
				371	The first argument is a 4-byte little-endian signed int giving
				372	the number of bytes in the string, and the second argument is
				373	that many bytes.
				374	""")
				375
				376
				377	def read_string1(f):
Tim Peters	55762f5	2003-01-28 16:01:25 +0000	[diff] [blame]	378	r"""
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	379	>>> import StringIO
Tim Peters	55762f5	2003-01-28 16:01:25 +0000	[diff] [blame]	380	>>> read_string1(StringIO.StringIO("\x00"))
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	381	''
Tim Peters	55762f5	2003-01-28 16:01:25 +0000	[diff] [blame]	382	>>> read_string1(StringIO.StringIO("\x03abcdef"))
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	383	'abc'
				384	"""
				385
				386	n = read_uint1(f)
				387	assert n >= 0
				388	data = f.read(n)
				389	if len(data) == n:
				390	return data
				391	raise ValueError("expected %d bytes in a string1, but only %d remain" %
				392	(n, len(data)))
				393
				394	string1 = ArgumentDescriptor(
				395	name="string1",
Tim Peters	fdb8cfa	2003-01-28 00:13:19 +0000	[diff] [blame]	396	n=TAKEN_FROM_ARGUMENT1,
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	397	reader=read_string1,
				398	doc="""A counted string.
				399
				400	The first argument is a 1-byte unsigned int giving the number
				401	of bytes in the string, and the second argument is that many
				402	bytes.
				403	""")
				404
				405
				406	def read_unicodestringnl(f):
Tim Peters	55762f5	2003-01-28 16:01:25 +0000	[diff] [blame]	407	r"""
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	408	>>> import StringIO
Tim Peters	55762f5	2003-01-28 16:01:25 +0000	[diff] [blame]	409	>>> read_unicodestringnl(StringIO.StringIO("abc\uabcd\njunk"))
				410	u'abc\uabcd'
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	411	"""
				412
				413	data = f.readline()
				414	if not data.endswith('\n'):
				415	raise ValueError("no newline found when trying to read "
				416	"unicodestringnl")
				417	data = data[:-1] # lose the newline
				418	return unicode(data, 'raw-unicode-escape')
				419
				420	unicodestringnl = ArgumentDescriptor(
				421	name='unicodestringnl',
				422	n=UP_TO_NEWLINE,
				423	reader=read_unicodestringnl,
				424	doc="""A newline-terminated Unicode string.
				425
				426	This is raw-unicode-escape encoded, so consists of
				427	printable ASCII characters, and may contain embedded
				428	escape sequences.
				429	""")
				430
				431	def read_unicodestring4(f):
Tim Peters	55762f5	2003-01-28 16:01:25 +0000	[diff] [blame]	432	r"""
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	433	>>> import StringIO
Tim Peters	55762f5	2003-01-28 16:01:25 +0000	[diff] [blame]	434	>>> s = u'abcd\uabcd'
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	435	>>> enc = s.encode('utf-8')
				436	>>> enc
Tim Peters	55762f5	2003-01-28 16:01:25 +0000	[diff] [blame]	437	'abcd\xea\xaf\x8d'
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	438	>>> n = chr(len(enc)) + chr(0) * 3 # little-endian 4-byte length
				439	>>> t = read_unicodestring4(StringIO.StringIO(n + enc + 'junk'))
				440	>>> s == t
				441	True
				442
				443	>>> read_unicodestring4(StringIO.StringIO(n + enc[:-1]))
				444	Traceback (most recent call last):
				445	...
				446	ValueError: expected 7 bytes in a unicodestring4, but only 6 remain
				447	"""
				448
				449	n = read_int4(f)
				450	if n < 0:
				451	raise ValueError("unicodestring4 byte count < 0: %d" % n)
				452	data = f.read(n)
				453	if len(data) == n:
				454	return unicode(data, 'utf-8')
				455	raise ValueError("expected %d bytes in a unicodestring4, but only %d "
				456	"remain" % (n, len(data)))
				457
				458	unicodestring4 = ArgumentDescriptor(
				459	name="unicodestring4",
Tim Peters	fdb8cfa	2003-01-28 00:13:19 +0000	[diff] [blame]	460	n=TAKEN_FROM_ARGUMENT4,
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	461	reader=read_unicodestring4,
				462	doc="""A counted Unicode string.
				463
				464	The first argument is a 4-byte little-endian signed int
				465	giving the number of bytes in the string, and the second
				466	argument-- the UTF-8 encoding of the Unicode string --
				467	contains that many bytes.
				468	""")
				469
				470
				471	def read_decimalnl_short(f):
Tim Peters	55762f5	2003-01-28 16:01:25 +0000	[diff] [blame]	472	r"""
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	473	>>> import StringIO
Tim Peters	55762f5	2003-01-28 16:01:25 +0000	[diff] [blame]	474	>>> read_decimalnl_short(StringIO.StringIO("1234\n56"))
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	475	1234
				476
Tim Peters	55762f5	2003-01-28 16:01:25 +0000	[diff] [blame]	477	>>> read_decimalnl_short(StringIO.StringIO("1234L\n56"))
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	478	Traceback (most recent call last):
				479	...
				480	ValueError: trailing 'L' not allowed in '1234L'
				481	"""
				482
				483	s = read_stringnl(f, decode=False, stripquotes=False)
				484	if s.endswith("L"):
				485	raise ValueError("trailing 'L' not allowed in %r" % s)
				486
				487	# It's not necessarily true that the result fits in a Python short int:
				488	# the pickle may have been written on a 64-bit box. There's also a hack
				489	# for True and False here.
				490	if s == "00":
				491	return False
				492	elif s == "01":
				493	return True
				494
				495	try:
				496	return int(s)
				497	except OverflowError:
				498	return long(s)
				499
				500	def read_decimalnl_long(f):
Tim Peters	55762f5	2003-01-28 16:01:25 +0000	[diff] [blame]	501	r"""
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	502	>>> import StringIO
				503
Tim Peters	55762f5	2003-01-28 16:01:25 +0000	[diff] [blame]	504	>>> read_decimalnl_long(StringIO.StringIO("1234\n56"))
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	505	Traceback (most recent call last):
				506	...
				507	ValueError: trailing 'L' required in '1234'
				508
				509	Someday the trailing 'L' will probably go away from this output.
				510
Tim Peters	55762f5	2003-01-28 16:01:25 +0000	[diff] [blame]	511	>>> read_decimalnl_long(StringIO.StringIO("1234L\n56"))
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	512	1234L
				513
Tim Peters	55762f5	2003-01-28 16:01:25 +0000	[diff] [blame]	514	>>> read_decimalnl_long(StringIO.StringIO("123456789012345678901234L\n6"))
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	515	123456789012345678901234L
				516	"""
				517
				518	s = read_stringnl(f, decode=False, stripquotes=False)
				519	if not s.endswith("L"):
				520	raise ValueError("trailing 'L' required in %r" % s)
				521	return long(s)
				522
				523
				524	decimalnl_short = ArgumentDescriptor(
				525	name='decimalnl_short',
				526	n=UP_TO_NEWLINE,
				527	reader=read_decimalnl_short,
				528	doc="""A newline-terminated decimal integer literal.
				529
				530	This never has a trailing 'L', and the integer fit
				531	in a short Python int on the box where the pickle
				532	was written -- but there's no guarantee it will fit
				533	in a short Python int on the box where the pickle
				534	is read.
				535	""")
				536
				537	decimalnl_long = ArgumentDescriptor(
				538	name='decimalnl_long',
				539	n=UP_TO_NEWLINE,
				540	reader=read_decimalnl_long,
				541	doc="""A newline-terminated decimal integer literal.
				542
				543	This has a trailing 'L', and can represent integers
				544	of any size.
				545	""")
				546
				547
				548	def read_floatnl(f):
Tim Peters	55762f5	2003-01-28 16:01:25 +0000	[diff] [blame]	549	r"""
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	550	>>> import StringIO
Tim Peters	55762f5	2003-01-28 16:01:25 +0000	[diff] [blame]	551	>>> read_floatnl(StringIO.StringIO("-1.25\n6"))
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	552	-1.25
				553	"""
				554	s = read_stringnl(f, decode=False, stripquotes=False)
				555	return float(s)
				556
				557	floatnl = ArgumentDescriptor(
				558	name='floatnl',
				559	n=UP_TO_NEWLINE,
				560	reader=read_floatnl,
				561	doc="""A newline-terminated decimal floating literal.
				562
				563	In general this requires 17 significant digits for roundtrip
				564	identity, and pickling then unpickling infinities, NaNs, and
				565	minus zero doesn't work across boxes, or on some boxes even
				566	on itself (e.g., Windows can't read the strings it produces
				567	for infinities or NaNs).
				568	""")
				569
				570	def read_float8(f):
Tim Peters	55762f5	2003-01-28 16:01:25 +0000	[diff] [blame]	571	r"""
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	572	>>> import StringIO, struct
				573	>>> raw = struct.pack(">d", -1.25)
				574	>>> raw
Tim Peters	55762f5	2003-01-28 16:01:25 +0000	[diff] [blame]	575	'\xbf\xf4\x00\x00\x00\x00\x00\x00'
				576	>>> read_float8(StringIO.StringIO(raw + "\n"))
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	577	-1.25
				578	"""
				579
				580	data = f.read(8)
				581	if len(data) == 8:
				582	return _unpack(">d", data)[0]
				583	raise ValueError("not enough data in stream to read float8")
				584
				585
				586	float8 = ArgumentDescriptor(
				587	name='float8',
				588	n=8,
				589	reader=read_float8,
				590	doc="""An 8-byte binary representation of a float, big-endian.
				591
				592	The format is unique to Python, and shared with the struct
				593	module (format string '>d') "in theory" (the struct and cPickle
				594	implementations don't share the code -- they should). It's
				595	strongly related to the IEEE-754 double format, and, in normal
				596	cases, is in fact identical to the big-endian 754 double format.
				597	On other boxes the dynamic range is limited to that of a 754
				598	double, and "add a half and chop" rounding is used to reduce
				599	the precision to 53 bits. However, even on a 754 box,
				600	infinities, NaNs, and minus zero may not be handled correctly
				601	(may not survive roundtrip pickling intact).
				602	""")
				603
Guido van Rossum	5a2d8f5	2003-01-27 21:44:25 +0000	[diff] [blame]	604	# Protocol 2 formats
				605
Tim Peters	c0c12b5	2003-01-29 00:56:17 +0000	[diff] [blame^]	606	from pickle import decode_long
Guido van Rossum	5a2d8f5	2003-01-27 21:44:25 +0000	[diff] [blame]	607
				608	def read_long1(f):
				609	r"""
				610	>>> import StringIO
				611	>>> read_long1(StringIO.StringIO("\x02\xff\x00"))
				612	255L
				613	>>> read_long1(StringIO.StringIO("\x02\xff\x7f"))
				614	32767L
				615	>>> read_long1(StringIO.StringIO("\x02\x00\xff"))
				616	-256L
				617	>>> read_long1(StringIO.StringIO("\x02\x00\x80"))
				618	-32768L
Tim Peters	5eed340	2003-01-27 23:51:36 +0000	[diff] [blame]	619	>>>
Guido van Rossum	5a2d8f5	2003-01-27 21:44:25 +0000	[diff] [blame]	620	"""
				621
				622	n = read_uint1(f)
				623	data = f.read(n)
				624	if len(data) != n:
				625	raise ValueError("not enough data in stream to read long1")
				626	return decode_long(data)
				627
				628	long1 = ArgumentDescriptor(
				629	name="long1",
Tim Peters	fdb8cfa	2003-01-28 00:13:19 +0000	[diff] [blame]	630	n=TAKEN_FROM_ARGUMENT1,
Guido van Rossum	5a2d8f5	2003-01-27 21:44:25 +0000	[diff] [blame]	631	reader=read_long1,
				632	doc="""A binary long, little-endian, using 1-byte size.
				633
				634	This first reads one byte as an unsigned size, then reads that
Tim Peters	bdbe741	2003-01-27 23:54:04 +0000	[diff] [blame]	635	many bytes and interprets them as a little-endian 2's-complement long.
Guido van Rossum	5a2d8f5	2003-01-27 21:44:25 +0000	[diff] [blame]	636	""")
				637
Guido van Rossum	5a2d8f5	2003-01-27 21:44:25 +0000	[diff] [blame]	638	def read_long4(f):
				639	r"""
				640	>>> import StringIO
				641	>>> read_long4(StringIO.StringIO("\x02\x00\x00\x00\xff\x00"))
				642	255L
				643	>>> read_long4(StringIO.StringIO("\x02\x00\x00\x00\xff\x7f"))
				644	32767L
				645	>>> read_long4(StringIO.StringIO("\x02\x00\x00\x00\x00\xff"))
				646	-256L
				647	>>> read_long4(StringIO.StringIO("\x02\x00\x00\x00\x00\x80"))
				648	-32768L
Tim Peters	5eed340	2003-01-27 23:51:36 +0000	[diff] [blame]	649	>>>
Guido van Rossum	5a2d8f5	2003-01-27 21:44:25 +0000	[diff] [blame]	650	"""
				651
				652	n = read_int4(f)
				653	if n < 0:
Neal Norwitz	784a3f5	2003-01-28 00:20:41 +0000	[diff] [blame]	654	raise ValueError("long4 byte count < 0: %d" % n)
Guido van Rossum	5a2d8f5	2003-01-27 21:44:25 +0000	[diff] [blame]	655	data = f.read(n)
				656	if len(data) != n:
Neal Norwitz	784a3f5	2003-01-28 00:20:41 +0000	[diff] [blame]	657	raise ValueError("not enough data in stream to read long4")
Guido van Rossum	5a2d8f5	2003-01-27 21:44:25 +0000	[diff] [blame]	658	return decode_long(data)
				659
				660	long4 = ArgumentDescriptor(
				661	name="long4",
Tim Peters	fdb8cfa	2003-01-28 00:13:19 +0000	[diff] [blame]	662	n=TAKEN_FROM_ARGUMENT4,
Guido van Rossum	5a2d8f5	2003-01-27 21:44:25 +0000	[diff] [blame]	663	reader=read_long4,
				664	doc="""A binary representation of a long, little-endian.
				665
				666	This first reads four bytes as a signed size (but requires the
				667	size to be >= 0), then reads that many bytes and interprets them
Tim Peters	bdbe741	2003-01-27 23:54:04 +0000	[diff] [blame]	668	as a little-endian 2's-complement long.
Guido van Rossum	5a2d8f5	2003-01-27 21:44:25 +0000	[diff] [blame]	669	""")
				670
				671
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	672	##############################################################################
				673	# Object descriptors. The stack used by the pickle machine holds objects,
				674	# and in the stack_before and stack_after attributes of OpcodeInfo
				675	# descriptors we need names to describe the various types of objects that can
				676	# appear on the stack.
				677
				678	class StackObject(object):
				679	__slots__ = (
				680	# name of descriptor record, for info only
				681	'name',
				682
				683	# type of object, or tuple of type objects (meaning the object can
				684	# be of any type in the tuple)
				685	'obtype',
				686
				687	# human-readable docs for this kind of stack object; a string
				688	'doc',
				689	)
				690
				691	def __init__(self, name, obtype, doc):
				692	assert isinstance(name, str)
				693	self.name = name
				694
				695	assert isinstance(obtype, type) or isinstance(obtype, tuple)
				696	if isinstance(obtype, tuple):
				697	for contained in obtype:
				698	assert isinstance(contained, type)
				699	self.obtype = obtype
				700
				701	assert isinstance(doc, str)
				702	self.doc = doc
				703
				704
				705	pyint = StackObject(
				706	name='int',
				707	obtype=int,
				708	doc="A short (as opposed to long) Python integer object.")
				709
				710	pylong = StackObject(
				711	name='long',
				712	obtype=long,
				713	doc="A long (as opposed to short) Python integer object.")
				714
				715	pyinteger_or_bool = StackObject(
				716	name='int_or_bool',
				717	obtype=(int, long, bool),
				718	doc="A Python integer object (short or long), or "
				719	"a Python bool.")
				720
Guido van Rossum	5a2d8f5	2003-01-27 21:44:25 +0000	[diff] [blame]	721	pybool = StackObject(
				722	name='bool',
				723	obtype=(bool,),
				724	doc="A Python bool object.")
				725
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	726	pyfloat = StackObject(
				727	name='float',
				728	obtype=float,
				729	doc="A Python float object.")
				730
				731	pystring = StackObject(
				732	name='str',
				733	obtype=str,
				734	doc="A Python string object.")
				735
				736	pyunicode = StackObject(
				737	name='unicode',
				738	obtype=unicode,
				739	doc="A Python Unicode string object.")
				740
				741	pynone = StackObject(
				742	name="None",
				743	obtype=type(None),
				744	doc="The Python None object.")
				745
				746	pytuple = StackObject(
				747	name="tuple",
				748	obtype=tuple,
				749	doc="A Python tuple object.")
				750
				751	pylist = StackObject(
				752	name="list",
				753	obtype=list,
				754	doc="A Python list object.")
				755
				756	pydict = StackObject(
				757	name="dict",
				758	obtype=dict,
				759	doc="A Python dict object.")
				760
				761	anyobject = StackObject(
				762	name='any',
				763	obtype=object,
				764	doc="Any kind of object whatsoever.")
				765
				766	markobject = StackObject(
				767	name="mark",
				768	obtype=StackObject,
				769	doc="""'The mark' is a unique object.
				770
				771	Opcodes that operate on a variable number of objects
				772	generally don't embed the count of objects in the opcode,
				773	or pull it off the stack. Instead the MARK opcode is used
				774	to push a special marker object on the stack, and then
				775	some other opcodes grab all the objects from the top of
				776	the stack down to (but not including) the topmost marker
				777	object.
				778	""")
				779
				780	stackslice = StackObject(
				781	name="stackslice",
				782	obtype=StackObject,
				783	doc="""An object representing a contiguous slice of the stack.
				784
				785	This is used in conjuction with markobject, to represent all
				786	of the stack following the topmost markobject. For example,
				787	the POP_MARK opcode changes the stack from
				788
				789	[..., markobject, stackslice]
				790	to
				791	[...]
				792
				793	No matter how many object are on the stack after the topmost
				794	markobject, POP_MARK gets rid of all of them (including the
				795	topmost markobject too).
				796	""")
				797
				798	##############################################################################
				799	# Descriptors for pickle opcodes.
				800
				801	class OpcodeInfo(object):
				802
				803	__slots__ = (
				804	# symbolic name of opcode; a string
				805	'name',
				806
				807	# the code used in a bytestream to represent the opcode; a
				808	# one-character string
				809	'code',
				810
				811	# If the opcode has an argument embedded in the byte string, an
				812	# instance of ArgumentDescriptor specifying its type. Note that
				813	# arg.reader(s) can be used to read and decode the argument from
				814	# the bytestream s, and arg.doc documents the format of the raw
				815	# argument bytes. If the opcode doesn't have an argument embedded
				816	# in the bytestream, arg should be None.
				817	'arg',
				818
				819	# what the stack looks like before this opcode runs; a list
				820	'stack_before',
				821
				822	# what the stack looks like after this opcode runs; a list
				823	'stack_after',
				824
				825	# the protocol number in which this opcode was introduced; an int
				826	'proto',
				827
				828	# human-readable docs for this opcode; a string
				829	'doc',
				830	)
				831
				832	def __init__(self, name, code, arg,
				833	stack_before, stack_after, proto, doc):
				834	assert isinstance(name, str)
				835	self.name = name
				836
				837	assert isinstance(code, str)
				838	assert len(code) == 1
				839	self.code = code
				840
				841	assert arg is None or isinstance(arg, ArgumentDescriptor)
				842	self.arg = arg
				843
				844	assert isinstance(stack_before, list)
				845	for x in stack_before:
				846	assert isinstance(x, StackObject)
				847	self.stack_before = stack_before
				848
				849	assert isinstance(stack_after, list)
				850	for x in stack_after:
				851	assert isinstance(x, StackObject)
				852	self.stack_after = stack_after
				853
				854	assert isinstance(proto, int) and 0 <= proto <= 2
				855	self.proto = proto
				856
				857	assert isinstance(doc, str)
				858	self.doc = doc
				859
				860	I = OpcodeInfo
				861	opcodes = [
				862
				863	# Ways to spell integers.
				864
				865	I(name='INT',
				866	code='I',
				867	arg=decimalnl_short,
				868	stack_before=[],
				869	stack_after=[pyinteger_or_bool],
				870	proto=0,
				871	doc="""Push an integer or bool.
				872
				873	The argument is a newline-terminated decimal literal string.
				874
				875	The intent may have been that this always fit in a short Python int,
				876	but INT can be generated in pickles written on a 64-bit box that
				877	require a Python long on a 32-bit box. The difference between this
				878	and LONG then is that INT skips a trailing 'L', and produces a short
				879	int whenever possible.
				880
				881	Another difference is due to that, when bool was introduced as a
				882	distinct type in 2.3, builtin names True and False were also added to
				883	2.2.2, mapping to ints 1 and 0. For compatibility in both directions,
				884	True gets pickled as INT + "I01\\n", and False as INT + "I00\\n".
				885	Leading zeroes are never produced for a genuine integer. The 2.3
				886	(and later) unpicklers special-case these and return bool instead;
				887	earlier unpicklers ignore the leading "0" and return the int.
				888	"""),
				889
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	890	I(name='BININT',
				891	code='J',
				892	arg=int4,
				893	stack_before=[],
				894	stack_after=[pyint],
				895	proto=1,
				896	doc="""Push a four-byte signed integer.
				897
				898	This handles the full range of Python (short) integers on a 32-bit
				899	box, directly as binary bytes (1 for the opcode and 4 for the integer).
				900	If the integer is non-negative and fits in 1 or 2 bytes, pickling via
				901	BININT1 or BININT2 saves space.
				902	"""),
				903
				904	I(name='BININT1',
				905	code='K',
				906	arg=uint1,
				907	stack_before=[],
				908	stack_after=[pyint],
				909	proto=1,
				910	doc="""Push a one-byte unsigned integer.
				911
				912	This is a space optimization for pickling very small non-negative ints,
				913	in range(256).
				914	"""),
				915
				916	I(name='BININT2',
				917	code='M',
				918	arg=uint2,
				919	stack_before=[],
				920	stack_after=[pyint],
				921	proto=1,
				922	doc="""Push a two-byte unsigned integer.
				923
				924	This is a space optimization for pickling small positive ints, in
				925	range(256, 2**16). Integers in range(256) can also be pickled via
				926	BININT2, but BININT1 instead saves a byte.
				927	"""),
				928
Tim Peters	fdc0346	2003-01-28 04:56:33 +0000	[diff] [blame]	929	I(name='LONG',
				930	code='L',
				931	arg=decimalnl_long,
				932	stack_before=[],
				933	stack_after=[pylong],
				934	proto=0,
				935	doc="""Push a long integer.
				936
				937	The same as INT, except that the literal ends with 'L', and always
				938	unpickles to a Python long. There doesn't seem a real purpose to the
				939	trailing 'L'.
				940
				941	Note that LONG takes time quadratic in the number of digits when
				942	unpickling (this is simply due to the nature of decimal->binary
				943	conversion). Proto 2 added linear-time (in C; still quadratic-time
				944	in Python) LONG1 and LONG4 opcodes.
				945	"""),
				946
				947	I(name="LONG1",
				948	code='\x8a',
				949	arg=long1,
				950	stack_before=[],
				951	stack_after=[pylong],
				952	proto=2,
				953	doc="""Long integer using one-byte length.
				954
				955	A more efficient encoding of a Python long; the long1 encoding
				956	says it all."""),
				957
				958	I(name="LONG4",
				959	code='\x8b',
				960	arg=long4,
				961	stack_before=[],
				962	stack_after=[pylong],
				963	proto=2,
				964	doc="""Long integer using found-byte length.
				965
				966	A more efficient encoding of a Python long; the long4 encoding
				967	says it all."""),
				968
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	969	# Ways to spell strings (8-bit, not Unicode).
				970
				971	I(name='STRING',
				972	code='S',
				973	arg=stringnl,
				974	stack_before=[],
				975	stack_after=[pystring],
				976	proto=0,
				977	doc="""Push a Python string object.
				978
				979	The argument is a repr-style string, with bracketing quote characters,
				980	and perhaps embedded escapes. The argument extends until the next
				981	newline character.
				982	"""),
				983
				984	I(name='BINSTRING',
				985	code='T',
				986	arg=string4,
				987	stack_before=[],
				988	stack_after=[pystring],
				989	proto=1,
				990	doc="""Push a Python string object.
				991
				992	There are two arguments: the first is a 4-byte little-endian signed int
				993	giving the number of bytes in the string, and the second is that many
				994	bytes, which are taken literally as the string content.
				995	"""),
				996
				997	I(name='SHORT_BINSTRING',
				998	code='U',
				999	arg=string1,
				1000	stack_before=[],
				1001	stack_after=[pystring],
				1002	proto=1,
				1003	doc="""Push a Python string object.
				1004
				1005	There are two arguments: the first is a 1-byte unsigned int giving
				1006	the number of bytes in the string, and the second is that many bytes,
				1007	which are taken literally as the string content.
				1008	"""),
				1009
				1010	# Ways to spell None.
				1011
				1012	I(name='NONE',
				1013	code='N',
				1014	arg=None,
				1015	stack_before=[],
				1016	stack_after=[pynone],
				1017	proto=0,
				1018	doc="Push None on the stack."),
				1019
Tim Peters	fdc0346	2003-01-28 04:56:33 +0000	[diff] [blame]	1020	# Ways to spell bools, starting with proto 2. See INT for how this was
				1021	# done before proto 2.
				1022
				1023	I(name='NEWTRUE',
				1024	code='\x88',
				1025	arg=None,
				1026	stack_before=[],
				1027	stack_after=[pybool],
				1028	proto=2,
				1029	doc="""True.
				1030
				1031	Push True onto the stack."""),
				1032
				1033	I(name='NEWFALSE',
				1034	code='\x89',
				1035	arg=None,
				1036	stack_before=[],
				1037	stack_after=[pybool],
				1038	proto=2,
				1039	doc="""True.
				1040
				1041	Push False onto the stack."""),
				1042
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	1043	# Ways to spell Unicode strings.
				1044
				1045	I(name='UNICODE',
				1046	code='V',
				1047	arg=unicodestringnl,
				1048	stack_before=[],
				1049	stack_after=[pyunicode],
				1050	proto=0, # this may be pure-text, but it's a later addition
				1051	doc="""Push a Python Unicode string object.
				1052
				1053	The argument is a raw-unicode-escape encoding of a Unicode string,
				1054	and so may contain embedded escape sequences. The argument extends
				1055	until the next newline character.
				1056	"""),
				1057
				1058	I(name='BINUNICODE',
				1059	code='X',
				1060	arg=unicodestring4,
				1061	stack_before=[],
				1062	stack_after=[pyunicode],
				1063	proto=1,
				1064	doc="""Push a Python Unicode string object.
				1065
				1066	There are two arguments: the first is a 4-byte little-endian signed int
				1067	giving the number of bytes in the string. The second is that many
				1068	bytes, and is the UTF-8 encoding of the Unicode string.
				1069	"""),
				1070
				1071	# Ways to spell floats.
				1072
				1073	I(name='FLOAT',
				1074	code='F',
				1075	arg=floatnl,
				1076	stack_before=[],
				1077	stack_after=[pyfloat],
				1078	proto=0,
				1079	doc="""Newline-terminated decimal float literal.
				1080
				1081	The argument is repr(a_float), and in general requires 17 significant
				1082	digits for roundtrip conversion to be an identity (this is so for
				1083	IEEE-754 double precision values, which is what Python float maps to
				1084	on most boxes).
				1085
				1086	In general, FLOAT cannot be used to transport infinities, NaNs, or
				1087	minus zero across boxes (or even on a single box, if the platform C
				1088	library can't read the strings it produces for such things -- Windows
				1089	is like that), but may do less damage than BINFLOAT on boxes with
				1090	greater precision or dynamic range than IEEE-754 double.
				1091	"""),
				1092
				1093	I(name='BINFLOAT',
				1094	code='G',
				1095	arg=float8,
				1096	stack_before=[],
				1097	stack_after=[pyfloat],
				1098	proto=1,
				1099	doc="""Float stored in binary form, with 8 bytes of data.
				1100
				1101	This generally requires less than half the space of FLOAT encoding.
				1102	In general, BINFLOAT cannot be used to transport infinities, NaNs, or
				1103	minus zero, raises an exception if the exponent exceeds the range of
				1104	an IEEE-754 double, and retains no more than 53 bits of precision (if
				1105	there are more than that, "add a half and chop" rounding is used to
				1106	cut it back to 53 significant bits).
				1107	"""),
				1108
				1109	# Ways to build lists.
				1110
				1111	I(name='EMPTY_LIST',
				1112	code=']',
				1113	arg=None,
				1114	stack_before=[],
				1115	stack_after=[pylist],
				1116	proto=1,
				1117	doc="Push an empty list."),
				1118
				1119	I(name='APPEND',
				1120	code='a',
				1121	arg=None,
				1122	stack_before=[pylist, anyobject],
				1123	stack_after=[pylist],
				1124	proto=0,
				1125	doc="""Append an object to a list.
				1126
				1127	Stack before: ... pylist anyobject
				1128	Stack after: ... pylist+[anyobject]
Tim Peters	81098ac	2003-01-28 05:12:08 +0000	[diff] [blame]	1129
				1130	although pylist is really extended in-place.
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	1131	"""),
				1132
				1133	I(name='APPENDS',
				1134	code='e',
				1135	arg=None,
				1136	stack_before=[pylist, markobject, stackslice],
				1137	stack_after=[pylist],
				1138	proto=1,
				1139	doc="""Extend a list by a slice of stack objects.
				1140
				1141	Stack before: ... pylist markobject stackslice
				1142	Stack after: ... pylist+stackslice
Tim Peters	81098ac	2003-01-28 05:12:08 +0000	[diff] [blame]	1143
				1144	although pylist is really extended in-place.
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	1145	"""),
				1146
				1147	I(name='LIST',
				1148	code='l',
				1149	arg=None,
				1150	stack_before=[markobject, stackslice],
				1151	stack_after=[pylist],
				1152	proto=0,
				1153	doc="""Build a list out of the topmost stack slice, after markobject.
				1154
				1155	All the stack entries following the topmost markobject are placed into
				1156	a single Python list, which single list object replaces all of the
				1157	stack from the topmost markobject onward. For example,
				1158
				1159	Stack before: ... markobject 1 2 3 'abc'
				1160	Stack after: ... [1, 2, 3, 'abc']
				1161	"""),
				1162
				1163	# Ways to build tuples.
				1164
				1165	I(name='EMPTY_TUPLE',
				1166	code=')',
				1167	arg=None,
				1168	stack_before=[],
				1169	stack_after=[pytuple],
				1170	proto=1,
				1171	doc="Push an empty tuple."),
				1172
				1173	I(name='TUPLE',
				1174	code='t',
				1175	arg=None,
				1176	stack_before=[markobject, stackslice],
				1177	stack_after=[pytuple],
				1178	proto=0,
				1179	doc="""Build a tuple out of the topmost stack slice, after markobject.
				1180
				1181	All the stack entries following the topmost markobject are placed into
				1182	a single Python tuple, which single tuple object replaces all of the
				1183	stack from the topmost markobject onward. For example,
				1184
				1185	Stack before: ... markobject 1 2 3 'abc'
				1186	Stack after: ... (1, 2, 3, 'abc')
				1187	"""),
				1188
Tim Peters	fdc0346	2003-01-28 04:56:33 +0000	[diff] [blame]	1189	I(name='TUPLE1',
				1190	code='\x85',
				1191	arg=None,
				1192	stack_before=[anyobject],
				1193	stack_after=[pytuple],
				1194	proto=2,
				1195	doc="""One-tuple.
				1196
				1197	This code pops one value off the stack and pushes a tuple of
				1198	length 1 whose one item is that value back onto it. IOW:
				1199
				1200	stack[-1] = tuple(stack[-1:])
				1201	"""),
				1202
				1203	I(name='TUPLE2',
				1204	code='\x86',
				1205	arg=None,
				1206	stack_before=[anyobject, anyobject],
				1207	stack_after=[pytuple],
				1208	proto=2,
				1209	doc="""One-tuple.
				1210
				1211	This code pops two values off the stack and pushes a tuple
				1212	of length 2 whose items are those values back onto it. IOW:
				1213
				1214	stack[-2:] = [tuple(stack[-2:])]
				1215	"""),
				1216
				1217	I(name='TUPLE3',
				1218	code='\x87',
				1219	arg=None,
				1220	stack_before=[anyobject, anyobject, anyobject],
				1221	stack_after=[pytuple],
				1222	proto=2,
				1223	doc="""One-tuple.
				1224
				1225	This code pops three values off the stack and pushes a tuple
				1226	of length 3 whose items are those values back onto it. IOW:
				1227
				1228	stack[-3:] = [tuple(stack[-3:])]
				1229	"""),
				1230
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	1231	# Ways to build dicts.
				1232
				1233	I(name='EMPTY_DICT',
				1234	code='}',
				1235	arg=None,
				1236	stack_before=[],
				1237	stack_after=[pydict],
				1238	proto=1,
				1239	doc="Push an empty dict."),
				1240
				1241	I(name='DICT',
				1242	code='d',
				1243	arg=None,
				1244	stack_before=[markobject, stackslice],
				1245	stack_after=[pydict],
				1246	proto=0,
				1247	doc="""Build a dict out of the topmost stack slice, after markobject.
				1248
				1249	All the stack entries following the topmost markobject are placed into
				1250	a single Python dict, which single dict object replaces all of the
				1251	stack from the topmost markobject onward. The stack slice alternates
				1252	key, value, key, value, .... For example,
				1253
				1254	Stack before: ... markobject 1 2 3 'abc'
				1255	Stack after: ... {1: 2, 3: 'abc'}
				1256	"""),
				1257
				1258	I(name='SETITEM',
				1259	code='s',
				1260	arg=None,
				1261	stack_before=[pydict, anyobject, anyobject],
				1262	stack_after=[pydict],
				1263	proto=0,
				1264	doc="""Add a key+value pair to an existing dict.
				1265
				1266	Stack before: ... pydict key value
				1267	Stack after: ... pydict
				1268
				1269	where pydict has been modified via pydict[key] = value.
				1270	"""),
				1271
				1272	I(name='SETITEMS',
				1273	code='u',
				1274	arg=None,
				1275	stack_before=[pydict, markobject, stackslice],
				1276	stack_after=[pydict],
				1277	proto=1,
				1278	doc="""Add an arbitrary number of key+value pairs to an existing dict.
				1279
				1280	The slice of the stack following the topmost markobject is taken as
				1281	an alternating sequence of keys and values, added to the dict
				1282	immediately under the topmost markobject. Everything at and after the
				1283	topmost markobject is popped, leaving the mutated dict at the top
				1284	of the stack.
				1285
				1286	Stack before: ... pydict markobject key_1 value_1 ... key_n value_n
				1287	Stack after: ... pydict
				1288
				1289	where pydict has been modified via pydict[key_i] = value_i for i in
				1290	1, 2, ..., n, and in that order.
				1291	"""),
				1292
				1293	# Stack manipulation.
				1294
				1295	I(name='POP',
				1296	code='0',
				1297	arg=None,
				1298	stack_before=[anyobject],
				1299	stack_after=[],
				1300	proto=0,
				1301	doc="Discard the top stack item, shrinking the stack by one item."),
				1302
				1303	I(name='DUP',
				1304	code='2',
				1305	arg=None,
				1306	stack_before=[anyobject],
				1307	stack_after=[anyobject, anyobject],
				1308	proto=0,
				1309	doc="Push the top stack item onto the stack again, duplicating it."),
				1310
				1311	I(name='MARK',
				1312	code='(',
				1313	arg=None,
				1314	stack_before=[],
				1315	stack_after=[markobject],
				1316	proto=0,
				1317	doc="""Push markobject onto the stack.
				1318
				1319	markobject is a unique object, used by other opcodes to identify a
				1320	region of the stack containing a variable number of objects for them
				1321	to work on. See markobject.doc for more detail.
				1322	"""),
				1323
				1324	I(name='POP_MARK',
				1325	code='1',
				1326	arg=None,
				1327	stack_before=[markobject, stackslice],
				1328	stack_after=[],
				1329	proto=0,
				1330	doc="""Pop all the stack objects at and above the topmost markobject.
				1331
				1332	When an opcode using a variable number of stack objects is done,
				1333	POP_MARK is used to remove those objects, and to remove the markobject
				1334	that delimited their starting position on the stack.
				1335	"""),
				1336
				1337	# Memo manipulation. There are really only two operations (get and put),
				1338	# each in all-text, "short binary", and "long binary" flavors.
				1339
				1340	I(name='GET',
				1341	code='g',
				1342	arg=decimalnl_short,
				1343	stack_before=[],
				1344	stack_after=[anyobject],
				1345	proto=0,
				1346	doc="""Read an object from the memo and push it on the stack.
				1347
				1348	The index of the memo object to push is given by the newline-teriminated
				1349	decimal string following. BINGET and LONG_BINGET are space-optimized
				1350	versions.
				1351	"""),
				1352
				1353	I(name='BINGET',
				1354	code='h',
				1355	arg=uint1,
				1356	stack_before=[],
				1357	stack_after=[anyobject],
				1358	proto=1,
				1359	doc="""Read an object from the memo and push it on the stack.
				1360
				1361	The index of the memo object to push is given by the 1-byte unsigned
				1362	integer following.
				1363	"""),
				1364
				1365	I(name='LONG_BINGET',
				1366	code='j',
				1367	arg=int4,
				1368	stack_before=[],
				1369	stack_after=[anyobject],
				1370	proto=1,
				1371	doc="""Read an object from the memo and push it on the stack.
				1372
				1373	The index of the memo object to push is given by the 4-byte signed
				1374	little-endian integer following.
				1375	"""),
				1376
				1377	I(name='PUT',
				1378	code='p',
				1379	arg=decimalnl_short,
				1380	stack_before=[],
				1381	stack_after=[],
				1382	proto=0,
				1383	doc="""Store the stack top into the memo. The stack is not popped.
				1384
				1385	The index of the memo location to write into is given by the newline-
				1386	terminated decimal string following. BINPUT and LONG_BINPUT are
				1387	space-optimized versions.
				1388	"""),
				1389
				1390	I(name='BINPUT',
				1391	code='q',
				1392	arg=uint1,
				1393	stack_before=[],
				1394	stack_after=[],
				1395	proto=1,
				1396	doc="""Store the stack top into the memo. The stack is not popped.
				1397
				1398	The index of the memo location to write into is given by the 1-byte
				1399	unsigned integer following.
				1400	"""),
				1401
				1402	I(name='LONG_BINPUT',
				1403	code='r',
				1404	arg=int4,
				1405	stack_before=[],
				1406	stack_after=[],
				1407	proto=1,
				1408	doc="""Store the stack top into the memo. The stack is not popped.
				1409
				1410	The index of the memo location to write into is given by the 4-byte
				1411	signed little-endian integer following.
				1412	"""),
				1413
Tim Peters	fdc0346	2003-01-28 04:56:33 +0000	[diff] [blame]	1414	# Access the extension registry (predefined objects). Akin to the GET
				1415	# family.
				1416
				1417	I(name='EXT1',
				1418	code='\x82',
				1419	arg=uint1,
				1420	stack_before=[],
				1421	stack_after=[anyobject],
				1422	proto=2,
				1423	doc="""Extension code.
				1424
				1425	This code and the similar EXT2 and EXT4 allow using a registry
				1426	of popular objects that are pickled by name, typically classes.
				1427	It is envisioned that through a global negotiation and
				1428	registration process, third parties can set up a mapping between
				1429	ints and object names.
				1430
				1431	In order to guarantee pickle interchangeability, the extension
				1432	code registry ought to be global, although a range of codes may
				1433	be reserved for private use.
				1434
				1435	EXT1 has a 1-byte integer argument. This is used to index into the
				1436	extension registry, and the object at that index is pushed on the stack.
				1437	"""),
				1438
				1439	I(name='EXT2',
				1440	code='\x83',
				1441	arg=uint2,
				1442	stack_before=[],
				1443	stack_after=[anyobject],
				1444	proto=2,
				1445	doc="""Extension code.
				1446
				1447	See EXT1. EXT2 has a two-byte integer argument.
				1448	"""),
				1449
				1450	I(name='EXT4',
				1451	code='\x84',
				1452	arg=int4,
				1453	stack_before=[],
				1454	stack_after=[anyobject],
				1455	proto=2,
				1456	doc="""Extension code.
				1457
				1458	See EXT1. EXT4 has a four-byte integer argument.
				1459	"""),
				1460
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	1461	# Push a class object, or module function, on the stack, via its module
				1462	# and name.
				1463
				1464	I(name='GLOBAL',
				1465	code='c',
				1466	arg=stringnl_noescape_pair,
				1467	stack_before=[],
				1468	stack_after=[anyobject],
				1469	proto=0,
				1470	doc="""Push a global object (module.attr) on the stack.
				1471
				1472	Two newline-terminated strings follow the GLOBAL opcode. The first is
				1473	taken as a module name, and the second as a class name. The class
				1474	object module.class is pushed on the stack. More accurately, the
				1475	object returned by self.find_class(module, class) is pushed on the
				1476	stack, so unpickling subclasses can override this form of lookup.
				1477	"""),
				1478
				1479	# Ways to build objects of classes pickle doesn't know about directly
				1480	# (user-defined classes). I despair of documenting this accurately
				1481	# and comprehensibly -- you really have to read the pickle code to
				1482	# find all the special cases.
				1483
				1484	I(name='REDUCE',
				1485	code='R',
				1486	arg=None,
				1487	stack_before=[anyobject, anyobject],
				1488	stack_after=[anyobject],
				1489	proto=0,
				1490	doc="""Push an object built from a callable and an argument tuple.
				1491
				1492	The opcode is named to remind of the __reduce__() method.
				1493
				1494	Stack before: ... callable pytuple
				1495	Stack after: ... callable(*pytuple)
				1496
				1497	The callable and the argument tuple are the first two items returned
				1498	by a __reduce__ method. Applying the callable to the argtuple is
				1499	supposed to reproduce the original object, or at least get it started.
				1500	If the __reduce__ method returns a 3-tuple, the last component is an
				1501	argument to be passed to the object's __setstate__, and then the REDUCE
				1502	opcode is followed by code to create setstate's argument, and then a
				1503	BUILD opcode to apply __setstate__ to that argument.
				1504
				1505	There are lots of special cases here. The argtuple can be None, in
				1506	which case callable.__basicnew__() is called instead to produce the
				1507	object to be pushed on the stack. This appears to be a trick unique
				1508	to ExtensionClasses, and is deprecated regardless.
				1509
				1510	If type(callable) is not ClassType, REDUCE complains unless the
				1511	callable has been registered with the copy_reg module's
				1512	safe_constructors dict, or the callable has a magic
				1513	'__safe_for_unpickling__' attribute with a true value. I'm not sure
				1514	why it does this, but I've sure seen this complaint often enough when
				1515	I didn't want to <wink>.
				1516	"""),
				1517
				1518	I(name='BUILD',
				1519	code='b',
				1520	arg=None,
				1521	stack_before=[anyobject, anyobject],
				1522	stack_after=[anyobject],
				1523	proto=0,
				1524	doc="""Finish building an object, via __setstate__ or dict update.
				1525
				1526	Stack before: ... anyobject argument
				1527	Stack after: ... anyobject
				1528
				1529	where anyobject may have been mutated, as follows:
				1530
				1531	If the object has a __setstate__ method,
				1532
				1533	anyobject.__setstate__(argument)
				1534
				1535	is called.
				1536
				1537	Else the argument must be a dict, the object must have a __dict__, and
				1538	the object is updated via
				1539
				1540	anyobject.__dict__.update(argument)
				1541
				1542	This may raise RuntimeError in restricted execution mode (which
				1543	disallows access to __dict__ directly); in that case, the object
				1544	is updated instead via
				1545
				1546	for k, v in argument.items():
				1547	anyobject[k] = v
				1548	"""),
				1549
				1550	I(name='INST',
				1551	code='i',
				1552	arg=stringnl_noescape_pair,
				1553	stack_before=[markobject, stackslice],
				1554	stack_after=[anyobject],
				1555	proto=0,
				1556	doc="""Build a class instance.
				1557
				1558	This is the protocol 0 version of protocol 1's OBJ opcode.
				1559	INST is followed by two newline-terminated strings, giving a
				1560	module and class name, just as for the GLOBAL opcode (and see
				1561	GLOBAL for more details about that). self.find_class(module, name)
				1562	is used to get a class object.
				1563
				1564	In addition, all the objects on the stack following the topmost
				1565	markobject are gathered into a tuple and popped (along with the
				1566	topmost markobject), just as for the TUPLE opcode.
				1567
				1568	Now it gets complicated. If all of these are true:
				1569
				1570	+ The argtuple is empty (markobject was at the top of the stack
				1571	at the start).
				1572
				1573	+ It's an old-style class object (the type of the class object is
				1574	ClassType).
				1575
				1576	+ The class object does not have a __getinitargs__ attribute.
				1577
				1578	then we want to create an old-style class instance without invoking
				1579	its __init__() method (pickle has waffled on this over the years; not
				1580	calling __init__() is current wisdom). In this case, an instance of
				1581	an old-style dummy class is created, and then we try to rebind its
				1582	__class__ attribute to the desired class object. If this succeeds,
				1583	the new instance object is pushed on the stack, and we're done. In
				1584	restricted execution mode it can fail (assignment to __class__ is
				1585	disallowed), and I'm not really sure what happens then -- it looks
				1586	like the code ends up calling the class object's __init__ anyway,
				1587	via falling into the next case.
				1588
				1589	Else (the argtuple is not empty, it's not an old-style class object,
				1590	or the class object does have a __getinitargs__ attribute), the code
				1591	first insists that the class object have a __safe_for_unpickling__
				1592	attribute. Unlike as for the __safe_for_unpickling__ check in REDUCE,
				1593	it doesn't matter whether this attribute has a true or false value, it
				1594	only matters whether it exists (XXX this smells like a bug). If
				1595	__safe_for_unpickling__ dosn't exist, UnpicklingError is raised.
				1596
				1597	Else (the class object does have a __safe_for_unpickling__ attr),
				1598	the class object obtained from INST's arguments is applied to the
				1599	argtuple obtained from the stack, and the resulting instance object
				1600	is pushed on the stack.
				1601	"""),
				1602
				1603	I(name='OBJ',
				1604	code='o',
				1605	arg=None,
				1606	stack_before=[markobject, anyobject, stackslice],
				1607	stack_after=[anyobject],
				1608	proto=1,
				1609	doc="""Build a class instance.
				1610
				1611	This is the protocol 1 version of protocol 0's INST opcode, and is
				1612	very much like it. The major difference is that the class object
				1613	is taken off the stack, allowing it to be retrieved from the memo
				1614	repeatedly if several instances of the same class are created. This
				1615	can be much more efficient (in both time and space) than repeatedly
				1616	embedding the module and class names in INST opcodes.
				1617
				1618	Unlike INST, OBJ takes no arguments from the opcode stream. Instead
				1619	the class object is taken off the stack, immediately above the
				1620	topmost markobject:
				1621
				1622	Stack before: ... markobject classobject stackslice
				1623	Stack after: ... new_instance_object
				1624
				1625	As for INST, the remainder of the stack above the markobject is
				1626	gathered into an argument tuple, and then the logic seems identical,
				1627	except that no __safe_for_unpickling__ check is done (XXX this smells
				1628	like a bug). See INST for the gory details.
				1629	"""),
				1630
Tim Peters	fdc0346	2003-01-28 04:56:33 +0000	[diff] [blame]	1631	I(name='NEWOBJ',
				1632	code='\x81',
				1633	arg=None,
				1634	stack_before=[anyobject, anyobject],
				1635	stack_after=[anyobject],
				1636	proto=2,
				1637	doc="""Build an object instance.
				1638
				1639	The stack before should be thought of as containing a class
				1640	object followed by an argument tuple (the tuple being the stack
				1641	top). Call these cls and args. They are popped off the stack,
				1642	and the value returned by cls.__new__(cls, *args) is pushed back
				1643	onto the stack.
				1644	"""),
				1645
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	1646	# Machine control.
				1647
Tim Peters	fdc0346	2003-01-28 04:56:33 +0000	[diff] [blame]	1648	I(name='PROTO',
				1649	code='\x80',
				1650	arg=uint1,
				1651	stack_before=[],
				1652	stack_after=[],
				1653	proto=2,
				1654	doc="""Protocol version indicator.
				1655
				1656	For protocol 2 and above, a pickle must start with this opcode.
				1657	The argument is the protocol version, an int in range(2, 256).
				1658	"""),
				1659
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	1660	I(name='STOP',
				1661	code='.',
				1662	arg=None,
				1663	stack_before=[anyobject],
				1664	stack_after=[],
				1665	proto=0,
				1666	doc="""Stop the unpickling machine.
				1667
				1668	Every pickle ends with this opcode. The object at the top of the stack
				1669	is popped, and that's the result of unpickling. The stack should be
				1670	empty then.
				1671	"""),
				1672
				1673	# Ways to deal with persistent IDs.
				1674
				1675	I(name='PERSID',
				1676	code='P',
				1677	arg=stringnl_noescape,
				1678	stack_before=[],
				1679	stack_after=[anyobject],
				1680	proto=0,
				1681	doc="""Push an object identified by a persistent ID.
				1682
				1683	The pickle module doesn't define what a persistent ID means. PERSID's
				1684	argument is a newline-terminated str-style (no embedded escapes, no
				1685	bracketing quote characters) string, which is "the persistent ID".
				1686	The unpickler passes this string to self.persistent_load(). Whatever
				1687	object that returns is pushed on the stack. There is no implementation
				1688	of persistent_load() in Python's unpickler: it must be supplied by an
				1689	unpickler subclass.
				1690	"""),
				1691
				1692	I(name='BINPERSID',
				1693	code='Q',
				1694	arg=None,
				1695	stack_before=[anyobject],
				1696	stack_after=[anyobject],
				1697	proto=1,
				1698	doc="""Push an object identified by a persistent ID.
				1699
				1700	Like PERSID, except the persistent ID is popped off the stack (instead
				1701	of being a string embedded in the opcode bytestream). The persistent
				1702	ID is passed to self.persistent_load(), and whatever object that
				1703	returns is pushed on the stack. See PERSID for more detail.
				1704	"""),
				1705	]
				1706	del I
				1707
				1708	# Verify uniqueness of .name and .code members.
				1709	name2i = {}
				1710	code2i = {}
				1711
				1712	for i, d in enumerate(opcodes):
				1713	if d.name in name2i:
				1714	raise ValueError("repeated name %r at indices %d and %d" %
				1715	(d.name, name2i[d.name], i))
				1716	if d.code in code2i:
				1717	raise ValueError("repeated code %r at indices %d and %d" %
				1718	(d.code, code2i[d.code], i))
				1719
				1720	name2i[d.name] = i
				1721	code2i[d.code] = i
				1722
				1723	del name2i, code2i, i, d
				1724
				1725	##############################################################################
				1726	# Build a code2op dict, mapping opcode characters to OpcodeInfo records.
				1727	# Also ensure we've got the same stuff as pickle.py, although the
				1728	# introspection here is dicey.
				1729
				1730	code2op = {}
				1731	for d in opcodes:
				1732	code2op[d.code] = d
				1733	del d
				1734
				1735	def assure_pickle_consistency(verbose=False):
				1736	import pickle, re
				1737
				1738	copy = code2op.copy()
				1739	for name in pickle.__all__:
				1740	if not re.match("[A-Z][A-Z0-9_]+$", name):
				1741	if verbose:
				1742	print "skipping %r: it doesn't look like an opcode name" % name
				1743	continue
				1744	picklecode = getattr(pickle, name)
				1745	if not isinstance(picklecode, str) or len(picklecode) != 1:
				1746	if verbose:
				1747	print ("skipping %r: value %r doesn't look like a pickle "
				1748	"code" % (name, picklecode))
				1749	continue
				1750	if picklecode in copy:
				1751	if verbose:
				1752	print "checking name %r w/ code %r for consistency" % (
				1753	name, picklecode)
				1754	d = copy[picklecode]
				1755	if d.name != name:
				1756	raise ValueError("for pickle code %r, pickle.py uses name %r "
				1757	"but we're using name %r" % (picklecode,
				1758	name,
				1759	d.name))
				1760	# Forget this one. Any left over in copy at the end are a problem
				1761	# of a different kind.
				1762	del copy[picklecode]
				1763	else:
				1764	raise ValueError("pickle.py appears to have a pickle opcode with "
				1765	"name %r and code %r, but we don't" %
				1766	(name, picklecode))
				1767	if copy:
				1768	msg = ["we appear to have pickle opcodes that pickle.py doesn't have:"]
				1769	for code, d in copy.items():
				1770	msg.append(" name %r with code %r" % (d.name, code))
				1771	raise ValueError("\n".join(msg))
				1772
				1773	assure_pickle_consistency()
Tim Peters	c0c12b5	2003-01-29 00:56:17 +0000	[diff] [blame^]	1774	del assure_pickle_consistency
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	1775
				1776	##############################################################################
				1777	# A pickle opcode generator.
				1778
				1779	def genops(pickle):
Guido van Rossum	a72ded9	2003-01-27 19:40:47 +0000	[diff] [blame]	1780	"""Generate all the opcodes in a pickle.
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	1781
				1782	'pickle' is a file-like object, or string, containing the pickle.
				1783
				1784	Each opcode in the pickle is generated, from the current pickle position,
				1785	stopping after a STOP opcode is delivered. A triple is generated for
				1786	each opcode:
				1787
				1788	opcode, arg, pos
				1789
				1790	opcode is an OpcodeInfo record, describing the current opcode.
				1791
				1792	If the opcode has an argument embedded in the pickle, arg is its decoded
				1793	value, as a Python object. If the opcode doesn't have an argument, arg
				1794	is None.
				1795
				1796	If the pickle has a tell() method, pos was the value of pickle.tell()
				1797	before reading the current opcode. If the pickle is a string object,
				1798	it's wrapped in a StringIO object, and the latter's tell() result is
				1799	used. Else (the pickle doesn't have a tell(), and it's not obvious how
				1800	to query its current position) pos is None.
				1801	"""
				1802
				1803	import cStringIO as StringIO
				1804
				1805	if isinstance(pickle, str):
				1806	pickle = StringIO.StringIO(pickle)
				1807
				1808	if hasattr(pickle, "tell"):
				1809	getpos = pickle.tell
				1810	else:
				1811	getpos = lambda: None
				1812
				1813	while True:
				1814	pos = getpos()
				1815	code = pickle.read(1)
				1816	opcode = code2op.get(code)
				1817	if opcode is None:
				1818	if code == "":
				1819	raise ValueError("pickle exhausted before seeing STOP")
				1820	else:
				1821	raise ValueError("at position %s, opcode %r unknown" % (
				1822	pos is None and "<unknown>" or pos,
				1823	code))
				1824	if opcode.arg is None:
				1825	arg = None
				1826	else:
				1827	arg = opcode.arg.reader(pickle)
				1828	yield opcode, arg, pos
				1829	if code == '.':
				1830	assert opcode.name == 'STOP'
				1831	break
				1832
				1833	##############################################################################
				1834	# A symbolic pickle disassembler.
				1835
				1836	def dis(pickle, out=None, indentlevel=4):
				1837	"""Produce a symbolic disassembly of a pickle.
				1838
				1839	'pickle' is a file-like object, or string, containing a (at least one)
				1840	pickle. The pickle is disassembled from the current position, through
				1841	the first STOP opcode encountered.
				1842
				1843	Optional arg 'out' is a file-like object to which the disassembly is
				1844	printed. It defaults to sys.stdout.
				1845
				1846	Optional arg indentlevel is the number of blanks by which to indent
				1847	a new MARK level. It defaults to 4.
				1848	"""
				1849
				1850	markstack = []
				1851	indentchunk = ' ' * indentlevel
				1852	for opcode, arg, pos in genops(pickle):
				1853	if pos is not None:
				1854	print >> out, "%5d:" % pos,
				1855
Tim Peters	d0f7c86	2003-01-28 15:27:57 +0000	[diff] [blame]	1856	line = "%-4s %s%s" % (repr(opcode.code)[1:-1],
				1857	indentchunk * len(markstack),
				1858	opcode.name)
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	1859
				1860	markmsg = None
				1861	if markstack and markobject in opcode.stack_before:
				1862	assert markobject not in opcode.stack_after
				1863	markpos = markstack.pop()
				1864	if markpos is not None:
				1865	markmsg = "(MARK at %d)" % markpos
				1866
				1867	if arg is not None or markmsg:
				1868	# make a mild effort to align arguments
				1869	line += ' ' * (10 - len(opcode.name))
				1870	if arg is not None:
				1871	line += ' ' + repr(arg)
				1872	if markmsg:
				1873	line += ' ' + markmsg
				1874	print >> out, line
				1875
				1876	if markobject in opcode.stack_after:
				1877	assert markobject not in opcode.stack_before
				1878	markstack.append(pos)
				1879
				1880
Guido van Rossum	03e3532	2003-01-28 15:37:13 +0000	[diff] [blame]	1881	_dis_test = r"""
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	1882	>>> import pickle
				1883	>>> x = [1, 2, (3, 4), {'abc': u"def"}]
Guido van Rossum	5702835	2003-01-28 15:09:10 +0000	[diff] [blame]	1884	>>> pkl = pickle.dumps(x, 0)
				1885	>>> dis(pkl)
Tim Peters	d0f7c86	2003-01-28 15:27:57 +0000	[diff] [blame]	1886	0: ( MARK
				1887	1: l LIST (MARK at 0)
				1888	2: p PUT 0
				1889	5: I INT 1
				1890	8: a APPEND
				1891	9: I INT 2
				1892	12: a APPEND
				1893	13: ( MARK
				1894	14: I INT 3
				1895	17: I INT 4
				1896	20: t TUPLE (MARK at 13)
				1897	21: p PUT 1
				1898	24: a APPEND
				1899	25: ( MARK
				1900	26: d DICT (MARK at 25)
				1901	27: p PUT 2
				1902	30: S STRING 'abc'
				1903	37: p PUT 3
				1904	40: V UNICODE u'def'
				1905	45: p PUT 4
				1906	48: s SETITEM
				1907	49: a APPEND
				1908	50: . STOP
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	1909
				1910	Try again with a "binary" pickle.
				1911
Guido van Rossum	5702835	2003-01-28 15:09:10 +0000	[diff] [blame]	1912	>>> pkl = pickle.dumps(x, 1)
				1913	>>> dis(pkl)
Tim Peters	d0f7c86	2003-01-28 15:27:57 +0000	[diff] [blame]	1914	0: ] EMPTY_LIST
				1915	1: q BINPUT 0
				1916	3: ( MARK
				1917	4: K BININT1 1
				1918	6: K BININT1 2
				1919	8: ( MARK
				1920	9: K BININT1 3
				1921	11: K BININT1 4
				1922	13: t TUPLE (MARK at 8)
				1923	14: q BINPUT 1
				1924	16: } EMPTY_DICT
				1925	17: q BINPUT 2
				1926	19: U SHORT_BINSTRING 'abc'
				1927	24: q BINPUT 3
				1928	26: X BINUNICODE u'def'
				1929	34: q BINPUT 4
				1930	36: s SETITEM
				1931	37: e APPENDS (MARK at 3)
				1932	38: . STOP
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	1933
				1934	Exercise the INST/OBJ/BUILD family.
				1935
				1936	>>> import random
Guido van Rossum	f29d3d6	2003-01-27 22:47:53 +0000	[diff] [blame]	1937	>>> dis(pickle.dumps(random.random, 0))
Tim Peters	d0f7c86	2003-01-28 15:27:57 +0000	[diff] [blame]	1938	0: c GLOBAL 'random random'
				1939	15: p PUT 0
				1940	18: . STOP
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	1941
				1942	>>> x = [pickle.PicklingError()] * 2
Guido van Rossum	f29d3d6	2003-01-27 22:47:53 +0000	[diff] [blame]	1943	>>> dis(pickle.dumps(x, 0))
Tim Peters	d0f7c86	2003-01-28 15:27:57 +0000	[diff] [blame]	1944	0: ( MARK
				1945	1: l LIST (MARK at 0)
				1946	2: p PUT 0
				1947	5: ( MARK
				1948	6: i INST 'pickle PicklingError' (MARK at 5)
				1949	28: p PUT 1
				1950	31: ( MARK
				1951	32: d DICT (MARK at 31)
				1952	33: p PUT 2
				1953	36: S STRING 'args'
				1954	44: p PUT 3
				1955	47: ( MARK
				1956	48: t TUPLE (MARK at 47)
				1957	49: s SETITEM
				1958	50: b BUILD
				1959	51: a APPEND
				1960	52: g GET 1
				1961	55: a APPEND
				1962	56: . STOP
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	1963
				1964	>>> dis(pickle.dumps(x, 1))
Tim Peters	d0f7c86	2003-01-28 15:27:57 +0000	[diff] [blame]	1965	0: ] EMPTY_LIST
				1966	1: q BINPUT 0
				1967	3: ( MARK
				1968	4: ( MARK
				1969	5: c GLOBAL 'pickle PicklingError'
				1970	27: q BINPUT 1
				1971	29: o OBJ (MARK at 4)
				1972	30: q BINPUT 2
				1973	32: } EMPTY_DICT
				1974	33: q BINPUT 3
				1975	35: U SHORT_BINSTRING 'args'
				1976	41: q BINPUT 4
				1977	43: ) EMPTY_TUPLE
				1978	44: s SETITEM
				1979	45: b BUILD
				1980	46: h BINGET 2
				1981	48: e APPENDS (MARK at 3)
				1982	49: . STOP
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	1983
				1984	Try "the canonical" recursive-object test.
				1985
				1986	>>> L = []
				1987	>>> T = L,
				1988	>>> L.append(T)
				1989	>>> L[0] is T
				1990	True
				1991	>>> T[0] is L
				1992	True
				1993	>>> L[0][0] is L
				1994	True
				1995	>>> T[0][0] is T
				1996	True
Guido van Rossum	f29d3d6	2003-01-27 22:47:53 +0000	[diff] [blame]	1997	>>> dis(pickle.dumps(L, 0))
Tim Peters	d0f7c86	2003-01-28 15:27:57 +0000	[diff] [blame]	1998	0: ( MARK
				1999	1: l LIST (MARK at 0)
				2000	2: p PUT 0
				2001	5: ( MARK
				2002	6: g GET 0
				2003	9: t TUPLE (MARK at 5)
				2004	10: p PUT 1
				2005	13: a APPEND
				2006	14: . STOP
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	2007	>>> dis(pickle.dumps(L, 1))
Tim Peters	d0f7c86	2003-01-28 15:27:57 +0000	[diff] [blame]	2008	0: ] EMPTY_LIST
				2009	1: q BINPUT 0
				2010	3: ( MARK
				2011	4: h BINGET 0
				2012	6: t TUPLE (MARK at 3)
				2013	7: q BINPUT 1
				2014	9: a APPEND
				2015	10: . STOP
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	2016
				2017	The protocol 0 pickle of the tuple causes the disassembly to get confused,
				2018	as it doesn't realize that the POP opcode at 16 gets rid of the MARK at 0
				2019	(so the output remains indented until the end). The protocol 1 pickle
				2020	doesn't trigger this glitch, because the disassembler realizes that
				2021	POP_MARK gets rid of the MARK. Doing a better job on the protocol 0
				2022	pickle would require the disassembler to emulate the stack.
				2023
Guido van Rossum	f29d3d6	2003-01-27 22:47:53 +0000	[diff] [blame]	2024	>>> dis(pickle.dumps(T, 0))
Tim Peters	d0f7c86	2003-01-28 15:27:57 +0000	[diff] [blame]	2025	0: ( MARK
				2026	1: ( MARK
				2027	2: l LIST (MARK at 1)
				2028	3: p PUT 0
				2029	6: ( MARK
				2030	7: g GET 0
				2031	10: t TUPLE (MARK at 6)
				2032	11: p PUT 1
				2033	14: a APPEND
				2034	15: 0 POP
				2035	16: 0 POP
				2036	17: g GET 1
				2037	20: . STOP
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	2038	>>> dis(pickle.dumps(T, 1))
Tim Peters	d0f7c86	2003-01-28 15:27:57 +0000	[diff] [blame]	2039	0: ( MARK
				2040	1: ] EMPTY_LIST
				2041	2: q BINPUT 0
				2042	4: ( MARK
				2043	5: h BINGET 0
				2044	7: t TUPLE (MARK at 4)
				2045	8: q BINPUT 1
				2046	10: a APPEND
				2047	11: 1 POP_MARK (MARK at 0)
				2048	12: h BINGET 1
				2049	14: . STOP
				2050
				2051	Try protocol 2.
				2052
				2053	>>> dis(pickle.dumps(L, 2))
				2054	0: \x80 PROTO 2
				2055	2: ] EMPTY_LIST
				2056	3: q BINPUT 0
				2057	5: h BINGET 0
				2058	7: \x85 TUPLE1
				2059	8: q BINPUT 1
				2060	10: a APPEND
				2061	11: . STOP
				2062
				2063	>>> dis(pickle.dumps(T, 2))
				2064	0: \x80 PROTO 2
				2065	2: ] EMPTY_LIST
				2066	3: q BINPUT 0
				2067	5: h BINGET 0
				2068	7: \x85 TUPLE1
				2069	8: q BINPUT 1
				2070	10: a APPEND
				2071	11: 0 POP
				2072	12: h BINGET 1
				2073	14: . STOP
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	2074	"""
				2075
Guido van Rossum	5702835	2003-01-28 15:09:10 +0000	[diff] [blame]	2076	__test__ = {'disassembler_test': _dis_test,
Tim Peters	8ecfc8e	2003-01-27 18:51:48 +0000	[diff] [blame]	2077	}
				2078
				2079	def _test():
				2080	import doctest
				2081	return doctest.testmod()
				2082
				2083	if __name__ == "__main__":
				2084	_test()