Blame - Lib/difflib.py - platform/external/python/cpython3

blob: db254e3fc3ff35e7d3a184842224f387a8e76b9b [file] [log] [blame]

Tim Peters	9ae2148	2001-02-10 08:00:53 +0000	[diff] [blame]	1	#! /usr/bin/env python
				2
				3	"""
				4	Module difflib -- helpers for computing deltas between objects.
				5
				6	Function get_close_matches(word, possibilities, n=3, cutoff=0.6):
Tim Peters	9ae2148	2001-02-10 08:00:53 +0000	[diff] [blame]	7	Use SequenceMatcher to return list of the best "good enough" matches.
				8
Tim Peters	5e824c3	2001-08-12 22:25:01 +0000	[diff] [blame]	9	Function ndiff(a, b):
				10	Return a delta: the difference between `a` and `b` (lists of strings).
Tim Peters	9ae2148	2001-02-10 08:00:53 +0000	[diff] [blame]	11
Tim Peters	5e824c3	2001-08-12 22:25:01 +0000	[diff] [blame]	12	Function restore(delta, which):
				13	Return one of the two sequences that generated an ndiff delta.
Tim Peters	9ae2148	2001-02-10 08:00:53 +0000	[diff] [blame]	14
Tim Peters	5e824c3	2001-08-12 22:25:01 +0000	[diff] [blame]	15	Class SequenceMatcher:
				16	A flexible class for comparing pairs of sequences of any type.
Tim Peters	9ae2148	2001-02-10 08:00:53 +0000	[diff] [blame]	17
Tim Peters	5e824c3	2001-08-12 22:25:01 +0000	[diff] [blame]	18	Class Differ:
				19	For producing human-readable deltas from sequences of lines of text.
Tim Peters	9ae2148	2001-02-10 08:00:53 +0000	[diff] [blame]	20	"""
				21
Tim Peters	5e824c3	2001-08-12 22:25:01 +0000	[diff] [blame]	22	__all__ = ['get_close_matches', 'ndiff', 'restore', 'SequenceMatcher',
				23	'Differ']
				24
Tim Peters	9ae2148	2001-02-10 08:00:53 +0000	[diff] [blame]	25	class SequenceMatcher:
Tim Peters	5e824c3	2001-08-12 22:25:01 +0000	[diff] [blame]	26
				27	"""
				28	SequenceMatcher is a flexible class for comparing pairs of sequences of
				29	any type, so long as the sequence elements are hashable. The basic
				30	algorithm predates, and is a little fancier than, an algorithm
				31	published in the late 1980's by Ratcliff and Obershelp under the
				32	hyperbolic name "gestalt pattern matching". The basic idea is to find
				33	the longest contiguous matching subsequence that contains no "junk"
				34	elements (R-O doesn't address junk). The same idea is then applied
				35	recursively to the pieces of the sequences to the left and to the right
				36	of the matching subsequence. This does not yield minimal edit
				37	sequences, but does tend to yield matches that "look right" to people.
				38
				39	SequenceMatcher tries to compute a "human-friendly diff" between two
				40	sequences. Unlike e.g. UNIX(tm) diff, the fundamental notion is the
				41	longest contiguous & junk-free matching subsequence. That's what
				42	catches peoples' eyes. The Windows(tm) windiff has another interesting
				43	notion, pairing up elements that appear uniquely in each sequence.
				44	That, and the method here, appear to yield more intuitive difference
				45	reports than does diff. This method appears to be the least vulnerable
				46	to synching up on blocks of "junk lines", though (like blank lines in
				47	ordinary text files, or maybe "<P>" lines in HTML files). That may be
				48	because this is the only method of the 3 that has a concept of
				49	"junk" <wink>.
				50
				51	Example, comparing two strings, and considering blanks to be "junk":
				52
				53	>>> s = SequenceMatcher(lambda x: x == " ",
				54	... "private Thread currentThread;",
				55	... "private volatile Thread currentThread;")
				56	>>>
				57
				58	.ratio() returns a float in [0, 1], measuring the "similarity" of the
				59	sequences. As a rule of thumb, a .ratio() value over 0.6 means the
				60	sequences are close matches:
				61
				62	>>> print round(s.ratio(), 3)
				63	0.866
				64	>>>
				65
				66	If you're only interested in where the sequences match,
				67	.get_matching_blocks() is handy:
				68
				69	>>> for block in s.get_matching_blocks():
				70	... print "a[%d] and b[%d] match for %d elements" % block
				71	a[0] and b[0] match for 8 elements
				72	a[8] and b[17] match for 6 elements
				73	a[14] and b[23] match for 15 elements
				74	a[29] and b[38] match for 0 elements
				75
				76	Note that the last tuple returned by .get_matching_blocks() is always a
				77	dummy, (len(a), len(b), 0), and this is the only case in which the last
				78	tuple element (number of elements matched) is 0.
				79
				80	If you want to know how to change the first sequence into the second,
				81	use .get_opcodes():
				82
				83	>>> for opcode in s.get_opcodes():
				84	... print "%6s a[%d:%d] b[%d:%d]" % opcode
				85	equal a[0:8] b[0:8]
				86	insert a[8:8] b[8:17]
				87	equal a[8:14] b[17:23]
				88	equal a[14:29] b[23:38]
				89
				90	See the Differ class for a fancy human-friendly file differencer, which
				91	uses SequenceMatcher both to compare sequences of lines, and to compare
				92	sequences of characters within similar (near-matching) lines.
				93
				94	See also function get_close_matches() in this module, which shows how
				95	simple code building on SequenceMatcher can be used to do useful work.
				96
				97	Timing: Basic R-O is cubic time worst case and quadratic time expected
				98	case. SequenceMatcher is quadratic time for the worst case and has
				99	expected-case behavior dependent in a complicated way on how many
				100	elements the sequences have in common; best case time is linear.
				101
				102	Methods:
				103
				104	__init__(isjunk=None, a='', b='')
				105	Construct a SequenceMatcher.
				106
				107	set_seqs(a, b)
				108	Set the two sequences to be compared.
				109
				110	set_seq1(a)
				111	Set the first sequence to be compared.
				112
				113	set_seq2(b)
				114	Set the second sequence to be compared.
				115
				116	find_longest_match(alo, ahi, blo, bhi)
				117	Find longest matching block in a[alo:ahi] and b[blo:bhi].
				118
				119	get_matching_blocks()
				120	Return list of triples describing matching subsequences.
				121
				122	get_opcodes()
				123	Return list of 5-tuples describing how to turn a into b.
				124
				125	ratio()
				126	Return a measure of the sequences' similarity (float in [0,1]).
				127
				128	quick_ratio()
				129	Return an upper bound on .ratio() relatively quickly.
				130
				131	real_quick_ratio()
				132	Return an upper bound on ratio() very quickly.
				133	"""
				134
Tim Peters	9ae2148	2001-02-10 08:00:53 +0000	[diff] [blame]	135	def __init__(self, isjunk=None, a='', b=''):
				136	"""Construct a SequenceMatcher.
				137
				138	Optional arg isjunk is None (the default), or a one-argument
				139	function that takes a sequence element and returns true iff the
Tim Peters	5e824c3	2001-08-12 22:25:01 +0000	[diff] [blame]	140	element is junk. None is equivalent to passing "lambda x: 0", i.e.
Fred Drake	f1da628	2001-02-19 19:30:05 +0000	[diff] [blame]	141	no elements are considered to be junk. For example, pass
Tim Peters	9ae2148	2001-02-10 08:00:53 +0000	[diff] [blame]	142	lambda x: x in " \\t"
				143	if you're comparing lines as sequences of characters, and don't
				144	want to synch up on blanks or hard tabs.
				145
				146	Optional arg a is the first of two sequences to be compared. By
				147	default, an empty string. The elements of a must be hashable. See
				148	also .set_seqs() and .set_seq1().
				149
				150	Optional arg b is the second of two sequences to be compared. By
Fred Drake	f1da628	2001-02-19 19:30:05 +0000	[diff] [blame]	151	default, an empty string. The elements of b must be hashable. See
Tim Peters	9ae2148	2001-02-10 08:00:53 +0000	[diff] [blame]	152	also .set_seqs() and .set_seq2().
				153	"""
				154
				155	# Members:
				156	# a
				157	# first sequence
				158	# b
				159	# second sequence; differences are computed as "what do
				160	# we need to do to 'a' to change it into 'b'?"
				161	# b2j
				162	# for x in b, b2j[x] is a list of the indices (into b)
				163	# at which x appears; junk elements do not appear
				164	# b2jhas
				165	# b2j.has_key
				166	# fullbcount
				167	# for x in b, fullbcount[x] == the number of times x
				168	# appears in b; only materialized if really needed (used
				169	# only for computing quick_ratio())
				170	# matching_blocks
				171	# a list of (i, j, k) triples, where a[i:i+k] == b[j:j+k];
				172	# ascending & non-overlapping in i and in j; terminated by
				173	# a dummy (len(a), len(b), 0) sentinel
				174	# opcodes
				175	# a list of (tag, i1, i2, j1, j2) tuples, where tag is
				176	# one of
				177	# 'replace' a[i1:i2] should be replaced by b[j1:j2]
				178	# 'delete' a[i1:i2] should be deleted
				179	# 'insert' b[j1:j2] should be inserted
				180	# 'equal' a[i1:i2] == b[j1:j2]
				181	# isjunk
				182	# a user-supplied function taking a sequence element and
				183	# returning true iff the element is "junk" -- this has
				184	# subtle but helpful effects on the algorithm, which I'll
				185	# get around to writing up someday <0.9 wink>.
				186	# DON'T USE! Only __chain_b uses this. Use isbjunk.
				187	# isbjunk
				188	# for x in b, isbjunk(x) == isjunk(x) but much faster;
				189	# it's really the has_key method of a hidden dict.
				190	# DOES NOT WORK for x in a!
				191
				192	self.isjunk = isjunk
				193	self.a = self.b = None
				194	self.set_seqs(a, b)
				195
				196	def set_seqs(self, a, b):
				197	"""Set the two sequences to be compared.
				198
				199	>>> s = SequenceMatcher()
				200	>>> s.set_seqs("abcd", "bcde")
				201	>>> s.ratio()
				202	0.75
				203	"""
				204
				205	self.set_seq1(a)
				206	self.set_seq2(b)
				207
				208	def set_seq1(self, a):
				209	"""Set the first sequence to be compared.
				210
				211	The second sequence to be compared is not changed.
				212
				213	>>> s = SequenceMatcher(None, "abcd", "bcde")
				214	>>> s.ratio()
				215	0.75
				216	>>> s.set_seq1("bcde")
				217	>>> s.ratio()
				218	1.0
				219	>>>
				220
				221	SequenceMatcher computes and caches detailed information about the
				222	second sequence, so if you want to compare one sequence S against
				223	many sequences, use .set_seq2(S) once and call .set_seq1(x)
				224	repeatedly for each of the other sequences.
				225
				226	See also set_seqs() and set_seq2().
				227	"""
				228
				229	if a is self.a:
				230	return
				231	self.a = a
				232	self.matching_blocks = self.opcodes = None
				233
				234	def set_seq2(self, b):
				235	"""Set the second sequence to be compared.
				236
				237	The first sequence to be compared is not changed.
				238
				239	>>> s = SequenceMatcher(None, "abcd", "bcde")
				240	>>> s.ratio()
				241	0.75
				242	>>> s.set_seq2("abcd")
				243	>>> s.ratio()
				244	1.0
				245	>>>
				246
				247	SequenceMatcher computes and caches detailed information about the
				248	second sequence, so if you want to compare one sequence S against
				249	many sequences, use .set_seq2(S) once and call .set_seq1(x)
				250	repeatedly for each of the other sequences.
				251
				252	See also set_seqs() and set_seq1().
				253	"""
				254
				255	if b is self.b:
				256	return
				257	self.b = b
				258	self.matching_blocks = self.opcodes = None
				259	self.fullbcount = None
				260	self.__chain_b()
				261
				262	# For each element x in b, set b2j[x] to a list of the indices in
				263	# b where x appears; the indices are in increasing order; note that
				264	# the number of times x appears in b is len(b2j[x]) ...
				265	# when self.isjunk is defined, junk elements don't show up in this
				266	# map at all, which stops the central find_longest_match method
				267	# from starting any matching block at a junk element ...
				268	# also creates the fast isbjunk function ...
				269	# note that this is only called when b changes; so for cross-product
				270	# kinds of matches, it's best to call set_seq2 once, then set_seq1
				271	# repeatedly
				272
				273	def __chain_b(self):
				274	# Because isjunk is a user-defined (not C) function, and we test
				275	# for junk a LOT, it's important to minimize the number of calls.
				276	# Before the tricks described here, __chain_b was by far the most
				277	# time-consuming routine in the whole module! If anyone sees
				278	# Jim Roskind, thank him again for profile.py -- I never would
				279	# have guessed that.
				280	# The first trick is to build b2j ignoring the possibility
				281	# of junk. I.e., we don't call isjunk at all yet. Throwing
				282	# out the junk later is much cheaper than building b2j "right"
				283	# from the start.
				284	b = self.b
				285	self.b2j = b2j = {}
				286	self.b2jhas = b2jhas = b2j.has_key
				287	for i in xrange(len(b)):
				288	elt = b[i]
				289	if b2jhas(elt):
				290	b2j[elt].append(i)
				291	else:
				292	b2j[elt] = [i]
				293
				294	# Now b2j.keys() contains elements uniquely, and especially when
				295	# the sequence is a string, that's usually a good deal smaller
				296	# than len(string). The difference is the number of isjunk calls
				297	# saved.
				298	isjunk, junkdict = self.isjunk, {}
				299	if isjunk:
				300	for elt in b2j.keys():
				301	if isjunk(elt):
				302	junkdict[elt] = 1 # value irrelevant; it's a set
				303	del b2j[elt]
				304
				305	# Now for x in b, isjunk(x) == junkdict.has_key(x), but the
				306	# latter is much faster. Note too that while there may be a
				307	# lot of junk in the sequence, the number of unique junk
				308	# elements is probably small. So the memory burden of keeping
				309	# this dict alive is likely trivial compared to the size of b2j.
				310	self.isbjunk = junkdict.has_key
				311
				312	def find_longest_match(self, alo, ahi, blo, bhi):
				313	"""Find longest matching block in a[alo:ahi] and b[blo:bhi].
				314
				315	If isjunk is not defined:
				316
				317	Return (i,j,k) such that a[i:i+k] is equal to b[j:j+k], where
				318	alo <= i <= i+k <= ahi
				319	blo <= j <= j+k <= bhi
				320	and for all (i',j',k') meeting those conditions,
				321	k >= k'
				322	i <= i'
				323	and if i == i', j <= j'
				324
				325	In other words, of all maximal matching blocks, return one that
				326	starts earliest in a, and of all those maximal matching blocks that
				327	start earliest in a, return the one that starts earliest in b.
				328
				329	>>> s = SequenceMatcher(None, " abcd", "abcd abcd")
				330	>>> s.find_longest_match(0, 5, 0, 9)
				331	(0, 4, 5)
				332
				333	If isjunk is defined, first the longest matching block is
				334	determined as above, but with the additional restriction that no
				335	junk element appears in the block. Then that block is extended as
				336	far as possible by matching (only) junk elements on both sides. So
				337	the resulting block never matches on junk except as identical junk
				338	happens to be adjacent to an "interesting" match.
				339
				340	Here's the same example as before, but considering blanks to be
				341	junk. That prevents " abcd" from matching the " abcd" at the tail
				342	end of the second sequence directly. Instead only the "abcd" can
				343	match, and matches the leftmost "abcd" in the second sequence:
				344
				345	>>> s = SequenceMatcher(lambda x: x==" ", " abcd", "abcd abcd")
				346	>>> s.find_longest_match(0, 5, 0, 9)
				347	(1, 0, 4)
				348
				349	If no blocks match, return (alo, blo, 0).
				350
				351	>>> s = SequenceMatcher(None, "ab", "c")
				352	>>> s.find_longest_match(0, 2, 0, 1)
				353	(0, 0, 0)
				354	"""
				355
				356	# CAUTION: stripping common prefix or suffix would be incorrect.
				357	# E.g.,
				358	# ab
				359	# acab
				360	# Longest matching block is "ab", but if common prefix is
				361	# stripped, it's "a" (tied with "b"). UNIX(tm) diff does so
				362	# strip, so ends up claiming that ab is changed to acab by
				363	# inserting "ca" in the middle. That's minimal but unintuitive:
				364	# "it's obvious" that someone inserted "ac" at the front.
				365	# Windiff ends up at the same place as diff, but by pairing up
				366	# the unique 'b's and then matching the first two 'a's.
				367
				368	a, b, b2j, isbjunk = self.a, self.b, self.b2j, self.isbjunk
				369	besti, bestj, bestsize = alo, blo, 0
				370	# find longest junk-free match
				371	# during an iteration of the loop, j2len[j] = length of longest
				372	# junk-free match ending with a[i-1] and b[j]
				373	j2len = {}
				374	nothing = []
				375	for i in xrange(alo, ahi):
				376	# look at all instances of a[i] in b; note that because
				377	# b2j has no junk keys, the loop is skipped if a[i] is junk
				378	j2lenget = j2len.get
				379	newj2len = {}
				380	for j in b2j.get(a[i], nothing):
				381	# a[i] matches b[j]
				382	if j < blo:
				383	continue
				384	if j >= bhi:
				385	break
				386	k = newj2len[j] = j2lenget(j-1, 0) + 1
				387	if k > bestsize:
				388	besti, bestj, bestsize = i-k+1, j-k+1, k
				389	j2len = newj2len
				390
				391	# Now that we have a wholly interesting match (albeit possibly
				392	# empty!), we may as well suck up the matching junk on each
				393	# side of it too. Can't think of a good reason not to, and it
				394	# saves post-processing the (possibly considerable) expense of
				395	# figuring out what to do with it. In the case of an empty
				396	# interesting match, this is clearly the right thing to do,
				397	# because no other kind of match is possible in the regions.
				398	while besti > alo and bestj > blo and \
				399	isbjunk(b[bestj-1]) and \
				400	a[besti-1] == b[bestj-1]:
				401	besti, bestj, bestsize = besti-1, bestj-1, bestsize+1
				402	while besti+bestsize < ahi and bestj+bestsize < bhi and \
				403	isbjunk(b[bestj+bestsize]) and \
				404	a[besti+bestsize] == b[bestj+bestsize]:
				405	bestsize = bestsize + 1
				406
Tim Peters	9ae2148	2001-02-10 08:00:53 +0000	[diff] [blame]	407	return besti, bestj, bestsize
				408
				409	def get_matching_blocks(self):
				410	"""Return list of triples describing matching subsequences.
				411
				412	Each triple is of the form (i, j, n), and means that
				413	a[i:i+n] == b[j:j+n]. The triples are monotonically increasing in
				414	i and in j.
				415
				416	The last triple is a dummy, (len(a), len(b), 0), and is the only
				417	triple with n==0.
				418
				419	>>> s = SequenceMatcher(None, "abxcd", "abcd")
				420	>>> s.get_matching_blocks()
				421	[(0, 0, 2), (3, 2, 2), (5, 4, 0)]
				422	"""
				423
				424	if self.matching_blocks is not None:
				425	return self.matching_blocks
				426	self.matching_blocks = []
				427	la, lb = len(self.a), len(self.b)
				428	self.__helper(0, la, 0, lb, self.matching_blocks)
				429	self.matching_blocks.append( (la, lb, 0) )
Tim Peters	9ae2148	2001-02-10 08:00:53 +0000	[diff] [blame]	430	return self.matching_blocks
				431
				432	# builds list of matching blocks covering a[alo:ahi] and
				433	# b[blo:bhi], appending them in increasing order to answer
				434
				435	def __helper(self, alo, ahi, blo, bhi, answer):
				436	i, j, k = x = self.find_longest_match(alo, ahi, blo, bhi)
				437	# a[alo:i] vs b[blo:j] unknown
				438	# a[i:i+k] same as b[j:j+k]
				439	# a[i+k:ahi] vs b[j+k:bhi] unknown
				440	if k:
				441	if alo < i and blo < j:
				442	self.__helper(alo, i, blo, j, answer)
				443	answer.append(x)
				444	if i+k < ahi and j+k < bhi:
				445	self.__helper(i+k, ahi, j+k, bhi, answer)
				446
				447	def get_opcodes(self):
				448	"""Return list of 5-tuples describing how to turn a into b.
				449
				450	Each tuple is of the form (tag, i1, i2, j1, j2). The first tuple
				451	has i1 == j1 == 0, and remaining tuples have i1 == the i2 from the
				452	tuple preceding it, and likewise for j1 == the previous j2.
				453
				454	The tags are strings, with these meanings:
				455
				456	'replace': a[i1:i2] should be replaced by b[j1:j2]
				457	'delete': a[i1:i2] should be deleted.
				458	Note that j1==j2 in this case.
				459	'insert': b[j1:j2] should be inserted at a[i1:i1].
				460	Note that i1==i2 in this case.
				461	'equal': a[i1:i2] == b[j1:j2]
				462
				463	>>> a = "qabxcd"
				464	>>> b = "abycdf"
				465	>>> s = SequenceMatcher(None, a, b)
				466	>>> for tag, i1, i2, j1, j2 in s.get_opcodes():
				467	... print ("%7s a[%d:%d] (%s) b[%d:%d] (%s)" %
				468	... (tag, i1, i2, a[i1:i2], j1, j2, b[j1:j2]))
				469	delete a[0:1] (q) b[0:0] ()
				470	equal a[1:3] (ab) b[0:2] (ab)
				471	replace a[3:4] (x) b[2:3] (y)
				472	equal a[4:6] (cd) b[3:5] (cd)
				473	insert a[6:6] () b[5:6] (f)
				474	"""
				475
				476	if self.opcodes is not None:
				477	return self.opcodes
				478	i = j = 0
				479	self.opcodes = answer = []
				480	for ai, bj, size in self.get_matching_blocks():
				481	# invariant: we've pumped out correct diffs to change
				482	# a[:i] into b[:j], and the next matching block is
				483	# a[ai:ai+size] == b[bj:bj+size]. So we need to pump
				484	# out a diff to change a[i:ai] into b[j:bj], pump out
				485	# the matching block, and move (i,j) beyond the match
				486	tag = ''
				487	if i < ai and j < bj:
				488	tag = 'replace'
				489	elif i < ai:
				490	tag = 'delete'
				491	elif j < bj:
				492	tag = 'insert'
				493	if tag:
				494	answer.append( (tag, i, ai, j, bj) )
				495	i, j = ai+size, bj+size
				496	# the list of matching blocks is terminated by a
				497	# sentinel with size 0
				498	if size:
				499	answer.append( ('equal', ai, i, bj, j) )
				500	return answer
				501
				502	def ratio(self):
				503	"""Return a measure of the sequences' similarity (float in [0,1]).
				504
				505	Where T is the total number of elements in both sequences, and
				506	M is the number of matches, this is 2,0*M / T.
				507	Note that this is 1 if the sequences are identical, and 0 if
				508	they have nothing in common.
				509
				510	.ratio() is expensive to compute if you haven't already computed
				511	.get_matching_blocks() or .get_opcodes(), in which case you may
				512	want to try .quick_ratio() or .real_quick_ratio() first to get an
				513	upper bound.
				514
				515	>>> s = SequenceMatcher(None, "abcd", "bcde")
				516	>>> s.ratio()
				517	0.75
				518	>>> s.quick_ratio()
				519	0.75
				520	>>> s.real_quick_ratio()
				521	1.0
				522	"""
				523
				524	matches = reduce(lambda sum, triple: sum + triple[-1],
				525	self.get_matching_blocks(), 0)
				526	return 2.0 * matches / (len(self.a) + len(self.b))
				527
				528	def quick_ratio(self):
				529	"""Return an upper bound on ratio() relatively quickly.
				530
				531	This isn't defined beyond that it is an upper bound on .ratio(), and
				532	is faster to compute.
				533	"""
				534
				535	# viewing a and b as multisets, set matches to the cardinality
				536	# of their intersection; this counts the number of matches
				537	# without regard to order, so is clearly an upper bound
				538	if self.fullbcount is None:
				539	self.fullbcount = fullbcount = {}
				540	for elt in self.b:
				541	fullbcount[elt] = fullbcount.get(elt, 0) + 1
				542	fullbcount = self.fullbcount
				543	# avail[x] is the number of times x appears in 'b' less the
				544	# number of times we've seen it in 'a' so far ... kinda
				545	avail = {}
				546	availhas, matches = avail.has_key, 0
				547	for elt in self.a:
				548	if availhas(elt):
				549	numb = avail[elt]
				550	else:
				551	numb = fullbcount.get(elt, 0)
				552	avail[elt] = numb - 1
				553	if numb > 0:
				554	matches = matches + 1
				555	return 2.0 * matches / (len(self.a) + len(self.b))
				556
				557	def real_quick_ratio(self):
				558	"""Return an upper bound on ratio() very quickly.
				559
				560	This isn't defined beyond that it is an upper bound on .ratio(), and
				561	is faster to compute than either .ratio() or .quick_ratio().
				562	"""
				563
				564	la, lb = len(self.a), len(self.b)
				565	# can't have more matches than the number of elements in the
				566	# shorter sequence
				567	return 2.0 * min(la, lb) / (la + lb)
				568
				569	def get_close_matches(word, possibilities, n=3, cutoff=0.6):
				570	"""Use SequenceMatcher to return list of the best "good enough" matches.
				571
				572	word is a sequence for which close matches are desired (typically a
				573	string).
				574
				575	possibilities is a list of sequences against which to match word
				576	(typically a list of strings).
				577
				578	Optional arg n (default 3) is the maximum number of close matches to
				579	return. n must be > 0.
				580
				581	Optional arg cutoff (default 0.6) is a float in [0, 1]. Possibilities
				582	that don't score at least that similar to word are ignored.
				583
				584	The best (no more than n) matches among the possibilities are returned
				585	in a list, sorted by similarity score, most similar first.
				586
				587	>>> get_close_matches("appel", ["ape", "apple", "peach", "puppy"])
				588	['apple', 'ape']
Tim Peters	5e824c3	2001-08-12 22:25:01 +0000	[diff] [blame]	589	>>> import keyword as _keyword
				590	>>> get_close_matches("wheel", _keyword.kwlist)
Tim Peters	9ae2148	2001-02-10 08:00:53 +0000	[diff] [blame]	591	['while']
Tim Peters	5e824c3	2001-08-12 22:25:01 +0000	[diff] [blame]	592	>>> get_close_matches("apple", _keyword.kwlist)
Tim Peters	9ae2148	2001-02-10 08:00:53 +0000	[diff] [blame]	593	[]
Tim Peters	5e824c3	2001-08-12 22:25:01 +0000	[diff] [blame]	594	>>> get_close_matches("accept", _keyword.kwlist)
Tim Peters	9ae2148	2001-02-10 08:00:53 +0000	[diff] [blame]	595	['except']
				596	"""
				597
				598	if not n > 0:
Fred Drake	f1da628	2001-02-19 19:30:05 +0000	[diff] [blame]	599	raise ValueError("n must be > 0: " + `n`)
Tim Peters	9ae2148	2001-02-10 08:00:53 +0000	[diff] [blame]	600	if not 0.0 <= cutoff <= 1.0:
Fred Drake	f1da628	2001-02-19 19:30:05 +0000	[diff] [blame]	601	raise ValueError("cutoff must be in [0.0, 1.0]: " + `cutoff`)
Tim Peters	9ae2148	2001-02-10 08:00:53 +0000	[diff] [blame]	602	result = []
				603	s = SequenceMatcher()
				604	s.set_seq2(word)
				605	for x in possibilities:
				606	s.set_seq1(x)
				607	if s.real_quick_ratio() >= cutoff and \
				608	s.quick_ratio() >= cutoff and \
				609	s.ratio() >= cutoff:
				610	result.append((s.ratio(), x))
				611	# Sort by score.
				612	result.sort()
				613	# Retain only the best n.
				614	result = result[-n:]
				615	# Move best-scorer to head of list.
				616	result.reverse()
				617	# Strip scores.
				618	return [x for score, x in result]
				619
Tim Peters	5e824c3	2001-08-12 22:25:01 +0000	[diff] [blame]	620
				621	def _count_leading(line, ch):
				622	"""
				623	Return number of `ch` characters at the start of `line`.
				624
				625	Example:
				626
				627	>>> _count_leading(' abc', ' ')
				628	3
				629	"""
				630
				631	i, n = 0, len(line)
				632	while i < n and line[i] == ch:
				633	i += 1
				634	return i
				635
				636	class Differ:
				637	r"""
				638	Differ is a class for comparing sequences of lines of text, and
				639	producing human-readable differences or deltas. Differ uses
				640	SequenceMatcher both to compare sequences of lines, and to compare
				641	sequences of characters within similar (near-matching) lines.
				642
				643	Each line of a Differ delta begins with a two-letter code:
				644
				645	'- ' line unique to sequence 1
				646	'+ ' line unique to sequence 2
				647	' ' line common to both sequences
				648	'? ' line not present in either input sequence
				649
				650	Lines beginning with '? ' attempt to guide the eye to intraline
				651	differences, and were not present in either input sequence. These lines
				652	can be confusing if the sequences contain tab characters.
				653
				654	Note that Differ makes no claim to produce a minimal diff. To the
				655	contrary, minimal diffs are often counter-intuitive, because they synch
				656	up anywhere possible, sometimes accidental matches 100 pages apart.
				657	Restricting synch points to contiguous matches preserves some notion of
				658	locality, at the occasional cost of producing a longer diff.
				659
				660	Example: Comparing two texts.
				661
				662	First we set up the texts, sequences of individual single-line strings
				663	ending with newlines (such sequences can also be obtained from the
				664	`readlines()` method of file-like objects):
				665
				666	>>> text1 = ''' 1. Beautiful is better than ugly.
				667	... 2. Explicit is better than implicit.
				668	... 3. Simple is better than complex.
				669	... 4. Complex is better than complicated.
				670	... '''.splitlines(1)
				671	>>> len(text1)
				672	4
				673	>>> text1[0][-1]
				674	'\n'
				675	>>> text2 = ''' 1. Beautiful is better than ugly.
				676	... 3. Simple is better than complex.
				677	... 4. Complicated is better than complex.
				678	... 5. Flat is better than nested.
				679	... '''.splitlines(1)
				680
				681	Next we instantiate a Differ object:
				682
				683	>>> d = Differ()
				684
				685	Note that when instantiating a Differ object we may pass functions to
				686	filter out line and character 'junk'. See Differ.__init__ for details.
				687
				688	Finally, we compare the two:
				689
Tim Peters	8a9c284	2001-09-22 21:30:22 +0000	[diff] [blame]	690	>>> result = list(d.compare(text1, text2))
Tim Peters	5e824c3	2001-08-12 22:25:01 +0000	[diff] [blame]	691
				692	'result' is a list of strings, so let's pretty-print it:
				693
				694	>>> from pprint import pprint as _pprint
				695	>>> _pprint(result)
				696	[' 1. Beautiful is better than ugly.\n',
				697	'- 2. Explicit is better than implicit.\n',
				698	'- 3. Simple is better than complex.\n',
				699	'+ 3. Simple is better than complex.\n',
				700	'? ++\n',
				701	'- 4. Complex is better than complicated.\n',
				702	'? ^ ---- ^\n',
				703	'+ 4. Complicated is better than complex.\n',
				704	'? ++++ ^ ^\n',
				705	'+ 5. Flat is better than nested.\n']
				706
				707	As a single multi-line string it looks like this:
				708
				709	>>> print ''.join(result),
				710	1. Beautiful is better than ugly.
				711	- 2. Explicit is better than implicit.
				712	- 3. Simple is better than complex.
				713	+ 3. Simple is better than complex.
				714	? ++
				715	- 4. Complex is better than complicated.
				716	? ^ ---- ^
				717	+ 4. Complicated is better than complex.
				718	? ++++ ^ ^
				719	+ 5. Flat is better than nested.
				720
				721	Methods:
				722
				723	__init__(linejunk=None, charjunk=None)
				724	Construct a text differencer, with optional filters.
				725
				726	compare(a, b)
Tim Peters	8a9c284	2001-09-22 21:30:22 +0000	[diff] [blame]	727	Compare two sequences of lines; generate the resulting delta.
Tim Peters	5e824c3	2001-08-12 22:25:01 +0000	[diff] [blame]	728	"""
				729
				730	def __init__(self, linejunk=None, charjunk=None):
				731	"""
				732	Construct a text differencer, with optional filters.
				733
				734	The two optional keyword parameters are for filter functions:
				735
				736	- `linejunk`: A function that should accept a single string argument,
				737	and return true iff the string is junk. The module-level function
				738	`IS_LINE_JUNK` may be used to filter out lines without visible
				739	characters, except for at most one splat ('#').
				740
				741	- `charjunk`: A function that should accept a string of length 1. The
				742	module-level function `IS_CHARACTER_JUNK` may be used to filter out
				743	whitespace characters (a blank or tab; note: bad idea to include
				744	newline in this!).
				745	"""
				746
				747	self.linejunk = linejunk
				748	self.charjunk = charjunk
Tim Peters	5e824c3	2001-08-12 22:25:01 +0000	[diff] [blame]	749
				750	def compare(self, a, b):
				751	r"""
Tim Peters	8a9c284	2001-09-22 21:30:22 +0000	[diff] [blame]	752	Compare two sequences of lines; generate the resulting delta.
Tim Peters	5e824c3	2001-08-12 22:25:01 +0000	[diff] [blame]	753
				754	Each sequence must contain individual single-line strings ending with
				755	newlines. Such sequences can be obtained from the `readlines()` method
Tim Peters	8a9c284	2001-09-22 21:30:22 +0000	[diff] [blame]	756	of file-like objects. The delta generated also consists of newline-
				757	terminated strings, ready to be printed as-is via the writeline()
Tim Peters	5e824c3	2001-08-12 22:25:01 +0000	[diff] [blame]	758	method of a file-like object.
				759
				760	Example:
				761
				762	>>> print ''.join(Differ().compare('one\ntwo\nthree\n'.splitlines(1),
				763	... 'ore\ntree\nemu\n'.splitlines(1))),
				764	- one
				765	? ^
				766	+ ore
				767	? ^
				768	- two
				769	- three
				770	? -
				771	+ tree
				772	+ emu
				773	"""
				774
				775	cruncher = SequenceMatcher(self.linejunk, a, b)
				776	for tag, alo, ahi, blo, bhi in cruncher.get_opcodes():
				777	if tag == 'replace':
Tim Peters	8a9c284	2001-09-22 21:30:22 +0000	[diff] [blame]	778	g = self._fancy_replace(a, alo, ahi, b, blo, bhi)
Tim Peters	5e824c3	2001-08-12 22:25:01 +0000	[diff] [blame]	779	elif tag == 'delete':
Tim Peters	8a9c284	2001-09-22 21:30:22 +0000	[diff] [blame]	780	g = self._dump('-', a, alo, ahi)
Tim Peters	5e824c3	2001-08-12 22:25:01 +0000	[diff] [blame]	781	elif tag == 'insert':
Tim Peters	8a9c284	2001-09-22 21:30:22 +0000	[diff] [blame]	782	g = self._dump('+', b, blo, bhi)
Tim Peters	5e824c3	2001-08-12 22:25:01 +0000	[diff] [blame]	783	elif tag == 'equal':
Tim Peters	8a9c284	2001-09-22 21:30:22 +0000	[diff] [blame]	784	g = self._dump(' ', a, alo, ahi)
Tim Peters	5e824c3	2001-08-12 22:25:01 +0000	[diff] [blame]	785	else:
				786	raise ValueError, 'unknown tag ' + `tag`
Tim Peters	8a9c284	2001-09-22 21:30:22 +0000	[diff] [blame]	787
				788	for line in g:
				789	yield line
Tim Peters	5e824c3	2001-08-12 22:25:01 +0000	[diff] [blame]	790
				791	def _dump(self, tag, x, lo, hi):
Tim Peters	8a9c284	2001-09-22 21:30:22 +0000	[diff] [blame]	792	"""Generate comparison results for a same-tagged range."""
Tim Peters	5e824c3	2001-08-12 22:25:01 +0000	[diff] [blame]	793	for i in xrange(lo, hi):
Tim Peters	8a9c284	2001-09-22 21:30:22 +0000	[diff] [blame]	794	yield '%s %s' % (tag, x[i])
Tim Peters	5e824c3	2001-08-12 22:25:01 +0000	[diff] [blame]	795
				796	def _plain_replace(self, a, alo, ahi, b, blo, bhi):
				797	assert alo < ahi and blo < bhi
				798	# dump the shorter block first -- reduces the burden on short-term
				799	# memory if the blocks are of very different sizes
				800	if bhi - blo < ahi - alo:
Tim Peters	8a9c284	2001-09-22 21:30:22 +0000	[diff] [blame]	801	first = self._dump('+', b, blo, bhi)
				802	second = self._dump('-', a, alo, ahi)
Tim Peters	5e824c3	2001-08-12 22:25:01 +0000	[diff] [blame]	803	else:
Tim Peters	8a9c284	2001-09-22 21:30:22 +0000	[diff] [blame]	804	first = self._dump('-', a, alo, ahi)
				805	second = self._dump('+', b, blo, bhi)
				806
				807	for g in first, second:
				808	for line in g:
				809	yield line
Tim Peters	5e824c3	2001-08-12 22:25:01 +0000	[diff] [blame]	810
				811	def _fancy_replace(self, a, alo, ahi, b, blo, bhi):
				812	r"""
				813	When replacing one block of lines with another, search the blocks
				814	for similar lines; the best-matching pair (if any) is used as a
				815	synch point, and intraline difference marking is done on the
				816	similar pair. Lots of work, but often worth it.
				817
				818	Example:
				819
				820	>>> d = Differ()
				821	>>> d._fancy_replace(['abcDefghiJkl\n'], 0, 1, ['abcdefGhijkl\n'], 0, 1)
				822	>>> print ''.join(d.results),
				823	- abcDefghiJkl
				824	? ^ ^ ^
				825	+ abcdefGhijkl
				826	? ^ ^ ^
				827	"""
				828
Tim Peters	5e824c3	2001-08-12 22:25:01 +0000	[diff] [blame]	829	# don't synch up unless the lines have a similarity score of at
				830	# least cutoff; best_ratio tracks the best score seen so far
				831	best_ratio, cutoff = 0.74, 0.75
				832	cruncher = SequenceMatcher(self.charjunk)
				833	eqi, eqj = None, None # 1st indices of equal lines (if any)
				834
				835	# search for the pair that matches best without being identical
				836	# (identical lines must be junk lines, & we don't want to synch up
				837	# on junk -- unless we have to)
				838	for j in xrange(blo, bhi):
				839	bj = b[j]
				840	cruncher.set_seq2(bj)
				841	for i in xrange(alo, ahi):
				842	ai = a[i]
				843	if ai == bj:
				844	if eqi is None:
				845	eqi, eqj = i, j
				846	continue
				847	cruncher.set_seq1(ai)
				848	# computing similarity is expensive, so use the quick
				849	# upper bounds first -- have seen this speed up messy
				850	# compares by a factor of 3.
				851	# note that ratio() is only expensive to compute the first
				852	# time it's called on a sequence pair; the expensive part
				853	# of the computation is cached by cruncher
				854	if cruncher.real_quick_ratio() > best_ratio and \
				855	cruncher.quick_ratio() > best_ratio and \
				856	cruncher.ratio() > best_ratio:
				857	best_ratio, best_i, best_j = cruncher.ratio(), i, j
				858	if best_ratio < cutoff:
				859	# no non-identical "pretty close" pair
				860	if eqi is None:
				861	# no identical pair either -- treat it as a straight replace
Tim Peters	8a9c284	2001-09-22 21:30:22 +0000	[diff] [blame]	862	for line in self._plain_replace(a, alo, ahi, b, blo, bhi):
				863	yield line
Tim Peters	5e824c3	2001-08-12 22:25:01 +0000	[diff] [blame]	864	return
				865	# no close pair, but an identical pair -- synch up on that
				866	best_i, best_j, best_ratio = eqi, eqj, 1.0
				867	else:
				868	# there's a close pair, so forget the identical pair (if any)
				869	eqi = None
				870
				871	# a[best_i] very similar to b[best_j]; eqi is None iff they're not
				872	# identical
Tim Peters	5e824c3	2001-08-12 22:25:01 +0000	[diff] [blame]	873
				874	# pump out diffs from before the synch point
Tim Peters	8a9c284	2001-09-22 21:30:22 +0000	[diff] [blame]	875	for line in self._fancy_helper(a, alo, best_i, b, blo, best_j):
				876	yield line
Tim Peters	5e824c3	2001-08-12 22:25:01 +0000	[diff] [blame]	877
				878	# do intraline marking on the synch pair
				879	aelt, belt = a[best_i], b[best_j]
				880	if eqi is None:
				881	# pump out a '-', '?', '+', '?' quad for the synched lines
				882	atags = btags = ""
				883	cruncher.set_seqs(aelt, belt)
				884	for tag, ai1, ai2, bj1, bj2 in cruncher.get_opcodes():
				885	la, lb = ai2 - ai1, bj2 - bj1
				886	if tag == 'replace':
				887	atags += '^' * la
				888	btags += '^' * lb
				889	elif tag == 'delete':
				890	atags += '-' * la
				891	elif tag == 'insert':
				892	btags += '+' * lb
				893	elif tag == 'equal':
				894	atags += ' ' * la
				895	btags += ' ' * lb
				896	else:
				897	raise ValueError, 'unknown tag ' + `tag`
Tim Peters	8a9c284	2001-09-22 21:30:22 +0000	[diff] [blame]	898	for line in self._qformat(aelt, belt, atags, btags):
				899	yield line
Tim Peters	5e824c3	2001-08-12 22:25:01 +0000	[diff] [blame]	900	else:
				901	# the synch pair is identical
Tim Peters	8a9c284	2001-09-22 21:30:22 +0000	[diff] [blame]	902	yield ' ' + aelt
Tim Peters	5e824c3	2001-08-12 22:25:01 +0000	[diff] [blame]	903
				904	# pump out diffs from after the synch point
Tim Peters	8a9c284	2001-09-22 21:30:22 +0000	[diff] [blame]	905	for line in self._fancy_helper(a, best_i+1, ahi, b, best_j+1, bhi):
				906	yield line
Tim Peters	5e824c3	2001-08-12 22:25:01 +0000	[diff] [blame]	907
				908	def _fancy_helper(self, a, alo, ahi, b, blo, bhi):
Tim Peters	8a9c284	2001-09-22 21:30:22 +0000	[diff] [blame]	909	g = []
Tim Peters	5e824c3	2001-08-12 22:25:01 +0000	[diff] [blame]	910	if alo < ahi:
				911	if blo < bhi:
Tim Peters	8a9c284	2001-09-22 21:30:22 +0000	[diff] [blame]	912	g = self._fancy_replace(a, alo, ahi, b, blo, bhi)
Tim Peters	5e824c3	2001-08-12 22:25:01 +0000	[diff] [blame]	913	else:
Tim Peters	8a9c284	2001-09-22 21:30:22 +0000	[diff] [blame]	914	g = self._dump('-', a, alo, ahi)
Tim Peters	5e824c3	2001-08-12 22:25:01 +0000	[diff] [blame]	915	elif blo < bhi:
Tim Peters	8a9c284	2001-09-22 21:30:22 +0000	[diff] [blame]	916	g = self._dump('+', b, blo, bhi)
				917
				918	for line in g:
				919	yield line
Tim Peters	5e824c3	2001-08-12 22:25:01 +0000	[diff] [blame]	920
				921	def _qformat(self, aline, bline, atags, btags):
				922	r"""
				923	Format "?" output and deal with leading tabs.
				924
				925	Example:
				926
				927	>>> d = Differ()
				928	>>> d._qformat('\tabcDefghiJkl\n', '\t\tabcdefGhijkl\n',
				929	... ' ^ ^ ^ ', '+ ^ ^ ^ ')
				930	>>> for line in d.results: print repr(line)
				931	...
				932	'- \tabcDefghiJkl\n'
				933	'? \t ^ ^ ^\n'
				934	'+ \t\tabcdefGhijkl\n'
				935	'? \t ^ ^ ^\n'
				936	"""
				937
				938	# Can hurt, but will probably help most of the time.
				939	common = min(_count_leading(aline, "\t"),
				940	_count_leading(bline, "\t"))
				941	common = min(common, _count_leading(atags[:common], " "))
				942	atags = atags[common:].rstrip()
				943	btags = btags[common:].rstrip()
				944
Tim Peters	8a9c284	2001-09-22 21:30:22 +0000	[diff] [blame]	945	yield "- " + aline
Tim Peters	5e824c3	2001-08-12 22:25:01 +0000	[diff] [blame]	946	if atags:
Tim Peters	527e64f	2001-10-04 05:36:56 +0000	[diff] [blame]	947	yield "? %s%s\n" % ("\t" * common, atags)
Tim Peters	5e824c3	2001-08-12 22:25:01 +0000	[diff] [blame]	948
Tim Peters	8a9c284	2001-09-22 21:30:22 +0000	[diff] [blame]	949	yield "+ " + bline
Tim Peters	5e824c3	2001-08-12 22:25:01 +0000	[diff] [blame]	950	if btags:
Tim Peters	8a9c284	2001-09-22 21:30:22 +0000	[diff] [blame]	951	yield "? %s%s\n" % ("\t" * common, btags)
Tim Peters	5e824c3	2001-08-12 22:25:01 +0000	[diff] [blame]	952
				953	# With respect to junk, an earlier version of ndiff simply refused to
				954	# start a match with a junk element. The result was cases like this:
				955	# before: private Thread currentThread;
				956	# after: private volatile Thread currentThread;
				957	# If you consider whitespace to be junk, the longest contiguous match
				958	# not starting with junk is "e Thread currentThread". So ndiff reported
				959	# that "e volatil" was inserted between the 't' and the 'e' in "private".
				960	# While an accurate view, to people that's absurd. The current version
				961	# looks for matching blocks that are entirely junk-free, then extends the
				962	# longest one of those as far as possible but only with matching junk.
				963	# So now "currentThread" is matched, then extended to suck up the
				964	# preceding blank; then "private" is matched, and extended to suck up the
				965	# following blank; then "Thread" is matched; and finally ndiff reports
				966	# that "volatile " was inserted before "Thread". The only quibble
				967	# remaining is that perhaps it was really the case that " volatile"
				968	# was inserted after "private". I can live with that <wink>.
				969
				970	import re
				971
				972	def IS_LINE_JUNK(line, pat=re.compile(r"\s#?\s$").match):
				973	r"""
				974	Return 1 for ignorable line: iff `line` is blank or contains a single '#'.
				975
				976	Examples:
				977
				978	>>> IS_LINE_JUNK('\n')
				979	1
				980	>>> IS_LINE_JUNK(' # \n')
				981	1
				982	>>> IS_LINE_JUNK('hello\n')
				983	0
				984	"""
				985
				986	return pat(line) is not None
				987
				988	def IS_CHARACTER_JUNK(ch, ws=" \t"):
				989	r"""
				990	Return 1 for ignorable character: iff `ch` is a space or tab.
				991
				992	Examples:
				993
				994	>>> IS_CHARACTER_JUNK(' ')
				995	1
				996	>>> IS_CHARACTER_JUNK('\t')
				997	1
				998	>>> IS_CHARACTER_JUNK('\n')
				999	0
				1000	>>> IS_CHARACTER_JUNK('x')
				1001	0
				1002	"""
				1003
				1004	return ch in ws
				1005
				1006	del re
				1007
				1008	def ndiff(a, b, linejunk=IS_LINE_JUNK, charjunk=IS_CHARACTER_JUNK):
				1009	r"""
				1010	Compare `a` and `b` (lists of strings); return a `Differ`-style delta.
				1011
				1012	Optional keyword parameters `linejunk` and `charjunk` are for filter
				1013	functions (or None):
				1014
				1015	- linejunk: A function that should accept a single string argument, and
				1016	return true iff the string is junk. The default is module-level function
				1017	IS_LINE_JUNK, which filters out lines without visible characters, except
				1018	for at most one splat ('#').
				1019
				1020	- charjunk: A function that should accept a string of length 1. The
				1021	default is module-level function IS_CHARACTER_JUNK, which filters out
				1022	whitespace characters (a blank or tab; note: bad idea to include newline
				1023	in this!).
				1024
				1025	Tools/scripts/ndiff.py is a command-line front-end to this function.
				1026
				1027	Example:
				1028
				1029	>>> diff = ndiff('one\ntwo\nthree\n'.splitlines(1),
				1030	... 'ore\ntree\nemu\n'.splitlines(1))
				1031	>>> print ''.join(diff),
				1032	- one
				1033	? ^
				1034	+ ore
				1035	? ^
				1036	- two
				1037	- three
				1038	? -
				1039	+ tree
				1040	+ emu
				1041	"""
				1042	return Differ(linejunk, charjunk).compare(a, b)
				1043
				1044	def restore(delta, which):
				1045	r"""
Tim Peters	8a9c284	2001-09-22 21:30:22 +0000	[diff] [blame]	1046	Generate one of the two sequences that generated a delta.
Tim Peters	5e824c3	2001-08-12 22:25:01 +0000	[diff] [blame]	1047
				1048	Given a `delta` produced by `Differ.compare()` or `ndiff()`, extract
				1049	lines originating from file 1 or 2 (parameter `which`), stripping off line
				1050	prefixes.
				1051
				1052	Examples:
				1053
				1054	>>> diff = ndiff('one\ntwo\nthree\n'.splitlines(1),
				1055	... 'ore\ntree\nemu\n'.splitlines(1))
Tim Peters	8a9c284	2001-09-22 21:30:22 +0000	[diff] [blame]	1056	>>> diff = list(diff)
Tim Peters	5e824c3	2001-08-12 22:25:01 +0000	[diff] [blame]	1057	>>> print ''.join(restore(diff, 1)),
				1058	one
				1059	two
				1060	three
				1061	>>> print ''.join(restore(diff, 2)),
				1062	ore
				1063	tree
				1064	emu
				1065	"""
				1066	try:
				1067	tag = {1: "- ", 2: "+ "}[int(which)]
				1068	except KeyError:
				1069	raise ValueError, ('unknown delta choice (must be 1 or 2): %r'
				1070	% which)
				1071	prefixes = (" ", tag)
Tim Peters	5e824c3	2001-08-12 22:25:01 +0000	[diff] [blame]	1072	for line in delta:
				1073	if line[:2] in prefixes:
Tim Peters	8a9c284	2001-09-22 21:30:22 +0000	[diff] [blame]	1074	yield line[2:]
Tim Peters	5e824c3	2001-08-12 22:25:01 +0000	[diff] [blame]	1075
Tim Peters	9ae2148	2001-02-10 08:00:53 +0000	[diff] [blame]	1076	def _test():
				1077	import doctest, difflib
				1078	return doctest.testmod(difflib)
				1079
				1080	if __name__ == "__main__":
				1081	_test()