Blame - Doc/lib/libdifflib.tex - platform/external/python/cpython3

blob: 37e401e27410d9c9b632638f3b4d716169065f10 [file] [log] [blame]

Fred Drake	baf7142	2001-02-19 16:31:02 +0000	[diff] [blame]	1	\section{\module{difflib} ---
				2	Helpers for computing deltas}
				3
				4	\declaremodule{standard}{difflib}
				5	\modulesynopsis{Helpers for computing differences between objects.}
				6	\moduleauthor{Tim Peters}{tim.one@home.com}
				7	\sectionauthor{Tim Peters}{tim.one@home.com}
				8	% LaTeXification by Fred L. Drake, Jr. <fdrake@acm.org>.
				9
Fred Drake	da00cda	2001-04-10 19:56:09 +0000	[diff] [blame]	10	\versionadded{2.1}
				11
				12
Fred Drake	6943a29	2001-08-13 19:31:59 +0000	[diff] [blame]	13	\begin{classdesc*}{SequenceMatcher}
				14	This is a flexible class for comparing pairs of sequences of any
				15	type, so long as the sequence elements are hashable. The basic
				16	algorithm predates, and is a little fancier than, an algorithm
				17	published in the late 1980's by Ratcliff and Obershelp under the
				18	hyperbolic name ``gestalt pattern matching.'' The idea is to find
				19	the longest contiguous matching subsequence that contains no
				20	``junk'' elements (the Ratcliff and Obershelp algorithm doesn't
				21	address junk). The same idea is then applied recursively to the
				22	pieces of the sequences to the left and to the right of the matching
				23	subsequence. This does not yield minimal edit sequences, but does
				24	tend to yield matches that ``look right'' to people.
				25
				26	\strong{Timing:} The basic Ratcliff-Obershelp algorithm is cubic
				27	time in the worst case and quadratic time in the expected case.
				28	\class{SequenceMatcher} is quadratic time for the worst case and has
				29	expected-case behavior dependent in a complicated way on how many
				30	elements the sequences have in common; best case time is linear.
				31	\end{classdesc*}
				32
				33	\begin{classdesc*}{Differ}
				34	This is a class for comparing sequences of lines of text, and
Tim Peters	8a9c284	2001-09-22 21:30:22 +0000	[diff] [blame]	35	producing human-readable differences or deltas. Differ uses
Fred Drake	6943a29	2001-08-13 19:31:59 +0000	[diff] [blame]	36	\class{SequenceMatcher} both to compare sequences of lines, and to
				37	compare sequences of characters within similar (near-matching)
				38	lines.
				39
				40	Each line of a \class{Differ} delta begins with a two-letter code:
				41
				42	\begin{tableii}{l\|l}{code}{Code}{Meaning}
				43	\lineii{'- '}{line unique to sequence 1}
				44	\lineii{'+ '}{line unique to sequence 2}
				45	\lineii{' '}{line common to both sequences}
				46	\lineii{'? '}{line not present in either input sequence}
				47	\end{tableii}
				48
				49	Lines beginning with `\code{?~}' attempt to guide the eye to
				50	intraline differences, and were not present in either input
				51	sequence. These lines can be confusing if the sequences contain tab
				52	characters.
				53	\end{classdesc*}
				54
Fred Drake	baf7142	2001-02-19 16:31:02 +0000	[diff] [blame]	55	\begin{funcdesc}{get_close_matches}{word, possibilities\optional{,
				56	n\optional{, cutoff}}}
				57	Return a list of the best ``good enough'' matches. \var{word} is a
				58	sequence for which close matches are desired (typically a string),
				59	and \var{possibilities} is a list of sequences against which to
				60	match \var{word} (typically a list of strings).
				61
				62	Optional argument \var{n} (default \code{3}) is the maximum number
				63	of close matches to return; \var{n} must be greater than \code{0}.
				64
				65	Optional argument \var{cutoff} (default \code{0.6}) is a float in
				66	the range [0, 1]. Possibilities that don't score at least that
				67	similar to \var{word} are ignored.
				68
				69	The best (no more than \var{n}) matches among the possibilities are
				70	returned in a list, sorted by similarity score, most similar first.
				71
				72	\begin{verbatim}
				73	>>> get_close_matches('appel', ['ape', 'apple', 'peach', 'puppy'])
				74	['apple', 'ape']
				75	>>> import keyword
				76	>>> get_close_matches('wheel', keyword.kwlist)
				77	['while']
				78	>>> get_close_matches('apple', keyword.kwlist)
				79	[]
				80	>>> get_close_matches('accept', keyword.kwlist)
				81	['except']
				82	\end{verbatim}
				83	\end{funcdesc}
				84
Fred Drake	6943a29	2001-08-13 19:31:59 +0000	[diff] [blame]	85	\begin{funcdesc}{ndiff}{a, b\optional{, linejunk\optional{,
				86	charjunk}}}
				87	Compare \var{a} and \var{b} (lists of strings); return a
Tim Peters	8a9c284	2001-09-22 21:30:22 +0000	[diff] [blame]	88	\class{Differ}-style delta (a generator generating the delta lines).
Fred Drake	baf7142	2001-02-19 16:31:02 +0000	[diff] [blame]	89
Fred Drake	6943a29	2001-08-13 19:31:59 +0000	[diff] [blame]	90	Optional keyword parameters \var{linejunk} and \var{charjunk} are
				91	for filter functions (or \code{None}):
				92
Tim Peters	81b9251	2002-04-29 01:37:32 +0000	[diff] [blame]	93	\var{linejunk}: A function that accepts a single string
				94	argument, and returns true if the string is junk, or false if not.
				95	The default is (\code{None}), starting with Python 2.3. Before then,
				96	the default was the module-level function
Fred Drake	6943a29	2001-08-13 19:31:59 +0000	[diff] [blame]	97	\function{IS_LINE_JUNK()}, which filters out lines without visible
				98	characters, except for at most one pound character (\character{\#}).
Tim Peters	81b9251	2002-04-29 01:37:32 +0000	[diff] [blame]	99	As of Python 2.3, the underlying \class{SequenceMatcher} class
				100	does a dynamic analysis of which lines are so frequent as to
				101	constitute noise, and this usually works better than the pre-2.3
				102	default.
Fred Drake	6943a29	2001-08-13 19:31:59 +0000	[diff] [blame]	103
Tim Peters	81b9251	2002-04-29 01:37:32 +0000	[diff] [blame]	104	\var{charjunk}: A function that accepts a character (a string of
				105	length 1), and returns if the character is junk, or false if not.
Fred Drake	6943a29	2001-08-13 19:31:59 +0000	[diff] [blame]	106	The default is module-level function \function{IS_CHARACTER_JUNK()},
				107	which filters out whitespace characters (a blank or tab; note: bad
				108	idea to include newline in this!).
				109
				110	\file{Tools/scripts/ndiff.py} is a command-line front-end to this
				111	function.
				112
				113	\begin{verbatim}
				114	>>> diff = ndiff('one\ntwo\nthree\n'.splitlines(1),
				115	... 'ore\ntree\nemu\n'.splitlines(1)))
				116	>>> print ''.join(diff),
				117	- one
Tim Peters	8a9c284	2001-09-22 21:30:22 +0000	[diff] [blame]	118	? ^
Fred Drake	6943a29	2001-08-13 19:31:59 +0000	[diff] [blame]	119	+ ore
Tim Peters	8a9c284	2001-09-22 21:30:22 +0000	[diff] [blame]	120	? ^
Fred Drake	6943a29	2001-08-13 19:31:59 +0000	[diff] [blame]	121	- two
				122	- three
Tim Peters	8a9c284	2001-09-22 21:30:22 +0000	[diff] [blame]	123	? -
Fred Drake	6943a29	2001-08-13 19:31:59 +0000	[diff] [blame]	124	+ tree
				125	+ emu
				126	\end{verbatim}
				127	\end{funcdesc}
				128
				129	\begin{funcdesc}{restore}{sequence, which}
				130	Return one of the two sequences that generated a delta.
				131
				132	Given a \var{sequence} produced by \method{Differ.compare()} or
				133	\function{ndiff()}, extract lines originating from file 1 or 2
				134	(parameter \var{which}), stripping off line prefixes.
				135
				136	Example:
				137
				138	\begin{verbatim}
				139	>>> diff = ndiff('one\ntwo\nthree\n'.splitlines(1),
				140	... 'ore\ntree\nemu\n'.splitlines(1))
Tim Peters	8a9c284	2001-09-22 21:30:22 +0000	[diff] [blame]	141	>>> diff = list(diff) # materialize the generated delta into a list
Fred Drake	6943a29	2001-08-13 19:31:59 +0000	[diff] [blame]	142	>>> print ''.join(restore(diff, 1)),
				143	one
				144	two
				145	three
				146	>>> print ''.join(restore(diff, 2)),
				147	ore
				148	tree
				149	emu
				150	\end{verbatim}
				151
				152	\end{funcdesc}
				153
				154
Fred Drake	7f10cce	2001-10-26 03:04:23 +0000	[diff] [blame]	155	\begin{funcdesc}{IS_LINE_JUNK}{line}
				156	Return true for ignorable lines. The line \var{line} is ignorable
				157	if \var{line} is blank or contains a single \character{\#},
				158	otherwise it is not ignorable. Used as a default for parameter
Tim Peters	81b9251	2002-04-29 01:37:32 +0000	[diff] [blame]	159	\var{linejunk} in \function{ndiff()} before Python 2.3.
Fred Drake	6943a29	2001-08-13 19:31:59 +0000	[diff] [blame]	160	\end{funcdesc}
				161
				162
Fred Drake	7f10cce	2001-10-26 03:04:23 +0000	[diff] [blame]	163	\begin{funcdesc}{IS_CHARACTER_JUNK}{ch}
				164	Return true for ignorable characters. The character \var{ch} is
				165	ignorable if \var{ch} is a space or tab, otherwise it is not
				166	ignorable. Used as a default for parameter \var{charjunk} in
Fred Drake	6943a29	2001-08-13 19:31:59 +0000	[diff] [blame]	167	\function{ndiff()}.
Fred Drake	6943a29	2001-08-13 19:31:59 +0000	[diff] [blame]	168	\end{funcdesc}
Fred Drake	baf7142	2001-02-19 16:31:02 +0000	[diff] [blame]	169
				170
Fred Drake	6fda3ac	2001-04-10 18:41:16 +0000	[diff] [blame]	171	\begin{seealso}
				172	\seetitle{Pattern Matching: The Gestalt Approach}{Discussion of a
				173	similar algorithm by John W. Ratcliff and D. E. Metzener.
				174	This was published in
				175	\citetitle[http://www.ddj.com/]{Dr. Dobb's Journal} in
				176	July, 1988.}
				177	\end{seealso}
				178
				179
Fred Drake	baf7142	2001-02-19 16:31:02 +0000	[diff] [blame]	180	\subsection{SequenceMatcher Objects \label{sequence-matcher}}
				181
Fred Drake	96d7a70	2001-05-11 01:08:13 +0000	[diff] [blame]	182	The \class{SequenceMatcher} class has this constructor:
				183
Fred Drake	baf7142	2001-02-19 16:31:02 +0000	[diff] [blame]	184	\begin{classdesc}{SequenceMatcher}{\optional{isjunk\optional{,
				185	a\optional{, b}}}}
				186	Optional argument \var{isjunk} must be \code{None} (the default) or
				187	a one-argument function that takes a sequence element and returns
				188	true if and only if the element is ``junk'' and should be ignored.
Fred Drake	7f10cce	2001-10-26 03:04:23 +0000	[diff] [blame]	189	Passing \code{None} for \var{b} is equivalent to passing
				190	\code{lambda x: 0}; in other words, no elements are ignored. For
				191	example, pass:
Fred Drake	baf7142	2001-02-19 16:31:02 +0000	[diff] [blame]	192
				193	\begin{verbatim}
Fred Drake	447f545	2001-02-23 19:13:07 +0000	[diff] [blame]	194	lambda x: x in " \t"
Fred Drake	baf7142	2001-02-19 16:31:02 +0000	[diff] [blame]	195	\end{verbatim}
				196
				197	if you're comparing lines as sequences of characters, and don't want
				198	to synch up on blanks or hard tabs.
				199
				200	The optional arguments \var{a} and \var{b} are sequences to be
				201	compared; both default to empty strings. The elements of both
				202	sequences must be hashable.
				203	\end{classdesc}
				204
				205
				206	\class{SequenceMatcher} objects have the following methods:
				207
				208	\begin{methoddesc}{set_seqs}{a, b}
				209	Set the two sequences to be compared.
				210	\end{methoddesc}
				211
				212	\class{SequenceMatcher} computes and caches detailed information about
				213	the second sequence, so if you want to compare one sequence against
				214	many sequences, use \method{set_seq2()} to set the commonly used
				215	sequence once and call \method{set_seq1()} repeatedly, once for each
				216	of the other sequences.
				217
				218	\begin{methoddesc}{set_seq1}{a}
				219	Set the first sequence to be compared. The second sequence to be
				220	compared is not changed.
				221	\end{methoddesc}
				222
				223	\begin{methoddesc}{set_seq2}{b}
				224	Set the second sequence to be compared. The first sequence to be
				225	compared is not changed.
				226	\end{methoddesc}
				227
				228	\begin{methoddesc}{find_longest_match}{alo, ahi, blo, bhi}
				229	Find longest matching block in \code{\var{a}[\var{alo}:\var{ahi}]}
				230	and \code{\var{b}[\var{blo}:\var{bhi}]}.
				231
				232	If \var{isjunk} was omitted or \code{None},
				233	\method{get_longest_match()} returns \code{(\var{i}, \var{j},
				234	\var{k})} such that \code{\var{a}[\var{i}:\var{i}+\var{k}]} is equal
Tim Peters	8a9c284	2001-09-22 21:30:22 +0000	[diff] [blame]	235	to \code{\var{b}[\var{j}:\var{j}+\var{k}]}, where
Fred Drake	baf7142	2001-02-19 16:31:02 +0000	[diff] [blame]	236	\code{\var{alo} <= \var{i} <= \var{i}+\var{k} <= \var{ahi}} and
				237	\code{\var{blo} <= \var{j} <= \var{j}+\var{k} <= \var{bhi}}.
				238	For all \code{(\var{i'}, \var{j'}, \var{k'})} meeting those
				239	conditions, the additional conditions
				240	\code{\var{k} >= \var{k'}},
				241	\code{\var{i} <= \var{i'}},
				242	and if \code{\var{i} == \var{i'}}, \code{\var{j} <= \var{j'}}
				243	are also met.
				244	In other words, of all maximal matching blocks, return one that
				245	starts earliest in \var{a}, and of all those maximal matching blocks
				246	that start earliest in \var{a}, return the one that starts earliest
				247	in \var{b}.
				248
				249	\begin{verbatim}
				250	>>> s = SequenceMatcher(None, " abcd", "abcd abcd")
				251	>>> s.find_longest_match(0, 5, 0, 9)
				252	(0, 4, 5)
				253	\end{verbatim}
				254
				255	If \var{isjunk} was provided, first the longest matching block is
				256	determined as above, but with the additional restriction that no
				257	junk element appears in the block. Then that block is extended as
				258	far as possible by matching (only) junk elements on both sides.
				259	So the resulting block never matches on junk except as identical
				260	junk happens to be adjacent to an interesting match.
				261
				262	Here's the same example as before, but considering blanks to be junk.
Tim Peters	754ba58	2001-02-20 11:24:35 +0000	[diff] [blame]	263	That prevents \code{' abcd'} from matching the \code{' abcd'} at the
Fred Drake	baf7142	2001-02-19 16:31:02 +0000	[diff] [blame]	264	tail end of the second sequence directly. Instead only the
				265	\code{'abcd'} can match, and matches the leftmost \code{'abcd'} in
				266	the second sequence:
				267
				268	\begin{verbatim}
				269	>>> s = SequenceMatcher(lambda x: x==" ", " abcd", "abcd abcd")
				270	>>> s.find_longest_match(0, 5, 0, 9)
				271	(1, 0, 4)
				272	\end{verbatim}
				273
				274	If no blocks match, this returns \code{(\var{alo}, \var{blo}, 0)}.
				275	\end{methoddesc}
				276
				277	\begin{methoddesc}{get_matching_blocks}{}
				278	Return list of triples describing matching subsequences.
				279	Each triple is of the form \code{(\var{i}, \var{j}, \var{n})}, and
				280	means that \code{\var{a}[\var{i}:\var{i}+\var{n}] ==
				281	\var{b}[\var{j}:\var{j}+\var{n}]}. The triples are monotonically
				282	increasing in \var{i} and \var{j}.
				283
				284	The last triple is a dummy, and has the value \code{(len(\var{a}),
				285	len(\var{b}), 0)}. It is the only triple with \code{\var{n} == 0}.
				286	% Explain why a dummy is used!
				287
				288	\begin{verbatim}
				289	>>> s = SequenceMatcher(None, "abxcd", "abcd")
				290	>>> s.get_matching_blocks()
				291	[(0, 0, 2), (3, 2, 2), (5, 4, 0)]
				292	\end{verbatim}
				293	\end{methoddesc}
				294
				295	\begin{methoddesc}{get_opcodes}{}
				296	Return list of 5-tuples describing how to turn \var{a} into \var{b}.
				297	Each tuple is of the form \code{(\var{tag}, \var{i1}, \var{i2},
				298	\var{j1}, \var{j2})}. The first tuple has \code{\var{i1} ==
				299	\var{j1} == 0}, and remaining tuples have \var{i1} equal to the
				300	\var{i2} from the preceeding tuple, and, likewise, \var{j1} equal to
				301	the previous \var{j2}.
				302
				303	The \var{tag} values are strings, with these meanings:
				304
				305	\begin{tableii}{l\|l}{code}{Value}{Meaning}
				306	\lineii{'replace'}{\code{\var{a}[\var{i1}:\var{i2}]} should be
				307	replaced by \code{\var{b}[\var{j1}:\var{j2}]}.}
				308	\lineii{'delete'}{\code{\var{a}[\var{i1}:\var{i2}]} should be
				309	deleted. Note that \code{\var{j1} == \var{j2}} in
				310	this case.}
				311	\lineii{'insert'}{\code{\var{b}[\var{j1}:\var{j2}]} should be
Tim Peters	8a9c284	2001-09-22 21:30:22 +0000	[diff] [blame]	312	inserted at \code{\var{a}[\var{i1}:\var{i1}]}.
Fred Drake	baf7142	2001-02-19 16:31:02 +0000	[diff] [blame]	313	Note that \code{\var{i1} == \var{i2}} in this
				314	case.}
				315	\lineii{'equal'}{\code{\var{a}[\var{i1}:\var{i2}] ==
				316	\var{b}[\var{j1}:\var{j2}]} (the sub-sequences are
				317	equal).}
				318	\end{tableii}
				319
				320	For example:
				321
				322	\begin{verbatim}
				323	>>> a = "qabxcd"
				324	>>> b = "abycdf"
				325	>>> s = SequenceMatcher(None, a, b)
				326	>>> for tag, i1, i2, j1, j2 in s.get_opcodes():
				327	... print ("%7s a[%d:%d] (%s) b[%d:%d] (%s)" %
				328	... (tag, i1, i2, a[i1:i2], j1, j2, b[j1:j2]))
				329	delete a[0:1] (q) b[0:0] ()
				330	equal a[1:3] (ab) b[0:2] (ab)
				331	replace a[3:4] (x) b[2:3] (y)
				332	equal a[4:6] (cd) b[3:5] (cd)
				333	insert a[6:6] () b[5:6] (f)
				334	\end{verbatim}
				335	\end{methoddesc}
				336
				337	\begin{methoddesc}{ratio}{}
				338	Return a measure of the sequences' similarity as a float in the
				339	range [0, 1].
				340
				341	Where T is the total number of elements in both sequences, and M is
Fred Drake	6943a29	2001-08-13 19:31:59 +0000	[diff] [blame]	342	the number of matches, this is 2.0*M / T. Note that this is
				343	\code{1.0} if the sequences are identical, and \code{0.0} if they
				344	have nothing in common.
Fred Drake	baf7142	2001-02-19 16:31:02 +0000	[diff] [blame]	345
				346	This is expensive to compute if \method{get_matching_blocks()} or
				347	\method{get_opcodes()} hasn't already been called, in which case you
				348	may want to try \method{quick_ratio()} or
				349	\method{real_quick_ratio()} first to get an upper bound.
				350	\end{methoddesc}
				351
				352	\begin{methoddesc}{quick_ratio}{}
				353	Return an upper bound on \method{ratio()} relatively quickly.
				354
				355	This isn't defined beyond that it is an upper bound on
				356	\method{ratio()}, and is faster to compute.
				357	\end{methoddesc}
				358
				359	\begin{methoddesc}{real_quick_ratio}{}
				360	Return an upper bound on \method{ratio()} very quickly.
				361
				362	This isn't defined beyond that it is an upper bound on
				363	\method{ratio()}, and is faster to compute than either
				364	\method{ratio()} or \method{quick_ratio()}.
				365	\end{methoddesc}
				366
Tim Peters	754ba58	2001-02-20 11:24:35 +0000	[diff] [blame]	367	The three methods that return the ratio of matching to total characters
				368	can give different results due to differing levels of approximation,
				369	although \method{quick_ratio()} and \method{real_quick_ratio()} are always
				370	at least as large as \method{ratio()}:
Fred Drake	baf7142	2001-02-19 16:31:02 +0000	[diff] [blame]	371
				372	\begin{verbatim}
				373	>>> s = SequenceMatcher(None, "abcd", "bcde")
				374	>>> s.ratio()
				375	0.75
				376	>>> s.quick_ratio()
				377	0.75
				378	>>> s.real_quick_ratio()
				379	1.0
				380	\end{verbatim}
				381
				382
Fred Drake	6943a29	2001-08-13 19:31:59 +0000	[diff] [blame]	383	\subsection{SequenceMatcher Examples \label{sequencematcher-examples}}
Fred Drake	baf7142	2001-02-19 16:31:02 +0000	[diff] [blame]	384
				385
				386	This example compares two strings, considering blanks to be ``junk:''
				387
				388	\begin{verbatim}
				389	>>> s = SequenceMatcher(lambda x: x == " ",
				390	... "private Thread currentThread;",
				391	... "private volatile Thread currentThread;")
				392	\end{verbatim}
				393
				394	\method{ratio()} returns a float in [0, 1], measuring the similarity
				395	of the sequences. As a rule of thumb, a \method{ratio()} value over
				396	0.6 means the sequences are close matches:
				397
				398	\begin{verbatim}
				399	>>> print round(s.ratio(), 3)
				400	0.866
				401	\end{verbatim}
				402
				403	If you're only interested in where the sequences match,
				404	\method{get_matching_blocks()} is handy:
				405
				406	\begin{verbatim}
				407	>>> for block in s.get_matching_blocks():
				408	... print "a[%d] and b[%d] match for %d elements" % block
				409	a[0] and b[0] match for 8 elements
				410	a[8] and b[17] match for 6 elements
				411	a[14] and b[23] match for 15 elements
				412	a[29] and b[38] match for 0 elements
				413	\end{verbatim}
				414
				415	Note that the last tuple returned by \method{get_matching_blocks()} is
				416	always a dummy, \code{(len(\var{a}), len(\var{b}), 0)}, and this is
				417	the only case in which the last tuple element (number of elements
				418	matched) is \code{0}.
				419
				420	If you want to know how to change the first sequence into the second,
				421	use \method{get_opcodes()}:
				422
				423	\begin{verbatim}
				424	>>> for opcode in s.get_opcodes():
				425	... print "%6s a[%d:%d] b[%d:%d]" % opcode
				426	equal a[0:8] b[0:8]
				427	insert a[8:8] b[8:17]
				428	equal a[8:14] b[17:23]
				429	equal a[14:29] b[23:38]
				430	\end{verbatim}
				431
Fred Drake	baf7142	2001-02-19 16:31:02 +0000	[diff] [blame]	432	See also the function \function{get_close_matches()} in this module,
				433	which shows how simple code building on \class{SequenceMatcher} can be
				434	used to do useful work.
Fred Drake	6943a29	2001-08-13 19:31:59 +0000	[diff] [blame]	435
				436
				437	\subsection{Differ Objects \label{differ-objects}}
				438
				439	Note that \class{Differ}-generated deltas make no claim to be
				440	\strong{minimal} diffs. To the contrary, minimal diffs are often
				441	counter-intuitive, because they synch up anywhere possible, sometimes
				442	accidental matches 100 pages apart. Restricting synch points to
				443	contiguous matches preserves some notion of locality, at the
				444	occasional cost of producing a longer diff.
				445
				446	The \class{Differ} class has this constructor:
				447
				448	\begin{classdesc}{Differ}{\optional{linejunk\optional{, charjunk}}}
				449	Optional keyword parameters \var{linejunk} and \var{charjunk} are
				450	for filter functions (or \code{None}):
				451
Tim Peters	81b9251	2002-04-29 01:37:32 +0000	[diff] [blame]	452	\var{linejunk}: A function that accepts a single string
				453	argument, and returns true if the string is junk. The default is
				454	\code{None}, meaning that no line is considered junk.
Fred Drake	6943a29	2001-08-13 19:31:59 +0000	[diff] [blame]	455
Tim Peters	81b9251	2002-04-29 01:37:32 +0000	[diff] [blame]	456	\var{charjunk}: A function that accepts a single character argument
				457	(a string of length 1), and returns true if the character is junk.
				458	The default is \code{None}, meaning that no character is
				459	considered junk.
Fred Drake	6943a29	2001-08-13 19:31:59 +0000	[diff] [blame]	460	\end{classdesc}
				461
				462	\class{Differ} objects are used (deltas generated) via a single
				463	method:
				464
				465	\begin{methoddesc}{compare}{a, b}
Tim Peters	8a9c284	2001-09-22 21:30:22 +0000	[diff] [blame]	466	Compare two sequences of lines, and generate the delta (a sequence
				467	of lines).
Fred Drake	6943a29	2001-08-13 19:31:59 +0000	[diff] [blame]	468
				469	Each sequence must contain individual single-line strings ending
				470	with newlines. Such sequences can be obtained from the
Tim Peters	8a9c284	2001-09-22 21:30:22 +0000	[diff] [blame]	471	\method{readlines()} method of file-like objects. The delta generated
				472	also consists of newline-terminated strings, ready to be printed as-is
Fred Drake	389aa17	2001-11-29 19:04:50 +0000	[diff] [blame]	473	via the \method{writelines()} method of a file-like object.
Fred Drake	6943a29	2001-08-13 19:31:59 +0000	[diff] [blame]	474	\end{methoddesc}
				475
				476
				477	\subsection{Differ Example \label{differ-examples}}
				478
				479	This example compares two texts. First we set up the texts, sequences
				480	of individual single-line strings ending with newlines (such sequences
				481	can also be obtained from the \method{readlines()} method of file-like
				482	objects):
				483
				484	\begin{verbatim}
				485	>>> text1 = ''' 1. Beautiful is better than ugly.
				486	... 2. Explicit is better than implicit.
				487	... 3. Simple is better than complex.
				488	... 4. Complex is better than complicated.
				489	... '''.splitlines(1)
				490	>>> len(text1)
				491	4
				492	>>> text1[0][-1]
				493	'\n'
				494	>>> text2 = ''' 1. Beautiful is better than ugly.
				495	... 3. Simple is better than complex.
				496	... 4. Complicated is better than complex.
				497	... 5. Flat is better than nested.
				498	... '''.splitlines(1)
				499	\end{verbatim}
				500
				501	Next we instantiate a Differ object:
				502
				503	\begin{verbatim}
				504	>>> d = Differ()
				505	\end{verbatim}
				506
				507	Note that when instantiating a \class{Differ} object we may pass
				508	functions to filter out line and character ``junk.'' See the
				509	\method{Differ()} constructor for details.
				510
				511	Finally, we compare the two:
				512
				513	\begin{verbatim}
Tim Peters	8a9c284	2001-09-22 21:30:22 +0000	[diff] [blame]	514	>>> result = list(d.compare(text1, text2))
Fred Drake	6943a29	2001-08-13 19:31:59 +0000	[diff] [blame]	515	\end{verbatim}
				516
				517	\code{result} is a list of strings, so let's pretty-print it:
				518
				519	\begin{verbatim}
				520	>>> from pprint import pprint
				521	>>> pprint(result)
				522	[' 1. Beautiful is better than ugly.\n',
				523	'- 2. Explicit is better than implicit.\n',
				524	'- 3. Simple is better than complex.\n',
				525	'+ 3. Simple is better than complex.\n',
				526	'? ++ \n',
				527	'- 4. Complex is better than complicated.\n',
				528	'? ^ ---- ^ \n',
				529	'+ 4. Complicated is better than complex.\n',
				530	'? ++++ ^ ^ \n',
				531	'+ 5. Flat is better than nested.\n']
				532	\end{verbatim}
				533
				534	As a single multi-line string it looks like this:
				535
				536	\begin{verbatim}
				537	>>> import sys
				538	>>> sys.stdout.writelines(result)
				539	1. Beautiful is better than ugly.
				540	- 2. Explicit is better than implicit.
				541	- 3. Simple is better than complex.
				542	+ 3. Simple is better than complex.
				543	? ++
				544	- 4. Complex is better than complicated.
				545	? ^ ---- ^
				546	+ 4. Complicated is better than complex.
				547	? ++++ ^ ^
				548	+ 5. Flat is better than nested.
				549	\end{verbatim}