Blame - Doc/library/difflib.rst - platform/external/python/cpython3

blob: 25e3511d017858fa7f5ae0c14ddc48da5f982839 [file] [log] [blame]

Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	1	:mod:`difflib` --- Helpers for computing deltas
				2	===============================================
				3
				4	.. module:: difflib
				5	:synopsis: Helpers for computing differences between objects.
Terry Jan Reedy	fa089b9	2016-06-11 15:02:54 -0400	[diff] [blame]	6
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	7	.. moduleauthor:: Tim Peters <tim_one@users.sourceforge.net>
				8	.. sectionauthor:: Tim Peters <tim_one@users.sourceforge.net>
Christian Heimes	5b5e81c	2007-12-31 16:14:33 +0000	[diff] [blame]	9	.. Markup by Fred L. Drake, Jr. <fdrake@acm.org>
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	10
Andrew Kuchling	2e3743c	2014-03-19 16:23:01 -0400	[diff] [blame]	11	Source code: :source:`Lib/difflib.py`
				12
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	13	.. testsetup::
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	14
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	15	import sys
				16	from difflib import *
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	17
Terry Jan Reedy	fa089b9	2016-06-11 15:02:54 -0400	[diff] [blame]	18	--------------
				19
Georg Brandl	9afde1c	2007-11-01 20:32:30 +0000	[diff] [blame]	20	This module provides classes and functions for comparing sequences. It
				21	can be used for example, for comparing files, and can produce difference
				22	information in various formats, including HTML and context and unified
				23	diffs. For comparing directories and files, see also, the :mod:`filecmp` module.
				24
Terry Reedy	99f9637	2010-11-25 06:12:34 +0000	[diff] [blame]	25
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	26	.. class:: SequenceMatcher
Victor Stinner	8f88190	2020-08-19 19:25:22 +0200	[diff] [blame]	27	:noindex:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	28
				29	This is a flexible class for comparing pairs of sequences of any type, so long
Guido van Rossum	2cc30da	2007-11-02 23:46:40 +0000	[diff] [blame]	30	as the sequence elements are :term:`hashable`. The basic algorithm predates, and is a
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	31	little fancier than, an algorithm published in the late 1980's by Ratcliff and
				32	Obershelp under the hyperbolic name "gestalt pattern matching." The idea is to
				33	find the longest contiguous matching subsequence that contains no "junk"
Andrew Kuchling	c51da2b	2014-03-19 16:43:06 -0400	[diff] [blame]	34	elements; these "junk" elements are ones that are uninteresting in some
				35	sense, such as blank lines or whitespace. (Handling junk is an
				36	extension to the Ratcliff and Obershelp algorithm.) The same
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	37	idea is then applied recursively to the pieces of the sequences to the left and
				38	to the right of the matching subsequence. This does not yield minimal edit
				39	sequences, but does tend to yield matches that "look right" to people.
				40
				41	Timing: The basic Ratcliff-Obershelp algorithm is cubic time in the worst
				42	case and quadratic time in the expected case. :class:`SequenceMatcher` is
				43	quadratic time for the worst case and has expected-case behavior dependent in a
				44	complicated way on how many elements the sequences have in common; best case
				45	time is linear.
				46
Terry Reedy	99f9637	2010-11-25 06:12:34 +0000	[diff] [blame]	47	Automatic junk heuristic: :class:`SequenceMatcher` supports a heuristic that
				48	automatically treats certain sequence items as junk. The heuristic counts how many
				49	times each individual item appears in the sequence. If an item's duplicates (after
				50	the first one) account for more than 1% of the sequence and the sequence is at least
				51	200 items long, this item is marked as "popular" and is treated as junk for
				52	the purpose of sequence matching. This heuristic can be turned off by setting
				53	the ``autojunk`` argument to ``False`` when creating the :class:`SequenceMatcher`.
				54
Terry Reedy	dc9b17d	2010-11-27 20:52:14 +0000	[diff] [blame]	55	.. versionadded:: 3.2
				56	The autojunk parameter.
				57
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	58
				59	.. class:: Differ
				60
				61	This is a class for comparing sequences of lines of text, and producing
				62	human-readable differences or deltas. Differ uses :class:`SequenceMatcher`
				63	both to compare sequences of lines, and to compare sequences of characters
				64	within similar (near-matching) lines.
				65
				66	Each line of a :class:`Differ` delta begins with a two-letter code:
				67
				68	+----------+-------------------------------------------+
				69	\| Code \| Meaning \|
				70	+==========+===========================================+
				71	\| ``'- '`` \| line unique to sequence 1 \|
				72	+----------+-------------------------------------------+
				73	\| ``'+ '`` \| line unique to sequence 2 \|
				74	+----------+-------------------------------------------+
				75	\| ``' '`` \| line common to both sequences \|
				76	+----------+-------------------------------------------+
				77	\| ``'? '`` \| line not present in either input sequence \|
				78	+----------+-------------------------------------------+
				79
				80	Lines beginning with '``?``' attempt to guide the eye to intraline differences,
				81	and were not present in either input sequence. These lines can be confusing if
				82	the sequences contain tab characters.
				83
				84
				85	.. class:: HtmlDiff
				86
				87	This class can be used to create an HTML table (or a complete HTML file
				88	containing the table) showing a side by side, line by line comparison of text
				89	with inter-line and intra-line change highlights. The table can be generated in
				90	either full or contextual difference mode.
				91
				92	The constructor for this class is:
				93
				94
Georg Brandl	c2a4f4f	2009-04-10 09:03:43 +0000	[diff] [blame]	95	.. method:: __init__(tabsize=8, wrapcolumn=None, linejunk=None, charjunk=IS_CHARACTER_JUNK)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	96
				97	Initializes instance of :class:`HtmlDiff`.
				98
				99	tabsize is an optional keyword argument to specify tab stop spacing and
				100	defaults to ``8``.
				101
				102	wrapcolumn is an optional keyword to specify column number where lines are
				103	broken and wrapped, defaults to ``None`` where lines are not wrapped.
				104
Terry Jan Reedy	3e8a7ad	2015-10-30 19:41:16 -0400	[diff] [blame]	105	linejunk and charjunk are optional keyword arguments passed into :func:`ndiff`
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	106	(used by :class:`HtmlDiff` to generate the side by side HTML differences). See
Terry Jan Reedy	3e8a7ad	2015-10-30 19:41:16 -0400	[diff] [blame]	107	:func:`ndiff` documentation for argument default values and descriptions.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	108
				109	The following methods are public:
				110
Berker Peksag	102029d	2015-03-15 01:18:47 +0200	[diff] [blame]	111	.. method:: make_file(fromlines, tolines, fromdesc='', todesc='', context=False, \
				112	numlines=5, *, charset='utf-8')
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	113
				114	Compares fromlines and tolines (lists of strings) and returns a string which
				115	is a complete HTML file containing a table showing line by line differences with
				116	inter-line and intra-line changes highlighted.
				117
				118	fromdesc and todesc are optional keyword arguments to specify from/to file
				119	column header strings (both default to an empty string).
				120
				121	context and numlines are both optional keyword arguments. Set context to
				122	``True`` when contextual differences are to be shown, else the default is
				123	``False`` to show the full files. numlines defaults to ``5``. When context
				124	is ``True`` numlines controls the number of context lines which surround the
				125	difference highlights. When context is ``False`` numlines controls the
				126	number of lines which are shown before a difference highlight when using the
				127	"next" hyperlinks (setting to zero would cause the "next" hyperlinks to place
				128	the next difference highlight at the top of the browser without any leading
				129	context).
				130
Xtreak	c78dae8	2019-09-11 12:21:31 +0100	[diff] [blame]	131	.. note::
				132	fromdesc and todesc are interpreted as unescaped HTML and should be
				133	properly escaped while receiving input from untrusted sources.
				134
Berker Peksag	102029d	2015-03-15 01:18:47 +0200	[diff] [blame]	135	.. versionchanged:: 3.5
				136	charset keyword-only argument was added. The default charset of
				137	HTML document changed from ``'ISO-8859-1'`` to ``'utf-8'``.
				138
Georg Brandl	c2a4f4f	2009-04-10 09:03:43 +0000	[diff] [blame]	139	.. method:: make_table(fromlines, tolines, fromdesc='', todesc='', context=False, numlines=5)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	140
				141	Compares fromlines and tolines (lists of strings) and returns a string which
				142	is a complete HTML table showing line by line differences with inter-line and
				143	intra-line changes highlighted.
				144
				145	The arguments for this method are the same as those for the :meth:`make_file`
				146	method.
				147
				148	:file:`Tools/scripts/diff.py` is a command-line front-end to this class and
				149	contains a good example of its use.
				150
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	151
Georg Brandl	c2a4f4f	2009-04-10 09:03:43 +0000	[diff] [blame]	152	.. function:: context_diff(a, b, fromfile='', tofile='', fromfiledate='', tofiledate='', n=3, lineterm='\\n')
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	153
Georg Brandl	9afde1c	2007-11-01 20:32:30 +0000	[diff] [blame]	154	Compare a and b (lists of strings); return a delta (a :term:`generator`
				155	generating the delta lines) in context diff format.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	156
				157	Context diffs are a compact way of showing just the lines that have changed plus
				158	a few lines of context. The changes are shown in a before/after style. The
				159	number of context lines is set by n which defaults to three.
				160
				161	By default, the diff control lines (those with ``***`` or ``---``) are created
				162	with a trailing newline. This is helpful so that inputs created from
Serhiy Storchaka	bfdcd43	2013-10-13 23:09:14 +0300	[diff] [blame]	163	:func:`io.IOBase.readlines` result in diffs that are suitable for use with
				164	:func:`io.IOBase.writelines` since both the inputs and outputs have trailing
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	165	newlines.
				166
				167	For inputs that do not have trailing newlines, set the lineterm argument to
				168	``""`` so that the output will be uniformly newline free.
				169
				170	The context diff format normally has a header for filenames and modification
				171	times. Any or all of these may be specified using strings for fromfile,
R. David Murray	b2416e5	2010-04-12 16:58:02 +0000	[diff] [blame]	172	tofile, fromfiledate, and tofiledate. The modification times are normally
				173	expressed in the ISO 8601 format. If not specified, the
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	174	strings default to blanks.
				175
Christian Heimes	8640e74	2008-02-23 16:23:06 +0000	[diff] [blame]	176	>>> s1 = ['bacon\n', 'eggs\n', 'ham\n', 'guido\n']
				177	>>> s2 = ['python\n', 'eggy\n', 'hamster\n', 'guido\n']
Zachary Ware	9f8b3a0	2016-08-10 00:59:59 -0500	[diff] [blame]	178	>>> sys.stdout.writelines(context_diff(s1, s2, fromfile='before.py', tofile='after.py'))
Christian Heimes	8640e74	2008-02-23 16:23:06 +0000	[diff] [blame]	179	*** before.py
				180	--- after.py
				181	***************
				182	* 1,4 **
				183	! bacon
				184	! eggs
				185	! ham
				186	guido
				187	--- 1,4 ----
				188	! python
				189	! eggy
				190	! hamster
				191	guido
				192
				193	See :ref:`difflib-interface` for a more detailed example.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	194
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	195
Georg Brandl	c2a4f4f	2009-04-10 09:03:43 +0000	[diff] [blame]	196	.. function:: get_close_matches(word, possibilities, n=3, cutoff=0.6)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	197
				198	Return a list of the best "good enough" matches. word is a sequence for which
				199	close matches are desired (typically a string), and possibilities is a list of
				200	sequences against which to match word (typically a list of strings).
				201
				202	Optional argument n (default ``3``) is the maximum number of close matches to
				203	return; n must be greater than ``0``.
				204
				205	Optional argument cutoff (default ``0.6``) is a float in the range [0, 1].
				206	Possibilities that don't score at least that similar to word are ignored.
				207
				208	The best (no more than n) matches among the possibilities are returned in a
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	209	list, sorted by similarity score, most similar first.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	210
				211	>>> get_close_matches('appel', ['ape', 'apple', 'peach', 'puppy'])
				212	['apple', 'ape']
				213	>>> import keyword
				214	>>> get_close_matches('wheel', keyword.kwlist)
				215	['while']
Zachary Ware	9f8b3a0	2016-08-10 00:59:59 -0500	[diff] [blame]	216	>>> get_close_matches('pineapple', keyword.kwlist)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	217	[]
				218	>>> get_close_matches('accept', keyword.kwlist)
				219	['except']
				220
				221
Georg Brandl	c2a4f4f	2009-04-10 09:03:43 +0000	[diff] [blame]	222	.. function:: ndiff(a, b, linejunk=None, charjunk=IS_CHARACTER_JUNK)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	223
Georg Brandl	9afde1c	2007-11-01 20:32:30 +0000	[diff] [blame]	224	Compare a and b (lists of strings); return a :class:`Differ`\ -style
				225	delta (a :term:`generator` generating the delta lines).
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	226
Andrew Kuchling	c51da2b	2014-03-19 16:43:06 -0400	[diff] [blame]	227	Optional keyword parameters linejunk and charjunk are filtering functions
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	228	(or ``None``):
				229
Georg Brandl	e6bcc91	2008-05-12 18:05:20 +0000	[diff] [blame]	230	linejunk: A function that accepts a single string argument, and returns
				231	true if the string is junk, or false if not. The default is ``None``. There
				232	is also a module-level function :func:`IS_LINE_JUNK`, which filters out lines
				233	without visible characters, except for at most one pound character (``'#'``)
				234	-- however the underlying :class:`SequenceMatcher` class does a dynamic
				235	analysis of which lines are so frequent as to constitute noise, and this
				236	usually works better than using this function.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	237
				238	charjunk: A function that accepts a character (a string of length 1), and
				239	returns if the character is junk, or false if not. The default is module-level
				240	function :func:`IS_CHARACTER_JUNK`, which filters out whitespace characters (a
Andrew Kuchling	c51da2b	2014-03-19 16:43:06 -0400	[diff] [blame]	241	blank or tab; it's a bad idea to include newline in this!).
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	242
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	243	:file:`Tools/scripts/ndiff.py` is a command-line front-end to this function.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	244
Terry Jan Reedy	bddecc3	2014-04-18 17:00:19 -0400	[diff] [blame]	245	>>> diff = ndiff('one\ntwo\nthree\n'.splitlines(keepends=True),
				246	... 'ore\ntree\nemu\n'.splitlines(keepends=True))
Georg Brandl	6911e3c	2007-09-04 07:15:32 +0000	[diff] [blame]	247	>>> print(''.join(diff), end="")
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	248	- one
				249	? ^
				250	+ ore
				251	? ^
				252	- two
				253	- three
				254	? -
				255	+ tree
				256	+ emu
				257
				258
				259	.. function:: restore(sequence, which)
				260
				261	Return one of the two sequences that generated a delta.
				262
				263	Given a sequence produced by :meth:`Differ.compare` or :func:`ndiff`, extract
				264	lines originating from file 1 or 2 (parameter which), stripping off line
				265	prefixes.
				266
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	267	Example:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	268
Terry Jan Reedy	bddecc3	2014-04-18 17:00:19 -0400	[diff] [blame]	269	>>> diff = ndiff('one\ntwo\nthree\n'.splitlines(keepends=True),
				270	... 'ore\ntree\nemu\n'.splitlines(keepends=True))
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	271	>>> diff = list(diff) # materialize the generated delta into a list
Georg Brandl	6911e3c	2007-09-04 07:15:32 +0000	[diff] [blame]	272	>>> print(''.join(restore(diff, 1)), end="")
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	273	one
				274	two
				275	three
Georg Brandl	6911e3c	2007-09-04 07:15:32 +0000	[diff] [blame]	276	>>> print(''.join(restore(diff, 2)), end="")
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	277	ore
				278	tree
				279	emu
				280
				281
Georg Brandl	c2a4f4f	2009-04-10 09:03:43 +0000	[diff] [blame]	282	.. function:: unified_diff(a, b, fromfile='', tofile='', fromfiledate='', tofiledate='', n=3, lineterm='\\n')
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	283
Georg Brandl	9afde1c	2007-11-01 20:32:30 +0000	[diff] [blame]	284	Compare a and b (lists of strings); return a delta (a :term:`generator`
				285	generating the delta lines) in unified diff format.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	286
				287	Unified diffs are a compact way of showing just the lines that have changed plus
Martin Panter	7462b649	2015-11-02 03:37:02 +0000	[diff] [blame]	288	a few lines of context. The changes are shown in an inline style (instead of
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	289	separate before/after blocks). The number of context lines is set by n which
				290	defaults to three.
				291
				292	By default, the diff control lines (those with ``---``, ``+++``, or ``@@``) are
				293	created with a trailing newline. This is helpful so that inputs created from
Serhiy Storchaka	bfdcd43	2013-10-13 23:09:14 +0300	[diff] [blame]	294	:func:`io.IOBase.readlines` result in diffs that are suitable for use with
				295	:func:`io.IOBase.writelines` since both the inputs and outputs have trailing
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	296	newlines.
				297
				298	For inputs that do not have trailing newlines, set the lineterm argument to
				299	``""`` so that the output will be uniformly newline free.
				300
				301	The context diff format normally has a header for filenames and modification
				302	times. Any or all of these may be specified using strings for fromfile,
R. David Murray	b2416e5	2010-04-12 16:58:02 +0000	[diff] [blame]	303	tofile, fromfiledate, and tofiledate. The modification times are normally
				304	expressed in the ISO 8601 format. If not specified, the
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	305	strings default to blanks.
				306
Christian Heimes	8640e74	2008-02-23 16:23:06 +0000	[diff] [blame]	307
				308	>>> s1 = ['bacon\n', 'eggs\n', 'ham\n', 'guido\n']
				309	>>> s2 = ['python\n', 'eggy\n', 'hamster\n', 'guido\n']
Zachary Ware	9f8b3a0	2016-08-10 00:59:59 -0500	[diff] [blame]	310	>>> sys.stdout.writelines(unified_diff(s1, s2, fromfile='before.py', tofile='after.py'))
Christian Heimes	8640e74	2008-02-23 16:23:06 +0000	[diff] [blame]	311	--- before.py
				312	+++ after.py
				313	@@ -1,4 +1,4 @@
				314	-bacon
				315	-eggs
				316	-ham
				317	+python
				318	+eggy
				319	+hamster
				320	guido
				321
				322	See :ref:`difflib-interface` for a more detailed example.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	323
Greg Ward	4d9d256	2015-04-20 20:21:21 -0400	[diff] [blame]	324	.. function:: diff_bytes(dfunc, a, b, fromfile=b'', tofile=b'', fromfiledate=b'', tofiledate=b'', n=3, lineterm=b'\\n')
				325
				326	Compare a and b (lists of bytes objects) using dfunc; yield a
				327	sequence of delta lines (also bytes) in the format returned by dfunc.
				328	dfunc must be a callable, typically either :func:`unified_diff` or
				329	:func:`context_diff`.
				330
				331	Allows you to compare data with unknown or inconsistent encoding. All
				332	inputs except n must be bytes objects, not str. Works by losslessly
				333	converting all inputs (except n) to str, and calling ``dfunc(a, b,
				334	fromfile, tofile, fromfiledate, tofiledate, n, lineterm)``. The output of
				335	dfunc is then converted back to bytes, so the delta lines that you
				336	receive have the same unknown/inconsistent encodings as a and b.
				337
				338	.. versionadded:: 3.5
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	339
				340	.. function:: IS_LINE_JUNK(line)
				341
Serhiy Storchaka	138ccbb	2019-11-12 16:57:03 +0200	[diff] [blame]	342	Return ``True`` for ignorable lines. The line line is ignorable if line is
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	343	blank or contains a single ``'#'``, otherwise it is not ignorable. Used as a
Georg Brandl	e6bcc91	2008-05-12 18:05:20 +0000	[diff] [blame]	344	default for parameter linejunk in :func:`ndiff` in older versions.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	345
				346
				347	.. function:: IS_CHARACTER_JUNK(ch)
				348
Serhiy Storchaka	138ccbb	2019-11-12 16:57:03 +0200	[diff] [blame]	349	Return ``True`` for ignorable characters. The character ch is ignorable if ch
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	350	is a space or tab, otherwise it is not ignorable. Used as a default for
				351	parameter charjunk in :func:`ndiff`.
				352
				353
				354	.. seealso::
				355
Georg Brandl	525d355	2014-10-29 10:26:56 +0100	[diff] [blame]	356	`Pattern Matching: The Gestalt Approach <http://www.drdobbs.com/database/pattern-matching-the-gestalt-approach/184407970>`_
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	357	Discussion of a similar algorithm by John W. Ratcliff and D. E. Metzener. This
Georg Brandl	525d355	2014-10-29 10:26:56 +0100	[diff] [blame]	358	was published in `Dr. Dobb's Journal <http://www.drdobbs.com/>`_ in July, 1988.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	359
				360
				361	.. _sequence-matcher:
				362
				363	SequenceMatcher Objects
				364	-----------------------
				365
				366	The :class:`SequenceMatcher` class has this constructor:
				367
				368
Terry Reedy	99f9637	2010-11-25 06:12:34 +0000	[diff] [blame]	369	.. class:: SequenceMatcher(isjunk=None, a='', b='', autojunk=True)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	370
				371	Optional argument isjunk must be ``None`` (the default) or a one-argument
				372	function that takes a sequence element and returns true if and only if the
				373	element is "junk" and should be ignored. Passing ``None`` for isjunk is
Serhiy Storchaka	138ccbb	2019-11-12 16:57:03 +0200	[diff] [blame]	374	equivalent to passing ``lambda x: False``; in other words, no elements are ignored.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	375	For example, pass::
				376
				377	lambda x: x in " \t"
				378
				379	if you're comparing lines as sequences of characters, and don't want to synch up
				380	on blanks or hard tabs.
				381
				382	The optional arguments a and b are sequences to be compared; both default to
Guido van Rossum	2cc30da	2007-11-02 23:46:40 +0000	[diff] [blame]	383	empty strings. The elements of both sequences must be :term:`hashable`.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	384
Terry Reedy	99f9637	2010-11-25 06:12:34 +0000	[diff] [blame]	385	The optional argument autojunk can be used to disable the automatic junk
				386	heuristic.
				387
Terry Reedy	dc9b17d	2010-11-27 20:52:14 +0000	[diff] [blame]	388	.. versionadded:: 3.2
				389	The autojunk parameter.
				390
Terry Reedy	74a7c67	2010-12-03 18:57:42 +0000	[diff] [blame]	391	SequenceMatcher objects get three data attributes: bjunk is the
Serhiy Storchaka	fbc1c26	2013-11-29 12:17:13 +0200	[diff] [blame]	392	set of elements of b for which isjunk is ``True``; bpopular is the set of
Terry Reedy	17a5925	2010-12-15 20:18:10 +0000	[diff] [blame]	393	non-junk elements considered popular by the heuristic (if it is not
				394	disabled); b2j is a dict mapping the remaining elements of b to a list
				395	of positions where they occur. All three are reset whenever b is reset
				396	with :meth:`set_seqs` or :meth:`set_seq2`.
Terry Reedy	74a7c67	2010-12-03 18:57:42 +0000	[diff] [blame]	397
Georg Brandl	500be24	2010-12-03 19:56:42 +0000	[diff] [blame]	398	.. versionadded:: 3.2
Terry Reedy	74a7c67	2010-12-03 18:57:42 +0000	[diff] [blame]	399	The bjunk and bpopular attributes.
				400
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	401	:class:`SequenceMatcher` objects have the following methods:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	402
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	403	.. method:: set_seqs(a, b)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	404
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	405	Set the two sequences to be compared.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	406
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	407	:class:`SequenceMatcher` computes and caches detailed information about the
				408	second sequence, so if you want to compare one sequence against many
				409	sequences, use :meth:`set_seq2` to set the commonly used sequence once and
				410	call :meth:`set_seq1` repeatedly, once for each of the other sequences.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	411
				412
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	413	.. method:: set_seq1(a)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	414
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	415	Set the first sequence to be compared. The second sequence to be compared
				416	is not changed.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	417
				418
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	419	.. method:: set_seq2(b)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	420
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	421	Set the second sequence to be compared. The first sequence to be compared
				422	is not changed.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	423
				424
lrjball	3209cbd	2020-04-30 04:42:45 +0100	[diff] [blame]	425	.. method:: find_longest_match(alo=0, ahi=None, blo=0, bhi=None)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	426
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	427	Find longest matching block in ``a[alo:ahi]`` and ``b[blo:bhi]``.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	428
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	429	If isjunk was omitted or ``None``, :meth:`find_longest_match` returns
				430	``(i, j, k)`` such that ``a[i:i+k]`` is equal to ``b[j:j+k]``, where ``alo
				431	<= i <= i+k <= ahi`` and ``blo <= j <= j+k <= bhi``. For all ``(i', j',
				432	k')`` meeting those conditions, the additional conditions ``k >= k'``, ``i
				433	<= i'``, and if ``i == i'``, ``j <= j'`` are also met. In other words, of
				434	all maximal matching blocks, return one that starts earliest in a, and
				435	of all those maximal matching blocks that start earliest in a, return
				436	the one that starts earliest in b.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	437
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	438	>>> s = SequenceMatcher(None, " abcd", "abcd abcd")
				439	>>> s.find_longest_match(0, 5, 0, 9)
				440	Match(a=0, b=4, size=5)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	441
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	442	If isjunk was provided, first the longest matching block is determined
				443	as above, but with the additional restriction that no junk element appears
				444	in the block. Then that block is extended as far as possible by matching
				445	(only) junk elements on both sides. So the resulting block never matches
				446	on junk except as identical junk happens to be adjacent to an interesting
				447	match.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	448
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	449	Here's the same example as before, but considering blanks to be junk. That
				450	prevents ``' abcd'`` from matching the ``' abcd'`` at the tail end of the
				451	second sequence directly. Instead only the ``'abcd'`` can match, and
				452	matches the leftmost ``'abcd'`` in the second sequence:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	453
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	454	>>> s = SequenceMatcher(lambda x: x==" ", " abcd", "abcd abcd")
				455	>>> s.find_longest_match(0, 5, 0, 9)
				456	Match(a=1, b=0, size=4)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	457
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	458	If no blocks match, this returns ``(alo, blo, 0)``.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	459
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	460	This method returns a :term:`named tuple` ``Match(a, b, size)``.
Christian Heimes	25bb783	2008-01-11 16:17:00 +0000	[diff] [blame]	461
lrjball	3209cbd	2020-04-30 04:42:45 +0100	[diff] [blame]	462	.. versionchanged:: 3.9
				463	Added default arguments.
				464
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	465
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	466	.. method:: get_matching_blocks()
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	467
Terry Jan Reedy	d9bff4e	2018-10-26 23:03:08 -0400	[diff] [blame]	468	Return list of triples describing non-overlapping matching subsequences.
				469	Each triple is of the form ``(i, j, n)``,
				470	and means that ``a[i:i+n] == b[j:j+n]``. The
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	471	triples are monotonically increasing in i and j.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	472
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	473	The last triple is a dummy, and has the value ``(len(a), len(b), 0)``. It
				474	is the only triple with ``n == 0``. If ``(i, j, n)`` and ``(i', j', n')``
				475	are adjacent triples in the list, and the second is not the last triple in
Terry Jan Reedy	d9bff4e	2018-10-26 23:03:08 -0400	[diff] [blame]	476	the list, then ``i+n < i'`` or ``j+n < j'``; in other words, adjacent
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	477	triples always describe non-adjacent equal blocks.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	478
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	479	.. XXX Explain why a dummy is used!
Christian Heimes	5b5e81c	2007-12-31 16:14:33 +0000	[diff] [blame]	480
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	481	.. doctest::
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	482
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	483	>>> s = SequenceMatcher(None, "abxcd", "abcd")
				484	>>> s.get_matching_blocks()
				485	[Match(a=0, b=0, size=2), Match(a=3, b=2, size=2), Match(a=5, b=4, size=0)]
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	486
				487
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	488	.. method:: get_opcodes()
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	489
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	490	Return list of 5-tuples describing how to turn a into b. Each tuple is
				491	of the form ``(tag, i1, i2, j1, j2)``. The first tuple has ``i1 == j1 ==
				492	0``, and remaining tuples have i1 equal to the i2 from the preceding
				493	tuple, and, likewise, j1 equal to the previous j2.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	494
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	495	The tag values are strings, with these meanings:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	496
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	497	+---------------+---------------------------------------------+
				498	\| Value \| Meaning \|
				499	+===============+=============================================+
				500	\| ``'replace'`` \| ``a[i1:i2]`` should be replaced by \|
				501	\| \| ``b[j1:j2]``. \|
				502	+---------------+---------------------------------------------+
				503	\| ``'delete'`` \| ``a[i1:i2]`` should be deleted. Note that \|
				504	\| \| ``j1 == j2`` in this case. \|
				505	+---------------+---------------------------------------------+
				506	\| ``'insert'`` \| ``b[j1:j2]`` should be inserted at \|
				507	\| \| ``a[i1:i1]``. Note that ``i1 == i2`` in \|
				508	\| \| this case. \|
				509	+---------------+---------------------------------------------+
				510	\| ``'equal'`` \| ``a[i1:i2] == b[j1:j2]`` (the sub-sequences \|
				511	\| \| are equal). \|
				512	+---------------+---------------------------------------------+
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	513
Berker Peksag	eb2e02b	2016-03-11 23:19:48 +0200	[diff] [blame]	514	For example::
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	515
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	516	>>> a = "qabxcd"
				517	>>> b = "abycdf"
				518	>>> s = SequenceMatcher(None, a, b)
				519	>>> for tag, i1, i2, j1, j2 in s.get_opcodes():
Berker Peksag	eb2e02b	2016-03-11 23:19:48 +0200	[diff] [blame]	520	... print('{:7} a[{}:{}] --> b[{}:{}] {!r:>8} --> {!r}'.format(
				521	... tag, i1, i2, j1, j2, a[i1:i2], b[j1:j2]))
Raymond Hettinger	dbb677a	2011-04-09 19:41:00 -0700	[diff] [blame]	522	delete a[0:1] --> b[0:0] 'q' --> ''
				523	equal a[1:3] --> b[0:2] 'ab' --> 'ab'
				524	replace a[3:4] --> b[2:3] 'x' --> 'y'
				525	equal a[4:6] --> b[3:5] 'cd' --> 'cd'
				526	insert a[6:6] --> b[5:6] '' --> 'f'
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	527
				528
Georg Brandl	c2a4f4f	2009-04-10 09:03:43 +0000	[diff] [blame]	529	.. method:: get_grouped_opcodes(n=3)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	530
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	531	Return a :term:`generator` of groups with up to n lines of context.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	532
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	533	Starting with the groups returned by :meth:`get_opcodes`, this method
				534	splits out smaller change clusters and eliminates intervening ranges which
				535	have no changes.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	536
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	537	The groups are returned in the same format as :meth:`get_opcodes`.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	538
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	539
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	540	.. method:: ratio()
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	541
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	542	Return a measure of the sequences' similarity as a float in the range [0,
				543	1].
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	544
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	545	Where T is the total number of elements in both sequences, and M is the
				546	number of matches, this is 2.0\*M / T. Note that this is ``1.0`` if the
				547	sequences are identical, and ``0.0`` if they have nothing in common.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	548
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	549	This is expensive to compute if :meth:`get_matching_blocks` or
				550	:meth:`get_opcodes` hasn't already been called, in which case you may want
				551	to try :meth:`quick_ratio` or :meth:`real_quick_ratio` first to get an
				552	upper bound.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	553
sweeneyde	e9cbcd0	2019-08-07 00:37:08 -0400	[diff] [blame]	554	.. note::
				555
				556	Caution: The result of a :meth:`ratio` call may depend on the order of
				557	the arguments. For instance::
				558
				559	>>> SequenceMatcher(None, 'tide', 'diet').ratio()
				560	0.25
				561	>>> SequenceMatcher(None, 'diet', 'tide').ratio()
				562	0.5
				563
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	564
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	565	.. method:: quick_ratio()
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	566
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	567	Return an upper bound on :meth:`ratio` relatively quickly.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	568
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	569
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	570	.. method:: real_quick_ratio()
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	571
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	572	Return an upper bound on :meth:`ratio` very quickly.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	573
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	574
				575	The three methods that return the ratio of matching to total characters can give
				576	different results due to differing levels of approximation, although
				577	:meth:`quick_ratio` and :meth:`real_quick_ratio` are always at least as large as
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	578	:meth:`ratio`:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	579
				580	>>> s = SequenceMatcher(None, "abcd", "bcde")
				581	>>> s.ratio()
				582	0.75
				583	>>> s.quick_ratio()
				584	0.75
				585	>>> s.real_quick_ratio()
				586	1.0
				587
				588
				589	.. _sequencematcher-examples:
				590
				591	SequenceMatcher Examples
				592	------------------------
				593
Terry Reedy	74a7c67	2010-12-03 18:57:42 +0000	[diff] [blame]	594	This example compares two strings, considering blanks to be "junk":
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	595
				596	>>> s = SequenceMatcher(lambda x: x == " ",
				597	... "private Thread currentThread;",
				598	... "private volatile Thread currentThread;")
				599
				600	:meth:`ratio` returns a float in [0, 1], measuring the similarity of the
				601	sequences. As a rule of thumb, a :meth:`ratio` value over 0.6 means the
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	602	sequences are close matches:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	603
Georg Brandl	6911e3c	2007-09-04 07:15:32 +0000	[diff] [blame]	604	>>> print(round(s.ratio(), 3))
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	605	0.866
				606
				607	If you're only interested in where the sequences match,
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	608	:meth:`get_matching_blocks` is handy:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	609
				610	>>> for block in s.get_matching_blocks():
Georg Brandl	6911e3c	2007-09-04 07:15:32 +0000	[diff] [blame]	611	... print("a[%d] and b[%d] match for %d elements" % block)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	612	a[0] and b[0] match for 8 elements
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	613	a[8] and b[17] match for 21 elements
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	614	a[29] and b[38] match for 0 elements
				615
				616	Note that the last tuple returned by :meth:`get_matching_blocks` is always a
				617	dummy, ``(len(a), len(b), 0)``, and this is the only case in which the last
				618	tuple element (number of elements matched) is ``0``.
				619
				620	If you want to know how to change the first sequence into the second, use
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	621	:meth:`get_opcodes`:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	622
				623	>>> for opcode in s.get_opcodes():
Georg Brandl	6911e3c	2007-09-04 07:15:32 +0000	[diff] [blame]	624	... print("%6s a[%d:%d] b[%d:%d]" % opcode)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	625	equal a[0:8] b[0:8]
				626	insert a[8:8] b[8:17]
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	627	equal a[8:29] b[17:38]
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	628
Raymond Hettinger	58c8c26	2009-04-27 21:01:21 +0000	[diff] [blame]	629	.. seealso::
				630
				631	* The :func:`get_close_matches` function in this module which shows how
				632	simple code building on :class:`SequenceMatcher` can be used to do useful
				633	work.
				634
				635	* `Simple version control recipe
Serhiy Storchaka	6dff020	2016-05-07 10:49:07 +0300	[diff] [blame]	636	<https://code.activestate.com/recipes/576729/>`_ for a small application
Raymond Hettinger	58c8c26	2009-04-27 21:01:21 +0000	[diff] [blame]	637	built with :class:`SequenceMatcher`.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	638
				639
				640	.. _differ-objects:
				641
				642	Differ Objects
				643	--------------
				644
				645	Note that :class:`Differ`\ -generated deltas make no claim to be minimal
				646	diffs. To the contrary, minimal diffs are often counter-intuitive, because they
				647	synch up anywhere possible, sometimes accidental matches 100 pages apart.
				648	Restricting synch points to contiguous matches preserves some notion of
				649	locality, at the occasional cost of producing a longer diff.
				650
				651	The :class:`Differ` class has this constructor:
				652
				653
Georg Brandl	c2a4f4f	2009-04-10 09:03:43 +0000	[diff] [blame]	654	.. class:: Differ(linejunk=None, charjunk=None)
Victor Stinner	8f88190	2020-08-19 19:25:22 +0200	[diff] [blame]	655	:noindex:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	656
				657	Optional keyword parameters linejunk and charjunk are for filter functions
				658	(or ``None``):
				659
				660	linejunk: A function that accepts a single string argument, and returns true
				661	if the string is junk. The default is ``None``, meaning that no line is
				662	considered junk.
				663
				664	charjunk: A function that accepts a single character argument (a string of
				665	length 1), and returns true if the character is junk. The default is ``None``,
				666	meaning that no character is considered junk.
				667
Andrew Kuchling	c51da2b	2014-03-19 16:43:06 -0400	[diff] [blame]	668	These junk-filtering functions speed up matching to find
				669	differences and do not cause any differing lines or characters to
				670	be ignored. Read the description of the
				671	:meth:`~SequenceMatcher.find_longest_match` method's isjunk
				672	parameter for an explanation.
				673
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	674	:class:`Differ` objects are used (deltas generated) via a single method:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	675
				676
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	677	.. method:: Differ.compare(a, b)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	678
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	679	Compare two sequences of lines, and generate the delta (a sequence of lines).
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	680
Serhiy Storchaka	bfdcd43	2013-10-13 23:09:14 +0300	[diff] [blame]	681	Each sequence must contain individual single-line strings ending with
				682	newlines. Such sequences can be obtained from the
				683	:meth:`~io.IOBase.readlines` method of file-like objects. The delta
				684	generated also consists of newline-terminated strings, ready to be
				685	printed as-is via the :meth:`~io.IOBase.writelines` method of a
				686	file-like object.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	687
				688
				689	.. _differ-examples:
				690
				691	Differ Example
				692	--------------
				693
				694	This example compares two texts. First we set up the texts, sequences of
				695	individual single-line strings ending with newlines (such sequences can also be
Serhiy Storchaka	bfdcd43	2013-10-13 23:09:14 +0300	[diff] [blame]	696	obtained from the :meth:`~io.BaseIO.readlines` method of file-like objects):
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	697
				698	>>> text1 = ''' 1. Beautiful is better than ugly.
				699	... 2. Explicit is better than implicit.
				700	... 3. Simple is better than complex.
				701	... 4. Complex is better than complicated.
Terry Jan Reedy	bddecc3	2014-04-18 17:00:19 -0400	[diff] [blame]	702	... '''.splitlines(keepends=True)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	703	>>> len(text1)
				704	4
				705	>>> text1[0][-1]
				706	'\n'
				707	>>> text2 = ''' 1. Beautiful is better than ugly.
				708	... 3. Simple is better than complex.
				709	... 4. Complicated is better than complex.
				710	... 5. Flat is better than nested.
Terry Jan Reedy	bddecc3	2014-04-18 17:00:19 -0400	[diff] [blame]	711	... '''.splitlines(keepends=True)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	712
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	713	Next we instantiate a Differ object:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	714
				715	>>> d = Differ()
				716
				717	Note that when instantiating a :class:`Differ` object we may pass functions to
				718	filter out line and character "junk." See the :meth:`Differ` constructor for
				719	details.
				720
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	721	Finally, we compare the two:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	722
				723	>>> result = list(d.compare(text1, text2))
				724
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	725	``result`` is a list of strings, so let's pretty-print it:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	726
				727	>>> from pprint import pprint
				728	>>> pprint(result)
				729	[' 1. Beautiful is better than ugly.\n',
				730	'- 2. Explicit is better than implicit.\n',
				731	'- 3. Simple is better than complex.\n',
				732	'+ 3. Simple is better than complex.\n',
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	733	'? ++\n',
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	734	'- 4. Complex is better than complicated.\n',
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	735	'? ^ ---- ^\n',
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	736	'+ 4. Complicated is better than complex.\n',
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	737	'? ++++ ^ ^\n',
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	738	'+ 5. Flat is better than nested.\n']
				739
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	740	As a single multi-line string it looks like this:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	741
				742	>>> import sys
				743	>>> sys.stdout.writelines(result)
				744	1. Beautiful is better than ugly.
				745	- 2. Explicit is better than implicit.
				746	- 3. Simple is better than complex.
				747	+ 3. Simple is better than complex.
				748	? ++
				749	- 4. Complex is better than complicated.
				750	? ^ ---- ^
				751	+ 4. Complicated is better than complex.
				752	? ++++ ^ ^
				753	+ 5. Flat is better than nested.
				754
Christian Heimes	8640e74	2008-02-23 16:23:06 +0000	[diff] [blame]	755
				756	.. _difflib-interface:
				757
				758	A command-line interface to difflib
				759	-----------------------------------
				760
				761	This example shows how to use difflib to create a ``diff``-like utility.
				762	It is also contained in the Python source distribution, as
				763	:file:`Tools/scripts/diff.py`.
				764
Berker Peksag	707deb9	2015-07-30 00:03:48 +0300	[diff] [blame]	765	.. literalinclude:: ../../Tools/scripts/diff.py