Blame - Doc/library/difflib.rst - platform/external/python/cpython3

blob: 38245e776ac3910bb1d07ab3435a7957f63d45e9 [file] [log] [blame]

Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	1	:mod:`difflib` --- Helpers for computing deltas
				2	===============================================
				3
				4	.. module:: difflib
				5	:synopsis: Helpers for computing differences between objects.
				6	.. moduleauthor:: Tim Peters <tim_one@users.sourceforge.net>
				7	.. sectionauthor:: Tim Peters <tim_one@users.sourceforge.net>
Christian Heimes	5b5e81c	2007-12-31 16:14:33 +0000	[diff] [blame]	8	.. Markup by Fred L. Drake, Jr. <fdrake@acm.org>
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	9
Andrew Kuchling	2e3743c	2014-03-19 16:23:01 -0400	[diff] [blame]	10	Source code: :source:`Lib/difflib.py`
				11
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	12	.. testsetup::
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	13
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	14	import sys
				15	from difflib import *
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	16
Georg Brandl	9afde1c	2007-11-01 20:32:30 +0000	[diff] [blame]	17	This module provides classes and functions for comparing sequences. It
				18	can be used for example, for comparing files, and can produce difference
				19	information in various formats, including HTML and context and unified
				20	diffs. For comparing directories and files, see also, the :mod:`filecmp` module.
				21
Terry Reedy	99f9637	2010-11-25 06:12:34 +0000	[diff] [blame]	22
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	23	.. class:: SequenceMatcher
				24
				25	This is a flexible class for comparing pairs of sequences of any type, so long
Guido van Rossum	2cc30da	2007-11-02 23:46:40 +0000	[diff] [blame]	26	as the sequence elements are :term:`hashable`. The basic algorithm predates, and is a
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	27	little fancier than, an algorithm published in the late 1980's by Ratcliff and
				28	Obershelp under the hyperbolic name "gestalt pattern matching." The idea is to
				29	find the longest contiguous matching subsequence that contains no "junk"
Andrew Kuchling	c51da2b	2014-03-19 16:43:06 -0400	[diff] [blame]	30	elements; these "junk" elements are ones that are uninteresting in some
				31	sense, such as blank lines or whitespace. (Handling junk is an
				32	extension to the Ratcliff and Obershelp algorithm.) The same
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	33	idea is then applied recursively to the pieces of the sequences to the left and
				34	to the right of the matching subsequence. This does not yield minimal edit
				35	sequences, but does tend to yield matches that "look right" to people.
				36
				37	Timing: The basic Ratcliff-Obershelp algorithm is cubic time in the worst
				38	case and quadratic time in the expected case. :class:`SequenceMatcher` is
				39	quadratic time for the worst case and has expected-case behavior dependent in a
				40	complicated way on how many elements the sequences have in common; best case
				41	time is linear.
				42
Terry Reedy	99f9637	2010-11-25 06:12:34 +0000	[diff] [blame]	43	Automatic junk heuristic: :class:`SequenceMatcher` supports a heuristic that
				44	automatically treats certain sequence items as junk. The heuristic counts how many
				45	times each individual item appears in the sequence. If an item's duplicates (after
				46	the first one) account for more than 1% of the sequence and the sequence is at least
				47	200 items long, this item is marked as "popular" and is treated as junk for
				48	the purpose of sequence matching. This heuristic can be turned off by setting
				49	the ``autojunk`` argument to ``False`` when creating the :class:`SequenceMatcher`.
				50
Terry Reedy	dc9b17d	2010-11-27 20:52:14 +0000	[diff] [blame]	51	.. versionadded:: 3.2
				52	The autojunk parameter.
				53
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	54
				55	.. class:: Differ
				56
				57	This is a class for comparing sequences of lines of text, and producing
				58	human-readable differences or deltas. Differ uses :class:`SequenceMatcher`
				59	both to compare sequences of lines, and to compare sequences of characters
				60	within similar (near-matching) lines.
				61
				62	Each line of a :class:`Differ` delta begins with a two-letter code:
				63
				64	+----------+-------------------------------------------+
				65	\| Code \| Meaning \|
				66	+==========+===========================================+
				67	\| ``'- '`` \| line unique to sequence 1 \|
				68	+----------+-------------------------------------------+
				69	\| ``'+ '`` \| line unique to sequence 2 \|
				70	+----------+-------------------------------------------+
				71	\| ``' '`` \| line common to both sequences \|
				72	+----------+-------------------------------------------+
				73	\| ``'? '`` \| line not present in either input sequence \|
				74	+----------+-------------------------------------------+
				75
				76	Lines beginning with '``?``' attempt to guide the eye to intraline differences,
				77	and were not present in either input sequence. These lines can be confusing if
				78	the sequences contain tab characters.
				79
				80
				81	.. class:: HtmlDiff
				82
				83	This class can be used to create an HTML table (or a complete HTML file
				84	containing the table) showing a side by side, line by line comparison of text
				85	with inter-line and intra-line change highlights. The table can be generated in
				86	either full or contextual difference mode.
				87
				88	The constructor for this class is:
				89
				90
Georg Brandl	c2a4f4f	2009-04-10 09:03:43 +0000	[diff] [blame]	91	.. method:: __init__(tabsize=8, wrapcolumn=None, linejunk=None, charjunk=IS_CHARACTER_JUNK)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	92
				93	Initializes instance of :class:`HtmlDiff`.
				94
				95	tabsize is an optional keyword argument to specify tab stop spacing and
				96	defaults to ``8``.
				97
				98	wrapcolumn is an optional keyword to specify column number where lines are
				99	broken and wrapped, defaults to ``None`` where lines are not wrapped.
				100
				101	linejunk and charjunk are optional keyword arguments passed into ``ndiff()``
				102	(used by :class:`HtmlDiff` to generate the side by side HTML differences). See
				103	``ndiff()`` documentation for argument default values and descriptions.
				104
				105	The following methods are public:
				106
Berker Peksag	102029d	2015-03-15 01:18:47 +0200	[diff] [blame]	107	.. method:: make_file(fromlines, tolines, fromdesc='', todesc='', context=False, \
				108	numlines=5, *, charset='utf-8')
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	109
				110	Compares fromlines and tolines (lists of strings) and returns a string which
				111	is a complete HTML file containing a table showing line by line differences with
				112	inter-line and intra-line changes highlighted.
				113
				114	fromdesc and todesc are optional keyword arguments to specify from/to file
				115	column header strings (both default to an empty string).
				116
				117	context and numlines are both optional keyword arguments. Set context to
				118	``True`` when contextual differences are to be shown, else the default is
				119	``False`` to show the full files. numlines defaults to ``5``. When context
				120	is ``True`` numlines controls the number of context lines which surround the
				121	difference highlights. When context is ``False`` numlines controls the
				122	number of lines which are shown before a difference highlight when using the
				123	"next" hyperlinks (setting to zero would cause the "next" hyperlinks to place
				124	the next difference highlight at the top of the browser without any leading
				125	context).
				126
Berker Peksag	102029d	2015-03-15 01:18:47 +0200	[diff] [blame]	127	.. versionchanged:: 3.5
				128	charset keyword-only argument was added. The default charset of
				129	HTML document changed from ``'ISO-8859-1'`` to ``'utf-8'``.
				130
Georg Brandl	c2a4f4f	2009-04-10 09:03:43 +0000	[diff] [blame]	131	.. method:: make_table(fromlines, tolines, fromdesc='', todesc='', context=False, numlines=5)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	132
				133	Compares fromlines and tolines (lists of strings) and returns a string which
				134	is a complete HTML table showing line by line differences with inter-line and
				135	intra-line changes highlighted.
				136
				137	The arguments for this method are the same as those for the :meth:`make_file`
				138	method.
				139
				140	:file:`Tools/scripts/diff.py` is a command-line front-end to this class and
				141	contains a good example of its use.
				142
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	143
Georg Brandl	c2a4f4f	2009-04-10 09:03:43 +0000	[diff] [blame]	144	.. function:: context_diff(a, b, fromfile='', tofile='', fromfiledate='', tofiledate='', n=3, lineterm='\\n')
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	145
Georg Brandl	9afde1c	2007-11-01 20:32:30 +0000	[diff] [blame]	146	Compare a and b (lists of strings); return a delta (a :term:`generator`
				147	generating the delta lines) in context diff format.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	148
				149	Context diffs are a compact way of showing just the lines that have changed plus
				150	a few lines of context. The changes are shown in a before/after style. The
				151	number of context lines is set by n which defaults to three.
				152
				153	By default, the diff control lines (those with ``***`` or ``---``) are created
				154	with a trailing newline. This is helpful so that inputs created from
Serhiy Storchaka	bfdcd43	2013-10-13 23:09:14 +0300	[diff] [blame]	155	:func:`io.IOBase.readlines` result in diffs that are suitable for use with
				156	:func:`io.IOBase.writelines` since both the inputs and outputs have trailing
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	157	newlines.
				158
				159	For inputs that do not have trailing newlines, set the lineterm argument to
				160	``""`` so that the output will be uniformly newline free.
				161
				162	The context diff format normally has a header for filenames and modification
				163	times. Any or all of these may be specified using strings for fromfile,
R. David Murray	b2416e5	2010-04-12 16:58:02 +0000	[diff] [blame]	164	tofile, fromfiledate, and tofiledate. The modification times are normally
				165	expressed in the ISO 8601 format. If not specified, the
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	166	strings default to blanks.
				167
Christian Heimes	8640e74	2008-02-23 16:23:06 +0000	[diff] [blame]	168	>>> s1 = ['bacon\n', 'eggs\n', 'ham\n', 'guido\n']
				169	>>> s2 = ['python\n', 'eggy\n', 'hamster\n', 'guido\n']
				170	>>> for line in context_diff(s1, s2, fromfile='before.py', tofile='after.py'):
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	171	... sys.stdout.write(line) # doctest: +NORMALIZE_WHITESPACE
Christian Heimes	8640e74	2008-02-23 16:23:06 +0000	[diff] [blame]	172	*** before.py
				173	--- after.py
				174	***************
				175	* 1,4 **
				176	! bacon
				177	! eggs
				178	! ham
				179	guido
				180	--- 1,4 ----
				181	! python
				182	! eggy
				183	! hamster
				184	guido
				185
				186	See :ref:`difflib-interface` for a more detailed example.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	187
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	188
Georg Brandl	c2a4f4f	2009-04-10 09:03:43 +0000	[diff] [blame]	189	.. function:: get_close_matches(word, possibilities, n=3, cutoff=0.6)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	190
				191	Return a list of the best "good enough" matches. word is a sequence for which
				192	close matches are desired (typically a string), and possibilities is a list of
				193	sequences against which to match word (typically a list of strings).
				194
				195	Optional argument n (default ``3``) is the maximum number of close matches to
				196	return; n must be greater than ``0``.
				197
				198	Optional argument cutoff (default ``0.6``) is a float in the range [0, 1].
				199	Possibilities that don't score at least that similar to word are ignored.
				200
				201	The best (no more than n) matches among the possibilities are returned in a
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	202	list, sorted by similarity score, most similar first.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	203
				204	>>> get_close_matches('appel', ['ape', 'apple', 'peach', 'puppy'])
				205	['apple', 'ape']
				206	>>> import keyword
				207	>>> get_close_matches('wheel', keyword.kwlist)
				208	['while']
				209	>>> get_close_matches('apple', keyword.kwlist)
				210	[]
				211	>>> get_close_matches('accept', keyword.kwlist)
				212	['except']
				213
				214
Georg Brandl	c2a4f4f	2009-04-10 09:03:43 +0000	[diff] [blame]	215	.. function:: ndiff(a, b, linejunk=None, charjunk=IS_CHARACTER_JUNK)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	216
Georg Brandl	9afde1c	2007-11-01 20:32:30 +0000	[diff] [blame]	217	Compare a and b (lists of strings); return a :class:`Differ`\ -style
				218	delta (a :term:`generator` generating the delta lines).
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	219
Andrew Kuchling	c51da2b	2014-03-19 16:43:06 -0400	[diff] [blame]	220	Optional keyword parameters linejunk and charjunk are filtering functions
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	221	(or ``None``):
				222
Georg Brandl	e6bcc91	2008-05-12 18:05:20 +0000	[diff] [blame]	223	linejunk: A function that accepts a single string argument, and returns
				224	true if the string is junk, or false if not. The default is ``None``. There
				225	is also a module-level function :func:`IS_LINE_JUNK`, which filters out lines
				226	without visible characters, except for at most one pound character (``'#'``)
				227	-- however the underlying :class:`SequenceMatcher` class does a dynamic
				228	analysis of which lines are so frequent as to constitute noise, and this
				229	usually works better than using this function.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	230
				231	charjunk: A function that accepts a character (a string of length 1), and
				232	returns if the character is junk, or false if not. The default is module-level
				233	function :func:`IS_CHARACTER_JUNK`, which filters out whitespace characters (a
Andrew Kuchling	c51da2b	2014-03-19 16:43:06 -0400	[diff] [blame]	234	blank or tab; it's a bad idea to include newline in this!).
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	235
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	236	:file:`Tools/scripts/ndiff.py` is a command-line front-end to this function.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	237
Terry Jan Reedy	bddecc3	2014-04-18 17:00:19 -0400	[diff] [blame]	238	>>> diff = ndiff('one\ntwo\nthree\n'.splitlines(keepends=True),
				239	... 'ore\ntree\nemu\n'.splitlines(keepends=True))
Georg Brandl	6911e3c	2007-09-04 07:15:32 +0000	[diff] [blame]	240	>>> print(''.join(diff), end="")
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	241	- one
				242	? ^
				243	+ ore
				244	? ^
				245	- two
				246	- three
				247	? -
				248	+ tree
				249	+ emu
				250
				251
				252	.. function:: restore(sequence, which)
				253
				254	Return one of the two sequences that generated a delta.
				255
				256	Given a sequence produced by :meth:`Differ.compare` or :func:`ndiff`, extract
				257	lines originating from file 1 or 2 (parameter which), stripping off line
				258	prefixes.
				259
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	260	Example:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	261
Terry Jan Reedy	bddecc3	2014-04-18 17:00:19 -0400	[diff] [blame]	262	>>> diff = ndiff('one\ntwo\nthree\n'.splitlines(keepends=True),
				263	... 'ore\ntree\nemu\n'.splitlines(keepends=True))
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	264	>>> diff = list(diff) # materialize the generated delta into a list
Georg Brandl	6911e3c	2007-09-04 07:15:32 +0000	[diff] [blame]	265	>>> print(''.join(restore(diff, 1)), end="")
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	266	one
				267	two
				268	three
Georg Brandl	6911e3c	2007-09-04 07:15:32 +0000	[diff] [blame]	269	>>> print(''.join(restore(diff, 2)), end="")
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	270	ore
				271	tree
				272	emu
				273
				274
Georg Brandl	c2a4f4f	2009-04-10 09:03:43 +0000	[diff] [blame]	275	.. function:: unified_diff(a, b, fromfile='', tofile='', fromfiledate='', tofiledate='', n=3, lineterm='\\n')
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	276
Georg Brandl	9afde1c	2007-11-01 20:32:30 +0000	[diff] [blame]	277	Compare a and b (lists of strings); return a delta (a :term:`generator`
				278	generating the delta lines) in unified diff format.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	279
				280	Unified diffs are a compact way of showing just the lines that have changed plus
				281	a few lines of context. The changes are shown in a inline style (instead of
				282	separate before/after blocks). The number of context lines is set by n which
				283	defaults to three.
				284
				285	By default, the diff control lines (those with ``---``, ``+++``, or ``@@``) are
				286	created with a trailing newline. This is helpful so that inputs created from
Serhiy Storchaka	bfdcd43	2013-10-13 23:09:14 +0300	[diff] [blame]	287	:func:`io.IOBase.readlines` result in diffs that are suitable for use with
				288	:func:`io.IOBase.writelines` since both the inputs and outputs have trailing
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	289	newlines.
				290
				291	For inputs that do not have trailing newlines, set the lineterm argument to
				292	``""`` so that the output will be uniformly newline free.
				293
				294	The context diff format normally has a header for filenames and modification
				295	times. Any or all of these may be specified using strings for fromfile,
R. David Murray	b2416e5	2010-04-12 16:58:02 +0000	[diff] [blame]	296	tofile, fromfiledate, and tofiledate. The modification times are normally
				297	expressed in the ISO 8601 format. If not specified, the
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	298	strings default to blanks.
				299
Christian Heimes	8640e74	2008-02-23 16:23:06 +0000	[diff] [blame]	300
				301	>>> s1 = ['bacon\n', 'eggs\n', 'ham\n', 'guido\n']
				302	>>> s2 = ['python\n', 'eggy\n', 'hamster\n', 'guido\n']
				303	>>> for line in unified_diff(s1, s2, fromfile='before.py', tofile='after.py'):
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	304	... sys.stdout.write(line) # doctest: +NORMALIZE_WHITESPACE
Christian Heimes	8640e74	2008-02-23 16:23:06 +0000	[diff] [blame]	305	--- before.py
				306	+++ after.py
				307	@@ -1,4 +1,4 @@
				308	-bacon
				309	-eggs
				310	-ham
				311	+python
				312	+eggy
				313	+hamster
				314	guido
				315
				316	See :ref:`difflib-interface` for a more detailed example.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	317
Greg Ward	4d9d256	2015-04-20 20:21:21 -0400	[diff] [blame]	318	.. function:: diff_bytes(dfunc, a, b, fromfile=b'', tofile=b'', fromfiledate=b'', tofiledate=b'', n=3, lineterm=b'\\n')
				319
				320	Compare a and b (lists of bytes objects) using dfunc; yield a
				321	sequence of delta lines (also bytes) in the format returned by dfunc.
				322	dfunc must be a callable, typically either :func:`unified_diff` or
				323	:func:`context_diff`.
				324
				325	Allows you to compare data with unknown or inconsistent encoding. All
				326	inputs except n must be bytes objects, not str. Works by losslessly
				327	converting all inputs (except n) to str, and calling ``dfunc(a, b,
				328	fromfile, tofile, fromfiledate, tofiledate, n, lineterm)``. The output of
				329	dfunc is then converted back to bytes, so the delta lines that you
				330	receive have the same unknown/inconsistent encodings as a and b.
				331
				332	.. versionadded:: 3.5
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	333
				334	.. function:: IS_LINE_JUNK(line)
				335
				336	Return true for ignorable lines. The line line is ignorable if line is
				337	blank or contains a single ``'#'``, otherwise it is not ignorable. Used as a
Georg Brandl	e6bcc91	2008-05-12 18:05:20 +0000	[diff] [blame]	338	default for parameter linejunk in :func:`ndiff` in older versions.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	339
				340
				341	.. function:: IS_CHARACTER_JUNK(ch)
				342
				343	Return true for ignorable characters. The character ch is ignorable if ch
				344	is a space or tab, otherwise it is not ignorable. Used as a default for
				345	parameter charjunk in :func:`ndiff`.
				346
				347
				348	.. seealso::
				349
Georg Brandl	525d355	2014-10-29 10:26:56 +0100	[diff] [blame]	350	`Pattern Matching: The Gestalt Approach <http://www.drdobbs.com/database/pattern-matching-the-gestalt-approach/184407970>`_
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	351	Discussion of a similar algorithm by John W. Ratcliff and D. E. Metzener. This
Georg Brandl	525d355	2014-10-29 10:26:56 +0100	[diff] [blame]	352	was published in `Dr. Dobb's Journal <http://www.drdobbs.com/>`_ in July, 1988.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	353
				354
				355	.. _sequence-matcher:
				356
				357	SequenceMatcher Objects
				358	-----------------------
				359
				360	The :class:`SequenceMatcher` class has this constructor:
				361
				362
Terry Reedy	99f9637	2010-11-25 06:12:34 +0000	[diff] [blame]	363	.. class:: SequenceMatcher(isjunk=None, a='', b='', autojunk=True)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	364
				365	Optional argument isjunk must be ``None`` (the default) or a one-argument
				366	function that takes a sequence element and returns true if and only if the
				367	element is "junk" and should be ignored. Passing ``None`` for isjunk is
				368	equivalent to passing ``lambda x: 0``; in other words, no elements are ignored.
				369	For example, pass::
				370
				371	lambda x: x in " \t"
				372
				373	if you're comparing lines as sequences of characters, and don't want to synch up
				374	on blanks or hard tabs.
				375
				376	The optional arguments a and b are sequences to be compared; both default to
Guido van Rossum	2cc30da	2007-11-02 23:46:40 +0000	[diff] [blame]	377	empty strings. The elements of both sequences must be :term:`hashable`.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	378
Terry Reedy	99f9637	2010-11-25 06:12:34 +0000	[diff] [blame]	379	The optional argument autojunk can be used to disable the automatic junk
				380	heuristic.
				381
Terry Reedy	dc9b17d	2010-11-27 20:52:14 +0000	[diff] [blame]	382	.. versionadded:: 3.2
				383	The autojunk parameter.
				384
Terry Reedy	74a7c67	2010-12-03 18:57:42 +0000	[diff] [blame]	385	SequenceMatcher objects get three data attributes: bjunk is the
Serhiy Storchaka	fbc1c26	2013-11-29 12:17:13 +0200	[diff] [blame]	386	set of elements of b for which isjunk is ``True``; bpopular is the set of
Terry Reedy	17a5925	2010-12-15 20:18:10 +0000	[diff] [blame]	387	non-junk elements considered popular by the heuristic (if it is not
				388	disabled); b2j is a dict mapping the remaining elements of b to a list
				389	of positions where they occur. All three are reset whenever b is reset
				390	with :meth:`set_seqs` or :meth:`set_seq2`.
Terry Reedy	74a7c67	2010-12-03 18:57:42 +0000	[diff] [blame]	391
Georg Brandl	500be24	2010-12-03 19:56:42 +0000	[diff] [blame]	392	.. versionadded:: 3.2
Terry Reedy	74a7c67	2010-12-03 18:57:42 +0000	[diff] [blame]	393	The bjunk and bpopular attributes.
				394
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	395	:class:`SequenceMatcher` objects have the following methods:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	396
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	397	.. method:: set_seqs(a, b)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	398
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	399	Set the two sequences to be compared.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	400
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	401	:class:`SequenceMatcher` computes and caches detailed information about the
				402	second sequence, so if you want to compare one sequence against many
				403	sequences, use :meth:`set_seq2` to set the commonly used sequence once and
				404	call :meth:`set_seq1` repeatedly, once for each of the other sequences.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	405
				406
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	407	.. method:: set_seq1(a)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	408
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	409	Set the first sequence to be compared. The second sequence to be compared
				410	is not changed.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	411
				412
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	413	.. method:: set_seq2(b)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	414
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	415	Set the second sequence to be compared. The first sequence to be compared
				416	is not changed.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	417
				418
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	419	.. method:: find_longest_match(alo, ahi, blo, bhi)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	420
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	421	Find longest matching block in ``a[alo:ahi]`` and ``b[blo:bhi]``.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	422
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	423	If isjunk was omitted or ``None``, :meth:`find_longest_match` returns
				424	``(i, j, k)`` such that ``a[i:i+k]`` is equal to ``b[j:j+k]``, where ``alo
				425	<= i <= i+k <= ahi`` and ``blo <= j <= j+k <= bhi``. For all ``(i', j',
				426	k')`` meeting those conditions, the additional conditions ``k >= k'``, ``i
				427	<= i'``, and if ``i == i'``, ``j <= j'`` are also met. In other words, of
				428	all maximal matching blocks, return one that starts earliest in a, and
				429	of all those maximal matching blocks that start earliest in a, return
				430	the one that starts earliest in b.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	431
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	432	>>> s = SequenceMatcher(None, " abcd", "abcd abcd")
				433	>>> s.find_longest_match(0, 5, 0, 9)
				434	Match(a=0, b=4, size=5)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	435
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	436	If isjunk was provided, first the longest matching block is determined
				437	as above, but with the additional restriction that no junk element appears
				438	in the block. Then that block is extended as far as possible by matching
				439	(only) junk elements on both sides. So the resulting block never matches
				440	on junk except as identical junk happens to be adjacent to an interesting
				441	match.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	442
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	443	Here's the same example as before, but considering blanks to be junk. That
				444	prevents ``' abcd'`` from matching the ``' abcd'`` at the tail end of the
				445	second sequence directly. Instead only the ``'abcd'`` can match, and
				446	matches the leftmost ``'abcd'`` in the second sequence:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	447
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	448	>>> s = SequenceMatcher(lambda x: x==" ", " abcd", "abcd abcd")
				449	>>> s.find_longest_match(0, 5, 0, 9)
				450	Match(a=1, b=0, size=4)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	451
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	452	If no blocks match, this returns ``(alo, blo, 0)``.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	453
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	454	This method returns a :term:`named tuple` ``Match(a, b, size)``.
Christian Heimes	25bb783	2008-01-11 16:17:00 +0000	[diff] [blame]	455
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	456
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	457	.. method:: get_matching_blocks()
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	458
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	459	Return list of triples describing matching subsequences. Each triple is of
				460	the form ``(i, j, n)``, and means that ``a[i:i+n] == b[j:j+n]``. The
				461	triples are monotonically increasing in i and j.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	462
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	463	The last triple is a dummy, and has the value ``(len(a), len(b), 0)``. It
				464	is the only triple with ``n == 0``. If ``(i, j, n)`` and ``(i', j', n')``
				465	are adjacent triples in the list, and the second is not the last triple in
				466	the list, then ``i+n != i'`` or ``j+n != j'``; in other words, adjacent
				467	triples always describe non-adjacent equal blocks.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	468
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	469	.. XXX Explain why a dummy is used!
Christian Heimes	5b5e81c	2007-12-31 16:14:33 +0000	[diff] [blame]	470
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	471	.. doctest::
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	472
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	473	>>> s = SequenceMatcher(None, "abxcd", "abcd")
				474	>>> s.get_matching_blocks()
				475	[Match(a=0, b=0, size=2), Match(a=3, b=2, size=2), Match(a=5, b=4, size=0)]
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	476
				477
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	478	.. method:: get_opcodes()
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	479
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	480	Return list of 5-tuples describing how to turn a into b. Each tuple is
				481	of the form ``(tag, i1, i2, j1, j2)``. The first tuple has ``i1 == j1 ==
				482	0``, and remaining tuples have i1 equal to the i2 from the preceding
				483	tuple, and, likewise, j1 equal to the previous j2.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	484
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	485	The tag values are strings, with these meanings:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	486
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	487	+---------------+---------------------------------------------+
				488	\| Value \| Meaning \|
				489	+===============+=============================================+
				490	\| ``'replace'`` \| ``a[i1:i2]`` should be replaced by \|
				491	\| \| ``b[j1:j2]``. \|
				492	+---------------+---------------------------------------------+
				493	\| ``'delete'`` \| ``a[i1:i2]`` should be deleted. Note that \|
				494	\| \| ``j1 == j2`` in this case. \|
				495	+---------------+---------------------------------------------+
				496	\| ``'insert'`` \| ``b[j1:j2]`` should be inserted at \|
				497	\| \| ``a[i1:i1]``. Note that ``i1 == i2`` in \|
				498	\| \| this case. \|
				499	+---------------+---------------------------------------------+
				500	\| ``'equal'`` \| ``a[i1:i2] == b[j1:j2]`` (the sub-sequences \|
				501	\| \| are equal). \|
				502	+---------------+---------------------------------------------+
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	503
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	504	For example:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	505
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	506	>>> a = "qabxcd"
				507	>>> b = "abycdf"
				508	>>> s = SequenceMatcher(None, a, b)
				509	>>> for tag, i1, i2, j1, j2 in s.get_opcodes():
Raymond Hettinger	dbb677a	2011-04-09 19:41:00 -0700	[diff] [blame]	510	print('{:7} a[{}:{}] --> b[{}:{}] {!r:>8} --> {!r}'.format(
				511	tag, i1, i2, j1, j2, a[i1:i2], b[j1:j2]))
				512
				513
				514	delete a[0:1] --> b[0:0] 'q' --> ''
				515	equal a[1:3] --> b[0:2] 'ab' --> 'ab'
				516	replace a[3:4] --> b[2:3] 'x' --> 'y'
				517	equal a[4:6] --> b[3:5] 'cd' --> 'cd'
				518	insert a[6:6] --> b[5:6] '' --> 'f'
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	519
				520
Georg Brandl	c2a4f4f	2009-04-10 09:03:43 +0000	[diff] [blame]	521	.. method:: get_grouped_opcodes(n=3)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	522
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	523	Return a :term:`generator` of groups with up to n lines of context.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	524
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	525	Starting with the groups returned by :meth:`get_opcodes`, this method
				526	splits out smaller change clusters and eliminates intervening ranges which
				527	have no changes.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	528
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	529	The groups are returned in the same format as :meth:`get_opcodes`.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	530
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	531
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	532	.. method:: ratio()
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	533
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	534	Return a measure of the sequences' similarity as a float in the range [0,
				535	1].
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	536
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	537	Where T is the total number of elements in both sequences, and M is the
				538	number of matches, this is 2.0\*M / T. Note that this is ``1.0`` if the
				539	sequences are identical, and ``0.0`` if they have nothing in common.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	540
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	541	This is expensive to compute if :meth:`get_matching_blocks` or
				542	:meth:`get_opcodes` hasn't already been called, in which case you may want
				543	to try :meth:`quick_ratio` or :meth:`real_quick_ratio` first to get an
				544	upper bound.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	545
				546
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	547	.. method:: quick_ratio()
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	548
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	549	Return an upper bound on :meth:`ratio` relatively quickly.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	550
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	551
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	552	.. method:: real_quick_ratio()
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	553
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	554	Return an upper bound on :meth:`ratio` very quickly.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	555
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	556
				557	The three methods that return the ratio of matching to total characters can give
				558	different results due to differing levels of approximation, although
				559	:meth:`quick_ratio` and :meth:`real_quick_ratio` are always at least as large as
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	560	:meth:`ratio`:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	561
				562	>>> s = SequenceMatcher(None, "abcd", "bcde")
				563	>>> s.ratio()
				564	0.75
				565	>>> s.quick_ratio()
				566	0.75
				567	>>> s.real_quick_ratio()
				568	1.0
				569
				570
				571	.. _sequencematcher-examples:
				572
				573	SequenceMatcher Examples
				574	------------------------
				575
Terry Reedy	74a7c67	2010-12-03 18:57:42 +0000	[diff] [blame]	576	This example compares two strings, considering blanks to be "junk":
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	577
				578	>>> s = SequenceMatcher(lambda x: x == " ",
				579	... "private Thread currentThread;",
				580	... "private volatile Thread currentThread;")
				581
				582	:meth:`ratio` returns a float in [0, 1], measuring the similarity of the
				583	sequences. As a rule of thumb, a :meth:`ratio` value over 0.6 means the
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	584	sequences are close matches:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	585
Georg Brandl	6911e3c	2007-09-04 07:15:32 +0000	[diff] [blame]	586	>>> print(round(s.ratio(), 3))
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	587	0.866
				588
				589	If you're only interested in where the sequences match,
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	590	:meth:`get_matching_blocks` is handy:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	591
				592	>>> for block in s.get_matching_blocks():
Georg Brandl	6911e3c	2007-09-04 07:15:32 +0000	[diff] [blame]	593	... print("a[%d] and b[%d] match for %d elements" % block)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	594	a[0] and b[0] match for 8 elements
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	595	a[8] and b[17] match for 21 elements
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	596	a[29] and b[38] match for 0 elements
				597
				598	Note that the last tuple returned by :meth:`get_matching_blocks` is always a
				599	dummy, ``(len(a), len(b), 0)``, and this is the only case in which the last
				600	tuple element (number of elements matched) is ``0``.
				601
				602	If you want to know how to change the first sequence into the second, use
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	603	:meth:`get_opcodes`:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	604
				605	>>> for opcode in s.get_opcodes():
Georg Brandl	6911e3c	2007-09-04 07:15:32 +0000	[diff] [blame]	606	... print("%6s a[%d:%d] b[%d:%d]" % opcode)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	607	equal a[0:8] b[0:8]
				608	insert a[8:8] b[8:17]
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	609	equal a[8:29] b[17:38]
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	610
Raymond Hettinger	58c8c26	2009-04-27 21:01:21 +0000	[diff] [blame]	611	.. seealso::
				612
				613	* The :func:`get_close_matches` function in this module which shows how
				614	simple code building on :class:`SequenceMatcher` can be used to do useful
				615	work.
				616
				617	* `Simple version control recipe
				618	<http://code.activestate.com/recipes/576729/>`_ for a small application
				619	built with :class:`SequenceMatcher`.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	620
				621
				622	.. _differ-objects:
				623
				624	Differ Objects
				625	--------------
				626
				627	Note that :class:`Differ`\ -generated deltas make no claim to be minimal
				628	diffs. To the contrary, minimal diffs are often counter-intuitive, because they
				629	synch up anywhere possible, sometimes accidental matches 100 pages apart.
				630	Restricting synch points to contiguous matches preserves some notion of
				631	locality, at the occasional cost of producing a longer diff.
				632
				633	The :class:`Differ` class has this constructor:
				634
				635
Georg Brandl	c2a4f4f	2009-04-10 09:03:43 +0000	[diff] [blame]	636	.. class:: Differ(linejunk=None, charjunk=None)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	637
				638	Optional keyword parameters linejunk and charjunk are for filter functions
				639	(or ``None``):
				640
				641	linejunk: A function that accepts a single string argument, and returns true
				642	if the string is junk. The default is ``None``, meaning that no line is
				643	considered junk.
				644
				645	charjunk: A function that accepts a single character argument (a string of
				646	length 1), and returns true if the character is junk. The default is ``None``,
				647	meaning that no character is considered junk.
				648
Andrew Kuchling	c51da2b	2014-03-19 16:43:06 -0400	[diff] [blame]	649	These junk-filtering functions speed up matching to find
				650	differences and do not cause any differing lines or characters to
				651	be ignored. Read the description of the
				652	:meth:`~SequenceMatcher.find_longest_match` method's isjunk
				653	parameter for an explanation.
				654
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	655	:class:`Differ` objects are used (deltas generated) via a single method:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	656
				657
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	658	.. method:: Differ.compare(a, b)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	659
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	660	Compare two sequences of lines, and generate the delta (a sequence of lines).
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	661
Serhiy Storchaka	bfdcd43	2013-10-13 23:09:14 +0300	[diff] [blame]	662	Each sequence must contain individual single-line strings ending with
				663	newlines. Such sequences can be obtained from the
				664	:meth:`~io.IOBase.readlines` method of file-like objects. The delta
				665	generated also consists of newline-terminated strings, ready to be
				666	printed as-is via the :meth:`~io.IOBase.writelines` method of a
				667	file-like object.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	668
				669
				670	.. _differ-examples:
				671
				672	Differ Example
				673	--------------
				674
				675	This example compares two texts. First we set up the texts, sequences of
				676	individual single-line strings ending with newlines (such sequences can also be
Serhiy Storchaka	bfdcd43	2013-10-13 23:09:14 +0300	[diff] [blame]	677	obtained from the :meth:`~io.BaseIO.readlines` method of file-like objects):
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	678
				679	>>> text1 = ''' 1. Beautiful is better than ugly.
				680	... 2. Explicit is better than implicit.
				681	... 3. Simple is better than complex.
				682	... 4. Complex is better than complicated.
Terry Jan Reedy	bddecc3	2014-04-18 17:00:19 -0400	[diff] [blame]	683	... '''.splitlines(keepends=True)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	684	>>> len(text1)
				685	4
				686	>>> text1[0][-1]
				687	'\n'
				688	>>> text2 = ''' 1. Beautiful is better than ugly.
				689	... 3. Simple is better than complex.
				690	... 4. Complicated is better than complex.
				691	... 5. Flat is better than nested.
Terry Jan Reedy	bddecc3	2014-04-18 17:00:19 -0400	[diff] [blame]	692	... '''.splitlines(keepends=True)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	693
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	694	Next we instantiate a Differ object:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	695
				696	>>> d = Differ()
				697
				698	Note that when instantiating a :class:`Differ` object we may pass functions to
				699	filter out line and character "junk." See the :meth:`Differ` constructor for
				700	details.
				701
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	702	Finally, we compare the two:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	703
				704	>>> result = list(d.compare(text1, text2))
				705
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	706	``result`` is a list of strings, so let's pretty-print it:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	707
				708	>>> from pprint import pprint
				709	>>> pprint(result)
				710	[' 1. Beautiful is better than ugly.\n',
				711	'- 2. Explicit is better than implicit.\n',
				712	'- 3. Simple is better than complex.\n',
				713	'+ 3. Simple is better than complex.\n',
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	714	'? ++\n',
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	715	'- 4. Complex is better than complicated.\n',
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	716	'? ^ ---- ^\n',
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	717	'+ 4. Complicated is better than complex.\n',
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	718	'? ++++ ^ ^\n',
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	719	'+ 5. Flat is better than nested.\n']
				720
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	721	As a single multi-line string it looks like this:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	722
				723	>>> import sys
				724	>>> sys.stdout.writelines(result)
				725	1. Beautiful is better than ugly.
				726	- 2. Explicit is better than implicit.
				727	- 3. Simple is better than complex.
				728	+ 3. Simple is better than complex.
				729	? ++
				730	- 4. Complex is better than complicated.
				731	? ^ ---- ^
				732	+ 4. Complicated is better than complex.
				733	? ++++ ^ ^
				734	+ 5. Flat is better than nested.
				735
Christian Heimes	8640e74	2008-02-23 16:23:06 +0000	[diff] [blame]	736
				737	.. _difflib-interface:
				738
				739	A command-line interface to difflib
				740	-----------------------------------
				741
				742	This example shows how to use difflib to create a ``diff``-like utility.
				743	It is also contained in the Python source distribution, as
				744	:file:`Tools/scripts/diff.py`.
				745
Berker Peksag	707deb9	2015-07-30 00:03:48 +0300	[diff] [blame]	746	.. literalinclude:: ../../Tools/scripts/diff.py