Blame - Doc/library/difflib.rst - platform/external/python/cpython3

blob: 707f179baa332e5695bdba1c7c2f7f8d49c5a388 [file] [log] [blame]

Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	1	:mod:`difflib` --- Helpers for computing deltas
				2	===============================================
				3
				4	.. module:: difflib
				5	:synopsis: Helpers for computing differences between objects.
				6	.. moduleauthor:: Tim Peters <tim_one@users.sourceforge.net>
				7	.. sectionauthor:: Tim Peters <tim_one@users.sourceforge.net>
Christian Heimes	5b5e81c	2007-12-31 16:14:33 +0000	[diff] [blame]	8	.. Markup by Fred L. Drake, Jr. <fdrake@acm.org>
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	9
Andrew Kuchling	2e3743c	2014-03-19 16:23:01 -0400	[diff] [blame]	10	Source code: :source:`Lib/difflib.py`
				11
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	12	.. testsetup::
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	13
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	14	import sys
				15	from difflib import *
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	16
Georg Brandl	9afde1c	2007-11-01 20:32:30 +0000	[diff] [blame]	17	This module provides classes and functions for comparing sequences. It
				18	can be used for example, for comparing files, and can produce difference
				19	information in various formats, including HTML and context and unified
				20	diffs. For comparing directories and files, see also, the :mod:`filecmp` module.
				21
Terry Reedy	99f9637	2010-11-25 06:12:34 +0000	[diff] [blame]	22
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	23	.. class:: SequenceMatcher
				24
				25	This is a flexible class for comparing pairs of sequences of any type, so long
Guido van Rossum	2cc30da	2007-11-02 23:46:40 +0000	[diff] [blame]	26	as the sequence elements are :term:`hashable`. The basic algorithm predates, and is a
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	27	little fancier than, an algorithm published in the late 1980's by Ratcliff and
				28	Obershelp under the hyperbolic name "gestalt pattern matching." The idea is to
				29	find the longest contiguous matching subsequence that contains no "junk"
Andrew Kuchling	c51da2b	2014-03-19 16:43:06 -0400	[diff] [blame]	30	elements; these "junk" elements are ones that are uninteresting in some
				31	sense, such as blank lines or whitespace. (Handling junk is an
				32	extension to the Ratcliff and Obershelp algorithm.) The same
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	33	idea is then applied recursively to the pieces of the sequences to the left and
				34	to the right of the matching subsequence. This does not yield minimal edit
				35	sequences, but does tend to yield matches that "look right" to people.
				36
				37	Timing: The basic Ratcliff-Obershelp algorithm is cubic time in the worst
				38	case and quadratic time in the expected case. :class:`SequenceMatcher` is
				39	quadratic time for the worst case and has expected-case behavior dependent in a
				40	complicated way on how many elements the sequences have in common; best case
				41	time is linear.
				42
Terry Reedy	99f9637	2010-11-25 06:12:34 +0000	[diff] [blame]	43	Automatic junk heuristic: :class:`SequenceMatcher` supports a heuristic that
				44	automatically treats certain sequence items as junk. The heuristic counts how many
				45	times each individual item appears in the sequence. If an item's duplicates (after
				46	the first one) account for more than 1% of the sequence and the sequence is at least
				47	200 items long, this item is marked as "popular" and is treated as junk for
				48	the purpose of sequence matching. This heuristic can be turned off by setting
				49	the ``autojunk`` argument to ``False`` when creating the :class:`SequenceMatcher`.
				50
Terry Reedy	dc9b17d	2010-11-27 20:52:14 +0000	[diff] [blame]	51	.. versionadded:: 3.2
				52	The autojunk parameter.
				53
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	54
				55	.. class:: Differ
				56
				57	This is a class for comparing sequences of lines of text, and producing
				58	human-readable differences or deltas. Differ uses :class:`SequenceMatcher`
				59	both to compare sequences of lines, and to compare sequences of characters
				60	within similar (near-matching) lines.
				61
				62	Each line of a :class:`Differ` delta begins with a two-letter code:
				63
				64	+----------+-------------------------------------------+
				65	\| Code \| Meaning \|
				66	+==========+===========================================+
				67	\| ``'- '`` \| line unique to sequence 1 \|
				68	+----------+-------------------------------------------+
				69	\| ``'+ '`` \| line unique to sequence 2 \|
				70	+----------+-------------------------------------------+
				71	\| ``' '`` \| line common to both sequences \|
				72	+----------+-------------------------------------------+
				73	\| ``'? '`` \| line not present in either input sequence \|
				74	+----------+-------------------------------------------+
				75
				76	Lines beginning with '``?``' attempt to guide the eye to intraline differences,
				77	and were not present in either input sequence. These lines can be confusing if
				78	the sequences contain tab characters.
				79
				80
				81	.. class:: HtmlDiff
				82
				83	This class can be used to create an HTML table (or a complete HTML file
				84	containing the table) showing a side by side, line by line comparison of text
				85	with inter-line and intra-line change highlights. The table can be generated in
				86	either full or contextual difference mode.
				87
				88	The constructor for this class is:
				89
				90
Georg Brandl	c2a4f4f	2009-04-10 09:03:43 +0000	[diff] [blame]	91	.. method:: __init__(tabsize=8, wrapcolumn=None, linejunk=None, charjunk=IS_CHARACTER_JUNK)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	92
				93	Initializes instance of :class:`HtmlDiff`.
				94
				95	tabsize is an optional keyword argument to specify tab stop spacing and
				96	defaults to ``8``.
				97
				98	wrapcolumn is an optional keyword to specify column number where lines are
				99	broken and wrapped, defaults to ``None`` where lines are not wrapped.
				100
				101	linejunk and charjunk are optional keyword arguments passed into ``ndiff()``
				102	(used by :class:`HtmlDiff` to generate the side by side HTML differences). See
				103	``ndiff()`` documentation for argument default values and descriptions.
				104
				105	The following methods are public:
				106
Georg Brandl	c2a4f4f	2009-04-10 09:03:43 +0000	[diff] [blame]	107	.. method:: make_file(fromlines, tolines, fromdesc='', todesc='', context=False, numlines=5)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	108
				109	Compares fromlines and tolines (lists of strings) and returns a string which
				110	is a complete HTML file containing a table showing line by line differences with
				111	inter-line and intra-line changes highlighted.
				112
				113	fromdesc and todesc are optional keyword arguments to specify from/to file
				114	column header strings (both default to an empty string).
				115
				116	context and numlines are both optional keyword arguments. Set context to
				117	``True`` when contextual differences are to be shown, else the default is
				118	``False`` to show the full files. numlines defaults to ``5``. When context
				119	is ``True`` numlines controls the number of context lines which surround the
				120	difference highlights. When context is ``False`` numlines controls the
				121	number of lines which are shown before a difference highlight when using the
				122	"next" hyperlinks (setting to zero would cause the "next" hyperlinks to place
				123	the next difference highlight at the top of the browser without any leading
				124	context).
				125
Georg Brandl	c2a4f4f	2009-04-10 09:03:43 +0000	[diff] [blame]	126	.. method:: make_table(fromlines, tolines, fromdesc='', todesc='', context=False, numlines=5)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	127
				128	Compares fromlines and tolines (lists of strings) and returns a string which
				129	is a complete HTML table showing line by line differences with inter-line and
				130	intra-line changes highlighted.
				131
				132	The arguments for this method are the same as those for the :meth:`make_file`
				133	method.
				134
				135	:file:`Tools/scripts/diff.py` is a command-line front-end to this class and
				136	contains a good example of its use.
				137
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	138
Georg Brandl	c2a4f4f	2009-04-10 09:03:43 +0000	[diff] [blame]	139	.. function:: context_diff(a, b, fromfile='', tofile='', fromfiledate='', tofiledate='', n=3, lineterm='\\n')
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	140
Georg Brandl	9afde1c	2007-11-01 20:32:30 +0000	[diff] [blame]	141	Compare a and b (lists of strings); return a delta (a :term:`generator`
				142	generating the delta lines) in context diff format.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	143
				144	Context diffs are a compact way of showing just the lines that have changed plus
				145	a few lines of context. The changes are shown in a before/after style. The
				146	number of context lines is set by n which defaults to three.
				147
				148	By default, the diff control lines (those with ``***`` or ``---``) are created
				149	with a trailing newline. This is helpful so that inputs created from
Serhiy Storchaka	bfdcd43	2013-10-13 23:09:14 +0300	[diff] [blame]	150	:func:`io.IOBase.readlines` result in diffs that are suitable for use with
				151	:func:`io.IOBase.writelines` since both the inputs and outputs have trailing
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	152	newlines.
				153
				154	For inputs that do not have trailing newlines, set the lineterm argument to
				155	``""`` so that the output will be uniformly newline free.
				156
				157	The context diff format normally has a header for filenames and modification
				158	times. Any or all of these may be specified using strings for fromfile,
R. David Murray	b2416e5	2010-04-12 16:58:02 +0000	[diff] [blame]	159	tofile, fromfiledate, and tofiledate. The modification times are normally
				160	expressed in the ISO 8601 format. If not specified, the
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	161	strings default to blanks.
				162
Christian Heimes	8640e74	2008-02-23 16:23:06 +0000	[diff] [blame]	163	>>> s1 = ['bacon\n', 'eggs\n', 'ham\n', 'guido\n']
				164	>>> s2 = ['python\n', 'eggy\n', 'hamster\n', 'guido\n']
				165	>>> for line in context_diff(s1, s2, fromfile='before.py', tofile='after.py'):
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	166	... sys.stdout.write(line) # doctest: +NORMALIZE_WHITESPACE
Christian Heimes	8640e74	2008-02-23 16:23:06 +0000	[diff] [blame]	167	*** before.py
				168	--- after.py
				169	***************
				170	* 1,4 **
				171	! bacon
				172	! eggs
				173	! ham
				174	guido
				175	--- 1,4 ----
				176	! python
				177	! eggy
				178	! hamster
				179	guido
				180
				181	See :ref:`difflib-interface` for a more detailed example.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	182
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	183
Georg Brandl	c2a4f4f	2009-04-10 09:03:43 +0000	[diff] [blame]	184	.. function:: get_close_matches(word, possibilities, n=3, cutoff=0.6)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	185
				186	Return a list of the best "good enough" matches. word is a sequence for which
				187	close matches are desired (typically a string), and possibilities is a list of
				188	sequences against which to match word (typically a list of strings).
				189
				190	Optional argument n (default ``3``) is the maximum number of close matches to
				191	return; n must be greater than ``0``.
				192
				193	Optional argument cutoff (default ``0.6``) is a float in the range [0, 1].
				194	Possibilities that don't score at least that similar to word are ignored.
				195
				196	The best (no more than n) matches among the possibilities are returned in a
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	197	list, sorted by similarity score, most similar first.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	198
				199	>>> get_close_matches('appel', ['ape', 'apple', 'peach', 'puppy'])
				200	['apple', 'ape']
				201	>>> import keyword
				202	>>> get_close_matches('wheel', keyword.kwlist)
				203	['while']
				204	>>> get_close_matches('apple', keyword.kwlist)
				205	[]
				206	>>> get_close_matches('accept', keyword.kwlist)
				207	['except']
				208
				209
Georg Brandl	c2a4f4f	2009-04-10 09:03:43 +0000	[diff] [blame]	210	.. function:: ndiff(a, b, linejunk=None, charjunk=IS_CHARACTER_JUNK)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	211
Georg Brandl	9afde1c	2007-11-01 20:32:30 +0000	[diff] [blame]	212	Compare a and b (lists of strings); return a :class:`Differ`\ -style
				213	delta (a :term:`generator` generating the delta lines).
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	214
Andrew Kuchling	c51da2b	2014-03-19 16:43:06 -0400	[diff] [blame]	215	Optional keyword parameters linejunk and charjunk are filtering functions
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	216	(or ``None``):
				217
Georg Brandl	e6bcc91	2008-05-12 18:05:20 +0000	[diff] [blame]	218	linejunk: A function that accepts a single string argument, and returns
				219	true if the string is junk, or false if not. The default is ``None``. There
				220	is also a module-level function :func:`IS_LINE_JUNK`, which filters out lines
				221	without visible characters, except for at most one pound character (``'#'``)
				222	-- however the underlying :class:`SequenceMatcher` class does a dynamic
				223	analysis of which lines are so frequent as to constitute noise, and this
				224	usually works better than using this function.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	225
				226	charjunk: A function that accepts a character (a string of length 1), and
				227	returns if the character is junk, or false if not. The default is module-level
				228	function :func:`IS_CHARACTER_JUNK`, which filters out whitespace characters (a
Andrew Kuchling	c51da2b	2014-03-19 16:43:06 -0400	[diff] [blame]	229	blank or tab; it's a bad idea to include newline in this!).
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	230
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	231	:file:`Tools/scripts/ndiff.py` is a command-line front-end to this function.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	232
Terry Jan Reedy	bddecc3	2014-04-18 17:00:19 -0400	[diff] [blame]	233	>>> diff = ndiff('one\ntwo\nthree\n'.splitlines(keepends=True),
				234	... 'ore\ntree\nemu\n'.splitlines(keepends=True))
Georg Brandl	6911e3c	2007-09-04 07:15:32 +0000	[diff] [blame]	235	>>> print(''.join(diff), end="")
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	236	- one
				237	? ^
				238	+ ore
				239	? ^
				240	- two
				241	- three
				242	? -
				243	+ tree
				244	+ emu
				245
				246
				247	.. function:: restore(sequence, which)
				248
				249	Return one of the two sequences that generated a delta.
				250
				251	Given a sequence produced by :meth:`Differ.compare` or :func:`ndiff`, extract
				252	lines originating from file 1 or 2 (parameter which), stripping off line
				253	prefixes.
				254
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	255	Example:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	256
Terry Jan Reedy	bddecc3	2014-04-18 17:00:19 -0400	[diff] [blame]	257	>>> diff = ndiff('one\ntwo\nthree\n'.splitlines(keepends=True),
				258	... 'ore\ntree\nemu\n'.splitlines(keepends=True))
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	259	>>> diff = list(diff) # materialize the generated delta into a list
Georg Brandl	6911e3c	2007-09-04 07:15:32 +0000	[diff] [blame]	260	>>> print(''.join(restore(diff, 1)), end="")
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	261	one
				262	two
				263	three
Georg Brandl	6911e3c	2007-09-04 07:15:32 +0000	[diff] [blame]	264	>>> print(''.join(restore(diff, 2)), end="")
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	265	ore
				266	tree
				267	emu
				268
				269
Georg Brandl	c2a4f4f	2009-04-10 09:03:43 +0000	[diff] [blame]	270	.. function:: unified_diff(a, b, fromfile='', tofile='', fromfiledate='', tofiledate='', n=3, lineterm='\\n')
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	271
Georg Brandl	9afde1c	2007-11-01 20:32:30 +0000	[diff] [blame]	272	Compare a and b (lists of strings); return a delta (a :term:`generator`
				273	generating the delta lines) in unified diff format.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	274
				275	Unified diffs are a compact way of showing just the lines that have changed plus
				276	a few lines of context. The changes are shown in a inline style (instead of
				277	separate before/after blocks). The number of context lines is set by n which
				278	defaults to three.
				279
				280	By default, the diff control lines (those with ``---``, ``+++``, or ``@@``) are
				281	created with a trailing newline. This is helpful so that inputs created from
Serhiy Storchaka	bfdcd43	2013-10-13 23:09:14 +0300	[diff] [blame]	282	:func:`io.IOBase.readlines` result in diffs that are suitable for use with
				283	:func:`io.IOBase.writelines` since both the inputs and outputs have trailing
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	284	newlines.
				285
				286	For inputs that do not have trailing newlines, set the lineterm argument to
				287	``""`` so that the output will be uniformly newline free.
				288
				289	The context diff format normally has a header for filenames and modification
				290	times. Any or all of these may be specified using strings for fromfile,
R. David Murray	b2416e5	2010-04-12 16:58:02 +0000	[diff] [blame]	291	tofile, fromfiledate, and tofiledate. The modification times are normally
				292	expressed in the ISO 8601 format. If not specified, the
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	293	strings default to blanks.
				294
Christian Heimes	8640e74	2008-02-23 16:23:06 +0000	[diff] [blame]	295
				296	>>> s1 = ['bacon\n', 'eggs\n', 'ham\n', 'guido\n']
				297	>>> s2 = ['python\n', 'eggy\n', 'hamster\n', 'guido\n']
				298	>>> for line in unified_diff(s1, s2, fromfile='before.py', tofile='after.py'):
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	299	... sys.stdout.write(line) # doctest: +NORMALIZE_WHITESPACE
Christian Heimes	8640e74	2008-02-23 16:23:06 +0000	[diff] [blame]	300	--- before.py
				301	+++ after.py
				302	@@ -1,4 +1,4 @@
				303	-bacon
				304	-eggs
				305	-ham
				306	+python
				307	+eggy
				308	+hamster
				309	guido
				310
				311	See :ref:`difflib-interface` for a more detailed example.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	312
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	313
				314	.. function:: IS_LINE_JUNK(line)
				315
				316	Return true for ignorable lines. The line line is ignorable if line is
				317	blank or contains a single ``'#'``, otherwise it is not ignorable. Used as a
Georg Brandl	e6bcc91	2008-05-12 18:05:20 +0000	[diff] [blame]	318	default for parameter linejunk in :func:`ndiff` in older versions.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	319
				320
				321	.. function:: IS_CHARACTER_JUNK(ch)
				322
				323	Return true for ignorable characters. The character ch is ignorable if ch
				324	is a space or tab, otherwise it is not ignorable. Used as a default for
				325	parameter charjunk in :func:`ndiff`.
				326
				327
				328	.. seealso::
				329
				330	`Pattern Matching: The Gestalt Approach <http://www.ddj.com/184407970?pgno=5>`_
				331	Discussion of a similar algorithm by John W. Ratcliff and D. E. Metzener. This
				332	was published in `Dr. Dobb's Journal <http://www.ddj.com/>`_ in July, 1988.
				333
				334
				335	.. _sequence-matcher:
				336
				337	SequenceMatcher Objects
				338	-----------------------
				339
				340	The :class:`SequenceMatcher` class has this constructor:
				341
				342
Terry Reedy	99f9637	2010-11-25 06:12:34 +0000	[diff] [blame]	343	.. class:: SequenceMatcher(isjunk=None, a='', b='', autojunk=True)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	344
				345	Optional argument isjunk must be ``None`` (the default) or a one-argument
				346	function that takes a sequence element and returns true if and only if the
				347	element is "junk" and should be ignored. Passing ``None`` for isjunk is
				348	equivalent to passing ``lambda x: 0``; in other words, no elements are ignored.
				349	For example, pass::
				350
				351	lambda x: x in " \t"
				352
				353	if you're comparing lines as sequences of characters, and don't want to synch up
				354	on blanks or hard tabs.
				355
				356	The optional arguments a and b are sequences to be compared; both default to
Guido van Rossum	2cc30da	2007-11-02 23:46:40 +0000	[diff] [blame]	357	empty strings. The elements of both sequences must be :term:`hashable`.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	358
Terry Reedy	99f9637	2010-11-25 06:12:34 +0000	[diff] [blame]	359	The optional argument autojunk can be used to disable the automatic junk
				360	heuristic.
				361
Terry Reedy	dc9b17d	2010-11-27 20:52:14 +0000	[diff] [blame]	362	.. versionadded:: 3.2
				363	The autojunk parameter.
				364
Terry Reedy	74a7c67	2010-12-03 18:57:42 +0000	[diff] [blame]	365	SequenceMatcher objects get three data attributes: bjunk is the
Serhiy Storchaka	fbc1c26	2013-11-29 12:17:13 +0200	[diff] [blame]	366	set of elements of b for which isjunk is ``True``; bpopular is the set of
Terry Reedy	17a5925	2010-12-15 20:18:10 +0000	[diff] [blame]	367	non-junk elements considered popular by the heuristic (if it is not
				368	disabled); b2j is a dict mapping the remaining elements of b to a list
				369	of positions where they occur. All three are reset whenever b is reset
				370	with :meth:`set_seqs` or :meth:`set_seq2`.
Terry Reedy	74a7c67	2010-12-03 18:57:42 +0000	[diff] [blame]	371
Georg Brandl	500be24	2010-12-03 19:56:42 +0000	[diff] [blame]	372	.. versionadded:: 3.2
Terry Reedy	74a7c67	2010-12-03 18:57:42 +0000	[diff] [blame]	373	The bjunk and bpopular attributes.
				374
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	375	:class:`SequenceMatcher` objects have the following methods:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	376
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	377	.. method:: set_seqs(a, b)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	378
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	379	Set the two sequences to be compared.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	380
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	381	:class:`SequenceMatcher` computes and caches detailed information about the
				382	second sequence, so if you want to compare one sequence against many
				383	sequences, use :meth:`set_seq2` to set the commonly used sequence once and
				384	call :meth:`set_seq1` repeatedly, once for each of the other sequences.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	385
				386
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	387	.. method:: set_seq1(a)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	388
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	389	Set the first sequence to be compared. The second sequence to be compared
				390	is not changed.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	391
				392
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	393	.. method:: set_seq2(b)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	394
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	395	Set the second sequence to be compared. The first sequence to be compared
				396	is not changed.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	397
				398
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	399	.. method:: find_longest_match(alo, ahi, blo, bhi)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	400
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	401	Find longest matching block in ``a[alo:ahi]`` and ``b[blo:bhi]``.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	402
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	403	If isjunk was omitted or ``None``, :meth:`find_longest_match` returns
				404	``(i, j, k)`` such that ``a[i:i+k]`` is equal to ``b[j:j+k]``, where ``alo
				405	<= i <= i+k <= ahi`` and ``blo <= j <= j+k <= bhi``. For all ``(i', j',
				406	k')`` meeting those conditions, the additional conditions ``k >= k'``, ``i
				407	<= i'``, and if ``i == i'``, ``j <= j'`` are also met. In other words, of
				408	all maximal matching blocks, return one that starts earliest in a, and
				409	of all those maximal matching blocks that start earliest in a, return
				410	the one that starts earliest in b.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	411
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	412	>>> s = SequenceMatcher(None, " abcd", "abcd abcd")
				413	>>> s.find_longest_match(0, 5, 0, 9)
				414	Match(a=0, b=4, size=5)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	415
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	416	If isjunk was provided, first the longest matching block is determined
				417	as above, but with the additional restriction that no junk element appears
				418	in the block. Then that block is extended as far as possible by matching
				419	(only) junk elements on both sides. So the resulting block never matches
				420	on junk except as identical junk happens to be adjacent to an interesting
				421	match.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	422
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	423	Here's the same example as before, but considering blanks to be junk. That
				424	prevents ``' abcd'`` from matching the ``' abcd'`` at the tail end of the
				425	second sequence directly. Instead only the ``'abcd'`` can match, and
				426	matches the leftmost ``'abcd'`` in the second sequence:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	427
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	428	>>> s = SequenceMatcher(lambda x: x==" ", " abcd", "abcd abcd")
				429	>>> s.find_longest_match(0, 5, 0, 9)
				430	Match(a=1, b=0, size=4)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	431
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	432	If no blocks match, this returns ``(alo, blo, 0)``.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	433
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	434	This method returns a :term:`named tuple` ``Match(a, b, size)``.
Christian Heimes	25bb783	2008-01-11 16:17:00 +0000	[diff] [blame]	435
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	436
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	437	.. method:: get_matching_blocks()
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	438
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	439	Return list of triples describing matching subsequences. Each triple is of
				440	the form ``(i, j, n)``, and means that ``a[i:i+n] == b[j:j+n]``. The
				441	triples are monotonically increasing in i and j.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	442
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	443	The last triple is a dummy, and has the value ``(len(a), len(b), 0)``. It
				444	is the only triple with ``n == 0``. If ``(i, j, n)`` and ``(i', j', n')``
				445	are adjacent triples in the list, and the second is not the last triple in
				446	the list, then ``i+n != i'`` or ``j+n != j'``; in other words, adjacent
				447	triples always describe non-adjacent equal blocks.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	448
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	449	.. XXX Explain why a dummy is used!
Christian Heimes	5b5e81c	2007-12-31 16:14:33 +0000	[diff] [blame]	450
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	451	.. doctest::
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	452
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	453	>>> s = SequenceMatcher(None, "abxcd", "abcd")
				454	>>> s.get_matching_blocks()
				455	[Match(a=0, b=0, size=2), Match(a=3, b=2, size=2), Match(a=5, b=4, size=0)]
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	456
				457
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	458	.. method:: get_opcodes()
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	459
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	460	Return list of 5-tuples describing how to turn a into b. Each tuple is
				461	of the form ``(tag, i1, i2, j1, j2)``. The first tuple has ``i1 == j1 ==
				462	0``, and remaining tuples have i1 equal to the i2 from the preceding
				463	tuple, and, likewise, j1 equal to the previous j2.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	464
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	465	The tag values are strings, with these meanings:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	466
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	467	+---------------+---------------------------------------------+
				468	\| Value \| Meaning \|
				469	+===============+=============================================+
				470	\| ``'replace'`` \| ``a[i1:i2]`` should be replaced by \|
				471	\| \| ``b[j1:j2]``. \|
				472	+---------------+---------------------------------------------+
				473	\| ``'delete'`` \| ``a[i1:i2]`` should be deleted. Note that \|
				474	\| \| ``j1 == j2`` in this case. \|
				475	+---------------+---------------------------------------------+
				476	\| ``'insert'`` \| ``b[j1:j2]`` should be inserted at \|
				477	\| \| ``a[i1:i1]``. Note that ``i1 == i2`` in \|
				478	\| \| this case. \|
				479	+---------------+---------------------------------------------+
				480	\| ``'equal'`` \| ``a[i1:i2] == b[j1:j2]`` (the sub-sequences \|
				481	\| \| are equal). \|
				482	+---------------+---------------------------------------------+
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	483
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	484	For example:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	485
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	486	>>> a = "qabxcd"
				487	>>> b = "abycdf"
				488	>>> s = SequenceMatcher(None, a, b)
				489	>>> for tag, i1, i2, j1, j2 in s.get_opcodes():
Raymond Hettinger	dbb677a	2011-04-09 19:41:00 -0700	[diff] [blame]	490	print('{:7} a[{}:{}] --> b[{}:{}] {!r:>8} --> {!r}'.format(
				491	tag, i1, i2, j1, j2, a[i1:i2], b[j1:j2]))
				492
				493
				494	delete a[0:1] --> b[0:0] 'q' --> ''
				495	equal a[1:3] --> b[0:2] 'ab' --> 'ab'
				496	replace a[3:4] --> b[2:3] 'x' --> 'y'
				497	equal a[4:6] --> b[3:5] 'cd' --> 'cd'
				498	insert a[6:6] --> b[5:6] '' --> 'f'
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	499
				500
Georg Brandl	c2a4f4f	2009-04-10 09:03:43 +0000	[diff] [blame]	501	.. method:: get_grouped_opcodes(n=3)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	502
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	503	Return a :term:`generator` of groups with up to n lines of context.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	504
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	505	Starting with the groups returned by :meth:`get_opcodes`, this method
				506	splits out smaller change clusters and eliminates intervening ranges which
				507	have no changes.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	508
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	509	The groups are returned in the same format as :meth:`get_opcodes`.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	510
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	511
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	512	.. method:: ratio()
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	513
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	514	Return a measure of the sequences' similarity as a float in the range [0,
				515	1].
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	516
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	517	Where T is the total number of elements in both sequences, and M is the
				518	number of matches, this is 2.0\*M / T. Note that this is ``1.0`` if the
				519	sequences are identical, and ``0.0`` if they have nothing in common.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	520
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	521	This is expensive to compute if :meth:`get_matching_blocks` or
				522	:meth:`get_opcodes` hasn't already been called, in which case you may want
				523	to try :meth:`quick_ratio` or :meth:`real_quick_ratio` first to get an
				524	upper bound.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	525
				526
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	527	.. method:: quick_ratio()
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	528
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	529	Return an upper bound on :meth:`ratio` relatively quickly.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	530
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	531
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	532	.. method:: real_quick_ratio()
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	533
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	534	Return an upper bound on :meth:`ratio` very quickly.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	535
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	536
				537	The three methods that return the ratio of matching to total characters can give
				538	different results due to differing levels of approximation, although
				539	:meth:`quick_ratio` and :meth:`real_quick_ratio` are always at least as large as
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	540	:meth:`ratio`:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	541
				542	>>> s = SequenceMatcher(None, "abcd", "bcde")
				543	>>> s.ratio()
				544	0.75
				545	>>> s.quick_ratio()
				546	0.75
				547	>>> s.real_quick_ratio()
				548	1.0
				549
				550
				551	.. _sequencematcher-examples:
				552
				553	SequenceMatcher Examples
				554	------------------------
				555
Terry Reedy	74a7c67	2010-12-03 18:57:42 +0000	[diff] [blame]	556	This example compares two strings, considering blanks to be "junk":
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	557
				558	>>> s = SequenceMatcher(lambda x: x == " ",
				559	... "private Thread currentThread;",
				560	... "private volatile Thread currentThread;")
				561
				562	:meth:`ratio` returns a float in [0, 1], measuring the similarity of the
				563	sequences. As a rule of thumb, a :meth:`ratio` value over 0.6 means the
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	564	sequences are close matches:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	565
Georg Brandl	6911e3c	2007-09-04 07:15:32 +0000	[diff] [blame]	566	>>> print(round(s.ratio(), 3))
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	567	0.866
				568
				569	If you're only interested in where the sequences match,
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	570	:meth:`get_matching_blocks` is handy:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	571
				572	>>> for block in s.get_matching_blocks():
Georg Brandl	6911e3c	2007-09-04 07:15:32 +0000	[diff] [blame]	573	... print("a[%d] and b[%d] match for %d elements" % block)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	574	a[0] and b[0] match for 8 elements
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	575	a[8] and b[17] match for 21 elements
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	576	a[29] and b[38] match for 0 elements
				577
				578	Note that the last tuple returned by :meth:`get_matching_blocks` is always a
				579	dummy, ``(len(a), len(b), 0)``, and this is the only case in which the last
				580	tuple element (number of elements matched) is ``0``.
				581
				582	If you want to know how to change the first sequence into the second, use
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	583	:meth:`get_opcodes`:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	584
				585	>>> for opcode in s.get_opcodes():
Georg Brandl	6911e3c	2007-09-04 07:15:32 +0000	[diff] [blame]	586	... print("%6s a[%d:%d] b[%d:%d]" % opcode)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	587	equal a[0:8] b[0:8]
				588	insert a[8:8] b[8:17]
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	589	equal a[8:29] b[17:38]
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	590
Raymond Hettinger	58c8c26	2009-04-27 21:01:21 +0000	[diff] [blame]	591	.. seealso::
				592
				593	* The :func:`get_close_matches` function in this module which shows how
				594	simple code building on :class:`SequenceMatcher` can be used to do useful
				595	work.
				596
				597	* `Simple version control recipe
				598	<http://code.activestate.com/recipes/576729/>`_ for a small application
				599	built with :class:`SequenceMatcher`.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	600
				601
				602	.. _differ-objects:
				603
				604	Differ Objects
				605	--------------
				606
				607	Note that :class:`Differ`\ -generated deltas make no claim to be minimal
				608	diffs. To the contrary, minimal diffs are often counter-intuitive, because they
				609	synch up anywhere possible, sometimes accidental matches 100 pages apart.
				610	Restricting synch points to contiguous matches preserves some notion of
				611	locality, at the occasional cost of producing a longer diff.
				612
				613	The :class:`Differ` class has this constructor:
				614
				615
Georg Brandl	c2a4f4f	2009-04-10 09:03:43 +0000	[diff] [blame]	616	.. class:: Differ(linejunk=None, charjunk=None)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	617
				618	Optional keyword parameters linejunk and charjunk are for filter functions
				619	(or ``None``):
				620
				621	linejunk: A function that accepts a single string argument, and returns true
				622	if the string is junk. The default is ``None``, meaning that no line is
				623	considered junk.
				624
				625	charjunk: A function that accepts a single character argument (a string of
				626	length 1), and returns true if the character is junk. The default is ``None``,
				627	meaning that no character is considered junk.
				628
Andrew Kuchling	c51da2b	2014-03-19 16:43:06 -0400	[diff] [blame]	629	These junk-filtering functions speed up matching to find
				630	differences and do not cause any differing lines or characters to
				631	be ignored. Read the description of the
				632	:meth:`~SequenceMatcher.find_longest_match` method's isjunk
				633	parameter for an explanation.
				634
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	635	:class:`Differ` objects are used (deltas generated) via a single method:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	636
				637
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	638	.. method:: Differ.compare(a, b)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	639
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	640	Compare two sequences of lines, and generate the delta (a sequence of lines).
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	641
Serhiy Storchaka	bfdcd43	2013-10-13 23:09:14 +0300	[diff] [blame]	642	Each sequence must contain individual single-line strings ending with
				643	newlines. Such sequences can be obtained from the
				644	:meth:`~io.IOBase.readlines` method of file-like objects. The delta
				645	generated also consists of newline-terminated strings, ready to be
				646	printed as-is via the :meth:`~io.IOBase.writelines` method of a
				647	file-like object.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	648
				649
				650	.. _differ-examples:
				651
				652	Differ Example
				653	--------------
				654
				655	This example compares two texts. First we set up the texts, sequences of
				656	individual single-line strings ending with newlines (such sequences can also be
Serhiy Storchaka	bfdcd43	2013-10-13 23:09:14 +0300	[diff] [blame]	657	obtained from the :meth:`~io.BaseIO.readlines` method of file-like objects):
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	658
				659	>>> text1 = ''' 1. Beautiful is better than ugly.
				660	... 2. Explicit is better than implicit.
				661	... 3. Simple is better than complex.
				662	... 4. Complex is better than complicated.
Terry Jan Reedy	bddecc3	2014-04-18 17:00:19 -0400	[diff] [blame]	663	... '''.splitlines(keepends=True)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	664	>>> len(text1)
				665	4
				666	>>> text1[0][-1]
				667	'\n'
				668	>>> text2 = ''' 1. Beautiful is better than ugly.
				669	... 3. Simple is better than complex.
				670	... 4. Complicated is better than complex.
				671	... 5. Flat is better than nested.
Terry Jan Reedy	bddecc3	2014-04-18 17:00:19 -0400	[diff] [blame]	672	... '''.splitlines(keepends=True)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	673
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	674	Next we instantiate a Differ object:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	675
				676	>>> d = Differ()
				677
				678	Note that when instantiating a :class:`Differ` object we may pass functions to
				679	filter out line and character "junk." See the :meth:`Differ` constructor for
				680	details.
				681
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	682	Finally, we compare the two:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	683
				684	>>> result = list(d.compare(text1, text2))
				685
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	686	``result`` is a list of strings, so let's pretty-print it:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	687
				688	>>> from pprint import pprint
				689	>>> pprint(result)
				690	[' 1. Beautiful is better than ugly.\n',
				691	'- 2. Explicit is better than implicit.\n',
				692	'- 3. Simple is better than complex.\n',
				693	'+ 3. Simple is better than complex.\n',
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	694	'? ++\n',
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	695	'- 4. Complex is better than complicated.\n',
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	696	'? ^ ---- ^\n',
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	697	'+ 4. Complicated is better than complex.\n',
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	698	'? ++++ ^ ^\n',
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	699	'+ 5. Flat is better than nested.\n']
				700
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	701	As a single multi-line string it looks like this:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	702
				703	>>> import sys
				704	>>> sys.stdout.writelines(result)
				705	1. Beautiful is better than ugly.
				706	- 2. Explicit is better than implicit.
				707	- 3. Simple is better than complex.
				708	+ 3. Simple is better than complex.
				709	? ++
				710	- 4. Complex is better than complicated.
				711	? ^ ---- ^
				712	+ 4. Complicated is better than complex.
				713	? ++++ ^ ^
				714	+ 5. Flat is better than nested.
				715
Christian Heimes	8640e74	2008-02-23 16:23:06 +0000	[diff] [blame]	716
				717	.. _difflib-interface:
				718
				719	A command-line interface to difflib
				720	-----------------------------------
				721
				722	This example shows how to use difflib to create a ``diff``-like utility.
				723	It is also contained in the Python source distribution, as
				724	:file:`Tools/scripts/diff.py`.
				725
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	726	.. testcode::
Christian Heimes	8640e74	2008-02-23 16:23:06 +0000	[diff] [blame]	727
				728	""" Command line interface to difflib.py providing diffs in four formats:
				729
				730	* ndiff: lists every line and highlights interline changes.
				731	* context: highlights clusters of changes in a before/after format.
				732	* unified: highlights clusters of changes in an inline format.
				733	* html: generates side by side comparison with change highlights.
				734
				735	"""
				736
				737	import sys, os, time, difflib, optparse
				738
				739	def main():
				740	# Configure the option parser
				741	usage = "usage: %prog [options] fromfile tofile"
				742	parser = optparse.OptionParser(usage)
				743	parser.add_option("-c", action="store_true", default=False,
				744	help='Produce a context format diff (default)')
				745	parser.add_option("-u", action="store_true", default=False,
				746	help='Produce a unified format diff')
				747	hlp = 'Produce HTML side by side diff (can use -c and -l in conjunction)'
				748	parser.add_option("-m", action="store_true", default=False, help=hlp)
				749	parser.add_option("-n", action="store_true", default=False,
				750	help='Produce a ndiff format diff')
				751	parser.add_option("-l", "--lines", type="int", default=3,
				752	help='Set number of context lines (default 3)')
				753	(options, args) = parser.parse_args()
				754
				755	if len(args) == 0:
				756	parser.print_help()
				757	sys.exit(1)
				758	if len(args) != 2:
				759	parser.error("need to specify both a fromfile and tofile")
				760
				761	n = options.lines
				762	fromfile, tofile = args # as specified in the usage string
				763
				764	# we're passing these as arguments to the diff function
				765	fromdate = time.ctime(os.stat(fromfile).st_mtime)
				766	todate = time.ctime(os.stat(tofile).st_mtime)
R David Murray	96433f8	2013-07-30 15:37:11 -0400	[diff] [blame]	767	with open(fromfile) as fromf, open(tofile) as tof:
Éric Araujo	a3dd56b	2011-03-11 17:42:48 +0100	[diff] [blame]	768	fromlines, tolines = list(fromf), list(tof)
Christian Heimes	8640e74	2008-02-23 16:23:06 +0000	[diff] [blame]	769
				770	if options.u:
				771	diff = difflib.unified_diff(fromlines, tolines, fromfile, tofile,
				772	fromdate, todate, n=n)
				773	elif options.n:
				774	diff = difflib.ndiff(fromlines, tolines)
				775	elif options.m:
				776	diff = difflib.HtmlDiff().make_file(fromlines, tolines, fromfile,
				777	tofile, context=options.c,
				778	numlines=n)
				779	else:
				780	diff = difflib.context_diff(fromlines, tolines, fromfile, tofile,
				781	fromdate, todate, n=n)
				782
				783	# we're using writelines because diff is a generator
				784	sys.stdout.writelines(diff)
				785
				786	if __name__ == '__main__':
				787	main()