Blame - Doc/library/re.rst - platform/external/python/cpython3

blob: ac6455a22074d305e21bc1766ecc38f1f933098d [file] [log] [blame]

Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	1	:mod:`re` --- Regular expression operations
				2	===========================================
				3
				4	.. module:: re
				5	:synopsis: Regular expression operations.
Terry Jan Reedy	fa089b9	2016-06-11 15:02:54 -0400	[diff] [blame]	6
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	7	.. moduleauthor:: Fredrik Lundh <fredrik@pythonware.com>
				8	.. sectionauthor:: Andrew M. Kuchling <amk@amk.ca>
				9
Terry Jan Reedy	fa089b9	2016-06-11 15:02:54 -0400	[diff] [blame]	10	Source code: :source:`Lib/re.py`
				11
				12	--------------
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	13
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	14	This module provides regular expression matching operations similar to
Georg Brandl	ed2a1db	2009-06-08 07:48:27 +0000	[diff] [blame]	15	those found in Perl.
Antoine Pitrou	fd03645	2008-08-19 17:56:33 +0000	[diff] [blame]	16
Serhiy Storchaka	cd195e2	2017-10-14 11:14:26 +0300	[diff] [blame]	17	Both patterns and strings to be searched can be Unicode strings (:class:`str`)
				18	as well as 8-bit strings (:class:`bytes`).
				19	However, Unicode strings and 8-bit strings cannot be mixed:
Martin Panter	6245cb3	2016-04-15 02:14:19 +0000	[diff] [blame]	20	that is, you cannot match a Unicode string with a byte pattern or
Georg Brandl	ae2dbe2	2009-03-13 19:04:40 +0000	[diff] [blame]	21	vice-versa; similarly, when asking for a substitution, the replacement
Antoine Pitrou	fd03645	2008-08-19 17:56:33 +0000	[diff] [blame]	22	string must be of the same type as both the pattern and the search string.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	23
				24	Regular expressions use the backslash character (``'\'``) to indicate
				25	special forms or to allow special characters to be used without invoking
				26	their special meaning. This collides with Python's usage of the same
				27	character for the same purpose in string literals; for example, to match
				28	a literal backslash, one might have to write ``'\\\\'`` as the pattern
				29	string, because the regular expression must be ``\\``, and each
				30	backslash must be expressed as ``\\`` inside a regular Python string
Pablo Galindo	e8239b8	2019-01-20 18:57:56 +0000	[diff] [blame]	31	literal. Also, please note that any invalid escape sequences in Python's
				32	usage of the backslash in string literals now generate a :exc:`DeprecationWarning`
				33	and in the future this will become a :exc:`SyntaxError`. This behaviour
				34	will happen even if it is a valid escape sequence for a regular expression.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	35
				36	The solution is to use Python's raw string notation for regular expression
				37	patterns; backslashes are not handled in any special way in a string literal
				38	prefixed with ``'r'``. So ``r"\n"`` is a two-character string containing
				39	``'\'`` and ``'n'``, while ``"\n"`` is a one-character string containing a
Georg Brandl	9afde1c	2007-11-01 20:32:30 +0000	[diff] [blame]	40	newline. Usually patterns will be expressed in Python code using this raw
				41	string notation.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	42
Christian Heimes	b9eccbf	2007-12-05 20:18:38 +0000	[diff] [blame]	43	It is important to note that most regular expression operations are available as
Georg Brandl	c62a704	2010-07-29 11:49:05 +0000	[diff] [blame]	44	module-level functions and methods on
				45	:ref:`compiled regular expressions <re-objects>`. The functions are shortcuts
				46	that don't require you to compile a regex object first, but miss some
Christian Heimes	b9eccbf	2007-12-05 20:18:38 +0000	[diff] [blame]	47	fine-tuning parameters.
				48
Marco Buttu	ed6795e	2017-02-26 16:26:23 +0100	[diff] [blame]	49	.. seealso::
				50
Stéphane Wirtel	19177fb	2018-05-15 20:58:35 +0200	[diff] [blame]	51	The third-party `regex <https://pypi.org/project/regex/>`_ module,
Marco Buttu	ed6795e	2017-02-26 16:26:23 +0100	[diff] [blame]	52	which has an API compatible with the standard library :mod:`re` module,
				53	but offers additional functionality and a more thorough Unicode support.
				54
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	55
				56	.. _re-syntax:
				57
				58	Regular Expression Syntax
				59	-------------------------
				60
				61	A regular expression (or RE) specifies a set of strings that matches it; the
				62	functions in this module let you check if a particular string matches a given
				63	regular expression (or if a given regular expression matches a particular
				64	string, which comes down to the same thing).
				65
				66	Regular expressions can be concatenated to form new regular expressions; if A
				67	and B are both regular expressions, then AB is also a regular expression.
				68	In general, if a string p matches A and another string q matches B, the
				69	string pq will match AB. This holds unless A or B contain low precedence
				70	operations; boundary conditions between A and B; or have numbered group
				71	references. Thus, complex expressions can easily be constructed from simpler
				72	primitive expressions like the ones described here. For details of the theory
Berker Peksag	a0a42d2	2018-03-23 16:46:52 +0300	[diff] [blame]	73	and implementation of regular expressions, consult the Friedl book [Frie09]_,
				74	or almost any textbook about compiler construction.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	75
				76	A brief explanation of the format of regular expressions follows. For further
Christian Heimes	2202f87	2008-02-06 14:31:34 +0000	[diff] [blame]	77	information and a gentler presentation, consult the :ref:`regex-howto`.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	78
				79	Regular expressions can contain both special and ordinary characters. Most
				80	ordinary characters, like ``'A'``, ``'a'``, or ``'0'``, are the simplest regular
				81	expressions; they simply match themselves. You can concatenate ordinary
				82	characters, so ``last`` matches the string ``'last'``. (In the rest of this
				83	section, we'll write RE's in ``this special style``, usually without quotes, and
				84	strings to be matched ``'in single quotes'``.)
				85
				86	Some characters, like ``'\|'`` or ``'('``, are special. Special
				87	characters either stand for classes of ordinary characters, or affect
Serhiy Storchaka	cd195e2	2017-10-14 11:14:26 +0300	[diff] [blame]	88	how the regular expressions around them are interpreted.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	89
Martin Panter	684340e	2016-10-15 01:18:16 +0000	[diff] [blame]	90	Repetition qualifiers (``*``, ``+``, ``?``, ``{m,n}``, etc) cannot be
				91	directly nested. This avoids ambiguity with the non-greedy modifier suffix
				92	``?``, and with other modifiers in other implementations. To apply a second
				93	repetition to an inner repetition, parentheses may be used. For example,
				94	the expression ``(?:a{6})*`` matches any multiple of six ``'a'`` characters.
				95
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	96
				97	The special characters are:
				98
Serhiy Storchaka	913876d	2018-10-28 13:41:26 +0200	[diff] [blame]	99	.. index:: single: . (dot); in regular expressions
Serhiy Storchaka	ddb961d	2018-10-26 09:00:49 +0300	[diff] [blame]	100
Serhiy Storchaka	cd195e2	2017-10-14 11:14:26 +0300	[diff] [blame]	101	``.``
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	102	(Dot.) In the default mode, this matches any character except a newline. If
				103	the :const:`DOTALL` flag has been specified, this matches any character
				104	including a newline.
				105
Serhiy Storchaka	913876d	2018-10-28 13:41:26 +0200	[diff] [blame]	106	.. index:: single: ^ (caret); in regular expressions
Serhiy Storchaka	ddb961d	2018-10-26 09:00:49 +0300	[diff] [blame]	107
Serhiy Storchaka	cd195e2	2017-10-14 11:14:26 +0300	[diff] [blame]	108	``^``
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	109	(Caret.) Matches the start of the string, and in :const:`MULTILINE` mode also
				110	matches immediately after each newline.
				111
Serhiy Storchaka	913876d	2018-10-28 13:41:26 +0200	[diff] [blame]	112	.. index:: single: $ (dollar); in regular expressions
Serhiy Storchaka	ddb961d	2018-10-26 09:00:49 +0300	[diff] [blame]	113
Serhiy Storchaka	cd195e2	2017-10-14 11:14:26 +0300	[diff] [blame]	114	``$``
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	115	Matches the end of the string or just before the newline at the end of the
				116	string, and in :const:`MULTILINE` mode also matches before a newline. ``foo``
				117	matches both 'foo' and 'foobar', while the regular expression ``foo$`` matches
				118	only 'foo'. More interestingly, searching for ``foo.$`` in ``'foo1\nfoo2\n'``
Christian Heimes	25bb783	2008-01-11 16:17:00 +0000	[diff] [blame]	119	matches 'foo2' normally, but 'foo1' in :const:`MULTILINE` mode; searching for
				120	a single ``$`` in ``'foo\n'`` will find two (empty) matches: one just before
				121	the newline, and one at the end of the string.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	122
Serhiy Storchaka	913876d	2018-10-28 13:41:26 +0200	[diff] [blame]	123	.. index:: single: * (asterisk); in regular expressions
Serhiy Storchaka	ddb961d	2018-10-26 09:00:49 +0300	[diff] [blame]	124
Serhiy Storchaka	cd195e2	2017-10-14 11:14:26 +0300	[diff] [blame]	125	``*``
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	126	Causes the resulting RE to match 0 or more repetitions of the preceding RE, as
				127	many repetitions as are possible. ``ab*`` will match 'a', 'ab', or 'a' followed
				128	by any number of 'b's.
				129
Serhiy Storchaka	913876d	2018-10-28 13:41:26 +0200	[diff] [blame]	130	.. index:: single: + (plus); in regular expressions
Serhiy Storchaka	ddb961d	2018-10-26 09:00:49 +0300	[diff] [blame]	131
Serhiy Storchaka	cd195e2	2017-10-14 11:14:26 +0300	[diff] [blame]	132	``+``
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	133	Causes the resulting RE to match 1 or more repetitions of the preceding RE.
				134	``ab+`` will match 'a' followed by any non-zero number of 'b's; it will not
				135	match just 'a'.
				136
Serhiy Storchaka	913876d	2018-10-28 13:41:26 +0200	[diff] [blame]	137	.. index:: single: ? (question mark); in regular expressions
Serhiy Storchaka	ddb961d	2018-10-26 09:00:49 +0300	[diff] [blame]	138
Serhiy Storchaka	cd195e2	2017-10-14 11:14:26 +0300	[diff] [blame]	139	``?``
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	140	Causes the resulting RE to match 0 or 1 repetitions of the preceding RE.
				141	``ab?`` will match either 'a' or 'ab'.
				142
Serhiy Storchaka	ddb961d	2018-10-26 09:00:49 +0300	[diff] [blame]	143	.. index::
				144	single: *?; in regular expressions
				145	single: +?; in regular expressions
				146	single: ??; in regular expressions
				147
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	148	``*?``, ``+?``, ``??``
				149	The ``'*'``, ``'+'``, and ``'?'`` qualifiers are all :dfn:`greedy`; they match
				150	as much text as possible. Sometimes this behaviour isn't desired; if the RE
Serhiy Storchaka	cd195e2	2017-10-14 11:14:26 +0300	[diff] [blame]	151	``<.*>`` is matched against ``'<a> b <c>'``, it will match the entire
				152	string, and not just ``'<a>'``. Adding ``?`` after the qualifier makes it
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	153	perform the match in :dfn:`non-greedy` or :dfn:`minimal` fashion; as few
Georg Brandl	7ff033b	2016-04-12 07:51:41 +0200	[diff] [blame]	154	characters as possible will be matched. Using the RE ``<.*?>`` will match
Serhiy Storchaka	cd195e2	2017-10-14 11:14:26 +0300	[diff] [blame]	155	only ``'<a>'``.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	156
Serhiy Storchaka	ddb961d	2018-10-26 09:00:49 +0300	[diff] [blame]	157	.. index::
Serhiy Storchaka	913876d	2018-10-28 13:41:26 +0200	[diff] [blame]	158	single: {} (curly brackets); in regular expressions
Serhiy Storchaka	ddb961d	2018-10-26 09:00:49 +0300	[diff] [blame]	159
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	160	``{m}``
				161	Specifies that exactly m copies of the previous RE should be matched; fewer
				162	matches cause the entire RE not to match. For example, ``a{6}`` will match
				163	exactly six ``'a'`` characters, but not five.
				164
				165	``{m,n}``
				166	Causes the resulting RE to match from m to n repetitions of the preceding
				167	RE, attempting to match as many repetitions as possible. For example,
				168	``a{3,5}`` will match from 3 to 5 ``'a'`` characters. Omitting m specifies a
				169	lower bound of zero, and omitting n specifies an infinite upper bound. As an
Serhiy Storchaka	cd195e2	2017-10-14 11:14:26 +0300	[diff] [blame]	170	example, ``a{4,}b`` will match ``'aaaab'`` or a thousand ``'a'`` characters
				171	followed by a ``'b'``, but not ``'aaab'``. The comma may not be omitted or the
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	172	modifier would be confused with the previously described form.
				173
				174	``{m,n}?``
				175	Causes the resulting RE to match from m to n repetitions of the preceding
				176	RE, attempting to match as few repetitions as possible. This is the
				177	non-greedy version of the previous qualifier. For example, on the
				178	6-character string ``'aaaaaa'``, ``a{3,5}`` will match 5 ``'a'`` characters,
				179	while ``a{3,5}?`` will only match 3 characters.
				180
Serhiy Storchaka	913876d	2018-10-28 13:41:26 +0200	[diff] [blame]	181	.. index:: single: \ (backslash); in regular expressions
Serhiy Storchaka	ddb961d	2018-10-26 09:00:49 +0300	[diff] [blame]	182
Serhiy Storchaka	cd195e2	2017-10-14 11:14:26 +0300	[diff] [blame]	183	``\``
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	184	Either escapes special characters (permitting you to match characters like
				185	``'*'``, ``'?'``, and so forth), or signals a special sequence; special
				186	sequences are discussed below.
				187
				188	If you're not using a raw string to express the pattern, remember that Python
				189	also uses the backslash as an escape sequence in string literals; if the escape
				190	sequence isn't recognized by Python's parser, the backslash and subsequent
				191	character are included in the resulting string. However, if Python would
				192	recognize the resulting sequence, the backslash should be repeated twice. This
				193	is complicated and hard to understand, so it's highly recommended that you use
				194	raw strings for all but the simplest expressions.
				195
Serhiy Storchaka	ddb961d	2018-10-26 09:00:49 +0300	[diff] [blame]	196	.. index::
Serhiy Storchaka	913876d	2018-10-28 13:41:26 +0200	[diff] [blame]	197	single: [] (square brackets); in regular expressions
Serhiy Storchaka	ddb961d	2018-10-26 09:00:49 +0300	[diff] [blame]	198
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	199	``[]``
Ezio Melotti	81231d9	2011-10-20 19:38:04 +0300	[diff] [blame]	200	Used to indicate a set of characters. In a set:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	201
Ezio Melotti	81231d9	2011-10-20 19:38:04 +0300	[diff] [blame]	202	* Characters can be listed individually, e.g. ``[amk]`` will match ``'a'``,
				203	``'m'``, or ``'k'``.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	204
Serhiy Storchaka	913876d	2018-10-28 13:41:26 +0200	[diff] [blame]	205	.. index:: single: - (minus); in regular expressions
Serhiy Storchaka	ddb961d	2018-10-26 09:00:49 +0300	[diff] [blame]	206
Ezio Melotti	81231d9	2011-10-20 19:38:04 +0300	[diff] [blame]	207	* Ranges of characters can be indicated by giving two characters and separating
				208	them by a ``'-'``, for example ``[a-z]`` will match any lowercase ASCII letter,
				209	``[0-5][0-9]`` will match all the two-digits numbers from ``00`` to ``59``, and
				210	``[0-9A-Fa-f]`` will match any hexadecimal digit. If ``-`` is escaped (e.g.
Serhiy Storchaka	cd195e2	2017-10-14 11:14:26 +0300	[diff] [blame]	211	``[a\-z]``) or if it's placed as the first or last character
				212	(e.g. ``[-a]`` or ``[a-]``), it will match a literal ``'-'``.
Ezio Melotti	81231d9	2011-10-20 19:38:04 +0300	[diff] [blame]	213
				214	* Special characters lose their special meaning inside sets. For example,
				215	``[(+*)]`` will match any of the literal characters ``'('``, ``'+'``,
				216	``'*'``, or ``')'``.
				217
Serhiy Storchaka	913876d	2018-10-28 13:41:26 +0200	[diff] [blame]	218	.. index:: single: \ (backslash); in regular expressions
Serhiy Storchaka	ddb961d	2018-10-26 09:00:49 +0300	[diff] [blame]	219
Ezio Melotti	81231d9	2011-10-20 19:38:04 +0300	[diff] [blame]	220	* Character classes such as ``\w`` or ``\S`` (defined below) are also accepted
				221	inside a set, although the characters they match depends on whether
				222	:const:`ASCII` or :const:`LOCALE` mode is in force.
				223
Serhiy Storchaka	913876d	2018-10-28 13:41:26 +0200	[diff] [blame]	224	.. index:: single: ^ (caret); in regular expressions
Serhiy Storchaka	ddb961d	2018-10-26 09:00:49 +0300	[diff] [blame]	225
Ezio Melotti	81231d9	2011-10-20 19:38:04 +0300	[diff] [blame]	226	* Characters that are not within a range can be matched by :dfn:`complementing`
				227	the set. If the first character of the set is ``'^'``, all the characters
				228	that are not in the set will be matched. For example, ``[^5]`` will match
				229	any character except ``'5'``, and ``[^^]`` will match any character except
				230	``'^'``. ``^`` has no special meaning if it's not the first character in
				231	the set.
				232
				233	* To match a literal ``']'`` inside a set, precede it with a backslash, or
				234	place it at the beginning of the set. For example, both ``[()[\]{}]`` and
				235	``[]()[{}]`` will both match a parenthesis.
Mark Summerfield	9e670c2	2008-05-31 13:05:34 +0000	[diff] [blame]	236
Serhiy Storchaka	ddb961d	2018-10-26 09:00:49 +0300	[diff] [blame]	237	.. .. index:: single: --; in regular expressions
				238	.. .. index:: single: &&; in regular expressions
				239	.. .. index:: single: ~~; in regular expressions
				240	.. .. index:: single: \|\|; in regular expressions
				241
Serhiy Storchaka	05cb728	2017-11-16 12:38:26 +0200	[diff] [blame]	242	* Support of nested sets and set operations as in `Unicode Technical
				243	Standard #18`_ might be added in the future. This would change the
				244	syntax, so to facilitate this change a :exc:`FutureWarning` will be raised
				245	in ambiguous cases for the time being.
Andrés Delfino	7dfbd49	2018-10-06 16:48:30 -0300	[diff] [blame]	246	That includes sets starting with a literal ``'['`` or containing literal
Serhiy Storchaka	05cb728	2017-11-16 12:38:26 +0200	[diff] [blame]	247	character sequences ``'--'``, ``'&&'``, ``'~~'``, and ``'\|\|'``. To
				248	avoid a warning escape them with a backslash.
				249
				250	.. _Unicode Technical Standard #18: https://unicode.org/reports/tr18/
				251
				252	.. versionchanged:: 3.7
				253	:exc:`FutureWarning` is raised if a character set contains constructs
				254	that will change semantically in the future.
				255
Serhiy Storchaka	913876d	2018-10-28 13:41:26 +0200	[diff] [blame]	256	.. index:: single: \| (vertical bar); in regular expressions
Serhiy Storchaka	ddb961d	2018-10-26 09:00:49 +0300	[diff] [blame]	257
Serhiy Storchaka	cd195e2	2017-10-14 11:14:26 +0300	[diff] [blame]	258	``\|``
				259	``A\|B``, where A and B can be arbitrary REs, creates a regular expression that
				260	will match either A or B. An arbitrary number of REs can be separated by the
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	261	``'\|'`` in this way. This can be used inside groups (see below) as well. As
				262	the target string is scanned, REs separated by ``'\|'`` are tried from left to
				263	right. When one pattern completely matches, that branch is accepted. This means
Serhiy Storchaka	cd195e2	2017-10-14 11:14:26 +0300	[diff] [blame]	264	that once A matches, B will not be tested further, even if it would
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	265	produce a longer overall match. In other words, the ``'\|'`` operator is never
				266	greedy. To match a literal ``'\|'``, use ``\\|``, or enclose it inside a
				267	character class, as in ``[\|]``.
				268
Serhiy Storchaka	ddb961d	2018-10-26 09:00:49 +0300	[diff] [blame]	269	.. index::
Serhiy Storchaka	913876d	2018-10-28 13:41:26 +0200	[diff] [blame]	270	single: () (parentheses); in regular expressions
Serhiy Storchaka	ddb961d	2018-10-26 09:00:49 +0300	[diff] [blame]	271
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	272	``(...)``
				273	Matches whatever regular expression is inside the parentheses, and indicates the
				274	start and end of a group; the contents of a group can be retrieved after a match
				275	has been performed, and can be matched later in the string with the ``\number``
				276	special sequence, described below. To match the literals ``'('`` or ``')'``,
Serhiy Storchaka	cd195e2	2017-10-14 11:14:26 +0300	[diff] [blame]	277	use ``$`` or ``$``, or enclose them inside a character class: ``[(]``, ``[)]``.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	278
Serhiy Storchaka	ddb961d	2018-10-26 09:00:49 +0300	[diff] [blame]	279	.. index:: single: (?; in regular expressions
				280
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	281	``(?...)``
				282	This is an extension notation (a ``'?'`` following a ``'('`` is not meaningful
				283	otherwise). The first character after the ``'?'`` determines what the meaning
				284	and further syntax of the construct is. Extensions usually do not create a new
				285	group; ``(?P<name>...)`` is the only exception to this rule. Following are the
				286	currently supported extensions.
				287
Antoine Pitrou	fd03645	2008-08-19 17:56:33 +0000	[diff] [blame]	288	``(?aiLmsux)``
				289	(One or more letters from the set ``'a'``, ``'i'``, ``'L'``, ``'m'``,
				290	``'s'``, ``'u'``, ``'x'``.) The group matches the empty string; the
Andrew M. Kuchling	1c50e86	2009-06-01 00:11:36 +0000	[diff] [blame]	291	letters set the corresponding flags: :const:`re.A` (ASCII-only matching),
Antoine Pitrou	fd03645	2008-08-19 17:56:33 +0000	[diff] [blame]	292	:const:`re.I` (ignore case), :const:`re.L` (locale dependent),
Georg Brandl	48310cd	2009-01-03 21:18:54 +0000	[diff] [blame]	293	:const:`re.M` (multi-line), :const:`re.S` (dot matches all),
Serhiy Storchaka	cd195e2	2017-10-14 11:14:26 +0300	[diff] [blame]	294	:const:`re.U` (Unicode matching), and :const:`re.X` (verbose),
				295	for the entire regular expression.
				296	(The flags are described in :ref:`contents-of-module-re`.)
				297	This is useful if you wish to include the flags as part of the
				298	regular expression, instead of passing a flag argument to the
Serhiy Storchaka	bd48d27	2016-09-11 12:50:02 +0300	[diff] [blame]	299	:func:`re.compile` function. Flags should be used first in the
				300	expression string.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	301
Serhiy Storchaka	ddb961d	2018-10-26 09:00:49 +0300	[diff] [blame]	302	.. index:: single: (?:; in regular expressions
				303
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	304	``(?:...)``
Georg Brandl	3122ce3	2010-10-29 06:17:38 +0000	[diff] [blame]	305	A non-capturing version of regular parentheses. Matches whatever regular
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	306	expression is inside the parentheses, but the substring matched by the group
				307	cannot be retrieved after performing a match or referenced later in the
				308	pattern.
				309
Serhiy Storchaka	3557b05	2017-10-24 23:31:42 +0300	[diff] [blame]	310	``(?aiLmsux-imsx:...)``
				311	(Zero or more letters from the set ``'a'``, ``'i'``, ``'L'``, ``'m'``,
				312	``'s'``, ``'u'``, ``'x'``, optionally followed by ``'-'`` followed by
				313	one or more letters from the ``'i'``, ``'m'``, ``'s'``, ``'x'``.)
				314	The letters set or remove the corresponding flags:
				315	:const:`re.A` (ASCII-only matching), :const:`re.I` (ignore case),
				316	:const:`re.L` (locale dependent), :const:`re.M` (multi-line),
				317	:const:`re.S` (dot matches all), :const:`re.U` (Unicode matching),
				318	and :const:`re.X` (verbose), for the part of the expression.
				319	(The flags are described in :ref:`contents-of-module-re`.)
				320
				321	The letters ``'a'``, ``'L'`` and ``'u'`` are mutually exclusive when used
				322	as inline flags, so they can't be combined or follow ``'-'``. Instead,
				323	when one of them appears in an inline group, it overrides the matching mode
				324	in the enclosing group. In Unicode patterns ``(?a:...)`` switches to
				325	ASCII-only matching, and ``(?u:...)`` switches to Unicode matching
				326	(default). In byte pattern ``(?L:...)`` switches to locale depending
				327	matching, and ``(?a:...)`` switches to ASCII-only matching (default).
				328	This override is only in effect for the narrow inline group, and the
				329	original matching mode is restored outside of the group.
Serhiy Storchaka	be9a4e5	2016-09-10 00:57:55 +0300	[diff] [blame]	330
Zachary Ware	c307672	2016-09-09 15:47:05 -0700	[diff] [blame]	331	.. versionadded:: 3.6
Serhiy Storchaka	be9a4e5	2016-09-10 00:57:55 +0300	[diff] [blame]	332
Serhiy Storchaka	3557b05	2017-10-24 23:31:42 +0300	[diff] [blame]	333	.. versionchanged:: 3.7
				334	The letters ``'a'``, ``'L'`` and ``'u'`` also can be used in a group.
				335
Serhiy Storchaka	ddb961d	2018-10-26 09:00:49 +0300	[diff] [blame]	336	.. index:: single: (?P<; in regular expressions
				337
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	338	``(?P<name>...)``
				339	Similar to regular parentheses, but the substring matched by the group is
Georg Brandl	3c6780c6	2013-10-06 12:08:14 +0200	[diff] [blame]	340	accessible via the symbolic group name name. Group names must be valid
				341	Python identifiers, and each group name must be defined only once within a
				342	regular expression. A symbolic group is also a numbered group, just as if
				343	the group were not named.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	344
Georg Brandl	3c6780c6	2013-10-06 12:08:14 +0200	[diff] [blame]	345	Named groups can be referenced in three contexts. If the pattern is
				346	``(?P<quote>['"]).*?(?P=quote)`` (i.e. matching a string quoted with either
				347	single or double quotes):
				348
				349	+---------------------------------------+----------------------------------+
				350	\| Context of reference to group "quote" \| Ways to reference it \|
				351	+=======================================+==================================+
				352	\| in the same pattern itself \| * ``(?P=quote)`` (as shown) \|
				353	\| \| * ``\1`` \|
				354	+---------------------------------------+----------------------------------+
Serhiy Storchaka	cd195e2	2017-10-14 11:14:26 +0300	[diff] [blame]	355	\| when processing match object m \| * ``m.group('quote')`` \|
Georg Brandl	3c6780c6	2013-10-06 12:08:14 +0200	[diff] [blame]	356	\| \| * ``m.end('quote')`` (etc.) \|
				357	+---------------------------------------+----------------------------------+
Serhiy Storchaka	cd195e2	2017-10-14 11:14:26 +0300	[diff] [blame]	358	\| in a string passed to the repl \| * ``\g<quote>`` \|
Georg Brandl	3c6780c6	2013-10-06 12:08:14 +0200	[diff] [blame]	359	\| argument of ``re.sub()`` \| * ``\g<1>`` \|
				360	\| \| * ``\1`` \|
				361	+---------------------------------------+----------------------------------+
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	362
Serhiy Storchaka	ddb961d	2018-10-26 09:00:49 +0300	[diff] [blame]	363	.. index:: single: (?P=; in regular expressions
				364
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	365	``(?P=name)``
Georg Brandl	3c6780c6	2013-10-06 12:08:14 +0200	[diff] [blame]	366	A backreference to a named group; it matches whatever text was matched by the
				367	earlier group named name.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	368
Serhiy Storchaka	ddb961d	2018-10-26 09:00:49 +0300	[diff] [blame]	369	.. index:: single: (?#; in regular expressions
				370
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	371	``(?#...)``
				372	A comment; the contents of the parentheses are simply ignored.
				373
				374	``(?=...)``
				375	Matches if ``...`` matches next, but doesn't consume any of the string. This is
Serhiy Storchaka	cd195e2	2017-10-14 11:14:26 +0300	[diff] [blame]	376	called a :dfn:`lookahead assertion`. For example, ``Isaac (?=Asimov)`` will match
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	377	``'Isaac '`` only if it's followed by ``'Asimov'``.
				378
Serhiy Storchaka	ddb961d	2018-10-26 09:00:49 +0300	[diff] [blame]	379	.. index:: single: (?!; in regular expressions
				380
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	381	``(?!...)``
Serhiy Storchaka	cd195e2	2017-10-14 11:14:26 +0300	[diff] [blame]	382	Matches if ``...`` doesn't match next. This is a :dfn:`negative lookahead assertion`.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	383	For example, ``Isaac (?!Asimov)`` will match ``'Isaac '`` only if it's not
				384	followed by ``'Asimov'``.
				385
Serhiy Storchaka	ddb961d	2018-10-26 09:00:49 +0300	[diff] [blame]	386	.. index:: single: (?<=; in regular expressions
				387
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	388	``(?<=...)``
				389	Matches if the current position in the string is preceded by a match for ``...``
				390	that ends at the current position. This is called a :dfn:`positive lookbehind
Serhiy Storchaka	cd195e2	2017-10-14 11:14:26 +0300	[diff] [blame]	391	assertion`. ``(?<=abc)def`` will find a match in ``'abcdef'``, since the
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	392	lookbehind will back up 3 characters and check if the contained pattern matches.
				393	The contained pattern must only match strings of some fixed length, meaning that
				394	``abc`` or ``a\|b`` are allowed, but ``a*`` and ``a{3,4}`` are not. Note that
Ezio Melotti	0a6b541	2012-04-29 07:34:46 +0300	[diff] [blame]	395	patterns which start with positive lookbehind assertions will not match at the
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	396	beginning of the string being searched; you will most likely want to use the
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	397	:func:`search` function rather than the :func:`match` function:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	398
				399	>>> import re
				400	>>> m = re.search('(?<=abc)def', 'abcdef')
				401	>>> m.group(0)
				402	'def'
				403
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	404	This example looks for a word following a hyphen:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	405
Cheryl Sabella	6677142	2018-02-02 16:16:27 -0500	[diff] [blame]	406	>>> m = re.search(r'(?<=-)\w+', 'spam-egg')
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	407	>>> m.group(0)
				408	'egg'
				409
Georg Brandl	8c16cb9	2016-02-25 20:17:45 +0100	[diff] [blame]	410	.. versionchanged:: 3.5
Serhiy Storchaka	4eea62f	2015-02-21 10:07:35 +0200	[diff] [blame]	411	Added support for group references of fixed length.
				412
Serhiy Storchaka	ddb961d	2018-10-26 09:00:49 +0300	[diff] [blame]	413	.. index:: single: (?<!; in regular expressions
				414
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	415	``(?<!...)``
				416	Matches if the current position in the string is not preceded by a match for
				417	``...``. This is called a :dfn:`negative lookbehind assertion`. Similar to
				418	positive lookbehind assertions, the contained pattern must only match strings of
				419	some fixed length. Patterns which start with negative lookbehind assertions may
				420	match at the beginning of the string being searched.
				421
				422	``(?(id/name)yes-pattern\|no-pattern)``
orsenthil@gmail.com	476021b	2011-03-12 10:46:25 +0800	[diff] [blame]	423	Will try to match with ``yes-pattern`` if the group with given id or
				424	name exists, and with ``no-pattern`` if it doesn't. ``no-pattern`` is
				425	optional and can be omitted. For example,
				426	``(<)?(\w+@\w+(?:\.\w+)+)(?(1)>\|$)`` is a poor email matching pattern, which
				427	will match with ``'<user@host.com>'`` as well as ``'user@host.com'``, but
Serhiy Storchaka	a4d170d	2013-12-23 18:20:51 +0200	[diff] [blame]	428	not with ``'<user@host.com'`` nor ``'user@host.com>'``.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	429
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	430
				431	The special sequences consist of ``'\'`` and a character from the list below.
Martin Panter	98e9051	2016-06-12 06:17:29 +0000	[diff] [blame]	432	If the ordinary character is not an ASCII digit or an ASCII letter, then the
Serhiy Storchaka	9bd85b8	2016-06-11 19:15:00 +0300	[diff] [blame]	433	resulting RE will match the second character. For example, ``\$`` matches the
				434	character ``'$'``.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	435
Serhiy Storchaka	913876d	2018-10-28 13:41:26 +0200	[diff] [blame]	436	.. index:: single: \ (backslash); in regular expressions
Serhiy Storchaka	ddb961d	2018-10-26 09:00:49 +0300	[diff] [blame]	437
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	438	``\number``
				439	Matches the contents of the group of the same number. Groups are numbered
				440	starting from 1. For example, ``(.+) \1`` matches ``'the the'`` or ``'55 55'``,
Georg Brandl	2070e83	2013-10-06 12:58:20 +0200	[diff] [blame]	441	but not ``'thethe'`` (note the space after the group). This special sequence
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	442	can only be used to match one of the first 99 groups. If the first digit of
				443	number is 0, or number is 3 octal digits long, it will not be interpreted as
				444	a group match, but as the character with octal value number. Inside the
				445	``'['`` and ``']'`` of a character class, all numeric escapes are treated as
				446	characters.
				447
Serhiy Storchaka	ddb961d	2018-10-26 09:00:49 +0300	[diff] [blame]	448	.. index:: single: \A; in regular expressions
				449
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	450	``\A``
				451	Matches only at the start of the string.
				452
Serhiy Storchaka	ddb961d	2018-10-26 09:00:49 +0300	[diff] [blame]	453	.. index:: single: \b; in regular expressions
				454
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	455	``\b``
Mark Summerfield	6c4f617	2008-08-20 07:34:41 +0000	[diff] [blame]	456	Matches the empty string, but only at the beginning or end of a word.
Serhiy Storchaka	cd195e2	2017-10-14 11:14:26 +0300	[diff] [blame]	457	A word is defined as a sequence of word characters. Note that formally,
Ezio Melotti	5a045b9	2012-02-29 11:48:44 +0200	[diff] [blame]	458	``\b`` is defined as the boundary between a ``\w`` and a ``\W`` character
				459	(or vice versa), or between ``\w`` and the beginning/end of the string.
				460	This means that ``r'\bfoo\b'`` matches ``'foo'``, ``'foo.'``, ``'(foo)'``,
				461	``'bar foo baz'`` but not ``'foobar'`` or ``'foo3'``.
				462
Serhiy Storchaka	cd195e2	2017-10-14 11:14:26 +0300	[diff] [blame]	463	By default Unicode alphanumerics are the ones used in Unicode patterns, but
				464	this can be changed by using the :const:`ASCII` flag. Word boundaries are
				465	determined by the current locale if the :const:`LOCALE` flag is used.
				466	Inside a character range, ``\b`` represents the backspace character, for
				467	compatibility with Python's string literals.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	468
Serhiy Storchaka	ddb961d	2018-10-26 09:00:49 +0300	[diff] [blame]	469	.. index:: single: \B; in regular expressions
				470
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	471	``\B``
Ezio Melotti	5a045b9	2012-02-29 11:48:44 +0200	[diff] [blame]	472	Matches the empty string, but only when it is not at the beginning or end
				473	of a word. This means that ``r'py\B'`` matches ``'python'``, ``'py3'``,
				474	``'py2'``, but not ``'py'``, ``'py.'``, or ``'py!'``.
Serhiy Storchaka	cd195e2	2017-10-14 11:14:26 +0300	[diff] [blame]	475	``\B`` is just the opposite of ``\b``, so word characters in Unicode
				476	patterns are Unicode alphanumerics or the underscore, although this can
				477	be changed by using the :const:`ASCII` flag. Word boundaries are
				478	determined by the current locale if the :const:`LOCALE` flag is used.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	479
Serhiy Storchaka	ddb961d	2018-10-26 09:00:49 +0300	[diff] [blame]	480	.. index:: single: \d; in regular expressions
				481
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	482	``\d``
Antoine Pitrou	fd03645	2008-08-19 17:56:33 +0000	[diff] [blame]	483	For Unicode (str) patterns:
Mark Dickinson	1f26828	2009-07-28 17:22:36 +0000	[diff] [blame]	484	Matches any Unicode decimal digit (that is, any character in
				485	Unicode character category [Nd]). This includes ``[0-9]``, and
				486	also many other digit characters. If the :const:`ASCII` flag is
Serhiy Storchaka	3557b05	2017-10-24 23:31:42 +0300	[diff] [blame]	487	used only ``[0-9]`` is matched.
Serhiy Storchaka	cd195e2	2017-10-14 11:14:26 +0300	[diff] [blame]	488
Antoine Pitrou	fd03645	2008-08-19 17:56:33 +0000	[diff] [blame]	489	For 8-bit (bytes) patterns:
Mark Summerfield	6c4f617	2008-08-20 07:34:41 +0000	[diff] [blame]	490	Matches any decimal digit; this is equivalent to ``[0-9]``.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	491
Serhiy Storchaka	ddb961d	2018-10-26 09:00:49 +0300	[diff] [blame]	492	.. index:: single: \D; in regular expressions
				493
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	494	``\D``
Serhiy Storchaka	cd195e2	2017-10-14 11:14:26 +0300	[diff] [blame]	495	Matches any character which is not a decimal digit. This is
Mark Summerfield	6c4f617	2008-08-20 07:34:41 +0000	[diff] [blame]	496	the opposite of ``\d``. If the :const:`ASCII` flag is used this
Serhiy Storchaka	3557b05	2017-10-24 23:31:42 +0300	[diff] [blame]	497	becomes the equivalent of ``[^0-9]``.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	498
Serhiy Storchaka	ddb961d	2018-10-26 09:00:49 +0300	[diff] [blame]	499	.. index:: single: \s; in regular expressions
				500
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	501	``\s``
Antoine Pitrou	fd03645	2008-08-19 17:56:33 +0000	[diff] [blame]	502	For Unicode (str) patterns:
Mark Summerfield	6c4f617	2008-08-20 07:34:41 +0000	[diff] [blame]	503	Matches Unicode whitespace characters (which includes
				504	``[ \t\n\r\f\v]``, and also many other characters, for example the
				505	non-breaking spaces mandated by typography rules in many
				506	languages). If the :const:`ASCII` flag is used, only
Serhiy Storchaka	3557b05	2017-10-24 23:31:42 +0300	[diff] [blame]	507	``[ \t\n\r\f\v]`` is matched.
Mark Summerfield	6c4f617	2008-08-20 07:34:41 +0000	[diff] [blame]	508
Antoine Pitrou	fd03645	2008-08-19 17:56:33 +0000	[diff] [blame]	509	For 8-bit (bytes) patterns:
				510	Matches characters considered whitespace in the ASCII character set;
Mark Summerfield	6c4f617	2008-08-20 07:34:41 +0000	[diff] [blame]	511	this is equivalent to ``[ \t\n\r\f\v]``.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	512
Serhiy Storchaka	ddb961d	2018-10-26 09:00:49 +0300	[diff] [blame]	513	.. index:: single: \S; in regular expressions
				514
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	515	``\S``
Serhiy Storchaka	cd195e2	2017-10-14 11:14:26 +0300	[diff] [blame]	516	Matches any character which is not a whitespace character. This is
Mark Summerfield	6c4f617	2008-08-20 07:34:41 +0000	[diff] [blame]	517	the opposite of ``\s``. If the :const:`ASCII` flag is used this
Serhiy Storchaka	3557b05	2017-10-24 23:31:42 +0300	[diff] [blame]	518	becomes the equivalent of ``[^ \t\n\r\f\v]``.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	519
Serhiy Storchaka	ddb961d	2018-10-26 09:00:49 +0300	[diff] [blame]	520	.. index:: single: \w; in regular expressions
				521
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	522	``\w``
Antoine Pitrou	fd03645	2008-08-19 17:56:33 +0000	[diff] [blame]	523	For Unicode (str) patterns:
Mark Summerfield	6c4f617	2008-08-20 07:34:41 +0000	[diff] [blame]	524	Matches Unicode word characters; this includes most characters
				525	that can be part of a word in any language, as well as numbers and
				526	the underscore. If the :const:`ASCII` flag is used, only
Serhiy Storchaka	3557b05	2017-10-24 23:31:42 +0300	[diff] [blame]	527	``[a-zA-Z0-9_]`` is matched.
Serhiy Storchaka	cd195e2	2017-10-14 11:14:26 +0300	[diff] [blame]	528
Antoine Pitrou	fd03645	2008-08-19 17:56:33 +0000	[diff] [blame]	529	For 8-bit (bytes) patterns:
				530	Matches characters considered alphanumeric in the ASCII character set;
Serhiy Storchaka	cd195e2	2017-10-14 11:14:26 +0300	[diff] [blame]	531	this is equivalent to ``[a-zA-Z0-9_]``. If the :const:`LOCALE` flag is
				532	used, matches characters considered alphanumeric in the current locale
				533	and the underscore.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	534
Serhiy Storchaka	ddb961d	2018-10-26 09:00:49 +0300	[diff] [blame]	535	.. index:: single: \W; in regular expressions
				536
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	537	``\W``
Serhiy Storchaka	cd195e2	2017-10-14 11:14:26 +0300	[diff] [blame]	538	Matches any character which is not a word character. This is
Mark Summerfield	6c4f617	2008-08-20 07:34:41 +0000	[diff] [blame]	539	the opposite of ``\w``. If the :const:`ASCII` flag is used this
Serhiy Storchaka	3557b05	2017-10-24 23:31:42 +0300	[diff] [blame]	540	becomes the equivalent of ``[^a-zA-Z0-9_]``. If the :const:`LOCALE` flag is
Serhiy Storchaka	cd195e2	2017-10-14 11:14:26 +0300	[diff] [blame]	541	used, matches characters considered alphanumeric in the current locale
				542	and the underscore.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	543
Serhiy Storchaka	ddb961d	2018-10-26 09:00:49 +0300	[diff] [blame]	544	.. index:: single: \Z; in regular expressions
				545
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	546	``\Z``
				547	Matches only at the end of the string.
				548
Serhiy Storchaka	ddb961d	2018-10-26 09:00:49 +0300	[diff] [blame]	549	.. index::
				550	single: \a; in regular expressions
				551	single: \b; in regular expressions
				552	single: \f; in regular expressions
				553	single: \n; in regular expressions
				554	single: \N; in regular expressions
				555	single: \r; in regular expressions
				556	single: \t; in regular expressions
				557	single: \u; in regular expressions
				558	single: \U; in regular expressions
				559	single: \v; in regular expressions
				560	single: \x; in regular expressions
				561	single: \\; in regular expressions
				562
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	563	Most of the standard escapes supported by Python string literals are also
				564	accepted by the regular expression parser::
				565
				566	\a \b \f \n
Serhiy Storchaka	a445feb	2018-02-10 00:08:17 +0200	[diff] [blame]	567	\N \r \t \u
				568	\U \v \x \\
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	569
Ezio Melotti	285e51b	2012-04-29 04:52:30 +0300	[diff] [blame]	570	(Note that ``\b`` is used to represent word boundaries, and means "backspace"
				571	only inside character classes.)
				572
Serhiy Storchaka	a445feb	2018-02-10 00:08:17 +0200	[diff] [blame]	573	``'\u'``, ``'\U'``, and ``'\N'`` escape sequences are only recognized in Unicode
Serhiy Storchaka	cd195e2	2017-10-14 11:14:26 +0300	[diff] [blame]	574	patterns. In bytes patterns they are errors.
Antoine Pitrou	463badf	2012-06-23 13:29:19 +0200	[diff] [blame]	575
Raymond Hettinger	5768e0c	2011-10-19 14:10:07 -0700	[diff] [blame]	576	Octal escapes are included in a limited form. If the first digit is a 0, or if
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	577	there are three octal digits, it is considered an octal escape. Otherwise, it is
				578	a group reference. As for string literals, octal escapes are always at most
				579	three digits in length.
				580
Antoine Pitrou	463badf	2012-06-23 13:29:19 +0200	[diff] [blame]	581	.. versionchanged:: 3.3
				582	The ``'\u'`` and ``'\U'`` escape sequences have been added.
				583
Serhiy Storchaka	9bd85b8	2016-06-11 19:15:00 +0300	[diff] [blame]	584	.. versionchanged:: 3.6
Martin Panter	98e9051	2016-06-12 06:17:29 +0000	[diff] [blame]	585	Unknown escapes consisting of ``'\'`` and an ASCII letter now are errors.
Serhiy Storchaka	a54aae0	2015-03-24 22:58:14 +0200	[diff] [blame]	586
Serhiy Storchaka	a445feb	2018-02-10 00:08:17 +0200	[diff] [blame]	587	.. versionchanged:: 3.8
				588	The ``'\N{name}'`` escape sequence has been added. As in string literals,
				589	it expands to the named Unicode character (e.g. ``'\N{EM DASH}'``).
Antoine Pitrou	463badf	2012-06-23 13:29:19 +0200	[diff] [blame]	590
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	591
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	592	.. _contents-of-module-re:
				593
				594	Module Contents
				595	---------------
				596
				597	The module defines several functions, constants, and an exception. Some of the
				598	functions are simplified versions of the full featured methods for compiled
				599	regular expressions. Most non-trivial applications always use the compiled
				600	form.
				601
Ethan Furman	c88c80b	2016-11-21 08:29:31 -0800	[diff] [blame]	602	.. versionchanged:: 3.6
				603	Flag constants are now instances of :class:`RegexFlag`, which is a subclass of
				604	:class:`enum.IntFlag`.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	605
Georg Brandl	1824415	2009-09-02 20:34:52 +0000	[diff] [blame]	606	.. function:: compile(pattern, flags=0)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	607
Henk-Jaap Wagenaar	ed94a8b	2017-08-28 06:41:20 +0100	[diff] [blame]	608	Compile a regular expression pattern into a :ref:`regular expression object
				609	<re-objects>`, which can be used for matching using its
Serhiy Storchaka	0b5e61d	2017-10-04 20:09:49 +0300	[diff] [blame]	610	:func:`~Pattern.match`, :func:`~Pattern.search` and other methods, described
Henk-Jaap Wagenaar	ed94a8b	2017-08-28 06:41:20 +0100	[diff] [blame]	611	below.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	612
				613	The expression's behaviour can be modified by specifying a flags value.
				614	Values can be any of the following variables, combined using bitwise OR (the
				615	``\|`` operator).
				616
				617	The sequence ::
				618
Gregory P. Smith	4221c74	2009-03-02 05:04:04 +0000	[diff] [blame]	619	prog = re.compile(pattern)
				620	result = prog.match(string)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	621
				622	is equivalent to ::
				623
Gregory P. Smith	4221c74	2009-03-02 05:04:04 +0000	[diff] [blame]	624	result = re.match(pattern, string)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	625
Georg Brandl	f346ac0	2009-07-26 15:03:49 +0000	[diff] [blame]	626	but using :func:`re.compile` and saving the resulting regular expression
				627	object for reuse is more efficient when the expression will be used several
				628	times in a single program.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	629
Gregory P. Smith	4221c74	2009-03-02 05:04:04 +0000	[diff] [blame]	630	.. note::
				631
				632	The compiled versions of the most recent patterns passed to
Serhiy Storchaka	32eddc1	2013-11-23 23:20:30 +0200	[diff] [blame]	633	:func:`re.compile` and the module-level matching functions are cached, so
Gregory P. Smith	4221c74	2009-03-02 05:04:04 +0000	[diff] [blame]	634	programs that use only a few regular expressions at a time needn't worry
				635	about compiling regular expressions.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	636
				637
Antoine Pitrou	fd03645	2008-08-19 17:56:33 +0000	[diff] [blame]	638	.. data:: A
				639	ASCII
				640
Georg Brandl	4049ce0	2009-06-08 07:49:54 +0000	[diff] [blame]	641	Make ``\w``, ``\W``, ``\b``, ``\B``, ``\d``, ``\D``, ``\s`` and ``\S``
				642	perform ASCII-only matching instead of full Unicode matching. This is only
				643	meaningful for Unicode patterns, and is ignored for byte patterns.
Serhiy Storchaka	cd195e2	2017-10-14 11:14:26 +0300	[diff] [blame]	644	Corresponds to the inline flag ``(?a)``.
Antoine Pitrou	fd03645	2008-08-19 17:56:33 +0000	[diff] [blame]	645
Mark Summerfield	6c4f617	2008-08-20 07:34:41 +0000	[diff] [blame]	646	Note that for backward compatibility, the :const:`re.U` flag still
				647	exists (as well as its synonym :const:`re.UNICODE` and its embedded
Georg Brandl	ebeb44d	2010-07-29 11:15:36 +0000	[diff] [blame]	648	counterpart ``(?u)``), but these are redundant in Python 3 since
Mark Summerfield	6c4f617	2008-08-20 07:34:41 +0000	[diff] [blame]	649	matches are Unicode by default for strings (and Unicode matching
				650	isn't allowed for bytes).
Georg Brandl	48310cd	2009-01-03 21:18:54 +0000	[diff] [blame]	651
Antoine Pitrou	fd03645	2008-08-19 17:56:33 +0000	[diff] [blame]	652
Sandro Tosi	da785fd	2012-01-01 12:55:20 +0100	[diff] [blame]	653	.. data:: DEBUG
				654
				655	Display debug information about compiled expression.
Serhiy Storchaka	cd195e2	2017-10-14 11:14:26 +0300	[diff] [blame]	656	No corresponding inline flag.
Sandro Tosi	da785fd	2012-01-01 12:55:20 +0100	[diff] [blame]	657
				658
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	659	.. data:: I
				660	IGNORECASE
				661
Brian Ward	c9d6dbc	2017-05-24 00:03:38 -0700	[diff] [blame]	662	Perform case-insensitive matching; expressions like ``[A-Z]`` will also
Serhiy Storchaka	cd195e2	2017-10-14 11:14:26 +0300	[diff] [blame]	663	match lowercase letters. Full Unicode matching (such as ``Ü`` matching
				664	``ü``) also works unless the :const:`re.ASCII` flag is used to disable
				665	non-ASCII matches. The current locale does not change the effect of this
				666	flag unless the :const:`re.LOCALE` flag is also used.
				667	Corresponds to the inline flag ``(?i)``.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	668
Serhiy Storchaka	cd195e2	2017-10-14 11:14:26 +0300	[diff] [blame]	669	Note that when the Unicode patterns ``[a-z]`` or ``[A-Z]`` are used in
				670	combination with the :const:`IGNORECASE` flag, they will match the 52 ASCII
				671	letters and 4 additional non-ASCII letters: 'İ' (U+0130, Latin capital
				672	letter I with dot above), 'ı' (U+0131, Latin small letter dotless i),
				673	'ſ' (U+017F, Latin small letter long s) and 'K' (U+212A, Kelvin sign).
				674	If the :const:`ASCII` flag is used, only letters 'a' to 'z'
Serhiy Storchaka	3557b05	2017-10-24 23:31:42 +0300	[diff] [blame]	675	and 'A' to 'Z' are matched.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	676
				677	.. data:: L
				678	LOCALE
				679
Serhiy Storchaka	cd195e2	2017-10-14 11:14:26 +0300	[diff] [blame]	680	Make ``\w``, ``\W``, ``\b``, ``\B`` and case-insensitive matching
				681	dependent on the current locale. This flag can be used only with bytes
				682	patterns. The use of this flag is discouraged as the locale mechanism
				683	is very unreliable, it only handles one "culture" at a time, and it only
				684	works with 8-bit locales. Unicode matching is already enabled by default
				685	in Python 3 for Unicode (str) patterns, and it is able to handle different
				686	locales/languages.
				687	Corresponds to the inline flag ``(?L)``.
Serhiy Storchaka	22a309a	2014-12-01 11:50:07 +0200	[diff] [blame]	688
Serhiy Storchaka	9bd85b8	2016-06-11 19:15:00 +0300	[diff] [blame]	689	.. versionchanged:: 3.6
				690	:const:`re.LOCALE` can be used only with bytes patterns and is
				691	not compatible with :const:`re.ASCII`.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	692
Serhiy Storchaka	898ff03	2017-05-05 08:53:40 +0300	[diff] [blame]	693	.. versionchanged:: 3.7
				694	Compiled regular expression objects with the :const:`re.LOCALE` flag no
				695	longer depend on the locale at compile time. Only the locale at
				696	matching time affects the result of matching.
				697
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	698
				699	.. data:: M
				700	MULTILINE
				701
				702	When specified, the pattern character ``'^'`` matches at the beginning of the
				703	string and at the beginning of each line (immediately following each newline);
				704	and the pattern character ``'$'`` matches at the end of the string and at the
				705	end of each line (immediately preceding each newline). By default, ``'^'``
				706	matches only at the beginning of the string, and ``'$'`` only at the end of the
				707	string and immediately before the newline (if any) at the end of the string.
Serhiy Storchaka	cd195e2	2017-10-14 11:14:26 +0300	[diff] [blame]	708	Corresponds to the inline flag ``(?m)``.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	709
				710
				711	.. data:: S
				712	DOTALL
				713
				714	Make the ``'.'`` special character match any character at all, including a
				715	newline; without this flag, ``'.'`` will match anything except a newline.
Serhiy Storchaka	cd195e2	2017-10-14 11:14:26 +0300	[diff] [blame]	716	Corresponds to the inline flag ``(?s)``.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	717
				718
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	719	.. data:: X
				720	VERBOSE
				721
Serhiy Storchaka	913876d	2018-10-28 13:41:26 +0200	[diff] [blame]	722	.. index:: single: # (hash); in regular expressions
Serhiy Storchaka	ddb961d	2018-10-26 09:00:49 +0300	[diff] [blame]	723
Zachary Ware	71a0b43	2015-11-11 23:32:14 -0600	[diff] [blame]	724	This flag allows you to write regular expressions that look nicer and are
				725	more readable by allowing you to visually separate logical sections of the
				726	pattern and add comments. Whitespace within the pattern is ignored, except
Serhiy Storchaka	b0b44b4	2017-11-14 17:21:26 +0200	[diff] [blame]	727	when in a character class, or when preceded by an unescaped backslash,
				728	or within tokens like ``*?``, ``(?:`` or ``(?P<...>``.
Zachary Ware	71a0b43	2015-11-11 23:32:14 -0600	[diff] [blame]	729	When a line contains a ``#`` that is not in a character class and is not
				730	preceded by an unescaped backslash, all characters from the leftmost such
				731	``#`` through the end of the line are ignored.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	732
Zachary Ware	71a0b43	2015-11-11 23:32:14 -0600	[diff] [blame]	733	This means that the two following regular expression objects that match a
Christian Heimes	b9eccbf	2007-12-05 20:18:38 +0000	[diff] [blame]	734	decimal number are functionally equal::
Georg Brandl	81ac1ce	2007-08-31 17:17:17 +0000	[diff] [blame]	735
Christian Heimes	b9eccbf	2007-12-05 20:18:38 +0000	[diff] [blame]	736	a = re.compile(r"""\d + # the integral part
				737	\. # the decimal point
				738	\d * # some fractional digits""", re.X)
				739	b = re.compile(r"\d+\.\d*")
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	740
Serhiy Storchaka	cd195e2	2017-10-14 11:14:26 +0300	[diff] [blame]	741	Corresponds to the inline flag ``(?x)``.
Antoine Pitrou	fd03645	2008-08-19 17:56:33 +0000	[diff] [blame]	742
				743
Georg Brandl	c62a704	2010-07-29 11:49:05 +0000	[diff] [blame]	744	.. function:: search(pattern, string, flags=0)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	745
Terry Jan Reedy	0edb5c1	2014-05-30 16:19:59 -0400	[diff] [blame]	746	Scan through string looking for the first location where the regular expression
Georg Brandl	c62a704	2010-07-29 11:49:05 +0000	[diff] [blame]	747	pattern produces a match, and return a corresponding :ref:`match object
				748	<match-objects>`. Return ``None`` if no position in the string matches the
				749	pattern; note that this is different from finding a zero-length match at some
				750	point in the string.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	751
				752
Georg Brandl	1824415	2009-09-02 20:34:52 +0000	[diff] [blame]	753	.. function:: match(pattern, string, flags=0)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	754
				755	If zero or more characters at the beginning of string match the regular
Georg Brandl	c62a704	2010-07-29 11:49:05 +0000	[diff] [blame]	756	expression pattern, return a corresponding :ref:`match object
				757	<match-objects>`. Return ``None`` if the string does not match the pattern;
				758	note that this is different from a zero-length match.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	759
Ezio Melotti	443f000	2012-02-29 13:39:05 +0200	[diff] [blame]	760	Note that even in :const:`MULTILINE` mode, :func:`re.match` will only match
				761	at the beginning of the string and not at the beginning of each line.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	762
Ezio Melotti	443f000	2012-02-29 13:39:05 +0200	[diff] [blame]	763	If you want to locate a match anywhere in string, use :func:`search`
				764	instead (see also :ref:`search-vs-match`).
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	765
				766
Serhiy Storchaka	32eddc1	2013-11-23 23:20:30 +0200	[diff] [blame]	767	.. function:: fullmatch(pattern, string, flags=0)
				768
				769	If the whole string matches the regular expression pattern, return a
				770	corresponding :ref:`match object <match-objects>`. Return ``None`` if the
				771	string does not match the pattern; note that this is different from a
				772	zero-length match.
				773
				774	.. versionadded:: 3.4
				775
				776
Georg Brandl	1824415	2009-09-02 20:34:52 +0000	[diff] [blame]	777	.. function:: split(pattern, string, maxsplit=0, flags=0)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	778
				779	Split string by the occurrences of pattern. If capturing parentheses are
				780	used in pattern, then the text of all groups in the pattern are also returned
				781	as part of the resulting list. If maxsplit is nonzero, at most maxsplit
				782	splits occur, and the remainder of the string is returned as the final element
Georg Brandl	9647389	2008-03-06 07:09:43 +0000	[diff] [blame]	783	of the list. ::
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	784
Serhiy Storchaka	c615be5	2017-11-28 22:51:38 +0200	[diff] [blame]	785	>>> re.split(r'\W+', 'Words, words, words.')
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	786	['Words', 'words', 'words', '']
Serhiy Storchaka	c615be5	2017-11-28 22:51:38 +0200	[diff] [blame]	787	>>> re.split(r'(\W+)', 'Words, words, words.')
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	788	['Words', ', ', 'words', ', ', 'words', '.', '']
Serhiy Storchaka	c615be5	2017-11-28 22:51:38 +0200	[diff] [blame]	789	>>> re.split(r'\W+', 'Words, words, words.', 1)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	790	['Words', 'words, words.']
Gregory P. Smith	ccc5ae7	2009-03-02 05:21:55 +0000	[diff] [blame]	791	>>> re.split('[a-f]+', '0a3B9', flags=re.IGNORECASE)
				792	['0', '3', '9']
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	793
Christian Heimes	dd15f6c	2008-03-16 00:07:10 +0000	[diff] [blame]	794	If there are capturing groups in the separator and it matches at the start of
				795	the string, the result will start with an empty string. The same holds for
Serhiy Storchaka	cd195e2	2017-10-14 11:14:26 +0300	[diff] [blame]	796	the end of the string::
Christian Heimes	dd15f6c	2008-03-16 00:07:10 +0000	[diff] [blame]	797
Serhiy Storchaka	c615be5	2017-11-28 22:51:38 +0200	[diff] [blame]	798	>>> re.split(r'(\W+)', '...words, words...')
Christian Heimes	dd15f6c	2008-03-16 00:07:10 +0000	[diff] [blame]	799	['', '...', 'words', ', ', 'words', '...', '']
				800
				801	That way, separator components are always found at the same relative
Raymond Hettinger	5768e0c	2011-10-19 14:10:07 -0700	[diff] [blame]	802	indices within the result list.
Christian Heimes	dd15f6c	2008-03-16 00:07:10 +0000	[diff] [blame]	803
Serhiy Storchaka	fbb490f	2018-01-04 11:06:13 +0200	[diff] [blame]	804	Empty matches for the pattern split the string only when not adjacent
				805	to a previous empty match.
Thomas Wouters	89d996e	2007-09-08 17:39:28 +0000	[diff] [blame]	806
Serhiy Storchaka	70d56fb	2017-12-04 14:29:05 +0200	[diff] [blame]	807	>>> re.split(r'\b', 'Words, words, words.')
				808	['', 'Words', ', ', 'words', ', ', 'words', '.']
Serhiy Storchaka	fbb490f	2018-01-04 11:06:13 +0200	[diff] [blame]	809	>>> re.split(r'\W*', '...words...')
				810	['', '', 'w', 'o', 'r', 'd', 's', '', '']
Serhiy Storchaka	70d56fb	2017-12-04 14:29:05 +0200	[diff] [blame]	811	>>> re.split(r'(\W*)', '...words...')
Serhiy Storchaka	fbb490f	2018-01-04 11:06:13 +0200	[diff] [blame]	812	['', '...', '', '', 'w', '', 'o', '', 'r', '', 'd', '', 's', '...', '', '', '']
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	813
Jeroen Ruigrok van der Werven	b70ccc3	2009-04-27 08:07:12 +0000	[diff] [blame]	814	.. versionchanged:: 3.1
Gregory P. Smith	ccc5ae7	2009-03-02 05:21:55 +0000	[diff] [blame]	815	Added the optional flags argument.
				816
Serhiy Storchaka	70d56fb	2017-12-04 14:29:05 +0200	[diff] [blame]	817	.. versionchanged:: 3.7
				818	Added support of splitting on a pattern that could match an empty string.
				819
Christian Heimes	dd15f6c	2008-03-16 00:07:10 +0000	[diff] [blame]	820
Georg Brandl	1824415	2009-09-02 20:34:52 +0000	[diff] [blame]	821	.. function:: findall(pattern, string, flags=0)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	822
Georg Brandl	9afde1c	2007-11-01 20:32:30 +0000	[diff] [blame]	823	Return all non-overlapping matches of pattern in string, as a list of
Georg Brandl	3dbca81	2008-07-23 16:10:53 +0000	[diff] [blame]	824	strings. The string is scanned left-to-right, and matches are returned in
				825	the order found. If one or more groups are present in the pattern, return a
				826	list of groups; this will be a list of tuples if the pattern has more than
Serhiy Storchaka	70d56fb	2017-12-04 14:29:05 +0200	[diff] [blame]	827	one group. Empty matches are included in the result.
				828
				829	.. versionchanged:: 3.7
				830	Non-empty matches can now start just after a previous empty match.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	831
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	832
Georg Brandl	1824415	2009-09-02 20:34:52 +0000	[diff] [blame]	833	.. function:: finditer(pattern, string, flags=0)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	834
Georg Brandl	c62a704	2010-07-29 11:49:05 +0000	[diff] [blame]	835	Return an :term:`iterator` yielding :ref:`match objects <match-objects>` over
				836	all non-overlapping matches for the RE pattern in string. The string
				837	is scanned left-to-right, and matches are returned in the order found. Empty
Serhiy Storchaka	70d56fb	2017-12-04 14:29:05 +0200	[diff] [blame]	838	matches are included in the result.
				839
				840	.. versionchanged:: 3.7
				841	Non-empty matches can now start just after a previous empty match.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	842
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	843
Georg Brandl	1824415	2009-09-02 20:34:52 +0000	[diff] [blame]	844	.. function:: sub(pattern, repl, string, count=0, flags=0)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	845
				846	Return the string obtained by replacing the leftmost non-overlapping occurrences
				847	of pattern in string by the replacement repl. If the pattern isn't found,
				848	string is returned unchanged. repl can be a string or a function; if it is
				849	a string, any backslash escapes in it are processed. That is, ``\n`` is
Sandro Tosi	6a633bb	2011-08-19 22:54:50 +0200	[diff] [blame]	850	converted to a single newline character, ``\r`` is converted to a carriage return, and
Serhiy Storchaka	a54aae0	2015-03-24 22:58:14 +0200	[diff] [blame]	851	so forth. Unknown escapes such as ``\&`` are left alone. Backreferences, such
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	852	as ``\6``, are replaced with the substring matched by group 6 in the pattern.
Serhiy Storchaka	cd195e2	2017-10-14 11:14:26 +0300	[diff] [blame]	853	For example::
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	854
				855	>>> re.sub(r'def\s+([a-zA-Z_][a-zA-Z_0-9])\s$\s*$:',
				856	... r'static PyObject*\npy_\1(void)\n{',
				857	... 'def myfunc():')
				858	'static PyObject*\npy_myfunc(void)\n{'
				859
				860	If repl is a function, it is called for every non-overlapping occurrence of
Serhiy Storchaka	cd195e2	2017-10-14 11:14:26 +0300	[diff] [blame]	861	pattern. The function takes a single :ref:`match object <match-objects>`
				862	argument, and returns the replacement string. For example::
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	863
				864	>>> def dashrepl(matchobj):
				865	... if matchobj.group(0) == '-': return ' '
				866	... else: return '-'
				867	>>> re.sub('-{1,2}', dashrepl, 'pro----gram-files')
				868	'pro--gram files'
Gregory P. Smith	ccc5ae7	2009-03-02 05:21:55 +0000	[diff] [blame]	869	>>> re.sub(r'\sAND\s', ' & ', 'Baked Beans And Spam', flags=re.IGNORECASE)
				870	'Baked Beans & Spam'
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	871
Serhiy Storchaka	cd195e2	2017-10-14 11:14:26 +0300	[diff] [blame]	872	The pattern may be a string or a :ref:`pattern object <re-objects>`.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	873
				874	The optional argument count is the maximum number of pattern occurrences to be
				875	replaced; count must be a non-negative integer. If omitted or zero, all
				876	occurrences will be replaced. Empty matches for the pattern are replaced only
Serhiy Storchaka	fbb490f	2018-01-04 11:06:13 +0200	[diff] [blame]	877	when not adjacent to a previous empty match, so ``sub('x*', '-', 'abxd')`` returns
				878	``'-a-b--d-'``.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	879
Serhiy Storchaka	ddb961d	2018-10-26 09:00:49 +0300	[diff] [blame]	880	.. index:: single: \g; in regular expressions
				881
Georg Brandl	3c6780c6	2013-10-06 12:08:14 +0200	[diff] [blame]	882	In string-type repl arguments, in addition to the character escapes and
				883	backreferences described above,
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	884	``\g<name>`` will use the substring matched by the group named ``name``, as
				885	defined by the ``(?P<name>...)`` syntax. ``\g<number>`` uses the corresponding
				886	group number; ``\g<2>`` is therefore equivalent to ``\2``, but isn't ambiguous
				887	in a replacement such as ``\g<2>0``. ``\20`` would be interpreted as a
				888	reference to group 20, not a reference to group 2 followed by the literal
				889	character ``'0'``. The backreference ``\g<0>`` substitutes in the entire
				890	substring matched by the RE.
				891
Jeroen Ruigrok van der Werven	b70ccc3	2009-04-27 08:07:12 +0000	[diff] [blame]	892	.. versionchanged:: 3.1
Gregory P. Smith	ccc5ae7	2009-03-02 05:21:55 +0000	[diff] [blame]	893	Added the optional flags argument.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	894
Serhiy Storchaka	7438e4b	2014-10-10 11:06:31 +0300	[diff] [blame]	895	.. versionchanged:: 3.5
				896	Unmatched groups are replaced with an empty string.
				897
Serhiy Storchaka	9bd85b8	2016-06-11 19:15:00 +0300	[diff] [blame]	898	.. versionchanged:: 3.6
Serhiy Storchaka	53c53ea	2016-12-06 19:15:29 +0200	[diff] [blame]	899	Unknown escapes in pattern consisting of ``'\'`` and an ASCII letter
				900	now are errors.
				901
Serhiy Storchaka	ff3dbe9	2016-12-06 19:25:19 +0200	[diff] [blame]	902	.. versionchanged:: 3.7
				903	Unknown escapes in repl consisting of ``'\'`` and an ASCII letter
				904	now are errors.
Serhiy Storchaka	a54aae0	2015-03-24 22:58:14 +0200	[diff] [blame]	905
Serhiy Storchaka	fbb490f	2018-01-04 11:06:13 +0200	[diff] [blame]	906	Empty matches for the pattern are replaced when adjacent to a previous
				907	non-empty match.
				908
Gregory P. Smith	ccc5ae7	2009-03-02 05:21:55 +0000	[diff] [blame]	909
Georg Brandl	1824415	2009-09-02 20:34:52 +0000	[diff] [blame]	910	.. function:: subn(pattern, repl, string, count=0, flags=0)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	911
				912	Perform the same operation as :func:`sub`, but return a tuple ``(new_string,
				913	number_of_subs_made)``.
				914
Jeroen Ruigrok van der Werven	b70ccc3	2009-04-27 08:07:12 +0000	[diff] [blame]	915	.. versionchanged:: 3.1
Gregory P. Smith	ccc5ae7	2009-03-02 05:21:55 +0000	[diff] [blame]	916	Added the optional flags argument.
				917
Serhiy Storchaka	7438e4b	2014-10-10 11:06:31 +0300	[diff] [blame]	918	.. versionchanged:: 3.5
				919	Unmatched groups are replaced with an empty string.
				920
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	921
Serhiy Storchaka	8fc7bc2	2017-04-13 19:17:36 +0300	[diff] [blame]	922	.. function:: escape(pattern)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	923
Serhiy Storchaka	5908300	2017-04-13 21:06:43 +0300	[diff] [blame]	924	Escape special characters in pattern.
Ezio Melotti	88fdeb4	2011-04-10 12:59:16 +0300	[diff] [blame]	925	This is useful if you want to match an arbitrary literal string that may
Serhiy Storchaka	8fc7bc2	2017-04-13 19:17:36 +0300	[diff] [blame]	926	have regular expression metacharacters in it. For example::
				927
				928	>>> print(re.escape('python.exe'))
				929	python\.exe
				930
				931	>>> legal_chars = string.ascii_lowercase + string.digits + "!#$%&'*+-.^_`\|~:"
				932	>>> print('[%s]+' % re.escape(legal_chars))
Serhiy Storchaka	05cb728	2017-11-16 12:38:26 +0200	[diff] [blame]	933	[abcdefghijklmnopqrstuvwxyz0123456789!\#\$%\&'\*\+\-\.\^_`\\|\~:]+
Serhiy Storchaka	8fc7bc2	2017-04-13 19:17:36 +0300	[diff] [blame]	934
				935	>>> operators = ['+', '-', '', '/', '*']
				936	>>> print('\|'.join(map(re.escape, sorted(operators, reverse=True))))
Serhiy Storchaka	5908300	2017-04-13 21:06:43 +0300	[diff] [blame]	937	/\|\-\|\+\|\\\|\*
Ezio Melotti	88fdeb4	2011-04-10 12:59:16 +0300	[diff] [blame]	938
Serhiy Storchaka	cd195e2	2017-10-14 11:14:26 +0300	[diff] [blame]	939	This functions must not be used for the replacement string in :func:`sub`
				940	and :func:`subn`, only backslashes should be escaped. For example::
				941
				942	>>> digits_re = r'\d+'
				943	>>> sample = '/usr/sbin/sendmail - 0 errors, 12 warnings'
				944	>>> print(re.sub(digits_re, digits_re.replace('\\', r'\\'), sample))
				945	/usr/sbin/sendmail - \d+ errors, \d+ warnings
				946
Ezio Melotti	88fdeb4	2011-04-10 12:59:16 +0300	[diff] [blame]	947	.. versionchanged:: 3.3
				948	The ``'_'`` character is no longer escaped.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	949
Serhiy Storchaka	5908300	2017-04-13 21:06:43 +0300	[diff] [blame]	950	.. versionchanged:: 3.7
				951	Only characters that can have special meaning in a regular expression
				952	are escaped.
				953
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	954
R. David Murray	522c32a	2010-07-10 14:23:36 +0000	[diff] [blame]	955	.. function:: purge()
				956
				957	Clear the regular expression cache.
				958
				959
Serhiy Storchaka	ad446d5	2014-11-10 13:49:00 +0200	[diff] [blame]	960	.. exception:: error(msg, pattern=None, pos=None)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	961
				962	Exception raised when a string passed to one of the functions here is not a
				963	valid regular expression (for example, it might contain unmatched parentheses)
				964	or when some other error occurs during compilation or matching. It is never an
Serhiy Storchaka	ad446d5	2014-11-10 13:49:00 +0200	[diff] [blame]	965	error if a string contains no match for a pattern. The error instance has
				966	the following additional attributes:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	967
Serhiy Storchaka	ad446d5	2014-11-10 13:49:00 +0200	[diff] [blame]	968	.. attribute:: msg
				969
				970	The unformatted error message.
				971
				972	.. attribute:: pattern
				973
				974	The regular expression pattern.
				975
				976	.. attribute:: pos
				977
Serhiy Storchaka	12d6b5d	2017-05-27 16:12:48 +0300	[diff] [blame]	978	The index in pattern where compilation failed (may be ``None``).
Serhiy Storchaka	ad446d5	2014-11-10 13:49:00 +0200	[diff] [blame]	979
				980	.. attribute:: lineno
				981
Serhiy Storchaka	12d6b5d	2017-05-27 16:12:48 +0300	[diff] [blame]	982	The line corresponding to pos (may be ``None``).
Serhiy Storchaka	ad446d5	2014-11-10 13:49:00 +0200	[diff] [blame]	983
				984	.. attribute:: colno
				985
Serhiy Storchaka	12d6b5d	2017-05-27 16:12:48 +0300	[diff] [blame]	986	The column corresponding to pos (may be ``None``).
Serhiy Storchaka	ad446d5	2014-11-10 13:49:00 +0200	[diff] [blame]	987
				988	.. versionchanged:: 3.5
				989	Added additional attributes.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	990
				991	.. _re-objects:
				992
				993	Regular Expression Objects
				994	--------------------------
				995
Georg Brandl	c62a704	2010-07-29 11:49:05 +0000	[diff] [blame]	996	Compiled regular expression objects support the following methods and
Raymond Hettinger	5768e0c	2011-10-19 14:10:07 -0700	[diff] [blame]	997	attributes:
Brian Curtin	027e478	2010-03-26 00:39:56 +0000	[diff] [blame]	998
Serhiy Storchaka	0b5e61d	2017-10-04 20:09:49 +0300	[diff] [blame]	999	.. method:: Pattern.search(string[, pos[, endpos]])
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	1000
Berker Peksag	84f387d	2016-06-08 14:56:56 +0300	[diff] [blame]	1001	Scan through string looking for the first location where this regular
				1002	expression produces a match, and return a corresponding :ref:`match object
Georg Brandl	c62a704	2010-07-29 11:49:05 +0000	[diff] [blame]	1003	<match-objects>`. Return ``None`` if no position in the string matches the
				1004	pattern; note that this is different from finding a zero-length match at some
				1005	point in the string.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	1006
Georg Brandl	c62a704	2010-07-29 11:49:05 +0000	[diff] [blame]	1007	The optional second parameter pos gives an index in the string where the
				1008	search is to start; it defaults to ``0``. This is not completely equivalent to
				1009	slicing the string; the ``'^'`` pattern character matches at the real beginning
				1010	of the string and at positions just after a newline, but not necessarily at the
				1011	index where the search is to start.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	1012
Georg Brandl	c62a704	2010-07-29 11:49:05 +0000	[diff] [blame]	1013	The optional parameter endpos limits how far the string will be searched; it
				1014	will be as if the string is endpos characters long, so only the characters
				1015	from pos to ``endpos - 1`` will be searched for a match. If endpos is less
Raymond Hettinger	5768e0c	2011-10-19 14:10:07 -0700	[diff] [blame]	1016	than pos, no match will be found; otherwise, if rx is a compiled regular
Georg Brandl	c62a704	2010-07-29 11:49:05 +0000	[diff] [blame]	1017	expression object, ``rx.search(string, 0, 50)`` is equivalent to
Serhiy Storchaka	cd195e2	2017-10-14 11:14:26 +0300	[diff] [blame]	1018	``rx.search(string[:50], 0)``. ::
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	1019
Serhiy Storchaka	cd195e2	2017-10-14 11:14:26 +0300	[diff] [blame]	1020	>>> pattern = re.compile("d")
				1021	>>> pattern.search("dog") # Match at index 0
				1022	<re.Match object; span=(0, 1), match='d'>
				1023	>>> pattern.search("dog", 1) # No match; search doesn't include the "d"
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	1024
				1025
Serhiy Storchaka	0b5e61d	2017-10-04 20:09:49 +0300	[diff] [blame]	1026	.. method:: Pattern.match(string[, pos[, endpos]])
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	1027
Georg Brandl	c62a704	2010-07-29 11:49:05 +0000	[diff] [blame]	1028	If zero or more characters at the beginning of string match this regular
				1029	expression, return a corresponding :ref:`match object <match-objects>`.
				1030	Return ``None`` if the string does not match the pattern; note that this is
				1031	different from a zero-length match.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	1032
Georg Brandl	c62a704	2010-07-29 11:49:05 +0000	[diff] [blame]	1033	The optional pos and endpos parameters have the same meaning as for the
Serhiy Storchaka	cd195e2	2017-10-14 11:14:26 +0300	[diff] [blame]	1034	:meth:`~Pattern.search` method. ::
Benjamin Peterson	d7c3ed5	2010-06-27 22:32:30 +0000	[diff] [blame]	1035
Serhiy Storchaka	cd195e2	2017-10-14 11:14:26 +0300	[diff] [blame]	1036	>>> pattern = re.compile("o")
				1037	>>> pattern.match("dog") # No match as "o" is not at the start of "dog".
				1038	>>> pattern.match("dog", 1) # Match as "o" is the 2nd character of "dog".
				1039	<re.Match object; span=(1, 2), match='o'>
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	1040
Ezio Melotti	443f000	2012-02-29 13:39:05 +0200	[diff] [blame]	1041	If you want to locate a match anywhere in string, use
Serhiy Storchaka	0b5e61d	2017-10-04 20:09:49 +0300	[diff] [blame]	1042	:meth:`~Pattern.search` instead (see also :ref:`search-vs-match`).
Ezio Melotti	443f000	2012-02-29 13:39:05 +0200	[diff] [blame]	1043
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	1044
Serhiy Storchaka	0b5e61d	2017-10-04 20:09:49 +0300	[diff] [blame]	1045	.. method:: Pattern.fullmatch(string[, pos[, endpos]])
Serhiy Storchaka	32eddc1	2013-11-23 23:20:30 +0200	[diff] [blame]	1046
				1047	If the whole string matches this regular expression, return a corresponding
				1048	:ref:`match object <match-objects>`. Return ``None`` if the string does not
				1049	match the pattern; note that this is different from a zero-length match.
				1050
				1051	The optional pos and endpos parameters have the same meaning as for the
Serhiy Storchaka	cd195e2	2017-10-14 11:14:26 +0300	[diff] [blame]	1052	:meth:`~Pattern.search` method. ::
Serhiy Storchaka	32eddc1	2013-11-23 23:20:30 +0200	[diff] [blame]	1053
Serhiy Storchaka	cd195e2	2017-10-14 11:14:26 +0300	[diff] [blame]	1054	>>> pattern = re.compile("o[gh]")
				1055	>>> pattern.fullmatch("dog") # No match as "o" is not at the start of "dog".
				1056	>>> pattern.fullmatch("ogre") # No match as not the full string matches.
				1057	>>> pattern.fullmatch("doggie", 1, 3) # Matches within given limits.
				1058	<re.Match object; span=(1, 3), match='og'>
Serhiy Storchaka	32eddc1	2013-11-23 23:20:30 +0200	[diff] [blame]	1059
				1060	.. versionadded:: 3.4
				1061
				1062
Serhiy Storchaka	0b5e61d	2017-10-04 20:09:49 +0300	[diff] [blame]	1063	.. method:: Pattern.split(string, maxsplit=0)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	1064
Georg Brandl	c62a704	2010-07-29 11:49:05 +0000	[diff] [blame]	1065	Identical to the :func:`split` function, using the compiled pattern.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	1066
				1067
Serhiy Storchaka	0b5e61d	2017-10-04 20:09:49 +0300	[diff] [blame]	1068	.. method:: Pattern.findall(string[, pos[, endpos]])
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	1069
Georg Brandl	c62a704	2010-07-29 11:49:05 +0000	[diff] [blame]	1070	Similar to the :func:`findall` function, using the compiled pattern, but
				1071	also accepts optional pos and endpos parameters that limit the search
Serhiy Storchaka	cd195e2	2017-10-14 11:14:26 +0300	[diff] [blame]	1072	region like for :meth:`search`.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	1073
				1074
Serhiy Storchaka	0b5e61d	2017-10-04 20:09:49 +0300	[diff] [blame]	1075	.. method:: Pattern.finditer(string[, pos[, endpos]])
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	1076
Georg Brandl	c62a704	2010-07-29 11:49:05 +0000	[diff] [blame]	1077	Similar to the :func:`finditer` function, using the compiled pattern, but
				1078	also accepts optional pos and endpos parameters that limit the search
Serhiy Storchaka	cd195e2	2017-10-14 11:14:26 +0300	[diff] [blame]	1079	region like for :meth:`search`.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	1080
				1081
Serhiy Storchaka	0b5e61d	2017-10-04 20:09:49 +0300	[diff] [blame]	1082	.. method:: Pattern.sub(repl, string, count=0)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	1083
Georg Brandl	c62a704	2010-07-29 11:49:05 +0000	[diff] [blame]	1084	Identical to the :func:`sub` function, using the compiled pattern.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	1085
				1086
Serhiy Storchaka	0b5e61d	2017-10-04 20:09:49 +0300	[diff] [blame]	1087	.. method:: Pattern.subn(repl, string, count=0)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	1088
Georg Brandl	c62a704	2010-07-29 11:49:05 +0000	[diff] [blame]	1089	Identical to the :func:`subn` function, using the compiled pattern.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	1090
				1091
Serhiy Storchaka	0b5e61d	2017-10-04 20:09:49 +0300	[diff] [blame]	1092	.. attribute:: Pattern.flags
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	1093
Georg Brandl	3a19e54	2012-03-17 17:29:27 +0100	[diff] [blame]	1094	The regex matching flags. This is a combination of the flags given to
				1095	:func:`.compile`, any ``(?...)`` inline flags in the pattern, and implicit
				1096	flags such as :data:`UNICODE` if the pattern is a Unicode string.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	1097
				1098
Serhiy Storchaka	0b5e61d	2017-10-04 20:09:49 +0300	[diff] [blame]	1099	.. attribute:: Pattern.groups
Georg Brandl	af265f4	2008-12-07 15:06:20 +0000	[diff] [blame]	1100
Georg Brandl	c62a704	2010-07-29 11:49:05 +0000	[diff] [blame]	1101	The number of capturing groups in the pattern.
Georg Brandl	af265f4	2008-12-07 15:06:20 +0000	[diff] [blame]	1102
				1103
Serhiy Storchaka	0b5e61d	2017-10-04 20:09:49 +0300	[diff] [blame]	1104	.. attribute:: Pattern.groupindex
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	1105
Georg Brandl	c62a704	2010-07-29 11:49:05 +0000	[diff] [blame]	1106	A dictionary mapping any symbolic group names defined by ``(?P<id>)`` to group
				1107	numbers. The dictionary is empty if no symbolic groups were used in the
				1108	pattern.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	1109
				1110
Serhiy Storchaka	0b5e61d	2017-10-04 20:09:49 +0300	[diff] [blame]	1111	.. attribute:: Pattern.pattern
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	1112
Serhiy Storchaka	0b5e61d	2017-10-04 20:09:49 +0300	[diff] [blame]	1113	The pattern string from which the pattern object was compiled.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	1114
				1115
Serhiy Storchaka	fdbd011	2017-04-16 10:16:03 +0300	[diff] [blame]	1116	.. versionchanged:: 3.7
				1117	Added support of :func:`copy.copy` and :func:`copy.deepcopy`. Compiled
				1118	regular expression objects are considered atomic.
				1119
				1120
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	1121	.. _match-objects:
				1122
				1123	Match Objects
				1124	-------------
				1125
Ezio Melotti	b87f82f	2012-11-04 06:59:22 +0200	[diff] [blame]	1126	Match objects always have a boolean value of ``True``.
Serhiy Storchaka	0b5e61d	2017-10-04 20:09:49 +0300	[diff] [blame]	1127	Since :meth:`~Pattern.match` and :meth:`~Pattern.search` return ``None``
Ezio Melotti	b87f82f	2012-11-04 06:59:22 +0200	[diff] [blame]	1128	when there is no match, you can test whether there was a match with a simple
				1129	``if`` statement::
				1130
				1131	match = re.search(pattern, string)
				1132	if match:
				1133	process(match)
				1134
				1135	Match objects support the following methods and attributes:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	1136
				1137
Serhiy Storchaka	0b5e61d	2017-10-04 20:09:49 +0300	[diff] [blame]	1138	.. method:: Match.expand(template)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	1139
Georg Brandl	c62a704	2010-07-29 11:49:05 +0000	[diff] [blame]	1140	Return the string obtained by doing backslash substitution on the template
Serhiy Storchaka	0b5e61d	2017-10-04 20:09:49 +0300	[diff] [blame]	1141	string template, as done by the :meth:`~Pattern.sub` method.
Georg Brandl	c62a704	2010-07-29 11:49:05 +0000	[diff] [blame]	1142	Escapes such as ``\n`` are converted to the appropriate characters,
				1143	and numeric backreferences (``\1``, ``\2``) and named backreferences
				1144	(``\g<1>``, ``\g<name>``) are replaced by the contents of the
				1145	corresponding group.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	1146
Serhiy Storchaka	7438e4b	2014-10-10 11:06:31 +0300	[diff] [blame]	1147	.. versionchanged:: 3.5
				1148	Unmatched groups are replaced with an empty string.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	1149
Serhiy Storchaka	0b5e61d	2017-10-04 20:09:49 +0300	[diff] [blame]	1150	.. method:: Match.group([group1, ...])
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	1151
Georg Brandl	c62a704	2010-07-29 11:49:05 +0000	[diff] [blame]	1152	Returns one or more subgroups of the match. If there is a single argument, the
				1153	result is a single string; if there are multiple arguments, the result is a
				1154	tuple with one item per argument. Without arguments, group1 defaults to zero
				1155	(the whole match is returned). If a groupN argument is zero, the corresponding
				1156	return value is the entire matching string; if it is in the inclusive range
				1157	[1..99], it is the string matching the corresponding parenthesized group. If a
				1158	group number is negative or larger than the number of groups defined in the
				1159	pattern, an :exc:`IndexError` exception is raised. If a group is contained in a
				1160	part of the pattern that did not match, the corresponding result is ``None``.
				1161	If a group is contained in a part of the pattern that matched multiple times,
Serhiy Storchaka	cd195e2	2017-10-14 11:14:26 +0300	[diff] [blame]	1162	the last match is returned. ::
Christian Heimes	b9eccbf	2007-12-05 20:18:38 +0000	[diff] [blame]	1163
Georg Brandl	c62a704	2010-07-29 11:49:05 +0000	[diff] [blame]	1164	>>> m = re.match(r"(\w+) (\w+)", "Isaac Newton, physicist")
				1165	>>> m.group(0) # The entire match
				1166	'Isaac Newton'
				1167	>>> m.group(1) # The first parenthesized subgroup.
				1168	'Isaac'
				1169	>>> m.group(2) # The second parenthesized subgroup.
				1170	'Newton'
				1171	>>> m.group(1, 2) # Multiple arguments give us a tuple.
				1172	('Isaac', 'Newton')
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	1173
Georg Brandl	c62a704	2010-07-29 11:49:05 +0000	[diff] [blame]	1174	If the regular expression uses the ``(?P<name>...)`` syntax, the groupN
				1175	arguments may also be strings identifying groups by their group name. If a
				1176	string argument is not used as a group name in the pattern, an :exc:`IndexError`
				1177	exception is raised.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	1178
Serhiy Storchaka	cd195e2	2017-10-14 11:14:26 +0300	[diff] [blame]	1179	A moderately complicated example::
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	1180
Georg Brandl	c62a704	2010-07-29 11:49:05 +0000	[diff] [blame]	1181	>>> m = re.match(r"(?P<first_name>\w+) (?P<last_name>\w+)", "Malcolm Reynolds")
				1182	>>> m.group('first_name')
				1183	'Malcolm'
				1184	>>> m.group('last_name')
				1185	'Reynolds'
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	1186
Serhiy Storchaka	cd195e2	2017-10-14 11:14:26 +0300	[diff] [blame]	1187	Named groups can also be referred to by their index::
Christian Heimes	b9eccbf	2007-12-05 20:18:38 +0000	[diff] [blame]	1188
Georg Brandl	c62a704	2010-07-29 11:49:05 +0000	[diff] [blame]	1189	>>> m.group(1)
				1190	'Malcolm'
				1191	>>> m.group(2)
				1192	'Reynolds'
Christian Heimes	b9eccbf	2007-12-05 20:18:38 +0000	[diff] [blame]	1193
Serhiy Storchaka	cd195e2	2017-10-14 11:14:26 +0300	[diff] [blame]	1194	If a group matches multiple times, only the last match is accessible::
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	1195
Georg Brandl	c62a704	2010-07-29 11:49:05 +0000	[diff] [blame]	1196	>>> m = re.match(r"(..)+", "a1b2c3") # Matches 3 times.
				1197	>>> m.group(1) # Returns only the last match.
				1198	'c3'
Brian Curtin	027e478	2010-03-26 00:39:56 +0000	[diff] [blame]	1199
Brian Curtin	48f16f9	2010-04-08 13:55:29 +0000	[diff] [blame]	1200
Serhiy Storchaka	0b5e61d	2017-10-04 20:09:49 +0300	[diff] [blame]	1201	.. method:: Match.__getitem__(g)
Eric V. Smith	605bdae	2016-09-11 08:55:43 -0400	[diff] [blame]	1202
				1203	This is identical to ``m.group(g)``. This allows easier access to
Serhiy Storchaka	cd195e2	2017-10-14 11:14:26 +0300	[diff] [blame]	1204	an individual group from a match::
Eric V. Smith	605bdae	2016-09-11 08:55:43 -0400	[diff] [blame]	1205
				1206	>>> m = re.match(r"(\w+) (\w+)", "Isaac Newton, physicist")
				1207	>>> m[0] # The entire match
				1208	'Isaac Newton'
				1209	>>> m[1] # The first parenthesized subgroup.
				1210	'Isaac'
				1211	>>> m[2] # The second parenthesized subgroup.
				1212	'Newton'
				1213
				1214	.. versionadded:: 3.6
				1215
				1216
Serhiy Storchaka	0b5e61d	2017-10-04 20:09:49 +0300	[diff] [blame]	1217	.. method:: Match.groups(default=None)
Brian Curtin	48f16f9	2010-04-08 13:55:29 +0000	[diff] [blame]	1218
Georg Brandl	c62a704	2010-07-29 11:49:05 +0000	[diff] [blame]	1219	Return a tuple containing all the subgroups of the match, from 1 up to however
				1220	many groups are in the pattern. The default argument is used for groups that
				1221	did not participate in the match; it defaults to ``None``.
Brian Curtin	027e478	2010-03-26 00:39:56 +0000	[diff] [blame]	1222
Serhiy Storchaka	cd195e2	2017-10-14 11:14:26 +0300	[diff] [blame]	1223	For example::
Brian Curtin	027e478	2010-03-26 00:39:56 +0000	[diff] [blame]	1224
Georg Brandl	c62a704	2010-07-29 11:49:05 +0000	[diff] [blame]	1225	>>> m = re.match(r"(\d+)\.(\d+)", "24.1632")
				1226	>>> m.groups()
				1227	('24', '1632')
Brian Curtin	027e478	2010-03-26 00:39:56 +0000	[diff] [blame]	1228
Georg Brandl	c62a704	2010-07-29 11:49:05 +0000	[diff] [blame]	1229	If we make the decimal place and everything after it optional, not all groups
				1230	might participate in the match. These groups will default to ``None`` unless
Serhiy Storchaka	cd195e2	2017-10-14 11:14:26 +0300	[diff] [blame]	1231	the default argument is given::
Brian Curtin	027e478	2010-03-26 00:39:56 +0000	[diff] [blame]	1232
Georg Brandl	c62a704	2010-07-29 11:49:05 +0000	[diff] [blame]	1233	>>> m = re.match(r"(\d+)\.?(\d+)?", "24")
				1234	>>> m.groups() # Second group defaults to None.
				1235	('24', None)
				1236	>>> m.groups('0') # Now, the second group defaults to '0'.
				1237	('24', '0')
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	1238
				1239
Serhiy Storchaka	0b5e61d	2017-10-04 20:09:49 +0300	[diff] [blame]	1240	.. method:: Match.groupdict(default=None)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	1241
Georg Brandl	c62a704	2010-07-29 11:49:05 +0000	[diff] [blame]	1242	Return a dictionary containing all the named subgroups of the match, keyed by
				1243	the subgroup name. The default argument is used for groups that did not
Serhiy Storchaka	cd195e2	2017-10-14 11:14:26 +0300	[diff] [blame]	1244	participate in the match; it defaults to ``None``. For example::
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	1245
Georg Brandl	c62a704	2010-07-29 11:49:05 +0000	[diff] [blame]	1246	>>> m = re.match(r"(?P<first_name>\w+) (?P<last_name>\w+)", "Malcolm Reynolds")
				1247	>>> m.groupdict()
				1248	{'first_name': 'Malcolm', 'last_name': 'Reynolds'}
Christian Heimes	b9eccbf	2007-12-05 20:18:38 +0000	[diff] [blame]	1249
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	1250
Serhiy Storchaka	0b5e61d	2017-10-04 20:09:49 +0300	[diff] [blame]	1251	.. method:: Match.start([group])
				1252	Match.end([group])
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	1253
Georg Brandl	c62a704	2010-07-29 11:49:05 +0000	[diff] [blame]	1254	Return the indices of the start and end of the substring matched by group;
				1255	group defaults to zero (meaning the whole matched substring). Return ``-1`` if
				1256	group exists but did not contribute to the match. For a match object m, and
				1257	a group g that did contribute to the match, the substring matched by group g
				1258	(equivalent to ``m.group(g)``) is ::
Christian Heimes	b9eccbf	2007-12-05 20:18:38 +0000	[diff] [blame]	1259
Georg Brandl	c62a704	2010-07-29 11:49:05 +0000	[diff] [blame]	1260	m.string[m.start(g):m.end(g)]
Brian Curtin	027e478	2010-03-26 00:39:56 +0000	[diff] [blame]	1261
Georg Brandl	c62a704	2010-07-29 11:49:05 +0000	[diff] [blame]	1262	Note that ``m.start(group)`` will equal ``m.end(group)`` if group matched a
				1263	null string. For example, after ``m = re.search('b(c?)', 'cba')``,
				1264	``m.start(0)`` is 1, ``m.end(0)`` is 2, ``m.start(1)`` and ``m.end(1)`` are both
				1265	2, and ``m.start(2)`` raises an :exc:`IndexError` exception.
Brian Curtin	027e478	2010-03-26 00:39:56 +0000	[diff] [blame]	1266
Serhiy Storchaka	cd195e2	2017-10-14 11:14:26 +0300	[diff] [blame]	1267	An example that will remove remove_this from email addresses::
Brian Curtin	027e478	2010-03-26 00:39:56 +0000	[diff] [blame]	1268
Georg Brandl	c62a704	2010-07-29 11:49:05 +0000	[diff] [blame]	1269	>>> email = "tony@tiremove_thisger.net"
				1270	>>> m = re.search("remove_this", email)
				1271	>>> email[:m.start()] + email[m.end():]
				1272	'tony@tiger.net'
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	1273
				1274
Serhiy Storchaka	0b5e61d	2017-10-04 20:09:49 +0300	[diff] [blame]	1275	.. method:: Match.span([group])
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	1276
Georg Brandl	c62a704	2010-07-29 11:49:05 +0000	[diff] [blame]	1277	For a match m, return the 2-tuple ``(m.start(group), m.end(group))``. Note
				1278	that if group did not contribute to the match, this is ``(-1, -1)``.
				1279	group defaults to zero, the entire match.
Christian Heimes	b9eccbf	2007-12-05 20:18:38 +0000	[diff] [blame]	1280
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	1281
Serhiy Storchaka	0b5e61d	2017-10-04 20:09:49 +0300	[diff] [blame]	1282	.. attribute:: Match.pos
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	1283
Serhiy Storchaka	0b5e61d	2017-10-04 20:09:49 +0300	[diff] [blame]	1284	The value of pos which was passed to the :meth:`~Pattern.search` or
				1285	:meth:`~Pattern.match` method of a :ref:`regex object <re-objects>`. This is
Georg Brandl	69c7a69	2012-03-14 08:02:43 +0100	[diff] [blame]	1286	the index into the string at which the RE engine started looking for a match.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	1287
				1288
Serhiy Storchaka	0b5e61d	2017-10-04 20:09:49 +0300	[diff] [blame]	1289	.. attribute:: Match.endpos
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	1290
Serhiy Storchaka	0b5e61d	2017-10-04 20:09:49 +0300	[diff] [blame]	1291	The value of endpos which was passed to the :meth:`~Pattern.search` or
				1292	:meth:`~Pattern.match` method of a :ref:`regex object <re-objects>`. This is
Georg Brandl	69c7a69	2012-03-14 08:02:43 +0100	[diff] [blame]	1293	the index into the string beyond which the RE engine will not go.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	1294
				1295
Serhiy Storchaka	0b5e61d	2017-10-04 20:09:49 +0300	[diff] [blame]	1296	.. attribute:: Match.lastindex
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	1297
Georg Brandl	c62a704	2010-07-29 11:49:05 +0000	[diff] [blame]	1298	The integer index of the last matched capturing group, or ``None`` if no group
				1299	was matched at all. For example, the expressions ``(a)b``, ``((a)(b))``, and
				1300	``((ab))`` will have ``lastindex == 1`` if applied to the string ``'ab'``, while
				1301	the expression ``(a)(b)`` will have ``lastindex == 2``, if applied to the same
				1302	string.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	1303
				1304
Serhiy Storchaka	0b5e61d	2017-10-04 20:09:49 +0300	[diff] [blame]	1305	.. attribute:: Match.lastgroup
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	1306
Georg Brandl	c62a704	2010-07-29 11:49:05 +0000	[diff] [blame]	1307	The name of the last matched capturing group, or ``None`` if the group didn't
				1308	have a name, or if no group was matched at all.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	1309
				1310
Serhiy Storchaka	0b5e61d	2017-10-04 20:09:49 +0300	[diff] [blame]	1311	.. attribute:: Match.re
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	1312
Serhiy Storchaka	cd195e2	2017-10-14 11:14:26 +0300	[diff] [blame]	1313	The :ref:`regular expression object <re-objects>` whose :meth:`~Pattern.match` or
Serhiy Storchaka	0b5e61d	2017-10-04 20:09:49 +0300	[diff] [blame]	1314	:meth:`~Pattern.search` method produced this match instance.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	1315
				1316
Serhiy Storchaka	0b5e61d	2017-10-04 20:09:49 +0300	[diff] [blame]	1317	.. attribute:: Match.string
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	1318
Serhiy Storchaka	0b5e61d	2017-10-04 20:09:49 +0300	[diff] [blame]	1319	The string passed to :meth:`~Pattern.match` or :meth:`~Pattern.search`.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	1320
				1321
Serhiy Storchaka	fdbd011	2017-04-16 10:16:03 +0300	[diff] [blame]	1322	.. versionchanged:: 3.7
				1323	Added support of :func:`copy.copy` and :func:`copy.deepcopy`. Match objects
				1324	are considered atomic.
				1325
				1326
Raymond Hettinger	1fa7682	2010-12-06 23:31:36 +0000	[diff] [blame]	1327	.. _re-examples:
				1328
				1329	Regular Expression Examples
				1330	---------------------------
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	1331
Christian Heimes	b9eccbf	2007-12-05 20:18:38 +0000	[diff] [blame]	1332
Raymond Hettinger	5768e0c	2011-10-19 14:10:07 -0700	[diff] [blame]	1333	Checking for a Pair
Christian Heimes	b9eccbf	2007-12-05 20:18:38 +0000	[diff] [blame]	1334	^^^^^^^^^^^^^^^^^^^
				1335
				1336	In this example, we'll use the following helper function to display match
Stéphane Wirtel	859c068	2018-10-12 09:51:05 +0200	[diff] [blame]	1337	objects a little more gracefully::
Christian Heimes	b9eccbf	2007-12-05 20:18:38 +0000	[diff] [blame]	1338
				1339	def displaymatch(match):
				1340	if match is None:
				1341	return None
				1342	return '<Match: %r, groups=%r>' % (match.group(), match.groups())
				1343
				1344	Suppose you are writing a poker program where a player's hand is represented as
				1345	a 5-character string with each character representing a card, "a" for ace, "k"
Ezio Melotti	e5b2ac8	2011-12-17 01:17:17 +0200	[diff] [blame]	1346	for king, "q" for queen, "j" for jack, "t" for 10, and "2" through "9"
Christian Heimes	b9eccbf	2007-12-05 20:18:38 +0000	[diff] [blame]	1347	representing the card with that value.
				1348
Serhiy Storchaka	cd195e2	2017-10-14 11:14:26 +0300	[diff] [blame]	1349	To see if a given string is a valid hand, one could do the following::
Christian Heimes	b9eccbf	2007-12-05 20:18:38 +0000	[diff] [blame]	1350
Ezio Melotti	e5b2ac8	2011-12-17 01:17:17 +0200	[diff] [blame]	1351	>>> valid = re.compile(r"^[a2-9tjqk]{5}$")
				1352	>>> displaymatch(valid.match("akt5q")) # Valid.
				1353	"<Match: 'akt5q', groups=()>"
				1354	>>> displaymatch(valid.match("akt5e")) # Invalid.
				1355	>>> displaymatch(valid.match("akt")) # Invalid.
Christian Heimes	b9eccbf	2007-12-05 20:18:38 +0000	[diff] [blame]	1356	>>> displaymatch(valid.match("727ak")) # Valid.
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	1357	"<Match: '727ak', groups=()>"
Christian Heimes	b9eccbf	2007-12-05 20:18:38 +0000	[diff] [blame]	1358
				1359	That last hand, ``"727ak"``, contained a pair, or two of the same valued cards.
Serhiy Storchaka	cd195e2	2017-10-14 11:14:26 +0300	[diff] [blame]	1360	To match this with a regular expression, one could use backreferences as such::
Christian Heimes	b9eccbf	2007-12-05 20:18:38 +0000	[diff] [blame]	1361
				1362	>>> pair = re.compile(r".(.).\1")
				1363	>>> displaymatch(pair.match("717ak")) # Pair of 7s.
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	1364	"<Match: '717', groups=('7',)>"
Christian Heimes	b9eccbf	2007-12-05 20:18:38 +0000	[diff] [blame]	1365	>>> displaymatch(pair.match("718ak")) # No pairs.
				1366	>>> displaymatch(pair.match("354aa")) # Pair of aces.
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	1367	"<Match: '354aa', groups=('a',)>"
Christian Heimes	b9eccbf	2007-12-05 20:18:38 +0000	[diff] [blame]	1368
Georg Brandl	f346ac0	2009-07-26 15:03:49 +0000	[diff] [blame]	1369	To find out what card the pair consists of, one could use the
Stéphane Wirtel	859c068	2018-10-12 09:51:05 +0200	[diff] [blame]	1370	:meth:`~Match.group` method of the match object in the following manner::
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	1371
Stéphane Wirtel	859c068	2018-10-12 09:51:05 +0200	[diff] [blame]	1372	>>> pair = re.compile(r".(.).\1")
Christian Heimes	b9eccbf	2007-12-05 20:18:38 +0000	[diff] [blame]	1373	>>> pair.match("717ak").group(1)
				1374	'7'
Georg Brandl	48310cd	2009-01-03 21:18:54 +0000	[diff] [blame]	1375
Christian Heimes	b9eccbf	2007-12-05 20:18:38 +0000	[diff] [blame]	1376	# Error because re.match() returns None, which doesn't have a group() method:
				1377	>>> pair.match("718ak").group(1)
				1378	Traceback (most recent call last):
				1379	File "<pyshell#23>", line 1, in <module>
				1380	re.match(r".(.).\1", "718ak").group(1)
				1381	AttributeError: 'NoneType' object has no attribute 'group'
Georg Brandl	48310cd	2009-01-03 21:18:54 +0000	[diff] [blame]	1382
Christian Heimes	b9eccbf	2007-12-05 20:18:38 +0000	[diff] [blame]	1383	>>> pair.match("354aa").group(1)
				1384	'a'
				1385
				1386
				1387	Simulating scanf()
				1388	^^^^^^^^^^^^^^^^^^
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	1389
				1390	.. index:: single: scanf()
				1391
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1392	Python does not currently have an equivalent to :c:func:`scanf`. Regular
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	1393	expressions are generally more powerful, though also more verbose, than
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1394	:c:func:`scanf` format strings. The table below offers some more-or-less
				1395	equivalent mappings between :c:func:`scanf` format tokens and regular
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	1396	expressions.
				1397
				1398	+--------------------------------+---------------------------------------------+
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1399	\| :c:func:`scanf` Token \| Regular Expression \|
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	1400	+================================+=============================================+
				1401	\| ``%c`` \| ``.`` \|
				1402	+--------------------------------+---------------------------------------------+
				1403	\| ``%5c`` \| ``.{5}`` \|
				1404	+--------------------------------+---------------------------------------------+
				1405	\| ``%d`` \| ``[-+]?\d+`` \|
				1406	+--------------------------------+---------------------------------------------+
				1407	\| ``%e``, ``%E``, ``%f``, ``%g`` \| ``[-+]?(\d+(\.\d*)?\|\.\d+)([eE][-+]?\d+)?`` \|
				1408	+--------------------------------+---------------------------------------------+
				1409	\| ``%i`` \| ``[-+]?(0[xX][\dA-Fa-f]+\|0[0-7]*\|\d+)`` \|
				1410	+--------------------------------+---------------------------------------------+
Ezio Melotti	a0b1d1e	2012-04-29 11:47:28 +0300	[diff] [blame]	1411	\| ``%o`` \| ``[-+]?[0-7]+`` \|
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	1412	+--------------------------------+---------------------------------------------+
				1413	\| ``%s`` \| ``\S+`` \|
				1414	+--------------------------------+---------------------------------------------+
				1415	\| ``%u`` \| ``\d+`` \|
				1416	+--------------------------------+---------------------------------------------+
Ezio Melotti	a0b1d1e	2012-04-29 11:47:28 +0300	[diff] [blame]	1417	\| ``%x``, ``%X`` \| ``[-+]?(0[xX])?[\dA-Fa-f]+`` \|
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	1418	+--------------------------------+---------------------------------------------+
				1419
				1420	To extract the filename and numbers from a string like ::
				1421
				1422	/usr/sbin/sendmail - 0 errors, 4 warnings
				1423
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1424	you would use a :c:func:`scanf` format like ::
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	1425
				1426	%s - %d errors, %d warnings
				1427
				1428	The equivalent regular expression would be ::
				1429
				1430	(\S+) - (\d+) errors, (\d+) warnings
				1431
Christian Heimes	b9eccbf	2007-12-05 20:18:38 +0000	[diff] [blame]	1432
Ezio Melotti	443f000	2012-02-29 13:39:05 +0200	[diff] [blame]	1433	.. _search-vs-match:
Christian Heimes	b9eccbf	2007-12-05 20:18:38 +0000	[diff] [blame]	1434
				1435	search() vs. match()
				1436	^^^^^^^^^^^^^^^^^^^^
				1437
Ezio Melotti	443f000	2012-02-29 13:39:05 +0200	[diff] [blame]	1438	.. sectionauthor:: Fred L. Drake, Jr. <fdrake@acm.org>
Christian Heimes	b9eccbf	2007-12-05 20:18:38 +0000	[diff] [blame]	1439
Ezio Melotti	443f000	2012-02-29 13:39:05 +0200	[diff] [blame]	1440	Python offers two different primitive operations based on regular expressions:
				1441	:func:`re.match` checks for a match only at the beginning of the string, while
				1442	:func:`re.search` checks for a match anywhere in the string (this is what Perl
				1443	does by default).
				1444
				1445	For example::
				1446
Serhiy Storchaka	dba9039	2016-05-10 12:01:23 +0300	[diff] [blame]	1447	>>> re.match("c", "abcdef") # No match
				1448	>>> re.search("c", "abcdef") # Match
Serhiy Storchaka	0b5e61d	2017-10-04 20:09:49 +0300	[diff] [blame]	1449	<re.Match object; span=(2, 3), match='c'>
Christian Heimes	b9eccbf	2007-12-05 20:18:38 +0000	[diff] [blame]	1450
Ezio Melotti	443f000	2012-02-29 13:39:05 +0200	[diff] [blame]	1451	Regular expressions beginning with ``'^'`` can be used with :func:`search` to
				1452	restrict the match at the beginning of the string::
Christian Heimes	b9eccbf	2007-12-05 20:18:38 +0000	[diff] [blame]	1453
Serhiy Storchaka	dba9039	2016-05-10 12:01:23 +0300	[diff] [blame]	1454	>>> re.match("c", "abcdef") # No match
				1455	>>> re.search("^c", "abcdef") # No match
Ezio Melotti	443f000	2012-02-29 13:39:05 +0200	[diff] [blame]	1456	>>> re.search("^a", "abcdef") # Match
Serhiy Storchaka	0b5e61d	2017-10-04 20:09:49 +0300	[diff] [blame]	1457	<re.Match object; span=(0, 1), match='a'>
Ezio Melotti	443f000	2012-02-29 13:39:05 +0200	[diff] [blame]	1458
				1459	Note however that in :const:`MULTILINE` mode :func:`match` only matches at the
				1460	beginning of the string, whereas using :func:`search` with a regular expression
Serhiy Storchaka	cd195e2	2017-10-14 11:14:26 +0300	[diff] [blame]	1461	beginning with ``'^'`` will match at the beginning of each line. ::
Ezio Melotti	443f000	2012-02-29 13:39:05 +0200	[diff] [blame]	1462
				1463	>>> re.match('X', 'A\nB\nX', re.MULTILINE) # No match
				1464	>>> re.search('^X', 'A\nB\nX', re.MULTILINE) # Match
Serhiy Storchaka	0b5e61d	2017-10-04 20:09:49 +0300	[diff] [blame]	1465	<re.Match object; span=(4, 5), match='X'>
Christian Heimes	b9eccbf	2007-12-05 20:18:38 +0000	[diff] [blame]	1466
				1467
				1468	Making a Phonebook
				1469	^^^^^^^^^^^^^^^^^^
				1470
Georg Brandl	48310cd	2009-01-03 21:18:54 +0000	[diff] [blame]	1471	:func:`split` splits a string into a list delimited by the passed pattern. The
Christian Heimes	b9eccbf	2007-12-05 20:18:38 +0000	[diff] [blame]	1472	method is invaluable for converting textual data into data structures that can be
				1473	easily read and modified by Python as demonstrated in the following example that
				1474	creates a phonebook.
				1475
Christian Heimes	255f53b	2007-12-08 15:33:56 +0000	[diff] [blame]	1476	First, here is the input. Normally it may come from a file, here we are using
Stéphane Wirtel	859c068	2018-10-12 09:51:05 +0200	[diff] [blame]	1477	triple-quoted string syntax
				1478
				1479	.. doctest::
Christian Heimes	b9eccbf	2007-12-05 20:18:38 +0000	[diff] [blame]	1480
Georg Brandl	557a3ec	2012-03-17 17:26:27 +0100	[diff] [blame]	1481	>>> text = """Ross McFluff: 834.345.1254 155 Elm Street
Georg Brandl	48310cd	2009-01-03 21:18:54 +0000	[diff] [blame]	1482	...
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	1483	... Ronald Heathmore: 892.345.3428 436 Finley Avenue
				1484	... Frank Burger: 925.541.7625 662 South Dogwood Way
				1485	...
				1486	...
				1487	... Heather Albrecht: 548.326.4584 919 Park Place"""
Christian Heimes	255f53b	2007-12-08 15:33:56 +0000	[diff] [blame]	1488
				1489	The entries are separated by one or more newlines. Now we convert the string
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	1490	into a list with each nonempty line having its own entry:
				1491
				1492	.. doctest::
				1493	:options: +NORMALIZE_WHITESPACE
Christian Heimes	255f53b	2007-12-08 15:33:56 +0000	[diff] [blame]	1494
Georg Brandl	557a3ec	2012-03-17 17:26:27 +0100	[diff] [blame]	1495	>>> entries = re.split("\n+", text)
Christian Heimes	b9eccbf	2007-12-05 20:18:38 +0000	[diff] [blame]	1496	>>> entries
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	1497	['Ross McFluff: 834.345.1254 155 Elm Street',
				1498	'Ronald Heathmore: 892.345.3428 436 Finley Avenue',
				1499	'Frank Burger: 925.541.7625 662 South Dogwood Way',
				1500	'Heather Albrecht: 548.326.4584 919 Park Place']
Christian Heimes	b9eccbf	2007-12-05 20:18:38 +0000	[diff] [blame]	1501
				1502	Finally, split each entry into a list with first name, last name, telephone
Christian Heimes	c3f30c4	2008-02-22 16:37:40 +0000	[diff] [blame]	1503	number, and address. We use the ``maxsplit`` parameter of :func:`split`
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	1504	because the address has spaces, our splitting pattern, in it:
				1505
				1506	.. doctest::
				1507	:options: +NORMALIZE_WHITESPACE
Christian Heimes	b9eccbf	2007-12-05 20:18:38 +0000	[diff] [blame]	1508
Christian Heimes	255f53b	2007-12-08 15:33:56 +0000	[diff] [blame]	1509	>>> [re.split(":? ", entry, 3) for entry in entries]
Christian Heimes	b9eccbf	2007-12-05 20:18:38 +0000	[diff] [blame]	1510	[['Ross', 'McFluff', '834.345.1254', '155 Elm Street'],
				1511	['Ronald', 'Heathmore', '892.345.3428', '436 Finley Avenue'],
				1512	['Frank', 'Burger', '925.541.7625', '662 South Dogwood Way'],
				1513	['Heather', 'Albrecht', '548.326.4584', '919 Park Place']]
				1514
Christian Heimes	255f53b	2007-12-08 15:33:56 +0000	[diff] [blame]	1515	The ``:?`` pattern matches the colon after the last name, so that it does not
Christian Heimes	c3f30c4	2008-02-22 16:37:40 +0000	[diff] [blame]	1516	occur in the result list. With a ``maxsplit`` of ``4``, we could separate the
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	1517	house number from the street name:
				1518
				1519	.. doctest::
				1520	:options: +NORMALIZE_WHITESPACE
Christian Heimes	b9eccbf	2007-12-05 20:18:38 +0000	[diff] [blame]	1521
Christian Heimes	255f53b	2007-12-08 15:33:56 +0000	[diff] [blame]	1522	>>> [re.split(":? ", entry, 4) for entry in entries]
Christian Heimes	b9eccbf	2007-12-05 20:18:38 +0000	[diff] [blame]	1523	[['Ross', 'McFluff', '834.345.1254', '155', 'Elm Street'],
				1524	['Ronald', 'Heathmore', '892.345.3428', '436', 'Finley Avenue'],
				1525	['Frank', 'Burger', '925.541.7625', '662', 'South Dogwood Way'],
				1526	['Heather', 'Albrecht', '548.326.4584', '919', 'Park Place']]
				1527
				1528
				1529	Text Munging
				1530	^^^^^^^^^^^^
				1531
				1532	:func:`sub` replaces every occurrence of a pattern with a string or the
				1533	result of a function. This example demonstrates using :func:`sub` with
				1534	a function to "munge" text, or randomize the order of all the characters
				1535	in each word of a sentence except for the first and last characters::
				1536
				1537	>>> def repl(m):
Serhiy Storchaka	dba9039	2016-05-10 12:01:23 +0300	[diff] [blame]	1538	... inner_word = list(m.group(2))
				1539	... random.shuffle(inner_word)
				1540	... return m.group(1) + "".join(inner_word) + m.group(3)
Christian Heimes	b9eccbf	2007-12-05 20:18:38 +0000	[diff] [blame]	1541	>>> text = "Professor Abdolmalek, please report your absences promptly."
Georg Brandl	db4e939	2010-07-12 09:06:13 +0000	[diff] [blame]	1542	>>> re.sub(r"(\w)(\w+)(\w)", repl, text)
Christian Heimes	b9eccbf	2007-12-05 20:18:38 +0000	[diff] [blame]	1543	'Poefsrosr Aealmlobdk, pslaee reorpt your abnseces plmrptoy.'
Georg Brandl	db4e939	2010-07-12 09:06:13 +0000	[diff] [blame]	1544	>>> re.sub(r"(\w)(\w+)(\w)", repl, text)
Christian Heimes	b9eccbf	2007-12-05 20:18:38 +0000	[diff] [blame]	1545	'Pofsroser Aodlambelk, plasee reoprt yuor asnebces potlmrpy.'
				1546
				1547
				1548	Finding all Adverbs
				1549	^^^^^^^^^^^^^^^^^^^
				1550
Christian Heimes	c3f30c4	2008-02-22 16:37:40 +0000	[diff] [blame]	1551	:func:`findall` matches all occurrences of a pattern, not just the first
Andrés Delfino	5092439	2018-06-18 01:34:30 -0300	[diff] [blame]	1552	one as :func:`search` does. For example, if a writer wanted to
				1553	find all of the adverbs in some text, they might use :func:`findall` in
Serhiy Storchaka	cd195e2	2017-10-14 11:14:26 +0300	[diff] [blame]	1554	the following manner::
Christian Heimes	b9eccbf	2007-12-05 20:18:38 +0000	[diff] [blame]	1555
				1556	>>> text = "He was carefully disguised but captured quickly by police."
				1557	>>> re.findall(r"\w+ly", text)
				1558	['carefully', 'quickly']
				1559
				1560
				1561	Finding all Adverbs and their Positions
				1562	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
				1563
				1564	If one wants more information about all matches of a pattern than the matched
Georg Brandl	c62a704	2010-07-29 11:49:05 +0000	[diff] [blame]	1565	text, :func:`finditer` is useful as it provides :ref:`match objects
				1566	<match-objects>` instead of strings. Continuing with the previous example, if
Andrés Delfino	5092439	2018-06-18 01:34:30 -0300	[diff] [blame]	1567	a writer wanted to find all of the adverbs and their positions in
				1568	some text, they would use :func:`finditer` in the following manner::
Christian Heimes	b9eccbf	2007-12-05 20:18:38 +0000	[diff] [blame]	1569
				1570	>>> text = "He was carefully disguised but captured quickly by police."
				1571	>>> for m in re.finditer(r"\w+ly", text):
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	1572	... print('%02d-%02d: %s' % (m.start(), m.end(), m.group(0)))
Christian Heimes	b9eccbf	2007-12-05 20:18:38 +0000	[diff] [blame]	1573	07-16: carefully
				1574	40-47: quickly
				1575
				1576
				1577	Raw String Notation
				1578	^^^^^^^^^^^^^^^^^^^
				1579
				1580	Raw string notation (``r"text"``) keeps regular expressions sane. Without it,
				1581	every backslash (``'\'``) in a regular expression would have to be prefixed with
				1582	another one to escape it. For example, the two following lines of code are
Serhiy Storchaka	cd195e2	2017-10-14 11:14:26 +0300	[diff] [blame]	1583	functionally identical::
Christian Heimes	b9eccbf	2007-12-05 20:18:38 +0000	[diff] [blame]	1584
				1585	>>> re.match(r"\W(.)\1\W", " ff ")
Serhiy Storchaka	0b5e61d	2017-10-04 20:09:49 +0300	[diff] [blame]	1586	<re.Match object; span=(0, 4), match=' ff '>
Christian Heimes	b9eccbf	2007-12-05 20:18:38 +0000	[diff] [blame]	1587	>>> re.match("\\W(.)\\1\\W", " ff ")
Serhiy Storchaka	0b5e61d	2017-10-04 20:09:49 +0300	[diff] [blame]	1588	<re.Match object; span=(0, 4), match=' ff '>
Christian Heimes	b9eccbf	2007-12-05 20:18:38 +0000	[diff] [blame]	1589
				1590	When one wants to match a literal backslash, it must be escaped in the regular
				1591	expression. With raw string notation, this means ``r"\\"``. Without raw string
				1592	notation, one must use ``"\\\\"``, making the following lines of code
Serhiy Storchaka	cd195e2	2017-10-14 11:14:26 +0300	[diff] [blame]	1593	functionally identical::
Christian Heimes	b9eccbf	2007-12-05 20:18:38 +0000	[diff] [blame]	1594
				1595	>>> re.match(r"\\", r"\\")
Serhiy Storchaka	0b5e61d	2017-10-04 20:09:49 +0300	[diff] [blame]	1596	<re.Match object; span=(0, 1), match='\\'>
Christian Heimes	b9eccbf	2007-12-05 20:18:38 +0000	[diff] [blame]	1597	>>> re.match("\\\\", r"\\")
Serhiy Storchaka	0b5e61d	2017-10-04 20:09:49 +0300	[diff] [blame]	1598	<re.Match object; span=(0, 1), match='\\'>
Raymond Hettinger	37ade9c	2010-09-16 12:02:17 +0000	[diff] [blame]	1599
				1600
				1601	Writing a Tokenizer
				1602	^^^^^^^^^^^^^^^^^^^
				1603
Georg Brandl	5d94134	2016-02-26 19:37:12 +0100	[diff] [blame]	1604	A `tokenizer or scanner <https://en.wikipedia.org/wiki/Lexical_analysis>`_
Raymond Hettinger	37ade9c	2010-09-16 12:02:17 +0000	[diff] [blame]	1605	analyzes a string to categorize groups of characters. This is a useful first
				1606	step in writing a compiler or interpreter.
				1607
				1608	The text categories are specified with regular expressions. The technique is
				1609	to combine those into a single master regular expression and to loop over
				1610	successive matches::
				1611
Raymond Hettinger	4b244ef	2011-05-23 12:45:34 -0700	[diff] [blame]	1612	import collections
				1613	import re
				1614
Raymond Hettinger	b83942c	2018-11-09 01:19:33 -0800	[diff] [blame]	1615	Token = collections.namedtuple('Token', ['type', 'value', 'line', 'column'])
Raymond Hettinger	37ade9c	2010-09-16 12:02:17 +0000	[diff] [blame]	1616
Raymond Hettinger	c566431	2014-08-03 23:38:54 -0700	[diff] [blame]	1617	def tokenize(code):
Raymond Hettinger	4b244ef	2011-05-23 12:45:34 -0700	[diff] [blame]	1618	keywords = {'IF', 'THEN', 'ENDIF', 'FOR', 'NEXT', 'GOSUB', 'RETURN'}
				1619	token_specification = [
Raymond Hettinger	b83942c	2018-11-09 01:19:33 -0800	[diff] [blame]	1620	('NUMBER', r'\d+(\.\d*)?'), # Integer or decimal number
				1621	('ASSIGN', r':='), # Assignment operator
				1622	('END', r';'), # Statement terminator
				1623	('ID', r'[A-Za-z]+'), # Identifiers
				1624	('OP', r'[+\-*/]'), # Arithmetic operators
				1625	('NEWLINE', r'\n'), # Line endings
				1626	('SKIP', r'[ \t]+'), # Skip over spaces and tabs
				1627	('MISMATCH', r'.'), # Any other character
Raymond Hettinger	37ade9c	2010-09-16 12:02:17 +0000	[diff] [blame]	1628	]
Raymond Hettinger	4b244ef	2011-05-23 12:45:34 -0700	[diff] [blame]	1629	tok_regex = '\|'.join('(?P<%s>%s)' % pair for pair in token_specification)
Raymond Hettinger	c566431	2014-08-03 23:38:54 -0700	[diff] [blame]	1630	line_num = 1
				1631	line_start = 0
				1632	for mo in re.finditer(tok_regex, code):
				1633	kind = mo.lastgroup
Raymond Hettinger	b83942c	2018-11-09 01:19:33 -0800	[diff] [blame]	1634	value = mo.group()
				1635	column = mo.start() - line_start
				1636	if kind == 'NUMBER':
				1637	value = float(value) if '.' in value else int(value)
				1638	elif kind == 'ID' and value in keywords:
				1639	kind = value
				1640	elif kind == 'NEWLINE':
Raymond Hettinger	c566431	2014-08-03 23:38:54 -0700	[diff] [blame]	1641	line_start = mo.end()
				1642	line_num += 1
Raymond Hettinger	b83942c	2018-11-09 01:19:33 -0800	[diff] [blame]	1643	continue
Raymond Hettinger	c566431	2014-08-03 23:38:54 -0700	[diff] [blame]	1644	elif kind == 'SKIP':
Raymond Hettinger	b83942c	2018-11-09 01:19:33 -0800	[diff] [blame]	1645	continue
Raymond Hettinger	c566431	2014-08-03 23:38:54 -0700	[diff] [blame]	1646	elif kind == 'MISMATCH':
Raymond Hettinger	d0b9158	2017-02-06 07:15:31 -0800	[diff] [blame]	1647	raise RuntimeError(f'{value!r} unexpected on line {line_num}')
Raymond Hettinger	b83942c	2018-11-09 01:19:33 -0800	[diff] [blame]	1648	yield Token(kind, value, line_num, column)
Raymond Hettinger	37ade9c	2010-09-16 12:02:17 +0000	[diff] [blame]	1649
Raymond Hettinger	4b244ef	2011-05-23 12:45:34 -0700	[diff] [blame]	1650	statements = '''
				1651	IF quantity THEN
				1652	total := total + price * quantity;
				1653	tax := price * 0.05;
				1654	ENDIF;
Raymond Hettinger	37ade9c	2010-09-16 12:02:17 +0000	[diff] [blame]	1655	'''
Raymond Hettinger	23157e5	2011-05-13 01:38:31 -0700	[diff] [blame]	1656
				1657	for token in tokenize(statements):
				1658	print(token)
				1659
				1660	The tokenizer produces the following output::
Raymond Hettinger	9c47d77	2011-05-13 01:03:50 -0700	[diff] [blame]	1661
Raymond Hettinger	b83942c	2018-11-09 01:19:33 -0800	[diff] [blame]	1662	Token(type='IF', value='IF', line=2, column=4)
				1663	Token(type='ID', value='quantity', line=2, column=7)
				1664	Token(type='THEN', value='THEN', line=2, column=16)
				1665	Token(type='ID', value='total', line=3, column=8)
				1666	Token(type='ASSIGN', value=':=', line=3, column=14)
				1667	Token(type='ID', value='total', line=3, column=17)
				1668	Token(type='OP', value='+', line=3, column=23)
				1669	Token(type='ID', value='price', line=3, column=25)
				1670	Token(type='OP', value='*', line=3, column=31)
				1671	Token(type='ID', value='quantity', line=3, column=33)
				1672	Token(type='END', value=';', line=3, column=41)
				1673	Token(type='ID', value='tax', line=4, column=8)
				1674	Token(type='ASSIGN', value=':=', line=4, column=12)
				1675	Token(type='ID', value='price', line=4, column=15)
				1676	Token(type='OP', value='*', line=4, column=21)
				1677	Token(type='NUMBER', value=0.05, line=4, column=23)
				1678	Token(type='END', value=';', line=4, column=27)
				1679	Token(type='ENDIF', value='ENDIF', line=5, column=4)
				1680	Token(type='END', value=';', line=5, column=9)
Berker Peksag	a0a42d2	2018-03-23 16:46:52 +0300	[diff] [blame]	1681
				1682
				1683	.. [Frie09] Friedl, Jeffrey. Mastering Regular Expressions. 3rd ed., O'Reilly
				1684	Media, 2009. The third edition of the book no longer covers Python at all,
				1685	but the first edition covered writing good regular expression patterns in
				1686	great detail.