Blame - Doc/libre.tex - platform/external/python/cpython3

blob: b63a5fac19b485c67075f696837e38c6c601645d [file] [log] [blame]

Guido van Rossum	1acceb0	1997-08-14 23:12:18 +0000	[diff] [blame]	1	\section{Built-in Module \sectcode{re}}
				2	\label{module-re}
				3
				4	\bimodindex{re}
				5
				6	% XXX Remove before 1.5final release.
				7	{\large\bf The \code{re} module is still in the process of being
				8	developed, and more features will be added in future 1.5 alphas and
				9	betas. This documentation is also preliminary and incomplete. If you
				10	find a bug or documentation error, or just find something unclear,
				11	please send a message to
				12	\code{string-sig@python.org}, and we'll fix it.}
				13
				14	This module provides regular expression matching operations similar to
Guido van Rossum	0b33410	1997-12-08 17:33:40 +0000	[diff] [blame]	15	those found in Perl. It's 8-bit clean: both patterns and strings may
				16	contain null bytes and characters whose high bit is set. It is always
				17	available.
Guido van Rossum	1acceb0	1997-08-14 23:12:18 +0000	[diff] [blame]	18
				19	Regular expressions use the backslash character (\code{\e}) to
				20	indicate special forms or to allow special characters to be used
				21	without invoking their special meaning. This collides with Python's
				22	usage of the same character for the same purpose in string literals;
				23	for example, to match a literal backslash, one might have to write
Guido van Rossum	0b33410	1997-12-08 17:33:40 +0000	[diff] [blame]	24	\code{\e\e\e\e} as the pattern string, because the regular expression
				25	must be \code{\e\e}, and each backslash must be expressed as
				26	\code{\e\e} inside a regular Python string literal.
Guido van Rossum	1acceb0	1997-08-14 23:12:18 +0000	[diff] [blame]	27
				28	The solution is to use Python's raw string notation for regular
				29	expression patterns; backslashes are not handled in any special way in
				30	a string literal prefixed with 'r'. So \code{r"\e n"} is a two
				31	character string containing a backslash and the letter 'n', while
				32	\code{"\e n"} is a one-character string containing a newline. Usually
				33	patterns will be expressed in Python code using this raw string notation.
				34
				35	% XXX Can the following section be dropped, or should it be boiled down?
				36
				37	%\strong{Please note:} There is a little-known fact about Python string
				38	%literals which means that you don't usually have to worry about
				39	%doubling backslashes, even though they are used to escape special
				40	%characters in string literals as well as in regular expressions. This
				41	%is because Python doesn't remove backslashes from string literals if
				42	%they are followed by an unrecognized escape character.
				43	%\emph{However}, if you want to include a literal \dfn{backslash} in a
				44	%regular expression represented as a string literal, you have to
				45	%\emph{quadruple} it or enclose it in a singleton character class.
				46	%E.g.\ to extract \LaTeX\ \code{\e section\{{\rm
				47	%\ldots}\}} headers from a document, you can use this pattern:
				48	%\code{'[\e ] section\{\e (.*\e )\}'}. \emph{Another exception:}
				49	%the escape sequence \code{\e b} is significant in string literals
				50	%(where it means the ASCII bell character) as well as in Emacs regular
				51	%expressions (where it stands for a word boundary), so in order to
				52	%search for a word boundary, you should use the pattern \code{'\e \e b'}.
				53	%Similarly, a backslash followed by a digit 0-7 should be doubled to
				54	%avoid interpretation as an octal escape.
				55
				56	\subsection{Regular Expressions}
				57
				58	A regular expression (or RE) specifies a set of strings that matches
				59	it; the functions in this module let you check if a particular string
				60	matches a given regular expression (or if a given regular expression
				61	matches a particular string, which comes down to the same thing).
				62
				63	Regular expressions can be concatenated to form new regular
				64	expressions; if \emph{A} and \emph{B} are both regular expressions,
				65	then \emph{AB} is also an regular expression. If a string \emph{p}
				66	matches A and another string \emph{q} matches B, the string \emph{pq}
				67	will match AB. Thus, complex expressions can easily be constructed
				68	from simpler primitive expressions like the ones described here. For
				69	details of the theory and implementation of regular expressions,
				70	consult the Friedl book referenced below, or almost any textbook about
				71	compiler construction.
				72
Guido van Rossum	0b33410	1997-12-08 17:33:40 +0000	[diff] [blame]	73	A brief explanation of the format of regular expressions follows.
				74	%For further information and a gentler presentation, consult XXX somewhere.
Guido van Rossum	1acceb0	1997-08-14 23:12:18 +0000	[diff] [blame]	75
				76	Regular expressions can contain both special and ordinary characters.
				77	Most ordinary characters, like '\code{A}', '\code{a}', or '\code{0}',
				78	are the simplest regular expressions; they simply match themselves.
				79	You can concatenate ordinary characters, so '\code{last}' matches the
				80	characters 'last'. (In the rest of this section, we'll write RE's in
				81	\code{this special font}, usually without quotes, and strings to be
				82	matched 'in single quotes'.)
				83
				84	Some characters, like \code{\|} or \code{(}, are special. Special
				85	characters either stand for classes of ordinary characters, or affect
				86	how the regular expressions around them are interpreted.
				87
				88	The special characters are:
				89	\begin{itemize}
				90	\item[\code{.}] (Dot.) In the default mode, this matches any
				91	character except a newline. If the \code{DOTALL} flag has been
				92	specified, this matches any character including a newline.
				93	\item[\code{\^}] (Caret.) Matches the start of the string, and in
				94	\code{MULTILINE} mode also immediately after each newline.
				95	\item[\code{\$}] Matches the end of the string.
				96	\code{foo} matches both 'foo' and 'foobar', while the regular
				97	expression '\code{foo\$}' matches only 'foo'.
				98	%
				99	\item[\code{*}] Causes the resulting RE to
				100	match 0 or more repetitions of the preceding RE, as many repetitions
				101	as are possible. \code{ab*} will
				102	match 'a', 'ab', or 'a' followed by any number of 'b's.
				103	%
				104	\item[\code{+}] Causes the
				105	resulting RE to match 1 or more repetitions of the preceding RE.
				106	\code{ab+} will match 'a' followed by any non-zero number of 'b's; it
				107	will not match just 'a'.
				108	%
				109	\item[\code{?}] Causes the resulting RE to
				110	match 0 or 1 repetitions of the preceding RE. \code{ab?} will
				111	match either 'a' or 'ab'.
				112	\item[\code{?}, \code{+?}, \code{??}] The \code{}, \code{+}, and
				113	\code{?} qualifiers are all \dfn{greedy}; they match as much text as
				114	possible. Sometimes this behaviour isn't desired; if the RE
				115	\code{<.*>} is matched against \code{<H1>title</H1>}, it will match the
				116	entire string, and not just \code{<H1>}.
				117	Adding \code{?} after the qualifier makes it perform the match in
				118	\dfn{non-greedy} or \dfn{minimal} fashion; as few characters as
				119	possible will be matched. Using \code{.*?} in the previous
Guido van Rossum	0b33410	1997-12-08 17:33:40 +0000	[diff] [blame]	120	expression will match only \code{<H1>}.
Guido van Rossum	1acceb0	1997-08-14 23:12:18 +0000	[diff] [blame]	121	%
				122	\item[\code{\e}] Either escapes special characters (permitting you to match
				123	characters like '*?+\&\$'), or signals a special sequence; special
				124	sequences are discussed below.
				125
				126	If you're not using a raw string to
				127	express the pattern, remember that Python also uses the
				128	backslash as an escape sequence in string literals; if the escape
				129	sequence isn't recognized by Python's parser, the backslash and
				130	subsequent character are included in the resulting string. However,
				131	if Python would recognize the resulting sequence, the backslash should
				132	be repeated twice. This is complicated and hard to understand, so
				133	it's highly recommended that you use raw strings.
				134	%
				135	\item[\code{[]}] Used to indicate a set of characters. Characters can
				136	be listed individually, or a range is indicated by giving two
				137	characters and separating them by a '-'. Special characters are not
				138	active inside sets. For example, \code{[akm\$]} will match any of the
				139	characters 'a', 'k', 'm', or '\$'; \code{[a-z]} will match any
				140	lowercase letter and \code{[a-zA-Z0-9]} matches any letter or digit.
Guido van Rossum	0b33410	1997-12-08 17:33:40 +0000	[diff] [blame]	141	Character classes of the form \code{\e \var{X}} defined below are also acceptable.
Guido van Rossum	1acceb0	1997-08-14 23:12:18 +0000	[diff] [blame]	142	If you want to include a \code{]} or a \code{-} inside a
				143	set, precede it with a backslash.
				144
				145	Characters \emph{not} within a range can be matched by including a
				146	\code{\^} as the first character of the set; \code{\^} elsewhere will
				147	simply match the '\code{\^}' character.
				148	%
				149	\item[\code{\|}]\code{A\|B}, where A and B can be arbitrary REs,
				150	creates a regular expression that will match either A or B. This can
				151	be used inside groups (see below) as well. To match a literal '\|',
				152	use \code{\e\|}, or enclose it inside a character class, like \code{[\|]}.
				153	%
Guido van Rossum	0b33410	1997-12-08 17:33:40 +0000	[diff] [blame]	154	\item[\code{(...)}] Matches whatever regular expression is inside the parentheses, and indicates the start and end of a group; the
Guido van Rossum	1acceb0	1997-08-14 23:12:18 +0000	[diff] [blame]	155	contents of a group can be retrieved after a match has been performed,
				156	and can be matched later in the string with the
				157	\code{\e \var{number}} special sequence, described below. To match the
				158	literals '(' or ')',
				159	use \code{\e(} or \code{\e)}, or enclose them inside a character
				160	class: \code{[(] [)]}.
				161	%
Guido van Rossum	0b33410	1997-12-08 17:33:40 +0000	[diff] [blame]	162	\item[\code{(?...)}] This is an extension notation (a '?' following a
				163	'(' is not meaningful otherwise). The first character after the '?'
				164	determines what the meaning and further syntax of the construct is.
				165	Following are the currently supported extensions.
				166	%
Guido van Rossum	bd49ac4	1997-12-10 23:05:53 +0000	[diff] [blame^]	167	\item[\code{(?iLmsx)}] (One or more letters from the set 'i', 'L', 'm', 's',
Guido van Rossum	0b33410	1997-12-08 17:33:40 +0000	[diff] [blame]	168	'x'.) The group matches the empty string; the letters set the
				169	corresponding flags (re.I, re.L, re.M, re.S, re.X) for the entire regular
Guido van Rossum	bd49ac4	1997-12-10 23:05:53 +0000	[diff] [blame^]	170	expression. (The flag 'L' is uppercase because it is not in standard Perl.)
				171	This is useful if you wish include the flags as part of the regular
Guido van Rossum	0b33410	1997-12-08 17:33:40 +0000	[diff] [blame]	172	expression, instead of passing a \var{flag} argument to the \code{compile} function.
				173	%
Guido van Rossum	1acceb0	1997-08-14 23:12:18 +0000	[diff] [blame]	174	\item[\code{(?:...)}] A non-grouping version of regular parentheses.
				175	Matches whatever's inside the parentheses, but the text matched by the
				176	group \emph{cannot} be retrieved after performing a match or
				177	referenced later in the pattern.
				178	%
				179	\item[\code{(?P<\var{name}>...)}] Similar to regular parentheses, but
				180	the text matched by the group is accessible via the symbolic group
				181	name \var{name}. Group names must be valid Python identifiers. A
				182	symbolic group is also a numbered group, just as if the group were not
				183	named. So the group named 'id' in the example above can also be
				184	referenced as the numbered group 1.
				185
				186	For example, if the pattern string is
				187	\code{r'(?P<id>[a-zA-Z_]\e w*)'}, the group can be referenced by its
				188	name in arguments to methods of match objects, such as \code{m.group('id')}
				189	or \code{m.end('id')}, and also by name in pattern text (e.g. \code{(?P=id)}) and
				190	replacement text (e.g. \code{\e g<id>}).
				191	%
				192	\item[\code{(?\#...)}] A comment; the contents of the parentheses are simply ignored.
				193	%
Guido van Rossum	0b33410	1997-12-08 17:33:40 +0000	[diff] [blame]	194	\item[\code{(?=...)}] Matches if \code{...} matches next, but doesn't consume any of the string. This is called a lookahead assertion. For example,
				195	\code{Isaac (?=Asimov)} will match 'Isaac~' only if it's followed by 'Asimov'.
Guido van Rossum	1acceb0	1997-08-14 23:12:18 +0000	[diff] [blame]	196	%
Guido van Rossum	0b33410	1997-12-08 17:33:40 +0000	[diff] [blame]	197	\item[\code{(?!...)}] Matches if \code{...} doesn't match next. This is a negative lookahead assertion. For example,
				198	For example,
				199	\code{Isaac (?!Asimov)} will match 'Isaac~' only if it's \emph{not} followed by 'Asimov'.
				200
Guido van Rossum	1acceb0	1997-08-14 23:12:18 +0000	[diff] [blame]	201	\end{itemize}
				202
				203	The special sequences consist of '\code{\e}' and a character from the
				204	list below. If the ordinary character is not on the list, then the
				205	resulting RE will match the second character. For example,
				206	\code{\e\$} matches the character '\$'. Ones where the backslash
				207	should be doubled are indicated.
				208
				209	\begin{itemize}
				210
				211	%
				212	\item[\code{\e \var{number}}] Matches the contents of the group of the
Guido van Rossum	0b33410	1997-12-08 17:33:40 +0000	[diff] [blame]	213	same number. Groups are numbered starting from 1. For example,
				214	\code{(.+) \e 1} matches 'the the' or '55 55', but not 'the end' (note
				215	the space after the group). This special sequence can only be used to
				216	match one of the first 99 groups. If the first digit of \var{number}
				217	is 0, or \var{number} is 3 octal digits long, it will not be interpreted
				218	as a group match, but as the character with octal value \var{number}.
Guido van Rossum	1acceb0	1997-08-14 23:12:18 +0000	[diff] [blame]	219	%
				220	\item[\code{\e A}] Matches only at the start of the string.
				221	%
				222	\item[\code{\e b}] Matches the empty string, but only at the
				223	beginning or end of a word. A word is defined as a sequence of
				224	alphanumeric characters, so the end of a word is indicated by
Guido van Rossum	0b33410	1997-12-08 17:33:40 +0000	[diff] [blame]	225	whitespace or a non-alphanumeric character.
Guido van Rossum	1acceb0	1997-08-14 23:12:18 +0000	[diff] [blame]	226	%
Guido van Rossum	0b33410	1997-12-08 17:33:40 +0000	[diff] [blame]	227	\item[\code{\e B}] Matches the empty string, but only when it is
				228	\emph{not} at the beginning or end of a word.
Guido van Rossum	1acceb0	1997-08-14 23:12:18 +0000	[diff] [blame]	229	%
				230	\item[\code{\e d}]Matches any decimal digit; this is
				231	equivalent to the set \code{[0-9]}.
				232	%
				233	\item[\code{\e D}]Matches any non-digit character; this is
Guido van Rossum	d7dc2eb	1997-10-22 03:03:44 +0000	[diff] [blame]	234	equivalent to the set \code{[{\^}0-9]}.
Guido van Rossum	1acceb0	1997-08-14 23:12:18 +0000	[diff] [blame]	235	%
				236	\item[\code{\e s}]Matches any whitespace character; this is
				237	equivalent to the set \code{[ \e t\e n\e r\e f\e v]}.
				238	%
				239	\item[\code{\e S}]Matches any non-whitespace character; this is
Guido van Rossum	d7dc2eb	1997-10-22 03:03:44 +0000	[diff] [blame]	240	equivalent to the set \code{[{\^} \e t\e n\e r\e f\e v]}.
Guido van Rossum	1acceb0	1997-08-14 23:12:18 +0000	[diff] [blame]	241	%
Guido van Rossum	0b33410	1997-12-08 17:33:40 +0000	[diff] [blame]	242	\item[\code{\e w}]When the LOCALE flag is not specified, matches any alphanumeric character; this is
				243	equivalent to the set \code{[a-zA-Z0-9_]}. With LOCALE, it will match
				244	the set \code{[0-9_]} plus whatever characters are defined as letters
				245	for the current locale.
Guido van Rossum	1acceb0	1997-08-14 23:12:18 +0000	[diff] [blame]	246	%
Guido van Rossum	0b33410	1997-12-08 17:33:40 +0000	[diff] [blame]	247	\item[\code{\e W}]When the LOCALE flag is not specified, matches any
				248	non-alphanumeric character; this is equivalent to the set
				249	\code{[{\^}a-zA-Z0-9_]}. With LOCALE, it will match any character
				250	not in the set \code{[0-9_]}, and not defined as a letter
				251	for the current locale.
Guido van Rossum	1acceb0	1997-08-14 23:12:18 +0000	[diff] [blame]	252
				253	\item[\code{\e Z}]Matches only at the end of the string.
				254	%
				255
				256	\item[\code{\e \e}] Matches a literal backslash.
				257
				258	\end{itemize}
				259
				260	\subsection{Module Contents}
				261
				262	The module defines the following functions and constants, and an exception:
				263
				264	\renewcommand{\indexsubitem}{(in module re)}
				265
				266	\begin{funcdesc}{compile}{pattern\optional{\, flags}}
				267	Compile a regular expression pattern into a regular expression
				268	object, which can be used for matching using its \code{match} and
				269	\code{search} methods, described below.
				270
Guido van Rossum	0b33410	1997-12-08 17:33:40 +0000	[diff] [blame]	271	The expression's behaviour can be modified by specifying a
				272	\var{flags} value. Values can be any of the following variables,
				273	combined using bitwise OR (the \code{\|} operator).
				274
Guido van Rossum	a42c178	1997-12-09 20:41:47 +0000	[diff] [blame]	275	\begin{itemize}
Guido van Rossum	0b33410	1997-12-08 17:33:40 +0000	[diff] [blame]	276
Guido van Rossum	a42c178	1997-12-09 20:41:47 +0000	[diff] [blame]	277	\item[I ] or IGNORECASE:
				278	Perform case-insensitive matching; expressions like [A-Z] will match
				279	lowercase letters, too.
Guido van Rossum	0b33410	1997-12-08 17:33:40 +0000	[diff] [blame]	280
Guido van Rossum	a42c178	1997-12-09 20:41:47 +0000	[diff] [blame]	281	\item[L ] or LOCALE:
				282	Make \code{\e w}, \code{\e W}, \code{\e b}, \code{\e B}, dependent on
				283	the current locale.
Guido van Rossum	0b33410	1997-12-08 17:33:40 +0000	[diff] [blame]	284
Guido van Rossum	a42c178	1997-12-09 20:41:47 +0000	[diff] [blame]	285	\item[M ] or MULTILINE:
				286	When specified, the pattern character \code{\^} matches at the
				287	beginning of the string and at the beginning of each line (immediately
				288	following each newline); and the pattern character \code{\$} matches
				289	at the end of the string and at the end of each line (immediately
				290	preceding each newline).
				291
Guido van Rossum	0b33410	1997-12-08 17:33:40 +0000	[diff] [blame]	292	By default, \code{\^} matches only at the beginning of the string, and
				293	\code{\$} only at the end of the string and immediately before the
				294	newline (if any) at the end of the string.
Guido van Rossum	0b33410	1997-12-08 17:33:40 +0000	[diff] [blame]	295
Guido van Rossum	a42c178	1997-12-09 20:41:47 +0000	[diff] [blame]	296	\item[S ] or DOTALL:
				297	Make the \code{.} special character match a newline; without this
				298	flag, \code{.} will match anything \emph{except} a newline.
Guido van Rossum	0b33410	1997-12-08 17:33:40 +0000	[diff] [blame]	299
Guido van Rossum	a42c178	1997-12-09 20:41:47 +0000	[diff] [blame]	300	\item[X ] or VERBOSE:
				301	When specified, whitespace within the pattern string is ignored except
				302	when in a character class or preceded by an unescaped backslash, and,
				303	when a line contains a \code{\#} not in a character class or preceded
				304	by an unescaped backslash, all characters from the leftmost such
				305	\code{\#} through the end of the line are ignored.
Guido van Rossum	0b33410	1997-12-08 17:33:40 +0000	[diff] [blame]	306
Guido van Rossum	a42c178	1997-12-09 20:41:47 +0000	[diff] [blame]	307	\end{itemize}
Guido van Rossum	0b33410	1997-12-08 17:33:40 +0000	[diff] [blame]	308
Guido van Rossum	1acceb0	1997-08-14 23:12:18 +0000	[diff] [blame]	309	The sequence
				310	%
				311	\bcode\begin{verbatim}
				312	prog = re.compile(pat)
				313	result = prog.match(str)
				314	\end{verbatim}\ecode
				315	%
				316	is equivalent to
				317	%
				318	\bcode\begin{verbatim}
				319	result = re.match(pat, str)
				320	\end{verbatim}\ecode
				321	%
				322	but the version using \code{compile()} is more efficient when multiple
				323	regular expressions are used concurrently in a single program.
				324	%(The compiled version of the last pattern passed to \code{regex.match()} or
				325	%\code{regex.search()} is cached, so programs that use only a single
				326	%regular expression at a time needn't worry about compiling regular
				327	%expressions.)
				328	\end{funcdesc}
				329
				330	\begin{funcdesc}{escape}{string}
				331	Return \var{string} with all non-alphanumerics backslashed; this is
				332	useful if you want to match some variable string which may have
				333	regular expression metacharacters in it.
				334	\end{funcdesc}
				335
				336	\begin{funcdesc}{match}{pattern\, string\optional{\, flags}}
				337	If zero or more characters at the beginning of \var{string} match
				338	the regular expression \var{pattern}, return a corresponding
				339	\code{Match} object. Return \code{None} if the string does not
				340	match the pattern; note that this is different from a zero-length
				341	match.
				342	\end{funcdesc}
				343
				344	\begin{funcdesc}{search}{pattern\, string\optional{\, flags}}
				345	Scan through \var{string} looking for a location where the regular
				346	expression \var{pattern} produces a match. Return \code{None} if no
				347	position in the string matches the pattern; note that this is
				348	different from finding a zero-length match at some point in the string.
				349	\end{funcdesc}
				350
				351	\begin{funcdesc}{split}{pattern\, string\, \optional{, maxsplit=0}}
				352	Split \var{string} by the occurrences of \var{pattern}. If
				353	capturing parentheses are used in pattern, then occurrences of
				354	patterns or subpatterns are also returned.
				355	%
				356	\bcode\begin{verbatim}
				357	>>> re.split('[\W]+', 'Words, words, words.')
				358	['Words', 'words', 'words', '']
				359	>>> re.split('([\W]+)', 'Words, words, words.')
				360	['Words', ', ', 'words', ', ', 'words', '.', '']
				361	\end{verbatim}\ecode
				362	%
				363	This function combines and extends the functionality of
Guido van Rossum	0b33410	1997-12-08 17:33:40 +0000	[diff] [blame]	364	the old \code{regex.split()} and \code{regex.splitx()}.
Guido van Rossum	1acceb0	1997-08-14 23:12:18 +0000	[diff] [blame]	365	\end{funcdesc}
				366
				367	\begin{funcdesc}{sub}{pattern\, repl\, string\optional{, count=0}}
				368	Return the string obtained by replacing the leftmost non-overlapping
				369	occurrences of \var{pattern} in \var{string} by the replacement
Barry Warsaw	4552f3d	1997-11-20 00:15:13 +0000	[diff] [blame]	370	\var{repl}. If the pattern isn't found, \var{string} is returned
				371	unchanged. \var{repl} can be a string or a function; if a function,
				372	it is called for every non-overlapping occurance of \var{pattern}.
Guido van Rossum	0b33410	1997-12-08 17:33:40 +0000	[diff] [blame]	373	The function takes a single match object argument, and returns the
				374	replacement string. For example:
Barry Warsaw	4552f3d	1997-11-20 00:15:13 +0000	[diff] [blame]	375	%
				376	\bcode\begin{verbatim}
				377	>>> def dashrepl(matchobj):
				378	... if matchobj.group(0) == '-': return ' '
				379	... else: return '-'
				380	>>> re.sub('-{1,2}', dashrepl, 'pro----gram-files')
				381	'pro--gram files'
				382	\end{verbatim}\ecode
				383	%
Guido van Rossum	0b33410	1997-12-08 17:33:40 +0000	[diff] [blame]	384	The pattern may be a string or a
				385	regexp object; if you need to specify
				386	regular expression flags, you must use a regexp object, or use
				387	embedded modifiers in a pattern string; e.g.
Guido van Rossum	1acceb0	1997-08-14 23:12:18 +0000	[diff] [blame]	388	%
				389	\bcode\begin{verbatim}
				390	sub("(?i)b+", "x", "bbbb BBBB") returns 'x x'.
				391	\end{verbatim}\ecode
				392	%
				393	The optional argument \var{count} is the maximum number of pattern
				394	occurrences to be replaced; count must be a non-negative integer, and
				395	the default value of 0 means to replace all occurrences.
				396
				397	Empty matches for the pattern are replaced only when not adjacent to a
				398	previous match, so \code{sub('x*', '-', 'abc')} returns '-a-b-c-'.
				399	\end{funcdesc}
				400
				401	\begin{funcdesc}{subn}{pattern\, repl\, string\optional{, count=0}}
				402	Perform the same operation as \code{sub()}, but return a tuple
				403	\code{(new_string, number_of_subs_made)}.
				404	\end{funcdesc}
				405
				406	\begin{excdesc}{error}
				407	Exception raised when a string passed to one of the functions here
				408	is not a valid regular expression (e.g., unmatched parentheses) or
				409	when some other error occurs during compilation or matching. (It is
				410	never an error if a string contains no match for a pattern.)
				411	\end{excdesc}
				412
				413	\subsection{Regular Expression Objects}
				414	Compiled regular expression objects support the following methods and
				415	attributes:
				416
Guido van Rossum	eb53ae4	1997-10-05 18:54:07 +0000	[diff] [blame]	417	\renewcommand{\indexsubitem}{(re method)}
Guido van Rossum	0b33410	1997-12-08 17:33:40 +0000	[diff] [blame]	418	\begin{funcdesc}{match}{string\optional{\, pos}\optional{\, endpos}}
Guido van Rossum	eb53ae4	1997-10-05 18:54:07 +0000	[diff] [blame]	419	If zero or more characters at the beginning of \var{string} match
				420	this regular expression, return a corresponding
				421	\code{Match} object. Return \code{None} if the string does not
				422	match the pattern; note that this is different from a zero-length
				423	match.
Guido van Rossum	1acceb0	1997-08-14 23:12:18 +0000	[diff] [blame]	424
				425	The optional second parameter \var{pos} gives an index in the string
				426	where the search is to start; it defaults to \code{0}. This is not
				427	completely equivalent to slicing the string; the \code{'\^'} pattern
				428	character matches at the real begin of the string and at positions
				429	just after a newline, not necessarily at the index where the search
				430	is to start.
Guido van Rossum	0b33410	1997-12-08 17:33:40 +0000	[diff] [blame]	431
				432	The optional parameter \var{endpos} limits how far the string will
				433	be searched; it will be as if the string is \var{endpos} characters
				434	long, so only the characters from \var{pos} to \var{endpos} will be
				435	searched for a match.
Guido van Rossum	1acceb0	1997-08-14 23:12:18 +0000	[diff] [blame]	436	\end{funcdesc}
				437
Guido van Rossum	0b33410	1997-12-08 17:33:40 +0000	[diff] [blame]	438	\begin{funcdesc}{search}{string\optional{\, pos}\optional{\, endpos}}
Guido van Rossum	eb53ae4	1997-10-05 18:54:07 +0000	[diff] [blame]	439	Scan through \var{string} looking for a location where this regular
				440	expression produces a match. Return \code{None} if no
				441	position in the string matches the pattern; note that this is
				442	different from finding a zero-length match at some point in the string.
Guido van Rossum	1acceb0	1997-08-14 23:12:18 +0000	[diff] [blame]	443
Guido van Rossum	0b33410	1997-12-08 17:33:40 +0000	[diff] [blame]	444	The optional \var{pos} and \var{endpos} parameters have the same meaning as for the
Guido van Rossum	1acceb0	1997-08-14 23:12:18 +0000	[diff] [blame]	445	\code{match} method.
				446	\end{funcdesc}
				447
				448	\begin{funcdesc}{split}{string\, \optional{, maxsplit=0}}
				449	Identical to the \code{split} function, using the compiled pattern.
				450	\end{funcdesc}
				451
				452	\begin{funcdesc}{sub}{repl\, string\optional{, count=0}}
				453	Identical to the \code{sub} function, using the compiled pattern.
				454	\end{funcdesc}
				455
				456	\begin{funcdesc}{subn}{repl\, string\optional{, count=0}}
				457	Identical to the \code{subn} function, using the compiled pattern.
				458	\end{funcdesc}
				459
				460	\renewcommand{\indexsubitem}{(regex attribute)}
				461
				462	\begin{datadesc}{flags}
				463	The flags argument used when the regex object was compiled, or 0 if no
				464	flags were provided.
				465	\end{datadesc}
				466
				467	\begin{datadesc}{groupindex}
				468	A dictionary mapping any symbolic group names (defined by
				469	\code{?P<\var{id}>}) to group numbers. The dictionary is empty if no
				470	symbolic groups were used in the pattern.
				471	\end{datadesc}
				472
				473	\begin{datadesc}{pattern}
				474	The pattern string from which the regex object was compiled.
				475	\end{datadesc}
				476
				477	\subsection{Match Objects}
				478	Match objects support the following methods and attributes:
				479
Guido van Rossum	1acceb0	1997-08-14 23:12:18 +0000	[diff] [blame]	480	\begin{funcdesc}{start}{group}
				481	\end{funcdesc}
				482
				483	\begin{funcdesc}{end}{group}
Guido van Rossum	0b33410	1997-12-08 17:33:40 +0000	[diff] [blame]	484	Return the indices of the start and end of the substring
				485	matched by \var{group}. Return \code{None} if \var{group} exists but
				486	did not contribute to the match. Note that for a match object
				487	\code{m}, and a group \code{g} that did contribute to the match, the
				488	substring matched by group \code{g} is
Guido van Rossum	1acceb0	1997-08-14 23:12:18 +0000	[diff] [blame]	489	\bcode\begin{verbatim}
				490	m.string[m.start(g):m.end(g)]
				491	\end{verbatim}\ecode
				492	%
				493	Note too that \code{m.start(\var{group})} will equal
				494	\code{m.end(\var{group})} if \var{group} matched a null string. For example,
				495	after \code{m = re.search('b(c?)', 'cba')}, \code{m.start(0)} is 1,
				496	\code{m.end(0)} is 2, \code{m.start(1)} and \code{m.end(1)} are both
				497	2, and \code{m.start(2)} raises an
				498	\code{IndexError} exception.
				499	\end{funcdesc}
				500
Guido van Rossum	0b33410	1997-12-08 17:33:40 +0000	[diff] [blame]	501	\begin{funcdesc}{span}{group}
				502	Return the 2-tuple \code{(start(\var{group}), end(\var{group}))}.
				503	Note that if \var{group} did not contribute to the match, this is
				504	\code{(None, None)}.
				505	\end{funcdesc}
				506
Guido van Rossum	1acceb0	1997-08-14 23:12:18 +0000	[diff] [blame]	507	\begin{funcdesc}{group}{\optional{g1, g2, ...})}
				508	This method is only valid when the last call to the \code{match}
				509	or \code{search} method found a match. It returns one or more
				510	groups of the match. If there is a single \var{index} argument,
				511	the result is a single string; if there are multiple arguments, the
				512	result is a tuple with one item per argument. If the \var{index} is
				513	zero, the corresponding return value is the entire matching string; if
				514	it is in the inclusive range [1..99], it is the string matching the
				515	the corresponding parenthesized group (using the default syntax,
				516	groups are parenthesized using \code{\e (} and \code{\e )}). If no
				517	such group exists, the corresponding result is \code{None}.
				518
Guido van Rossum	0b33410	1997-12-08 17:33:40 +0000	[diff] [blame]	519	If the regular expression uses the \code{(?P<\var{name}>...)} syntax,
				520	the \var{index} arguments may also be strings identifying groups by
				521	their group name.
Guido van Rossum	1acceb0	1997-08-14 23:12:18 +0000	[diff] [blame]	522	\end{funcdesc}
				523
				524	\begin{datadesc}{pos}
Guido van Rossum	0b33410	1997-12-08 17:33:40 +0000	[diff] [blame]	525	The value of \var{pos} which was passed to the
				526	\code{search} or \code{match} function. This is the index into the
				527	string at which the regex engine started looking for a match.
				528	\end{datadesc}
				529
				530	\begin{datadesc}{endpos}
				531	The value of \var{endpos} which was passed to the
				532	\code{search} or \code{match} function. This is the index into the
				533	string beyond which the regex engine will not go.
Guido van Rossum	1acceb0	1997-08-14 23:12:18 +0000	[diff] [blame]	534	\end{datadesc}
				535
				536	\begin{datadesc}{re}
				537	The regular expression object whose match() or search() method
Guido van Rossum	0b33410	1997-12-08 17:33:40 +0000	[diff] [blame]	538	produced this match object.
Guido van Rossum	1acceb0	1997-08-14 23:12:18 +0000	[diff] [blame]	539	\end{datadesc}
				540
				541	\begin{datadesc}{string}
				542	The string passed to \code{match()} or \code{search()}.
				543	\end{datadesc}
				544
Guido van Rossum	1acceb0	1997-08-14 23:12:18 +0000	[diff] [blame]	545	\begin{seealso}
				546	\seetext Jeffrey Friedl, \emph{Mastering Regular Expressions}.
				547	\end{seealso}
				548