Blame - Doc/lib/libregex.tex - platform/external/python/cpython3

blob: fabc182c882048b680f96df202907fcf40fe19cf [file] [log] [blame]

Fred Drake	3a0351c	1998-04-04 07:23:21 +0000	[diff] [blame]	1	\section{Built-in Module \module{regex}}
Guido van Rossum	e47da0a	1997-07-17 16:34:52 +0000	[diff] [blame]	2	\label{module-regex}
Guido van Rossum	5fdeeea	1994-01-02 01:22:07 +0000	[diff] [blame]	3	\bimodindex{regex}
Fred Drake	054f8fd	1998-01-12 18:28:20 +0000	[diff] [blame]	4
Guido van Rossum	5fdeeea	1994-01-02 01:22:07 +0000	[diff] [blame]	5	This module provides regular expression matching operations similar to
Guido van Rossum	28f9a68	1997-12-09 19:45:47 +0000	[diff] [blame]	6	those found in Emacs.
				7
				8	\strong{Obsolescence note:}
				9	This module is obsolete as of Python version 1.5; it is still being
				10	maintained because much existing code still uses it. All new code in
Fred Drake	054f8fd	1998-01-12 18:28:20 +0000	[diff] [blame]	11	need of regular expressions should use the new
				12	\code{re}\refstmodindex{re} module, which supports the more powerful
				13	and regular Perl-style regular expressions. Existing code should be
				14	converted. The standard library module
				15	\code{reconvert}\refstmodindex{reconvert} helps in converting
				16	\code{regex} style regular expressions to \code{re}\refstmodindex{re}
Fred Drake	9da3881	1998-04-09 14:06:33 +0000	[diff] [blame]	17	style regular expressions. (For more conversion help, see Andrew
				18	Kuchling's\index{Kuchling, Andrew} ``\module{regex-to-re} HOWTO'' at
				19	\url{http://www.python.org/doc/howto/regex-to-re/}.)
Guido van Rossum	5fdeeea	1994-01-02 01:22:07 +0000	[diff] [blame]	20
Guido van Rossum	6240b0b	1996-10-24 22:49:13 +0000	[diff] [blame]	21	By default the patterns are Emacs-style regular expressions
				22	(with one exception). There is
Guido van Rossum	5fdeeea	1994-01-02 01:22:07 +0000	[diff] [blame]	23	a way to change the syntax to match that of several well-known
Guido van Rossum	fe4254e	1995-08-11 00:31:57 +0000	[diff] [blame]	24	\UNIX{} utilities. The exception is that Emacs' \samp{\e s}
				25	pattern is not supported, since the original implementation references
				26	the Emacs syntax tables.
Guido van Rossum	5fdeeea	1994-01-02 01:22:07 +0000	[diff] [blame]	27
				28	This module is 8-bit clean: both patterns and strings may contain null
				29	bytes and characters whose high bit is set.
				30
Guido van Rossum	326c0bc	1994-01-03 00:00:31 +0000	[diff] [blame]	31	\strong{Please note:} There is a little-known fact about Python string
				32	literals which means that you don't usually have to worry about
				33	doubling backslashes, even though they are used to escape special
				34	characters in string literals as well as in regular expressions. This
				35	is because Python doesn't remove backslashes from string literals if
				36	they are followed by an unrecognized escape character.
				37	\emph{However}, if you want to include a literal \dfn{backslash} in a
				38	regular expression represented as a string literal, you have to
Guido van Rossum	1f8cee2	1997-03-14 04:10:13 +0000	[diff] [blame]	39	\emph{quadruple} it or enclose it in a singleton character class.
				40	E.g.\ to extract \LaTeX\ \samp{\e section\{{\rm
Guido van Rossum	326c0bc	1994-01-03 00:00:31 +0000	[diff] [blame]	41	\ldots}\}} headers from a document, you can use this pattern:
Guido van Rossum	eb0f066	1997-12-30 20:38:16 +0000	[diff] [blame]	42	\code{'[\e ]section\{\e (.*\e )\}'}. \emph{Another exception:}
Guido van Rossum	1a53560	1996-06-26 19:43:22 +0000	[diff] [blame]	43	the escape sequece \samp{\e b} is significant in string literals
				44	(where it means the ASCII bell character) as well as in Emacs regular
				45	expressions (where it stands for a word boundary), so in order to
				46	search for a word boundary, you should use the pattern \code{'\e \e b'}.
				47	Similarly, a backslash followed by a digit 0-7 should be doubled to
				48	avoid interpretation as an octal escape.
				49
				50	\subsection{Regular Expressions}
				51
				52	A regular expression (or RE) specifies a set of strings that matches
				53	it; the functions in this module let you check if a particular string
Guido van Rossum	6240b0b	1996-10-24 22:49:13 +0000	[diff] [blame]	54	matches a given regular expression (or if a given regular expression
				55	matches a particular string, which comes down to the same thing).
Guido van Rossum	1a53560	1996-06-26 19:43:22 +0000	[diff] [blame]	56
				57	Regular expressions can be concatenated to form new regular
				58	expressions; if \emph{A} and \emph{B} are both regular expressions,
				59	then \emph{AB} is also an regular expression. If a string \emph{p}
				60	matches A and another string \emph{q} matches B, the string \emph{pq}
				61	will match AB. Thus, complex expressions can easily be constructed
				62	from simpler ones like the primitives described here. For details of
				63	the theory and implementation of regular expressions, consult almost
				64	any textbook about compiler construction.
				65
				66	% XXX The reference could be made more specific, say to
				67	% "Compilers: Principles, Techniques and Tools", by Alfred V. Aho,
				68	% Ravi Sethi, and Jeffrey D. Ullman, or some FA text.
				69
Guido van Rossum	6240b0b	1996-10-24 22:49:13 +0000	[diff] [blame]	70	A brief explanation of the format of regular expressions follows.
Guido van Rossum	1a53560	1996-06-26 19:43:22 +0000	[diff] [blame]	71
				72	Regular expressions can contain both special and ordinary characters.
				73	Ordinary characters, like '\code{A}', '\code{a}', or '\code{0}', are
				74	the simplest regular expressions; they simply match themselves. You
				75	can concatenate ordinary characters, so '\code{last}' matches the
Guido van Rossum	6240b0b	1996-10-24 22:49:13 +0000	[diff] [blame]	76	characters 'last'. (In the rest of this section, we'll write RE's in
				77	\code{this special font}, usually without quotes, and strings to be
				78	matched 'in single quotes'.)
Guido van Rossum	1a53560	1996-06-26 19:43:22 +0000	[diff] [blame]	79
				80	Special characters either stand for classes of ordinary characters, or
				81	affect how the regular expressions around them are interpreted.
				82
				83	The special characters are:
				84	\begin{itemize}
Fred Drake	4b3f031	1996-12-13 22:04:31 +0000	[diff] [blame]	85	\item[\code{.}] (Dot.) Matches any character except a newline.
				86	\item[\code{\^}] (Caret.) Matches the start of the string.
				87	\item[\code{\$}] Matches the end of the string.
Guido van Rossum	1a53560	1996-06-26 19:43:22 +0000	[diff] [blame]	88	\code{foo} matches both 'foo' and 'foobar', while the regular
Fred Drake	4b3f031	1996-12-13 22:04:31 +0000	[diff] [blame]	89	expression '\code{foo\$}' matches only 'foo'.
Guido van Rossum	1a53560	1996-06-26 19:43:22 +0000	[diff] [blame]	90	\item[\code{*}] Causes the resulting RE to
				91	match 0 or more repetitions of the preceding RE. \code{ab*} will
				92	match 'a', 'ab', or 'a' followed by any number of 'b's.
				93	\item[\code{+}] Causes the
				94	resulting RE to match 1 or more repetitions of the preceding RE.
				95	\code{ab+} will match 'a' followed by any non-zero number of 'b's; it
				96	will not match just 'a'.
				97	\item[\code{?}] Causes the resulting RE to
				98	match 0 or 1 repetitions of the preceding RE. \code{ab?} will
				99	match either 'a' or 'ab'.
				100
				101	\item[\code{\e}] Either escapes special characters (permitting you to match
				102	characters like '*?+\&\$'), or signals a special sequence; special
				103	sequences are discussed below. Remember that Python also uses the
				104	backslash as an escape sequence in string literals; if the escape
				105	sequence isn't recognized by Python's parser, the backslash and
				106	subsequent character are included in the resulting string. However,
				107	if Python would recognize the resulting sequence, the backslash should
				108	be repeated twice.
				109
				110	\item[\code{[]}] Used to indicate a set of characters. Characters can
				111	be listed individually, or a range is indicated by giving two
				112	characters and separating them by a '-'. Special characters are
				113	not active inside sets. For example, \code{[akm\$]}
				114	will match any of the characters 'a', 'k', 'm', or '\$'; \code{[a-z]} will
				115	match any lowercase letter.
				116
				117	If you want to include a \code{]} inside a
				118	set, it must be the first character of the set; to include a \code{-},
				119	place it as the first or last character.
				120
				121	Characters \emph{not} within a range can be matched by including a
				122	\code{\^} as the first character of the set; \code{\^} elsewhere will
				123	simply match the '\code{\^}' character.
				124	\end{itemize}
				125
				126	The special sequences consist of '\code{\e}' and a character
				127	from the list below. If the ordinary character is not on the list,
				128	then the resulting RE will match the second character. For example,
				129	\code{\e\$} matches the character '\$'. Ones where the backslash
Guido van Rossum	eb0f066	1997-12-30 20:38:16 +0000	[diff] [blame]	130	should be doubled in string literals are indicated.
Guido van Rossum	1a53560	1996-06-26 19:43:22 +0000	[diff] [blame]	131
				132	\begin{itemize}
				133	\item[\code{\e\|}]\code{A\e\|B}, where A and B can be arbitrary REs,
Guido van Rossum	6240b0b	1996-10-24 22:49:13 +0000	[diff] [blame]	134	creates a regular expression that will match either A or B. This can
				135	be used inside groups (see below) as well.
Guido van Rossum	1a53560	1996-06-26 19:43:22 +0000	[diff] [blame]	136	%
Fred Drake	4b3f031	1996-12-13 22:04:31 +0000	[diff] [blame]	137	\item[\code{\e( \e)}] Indicates the start and end of a group; the
Guido van Rossum	1a53560	1996-06-26 19:43:22 +0000	[diff] [blame]	138	contents of a group can be matched later in the string with the
Fred Drake	4b3f031	1996-12-13 22:04:31 +0000	[diff] [blame]	139	\code{\e [1-9]} special sequence, described next.
Fred Drake	75bfb0f	1998-02-19 06:32:06 +0000	[diff] [blame]	140	\end{itemize}
				141
				142	\begin{fulllineitems}
				143	\item[\code{\e \e 1, ... \e \e 7, \e 8, \e 9}]
Fred Drake	4b3f031	1996-12-13 22:04:31 +0000	[diff] [blame]	144	Matches the contents of the group of the same
Guido van Rossum	1a53560	1996-06-26 19:43:22 +0000	[diff] [blame]	145	number. For example, \code{\e (.+\e ) \e \e 1} matches 'the the' or
				146	'55 55', but not 'the end' (note the space after the group). This
				147	special sequence can only be used to match one of the first 9 groups;
				148	groups with higher numbers can be matched using the \code{\e v}
Guido van Rossum	6240b0b	1996-10-24 22:49:13 +0000	[diff] [blame]	149	sequence. (\code{\e 8} and \code{\e 9} don't need a double backslash
Guido van Rossum	38e0df3	1998-02-11 22:55:55 +0000	[diff] [blame]	150	because they are not octal digits.)
Fred Drake	75bfb0f	1998-02-19 06:32:06 +0000	[diff] [blame]	151	\end{fulllineitems}
				152
				153	\begin{itemize}
Fred Drake	4b3f031	1996-12-13 22:04:31 +0000	[diff] [blame]	154	\item[\code{\e \e b}] Matches the empty string, but only at the
Guido van Rossum	1a53560	1996-06-26 19:43:22 +0000	[diff] [blame]	155	beginning or end of a word. A word is defined as a sequence of
				156	alphanumeric characters, so the end of a word is indicated by
Fred Drake	4b3f031	1996-12-13 22:04:31 +0000	[diff] [blame]	157	whitespace or a non-alphanumeric character.
Guido van Rossum	1a53560	1996-06-26 19:43:22 +0000	[diff] [blame]	158	%
Fred Drake	4b3f031	1996-12-13 22:04:31 +0000	[diff] [blame]	159	\item[\code{\e B}] Matches the empty string, but when it is \emph{not} at the
				160	beginning or end of a word.
Guido van Rossum	1a53560	1996-06-26 19:43:22 +0000	[diff] [blame]	161	%
Fred Drake	4b3f031	1996-12-13 22:04:31 +0000	[diff] [blame]	162	\item[\code{\e v}] Must be followed by a two digit decimal number, and
Fred Drake	054f8fd	1998-01-12 18:28:20 +0000	[diff] [blame]	163	matches the contents of the group of the same number. The group
				164	number must be between 1 and 99, inclusive.
Guido van Rossum	1a53560	1996-06-26 19:43:22 +0000	[diff] [blame]	165	%
				166	\item[\code{\e w}]Matches any alphanumeric character; this is
				167	equivalent to the set \code{[a-zA-Z0-9]}.
				168	%
Fred Drake	4b3f031	1996-12-13 22:04:31 +0000	[diff] [blame]	169	\item[\code{\e W}] Matches any non-alphanumeric character; this is
				170	equivalent to the set \code{[\^a-zA-Z0-9]}.
				171	\item[\code{\e <}] Matches the empty string, but only at the beginning of a
Guido van Rossum	1a53560	1996-06-26 19:43:22 +0000	[diff] [blame]	172	word. A word is defined as a sequence of alphanumeric characters, so
				173	the end of a word is indicated by whitespace or a non-alphanumeric
Fred Drake	4b3f031	1996-12-13 22:04:31 +0000	[diff] [blame]	174	character.
				175	\item[\code{\e >}] Matches the empty string, but only at the end of a
				176	word.
Guido van Rossum	1a53560	1996-06-26 19:43:22 +0000	[diff] [blame]	177
Fred Drake	4b3f031	1996-12-13 22:04:31 +0000	[diff] [blame]	178	\item[\code{\e \e \e \e}] Matches a literal backslash.
Guido van Rossum	6240b0b	1996-10-24 22:49:13 +0000	[diff] [blame]	179
Guido van Rossum	1a53560	1996-06-26 19:43:22 +0000	[diff] [blame]	180	% In Emacs, the following two are start of buffer/end of buffer. In
				181	% Python they seem to be synonyms for ^$.
Fred Drake	4b3f031	1996-12-13 22:04:31 +0000	[diff] [blame]	182	\item[\code{\e `}] Like \code{\^}, this only matches at the start of the
				183	string.
Fred Drake	054f8fd	1998-01-12 18:28:20 +0000	[diff] [blame]	184	\item[\code{\e \e '}] Like \code{\$}, this only matches at the end of
				185	the string.
Guido van Rossum	1a53560	1996-06-26 19:43:22 +0000	[diff] [blame]	186	% end of buffer
				187	\end{itemize}
				188
				189	\subsection{Module Contents}
Guido van Rossum	38e0df3	1998-02-11 22:55:55 +0000	[diff] [blame]	190	\nodename{Contents of Module regex}
Guido van Rossum	5fdeeea	1994-01-02 01:22:07 +0000	[diff] [blame]	191
				192	The module defines these functions, and an exception:
				193
Guido van Rossum	326c0bc	1994-01-03 00:00:31 +0000	[diff] [blame]	194
Fred Drake	cce1090	1998-03-17 06:33:25 +0000	[diff] [blame]	195	\begin{funcdesc}{match}{pattern, string}
Guido van Rossum	5fdeeea	1994-01-02 01:22:07 +0000	[diff] [blame]	196	Return how many characters at the beginning of \var{string} match
				197	the regular expression \var{pattern}. Return \code{-1} if the
				198	string does not match the pattern (this is different from a
				199	zero-length match!).
				200	\end{funcdesc}
				201
Fred Drake	cce1090	1998-03-17 06:33:25 +0000	[diff] [blame]	202	\begin{funcdesc}{search}{pattern, string}
Guido van Rossum	5fdeeea	1994-01-02 01:22:07 +0000	[diff] [blame]	203	Return the first position in \var{string} that matches the regular
Guido van Rossum	6240b0b	1996-10-24 22:49:13 +0000	[diff] [blame]	204	expression \var{pattern}. Return \code{-1} if no position in the string
Guido van Rossum	5fdeeea	1994-01-02 01:22:07 +0000	[diff] [blame]	205	matches the pattern (this is different from a zero-length match
				206	anywhere!).
				207	\end{funcdesc}
				208
Fred Drake	cce1090	1998-03-17 06:33:25 +0000	[diff] [blame]	209	\begin{funcdesc}{compile}{pattern\optional{, translate}}
Guido van Rossum	5fdeeea	1994-01-02 01:22:07 +0000	[diff] [blame]	210	Compile a regular expression pattern into a regular expression
Fred Drake	054f8fd	1998-01-12 18:28:20 +0000	[diff] [blame]	211	object, which can be used for matching using its \code{match()} and
				212	\code{search()} methods, described below. The optional argument
Guido van Rossum	5fdeeea	1994-01-02 01:22:07 +0000	[diff] [blame]	213	\var{translate}, if present, must be a 256-character string
				214	indicating how characters (both of the pattern and of the strings to
Fred Drake	054f8fd	1998-01-12 18:28:20 +0000	[diff] [blame]	215	be matched) are translated before comparing them; the \var{i}-th
Guido van Rossum	5fdeeea	1994-01-02 01:22:07 +0000	[diff] [blame]	216	element of the string gives the translation for the character with
Fred Drake	054f8fd	1998-01-12 18:28:20 +0000	[diff] [blame]	217	\ASCII{} code \var{i}. This can be used to implement
Guido van Rossum	470be14	1995-03-17 16:07:09 +0000	[diff] [blame]	218	case-insensitive matching; see the \code{casefold} data item below.
Guido van Rossum	5fdeeea	1994-01-02 01:22:07 +0000	[diff] [blame]	219
				220	The sequence
				221
Fred Drake	1947991	1998-02-13 06:58:54 +0000	[diff] [blame]	222	\begin{verbatim}
Guido van Rossum	5fdeeea	1994-01-02 01:22:07 +0000	[diff] [blame]	223	prog = regex.compile(pat)
				224	result = prog.match(str)
Fred Drake	1947991	1998-02-13 06:58:54 +0000	[diff] [blame]	225	\end{verbatim}
Guido van Rossum	e47da0a	1997-07-17 16:34:52 +0000	[diff] [blame]	226	%
Guido van Rossum	5fdeeea	1994-01-02 01:22:07 +0000	[diff] [blame]	227	is equivalent to
				228
Fred Drake	1947991	1998-02-13 06:58:54 +0000	[diff] [blame]	229	\begin{verbatim}
Guido van Rossum	5fdeeea	1994-01-02 01:22:07 +0000	[diff] [blame]	230	result = regex.match(pat, str)
Fred Drake	1947991	1998-02-13 06:58:54 +0000	[diff] [blame]	231	\end{verbatim}
Fred Drake	054f8fd	1998-01-12 18:28:20 +0000	[diff] [blame]	232
Guido van Rossum	5fdeeea	1994-01-02 01:22:07 +0000	[diff] [blame]	233	but the version using \code{compile()} is more efficient when multiple
				234	regular expressions are used concurrently in a single program. (The
				235	compiled version of the last pattern passed to \code{regex.match()} or
				236	\code{regex.search()} is cached, so programs that use only a single
				237	regular expression at a time needn't worry about compiling regular
				238	expressions.)
				239	\end{funcdesc}
				240
				241	\begin{funcdesc}{set_syntax}{flags}
Fred Drake	054f8fd	1998-01-12 18:28:20 +0000	[diff] [blame]	242	Set the syntax to be used by future calls to \code{compile()},
				243	\code{match()} and \code{search()}. (Already compiled expression
				244	objects are not affected.) The argument is an integer which is the
				245	OR of several flag bits. The return value is the previous value of
				246	the syntax flags. Names for the flags are defined in the standard
				247	module \code{regex_syntax}\refstmodindex{regex_syntax}; read the
				248	file \file{regex_syntax.py} for more information.
Guido van Rossum	5fdeeea	1994-01-02 01:22:07 +0000	[diff] [blame]	249	\end{funcdesc}
				250
Barry Warsaw	cd77df6	1997-02-18 18:54:30 +0000	[diff] [blame]	251	\begin{funcdesc}{get_syntax}{}
				252	Returns the current value of the syntax flags as an integer.
				253	\end{funcdesc}
				254
Fred Drake	cce1090	1998-03-17 06:33:25 +0000	[diff] [blame]	255	\begin{funcdesc}{symcomp}{pattern\optional{, translate}}
Fred Drake	054f8fd	1998-01-12 18:28:20 +0000	[diff] [blame]	256	This is like \code{compile()}, but supports symbolic group names: if a
Guido van Rossum	6c4f003	1995-03-07 10:14:09 +0000	[diff] [blame]	257	parenthesis-enclosed group begins with a group name in angular
Guido van Rossum	326c0bc	1994-01-03 00:00:31 +0000	[diff] [blame]	258	brackets, e.g. \code{'\e(<id>[a-z][a-z0-9]*\e)'}, the group can
Fred Drake	054f8fd	1998-01-12 18:28:20 +0000	[diff] [blame]	259	be referenced by its name in arguments to the \code{group()} method of
Guido van Rossum	326c0bc	1994-01-03 00:00:31 +0000	[diff] [blame]	260	the resulting compiled regular expression object, like this:
Guido van Rossum	7defee7	1995-02-27 17:52:35 +0000	[diff] [blame]	261	\code{p.group('id')}. Group names may contain alphanumeric characters
				262	and \code{'_'} only.
Guido van Rossum	326c0bc	1994-01-03 00:00:31 +0000	[diff] [blame]	263	\end{funcdesc}
				264
Guido van Rossum	5fdeeea	1994-01-02 01:22:07 +0000	[diff] [blame]	265	\begin{excdesc}{error}
				266	Exception raised when a string passed to one of the functions here
				267	is not a valid regular expression (e.g., unmatched parentheses) or
				268	when some other error occurs during compilation or matching. (It is
				269	never an error if a string contains no match for a pattern.)
				270	\end{excdesc}
				271
				272	\begin{datadesc}{casefold}
Fred Drake	054f8fd	1998-01-12 18:28:20 +0000	[diff] [blame]	273	A string suitable to pass as the \var{translate} argument to
				274	\code{compile()} to map all upper case characters to their lowercase
Guido van Rossum	5fdeeea	1994-01-02 01:22:07 +0000	[diff] [blame]	275	equivalents.
				276	\end{datadesc}
				277
				278	\noindent
				279	Compiled regular expression objects support these methods:
				280
Fred Drake	1947991	1998-02-13 06:58:54 +0000	[diff] [blame]	281	\setindexsubitem{(regex method)}
Fred Drake	cce1090	1998-03-17 06:33:25 +0000	[diff] [blame]	282	\begin{funcdesc}{match}{string\optional{, pos}}
Guido van Rossum	5fdeeea	1994-01-02 01:22:07 +0000	[diff] [blame]	283	Return how many characters at the beginning of \var{string} match
				284	the compiled regular expression. Return \code{-1} if the string
				285	does not match the pattern (this is different from a zero-length
				286	match!).
				287
Fred Drake	054f8fd	1998-01-12 18:28:20 +0000	[diff] [blame]	288	The optional second parameter, \var{pos}, gives an index in the string
Guido van Rossum	5fdeeea	1994-01-02 01:22:07 +0000	[diff] [blame]	289	where the search is to start; it defaults to \code{0}. This is not
				290	completely equivalent to slicing the string; the \code{'\^'} pattern
				291	character matches at the real begin of the string and at positions
				292	just after a newline, not necessarily at the index where the search
				293	is to start.
				294	\end{funcdesc}
				295
Fred Drake	cce1090	1998-03-17 06:33:25 +0000	[diff] [blame]	296	\begin{funcdesc}{search}{string\optional{, pos}}
Guido van Rossum	5fdeeea	1994-01-02 01:22:07 +0000	[diff] [blame]	297	Return the first position in \var{string} that matches the regular
				298	expression \code{pattern}. Return \code{-1} if no position in the
				299	string matches the pattern (this is different from a zero-length
				300	match anywhere!).
				301
				302	The optional second parameter has the same meaning as for the
Fred Drake	054f8fd	1998-01-12 18:28:20 +0000	[diff] [blame]	303	\code{match()} method.
Guido van Rossum	5fdeeea	1994-01-02 01:22:07 +0000	[diff] [blame]	304	\end{funcdesc}
				305
Fred Drake	cce1090	1998-03-17 06:33:25 +0000	[diff] [blame]	306	\begin{funcdesc}{group}{index, index, ...}
Fred Drake	054f8fd	1998-01-12 18:28:20 +0000	[diff] [blame]	307	This method is only valid when the last call to the \code{match()}
				308	or \code{search()} method found a match. It returns one or more
Guido van Rossum	5fdeeea	1994-01-02 01:22:07 +0000	[diff] [blame]	309	groups of the match. If there is a single \var{index} argument,
				310	the result is a single string; if there are multiple arguments, the
				311	result is a tuple with one item per argument. If the \var{index} is
				312	zero, the corresponding return value is the entire matching string; if
Guido van Rossum	326c0bc	1994-01-03 00:00:31 +0000	[diff] [blame]	313	it is in the inclusive range [1..99], it is the string matching the
Guido van Rossum	5fdeeea	1994-01-02 01:22:07 +0000	[diff] [blame]	314	the corresponding parenthesized group (using the default syntax,
Fred Drake	875c807	1998-01-02 02:50:13 +0000	[diff] [blame]	315	groups are parenthesized using \code{{\e}(} and \code{{\e})}). If no
Guido van Rossum	5fdeeea	1994-01-02 01:22:07 +0000	[diff] [blame]	316	such group exists, the corresponding result is \code{None}.
Guido van Rossum	326c0bc	1994-01-03 00:00:31 +0000	[diff] [blame]	317
Fred Drake	054f8fd	1998-01-12 18:28:20 +0000	[diff] [blame]	318	If the regular expression was compiled by \code{symcomp()} instead of
				319	\code{compile()}, the \var{index} arguments may also be strings
Guido van Rossum	326c0bc	1994-01-03 00:00:31 +0000	[diff] [blame]	320	identifying groups by their group name.
Guido van Rossum	5fdeeea	1994-01-02 01:22:07 +0000	[diff] [blame]	321	\end{funcdesc}
				322
				323	\noindent
				324	Compiled regular expressions support these data attributes:
				325
Fred Drake	1947991	1998-02-13 06:58:54 +0000	[diff] [blame]	326	\setindexsubitem{(regex attribute)}
Guido van Rossum	326c0bc	1994-01-03 00:00:31 +0000	[diff] [blame]	327
Guido van Rossum	5fdeeea	1994-01-02 01:22:07 +0000	[diff] [blame]	328	\begin{datadesc}{regs}
Fred Drake	054f8fd	1998-01-12 18:28:20 +0000	[diff] [blame]	329	When the last call to the \code{match()} or \code{search()} method found a
				330	match, this is a tuple of pairs of indexes corresponding to the
Guido van Rossum	5fdeeea	1994-01-02 01:22:07 +0000	[diff] [blame]	331	beginning and end of all parenthesized groups in the pattern. Indices
Fred Drake	054f8fd	1998-01-12 18:28:20 +0000	[diff] [blame]	332	are relative to the string argument passed to \code{match()} or
				333	\code{search()}. The 0-th tuple gives the beginning and end or the
Guido van Rossum	5fdeeea	1994-01-02 01:22:07 +0000	[diff] [blame]	334	whole pattern. When the last match or search failed, this is
				335	\code{None}.
				336	\end{datadesc}
				337
				338	\begin{datadesc}{last}
Fred Drake	054f8fd	1998-01-12 18:28:20 +0000	[diff] [blame]	339	When the last call to the \code{match()} or \code{search()} method found a
Guido van Rossum	5fdeeea	1994-01-02 01:22:07 +0000	[diff] [blame]	340	match, this is the string argument passed to that method. When the
				341	last match or search failed, this is \code{None}.
				342	\end{datadesc}
				343
				344	\begin{datadesc}{translate}
				345	This is the value of the \var{translate} argument to
Fred Drake	054f8fd	1998-01-12 18:28:20 +0000	[diff] [blame]	346	\code{regex.compile()} that created this regular expression object. If
				347	the \var{translate} argument was omitted in the \code{regex.compile()}
Guido van Rossum	5fdeeea	1994-01-02 01:22:07 +0000	[diff] [blame]	348	call, this is \code{None}.
				349	\end{datadesc}
Guido van Rossum	326c0bc	1994-01-03 00:00:31 +0000	[diff] [blame]	350
				351	\begin{datadesc}{givenpat}
Fred Drake	054f8fd	1998-01-12 18:28:20 +0000	[diff] [blame]	352	The regular expression pattern as passed to \code{compile()} or
				353	\code{symcomp()}.
Guido van Rossum	326c0bc	1994-01-03 00:00:31 +0000	[diff] [blame]	354	\end{datadesc}
				355
				356	\begin{datadesc}{realpat}
				357	The regular expression after stripping the group names for regular
Fred Drake	054f8fd	1998-01-12 18:28:20 +0000	[diff] [blame]	358	expressions compiled with \code{symcomp()}. Same as \code{givenpat}
Guido van Rossum	326c0bc	1994-01-03 00:00:31 +0000	[diff] [blame]	359	otherwise.
				360	\end{datadesc}
				361
				362	\begin{datadesc}{groupindex}
				363	A dictionary giving the mapping from symbolic group names to numerical
Fred Drake	054f8fd	1998-01-12 18:28:20 +0000	[diff] [blame]	364	group indexes for regular expressions compiled with \code{symcomp()}.
Guido van Rossum	326c0bc	1994-01-03 00:00:31 +0000	[diff] [blame]	365	\code{None} otherwise.
				366	\end{datadesc}