blob: 35d3e3a9106d3c31cd814e4ed9d42a3054b69206 [file] [log] [blame]
Guido van Rossum2850d182000-06-30 16:25:20 +00001# module 're' -- A collection of regular expression operations
2
3"""Support for regular expressions (RE).
4
5This module provides regular expression matching operations similar to
6those found in Perl. It's 8-bit clean: the strings being processed may
7contain both null bytes and characters whose high bit is set. Regular
8expression pattern strings may not contain null bytes, but can specify
9the null byte using the \\number notation. Characters with the high
10bit set may be included.
11
12Regular expressions can contain both special and ordinary
13characters. Most ordinary characters, like "A", "a", or "0", are the
14simplest regular expressions; they simply match themselves. You can
15concatenate ordinary characters, so last matches the string 'last'.
16
17The special characters are:
18 "." Matches any character except a newline.
19 "^" Matches the start of the string.
20 "$" Matches the end of the string.
21 "*" Matches 0 or more (greedy) repetitions of the preceding RE.
22 Greedy means that it will match as many repetitions as possible.
23 "+" Matches 1 or more (greedy) repetitions of the preceding RE.
24 "?" Matches 0 or 1 (greedy) of the preceding RE.
25 *?,+?,?? Non-greedy versions of the previous three special characters.
26 {m,n} Matches from m to n repetitions of the preceding RE.
27 {m,n}? Non-greedy version of the above.
28 "\\" Either escapes special characters or signals a special sequence.
29 [] Indicates a set of characters.
30 A "^" as the first character indicates a complementing set.
31 "|" A|B, creates an RE that will match either A or B.
32 (...) Matches the RE inside the parentheses.
33 The contents can be retrieved or matched later in the string.
34 (?iLmsx) Set the I, L, M, S, or X flag for the RE.
35 (?:...) Non-grouping version of regular parentheses.
36 (?P<name>...) The substring matched by the group is accessible by name.
37 (?P=name) Matches the text matched earlier by the group named name.
38 (?#...) A comment; ignored.
39 (?=...) Matches if ... matches next, but doesn't consume the string.
40 (?!...) Matches if ... doesn't match next.
41
42The special sequences consist of "\\" and a character from the list
43below. If the ordinary character is not on the list, then the
44resulting RE will match the second character.
45 \\number Matches the contents of the group of the same number.
46 \\A Matches only at the start of the string.
Tim Peters2344fae2001-01-15 00:50:52 +000047 \\Z Matches only at the end of the string.
Guido van Rossum2850d182000-06-30 16:25:20 +000048 \\b Matches the empty string, but only at the start or end of a word.
49 \\B Matches the empty string, but not at the start or end of a word.
50 \\d Matches any decimal digit; equivalent to the set [0-9].
51 \\D Matches any non-digit character; equivalent to the set [^0-9].
52 \\s Matches any whitespace character; equivalent to [ \\t\\n\\r\\f\\v].
53 \\S Matches any non-whitespace character; equiv. to [^ \\t\\n\\r\\f\\v].
54 \\w Matches any alphanumeric character; equivalent to [a-zA-Z0-9_].
55 With LOCALE, it will match the set [0-9_] plus characters defined
56 as letters for the current locale.
57 \\W Matches the complement of \\w.
Tim Peters2344fae2001-01-15 00:50:52 +000058 \\\\ Matches a literal backslash.
Guido van Rossum2850d182000-06-30 16:25:20 +000059
60This module exports the following functions:
61 match Match a regular expression pattern to the beginning of a string.
62 search Search a string for the presence of a pattern.
63 sub Substitute occurrences of a pattern found in a string.
64 subn Same as sub, but also return the number of substitutions made.
65 split Split a string by the occurrences of a pattern.
66 findall Find all occurrences of a pattern in a string.
67 compile Compile a pattern into a RegexObject.
68 escape Backslash all non-alphanumerics in a string.
69
70This module exports the following classes:
71 RegexObject Holds a compiled regular expression pattern.
72 MatchObject Contains information about pattern matches.
73
74Some of the functions in this module takes flags as optional parameters:
75 I IGNORECASE Perform case-insensitive matching.
76 L LOCALE Make \w, \W, \b, \B, dependent on the current locale.
77 M MULTILINE "^" matches the beginning of lines as well as the string.
78 "$" matches the end of lines as well as the string.
79 S DOTALL "." matches any character at all, including the newline.
Thomas Wouters7e474022000-07-16 12:04:32 +000080 X VERBOSE Ignore whitespace and comments for nicer looking RE's.
Guido van Rossum2850d182000-06-30 16:25:20 +000081
82This module also defines an exception 'error'.
83
84"""
85
86
87import sys
Guido van Rossum2850d182000-06-30 16:25:20 +000088from pcre import *
89
90#
91# First, the public part of the interface:
92#
93
94# pcre.error and re.error should be the same, since exceptions can be
95# raised from either module.
96
97# compilation flags
98
99I = IGNORECASE
100L = LOCALE
101M = MULTILINE
Tim Peters2344fae2001-01-15 00:50:52 +0000102S = DOTALL
103X = VERBOSE
Guido van Rossum2850d182000-06-30 16:25:20 +0000104
105
106#
107#
108#
109
110_cache = {}
111_MAXCACHE = 20
112
113def _cachecompile(pattern, flags=0):
114 key = (pattern, flags)
115 try:
116 return _cache[key]
117 except KeyError:
118 pass
119 value = compile(pattern, flags)
120 if len(_cache) >= _MAXCACHE:
121 _cache.clear()
122 _cache[key] = value
123 return value
124
125def match(pattern, string, flags=0):
126 """match (pattern, string[, flags]) -> MatchObject or None
Tim Peters2344fae2001-01-15 00:50:52 +0000127
Guido van Rossum2850d182000-06-30 16:25:20 +0000128 If zero or more characters at the beginning of string match the
129 regular expression pattern, return a corresponding MatchObject
130 instance. Return None if the string does not match the pattern;
131 note that this is different from a zero-length match.
132
133 Note: If you want to locate a match anywhere in string, use
134 search() instead.
135
136 """
Tim Peters2344fae2001-01-15 00:50:52 +0000137
Guido van Rossum2850d182000-06-30 16:25:20 +0000138 return _cachecompile(pattern, flags).match(string)
Tim Peters2344fae2001-01-15 00:50:52 +0000139
Guido van Rossum2850d182000-06-30 16:25:20 +0000140def search(pattern, string, flags=0):
141 """search (pattern, string[, flags]) -> MatchObject or None
Tim Peters2344fae2001-01-15 00:50:52 +0000142
Guido van Rossum2850d182000-06-30 16:25:20 +0000143 Scan through string looking for a location where the regular
144 expression pattern produces a match, and return a corresponding
145 MatchObject instance. Return None if no position in the string
146 matches the pattern; note that this is different from finding a
147 zero-length match at some point in the string.
148
149 """
150 return _cachecompile(pattern, flags).search(string)
Tim Peters2344fae2001-01-15 00:50:52 +0000151
Guido van Rossum2850d182000-06-30 16:25:20 +0000152def sub(pattern, repl, string, count=0):
153 """sub(pattern, repl, string[, count=0]) -> string
Tim Peters2344fae2001-01-15 00:50:52 +0000154
Guido van Rossum2850d182000-06-30 16:25:20 +0000155 Return the string obtained by replacing the leftmost
156 non-overlapping occurrences of pattern in string by the
157 replacement repl. If the pattern isn't found, string is returned
158 unchanged. repl can be a string or a function; if a function, it
159 is called for every non-overlapping occurrence of pattern. The
160 function takes a single match object argument, and returns the
161 replacement string.
162
163 The pattern may be a string or a regex object; if you need to
164 specify regular expression flags, you must use a regex object, or
165 use embedded modifiers in a pattern; e.g.
166 sub("(?i)b+", "x", "bbbb BBBB") returns 'x x'.
167
168 The optional argument count is the maximum number of pattern
169 occurrences to be replaced; count must be a non-negative integer,
170 and the default value of 0 means to replace all occurrences.
171
172 """
173 if type(pattern) == type(''):
174 pattern = _cachecompile(pattern)
175 return pattern.sub(repl, string, count)
176
177def subn(pattern, repl, string, count=0):
178 """subn(pattern, repl, string[, count=0]) -> (string, num substitutions)
Tim Peters2344fae2001-01-15 00:50:52 +0000179
Guido van Rossum2850d182000-06-30 16:25:20 +0000180 Perform the same operation as sub(), but return a tuple
181 (new_string, number_of_subs_made).
182
183 """
184 if type(pattern) == type(''):
185 pattern = _cachecompile(pattern)
186 return pattern.subn(repl, string, count)
Tim Peters2344fae2001-01-15 00:50:52 +0000187
Guido van Rossum2850d182000-06-30 16:25:20 +0000188def split(pattern, string, maxsplit=0):
189 """split(pattern, string[, maxsplit=0]) -> list of strings
Tim Peters2344fae2001-01-15 00:50:52 +0000190
Guido van Rossum2850d182000-06-30 16:25:20 +0000191 Split string by the occurrences of pattern. If capturing
192 parentheses are used in pattern, then the text of all groups in
193 the pattern are also returned as part of the resulting list. If
194 maxsplit is nonzero, at most maxsplit splits occur, and the
195 remainder of the string is returned as the final element of the
196 list.
197
198 """
199 if type(pattern) == type(''):
200 pattern = _cachecompile(pattern)
201 return pattern.split(string, maxsplit)
202
203def findall(pattern, string):
204 """findall(pattern, string) -> list
Tim Peters2344fae2001-01-15 00:50:52 +0000205
Guido van Rossum2850d182000-06-30 16:25:20 +0000206 Return a list of all non-overlapping matches of pattern in
207 string. If one or more groups are present in the pattern, return a
208 list of groups; this will be a list of tuples if the pattern has
209 more than one group. Empty matches are included in the result.
210
211 """
212 if type(pattern) == type(''):
213 pattern = _cachecompile(pattern)
214 return pattern.findall(string)
215
216def escape(pattern):
217 """escape(string) -> string
Tim Peters2344fae2001-01-15 00:50:52 +0000218
Guido van Rossum2850d182000-06-30 16:25:20 +0000219 Return string with all non-alphanumerics backslashed; this is
220 useful if you want to match an arbitrary literal string that may
221 have regular expression metacharacters in it.
222
223 """
224 result = list(pattern)
Guido van Rossum2850d182000-06-30 16:25:20 +0000225 for i in range(len(pattern)):
226 char = pattern[i]
Eric S. Raymond6e025bc2001-02-10 00:22:33 +0000227 if not char.isalnum():
Guido van Rossum2850d182000-06-30 16:25:20 +0000228 if char=='\000': result[i] = '\\000'
229 else: result[i] = '\\'+char
Eric S. Raymondec3bbde2001-02-09 09:39:08 +0000230 return ''.join(result)
Guido van Rossum2850d182000-06-30 16:25:20 +0000231
232def compile(pattern, flags=0):
233 """compile(pattern[, flags]) -> RegexObject
234
235 Compile a regular expression pattern into a regular expression
236 object, which can be used for matching using its match() and
237 search() methods.
238
239 """
240 groupindex={}
241 code=pcre_compile(pattern, flags, groupindex)
242 return RegexObject(pattern, flags, code, groupindex)
Tim Peters2344fae2001-01-15 00:50:52 +0000243
Guido van Rossum2850d182000-06-30 16:25:20 +0000244
245#
246# Class definitions
247#
248
249class RegexObject:
250 """Holds a compiled regular expression pattern.
251
252 Methods:
253 match Match the pattern to the beginning of a string.
254 search Search a string for the presence of the pattern.
255 sub Substitute occurrences of the pattern found in a string.
256 subn Same as sub, but also return the number of substitutions made.
257 split Split a string by the occurrences of the pattern.
258 findall Find all occurrences of the pattern in a string.
Tim Peters2344fae2001-01-15 00:50:52 +0000259
Guido van Rossum2850d182000-06-30 16:25:20 +0000260 """
261
262 def __init__(self, pattern, flags, code, groupindex):
Tim Peters2344fae2001-01-15 00:50:52 +0000263 self.code = code
Guido van Rossum2850d182000-06-30 16:25:20 +0000264 self.flags = flags
265 self.pattern = pattern
266 self.groupindex = groupindex
267
268 def search(self, string, pos=0, endpos=None):
269 """search(string[, pos][, endpos]) -> MatchObject or None
Tim Peters2344fae2001-01-15 00:50:52 +0000270
Guido van Rossum2850d182000-06-30 16:25:20 +0000271 Scan through string looking for a location where this regular
272 expression produces a match, and return a corresponding
273 MatchObject instance. Return None if no position in the string
274 matches the pattern; note that this is different from finding
275 a zero-length match at some point in the string. The optional
276 pos and endpos parameters have the same meaning as for the
277 match() method.
Tim Peters2344fae2001-01-15 00:50:52 +0000278
Guido van Rossum2850d182000-06-30 16:25:20 +0000279 """
Tim Peters2344fae2001-01-15 00:50:52 +0000280 if endpos is None or endpos>len(string):
Guido van Rossum2850d182000-06-30 16:25:20 +0000281 endpos=len(string)
282 if endpos<pos: endpos=pos
283 regs = self.code.match(string, pos, endpos, 0)
284 if regs is None:
285 return None
286 self._num_regs=len(regs)
Tim Peters2344fae2001-01-15 00:50:52 +0000287
Guido van Rossum2850d182000-06-30 16:25:20 +0000288 return MatchObject(self,
289 string,
290 pos, endpos,
291 regs)
Tim Peters2344fae2001-01-15 00:50:52 +0000292
Guido van Rossum2850d182000-06-30 16:25:20 +0000293 def match(self, string, pos=0, endpos=None):
294 """match(string[, pos][, endpos]) -> MatchObject or None
Tim Peters2344fae2001-01-15 00:50:52 +0000295
Guido van Rossum2850d182000-06-30 16:25:20 +0000296 If zero or more characters at the beginning of string match
297 this regular expression, return a corresponding MatchObject
298 instance. Return None if the string does not match the
299 pattern; note that this is different from a zero-length match.
300
301 Note: If you want to locate a match anywhere in string, use
302 search() instead.
303
304 The optional second parameter pos gives an index in the string
305 where the search is to start; it defaults to 0. This is not
306 completely equivalent to slicing the string; the '' pattern
307 character matches at the real beginning of the string and at
308 positions just after a newline, but not necessarily at the
309 index where the search is to start.
310
311 The optional parameter endpos limits how far the string will
312 be searched; it will be as if the string is endpos characters
313 long, so only the characters from pos to endpos will be
314 searched for a match.
315
316 """
Tim Peters2344fae2001-01-15 00:50:52 +0000317 if endpos is None or endpos>len(string):
Guido van Rossum2850d182000-06-30 16:25:20 +0000318 endpos=len(string)
319 if endpos<pos: endpos=pos
320 regs = self.code.match(string, pos, endpos, ANCHORED)
321 if regs is None:
322 return None
323 self._num_regs=len(regs)
324 return MatchObject(self,
325 string,
326 pos, endpos,
327 regs)
Tim Peters2344fae2001-01-15 00:50:52 +0000328
Guido van Rossum2850d182000-06-30 16:25:20 +0000329 def sub(self, repl, string, count=0):
330 """sub(repl, string[, count=0]) -> string
Tim Peters2344fae2001-01-15 00:50:52 +0000331
Guido van Rossum2850d182000-06-30 16:25:20 +0000332 Return the string obtained by replacing the leftmost
333 non-overlapping occurrences of the compiled pattern in string
334 by the replacement repl. If the pattern isn't found, string is
335 returned unchanged.
336
337 Identical to the sub() function, using the compiled pattern.
Tim Peters2344fae2001-01-15 00:50:52 +0000338
Guido van Rossum2850d182000-06-30 16:25:20 +0000339 """
340 return self.subn(repl, string, count)[0]
Tim Peters2344fae2001-01-15 00:50:52 +0000341
342 def subn(self, repl, source, count=0):
Guido van Rossum2850d182000-06-30 16:25:20 +0000343 """subn(repl, string[, count=0]) -> tuple
Tim Peters2344fae2001-01-15 00:50:52 +0000344
Guido van Rossum2850d182000-06-30 16:25:20 +0000345 Perform the same operation as sub(), but return a tuple
346 (new_string, number_of_subs_made).
347
348 """
349 if count < 0:
350 raise error, "negative substitution count"
351 if count == 0:
352 count = sys.maxint
353 n = 0 # Number of matches
354 pos = 0 # Where to start searching
355 lastmatch = -1 # End of last match
356 results = [] # Substrings making up the result
357 end = len(source)
358
359 if type(repl) is type(''):
360 # See if repl contains group references
361 try:
362 repl = pcre_expand(_Dummy, repl)
363 except:
364 m = MatchObject(self, source, 0, end, [])
365 repl = lambda m, repl=repl, expand=pcre_expand: expand(m, repl)
366 else:
367 m = None
368 else:
369 m = MatchObject(self, source, 0, end, [])
370
371 match = self.code.match
372 append = results.append
373 while n < count and pos <= end:
374 regs = match(source, pos, end, 0)
375 if not regs:
376 break
377 self._num_regs = len(regs)
378 i, j = regs[0]
379 if i == j == lastmatch:
380 # Empty match adjacent to previous match
381 pos = pos + 1
382 append(source[lastmatch:pos])
383 continue
384 if pos < i:
385 append(source[pos:i])
386 if m:
387 m.pos = pos
388 m.regs = regs
389 append(repl(m))
390 else:
391 append(repl)
392 pos = lastmatch = j
393 if i == j:
394 # Last match was empty; don't try here again
395 pos = pos + 1
396 append(source[lastmatch:pos])
397 n = n + 1
398 append(source[pos:])
Eric S. Raymondec3bbde2001-02-09 09:39:08 +0000399 return (''.join(results), n)
Tim Peters2344fae2001-01-15 00:50:52 +0000400
Guido van Rossum2850d182000-06-30 16:25:20 +0000401 def split(self, source, maxsplit=0):
402 """split(source[, maxsplit=0]) -> list of strings
Tim Peters2344fae2001-01-15 00:50:52 +0000403
Guido van Rossum2850d182000-06-30 16:25:20 +0000404 Split string by the occurrences of the compiled pattern. If
405 capturing parentheses are used in the pattern, then the text
406 of all groups in the pattern are also returned as part of the
407 resulting list. If maxsplit is nonzero, at most maxsplit
408 splits occur, and the remainder of the string is returned as
409 the final element of the list.
Tim Peters2344fae2001-01-15 00:50:52 +0000410
Guido van Rossum2850d182000-06-30 16:25:20 +0000411 """
412 if maxsplit < 0:
413 raise error, "negative split count"
414 if maxsplit == 0:
415 maxsplit = sys.maxint
416 n = 0
417 pos = 0
418 lastmatch = 0
419 results = []
420 end = len(source)
421 match = self.code.match
422 append = results.append
423 while n < maxsplit:
424 regs = match(source, pos, end, 0)
425 if not regs:
426 break
427 i, j = regs[0]
428 if i == j:
429 # Empty match
430 if pos >= end:
431 break
432 pos = pos+1
433 continue
434 append(source[lastmatch:i])
435 rest = regs[1:]
436 if rest:
437 for a, b in rest:
438 if a == -1 or b == -1:
439 group = None
440 else:
441 group = source[a:b]
442 append(group)
443 pos = lastmatch = j
444 n = n + 1
445 append(source[lastmatch:])
446 return results
447
448 def findall(self, source):
449 """findall(source) -> list
Tim Peters2344fae2001-01-15 00:50:52 +0000450
Guido van Rossum2850d182000-06-30 16:25:20 +0000451 Return a list of all non-overlapping matches of the compiled
452 pattern in string. If one or more groups are present in the
453 pattern, return a list of groups; this will be a list of
454 tuples if the pattern has more than one group. Empty matches
455 are included in the result.
456
457 """
458 pos = 0
459 end = len(source)
460 results = []
461 match = self.code.match
462 append = results.append
463 while pos <= end:
464 regs = match(source, pos, end, 0)
465 if not regs:
466 break
467 i, j = regs[0]
468 rest = regs[1:]
469 if not rest:
470 gr = source[i:j]
471 elif len(rest) == 1:
472 a, b = rest[0]
473 gr = source[a:b]
474 else:
475 gr = []
476 for (a, b) in rest:
477 gr.append(source[a:b])
478 gr = tuple(gr)
479 append(gr)
480 pos = max(j, pos+1)
481 return results
482
483 # The following 3 functions were contributed by Mike Fletcher, and
484 # allow pickling and unpickling of RegexObject instances.
485 def __getinitargs__(self):
486 return (None,None,None,None) # any 4 elements, to work around
487 # problems with the
Tim Peters2344fae2001-01-15 00:50:52 +0000488 # pickle/cPickle modules not yet
Guido van Rossum2850d182000-06-30 16:25:20 +0000489 # ignoring the __init__ function
490 def __getstate__(self):
491 return self.pattern, self.flags, self.groupindex
492 def __setstate__(self, statetuple):
493 self.pattern = statetuple[0]
494 self.flags = statetuple[1]
495 self.groupindex = statetuple[2]
496 self.code = apply(pcre_compile, statetuple)
497
498class _Dummy:
499 # Dummy class used by _subn_string(). Has 'group' to avoid core dump.
500 group = None
501
502class MatchObject:
503 """Holds a compiled regular expression pattern.
504
505 Methods:
506 start Return the index of the start of a matched substring.
507 end Return the index of the end of a matched substring.
508 span Return a tuple of (start, end) of a matched substring.
509 groups Return a tuple of all the subgroups of the match.
510 group Return one or more subgroups of the match.
511 groupdict Return a dictionary of all the named subgroups of the match.
512
513 """
514
515 def __init__(self, re, string, pos, endpos, regs):
516 self.re = re
517 self.string = string
Tim Peters2344fae2001-01-15 00:50:52 +0000518 self.pos = pos
Guido van Rossum2850d182000-06-30 16:25:20 +0000519 self.endpos = endpos
520 self.regs = regs
Tim Peters2344fae2001-01-15 00:50:52 +0000521
Guido van Rossum2850d182000-06-30 16:25:20 +0000522 def start(self, g = 0):
523 """start([group=0]) -> int or None
Tim Peters2344fae2001-01-15 00:50:52 +0000524
Guido van Rossum2850d182000-06-30 16:25:20 +0000525 Return the index of the start of the substring matched by
526 group; group defaults to zero (meaning the whole matched
Andrew M. Kuchling2cb176f2000-09-04 03:19:48 +0000527 substring). Return -1 if group exists but did not contribute
Guido van Rossum2850d182000-06-30 16:25:20 +0000528 to the match.
529
530 """
531 if type(g) == type(''):
532 try:
533 g = self.re.groupindex[g]
534 except (KeyError, TypeError):
535 raise IndexError, 'group %s is undefined' % `g`
536 return self.regs[g][0]
Tim Peters2344fae2001-01-15 00:50:52 +0000537
Guido van Rossum2850d182000-06-30 16:25:20 +0000538 def end(self, g = 0):
539 """end([group=0]) -> int or None
Tim Peters2344fae2001-01-15 00:50:52 +0000540
Guido van Rossum2850d182000-06-30 16:25:20 +0000541 Return the indices of the end of the substring matched by
542 group; group defaults to zero (meaning the whole matched
Andrew M. Kuchling2cb176f2000-09-04 03:19:48 +0000543 substring). Return -1 if group exists but did not contribute
Guido van Rossum2850d182000-06-30 16:25:20 +0000544 to the match.
545
546 """
547 if type(g) == type(''):
548 try:
549 g = self.re.groupindex[g]
550 except (KeyError, TypeError):
551 raise IndexError, 'group %s is undefined' % `g`
552 return self.regs[g][1]
Tim Peters2344fae2001-01-15 00:50:52 +0000553
Guido van Rossum2850d182000-06-30 16:25:20 +0000554 def span(self, g = 0):
555 """span([group=0]) -> tuple
Tim Peters2344fae2001-01-15 00:50:52 +0000556
Guido van Rossum2850d182000-06-30 16:25:20 +0000557 Return the 2-tuple (m.start(group), m.end(group)). Note that
Andrew M. Kuchling2cb176f2000-09-04 03:19:48 +0000558 if group did not contribute to the match, this is (-1,
559 -1). Group defaults to zero (meaning the whole matched
Guido van Rossum2850d182000-06-30 16:25:20 +0000560 substring).
561
562 """
563 if type(g) == type(''):
564 try:
565 g = self.re.groupindex[g]
566 except (KeyError, TypeError):
567 raise IndexError, 'group %s is undefined' % `g`
568 return self.regs[g]
Tim Peters2344fae2001-01-15 00:50:52 +0000569
Guido van Rossum2850d182000-06-30 16:25:20 +0000570 def groups(self, default=None):
571 """groups([default=None]) -> tuple
Tim Peters2344fae2001-01-15 00:50:52 +0000572
Guido van Rossum2850d182000-06-30 16:25:20 +0000573 Return a tuple containing all the subgroups of the match, from
574 1 up to however many groups are in the pattern. The default
575 argument is used for groups that did not participate in the
576 match.
577
578 """
579 result = []
580 for g in range(1, self.re._num_regs):
581 a, b = self.regs[g]
582 if a == -1 or b == -1:
583 result.append(default)
584 else:
585 result.append(self.string[a:b])
586 return tuple(result)
587
588 def group(self, *groups):
589 """group([group1, group2, ...]) -> string or tuple
Tim Peters2344fae2001-01-15 00:50:52 +0000590
Guido van Rossum2850d182000-06-30 16:25:20 +0000591 Return one or more subgroups of the match. If there is a
592 single argument, the result is a single string; if there are
593 multiple arguments, the result is a tuple with one item per
594 argument. Without arguments, group1 defaults to zero (i.e. the
595 whole match is returned). If a groupN argument is zero, the
596 corresponding return value is the entire matching string; if
597 it is in the inclusive range [1..99], it is the string
598 matching the the corresponding parenthesized group. If a group
599 number is negative or larger than the number of groups defined
600 in the pattern, an IndexError exception is raised. If a group
601 is contained in a part of the pattern that did not match, the
602 corresponding result is None. If a group is contained in a
603 part of the pattern that matched multiple times, the last
604 match is returned.
605
606 If the regular expression uses the (?P<name>...) syntax, the
607 groupN arguments may also be strings identifying groups by
608 their group name. If a string argument is not used as a group
609 name in the pattern, an IndexError exception is raised.
610
611 """
612 if len(groups) == 0:
613 groups = (0,)
614 result = []
615 for g in groups:
616 if type(g) == type(''):
617 try:
618 g = self.re.groupindex[g]
619 except (KeyError, TypeError):
620 raise IndexError, 'group %s is undefined' % `g`
621 if g >= len(self.regs):
622 raise IndexError, 'group %s is undefined' % `g`
623 a, b = self.regs[g]
624 if a == -1 or b == -1:
625 result.append(None)
626 else:
627 result.append(self.string[a:b])
628 if len(result) > 1:
629 return tuple(result)
630 elif len(result) == 1:
631 return result[0]
632 else:
633 return ()
634
635 def groupdict(self, default=None):
636 """groupdict([default=None]) -> dictionary
Tim Peters2344fae2001-01-15 00:50:52 +0000637
Guido van Rossum2850d182000-06-30 16:25:20 +0000638 Return a dictionary containing all the named subgroups of the
639 match, keyed by the subgroup name. The default argument is
640 used for groups that did not participate in the match.
641
642 """
643 dict = {}
644 for name, index in self.re.groupindex.items():
645 a, b = self.regs[index]
646 if a == -1 or b == -1:
647 dict[name] = default
648 else:
649 dict[name] = self.string[a:b]
650 return dict