Fred Drake | 295da24 | 1998-08-10 19:42:37 +0000 | [diff] [blame] | 1 | \section{\module{regsub} --- |
| 2 | Substitution and splitting operations that use regular expressions.} |
Fred Drake | b91e934 | 1998-07-23 17:59:49 +0000 | [diff] [blame] | 3 | \declaremodule{standard}{regsub} |
| 4 | |
| 5 | \modulesynopsis{Substitution and splitting operations that use regular expressions.} |
| 6 | |
Fred Drake | 54c3947 | 1998-04-09 14:03:00 +0000 | [diff] [blame] | 7 | |
Guido van Rossum | 5fdeeea | 1994-01-02 01:22:07 +0000 | [diff] [blame] | 8 | This module defines a number of functions useful for working with |
| 9 | regular expressions (see built-in module \code{regex}). |
| 10 | |
Guido van Rossum | 6076ea5 | 1996-06-26 19:24:22 +0000 | [diff] [blame] | 11 | Warning: these functions are not thread-safe. |
| 12 | |
Guido van Rossum | 7779619 | 1997-12-30 04:54:47 +0000 | [diff] [blame] | 13 | \strong{Obsolescence note:} |
| 14 | This module is obsolete as of Python version 1.5; it is still being |
| 15 | maintained because much existing code still uses it. All new code in |
Fred Drake | 16f8845 | 1998-01-22 20:47:26 +0000 | [diff] [blame] | 16 | need of regular expressions should use the new \module{re} module, which |
Guido van Rossum | 7779619 | 1997-12-30 04:54:47 +0000 | [diff] [blame] | 17 | supports the more powerful and regular Perl-style regular expressions. |
| 18 | Existing code should be converted. The standard library module |
Fred Drake | 16f8845 | 1998-01-22 20:47:26 +0000 | [diff] [blame] | 19 | \module{reconvert} helps in converting \code{regex} style regular |
| 20 | expressions to \module{re} style regular expressions. (For more |
Fred Drake | 54c3947 | 1998-04-09 14:03:00 +0000 | [diff] [blame] | 21 | conversion help, see Andrew Kuchling's\index{Kuchling, Andrew} |
| 22 | ``regex-to-re HOWTO'' at |
| 23 | \url{http://www.python.org/doc/howto/regex-to-re/}.) |
Guido van Rossum | 7779619 | 1997-12-30 04:54:47 +0000 | [diff] [blame] | 24 | |
Guido van Rossum | 0b3f951 | 1996-08-09 21:43:21 +0000 | [diff] [blame] | 25 | |
Fred Drake | cce1090 | 1998-03-17 06:33:25 +0000 | [diff] [blame] | 26 | \begin{funcdesc}{sub}{pat, repl, str} |
Guido van Rossum | 5fdeeea | 1994-01-02 01:22:07 +0000 | [diff] [blame] | 27 | Replace the first occurrence of pattern \var{pat} in string |
| 28 | \var{str} by replacement \var{repl}. If the pattern isn't found, |
| 29 | the string is returned unchanged. The pattern may be a string or an |
| 30 | already compiled pattern. The replacement may contain references |
| 31 | \samp{\e \var{digit}} to subpatterns and escaped backslashes. |
| 32 | \end{funcdesc} |
| 33 | |
Fred Drake | cce1090 | 1998-03-17 06:33:25 +0000 | [diff] [blame] | 34 | \begin{funcdesc}{gsub}{pat, repl, str} |
Guido van Rossum | 5fdeeea | 1994-01-02 01:22:07 +0000 | [diff] [blame] | 35 | Replace all (non-overlapping) occurrences of pattern \var{pat} in |
| 36 | string \var{str} by replacement \var{repl}. The same rules as for |
| 37 | \code{sub()} apply. Empty matches for the pattern are replaced only |
| 38 | when not adjacent to a previous match, so e.g. |
| 39 | \code{gsub('', '-', 'abc')} returns \code{'-a-b-c-'}. |
| 40 | \end{funcdesc} |
| 41 | |
Fred Drake | cce1090 | 1998-03-17 06:33:25 +0000 | [diff] [blame] | 42 | \begin{funcdesc}{split}{str, pat\optional{, maxsplit}} |
Guido van Rossum | 5fdeeea | 1994-01-02 01:22:07 +0000 | [diff] [blame] | 43 | Split the string \var{str} in fields separated by delimiters matching |
| 44 | the pattern \var{pat}, and return a list containing the fields. Only |
| 45 | non-empty matches for the pattern are considered, so e.g. |
| 46 | \code{split('a:b', ':*')} returns \code{['a', 'b']} and |
Guido van Rossum | 0b3f951 | 1996-08-09 21:43:21 +0000 | [diff] [blame] | 47 | \code{split('abc', '')} returns \code{['abc']}. The \var{maxsplit} |
| 48 | defaults to 0. If it is nonzero, only \var{maxsplit} number of splits |
| 49 | occur, and the remainder of the string is returned as the final |
| 50 | element of the list. |
| 51 | \end{funcdesc} |
| 52 | |
Fred Drake | cce1090 | 1998-03-17 06:33:25 +0000 | [diff] [blame] | 53 | \begin{funcdesc}{splitx}{str, pat\optional{, maxsplit}} |
Guido van Rossum | 0b3f951 | 1996-08-09 21:43:21 +0000 | [diff] [blame] | 54 | Split the string \var{str} in fields separated by delimiters matching |
| 55 | the pattern \var{pat}, and return a list containing the fields as well |
| 56 | as the separators. For example, \code{splitx('a:::b', ':*')} returns |
| 57 | \code{['a', ':::', 'b']}. Otherwise, this function behaves the same |
| 58 | as \code{split}. |
| 59 | \end{funcdesc} |
| 60 | |
Fred Drake | cce1090 | 1998-03-17 06:33:25 +0000 | [diff] [blame] | 61 | \begin{funcdesc}{capwords}{s\optional{, pat}} |
Guido van Rossum | 0b3f951 | 1996-08-09 21:43:21 +0000 | [diff] [blame] | 62 | Capitalize words separated by optional pattern \var{pat}. The default |
| 63 | pattern uses any characters except letters, digits and underscores as |
| 64 | word delimiters. Capitalization is done by changing the first |
| 65 | character of each word to upper case. |
Guido van Rossum | 5fdeeea | 1994-01-02 01:22:07 +0000 | [diff] [blame] | 66 | \end{funcdesc} |
Barry Warsaw | 736bb06 | 1997-02-18 18:59:37 +0000 | [diff] [blame] | 67 | |
| 68 | \begin{funcdesc}{clear_cache}{} |
| 69 | The regsub module maintains a cache of compiled regular expressions, |
| 70 | keyed on the regular expression string and the syntax of the regex |
| 71 | module at the time the expression was compiled. This function clears |
| 72 | that cache. |
| 73 | \end{funcdesc} |