Fred Drake | 295da24 | 1998-08-10 19:42:37 +0000 | [diff] [blame] | 1 | \section{\module{regsub} --- |
Fred Drake | ffbe687 | 1999-04-22 21:23:22 +0000 | [diff] [blame] | 2 | String operations using regular expressions} |
Fred Drake | b91e934 | 1998-07-23 17:59:49 +0000 | [diff] [blame] | 3 | |
Fred Drake | ffbe687 | 1999-04-22 21:23:22 +0000 | [diff] [blame] | 4 | \declaremodule{standard}{regsub} |
| 5 | \modulesynopsis{Substitution and splitting operations that use |
| 6 | regular expressions.} |
Fred Drake | b91e934 | 1998-07-23 17:59:49 +0000 | [diff] [blame] | 7 | |
Fred Drake | 54c3947 | 1998-04-09 14:03:00 +0000 | [diff] [blame] | 8 | |
Guido van Rossum | 5fdeeea | 1994-01-02 01:22:07 +0000 | [diff] [blame] | 9 | This module defines a number of functions useful for working with |
Fred Drake | ffbe687 | 1999-04-22 21:23:22 +0000 | [diff] [blame] | 10 | regular expressions (see built-in module \refmodule{regex}). |
Guido van Rossum | 5fdeeea | 1994-01-02 01:22:07 +0000 | [diff] [blame] | 11 | |
Guido van Rossum | 6076ea5 | 1996-06-26 19:24:22 +0000 | [diff] [blame] | 12 | Warning: these functions are not thread-safe. |
| 13 | |
Guido van Rossum | 7779619 | 1997-12-30 04:54:47 +0000 | [diff] [blame] | 14 | \strong{Obsolescence note:} |
| 15 | This module is obsolete as of Python version 1.5; it is still being |
| 16 | maintained because much existing code still uses it. All new code in |
Fred Drake | ffbe687 | 1999-04-22 21:23:22 +0000 | [diff] [blame] | 17 | need of regular expressions should use the new \refmodule{re} module, which |
Guido van Rossum | 7779619 | 1997-12-30 04:54:47 +0000 | [diff] [blame] | 18 | supports the more powerful and regular Perl-style regular expressions. |
| 19 | Existing code should be converted. The standard library module |
Fred Drake | ffbe687 | 1999-04-22 21:23:22 +0000 | [diff] [blame] | 20 | \module{reconvert} helps in converting \refmodule{regex} style regular |
| 21 | expressions to \refmodule{re} style regular expressions. (For more |
Fred Drake | 54c3947 | 1998-04-09 14:03:00 +0000 | [diff] [blame] | 22 | conversion help, see Andrew Kuchling's\index{Kuchling, Andrew} |
| 23 | ``regex-to-re HOWTO'' at |
| 24 | \url{http://www.python.org/doc/howto/regex-to-re/}.) |
Guido van Rossum | 7779619 | 1997-12-30 04:54:47 +0000 | [diff] [blame] | 25 | |
Guido van Rossum | 0b3f951 | 1996-08-09 21:43:21 +0000 | [diff] [blame] | 26 | |
Fred Drake | cce1090 | 1998-03-17 06:33:25 +0000 | [diff] [blame] | 27 | \begin{funcdesc}{sub}{pat, repl, str} |
Guido van Rossum | 5fdeeea | 1994-01-02 01:22:07 +0000 | [diff] [blame] | 28 | Replace the first occurrence of pattern \var{pat} in string |
| 29 | \var{str} by replacement \var{repl}. If the pattern isn't found, |
| 30 | the string is returned unchanged. The pattern may be a string or an |
| 31 | already compiled pattern. The replacement may contain references |
| 32 | \samp{\e \var{digit}} to subpatterns and escaped backslashes. |
| 33 | \end{funcdesc} |
| 34 | |
Fred Drake | cce1090 | 1998-03-17 06:33:25 +0000 | [diff] [blame] | 35 | \begin{funcdesc}{gsub}{pat, repl, str} |
Guido van Rossum | 5fdeeea | 1994-01-02 01:22:07 +0000 | [diff] [blame] | 36 | Replace all (non-overlapping) occurrences of pattern \var{pat} in |
| 37 | string \var{str} by replacement \var{repl}. The same rules as for |
| 38 | \code{sub()} apply. Empty matches for the pattern are replaced only |
| 39 | when not adjacent to a previous match, so e.g. |
| 40 | \code{gsub('', '-', 'abc')} returns \code{'-a-b-c-'}. |
| 41 | \end{funcdesc} |
| 42 | |
Fred Drake | cce1090 | 1998-03-17 06:33:25 +0000 | [diff] [blame] | 43 | \begin{funcdesc}{split}{str, pat\optional{, maxsplit}} |
Guido van Rossum | 5fdeeea | 1994-01-02 01:22:07 +0000 | [diff] [blame] | 44 | Split the string \var{str} in fields separated by delimiters matching |
| 45 | the pattern \var{pat}, and return a list containing the fields. Only |
| 46 | non-empty matches for the pattern are considered, so e.g. |
| 47 | \code{split('a:b', ':*')} returns \code{['a', 'b']} and |
Guido van Rossum | 0b3f951 | 1996-08-09 21:43:21 +0000 | [diff] [blame] | 48 | \code{split('abc', '')} returns \code{['abc']}. The \var{maxsplit} |
| 49 | defaults to 0. If it is nonzero, only \var{maxsplit} number of splits |
| 50 | occur, and the remainder of the string is returned as the final |
| 51 | element of the list. |
| 52 | \end{funcdesc} |
| 53 | |
Fred Drake | cce1090 | 1998-03-17 06:33:25 +0000 | [diff] [blame] | 54 | \begin{funcdesc}{splitx}{str, pat\optional{, maxsplit}} |
Guido van Rossum | 0b3f951 | 1996-08-09 21:43:21 +0000 | [diff] [blame] | 55 | Split the string \var{str} in fields separated by delimiters matching |
| 56 | the pattern \var{pat}, and return a list containing the fields as well |
| 57 | as the separators. For example, \code{splitx('a:::b', ':*')} returns |
| 58 | \code{['a', ':::', 'b']}. Otherwise, this function behaves the same |
| 59 | as \code{split}. |
| 60 | \end{funcdesc} |
| 61 | |
Fred Drake | cce1090 | 1998-03-17 06:33:25 +0000 | [diff] [blame] | 62 | \begin{funcdesc}{capwords}{s\optional{, pat}} |
Guido van Rossum | 0b3f951 | 1996-08-09 21:43:21 +0000 | [diff] [blame] | 63 | Capitalize words separated by optional pattern \var{pat}. The default |
| 64 | pattern uses any characters except letters, digits and underscores as |
| 65 | word delimiters. Capitalization is done by changing the first |
| 66 | character of each word to upper case. |
Guido van Rossum | 5fdeeea | 1994-01-02 01:22:07 +0000 | [diff] [blame] | 67 | \end{funcdesc} |
Barry Warsaw | 736bb06 | 1997-02-18 18:59:37 +0000 | [diff] [blame] | 68 | |
| 69 | \begin{funcdesc}{clear_cache}{} |
| 70 | The regsub module maintains a cache of compiled regular expressions, |
| 71 | keyed on the regular expression string and the syntax of the regex |
| 72 | module at the time the expression was compiled. This function clears |
| 73 | that cache. |
| 74 | \end{funcdesc} |