Skip Montanaro | 0588581 | 2005-01-16 20:48:27 +0000 | [diff] [blame] | 1 | \section{\module{reconvert} --- |
| 2 | Convert regular expressions from regex to re form} |
| 3 | \declaremodule{standard}{reconvert} |
| 4 | \moduleauthor{Andrew M. Kuchling}{amk@amk.ca} |
| 5 | \sectionauthor{Skip Montanaro}{skip@pobox.com} |
| 6 | |
| 7 | |
| 8 | \modulesynopsis{Convert regex-, emacs- or sed-style regular expressions |
| 9 | to re-style syntax.} |
| 10 | |
| 11 | |
| 12 | This module provides a facility to convert regular expressions from the |
| 13 | syntax used by the deprecated \module{regex} module to those used by the |
| 14 | newer \module{re} module. Because of similarity between the regular |
| 15 | expression syntax of \code{sed(1)} and \code{emacs(1)} and the |
| 16 | \module{regex} module, it is also helpful to convert patterns written for |
| 17 | those tools to \module{re} patterns. |
| 18 | |
| 19 | When used as a script, a Python string literal (or any other expression |
| 20 | evaluating to a string) is read from stdin, and the translated expression is |
| 21 | written to stdout as a string literal. Unless stdout is a tty, no trailing |
| 22 | newline is written to stdout. This is done so that it can be used with |
| 23 | Emacs \code{C-U M-|} (shell-command-on-region) which filters the region |
| 24 | through the shell command. |
| 25 | |
| 26 | \begin{seealso} |
| 27 | \seetitle{Mastering Regular Expressions}{Book on regular expressions |
| 28 | by Jeffrey Friedl, published by O'Reilly. The second |
| 29 | edition of the book no longer covers Python at all, |
| 30 | but the first edition covered writing good regular expression |
| 31 | patterns in great detail.} |
| 32 | \end{seealso} |
| 33 | |
| 34 | \subsection{Module Contents} |
| 35 | \nodename{Contents of Module reconvert} |
| 36 | |
| 37 | The module defines two functions and a handful of constants. |
| 38 | |
| 39 | \begin{funcdesc}{convert}{pattern\optional{, syntax=None}} |
| 40 | Convert a \var{pattern} representing a \module{regex}-stype regular |
| 41 | expression into a \module{re}-style regular expression. The optional |
| 42 | \var{syntax} parameter is a bitwise-or'd set of flags that control what |
| 43 | constructs are converted. See below for a description of the various |
| 44 | constants. |
| 45 | \end{funcdesc} |
| 46 | |
| 47 | \begin{funcdesc}{quote}{s\optional{, quote=None}} |
| 48 | Convert a string object to a quoted string literal. |
| 49 | |
| 50 | This is similar to \function{repr} but will return a "raw" string (r'...' |
| 51 | or r"...") when the string contains backslashes, instead of doubling all |
| 52 | backslashes. The resulting string does not always evaluate to the same |
| 53 | string as the original; however it will do just the right thing when passed |
| 54 | into re.compile(). |
| 55 | |
| 56 | The optional second argument forces the string quote; it must be a single |
| 57 | character which is a valid Python string quote. Note that prior to Python |
| 58 | 2.5 this would not accept triple-quoted string delimiters. |
| 59 | \end{funcdesc} |
| 60 | |
| 61 | \begin{datadesc}{RE_NO_BK_PARENS} |
| 62 | Suppress paren conversion. This should be omitted when converting |
| 63 | \code{sed}-style or \code{emacs}-style regular expressions. |
| 64 | \end{datadesc} |
| 65 | |
| 66 | \begin{datadesc}{RE_NO_BK_VBAR} |
| 67 | Suppress vertical bar conversion. This should be omitted when converting |
| 68 | \code{sed}-style or \code{emacs}-style regular expressions. |
| 69 | \end{datadesc} |
| 70 | |
| 71 | \begin{datadesc}{RE_BK_PLUS_QM} |
| 72 | Enable conversion of \code{+} and \code{?} characters. This should be |
| 73 | added to the \var{syntax} arg of \function{convert} when converting |
| 74 | \code{sed}-style regular expressions and omitted when converting |
| 75 | \code{emacs}-style regular expressions. |
| 76 | \end{datadesc} |
| 77 | |
| 78 | \begin{datadesc}{RE_NEWLINE_OR} |
| 79 | When set, newline characters are replaced by \code{|}. |
| 80 | \end{datadesc} |