Guido van Rossum | 5e97c9d | 1998-12-22 05:18:24 +0000 | [diff] [blame] | 1 | % Module and documentation by Eric S. Raymond, 21 Dec 1998 |
Guido van Rossum | 5e97c9d | 1998-12-22 05:18:24 +0000 | [diff] [blame] | 2 | |
Fred Drake | 1189fa9 | 1998-12-22 18:24:13 +0000 | [diff] [blame] | 3 | \section{\module{shlex} --- |
Fred Drake | 184e836 | 1999-05-11 15:14:15 +0000 | [diff] [blame] | 4 | Simple lexical analysis} |
Fred Drake | 1189fa9 | 1998-12-22 18:24:13 +0000 | [diff] [blame] | 5 | |
| 6 | \declaremodule{standard}{shlex} |
Fred Drake | 39cddb7 | 1999-01-12 19:22:11 +0000 | [diff] [blame] | 7 | \modulesynopsis{Simple lexical analysis for \UNIX{} shell-like languages.} |
Fred Drake | 1189fa9 | 1998-12-22 18:24:13 +0000 | [diff] [blame] | 8 | \moduleauthor{Eric S. Raymond}{esr@snark.thyrsus.com} |
| 9 | \sectionauthor{Eric S. Raymond}{esr@snark.thyrsus.com} |
| 10 | |
Fred Drake | 292b9eb | 1998-12-22 18:40:50 +0000 | [diff] [blame] | 11 | \versionadded{1.5.2} |
Fred Drake | 1189fa9 | 1998-12-22 18:24:13 +0000 | [diff] [blame] | 12 | |
| 13 | The \class{shlex} class makes it easy to write lexical analyzers for |
| 14 | simple syntaxes resembling that of the \UNIX{} shell. This will often |
| 15 | be useful for writing minilanguages, e.g.\ in run control files for |
Guido van Rossum | 5e97c9d | 1998-12-22 05:18:24 +0000 | [diff] [blame] | 16 | Python applications. |
| 17 | |
| 18 | \begin{classdesc}{shlex}{\optional{stream}} |
| 19 | A \class{shlex} instance or subclass instance is a lexical analyzer |
| 20 | object. The initialization argument, if present, specifies where to |
| 21 | read characters from. It must be a file- or stream-like object with |
Fred Drake | 1189fa9 | 1998-12-22 18:24:13 +0000 | [diff] [blame] | 22 | \method{read()} and \method{readline()} methods. If no argument is given, |
| 23 | input will be taken from \code{sys.stdin}. |
Guido van Rossum | 5e97c9d | 1998-12-22 05:18:24 +0000 | [diff] [blame] | 24 | \end{classdesc} |
| 25 | |
Fred Drake | 184e836 | 1999-05-11 15:14:15 +0000 | [diff] [blame] | 26 | |
| 27 | \begin{seealso} |
| 28 | \seemodule{ConfigParser}{Parser for configuration files similar to the |
| 29 | Windows \file{.ini} files.} |
| 30 | \end{seealso} |
| 31 | |
| 32 | |
Fred Drake | 1189fa9 | 1998-12-22 18:24:13 +0000 | [diff] [blame] | 33 | \subsection{shlex Objects \label{shlex-objects}} |
Guido van Rossum | 5e97c9d | 1998-12-22 05:18:24 +0000 | [diff] [blame] | 34 | |
| 35 | A \class{shlex} instance has the following methods: |
| 36 | |
| 37 | \begin{methoddesc}{get_token}{} |
Fred Drake | 1189fa9 | 1998-12-22 18:24:13 +0000 | [diff] [blame] | 38 | Return a token. If tokens have been stacked using |
| 39 | \method{push_token()}, pop a token off the stack. Otherwise, read one |
| 40 | from the input stream. If reading encounters an immediate |
| 41 | end-of-file, an empty string is returned. |
Guido van Rossum | 5e97c9d | 1998-12-22 05:18:24 +0000 | [diff] [blame] | 42 | \end{methoddesc} |
| 43 | |
| 44 | \begin{methoddesc}{push_token}{str} |
| 45 | Push the argument onto the token stack. |
| 46 | \end{methoddesc} |
| 47 | |
| 48 | Instances of \class{shlex} subclasses have some public instance |
| 49 | variables which either control lexical analysis or can be used |
| 50 | for debugging: |
| 51 | |
| 52 | \begin{memberdesc}{commenters} |
| 53 | The string of characters that are recognized as comment beginners. |
| 54 | All characters from the comment beginner to end of line are ignored. |
Fred Drake | 1189fa9 | 1998-12-22 18:24:13 +0000 | [diff] [blame] | 55 | Includes just \character{\#} by default. |
Guido van Rossum | 5e97c9d | 1998-12-22 05:18:24 +0000 | [diff] [blame] | 56 | \end{memberdesc} |
| 57 | |
| 58 | \begin{memberdesc}{wordchars} |
| 59 | The string of characters that will accumulate into multi-character |
Fred Drake | 1189fa9 | 1998-12-22 18:24:13 +0000 | [diff] [blame] | 60 | tokens. By default, includes all \ASCII{} alphanumerics and |
| 61 | underscore. |
Guido van Rossum | 5e97c9d | 1998-12-22 05:18:24 +0000 | [diff] [blame] | 62 | \end{memberdesc} |
| 63 | |
| 64 | \begin{memberdesc}{whitespace} |
| 65 | Characters that will be considered whitespace and skipped. Whitespace |
Fred Drake | 1189fa9 | 1998-12-22 18:24:13 +0000 | [diff] [blame] | 66 | bounds tokens. By default, includes space, tab, linefeed and |
Guido van Rossum | 5e97c9d | 1998-12-22 05:18:24 +0000 | [diff] [blame] | 67 | carriage-return. |
| 68 | \end{memberdesc} |
| 69 | |
| 70 | \begin{memberdesc}{quotes} |
| 71 | Characters that will be considered string quotes. The token |
| 72 | accumulates until the same quote is encountered again (thus, different |
Fred Drake | 184e836 | 1999-05-11 15:14:15 +0000 | [diff] [blame] | 73 | quote types protect each other as in the shell.) By default, includes |
Fred Drake | 1189fa9 | 1998-12-22 18:24:13 +0000 | [diff] [blame] | 74 | \ASCII{} single and double quotes. |
Guido van Rossum | 5e97c9d | 1998-12-22 05:18:24 +0000 | [diff] [blame] | 75 | \end{memberdesc} |
| 76 | |
| 77 | Note that any character not declared to be a word character, |
| 78 | whitespace, or a quote will be returned as a single-character token. |
| 79 | |
| 80 | Quote and comment characters are not recognized within words. Thus, |
Fred Drake | 1189fa9 | 1998-12-22 18:24:13 +0000 | [diff] [blame] | 81 | the bare words \samp{ain't} and \samp{ain\#t} would be returned as single |
Guido van Rossum | 5e97c9d | 1998-12-22 05:18:24 +0000 | [diff] [blame] | 82 | tokens by the default parser. |
| 83 | |
| 84 | \begin{memberdesc}{lineno} |
| 85 | Source line number (count of newlines seen so far plus one). |
| 86 | \end{memberdesc} |
| 87 | |
| 88 | \begin{memberdesc}{token} |
Fred Drake | 1189fa9 | 1998-12-22 18:24:13 +0000 | [diff] [blame] | 89 | The token buffer. It may be useful to examine this when catching |
| 90 | exceptions. |
Guido van Rossum | 5e97c9d | 1998-12-22 05:18:24 +0000 | [diff] [blame] | 91 | \end{memberdesc} |