Fred Drake | 1189fa9 | 1998-12-22 18:24:13 +0000 | [diff] [blame] | 1 | \section{\module{shlex} --- |
Fred Drake | 184e836 | 1999-05-11 15:14:15 +0000 | [diff] [blame] | 2 | Simple lexical analysis} |
Fred Drake | 1189fa9 | 1998-12-22 18:24:13 +0000 | [diff] [blame] | 3 | |
| 4 | \declaremodule{standard}{shlex} |
Fred Drake | 39cddb7 | 1999-01-12 19:22:11 +0000 | [diff] [blame] | 5 | \modulesynopsis{Simple lexical analysis for \UNIX{} shell-like languages.} |
Fred Drake | 1189fa9 | 1998-12-22 18:24:13 +0000 | [diff] [blame] | 6 | \moduleauthor{Eric S. Raymond}{esr@snark.thyrsus.com} |
| 7 | \sectionauthor{Eric S. Raymond}{esr@snark.thyrsus.com} |
| 8 | |
Fred Drake | 292b9eb | 1998-12-22 18:40:50 +0000 | [diff] [blame] | 9 | \versionadded{1.5.2} |
Fred Drake | 1189fa9 | 1998-12-22 18:24:13 +0000 | [diff] [blame] | 10 | |
| 11 | The \class{shlex} class makes it easy to write lexical analyzers for |
| 12 | simple syntaxes resembling that of the \UNIX{} shell. This will often |
| 13 | be useful for writing minilanguages, e.g.\ in run control files for |
Guido van Rossum | 5e97c9d | 1998-12-22 05:18:24 +0000 | [diff] [blame] | 14 | Python applications. |
| 15 | |
Guido van Rossum | d67ddbb | 2000-05-01 20:14:47 +0000 | [diff] [blame^] | 16 | \begin{classdesc}{shlex}{\optional{stream}, \optional{file}} |
Guido van Rossum | 5e97c9d | 1998-12-22 05:18:24 +0000 | [diff] [blame] | 17 | A \class{shlex} instance or subclass instance is a lexical analyzer |
| 18 | object. The initialization argument, if present, specifies where to |
| 19 | read characters from. It must be a file- or stream-like object with |
Fred Drake | 1189fa9 | 1998-12-22 18:24:13 +0000 | [diff] [blame] | 20 | \method{read()} and \method{readline()} methods. If no argument is given, |
Guido van Rossum | d67ddbb | 2000-05-01 20:14:47 +0000 | [diff] [blame^] | 21 | input will be taken from \code{sys.stdin}. The second optional |
| 22 | argument is a filename string, which sets the initial value of the |
| 23 | \member{infile} member. If the stream argument is omitted or |
| 24 | equal to \code{sys.stdin}, this second argument defauilts to ``stdin''. |
Guido van Rossum | 5e97c9d | 1998-12-22 05:18:24 +0000 | [diff] [blame] | 25 | \end{classdesc} |
| 26 | |
Fred Drake | 184e836 | 1999-05-11 15:14:15 +0000 | [diff] [blame] | 27 | |
| 28 | \begin{seealso} |
| 29 | \seemodule{ConfigParser}{Parser for configuration files similar to the |
| 30 | Windows \file{.ini} files.} |
| 31 | \end{seealso} |
| 32 | |
| 33 | |
Fred Drake | 1189fa9 | 1998-12-22 18:24:13 +0000 | [diff] [blame] | 34 | \subsection{shlex Objects \label{shlex-objects}} |
Guido van Rossum | 5e97c9d | 1998-12-22 05:18:24 +0000 | [diff] [blame] | 35 | |
| 36 | A \class{shlex} instance has the following methods: |
| 37 | |
| 38 | \begin{methoddesc}{get_token}{} |
Fred Drake | 1189fa9 | 1998-12-22 18:24:13 +0000 | [diff] [blame] | 39 | Return a token. If tokens have been stacked using |
| 40 | \method{push_token()}, pop a token off the stack. Otherwise, read one |
| 41 | from the input stream. If reading encounters an immediate |
| 42 | end-of-file, an empty string is returned. |
Guido van Rossum | 5e97c9d | 1998-12-22 05:18:24 +0000 | [diff] [blame] | 43 | \end{methoddesc} |
| 44 | |
| 45 | \begin{methoddesc}{push_token}{str} |
| 46 | Push the argument onto the token stack. |
| 47 | \end{methoddesc} |
| 48 | |
Guido van Rossum | d67ddbb | 2000-05-01 20:14:47 +0000 | [diff] [blame^] | 49 | \begin{methoddesc}{read_token}{} |
| 50 | Read a raw token. Ignore the pushback stack, and do not interpret source |
| 51 | requests. (This is not ordinarily a useful entry point, and is |
| 52 | documented here only for the sake of completeness.) |
| 53 | \end{methoddesc} |
| 54 | |
| 55 | \begin{methoddesc}{openhook}{filename} |
| 56 | When shlex detects a source request (see \member{source} below) |
| 57 | this method is given the following token as argument, and expected to |
| 58 | return a tuple consisting of a filename and an opened stream object. |
| 59 | |
| 60 | Normally, this method just strips any quotes off the argument and |
| 61 | treats it as a filename, calling \code{open()} on it. It is exposed so that |
| 62 | you can use it to implement directory search paths, addition of |
| 63 | file extensions, and other namespace hacks. |
| 64 | |
| 65 | There is no corresponding `close' hook, but a shlex instance will call |
| 66 | the \code{close()} method of the sourced input stream when it returns EOF. |
| 67 | \end{methoddesc} |
| 68 | |
| 69 | \begin{methoddesc}{error_leader}{\optional{file}, \optional{line}} |
| 70 | This method generates an error message leader in the format of a |
| 71 | Unix C compiler error label; the format is '"\%s", line \%d: ', |
| 72 | where the \%s is replaced with the name of the current source file and |
| 73 | the \%d with the current input line number (the optional arguments |
| 74 | can be used to override these). |
| 75 | |
| 76 | This convenience is provided to encourage shlex users to generate |
| 77 | error messages in the standard, parseable format understood by Emacs |
| 78 | and other Unix tools. |
| 79 | \end{methoddesc} |
| 80 | |
Guido van Rossum | 5e97c9d | 1998-12-22 05:18:24 +0000 | [diff] [blame] | 81 | Instances of \class{shlex} subclasses have some public instance |
| 82 | variables which either control lexical analysis or can be used |
| 83 | for debugging: |
| 84 | |
| 85 | \begin{memberdesc}{commenters} |
| 86 | The string of characters that are recognized as comment beginners. |
| 87 | All characters from the comment beginner to end of line are ignored. |
Fred Drake | 1189fa9 | 1998-12-22 18:24:13 +0000 | [diff] [blame] | 88 | Includes just \character{\#} by default. |
Guido van Rossum | 5e97c9d | 1998-12-22 05:18:24 +0000 | [diff] [blame] | 89 | \end{memberdesc} |
| 90 | |
| 91 | \begin{memberdesc}{wordchars} |
| 92 | The string of characters that will accumulate into multi-character |
Fred Drake | 1189fa9 | 1998-12-22 18:24:13 +0000 | [diff] [blame] | 93 | tokens. By default, includes all \ASCII{} alphanumerics and |
| 94 | underscore. |
Guido van Rossum | 5e97c9d | 1998-12-22 05:18:24 +0000 | [diff] [blame] | 95 | \end{memberdesc} |
| 96 | |
| 97 | \begin{memberdesc}{whitespace} |
| 98 | Characters that will be considered whitespace and skipped. Whitespace |
Fred Drake | 1189fa9 | 1998-12-22 18:24:13 +0000 | [diff] [blame] | 99 | bounds tokens. By default, includes space, tab, linefeed and |
Guido van Rossum | 5e97c9d | 1998-12-22 05:18:24 +0000 | [diff] [blame] | 100 | carriage-return. |
| 101 | \end{memberdesc} |
| 102 | |
| 103 | \begin{memberdesc}{quotes} |
| 104 | Characters that will be considered string quotes. The token |
| 105 | accumulates until the same quote is encountered again (thus, different |
Fred Drake | 184e836 | 1999-05-11 15:14:15 +0000 | [diff] [blame] | 106 | quote types protect each other as in the shell.) By default, includes |
Fred Drake | 1189fa9 | 1998-12-22 18:24:13 +0000 | [diff] [blame] | 107 | \ASCII{} single and double quotes. |
Guido van Rossum | 5e97c9d | 1998-12-22 05:18:24 +0000 | [diff] [blame] | 108 | \end{memberdesc} |
| 109 | |
Guido van Rossum | d67ddbb | 2000-05-01 20:14:47 +0000 | [diff] [blame^] | 110 | \begin{memberdesc}{infile} |
| 111 | The name of the current input file, as initially set at class |
| 112 | instantiation time or stacked by later source requests. It may |
| 113 | be useful to examine this when constructing error messages. |
| 114 | \end{memberdesc} |
| 115 | |
| 116 | \begin{memberdesc}{instream} |
| 117 | The input stream from which this shlex instance is reading characters. |
| 118 | \end{memberdesc} |
| 119 | |
| 120 | \begin{memberdesc}{source} |
| 121 | This member is None by default. If you assign a string to it, that |
| 122 | string will be recognized as a lexical-level inclusion request similar |
| 123 | to the `source' keyword in various shells. That is, the immediately |
| 124 | following token will opened as a filename and input taken from that |
| 125 | stream until EOF, at which point the \code{close()} method of that |
| 126 | stream will be called and the input source will again become the |
| 127 | original input stream. Source requests may be stacked any number of |
| 128 | levels deep. |
| 129 | \end{memberdesc} |
| 130 | |
| 131 | \begin{memberdesc}{debug} |
| 132 | If this member is numeric and 1 or more, a shlex instance will print |
| 133 | verbose progress output on its behavior. If you need to use this, |
| 134 | you can read the module source code to learn the details. |
| 135 | \end{memberdesc} |
| 136 | |
Guido van Rossum | 5e97c9d | 1998-12-22 05:18:24 +0000 | [diff] [blame] | 137 | Note that any character not declared to be a word character, |
| 138 | whitespace, or a quote will be returned as a single-character token. |
| 139 | |
| 140 | Quote and comment characters are not recognized within words. Thus, |
Fred Drake | 1189fa9 | 1998-12-22 18:24:13 +0000 | [diff] [blame] | 141 | the bare words \samp{ain't} and \samp{ain\#t} would be returned as single |
Guido van Rossum | 5e97c9d | 1998-12-22 05:18:24 +0000 | [diff] [blame] | 142 | tokens by the default parser. |
| 143 | |
| 144 | \begin{memberdesc}{lineno} |
| 145 | Source line number (count of newlines seen so far plus one). |
| 146 | \end{memberdesc} |
| 147 | |
| 148 | \begin{memberdesc}{token} |
Fred Drake | 1189fa9 | 1998-12-22 18:24:13 +0000 | [diff] [blame] | 149 | The token buffer. It may be useful to examine this when catching |
| 150 | exceptions. |
Guido van Rossum | 5e97c9d | 1998-12-22 05:18:24 +0000 | [diff] [blame] | 151 | \end{memberdesc} |