| \section{\module{shlex} --- |
| Simple lexical analysis} |
| |
| \declaremodule{standard}{shlex} |
| \modulesynopsis{Simple lexical analysis for \UNIX{} shell-like languages.} |
| \moduleauthor{Eric S. Raymond}{esr@snark.thyrsus.com} |
| \sectionauthor{Eric S. Raymond}{esr@snark.thyrsus.com} |
| |
| \versionadded{1.5.2} |
| |
| The \class{shlex} class makes it easy to write lexical analyzers for |
| simple syntaxes resembling that of the \UNIX{} shell. This will often |
| be useful for writing minilanguages, e.g.\ in run control files for |
| Python applications. |
| |
| \begin{classdesc}{shlex}{\optional{stream}, \optional{file}} |
| A \class{shlex} instance or subclass instance is a lexical analyzer |
| object. The initialization argument, if present, specifies where to |
| read characters from. It must be a file- or stream-like object with |
| \method{read()} and \method{readline()} methods. If no argument is given, |
| input will be taken from \code{sys.stdin}. The second optional |
| argument is a filename string, which sets the initial value of the |
| \member{infile} member. If the stream argument is omitted or |
| equal to \code{sys.stdin}, this second argument defauilts to ``stdin''. |
| \end{classdesc} |
| |
| |
| \begin{seealso} |
| \seemodule{ConfigParser}{Parser for configuration files similar to the |
| Windows \file{.ini} files.} |
| \end{seealso} |
| |
| |
| \subsection{shlex Objects \label{shlex-objects}} |
| |
| A \class{shlex} instance has the following methods: |
| |
| \begin{methoddesc}{get_token}{} |
| Return a token. If tokens have been stacked using |
| \method{push_token()}, pop a token off the stack. Otherwise, read one |
| from the input stream. If reading encounters an immediate |
| end-of-file, an empty string is returned. |
| \end{methoddesc} |
| |
| \begin{methoddesc}{push_token}{str} |
| Push the argument onto the token stack. |
| \end{methoddesc} |
| |
| \begin{methoddesc}{read_token}{} |
| Read a raw token. Ignore the pushback stack, and do not interpret source |
| requests. (This is not ordinarily a useful entry point, and is |
| documented here only for the sake of completeness.) |
| \end{methoddesc} |
| |
| \begin{methoddesc}{openhook}{filename} |
| When shlex detects a source request (see \member{source} below) |
| this method is given the following token as argument, and expected to |
| return a tuple consisting of a filename and an opened stream object. |
| |
| Normally, this method just strips any quotes off the argument and |
| treats it as a filename, calling \code{open()} on it. It is exposed so that |
| you can use it to implement directory search paths, addition of |
| file extensions, and other namespace hacks. |
| |
| There is no corresponding `close' hook, but a shlex instance will call |
| the \code{close()} method of the sourced input stream when it returns EOF. |
| \end{methoddesc} |
| |
| \begin{methoddesc}{error_leader}{\optional{file}, \optional{line}} |
| This method generates an error message leader in the format of a |
| Unix C compiler error label; the format is '"\%s", line \%d: ', |
| where the \%s is replaced with the name of the current source file and |
| the \%d with the current input line number (the optional arguments |
| can be used to override these). |
| |
| This convenience is provided to encourage shlex users to generate |
| error messages in the standard, parseable format understood by Emacs |
| and other Unix tools. |
| \end{methoddesc} |
| |
| Instances of \class{shlex} subclasses have some public instance |
| variables which either control lexical analysis or can be used |
| for debugging: |
| |
| \begin{memberdesc}{commenters} |
| The string of characters that are recognized as comment beginners. |
| All characters from the comment beginner to end of line are ignored. |
| Includes just \character{\#} by default. |
| \end{memberdesc} |
| |
| \begin{memberdesc}{wordchars} |
| The string of characters that will accumulate into multi-character |
| tokens. By default, includes all \ASCII{} alphanumerics and |
| underscore. |
| \end{memberdesc} |
| |
| \begin{memberdesc}{whitespace} |
| Characters that will be considered whitespace and skipped. Whitespace |
| bounds tokens. By default, includes space, tab, linefeed and |
| carriage-return. |
| \end{memberdesc} |
| |
| \begin{memberdesc}{quotes} |
| Characters that will be considered string quotes. The token |
| accumulates until the same quote is encountered again (thus, different |
| quote types protect each other as in the shell.) By default, includes |
| \ASCII{} single and double quotes. |
| \end{memberdesc} |
| |
| \begin{memberdesc}{infile} |
| The name of the current input file, as initially set at class |
| instantiation time or stacked by later source requests. It may |
| be useful to examine this when constructing error messages. |
| \end{memberdesc} |
| |
| \begin{memberdesc}{instream} |
| The input stream from which this shlex instance is reading characters. |
| \end{memberdesc} |
| |
| \begin{memberdesc}{source} |
| This member is None by default. If you assign a string to it, that |
| string will be recognized as a lexical-level inclusion request similar |
| to the `source' keyword in various shells. That is, the immediately |
| following token will opened as a filename and input taken from that |
| stream until EOF, at which point the \code{close()} method of that |
| stream will be called and the input source will again become the |
| original input stream. Source requests may be stacked any number of |
| levels deep. |
| \end{memberdesc} |
| |
| \begin{memberdesc}{debug} |
| If this member is numeric and 1 or more, a shlex instance will print |
| verbose progress output on its behavior. If you need to use this, |
| you can read the module source code to learn the details. |
| \end{memberdesc} |
| |
| Note that any character not declared to be a word character, |
| whitespace, or a quote will be returned as a single-character token. |
| |
| Quote and comment characters are not recognized within words. Thus, |
| the bare words \samp{ain't} and \samp{ain\#t} would be returned as single |
| tokens by the default parser. |
| |
| \begin{memberdesc}{lineno} |
| Source line number (count of newlines seen so far plus one). |
| \end{memberdesc} |
| |
| \begin{memberdesc}{token} |
| The token buffer. It may be useful to examine this when catching |
| exceptions. |
| \end{memberdesc} |