blob: a0709081accb00b349dff6731155bf4a272a23c8 [file] [log] [blame]
Fred Drake1189fa91998-12-22 18:24:13 +00001\section{\module{shlex} ---
Fred Drake184e8361999-05-11 15:14:15 +00002 Simple lexical analysis}
Fred Drake1189fa91998-12-22 18:24:13 +00003
4\declaremodule{standard}{shlex}
Fred Drake39cddb71999-01-12 19:22:11 +00005\modulesynopsis{Simple lexical analysis for \UNIX{} shell-like languages.}
Fred Drake1189fa91998-12-22 18:24:13 +00006\moduleauthor{Eric S. Raymond}{esr@snark.thyrsus.com}
7\sectionauthor{Eric S. Raymond}{esr@snark.thyrsus.com}
8
Fred Drake292b9eb1998-12-22 18:40:50 +00009\versionadded{1.5.2}
Fred Drake1189fa91998-12-22 18:24:13 +000010
11The \class{shlex} class makes it easy to write lexical analyzers for
12simple syntaxes resembling that of the \UNIX{} shell. This will often
13be useful for writing minilanguages, e.g.\ in run control files for
Guido van Rossum5e97c9d1998-12-22 05:18:24 +000014Python applications.
15
Guido van Rossumd67ddbb2000-05-01 20:14:47 +000016\begin{classdesc}{shlex}{\optional{stream}, \optional{file}}
Guido van Rossum5e97c9d1998-12-22 05:18:24 +000017A \class{shlex} instance or subclass instance is a lexical analyzer
18object. The initialization argument, if present, specifies where to
19read characters from. It must be a file- or stream-like object with
Fred Drake1189fa91998-12-22 18:24:13 +000020\method{read()} and \method{readline()} methods. If no argument is given,
Guido van Rossumd67ddbb2000-05-01 20:14:47 +000021input will be taken from \code{sys.stdin}. The second optional
22argument is a filename string, which sets the initial value of the
23\member{infile} member. If the stream argument is omitted or
24equal to \code{sys.stdin}, this second argument defauilts to ``stdin''.
Guido van Rossum5e97c9d1998-12-22 05:18:24 +000025\end{classdesc}
26
Fred Drake184e8361999-05-11 15:14:15 +000027
28\begin{seealso}
29 \seemodule{ConfigParser}{Parser for configuration files similar to the
30 Windows \file{.ini} files.}
31\end{seealso}
32
33
Fred Drake1189fa91998-12-22 18:24:13 +000034\subsection{shlex Objects \label{shlex-objects}}
Guido van Rossum5e97c9d1998-12-22 05:18:24 +000035
36A \class{shlex} instance has the following methods:
37
38\begin{methoddesc}{get_token}{}
Fred Drake1189fa91998-12-22 18:24:13 +000039Return a token. If tokens have been stacked using
40\method{push_token()}, pop a token off the stack. Otherwise, read one
41from the input stream. If reading encounters an immediate
42end-of-file, an empty string is returned.
Guido van Rossum5e97c9d1998-12-22 05:18:24 +000043\end{methoddesc}
44
45\begin{methoddesc}{push_token}{str}
46Push the argument onto the token stack.
47\end{methoddesc}
48
Guido van Rossumd67ddbb2000-05-01 20:14:47 +000049\begin{methoddesc}{read_token}{}
50Read a raw token. Ignore the pushback stack, and do not interpret source
51requests. (This is not ordinarily a useful entry point, and is
52documented here only for the sake of completeness.)
53\end{methoddesc}
54
55\begin{methoddesc}{openhook}{filename}
56When shlex detects a source request (see \member{source} below)
57this method is given the following token as argument, and expected to
58return a tuple consisting of a filename and an opened stream object.
59
60Normally, this method just strips any quotes off the argument and
61treats it as a filename, calling \code{open()} on it. It is exposed so that
62you can use it to implement directory search paths, addition of
63file extensions, and other namespace hacks.
64
65There is no corresponding `close' hook, but a shlex instance will call
66the \code{close()} method of the sourced input stream when it returns EOF.
67\end{methoddesc}
68
69\begin{methoddesc}{error_leader}{\optional{file}, \optional{line}}
70This method generates an error message leader in the format of a
71Unix C compiler error label; the format is '"\%s", line \%d: ',
72where the \%s is replaced with the name of the current source file and
73the \%d with the current input line number (the optional arguments
74can be used to override these).
75
76This convenience is provided to encourage shlex users to generate
77error messages in the standard, parseable format understood by Emacs
78and other Unix tools.
79\end{methoddesc}
80
Guido van Rossum5e97c9d1998-12-22 05:18:24 +000081Instances of \class{shlex} subclasses have some public instance
82variables which either control lexical analysis or can be used
83for debugging:
84
85\begin{memberdesc}{commenters}
86The string of characters that are recognized as comment beginners.
87All characters from the comment beginner to end of line are ignored.
Fred Drake1189fa91998-12-22 18:24:13 +000088Includes just \character{\#} by default.
Guido van Rossum5e97c9d1998-12-22 05:18:24 +000089\end{memberdesc}
90
91\begin{memberdesc}{wordchars}
92The string of characters that will accumulate into multi-character
Fred Drake1189fa91998-12-22 18:24:13 +000093tokens. By default, includes all \ASCII{} alphanumerics and
94underscore.
Guido van Rossum5e97c9d1998-12-22 05:18:24 +000095\end{memberdesc}
96
97\begin{memberdesc}{whitespace}
98Characters that will be considered whitespace and skipped. Whitespace
Fred Drake1189fa91998-12-22 18:24:13 +000099bounds tokens. By default, includes space, tab, linefeed and
Guido van Rossum5e97c9d1998-12-22 05:18:24 +0000100carriage-return.
101\end{memberdesc}
102
103\begin{memberdesc}{quotes}
104Characters that will be considered string quotes. The token
105accumulates until the same quote is encountered again (thus, different
Fred Drake184e8361999-05-11 15:14:15 +0000106quote types protect each other as in the shell.) By default, includes
Fred Drake1189fa91998-12-22 18:24:13 +0000107\ASCII{} single and double quotes.
Guido van Rossum5e97c9d1998-12-22 05:18:24 +0000108\end{memberdesc}
109
Guido van Rossumd67ddbb2000-05-01 20:14:47 +0000110\begin{memberdesc}{infile}
111The name of the current input file, as initially set at class
112instantiation time or stacked by later source requests. It may
113be useful to examine this when constructing error messages.
114\end{memberdesc}
115
116\begin{memberdesc}{instream}
117The input stream from which this shlex instance is reading characters.
118\end{memberdesc}
119
120\begin{memberdesc}{source}
121This member is None by default. If you assign a string to it, that
122string will be recognized as a lexical-level inclusion request similar
123to the `source' keyword in various shells. That is, the immediately
124following token will opened as a filename and input taken from that
125stream until EOF, at which point the \code{close()} method of that
126stream will be called and the input source will again become the
127original input stream. Source requests may be stacked any number of
128levels deep.
129\end{memberdesc}
130
131\begin{memberdesc}{debug}
132If this member is numeric and 1 or more, a shlex instance will print
133verbose progress output on its behavior. If you need to use this,
134you can read the module source code to learn the details.
135\end{memberdesc}
136
Guido van Rossum5e97c9d1998-12-22 05:18:24 +0000137Note that any character not declared to be a word character,
138whitespace, or a quote will be returned as a single-character token.
139
140Quote and comment characters are not recognized within words. Thus,
Fred Drake1189fa91998-12-22 18:24:13 +0000141the bare words \samp{ain't} and \samp{ain\#t} would be returned as single
Guido van Rossum5e97c9d1998-12-22 05:18:24 +0000142tokens by the default parser.
143
144\begin{memberdesc}{lineno}
145Source line number (count of newlines seen so far plus one).
146\end{memberdesc}
147
148\begin{memberdesc}{token}
Fred Drake1189fa91998-12-22 18:24:13 +0000149The token buffer. It may be useful to examine this when catching
150exceptions.
Guido van Rossum5e97c9d1998-12-22 05:18:24 +0000151\end{memberdesc}