blob: b9089379a423463eda5b0de34b084852f130cdd2 [file] [log] [blame]
Fred Drake1189fa91998-12-22 18:24:13 +00001\section{\module{shlex} ---
Fred Drake184e8361999-05-11 15:14:15 +00002 Simple lexical analysis}
Fred Drake1189fa91998-12-22 18:24:13 +00003
4\declaremodule{standard}{shlex}
Fred Drake39cddb71999-01-12 19:22:11 +00005\modulesynopsis{Simple lexical analysis for \UNIX{} shell-like languages.}
Fred Drake1189fa91998-12-22 18:24:13 +00006\moduleauthor{Eric S. Raymond}{esr@snark.thyrsus.com}
7\sectionauthor{Eric S. Raymond}{esr@snark.thyrsus.com}
8
Fred Drake292b9eb1998-12-22 18:40:50 +00009\versionadded{1.5.2}
Fred Drake1189fa91998-12-22 18:24:13 +000010
11The \class{shlex} class makes it easy to write lexical analyzers for
12simple syntaxes resembling that of the \UNIX{} shell. This will often
13be useful for writing minilanguages, e.g.\ in run control files for
Guido van Rossum5e97c9d1998-12-22 05:18:24 +000014Python applications.
15
Fred Drake52dc76c2000-07-03 09:56:23 +000016\begin{classdesc}{shlex}{\optional{stream\optional{, file}}}
Guido van Rossum5e97c9d1998-12-22 05:18:24 +000017A \class{shlex} instance or subclass instance is a lexical analyzer
18object. The initialization argument, if present, specifies where to
19read characters from. It must be a file- or stream-like object with
Fred Drake1189fa91998-12-22 18:24:13 +000020\method{read()} and \method{readline()} methods. If no argument is given,
Guido van Rossumd67ddbb2000-05-01 20:14:47 +000021input will be taken from \code{sys.stdin}. The second optional
22argument is a filename string, which sets the initial value of the
23\member{infile} member. If the stream argument is omitted or
Thomas Woutersf8316632000-07-16 19:01:10 +000024equal to \code{sys.stdin}, this second argument defaults to ``stdin''.
Guido van Rossum5e97c9d1998-12-22 05:18:24 +000025\end{classdesc}
26
Fred Drake184e8361999-05-11 15:14:15 +000027
28\begin{seealso}
29 \seemodule{ConfigParser}{Parser for configuration files similar to the
30 Windows \file{.ini} files.}
31\end{seealso}
32
33
Fred Drake1189fa91998-12-22 18:24:13 +000034\subsection{shlex Objects \label{shlex-objects}}
Guido van Rossum5e97c9d1998-12-22 05:18:24 +000035
36A \class{shlex} instance has the following methods:
37
Fred Drake52dc76c2000-07-03 09:56:23 +000038
Guido van Rossum5e97c9d1998-12-22 05:18:24 +000039\begin{methoddesc}{get_token}{}
Fred Drake1189fa91998-12-22 18:24:13 +000040Return a token. If tokens have been stacked using
41\method{push_token()}, pop a token off the stack. Otherwise, read one
42from the input stream. If reading encounters an immediate
43end-of-file, an empty string is returned.
Guido van Rossum5e97c9d1998-12-22 05:18:24 +000044\end{methoddesc}
45
46\begin{methoddesc}{push_token}{str}
47Push the argument onto the token stack.
48\end{methoddesc}
49
Guido van Rossumd67ddbb2000-05-01 20:14:47 +000050\begin{methoddesc}{read_token}{}
51Read a raw token. Ignore the pushback stack, and do not interpret source
52requests. (This is not ordinarily a useful entry point, and is
53documented here only for the sake of completeness.)
54\end{methoddesc}
55
Fred Drake52dc76c2000-07-03 09:56:23 +000056\begin{methoddesc}{sourcehook}{filename}
57When \class{shlex} detects a source request (see
58\member{source} below) this method is given the following token as
59argument, and expected to return a tuple consisting of a filename and
60an open file-like object.
Guido van Rossumd67ddbb2000-05-01 20:14:47 +000061
Fred Drake52dc76c2000-07-03 09:56:23 +000062Normally, this method first strips any quotes off the argument. If
63the result is an absolute pathname, or there was no previous source
64request in effect, or the previous source was a stream
65(e.g. \code{sys.stdin}), the result is left alone. Otherwise, if the
66result is a relative pathname, the directory part of the name of the
67file immediately before it on the source inclusion stack is prepended
68(this behavior is like the way the C preprocessor handles
Eric S. Raymondbd1a4892001-01-16 14:18:55 +000069\code{\#include "file.h"}).
70
71The result of the manipulations is treated as a filename, and returned
72as the first component of the tuple, with
73\function{open()} called on it to yield the second component. (Note:
74this is the reverse of the order of arguments in instance initialization!)
Guido van Rossumd67ddbb2000-05-01 20:14:47 +000075
Fred Drake52dc76c2000-07-03 09:56:23 +000076This hook is exposed so that you can use it to implement directory
77search paths, addition of file extensions, and other namespace hacks.
Guido van Rossumd67ddbb2000-05-01 20:14:47 +000078There is no corresponding `close' hook, but a shlex instance will call
Fred Drake52dc76c2000-07-03 09:56:23 +000079the \method{close()} method of the sourced input stream when it
80returns \EOF.
Eric S. Raymondbd1a4892001-01-16 14:18:55 +000081
82For more explicit control of source stacking, use the next two
83methods.
84\end{methoddesc}
85
86\begin{methoddesc}{push_source}{stream\optional{, filename}}
87Push an input source stream onto the input stack. If the filename
88argument is specified it will later be available for use in error
89messages. This is the same method used internally by the
90\method{sourcehook} method. (New in 2.1)
91\end{methoddesc}
92
93\begin{methoddesc}{pop_source}{}}
94Pop the last-pushed input source from the input stack.
95This is the same method used internally when the lexer reaches
96\EOF on a stacked input stream. (New in 2.1)
Guido van Rossumd67ddbb2000-05-01 20:14:47 +000097\end{methoddesc}
98
Fred Drake52dc76c2000-07-03 09:56:23 +000099\begin{methoddesc}{error_leader}{\optional{file\optional{, line}}}
Guido van Rossumd67ddbb2000-05-01 20:14:47 +0000100This method generates an error message leader in the format of a
Fred Drake52dc76c2000-07-03 09:56:23 +0000101\UNIX{} C compiler error label; the format is '"\%s", line \%d: ',
102where the \samp{\%s} is replaced with the name of the current source
103file and the \samp{\%d} with the current input line number (the
104optional arguments can be used to override these).
Guido van Rossumd67ddbb2000-05-01 20:14:47 +0000105
Fred Drake52dc76c2000-07-03 09:56:23 +0000106This convenience is provided to encourage \module{shlex} users to
107generate error messages in the standard, parseable format understood
108by Emacs and other \UNIX{} tools.
Guido van Rossumd67ddbb2000-05-01 20:14:47 +0000109\end{methoddesc}
110
Guido van Rossum5e97c9d1998-12-22 05:18:24 +0000111Instances of \class{shlex} subclasses have some public instance
Fred Drake52dc76c2000-07-03 09:56:23 +0000112variables which either control lexical analysis or can be used for
113debugging:
Guido van Rossum5e97c9d1998-12-22 05:18:24 +0000114
115\begin{memberdesc}{commenters}
116The string of characters that are recognized as comment beginners.
117All characters from the comment beginner to end of line are ignored.
Fred Drake1189fa91998-12-22 18:24:13 +0000118Includes just \character{\#} by default.
Guido van Rossum5e97c9d1998-12-22 05:18:24 +0000119\end{memberdesc}
120
121\begin{memberdesc}{wordchars}
122The string of characters that will accumulate into multi-character
Fred Drake52dc76c2000-07-03 09:56:23 +0000123tokens. By default, includes all \ASCII{} alphanumerics and
Fred Drake1189fa91998-12-22 18:24:13 +0000124underscore.
Guido van Rossum5e97c9d1998-12-22 05:18:24 +0000125\end{memberdesc}
126
127\begin{memberdesc}{whitespace}
128Characters that will be considered whitespace and skipped. Whitespace
Fred Drake1189fa91998-12-22 18:24:13 +0000129bounds tokens. By default, includes space, tab, linefeed and
Guido van Rossum5e97c9d1998-12-22 05:18:24 +0000130carriage-return.
131\end{memberdesc}
132
133\begin{memberdesc}{quotes}
134Characters that will be considered string quotes. The token
135accumulates until the same quote is encountered again (thus, different
Fred Drake184e8361999-05-11 15:14:15 +0000136quote types protect each other as in the shell.) By default, includes
Fred Drake1189fa91998-12-22 18:24:13 +0000137\ASCII{} single and double quotes.
Guido van Rossum5e97c9d1998-12-22 05:18:24 +0000138\end{memberdesc}
139
Guido van Rossumd67ddbb2000-05-01 20:14:47 +0000140\begin{memberdesc}{infile}
141The name of the current input file, as initially set at class
142instantiation time or stacked by later source requests. It may
143be useful to examine this when constructing error messages.
144\end{memberdesc}
145
146\begin{memberdesc}{instream}
Fred Drake52dc76c2000-07-03 09:56:23 +0000147The input stream from which this \class{shlex} instance is reading
148characters.
Guido van Rossumd67ddbb2000-05-01 20:14:47 +0000149\end{memberdesc}
150
151\begin{memberdesc}{source}
Fred Drake52dc76c2000-07-03 09:56:23 +0000152This member is \code{None} by default. If you assign a string to it,
153that string will be recognized as a lexical-level inclusion request
154similar to the \samp{source} keyword in various shells. That is, the
155immediately following token will opened as a filename and input taken
156from that stream until \EOF, at which point the \method{close()}
157method of that stream will be called and the input source will again
158become the original input stream. Source requests may be stacked any
159number of levels deep.
Guido van Rossumd67ddbb2000-05-01 20:14:47 +0000160\end{memberdesc}
161
162\begin{memberdesc}{debug}
Fred Drake52dc76c2000-07-03 09:56:23 +0000163If this member is numeric and \code{1} or more, a \class{shlex}
164instance will print verbose progress output on its behavior. If you
165need to use this, you can read the module source code to learn the
166details.
Guido van Rossumd67ddbb2000-05-01 20:14:47 +0000167\end{memberdesc}
168
Guido van Rossum5e97c9d1998-12-22 05:18:24 +0000169Note that any character not declared to be a word character,
170whitespace, or a quote will be returned as a single-character token.
171
172Quote and comment characters are not recognized within words. Thus,
Fred Drake1189fa91998-12-22 18:24:13 +0000173the bare words \samp{ain't} and \samp{ain\#t} would be returned as single
Guido van Rossum5e97c9d1998-12-22 05:18:24 +0000174tokens by the default parser.
175
176\begin{memberdesc}{lineno}
177Source line number (count of newlines seen so far plus one).
178\end{memberdesc}
179
180\begin{memberdesc}{token}
Fred Drake1189fa91998-12-22 18:24:13 +0000181The token buffer. It may be useful to examine this when catching
182exceptions.
Guido van Rossum5e97c9d1998-12-22 05:18:24 +0000183\end{memberdesc}