blob: 4ed928c95f548bbfa073845cefbec1a77188c138 [file] [log] [blame]
Fred Drake1189fa91998-12-22 18:24:13 +00001\section{\module{shlex} ---
Fred Drake184e8361999-05-11 15:14:15 +00002 Simple lexical analysis}
Fred Drake1189fa91998-12-22 18:24:13 +00003
4\declaremodule{standard}{shlex}
Fred Drake39cddb71999-01-12 19:22:11 +00005\modulesynopsis{Simple lexical analysis for \UNIX{} shell-like languages.}
Fred Drake1189fa91998-12-22 18:24:13 +00006\moduleauthor{Eric S. Raymond}{esr@snark.thyrsus.com}
7\sectionauthor{Eric S. Raymond}{esr@snark.thyrsus.com}
8
Fred Drake292b9eb1998-12-22 18:40:50 +00009\versionadded{1.5.2}
Fred Drake1189fa91998-12-22 18:24:13 +000010
11The \class{shlex} class makes it easy to write lexical analyzers for
12simple syntaxes resembling that of the \UNIX{} shell. This will often
13be useful for writing minilanguages, e.g.\ in run control files for
Guido van Rossum5e97c9d1998-12-22 05:18:24 +000014Python applications.
15
Fred Drake52dc76c2000-07-03 09:56:23 +000016\begin{classdesc}{shlex}{\optional{stream\optional{, file}}}
Guido van Rossum5e97c9d1998-12-22 05:18:24 +000017A \class{shlex} instance or subclass instance is a lexical analyzer
18object. The initialization argument, if present, specifies where to
19read characters from. It must be a file- or stream-like object with
Fred Drake1189fa91998-12-22 18:24:13 +000020\method{read()} and \method{readline()} methods. If no argument is given,
Guido van Rossumd67ddbb2000-05-01 20:14:47 +000021input will be taken from \code{sys.stdin}. The second optional
22argument is a filename string, which sets the initial value of the
23\member{infile} member. If the stream argument is omitted or
Thomas Woutersf8316632000-07-16 19:01:10 +000024equal to \code{sys.stdin}, this second argument defaults to ``stdin''.
Guido van Rossum5e97c9d1998-12-22 05:18:24 +000025\end{classdesc}
26
Fred Drake184e8361999-05-11 15:14:15 +000027
28\begin{seealso}
29 \seemodule{ConfigParser}{Parser for configuration files similar to the
30 Windows \file{.ini} files.}
31\end{seealso}
32
33
Fred Drake1189fa91998-12-22 18:24:13 +000034\subsection{shlex Objects \label{shlex-objects}}
Guido van Rossum5e97c9d1998-12-22 05:18:24 +000035
36A \class{shlex} instance has the following methods:
37
Fred Drake52dc76c2000-07-03 09:56:23 +000038
Guido van Rossum5e97c9d1998-12-22 05:18:24 +000039\begin{methoddesc}{get_token}{}
Fred Drake1189fa91998-12-22 18:24:13 +000040Return a token. If tokens have been stacked using
41\method{push_token()}, pop a token off the stack. Otherwise, read one
42from the input stream. If reading encounters an immediate
43end-of-file, an empty string is returned.
Guido van Rossum5e97c9d1998-12-22 05:18:24 +000044\end{methoddesc}
45
46\begin{methoddesc}{push_token}{str}
47Push the argument onto the token stack.
48\end{methoddesc}
49
Guido van Rossumd67ddbb2000-05-01 20:14:47 +000050\begin{methoddesc}{read_token}{}
51Read a raw token. Ignore the pushback stack, and do not interpret source
52requests. (This is not ordinarily a useful entry point, and is
53documented here only for the sake of completeness.)
54\end{methoddesc}
55
Fred Drake52dc76c2000-07-03 09:56:23 +000056\begin{methoddesc}{sourcehook}{filename}
57When \class{shlex} detects a source request (see
58\member{source} below) this method is given the following token as
59argument, and expected to return a tuple consisting of a filename and
60an open file-like object.
Guido van Rossumd67ddbb2000-05-01 20:14:47 +000061
Fred Drake52dc76c2000-07-03 09:56:23 +000062Normally, this method first strips any quotes off the argument. If
63the result is an absolute pathname, or there was no previous source
64request in effect, or the previous source was a stream
65(e.g. \code{sys.stdin}), the result is left alone. Otherwise, if the
66result is a relative pathname, the directory part of the name of the
67file immediately before it on the source inclusion stack is prepended
68(this behavior is like the way the C preprocessor handles
Eric S. Raymondbd1a4892001-01-16 14:18:55 +000069\code{\#include "file.h"}).
70
71The result of the manipulations is treated as a filename, and returned
72as the first component of the tuple, with
73\function{open()} called on it to yield the second component. (Note:
74this is the reverse of the order of arguments in instance initialization!)
Guido van Rossumd67ddbb2000-05-01 20:14:47 +000075
Fred Drake52dc76c2000-07-03 09:56:23 +000076This hook is exposed so that you can use it to implement directory
77search paths, addition of file extensions, and other namespace hacks.
Guido van Rossumd67ddbb2000-05-01 20:14:47 +000078There is no corresponding `close' hook, but a shlex instance will call
Fred Drake52dc76c2000-07-03 09:56:23 +000079the \method{close()} method of the sourced input stream when it
80returns \EOF.
Eric S. Raymondbd1a4892001-01-16 14:18:55 +000081
Fred Drake25be1932001-01-16 20:52:41 +000082For more explicit control of source stacking, use the
83\method{push_source()} and \method{pop_source()} methods.
Eric S. Raymondbd1a4892001-01-16 14:18:55 +000084\end{methoddesc}
85
86\begin{methoddesc}{push_source}{stream\optional{, filename}}
87Push an input source stream onto the input stack. If the filename
88argument is specified it will later be available for use in error
89messages. This is the same method used internally by the
Fred Drake25be1932001-01-16 20:52:41 +000090\method{sourcehook} method.
91\versionadded{2.1}
Eric S. Raymondbd1a4892001-01-16 14:18:55 +000092\end{methoddesc}
93
Fred Drake25be1932001-01-16 20:52:41 +000094\begin{methoddesc}{pop_source}{}
Eric S. Raymondbd1a4892001-01-16 14:18:55 +000095Pop the last-pushed input source from the input stack.
96This is the same method used internally when the lexer reaches
Fred Drake25be1932001-01-16 20:52:41 +000097\EOF on a stacked input stream.
98\versionadded{2.1}
Guido van Rossumd67ddbb2000-05-01 20:14:47 +000099\end{methoddesc}
100
Fred Drake52dc76c2000-07-03 09:56:23 +0000101\begin{methoddesc}{error_leader}{\optional{file\optional{, line}}}
Guido van Rossumd67ddbb2000-05-01 20:14:47 +0000102This method generates an error message leader in the format of a
Fred Drake25be1932001-01-16 20:52:41 +0000103\UNIX{} C compiler error label; the format is \code{'"\%s", line \%d: '},
Fred Drake52dc76c2000-07-03 09:56:23 +0000104where the \samp{\%s} is replaced with the name of the current source
105file and the \samp{\%d} with the current input line number (the
106optional arguments can be used to override these).
Guido van Rossumd67ddbb2000-05-01 20:14:47 +0000107
Fred Drake52dc76c2000-07-03 09:56:23 +0000108This convenience is provided to encourage \module{shlex} users to
109generate error messages in the standard, parseable format understood
110by Emacs and other \UNIX{} tools.
Guido van Rossumd67ddbb2000-05-01 20:14:47 +0000111\end{methoddesc}
112
Guido van Rossum5e97c9d1998-12-22 05:18:24 +0000113Instances of \class{shlex} subclasses have some public instance
Fred Drake52dc76c2000-07-03 09:56:23 +0000114variables which either control lexical analysis or can be used for
115debugging:
Guido van Rossum5e97c9d1998-12-22 05:18:24 +0000116
117\begin{memberdesc}{commenters}
118The string of characters that are recognized as comment beginners.
119All characters from the comment beginner to end of line are ignored.
Fred Drake1189fa91998-12-22 18:24:13 +0000120Includes just \character{\#} by default.
Guido van Rossum5e97c9d1998-12-22 05:18:24 +0000121\end{memberdesc}
122
123\begin{memberdesc}{wordchars}
124The string of characters that will accumulate into multi-character
Fred Drake52dc76c2000-07-03 09:56:23 +0000125tokens. By default, includes all \ASCII{} alphanumerics and
Fred Drake1189fa91998-12-22 18:24:13 +0000126underscore.
Guido van Rossum5e97c9d1998-12-22 05:18:24 +0000127\end{memberdesc}
128
129\begin{memberdesc}{whitespace}
130Characters that will be considered whitespace and skipped. Whitespace
Fred Drake1189fa91998-12-22 18:24:13 +0000131bounds tokens. By default, includes space, tab, linefeed and
Guido van Rossum5e97c9d1998-12-22 05:18:24 +0000132carriage-return.
133\end{memberdesc}
134
135\begin{memberdesc}{quotes}
136Characters that will be considered string quotes. The token
137accumulates until the same quote is encountered again (thus, different
Fred Drake184e8361999-05-11 15:14:15 +0000138quote types protect each other as in the shell.) By default, includes
Fred Drake1189fa91998-12-22 18:24:13 +0000139\ASCII{} single and double quotes.
Guido van Rossum5e97c9d1998-12-22 05:18:24 +0000140\end{memberdesc}
141
Guido van Rossumd67ddbb2000-05-01 20:14:47 +0000142\begin{memberdesc}{infile}
143The name of the current input file, as initially set at class
144instantiation time or stacked by later source requests. It may
145be useful to examine this when constructing error messages.
146\end{memberdesc}
147
148\begin{memberdesc}{instream}
Fred Drake52dc76c2000-07-03 09:56:23 +0000149The input stream from which this \class{shlex} instance is reading
150characters.
Guido van Rossumd67ddbb2000-05-01 20:14:47 +0000151\end{memberdesc}
152
153\begin{memberdesc}{source}
Fred Drake52dc76c2000-07-03 09:56:23 +0000154This member is \code{None} by default. If you assign a string to it,
155that string will be recognized as a lexical-level inclusion request
156similar to the \samp{source} keyword in various shells. That is, the
157immediately following token will opened as a filename and input taken
158from that stream until \EOF, at which point the \method{close()}
159method of that stream will be called and the input source will again
160become the original input stream. Source requests may be stacked any
161number of levels deep.
Guido van Rossumd67ddbb2000-05-01 20:14:47 +0000162\end{memberdesc}
163
164\begin{memberdesc}{debug}
Fred Drake52dc76c2000-07-03 09:56:23 +0000165If this member is numeric and \code{1} or more, a \class{shlex}
166instance will print verbose progress output on its behavior. If you
167need to use this, you can read the module source code to learn the
168details.
Guido van Rossumd67ddbb2000-05-01 20:14:47 +0000169\end{memberdesc}
170
Guido van Rossum5e97c9d1998-12-22 05:18:24 +0000171Note that any character not declared to be a word character,
172whitespace, or a quote will be returned as a single-character token.
173
174Quote and comment characters are not recognized within words. Thus,
Fred Drake1189fa91998-12-22 18:24:13 +0000175the bare words \samp{ain't} and \samp{ain\#t} would be returned as single
Guido van Rossum5e97c9d1998-12-22 05:18:24 +0000176tokens by the default parser.
177
178\begin{memberdesc}{lineno}
179Source line number (count of newlines seen so far plus one).
180\end{memberdesc}
181
182\begin{memberdesc}{token}
Fred Drake1189fa91998-12-22 18:24:13 +0000183The token buffer. It may be useful to examine this when catching
184exceptions.
Guido van Rossum5e97c9d1998-12-22 05:18:24 +0000185\end{memberdesc}