blob: 85cbbcd1900c24a8e77a57fba03593324c3c72d8 [file] [log] [blame]
Fred Drake1189fa91998-12-22 18:24:13 +00001\section{\module{shlex} ---
Fred Drake184e8361999-05-11 15:14:15 +00002 Simple lexical analysis}
Fred Drake1189fa91998-12-22 18:24:13 +00003
4\declaremodule{standard}{shlex}
Fred Drake39cddb71999-01-12 19:22:11 +00005\modulesynopsis{Simple lexical analysis for \UNIX{} shell-like languages.}
Fred Drake1189fa91998-12-22 18:24:13 +00006\moduleauthor{Eric S. Raymond}{esr@snark.thyrsus.com}
7\sectionauthor{Eric S. Raymond}{esr@snark.thyrsus.com}
8
Fred Drake292b9eb1998-12-22 18:40:50 +00009\versionadded{1.5.2}
Fred Drake1189fa91998-12-22 18:24:13 +000010
11The \class{shlex} class makes it easy to write lexical analyzers for
12simple syntaxes resembling that of the \UNIX{} shell. This will often
13be useful for writing minilanguages, e.g.\ in run control files for
Guido van Rossum5e97c9d1998-12-22 05:18:24 +000014Python applications.
15
Fred Drake52dc76c2000-07-03 09:56:23 +000016\begin{classdesc}{shlex}{\optional{stream\optional{, file}}}
Guido van Rossum5e97c9d1998-12-22 05:18:24 +000017A \class{shlex} instance or subclass instance is a lexical analyzer
18object. The initialization argument, if present, specifies where to
19read characters from. It must be a file- or stream-like object with
Fred Drake1189fa91998-12-22 18:24:13 +000020\method{read()} and \method{readline()} methods. If no argument is given,
Guido van Rossumd67ddbb2000-05-01 20:14:47 +000021input will be taken from \code{sys.stdin}. The second optional
22argument is a filename string, which sets the initial value of the
23\member{infile} member. If the stream argument is omitted or
Thomas Woutersf8316632000-07-16 19:01:10 +000024equal to \code{sys.stdin}, this second argument defaults to ``stdin''.
Guido van Rossum5e97c9d1998-12-22 05:18:24 +000025\end{classdesc}
26
Fred Drake184e8361999-05-11 15:14:15 +000027
28\begin{seealso}
29 \seemodule{ConfigParser}{Parser for configuration files similar to the
30 Windows \file{.ini} files.}
31\end{seealso}
32
33
Fred Drake1189fa91998-12-22 18:24:13 +000034\subsection{shlex Objects \label{shlex-objects}}
Guido van Rossum5e97c9d1998-12-22 05:18:24 +000035
36A \class{shlex} instance has the following methods:
37
Fred Drake52dc76c2000-07-03 09:56:23 +000038
Guido van Rossum5e97c9d1998-12-22 05:18:24 +000039\begin{methoddesc}{get_token}{}
Fred Drake1189fa91998-12-22 18:24:13 +000040Return a token. If tokens have been stacked using
41\method{push_token()}, pop a token off the stack. Otherwise, read one
42from the input stream. If reading encounters an immediate
43end-of-file, an empty string is returned.
Guido van Rossum5e97c9d1998-12-22 05:18:24 +000044\end{methoddesc}
45
46\begin{methoddesc}{push_token}{str}
47Push the argument onto the token stack.
48\end{methoddesc}
49
Guido van Rossumd67ddbb2000-05-01 20:14:47 +000050\begin{methoddesc}{read_token}{}
51Read a raw token. Ignore the pushback stack, and do not interpret source
52requests. (This is not ordinarily a useful entry point, and is
53documented here only for the sake of completeness.)
54\end{methoddesc}
55
Fred Drake52dc76c2000-07-03 09:56:23 +000056\begin{methoddesc}{sourcehook}{filename}
57When \class{shlex} detects a source request (see
58\member{source} below) this method is given the following token as
59argument, and expected to return a tuple consisting of a filename and
60an open file-like object.
Guido van Rossumd67ddbb2000-05-01 20:14:47 +000061
Fred Drake52dc76c2000-07-03 09:56:23 +000062Normally, this method first strips any quotes off the argument. If
63the result is an absolute pathname, or there was no previous source
64request in effect, or the previous source was a stream
65(e.g. \code{sys.stdin}), the result is left alone. Otherwise, if the
66result is a relative pathname, the directory part of the name of the
67file immediately before it on the source inclusion stack is prepended
68(this behavior is like the way the C preprocessor handles
69\code{\#include "file.h"}). The result of the manipulations is treated
70as a filename, and returned as the first component of the tuple, with
71\function{open()} called on it to yield the second component.
Guido van Rossumd67ddbb2000-05-01 20:14:47 +000072
Fred Drake52dc76c2000-07-03 09:56:23 +000073This hook is exposed so that you can use it to implement directory
74search paths, addition of file extensions, and other namespace hacks.
Guido van Rossumd67ddbb2000-05-01 20:14:47 +000075There is no corresponding `close' hook, but a shlex instance will call
Fred Drake52dc76c2000-07-03 09:56:23 +000076the \method{close()} method of the sourced input stream when it
77returns \EOF.
Guido van Rossumd67ddbb2000-05-01 20:14:47 +000078\end{methoddesc}
79
Fred Drake52dc76c2000-07-03 09:56:23 +000080\begin{methoddesc}{error_leader}{\optional{file\optional{, line}}}
Guido van Rossumd67ddbb2000-05-01 20:14:47 +000081This method generates an error message leader in the format of a
Fred Drake52dc76c2000-07-03 09:56:23 +000082\UNIX{} C compiler error label; the format is '"\%s", line \%d: ',
83where the \samp{\%s} is replaced with the name of the current source
84file and the \samp{\%d} with the current input line number (the
85optional arguments can be used to override these).
Guido van Rossumd67ddbb2000-05-01 20:14:47 +000086
Fred Drake52dc76c2000-07-03 09:56:23 +000087This convenience is provided to encourage \module{shlex} users to
88generate error messages in the standard, parseable format understood
89by Emacs and other \UNIX{} tools.
Guido van Rossumd67ddbb2000-05-01 20:14:47 +000090\end{methoddesc}
91
Guido van Rossum5e97c9d1998-12-22 05:18:24 +000092Instances of \class{shlex} subclasses have some public instance
Fred Drake52dc76c2000-07-03 09:56:23 +000093variables which either control lexical analysis or can be used for
94debugging:
Guido van Rossum5e97c9d1998-12-22 05:18:24 +000095
96\begin{memberdesc}{commenters}
97The string of characters that are recognized as comment beginners.
98All characters from the comment beginner to end of line are ignored.
Fred Drake1189fa91998-12-22 18:24:13 +000099Includes just \character{\#} by default.
Guido van Rossum5e97c9d1998-12-22 05:18:24 +0000100\end{memberdesc}
101
102\begin{memberdesc}{wordchars}
103The string of characters that will accumulate into multi-character
Fred Drake52dc76c2000-07-03 09:56:23 +0000104tokens. By default, includes all \ASCII{} alphanumerics and
Fred Drake1189fa91998-12-22 18:24:13 +0000105underscore.
Guido van Rossum5e97c9d1998-12-22 05:18:24 +0000106\end{memberdesc}
107
108\begin{memberdesc}{whitespace}
109Characters that will be considered whitespace and skipped. Whitespace
Fred Drake1189fa91998-12-22 18:24:13 +0000110bounds tokens. By default, includes space, tab, linefeed and
Guido van Rossum5e97c9d1998-12-22 05:18:24 +0000111carriage-return.
112\end{memberdesc}
113
114\begin{memberdesc}{quotes}
115Characters that will be considered string quotes. The token
116accumulates until the same quote is encountered again (thus, different
Fred Drake184e8361999-05-11 15:14:15 +0000117quote types protect each other as in the shell.) By default, includes
Fred Drake1189fa91998-12-22 18:24:13 +0000118\ASCII{} single and double quotes.
Guido van Rossum5e97c9d1998-12-22 05:18:24 +0000119\end{memberdesc}
120
Guido van Rossumd67ddbb2000-05-01 20:14:47 +0000121\begin{memberdesc}{infile}
122The name of the current input file, as initially set at class
123instantiation time or stacked by later source requests. It may
124be useful to examine this when constructing error messages.
125\end{memberdesc}
126
127\begin{memberdesc}{instream}
Fred Drake52dc76c2000-07-03 09:56:23 +0000128The input stream from which this \class{shlex} instance is reading
129characters.
Guido van Rossumd67ddbb2000-05-01 20:14:47 +0000130\end{memberdesc}
131
132\begin{memberdesc}{source}
Fred Drake52dc76c2000-07-03 09:56:23 +0000133This member is \code{None} by default. If you assign a string to it,
134that string will be recognized as a lexical-level inclusion request
135similar to the \samp{source} keyword in various shells. That is, the
136immediately following token will opened as a filename and input taken
137from that stream until \EOF, at which point the \method{close()}
138method of that stream will be called and the input source will again
139become the original input stream. Source requests may be stacked any
140number of levels deep.
Guido van Rossumd67ddbb2000-05-01 20:14:47 +0000141\end{memberdesc}
142
143\begin{memberdesc}{debug}
Fred Drake52dc76c2000-07-03 09:56:23 +0000144If this member is numeric and \code{1} or more, a \class{shlex}
145instance will print verbose progress output on its behavior. If you
146need to use this, you can read the module source code to learn the
147details.
Guido van Rossumd67ddbb2000-05-01 20:14:47 +0000148\end{memberdesc}
149
Guido van Rossum5e97c9d1998-12-22 05:18:24 +0000150Note that any character not declared to be a word character,
151whitespace, or a quote will be returned as a single-character token.
152
153Quote and comment characters are not recognized within words. Thus,
Fred Drake1189fa91998-12-22 18:24:13 +0000154the bare words \samp{ain't} and \samp{ain\#t} would be returned as single
Guido van Rossum5e97c9d1998-12-22 05:18:24 +0000155tokens by the default parser.
156
157\begin{memberdesc}{lineno}
158Source line number (count of newlines seen so far plus one).
159\end{memberdesc}
160
161\begin{memberdesc}{token}
Fred Drake1189fa91998-12-22 18:24:13 +0000162The token buffer. It may be useful to examine this when catching
163exceptions.
Guido van Rossum5e97c9d1998-12-22 05:18:24 +0000164\end{memberdesc}