blob: bf26e94d42a38f375f73edf9e5d9f32fdc21d67e [file] [log] [blame]
Guido van Rossum5e97c9d1998-12-22 05:18:24 +00001% Module and documentation by Eric S. Raymond, 21 Dec 1998
Guido van Rossum5e97c9d1998-12-22 05:18:24 +00002
Fred Drake1189fa91998-12-22 18:24:13 +00003\section{\module{shlex} ---
4 Simple lexical analysis}
5
6\declaremodule{standard}{shlex}
7\moduleauthor{Eric S. Raymond}{esr@snark.thyrsus.com}
8\sectionauthor{Eric S. Raymond}{esr@snark.thyrsus.com}
9
10
11The \class{shlex} class makes it easy to write lexical analyzers for
12simple syntaxes resembling that of the \UNIX{} shell. This will often
13be useful for writing minilanguages, e.g.\ in run control files for
Guido van Rossum5e97c9d1998-12-22 05:18:24 +000014Python applications.
15
16\begin{classdesc}{shlex}{\optional{stream}}
17A \class{shlex} instance or subclass instance is a lexical analyzer
18object. The initialization argument, if present, specifies where to
19read characters from. It must be a file- or stream-like object with
Fred Drake1189fa91998-12-22 18:24:13 +000020\method{read()} and \method{readline()} methods. If no argument is given,
21input will be taken from \code{sys.stdin}.
Guido van Rossum5e97c9d1998-12-22 05:18:24 +000022
23\end{classdesc}
24
Fred Drake1189fa91998-12-22 18:24:13 +000025\subsection{shlex Objects \label{shlex-objects}}
Guido van Rossum5e97c9d1998-12-22 05:18:24 +000026
27A \class{shlex} instance has the following methods:
28
29\begin{methoddesc}{get_token}{}
Fred Drake1189fa91998-12-22 18:24:13 +000030Return a token. If tokens have been stacked using
31\method{push_token()}, pop a token off the stack. Otherwise, read one
32from the input stream. If reading encounters an immediate
33end-of-file, an empty string is returned.
Guido van Rossum5e97c9d1998-12-22 05:18:24 +000034\end{methoddesc}
35
36\begin{methoddesc}{push_token}{str}
37Push the argument onto the token stack.
38\end{methoddesc}
39
40Instances of \class{shlex} subclasses have some public instance
41variables which either control lexical analysis or can be used
42for debugging:
43
44\begin{memberdesc}{commenters}
45The string of characters that are recognized as comment beginners.
46All characters from the comment beginner to end of line are ignored.
Fred Drake1189fa91998-12-22 18:24:13 +000047Includes just \character{\#} by default.
Guido van Rossum5e97c9d1998-12-22 05:18:24 +000048\end{memberdesc}
49
50\begin{memberdesc}{wordchars}
51The string of characters that will accumulate into multi-character
Fred Drake1189fa91998-12-22 18:24:13 +000052tokens. By default, includes all \ASCII{} alphanumerics and
53underscore.
Guido van Rossum5e97c9d1998-12-22 05:18:24 +000054\end{memberdesc}
55
56\begin{memberdesc}{whitespace}
57Characters that will be considered whitespace and skipped. Whitespace
Fred Drake1189fa91998-12-22 18:24:13 +000058bounds tokens. By default, includes space, tab, linefeed and
Guido van Rossum5e97c9d1998-12-22 05:18:24 +000059carriage-return.
60\end{memberdesc}
61
62\begin{memberdesc}{quotes}
63Characters that will be considered string quotes. The token
64accumulates until the same quote is encountered again (thus, different
65quote types protect each other as in the shall.) By default, includes
Fred Drake1189fa91998-12-22 18:24:13 +000066\ASCII{} single and double quotes.
Guido van Rossum5e97c9d1998-12-22 05:18:24 +000067\end{memberdesc}
68
69Note that any character not declared to be a word character,
70whitespace, or a quote will be returned as a single-character token.
71
72Quote and comment characters are not recognized within words. Thus,
Fred Drake1189fa91998-12-22 18:24:13 +000073the bare words \samp{ain't} and \samp{ain\#t} would be returned as single
Guido van Rossum5e97c9d1998-12-22 05:18:24 +000074tokens by the default parser.
75
76\begin{memberdesc}{lineno}
77Source line number (count of newlines seen so far plus one).
78\end{memberdesc}
79
80\begin{memberdesc}{token}
Fred Drake1189fa91998-12-22 18:24:13 +000081The token buffer. It may be useful to examine this when catching
82exceptions.
Guido van Rossum5e97c9d1998-12-22 05:18:24 +000083\end{memberdesc}