blob: f33fa2a0a8d15035e1053dbd81a83f1dd1f8a751 [file] [log] [blame]
Guido van Rossum5e97c9d1998-12-22 05:18:24 +00001% Module and documentation by Eric S. Raymond, 21 Dec 1998
Guido van Rossum5e97c9d1998-12-22 05:18:24 +00002
Fred Drake1189fa91998-12-22 18:24:13 +00003\section{\module{shlex} ---
Fred Drake184e8361999-05-11 15:14:15 +00004 Simple lexical analysis}
Fred Drake1189fa91998-12-22 18:24:13 +00005
6\declaremodule{standard}{shlex}
Fred Drake39cddb71999-01-12 19:22:11 +00007\modulesynopsis{Simple lexical analysis for \UNIX{} shell-like languages.}
Fred Drake1189fa91998-12-22 18:24:13 +00008\moduleauthor{Eric S. Raymond}{esr@snark.thyrsus.com}
9\sectionauthor{Eric S. Raymond}{esr@snark.thyrsus.com}
10
Fred Drake292b9eb1998-12-22 18:40:50 +000011\versionadded{1.5.2}
Fred Drake1189fa91998-12-22 18:24:13 +000012
13The \class{shlex} class makes it easy to write lexical analyzers for
14simple syntaxes resembling that of the \UNIX{} shell. This will often
15be useful for writing minilanguages, e.g.\ in run control files for
Guido van Rossum5e97c9d1998-12-22 05:18:24 +000016Python applications.
17
18\begin{classdesc}{shlex}{\optional{stream}}
19A \class{shlex} instance or subclass instance is a lexical analyzer
20object. The initialization argument, if present, specifies where to
21read characters from. It must be a file- or stream-like object with
Fred Drake1189fa91998-12-22 18:24:13 +000022\method{read()} and \method{readline()} methods. If no argument is given,
23input will be taken from \code{sys.stdin}.
Guido van Rossum5e97c9d1998-12-22 05:18:24 +000024\end{classdesc}
25
Fred Drake184e8361999-05-11 15:14:15 +000026
27\begin{seealso}
28 \seemodule{ConfigParser}{Parser for configuration files similar to the
29 Windows \file{.ini} files.}
30\end{seealso}
31
32
Fred Drake1189fa91998-12-22 18:24:13 +000033\subsection{shlex Objects \label{shlex-objects}}
Guido van Rossum5e97c9d1998-12-22 05:18:24 +000034
35A \class{shlex} instance has the following methods:
36
37\begin{methoddesc}{get_token}{}
Fred Drake1189fa91998-12-22 18:24:13 +000038Return a token. If tokens have been stacked using
39\method{push_token()}, pop a token off the stack. Otherwise, read one
40from the input stream. If reading encounters an immediate
41end-of-file, an empty string is returned.
Guido van Rossum5e97c9d1998-12-22 05:18:24 +000042\end{methoddesc}
43
44\begin{methoddesc}{push_token}{str}
45Push the argument onto the token stack.
46\end{methoddesc}
47
48Instances of \class{shlex} subclasses have some public instance
49variables which either control lexical analysis or can be used
50for debugging:
51
52\begin{memberdesc}{commenters}
53The string of characters that are recognized as comment beginners.
54All characters from the comment beginner to end of line are ignored.
Fred Drake1189fa91998-12-22 18:24:13 +000055Includes just \character{\#} by default.
Guido van Rossum5e97c9d1998-12-22 05:18:24 +000056\end{memberdesc}
57
58\begin{memberdesc}{wordchars}
59The string of characters that will accumulate into multi-character
Fred Drake1189fa91998-12-22 18:24:13 +000060tokens. By default, includes all \ASCII{} alphanumerics and
61underscore.
Guido van Rossum5e97c9d1998-12-22 05:18:24 +000062\end{memberdesc}
63
64\begin{memberdesc}{whitespace}
65Characters that will be considered whitespace and skipped. Whitespace
Fred Drake1189fa91998-12-22 18:24:13 +000066bounds tokens. By default, includes space, tab, linefeed and
Guido van Rossum5e97c9d1998-12-22 05:18:24 +000067carriage-return.
68\end{memberdesc}
69
70\begin{memberdesc}{quotes}
71Characters that will be considered string quotes. The token
72accumulates until the same quote is encountered again (thus, different
Fred Drake184e8361999-05-11 15:14:15 +000073quote types protect each other as in the shell.) By default, includes
Fred Drake1189fa91998-12-22 18:24:13 +000074\ASCII{} single and double quotes.
Guido van Rossum5e97c9d1998-12-22 05:18:24 +000075\end{memberdesc}
76
77Note that any character not declared to be a word character,
78whitespace, or a quote will be returned as a single-character token.
79
80Quote and comment characters are not recognized within words. Thus,
Fred Drake1189fa91998-12-22 18:24:13 +000081the bare words \samp{ain't} and \samp{ain\#t} would be returned as single
Guido van Rossum5e97c9d1998-12-22 05:18:24 +000082tokens by the default parser.
83
84\begin{memberdesc}{lineno}
85Source line number (count of newlines seen so far plus one).
86\end{memberdesc}
87
88\begin{memberdesc}{token}
Fred Drake1189fa91998-12-22 18:24:13 +000089The token buffer. It may be useful to examine this when catching
90exceptions.
Guido van Rossum5e97c9d1998-12-22 05:18:24 +000091\end{memberdesc}