blob: 326edcd4ad8c6318b9c0469a1d9334ce9af20786 [file] [log] [blame]
Guido van Rossum5e97c9d1998-12-22 05:18:24 +00001% Module and documentation by Eric S. Raymond, 21 Dec 1998
Guido van Rossum5e97c9d1998-12-22 05:18:24 +00002
Fred Drake1189fa91998-12-22 18:24:13 +00003\section{\module{shlex} ---
Fred Drake39cddb71999-01-12 19:22:11 +00004 Simple lexical analysis.}
Fred Drake1189fa91998-12-22 18:24:13 +00005
6\declaremodule{standard}{shlex}
Fred Drake39cddb71999-01-12 19:22:11 +00007\modulesynopsis{Simple lexical analysis for \UNIX{} shell-like languages.}
Fred Drake1189fa91998-12-22 18:24:13 +00008\moduleauthor{Eric S. Raymond}{esr@snark.thyrsus.com}
9\sectionauthor{Eric S. Raymond}{esr@snark.thyrsus.com}
10
Fred Drake292b9eb1998-12-22 18:40:50 +000011\versionadded{1.5.2}
Fred Drake1189fa91998-12-22 18:24:13 +000012
13The \class{shlex} class makes it easy to write lexical analyzers for
14simple syntaxes resembling that of the \UNIX{} shell. This will often
15be useful for writing minilanguages, e.g.\ in run control files for
Guido van Rossum5e97c9d1998-12-22 05:18:24 +000016Python applications.
17
18\begin{classdesc}{shlex}{\optional{stream}}
19A \class{shlex} instance or subclass instance is a lexical analyzer
20object. The initialization argument, if present, specifies where to
21read characters from. It must be a file- or stream-like object with
Fred Drake1189fa91998-12-22 18:24:13 +000022\method{read()} and \method{readline()} methods. If no argument is given,
23input will be taken from \code{sys.stdin}.
Guido van Rossum5e97c9d1998-12-22 05:18:24 +000024
25\end{classdesc}
26
Fred Drake1189fa91998-12-22 18:24:13 +000027\subsection{shlex Objects \label{shlex-objects}}
Guido van Rossum5e97c9d1998-12-22 05:18:24 +000028
29A \class{shlex} instance has the following methods:
30
31\begin{methoddesc}{get_token}{}
Fred Drake1189fa91998-12-22 18:24:13 +000032Return a token. If tokens have been stacked using
33\method{push_token()}, pop a token off the stack. Otherwise, read one
34from the input stream. If reading encounters an immediate
35end-of-file, an empty string is returned.
Guido van Rossum5e97c9d1998-12-22 05:18:24 +000036\end{methoddesc}
37
38\begin{methoddesc}{push_token}{str}
39Push the argument onto the token stack.
40\end{methoddesc}
41
42Instances of \class{shlex} subclasses have some public instance
43variables which either control lexical analysis or can be used
44for debugging:
45
46\begin{memberdesc}{commenters}
47The string of characters that are recognized as comment beginners.
48All characters from the comment beginner to end of line are ignored.
Fred Drake1189fa91998-12-22 18:24:13 +000049Includes just \character{\#} by default.
Guido van Rossum5e97c9d1998-12-22 05:18:24 +000050\end{memberdesc}
51
52\begin{memberdesc}{wordchars}
53The string of characters that will accumulate into multi-character
Fred Drake1189fa91998-12-22 18:24:13 +000054tokens. By default, includes all \ASCII{} alphanumerics and
55underscore.
Guido van Rossum5e97c9d1998-12-22 05:18:24 +000056\end{memberdesc}
57
58\begin{memberdesc}{whitespace}
59Characters that will be considered whitespace and skipped. Whitespace
Fred Drake1189fa91998-12-22 18:24:13 +000060bounds tokens. By default, includes space, tab, linefeed and
Guido van Rossum5e97c9d1998-12-22 05:18:24 +000061carriage-return.
62\end{memberdesc}
63
64\begin{memberdesc}{quotes}
65Characters that will be considered string quotes. The token
66accumulates until the same quote is encountered again (thus, different
67quote types protect each other as in the shall.) By default, includes
Fred Drake1189fa91998-12-22 18:24:13 +000068\ASCII{} single and double quotes.
Guido van Rossum5e97c9d1998-12-22 05:18:24 +000069\end{memberdesc}
70
71Note that any character not declared to be a word character,
72whitespace, or a quote will be returned as a single-character token.
73
74Quote and comment characters are not recognized within words. Thus,
Fred Drake1189fa91998-12-22 18:24:13 +000075the bare words \samp{ain't} and \samp{ain\#t} would be returned as single
Guido van Rossum5e97c9d1998-12-22 05:18:24 +000076tokens by the default parser.
77
78\begin{memberdesc}{lineno}
79Source line number (count of newlines seen so far plus one).
80\end{memberdesc}
81
82\begin{memberdesc}{token}
Fred Drake1189fa91998-12-22 18:24:13 +000083The token buffer. It may be useful to examine this when catching
84exceptions.
Guido van Rossum5e97c9d1998-12-22 05:18:24 +000085\end{memberdesc}