blob: 23babd3ccbe188778eda1a3fd0ffd64f09bbc8a5 [file] [log] [blame]
Fred Drake1189fa91998-12-22 18:24:13 +00001\section{\module{shlex} ---
Fred Drake184e8361999-05-11 15:14:15 +00002 Simple lexical analysis}
Fred Drake1189fa91998-12-22 18:24:13 +00003
4\declaremodule{standard}{shlex}
Fred Drakec116b822001-05-09 15:50:17 +00005\modulesynopsis{Simple lexical analysis for \UNIX\ shell-like languages.}
Fred Drake1189fa91998-12-22 18:24:13 +00006\moduleauthor{Eric S. Raymond}{esr@snark.thyrsus.com}
Gustavo Niemeyer68d8cef2003-04-17 21:31:33 +00007\moduleauthor{Gustavo Niemeyer}{niemeyer@conectiva.com}
Fred Drake1189fa91998-12-22 18:24:13 +00008\sectionauthor{Eric S. Raymond}{esr@snark.thyrsus.com}
Gustavo Niemeyer68d8cef2003-04-17 21:31:33 +00009\sectionauthor{Gustavo Niemeyer}{niemeyer@conectiva.com}
Fred Drake1189fa91998-12-22 18:24:13 +000010
Fred Drake292b9eb1998-12-22 18:40:50 +000011\versionadded{1.5.2}
Fred Drake1189fa91998-12-22 18:24:13 +000012
13The \class{shlex} class makes it easy to write lexical analyzers for
14simple syntaxes resembling that of the \UNIX{} shell. This will often
Fred Drakeaf785122003-12-31 05:18:46 +000015be useful for writing minilanguages, (for example, in run control
16files for Python applications) or for parsing quoted strings.
Gustavo Niemeyer68d8cef2003-04-17 21:31:33 +000017
Georg Brandl1aa74ee2005-09-29 20:24:06 +000018\note{The \module{shlex} module currently does not support Unicode input.}
19
Gustavo Niemeyer68d8cef2003-04-17 21:31:33 +000020The \module{shlex} module defines the following functions:
21
Guido van Rossume7ba4952007-06-06 23:52:48 +000022\begin{funcdesc}{split}{s\optional{, comments\optional{, posix}}}
Gustavo Niemeyer48f3dcc2003-04-20 01:57:03 +000023Split the string \var{s} using shell-like syntax. If \var{comments} is
Fred Drakeaf785122003-12-31 05:18:46 +000024\constant{False} (the default), the parsing of comments in the given
25string will be disabled (setting the \member{commenters} member of the
26\class{shlex} instance to the empty string). This function operates
Guido van Rossume7ba4952007-06-06 23:52:48 +000027in \POSIX{} mode by default, but uses non-\POSIX{} mode if the
28\var{posix} argument is false.
Gustavo Niemeyer68d8cef2003-04-17 21:31:33 +000029\versionadded{2.3}
Guido van Rossume7ba4952007-06-06 23:52:48 +000030\versionchanged[Added the \var{posix} parameter]{2.6}
Gustavo Niemeyer68d8cef2003-04-17 21:31:33 +000031\end{funcdesc}
32
Fred Drakeaf785122003-12-31 05:18:46 +000033The \module{shlex} module defines the following class:
Gustavo Niemeyer68d8cef2003-04-17 21:31:33 +000034
Fred Drakeaf785122003-12-31 05:18:46 +000035\begin{classdesc}{shlex}{\optional{instream\optional{,
36 infile\optional{, posix}}}}
Gustavo Niemeyer68d8cef2003-04-17 21:31:33 +000037A \class{shlex} instance or subclass instance is a lexical analyzer
38object. The initialization argument, if present, specifies where to
39read characters from. It must be a file-/stream-like object with
40\method{read()} and \method{readline()} methods, or a string (strings
41are accepted since Python 2.3). If no argument is given, input will be
42taken from \code{sys.stdin}. The second optional argument is a filename
43string, which sets the initial value of the \member{infile} member. If
44the \var{instream} argument is omitted or equal to \code{sys.stdin},
45this second argument defaults to ``stdin''. The \var{posix} argument
Fred Drakeaa3b5d22003-04-17 21:49:04 +000046was introduced in Python 2.3, and defines the operational mode. When
Gustavo Niemeyer68d8cef2003-04-17 21:31:33 +000047\var{posix} is not true (default), the \class{shlex} instance will
Fred Drakeaa3b5d22003-04-17 21:49:04 +000048operate in compatibility mode. When operating in \POSIX{} mode,
49\class{shlex} will try to be as close as possible to the \POSIX{} shell
Fred Drakeaf785122003-12-31 05:18:46 +000050parsing rules. See section~\ref{shlex-objects}.
Gustavo Niemeyer68d8cef2003-04-17 21:31:33 +000051\end{classdesc}
52
Fred Drakeaf785122003-12-31 05:18:46 +000053\begin{seealso}
54 \seemodule{ConfigParser}{Parser for configuration files similar to the
55 Windows \file{.ini} files.}
56\end{seealso}
57
58
Fred Drake1189fa91998-12-22 18:24:13 +000059\subsection{shlex Objects \label{shlex-objects}}
Guido van Rossum5e97c9d1998-12-22 05:18:24 +000060
61A \class{shlex} instance has the following methods:
62
Guido van Rossumd8faa362007-04-27 19:54:29 +000063\begin{methoddesc}[shlex]{get_token}{}
Fred Drake1189fa91998-12-22 18:24:13 +000064Return a token. If tokens have been stacked using
65\method{push_token()}, pop a token off the stack. Otherwise, read one
66from the input stream. If reading encounters an immediate
Fred Drakeaa3b5d22003-04-17 21:49:04 +000067end-of-file, \member{self.eof} is returned (the empty string (\code{''})
68in non-\POSIX{} mode, and \code{None} in \POSIX{} mode).
Guido van Rossum5e97c9d1998-12-22 05:18:24 +000069\end{methoddesc}
70
Guido van Rossumd8faa362007-04-27 19:54:29 +000071\begin{methoddesc}[shlex]{push_token}{str}
Guido van Rossum5e97c9d1998-12-22 05:18:24 +000072Push the argument onto the token stack.
73\end{methoddesc}
74
Guido van Rossumd8faa362007-04-27 19:54:29 +000075\begin{methoddesc}[shlex]{read_token}{}
Guido van Rossumd67ddbb2000-05-01 20:14:47 +000076Read a raw token. Ignore the pushback stack, and do not interpret source
77requests. (This is not ordinarily a useful entry point, and is
78documented here only for the sake of completeness.)
79\end{methoddesc}
80
Guido van Rossumd8faa362007-04-27 19:54:29 +000081\begin{methoddesc}[shlex]{sourcehook}{filename}
Fred Drake52dc76c2000-07-03 09:56:23 +000082When \class{shlex} detects a source request (see
83\member{source} below) this method is given the following token as
84argument, and expected to return a tuple consisting of a filename and
85an open file-like object.
Guido van Rossumd67ddbb2000-05-01 20:14:47 +000086
Fred Drake52dc76c2000-07-03 09:56:23 +000087Normally, this method first strips any quotes off the argument. If
88the result is an absolute pathname, or there was no previous source
89request in effect, or the previous source was a stream
Fred Drakeaf785122003-12-31 05:18:46 +000090(such as \code{sys.stdin}), the result is left alone. Otherwise, if the
Fred Drake52dc76c2000-07-03 09:56:23 +000091result is a relative pathname, the directory part of the name of the
92file immediately before it on the source inclusion stack is prepended
93(this behavior is like the way the C preprocessor handles
Eric S. Raymondbd1a4892001-01-16 14:18:55 +000094\code{\#include "file.h"}).
95
96The result of the manipulations is treated as a filename, and returned
97as the first component of the tuple, with
98\function{open()} called on it to yield the second component. (Note:
99this is the reverse of the order of arguments in instance initialization!)
Guido van Rossumd67ddbb2000-05-01 20:14:47 +0000100
Fred Drake52dc76c2000-07-03 09:56:23 +0000101This hook is exposed so that you can use it to implement directory
102search paths, addition of file extensions, and other namespace hacks.
Guido van Rossumd67ddbb2000-05-01 20:14:47 +0000103There is no corresponding `close' hook, but a shlex instance will call
Fred Drake52dc76c2000-07-03 09:56:23 +0000104the \method{close()} method of the sourced input stream when it
105returns \EOF.
Eric S. Raymondbd1a4892001-01-16 14:18:55 +0000106
Fred Drake25be1932001-01-16 20:52:41 +0000107For more explicit control of source stacking, use the
108\method{push_source()} and \method{pop_source()} methods.
Eric S. Raymondbd1a4892001-01-16 14:18:55 +0000109\end{methoddesc}
110
Guido van Rossumd8faa362007-04-27 19:54:29 +0000111\begin{methoddesc}[shlex]{push_source}{stream\optional{, filename}}
Eric S. Raymondbd1a4892001-01-16 14:18:55 +0000112Push an input source stream onto the input stack. If the filename
113argument is specified it will later be available for use in error
114messages. This is the same method used internally by the
Fred Drake25be1932001-01-16 20:52:41 +0000115\method{sourcehook} method.
116\versionadded{2.1}
Eric S. Raymondbd1a4892001-01-16 14:18:55 +0000117\end{methoddesc}
118
Guido van Rossumd8faa362007-04-27 19:54:29 +0000119\begin{methoddesc}[shlex]{pop_source}{}
Eric S. Raymondbd1a4892001-01-16 14:18:55 +0000120Pop the last-pushed input source from the input stack.
121This is the same method used internally when the lexer reaches
Raymond Hettingerb67449d2003-09-08 18:52:18 +0000122\EOF{} on a stacked input stream.
Fred Drake25be1932001-01-16 20:52:41 +0000123\versionadded{2.1}
Guido van Rossumd67ddbb2000-05-01 20:14:47 +0000124\end{methoddesc}
125
Guido van Rossumd8faa362007-04-27 19:54:29 +0000126\begin{methoddesc}[shlex]{error_leader}{\optional{file\optional{, line}}}
Guido van Rossumd67ddbb2000-05-01 20:14:47 +0000127This method generates an error message leader in the format of a
Fred Drake25be1932001-01-16 20:52:41 +0000128\UNIX{} C compiler error label; the format is \code{'"\%s", line \%d: '},
Fred Drake52dc76c2000-07-03 09:56:23 +0000129where the \samp{\%s} is replaced with the name of the current source
130file and the \samp{\%d} with the current input line number (the
131optional arguments can be used to override these).
Guido van Rossumd67ddbb2000-05-01 20:14:47 +0000132
Fred Drake52dc76c2000-07-03 09:56:23 +0000133This convenience is provided to encourage \module{shlex} users to
134generate error messages in the standard, parseable format understood
135by Emacs and other \UNIX{} tools.
Guido van Rossumd67ddbb2000-05-01 20:14:47 +0000136\end{methoddesc}
137
Guido van Rossum5e97c9d1998-12-22 05:18:24 +0000138Instances of \class{shlex} subclasses have some public instance
Fred Drake52dc76c2000-07-03 09:56:23 +0000139variables which either control lexical analysis or can be used for
140debugging:
Guido van Rossum5e97c9d1998-12-22 05:18:24 +0000141
Guido van Rossumd8faa362007-04-27 19:54:29 +0000142\begin{memberdesc}[shlex]{commenters}
Guido van Rossum5e97c9d1998-12-22 05:18:24 +0000143The string of characters that are recognized as comment beginners.
144All characters from the comment beginner to end of line are ignored.
Fred Drake1189fa91998-12-22 18:24:13 +0000145Includes just \character{\#} by default.
Guido van Rossum5e97c9d1998-12-22 05:18:24 +0000146\end{memberdesc}
147
Guido van Rossumd8faa362007-04-27 19:54:29 +0000148\begin{memberdesc}[shlex]{wordchars}
Guido van Rossum5e97c9d1998-12-22 05:18:24 +0000149The string of characters that will accumulate into multi-character
Fred Drake52dc76c2000-07-03 09:56:23 +0000150tokens. By default, includes all \ASCII{} alphanumerics and
Fred Drake1189fa91998-12-22 18:24:13 +0000151underscore.
Guido van Rossum5e97c9d1998-12-22 05:18:24 +0000152\end{memberdesc}
153
Guido van Rossumd8faa362007-04-27 19:54:29 +0000154\begin{memberdesc}[shlex]{whitespace}
Guido van Rossum5e97c9d1998-12-22 05:18:24 +0000155Characters that will be considered whitespace and skipped. Whitespace
Fred Drake1189fa91998-12-22 18:24:13 +0000156bounds tokens. By default, includes space, tab, linefeed and
Guido van Rossum5e97c9d1998-12-22 05:18:24 +0000157carriage-return.
158\end{memberdesc}
159
Guido van Rossumd8faa362007-04-27 19:54:29 +0000160\begin{memberdesc}[shlex]{escape}
Gustavo Niemeyer68d8cef2003-04-17 21:31:33 +0000161Characters that will be considered as escape. This will be only used
Fred Drakeaa3b5d22003-04-17 21:49:04 +0000162in \POSIX{} mode, and includes just \character{\textbackslash} by default.
Gustavo Niemeyer68d8cef2003-04-17 21:31:33 +0000163\versionadded{2.3}
164\end{memberdesc}
165
Guido van Rossumd8faa362007-04-27 19:54:29 +0000166\begin{memberdesc}[shlex]{quotes}
Guido van Rossum5e97c9d1998-12-22 05:18:24 +0000167Characters that will be considered string quotes. The token
168accumulates until the same quote is encountered again (thus, different
Fred Drake184e8361999-05-11 15:14:15 +0000169quote types protect each other as in the shell.) By default, includes
Fred Drake1189fa91998-12-22 18:24:13 +0000170\ASCII{} single and double quotes.
Guido van Rossum5e97c9d1998-12-22 05:18:24 +0000171\end{memberdesc}
172
Guido van Rossumd8faa362007-04-27 19:54:29 +0000173\begin{memberdesc}[shlex]{escapedquotes}
Gustavo Niemeyer68d8cef2003-04-17 21:31:33 +0000174Characters in \member{quotes} that will interpret escape characters
Fred Drakeaa3b5d22003-04-17 21:49:04 +0000175defined in \member{escape}. This is only used in \POSIX{} mode, and
176includes just \character{"} by default.
Gustavo Niemeyer68d8cef2003-04-17 21:31:33 +0000177\versionadded{2.3}
178\end{memberdesc}
179
Guido van Rossumd8faa362007-04-27 19:54:29 +0000180\begin{memberdesc}[shlex]{whitespace_split}
Neal Norwitz10cf2182003-04-17 23:09:08 +0000181If \code{True}, tokens will only be split in whitespaces. This is useful, for
Gustavo Niemeyer68d8cef2003-04-17 21:31:33 +0000182example, for parsing command lines with \class{shlex}, getting tokens
183in a similar way to shell arguments.
184\versionadded{2.3}
185\end{memberdesc}
186
Guido van Rossumd8faa362007-04-27 19:54:29 +0000187\begin{memberdesc}[shlex]{infile}
Guido van Rossumd67ddbb2000-05-01 20:14:47 +0000188The name of the current input file, as initially set at class
189instantiation time or stacked by later source requests. It may
190be useful to examine this when constructing error messages.
191\end{memberdesc}
192
Guido van Rossumd8faa362007-04-27 19:54:29 +0000193\begin{memberdesc}[shlex]{instream}
Fred Drake52dc76c2000-07-03 09:56:23 +0000194The input stream from which this \class{shlex} instance is reading
195characters.
Guido van Rossumd67ddbb2000-05-01 20:14:47 +0000196\end{memberdesc}
197
Guido van Rossumd8faa362007-04-27 19:54:29 +0000198\begin{memberdesc}[shlex]{source}
Fred Drake52dc76c2000-07-03 09:56:23 +0000199This member is \code{None} by default. If you assign a string to it,
200that string will be recognized as a lexical-level inclusion request
201similar to the \samp{source} keyword in various shells. That is, the
202immediately following token will opened as a filename and input taken
203from that stream until \EOF, at which point the \method{close()}
204method of that stream will be called and the input source will again
205become the original input stream. Source requests may be stacked any
206number of levels deep.
Guido van Rossumd67ddbb2000-05-01 20:14:47 +0000207\end{memberdesc}
208
Guido van Rossumd8faa362007-04-27 19:54:29 +0000209\begin{memberdesc}[shlex]{debug}
Fred Drake52dc76c2000-07-03 09:56:23 +0000210If this member is numeric and \code{1} or more, a \class{shlex}
211instance will print verbose progress output on its behavior. If you
212need to use this, you can read the module source code to learn the
213details.
Guido van Rossumd67ddbb2000-05-01 20:14:47 +0000214\end{memberdesc}
215
Guido van Rossumd8faa362007-04-27 19:54:29 +0000216\begin{memberdesc}[shlex]{lineno}
Guido van Rossum5e97c9d1998-12-22 05:18:24 +0000217Source line number (count of newlines seen so far plus one).
218\end{memberdesc}
219
Guido van Rossumd8faa362007-04-27 19:54:29 +0000220\begin{memberdesc}[shlex]{token}
Fred Drake1189fa91998-12-22 18:24:13 +0000221The token buffer. It may be useful to examine this when catching
222exceptions.
Guido van Rossum5e97c9d1998-12-22 05:18:24 +0000223\end{memberdesc}
Gustavo Niemeyer68d8cef2003-04-17 21:31:33 +0000224
Guido van Rossumd8faa362007-04-27 19:54:29 +0000225\begin{memberdesc}[shlex]{eof}
Gustavo Niemeyer68d8cef2003-04-17 21:31:33 +0000226Token used to determine end of file. This will be set to the empty
Fred Drakeaa3b5d22003-04-17 21:49:04 +0000227string (\code{''}), in non-\POSIX{} mode, and to \code{None} in
228\POSIX{} mode.
Gustavo Niemeyer68d8cef2003-04-17 21:31:33 +0000229\versionadded{2.3}
230\end{memberdesc}
231
232\subsection{Parsing Rules\label{shlex-parsing-rules}}
233
Fred Drakeaa3b5d22003-04-17 21:49:04 +0000234When operating in non-\POSIX{} mode, \class{shlex} will try to obey to
235the following rules.
Gustavo Niemeyer68d8cef2003-04-17 21:31:33 +0000236
237\begin{itemize}
238\item Quote characters are not recognized within words
239 (\code{Do"Not"Separate} is parsed as the single word
240 \code{Do"Not"Separate});
241\item Escape characters are not recognized;
242\item Enclosing characters in quotes preserve the literal value of
243 all characters within the quotes;
244\item Closing quotes separate words (\code{"Do"Separate} is parsed
245 as \code{"Do"} and \code{Separate});
246\item If \member{whitespace_split} is \code{False}, any character not
247 declared to be a word character, whitespace, or a quote will be
248 returned as a single-character token. If it is \code{True},
249 \class{shlex} will only split words in whitespaces;
Fred Drakeaa3b5d22003-04-17 21:49:04 +0000250\item EOF is signaled with an empty string (\code{''});
Gustavo Niemeyer68d8cef2003-04-17 21:31:33 +0000251\item It's not possible to parse empty strings, even if quoted.
252\end{itemize}
253
Fred Drakeaa3b5d22003-04-17 21:49:04 +0000254When operating in \POSIX{} mode, \class{shlex} will try to obey to the
Gustavo Niemeyer68d8cef2003-04-17 21:31:33 +0000255following parsing rules.
256
257\begin{itemize}
258\item Quotes are stripped out, and do not separate words
259 (\code{"Do"Not"Separate"} is parsed as the single word
260 \code{DoNotSeparate});
261\item Non-quoted escape characters (e.g. \character{\textbackslash})
262 preserve the literal value of the next character that follows;
263\item Enclosing characters in quotes which are not part of
264 \member{escapedquotes} (e.g. \character{'}) preserve the literal
265 value of all characters within the quotes;
266\item Enclosing characters in quotes which are part of
267 \member{escapedquotes} (e.g. \character{"}) preserves the literal
268 value of all characters within the quotes, with the exception of
269 the characters mentioned in \member{escape}. The escape characters
270 retain its special meaning only when followed by the quote in use,
271 or the escape character itself. Otherwise the escape character
272 will be considered a normal character.
Fred Drakeaf785122003-12-31 05:18:46 +0000273\item EOF is signaled with a \constant{None} value;
Fred Drakeaa3b5d22003-04-17 21:49:04 +0000274\item Quoted empty strings (\code{''}) are allowed;
Gustavo Niemeyer68d8cef2003-04-17 21:31:33 +0000275\end{itemize}
276