blob: 230ae9f76773d9d97e568649e8ec15faf635e416 [file] [log] [blame]
Fred Drake1189fa91998-12-22 18:24:13 +00001\section{\module{shlex} ---
Fred Drake184e8361999-05-11 15:14:15 +00002 Simple lexical analysis}
Fred Drake1189fa91998-12-22 18:24:13 +00003
4\declaremodule{standard}{shlex}
Fred Drakec116b822001-05-09 15:50:17 +00005\modulesynopsis{Simple lexical analysis for \UNIX\ shell-like languages.}
Fred Drake1189fa91998-12-22 18:24:13 +00006\moduleauthor{Eric S. Raymond}{esr@snark.thyrsus.com}
Gustavo Niemeyer68d8cef2003-04-17 21:31:33 +00007\moduleauthor{Gustavo Niemeyer}{niemeyer@conectiva.com}
Fred Drake1189fa91998-12-22 18:24:13 +00008\sectionauthor{Eric S. Raymond}{esr@snark.thyrsus.com}
Gustavo Niemeyer68d8cef2003-04-17 21:31:33 +00009\sectionauthor{Gustavo Niemeyer}{niemeyer@conectiva.com}
Fred Drake1189fa91998-12-22 18:24:13 +000010
Fred Drake292b9eb1998-12-22 18:40:50 +000011\versionadded{1.5.2}
Fred Drake1189fa91998-12-22 18:24:13 +000012
13The \class{shlex} class makes it easy to write lexical analyzers for
14simple syntaxes resembling that of the \UNIX{} shell. This will often
Fred Drakeaf785122003-12-31 05:18:46 +000015be useful for writing minilanguages, (for example, in run control
16files for Python applications) or for parsing quoted strings.
Gustavo Niemeyer68d8cef2003-04-17 21:31:33 +000017
Georg Brandl1aa74ee2005-09-29 20:24:06 +000018\note{The \module{shlex} module currently does not support Unicode input.}
19
Gustavo Niemeyer68d8cef2003-04-17 21:31:33 +000020The \module{shlex} module defines the following functions:
21
Guido van Rossume7ba4952007-06-06 23:52:48 +000022\begin{funcdesc}{split}{s\optional{, comments\optional{, posix}}}
Gustavo Niemeyer48f3dcc2003-04-20 01:57:03 +000023Split the string \var{s} using shell-like syntax. If \var{comments} is
Fred Drakeaf785122003-12-31 05:18:46 +000024\constant{False} (the default), the parsing of comments in the given
25string will be disabled (setting the \member{commenters} member of the
26\class{shlex} instance to the empty string). This function operates
Guido van Rossume7ba4952007-06-06 23:52:48 +000027in \POSIX{} mode by default, but uses non-\POSIX{} mode if the
28\var{posix} argument is false.
Gustavo Niemeyer68d8cef2003-04-17 21:31:33 +000029\versionadded{2.3}
Guido van Rossume7ba4952007-06-06 23:52:48 +000030\versionchanged[Added the \var{posix} parameter]{2.6}
Guido van Rossum486364b2007-06-30 05:01:58 +000031\note{Since the \function{split()} function instantiates a \class{shlex}
32 instance, passing \code{None} for \var{s} will read the string
33 to split from standard input.}
Gustavo Niemeyer68d8cef2003-04-17 21:31:33 +000034\end{funcdesc}
35
Fred Drakeaf785122003-12-31 05:18:46 +000036The \module{shlex} module defines the following class:
Gustavo Niemeyer68d8cef2003-04-17 21:31:33 +000037
Fred Drakeaf785122003-12-31 05:18:46 +000038\begin{classdesc}{shlex}{\optional{instream\optional{,
39 infile\optional{, posix}}}}
Gustavo Niemeyer68d8cef2003-04-17 21:31:33 +000040A \class{shlex} instance or subclass instance is a lexical analyzer
41object. The initialization argument, if present, specifies where to
42read characters from. It must be a file-/stream-like object with
43\method{read()} and \method{readline()} methods, or a string (strings
44are accepted since Python 2.3). If no argument is given, input will be
45taken from \code{sys.stdin}. The second optional argument is a filename
46string, which sets the initial value of the \member{infile} member. If
47the \var{instream} argument is omitted or equal to \code{sys.stdin},
48this second argument defaults to ``stdin''. The \var{posix} argument
Fred Drakeaa3b5d22003-04-17 21:49:04 +000049was introduced in Python 2.3, and defines the operational mode. When
Gustavo Niemeyer68d8cef2003-04-17 21:31:33 +000050\var{posix} is not true (default), the \class{shlex} instance will
Fred Drakeaa3b5d22003-04-17 21:49:04 +000051operate in compatibility mode. When operating in \POSIX{} mode,
52\class{shlex} will try to be as close as possible to the \POSIX{} shell
Fred Drakeaf785122003-12-31 05:18:46 +000053parsing rules. See section~\ref{shlex-objects}.
Gustavo Niemeyer68d8cef2003-04-17 21:31:33 +000054\end{classdesc}
55
Fred Drakeaf785122003-12-31 05:18:46 +000056\begin{seealso}
57 \seemodule{ConfigParser}{Parser for configuration files similar to the
58 Windows \file{.ini} files.}
59\end{seealso}
60
61
Fred Drake1189fa91998-12-22 18:24:13 +000062\subsection{shlex Objects \label{shlex-objects}}
Guido van Rossum5e97c9d1998-12-22 05:18:24 +000063
64A \class{shlex} instance has the following methods:
65
Guido van Rossumd8faa362007-04-27 19:54:29 +000066\begin{methoddesc}[shlex]{get_token}{}
Fred Drake1189fa91998-12-22 18:24:13 +000067Return a token. If tokens have been stacked using
68\method{push_token()}, pop a token off the stack. Otherwise, read one
69from the input stream. If reading encounters an immediate
Fred Drakeaa3b5d22003-04-17 21:49:04 +000070end-of-file, \member{self.eof} is returned (the empty string (\code{''})
71in non-\POSIX{} mode, and \code{None} in \POSIX{} mode).
Guido van Rossum5e97c9d1998-12-22 05:18:24 +000072\end{methoddesc}
73
Guido van Rossumd8faa362007-04-27 19:54:29 +000074\begin{methoddesc}[shlex]{push_token}{str}
Guido van Rossum5e97c9d1998-12-22 05:18:24 +000075Push the argument onto the token stack.
76\end{methoddesc}
77
Guido van Rossumd8faa362007-04-27 19:54:29 +000078\begin{methoddesc}[shlex]{read_token}{}
Guido van Rossumd67ddbb2000-05-01 20:14:47 +000079Read a raw token. Ignore the pushback stack, and do not interpret source
80requests. (This is not ordinarily a useful entry point, and is
81documented here only for the sake of completeness.)
82\end{methoddesc}
83
Guido van Rossumd8faa362007-04-27 19:54:29 +000084\begin{methoddesc}[shlex]{sourcehook}{filename}
Fred Drake52dc76c2000-07-03 09:56:23 +000085When \class{shlex} detects a source request (see
86\member{source} below) this method is given the following token as
87argument, and expected to return a tuple consisting of a filename and
88an open file-like object.
Guido van Rossumd67ddbb2000-05-01 20:14:47 +000089
Fred Drake52dc76c2000-07-03 09:56:23 +000090Normally, this method first strips any quotes off the argument. If
91the result is an absolute pathname, or there was no previous source
92request in effect, or the previous source was a stream
Fred Drakeaf785122003-12-31 05:18:46 +000093(such as \code{sys.stdin}), the result is left alone. Otherwise, if the
Fred Drake52dc76c2000-07-03 09:56:23 +000094result is a relative pathname, the directory part of the name of the
95file immediately before it on the source inclusion stack is prepended
96(this behavior is like the way the C preprocessor handles
Eric S. Raymondbd1a4892001-01-16 14:18:55 +000097\code{\#include "file.h"}).
98
99The result of the manipulations is treated as a filename, and returned
100as the first component of the tuple, with
101\function{open()} called on it to yield the second component. (Note:
102this is the reverse of the order of arguments in instance initialization!)
Guido van Rossumd67ddbb2000-05-01 20:14:47 +0000103
Fred Drake52dc76c2000-07-03 09:56:23 +0000104This hook is exposed so that you can use it to implement directory
105search paths, addition of file extensions, and other namespace hacks.
Guido van Rossumd67ddbb2000-05-01 20:14:47 +0000106There is no corresponding `close' hook, but a shlex instance will call
Fred Drake52dc76c2000-07-03 09:56:23 +0000107the \method{close()} method of the sourced input stream when it
108returns \EOF.
Eric S. Raymondbd1a4892001-01-16 14:18:55 +0000109
Fred Drake25be1932001-01-16 20:52:41 +0000110For more explicit control of source stacking, use the
111\method{push_source()} and \method{pop_source()} methods.
Eric S. Raymondbd1a4892001-01-16 14:18:55 +0000112\end{methoddesc}
113
Guido van Rossumd8faa362007-04-27 19:54:29 +0000114\begin{methoddesc}[shlex]{push_source}{stream\optional{, filename}}
Eric S. Raymondbd1a4892001-01-16 14:18:55 +0000115Push an input source stream onto the input stack. If the filename
116argument is specified it will later be available for use in error
117messages. This is the same method used internally by the
Fred Drake25be1932001-01-16 20:52:41 +0000118\method{sourcehook} method.
119\versionadded{2.1}
Eric S. Raymondbd1a4892001-01-16 14:18:55 +0000120\end{methoddesc}
121
Guido van Rossumd8faa362007-04-27 19:54:29 +0000122\begin{methoddesc}[shlex]{pop_source}{}
Eric S. Raymondbd1a4892001-01-16 14:18:55 +0000123Pop the last-pushed input source from the input stack.
124This is the same method used internally when the lexer reaches
Raymond Hettingerb67449d2003-09-08 18:52:18 +0000125\EOF{} on a stacked input stream.
Fred Drake25be1932001-01-16 20:52:41 +0000126\versionadded{2.1}
Guido van Rossumd67ddbb2000-05-01 20:14:47 +0000127\end{methoddesc}
128
Guido van Rossumd8faa362007-04-27 19:54:29 +0000129\begin{methoddesc}[shlex]{error_leader}{\optional{file\optional{, line}}}
Guido van Rossumd67ddbb2000-05-01 20:14:47 +0000130This method generates an error message leader in the format of a
Fred Drake25be1932001-01-16 20:52:41 +0000131\UNIX{} C compiler error label; the format is \code{'"\%s", line \%d: '},
Fred Drake52dc76c2000-07-03 09:56:23 +0000132where the \samp{\%s} is replaced with the name of the current source
133file and the \samp{\%d} with the current input line number (the
134optional arguments can be used to override these).
Guido van Rossumd67ddbb2000-05-01 20:14:47 +0000135
Fred Drake52dc76c2000-07-03 09:56:23 +0000136This convenience is provided to encourage \module{shlex} users to
137generate error messages in the standard, parseable format understood
138by Emacs and other \UNIX{} tools.
Guido van Rossumd67ddbb2000-05-01 20:14:47 +0000139\end{methoddesc}
140
Guido van Rossum5e97c9d1998-12-22 05:18:24 +0000141Instances of \class{shlex} subclasses have some public instance
Fred Drake52dc76c2000-07-03 09:56:23 +0000142variables which either control lexical analysis or can be used for
143debugging:
Guido van Rossum5e97c9d1998-12-22 05:18:24 +0000144
Guido van Rossumd8faa362007-04-27 19:54:29 +0000145\begin{memberdesc}[shlex]{commenters}
Guido van Rossum5e97c9d1998-12-22 05:18:24 +0000146The string of characters that are recognized as comment beginners.
147All characters from the comment beginner to end of line are ignored.
Fred Drake1189fa91998-12-22 18:24:13 +0000148Includes just \character{\#} by default.
Guido van Rossum5e97c9d1998-12-22 05:18:24 +0000149\end{memberdesc}
150
Guido van Rossumd8faa362007-04-27 19:54:29 +0000151\begin{memberdesc}[shlex]{wordchars}
Guido van Rossum5e97c9d1998-12-22 05:18:24 +0000152The string of characters that will accumulate into multi-character
Fred Drake52dc76c2000-07-03 09:56:23 +0000153tokens. By default, includes all \ASCII{} alphanumerics and
Fred Drake1189fa91998-12-22 18:24:13 +0000154underscore.
Guido van Rossum5e97c9d1998-12-22 05:18:24 +0000155\end{memberdesc}
156
Guido van Rossumd8faa362007-04-27 19:54:29 +0000157\begin{memberdesc}[shlex]{whitespace}
Guido van Rossum5e97c9d1998-12-22 05:18:24 +0000158Characters that will be considered whitespace and skipped. Whitespace
Fred Drake1189fa91998-12-22 18:24:13 +0000159bounds tokens. By default, includes space, tab, linefeed and
Guido van Rossum5e97c9d1998-12-22 05:18:24 +0000160carriage-return.
161\end{memberdesc}
162
Guido van Rossumd8faa362007-04-27 19:54:29 +0000163\begin{memberdesc}[shlex]{escape}
Gustavo Niemeyer68d8cef2003-04-17 21:31:33 +0000164Characters that will be considered as escape. This will be only used
Fred Drakeaa3b5d22003-04-17 21:49:04 +0000165in \POSIX{} mode, and includes just \character{\textbackslash} by default.
Gustavo Niemeyer68d8cef2003-04-17 21:31:33 +0000166\versionadded{2.3}
167\end{memberdesc}
168
Guido van Rossumd8faa362007-04-27 19:54:29 +0000169\begin{memberdesc}[shlex]{quotes}
Guido van Rossum5e97c9d1998-12-22 05:18:24 +0000170Characters that will be considered string quotes. The token
171accumulates until the same quote is encountered again (thus, different
Fred Drake184e8361999-05-11 15:14:15 +0000172quote types protect each other as in the shell.) By default, includes
Fred Drake1189fa91998-12-22 18:24:13 +0000173\ASCII{} single and double quotes.
Guido van Rossum5e97c9d1998-12-22 05:18:24 +0000174\end{memberdesc}
175
Guido van Rossumd8faa362007-04-27 19:54:29 +0000176\begin{memberdesc}[shlex]{escapedquotes}
Gustavo Niemeyer68d8cef2003-04-17 21:31:33 +0000177Characters in \member{quotes} that will interpret escape characters
Fred Drakeaa3b5d22003-04-17 21:49:04 +0000178defined in \member{escape}. This is only used in \POSIX{} mode, and
179includes just \character{"} by default.
Gustavo Niemeyer68d8cef2003-04-17 21:31:33 +0000180\versionadded{2.3}
181\end{memberdesc}
182
Guido van Rossumd8faa362007-04-27 19:54:29 +0000183\begin{memberdesc}[shlex]{whitespace_split}
Neal Norwitz10cf2182003-04-17 23:09:08 +0000184If \code{True}, tokens will only be split in whitespaces. This is useful, for
Gustavo Niemeyer68d8cef2003-04-17 21:31:33 +0000185example, for parsing command lines with \class{shlex}, getting tokens
186in a similar way to shell arguments.
187\versionadded{2.3}
188\end{memberdesc}
189
Guido van Rossumd8faa362007-04-27 19:54:29 +0000190\begin{memberdesc}[shlex]{infile}
Guido van Rossumd67ddbb2000-05-01 20:14:47 +0000191The name of the current input file, as initially set at class
192instantiation time or stacked by later source requests. It may
193be useful to examine this when constructing error messages.
194\end{memberdesc}
195
Guido van Rossumd8faa362007-04-27 19:54:29 +0000196\begin{memberdesc}[shlex]{instream}
Fred Drake52dc76c2000-07-03 09:56:23 +0000197The input stream from which this \class{shlex} instance is reading
198characters.
Guido van Rossumd67ddbb2000-05-01 20:14:47 +0000199\end{memberdesc}
200
Guido van Rossumd8faa362007-04-27 19:54:29 +0000201\begin{memberdesc}[shlex]{source}
Fred Drake52dc76c2000-07-03 09:56:23 +0000202This member is \code{None} by default. If you assign a string to it,
203that string will be recognized as a lexical-level inclusion request
204similar to the \samp{source} keyword in various shells. That is, the
205immediately following token will opened as a filename and input taken
206from that stream until \EOF, at which point the \method{close()}
207method of that stream will be called and the input source will again
208become the original input stream. Source requests may be stacked any
209number of levels deep.
Guido van Rossumd67ddbb2000-05-01 20:14:47 +0000210\end{memberdesc}
211
Guido van Rossumd8faa362007-04-27 19:54:29 +0000212\begin{memberdesc}[shlex]{debug}
Fred Drake52dc76c2000-07-03 09:56:23 +0000213If this member is numeric and \code{1} or more, a \class{shlex}
214instance will print verbose progress output on its behavior. If you
215need to use this, you can read the module source code to learn the
216details.
Guido van Rossumd67ddbb2000-05-01 20:14:47 +0000217\end{memberdesc}
218
Guido van Rossumd8faa362007-04-27 19:54:29 +0000219\begin{memberdesc}[shlex]{lineno}
Guido van Rossum5e97c9d1998-12-22 05:18:24 +0000220Source line number (count of newlines seen so far plus one).
221\end{memberdesc}
222
Guido van Rossumd8faa362007-04-27 19:54:29 +0000223\begin{memberdesc}[shlex]{token}
Fred Drake1189fa91998-12-22 18:24:13 +0000224The token buffer. It may be useful to examine this when catching
225exceptions.
Guido van Rossum5e97c9d1998-12-22 05:18:24 +0000226\end{memberdesc}
Gustavo Niemeyer68d8cef2003-04-17 21:31:33 +0000227
Guido van Rossumd8faa362007-04-27 19:54:29 +0000228\begin{memberdesc}[shlex]{eof}
Gustavo Niemeyer68d8cef2003-04-17 21:31:33 +0000229Token used to determine end of file. This will be set to the empty
Fred Drakeaa3b5d22003-04-17 21:49:04 +0000230string (\code{''}), in non-\POSIX{} mode, and to \code{None} in
231\POSIX{} mode.
Gustavo Niemeyer68d8cef2003-04-17 21:31:33 +0000232\versionadded{2.3}
233\end{memberdesc}
234
235\subsection{Parsing Rules\label{shlex-parsing-rules}}
236
Fred Drakeaa3b5d22003-04-17 21:49:04 +0000237When operating in non-\POSIX{} mode, \class{shlex} will try to obey to
238the following rules.
Gustavo Niemeyer68d8cef2003-04-17 21:31:33 +0000239
240\begin{itemize}
241\item Quote characters are not recognized within words
242 (\code{Do"Not"Separate} is parsed as the single word
243 \code{Do"Not"Separate});
244\item Escape characters are not recognized;
245\item Enclosing characters in quotes preserve the literal value of
246 all characters within the quotes;
247\item Closing quotes separate words (\code{"Do"Separate} is parsed
248 as \code{"Do"} and \code{Separate});
249\item If \member{whitespace_split} is \code{False}, any character not
250 declared to be a word character, whitespace, or a quote will be
251 returned as a single-character token. If it is \code{True},
252 \class{shlex} will only split words in whitespaces;
Fred Drakeaa3b5d22003-04-17 21:49:04 +0000253\item EOF is signaled with an empty string (\code{''});
Gustavo Niemeyer68d8cef2003-04-17 21:31:33 +0000254\item It's not possible to parse empty strings, even if quoted.
255\end{itemize}
256
Fred Drakeaa3b5d22003-04-17 21:49:04 +0000257When operating in \POSIX{} mode, \class{shlex} will try to obey to the
Gustavo Niemeyer68d8cef2003-04-17 21:31:33 +0000258following parsing rules.
259
260\begin{itemize}
261\item Quotes are stripped out, and do not separate words
262 (\code{"Do"Not"Separate"} is parsed as the single word
263 \code{DoNotSeparate});
264\item Non-quoted escape characters (e.g. \character{\textbackslash})
265 preserve the literal value of the next character that follows;
266\item Enclosing characters in quotes which are not part of
267 \member{escapedquotes} (e.g. \character{'}) preserve the literal
268 value of all characters within the quotes;
269\item Enclosing characters in quotes which are part of
270 \member{escapedquotes} (e.g. \character{"}) preserves the literal
271 value of all characters within the quotes, with the exception of
272 the characters mentioned in \member{escape}. The escape characters
273 retain its special meaning only when followed by the quote in use,
274 or the escape character itself. Otherwise the escape character
275 will be considered a normal character.
Fred Drakeaf785122003-12-31 05:18:46 +0000276\item EOF is signaled with a \constant{None} value;
Fred Drakeaa3b5d22003-04-17 21:49:04 +0000277\item Quoted empty strings (\code{''}) are allowed;
Gustavo Niemeyer68d8cef2003-04-17 21:31:33 +0000278\end{itemize}
279