blob: be08e01ab5e968513578875d0b55d99af75e7f6a [file] [log] [blame]
Georg Brandl8ec7f652007-08-15 14:28:01 +00001:mod:`shlex` --- Simple lexical analysis
2========================================
3
4.. module:: shlex
5 :synopsis: Simple lexical analysis for Unix shell-like languages.
6.. moduleauthor:: Eric S. Raymond <esr@snark.thyrsus.com>
7.. moduleauthor:: Gustavo Niemeyer <niemeyer@conectiva.com>
8.. sectionauthor:: Eric S. Raymond <esr@snark.thyrsus.com>
9.. sectionauthor:: Gustavo Niemeyer <niemeyer@conectiva.com>
10
11
12.. versionadded:: 1.5.2
13
Éric Araujo29a0b572011-08-19 02:14:03 +020014**Source code:** :source:`Lib/shlex.py`
15
16--------------
17
18
Serhiy Storchaka7378d632013-10-13 21:17:56 +030019The :class:`~shlex.shlex` class makes it easy to write lexical analyzers for
20simple syntaxes resembling that of the Unix shell. This will often be useful
21for writing minilanguages, (for example, in run control files for Python
Georg Brandl8ec7f652007-08-15 14:28:01 +000022applications) or for parsing quoted strings.
23
Éric Araujob21f51a2011-10-23 04:37:51 +020024Prior to Python 2.7.3, this module did not support Unicode input.
Georg Brandl8ec7f652007-08-15 14:28:01 +000025
26The :mod:`shlex` module defines the following functions:
27
28
29.. function:: split(s[, comments[, posix]])
30
31 Split the string *s* using shell-like syntax. If *comments* is :const:`False`
32 (the default), the parsing of comments in the given string will be disabled
Serhiy Storchaka7378d632013-10-13 21:17:56 +030033 (setting the :attr:`~shlex.commenters` attribute of the
34 :class:`~shlex.shlex` instance to the empty string). This function operates
35 in POSIX mode by default, but uses non-POSIX mode if the *posix* argument is
36 false.
Georg Brandl8ec7f652007-08-15 14:28:01 +000037
38 .. versionadded:: 2.3
39
40 .. versionchanged:: 2.6
41 Added the *posix* parameter.
42
43 .. note::
44
Serhiy Storchaka7378d632013-10-13 21:17:56 +030045 Since the :func:`split` function instantiates a :class:`~shlex.shlex`
46 instance, passing ``None`` for *s* will read the string to split from
47 standard input.
Georg Brandl8ec7f652007-08-15 14:28:01 +000048
49The :mod:`shlex` module defines the following class:
50
51
52.. class:: shlex([instream[, infile[, posix]]])
53
Serhiy Storchaka7378d632013-10-13 21:17:56 +030054 A :class:`~shlex.shlex` instance or subclass instance is a lexical analyzer
55 object. The initialization argument, if present, specifies where to read
56 characters from. It must be a file-/stream-like object with
57 :meth:`~io.TextIOBase.read` and :meth:`~io.TextIOBase.readline` methods, or
58 a string (strings are accepted since Python 2.3). If no argument is given,
59 input will be taken from ``sys.stdin``. The second optional argument is a
60 filename string, which sets the initial value of the :attr:`~shlex.infile`
61 attribute. If the *instream* argument is omitted or equal to ``sys.stdin``,
62 this second argument defaults to "stdin". The *posix* argument was
63 introduced in Python 2.3, and defines the operational mode. When *posix* is
64 not true (default), the :class:`~shlex.shlex` instance will operate in
65 compatibility mode. When operating in POSIX mode, :class:`~shlex.shlex`
66 will try to be as close as possible to the POSIX shell parsing rules.
Georg Brandl8ec7f652007-08-15 14:28:01 +000067
68
69.. seealso::
70
Georg Brandl392c6fc2008-05-25 07:25:25 +000071 Module :mod:`ConfigParser`
Georg Brandl8ec7f652007-08-15 14:28:01 +000072 Parser for configuration files similar to the Windows :file:`.ini` files.
73
74
75.. _shlex-objects:
76
77shlex Objects
78-------------
79
Serhiy Storchaka7378d632013-10-13 21:17:56 +030080A :class:`~shlex.shlex` instance has the following methods:
Georg Brandl8ec7f652007-08-15 14:28:01 +000081
82
83.. method:: shlex.get_token()
84
85 Return a token. If tokens have been stacked using :meth:`push_token`, pop a
86 token off the stack. Otherwise, read one from the input stream. If reading
Serhiy Storchaka7378d632013-10-13 21:17:56 +030087 encounters an immediate end-of-file, :attr:`eof` is returned (the empty
Georg Brandl8ec7f652007-08-15 14:28:01 +000088 string (``''``) in non-POSIX mode, and ``None`` in POSIX mode).
89
90
91.. method:: shlex.push_token(str)
92
93 Push the argument onto the token stack.
94
95
96.. method:: shlex.read_token()
97
98 Read a raw token. Ignore the pushback stack, and do not interpret source
99 requests. (This is not ordinarily a useful entry point, and is documented here
100 only for the sake of completeness.)
101
102
103.. method:: shlex.sourcehook(filename)
104
Serhiy Storchaka7378d632013-10-13 21:17:56 +0300105 When :class:`~shlex.shlex` detects a source request (see :attr:`source`
106 below) this method is given the following token as argument, and expected
107 to return a tuple consisting of a filename and an open file-like object.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000108
109 Normally, this method first strips any quotes off the argument. If the result
110 is an absolute pathname, or there was no previous source request in effect, or
111 the previous source was a stream (such as ``sys.stdin``), the result is left
112 alone. Otherwise, if the result is a relative pathname, the directory part of
113 the name of the file immediately before it on the source inclusion stack is
114 prepended (this behavior is like the way the C preprocessor handles ``#include
115 "file.h"``).
116
117 The result of the manipulations is treated as a filename, and returned as the
118 first component of the tuple, with :func:`open` called on it to yield the second
119 component. (Note: this is the reverse of the order of arguments in instance
120 initialization!)
121
122 This hook is exposed so that you can use it to implement directory search paths,
123 addition of file extensions, and other namespace hacks. There is no
Serhiy Storchaka7378d632013-10-13 21:17:56 +0300124 corresponding 'close' hook, but a shlex instance will call the
125 :meth:`~io.IOBase.close` method of the sourced input stream when it returns
126 EOF.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000127
128 For more explicit control of source stacking, use the :meth:`push_source` and
129 :meth:`pop_source` methods.
130
131
132.. method:: shlex.push_source(stream[, filename])
133
134 Push an input source stream onto the input stack. If the filename argument is
135 specified it will later be available for use in error messages. This is the
136 same method used internally by the :meth:`sourcehook` method.
137
138 .. versionadded:: 2.1
139
140
141.. method:: shlex.pop_source()
142
143 Pop the last-pushed input source from the input stack. This is the same method
144 used internally when the lexer reaches EOF on a stacked input stream.
145
146 .. versionadded:: 2.1
147
148
149.. method:: shlex.error_leader([file[, line]])
150
151 This method generates an error message leader in the format of a Unix C compiler
152 error label; the format is ``'"%s", line %d: '``, where the ``%s`` is replaced
153 with the name of the current source file and the ``%d`` with the current input
154 line number (the optional arguments can be used to override these).
155
156 This convenience is provided to encourage :mod:`shlex` users to generate error
157 messages in the standard, parseable format understood by Emacs and other Unix
158 tools.
159
Serhiy Storchaka7378d632013-10-13 21:17:56 +0300160Instances of :class:`~shlex.shlex` subclasses have some public instance
161variables which either control lexical analysis or can be used for debugging:
Georg Brandl8ec7f652007-08-15 14:28:01 +0000162
163
164.. attribute:: shlex.commenters
165
166 The string of characters that are recognized as comment beginners. All
167 characters from the comment beginner to end of line are ignored. Includes just
168 ``'#'`` by default.
169
170
171.. attribute:: shlex.wordchars
172
173 The string of characters that will accumulate into multi-character tokens. By
174 default, includes all ASCII alphanumerics and underscore.
175
176
177.. attribute:: shlex.whitespace
178
179 Characters that will be considered whitespace and skipped. Whitespace bounds
180 tokens. By default, includes space, tab, linefeed and carriage-return.
181
182
183.. attribute:: shlex.escape
184
185 Characters that will be considered as escape. This will be only used in POSIX
186 mode, and includes just ``'\'`` by default.
187
188 .. versionadded:: 2.3
189
190
191.. attribute:: shlex.quotes
192
193 Characters that will be considered string quotes. The token accumulates until
194 the same quote is encountered again (thus, different quote types protect each
195 other as in the shell.) By default, includes ASCII single and double quotes.
196
197
198.. attribute:: shlex.escapedquotes
199
200 Characters in :attr:`quotes` that will interpret escape characters defined in
201 :attr:`escape`. This is only used in POSIX mode, and includes just ``'"'`` by
202 default.
203
204 .. versionadded:: 2.3
205
206
207.. attribute:: shlex.whitespace_split
208
209 If ``True``, tokens will only be split in whitespaces. This is useful, for
Serhiy Storchaka7378d632013-10-13 21:17:56 +0300210 example, for parsing command lines with :class:`~shlex.shlex`, getting
211 tokens in a similar way to shell arguments.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000212
213 .. versionadded:: 2.3
214
215
216.. attribute:: shlex.infile
217
218 The name of the current input file, as initially set at class instantiation time
219 or stacked by later source requests. It may be useful to examine this when
220 constructing error messages.
221
222
223.. attribute:: shlex.instream
224
Serhiy Storchaka7378d632013-10-13 21:17:56 +0300225 The input stream from which this :class:`~shlex.shlex` instance is reading
226 characters.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000227
228
229.. attribute:: shlex.source
230
Senthil Kumaran6f18b982011-07-04 12:50:02 -0700231 This attribute is ``None`` by default. If you assign a string to it, that
232 string will be recognized as a lexical-level inclusion request similar to the
Georg Brandl8ec7f652007-08-15 14:28:01 +0000233 ``source`` keyword in various shells. That is, the immediately following token
234 will opened as a filename and input taken from that stream until EOF, at which
Serhiy Storchaka7378d632013-10-13 21:17:56 +0300235 point the :meth:`~io.IOBase.close` method of that stream will be called and
236 the input source will again become the original input stream. Source
237 requests may be stacked any number of levels deep.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000238
239
240.. attribute:: shlex.debug
241
Serhiy Storchaka7378d632013-10-13 21:17:56 +0300242 If this attribute is numeric and ``1`` or more, a :class:`~shlex.shlex`
243 instance will print verbose progress output on its behavior. If you need
244 to use this, you can read the module source code to learn the details.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000245
246
247.. attribute:: shlex.lineno
248
249 Source line number (count of newlines seen so far plus one).
250
251
252.. attribute:: shlex.token
253
254 The token buffer. It may be useful to examine this when catching exceptions.
255
256
257.. attribute:: shlex.eof
258
259 Token used to determine end of file. This will be set to the empty string
260 (``''``), in non-POSIX mode, and to ``None`` in POSIX mode.
261
262 .. versionadded:: 2.3
263
264
265.. _shlex-parsing-rules:
266
267Parsing Rules
268-------------
269
Serhiy Storchaka7378d632013-10-13 21:17:56 +0300270When operating in non-POSIX mode, :class:`~shlex.shlex` will try to obey to the
Georg Brandl8ec7f652007-08-15 14:28:01 +0000271following rules.
272
273* Quote characters are not recognized within words (``Do"Not"Separate`` is
274 parsed as the single word ``Do"Not"Separate``);
275
276* Escape characters are not recognized;
277
278* Enclosing characters in quotes preserve the literal value of all characters
279 within the quotes;
280
281* Closing quotes separate words (``"Do"Separate`` is parsed as ``"Do"`` and
282 ``Separate``);
283
Serhiy Storchaka7378d632013-10-13 21:17:56 +0300284* If :attr:`~shlex.whitespace_split` is ``False``, any character not
285 declared to be a word character, whitespace, or a quote will be returned as
286 a single-character token. If it is ``True``, :class:`~shlex.shlex` will only
287 split words in whitespaces;
Georg Brandl8ec7f652007-08-15 14:28:01 +0000288
289* EOF is signaled with an empty string (``''``);
290
291* It's not possible to parse empty strings, even if quoted.
292
Serhiy Storchaka7378d632013-10-13 21:17:56 +0300293When operating in POSIX mode, :class:`~shlex.shlex` will try to obey to the
294following parsing rules.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000295
296* Quotes are stripped out, and do not separate words (``"Do"Not"Separate"`` is
297 parsed as the single word ``DoNotSeparate``);
298
299* Non-quoted escape characters (e.g. ``'\'``) preserve the literal value of the
300 next character that follows;
301
Serhiy Storchaka7378d632013-10-13 21:17:56 +0300302* Enclosing characters in quotes which are not part of
303 :attr:`~shlex.escapedquotes` (e.g. ``"'"``) preserve the literal value
304 of all characters within the quotes;
Georg Brandl8ec7f652007-08-15 14:28:01 +0000305
Serhiy Storchaka7378d632013-10-13 21:17:56 +0300306* Enclosing characters in quotes which are part of
307 :attr:`~shlex.escapedquotes` (e.g. ``'"'``) preserves the literal value
308 of all characters within the quotes, with the exception of the characters
309 mentioned in :attr:`~shlex.escape`. The escape characters retain its
310 special meaning only when followed by the quote in use, or the escape
311 character itself. Otherwise the escape character will be considered a
Georg Brandl8ec7f652007-08-15 14:28:01 +0000312 normal character.
313
314* EOF is signaled with a :const:`None` value;
315
316* Quoted empty strings (``''``) are allowed;
317