blob: bf543185d4b6a9f5311601d2078d2f6d11b9c10c [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001:mod:`shlex` --- Simple lexical analysis
2========================================
3
4.. module:: shlex
5 :synopsis: Simple lexical analysis for Unix shell-like languages.
6.. moduleauthor:: Eric S. Raymond <esr@snark.thyrsus.com>
7.. moduleauthor:: Gustavo Niemeyer <niemeyer@conectiva.com>
8.. sectionauthor:: Eric S. Raymond <esr@snark.thyrsus.com>
9.. sectionauthor:: Gustavo Niemeyer <niemeyer@conectiva.com>
10
Raymond Hettingera1993682011-01-27 01:20:32 +000011**Source code:** :source:`Lib/shlex.py`
12
13--------------
Georg Brandl116aa622007-08-15 14:28:22 +000014
Serhiy Storchaka4e985672013-10-13 21:19:00 +030015The :class:`~shlex.shlex` class makes it easy to write lexical analyzers for
16simple syntaxes resembling that of the Unix shell. This will often be useful
17for writing minilanguages, (for example, in run control files for Python
Georg Brandl116aa622007-08-15 14:28:22 +000018applications) or for parsing quoted strings.
19
Georg Brandl116aa622007-08-15 14:28:22 +000020The :mod:`shlex` module defines the following functions:
21
22
Georg Brandl18244152009-09-02 20:34:52 +000023.. function:: split(s, comments=False, posix=True)
Georg Brandl116aa622007-08-15 14:28:22 +000024
25 Split the string *s* using shell-like syntax. If *comments* is :const:`False`
26 (the default), the parsing of comments in the given string will be disabled
Serhiy Storchaka4e985672013-10-13 21:19:00 +030027 (setting the :attr:`~shlex.commenters` attribute of the
28 :class:`~shlex.shlex` instance to the empty string). This function operates
29 in POSIX mode by default, but uses non-POSIX mode if the *posix* argument is
30 false.
Georg Brandl116aa622007-08-15 14:28:22 +000031
Georg Brandl116aa622007-08-15 14:28:22 +000032 .. note::
33
Serhiy Storchaka4e985672013-10-13 21:19:00 +030034 Since the :func:`split` function instantiates a :class:`~shlex.shlex`
35 instance, passing ``None`` for *s* will read the string to split from
36 standard input.
Georg Brandl116aa622007-08-15 14:28:22 +000037
Éric Araujo9bce3112011-07-27 18:29:31 +020038
39.. function:: quote(s)
40
41 Return a shell-escaped version of the string *s*. The returned value is a
Éric Araujo30e277b2011-07-29 15:08:42 +020042 string that can safely be used as one token in a shell command line, for
43 cases where you cannot use a list.
Éric Araujo9bce3112011-07-27 18:29:31 +020044
Éric Araujo30e277b2011-07-29 15:08:42 +020045 This idiom would be unsafe::
46
47 >>> filename = 'somefile; rm -rf ~'
48 >>> command = 'ls -l {}'.format(filename)
49 >>> print(command) # executed by a shell: boom!
50 ls -l somefile; rm -rf ~
51
52 :func:`quote` lets you plug the security hole::
53
Éric Araujo9bce3112011-07-27 18:29:31 +020054 >>> command = 'ls -l {}'.format(quote(filename))
55 >>> print(command)
Éric Araujo30e277b2011-07-29 15:08:42 +020056 ls -l 'somefile; rm -rf ~'
Éric Araujo9bce3112011-07-27 18:29:31 +020057 >>> remote_command = 'ssh home {}'.format(quote(command))
58 >>> print(remote_command)
Éric Araujo30e277b2011-07-29 15:08:42 +020059 ssh home 'ls -l '"'"'somefile; rm -rf ~'"'"''
60
61 The quoting is compatible with UNIX shells and with :func:`split`:
62
63 >>> remote_command = split(remote_command)
64 >>> remote_command
65 ['ssh', 'home', "ls -l 'somefile; rm -rf ~'"]
66 >>> command = split(remote_command[-1])
67 >>> command
68 ['ls', '-l', 'somefile; rm -rf ~']
Éric Araujo9bce3112011-07-27 18:29:31 +020069
Eli Bendersky493846e2012-03-01 19:07:55 +020070 .. versionadded:: 3.3
Éric Araujo9bce3112011-07-27 18:29:31 +020071
Georg Brandl116aa622007-08-15 14:28:22 +000072The :mod:`shlex` module defines the following class:
73
74
Georg Brandl18244152009-09-02 20:34:52 +000075.. class:: shlex(instream=None, infile=None, posix=False)
Georg Brandl116aa622007-08-15 14:28:22 +000076
Serhiy Storchaka4e985672013-10-13 21:19:00 +030077 A :class:`~shlex.shlex` instance or subclass instance is a lexical analyzer
78 object. The initialization argument, if present, specifies where to read
79 characters from. It must be a file-/stream-like object with
80 :meth:`~io.TextIOBase.read` and :meth:`~io.TextIOBase.readline` methods, or
81 a string. If no argument is given, input will be taken from ``sys.stdin``.
82 The second optional argument is a filename string, which sets the initial
83 value of the :attr:`~shlex.infile` attribute. If the *instream*
84 argument is omitted or equal to ``sys.stdin``, this second argument
85 defaults to "stdin". The *posix* argument defines the operational mode:
86 when *posix* is not true (default), the :class:`~shlex.shlex` instance will
87 operate in compatibility mode. When operating in POSIX mode,
88 :class:`~shlex.shlex` will try to be as close as possible to the POSIX shell
89 parsing rules.
Georg Brandl116aa622007-08-15 14:28:22 +000090
91
92.. seealso::
93
Alexandre Vassalotti1d1eaa42008-05-14 22:59:42 +000094 Module :mod:`configparser`
Georg Brandl116aa622007-08-15 14:28:22 +000095 Parser for configuration files similar to the Windows :file:`.ini` files.
96
97
98.. _shlex-objects:
99
100shlex Objects
101-------------
102
Serhiy Storchaka4e985672013-10-13 21:19:00 +0300103A :class:`~shlex.shlex` instance has the following methods:
Georg Brandl116aa622007-08-15 14:28:22 +0000104
105
106.. method:: shlex.get_token()
107
108 Return a token. If tokens have been stacked using :meth:`push_token`, pop a
109 token off the stack. Otherwise, read one from the input stream. If reading
Serhiy Storchaka4e985672013-10-13 21:19:00 +0300110 encounters an immediate end-of-file, :attr:`eof` is returned (the empty
Georg Brandl116aa622007-08-15 14:28:22 +0000111 string (``''``) in non-POSIX mode, and ``None`` in POSIX mode).
112
113
114.. method:: shlex.push_token(str)
115
116 Push the argument onto the token stack.
117
118
119.. method:: shlex.read_token()
120
121 Read a raw token. Ignore the pushback stack, and do not interpret source
122 requests. (This is not ordinarily a useful entry point, and is documented here
123 only for the sake of completeness.)
124
125
126.. method:: shlex.sourcehook(filename)
127
Serhiy Storchaka4e985672013-10-13 21:19:00 +0300128 When :class:`~shlex.shlex` detects a source request (see :attr:`source`
129 below) this method is given the following token as argument, and expected
130 to return a tuple consisting of a filename and an open file-like object.
Georg Brandl116aa622007-08-15 14:28:22 +0000131
132 Normally, this method first strips any quotes off the argument. If the result
133 is an absolute pathname, or there was no previous source request in effect, or
134 the previous source was a stream (such as ``sys.stdin``), the result is left
135 alone. Otherwise, if the result is a relative pathname, the directory part of
136 the name of the file immediately before it on the source inclusion stack is
137 prepended (this behavior is like the way the C preprocessor handles ``#include
138 "file.h"``).
139
140 The result of the manipulations is treated as a filename, and returned as the
141 first component of the tuple, with :func:`open` called on it to yield the second
142 component. (Note: this is the reverse of the order of arguments in instance
143 initialization!)
144
145 This hook is exposed so that you can use it to implement directory search paths,
146 addition of file extensions, and other namespace hacks. There is no
Serhiy Storchaka4e985672013-10-13 21:19:00 +0300147 corresponding 'close' hook, but a shlex instance will call the
148 :meth:`~io.IOBase.close` method of the sourced input stream when it returns
149 EOF.
Georg Brandl116aa622007-08-15 14:28:22 +0000150
151 For more explicit control of source stacking, use the :meth:`push_source` and
152 :meth:`pop_source` methods.
153
154
Georg Brandl18244152009-09-02 20:34:52 +0000155.. method:: shlex.push_source(newstream, newfile=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000156
157 Push an input source stream onto the input stack. If the filename argument is
158 specified it will later be available for use in error messages. This is the
159 same method used internally by the :meth:`sourcehook` method.
160
Georg Brandl116aa622007-08-15 14:28:22 +0000161
162.. method:: shlex.pop_source()
163
164 Pop the last-pushed input source from the input stack. This is the same method
165 used internally when the lexer reaches EOF on a stacked input stream.
166
Georg Brandl116aa622007-08-15 14:28:22 +0000167
Georg Brandl18244152009-09-02 20:34:52 +0000168.. method:: shlex.error_leader(infile=None, lineno=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000169
170 This method generates an error message leader in the format of a Unix C compiler
171 error label; the format is ``'"%s", line %d: '``, where the ``%s`` is replaced
172 with the name of the current source file and the ``%d`` with the current input
173 line number (the optional arguments can be used to override these).
174
175 This convenience is provided to encourage :mod:`shlex` users to generate error
176 messages in the standard, parseable format understood by Emacs and other Unix
177 tools.
178
Serhiy Storchaka4e985672013-10-13 21:19:00 +0300179Instances of :class:`~shlex.shlex` subclasses have some public instance
180variables which either control lexical analysis or can be used for debugging:
Georg Brandl116aa622007-08-15 14:28:22 +0000181
182
183.. attribute:: shlex.commenters
184
185 The string of characters that are recognized as comment beginners. All
186 characters from the comment beginner to end of line are ignored. Includes just
187 ``'#'`` by default.
188
189
190.. attribute:: shlex.wordchars
191
192 The string of characters that will accumulate into multi-character tokens. By
193 default, includes all ASCII alphanumerics and underscore.
194
195
196.. attribute:: shlex.whitespace
197
198 Characters that will be considered whitespace and skipped. Whitespace bounds
199 tokens. By default, includes space, tab, linefeed and carriage-return.
200
201
202.. attribute:: shlex.escape
203
204 Characters that will be considered as escape. This will be only used in POSIX
205 mode, and includes just ``'\'`` by default.
206
Georg Brandl116aa622007-08-15 14:28:22 +0000207
208.. attribute:: shlex.quotes
209
210 Characters that will be considered string quotes. The token accumulates until
211 the same quote is encountered again (thus, different quote types protect each
212 other as in the shell.) By default, includes ASCII single and double quotes.
213
214
215.. attribute:: shlex.escapedquotes
216
217 Characters in :attr:`quotes` that will interpret escape characters defined in
218 :attr:`escape`. This is only used in POSIX mode, and includes just ``'"'`` by
219 default.
220
Georg Brandl116aa622007-08-15 14:28:22 +0000221
222.. attribute:: shlex.whitespace_split
223
224 If ``True``, tokens will only be split in whitespaces. This is useful, for
Serhiy Storchaka4e985672013-10-13 21:19:00 +0300225 example, for parsing command lines with :class:`~shlex.shlex`, getting
226 tokens in a similar way to shell arguments.
Georg Brandl116aa622007-08-15 14:28:22 +0000227
Georg Brandl116aa622007-08-15 14:28:22 +0000228
229.. attribute:: shlex.infile
230
231 The name of the current input file, as initially set at class instantiation time
232 or stacked by later source requests. It may be useful to examine this when
233 constructing error messages.
234
235
236.. attribute:: shlex.instream
237
Serhiy Storchaka4e985672013-10-13 21:19:00 +0300238 The input stream from which this :class:`~shlex.shlex` instance is reading
239 characters.
Georg Brandl116aa622007-08-15 14:28:22 +0000240
241
242.. attribute:: shlex.source
243
Senthil Kumarana6bac952011-07-04 11:28:30 -0700244 This attribute is ``None`` by default. If you assign a string to it, that
245 string will be recognized as a lexical-level inclusion request similar to the
Georg Brandl116aa622007-08-15 14:28:22 +0000246 ``source`` keyword in various shells. That is, the immediately following token
Martin Panter8d56c022016-05-29 04:13:35 +0000247 will be opened as a filename and input will
248 be taken from that stream until EOF, at which
Serhiy Storchaka4e985672013-10-13 21:19:00 +0300249 point the :meth:`~io.IOBase.close` method of that stream will be called and
250 the input source will again become the original input stream. Source
251 requests may be stacked any number of levels deep.
Georg Brandl116aa622007-08-15 14:28:22 +0000252
253
254.. attribute:: shlex.debug
255
Serhiy Storchaka4e985672013-10-13 21:19:00 +0300256 If this attribute is numeric and ``1`` or more, a :class:`~shlex.shlex`
257 instance will print verbose progress output on its behavior. If you need
258 to use this, you can read the module source code to learn the details.
Georg Brandl116aa622007-08-15 14:28:22 +0000259
260
261.. attribute:: shlex.lineno
262
263 Source line number (count of newlines seen so far plus one).
264
265
266.. attribute:: shlex.token
267
268 The token buffer. It may be useful to examine this when catching exceptions.
269
270
271.. attribute:: shlex.eof
272
273 Token used to determine end of file. This will be set to the empty string
274 (``''``), in non-POSIX mode, and to ``None`` in POSIX mode.
275
Georg Brandl116aa622007-08-15 14:28:22 +0000276
277.. _shlex-parsing-rules:
278
279Parsing Rules
280-------------
281
Serhiy Storchaka4e985672013-10-13 21:19:00 +0300282When operating in non-POSIX mode, :class:`~shlex.shlex` will try to obey to the
Georg Brandl116aa622007-08-15 14:28:22 +0000283following rules.
284
285* Quote characters are not recognized within words (``Do"Not"Separate`` is
286 parsed as the single word ``Do"Not"Separate``);
287
288* Escape characters are not recognized;
289
290* Enclosing characters in quotes preserve the literal value of all characters
291 within the quotes;
292
293* Closing quotes separate words (``"Do"Separate`` is parsed as ``"Do"`` and
294 ``Separate``);
295
Serhiy Storchaka4e985672013-10-13 21:19:00 +0300296* If :attr:`~shlex.whitespace_split` is ``False``, any character not
297 declared to be a word character, whitespace, or a quote will be returned as
298 a single-character token. If it is ``True``, :class:`~shlex.shlex` will only
299 split words in whitespaces;
Georg Brandl116aa622007-08-15 14:28:22 +0000300
301* EOF is signaled with an empty string (``''``);
302
303* It's not possible to parse empty strings, even if quoted.
304
Serhiy Storchaka4e985672013-10-13 21:19:00 +0300305When operating in POSIX mode, :class:`~shlex.shlex` will try to obey to the
306following parsing rules.
Georg Brandl116aa622007-08-15 14:28:22 +0000307
308* Quotes are stripped out, and do not separate words (``"Do"Not"Separate"`` is
309 parsed as the single word ``DoNotSeparate``);
310
311* Non-quoted escape characters (e.g. ``'\'``) preserve the literal value of the
312 next character that follows;
313
Serhiy Storchaka4e985672013-10-13 21:19:00 +0300314* Enclosing characters in quotes which are not part of
315 :attr:`~shlex.escapedquotes` (e.g. ``"'"``) preserve the literal value
316 of all characters within the quotes;
Georg Brandl116aa622007-08-15 14:28:22 +0000317
Serhiy Storchaka4e985672013-10-13 21:19:00 +0300318* Enclosing characters in quotes which are part of
319 :attr:`~shlex.escapedquotes` (e.g. ``'"'``) preserves the literal value
320 of all characters within the quotes, with the exception of the characters
321 mentioned in :attr:`~shlex.escape`. The escape characters retain its
322 special meaning only when followed by the quote in use, or the escape
323 character itself. Otherwise the escape character will be considered a
Georg Brandl116aa622007-08-15 14:28:22 +0000324 normal character.
325
326* EOF is signaled with a :const:`None` value;
327
Éric Araujo9bce3112011-07-27 18:29:31 +0200328* Quoted empty strings (``''``) are allowed.