blob: e81f9822bb91eb9d78865961340df424aeb0ea05 [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001:mod:`shlex` --- Simple lexical analysis
2========================================
3
4.. module:: shlex
5 :synopsis: Simple lexical analysis for Unix shell-like languages.
Terry Jan Reedyfa089b92016-06-11 15:02:54 -04006
Georg Brandl116aa622007-08-15 14:28:22 +00007.. moduleauthor:: Eric S. Raymond <esr@snark.thyrsus.com>
8.. moduleauthor:: Gustavo Niemeyer <niemeyer@conectiva.com>
9.. sectionauthor:: Eric S. Raymond <esr@snark.thyrsus.com>
10.. sectionauthor:: Gustavo Niemeyer <niemeyer@conectiva.com>
11
Raymond Hettingera1993682011-01-27 01:20:32 +000012**Source code:** :source:`Lib/shlex.py`
13
14--------------
Georg Brandl116aa622007-08-15 14:28:22 +000015
Serhiy Storchaka4e985672013-10-13 21:19:00 +030016The :class:`~shlex.shlex` class makes it easy to write lexical analyzers for
17simple syntaxes resembling that of the Unix shell. This will often be useful
18for writing minilanguages, (for example, in run control files for Python
Georg Brandl116aa622007-08-15 14:28:22 +000019applications) or for parsing quoted strings.
20
Georg Brandl116aa622007-08-15 14:28:22 +000021The :mod:`shlex` module defines the following functions:
22
23
Georg Brandl18244152009-09-02 20:34:52 +000024.. function:: split(s, comments=False, posix=True)
Georg Brandl116aa622007-08-15 14:28:22 +000025
26 Split the string *s* using shell-like syntax. If *comments* is :const:`False`
27 (the default), the parsing of comments in the given string will be disabled
Serhiy Storchaka4e985672013-10-13 21:19:00 +030028 (setting the :attr:`~shlex.commenters` attribute of the
29 :class:`~shlex.shlex` instance to the empty string). This function operates
30 in POSIX mode by default, but uses non-POSIX mode if the *posix* argument is
31 false.
Georg Brandl116aa622007-08-15 14:28:22 +000032
Georg Brandl116aa622007-08-15 14:28:22 +000033 .. note::
34
Serhiy Storchaka4e985672013-10-13 21:19:00 +030035 Since the :func:`split` function instantiates a :class:`~shlex.shlex`
36 instance, passing ``None`` for *s* will read the string to split from
37 standard input.
Georg Brandl116aa622007-08-15 14:28:22 +000038
Éric Araujo9bce3112011-07-27 18:29:31 +020039
40.. function:: quote(s)
41
42 Return a shell-escaped version of the string *s*. The returned value is a
Éric Araujo30e277b2011-07-29 15:08:42 +020043 string that can safely be used as one token in a shell command line, for
44 cases where you cannot use a list.
Éric Araujo9bce3112011-07-27 18:29:31 +020045
Éric Araujo30e277b2011-07-29 15:08:42 +020046 This idiom would be unsafe::
47
48 >>> filename = 'somefile; rm -rf ~'
49 >>> command = 'ls -l {}'.format(filename)
50 >>> print(command) # executed by a shell: boom!
51 ls -l somefile; rm -rf ~
52
53 :func:`quote` lets you plug the security hole::
54
Éric Araujo9bce3112011-07-27 18:29:31 +020055 >>> command = 'ls -l {}'.format(quote(filename))
56 >>> print(command)
Éric Araujo30e277b2011-07-29 15:08:42 +020057 ls -l 'somefile; rm -rf ~'
Éric Araujo9bce3112011-07-27 18:29:31 +020058 >>> remote_command = 'ssh home {}'.format(quote(command))
59 >>> print(remote_command)
Éric Araujo30e277b2011-07-29 15:08:42 +020060 ssh home 'ls -l '"'"'somefile; rm -rf ~'"'"''
61
62 The quoting is compatible with UNIX shells and with :func:`split`:
63
64 >>> remote_command = split(remote_command)
65 >>> remote_command
66 ['ssh', 'home', "ls -l 'somefile; rm -rf ~'"]
67 >>> command = split(remote_command[-1])
68 >>> command
69 ['ls', '-l', 'somefile; rm -rf ~']
Éric Araujo9bce3112011-07-27 18:29:31 +020070
Eli Bendersky493846e2012-03-01 19:07:55 +020071 .. versionadded:: 3.3
Éric Araujo9bce3112011-07-27 18:29:31 +020072
Georg Brandl116aa622007-08-15 14:28:22 +000073The :mod:`shlex` module defines the following class:
74
75
Georg Brandl18244152009-09-02 20:34:52 +000076.. class:: shlex(instream=None, infile=None, posix=False)
Georg Brandl116aa622007-08-15 14:28:22 +000077
Serhiy Storchaka4e985672013-10-13 21:19:00 +030078 A :class:`~shlex.shlex` instance or subclass instance is a lexical analyzer
79 object. The initialization argument, if present, specifies where to read
80 characters from. It must be a file-/stream-like object with
81 :meth:`~io.TextIOBase.read` and :meth:`~io.TextIOBase.readline` methods, or
82 a string. If no argument is given, input will be taken from ``sys.stdin``.
83 The second optional argument is a filename string, which sets the initial
84 value of the :attr:`~shlex.infile` attribute. If the *instream*
85 argument is omitted or equal to ``sys.stdin``, this second argument
86 defaults to "stdin". The *posix* argument defines the operational mode:
87 when *posix* is not true (default), the :class:`~shlex.shlex` instance will
88 operate in compatibility mode. When operating in POSIX mode,
89 :class:`~shlex.shlex` will try to be as close as possible to the POSIX shell
90 parsing rules.
Georg Brandl116aa622007-08-15 14:28:22 +000091
92
93.. seealso::
94
Alexandre Vassalotti1d1eaa42008-05-14 22:59:42 +000095 Module :mod:`configparser`
Georg Brandl116aa622007-08-15 14:28:22 +000096 Parser for configuration files similar to the Windows :file:`.ini` files.
97
98
99.. _shlex-objects:
100
101shlex Objects
102-------------
103
Serhiy Storchaka4e985672013-10-13 21:19:00 +0300104A :class:`~shlex.shlex` instance has the following methods:
Georg Brandl116aa622007-08-15 14:28:22 +0000105
106
107.. method:: shlex.get_token()
108
109 Return a token. If tokens have been stacked using :meth:`push_token`, pop a
110 token off the stack. Otherwise, read one from the input stream. If reading
Serhiy Storchaka4e985672013-10-13 21:19:00 +0300111 encounters an immediate end-of-file, :attr:`eof` is returned (the empty
Georg Brandl116aa622007-08-15 14:28:22 +0000112 string (``''``) in non-POSIX mode, and ``None`` in POSIX mode).
113
114
115.. method:: shlex.push_token(str)
116
117 Push the argument onto the token stack.
118
119
120.. method:: shlex.read_token()
121
122 Read a raw token. Ignore the pushback stack, and do not interpret source
123 requests. (This is not ordinarily a useful entry point, and is documented here
124 only for the sake of completeness.)
125
126
127.. method:: shlex.sourcehook(filename)
128
Serhiy Storchaka4e985672013-10-13 21:19:00 +0300129 When :class:`~shlex.shlex` detects a source request (see :attr:`source`
130 below) this method is given the following token as argument, and expected
131 to return a tuple consisting of a filename and an open file-like object.
Georg Brandl116aa622007-08-15 14:28:22 +0000132
133 Normally, this method first strips any quotes off the argument. If the result
134 is an absolute pathname, or there was no previous source request in effect, or
135 the previous source was a stream (such as ``sys.stdin``), the result is left
136 alone. Otherwise, if the result is a relative pathname, the directory part of
137 the name of the file immediately before it on the source inclusion stack is
138 prepended (this behavior is like the way the C preprocessor handles ``#include
139 "file.h"``).
140
141 The result of the manipulations is treated as a filename, and returned as the
142 first component of the tuple, with :func:`open` called on it to yield the second
143 component. (Note: this is the reverse of the order of arguments in instance
144 initialization!)
145
146 This hook is exposed so that you can use it to implement directory search paths,
147 addition of file extensions, and other namespace hacks. There is no
Serhiy Storchaka4e985672013-10-13 21:19:00 +0300148 corresponding 'close' hook, but a shlex instance will call the
149 :meth:`~io.IOBase.close` method of the sourced input stream when it returns
150 EOF.
Georg Brandl116aa622007-08-15 14:28:22 +0000151
152 For more explicit control of source stacking, use the :meth:`push_source` and
153 :meth:`pop_source` methods.
154
155
Georg Brandl18244152009-09-02 20:34:52 +0000156.. method:: shlex.push_source(newstream, newfile=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000157
158 Push an input source stream onto the input stack. If the filename argument is
159 specified it will later be available for use in error messages. This is the
160 same method used internally by the :meth:`sourcehook` method.
161
Georg Brandl116aa622007-08-15 14:28:22 +0000162
163.. method:: shlex.pop_source()
164
165 Pop the last-pushed input source from the input stack. This is the same method
166 used internally when the lexer reaches EOF on a stacked input stream.
167
Georg Brandl116aa622007-08-15 14:28:22 +0000168
Georg Brandl18244152009-09-02 20:34:52 +0000169.. method:: shlex.error_leader(infile=None, lineno=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000170
171 This method generates an error message leader in the format of a Unix C compiler
172 error label; the format is ``'"%s", line %d: '``, where the ``%s`` is replaced
173 with the name of the current source file and the ``%d`` with the current input
174 line number (the optional arguments can be used to override these).
175
176 This convenience is provided to encourage :mod:`shlex` users to generate error
177 messages in the standard, parseable format understood by Emacs and other Unix
178 tools.
179
Serhiy Storchaka4e985672013-10-13 21:19:00 +0300180Instances of :class:`~shlex.shlex` subclasses have some public instance
181variables which either control lexical analysis or can be used for debugging:
Georg Brandl116aa622007-08-15 14:28:22 +0000182
183
184.. attribute:: shlex.commenters
185
186 The string of characters that are recognized as comment beginners. All
187 characters from the comment beginner to end of line are ignored. Includes just
188 ``'#'`` by default.
189
190
191.. attribute:: shlex.wordchars
192
193 The string of characters that will accumulate into multi-character tokens. By
194 default, includes all ASCII alphanumerics and underscore.
195
196
197.. attribute:: shlex.whitespace
198
199 Characters that will be considered whitespace and skipped. Whitespace bounds
200 tokens. By default, includes space, tab, linefeed and carriage-return.
201
202
203.. attribute:: shlex.escape
204
205 Characters that will be considered as escape. This will be only used in POSIX
206 mode, and includes just ``'\'`` by default.
207
Georg Brandl116aa622007-08-15 14:28:22 +0000208
209.. attribute:: shlex.quotes
210
211 Characters that will be considered string quotes. The token accumulates until
212 the same quote is encountered again (thus, different quote types protect each
213 other as in the shell.) By default, includes ASCII single and double quotes.
214
215
216.. attribute:: shlex.escapedquotes
217
218 Characters in :attr:`quotes` that will interpret escape characters defined in
219 :attr:`escape`. This is only used in POSIX mode, and includes just ``'"'`` by
220 default.
221
Georg Brandl116aa622007-08-15 14:28:22 +0000222
223.. attribute:: shlex.whitespace_split
224
225 If ``True``, tokens will only be split in whitespaces. This is useful, for
Serhiy Storchaka4e985672013-10-13 21:19:00 +0300226 example, for parsing command lines with :class:`~shlex.shlex`, getting
227 tokens in a similar way to shell arguments.
Georg Brandl116aa622007-08-15 14:28:22 +0000228
Georg Brandl116aa622007-08-15 14:28:22 +0000229
230.. attribute:: shlex.infile
231
232 The name of the current input file, as initially set at class instantiation time
233 or stacked by later source requests. It may be useful to examine this when
234 constructing error messages.
235
236
237.. attribute:: shlex.instream
238
Serhiy Storchaka4e985672013-10-13 21:19:00 +0300239 The input stream from which this :class:`~shlex.shlex` instance is reading
240 characters.
Georg Brandl116aa622007-08-15 14:28:22 +0000241
242
243.. attribute:: shlex.source
244
Senthil Kumarana6bac952011-07-04 11:28:30 -0700245 This attribute is ``None`` by default. If you assign a string to it, that
246 string will be recognized as a lexical-level inclusion request similar to the
Georg Brandl116aa622007-08-15 14:28:22 +0000247 ``source`` keyword in various shells. That is, the immediately following token
Martin Panter8d56c022016-05-29 04:13:35 +0000248 will be opened as a filename and input will
249 be taken from that stream until EOF, at which
Serhiy Storchaka4e985672013-10-13 21:19:00 +0300250 point the :meth:`~io.IOBase.close` method of that stream will be called and
251 the input source will again become the original input stream. Source
252 requests may be stacked any number of levels deep.
Georg Brandl116aa622007-08-15 14:28:22 +0000253
254
255.. attribute:: shlex.debug
256
Serhiy Storchaka4e985672013-10-13 21:19:00 +0300257 If this attribute is numeric and ``1`` or more, a :class:`~shlex.shlex`
258 instance will print verbose progress output on its behavior. If you need
259 to use this, you can read the module source code to learn the details.
Georg Brandl116aa622007-08-15 14:28:22 +0000260
261
262.. attribute:: shlex.lineno
263
264 Source line number (count of newlines seen so far plus one).
265
266
267.. attribute:: shlex.token
268
269 The token buffer. It may be useful to examine this when catching exceptions.
270
271
272.. attribute:: shlex.eof
273
274 Token used to determine end of file. This will be set to the empty string
275 (``''``), in non-POSIX mode, and to ``None`` in POSIX mode.
276
Georg Brandl116aa622007-08-15 14:28:22 +0000277
278.. _shlex-parsing-rules:
279
280Parsing Rules
281-------------
282
Serhiy Storchaka4e985672013-10-13 21:19:00 +0300283When operating in non-POSIX mode, :class:`~shlex.shlex` will try to obey to the
Georg Brandl116aa622007-08-15 14:28:22 +0000284following rules.
285
286* Quote characters are not recognized within words (``Do"Not"Separate`` is
287 parsed as the single word ``Do"Not"Separate``);
288
289* Escape characters are not recognized;
290
291* Enclosing characters in quotes preserve the literal value of all characters
292 within the quotes;
293
294* Closing quotes separate words (``"Do"Separate`` is parsed as ``"Do"`` and
295 ``Separate``);
296
Serhiy Storchaka4e985672013-10-13 21:19:00 +0300297* If :attr:`~shlex.whitespace_split` is ``False``, any character not
298 declared to be a word character, whitespace, or a quote will be returned as
299 a single-character token. If it is ``True``, :class:`~shlex.shlex` will only
300 split words in whitespaces;
Georg Brandl116aa622007-08-15 14:28:22 +0000301
302* EOF is signaled with an empty string (``''``);
303
304* It's not possible to parse empty strings, even if quoted.
305
Serhiy Storchaka4e985672013-10-13 21:19:00 +0300306When operating in POSIX mode, :class:`~shlex.shlex` will try to obey to the
307following parsing rules.
Georg Brandl116aa622007-08-15 14:28:22 +0000308
309* Quotes are stripped out, and do not separate words (``"Do"Not"Separate"`` is
310 parsed as the single word ``DoNotSeparate``);
311
312* Non-quoted escape characters (e.g. ``'\'``) preserve the literal value of the
313 next character that follows;
314
Serhiy Storchaka4e985672013-10-13 21:19:00 +0300315* Enclosing characters in quotes which are not part of
316 :attr:`~shlex.escapedquotes` (e.g. ``"'"``) preserve the literal value
317 of all characters within the quotes;
Georg Brandl116aa622007-08-15 14:28:22 +0000318
Serhiy Storchaka4e985672013-10-13 21:19:00 +0300319* Enclosing characters in quotes which are part of
320 :attr:`~shlex.escapedquotes` (e.g. ``'"'``) preserves the literal value
321 of all characters within the quotes, with the exception of the characters
322 mentioned in :attr:`~shlex.escape`. The escape characters retain its
323 special meaning only when followed by the quote in use, or the escape
324 character itself. Otherwise the escape character will be considered a
Georg Brandl116aa622007-08-15 14:28:22 +0000325 normal character.
326
327* EOF is signaled with a :const:`None` value;
328
Éric Araujo9bce3112011-07-27 18:29:31 +0200329* Quoted empty strings (``''``) are allowed.