blob: e40a10daa5d2290251be2c44d25e9171de77e020 [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001:mod:`shlex` --- Simple lexical analysis
2========================================
3
4.. module:: shlex
5 :synopsis: Simple lexical analysis for Unix shell-like languages.
6.. moduleauthor:: Eric S. Raymond <esr@snark.thyrsus.com>
7.. moduleauthor:: Gustavo Niemeyer <niemeyer@conectiva.com>
8.. sectionauthor:: Eric S. Raymond <esr@snark.thyrsus.com>
9.. sectionauthor:: Gustavo Niemeyer <niemeyer@conectiva.com>
10
Raymond Hettingera1993682011-01-27 01:20:32 +000011**Source code:** :source:`Lib/shlex.py`
12
13--------------
Georg Brandl116aa622007-08-15 14:28:22 +000014
Serhiy Storchaka4e985672013-10-13 21:19:00 +030015The :class:`~shlex.shlex` class makes it easy to write lexical analyzers for
16simple syntaxes resembling that of the Unix shell. This will often be useful
17for writing minilanguages, (for example, in run control files for Python
Georg Brandl116aa622007-08-15 14:28:22 +000018applications) or for parsing quoted strings.
19
Georg Brandl116aa622007-08-15 14:28:22 +000020The :mod:`shlex` module defines the following functions:
21
22
Georg Brandl18244152009-09-02 20:34:52 +000023.. function:: split(s, comments=False, posix=True)
Georg Brandl116aa622007-08-15 14:28:22 +000024
25 Split the string *s* using shell-like syntax. If *comments* is :const:`False`
26 (the default), the parsing of comments in the given string will be disabled
Serhiy Storchaka4e985672013-10-13 21:19:00 +030027 (setting the :attr:`~shlex.commenters` attribute of the
28 :class:`~shlex.shlex` instance to the empty string). This function operates
29 in POSIX mode by default, but uses non-POSIX mode if the *posix* argument is
30 false.
Georg Brandl116aa622007-08-15 14:28:22 +000031
Georg Brandl116aa622007-08-15 14:28:22 +000032 .. note::
33
Serhiy Storchaka4e985672013-10-13 21:19:00 +030034 Since the :func:`split` function instantiates a :class:`~shlex.shlex`
35 instance, passing ``None`` for *s* will read the string to split from
36 standard input.
Georg Brandl116aa622007-08-15 14:28:22 +000037
Éric Araujo9bce3112011-07-27 18:29:31 +020038
39.. function:: quote(s)
40
41 Return a shell-escaped version of the string *s*. The returned value is a
Éric Araujo30e277b2011-07-29 15:08:42 +020042 string that can safely be used as one token in a shell command line, for
43 cases where you cannot use a list.
Éric Araujo9bce3112011-07-27 18:29:31 +020044
Éric Araujo30e277b2011-07-29 15:08:42 +020045 This idiom would be unsafe::
46
47 >>> filename = 'somefile; rm -rf ~'
48 >>> command = 'ls -l {}'.format(filename)
49 >>> print(command) # executed by a shell: boom!
50 ls -l somefile; rm -rf ~
51
52 :func:`quote` lets you plug the security hole::
53
Éric Araujo9bce3112011-07-27 18:29:31 +020054 >>> command = 'ls -l {}'.format(quote(filename))
55 >>> print(command)
Éric Araujo30e277b2011-07-29 15:08:42 +020056 ls -l 'somefile; rm -rf ~'
Éric Araujo9bce3112011-07-27 18:29:31 +020057 >>> remote_command = 'ssh home {}'.format(quote(command))
58 >>> print(remote_command)
Éric Araujo30e277b2011-07-29 15:08:42 +020059 ssh home 'ls -l '"'"'somefile; rm -rf ~'"'"''
60
61 The quoting is compatible with UNIX shells and with :func:`split`:
62
63 >>> remote_command = split(remote_command)
64 >>> remote_command
65 ['ssh', 'home', "ls -l 'somefile; rm -rf ~'"]
66 >>> command = split(remote_command[-1])
67 >>> command
68 ['ls', '-l', 'somefile; rm -rf ~']
Éric Araujo9bce3112011-07-27 18:29:31 +020069
Eli Bendersky493846e2012-03-01 19:07:55 +020070 .. versionadded:: 3.3
Éric Araujo9bce3112011-07-27 18:29:31 +020071
Georg Brandl116aa622007-08-15 14:28:22 +000072The :mod:`shlex` module defines the following class:
73
74
Georg Brandl18244152009-09-02 20:34:52 +000075.. class:: shlex(instream=None, infile=None, posix=False)
Georg Brandl116aa622007-08-15 14:28:22 +000076
Serhiy Storchaka4e985672013-10-13 21:19:00 +030077 A :class:`~shlex.shlex` instance or subclass instance is a lexical analyzer
78 object. The initialization argument, if present, specifies where to read
79 characters from. It must be a file-/stream-like object with
80 :meth:`~io.TextIOBase.read` and :meth:`~io.TextIOBase.readline` methods, or
81 a string. If no argument is given, input will be taken from ``sys.stdin``.
82 The second optional argument is a filename string, which sets the initial
83 value of the :attr:`~shlex.infile` attribute. If the *instream*
84 argument is omitted or equal to ``sys.stdin``, this second argument
85 defaults to "stdin". The *posix* argument defines the operational mode:
86 when *posix* is not true (default), the :class:`~shlex.shlex` instance will
87 operate in compatibility mode. When operating in POSIX mode,
88 :class:`~shlex.shlex` will try to be as close as possible to the POSIX shell
89 parsing rules.
Georg Brandl116aa622007-08-15 14:28:22 +000090
91
92.. seealso::
93
Alexandre Vassalotti1d1eaa42008-05-14 22:59:42 +000094 Module :mod:`configparser`
Georg Brandl116aa622007-08-15 14:28:22 +000095 Parser for configuration files similar to the Windows :file:`.ini` files.
96
97
98.. _shlex-objects:
99
100shlex Objects
101-------------
102
Serhiy Storchaka4e985672013-10-13 21:19:00 +0300103A :class:`~shlex.shlex` instance has the following methods:
Georg Brandl116aa622007-08-15 14:28:22 +0000104
105
106.. method:: shlex.get_token()
107
108 Return a token. If tokens have been stacked using :meth:`push_token`, pop a
109 token off the stack. Otherwise, read one from the input stream. If reading
Serhiy Storchaka4e985672013-10-13 21:19:00 +0300110 encounters an immediate end-of-file, :attr:`eof` is returned (the empty
Georg Brandl116aa622007-08-15 14:28:22 +0000111 string (``''``) in non-POSIX mode, and ``None`` in POSIX mode).
112
113
114.. method:: shlex.push_token(str)
115
116 Push the argument onto the token stack.
117
118
119.. method:: shlex.read_token()
120
121 Read a raw token. Ignore the pushback stack, and do not interpret source
122 requests. (This is not ordinarily a useful entry point, and is documented here
123 only for the sake of completeness.)
124
125
126.. method:: shlex.sourcehook(filename)
127
Serhiy Storchaka4e985672013-10-13 21:19:00 +0300128 When :class:`~shlex.shlex` detects a source request (see :attr:`source`
129 below) this method is given the following token as argument, and expected
130 to return a tuple consisting of a filename and an open file-like object.
Georg Brandl116aa622007-08-15 14:28:22 +0000131
132 Normally, this method first strips any quotes off the argument. If the result
133 is an absolute pathname, or there was no previous source request in effect, or
134 the previous source was a stream (such as ``sys.stdin``), the result is left
135 alone. Otherwise, if the result is a relative pathname, the directory part of
136 the name of the file immediately before it on the source inclusion stack is
137 prepended (this behavior is like the way the C preprocessor handles ``#include
138 "file.h"``).
139
140 The result of the manipulations is treated as a filename, and returned as the
141 first component of the tuple, with :func:`open` called on it to yield the second
142 component. (Note: this is the reverse of the order of arguments in instance
143 initialization!)
144
145 This hook is exposed so that you can use it to implement directory search paths,
146 addition of file extensions, and other namespace hacks. There is no
Serhiy Storchaka4e985672013-10-13 21:19:00 +0300147 corresponding 'close' hook, but a shlex instance will call the
148 :meth:`~io.IOBase.close` method of the sourced input stream when it returns
149 EOF.
Georg Brandl116aa622007-08-15 14:28:22 +0000150
151 For more explicit control of source stacking, use the :meth:`push_source` and
152 :meth:`pop_source` methods.
153
154
Georg Brandl18244152009-09-02 20:34:52 +0000155.. method:: shlex.push_source(newstream, newfile=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000156
157 Push an input source stream onto the input stack. If the filename argument is
158 specified it will later be available for use in error messages. This is the
159 same method used internally by the :meth:`sourcehook` method.
160
Georg Brandl116aa622007-08-15 14:28:22 +0000161
162.. method:: shlex.pop_source()
163
164 Pop the last-pushed input source from the input stack. This is the same method
165 used internally when the lexer reaches EOF on a stacked input stream.
166
Georg Brandl116aa622007-08-15 14:28:22 +0000167
Georg Brandl18244152009-09-02 20:34:52 +0000168.. method:: shlex.error_leader(infile=None, lineno=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000169
170 This method generates an error message leader in the format of a Unix C compiler
171 error label; the format is ``'"%s", line %d: '``, where the ``%s`` is replaced
172 with the name of the current source file and the ``%d`` with the current input
173 line number (the optional arguments can be used to override these).
174
175 This convenience is provided to encourage :mod:`shlex` users to generate error
176 messages in the standard, parseable format understood by Emacs and other Unix
177 tools.
178
Serhiy Storchaka4e985672013-10-13 21:19:00 +0300179Instances of :class:`~shlex.shlex` subclasses have some public instance
180variables which either control lexical analysis or can be used for debugging:
Georg Brandl116aa622007-08-15 14:28:22 +0000181
182
183.. attribute:: shlex.commenters
184
185 The string of characters that are recognized as comment beginners. All
186 characters from the comment beginner to end of line are ignored. Includes just
187 ``'#'`` by default.
188
189
190.. attribute:: shlex.wordchars
191
192 The string of characters that will accumulate into multi-character tokens. By
193 default, includes all ASCII alphanumerics and underscore.
194
195
196.. attribute:: shlex.whitespace
197
198 Characters that will be considered whitespace and skipped. Whitespace bounds
199 tokens. By default, includes space, tab, linefeed and carriage-return.
200
201
202.. attribute:: shlex.escape
203
204 Characters that will be considered as escape. This will be only used in POSIX
205 mode, and includes just ``'\'`` by default.
206
Georg Brandl116aa622007-08-15 14:28:22 +0000207
208.. attribute:: shlex.quotes
209
210 Characters that will be considered string quotes. The token accumulates until
211 the same quote is encountered again (thus, different quote types protect each
212 other as in the shell.) By default, includes ASCII single and double quotes.
213
214
215.. attribute:: shlex.escapedquotes
216
217 Characters in :attr:`quotes` that will interpret escape characters defined in
218 :attr:`escape`. This is only used in POSIX mode, and includes just ``'"'`` by
219 default.
220
Georg Brandl116aa622007-08-15 14:28:22 +0000221
222.. attribute:: shlex.whitespace_split
223
224 If ``True``, tokens will only be split in whitespaces. This is useful, for
Serhiy Storchaka4e985672013-10-13 21:19:00 +0300225 example, for parsing command lines with :class:`~shlex.shlex`, getting
226 tokens in a similar way to shell arguments.
Georg Brandl116aa622007-08-15 14:28:22 +0000227
Georg Brandl116aa622007-08-15 14:28:22 +0000228
229.. attribute:: shlex.infile
230
231 The name of the current input file, as initially set at class instantiation time
232 or stacked by later source requests. It may be useful to examine this when
233 constructing error messages.
234
235
236.. attribute:: shlex.instream
237
Serhiy Storchaka4e985672013-10-13 21:19:00 +0300238 The input stream from which this :class:`~shlex.shlex` instance is reading
239 characters.
Georg Brandl116aa622007-08-15 14:28:22 +0000240
241
242.. attribute:: shlex.source
243
Senthil Kumarana6bac952011-07-04 11:28:30 -0700244 This attribute is ``None`` by default. If you assign a string to it, that
245 string will be recognized as a lexical-level inclusion request similar to the
Georg Brandl116aa622007-08-15 14:28:22 +0000246 ``source`` keyword in various shells. That is, the immediately following token
247 will opened as a filename and input taken from that stream until EOF, at which
Serhiy Storchaka4e985672013-10-13 21:19:00 +0300248 point the :meth:`~io.IOBase.close` method of that stream will be called and
249 the input source will again become the original input stream. Source
250 requests may be stacked any number of levels deep.
Georg Brandl116aa622007-08-15 14:28:22 +0000251
252
253.. attribute:: shlex.debug
254
Serhiy Storchaka4e985672013-10-13 21:19:00 +0300255 If this attribute is numeric and ``1`` or more, a :class:`~shlex.shlex`
256 instance will print verbose progress output on its behavior. If you need
257 to use this, you can read the module source code to learn the details.
Georg Brandl116aa622007-08-15 14:28:22 +0000258
259
260.. attribute:: shlex.lineno
261
262 Source line number (count of newlines seen so far plus one).
263
264
265.. attribute:: shlex.token
266
267 The token buffer. It may be useful to examine this when catching exceptions.
268
269
270.. attribute:: shlex.eof
271
272 Token used to determine end of file. This will be set to the empty string
273 (``''``), in non-POSIX mode, and to ``None`` in POSIX mode.
274
Georg Brandl116aa622007-08-15 14:28:22 +0000275
276.. _shlex-parsing-rules:
277
278Parsing Rules
279-------------
280
Serhiy Storchaka4e985672013-10-13 21:19:00 +0300281When operating in non-POSIX mode, :class:`~shlex.shlex` will try to obey to the
Georg Brandl116aa622007-08-15 14:28:22 +0000282following rules.
283
284* Quote characters are not recognized within words (``Do"Not"Separate`` is
285 parsed as the single word ``Do"Not"Separate``);
286
287* Escape characters are not recognized;
288
289* Enclosing characters in quotes preserve the literal value of all characters
290 within the quotes;
291
292* Closing quotes separate words (``"Do"Separate`` is parsed as ``"Do"`` and
293 ``Separate``);
294
Serhiy Storchaka4e985672013-10-13 21:19:00 +0300295* If :attr:`~shlex.whitespace_split` is ``False``, any character not
296 declared to be a word character, whitespace, or a quote will be returned as
297 a single-character token. If it is ``True``, :class:`~shlex.shlex` will only
298 split words in whitespaces;
Georg Brandl116aa622007-08-15 14:28:22 +0000299
300* EOF is signaled with an empty string (``''``);
301
302* It's not possible to parse empty strings, even if quoted.
303
Serhiy Storchaka4e985672013-10-13 21:19:00 +0300304When operating in POSIX mode, :class:`~shlex.shlex` will try to obey to the
305following parsing rules.
Georg Brandl116aa622007-08-15 14:28:22 +0000306
307* Quotes are stripped out, and do not separate words (``"Do"Not"Separate"`` is
308 parsed as the single word ``DoNotSeparate``);
309
310* Non-quoted escape characters (e.g. ``'\'``) preserve the literal value of the
311 next character that follows;
312
Serhiy Storchaka4e985672013-10-13 21:19:00 +0300313* Enclosing characters in quotes which are not part of
314 :attr:`~shlex.escapedquotes` (e.g. ``"'"``) preserve the literal value
315 of all characters within the quotes;
Georg Brandl116aa622007-08-15 14:28:22 +0000316
Serhiy Storchaka4e985672013-10-13 21:19:00 +0300317* Enclosing characters in quotes which are part of
318 :attr:`~shlex.escapedquotes` (e.g. ``'"'``) preserves the literal value
319 of all characters within the quotes, with the exception of the characters
320 mentioned in :attr:`~shlex.escape`. The escape characters retain its
321 special meaning only when followed by the quote in use, or the escape
322 character itself. Otherwise the escape character will be considered a
Georg Brandl116aa622007-08-15 14:28:22 +0000323 normal character.
324
325* EOF is signaled with a :const:`None` value;
326
Éric Araujo9bce3112011-07-27 18:29:31 +0200327* Quoted empty strings (``''``) are allowed.