blob: ee241f4a52cc9207808b2f7311c31970b9c6a5fb [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001
2:mod:`shlex` --- Simple lexical analysis
3========================================
4
5.. module:: shlex
6 :synopsis: Simple lexical analysis for Unix shell-like languages.
7.. moduleauthor:: Eric S. Raymond <esr@snark.thyrsus.com>
8.. moduleauthor:: Gustavo Niemeyer <niemeyer@conectiva.com>
9.. sectionauthor:: Eric S. Raymond <esr@snark.thyrsus.com>
10.. sectionauthor:: Gustavo Niemeyer <niemeyer@conectiva.com>
11
12
Georg Brandl116aa622007-08-15 14:28:22 +000013The :class:`shlex` class makes it easy to write lexical analyzers for simple
14syntaxes resembling that of the Unix shell. This will often be useful for
15writing minilanguages, (for example, in run control files for Python
16applications) or for parsing quoted strings.
17
Georg Brandl116aa622007-08-15 14:28:22 +000018The :mod:`shlex` module defines the following functions:
19
20
21.. function:: split(s[, comments[, posix]])
22
23 Split the string *s* using shell-like syntax. If *comments* is :const:`False`
24 (the default), the parsing of comments in the given string will be disabled
25 (setting the :attr:`commenters` member of the :class:`shlex` instance to the
26 empty string). This function operates in POSIX mode by default, but uses
27 non-POSIX mode if the *posix* argument is false.
28
Georg Brandl116aa622007-08-15 14:28:22 +000029 .. note::
30
31 Since the :func:`split` function instantiates a :class:`shlex` instance, passing
32 ``None`` for *s* will read the string to split from standard input.
33
34The :mod:`shlex` module defines the following class:
35
36
37.. class:: shlex([instream[, infile[, posix]]])
38
39 A :class:`shlex` instance or subclass instance is a lexical analyzer object.
40 The initialization argument, if present, specifies where to read characters
41 from. It must be a file-/stream-like object with :meth:`read` and
Georg Brandle6bcc912008-05-12 18:05:20 +000042 :meth:`readline` methods, or a string. If no argument is given, input will
43 be taken from ``sys.stdin``. The second optional argument is a filename
44 string, which sets the initial value of the :attr:`infile` member. If the
45 *instream* argument is omitted or equal to ``sys.stdin``, this second
46 argument defaults to "stdin". The *posix* argument defines the operational
47 mode: when *posix* is not true (default), the :class:`shlex` instance will
48 operate in compatibility mode. When operating in POSIX mode, :class:`shlex`
49 will try to be as close as possible to the POSIX shell parsing rules.
Georg Brandl116aa622007-08-15 14:28:22 +000050
51
52.. seealso::
53
Alexandre Vassalotti1d1eaa42008-05-14 22:59:42 +000054 Module :mod:`configparser`
Georg Brandl116aa622007-08-15 14:28:22 +000055 Parser for configuration files similar to the Windows :file:`.ini` files.
56
57
58.. _shlex-objects:
59
60shlex Objects
61-------------
62
63A :class:`shlex` instance has the following methods:
64
65
66.. method:: shlex.get_token()
67
68 Return a token. If tokens have been stacked using :meth:`push_token`, pop a
69 token off the stack. Otherwise, read one from the input stream. If reading
70 encounters an immediate end-of-file, :attr:`self.eof` is returned (the empty
71 string (``''``) in non-POSIX mode, and ``None`` in POSIX mode).
72
73
74.. method:: shlex.push_token(str)
75
76 Push the argument onto the token stack.
77
78
79.. method:: shlex.read_token()
80
81 Read a raw token. Ignore the pushback stack, and do not interpret source
82 requests. (This is not ordinarily a useful entry point, and is documented here
83 only for the sake of completeness.)
84
85
86.. method:: shlex.sourcehook(filename)
87
88 When :class:`shlex` detects a source request (see :attr:`source` below) this
89 method is given the following token as argument, and expected to return a tuple
90 consisting of a filename and an open file-like object.
91
92 Normally, this method first strips any quotes off the argument. If the result
93 is an absolute pathname, or there was no previous source request in effect, or
94 the previous source was a stream (such as ``sys.stdin``), the result is left
95 alone. Otherwise, if the result is a relative pathname, the directory part of
96 the name of the file immediately before it on the source inclusion stack is
97 prepended (this behavior is like the way the C preprocessor handles ``#include
98 "file.h"``).
99
100 The result of the manipulations is treated as a filename, and returned as the
101 first component of the tuple, with :func:`open` called on it to yield the second
102 component. (Note: this is the reverse of the order of arguments in instance
103 initialization!)
104
105 This hook is exposed so that you can use it to implement directory search paths,
106 addition of file extensions, and other namespace hacks. There is no
107 corresponding 'close' hook, but a shlex instance will call the :meth:`close`
108 method of the sourced input stream when it returns EOF.
109
110 For more explicit control of source stacking, use the :meth:`push_source` and
111 :meth:`pop_source` methods.
112
113
114.. method:: shlex.push_source(stream[, filename])
115
116 Push an input source stream onto the input stack. If the filename argument is
117 specified it will later be available for use in error messages. This is the
118 same method used internally by the :meth:`sourcehook` method.
119
Georg Brandl116aa622007-08-15 14:28:22 +0000120
121.. method:: shlex.pop_source()
122
123 Pop the last-pushed input source from the input stack. This is the same method
124 used internally when the lexer reaches EOF on a stacked input stream.
125
Georg Brandl116aa622007-08-15 14:28:22 +0000126
127.. method:: shlex.error_leader([file[, line]])
128
129 This method generates an error message leader in the format of a Unix C compiler
130 error label; the format is ``'"%s", line %d: '``, where the ``%s`` is replaced
131 with the name of the current source file and the ``%d`` with the current input
132 line number (the optional arguments can be used to override these).
133
134 This convenience is provided to encourage :mod:`shlex` users to generate error
135 messages in the standard, parseable format understood by Emacs and other Unix
136 tools.
137
138Instances of :class:`shlex` subclasses have some public instance variables which
139either control lexical analysis or can be used for debugging:
140
141
142.. attribute:: shlex.commenters
143
144 The string of characters that are recognized as comment beginners. All
145 characters from the comment beginner to end of line are ignored. Includes just
146 ``'#'`` by default.
147
148
149.. attribute:: shlex.wordchars
150
151 The string of characters that will accumulate into multi-character tokens. By
152 default, includes all ASCII alphanumerics and underscore.
153
154
155.. attribute:: shlex.whitespace
156
157 Characters that will be considered whitespace and skipped. Whitespace bounds
158 tokens. By default, includes space, tab, linefeed and carriage-return.
159
160
161.. attribute:: shlex.escape
162
163 Characters that will be considered as escape. This will be only used in POSIX
164 mode, and includes just ``'\'`` by default.
165
Georg Brandl116aa622007-08-15 14:28:22 +0000166
167.. attribute:: shlex.quotes
168
169 Characters that will be considered string quotes. The token accumulates until
170 the same quote is encountered again (thus, different quote types protect each
171 other as in the shell.) By default, includes ASCII single and double quotes.
172
173
174.. attribute:: shlex.escapedquotes
175
176 Characters in :attr:`quotes` that will interpret escape characters defined in
177 :attr:`escape`. This is only used in POSIX mode, and includes just ``'"'`` by
178 default.
179
Georg Brandl116aa622007-08-15 14:28:22 +0000180
181.. attribute:: shlex.whitespace_split
182
183 If ``True``, tokens will only be split in whitespaces. This is useful, for
184 example, for parsing command lines with :class:`shlex`, getting tokens in a
185 similar way to shell arguments.
186
Georg Brandl116aa622007-08-15 14:28:22 +0000187
188.. attribute:: shlex.infile
189
190 The name of the current input file, as initially set at class instantiation time
191 or stacked by later source requests. It may be useful to examine this when
192 constructing error messages.
193
194
195.. attribute:: shlex.instream
196
197 The input stream from which this :class:`shlex` instance is reading characters.
198
199
200.. attribute:: shlex.source
201
202 This member is ``None`` by default. If you assign a string to it, that string
203 will be recognized as a lexical-level inclusion request similar to the
204 ``source`` keyword in various shells. That is, the immediately following token
205 will opened as a filename and input taken from that stream until EOF, at which
206 point the :meth:`close` method of that stream will be called and the input
207 source will again become the original input stream. Source requests may be
208 stacked any number of levels deep.
209
210
211.. attribute:: shlex.debug
212
213 If this member is numeric and ``1`` or more, a :class:`shlex` instance will
214 print verbose progress output on its behavior. If you need to use this, you can
215 read the module source code to learn the details.
216
217
218.. attribute:: shlex.lineno
219
220 Source line number (count of newlines seen so far plus one).
221
222
223.. attribute:: shlex.token
224
225 The token buffer. It may be useful to examine this when catching exceptions.
226
227
228.. attribute:: shlex.eof
229
230 Token used to determine end of file. This will be set to the empty string
231 (``''``), in non-POSIX mode, and to ``None`` in POSIX mode.
232
Georg Brandl116aa622007-08-15 14:28:22 +0000233
234.. _shlex-parsing-rules:
235
236Parsing Rules
237-------------
238
239When operating in non-POSIX mode, :class:`shlex` will try to obey to the
240following rules.
241
242* Quote characters are not recognized within words (``Do"Not"Separate`` is
243 parsed as the single word ``Do"Not"Separate``);
244
245* Escape characters are not recognized;
246
247* Enclosing characters in quotes preserve the literal value of all characters
248 within the quotes;
249
250* Closing quotes separate words (``"Do"Separate`` is parsed as ``"Do"`` and
251 ``Separate``);
252
253* If :attr:`whitespace_split` is ``False``, any character not declared to be a
254 word character, whitespace, or a quote will be returned as a single-character
255 token. If it is ``True``, :class:`shlex` will only split words in whitespaces;
256
257* EOF is signaled with an empty string (``''``);
258
259* It's not possible to parse empty strings, even if quoted.
260
261When operating in POSIX mode, :class:`shlex` will try to obey to the following
262parsing rules.
263
264* Quotes are stripped out, and do not separate words (``"Do"Not"Separate"`` is
265 parsed as the single word ``DoNotSeparate``);
266
267* Non-quoted escape characters (e.g. ``'\'``) preserve the literal value of the
268 next character that follows;
269
270* Enclosing characters in quotes which are not part of :attr:`escapedquotes`
271 (e.g. ``"'"``) preserve the literal value of all characters within the quotes;
272
273* Enclosing characters in quotes which are part of :attr:`escapedquotes` (e.g.
274 ``'"'``) preserves the literal value of all characters within the quotes, with
275 the exception of the characters mentioned in :attr:`escape`. The escape
276 characters retain its special meaning only when followed by the quote in use, or
277 the escape character itself. Otherwise the escape character will be considered a
278 normal character.
279
280* EOF is signaled with a :const:`None` value;
281
282* Quoted empty strings (``''``) are allowed;
283