Fred Drake | 1189fa9 | 1998-12-22 18:24:13 +0000 | [diff] [blame] | 1 | \section{\module{shlex} --- |
Fred Drake | 184e836 | 1999-05-11 15:14:15 +0000 | [diff] [blame] | 2 | Simple lexical analysis} |
Fred Drake | 1189fa9 | 1998-12-22 18:24:13 +0000 | [diff] [blame] | 3 | |
| 4 | \declaremodule{standard}{shlex} |
Fred Drake | c116b82 | 2001-05-09 15:50:17 +0000 | [diff] [blame] | 5 | \modulesynopsis{Simple lexical analysis for \UNIX\ shell-like languages.} |
Fred Drake | 1189fa9 | 1998-12-22 18:24:13 +0000 | [diff] [blame] | 6 | \moduleauthor{Eric S. Raymond}{esr@snark.thyrsus.com} |
Gustavo Niemeyer | 68d8cef | 2003-04-17 21:31:33 +0000 | [diff] [blame] | 7 | \moduleauthor{Gustavo Niemeyer}{niemeyer@conectiva.com} |
Fred Drake | 1189fa9 | 1998-12-22 18:24:13 +0000 | [diff] [blame] | 8 | \sectionauthor{Eric S. Raymond}{esr@snark.thyrsus.com} |
Gustavo Niemeyer | 68d8cef | 2003-04-17 21:31:33 +0000 | [diff] [blame] | 9 | \sectionauthor{Gustavo Niemeyer}{niemeyer@conectiva.com} |
Fred Drake | 1189fa9 | 1998-12-22 18:24:13 +0000 | [diff] [blame] | 10 | |
Fred Drake | 292b9eb | 1998-12-22 18:40:50 +0000 | [diff] [blame] | 11 | \versionadded{1.5.2} |
Fred Drake | 1189fa9 | 1998-12-22 18:24:13 +0000 | [diff] [blame] | 12 | |
| 13 | The \class{shlex} class makes it easy to write lexical analyzers for |
| 14 | simple syntaxes resembling that of the \UNIX{} shell. This will often |
Fred Drake | af78512 | 2003-12-31 05:18:46 +0000 | [diff] [blame] | 15 | be useful for writing minilanguages, (for example, in run control |
| 16 | files for Python applications) or for parsing quoted strings. |
Gustavo Niemeyer | 68d8cef | 2003-04-17 21:31:33 +0000 | [diff] [blame] | 17 | |
Georg Brandl | 1aa74ee | 2005-09-29 20:24:06 +0000 | [diff] [blame] | 18 | \note{The \module{shlex} module currently does not support Unicode input.} |
| 19 | |
Gustavo Niemeyer | 68d8cef | 2003-04-17 21:31:33 +0000 | [diff] [blame] | 20 | The \module{shlex} module defines the following functions: |
| 21 | |
Guido van Rossum | e7ba495 | 2007-06-06 23:52:48 +0000 | [diff] [blame] | 22 | \begin{funcdesc}{split}{s\optional{, comments\optional{, posix}}} |
Gustavo Niemeyer | 48f3dcc | 2003-04-20 01:57:03 +0000 | [diff] [blame] | 23 | Split the string \var{s} using shell-like syntax. If \var{comments} is |
Fred Drake | af78512 | 2003-12-31 05:18:46 +0000 | [diff] [blame] | 24 | \constant{False} (the default), the parsing of comments in the given |
| 25 | string will be disabled (setting the \member{commenters} member of the |
| 26 | \class{shlex} instance to the empty string). This function operates |
Guido van Rossum | e7ba495 | 2007-06-06 23:52:48 +0000 | [diff] [blame] | 27 | in \POSIX{} mode by default, but uses non-\POSIX{} mode if the |
| 28 | \var{posix} argument is false. |
Gustavo Niemeyer | 68d8cef | 2003-04-17 21:31:33 +0000 | [diff] [blame] | 29 | \versionadded{2.3} |
Guido van Rossum | e7ba495 | 2007-06-06 23:52:48 +0000 | [diff] [blame] | 30 | \versionchanged[Added the \var{posix} parameter]{2.6} |
Guido van Rossum | 486364b | 2007-06-30 05:01:58 +0000 | [diff] [blame] | 31 | \note{Since the \function{split()} function instantiates a \class{shlex} |
| 32 | instance, passing \code{None} for \var{s} will read the string |
| 33 | to split from standard input.} |
Gustavo Niemeyer | 68d8cef | 2003-04-17 21:31:33 +0000 | [diff] [blame] | 34 | \end{funcdesc} |
| 35 | |
Fred Drake | af78512 | 2003-12-31 05:18:46 +0000 | [diff] [blame] | 36 | The \module{shlex} module defines the following class: |
Gustavo Niemeyer | 68d8cef | 2003-04-17 21:31:33 +0000 | [diff] [blame] | 37 | |
Fred Drake | af78512 | 2003-12-31 05:18:46 +0000 | [diff] [blame] | 38 | \begin{classdesc}{shlex}{\optional{instream\optional{, |
| 39 | infile\optional{, posix}}}} |
Gustavo Niemeyer | 68d8cef | 2003-04-17 21:31:33 +0000 | [diff] [blame] | 40 | A \class{shlex} instance or subclass instance is a lexical analyzer |
| 41 | object. The initialization argument, if present, specifies where to |
| 42 | read characters from. It must be a file-/stream-like object with |
| 43 | \method{read()} and \method{readline()} methods, or a string (strings |
| 44 | are accepted since Python 2.3). If no argument is given, input will be |
| 45 | taken from \code{sys.stdin}. The second optional argument is a filename |
| 46 | string, which sets the initial value of the \member{infile} member. If |
| 47 | the \var{instream} argument is omitted or equal to \code{sys.stdin}, |
| 48 | this second argument defaults to ``stdin''. The \var{posix} argument |
Fred Drake | aa3b5d2 | 2003-04-17 21:49:04 +0000 | [diff] [blame] | 49 | was introduced in Python 2.3, and defines the operational mode. When |
Gustavo Niemeyer | 68d8cef | 2003-04-17 21:31:33 +0000 | [diff] [blame] | 50 | \var{posix} is not true (default), the \class{shlex} instance will |
Fred Drake | aa3b5d2 | 2003-04-17 21:49:04 +0000 | [diff] [blame] | 51 | operate in compatibility mode. When operating in \POSIX{} mode, |
| 52 | \class{shlex} will try to be as close as possible to the \POSIX{} shell |
Fred Drake | af78512 | 2003-12-31 05:18:46 +0000 | [diff] [blame] | 53 | parsing rules. See section~\ref{shlex-objects}. |
Gustavo Niemeyer | 68d8cef | 2003-04-17 21:31:33 +0000 | [diff] [blame] | 54 | \end{classdesc} |
| 55 | |
Fred Drake | af78512 | 2003-12-31 05:18:46 +0000 | [diff] [blame] | 56 | \begin{seealso} |
| 57 | \seemodule{ConfigParser}{Parser for configuration files similar to the |
| 58 | Windows \file{.ini} files.} |
| 59 | \end{seealso} |
| 60 | |
| 61 | |
Fred Drake | 1189fa9 | 1998-12-22 18:24:13 +0000 | [diff] [blame] | 62 | \subsection{shlex Objects \label{shlex-objects}} |
Guido van Rossum | 5e97c9d | 1998-12-22 05:18:24 +0000 | [diff] [blame] | 63 | |
| 64 | A \class{shlex} instance has the following methods: |
| 65 | |
Guido van Rossum | d8faa36 | 2007-04-27 19:54:29 +0000 | [diff] [blame] | 66 | \begin{methoddesc}[shlex]{get_token}{} |
Fred Drake | 1189fa9 | 1998-12-22 18:24:13 +0000 | [diff] [blame] | 67 | Return a token. If tokens have been stacked using |
| 68 | \method{push_token()}, pop a token off the stack. Otherwise, read one |
| 69 | from the input stream. If reading encounters an immediate |
Fred Drake | aa3b5d2 | 2003-04-17 21:49:04 +0000 | [diff] [blame] | 70 | end-of-file, \member{self.eof} is returned (the empty string (\code{''}) |
| 71 | in non-\POSIX{} mode, and \code{None} in \POSIX{} mode). |
Guido van Rossum | 5e97c9d | 1998-12-22 05:18:24 +0000 | [diff] [blame] | 72 | \end{methoddesc} |
| 73 | |
Guido van Rossum | d8faa36 | 2007-04-27 19:54:29 +0000 | [diff] [blame] | 74 | \begin{methoddesc}[shlex]{push_token}{str} |
Guido van Rossum | 5e97c9d | 1998-12-22 05:18:24 +0000 | [diff] [blame] | 75 | Push the argument onto the token stack. |
| 76 | \end{methoddesc} |
| 77 | |
Guido van Rossum | d8faa36 | 2007-04-27 19:54:29 +0000 | [diff] [blame] | 78 | \begin{methoddesc}[shlex]{read_token}{} |
Guido van Rossum | d67ddbb | 2000-05-01 20:14:47 +0000 | [diff] [blame] | 79 | Read a raw token. Ignore the pushback stack, and do not interpret source |
| 80 | requests. (This is not ordinarily a useful entry point, and is |
| 81 | documented here only for the sake of completeness.) |
| 82 | \end{methoddesc} |
| 83 | |
Guido van Rossum | d8faa36 | 2007-04-27 19:54:29 +0000 | [diff] [blame] | 84 | \begin{methoddesc}[shlex]{sourcehook}{filename} |
Fred Drake | 52dc76c | 2000-07-03 09:56:23 +0000 | [diff] [blame] | 85 | When \class{shlex} detects a source request (see |
| 86 | \member{source} below) this method is given the following token as |
| 87 | argument, and expected to return a tuple consisting of a filename and |
| 88 | an open file-like object. |
Guido van Rossum | d67ddbb | 2000-05-01 20:14:47 +0000 | [diff] [blame] | 89 | |
Fred Drake | 52dc76c | 2000-07-03 09:56:23 +0000 | [diff] [blame] | 90 | Normally, this method first strips any quotes off the argument. If |
| 91 | the result is an absolute pathname, or there was no previous source |
| 92 | request in effect, or the previous source was a stream |
Fred Drake | af78512 | 2003-12-31 05:18:46 +0000 | [diff] [blame] | 93 | (such as \code{sys.stdin}), the result is left alone. Otherwise, if the |
Fred Drake | 52dc76c | 2000-07-03 09:56:23 +0000 | [diff] [blame] | 94 | result is a relative pathname, the directory part of the name of the |
| 95 | file immediately before it on the source inclusion stack is prepended |
| 96 | (this behavior is like the way the C preprocessor handles |
Eric S. Raymond | bd1a489 | 2001-01-16 14:18:55 +0000 | [diff] [blame] | 97 | \code{\#include "file.h"}). |
| 98 | |
| 99 | The result of the manipulations is treated as a filename, and returned |
| 100 | as the first component of the tuple, with |
| 101 | \function{open()} called on it to yield the second component. (Note: |
| 102 | this is the reverse of the order of arguments in instance initialization!) |
Guido van Rossum | d67ddbb | 2000-05-01 20:14:47 +0000 | [diff] [blame] | 103 | |
Fred Drake | 52dc76c | 2000-07-03 09:56:23 +0000 | [diff] [blame] | 104 | This hook is exposed so that you can use it to implement directory |
| 105 | search paths, addition of file extensions, and other namespace hacks. |
Guido van Rossum | d67ddbb | 2000-05-01 20:14:47 +0000 | [diff] [blame] | 106 | There is no corresponding `close' hook, but a shlex instance will call |
Fred Drake | 52dc76c | 2000-07-03 09:56:23 +0000 | [diff] [blame] | 107 | the \method{close()} method of the sourced input stream when it |
| 108 | returns \EOF. |
Eric S. Raymond | bd1a489 | 2001-01-16 14:18:55 +0000 | [diff] [blame] | 109 | |
Fred Drake | 25be193 | 2001-01-16 20:52:41 +0000 | [diff] [blame] | 110 | For more explicit control of source stacking, use the |
| 111 | \method{push_source()} and \method{pop_source()} methods. |
Eric S. Raymond | bd1a489 | 2001-01-16 14:18:55 +0000 | [diff] [blame] | 112 | \end{methoddesc} |
| 113 | |
Guido van Rossum | d8faa36 | 2007-04-27 19:54:29 +0000 | [diff] [blame] | 114 | \begin{methoddesc}[shlex]{push_source}{stream\optional{, filename}} |
Eric S. Raymond | bd1a489 | 2001-01-16 14:18:55 +0000 | [diff] [blame] | 115 | Push an input source stream onto the input stack. If the filename |
| 116 | argument is specified it will later be available for use in error |
| 117 | messages. This is the same method used internally by the |
Fred Drake | 25be193 | 2001-01-16 20:52:41 +0000 | [diff] [blame] | 118 | \method{sourcehook} method. |
| 119 | \versionadded{2.1} |
Eric S. Raymond | bd1a489 | 2001-01-16 14:18:55 +0000 | [diff] [blame] | 120 | \end{methoddesc} |
| 121 | |
Guido van Rossum | d8faa36 | 2007-04-27 19:54:29 +0000 | [diff] [blame] | 122 | \begin{methoddesc}[shlex]{pop_source}{} |
Eric S. Raymond | bd1a489 | 2001-01-16 14:18:55 +0000 | [diff] [blame] | 123 | Pop the last-pushed input source from the input stack. |
| 124 | This is the same method used internally when the lexer reaches |
Raymond Hettinger | b67449d | 2003-09-08 18:52:18 +0000 | [diff] [blame] | 125 | \EOF{} on a stacked input stream. |
Fred Drake | 25be193 | 2001-01-16 20:52:41 +0000 | [diff] [blame] | 126 | \versionadded{2.1} |
Guido van Rossum | d67ddbb | 2000-05-01 20:14:47 +0000 | [diff] [blame] | 127 | \end{methoddesc} |
| 128 | |
Guido van Rossum | d8faa36 | 2007-04-27 19:54:29 +0000 | [diff] [blame] | 129 | \begin{methoddesc}[shlex]{error_leader}{\optional{file\optional{, line}}} |
Guido van Rossum | d67ddbb | 2000-05-01 20:14:47 +0000 | [diff] [blame] | 130 | This method generates an error message leader in the format of a |
Fred Drake | 25be193 | 2001-01-16 20:52:41 +0000 | [diff] [blame] | 131 | \UNIX{} C compiler error label; the format is \code{'"\%s", line \%d: '}, |
Fred Drake | 52dc76c | 2000-07-03 09:56:23 +0000 | [diff] [blame] | 132 | where the \samp{\%s} is replaced with the name of the current source |
| 133 | file and the \samp{\%d} with the current input line number (the |
| 134 | optional arguments can be used to override these). |
Guido van Rossum | d67ddbb | 2000-05-01 20:14:47 +0000 | [diff] [blame] | 135 | |
Fred Drake | 52dc76c | 2000-07-03 09:56:23 +0000 | [diff] [blame] | 136 | This convenience is provided to encourage \module{shlex} users to |
| 137 | generate error messages in the standard, parseable format understood |
| 138 | by Emacs and other \UNIX{} tools. |
Guido van Rossum | d67ddbb | 2000-05-01 20:14:47 +0000 | [diff] [blame] | 139 | \end{methoddesc} |
| 140 | |
Guido van Rossum | 5e97c9d | 1998-12-22 05:18:24 +0000 | [diff] [blame] | 141 | Instances of \class{shlex} subclasses have some public instance |
Fred Drake | 52dc76c | 2000-07-03 09:56:23 +0000 | [diff] [blame] | 142 | variables which either control lexical analysis or can be used for |
| 143 | debugging: |
Guido van Rossum | 5e97c9d | 1998-12-22 05:18:24 +0000 | [diff] [blame] | 144 | |
Guido van Rossum | d8faa36 | 2007-04-27 19:54:29 +0000 | [diff] [blame] | 145 | \begin{memberdesc}[shlex]{commenters} |
Guido van Rossum | 5e97c9d | 1998-12-22 05:18:24 +0000 | [diff] [blame] | 146 | The string of characters that are recognized as comment beginners. |
| 147 | All characters from the comment beginner to end of line are ignored. |
Fred Drake | 1189fa9 | 1998-12-22 18:24:13 +0000 | [diff] [blame] | 148 | Includes just \character{\#} by default. |
Guido van Rossum | 5e97c9d | 1998-12-22 05:18:24 +0000 | [diff] [blame] | 149 | \end{memberdesc} |
| 150 | |
Guido van Rossum | d8faa36 | 2007-04-27 19:54:29 +0000 | [diff] [blame] | 151 | \begin{memberdesc}[shlex]{wordchars} |
Guido van Rossum | 5e97c9d | 1998-12-22 05:18:24 +0000 | [diff] [blame] | 152 | The string of characters that will accumulate into multi-character |
Fred Drake | 52dc76c | 2000-07-03 09:56:23 +0000 | [diff] [blame] | 153 | tokens. By default, includes all \ASCII{} alphanumerics and |
Fred Drake | 1189fa9 | 1998-12-22 18:24:13 +0000 | [diff] [blame] | 154 | underscore. |
Guido van Rossum | 5e97c9d | 1998-12-22 05:18:24 +0000 | [diff] [blame] | 155 | \end{memberdesc} |
| 156 | |
Guido van Rossum | d8faa36 | 2007-04-27 19:54:29 +0000 | [diff] [blame] | 157 | \begin{memberdesc}[shlex]{whitespace} |
Guido van Rossum | 5e97c9d | 1998-12-22 05:18:24 +0000 | [diff] [blame] | 158 | Characters that will be considered whitespace and skipped. Whitespace |
Fred Drake | 1189fa9 | 1998-12-22 18:24:13 +0000 | [diff] [blame] | 159 | bounds tokens. By default, includes space, tab, linefeed and |
Guido van Rossum | 5e97c9d | 1998-12-22 05:18:24 +0000 | [diff] [blame] | 160 | carriage-return. |
| 161 | \end{memberdesc} |
| 162 | |
Guido van Rossum | d8faa36 | 2007-04-27 19:54:29 +0000 | [diff] [blame] | 163 | \begin{memberdesc}[shlex]{escape} |
Gustavo Niemeyer | 68d8cef | 2003-04-17 21:31:33 +0000 | [diff] [blame] | 164 | Characters that will be considered as escape. This will be only used |
Fred Drake | aa3b5d2 | 2003-04-17 21:49:04 +0000 | [diff] [blame] | 165 | in \POSIX{} mode, and includes just \character{\textbackslash} by default. |
Gustavo Niemeyer | 68d8cef | 2003-04-17 21:31:33 +0000 | [diff] [blame] | 166 | \versionadded{2.3} |
| 167 | \end{memberdesc} |
| 168 | |
Guido van Rossum | d8faa36 | 2007-04-27 19:54:29 +0000 | [diff] [blame] | 169 | \begin{memberdesc}[shlex]{quotes} |
Guido van Rossum | 5e97c9d | 1998-12-22 05:18:24 +0000 | [diff] [blame] | 170 | Characters that will be considered string quotes. The token |
| 171 | accumulates until the same quote is encountered again (thus, different |
Fred Drake | 184e836 | 1999-05-11 15:14:15 +0000 | [diff] [blame] | 172 | quote types protect each other as in the shell.) By default, includes |
Fred Drake | 1189fa9 | 1998-12-22 18:24:13 +0000 | [diff] [blame] | 173 | \ASCII{} single and double quotes. |
Guido van Rossum | 5e97c9d | 1998-12-22 05:18:24 +0000 | [diff] [blame] | 174 | \end{memberdesc} |
| 175 | |
Guido van Rossum | d8faa36 | 2007-04-27 19:54:29 +0000 | [diff] [blame] | 176 | \begin{memberdesc}[shlex]{escapedquotes} |
Gustavo Niemeyer | 68d8cef | 2003-04-17 21:31:33 +0000 | [diff] [blame] | 177 | Characters in \member{quotes} that will interpret escape characters |
Fred Drake | aa3b5d2 | 2003-04-17 21:49:04 +0000 | [diff] [blame] | 178 | defined in \member{escape}. This is only used in \POSIX{} mode, and |
| 179 | includes just \character{"} by default. |
Gustavo Niemeyer | 68d8cef | 2003-04-17 21:31:33 +0000 | [diff] [blame] | 180 | \versionadded{2.3} |
| 181 | \end{memberdesc} |
| 182 | |
Guido van Rossum | d8faa36 | 2007-04-27 19:54:29 +0000 | [diff] [blame] | 183 | \begin{memberdesc}[shlex]{whitespace_split} |
Neal Norwitz | 10cf218 | 2003-04-17 23:09:08 +0000 | [diff] [blame] | 184 | If \code{True}, tokens will only be split in whitespaces. This is useful, for |
Gustavo Niemeyer | 68d8cef | 2003-04-17 21:31:33 +0000 | [diff] [blame] | 185 | example, for parsing command lines with \class{shlex}, getting tokens |
| 186 | in a similar way to shell arguments. |
| 187 | \versionadded{2.3} |
| 188 | \end{memberdesc} |
| 189 | |
Guido van Rossum | d8faa36 | 2007-04-27 19:54:29 +0000 | [diff] [blame] | 190 | \begin{memberdesc}[shlex]{infile} |
Guido van Rossum | d67ddbb | 2000-05-01 20:14:47 +0000 | [diff] [blame] | 191 | The name of the current input file, as initially set at class |
| 192 | instantiation time or stacked by later source requests. It may |
| 193 | be useful to examine this when constructing error messages. |
| 194 | \end{memberdesc} |
| 195 | |
Guido van Rossum | d8faa36 | 2007-04-27 19:54:29 +0000 | [diff] [blame] | 196 | \begin{memberdesc}[shlex]{instream} |
Fred Drake | 52dc76c | 2000-07-03 09:56:23 +0000 | [diff] [blame] | 197 | The input stream from which this \class{shlex} instance is reading |
| 198 | characters. |
Guido van Rossum | d67ddbb | 2000-05-01 20:14:47 +0000 | [diff] [blame] | 199 | \end{memberdesc} |
| 200 | |
Guido van Rossum | d8faa36 | 2007-04-27 19:54:29 +0000 | [diff] [blame] | 201 | \begin{memberdesc}[shlex]{source} |
Fred Drake | 52dc76c | 2000-07-03 09:56:23 +0000 | [diff] [blame] | 202 | This member is \code{None} by default. If you assign a string to it, |
| 203 | that string will be recognized as a lexical-level inclusion request |
| 204 | similar to the \samp{source} keyword in various shells. That is, the |
| 205 | immediately following token will opened as a filename and input taken |
| 206 | from that stream until \EOF, at which point the \method{close()} |
| 207 | method of that stream will be called and the input source will again |
| 208 | become the original input stream. Source requests may be stacked any |
| 209 | number of levels deep. |
Guido van Rossum | d67ddbb | 2000-05-01 20:14:47 +0000 | [diff] [blame] | 210 | \end{memberdesc} |
| 211 | |
Guido van Rossum | d8faa36 | 2007-04-27 19:54:29 +0000 | [diff] [blame] | 212 | \begin{memberdesc}[shlex]{debug} |
Fred Drake | 52dc76c | 2000-07-03 09:56:23 +0000 | [diff] [blame] | 213 | If this member is numeric and \code{1} or more, a \class{shlex} |
| 214 | instance will print verbose progress output on its behavior. If you |
| 215 | need to use this, you can read the module source code to learn the |
| 216 | details. |
Guido van Rossum | d67ddbb | 2000-05-01 20:14:47 +0000 | [diff] [blame] | 217 | \end{memberdesc} |
| 218 | |
Guido van Rossum | d8faa36 | 2007-04-27 19:54:29 +0000 | [diff] [blame] | 219 | \begin{memberdesc}[shlex]{lineno} |
Guido van Rossum | 5e97c9d | 1998-12-22 05:18:24 +0000 | [diff] [blame] | 220 | Source line number (count of newlines seen so far plus one). |
| 221 | \end{memberdesc} |
| 222 | |
Guido van Rossum | d8faa36 | 2007-04-27 19:54:29 +0000 | [diff] [blame] | 223 | \begin{memberdesc}[shlex]{token} |
Fred Drake | 1189fa9 | 1998-12-22 18:24:13 +0000 | [diff] [blame] | 224 | The token buffer. It may be useful to examine this when catching |
| 225 | exceptions. |
Guido van Rossum | 5e97c9d | 1998-12-22 05:18:24 +0000 | [diff] [blame] | 226 | \end{memberdesc} |
Gustavo Niemeyer | 68d8cef | 2003-04-17 21:31:33 +0000 | [diff] [blame] | 227 | |
Guido van Rossum | d8faa36 | 2007-04-27 19:54:29 +0000 | [diff] [blame] | 228 | \begin{memberdesc}[shlex]{eof} |
Gustavo Niemeyer | 68d8cef | 2003-04-17 21:31:33 +0000 | [diff] [blame] | 229 | Token used to determine end of file. This will be set to the empty |
Fred Drake | aa3b5d2 | 2003-04-17 21:49:04 +0000 | [diff] [blame] | 230 | string (\code{''}), in non-\POSIX{} mode, and to \code{None} in |
| 231 | \POSIX{} mode. |
Gustavo Niemeyer | 68d8cef | 2003-04-17 21:31:33 +0000 | [diff] [blame] | 232 | \versionadded{2.3} |
| 233 | \end{memberdesc} |
| 234 | |
| 235 | \subsection{Parsing Rules\label{shlex-parsing-rules}} |
| 236 | |
Fred Drake | aa3b5d2 | 2003-04-17 21:49:04 +0000 | [diff] [blame] | 237 | When operating in non-\POSIX{} mode, \class{shlex} will try to obey to |
| 238 | the following rules. |
Gustavo Niemeyer | 68d8cef | 2003-04-17 21:31:33 +0000 | [diff] [blame] | 239 | |
| 240 | \begin{itemize} |
| 241 | \item Quote characters are not recognized within words |
| 242 | (\code{Do"Not"Separate} is parsed as the single word |
| 243 | \code{Do"Not"Separate}); |
| 244 | \item Escape characters are not recognized; |
| 245 | \item Enclosing characters in quotes preserve the literal value of |
| 246 | all characters within the quotes; |
| 247 | \item Closing quotes separate words (\code{"Do"Separate} is parsed |
| 248 | as \code{"Do"} and \code{Separate}); |
| 249 | \item If \member{whitespace_split} is \code{False}, any character not |
| 250 | declared to be a word character, whitespace, or a quote will be |
| 251 | returned as a single-character token. If it is \code{True}, |
| 252 | \class{shlex} will only split words in whitespaces; |
Fred Drake | aa3b5d2 | 2003-04-17 21:49:04 +0000 | [diff] [blame] | 253 | \item EOF is signaled with an empty string (\code{''}); |
Gustavo Niemeyer | 68d8cef | 2003-04-17 21:31:33 +0000 | [diff] [blame] | 254 | \item It's not possible to parse empty strings, even if quoted. |
| 255 | \end{itemize} |
| 256 | |
Fred Drake | aa3b5d2 | 2003-04-17 21:49:04 +0000 | [diff] [blame] | 257 | When operating in \POSIX{} mode, \class{shlex} will try to obey to the |
Gustavo Niemeyer | 68d8cef | 2003-04-17 21:31:33 +0000 | [diff] [blame] | 258 | following parsing rules. |
| 259 | |
| 260 | \begin{itemize} |
| 261 | \item Quotes are stripped out, and do not separate words |
| 262 | (\code{"Do"Not"Separate"} is parsed as the single word |
| 263 | \code{DoNotSeparate}); |
| 264 | \item Non-quoted escape characters (e.g. \character{\textbackslash}) |
| 265 | preserve the literal value of the next character that follows; |
| 266 | \item Enclosing characters in quotes which are not part of |
| 267 | \member{escapedquotes} (e.g. \character{'}) preserve the literal |
| 268 | value of all characters within the quotes; |
| 269 | \item Enclosing characters in quotes which are part of |
| 270 | \member{escapedquotes} (e.g. \character{"}) preserves the literal |
| 271 | value of all characters within the quotes, with the exception of |
| 272 | the characters mentioned in \member{escape}. The escape characters |
| 273 | retain its special meaning only when followed by the quote in use, |
| 274 | or the escape character itself. Otherwise the escape character |
| 275 | will be considered a normal character. |
Fred Drake | af78512 | 2003-12-31 05:18:46 +0000 | [diff] [blame] | 276 | \item EOF is signaled with a \constant{None} value; |
Fred Drake | aa3b5d2 | 2003-04-17 21:49:04 +0000 | [diff] [blame] | 277 | \item Quoted empty strings (\code{''}) are allowed; |
Gustavo Niemeyer | 68d8cef | 2003-04-17 21:31:33 +0000 | [diff] [blame] | 278 | \end{itemize} |
| 279 | |