Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 1 | \section{\module{csv} --- CSV File Reading and Writing} |
| 2 | |
| 3 | \declaremodule{standard}{csv} |
| 4 | \modulesynopsis{Write and read tabular data to and from delimited files.} |
Skip Montanaro | 3bd3c84 | 2003-04-24 18:47:31 +0000 | [diff] [blame] | 5 | \sectionauthor{Skip Montanaro}{skip@pobox.com} |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 6 | |
| 7 | \versionadded{2.3} |
| 8 | \index{csv} |
| 9 | \indexii{data}{tabular} |
| 10 | |
| 11 | The so-called CSV (Comma Separated Values) format is the most common import |
| 12 | and export format for spreadsheets and databases. There is no ``CSV |
| 13 | standard'', so the format is operationally defined by the many applications |
| 14 | which read and write it. The lack of a standard means that subtle |
| 15 | differences often exist in the data produced and consumed by different |
| 16 | applications. These differences can make it annoying to process CSV files |
| 17 | from multiple sources. Still, while the delimiters and quoting characters |
| 18 | vary, the overall format is similar enough that it is possible to write a |
| 19 | single module which can efficiently manipulate such data, hiding the details |
| 20 | of reading and writing the data from the programmer. |
| 21 | |
Skip Montanaro | 5d0136e | 2003-04-25 15:14:49 +0000 | [diff] [blame] | 22 | The \module{csv} module implements classes to read and write tabular data in |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 23 | CSV format. It allows programmers to say, ``write this data in the format |
| 24 | preferred by Excel,'' or ``read data from this file which was generated by |
| 25 | Excel,'' without knowing the precise details of the CSV format used by |
| 26 | Excel. Programmers can also describe the CSV formats understood by other |
| 27 | applications or define their own special-purpose CSV formats. |
| 28 | |
Skip Montanaro | 5d0136e | 2003-04-25 15:14:49 +0000 | [diff] [blame] | 29 | The \module{csv} module's \class{reader} and \class{writer} objects read and |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 30 | write sequences. Programmers can also read and write data in dictionary |
| 31 | form using the \class{DictReader} and \class{DictWriter} classes. |
| 32 | |
Fred Drake | 9635268 | 2003-04-25 18:02:34 +0000 | [diff] [blame] | 33 | \begin{notice} |
| 34 | This version of the \module{csv} module doesn't support Unicode |
| 35 | input. Also, there are currently some issues regarding \ASCII{} NUL |
Thomas Wouters | 49fd7fa | 2006-04-21 10:40:58 +0000 | [diff] [blame] | 36 | characters. Accordingly, all input should be UTF-8 or printable |
| 37 | \ASCII{} to be safe; see the examples in section~\ref{csv-examples}. |
| 38 | These restrictions will be removed in the future. |
Fred Drake | 9635268 | 2003-04-25 18:02:34 +0000 | [diff] [blame] | 39 | \end{notice} |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 40 | |
| 41 | \begin{seealso} |
| 42 | % \seemodule{array}{Arrays of uniformly types numeric values.} |
| 43 | \seepep{305}{CSV File API} |
| 44 | {The Python Enhancement Proposal which proposed this addition |
| 45 | to Python.} |
| 46 | \end{seealso} |
| 47 | |
| 48 | |
Raymond Hettinger | 6f6d7b93 | 2003-08-31 05:44:54 +0000 | [diff] [blame] | 49 | \subsection{Module Contents \label{csv-contents}} |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 50 | |
Skip Montanaro | 5d0136e | 2003-04-25 15:14:49 +0000 | [diff] [blame] | 51 | The \module{csv} module defines the following functions: |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 52 | |
| 53 | \begin{funcdesc}{reader}{csvfile\optional{, |
Andrew McNamara | 8231de0 | 2005-01-12 11:47:57 +0000 | [diff] [blame] | 54 | dialect=\code{'excel'}}\optional{, fmtparam}} |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 55 | Return a reader object which will iterate over lines in the given |
| 56 | {}\var{csvfile}. \var{csvfile} can be any object which supports the |
| 57 | iterator protocol and returns a string each time its \method{next} |
Thomas Wouters | 0e3f591 | 2006-08-11 14:57:12 +0000 | [diff] [blame] | 58 | method is called --- file objects and list objects are both suitable. |
Andrew McNamara | 8231de0 | 2005-01-12 11:47:57 +0000 | [diff] [blame] | 59 | If \var{csvfile} is a file object, it must be opened with |
Skip Montanaro | 5e4e39f | 2003-07-02 15:32:48 +0000 | [diff] [blame] | 60 | the 'b' flag on platforms where that makes a difference. An optional |
| 61 | {}\var{dialect} parameter can be given |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 62 | which is used to define a set of parameters specific to a particular CSV |
| 63 | dialect. It may be an instance of a subclass of the \class{Dialect} |
| 64 | class or one of the strings returned by the \function{list_dialects} |
| 65 | function. The other optional {}\var{fmtparam} keyword arguments can be |
| 66 | given to override individual formatting parameters in the current |
Thomas Wouters | 89f507f | 2006-12-13 04:49:30 +0000 | [diff] [blame] | 67 | dialect. For full details about the dialect and formatting |
Raymond Hettinger | 6e380cd | 2003-09-10 18:54:49 +0000 | [diff] [blame] | 68 | parameters, see section~\ref{csv-fmt-params}, ``Dialects and Formatting |
Thomas Wouters | 89f507f | 2006-12-13 04:49:30 +0000 | [diff] [blame] | 69 | Parameters''. |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 70 | |
| 71 | All data read are returned as strings. No automatic data type |
| 72 | conversion is performed. |
Thomas Wouters | 0e3f591 | 2006-08-11 14:57:12 +0000 | [diff] [blame] | 73 | |
| 74 | \versionchanged[ |
| 75 | The parser is now stricter with respect to multi-line quoted |
| 76 | fields. Previously, if a line ended within a quoted field without a |
| 77 | terminating newline character, a newline would be inserted into the |
| 78 | returned field. This behavior caused problems when reading files |
| 79 | which contained carriage return characters within fields. The |
| 80 | behavior was changed to return the field without inserting newlines. As |
| 81 | a consequence, if newlines embedded within fields are important, the |
| 82 | input should be split into lines in a manner which preserves the newline |
| 83 | characters]{2.5} |
| 84 | |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 85 | \end{funcdesc} |
| 86 | |
| 87 | \begin{funcdesc}{writer}{csvfile\optional{, |
Andrew McNamara | 8231de0 | 2005-01-12 11:47:57 +0000 | [diff] [blame] | 88 | dialect=\code{'excel'}}\optional{, fmtparam}} |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 89 | Return a writer object responsible for converting the user's data into |
Skip Montanaro | 5e4e39f | 2003-07-02 15:32:48 +0000 | [diff] [blame] | 90 | delimited strings on the given file-like object. \var{csvfile} can be any |
| 91 | object with a \function{write} method. If \var{csvfile} is a file object, |
| 92 | it must be opened with the 'b' flag on platforms where that makes a |
| 93 | difference. An optional |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 94 | {}\var{dialect} parameter can be given which is used to define a set of |
| 95 | parameters specific to a particular CSV dialect. It may be an instance |
| 96 | of a subclass of the \class{Dialect} class or one of the strings |
| 97 | returned by the \function{list_dialects} function. The other optional |
| 98 | {}\var{fmtparam} keyword arguments can be given to override individual |
Thomas Wouters | 89f507f | 2006-12-13 04:49:30 +0000 | [diff] [blame] | 99 | formatting parameters in the current dialect. For full details |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 100 | about the dialect and formatting parameters, see |
Thomas Wouters | 89f507f | 2006-12-13 04:49:30 +0000 | [diff] [blame] | 101 | section~\ref{csv-fmt-params}, ``Dialects and Formatting Parameters''. |
| 102 | To make it as easy as possible to |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 103 | interface with modules which implement the DB API, the value |
| 104 | \constant{None} is written as the empty string. While this isn't a |
| 105 | reversible transformation, it makes it easier to dump SQL NULL data values |
| 106 | to CSV files without preprocessing the data returned from a |
| 107 | \code{cursor.fetch*()} call. All other non-string data are stringified |
| 108 | with \function{str()} before being written. |
| 109 | \end{funcdesc} |
| 110 | |
Andrew McNamara | 8231de0 | 2005-01-12 11:47:57 +0000 | [diff] [blame] | 111 | \begin{funcdesc}{register_dialect}{name\optional{, dialect}\optional{, fmtparam}} |
| 112 | Associate \var{dialect} with \var{name}. \var{name} must be a string |
| 113 | or Unicode object. The dialect can be specified either by passing a |
| 114 | sub-class of \class{Dialect}, or by \var{fmtparam} keyword arguments, |
| 115 | or both, with keyword arguments overriding parameters of the dialect. |
Thomas Wouters | 89f507f | 2006-12-13 04:49:30 +0000 | [diff] [blame] | 116 | For full details about the dialect and formatting parameters, see |
| 117 | section~\ref{csv-fmt-params}, ``Dialects and Formatting Parameters''. |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 118 | \end{funcdesc} |
| 119 | |
| 120 | \begin{funcdesc}{unregister_dialect}{name} |
| 121 | Delete the dialect associated with \var{name} from the dialect registry. An |
| 122 | \exception{Error} is raised if \var{name} is not a registered dialect |
| 123 | name. |
| 124 | \end{funcdesc} |
| 125 | |
| 126 | \begin{funcdesc}{get_dialect}{name} |
| 127 | Return the dialect associated with \var{name}. An \exception{Error} is |
| 128 | raised if \var{name} is not a registered dialect name. |
| 129 | \end{funcdesc} |
| 130 | |
| 131 | \begin{funcdesc}{list_dialects}{} |
| 132 | Return the names of all registered dialects. |
| 133 | \end{funcdesc} |
| 134 | |
Andrew McNamara | 8231de0 | 2005-01-12 11:47:57 +0000 | [diff] [blame] | 135 | \begin{funcdesc}{field_size_limit}{\optional{new_limit}} |
| 136 | Returns the current maximum field size allowed by the parser. If |
| 137 | \var{new_limit} is given, this becomes the new limit. |
| 138 | \versionadded{2.5} |
| 139 | \end{funcdesc} |
| 140 | |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 141 | |
Skip Montanaro | 5d0136e | 2003-04-25 15:14:49 +0000 | [diff] [blame] | 142 | The \module{csv} module defines the following classes: |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 143 | |
Skip Montanaro | dffeed3 | 2003-10-03 14:03:01 +0000 | [diff] [blame] | 144 | \begin{classdesc}{DictReader}{csvfile\optional{, |
| 145 | fieldnames=\constant{None},\optional{, |
Fred Drake | 9635268 | 2003-04-25 18:02:34 +0000 | [diff] [blame] | 146 | restkey=\constant{None}\optional{, |
| 147 | restval=\constant{None}\optional{, |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 148 | dialect=\code{'excel'}\optional{, |
Skip Montanaro | 10659f2 | 2004-04-16 03:21:01 +0000 | [diff] [blame] | 149 | *args, **kwds}}}}}} |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 150 | Create an object which operates like a regular reader but maps the |
Skip Montanaro | dffeed3 | 2003-10-03 14:03:01 +0000 | [diff] [blame] | 151 | information read into a dict whose keys are given by the optional |
| 152 | {} \var{fieldnames} |
| 153 | parameter. If the \var{fieldnames} parameter is omitted, the values in |
| 154 | the first row of the \var{csvfile} will be used as the fieldnames. |
| 155 | If the row read has fewer fields than the fieldnames sequence, |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 156 | the value of \var{restval} will be used as the default value. If the row |
| 157 | read has more fields than the fieldnames sequence, the remaining data is |
| 158 | added as a sequence keyed by the value of \var{restkey}. If the row read |
| 159 | has fewer fields than the fieldnames sequence, the remaining keys take the |
Skip Montanaro | 10659f2 | 2004-04-16 03:21:01 +0000 | [diff] [blame] | 160 | value of the optional \var{restval} parameter. Any other optional or |
| 161 | keyword arguments are passed to the underlying \class{reader} instance. |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 162 | \end{classdesc} |
| 163 | |
| 164 | |
| 165 | \begin{classdesc}{DictWriter}{csvfile, fieldnames\optional{, |
| 166 | restval=""\optional{, |
| 167 | extrasaction=\code{'raise'}\optional{, |
Skip Montanaro | 10659f2 | 2004-04-16 03:21:01 +0000 | [diff] [blame] | 168 | dialect=\code{'excel'}\optional{, |
| 169 | *args, **kwds}}}}} |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 170 | Create an object which operates like a regular writer but maps dictionaries |
| 171 | onto output rows. The \var{fieldnames} parameter identifies the order in |
| 172 | which values in the dictionary passed to the \method{writerow()} method are |
| 173 | written to the \var{csvfile}. The optional \var{restval} parameter |
| 174 | specifies the value to be written if the dictionary is missing a key in |
| 175 | \var{fieldnames}. If the dictionary passed to the \method{writerow()} |
| 176 | method contains a key not found in \var{fieldnames}, the optional |
| 177 | \var{extrasaction} parameter indicates what action to take. If it is set |
| 178 | to \code{'raise'} a \exception{ValueError} is raised. If it is set to |
Skip Montanaro | 10659f2 | 2004-04-16 03:21:01 +0000 | [diff] [blame] | 179 | \code{'ignore'}, extra values in the dictionary are ignored. Any other |
| 180 | optional or keyword arguments are passed to the underlying \class{writer} |
| 181 | instance. |
Skip Montanaro | dffeed3 | 2003-10-03 14:03:01 +0000 | [diff] [blame] | 182 | |
| 183 | Note that unlike the \class{DictReader} class, the \var{fieldnames} |
| 184 | parameter of the \class{DictWriter} is not optional. Since Python's |
| 185 | \class{dict} objects are not ordered, there is not enough information |
| 186 | available to deduce the order in which the row should be written to the |
| 187 | \var{csvfile}. |
| 188 | |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 189 | \end{classdesc} |
| 190 | |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 191 | \begin{classdesc*}{Dialect}{} |
| 192 | The \class{Dialect} class is a container class relied on primarily for its |
| 193 | attributes, which are used to define the parameters for a specific |
Fred Drake | 9635268 | 2003-04-25 18:02:34 +0000 | [diff] [blame] | 194 | \class{reader} or \class{writer} instance. |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 195 | \end{classdesc*} |
| 196 | |
Skip Montanaro | bb0c9dc | 2005-01-05 06:58:15 +0000 | [diff] [blame] | 197 | \begin{classdesc}{excel}{} |
| 198 | The \class{excel} class defines the usual properties of an Excel-generated |
Thomas Wouters | 89f507f | 2006-12-13 04:49:30 +0000 | [diff] [blame] | 199 | CSV file. It is registered with the dialect name \code{'excel'}. |
Skip Montanaro | bb0c9dc | 2005-01-05 06:58:15 +0000 | [diff] [blame] | 200 | \end{classdesc} |
| 201 | |
| 202 | \begin{classdesc}{excel_tab}{} |
| 203 | The \class{excel_tab} class defines the usual properties of an |
Thomas Wouters | 89f507f | 2006-12-13 04:49:30 +0000 | [diff] [blame] | 204 | Excel-generated TAB-delimited file. It is registered with the dialect name |
| 205 | \code{'excel-tab'}. |
Skip Montanaro | bb0c9dc | 2005-01-05 06:58:15 +0000 | [diff] [blame] | 206 | \end{classdesc} |
| 207 | |
Skip Montanaro | 7789237 | 2003-05-19 15:33:36 +0000 | [diff] [blame] | 208 | \begin{classdesc}{Sniffer}{} |
| 209 | The \class{Sniffer} class is used to deduce the format of a CSV file. |
Fred Drake | 9635268 | 2003-04-25 18:02:34 +0000 | [diff] [blame] | 210 | \end{classdesc} |
| 211 | |
Skip Montanaro | 8bdaac7 | 2005-12-28 15:56:58 +0000 | [diff] [blame] | 212 | The \class{Sniffer} class provides two methods: |
Fred Drake | 9635268 | 2003-04-25 18:02:34 +0000 | [diff] [blame] | 213 | |
Skip Montanaro | 7789237 | 2003-05-19 15:33:36 +0000 | [diff] [blame] | 214 | \begin{methoddesc}{sniff}{sample\optional{,delimiters=None}} |
| 215 | Analyze the given \var{sample} and return a \class{Dialect} subclass |
| 216 | reflecting the parameters found. If the optional \var{delimiters} parameter |
| 217 | is given, it is interpreted as a string containing possible valid delimiter |
| 218 | characters. |
Fred Drake | 9635268 | 2003-04-25 18:02:34 +0000 | [diff] [blame] | 219 | \end{methoddesc} |
| 220 | |
| 221 | \begin{methoddesc}{has_header}{sample} |
| 222 | Analyze the sample text (presumed to be in CSV format) and return |
| 223 | \constant{True} if the first row appears to be a series of column |
| 224 | headers. |
| 225 | \end{methoddesc} |
| 226 | |
| 227 | |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 228 | The \module{csv} module defines the following constants: |
| 229 | |
Skip Montanaro | a104556 | 2003-06-04 15:30:13 +0000 | [diff] [blame] | 230 | \begin{datadesc}{QUOTE_ALL} |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 231 | Instructs \class{writer} objects to quote all fields. |
| 232 | \end{datadesc} |
| 233 | |
| 234 | \begin{datadesc}{QUOTE_MINIMAL} |
| 235 | Instructs \class{writer} objects to only quote those fields which contain |
Andrew McNamara | 8231de0 | 2005-01-12 11:47:57 +0000 | [diff] [blame] | 236 | special characters such as \var{delimiter}, \var{quotechar} or any of the |
| 237 | characters in \var{lineterminator}. |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 238 | \end{datadesc} |
| 239 | |
| 240 | \begin{datadesc}{QUOTE_NONNUMERIC} |
Andrew McNamara | 8231de0 | 2005-01-12 11:47:57 +0000 | [diff] [blame] | 241 | Instructs \class{writer} objects to quote all non-numeric |
| 242 | fields. |
| 243 | |
| 244 | Instructs the reader to convert all non-quoted fields to type \var{float}. |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 245 | \end{datadesc} |
| 246 | |
| 247 | \begin{datadesc}{QUOTE_NONE} |
| 248 | Instructs \class{writer} objects to never quote fields. When the current |
| 249 | \var{delimiter} occurs in output data it is preceded by the current |
Andrew McNamara | 8231de0 | 2005-01-12 11:47:57 +0000 | [diff] [blame] | 250 | \var{escapechar} character. If \var{escapechar} is not set, the writer |
| 251 | will raise \exception{Error} if any characters that require escaping |
| 252 | are encountered. |
| 253 | |
| 254 | Instructs \class{reader} to perform no special processing of quote characters. |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 255 | \end{datadesc} |
| 256 | |
| 257 | |
| 258 | The \module{csv} module defines the following exception: |
| 259 | |
| 260 | \begin{excdesc}{Error} |
| 261 | Raised by any of the functions when an error is detected. |
| 262 | \end{excdesc} |
| 263 | |
| 264 | |
Fred Drake | 9635268 | 2003-04-25 18:02:34 +0000 | [diff] [blame] | 265 | \subsection{Dialects and Formatting Parameters\label{csv-fmt-params}} |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 266 | |
| 267 | To make it easier to specify the format of input and output records, |
| 268 | specific formatting parameters are grouped together into dialects. A |
| 269 | dialect is a subclass of the \class{Dialect} class having a set of specific |
| 270 | methods and a single \method{validate()} method. When creating \class{reader} |
| 271 | or \class{writer} objects, the programmer can specify a string or a subclass |
| 272 | of the \class{Dialect} class as the dialect parameter. In addition to, or |
| 273 | instead of, the \var{dialect} parameter, the programmer can also specify |
| 274 | individual formatting parameters, which have the same names as the |
Raymond Hettinger | 6f6d7b93 | 2003-08-31 05:44:54 +0000 | [diff] [blame] | 275 | attributes defined below for the \class{Dialect} class. |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 276 | |
Fred Drake | 9635268 | 2003-04-25 18:02:34 +0000 | [diff] [blame] | 277 | Dialects support the following attributes: |
| 278 | |
| 279 | \begin{memberdesc}[Dialect]{delimiter} |
| 280 | A one-character string used to separate fields. It defaults to \code{','}. |
| 281 | \end{memberdesc} |
| 282 | |
| 283 | \begin{memberdesc}[Dialect]{doublequote} |
Andrew McNamara | 8231de0 | 2005-01-12 11:47:57 +0000 | [diff] [blame] | 284 | Controls how instances of \var{quotechar} appearing inside a field should |
| 285 | be themselves be quoted. When \constant{True}, the character is doubled. |
| 286 | When \constant{False}, the \var{escapechar} is used as a prefix to the |
| 287 | \var{quotechar}. It defaults to \constant{True}. |
| 288 | |
| 289 | On output, if \var{doublequote} is \constant{False} and no |
| 290 | \var{escapechar} is set, \exception{Error} is raised if a \var{quotechar} |
| 291 | is found in a field. |
Fred Drake | 9635268 | 2003-04-25 18:02:34 +0000 | [diff] [blame] | 292 | \end{memberdesc} |
| 293 | |
| 294 | \begin{memberdesc}[Dialect]{escapechar} |
Andrew McNamara | 8231de0 | 2005-01-12 11:47:57 +0000 | [diff] [blame] | 295 | A one-character string used by the writer to escape the \var{delimiter} if |
| 296 | \var{quoting} is set to \constant{QUOTE_NONE} and the \var{quotechar} |
| 297 | if \var{doublequote} is \constant{False}. On reading, the \var{escapechar} |
| 298 | removes any special meaning from the following character. It defaults |
| 299 | to \constant{None}, which disables escaping. |
Fred Drake | 9635268 | 2003-04-25 18:02:34 +0000 | [diff] [blame] | 300 | \end{memberdesc} |
| 301 | |
| 302 | \begin{memberdesc}[Dialect]{lineterminator} |
Andrew McNamara | 8231de0 | 2005-01-12 11:47:57 +0000 | [diff] [blame] | 303 | The string used to terminate lines produced by the \class{writer}. |
| 304 | It defaults to \code{'\e r\e n'}. |
| 305 | |
| 306 | \note{The \class{reader} is hard-coded to recognise either \code{'\e r'} |
| 307 | or \code{'\e n'} as end-of-line, and ignores \var{lineterminator}. This |
| 308 | behavior may change in the future.} |
Fred Drake | 9635268 | 2003-04-25 18:02:34 +0000 | [diff] [blame] | 309 | \end{memberdesc} |
| 310 | |
| 311 | \begin{memberdesc}[Dialect]{quotechar} |
Andrew McNamara | 8231de0 | 2005-01-12 11:47:57 +0000 | [diff] [blame] | 312 | A one-character string used to quote fields containing special characters, |
| 313 | such as the \var{delimiter} or \var{quotechar}, or which contain new-line |
| 314 | characters. It defaults to \code{'"'}. |
Fred Drake | 9635268 | 2003-04-25 18:02:34 +0000 | [diff] [blame] | 315 | \end{memberdesc} |
| 316 | |
| 317 | \begin{memberdesc}[Dialect]{quoting} |
Andrew McNamara | 8231de0 | 2005-01-12 11:47:57 +0000 | [diff] [blame] | 318 | Controls when quotes should be generated by the writer and recognised |
| 319 | by the reader. It can take on any of the \constant{QUOTE_*} constants |
| 320 | (see section~\ref{csv-contents}) and defaults to \constant{QUOTE_MINIMAL}. |
Fred Drake | 9635268 | 2003-04-25 18:02:34 +0000 | [diff] [blame] | 321 | \end{memberdesc} |
| 322 | |
| 323 | \begin{memberdesc}[Dialect]{skipinitialspace} |
| 324 | When \constant{True}, whitespace immediately following the \var{delimiter} |
| 325 | is ignored. The default is \constant{False}. |
| 326 | \end{memberdesc} |
| 327 | |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 328 | |
| 329 | \subsection{Reader Objects} |
| 330 | |
Fred Drake | 9635268 | 2003-04-25 18:02:34 +0000 | [diff] [blame] | 331 | Reader objects (\class{DictReader} instances and objects returned by |
Raymond Hettinger | 6f6d7b93 | 2003-08-31 05:44:54 +0000 | [diff] [blame] | 332 | the \function{reader()} function) have the following public methods: |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 333 | |
Fred Drake | 9635268 | 2003-04-25 18:02:34 +0000 | [diff] [blame] | 334 | \begin{methoddesc}[csv reader]{next}{} |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 335 | Return the next row of the reader's iterable object as a list, parsed |
| 336 | according to the current dialect. |
| 337 | \end{methoddesc} |
| 338 | |
Andrew McNamara | 8231de0 | 2005-01-12 11:47:57 +0000 | [diff] [blame] | 339 | Reader objects have the following public attributes: |
| 340 | |
| 341 | \begin{memberdesc}[csv reader]{dialect} |
| 342 | A read-only description of the dialect in use by the parser. |
| 343 | \end{memberdesc} |
| 344 | |
| 345 | \begin{memberdesc}[csv reader]{line_num} |
| 346 | The number of lines read from the source iterator. This is not the same |
| 347 | as the number of records returned, as records can span multiple lines. |
Thomas Wouters | 89f507f | 2006-12-13 04:49:30 +0000 | [diff] [blame] | 348 | \versionadded{2.5} |
Andrew McNamara | 8231de0 | 2005-01-12 11:47:57 +0000 | [diff] [blame] | 349 | \end{memberdesc} |
| 350 | |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 351 | |
| 352 | \subsection{Writer Objects} |
| 353 | |
Skip Montanaro | ba0485a | 2004-01-21 13:47:04 +0000 | [diff] [blame] | 354 | \class{Writer} objects (\class{DictWriter} instances and objects returned by |
| 355 | the \function{writer()} function) have the following public methods. A |
| 356 | {}\var{row} must be a sequence of strings or numbers for \class{Writer} |
| 357 | objects and a dictionary mapping fieldnames to strings or numbers (by |
| 358 | passing them through \function{str()} first) for {}\class{DictWriter} |
| 359 | objects. Note that complex numbers are written out surrounded by parens. |
| 360 | This may cause some problems for other programs which read CSV files |
| 361 | (assuming they support complex numbers at all). |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 362 | |
Fred Drake | 9635268 | 2003-04-25 18:02:34 +0000 | [diff] [blame] | 363 | \begin{methoddesc}[csv writer]{writerow}{row} |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 364 | Write the \var{row} parameter to the writer's file object, formatted |
| 365 | according to the current dialect. |
| 366 | \end{methoddesc} |
| 367 | |
Fred Drake | 9635268 | 2003-04-25 18:02:34 +0000 | [diff] [blame] | 368 | \begin{methoddesc}[csv writer]{writerows}{rows} |
Skip Montanaro | ba0485a | 2004-01-21 13:47:04 +0000 | [diff] [blame] | 369 | Write all the \var{rows} parameters (a list of \var{row} objects as |
| 370 | described above) to the writer's file object, formatted |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 371 | according to the current dialect. |
| 372 | \end{methoddesc} |
| 373 | |
Andrew McNamara | 8231de0 | 2005-01-12 11:47:57 +0000 | [diff] [blame] | 374 | Writer objects have the following public attribute: |
| 375 | |
| 376 | \begin{memberdesc}[csv writer]{dialect} |
| 377 | A read-only description of the dialect in use by the writer. |
| 378 | \end{memberdesc} |
| 379 | |
| 380 | |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 381 | |
Thomas Wouters | 49fd7fa | 2006-04-21 10:40:58 +0000 | [diff] [blame] | 382 | \subsection{Examples\label{csv-examples}} |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 383 | |
Andrew McNamara | 8231de0 | 2005-01-12 11:47:57 +0000 | [diff] [blame] | 384 | The simplest example of reading a CSV file: |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 385 | |
| 386 | \begin{verbatim} |
Fred Drake | 9635268 | 2003-04-25 18:02:34 +0000 | [diff] [blame] | 387 | import csv |
Andrew M. Kuchling | 6f937b1 | 2004-08-07 15:11:24 +0000 | [diff] [blame] | 388 | reader = csv.reader(open("some.csv", "rb")) |
Fred Drake | 9635268 | 2003-04-25 18:02:34 +0000 | [diff] [blame] | 389 | for row in reader: |
| 390 | print row |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 391 | \end{verbatim} |
| 392 | |
Andrew McNamara | 8231de0 | 2005-01-12 11:47:57 +0000 | [diff] [blame] | 393 | Reading a file with an alternate format: |
Skip Montanaro | 2b2795a | 2004-07-08 19:49:10 +0000 | [diff] [blame] | 394 | |
| 395 | \begin{verbatim} |
| 396 | import csv |
Andrew McNamara | 8231de0 | 2005-01-12 11:47:57 +0000 | [diff] [blame] | 397 | reader = csv.reader(open("passwd", "rb"), delimiter=':', quoting=csv.QUOTE_NONE) |
Skip Montanaro | 2b2795a | 2004-07-08 19:49:10 +0000 | [diff] [blame] | 398 | for row in reader: |
Andrew McNamara | 8231de0 | 2005-01-12 11:47:57 +0000 | [diff] [blame] | 399 | print row |
Skip Montanaro | 2b2795a | 2004-07-08 19:49:10 +0000 | [diff] [blame] | 400 | \end{verbatim} |
| 401 | |
Andrew McNamara | 8231de0 | 2005-01-12 11:47:57 +0000 | [diff] [blame] | 402 | The corresponding simplest possible writing example is: |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 403 | |
| 404 | \begin{verbatim} |
Fred Drake | 9635268 | 2003-04-25 18:02:34 +0000 | [diff] [blame] | 405 | import csv |
Andrew M. Kuchling | 6f937b1 | 2004-08-07 15:11:24 +0000 | [diff] [blame] | 406 | writer = csv.writer(open("some.csv", "wb")) |
Andrew McNamara | 8231de0 | 2005-01-12 11:47:57 +0000 | [diff] [blame] | 407 | writer.writerows(someiterable) |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 408 | \end{verbatim} |
Andrew McNamara | 8231de0 | 2005-01-12 11:47:57 +0000 | [diff] [blame] | 409 | |
| 410 | Registering a new dialect: |
| 411 | |
| 412 | \begin{verbatim} |
| 413 | import csv |
| 414 | |
| 415 | csv.register_dialect('unixpwd', delimiter=':', quoting=csv.QUOTE_NONE) |
| 416 | |
| 417 | reader = csv.reader(open("passwd", "rb"), 'unixpwd') |
| 418 | \end{verbatim} |
| 419 | |
Thomas Wouters | 0e3f591 | 2006-08-11 14:57:12 +0000 | [diff] [blame] | 420 | A slightly more advanced use of the reader --- catching and reporting errors: |
Andrew McNamara | 8231de0 | 2005-01-12 11:47:57 +0000 | [diff] [blame] | 421 | |
| 422 | \begin{verbatim} |
| 423 | import csv, sys |
| 424 | filename = "some.csv" |
| 425 | reader = csv.reader(open(filename, "rb")) |
| 426 | try: |
| 427 | for row in reader: |
| 428 | print row |
Guido van Rossum | b940e11 | 2007-01-10 16:19:56 +0000 | [diff] [blame] | 429 | except csv.Error as e: |
Andrew McNamara | 8231de0 | 2005-01-12 11:47:57 +0000 | [diff] [blame] | 430 | sys.exit('file %s, line %d: %s' % (filename, reader.line_num, e)) |
| 431 | \end{verbatim} |
| 432 | |
| 433 | And while the module doesn't directly support parsing strings, it can |
| 434 | easily be done: |
| 435 | |
| 436 | \begin{verbatim} |
| 437 | import csv |
Thomas Wouters | bbdf607 | 2006-02-16 14:57:05 +0000 | [diff] [blame] | 438 | for row in csv.reader(['one,two,three']): |
| 439 | print row |
Andrew McNamara | 8231de0 | 2005-01-12 11:47:57 +0000 | [diff] [blame] | 440 | \end{verbatim} |
| 441 | |
Skip Montanaro | 5011c3f | 2005-03-18 16:56:37 +0000 | [diff] [blame] | 442 | The \module{csv} module doesn't directly support reading and writing |
Thomas Wouters | 49fd7fa | 2006-04-21 10:40:58 +0000 | [diff] [blame] | 443 | Unicode, but it is 8-bit-clean save for some problems with \ASCII{} NUL |
| 444 | characters. So you can write functions or classes that handle the |
| 445 | encoding and decoding for you as long as you avoid encodings like |
| 446 | UTF-16 that use NULs. UTF-8 is recommended. |
| 447 | |
| 448 | \function{unicode_csv_reader} below is a generator that wraps |
| 449 | \class{csv.reader} to handle Unicode CSV data (a list of Unicode |
| 450 | strings). \function{utf_8_encoder} is a generator that encodes the |
| 451 | Unicode strings as UTF-8, one string (or row) at a time. The encoded |
| 452 | strings are parsed by the CSV reader, and |
| 453 | \function{unicode_csv_reader} decodes the UTF-8-encoded cells back |
| 454 | into Unicode: |
Skip Montanaro | 5011c3f | 2005-03-18 16:56:37 +0000 | [diff] [blame] | 455 | |
| 456 | \begin{verbatim} |
| 457 | import csv |
| 458 | |
Thomas Wouters | 49fd7fa | 2006-04-21 10:40:58 +0000 | [diff] [blame] | 459 | def unicode_csv_reader(unicode_csv_data, dialect=csv.excel, **kwargs): |
| 460 | # csv.py doesn't do Unicode; encode temporarily as UTF-8: |
| 461 | csv_reader = csv.reader(utf_8_encoder(unicode_csv_data), |
| 462 | dialect=dialect, **kwargs) |
| 463 | for row in csv_reader: |
| 464 | # decode UTF-8 back to Unicode, cell by cell: |
| 465 | yield [unicode(cell, 'utf-8') for cell in row] |
| 466 | |
| 467 | def utf_8_encoder(unicode_csv_data): |
| 468 | for line in unicode_csv_data: |
| 469 | yield line.encode('utf-8') |
| 470 | \end{verbatim} |
| 471 | |
| 472 | For all other encodings the following \class{UnicodeReader} and |
| 473 | \class{UnicodeWriter} classes can be used. They take an additional |
| 474 | \var{encoding} parameter in their constructor and make sure that the data |
| 475 | passes the real reader or writer encoded as UTF-8: |
| 476 | |
| 477 | \begin{verbatim} |
| 478 | import csv, codecs, cStringIO |
| 479 | |
| 480 | class UTF8Recoder: |
| 481 | """ |
| 482 | Iterator that reads an encoded stream and reencodes the input to UTF-8 |
| 483 | """ |
| 484 | def __init__(self, f, encoding): |
| 485 | self.reader = codecs.getreader(encoding)(f) |
| 486 | |
| 487 | def __iter__(self): |
| 488 | return self |
| 489 | |
Georg Brandl | a18af4e | 2007-04-21 15:47:16 +0000 | [diff] [blame] | 490 | def __next__(self): |
| 491 | return next(self.reader).encode("utf-8") |
Thomas Wouters | 49fd7fa | 2006-04-21 10:40:58 +0000 | [diff] [blame] | 492 | |
Skip Montanaro | 5011c3f | 2005-03-18 16:56:37 +0000 | [diff] [blame] | 493 | class UnicodeReader: |
Thomas Wouters | 49fd7fa | 2006-04-21 10:40:58 +0000 | [diff] [blame] | 494 | """ |
| 495 | A CSV reader which will iterate over lines in the CSV file "f", |
| 496 | which is encoded in the given encoding. |
| 497 | """ |
| 498 | |
Skip Montanaro | 5011c3f | 2005-03-18 16:56:37 +0000 | [diff] [blame] | 499 | def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds): |
Thomas Wouters | 49fd7fa | 2006-04-21 10:40:58 +0000 | [diff] [blame] | 500 | f = UTF8Recoder(f, encoding) |
Skip Montanaro | 5011c3f | 2005-03-18 16:56:37 +0000 | [diff] [blame] | 501 | self.reader = csv.reader(f, dialect=dialect, **kwds) |
Skip Montanaro | 5011c3f | 2005-03-18 16:56:37 +0000 | [diff] [blame] | 502 | |
Georg Brandl | a18af4e | 2007-04-21 15:47:16 +0000 | [diff] [blame] | 503 | def __next__(self): |
| 504 | row = next(self.reader) |
Thomas Wouters | 49fd7fa | 2006-04-21 10:40:58 +0000 | [diff] [blame] | 505 | return [unicode(s, "utf-8") for s in row] |
Skip Montanaro | 5011c3f | 2005-03-18 16:56:37 +0000 | [diff] [blame] | 506 | |
| 507 | def __iter__(self): |
| 508 | return self |
| 509 | |
| 510 | class UnicodeWriter: |
Thomas Wouters | 49fd7fa | 2006-04-21 10:40:58 +0000 | [diff] [blame] | 511 | """ |
| 512 | A CSV writer which will write rows to CSV file "f", |
| 513 | which is encoded in the given encoding. |
| 514 | """ |
| 515 | |
Skip Montanaro | 5011c3f | 2005-03-18 16:56:37 +0000 | [diff] [blame] | 516 | def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds): |
Thomas Wouters | 49fd7fa | 2006-04-21 10:40:58 +0000 | [diff] [blame] | 517 | # Redirect output to a queue |
| 518 | self.queue = cStringIO.StringIO() |
| 519 | self.writer = csv.writer(self.queue, dialect=dialect, **kwds) |
| 520 | self.stream = f |
| 521 | self.encoder = codecs.getincrementalencoder(encoding)() |
Skip Montanaro | 5011c3f | 2005-03-18 16:56:37 +0000 | [diff] [blame] | 522 | |
| 523 | def writerow(self, row): |
Thomas Wouters | 49fd7fa | 2006-04-21 10:40:58 +0000 | [diff] [blame] | 524 | self.writer.writerow([s.encode("utf-8") for s in row]) |
| 525 | # Fetch UTF-8 output from the queue ... |
| 526 | data = self.queue.getvalue() |
| 527 | data = data.decode("utf-8") |
| 528 | # ... and reencode it into the target encoding |
| 529 | data = self.encoder.encode(data) |
| 530 | # write to the target stream |
| 531 | self.stream.write(data) |
| 532 | # empty queue |
| 533 | self.queue.truncate(0) |
Skip Montanaro | 5011c3f | 2005-03-18 16:56:37 +0000 | [diff] [blame] | 534 | |
| 535 | def writerows(self, rows): |
| 536 | for row in rows: |
| 537 | self.writerow(row) |
| 538 | \end{verbatim} |