Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 1 | \section{\module{csv} --- CSV File Reading and Writing} |
| 2 | |
| 3 | \declaremodule{standard}{csv} |
| 4 | \modulesynopsis{Write and read tabular data to and from delimited files.} |
Skip Montanaro | 3bd3c84 | 2003-04-24 18:47:31 +0000 | [diff] [blame] | 5 | \sectionauthor{Skip Montanaro}{skip@pobox.com} |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 6 | |
| 7 | \versionadded{2.3} |
| 8 | \index{csv} |
| 9 | \indexii{data}{tabular} |
| 10 | |
| 11 | The so-called CSV (Comma Separated Values) format is the most common import |
| 12 | and export format for spreadsheets and databases. There is no ``CSV |
| 13 | standard'', so the format is operationally defined by the many applications |
| 14 | which read and write it. The lack of a standard means that subtle |
| 15 | differences often exist in the data produced and consumed by different |
| 16 | applications. These differences can make it annoying to process CSV files |
| 17 | from multiple sources. Still, while the delimiters and quoting characters |
| 18 | vary, the overall format is similar enough that it is possible to write a |
| 19 | single module which can efficiently manipulate such data, hiding the details |
| 20 | of reading and writing the data from the programmer. |
| 21 | |
Skip Montanaro | 5d0136e | 2003-04-25 15:14:49 +0000 | [diff] [blame] | 22 | The \module{csv} module implements classes to read and write tabular data in |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 23 | CSV format. It allows programmers to say, ``write this data in the format |
| 24 | preferred by Excel,'' or ``read data from this file which was generated by |
| 25 | Excel,'' without knowing the precise details of the CSV format used by |
| 26 | Excel. Programmers can also describe the CSV formats understood by other |
| 27 | applications or define their own special-purpose CSV formats. |
| 28 | |
Skip Montanaro | 5d0136e | 2003-04-25 15:14:49 +0000 | [diff] [blame] | 29 | The \module{csv} module's \class{reader} and \class{writer} objects read and |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 30 | write sequences. Programmers can also read and write data in dictionary |
| 31 | form using the \class{DictReader} and \class{DictWriter} classes. |
| 32 | |
Fred Drake | 9635268 | 2003-04-25 18:02:34 +0000 | [diff] [blame] | 33 | \begin{notice} |
| 34 | This version of the \module{csv} module doesn't support Unicode |
| 35 | input. Also, there are currently some issues regarding \ASCII{} NUL |
| 36 | characters. Accordingly, all input should generally be printable |
| 37 | \ASCII{} to be safe. These restrictions will be removed in the future. |
| 38 | \end{notice} |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 39 | |
| 40 | \begin{seealso} |
| 41 | % \seemodule{array}{Arrays of uniformly types numeric values.} |
| 42 | \seepep{305}{CSV File API} |
| 43 | {The Python Enhancement Proposal which proposed this addition |
| 44 | to Python.} |
| 45 | \end{seealso} |
| 46 | |
| 47 | |
Raymond Hettinger | 6f6d7b93 | 2003-08-31 05:44:54 +0000 | [diff] [blame] | 48 | \subsection{Module Contents \label{csv-contents}} |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 49 | |
Skip Montanaro | 5d0136e | 2003-04-25 15:14:49 +0000 | [diff] [blame] | 50 | The \module{csv} module defines the following functions: |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 51 | |
| 52 | \begin{funcdesc}{reader}{csvfile\optional{, |
Andrew McNamara | 8231de0 | 2005-01-12 11:47:57 +0000 | [diff] [blame] | 53 | dialect=\code{'excel'}}\optional{, fmtparam}} |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 54 | Return a reader object which will iterate over lines in the given |
| 55 | {}\var{csvfile}. \var{csvfile} can be any object which supports the |
| 56 | iterator protocol and returns a string each time its \method{next} |
Andrew McNamara | 8231de0 | 2005-01-12 11:47:57 +0000 | [diff] [blame] | 57 | method is called - file objects and list objects are both suitable. |
| 58 | If \var{csvfile} is a file object, it must be opened with |
Skip Montanaro | 5e4e39f | 2003-07-02 15:32:48 +0000 | [diff] [blame] | 59 | the 'b' flag on platforms where that makes a difference. An optional |
| 60 | {}\var{dialect} parameter can be given |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 61 | which is used to define a set of parameters specific to a particular CSV |
| 62 | dialect. It may be an instance of a subclass of the \class{Dialect} |
| 63 | class or one of the strings returned by the \function{list_dialects} |
| 64 | function. The other optional {}\var{fmtparam} keyword arguments can be |
| 65 | given to override individual formatting parameters in the current |
| 66 | dialect. For more information about the dialect and formatting |
Raymond Hettinger | 6e380cd | 2003-09-10 18:54:49 +0000 | [diff] [blame] | 67 | parameters, see section~\ref{csv-fmt-params}, ``Dialects and Formatting |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 68 | Parameters'' for details of these parameters. |
| 69 | |
| 70 | All data read are returned as strings. No automatic data type |
| 71 | conversion is performed. |
| 72 | \end{funcdesc} |
| 73 | |
| 74 | \begin{funcdesc}{writer}{csvfile\optional{, |
Andrew McNamara | 8231de0 | 2005-01-12 11:47:57 +0000 | [diff] [blame] | 75 | dialect=\code{'excel'}}\optional{, fmtparam}} |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 76 | Return a writer object responsible for converting the user's data into |
Skip Montanaro | 5e4e39f | 2003-07-02 15:32:48 +0000 | [diff] [blame] | 77 | delimited strings on the given file-like object. \var{csvfile} can be any |
| 78 | object with a \function{write} method. If \var{csvfile} is a file object, |
| 79 | it must be opened with the 'b' flag on platforms where that makes a |
| 80 | difference. An optional |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 81 | {}\var{dialect} parameter can be given which is used to define a set of |
| 82 | parameters specific to a particular CSV dialect. It may be an instance |
| 83 | of a subclass of the \class{Dialect} class or one of the strings |
| 84 | returned by the \function{list_dialects} function. The other optional |
| 85 | {}\var{fmtparam} keyword arguments can be given to override individual |
| 86 | formatting parameters in the current dialect. For more information |
| 87 | about the dialect and formatting parameters, see |
Raymond Hettinger | 6e380cd | 2003-09-10 18:54:49 +0000 | [diff] [blame] | 88 | section~\ref{csv-fmt-params}, ``Dialects and Formatting Parameters'' for |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 89 | details of these parameters. To make it as easy as possible to |
| 90 | interface with modules which implement the DB API, the value |
| 91 | \constant{None} is written as the empty string. While this isn't a |
| 92 | reversible transformation, it makes it easier to dump SQL NULL data values |
| 93 | to CSV files without preprocessing the data returned from a |
| 94 | \code{cursor.fetch*()} call. All other non-string data are stringified |
| 95 | with \function{str()} before being written. |
| 96 | \end{funcdesc} |
| 97 | |
Andrew McNamara | 8231de0 | 2005-01-12 11:47:57 +0000 | [diff] [blame] | 98 | \begin{funcdesc}{register_dialect}{name\optional{, dialect}\optional{, fmtparam}} |
| 99 | Associate \var{dialect} with \var{name}. \var{name} must be a string |
| 100 | or Unicode object. The dialect can be specified either by passing a |
| 101 | sub-class of \class{Dialect}, or by \var{fmtparam} keyword arguments, |
| 102 | or both, with keyword arguments overriding parameters of the dialect. |
| 103 | For more information about the dialect and formatting parameters, see |
| 104 | section~\ref{csv-fmt-params}, ``Dialects and Formatting Parameters'' |
| 105 | for details of these parameters. |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 106 | \end{funcdesc} |
| 107 | |
| 108 | \begin{funcdesc}{unregister_dialect}{name} |
| 109 | Delete the dialect associated with \var{name} from the dialect registry. An |
| 110 | \exception{Error} is raised if \var{name} is not a registered dialect |
| 111 | name. |
| 112 | \end{funcdesc} |
| 113 | |
| 114 | \begin{funcdesc}{get_dialect}{name} |
| 115 | Return the dialect associated with \var{name}. An \exception{Error} is |
| 116 | raised if \var{name} is not a registered dialect name. |
| 117 | \end{funcdesc} |
| 118 | |
| 119 | \begin{funcdesc}{list_dialects}{} |
| 120 | Return the names of all registered dialects. |
| 121 | \end{funcdesc} |
| 122 | |
Andrew McNamara | 8231de0 | 2005-01-12 11:47:57 +0000 | [diff] [blame] | 123 | \begin{funcdesc}{field_size_limit}{\optional{new_limit}} |
| 124 | Returns the current maximum field size allowed by the parser. If |
| 125 | \var{new_limit} is given, this becomes the new limit. |
| 126 | \versionadded{2.5} |
| 127 | \end{funcdesc} |
| 128 | |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 129 | |
Skip Montanaro | 5d0136e | 2003-04-25 15:14:49 +0000 | [diff] [blame] | 130 | The \module{csv} module defines the following classes: |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 131 | |
Skip Montanaro | dffeed3 | 2003-10-03 14:03:01 +0000 | [diff] [blame] | 132 | \begin{classdesc}{DictReader}{csvfile\optional{, |
| 133 | fieldnames=\constant{None},\optional{, |
Fred Drake | 9635268 | 2003-04-25 18:02:34 +0000 | [diff] [blame] | 134 | restkey=\constant{None}\optional{, |
| 135 | restval=\constant{None}\optional{, |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 136 | dialect=\code{'excel'}\optional{, |
Skip Montanaro | 10659f2 | 2004-04-16 03:21:01 +0000 | [diff] [blame] | 137 | *args, **kwds}}}}}} |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 138 | Create an object which operates like a regular reader but maps the |
Skip Montanaro | dffeed3 | 2003-10-03 14:03:01 +0000 | [diff] [blame] | 139 | information read into a dict whose keys are given by the optional |
| 140 | {} \var{fieldnames} |
| 141 | parameter. If the \var{fieldnames} parameter is omitted, the values in |
| 142 | the first row of the \var{csvfile} will be used as the fieldnames. |
| 143 | If the row read has fewer fields than the fieldnames sequence, |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 144 | the value of \var{restval} will be used as the default value. If the row |
| 145 | read has more fields than the fieldnames sequence, the remaining data is |
| 146 | added as a sequence keyed by the value of \var{restkey}. If the row read |
| 147 | has fewer fields than the fieldnames sequence, the remaining keys take the |
Skip Montanaro | 10659f2 | 2004-04-16 03:21:01 +0000 | [diff] [blame] | 148 | value of the optional \var{restval} parameter. Any other optional or |
| 149 | keyword arguments are passed to the underlying \class{reader} instance. |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 150 | \end{classdesc} |
| 151 | |
| 152 | |
| 153 | \begin{classdesc}{DictWriter}{csvfile, fieldnames\optional{, |
| 154 | restval=""\optional{, |
| 155 | extrasaction=\code{'raise'}\optional{, |
Skip Montanaro | 10659f2 | 2004-04-16 03:21:01 +0000 | [diff] [blame] | 156 | dialect=\code{'excel'}\optional{, |
| 157 | *args, **kwds}}}}} |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 158 | Create an object which operates like a regular writer but maps dictionaries |
| 159 | onto output rows. The \var{fieldnames} parameter identifies the order in |
| 160 | which values in the dictionary passed to the \method{writerow()} method are |
| 161 | written to the \var{csvfile}. The optional \var{restval} parameter |
| 162 | specifies the value to be written if the dictionary is missing a key in |
| 163 | \var{fieldnames}. If the dictionary passed to the \method{writerow()} |
| 164 | method contains a key not found in \var{fieldnames}, the optional |
| 165 | \var{extrasaction} parameter indicates what action to take. If it is set |
| 166 | to \code{'raise'} a \exception{ValueError} is raised. If it is set to |
Skip Montanaro | 10659f2 | 2004-04-16 03:21:01 +0000 | [diff] [blame] | 167 | \code{'ignore'}, extra values in the dictionary are ignored. Any other |
| 168 | optional or keyword arguments are passed to the underlying \class{writer} |
| 169 | instance. |
Skip Montanaro | dffeed3 | 2003-10-03 14:03:01 +0000 | [diff] [blame] | 170 | |
| 171 | Note that unlike the \class{DictReader} class, the \var{fieldnames} |
| 172 | parameter of the \class{DictWriter} is not optional. Since Python's |
| 173 | \class{dict} objects are not ordered, there is not enough information |
| 174 | available to deduce the order in which the row should be written to the |
| 175 | \var{csvfile}. |
| 176 | |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 177 | \end{classdesc} |
| 178 | |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 179 | \begin{classdesc*}{Dialect}{} |
| 180 | The \class{Dialect} class is a container class relied on primarily for its |
| 181 | attributes, which are used to define the parameters for a specific |
Fred Drake | 9635268 | 2003-04-25 18:02:34 +0000 | [diff] [blame] | 182 | \class{reader} or \class{writer} instance. |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 183 | \end{classdesc*} |
| 184 | |
Skip Montanaro | bb0c9dc | 2005-01-05 06:58:15 +0000 | [diff] [blame] | 185 | \begin{classdesc}{excel}{} |
| 186 | The \class{excel} class defines the usual properties of an Excel-generated |
| 187 | CSV file. |
| 188 | \end{classdesc} |
| 189 | |
| 190 | \begin{classdesc}{excel_tab}{} |
| 191 | The \class{excel_tab} class defines the usual properties of an |
| 192 | Excel-generated TAB-delimited file. |
| 193 | \end{classdesc} |
| 194 | |
Skip Montanaro | 7789237 | 2003-05-19 15:33:36 +0000 | [diff] [blame] | 195 | \begin{classdesc}{Sniffer}{} |
| 196 | The \class{Sniffer} class is used to deduce the format of a CSV file. |
Fred Drake | 9635268 | 2003-04-25 18:02:34 +0000 | [diff] [blame] | 197 | \end{classdesc} |
| 198 | |
Skip Montanaro | 8bdaac7 | 2005-12-28 15:56:58 +0000 | [diff] [blame] | 199 | The \class{Sniffer} class provides two methods: |
Fred Drake | 9635268 | 2003-04-25 18:02:34 +0000 | [diff] [blame] | 200 | |
Skip Montanaro | 7789237 | 2003-05-19 15:33:36 +0000 | [diff] [blame] | 201 | \begin{methoddesc}{sniff}{sample\optional{,delimiters=None}} |
| 202 | Analyze the given \var{sample} and return a \class{Dialect} subclass |
| 203 | reflecting the parameters found. If the optional \var{delimiters} parameter |
| 204 | is given, it is interpreted as a string containing possible valid delimiter |
| 205 | characters. |
Fred Drake | 9635268 | 2003-04-25 18:02:34 +0000 | [diff] [blame] | 206 | \end{methoddesc} |
| 207 | |
| 208 | \begin{methoddesc}{has_header}{sample} |
| 209 | Analyze the sample text (presumed to be in CSV format) and return |
| 210 | \constant{True} if the first row appears to be a series of column |
| 211 | headers. |
| 212 | \end{methoddesc} |
| 213 | |
| 214 | |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 215 | The \module{csv} module defines the following constants: |
| 216 | |
Skip Montanaro | a104556 | 2003-06-04 15:30:13 +0000 | [diff] [blame] | 217 | \begin{datadesc}{QUOTE_ALL} |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 218 | Instructs \class{writer} objects to quote all fields. |
| 219 | \end{datadesc} |
| 220 | |
| 221 | \begin{datadesc}{QUOTE_MINIMAL} |
| 222 | Instructs \class{writer} objects to only quote those fields which contain |
Andrew McNamara | 8231de0 | 2005-01-12 11:47:57 +0000 | [diff] [blame] | 223 | special characters such as \var{delimiter}, \var{quotechar} or any of the |
| 224 | characters in \var{lineterminator}. |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 225 | \end{datadesc} |
| 226 | |
| 227 | \begin{datadesc}{QUOTE_NONNUMERIC} |
Andrew McNamara | 8231de0 | 2005-01-12 11:47:57 +0000 | [diff] [blame] | 228 | Instructs \class{writer} objects to quote all non-numeric |
| 229 | fields. |
| 230 | |
| 231 | Instructs the reader to convert all non-quoted fields to type \var{float}. |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 232 | \end{datadesc} |
| 233 | |
| 234 | \begin{datadesc}{QUOTE_NONE} |
| 235 | Instructs \class{writer} objects to never quote fields. When the current |
| 236 | \var{delimiter} occurs in output data it is preceded by the current |
Andrew McNamara | 8231de0 | 2005-01-12 11:47:57 +0000 | [diff] [blame] | 237 | \var{escapechar} character. If \var{escapechar} is not set, the writer |
| 238 | will raise \exception{Error} if any characters that require escaping |
| 239 | are encountered. |
| 240 | |
| 241 | Instructs \class{reader} to perform no special processing of quote characters. |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 242 | \end{datadesc} |
| 243 | |
| 244 | |
| 245 | The \module{csv} module defines the following exception: |
| 246 | |
| 247 | \begin{excdesc}{Error} |
| 248 | Raised by any of the functions when an error is detected. |
| 249 | \end{excdesc} |
| 250 | |
| 251 | |
Fred Drake | 9635268 | 2003-04-25 18:02:34 +0000 | [diff] [blame] | 252 | \subsection{Dialects and Formatting Parameters\label{csv-fmt-params}} |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 253 | |
| 254 | To make it easier to specify the format of input and output records, |
| 255 | specific formatting parameters are grouped together into dialects. A |
| 256 | dialect is a subclass of the \class{Dialect} class having a set of specific |
| 257 | methods and a single \method{validate()} method. When creating \class{reader} |
| 258 | or \class{writer} objects, the programmer can specify a string or a subclass |
| 259 | of the \class{Dialect} class as the dialect parameter. In addition to, or |
| 260 | instead of, the \var{dialect} parameter, the programmer can also specify |
| 261 | individual formatting parameters, which have the same names as the |
Raymond Hettinger | 6f6d7b93 | 2003-08-31 05:44:54 +0000 | [diff] [blame] | 262 | attributes defined below for the \class{Dialect} class. |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 263 | |
Fred Drake | 9635268 | 2003-04-25 18:02:34 +0000 | [diff] [blame] | 264 | Dialects support the following attributes: |
| 265 | |
| 266 | \begin{memberdesc}[Dialect]{delimiter} |
| 267 | A one-character string used to separate fields. It defaults to \code{','}. |
| 268 | \end{memberdesc} |
| 269 | |
| 270 | \begin{memberdesc}[Dialect]{doublequote} |
Andrew McNamara | 8231de0 | 2005-01-12 11:47:57 +0000 | [diff] [blame] | 271 | Controls how instances of \var{quotechar} appearing inside a field should |
| 272 | be themselves be quoted. When \constant{True}, the character is doubled. |
| 273 | When \constant{False}, the \var{escapechar} is used as a prefix to the |
| 274 | \var{quotechar}. It defaults to \constant{True}. |
| 275 | |
| 276 | On output, if \var{doublequote} is \constant{False} and no |
| 277 | \var{escapechar} is set, \exception{Error} is raised if a \var{quotechar} |
| 278 | is found in a field. |
Fred Drake | 9635268 | 2003-04-25 18:02:34 +0000 | [diff] [blame] | 279 | \end{memberdesc} |
| 280 | |
| 281 | \begin{memberdesc}[Dialect]{escapechar} |
Andrew McNamara | 8231de0 | 2005-01-12 11:47:57 +0000 | [diff] [blame] | 282 | A one-character string used by the writer to escape the \var{delimiter} if |
| 283 | \var{quoting} is set to \constant{QUOTE_NONE} and the \var{quotechar} |
| 284 | if \var{doublequote} is \constant{False}. On reading, the \var{escapechar} |
| 285 | removes any special meaning from the following character. It defaults |
| 286 | to \constant{None}, which disables escaping. |
Fred Drake | 9635268 | 2003-04-25 18:02:34 +0000 | [diff] [blame] | 287 | \end{memberdesc} |
| 288 | |
| 289 | \begin{memberdesc}[Dialect]{lineterminator} |
Andrew McNamara | 8231de0 | 2005-01-12 11:47:57 +0000 | [diff] [blame] | 290 | The string used to terminate lines produced by the \class{writer}. |
| 291 | It defaults to \code{'\e r\e n'}. |
| 292 | |
| 293 | \note{The \class{reader} is hard-coded to recognise either \code{'\e r'} |
| 294 | or \code{'\e n'} as end-of-line, and ignores \var{lineterminator}. This |
| 295 | behavior may change in the future.} |
Fred Drake | 9635268 | 2003-04-25 18:02:34 +0000 | [diff] [blame] | 296 | \end{memberdesc} |
| 297 | |
| 298 | \begin{memberdesc}[Dialect]{quotechar} |
Andrew McNamara | 8231de0 | 2005-01-12 11:47:57 +0000 | [diff] [blame] | 299 | A one-character string used to quote fields containing special characters, |
| 300 | such as the \var{delimiter} or \var{quotechar}, or which contain new-line |
| 301 | characters. It defaults to \code{'"'}. |
Fred Drake | 9635268 | 2003-04-25 18:02:34 +0000 | [diff] [blame] | 302 | \end{memberdesc} |
| 303 | |
| 304 | \begin{memberdesc}[Dialect]{quoting} |
Andrew McNamara | 8231de0 | 2005-01-12 11:47:57 +0000 | [diff] [blame] | 305 | Controls when quotes should be generated by the writer and recognised |
| 306 | by the reader. It can take on any of the \constant{QUOTE_*} constants |
| 307 | (see section~\ref{csv-contents}) and defaults to \constant{QUOTE_MINIMAL}. |
Fred Drake | 9635268 | 2003-04-25 18:02:34 +0000 | [diff] [blame] | 308 | \end{memberdesc} |
| 309 | |
| 310 | \begin{memberdesc}[Dialect]{skipinitialspace} |
| 311 | When \constant{True}, whitespace immediately following the \var{delimiter} |
| 312 | is ignored. The default is \constant{False}. |
| 313 | \end{memberdesc} |
| 314 | |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 315 | |
| 316 | \subsection{Reader Objects} |
| 317 | |
Fred Drake | 9635268 | 2003-04-25 18:02:34 +0000 | [diff] [blame] | 318 | Reader objects (\class{DictReader} instances and objects returned by |
Raymond Hettinger | 6f6d7b93 | 2003-08-31 05:44:54 +0000 | [diff] [blame] | 319 | the \function{reader()} function) have the following public methods: |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 320 | |
Fred Drake | 9635268 | 2003-04-25 18:02:34 +0000 | [diff] [blame] | 321 | \begin{methoddesc}[csv reader]{next}{} |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 322 | Return the next row of the reader's iterable object as a list, parsed |
| 323 | according to the current dialect. |
| 324 | \end{methoddesc} |
| 325 | |
Andrew McNamara | 8231de0 | 2005-01-12 11:47:57 +0000 | [diff] [blame] | 326 | Reader objects have the following public attributes: |
| 327 | |
| 328 | \begin{memberdesc}[csv reader]{dialect} |
| 329 | A read-only description of the dialect in use by the parser. |
| 330 | \end{memberdesc} |
| 331 | |
| 332 | \begin{memberdesc}[csv reader]{line_num} |
| 333 | The number of lines read from the source iterator. This is not the same |
| 334 | as the number of records returned, as records can span multiple lines. |
| 335 | \end{memberdesc} |
| 336 | |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 337 | |
| 338 | \subsection{Writer Objects} |
| 339 | |
Skip Montanaro | ba0485a | 2004-01-21 13:47:04 +0000 | [diff] [blame] | 340 | \class{Writer} objects (\class{DictWriter} instances and objects returned by |
| 341 | the \function{writer()} function) have the following public methods. A |
| 342 | {}\var{row} must be a sequence of strings or numbers for \class{Writer} |
| 343 | objects and a dictionary mapping fieldnames to strings or numbers (by |
| 344 | passing them through \function{str()} first) for {}\class{DictWriter} |
| 345 | objects. Note that complex numbers are written out surrounded by parens. |
| 346 | This may cause some problems for other programs which read CSV files |
| 347 | (assuming they support complex numbers at all). |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 348 | |
Fred Drake | 9635268 | 2003-04-25 18:02:34 +0000 | [diff] [blame] | 349 | \begin{methoddesc}[csv writer]{writerow}{row} |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 350 | Write the \var{row} parameter to the writer's file object, formatted |
| 351 | according to the current dialect. |
| 352 | \end{methoddesc} |
| 353 | |
Fred Drake | 9635268 | 2003-04-25 18:02:34 +0000 | [diff] [blame] | 354 | \begin{methoddesc}[csv writer]{writerows}{rows} |
Skip Montanaro | ba0485a | 2004-01-21 13:47:04 +0000 | [diff] [blame] | 355 | Write all the \var{rows} parameters (a list of \var{row} objects as |
| 356 | described above) to the writer's file object, formatted |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 357 | according to the current dialect. |
| 358 | \end{methoddesc} |
| 359 | |
Andrew McNamara | 8231de0 | 2005-01-12 11:47:57 +0000 | [diff] [blame] | 360 | Writer objects have the following public attribute: |
| 361 | |
| 362 | \begin{memberdesc}[csv writer]{dialect} |
| 363 | A read-only description of the dialect in use by the writer. |
| 364 | \end{memberdesc} |
| 365 | |
| 366 | |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 367 | |
| 368 | \subsection{Examples} |
| 369 | |
Andrew McNamara | 8231de0 | 2005-01-12 11:47:57 +0000 | [diff] [blame] | 370 | The simplest example of reading a CSV file: |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 371 | |
| 372 | \begin{verbatim} |
Fred Drake | 9635268 | 2003-04-25 18:02:34 +0000 | [diff] [blame] | 373 | import csv |
Andrew M. Kuchling | 6f937b1 | 2004-08-07 15:11:24 +0000 | [diff] [blame] | 374 | reader = csv.reader(open("some.csv", "rb")) |
Fred Drake | 9635268 | 2003-04-25 18:02:34 +0000 | [diff] [blame] | 375 | for row in reader: |
| 376 | print row |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 377 | \end{verbatim} |
| 378 | |
Andrew McNamara | 8231de0 | 2005-01-12 11:47:57 +0000 | [diff] [blame] | 379 | Reading a file with an alternate format: |
Skip Montanaro | 2b2795a | 2004-07-08 19:49:10 +0000 | [diff] [blame] | 380 | |
| 381 | \begin{verbatim} |
| 382 | import csv |
Andrew McNamara | 8231de0 | 2005-01-12 11:47:57 +0000 | [diff] [blame] | 383 | reader = csv.reader(open("passwd", "rb"), delimiter=':', quoting=csv.QUOTE_NONE) |
Skip Montanaro | 2b2795a | 2004-07-08 19:49:10 +0000 | [diff] [blame] | 384 | for row in reader: |
Andrew McNamara | 8231de0 | 2005-01-12 11:47:57 +0000 | [diff] [blame] | 385 | print row |
Skip Montanaro | 2b2795a | 2004-07-08 19:49:10 +0000 | [diff] [blame] | 386 | \end{verbatim} |
| 387 | |
Andrew McNamara | 8231de0 | 2005-01-12 11:47:57 +0000 | [diff] [blame] | 388 | The corresponding simplest possible writing example is: |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 389 | |
| 390 | \begin{verbatim} |
Fred Drake | 9635268 | 2003-04-25 18:02:34 +0000 | [diff] [blame] | 391 | import csv |
Andrew M. Kuchling | 6f937b1 | 2004-08-07 15:11:24 +0000 | [diff] [blame] | 392 | writer = csv.writer(open("some.csv", "wb")) |
Andrew McNamara | 8231de0 | 2005-01-12 11:47:57 +0000 | [diff] [blame] | 393 | writer.writerows(someiterable) |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 394 | \end{verbatim} |
Andrew McNamara | 8231de0 | 2005-01-12 11:47:57 +0000 | [diff] [blame] | 395 | |
| 396 | Registering a new dialect: |
| 397 | |
| 398 | \begin{verbatim} |
| 399 | import csv |
| 400 | |
| 401 | csv.register_dialect('unixpwd', delimiter=':', quoting=csv.QUOTE_NONE) |
| 402 | |
| 403 | reader = csv.reader(open("passwd", "rb"), 'unixpwd') |
| 404 | \end{verbatim} |
| 405 | |
| 406 | A slightly more advanced use of the reader - catching and reporting errors: |
| 407 | |
| 408 | \begin{verbatim} |
| 409 | import csv, sys |
| 410 | filename = "some.csv" |
| 411 | reader = csv.reader(open(filename, "rb")) |
| 412 | try: |
| 413 | for row in reader: |
| 414 | print row |
| 415 | except csv.Error, e: |
| 416 | sys.exit('file %s, line %d: %s' % (filename, reader.line_num, e)) |
| 417 | \end{verbatim} |
| 418 | |
| 419 | And while the module doesn't directly support parsing strings, it can |
| 420 | easily be done: |
| 421 | |
| 422 | \begin{verbatim} |
| 423 | import csv |
| 424 | print csv.reader(['one,two,three'])[0] |
| 425 | \end{verbatim} |
| 426 | |
Skip Montanaro | 5011c3f | 2005-03-18 16:56:37 +0000 | [diff] [blame] | 427 | The \module{csv} module doesn't directly support reading and writing |
| 428 | Unicode, but it is 8-bit clean save for some problems with \ASCII{} NUL |
| 429 | characters, so you can write classes that handle the encoding and decoding |
| 430 | for you as long as you avoid encodings like utf-16 that use NULs. |
| 431 | |
| 432 | \begin{verbatim} |
| 433 | import csv |
| 434 | |
| 435 | class UnicodeReader: |
| 436 | def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds): |
| 437 | self.reader = csv.reader(f, dialect=dialect, **kwds) |
| 438 | self.encoding = encoding |
| 439 | |
| 440 | def next(self): |
| 441 | row = self.reader.next() |
| 442 | return [unicode(s, self.encoding) for s in row] |
| 443 | |
| 444 | def __iter__(self): |
| 445 | return self |
| 446 | |
| 447 | class UnicodeWriter: |
| 448 | def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds): |
| 449 | self.writer = csv.writer(f, dialect=dialect, **kwds) |
| 450 | self.encoding = encoding |
| 451 | |
| 452 | def writerow(self, row): |
| 453 | self.writer.writerow([s.encode("utf-8") for s in row]) |
| 454 | |
| 455 | def writerows(self, rows): |
| 456 | for row in rows: |
| 457 | self.writerow(row) |
| 458 | \end{verbatim} |
| 459 | |
| 460 | They should work just like the \class{csv.reader} and \class{csv.writer} |
| 461 | classes but add an \var{encoding} parameter. |