Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 1 | \section{\module{csv} --- CSV File Reading and Writing} |
| 2 | |
| 3 | \declaremodule{standard}{csv} |
| 4 | \modulesynopsis{Write and read tabular data to and from delimited files.} |
Skip Montanaro | 3bd3c84 | 2003-04-24 18:47:31 +0000 | [diff] [blame] | 5 | \sectionauthor{Skip Montanaro}{skip@pobox.com} |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 6 | |
| 7 | \versionadded{2.3} |
| 8 | \index{csv} |
| 9 | \indexii{data}{tabular} |
| 10 | |
| 11 | The so-called CSV (Comma Separated Values) format is the most common import |
| 12 | and export format for spreadsheets and databases. There is no ``CSV |
| 13 | standard'', so the format is operationally defined by the many applications |
| 14 | which read and write it. The lack of a standard means that subtle |
| 15 | differences often exist in the data produced and consumed by different |
| 16 | applications. These differences can make it annoying to process CSV files |
| 17 | from multiple sources. Still, while the delimiters and quoting characters |
| 18 | vary, the overall format is similar enough that it is possible to write a |
| 19 | single module which can efficiently manipulate such data, hiding the details |
| 20 | of reading and writing the data from the programmer. |
| 21 | |
Skip Montanaro | 5d0136e | 2003-04-25 15:14:49 +0000 | [diff] [blame] | 22 | The \module{csv} module implements classes to read and write tabular data in |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 23 | CSV format. It allows programmers to say, ``write this data in the format |
| 24 | preferred by Excel,'' or ``read data from this file which was generated by |
| 25 | Excel,'' without knowing the precise details of the CSV format used by |
| 26 | Excel. Programmers can also describe the CSV formats understood by other |
| 27 | applications or define their own special-purpose CSV formats. |
| 28 | |
Skip Montanaro | 5d0136e | 2003-04-25 15:14:49 +0000 | [diff] [blame] | 29 | The \module{csv} module's \class{reader} and \class{writer} objects read and |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 30 | write sequences. Programmers can also read and write data in dictionary |
| 31 | form using the \class{DictReader} and \class{DictWriter} classes. |
| 32 | |
Fred Drake | 9635268 | 2003-04-25 18:02:34 +0000 | [diff] [blame] | 33 | \begin{notice} |
| 34 | This version of the \module{csv} module doesn't support Unicode |
| 35 | input. Also, there are currently some issues regarding \ASCII{} NUL |
| 36 | characters. Accordingly, all input should generally be printable |
| 37 | \ASCII{} to be safe. These restrictions will be removed in the future. |
| 38 | \end{notice} |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 39 | |
| 40 | \begin{seealso} |
| 41 | % \seemodule{array}{Arrays of uniformly types numeric values.} |
| 42 | \seepep{305}{CSV File API} |
| 43 | {The Python Enhancement Proposal which proposed this addition |
| 44 | to Python.} |
| 45 | \end{seealso} |
| 46 | |
| 47 | |
Skip Montanaro | 5d0136e | 2003-04-25 15:14:49 +0000 | [diff] [blame] | 48 | \subsection{Module Contents} |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 49 | |
Skip Montanaro | 5d0136e | 2003-04-25 15:14:49 +0000 | [diff] [blame] | 50 | The \module{csv} module defines the following functions: |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 51 | |
| 52 | \begin{funcdesc}{reader}{csvfile\optional{, |
| 53 | dialect=\code{'excel'}\optional{, fmtparam}}} |
| 54 | Return a reader object which will iterate over lines in the given |
| 55 | {}\var{csvfile}. \var{csvfile} can be any object which supports the |
| 56 | iterator protocol and returns a string each time its \method{next} |
| 57 | method is called. An optional \var{dialect} parameter can be given |
| 58 | which is used to define a set of parameters specific to a particular CSV |
| 59 | dialect. It may be an instance of a subclass of the \class{Dialect} |
| 60 | class or one of the strings returned by the \function{list_dialects} |
| 61 | function. The other optional {}\var{fmtparam} keyword arguments can be |
| 62 | given to override individual formatting parameters in the current |
| 63 | dialect. For more information about the dialect and formatting |
| 64 | parameters, see section~\ref{fmt-params}, ``Dialects and Formatting |
| 65 | Parameters'' for details of these parameters. |
| 66 | |
| 67 | All data read are returned as strings. No automatic data type |
| 68 | conversion is performed. |
| 69 | \end{funcdesc} |
| 70 | |
| 71 | \begin{funcdesc}{writer}{csvfile\optional{, |
| 72 | dialect=\code{'excel'}\optional{, fmtparam}}} |
| 73 | Return a writer object responsible for converting the user's data into |
| 74 | delimited strings on the given file-like object. An optional |
| 75 | {}\var{dialect} parameter can be given which is used to define a set of |
| 76 | parameters specific to a particular CSV dialect. It may be an instance |
| 77 | of a subclass of the \class{Dialect} class or one of the strings |
| 78 | returned by the \function{list_dialects} function. The other optional |
| 79 | {}\var{fmtparam} keyword arguments can be given to override individual |
| 80 | formatting parameters in the current dialect. For more information |
| 81 | about the dialect and formatting parameters, see |
| 82 | section~\ref{fmt-params}, ``Dialects and Formatting Parameters'' for |
| 83 | details of these parameters. To make it as easy as possible to |
| 84 | interface with modules which implement the DB API, the value |
| 85 | \constant{None} is written as the empty string. While this isn't a |
| 86 | reversible transformation, it makes it easier to dump SQL NULL data values |
| 87 | to CSV files without preprocessing the data returned from a |
| 88 | \code{cursor.fetch*()} call. All other non-string data are stringified |
| 89 | with \function{str()} before being written. |
| 90 | \end{funcdesc} |
| 91 | |
| 92 | \begin{funcdesc}{register_dialect}{name, dialect} |
| 93 | Associate \var{dialect} with \var{name}. \var{dialect} must be a subclass |
| 94 | of \class{csv.Dialect}. \var{name} must be a string or Unicode object. |
| 95 | \end{funcdesc} |
| 96 | |
| 97 | \begin{funcdesc}{unregister_dialect}{name} |
| 98 | Delete the dialect associated with \var{name} from the dialect registry. An |
| 99 | \exception{Error} is raised if \var{name} is not a registered dialect |
| 100 | name. |
| 101 | \end{funcdesc} |
| 102 | |
| 103 | \begin{funcdesc}{get_dialect}{name} |
| 104 | Return the dialect associated with \var{name}. An \exception{Error} is |
| 105 | raised if \var{name} is not a registered dialect name. |
| 106 | \end{funcdesc} |
| 107 | |
| 108 | \begin{funcdesc}{list_dialects}{} |
| 109 | Return the names of all registered dialects. |
| 110 | \end{funcdesc} |
| 111 | |
| 112 | |
Skip Montanaro | 5d0136e | 2003-04-25 15:14:49 +0000 | [diff] [blame] | 113 | The \module{csv} module defines the following classes: |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 114 | |
| 115 | \begin{classdesc}{DictReader}{csvfile, fieldnames\optional{, |
Fred Drake | 9635268 | 2003-04-25 18:02:34 +0000 | [diff] [blame] | 116 | restkey=\constant{None}\optional{, |
| 117 | restval=\constant{None}\optional{, |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 118 | dialect=\code{'excel'}\optional{, |
| 119 | fmtparam}}}}} |
| 120 | Create an object which operates like a regular reader but maps the |
| 121 | information read into a dict whose keys are given by the \var{fieldnames} |
| 122 | parameter. If the row read has fewer fields than the fieldnames sequence, |
| 123 | the value of \var{restval} will be used as the default value. If the row |
| 124 | read has more fields than the fieldnames sequence, the remaining data is |
| 125 | added as a sequence keyed by the value of \var{restkey}. If the row read |
| 126 | has fewer fields than the fieldnames sequence, the remaining keys take the |
| 127 | value of the optiona \var{restval} parameter. All other parameters are |
| 128 | interpreted as for regular readers. |
| 129 | \end{classdesc} |
| 130 | |
| 131 | |
| 132 | \begin{classdesc}{DictWriter}{csvfile, fieldnames\optional{, |
| 133 | restval=""\optional{, |
| 134 | extrasaction=\code{'raise'}\optional{, |
| 135 | dialect=\code{'excel'}\optional{, fmtparam}}}}} |
| 136 | Create an object which operates like a regular writer but maps dictionaries |
| 137 | onto output rows. The \var{fieldnames} parameter identifies the order in |
| 138 | which values in the dictionary passed to the \method{writerow()} method are |
| 139 | written to the \var{csvfile}. The optional \var{restval} parameter |
| 140 | specifies the value to be written if the dictionary is missing a key in |
| 141 | \var{fieldnames}. If the dictionary passed to the \method{writerow()} |
| 142 | method contains a key not found in \var{fieldnames}, the optional |
| 143 | \var{extrasaction} parameter indicates what action to take. If it is set |
| 144 | to \code{'raise'} a \exception{ValueError} is raised. If it is set to |
| 145 | \code{'ignore'}, extra values in the dictionary are ignored. All other |
| 146 | parameters are interpreted as for regular writers. |
| 147 | \end{classdesc} |
| 148 | |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 149 | \begin{classdesc*}{Dialect}{} |
| 150 | The \class{Dialect} class is a container class relied on primarily for its |
| 151 | attributes, which are used to define the parameters for a specific |
Fred Drake | 9635268 | 2003-04-25 18:02:34 +0000 | [diff] [blame] | 152 | \class{reader} or \class{writer} instance. |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 153 | \end{classdesc*} |
| 154 | |
Skip Montanaro | 7789237 | 2003-05-19 15:33:36 +0000 | [diff] [blame] | 155 | \begin{classdesc}{Sniffer}{} |
| 156 | The \class{Sniffer} class is used to deduce the format of a CSV file. |
Fred Drake | 9635268 | 2003-04-25 18:02:34 +0000 | [diff] [blame] | 157 | \end{classdesc} |
| 158 | |
| 159 | The \class{Sniffer} class provides a single method: |
| 160 | |
Skip Montanaro | 7789237 | 2003-05-19 15:33:36 +0000 | [diff] [blame] | 161 | \begin{methoddesc}{sniff}{sample\optional{,delimiters=None}} |
| 162 | Analyze the given \var{sample} and return a \class{Dialect} subclass |
| 163 | reflecting the parameters found. If the optional \var{delimiters} parameter |
| 164 | is given, it is interpreted as a string containing possible valid delimiter |
| 165 | characters. |
Fred Drake | 9635268 | 2003-04-25 18:02:34 +0000 | [diff] [blame] | 166 | \end{methoddesc} |
| 167 | |
| 168 | \begin{methoddesc}{has_header}{sample} |
| 169 | Analyze the sample text (presumed to be in CSV format) and return |
| 170 | \constant{True} if the first row appears to be a series of column |
| 171 | headers. |
| 172 | \end{methoddesc} |
| 173 | |
| 174 | |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 175 | The \module{csv} module defines the following constants: |
| 176 | |
Skip Montanaro | a104556 | 2003-06-04 15:30:13 +0000 | [diff] [blame^] | 177 | \begin{datadesc}{QUOTE_ALL} |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 178 | Instructs \class{writer} objects to quote all fields. |
| 179 | \end{datadesc} |
| 180 | |
| 181 | \begin{datadesc}{QUOTE_MINIMAL} |
| 182 | Instructs \class{writer} objects to only quote those fields which contain |
| 183 | the current \var{delimiter} or begin with the current \var{quotechar}. |
| 184 | \end{datadesc} |
| 185 | |
| 186 | \begin{datadesc}{QUOTE_NONNUMERIC} |
| 187 | Instructs \class{writer} objects to quote all non-numeric fields. |
| 188 | \end{datadesc} |
| 189 | |
| 190 | \begin{datadesc}{QUOTE_NONE} |
| 191 | Instructs \class{writer} objects to never quote fields. When the current |
| 192 | \var{delimiter} occurs in output data it is preceded by the current |
| 193 | \var{escapechar} character. When \constant{QUOTE_NONE} is in effect, it |
| 194 | is an error not to have a single-character \var{escapechar} defined, even if |
| 195 | no data to be written contains the \var{delimiter} character. |
| 196 | \end{datadesc} |
| 197 | |
| 198 | |
| 199 | The \module{csv} module defines the following exception: |
| 200 | |
| 201 | \begin{excdesc}{Error} |
| 202 | Raised by any of the functions when an error is detected. |
| 203 | \end{excdesc} |
| 204 | |
| 205 | |
Fred Drake | 9635268 | 2003-04-25 18:02:34 +0000 | [diff] [blame] | 206 | \subsection{Dialects and Formatting Parameters\label{csv-fmt-params}} |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 207 | |
| 208 | To make it easier to specify the format of input and output records, |
| 209 | specific formatting parameters are grouped together into dialects. A |
| 210 | dialect is a subclass of the \class{Dialect} class having a set of specific |
| 211 | methods and a single \method{validate()} method. When creating \class{reader} |
| 212 | or \class{writer} objects, the programmer can specify a string or a subclass |
| 213 | of the \class{Dialect} class as the dialect parameter. In addition to, or |
| 214 | instead of, the \var{dialect} parameter, the programmer can also specify |
| 215 | individual formatting parameters, which have the same names as the |
| 216 | attributes defined above for the \class{Dialect} class. |
| 217 | |
Fred Drake | 9635268 | 2003-04-25 18:02:34 +0000 | [diff] [blame] | 218 | Dialects support the following attributes: |
| 219 | |
| 220 | \begin{memberdesc}[Dialect]{delimiter} |
| 221 | A one-character string used to separate fields. It defaults to \code{','}. |
| 222 | \end{memberdesc} |
| 223 | |
| 224 | \begin{memberdesc}[Dialect]{doublequote} |
| 225 | Controls how instances of \var{quotechar} appearing inside a field should be |
| 226 | themselves be quoted. When \constant{True}, the character is doubledd. |
| 227 | When \constant{False}, the \var{escapechar} must be a one-character string |
| 228 | which is used as a prefix to the \var{quotechar}. It defaults to |
| 229 | \constant{True}. |
| 230 | \end{memberdesc} |
| 231 | |
| 232 | \begin{memberdesc}[Dialect]{escapechar} |
| 233 | A one-character string used to escape the \var{delimiter} if \var{quoting} |
| 234 | is set to \constant{QUOTE_NONE}. It defaults to \constant{None}. |
| 235 | \end{memberdesc} |
| 236 | |
| 237 | \begin{memberdesc}[Dialect]{lineterminator} |
| 238 | The string used to terminate lines in the CSV file. It defaults to |
| 239 | \code{'\e r\e n'}. |
| 240 | \end{memberdesc} |
| 241 | |
| 242 | \begin{memberdesc}[Dialect]{quotechar} |
| 243 | A one-character string used to quote elements containing the \var{delimiter} |
| 244 | or which start with the \var{quotechar}. It defaults to \code{'"'}. |
| 245 | \end{memberdesc} |
| 246 | |
| 247 | \begin{memberdesc}[Dialect]{quoting} |
| 248 | Controls when quotes should be generated by the writer. It can take on any |
| 249 | of the \constant{QUOTE_*} constants defined below and defaults to |
| 250 | \constant{QUOTE_MINIMAL}. |
| 251 | \end{memberdesc} |
| 252 | |
| 253 | \begin{memberdesc}[Dialect]{skipinitialspace} |
| 254 | When \constant{True}, whitespace immediately following the \var{delimiter} |
| 255 | is ignored. The default is \constant{False}. |
| 256 | \end{memberdesc} |
| 257 | |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 258 | |
| 259 | \subsection{Reader Objects} |
| 260 | |
Fred Drake | 9635268 | 2003-04-25 18:02:34 +0000 | [diff] [blame] | 261 | Reader objects (\class{DictReader} instances and objects returned by |
| 262 | the \function{reader()}function) have the following public methods: |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 263 | |
Fred Drake | 9635268 | 2003-04-25 18:02:34 +0000 | [diff] [blame] | 264 | \begin{methoddesc}[csv reader]{next}{} |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 265 | Return the next row of the reader's iterable object as a list, parsed |
| 266 | according to the current dialect. |
| 267 | \end{methoddesc} |
| 268 | |
| 269 | |
| 270 | \subsection{Writer Objects} |
| 271 | |
Fred Drake | 9635268 | 2003-04-25 18:02:34 +0000 | [diff] [blame] | 272 | Writer objects (\class{DictWriter} instances and objects returned by |
| 273 | the \function{writer()} function) have the following public methods: |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 274 | |
Fred Drake | 9635268 | 2003-04-25 18:02:34 +0000 | [diff] [blame] | 275 | \begin{methoddesc}[csv writer]{writerow}{row} |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 276 | Write the \var{row} parameter to the writer's file object, formatted |
| 277 | according to the current dialect. |
| 278 | \end{methoddesc} |
| 279 | |
Fred Drake | 9635268 | 2003-04-25 18:02:34 +0000 | [diff] [blame] | 280 | \begin{methoddesc}[csv writer]{writerows}{rows} |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 281 | Write all the \var{rows} parameters to the writer's file object, formatted |
| 282 | according to the current dialect. |
| 283 | \end{methoddesc} |
| 284 | |
| 285 | |
| 286 | \subsection{Examples} |
| 287 | |
| 288 | The ``Hello, world'' of csv reading is |
| 289 | |
| 290 | \begin{verbatim} |
Fred Drake | 9635268 | 2003-04-25 18:02:34 +0000 | [diff] [blame] | 291 | import csv |
| 292 | reader = csv.reader(file("some.csv")) |
| 293 | for row in reader: |
| 294 | print row |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 295 | \end{verbatim} |
| 296 | |
| 297 | The corresponding simplest possible writing example is |
| 298 | |
| 299 | \begin{verbatim} |
Fred Drake | 9635268 | 2003-04-25 18:02:34 +0000 | [diff] [blame] | 300 | import csv |
| 301 | writer = csv.writer(file("some.csv", "w")) |
| 302 | for row in someiterable: |
| 303 | writer.writerow(row) |
Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 304 | \end{verbatim} |