Skip Montanaro | b4a0417 | 2003-03-20 23:29:12 +0000 | [diff] [blame] | 1 | \section{\module{csv} --- CSV File Reading and Writing} |
| 2 | |
| 3 | \declaremodule{standard}{csv} |
| 4 | \modulesynopsis{Write and read tabular data to and from delimited files.} |
| 5 | |
| 6 | \versionadded{2.3} |
| 7 | \index{csv} |
| 8 | \indexii{data}{tabular} |
| 9 | |
| 10 | The so-called CSV (Comma Separated Values) format is the most common import |
| 11 | and export format for spreadsheets and databases. There is no ``CSV |
| 12 | standard'', so the format is operationally defined by the many applications |
| 13 | which read and write it. The lack of a standard means that subtle |
| 14 | differences often exist in the data produced and consumed by different |
| 15 | applications. These differences can make it annoying to process CSV files |
| 16 | from multiple sources. Still, while the delimiters and quoting characters |
| 17 | vary, the overall format is similar enough that it is possible to write a |
| 18 | single module which can efficiently manipulate such data, hiding the details |
| 19 | of reading and writing the data from the programmer. |
| 20 | |
| 21 | The \module{csv} module implements classes to read and write tabular data in |
| 22 | CSV format. It allows programmers to say, ``write this data in the format |
| 23 | preferred by Excel,'' or ``read data from this file which was generated by |
| 24 | Excel,'' without knowing the precise details of the CSV format used by |
| 25 | Excel. Programmers can also describe the CSV formats understood by other |
| 26 | applications or define their own special-purpose CSV formats. |
| 27 | |
| 28 | The \module{csv} module's \class{reader} and \class{writer} objects read and |
| 29 | write sequences. Programmers can also read and write data in dictionary |
| 30 | form using the \class{DictReader} and \class{DictWriter} classes. |
| 31 | |
| 32 | \note{The first version of the \module{csv} module doesn't support Unicode |
| 33 | input. Also, there are currently some issues regarding \ASCII{} NUL |
| 34 | characters. Accordingly, all input should generally be plain \ASCII{} to be |
| 35 | safe. These restrictions will be removed in the future.} |
| 36 | |
| 37 | \begin{seealso} |
| 38 | % \seemodule{array}{Arrays of uniformly types numeric values.} |
| 39 | \seepep{305}{CSV File API} |
| 40 | {The Python Enhancement Proposal which proposed this addition |
| 41 | to Python.} |
| 42 | \end{seealso} |
| 43 | |
| 44 | |
| 45 | \subsection{Module Contents} |
| 46 | |
| 47 | |
| 48 | The \module{csv} module defines the following functions: |
| 49 | |
| 50 | \begin{funcdesc}{reader}{csvfile\optional{, |
| 51 | dialect=\code{'excel'}\optional{, fmtparam}}} |
| 52 | Return a reader object which will iterate over lines in the given |
| 53 | {}\var{csvfile}. \var{csvfile} can be any object which supports the |
| 54 | iterator protocol and returns a string each time its \method{next} |
| 55 | method is called. An optional \var{dialect} parameter can be given |
| 56 | which is used to define a set of parameters specific to a particular CSV |
| 57 | dialect. It may be an instance of a subclass of the \class{Dialect} |
| 58 | class or one of the strings returned by the \function{list_dialects} |
| 59 | function. The other optional {}\var{fmtparam} keyword arguments can be |
| 60 | given to override individual formatting parameters in the current |
| 61 | dialect. For more information about the dialect and formatting |
| 62 | parameters, see section~\ref{fmt-params}, ``Dialects and Formatting |
| 63 | Parameters'' for details of these parameters. |
| 64 | |
| 65 | All data read are returned as strings. No automatic data type |
| 66 | conversion is performed. |
| 67 | \end{funcdesc} |
| 68 | |
| 69 | \begin{funcdesc}{writer}{csvfile\optional{, |
| 70 | dialect=\code{'excel'}\optional{, fmtparam}}} |
| 71 | Return a writer object responsible for converting the user's data into |
| 72 | delimited strings on the given file-like object. An optional |
| 73 | {}\var{dialect} parameter can be given which is used to define a set of |
| 74 | parameters specific to a particular CSV dialect. It may be an instance |
| 75 | of a subclass of the \class{Dialect} class or one of the strings |
| 76 | returned by the \function{list_dialects} function. The other optional |
| 77 | {}\var{fmtparam} keyword arguments can be given to override individual |
| 78 | formatting parameters in the current dialect. For more information |
| 79 | about the dialect and formatting parameters, see |
| 80 | section~\ref{fmt-params}, ``Dialects and Formatting Parameters'' for |
| 81 | details of these parameters. To make it as easy as possible to |
| 82 | interface with modules which implement the DB API, the value |
| 83 | \constant{None} is written as the empty string. While this isn't a |
| 84 | reversible transformation, it makes it easier to dump SQL NULL data values |
| 85 | to CSV files without preprocessing the data returned from a |
| 86 | \code{cursor.fetch*()} call. All other non-string data are stringified |
| 87 | with \function{str()} before being written. |
| 88 | \end{funcdesc} |
| 89 | |
| 90 | \begin{funcdesc}{register_dialect}{name, dialect} |
| 91 | Associate \var{dialect} with \var{name}. \var{dialect} must be a subclass |
| 92 | of \class{csv.Dialect}. \var{name} must be a string or Unicode object. |
| 93 | \end{funcdesc} |
| 94 | |
| 95 | \begin{funcdesc}{unregister_dialect}{name} |
| 96 | Delete the dialect associated with \var{name} from the dialect registry. An |
| 97 | \exception{Error} is raised if \var{name} is not a registered dialect |
| 98 | name. |
| 99 | \end{funcdesc} |
| 100 | |
| 101 | \begin{funcdesc}{get_dialect}{name} |
| 102 | Return the dialect associated with \var{name}. An \exception{Error} is |
| 103 | raised if \var{name} is not a registered dialect name. |
| 104 | \end{funcdesc} |
| 105 | |
| 106 | \begin{funcdesc}{list_dialects}{} |
| 107 | Return the names of all registered dialects. |
| 108 | \end{funcdesc} |
| 109 | |
| 110 | |
| 111 | The \module{csv} module defines the following classes: |
| 112 | |
| 113 | \begin{classdesc}{DictReader}{csvfile, fieldnames\optional{, |
| 114 | restkey=\code{None}\optional{, |
| 115 | restval=\code{None}\optional{, |
| 116 | dialect=\code{'excel'}\optional{, |
| 117 | fmtparam}}}}} |
| 118 | Create an object which operates like a regular reader but maps the |
| 119 | information read into a dict whose keys are given by the \var{fieldnames} |
| 120 | parameter. If the row read has fewer fields than the fieldnames sequence, |
| 121 | the value of \var{restval} will be used as the default value. If the row |
| 122 | read has more fields than the fieldnames sequence, the remaining data is |
| 123 | added as a sequence keyed by the value of \var{restkey}. If the row read |
| 124 | has fewer fields than the fieldnames sequence, the remaining keys take the |
| 125 | value of the optiona \var{restval} parameter. All other parameters are |
| 126 | interpreted as for regular readers. |
| 127 | \end{classdesc} |
| 128 | |
| 129 | |
| 130 | \begin{classdesc}{DictWriter}{csvfile, fieldnames\optional{, |
| 131 | restval=""\optional{, |
| 132 | extrasaction=\code{'raise'}\optional{, |
| 133 | dialect=\code{'excel'}\optional{, fmtparam}}}}} |
| 134 | Create an object which operates like a regular writer but maps dictionaries |
| 135 | onto output rows. The \var{fieldnames} parameter identifies the order in |
| 136 | which values in the dictionary passed to the \method{writerow()} method are |
| 137 | written to the \var{csvfile}. The optional \var{restval} parameter |
| 138 | specifies the value to be written if the dictionary is missing a key in |
| 139 | \var{fieldnames}. If the dictionary passed to the \method{writerow()} |
| 140 | method contains a key not found in \var{fieldnames}, the optional |
| 141 | \var{extrasaction} parameter indicates what action to take. If it is set |
| 142 | to \code{'raise'} a \exception{ValueError} is raised. If it is set to |
| 143 | \code{'ignore'}, extra values in the dictionary are ignored. All other |
| 144 | parameters are interpreted as for regular writers. |
| 145 | \end{classdesc} |
| 146 | |
| 147 | |
| 148 | \begin{classdesc*}{Dialect}{} |
| 149 | The \class{Dialect} class is a container class relied on primarily for its |
| 150 | attributes, which are used to define the parameters for a specific |
| 151 | \class{reader} or \class{writer} instance. Dialect objects support the |
| 152 | following data attributes: |
| 153 | |
| 154 | \begin{memberdesc}[string]{delimiter} |
| 155 | A one-character string used to separate fields. It defaults to \code{","}. |
| 156 | \end{memberdesc} |
| 157 | |
| 158 | \begin{memberdesc}[boolean]{doublequote} |
| 159 | Controls how instances of \var{quotechar} appearing inside a field should be |
| 160 | themselves be quoted. When \constant{True}, the character is doubledd. |
| 161 | When \constant{False}, the \var{escapechar} must be a one-character string |
| 162 | which is used as a prefix to the \var{quotechar}. It defaults to |
| 163 | \constant{True}. |
| 164 | \end{memberdesc} |
| 165 | |
| 166 | \begin{memberdesc}{escapechar} |
| 167 | A one-character string used to escape the \var{delimiter} if \var{quoting} |
| 168 | is set to \constant{QUOTE_NONE}. It defaults to \constant{None}. |
| 169 | \end{memberdesc} |
| 170 | |
| 171 | \begin{memberdesc}[string]{lineterminator} |
| 172 | The string used to terminate lines in the CSV file. It defaults to |
| 173 | \code{"\e r\e n"}. |
| 174 | \end{memberdesc} |
| 175 | |
| 176 | \begin{memberdesc}[string]{quotechar} |
| 177 | A one-character string used to quote elements containing the \var{delimiter} |
| 178 | or which start with the \var{quotechar}. It defaults to \code{'"'}. |
| 179 | \end{memberdesc} |
| 180 | |
| 181 | \begin{memberdesc}[integer]{quoting} |
| 182 | Controls when quotes should be generated by the writer. It can take on any |
| 183 | of the \code{QUOTE_*} constants defined below and defaults to |
| 184 | \constant{QUOTE_MINIMAL}. |
| 185 | \end{memberdesc} |
| 186 | |
| 187 | \begin{memberdesc}[boolean]{skipinitialspace} |
| 188 | When \constant{True}, whitespace immediately following the \var{delimiter} |
| 189 | is ignored. The default is \constant{False}. |
| 190 | \end{memberdesc} |
| 191 | |
| 192 | \end{classdesc*} |
| 193 | |
| 194 | The \module{csv} module defines the following constants: |
| 195 | |
| 196 | \begin{datadesc}{QUOTE_ALWAYS} |
| 197 | Instructs \class{writer} objects to quote all fields. |
| 198 | \end{datadesc} |
| 199 | |
| 200 | \begin{datadesc}{QUOTE_MINIMAL} |
| 201 | Instructs \class{writer} objects to only quote those fields which contain |
| 202 | the current \var{delimiter} or begin with the current \var{quotechar}. |
| 203 | \end{datadesc} |
| 204 | |
| 205 | \begin{datadesc}{QUOTE_NONNUMERIC} |
| 206 | Instructs \class{writer} objects to quote all non-numeric fields. |
| 207 | \end{datadesc} |
| 208 | |
| 209 | \begin{datadesc}{QUOTE_NONE} |
| 210 | Instructs \class{writer} objects to never quote fields. When the current |
| 211 | \var{delimiter} occurs in output data it is preceded by the current |
| 212 | \var{escapechar} character. When \constant{QUOTE_NONE} is in effect, it |
| 213 | is an error not to have a single-character \var{escapechar} defined, even if |
| 214 | no data to be written contains the \var{delimiter} character. |
| 215 | \end{datadesc} |
| 216 | |
| 217 | |
| 218 | The \module{csv} module defines the following exception: |
| 219 | |
| 220 | \begin{excdesc}{Error} |
| 221 | Raised by any of the functions when an error is detected. |
| 222 | \end{excdesc} |
| 223 | |
| 224 | |
| 225 | \subsection{Dialects and Formatting Parameters\label{fmt-params}} |
| 226 | |
| 227 | To make it easier to specify the format of input and output records, |
| 228 | specific formatting parameters are grouped together into dialects. A |
| 229 | dialect is a subclass of the \class{Dialect} class having a set of specific |
| 230 | methods and a single \method{validate()} method. When creating \class{reader} |
| 231 | or \class{writer} objects, the programmer can specify a string or a subclass |
| 232 | of the \class{Dialect} class as the dialect parameter. In addition to, or |
| 233 | instead of, the \var{dialect} parameter, the programmer can also specify |
| 234 | individual formatting parameters, which have the same names as the |
| 235 | attributes defined above for the \class{Dialect} class. |
| 236 | |
| 237 | |
| 238 | \subsection{Reader Objects} |
| 239 | |
| 240 | \class{DictReader} and \var{reader} objects have the following public |
| 241 | methods: |
| 242 | |
| 243 | \begin{methoddesc}{next}{} |
| 244 | Return the next row of the reader's iterable object as a list, parsed |
| 245 | according to the current dialect. |
| 246 | \end{methoddesc} |
| 247 | |
| 248 | |
| 249 | \subsection{Writer Objects} |
| 250 | |
| 251 | \class{DictWriter} and \var{writer} objects have the following public |
| 252 | methods: |
| 253 | |
| 254 | \begin{methoddesc}{writerow}{row} |
| 255 | Write the \var{row} parameter to the writer's file object, formatted |
| 256 | according to the current dialect. |
| 257 | \end{methoddesc} |
| 258 | |
| 259 | \begin{methoddesc}{writerows}{rows} |
| 260 | Write all the \var{rows} parameters to the writer's file object, formatted |
| 261 | according to the current dialect. |
| 262 | \end{methoddesc} |
| 263 | |
| 264 | |
| 265 | \subsection{Examples} |
| 266 | |
| 267 | The ``Hello, world'' of csv reading is |
| 268 | |
| 269 | \begin{verbatim} |
| 270 | reader = csv.reader(file("some.csv")) |
| 271 | for row in reader: |
| 272 | print row |
| 273 | \end{verbatim} |
| 274 | |
| 275 | The corresponding simplest possible writing example is |
| 276 | |
| 277 | \begin{verbatim} |
| 278 | writer = csv.writer(file("some.csv", "w")) |
| 279 | for row in someiterable: |
| 280 | writer.writerow(row) |
| 281 | \end{verbatim} |