blob: 7bbb7c1ccd085e8ccb7d29f597668c63aa94b942 [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001:mod:`csv` --- CSV File Reading and Writing
2===========================================
3
4.. module:: csv
5 :synopsis: Write and read tabular data to and from delimited files.
6.. sectionauthor:: Skip Montanaro <skip@pobox.com>
7
8
Georg Brandl116aa622007-08-15 14:28:22 +00009.. index::
10 single: csv
11 pair: data; tabular
12
13The so-called CSV (Comma Separated Values) format is the most common import and
14export format for spreadsheets and databases. There is no "CSV standard", so
15the format is operationally defined by the many applications which read and
16write it. The lack of a standard means that subtle differences often exist in
17the data produced and consumed by different applications. These differences can
18make it annoying to process CSV files from multiple sources. Still, while the
19delimiters and quoting characters vary, the overall format is similar enough
20that it is possible to write a single module which can efficiently manipulate
21such data, hiding the details of reading and writing the data from the
22programmer.
23
24The :mod:`csv` module implements classes to read and write tabular data in CSV
25format. It allows programmers to say, "write this data in the format preferred
26by Excel," or "read data from this file which was generated by Excel," without
27knowing the precise details of the CSV format used by Excel. Programmers can
28also describe the CSV formats understood by other applications or define their
29own special-purpose CSV formats.
30
31The :mod:`csv` module's :class:`reader` and :class:`writer` objects read and
32write sequences. Programmers can also read and write data in dictionary form
33using the :class:`DictReader` and :class:`DictWriter` classes.
34
Georg Brandl116aa622007-08-15 14:28:22 +000035.. seealso::
36
Georg Brandl116aa622007-08-15 14:28:22 +000037 :pep:`305` - CSV File API
38 The Python Enhancement Proposal which proposed this addition to Python.
39
40
41.. _csv-contents:
42
43Module Contents
44---------------
45
46The :mod:`csv` module defines the following functions:
47
48
Georg Brandlc2a4f4f2009-04-10 09:03:43 +000049.. function:: reader(csvfile, dialect='excel', **fmtparams)
Georg Brandl116aa622007-08-15 14:28:22 +000050
51 Return a reader object which will iterate over lines in the given *csvfile*.
Georg Brandl9afde1c2007-11-01 20:32:30 +000052 *csvfile* can be any object which supports the :term:`iterator` protocol and returns a
Georg Brandl502d9a52009-07-26 15:02:41 +000053 string each time its :meth:`!next` method is called --- file objects and list
R. David Murray8b7d4aa2009-04-04 01:38:38 +000054 objects are both suitable. If *csvfile* is a file object, it should be opened
55 with ``newline=''``. [#]_ An optional
Georg Brandl116aa622007-08-15 14:28:22 +000056 *dialect* parameter can be given which is used to define a set of parameters
57 specific to a particular CSV dialect. It may be an instance of a subclass of
58 the :class:`Dialect` class or one of the strings returned by the
Georg Brandlc2a4f4f2009-04-10 09:03:43 +000059 :func:`list_dialects` function. The other optional *fmtparams* keyword arguments
Georg Brandl116aa622007-08-15 14:28:22 +000060 can be given to override individual formatting parameters in the current
61 dialect. For full details about the dialect and formatting parameters, see
62 section :ref:`csv-fmt-params`.
63
Skip Montanaro0468df32009-03-25 00:52:18 +000064 Each row read from the csv file is returned as a list of strings. No
R. David Murray8b7d4aa2009-04-04 01:38:38 +000065 automatic data type conversion is performed unless the ``QUOTE_NONNUMERIC`` format
66 option is specified (in which case unquoted fields are transformed into floats).
Georg Brandl116aa622007-08-15 14:28:22 +000067
Christian Heimesb9eccbf2007-12-05 20:18:38 +000068 A short usage example::
Georg Brandl48310cd2009-01-03 21:18:54 +000069
Christian Heimesb9eccbf2007-12-05 20:18:38 +000070 >>> import csv
R. David Murray8b7d4aa2009-04-04 01:38:38 +000071 >>> spamReader = csv.reader(open('eggs.csv', newline=''), delimiter=' ', quotechar='|')
Christian Heimesb9eccbf2007-12-05 20:18:38 +000072 >>> for row in spamReader:
Georg Brandlf6945182008-02-01 11:56:49 +000073 ... print(', '.join(row))
Christian Heimesb9eccbf2007-12-05 20:18:38 +000074 Spam, Spam, Spam, Spam, Spam, Baked Beans
75 Spam, Lovely Spam, Wonderful Spam
76
Georg Brandl116aa622007-08-15 14:28:22 +000077
Georg Brandlc2a4f4f2009-04-10 09:03:43 +000078.. function:: writer(csvfile, dialect='excel', **fmtparams)
Georg Brandl116aa622007-08-15 14:28:22 +000079
80 Return a writer object responsible for converting the user's data into delimited
81 strings on the given file-like object. *csvfile* can be any object with a
R. David Murray8b7d4aa2009-04-04 01:38:38 +000082 :func:`write` method. An optional *dialect*
Georg Brandl116aa622007-08-15 14:28:22 +000083 parameter can be given which is used to define a set of parameters specific to a
84 particular CSV dialect. It may be an instance of a subclass of the
85 :class:`Dialect` class or one of the strings returned by the
Georg Brandlc2a4f4f2009-04-10 09:03:43 +000086 :func:`list_dialects` function. The other optional *fmtparams* keyword arguments
Georg Brandl116aa622007-08-15 14:28:22 +000087 can be given to override individual formatting parameters in the current
88 dialect. For full details about the dialect and formatting parameters, see
89 section :ref:`csv-fmt-params`. To make it
90 as easy as possible to interface with modules which implement the DB API, the
91 value :const:`None` is written as the empty string. While this isn't a
92 reversible transformation, it makes it easier to dump SQL NULL data values to
93 CSV files without preprocessing the data returned from a ``cursor.fetch*`` call.
94 All other non-string data are stringified with :func:`str` before being written.
95
Christian Heimesb9eccbf2007-12-05 20:18:38 +000096 A short usage example::
97
98 >>> import csv
99 >>> spamWriter = csv.writer(open('eggs.csv', 'w'), delimiter=' ',
Skip Montanarof290cb52009-08-18 14:37:42 +0000100 ... quotechar='|', quoting=csv.QUOTE_MINIMAL)
Christian Heimesb9eccbf2007-12-05 20:18:38 +0000101 >>> spamWriter.writerow(['Spam'] * 5 + ['Baked Beans'])
102 >>> spamWriter.writerow(['Spam', 'Lovely Spam', 'Wonderful Spam'])
103
Georg Brandl116aa622007-08-15 14:28:22 +0000104
Georg Brandlc2a4f4f2009-04-10 09:03:43 +0000105.. function:: register_dialect(name[, dialect], **fmtparams)
Georg Brandl116aa622007-08-15 14:28:22 +0000106
Georg Brandlf6945182008-02-01 11:56:49 +0000107 Associate *dialect* with *name*. *name* must be a string. The
Georg Brandl116aa622007-08-15 14:28:22 +0000108 dialect can be specified either by passing a sub-class of :class:`Dialect`, or
Georg Brandlc2a4f4f2009-04-10 09:03:43 +0000109 by *fmtparams* keyword arguments, or both, with keyword arguments overriding
Georg Brandl116aa622007-08-15 14:28:22 +0000110 parameters of the dialect. For full details about the dialect and formatting
111 parameters, see section :ref:`csv-fmt-params`.
112
113
114.. function:: unregister_dialect(name)
115
116 Delete the dialect associated with *name* from the dialect registry. An
117 :exc:`Error` is raised if *name* is not a registered dialect name.
118
119
120.. function:: get_dialect(name)
121
Georg Brandl6554cb92007-12-02 23:15:43 +0000122 Return the dialect associated with *name*. An :exc:`Error` is raised if
123 *name* is not a registered dialect name. This function returns an immutable
124 :class:`Dialect`.
Georg Brandl116aa622007-08-15 14:28:22 +0000125
126.. function:: list_dialects()
127
128 Return the names of all registered dialects.
129
130
131.. function:: field_size_limit([new_limit])
132
133 Returns the current maximum field size allowed by the parser. If *new_limit* is
134 given, this becomes the new limit.
135
Georg Brandl116aa622007-08-15 14:28:22 +0000136
137The :mod:`csv` module defines the following classes:
138
Georg Brandlc2a4f4f2009-04-10 09:03:43 +0000139.. class:: DictReader(csvfile, fieldnames=None, restkey=None, restval=None, dialect='excel', *args, **kwds)
Georg Brandl116aa622007-08-15 14:28:22 +0000140
141 Create an object which operates like a regular reader but maps the information
142 read into a dict whose keys are given by the optional *fieldnames* parameter.
143 If the *fieldnames* parameter is omitted, the values in the first row of the
144 *csvfile* will be used as the fieldnames. If the row read has fewer fields than
145 the fieldnames sequence, the value of *restval* will be used as the default
146 value. If the row read has more fields than the fieldnames sequence, the
147 remaining data is added as a sequence keyed by the value of *restkey*. If the
148 row read has fewer fields than the fieldnames sequence, the remaining keys take
149 the value of the optional *restval* parameter. Any other optional or keyword
150 arguments are passed to the underlying :class:`reader` instance.
151
152
Georg Brandlc2a4f4f2009-04-10 09:03:43 +0000153.. class:: DictWriter(csvfile, fieldnames, restval='', extrasaction='raise', dialect='excel', *args, **kwds)
Georg Brandl116aa622007-08-15 14:28:22 +0000154
155 Create an object which operates like a regular writer but maps dictionaries onto
156 output rows. The *fieldnames* parameter identifies the order in which values in
157 the dictionary passed to the :meth:`writerow` method are written to the
158 *csvfile*. The optional *restval* parameter specifies the value to be written
159 if the dictionary is missing a key in *fieldnames*. If the dictionary passed to
160 the :meth:`writerow` method contains a key not found in *fieldnames*, the
161 optional *extrasaction* parameter indicates what action to take. If it is set
162 to ``'raise'`` a :exc:`ValueError` is raised. If it is set to ``'ignore'``,
163 extra values in the dictionary are ignored. Any other optional or keyword
164 arguments are passed to the underlying :class:`writer` instance.
165
166 Note that unlike the :class:`DictReader` class, the *fieldnames* parameter of
167 the :class:`DictWriter` is not optional. Since Python's :class:`dict` objects
168 are not ordered, there is not enough information available to deduce the order
169 in which the row should be written to the *csvfile*.
170
171
172.. class:: Dialect
173
174 The :class:`Dialect` class is a container class relied on primarily for its
175 attributes, which are used to define the parameters for a specific
176 :class:`reader` or :class:`writer` instance.
177
178
179.. class:: excel()
180
181 The :class:`excel` class defines the usual properties of an Excel-generated CSV
182 file. It is registered with the dialect name ``'excel'``.
183
184
185.. class:: excel_tab()
186
187 The :class:`excel_tab` class defines the usual properties of an Excel-generated
188 TAB-delimited file. It is registered with the dialect name ``'excel-tab'``.
189
190
191.. class:: Sniffer()
192
193 The :class:`Sniffer` class is used to deduce the format of a CSV file.
194
Benjamin Petersone41251e2008-04-25 01:59:09 +0000195 The :class:`Sniffer` class provides two methods:
Georg Brandl116aa622007-08-15 14:28:22 +0000196
Georg Brandlc2a4f4f2009-04-10 09:03:43 +0000197 .. method:: sniff(sample, delimiters=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000198
Benjamin Petersone41251e2008-04-25 01:59:09 +0000199 Analyze the given *sample* and return a :class:`Dialect` subclass
200 reflecting the parameters found. If the optional *delimiters* parameter
201 is given, it is interpreted as a string containing possible valid
202 delimiter characters.
Georg Brandl116aa622007-08-15 14:28:22 +0000203
204
Benjamin Petersone41251e2008-04-25 01:59:09 +0000205 .. method:: has_header(sample)
Georg Brandl116aa622007-08-15 14:28:22 +0000206
Benjamin Petersone41251e2008-04-25 01:59:09 +0000207 Analyze the sample text (presumed to be in CSV format) and return
208 :const:`True` if the first row appears to be a series of column headers.
Georg Brandl116aa622007-08-15 14:28:22 +0000209
Christian Heimes7f044312008-01-06 17:05:40 +0000210An example for :class:`Sniffer` use::
Georg Brandl116aa622007-08-15 14:28:22 +0000211
Christian Heimes7f044312008-01-06 17:05:40 +0000212 csvfile = open("example.csv")
213 dialect = csv.Sniffer().sniff(csvfile.read(1024))
214 csvfile.seek(0)
215 reader = csv.reader(csvfile, dialect)
216 # ... process CSV file contents here ...
217
218
219The :mod:`csv` module defines the following constants:
Georg Brandl116aa622007-08-15 14:28:22 +0000220
221.. data:: QUOTE_ALL
222
223 Instructs :class:`writer` objects to quote all fields.
224
225
226.. data:: QUOTE_MINIMAL
227
228 Instructs :class:`writer` objects to only quote those fields which contain
229 special characters such as *delimiter*, *quotechar* or any of the characters in
230 *lineterminator*.
231
232
233.. data:: QUOTE_NONNUMERIC
234
235 Instructs :class:`writer` objects to quote all non-numeric fields.
236
237 Instructs the reader to convert all non-quoted fields to type *float*.
238
239
240.. data:: QUOTE_NONE
241
242 Instructs :class:`writer` objects to never quote fields. When the current
243 *delimiter* occurs in output data it is preceded by the current *escapechar*
244 character. If *escapechar* is not set, the writer will raise :exc:`Error` if
245 any characters that require escaping are encountered.
246
247 Instructs :class:`reader` to perform no special processing of quote characters.
248
249The :mod:`csv` module defines the following exception:
250
251
252.. exception:: Error
253
254 Raised by any of the functions when an error is detected.
255
Georg Brandl116aa622007-08-15 14:28:22 +0000256.. _csv-fmt-params:
257
258Dialects and Formatting Parameters
259----------------------------------
260
261To make it easier to specify the format of input and output records, specific
262formatting parameters are grouped together into dialects. A dialect is a
263subclass of the :class:`Dialect` class having a set of specific methods and a
264single :meth:`validate` method. When creating :class:`reader` or
265:class:`writer` objects, the programmer can specify a string or a subclass of
266the :class:`Dialect` class as the dialect parameter. In addition to, or instead
267of, the *dialect* parameter, the programmer can also specify individual
268formatting parameters, which have the same names as the attributes defined below
269for the :class:`Dialect` class.
270
271Dialects support the following attributes:
272
273
274.. attribute:: Dialect.delimiter
275
276 A one-character string used to separate fields. It defaults to ``','``.
277
278
279.. attribute:: Dialect.doublequote
280
281 Controls how instances of *quotechar* appearing inside a field should be
282 themselves be quoted. When :const:`True`, the character is doubled. When
283 :const:`False`, the *escapechar* is used as a prefix to the *quotechar*. It
284 defaults to :const:`True`.
285
286 On output, if *doublequote* is :const:`False` and no *escapechar* is set,
287 :exc:`Error` is raised if a *quotechar* is found in a field.
288
289
290.. attribute:: Dialect.escapechar
291
292 A one-character string used by the writer to escape the *delimiter* if *quoting*
293 is set to :const:`QUOTE_NONE` and the *quotechar* if *doublequote* is
294 :const:`False`. On reading, the *escapechar* removes any special meaning from
295 the following character. It defaults to :const:`None`, which disables escaping.
296
297
298.. attribute:: Dialect.lineterminator
299
300 The string used to terminate lines produced by the :class:`writer`. It defaults
301 to ``'\r\n'``.
302
303 .. note::
304
305 The :class:`reader` is hard-coded to recognise either ``'\r'`` or ``'\n'`` as
306 end-of-line, and ignores *lineterminator*. This behavior may change in the
307 future.
308
309
310.. attribute:: Dialect.quotechar
311
312 A one-character string used to quote fields containing special characters, such
313 as the *delimiter* or *quotechar*, or which contain new-line characters. It
314 defaults to ``'"'``.
315
316
317.. attribute:: Dialect.quoting
318
319 Controls when quotes should be generated by the writer and recognised by the
320 reader. It can take on any of the :const:`QUOTE_\*` constants (see section
321 :ref:`csv-contents`) and defaults to :const:`QUOTE_MINIMAL`.
322
323
324.. attribute:: Dialect.skipinitialspace
325
326 When :const:`True`, whitespace immediately following the *delimiter* is ignored.
327 The default is :const:`False`.
328
329
330Reader Objects
331--------------
332
333Reader objects (:class:`DictReader` instances and objects returned by the
334:func:`reader` function) have the following public methods:
335
Georg Brandlc7485062009-04-01 15:53:15 +0000336.. method:: csvreader.__next__()
Georg Brandl116aa622007-08-15 14:28:22 +0000337
338 Return the next row of the reader's iterable object as a list, parsed according
Georg Brandlc7485062009-04-01 15:53:15 +0000339 to the current dialect. Usually you should call this as ``next(reader)``.
340
Georg Brandl116aa622007-08-15 14:28:22 +0000341
342Reader objects have the following public attributes:
343
Georg Brandl116aa622007-08-15 14:28:22 +0000344.. attribute:: csvreader.dialect
345
346 A read-only description of the dialect in use by the parser.
347
348
349.. attribute:: csvreader.line_num
350
351 The number of lines read from the source iterator. This is not the same as the
352 number of records returned, as records can span multiple lines.
353
Georg Brandl116aa622007-08-15 14:28:22 +0000354
Skip Montanaroaf8fcfa2008-08-09 19:44:22 +0000355DictReader objects have the following public attribute:
356
Skip Montanaroaf8fcfa2008-08-09 19:44:22 +0000357.. attribute:: csvreader.fieldnames
358
359 If not passed as a parameter when creating the object, this attribute is
360 initialized upon first access or when the first record is read from the
361 file.
362
Skip Montanaroaf8fcfa2008-08-09 19:44:22 +0000363
364
Georg Brandl116aa622007-08-15 14:28:22 +0000365Writer Objects
366--------------
367
368:class:`Writer` objects (:class:`DictWriter` instances and objects returned by
369the :func:`writer` function) have the following public methods. A *row* must be
370a sequence of strings or numbers for :class:`Writer` objects and a dictionary
371mapping fieldnames to strings or numbers (by passing them through :func:`str`
372first) for :class:`DictWriter` objects. Note that complex numbers are written
373out surrounded by parens. This may cause some problems for other programs which
374read CSV files (assuming they support complex numbers at all).
375
376
377.. method:: csvwriter.writerow(row)
378
379 Write the *row* parameter to the writer's file object, formatted according to
380 the current dialect.
381
382
383.. method:: csvwriter.writerows(rows)
384
385 Write all the *rows* parameters (a list of *row* objects as described above) to
386 the writer's file object, formatted according to the current dialect.
387
388Writer objects have the following public attribute:
389
390
391.. attribute:: csvwriter.dialect
392
393 A read-only description of the dialect in use by the writer.
394
395
396.. _csv-examples:
397
398Examples
399--------
400
401The simplest example of reading a CSV file::
402
403 import csv
R. David Murray8b7d4aa2009-04-04 01:38:38 +0000404 reader = csv.reader(open("some.csv", newline=''))
Georg Brandl116aa622007-08-15 14:28:22 +0000405 for row in reader:
Georg Brandl6911e3c2007-09-04 07:15:32 +0000406 print(row)
Georg Brandl116aa622007-08-15 14:28:22 +0000407
408Reading a file with an alternate format::
409
410 import csv
R. David Murray8b7d4aa2009-04-04 01:38:38 +0000411 reader = csv.reader(open("passwd"), delimiter=':', quoting=csv.QUOTE_NONE)
Georg Brandl116aa622007-08-15 14:28:22 +0000412 for row in reader:
Georg Brandl6911e3c2007-09-04 07:15:32 +0000413 print(row)
Georg Brandl116aa622007-08-15 14:28:22 +0000414
415The corresponding simplest possible writing example is::
416
417 import csv
R. David Murray8b7d4aa2009-04-04 01:38:38 +0000418 writer = csv.writer(open("some.csv", "w"))
Georg Brandl116aa622007-08-15 14:28:22 +0000419 writer.writerows(someiterable)
420
R. David Murray8b7d4aa2009-04-04 01:38:38 +0000421Since :func:`open` is used to open a CSV file for reading, the file
422will by default be decoded into unicode using the system default
423encoding (see :func:`locale.getpreferredencoding`). To decode a file
424using a different encoding, use the ``encoding`` argument of open::
425
426 import csv
427 reader = csv.reader(open("some.csv", newline='', encoding='utf-8'))
428 for row in reader:
429 print(row)
430
431The same applies to writing in something other than the system default
432encoding: specify the encoding argument when opening the output file.
433
Georg Brandl116aa622007-08-15 14:28:22 +0000434Registering a new dialect::
435
436 import csv
Georg Brandl116aa622007-08-15 14:28:22 +0000437 csv.register_dialect('unixpwd', delimiter=':', quoting=csv.QUOTE_NONE)
R. David Murray8b7d4aa2009-04-04 01:38:38 +0000438 reader = csv.reader(open("passwd"), 'unixpwd')
Georg Brandl116aa622007-08-15 14:28:22 +0000439
440A slightly more advanced use of the reader --- catching and reporting errors::
441
442 import csv, sys
443 filename = "some.csv"
R. David Murray8b7d4aa2009-04-04 01:38:38 +0000444 reader = csv.reader(open(filename, newline=''))
Georg Brandl116aa622007-08-15 14:28:22 +0000445 try:
446 for row in reader:
Georg Brandl6911e3c2007-09-04 07:15:32 +0000447 print(row)
Georg Brandl116aa622007-08-15 14:28:22 +0000448 except csv.Error as e:
R. David Murray8b7d4aa2009-04-04 01:38:38 +0000449 sys.exit('file {}, line {}: {}'.format(filename, reader.line_num, e))
Georg Brandl116aa622007-08-15 14:28:22 +0000450
451And while the module doesn't directly support parsing strings, it can easily be
452done::
453
454 import csv
455 for row in csv.reader(['one,two,three']):
Georg Brandl6911e3c2007-09-04 07:15:32 +0000456 print(row)
Georg Brandl116aa622007-08-15 14:28:22 +0000457
Georg Brandl116aa622007-08-15 14:28:22 +0000458
R. David Murray8b7d4aa2009-04-04 01:38:38 +0000459.. rubric:: Footnotes
Georg Brandl116aa622007-08-15 14:28:22 +0000460
R. David Murray8b7d4aa2009-04-04 01:38:38 +0000461.. [#] If ``newline=''`` is not specified, newlines embedded inside quoted fields
462 will not be interpreted correctly. It should always be safe to specify
463 ``newline=''``, since the csv module does its own universal newline handling
464 on input.