blob: a20c4be901b043b1697616b784f22756b2c20d71 [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001:mod:`csv` --- CSV File Reading and Writing
2===========================================
3
4.. module:: csv
5 :synopsis: Write and read tabular data to and from delimited files.
6.. sectionauthor:: Skip Montanaro <skip@pobox.com>
7
8
Georg Brandl116aa622007-08-15 14:28:22 +00009.. index::
10 single: csv
11 pair: data; tabular
12
13The so-called CSV (Comma Separated Values) format is the most common import and
Skip Montanarob40dea72011-03-19 09:09:30 -050014export format for spreadsheets and databases. CSV format was used for many
15years prior to attempts to describe the format in a standardized way in
16:rfc:`4180`. The lack of a well-defined standard means that subtle differences
17often exist in the data produced and consumed by different applications. These
18differences can make it annoying to process CSV files from multiple sources.
19Still, while the delimiters and quoting characters vary, the overall format is
20similar enough that it is possible to write a single module which can
21efficiently manipulate such data, hiding the details of reading and writing the
22data from the programmer.
Georg Brandl116aa622007-08-15 14:28:22 +000023
24The :mod:`csv` module implements classes to read and write tabular data in CSV
25format. It allows programmers to say, "write this data in the format preferred
26by Excel," or "read data from this file which was generated by Excel," without
27knowing the precise details of the CSV format used by Excel. Programmers can
28also describe the CSV formats understood by other applications or define their
29own special-purpose CSV formats.
30
31The :mod:`csv` module's :class:`reader` and :class:`writer` objects read and
32write sequences. Programmers can also read and write data in dictionary form
33using the :class:`DictReader` and :class:`DictWriter` classes.
34
Georg Brandl116aa622007-08-15 14:28:22 +000035.. seealso::
36
Georg Brandl116aa622007-08-15 14:28:22 +000037 :pep:`305` - CSV File API
38 The Python Enhancement Proposal which proposed this addition to Python.
39
40
41.. _csv-contents:
42
43Module Contents
44---------------
45
46The :mod:`csv` module defines the following functions:
47
48
R David Murray1b00f252012-08-15 10:43:58 -040049.. index::
50 single: universal newlines; csv.reader function
51
Georg Brandlc2a4f4f2009-04-10 09:03:43 +000052.. function:: reader(csvfile, dialect='excel', **fmtparams)
Georg Brandl116aa622007-08-15 14:28:22 +000053
54 Return a reader object which will iterate over lines in the given *csvfile*.
Georg Brandl9afde1c2007-11-01 20:32:30 +000055 *csvfile* can be any object which supports the :term:`iterator` protocol and returns a
Georg Brandlb30f3302011-01-06 09:23:56 +000056 string each time its :meth:`!__next__` method is called --- :term:`file objects
Antoine Pitrou11cb9612010-09-15 11:11:28 +000057 <file object>` and list objects are both suitable. If *csvfile* is a file object,
R David Murray91887022011-03-19 22:30:14 -040058 it should be opened with ``newline=''``. [1]_ An optional
Georg Brandl116aa622007-08-15 14:28:22 +000059 *dialect* parameter can be given which is used to define a set of parameters
60 specific to a particular CSV dialect. It may be an instance of a subclass of
61 the :class:`Dialect` class or one of the strings returned by the
Georg Brandlc2a4f4f2009-04-10 09:03:43 +000062 :func:`list_dialects` function. The other optional *fmtparams* keyword arguments
Georg Brandl116aa622007-08-15 14:28:22 +000063 can be given to override individual formatting parameters in the current
64 dialect. For full details about the dialect and formatting parameters, see
65 section :ref:`csv-fmt-params`.
66
Skip Montanaro0468df32009-03-25 00:52:18 +000067 Each row read from the csv file is returned as a list of strings. No
R. David Murray8b7d4aa2009-04-04 01:38:38 +000068 automatic data type conversion is performed unless the ``QUOTE_NONNUMERIC`` format
69 option is specified (in which case unquoted fields are transformed into floats).
Georg Brandl116aa622007-08-15 14:28:22 +000070
Christian Heimesb9eccbf2007-12-05 20:18:38 +000071 A short usage example::
Georg Brandl48310cd2009-01-03 21:18:54 +000072
Christian Heimesb9eccbf2007-12-05 20:18:38 +000073 >>> import csv
Ezio Melottie34f8a92012-09-15 05:51:45 +030074 >>> with open('eggs.csv', newline='') as csvfile:
75 ... spamreader = csv.reader(csvfile, delimiter=' ', quotechar='|')
76 ... for row in spamreader:
77 ... print(', '.join(row))
Christian Heimesb9eccbf2007-12-05 20:18:38 +000078 Spam, Spam, Spam, Spam, Spam, Baked Beans
79 Spam, Lovely Spam, Wonderful Spam
80
Georg Brandl116aa622007-08-15 14:28:22 +000081
Georg Brandlc2a4f4f2009-04-10 09:03:43 +000082.. function:: writer(csvfile, dialect='excel', **fmtparams)
Georg Brandl116aa622007-08-15 14:28:22 +000083
84 Return a writer object responsible for converting the user's data into delimited
85 strings on the given file-like object. *csvfile* can be any object with a
R David Murray9c0d5ea2011-03-20 11:18:21 -040086 :func:`write` method. If *csvfile* is a file object, it should be opened with
87 ``newline=''`` [1]_. An optional *dialect*
Georg Brandl116aa622007-08-15 14:28:22 +000088 parameter can be given which is used to define a set of parameters specific to a
89 particular CSV dialect. It may be an instance of a subclass of the
90 :class:`Dialect` class or one of the strings returned by the
Georg Brandlc2a4f4f2009-04-10 09:03:43 +000091 :func:`list_dialects` function. The other optional *fmtparams* keyword arguments
Georg Brandl116aa622007-08-15 14:28:22 +000092 can be given to override individual formatting parameters in the current
93 dialect. For full details about the dialect and formatting parameters, see
94 section :ref:`csv-fmt-params`. To make it
95 as easy as possible to interface with modules which implement the DB API, the
96 value :const:`None` is written as the empty string. While this isn't a
97 reversible transformation, it makes it easier to dump SQL NULL data values to
98 CSV files without preprocessing the data returned from a ``cursor.fetch*`` call.
99 All other non-string data are stringified with :func:`str` before being written.
100
Christian Heimesb9eccbf2007-12-05 20:18:38 +0000101 A short usage example::
102
Ezio Melottie34f8a92012-09-15 05:51:45 +0300103 import csv
104 with open('eggs.csv', 'w', newline='') as csvfile:
105 spamwriter = csv.writer(csvfile, delimiter=' ',
106 quotechar='|', quoting=csv.QUOTE_MINIMAL)
107 spamwriter.writerow(['Spam'] * 5 + ['Baked Beans'])
108 spamwriter.writerow(['Spam', 'Lovely Spam', 'Wonderful Spam'])
Christian Heimesb9eccbf2007-12-05 20:18:38 +0000109
Georg Brandl116aa622007-08-15 14:28:22 +0000110
Georg Brandlc2a4f4f2009-04-10 09:03:43 +0000111.. function:: register_dialect(name[, dialect], **fmtparams)
Georg Brandl116aa622007-08-15 14:28:22 +0000112
Georg Brandlf6945182008-02-01 11:56:49 +0000113 Associate *dialect* with *name*. *name* must be a string. The
Georg Brandl116aa622007-08-15 14:28:22 +0000114 dialect can be specified either by passing a sub-class of :class:`Dialect`, or
Georg Brandlc2a4f4f2009-04-10 09:03:43 +0000115 by *fmtparams* keyword arguments, or both, with keyword arguments overriding
Georg Brandl116aa622007-08-15 14:28:22 +0000116 parameters of the dialect. For full details about the dialect and formatting
117 parameters, see section :ref:`csv-fmt-params`.
118
119
120.. function:: unregister_dialect(name)
121
122 Delete the dialect associated with *name* from the dialect registry. An
123 :exc:`Error` is raised if *name* is not a registered dialect name.
124
125
126.. function:: get_dialect(name)
127
Georg Brandl6554cb92007-12-02 23:15:43 +0000128 Return the dialect associated with *name*. An :exc:`Error` is raised if
129 *name* is not a registered dialect name. This function returns an immutable
130 :class:`Dialect`.
Georg Brandl116aa622007-08-15 14:28:22 +0000131
132.. function:: list_dialects()
133
134 Return the names of all registered dialects.
135
136
137.. function:: field_size_limit([new_limit])
138
139 Returns the current maximum field size allowed by the parser. If *new_limit* is
140 given, this becomes the new limit.
141
Georg Brandl116aa622007-08-15 14:28:22 +0000142
143The :mod:`csv` module defines the following classes:
144
Georg Brandlc2a4f4f2009-04-10 09:03:43 +0000145.. class:: DictReader(csvfile, fieldnames=None, restkey=None, restval=None, dialect='excel', *args, **kwds)
Georg Brandl116aa622007-08-15 14:28:22 +0000146
147 Create an object which operates like a regular reader but maps the information
148 read into a dict whose keys are given by the optional *fieldnames* parameter.
149 If the *fieldnames* parameter is omitted, the values in the first row of the
R. David Murray578ec902009-11-09 14:21:38 +0000150 *csvfile* will be used as the fieldnames. If the row read has more fields
151 than the fieldnames sequence, the remaining data is added as a sequence
152 keyed by the value of *restkey*. If the row read has fewer fields than the
153 fieldnames sequence, the remaining keys take the value of the optional
154 *restval* parameter. Any other optional or keyword arguments are passed to
155 the underlying :class:`reader` instance.
Georg Brandl116aa622007-08-15 14:28:22 +0000156
157
Georg Brandlc2a4f4f2009-04-10 09:03:43 +0000158.. class:: DictWriter(csvfile, fieldnames, restval='', extrasaction='raise', dialect='excel', *args, **kwds)
Georg Brandl116aa622007-08-15 14:28:22 +0000159
160 Create an object which operates like a regular writer but maps dictionaries onto
161 output rows. The *fieldnames* parameter identifies the order in which values in
162 the dictionary passed to the :meth:`writerow` method are written to the
163 *csvfile*. The optional *restval* parameter specifies the value to be written
164 if the dictionary is missing a key in *fieldnames*. If the dictionary passed to
165 the :meth:`writerow` method contains a key not found in *fieldnames*, the
166 optional *extrasaction* parameter indicates what action to take. If it is set
167 to ``'raise'`` a :exc:`ValueError` is raised. If it is set to ``'ignore'``,
168 extra values in the dictionary are ignored. Any other optional or keyword
169 arguments are passed to the underlying :class:`writer` instance.
170
171 Note that unlike the :class:`DictReader` class, the *fieldnames* parameter of
172 the :class:`DictWriter` is not optional. Since Python's :class:`dict` objects
173 are not ordered, there is not enough information available to deduce the order
174 in which the row should be written to the *csvfile*.
175
176
177.. class:: Dialect
178
179 The :class:`Dialect` class is a container class relied on primarily for its
180 attributes, which are used to define the parameters for a specific
181 :class:`reader` or :class:`writer` instance.
182
183
184.. class:: excel()
185
186 The :class:`excel` class defines the usual properties of an Excel-generated CSV
187 file. It is registered with the dialect name ``'excel'``.
188
189
190.. class:: excel_tab()
191
192 The :class:`excel_tab` class defines the usual properties of an Excel-generated
193 TAB-delimited file. It is registered with the dialect name ``'excel-tab'``.
194
195
Georg Brandl7424dd32010-10-27 07:27:06 +0000196.. class:: unix_dialect()
197
198 The :class:`unix_dialect` class defines the usual properties of a CSV file
199 generated on UNIX systems, i.e. using ``'\n'`` as line terminator and quoting
200 all fields. It is registered with the dialect name ``'unix'``.
201
202 .. versionadded:: 3.2
203
204
Georg Brandl116aa622007-08-15 14:28:22 +0000205.. class:: Sniffer()
206
207 The :class:`Sniffer` class is used to deduce the format of a CSV file.
208
Benjamin Petersone41251e2008-04-25 01:59:09 +0000209 The :class:`Sniffer` class provides two methods:
Georg Brandl116aa622007-08-15 14:28:22 +0000210
Georg Brandlc2a4f4f2009-04-10 09:03:43 +0000211 .. method:: sniff(sample, delimiters=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000212
Benjamin Petersone41251e2008-04-25 01:59:09 +0000213 Analyze the given *sample* and return a :class:`Dialect` subclass
214 reflecting the parameters found. If the optional *delimiters* parameter
215 is given, it is interpreted as a string containing possible valid
216 delimiter characters.
Georg Brandl116aa622007-08-15 14:28:22 +0000217
218
Benjamin Petersone41251e2008-04-25 01:59:09 +0000219 .. method:: has_header(sample)
Georg Brandl116aa622007-08-15 14:28:22 +0000220
Benjamin Petersone41251e2008-04-25 01:59:09 +0000221 Analyze the sample text (presumed to be in CSV format) and return
222 :const:`True` if the first row appears to be a series of column headers.
Georg Brandl116aa622007-08-15 14:28:22 +0000223
Christian Heimes7f044312008-01-06 17:05:40 +0000224An example for :class:`Sniffer` use::
Georg Brandl116aa622007-08-15 14:28:22 +0000225
Ezio Melottie34f8a92012-09-15 05:51:45 +0300226 with open('example.csv') as csvfile:
227 dialect = csv.Sniffer().sniff(csvfile.read(1024))
228 csvfile.seek(0)
229 reader = csv.reader(csvfile, dialect)
230 # ... process CSV file contents here ...
Christian Heimes7f044312008-01-06 17:05:40 +0000231
232
233The :mod:`csv` module defines the following constants:
Georg Brandl116aa622007-08-15 14:28:22 +0000234
235.. data:: QUOTE_ALL
236
237 Instructs :class:`writer` objects to quote all fields.
238
239
240.. data:: QUOTE_MINIMAL
241
242 Instructs :class:`writer` objects to only quote those fields which contain
243 special characters such as *delimiter*, *quotechar* or any of the characters in
244 *lineterminator*.
245
246
247.. data:: QUOTE_NONNUMERIC
248
249 Instructs :class:`writer` objects to quote all non-numeric fields.
250
251 Instructs the reader to convert all non-quoted fields to type *float*.
252
253
254.. data:: QUOTE_NONE
255
256 Instructs :class:`writer` objects to never quote fields. When the current
257 *delimiter* occurs in output data it is preceded by the current *escapechar*
258 character. If *escapechar* is not set, the writer will raise :exc:`Error` if
259 any characters that require escaping are encountered.
260
261 Instructs :class:`reader` to perform no special processing of quote characters.
262
263The :mod:`csv` module defines the following exception:
264
265
266.. exception:: Error
267
268 Raised by any of the functions when an error is detected.
269
Georg Brandl116aa622007-08-15 14:28:22 +0000270.. _csv-fmt-params:
271
272Dialects and Formatting Parameters
273----------------------------------
274
275To make it easier to specify the format of input and output records, specific
276formatting parameters are grouped together into dialects. A dialect is a
277subclass of the :class:`Dialect` class having a set of specific methods and a
278single :meth:`validate` method. When creating :class:`reader` or
279:class:`writer` objects, the programmer can specify a string or a subclass of
280the :class:`Dialect` class as the dialect parameter. In addition to, or instead
281of, the *dialect* parameter, the programmer can also specify individual
282formatting parameters, which have the same names as the attributes defined below
283for the :class:`Dialect` class.
284
285Dialects support the following attributes:
286
287
288.. attribute:: Dialect.delimiter
289
290 A one-character string used to separate fields. It defaults to ``','``.
291
292
293.. attribute:: Dialect.doublequote
294
295 Controls how instances of *quotechar* appearing inside a field should be
296 themselves be quoted. When :const:`True`, the character is doubled. When
297 :const:`False`, the *escapechar* is used as a prefix to the *quotechar*. It
298 defaults to :const:`True`.
299
300 On output, if *doublequote* is :const:`False` and no *escapechar* is set,
301 :exc:`Error` is raised if a *quotechar* is found in a field.
302
303
304.. attribute:: Dialect.escapechar
305
306 A one-character string used by the writer to escape the *delimiter* if *quoting*
307 is set to :const:`QUOTE_NONE` and the *quotechar* if *doublequote* is
308 :const:`False`. On reading, the *escapechar* removes any special meaning from
309 the following character. It defaults to :const:`None`, which disables escaping.
310
311
312.. attribute:: Dialect.lineterminator
313
314 The string used to terminate lines produced by the :class:`writer`. It defaults
315 to ``'\r\n'``.
316
317 .. note::
318
319 The :class:`reader` is hard-coded to recognise either ``'\r'`` or ``'\n'`` as
320 end-of-line, and ignores *lineterminator*. This behavior may change in the
321 future.
322
323
324.. attribute:: Dialect.quotechar
325
326 A one-character string used to quote fields containing special characters, such
327 as the *delimiter* or *quotechar*, or which contain new-line characters. It
328 defaults to ``'"'``.
329
330
331.. attribute:: Dialect.quoting
332
333 Controls when quotes should be generated by the writer and recognised by the
334 reader. It can take on any of the :const:`QUOTE_\*` constants (see section
335 :ref:`csv-contents`) and defaults to :const:`QUOTE_MINIMAL`.
336
337
338.. attribute:: Dialect.skipinitialspace
339
340 When :const:`True`, whitespace immediately following the *delimiter* is ignored.
341 The default is :const:`False`.
342
343
Ezio Melottia69be282012-11-18 12:55:35 +0200344.. attribute:: Dialect.strict
345
346 When ``True``, raise exception :exc:`Error` on bad CSV input.
347 The default is ``False``.
348
Georg Brandl116aa622007-08-15 14:28:22 +0000349Reader Objects
350--------------
351
352Reader objects (:class:`DictReader` instances and objects returned by the
353:func:`reader` function) have the following public methods:
354
Georg Brandlc7485062009-04-01 15:53:15 +0000355.. method:: csvreader.__next__()
Georg Brandl116aa622007-08-15 14:28:22 +0000356
357 Return the next row of the reader's iterable object as a list, parsed according
Georg Brandlc7485062009-04-01 15:53:15 +0000358 to the current dialect. Usually you should call this as ``next(reader)``.
359
Georg Brandl116aa622007-08-15 14:28:22 +0000360
361Reader objects have the following public attributes:
362
Georg Brandl116aa622007-08-15 14:28:22 +0000363.. attribute:: csvreader.dialect
364
365 A read-only description of the dialect in use by the parser.
366
367
368.. attribute:: csvreader.line_num
369
370 The number of lines read from the source iterator. This is not the same as the
371 number of records returned, as records can span multiple lines.
372
Georg Brandl116aa622007-08-15 14:28:22 +0000373
Skip Montanaroaf8fcfa2008-08-09 19:44:22 +0000374DictReader objects have the following public attribute:
375
Skip Montanaroaf8fcfa2008-08-09 19:44:22 +0000376.. attribute:: csvreader.fieldnames
377
378 If not passed as a parameter when creating the object, this attribute is
379 initialized upon first access or when the first record is read from the
380 file.
381
Skip Montanaroaf8fcfa2008-08-09 19:44:22 +0000382
383
Georg Brandl116aa622007-08-15 14:28:22 +0000384Writer Objects
385--------------
386
387:class:`Writer` objects (:class:`DictWriter` instances and objects returned by
388the :func:`writer` function) have the following public methods. A *row* must be
389a sequence of strings or numbers for :class:`Writer` objects and a dictionary
390mapping fieldnames to strings or numbers (by passing them through :func:`str`
391first) for :class:`DictWriter` objects. Note that complex numbers are written
392out surrounded by parens. This may cause some problems for other programs which
393read CSV files (assuming they support complex numbers at all).
394
395
396.. method:: csvwriter.writerow(row)
397
398 Write the *row* parameter to the writer's file object, formatted according to
399 the current dialect.
400
401
402.. method:: csvwriter.writerows(rows)
403
404 Write all the *rows* parameters (a list of *row* objects as described above) to
405 the writer's file object, formatted according to the current dialect.
406
407Writer objects have the following public attribute:
408
409
410.. attribute:: csvwriter.dialect
411
412 A read-only description of the dialect in use by the writer.
413
414
R. David Murraybe0698b2010-02-23 22:57:58 +0000415DictWriter objects have the following public method:
416
417
418.. method:: DictWriter.writeheader()
419
420 Write a row with the field names (as specified in the constructor).
421
R. David Murray19e45482010-02-23 23:00:34 +0000422 .. versionadded:: 3.2
R. David Murraybe0698b2010-02-23 22:57:58 +0000423
424
Georg Brandl116aa622007-08-15 14:28:22 +0000425.. _csv-examples:
426
427Examples
428--------
429
430The simplest example of reading a CSV file::
431
432 import csv
Eli Bendersky6860a922011-03-11 15:47:36 +0200433 with open('some.csv', newline='') as f:
434 reader = csv.reader(f)
435 for row in reader:
436 print(row)
Georg Brandl116aa622007-08-15 14:28:22 +0000437
438Reading a file with an alternate format::
439
440 import csv
R David Murray91887022011-03-19 22:30:14 -0400441 with open('passwd', newline='') as f:
Eli Bendersky6860a922011-03-11 15:47:36 +0200442 reader = csv.reader(f, delimiter=':', quoting=csv.QUOTE_NONE)
443 for row in reader:
444 print(row)
Georg Brandl116aa622007-08-15 14:28:22 +0000445
446The corresponding simplest possible writing example is::
447
448 import csv
R David Murray91887022011-03-19 22:30:14 -0400449 with open('some.csv', 'w', newline='') as f:
Eli Bendersky6860a922011-03-11 15:47:36 +0200450 writer = csv.writer(f)
451 writer.writerows(someiterable)
Georg Brandl116aa622007-08-15 14:28:22 +0000452
R. David Murray8b7d4aa2009-04-04 01:38:38 +0000453Since :func:`open` is used to open a CSV file for reading, the file
454will by default be decoded into unicode using the system default
455encoding (see :func:`locale.getpreferredencoding`). To decode a file
456using a different encoding, use the ``encoding`` argument of open::
457
Eli Bendersky6860a922011-03-11 15:47:36 +0200458 import csv
459 with open('some.csv', newline='', encoding='utf-8') as f:
460 reader = csv.reader(f)
461 for row in reader:
462 print(row)
R. David Murray8b7d4aa2009-04-04 01:38:38 +0000463
464The same applies to writing in something other than the system default
465encoding: specify the encoding argument when opening the output file.
466
Georg Brandl116aa622007-08-15 14:28:22 +0000467Registering a new dialect::
468
469 import csv
Georg Brandl116aa622007-08-15 14:28:22 +0000470 csv.register_dialect('unixpwd', delimiter=':', quoting=csv.QUOTE_NONE)
R David Murray91887022011-03-19 22:30:14 -0400471 with open('passwd', newline='') as f:
Eli Bendersky6860a922011-03-11 15:47:36 +0200472 reader = csv.reader(f, 'unixpwd')
Georg Brandl116aa622007-08-15 14:28:22 +0000473
474A slightly more advanced use of the reader --- catching and reporting errors::
475
476 import csv, sys
Eli Bendersky6860a922011-03-11 15:47:36 +0200477 filename = 'some.csv'
478 with open(filename, newline='') as f:
479 reader = csv.reader(f)
480 try:
481 for row in reader:
482 print(row)
483 except csv.Error as e:
484 sys.exit('file {}, line {}: {}'.format(filename, reader.line_num, e))
Georg Brandl116aa622007-08-15 14:28:22 +0000485
486And while the module doesn't directly support parsing strings, it can easily be
487done::
488
489 import csv
490 for row in csv.reader(['one,two,three']):
Georg Brandl6911e3c2007-09-04 07:15:32 +0000491 print(row)
Georg Brandl116aa622007-08-15 14:28:22 +0000492
Georg Brandl116aa622007-08-15 14:28:22 +0000493
R. David Murray8b7d4aa2009-04-04 01:38:38 +0000494.. rubric:: Footnotes
Georg Brandl116aa622007-08-15 14:28:22 +0000495
R David Murray91887022011-03-19 22:30:14 -0400496.. [1] If ``newline=''`` is not specified, newlines embedded inside quoted fields
497 will not be interpreted correctly, and on platforms that use ``\r\n`` linendings
R David Murray9c0d5ea2011-03-20 11:18:21 -0400498 on write an extra ``\r`` will be added. It should always be safe to specify
R David Murray1b00f252012-08-15 10:43:58 -0400499 ``newline=''``, since the csv module does its own
500 (:term:`universal <universal newlines>`) newline handling.