blob: ec0dfccfab7b9f34934969fa5c7aee437e7b1752 [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001:mod:`csv` --- CSV File Reading and Writing
2===========================================
3
4.. module:: csv
5 :synopsis: Write and read tabular data to and from delimited files.
6.. sectionauthor:: Skip Montanaro <skip@pobox.com>
7
8
Georg Brandl116aa622007-08-15 14:28:22 +00009.. index::
10 single: csv
11 pair: data; tabular
12
13The so-called CSV (Comma Separated Values) format is the most common import and
Skip Montanarob40dea72011-03-19 09:09:30 -050014export format for spreadsheets and databases. CSV format was used for many
15years prior to attempts to describe the format in a standardized way in
16:rfc:`4180`. The lack of a well-defined standard means that subtle differences
17often exist in the data produced and consumed by different applications. These
18differences can make it annoying to process CSV files from multiple sources.
19Still, while the delimiters and quoting characters vary, the overall format is
20similar enough that it is possible to write a single module which can
21efficiently manipulate such data, hiding the details of reading and writing the
22data from the programmer.
Georg Brandl116aa622007-08-15 14:28:22 +000023
24The :mod:`csv` module implements classes to read and write tabular data in CSV
25format. It allows programmers to say, "write this data in the format preferred
26by Excel," or "read data from this file which was generated by Excel," without
27knowing the precise details of the CSV format used by Excel. Programmers can
28also describe the CSV formats understood by other applications or define their
29own special-purpose CSV formats.
30
31The :mod:`csv` module's :class:`reader` and :class:`writer` objects read and
32write sequences. Programmers can also read and write data in dictionary form
33using the :class:`DictReader` and :class:`DictWriter` classes.
34
Georg Brandl116aa622007-08-15 14:28:22 +000035.. seealso::
36
Georg Brandl116aa622007-08-15 14:28:22 +000037 :pep:`305` - CSV File API
38 The Python Enhancement Proposal which proposed this addition to Python.
39
40
41.. _csv-contents:
42
43Module Contents
44---------------
45
46The :mod:`csv` module defines the following functions:
47
48
Georg Brandlc2a4f4f2009-04-10 09:03:43 +000049.. function:: reader(csvfile, dialect='excel', **fmtparams)
Georg Brandl116aa622007-08-15 14:28:22 +000050
51 Return a reader object which will iterate over lines in the given *csvfile*.
Georg Brandl9afde1c2007-11-01 20:32:30 +000052 *csvfile* can be any object which supports the :term:`iterator` protocol and returns a
Georg Brandlb30f3302011-01-06 09:23:56 +000053 string each time its :meth:`!__next__` method is called --- :term:`file objects
Antoine Pitrou11cb9612010-09-15 11:11:28 +000054 <file object>` and list objects are both suitable. If *csvfile* is a file object,
R David Murray91887022011-03-19 22:30:14 -040055 it should be opened with ``newline=''``. [1]_ An optional
Georg Brandl116aa622007-08-15 14:28:22 +000056 *dialect* parameter can be given which is used to define a set of parameters
57 specific to a particular CSV dialect. It may be an instance of a subclass of
58 the :class:`Dialect` class or one of the strings returned by the
Georg Brandlc2a4f4f2009-04-10 09:03:43 +000059 :func:`list_dialects` function. The other optional *fmtparams* keyword arguments
Georg Brandl116aa622007-08-15 14:28:22 +000060 can be given to override individual formatting parameters in the current
61 dialect. For full details about the dialect and formatting parameters, see
62 section :ref:`csv-fmt-params`.
63
Skip Montanaro0468df32009-03-25 00:52:18 +000064 Each row read from the csv file is returned as a list of strings. No
R. David Murray8b7d4aa2009-04-04 01:38:38 +000065 automatic data type conversion is performed unless the ``QUOTE_NONNUMERIC`` format
66 option is specified (in which case unquoted fields are transformed into floats).
Georg Brandl116aa622007-08-15 14:28:22 +000067
Christian Heimesb9eccbf2007-12-05 20:18:38 +000068 A short usage example::
Georg Brandl48310cd2009-01-03 21:18:54 +000069
Christian Heimesb9eccbf2007-12-05 20:18:38 +000070 >>> import csv
R. David Murray8b7d4aa2009-04-04 01:38:38 +000071 >>> spamReader = csv.reader(open('eggs.csv', newline=''), delimiter=' ', quotechar='|')
Christian Heimesb9eccbf2007-12-05 20:18:38 +000072 >>> for row in spamReader:
Georg Brandlf6945182008-02-01 11:56:49 +000073 ... print(', '.join(row))
Christian Heimesb9eccbf2007-12-05 20:18:38 +000074 Spam, Spam, Spam, Spam, Spam, Baked Beans
75 Spam, Lovely Spam, Wonderful Spam
76
Georg Brandl116aa622007-08-15 14:28:22 +000077
Georg Brandlc2a4f4f2009-04-10 09:03:43 +000078.. function:: writer(csvfile, dialect='excel', **fmtparams)
Georg Brandl116aa622007-08-15 14:28:22 +000079
80 Return a writer object responsible for converting the user's data into delimited
81 strings on the given file-like object. *csvfile* can be any object with a
R David Murray9c0d5ea2011-03-20 11:18:21 -040082 :func:`write` method. If *csvfile* is a file object, it should be opened with
83 ``newline=''`` [1]_. An optional *dialect*
Georg Brandl116aa622007-08-15 14:28:22 +000084 parameter can be given which is used to define a set of parameters specific to a
85 particular CSV dialect. It may be an instance of a subclass of the
86 :class:`Dialect` class or one of the strings returned by the
Georg Brandlc2a4f4f2009-04-10 09:03:43 +000087 :func:`list_dialects` function. The other optional *fmtparams* keyword arguments
Georg Brandl116aa622007-08-15 14:28:22 +000088 can be given to override individual formatting parameters in the current
89 dialect. For full details about the dialect and formatting parameters, see
90 section :ref:`csv-fmt-params`. To make it
91 as easy as possible to interface with modules which implement the DB API, the
92 value :const:`None` is written as the empty string. While this isn't a
93 reversible transformation, it makes it easier to dump SQL NULL data values to
94 CSV files without preprocessing the data returned from a ``cursor.fetch*`` call.
95 All other non-string data are stringified with :func:`str` before being written.
96
Christian Heimesb9eccbf2007-12-05 20:18:38 +000097 A short usage example::
98
99 >>> import csv
R David Murray91887022011-03-19 22:30:14 -0400100 >>> spamWriter = csv.writer(open('eggs.csv', 'w', newline=''), delimiter=' ',
Skip Montanarof290cb52009-08-18 14:37:42 +0000101 ... quotechar='|', quoting=csv.QUOTE_MINIMAL)
Christian Heimesb9eccbf2007-12-05 20:18:38 +0000102 >>> spamWriter.writerow(['Spam'] * 5 + ['Baked Beans'])
103 >>> spamWriter.writerow(['Spam', 'Lovely Spam', 'Wonderful Spam'])
104
Georg Brandl116aa622007-08-15 14:28:22 +0000105
Georg Brandlc2a4f4f2009-04-10 09:03:43 +0000106.. function:: register_dialect(name[, dialect], **fmtparams)
Georg Brandl116aa622007-08-15 14:28:22 +0000107
Georg Brandlf6945182008-02-01 11:56:49 +0000108 Associate *dialect* with *name*. *name* must be a string. The
Georg Brandl116aa622007-08-15 14:28:22 +0000109 dialect can be specified either by passing a sub-class of :class:`Dialect`, or
Georg Brandlc2a4f4f2009-04-10 09:03:43 +0000110 by *fmtparams* keyword arguments, or both, with keyword arguments overriding
Georg Brandl116aa622007-08-15 14:28:22 +0000111 parameters of the dialect. For full details about the dialect and formatting
112 parameters, see section :ref:`csv-fmt-params`.
113
114
115.. function:: unregister_dialect(name)
116
117 Delete the dialect associated with *name* from the dialect registry. An
118 :exc:`Error` is raised if *name* is not a registered dialect name.
119
120
121.. function:: get_dialect(name)
122
Georg Brandl6554cb92007-12-02 23:15:43 +0000123 Return the dialect associated with *name*. An :exc:`Error` is raised if
124 *name* is not a registered dialect name. This function returns an immutable
125 :class:`Dialect`.
Georg Brandl116aa622007-08-15 14:28:22 +0000126
127.. function:: list_dialects()
128
129 Return the names of all registered dialects.
130
131
132.. function:: field_size_limit([new_limit])
133
134 Returns the current maximum field size allowed by the parser. If *new_limit* is
135 given, this becomes the new limit.
136
Georg Brandl116aa622007-08-15 14:28:22 +0000137
138The :mod:`csv` module defines the following classes:
139
Georg Brandlc2a4f4f2009-04-10 09:03:43 +0000140.. class:: DictReader(csvfile, fieldnames=None, restkey=None, restval=None, dialect='excel', *args, **kwds)
Georg Brandl116aa622007-08-15 14:28:22 +0000141
142 Create an object which operates like a regular reader but maps the information
143 read into a dict whose keys are given by the optional *fieldnames* parameter.
144 If the *fieldnames* parameter is omitted, the values in the first row of the
R. David Murray578ec902009-11-09 14:21:38 +0000145 *csvfile* will be used as the fieldnames. If the row read has more fields
146 than the fieldnames sequence, the remaining data is added as a sequence
147 keyed by the value of *restkey*. If the row read has fewer fields than the
148 fieldnames sequence, the remaining keys take the value of the optional
149 *restval* parameter. Any other optional or keyword arguments are passed to
150 the underlying :class:`reader` instance.
Georg Brandl116aa622007-08-15 14:28:22 +0000151
152
Georg Brandlc2a4f4f2009-04-10 09:03:43 +0000153.. class:: DictWriter(csvfile, fieldnames, restval='', extrasaction='raise', dialect='excel', *args, **kwds)
Georg Brandl116aa622007-08-15 14:28:22 +0000154
155 Create an object which operates like a regular writer but maps dictionaries onto
156 output rows. The *fieldnames* parameter identifies the order in which values in
157 the dictionary passed to the :meth:`writerow` method are written to the
158 *csvfile*. The optional *restval* parameter specifies the value to be written
159 if the dictionary is missing a key in *fieldnames*. If the dictionary passed to
160 the :meth:`writerow` method contains a key not found in *fieldnames*, the
161 optional *extrasaction* parameter indicates what action to take. If it is set
162 to ``'raise'`` a :exc:`ValueError` is raised. If it is set to ``'ignore'``,
163 extra values in the dictionary are ignored. Any other optional or keyword
164 arguments are passed to the underlying :class:`writer` instance.
165
166 Note that unlike the :class:`DictReader` class, the *fieldnames* parameter of
167 the :class:`DictWriter` is not optional. Since Python's :class:`dict` objects
168 are not ordered, there is not enough information available to deduce the order
169 in which the row should be written to the *csvfile*.
170
171
172.. class:: Dialect
173
174 The :class:`Dialect` class is a container class relied on primarily for its
175 attributes, which are used to define the parameters for a specific
176 :class:`reader` or :class:`writer` instance.
177
178
179.. class:: excel()
180
181 The :class:`excel` class defines the usual properties of an Excel-generated CSV
182 file. It is registered with the dialect name ``'excel'``.
183
184
185.. class:: excel_tab()
186
187 The :class:`excel_tab` class defines the usual properties of an Excel-generated
188 TAB-delimited file. It is registered with the dialect name ``'excel-tab'``.
189
190
Georg Brandl7424dd32010-10-27 07:27:06 +0000191.. class:: unix_dialect()
192
193 The :class:`unix_dialect` class defines the usual properties of a CSV file
194 generated on UNIX systems, i.e. using ``'\n'`` as line terminator and quoting
195 all fields. It is registered with the dialect name ``'unix'``.
196
197 .. versionadded:: 3.2
198
199
Georg Brandl116aa622007-08-15 14:28:22 +0000200.. class:: Sniffer()
201
202 The :class:`Sniffer` class is used to deduce the format of a CSV file.
203
Benjamin Petersone41251e2008-04-25 01:59:09 +0000204 The :class:`Sniffer` class provides two methods:
Georg Brandl116aa622007-08-15 14:28:22 +0000205
Georg Brandlc2a4f4f2009-04-10 09:03:43 +0000206 .. method:: sniff(sample, delimiters=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000207
Benjamin Petersone41251e2008-04-25 01:59:09 +0000208 Analyze the given *sample* and return a :class:`Dialect` subclass
209 reflecting the parameters found. If the optional *delimiters* parameter
210 is given, it is interpreted as a string containing possible valid
211 delimiter characters.
Georg Brandl116aa622007-08-15 14:28:22 +0000212
213
Benjamin Petersone41251e2008-04-25 01:59:09 +0000214 .. method:: has_header(sample)
Georg Brandl116aa622007-08-15 14:28:22 +0000215
Benjamin Petersone41251e2008-04-25 01:59:09 +0000216 Analyze the sample text (presumed to be in CSV format) and return
217 :const:`True` if the first row appears to be a series of column headers.
Georg Brandl116aa622007-08-15 14:28:22 +0000218
Christian Heimes7f044312008-01-06 17:05:40 +0000219An example for :class:`Sniffer` use::
Georg Brandl116aa622007-08-15 14:28:22 +0000220
Christian Heimes7f044312008-01-06 17:05:40 +0000221 csvfile = open("example.csv")
222 dialect = csv.Sniffer().sniff(csvfile.read(1024))
223 csvfile.seek(0)
224 reader = csv.reader(csvfile, dialect)
225 # ... process CSV file contents here ...
226
227
228The :mod:`csv` module defines the following constants:
Georg Brandl116aa622007-08-15 14:28:22 +0000229
230.. data:: QUOTE_ALL
231
232 Instructs :class:`writer` objects to quote all fields.
233
234
235.. data:: QUOTE_MINIMAL
236
237 Instructs :class:`writer` objects to only quote those fields which contain
238 special characters such as *delimiter*, *quotechar* or any of the characters in
239 *lineterminator*.
240
241
242.. data:: QUOTE_NONNUMERIC
243
244 Instructs :class:`writer` objects to quote all non-numeric fields.
245
246 Instructs the reader to convert all non-quoted fields to type *float*.
247
248
249.. data:: QUOTE_NONE
250
251 Instructs :class:`writer` objects to never quote fields. When the current
252 *delimiter* occurs in output data it is preceded by the current *escapechar*
253 character. If *escapechar* is not set, the writer will raise :exc:`Error` if
254 any characters that require escaping are encountered.
255
256 Instructs :class:`reader` to perform no special processing of quote characters.
257
258The :mod:`csv` module defines the following exception:
259
260
261.. exception:: Error
262
263 Raised by any of the functions when an error is detected.
264
Georg Brandl116aa622007-08-15 14:28:22 +0000265.. _csv-fmt-params:
266
267Dialects and Formatting Parameters
268----------------------------------
269
270To make it easier to specify the format of input and output records, specific
271formatting parameters are grouped together into dialects. A dialect is a
272subclass of the :class:`Dialect` class having a set of specific methods and a
273single :meth:`validate` method. When creating :class:`reader` or
274:class:`writer` objects, the programmer can specify a string or a subclass of
275the :class:`Dialect` class as the dialect parameter. In addition to, or instead
276of, the *dialect* parameter, the programmer can also specify individual
277formatting parameters, which have the same names as the attributes defined below
278for the :class:`Dialect` class.
279
280Dialects support the following attributes:
281
282
283.. attribute:: Dialect.delimiter
284
285 A one-character string used to separate fields. It defaults to ``','``.
286
287
288.. attribute:: Dialect.doublequote
289
290 Controls how instances of *quotechar* appearing inside a field should be
291 themselves be quoted. When :const:`True`, the character is doubled. When
292 :const:`False`, the *escapechar* is used as a prefix to the *quotechar*. It
293 defaults to :const:`True`.
294
295 On output, if *doublequote* is :const:`False` and no *escapechar* is set,
296 :exc:`Error` is raised if a *quotechar* is found in a field.
297
298
299.. attribute:: Dialect.escapechar
300
301 A one-character string used by the writer to escape the *delimiter* if *quoting*
302 is set to :const:`QUOTE_NONE` and the *quotechar* if *doublequote* is
303 :const:`False`. On reading, the *escapechar* removes any special meaning from
304 the following character. It defaults to :const:`None`, which disables escaping.
305
306
307.. attribute:: Dialect.lineterminator
308
309 The string used to terminate lines produced by the :class:`writer`. It defaults
310 to ``'\r\n'``.
311
312 .. note::
313
314 The :class:`reader` is hard-coded to recognise either ``'\r'`` or ``'\n'`` as
315 end-of-line, and ignores *lineterminator*. This behavior may change in the
316 future.
317
318
319.. attribute:: Dialect.quotechar
320
321 A one-character string used to quote fields containing special characters, such
322 as the *delimiter* or *quotechar*, or which contain new-line characters. It
323 defaults to ``'"'``.
324
325
326.. attribute:: Dialect.quoting
327
328 Controls when quotes should be generated by the writer and recognised by the
329 reader. It can take on any of the :const:`QUOTE_\*` constants (see section
330 :ref:`csv-contents`) and defaults to :const:`QUOTE_MINIMAL`.
331
332
333.. attribute:: Dialect.skipinitialspace
334
335 When :const:`True`, whitespace immediately following the *delimiter* is ignored.
336 The default is :const:`False`.
337
338
339Reader Objects
340--------------
341
342Reader objects (:class:`DictReader` instances and objects returned by the
343:func:`reader` function) have the following public methods:
344
Georg Brandlc7485062009-04-01 15:53:15 +0000345.. method:: csvreader.__next__()
Georg Brandl116aa622007-08-15 14:28:22 +0000346
347 Return the next row of the reader's iterable object as a list, parsed according
Georg Brandlc7485062009-04-01 15:53:15 +0000348 to the current dialect. Usually you should call this as ``next(reader)``.
349
Georg Brandl116aa622007-08-15 14:28:22 +0000350
351Reader objects have the following public attributes:
352
Georg Brandl116aa622007-08-15 14:28:22 +0000353.. attribute:: csvreader.dialect
354
355 A read-only description of the dialect in use by the parser.
356
357
358.. attribute:: csvreader.line_num
359
360 The number of lines read from the source iterator. This is not the same as the
361 number of records returned, as records can span multiple lines.
362
Georg Brandl116aa622007-08-15 14:28:22 +0000363
Skip Montanaroaf8fcfa2008-08-09 19:44:22 +0000364DictReader objects have the following public attribute:
365
Skip Montanaroaf8fcfa2008-08-09 19:44:22 +0000366.. attribute:: csvreader.fieldnames
367
368 If not passed as a parameter when creating the object, this attribute is
369 initialized upon first access or when the first record is read from the
370 file.
371
Skip Montanaroaf8fcfa2008-08-09 19:44:22 +0000372
373
Georg Brandl116aa622007-08-15 14:28:22 +0000374Writer Objects
375--------------
376
377:class:`Writer` objects (:class:`DictWriter` instances and objects returned by
378the :func:`writer` function) have the following public methods. A *row* must be
379a sequence of strings or numbers for :class:`Writer` objects and a dictionary
380mapping fieldnames to strings or numbers (by passing them through :func:`str`
381first) for :class:`DictWriter` objects. Note that complex numbers are written
382out surrounded by parens. This may cause some problems for other programs which
383read CSV files (assuming they support complex numbers at all).
384
385
386.. method:: csvwriter.writerow(row)
387
388 Write the *row* parameter to the writer's file object, formatted according to
389 the current dialect.
390
391
392.. method:: csvwriter.writerows(rows)
393
394 Write all the *rows* parameters (a list of *row* objects as described above) to
395 the writer's file object, formatted according to the current dialect.
396
397Writer objects have the following public attribute:
398
399
400.. attribute:: csvwriter.dialect
401
402 A read-only description of the dialect in use by the writer.
403
404
R. David Murraybe0698b2010-02-23 22:57:58 +0000405DictWriter objects have the following public method:
406
407
408.. method:: DictWriter.writeheader()
409
410 Write a row with the field names (as specified in the constructor).
411
R. David Murray19e45482010-02-23 23:00:34 +0000412 .. versionadded:: 3.2
R. David Murraybe0698b2010-02-23 22:57:58 +0000413
414
Georg Brandl116aa622007-08-15 14:28:22 +0000415.. _csv-examples:
416
417Examples
418--------
419
420The simplest example of reading a CSV file::
421
422 import csv
Eli Bendersky6860a922011-03-11 15:47:36 +0200423 with open('some.csv', newline='') as f:
424 reader = csv.reader(f)
425 for row in reader:
426 print(row)
Georg Brandl116aa622007-08-15 14:28:22 +0000427
428Reading a file with an alternate format::
429
430 import csv
R David Murray91887022011-03-19 22:30:14 -0400431 with open('passwd', newline='') as f:
Eli Bendersky6860a922011-03-11 15:47:36 +0200432 reader = csv.reader(f, delimiter=':', quoting=csv.QUOTE_NONE)
433 for row in reader:
434 print(row)
Georg Brandl116aa622007-08-15 14:28:22 +0000435
436The corresponding simplest possible writing example is::
437
438 import csv
R David Murray91887022011-03-19 22:30:14 -0400439 with open('some.csv', 'w', newline='') as f:
Eli Bendersky6860a922011-03-11 15:47:36 +0200440 writer = csv.writer(f)
441 writer.writerows(someiterable)
Georg Brandl116aa622007-08-15 14:28:22 +0000442
R. David Murray8b7d4aa2009-04-04 01:38:38 +0000443Since :func:`open` is used to open a CSV file for reading, the file
444will by default be decoded into unicode using the system default
445encoding (see :func:`locale.getpreferredencoding`). To decode a file
446using a different encoding, use the ``encoding`` argument of open::
447
Eli Bendersky6860a922011-03-11 15:47:36 +0200448 import csv
449 with open('some.csv', newline='', encoding='utf-8') as f:
450 reader = csv.reader(f)
451 for row in reader:
452 print(row)
R. David Murray8b7d4aa2009-04-04 01:38:38 +0000453
454The same applies to writing in something other than the system default
455encoding: specify the encoding argument when opening the output file.
456
Georg Brandl116aa622007-08-15 14:28:22 +0000457Registering a new dialect::
458
459 import csv
Georg Brandl116aa622007-08-15 14:28:22 +0000460 csv.register_dialect('unixpwd', delimiter=':', quoting=csv.QUOTE_NONE)
R David Murray91887022011-03-19 22:30:14 -0400461 with open('passwd', newline='') as f:
Eli Bendersky6860a922011-03-11 15:47:36 +0200462 reader = csv.reader(f, 'unixpwd')
Georg Brandl116aa622007-08-15 14:28:22 +0000463
464A slightly more advanced use of the reader --- catching and reporting errors::
465
466 import csv, sys
Eli Bendersky6860a922011-03-11 15:47:36 +0200467 filename = 'some.csv'
468 with open(filename, newline='') as f:
469 reader = csv.reader(f)
470 try:
471 for row in reader:
472 print(row)
473 except csv.Error as e:
474 sys.exit('file {}, line {}: {}'.format(filename, reader.line_num, e))
Georg Brandl116aa622007-08-15 14:28:22 +0000475
476And while the module doesn't directly support parsing strings, it can easily be
477done::
478
479 import csv
480 for row in csv.reader(['one,two,three']):
Georg Brandl6911e3c2007-09-04 07:15:32 +0000481 print(row)
Georg Brandl116aa622007-08-15 14:28:22 +0000482
Georg Brandl116aa622007-08-15 14:28:22 +0000483
R. David Murray8b7d4aa2009-04-04 01:38:38 +0000484.. rubric:: Footnotes
Georg Brandl116aa622007-08-15 14:28:22 +0000485
R David Murray91887022011-03-19 22:30:14 -0400486.. [1] If ``newline=''`` is not specified, newlines embedded inside quoted fields
487 will not be interpreted correctly, and on platforms that use ``\r\n`` linendings
R David Murray9c0d5ea2011-03-20 11:18:21 -0400488 on write an extra ``\r`` will be added. It should always be safe to specify
R David Murray91887022011-03-19 22:30:14 -0400489 ``newline=''``, since the csv module does its own (universal) newline handling.