blob: ea1834972173108532111e33ef4041f6b9745f58 [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001:mod:`csv` --- CSV File Reading and Writing
2===========================================
3
4.. module:: csv
5 :synopsis: Write and read tabular data to and from delimited files.
6.. sectionauthor:: Skip Montanaro <skip@pobox.com>
7
8
Georg Brandl116aa622007-08-15 14:28:22 +00009.. index::
10 single: csv
11 pair: data; tabular
12
13The so-called CSV (Comma Separated Values) format is the most common import and
14export format for spreadsheets and databases. There is no "CSV standard", so
15the format is operationally defined by the many applications which read and
16write it. The lack of a standard means that subtle differences often exist in
17the data produced and consumed by different applications. These differences can
18make it annoying to process CSV files from multiple sources. Still, while the
19delimiters and quoting characters vary, the overall format is similar enough
20that it is possible to write a single module which can efficiently manipulate
21such data, hiding the details of reading and writing the data from the
22programmer.
23
24The :mod:`csv` module implements classes to read and write tabular data in CSV
25format. It allows programmers to say, "write this data in the format preferred
26by Excel," or "read data from this file which was generated by Excel," without
27knowing the precise details of the CSV format used by Excel. Programmers can
28also describe the CSV formats understood by other applications or define their
29own special-purpose CSV formats.
30
31The :mod:`csv` module's :class:`reader` and :class:`writer` objects read and
32write sequences. Programmers can also read and write data in dictionary form
33using the :class:`DictReader` and :class:`DictWriter` classes.
34
Georg Brandl116aa622007-08-15 14:28:22 +000035.. seealso::
36
Georg Brandl116aa622007-08-15 14:28:22 +000037 :pep:`305` - CSV File API
38 The Python Enhancement Proposal which proposed this addition to Python.
39
40
41.. _csv-contents:
42
43Module Contents
44---------------
45
46The :mod:`csv` module defines the following functions:
47
48
Georg Brandlc2a4f4f2009-04-10 09:03:43 +000049.. function:: reader(csvfile, dialect='excel', **fmtparams)
Georg Brandl116aa622007-08-15 14:28:22 +000050
51 Return a reader object which will iterate over lines in the given *csvfile*.
Georg Brandl9afde1c2007-11-01 20:32:30 +000052 *csvfile* can be any object which supports the :term:`iterator` protocol and returns a
Antoine Pitrou11cb9612010-09-15 11:11:28 +000053 string each time its :meth:`!next` method is called --- :term:`file objects
54 <file object>` and list objects are both suitable. If *csvfile* is a file object,
55 it should be opened with ``newline=''``. [#]_ An optional
Georg Brandl116aa622007-08-15 14:28:22 +000056 *dialect* parameter can be given which is used to define a set of parameters
57 specific to a particular CSV dialect. It may be an instance of a subclass of
58 the :class:`Dialect` class or one of the strings returned by the
Georg Brandlc2a4f4f2009-04-10 09:03:43 +000059 :func:`list_dialects` function. The other optional *fmtparams* keyword arguments
Georg Brandl116aa622007-08-15 14:28:22 +000060 can be given to override individual formatting parameters in the current
61 dialect. For full details about the dialect and formatting parameters, see
62 section :ref:`csv-fmt-params`.
63
Skip Montanaro0468df32009-03-25 00:52:18 +000064 Each row read from the csv file is returned as a list of strings. No
R. David Murray8b7d4aa2009-04-04 01:38:38 +000065 automatic data type conversion is performed unless the ``QUOTE_NONNUMERIC`` format
66 option is specified (in which case unquoted fields are transformed into floats).
Georg Brandl116aa622007-08-15 14:28:22 +000067
Christian Heimesb9eccbf2007-12-05 20:18:38 +000068 A short usage example::
Georg Brandl48310cd2009-01-03 21:18:54 +000069
Christian Heimesb9eccbf2007-12-05 20:18:38 +000070 >>> import csv
R. David Murray8b7d4aa2009-04-04 01:38:38 +000071 >>> spamReader = csv.reader(open('eggs.csv', newline=''), delimiter=' ', quotechar='|')
Christian Heimesb9eccbf2007-12-05 20:18:38 +000072 >>> for row in spamReader:
Georg Brandlf6945182008-02-01 11:56:49 +000073 ... print(', '.join(row))
Christian Heimesb9eccbf2007-12-05 20:18:38 +000074 Spam, Spam, Spam, Spam, Spam, Baked Beans
75 Spam, Lovely Spam, Wonderful Spam
76
Georg Brandl116aa622007-08-15 14:28:22 +000077
Georg Brandlc2a4f4f2009-04-10 09:03:43 +000078.. function:: writer(csvfile, dialect='excel', **fmtparams)
Georg Brandl116aa622007-08-15 14:28:22 +000079
80 Return a writer object responsible for converting the user's data into delimited
81 strings on the given file-like object. *csvfile* can be any object with a
R. David Murray8b7d4aa2009-04-04 01:38:38 +000082 :func:`write` method. An optional *dialect*
Georg Brandl116aa622007-08-15 14:28:22 +000083 parameter can be given which is used to define a set of parameters specific to a
84 particular CSV dialect. It may be an instance of a subclass of the
85 :class:`Dialect` class or one of the strings returned by the
Georg Brandlc2a4f4f2009-04-10 09:03:43 +000086 :func:`list_dialects` function. The other optional *fmtparams* keyword arguments
Georg Brandl116aa622007-08-15 14:28:22 +000087 can be given to override individual formatting parameters in the current
88 dialect. For full details about the dialect and formatting parameters, see
89 section :ref:`csv-fmt-params`. To make it
90 as easy as possible to interface with modules which implement the DB API, the
91 value :const:`None` is written as the empty string. While this isn't a
92 reversible transformation, it makes it easier to dump SQL NULL data values to
93 CSV files without preprocessing the data returned from a ``cursor.fetch*`` call.
94 All other non-string data are stringified with :func:`str` before being written.
95
Christian Heimesb9eccbf2007-12-05 20:18:38 +000096 A short usage example::
97
98 >>> import csv
99 >>> spamWriter = csv.writer(open('eggs.csv', 'w'), delimiter=' ',
Skip Montanarof290cb52009-08-18 14:37:42 +0000100 ... quotechar='|', quoting=csv.QUOTE_MINIMAL)
Christian Heimesb9eccbf2007-12-05 20:18:38 +0000101 >>> spamWriter.writerow(['Spam'] * 5 + ['Baked Beans'])
102 >>> spamWriter.writerow(['Spam', 'Lovely Spam', 'Wonderful Spam'])
103
Georg Brandl116aa622007-08-15 14:28:22 +0000104
Georg Brandlc2a4f4f2009-04-10 09:03:43 +0000105.. function:: register_dialect(name[, dialect], **fmtparams)
Georg Brandl116aa622007-08-15 14:28:22 +0000106
Georg Brandlf6945182008-02-01 11:56:49 +0000107 Associate *dialect* with *name*. *name* must be a string. The
Georg Brandl116aa622007-08-15 14:28:22 +0000108 dialect can be specified either by passing a sub-class of :class:`Dialect`, or
Georg Brandlc2a4f4f2009-04-10 09:03:43 +0000109 by *fmtparams* keyword arguments, or both, with keyword arguments overriding
Georg Brandl116aa622007-08-15 14:28:22 +0000110 parameters of the dialect. For full details about the dialect and formatting
111 parameters, see section :ref:`csv-fmt-params`.
112
113
114.. function:: unregister_dialect(name)
115
116 Delete the dialect associated with *name* from the dialect registry. An
117 :exc:`Error` is raised if *name* is not a registered dialect name.
118
119
120.. function:: get_dialect(name)
121
Georg Brandl6554cb92007-12-02 23:15:43 +0000122 Return the dialect associated with *name*. An :exc:`Error` is raised if
123 *name* is not a registered dialect name. This function returns an immutable
124 :class:`Dialect`.
Georg Brandl116aa622007-08-15 14:28:22 +0000125
126.. function:: list_dialects()
127
128 Return the names of all registered dialects.
129
130
131.. function:: field_size_limit([new_limit])
132
133 Returns the current maximum field size allowed by the parser. If *new_limit* is
134 given, this becomes the new limit.
135
Georg Brandl116aa622007-08-15 14:28:22 +0000136
137The :mod:`csv` module defines the following classes:
138
Georg Brandlc2a4f4f2009-04-10 09:03:43 +0000139.. class:: DictReader(csvfile, fieldnames=None, restkey=None, restval=None, dialect='excel', *args, **kwds)
Georg Brandl116aa622007-08-15 14:28:22 +0000140
141 Create an object which operates like a regular reader but maps the information
142 read into a dict whose keys are given by the optional *fieldnames* parameter.
143 If the *fieldnames* parameter is omitted, the values in the first row of the
R. David Murray578ec902009-11-09 14:21:38 +0000144 *csvfile* will be used as the fieldnames. If the row read has more fields
145 than the fieldnames sequence, the remaining data is added as a sequence
146 keyed by the value of *restkey*. If the row read has fewer fields than the
147 fieldnames sequence, the remaining keys take the value of the optional
148 *restval* parameter. Any other optional or keyword arguments are passed to
149 the underlying :class:`reader` instance.
Georg Brandl116aa622007-08-15 14:28:22 +0000150
151
Georg Brandlc2a4f4f2009-04-10 09:03:43 +0000152.. class:: DictWriter(csvfile, fieldnames, restval='', extrasaction='raise', dialect='excel', *args, **kwds)
Georg Brandl116aa622007-08-15 14:28:22 +0000153
154 Create an object which operates like a regular writer but maps dictionaries onto
155 output rows. The *fieldnames* parameter identifies the order in which values in
156 the dictionary passed to the :meth:`writerow` method are written to the
157 *csvfile*. The optional *restval* parameter specifies the value to be written
158 if the dictionary is missing a key in *fieldnames*. If the dictionary passed to
159 the :meth:`writerow` method contains a key not found in *fieldnames*, the
160 optional *extrasaction* parameter indicates what action to take. If it is set
161 to ``'raise'`` a :exc:`ValueError` is raised. If it is set to ``'ignore'``,
162 extra values in the dictionary are ignored. Any other optional or keyword
163 arguments are passed to the underlying :class:`writer` instance.
164
165 Note that unlike the :class:`DictReader` class, the *fieldnames* parameter of
166 the :class:`DictWriter` is not optional. Since Python's :class:`dict` objects
167 are not ordered, there is not enough information available to deduce the order
168 in which the row should be written to the *csvfile*.
169
170
171.. class:: Dialect
172
173 The :class:`Dialect` class is a container class relied on primarily for its
174 attributes, which are used to define the parameters for a specific
175 :class:`reader` or :class:`writer` instance.
176
177
178.. class:: excel()
179
180 The :class:`excel` class defines the usual properties of an Excel-generated CSV
181 file. It is registered with the dialect name ``'excel'``.
182
183
184.. class:: excel_tab()
185
186 The :class:`excel_tab` class defines the usual properties of an Excel-generated
187 TAB-delimited file. It is registered with the dialect name ``'excel-tab'``.
188
189
Georg Brandl7424dd32010-10-27 07:27:06 +0000190.. class:: unix_dialect()
191
192 The :class:`unix_dialect` class defines the usual properties of a CSV file
193 generated on UNIX systems, i.e. using ``'\n'`` as line terminator and quoting
194 all fields. It is registered with the dialect name ``'unix'``.
195
196 .. versionadded:: 3.2
197
198
Georg Brandl116aa622007-08-15 14:28:22 +0000199.. class:: Sniffer()
200
201 The :class:`Sniffer` class is used to deduce the format of a CSV file.
202
Benjamin Petersone41251e2008-04-25 01:59:09 +0000203 The :class:`Sniffer` class provides two methods:
Georg Brandl116aa622007-08-15 14:28:22 +0000204
Georg Brandlc2a4f4f2009-04-10 09:03:43 +0000205 .. method:: sniff(sample, delimiters=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000206
Benjamin Petersone41251e2008-04-25 01:59:09 +0000207 Analyze the given *sample* and return a :class:`Dialect` subclass
208 reflecting the parameters found. If the optional *delimiters* parameter
209 is given, it is interpreted as a string containing possible valid
210 delimiter characters.
Georg Brandl116aa622007-08-15 14:28:22 +0000211
212
Benjamin Petersone41251e2008-04-25 01:59:09 +0000213 .. method:: has_header(sample)
Georg Brandl116aa622007-08-15 14:28:22 +0000214
Benjamin Petersone41251e2008-04-25 01:59:09 +0000215 Analyze the sample text (presumed to be in CSV format) and return
216 :const:`True` if the first row appears to be a series of column headers.
Georg Brandl116aa622007-08-15 14:28:22 +0000217
Christian Heimes7f044312008-01-06 17:05:40 +0000218An example for :class:`Sniffer` use::
Georg Brandl116aa622007-08-15 14:28:22 +0000219
Christian Heimes7f044312008-01-06 17:05:40 +0000220 csvfile = open("example.csv")
221 dialect = csv.Sniffer().sniff(csvfile.read(1024))
222 csvfile.seek(0)
223 reader = csv.reader(csvfile, dialect)
224 # ... process CSV file contents here ...
225
226
227The :mod:`csv` module defines the following constants:
Georg Brandl116aa622007-08-15 14:28:22 +0000228
229.. data:: QUOTE_ALL
230
231 Instructs :class:`writer` objects to quote all fields.
232
233
234.. data:: QUOTE_MINIMAL
235
236 Instructs :class:`writer` objects to only quote those fields which contain
237 special characters such as *delimiter*, *quotechar* or any of the characters in
238 *lineterminator*.
239
240
241.. data:: QUOTE_NONNUMERIC
242
243 Instructs :class:`writer` objects to quote all non-numeric fields.
244
245 Instructs the reader to convert all non-quoted fields to type *float*.
246
247
248.. data:: QUOTE_NONE
249
250 Instructs :class:`writer` objects to never quote fields. When the current
251 *delimiter* occurs in output data it is preceded by the current *escapechar*
252 character. If *escapechar* is not set, the writer will raise :exc:`Error` if
253 any characters that require escaping are encountered.
254
255 Instructs :class:`reader` to perform no special processing of quote characters.
256
257The :mod:`csv` module defines the following exception:
258
259
260.. exception:: Error
261
262 Raised by any of the functions when an error is detected.
263
Georg Brandl116aa622007-08-15 14:28:22 +0000264.. _csv-fmt-params:
265
266Dialects and Formatting Parameters
267----------------------------------
268
269To make it easier to specify the format of input and output records, specific
270formatting parameters are grouped together into dialects. A dialect is a
271subclass of the :class:`Dialect` class having a set of specific methods and a
272single :meth:`validate` method. When creating :class:`reader` or
273:class:`writer` objects, the programmer can specify a string or a subclass of
274the :class:`Dialect` class as the dialect parameter. In addition to, or instead
275of, the *dialect* parameter, the programmer can also specify individual
276formatting parameters, which have the same names as the attributes defined below
277for the :class:`Dialect` class.
278
279Dialects support the following attributes:
280
281
282.. attribute:: Dialect.delimiter
283
284 A one-character string used to separate fields. It defaults to ``','``.
285
286
287.. attribute:: Dialect.doublequote
288
289 Controls how instances of *quotechar* appearing inside a field should be
290 themselves be quoted. When :const:`True`, the character is doubled. When
291 :const:`False`, the *escapechar* is used as a prefix to the *quotechar*. It
292 defaults to :const:`True`.
293
294 On output, if *doublequote* is :const:`False` and no *escapechar* is set,
295 :exc:`Error` is raised if a *quotechar* is found in a field.
296
297
298.. attribute:: Dialect.escapechar
299
300 A one-character string used by the writer to escape the *delimiter* if *quoting*
301 is set to :const:`QUOTE_NONE` and the *quotechar* if *doublequote* is
302 :const:`False`. On reading, the *escapechar* removes any special meaning from
303 the following character. It defaults to :const:`None`, which disables escaping.
304
305
306.. attribute:: Dialect.lineterminator
307
308 The string used to terminate lines produced by the :class:`writer`. It defaults
309 to ``'\r\n'``.
310
311 .. note::
312
313 The :class:`reader` is hard-coded to recognise either ``'\r'`` or ``'\n'`` as
314 end-of-line, and ignores *lineterminator*. This behavior may change in the
315 future.
316
317
318.. attribute:: Dialect.quotechar
319
320 A one-character string used to quote fields containing special characters, such
321 as the *delimiter* or *quotechar*, or which contain new-line characters. It
322 defaults to ``'"'``.
323
324
325.. attribute:: Dialect.quoting
326
327 Controls when quotes should be generated by the writer and recognised by the
328 reader. It can take on any of the :const:`QUOTE_\*` constants (see section
329 :ref:`csv-contents`) and defaults to :const:`QUOTE_MINIMAL`.
330
331
332.. attribute:: Dialect.skipinitialspace
333
334 When :const:`True`, whitespace immediately following the *delimiter* is ignored.
335 The default is :const:`False`.
336
337
338Reader Objects
339--------------
340
341Reader objects (:class:`DictReader` instances and objects returned by the
342:func:`reader` function) have the following public methods:
343
Georg Brandlc7485062009-04-01 15:53:15 +0000344.. method:: csvreader.__next__()
Georg Brandl116aa622007-08-15 14:28:22 +0000345
346 Return the next row of the reader's iterable object as a list, parsed according
Georg Brandlc7485062009-04-01 15:53:15 +0000347 to the current dialect. Usually you should call this as ``next(reader)``.
348
Georg Brandl116aa622007-08-15 14:28:22 +0000349
350Reader objects have the following public attributes:
351
Georg Brandl116aa622007-08-15 14:28:22 +0000352.. attribute:: csvreader.dialect
353
354 A read-only description of the dialect in use by the parser.
355
356
357.. attribute:: csvreader.line_num
358
359 The number of lines read from the source iterator. This is not the same as the
360 number of records returned, as records can span multiple lines.
361
Georg Brandl116aa622007-08-15 14:28:22 +0000362
Skip Montanaroaf8fcfa2008-08-09 19:44:22 +0000363DictReader objects have the following public attribute:
364
Skip Montanaroaf8fcfa2008-08-09 19:44:22 +0000365.. attribute:: csvreader.fieldnames
366
367 If not passed as a parameter when creating the object, this attribute is
368 initialized upon first access or when the first record is read from the
369 file.
370
Skip Montanaroaf8fcfa2008-08-09 19:44:22 +0000371
372
Georg Brandl116aa622007-08-15 14:28:22 +0000373Writer Objects
374--------------
375
376:class:`Writer` objects (:class:`DictWriter` instances and objects returned by
377the :func:`writer` function) have the following public methods. A *row* must be
378a sequence of strings or numbers for :class:`Writer` objects and a dictionary
379mapping fieldnames to strings or numbers (by passing them through :func:`str`
380first) for :class:`DictWriter` objects. Note that complex numbers are written
381out surrounded by parens. This may cause some problems for other programs which
382read CSV files (assuming they support complex numbers at all).
383
384
385.. method:: csvwriter.writerow(row)
386
387 Write the *row* parameter to the writer's file object, formatted according to
388 the current dialect.
389
390
391.. method:: csvwriter.writerows(rows)
392
393 Write all the *rows* parameters (a list of *row* objects as described above) to
394 the writer's file object, formatted according to the current dialect.
395
396Writer objects have the following public attribute:
397
398
399.. attribute:: csvwriter.dialect
400
401 A read-only description of the dialect in use by the writer.
402
403
R. David Murraybe0698b2010-02-23 22:57:58 +0000404DictWriter objects have the following public method:
405
406
407.. method:: DictWriter.writeheader()
408
409 Write a row with the field names (as specified in the constructor).
410
R. David Murray19e45482010-02-23 23:00:34 +0000411 .. versionadded:: 3.2
R. David Murraybe0698b2010-02-23 22:57:58 +0000412
413
Georg Brandl116aa622007-08-15 14:28:22 +0000414.. _csv-examples:
415
416Examples
417--------
418
419The simplest example of reading a CSV file::
420
421 import csv
R. David Murray8b7d4aa2009-04-04 01:38:38 +0000422 reader = csv.reader(open("some.csv", newline=''))
Georg Brandl116aa622007-08-15 14:28:22 +0000423 for row in reader:
Georg Brandl6911e3c2007-09-04 07:15:32 +0000424 print(row)
Georg Brandl116aa622007-08-15 14:28:22 +0000425
426Reading a file with an alternate format::
427
428 import csv
R. David Murray8b7d4aa2009-04-04 01:38:38 +0000429 reader = csv.reader(open("passwd"), delimiter=':', quoting=csv.QUOTE_NONE)
Georg Brandl116aa622007-08-15 14:28:22 +0000430 for row in reader:
Georg Brandl6911e3c2007-09-04 07:15:32 +0000431 print(row)
Georg Brandl116aa622007-08-15 14:28:22 +0000432
433The corresponding simplest possible writing example is::
434
435 import csv
R. David Murray8b7d4aa2009-04-04 01:38:38 +0000436 writer = csv.writer(open("some.csv", "w"))
Georg Brandl116aa622007-08-15 14:28:22 +0000437 writer.writerows(someiterable)
438
R. David Murray8b7d4aa2009-04-04 01:38:38 +0000439Since :func:`open` is used to open a CSV file for reading, the file
440will by default be decoded into unicode using the system default
441encoding (see :func:`locale.getpreferredencoding`). To decode a file
442using a different encoding, use the ``encoding`` argument of open::
443
444 import csv
445 reader = csv.reader(open("some.csv", newline='', encoding='utf-8'))
446 for row in reader:
447 print(row)
448
449The same applies to writing in something other than the system default
450encoding: specify the encoding argument when opening the output file.
451
Georg Brandl116aa622007-08-15 14:28:22 +0000452Registering a new dialect::
453
454 import csv
Georg Brandl116aa622007-08-15 14:28:22 +0000455 csv.register_dialect('unixpwd', delimiter=':', quoting=csv.QUOTE_NONE)
R. David Murray8b7d4aa2009-04-04 01:38:38 +0000456 reader = csv.reader(open("passwd"), 'unixpwd')
Georg Brandl116aa622007-08-15 14:28:22 +0000457
458A slightly more advanced use of the reader --- catching and reporting errors::
459
460 import csv, sys
461 filename = "some.csv"
R. David Murray8b7d4aa2009-04-04 01:38:38 +0000462 reader = csv.reader(open(filename, newline=''))
Georg Brandl116aa622007-08-15 14:28:22 +0000463 try:
464 for row in reader:
Georg Brandl6911e3c2007-09-04 07:15:32 +0000465 print(row)
Georg Brandl116aa622007-08-15 14:28:22 +0000466 except csv.Error as e:
R. David Murray8b7d4aa2009-04-04 01:38:38 +0000467 sys.exit('file {}, line {}: {}'.format(filename, reader.line_num, e))
Georg Brandl116aa622007-08-15 14:28:22 +0000468
469And while the module doesn't directly support parsing strings, it can easily be
470done::
471
472 import csv
473 for row in csv.reader(['one,two,three']):
Georg Brandl6911e3c2007-09-04 07:15:32 +0000474 print(row)
Georg Brandl116aa622007-08-15 14:28:22 +0000475
Georg Brandl116aa622007-08-15 14:28:22 +0000476
R. David Murray8b7d4aa2009-04-04 01:38:38 +0000477.. rubric:: Footnotes
Georg Brandl116aa622007-08-15 14:28:22 +0000478
R. David Murray8b7d4aa2009-04-04 01:38:38 +0000479.. [#] If ``newline=''`` is not specified, newlines embedded inside quoted fields
480 will not be interpreted correctly. It should always be safe to specify
481 ``newline=''``, since the csv module does its own universal newline handling
482 on input.