blob: b5994c1955d45dc98bee9bdd2a39a33e7db1a87a [file] [log] [blame]
Georg Brandl8ec7f652007-08-15 14:28:01 +00001
2:mod:`csv` --- CSV File Reading and Writing
3===========================================
4
5.. module:: csv
6 :synopsis: Write and read tabular data to and from delimited files.
7.. sectionauthor:: Skip Montanaro <skip@pobox.com>
8
9
10.. versionadded:: 2.3
11
12.. index::
13 single: csv
14 pair: data; tabular
15
16The so-called CSV (Comma Separated Values) format is the most common import and
17export format for spreadsheets and databases. There is no "CSV standard", so
18the format is operationally defined by the many applications which read and
19write it. The lack of a standard means that subtle differences often exist in
20the data produced and consumed by different applications. These differences can
21make it annoying to process CSV files from multiple sources. Still, while the
22delimiters and quoting characters vary, the overall format is similar enough
23that it is possible to write a single module which can efficiently manipulate
24such data, hiding the details of reading and writing the data from the
25programmer.
26
27The :mod:`csv` module implements classes to read and write tabular data in CSV
28format. It allows programmers to say, "write this data in the format preferred
29by Excel," or "read data from this file which was generated by Excel," without
30knowing the precise details of the CSV format used by Excel. Programmers can
31also describe the CSV formats understood by other applications or define their
32own special-purpose CSV formats.
33
34The :mod:`csv` module's :class:`reader` and :class:`writer` objects read and
35write sequences. Programmers can also read and write data in dictionary form
36using the :class:`DictReader` and :class:`DictWriter` classes.
37
38.. note::
39
40 This version of the :mod:`csv` module doesn't support Unicode input. Also,
41 there are currently some issues regarding ASCII NUL characters. Accordingly,
42 all input should be UTF-8 or printable ASCII to be safe; see the examples in
43 section :ref:`csv-examples`. These restrictions will be removed in the future.
44
45
46.. seealso::
47
Georg Brandl8ec7f652007-08-15 14:28:01 +000048 :pep:`305` - CSV File API
49 The Python Enhancement Proposal which proposed this addition to Python.
50
51
52.. _csv-contents:
53
54Module Contents
55---------------
56
57The :mod:`csv` module defines the following functions:
58
59
60.. function:: reader(csvfile[, dialect='excel'][, fmtparam])
61
62 Return a reader object which will iterate over lines in the given *csvfile*.
Georg Brandle7a09902007-10-21 12:10:28 +000063 *csvfile* can be any object which supports the :term:`iterator` protocol and returns a
Georg Brandl8ec7f652007-08-15 14:28:01 +000064 string each time its :meth:`next` method is called --- file objects and list
65 objects are both suitable. If *csvfile* is a file object, it must be opened
66 with the 'b' flag on platforms where that makes a difference. An optional
67 *dialect* parameter can be given which is used to define a set of parameters
68 specific to a particular CSV dialect. It may be an instance of a subclass of
69 the :class:`Dialect` class or one of the strings returned by the
70 :func:`list_dialects` function. The other optional *fmtparam* keyword arguments
71 can be given to override individual formatting parameters in the current
72 dialect. For full details about the dialect and formatting parameters, see
73 section :ref:`csv-fmt-params`.
74
Skip Montanaro9a1337b2009-03-25 00:52:11 +000075 Each row read from the csv file is returned as a list of strings. No
76 automatic data type conversion is performed.
Georg Brandl8ec7f652007-08-15 14:28:01 +000077
Georg Brandl722e1012007-12-05 17:56:50 +000078 A short usage example::
Georg Brandlc62ef8b2009-01-03 20:55:06 +000079
Georg Brandl722e1012007-12-05 17:56:50 +000080 >>> import csv
81 >>> spamReader = csv.reader(open('eggs.csv'), delimiter=' ', quotechar='|')
82 >>> for row in spamReader:
83 ... print ', '.join(row)
84 Spam, Spam, Spam, Spam, Spam, Baked Beans
85 Spam, Lovely Spam, Wonderful Spam
86
Georg Brandl8ec7f652007-08-15 14:28:01 +000087 .. versionchanged:: 2.5
88 The parser is now stricter with respect to multi-line quoted fields. Previously,
89 if a line ended within a quoted field without a terminating newline character, a
90 newline would be inserted into the returned field. This behavior caused problems
91 when reading files which contained carriage return characters within fields.
92 The behavior was changed to return the field without inserting newlines. As a
93 consequence, if newlines embedded within fields are important, the input should
94 be split into lines in a manner which preserves the newline characters.
95
96
97.. function:: writer(csvfile[, dialect='excel'][, fmtparam])
98
99 Return a writer object responsible for converting the user's data into delimited
100 strings on the given file-like object. *csvfile* can be any object with a
101 :func:`write` method. If *csvfile* is a file object, it must be opened with the
102 'b' flag on platforms where that makes a difference. An optional *dialect*
103 parameter can be given which is used to define a set of parameters specific to a
104 particular CSV dialect. It may be an instance of a subclass of the
105 :class:`Dialect` class or one of the strings returned by the
106 :func:`list_dialects` function. The other optional *fmtparam* keyword arguments
107 can be given to override individual formatting parameters in the current
108 dialect. For full details about the dialect and formatting parameters, see
109 section :ref:`csv-fmt-params`. To make it
110 as easy as possible to interface with modules which implement the DB API, the
111 value :const:`None` is written as the empty string. While this isn't a
112 reversible transformation, it makes it easier to dump SQL NULL data values to
113 CSV files without preprocessing the data returned from a ``cursor.fetch*`` call.
114 All other non-string data are stringified with :func:`str` before being written.
115
Georg Brandl722e1012007-12-05 17:56:50 +0000116 A short usage example::
117
118 >>> import csv
119 >>> spamWriter = csv.writer(open('eggs.csv', 'w'), delimiter=' ',
120 ... quotechar='|', quoting=QUOTE_MINIMAL)
121 >>> spamWriter.writerow(['Spam'] * 5 + ['Baked Beans'])
122 >>> spamWriter.writerow(['Spam', 'Lovely Spam', 'Wonderful Spam'])
123
Georg Brandl8ec7f652007-08-15 14:28:01 +0000124
125.. function:: register_dialect(name[, dialect][, fmtparam])
126
127 Associate *dialect* with *name*. *name* must be a string or Unicode object. The
128 dialect can be specified either by passing a sub-class of :class:`Dialect`, or
129 by *fmtparam* keyword arguments, or both, with keyword arguments overriding
130 parameters of the dialect. For full details about the dialect and formatting
131 parameters, see section :ref:`csv-fmt-params`.
132
133
134.. function:: unregister_dialect(name)
135
136 Delete the dialect associated with *name* from the dialect registry. An
137 :exc:`Error` is raised if *name* is not a registered dialect name.
138
139
140.. function:: get_dialect(name)
141
142 Return the dialect associated with *name*. An :exc:`Error` is raised if *name*
143 is not a registered dialect name.
144
Skip Montanarod469ff12007-11-04 15:56:52 +0000145 .. versionchanged:: 2.5
Georg Brandl9c466ba2007-11-04 17:43:49 +0000146 This function now returns an immutable :class:`Dialect`. Previously an
147 instance of the requested dialect was returned. Users could modify the
148 underlying class, changing the behavior of active readers and writers.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000149
150.. function:: list_dialects()
151
152 Return the names of all registered dialects.
153
154
155.. function:: field_size_limit([new_limit])
156
157 Returns the current maximum field size allowed by the parser. If *new_limit* is
158 given, this becomes the new limit.
159
160 .. versionadded:: 2.5
161
162The :mod:`csv` module defines the following classes:
163
164
Brett Cannon1f67a672007-10-16 23:24:06 +0000165.. class:: DictReader(csvfile[, fieldnames=None[, restkey=None[, restval=None[, dialect='excel'[, *args, **kwds]]]]])
Georg Brandl8ec7f652007-08-15 14:28:01 +0000166
167 Create an object which operates like a regular reader but maps the information
168 read into a dict whose keys are given by the optional *fieldnames* parameter.
169 If the *fieldnames* parameter is omitted, the values in the first row of the
170 *csvfile* will be used as the fieldnames. If the row read has fewer fields than
171 the fieldnames sequence, the value of *restval* will be used as the default
172 value. If the row read has more fields than the fieldnames sequence, the
173 remaining data is added as a sequence keyed by the value of *restkey*. If the
174 row read has fewer fields than the fieldnames sequence, the remaining keys take
175 the value of the optional *restval* parameter. Any other optional or keyword
176 arguments are passed to the underlying :class:`reader` instance.
177
178
179.. class:: DictWriter(csvfile, fieldnames[, restval=''[, extrasaction='raise'[, dialect='excel'[, *args, **kwds]]]])
180
181 Create an object which operates like a regular writer but maps dictionaries onto
182 output rows. The *fieldnames* parameter identifies the order in which values in
183 the dictionary passed to the :meth:`writerow` method are written to the
184 *csvfile*. The optional *restval* parameter specifies the value to be written
185 if the dictionary is missing a key in *fieldnames*. If the dictionary passed to
186 the :meth:`writerow` method contains a key not found in *fieldnames*, the
187 optional *extrasaction* parameter indicates what action to take. If it is set
188 to ``'raise'`` a :exc:`ValueError` is raised. If it is set to ``'ignore'``,
189 extra values in the dictionary are ignored. Any other optional or keyword
190 arguments are passed to the underlying :class:`writer` instance.
191
192 Note that unlike the :class:`DictReader` class, the *fieldnames* parameter of
193 the :class:`DictWriter` is not optional. Since Python's :class:`dict` objects
194 are not ordered, there is not enough information available to deduce the order
195 in which the row should be written to the *csvfile*.
196
197
198.. class:: Dialect
199
200 The :class:`Dialect` class is a container class relied on primarily for its
201 attributes, which are used to define the parameters for a specific
202 :class:`reader` or :class:`writer` instance.
203
204
205.. class:: excel()
206
207 The :class:`excel` class defines the usual properties of an Excel-generated CSV
208 file. It is registered with the dialect name ``'excel'``.
209
210
211.. class:: excel_tab()
212
213 The :class:`excel_tab` class defines the usual properties of an Excel-generated
214 TAB-delimited file. It is registered with the dialect name ``'excel-tab'``.
215
216
217.. class:: Sniffer()
218
219 The :class:`Sniffer` class is used to deduce the format of a CSV file.
220
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000221 The :class:`Sniffer` class provides two methods:
Georg Brandl8ec7f652007-08-15 14:28:01 +0000222
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000223 .. method:: sniff(sample[, delimiters=None])
Georg Brandl8ec7f652007-08-15 14:28:01 +0000224
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000225 Analyze the given *sample* and return a :class:`Dialect` subclass
226 reflecting the parameters found. If the optional *delimiters* parameter
227 is given, it is interpreted as a string containing possible valid
228 delimiter characters.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000229
230
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000231 .. method:: has_header(sample)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000232
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000233 Analyze the sample text (presumed to be in CSV format) and return
234 :const:`True` if the first row appears to be a series of column headers.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000235
Georg Brandl14aaee12008-01-06 16:04:56 +0000236An example for :class:`Sniffer` use::
Georg Brandl8ec7f652007-08-15 14:28:01 +0000237
Georg Brandl14aaee12008-01-06 16:04:56 +0000238 csvfile = open("example.csv")
239 dialect = csv.Sniffer().sniff(csvfile.read(1024))
240 csvfile.seek(0)
241 reader = csv.reader(csvfile, dialect)
242 # ... process CSV file contents here ...
243
244
245The :mod:`csv` module defines the following constants:
Georg Brandl8ec7f652007-08-15 14:28:01 +0000246
247.. data:: QUOTE_ALL
248
249 Instructs :class:`writer` objects to quote all fields.
250
251
252.. data:: QUOTE_MINIMAL
253
254 Instructs :class:`writer` objects to only quote those fields which contain
255 special characters such as *delimiter*, *quotechar* or any of the characters in
256 *lineterminator*.
257
258
259.. data:: QUOTE_NONNUMERIC
260
261 Instructs :class:`writer` objects to quote all non-numeric fields.
262
263 Instructs the reader to convert all non-quoted fields to type *float*.
264
265
266.. data:: QUOTE_NONE
267
268 Instructs :class:`writer` objects to never quote fields. When the current
269 *delimiter* occurs in output data it is preceded by the current *escapechar*
270 character. If *escapechar* is not set, the writer will raise :exc:`Error` if
271 any characters that require escaping are encountered.
272
273 Instructs :class:`reader` to perform no special processing of quote characters.
274
275The :mod:`csv` module defines the following exception:
276
277
278.. exception:: Error
279
280 Raised by any of the functions when an error is detected.
281
282
283.. _csv-fmt-params:
284
285Dialects and Formatting Parameters
286----------------------------------
287
288To make it easier to specify the format of input and output records, specific
289formatting parameters are grouped together into dialects. A dialect is a
290subclass of the :class:`Dialect` class having a set of specific methods and a
291single :meth:`validate` method. When creating :class:`reader` or
292:class:`writer` objects, the programmer can specify a string or a subclass of
293the :class:`Dialect` class as the dialect parameter. In addition to, or instead
294of, the *dialect* parameter, the programmer can also specify individual
295formatting parameters, which have the same names as the attributes defined below
296for the :class:`Dialect` class.
297
298Dialects support the following attributes:
299
300
301.. attribute:: Dialect.delimiter
302
303 A one-character string used to separate fields. It defaults to ``','``.
304
305
306.. attribute:: Dialect.doublequote
307
308 Controls how instances of *quotechar* appearing inside a field should be
309 themselves be quoted. When :const:`True`, the character is doubled. When
310 :const:`False`, the *escapechar* is used as a prefix to the *quotechar*. It
311 defaults to :const:`True`.
312
313 On output, if *doublequote* is :const:`False` and no *escapechar* is set,
314 :exc:`Error` is raised if a *quotechar* is found in a field.
315
316
317.. attribute:: Dialect.escapechar
318
319 A one-character string used by the writer to escape the *delimiter* if *quoting*
320 is set to :const:`QUOTE_NONE` and the *quotechar* if *doublequote* is
321 :const:`False`. On reading, the *escapechar* removes any special meaning from
322 the following character. It defaults to :const:`None`, which disables escaping.
323
324
325.. attribute:: Dialect.lineterminator
326
327 The string used to terminate lines produced by the :class:`writer`. It defaults
328 to ``'\r\n'``.
329
330 .. note::
331
332 The :class:`reader` is hard-coded to recognise either ``'\r'`` or ``'\n'`` as
333 end-of-line, and ignores *lineterminator*. This behavior may change in the
334 future.
335
336
337.. attribute:: Dialect.quotechar
338
339 A one-character string used to quote fields containing special characters, such
340 as the *delimiter* or *quotechar*, or which contain new-line characters. It
341 defaults to ``'"'``.
342
343
344.. attribute:: Dialect.quoting
345
346 Controls when quotes should be generated by the writer and recognised by the
347 reader. It can take on any of the :const:`QUOTE_\*` constants (see section
348 :ref:`csv-contents`) and defaults to :const:`QUOTE_MINIMAL`.
349
350
351.. attribute:: Dialect.skipinitialspace
352
353 When :const:`True`, whitespace immediately following the *delimiter* is ignored.
354 The default is :const:`False`.
355
356
357Reader Objects
358--------------
359
360Reader objects (:class:`DictReader` instances and objects returned by the
361:func:`reader` function) have the following public methods:
362
363
364.. method:: csvreader.next()
365
366 Return the next row of the reader's iterable object as a list, parsed according
367 to the current dialect.
368
369Reader objects have the following public attributes:
370
371
372.. attribute:: csvreader.dialect
373
374 A read-only description of the dialect in use by the parser.
375
376
377.. attribute:: csvreader.line_num
378
379 The number of lines read from the source iterator. This is not the same as the
380 number of records returned, as records can span multiple lines.
381
382 .. versionadded:: 2.5
383
384
Skip Montanaroa032bf42008-08-08 22:52:51 +0000385DictReader objects have the following public attribute:
386
387
388.. attribute:: csvreader.fieldnames
389
390 If not passed as a parameter when creating the object, this attribute is
391 initialized upon first access or when the first record is read from the
392 file.
393
394 .. versionchanged:: 2.6
395
396
Georg Brandl8ec7f652007-08-15 14:28:01 +0000397Writer Objects
398--------------
399
400:class:`Writer` objects (:class:`DictWriter` instances and objects returned by
401the :func:`writer` function) have the following public methods. A *row* must be
402a sequence of strings or numbers for :class:`Writer` objects and a dictionary
403mapping fieldnames to strings or numbers (by passing them through :func:`str`
404first) for :class:`DictWriter` objects. Note that complex numbers are written
405out surrounded by parens. This may cause some problems for other programs which
406read CSV files (assuming they support complex numbers at all).
407
408
409.. method:: csvwriter.writerow(row)
410
411 Write the *row* parameter to the writer's file object, formatted according to
412 the current dialect.
413
414
415.. method:: csvwriter.writerows(rows)
416
417 Write all the *rows* parameters (a list of *row* objects as described above) to
418 the writer's file object, formatted according to the current dialect.
419
420Writer objects have the following public attribute:
421
422
423.. attribute:: csvwriter.dialect
424
425 A read-only description of the dialect in use by the writer.
426
427
428.. _csv-examples:
429
430Examples
431--------
432
433The simplest example of reading a CSV file::
434
435 import csv
436 reader = csv.reader(open("some.csv", "rb"))
437 for row in reader:
438 print row
439
440Reading a file with an alternate format::
441
442 import csv
443 reader = csv.reader(open("passwd", "rb"), delimiter=':', quoting=csv.QUOTE_NONE)
444 for row in reader:
445 print row
446
447The corresponding simplest possible writing example is::
448
449 import csv
450 writer = csv.writer(open("some.csv", "wb"))
451 writer.writerows(someiterable)
452
453Registering a new dialect::
454
455 import csv
456
457 csv.register_dialect('unixpwd', delimiter=':', quoting=csv.QUOTE_NONE)
458
459 reader = csv.reader(open("passwd", "rb"), 'unixpwd')
460
461A slightly more advanced use of the reader --- catching and reporting errors::
462
Benjamin Petersona7b55a32009-02-20 03:31:23 +0000463 import csv, sys
Georg Brandl8ec7f652007-08-15 14:28:01 +0000464 filename = "some.csv"
465 reader = csv.reader(open(filename, "rb"))
466 try:
467 for row in reader:
468 print row
469 except csv.Error, e:
470 sys.exit('file %s, line %d: %s' % (filename, reader.line_num, e))
471
472And while the module doesn't directly support parsing strings, it can easily be
473done::
474
475 import csv
476 for row in csv.reader(['one,two,three']):
477 print row
478
479The :mod:`csv` module doesn't directly support reading and writing Unicode, but
480it is 8-bit-clean save for some problems with ASCII NUL characters. So you can
481write functions or classes that handle the encoding and decoding for you as long
482as you avoid encodings like UTF-16 that use NULs. UTF-8 is recommended.
483
Georg Brandlcf3fb252007-10-21 10:52:38 +0000484:func:`unicode_csv_reader` below is a :term:`generator` that wraps :class:`csv.reader`
Georg Brandl8ec7f652007-08-15 14:28:01 +0000485to handle Unicode CSV data (a list of Unicode strings). :func:`utf_8_encoder`
Georg Brandlcf3fb252007-10-21 10:52:38 +0000486is a :term:`generator` that encodes the Unicode strings as UTF-8, one string (or row) at
Georg Brandl8ec7f652007-08-15 14:28:01 +0000487a time. The encoded strings are parsed by the CSV reader, and
488:func:`unicode_csv_reader` decodes the UTF-8-encoded cells back into Unicode::
489
490 import csv
491
492 def unicode_csv_reader(unicode_csv_data, dialect=csv.excel, **kwargs):
493 # csv.py doesn't do Unicode; encode temporarily as UTF-8:
494 csv_reader = csv.reader(utf_8_encoder(unicode_csv_data),
495 dialect=dialect, **kwargs)
496 for row in csv_reader:
497 # decode UTF-8 back to Unicode, cell by cell:
498 yield [unicode(cell, 'utf-8') for cell in row]
499
500 def utf_8_encoder(unicode_csv_data):
501 for line in unicode_csv_data:
502 yield line.encode('utf-8')
503
504For all other encodings the following :class:`UnicodeReader` and
505:class:`UnicodeWriter` classes can be used. They take an additional *encoding*
506parameter in their constructor and make sure that the data passes the real
507reader or writer encoded as UTF-8::
508
Benjamin Petersona7b55a32009-02-20 03:31:23 +0000509 import csv, codecs, cStringIO
Georg Brandl8ec7f652007-08-15 14:28:01 +0000510
511 class UTF8Recoder:
512 """
513 Iterator that reads an encoded stream and reencodes the input to UTF-8
514 """
515 def __init__(self, f, encoding):
516 self.reader = codecs.getreader(encoding)(f)
517
518 def __iter__(self):
519 return self
520
521 def next(self):
522 return self.reader.next().encode("utf-8")
523
524 class UnicodeReader:
525 """
526 A CSV reader which will iterate over lines in the CSV file "f",
527 which is encoded in the given encoding.
528 """
529
530 def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
531 f = UTF8Recoder(f, encoding)
532 self.reader = csv.reader(f, dialect=dialect, **kwds)
533
534 def next(self):
535 row = self.reader.next()
536 return [unicode(s, "utf-8") for s in row]
537
538 def __iter__(self):
539 return self
540
541 class UnicodeWriter:
542 """
543 A CSV writer which will write rows to CSV file "f",
544 which is encoded in the given encoding.
545 """
546
547 def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
548 # Redirect output to a queue
549 self.queue = cStringIO.StringIO()
550 self.writer = csv.writer(self.queue, dialect=dialect, **kwds)
551 self.stream = f
552 self.encoder = codecs.getincrementalencoder(encoding)()
553
554 def writerow(self, row):
555 self.writer.writerow([s.encode("utf-8") for s in row])
556 # Fetch UTF-8 output from the queue ...
557 data = self.queue.getvalue()
558 data = data.decode("utf-8")
559 # ... and reencode it into the target encoding
560 data = self.encoder.encode(data)
561 # write to the target stream
562 self.stream.write(data)
563 # empty queue
564 self.queue.truncate(0)
565
566 def writerows(self, rows):
567 for row in rows:
568 self.writerow(row)
569