blob: 04fc0f28ca1b762a54db68fa37938302d4c665ce [file] [log] [blame]
Georg Brandl8ec7f652007-08-15 14:28:01 +00001
2:mod:`csv` --- CSV File Reading and Writing
3===========================================
4
5.. module:: csv
6 :synopsis: Write and read tabular data to and from delimited files.
7.. sectionauthor:: Skip Montanaro <skip@pobox.com>
8
9
10.. versionadded:: 2.3
11
12.. index::
13 single: csv
14 pair: data; tabular
15
16The so-called CSV (Comma Separated Values) format is the most common import and
17export format for spreadsheets and databases. There is no "CSV standard", so
18the format is operationally defined by the many applications which read and
19write it. The lack of a standard means that subtle differences often exist in
20the data produced and consumed by different applications. These differences can
21make it annoying to process CSV files from multiple sources. Still, while the
22delimiters and quoting characters vary, the overall format is similar enough
23that it is possible to write a single module which can efficiently manipulate
24such data, hiding the details of reading and writing the data from the
25programmer.
26
27The :mod:`csv` module implements classes to read and write tabular data in CSV
28format. It allows programmers to say, "write this data in the format preferred
29by Excel," or "read data from this file which was generated by Excel," without
30knowing the precise details of the CSV format used by Excel. Programmers can
31also describe the CSV formats understood by other applications or define their
32own special-purpose CSV formats.
33
34The :mod:`csv` module's :class:`reader` and :class:`writer` objects read and
35write sequences. Programmers can also read and write data in dictionary form
36using the :class:`DictReader` and :class:`DictWriter` classes.
37
38.. note::
39
40 This version of the :mod:`csv` module doesn't support Unicode input. Also,
41 there are currently some issues regarding ASCII NUL characters. Accordingly,
42 all input should be UTF-8 or printable ASCII to be safe; see the examples in
Éric Araujo06176a82012-07-02 17:46:40 -040043 section :ref:`csv-examples`.
Georg Brandl8ec7f652007-08-15 14:28:01 +000044
45
46.. seealso::
47
Georg Brandl8ec7f652007-08-15 14:28:01 +000048 :pep:`305` - CSV File API
49 The Python Enhancement Proposal which proposed this addition to Python.
50
51
52.. _csv-contents:
53
54Module Contents
55---------------
56
57The :mod:`csv` module defines the following functions:
58
59
Hynek Schlawack7d978902012-08-28 12:33:46 +020060.. function:: reader(csvfile, dialect='excel', **fmtparams)
Georg Brandl8ec7f652007-08-15 14:28:01 +000061
62 Return a reader object which will iterate over lines in the given *csvfile*.
Georg Brandle7a09902007-10-21 12:10:28 +000063 *csvfile* can be any object which supports the :term:`iterator` protocol and returns a
Georg Brandl9fa61bb2009-07-26 14:19:57 +000064 string each time its :meth:`!next` method is called --- file objects and list
Georg Brandl8ec7f652007-08-15 14:28:01 +000065 objects are both suitable. If *csvfile* is a file object, it must be opened
66 with the 'b' flag on platforms where that makes a difference. An optional
67 *dialect* parameter can be given which is used to define a set of parameters
68 specific to a particular CSV dialect. It may be an instance of a subclass of
69 the :class:`Dialect` class or one of the strings returned by the
Hynek Schlawack7d978902012-08-28 12:33:46 +020070 :func:`list_dialects` function. The other optional *fmtparams* keyword arguments
Georg Brandl8ec7f652007-08-15 14:28:01 +000071 can be given to override individual formatting parameters in the current
72 dialect. For full details about the dialect and formatting parameters, see
73 section :ref:`csv-fmt-params`.
74
Skip Montanaro9a1337b2009-03-25 00:52:11 +000075 Each row read from the csv file is returned as a list of strings. No
76 automatic data type conversion is performed.
Georg Brandl8ec7f652007-08-15 14:28:01 +000077
Georg Brandl722e1012007-12-05 17:56:50 +000078 A short usage example::
Georg Brandlc62ef8b2009-01-03 20:55:06 +000079
Georg Brandl722e1012007-12-05 17:56:50 +000080 >>> import csv
Ezio Melottia733d812012-09-15 05:46:24 +030081 >>> with open('eggs.csv', 'rb') as csvfile:
82 ... spamreader = csv.reader(csvfile, delimiter=' ', quotechar='|')
83 ... for row in spamreader:
84 ... print ', '.join(row)
Georg Brandl722e1012007-12-05 17:56:50 +000085 Spam, Spam, Spam, Spam, Spam, Baked Beans
86 Spam, Lovely Spam, Wonderful Spam
87
Georg Brandl8ec7f652007-08-15 14:28:01 +000088 .. versionchanged:: 2.5
89 The parser is now stricter with respect to multi-line quoted fields. Previously,
90 if a line ended within a quoted field without a terminating newline character, a
91 newline would be inserted into the returned field. This behavior caused problems
92 when reading files which contained carriage return characters within fields.
93 The behavior was changed to return the field without inserting newlines. As a
94 consequence, if newlines embedded within fields are important, the input should
95 be split into lines in a manner which preserves the newline characters.
96
97
Hynek Schlawack7d978902012-08-28 12:33:46 +020098.. function:: writer(csvfile, dialect='excel', **fmtparams)
Georg Brandl8ec7f652007-08-15 14:28:01 +000099
100 Return a writer object responsible for converting the user's data into delimited
101 strings on the given file-like object. *csvfile* can be any object with a
102 :func:`write` method. If *csvfile* is a file object, it must be opened with the
103 'b' flag on platforms where that makes a difference. An optional *dialect*
104 parameter can be given which is used to define a set of parameters specific to a
105 particular CSV dialect. It may be an instance of a subclass of the
106 :class:`Dialect` class or one of the strings returned by the
Hynek Schlawack7d978902012-08-28 12:33:46 +0200107 :func:`list_dialects` function. The other optional *fmtparams* keyword arguments
Georg Brandl8ec7f652007-08-15 14:28:01 +0000108 can be given to override individual formatting parameters in the current
109 dialect. For full details about the dialect and formatting parameters, see
110 section :ref:`csv-fmt-params`. To make it
111 as easy as possible to interface with modules which implement the DB API, the
112 value :const:`None` is written as the empty string. While this isn't a
113 reversible transformation, it makes it easier to dump SQL NULL data values to
114 CSV files without preprocessing the data returned from a ``cursor.fetch*`` call.
115 All other non-string data are stringified with :func:`str` before being written.
116
Georg Brandl722e1012007-12-05 17:56:50 +0000117 A short usage example::
118
Ezio Melottia733d812012-09-15 05:46:24 +0300119 import csv
120 with open('eggs.csv', 'wb') as csvfile:
121 spamwriter = csv.writer(csvfile, delimiter=' ',
122 quotechar='|', quoting=csv.QUOTE_MINIMAL)
123 spamwriter.writerow(['Spam'] * 5 + ['Baked Beans'])
124 spamwriter.writerow(['Spam', 'Lovely Spam', 'Wonderful Spam'])
Georg Brandl722e1012007-12-05 17:56:50 +0000125
Georg Brandl8ec7f652007-08-15 14:28:01 +0000126
Hynek Schlawack7d978902012-08-28 12:33:46 +0200127.. function:: register_dialect(name[, dialect], **fmtparams)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000128
129 Associate *dialect* with *name*. *name* must be a string or Unicode object. The
130 dialect can be specified either by passing a sub-class of :class:`Dialect`, or
Hynek Schlawack7d978902012-08-28 12:33:46 +0200131 by *fmtparams* keyword arguments, or both, with keyword arguments overriding
Georg Brandl8ec7f652007-08-15 14:28:01 +0000132 parameters of the dialect. For full details about the dialect and formatting
133 parameters, see section :ref:`csv-fmt-params`.
134
135
136.. function:: unregister_dialect(name)
137
138 Delete the dialect associated with *name* from the dialect registry. An
139 :exc:`Error` is raised if *name* is not a registered dialect name.
140
141
142.. function:: get_dialect(name)
143
144 Return the dialect associated with *name*. An :exc:`Error` is raised if *name*
145 is not a registered dialect name.
146
Skip Montanarod469ff12007-11-04 15:56:52 +0000147 .. versionchanged:: 2.5
Georg Brandl9c466ba2007-11-04 17:43:49 +0000148 This function now returns an immutable :class:`Dialect`. Previously an
149 instance of the requested dialect was returned. Users could modify the
150 underlying class, changing the behavior of active readers and writers.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000151
152.. function:: list_dialects()
153
154 Return the names of all registered dialects.
155
156
157.. function:: field_size_limit([new_limit])
158
159 Returns the current maximum field size allowed by the parser. If *new_limit* is
160 given, this becomes the new limit.
161
162 .. versionadded:: 2.5
163
164The :mod:`csv` module defines the following classes:
165
166
Hynek Schlawacke58ce012012-05-22 10:27:40 +0200167.. class:: DictReader(csvfile, fieldnames=None, restkey=None, restval=None, dialect='excel', *args, **kwds)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000168
169 Create an object which operates like a regular reader but maps the information
R David Murrayd2b5b312014-02-24 15:35:19 -0500170 read into a dict whose keys are given by the optional *fieldnames*
171 parameter. The *fieldnames* parameter is a :ref:`sequence
172 <collections-abstract-base-classes>` whose elements are associated with the
173 fields of the input data in order. These elements become the keys of the
174 resulting dictionary.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000175 If the *fieldnames* parameter is omitted, the values in the first row of the
R. David Murraya5dcf212009-11-09 14:18:14 +0000176 *csvfile* will be used as the fieldnames. If the row read has more fields
177 than the fieldnames sequence, the remaining data is added as a sequence
178 keyed by the value of *restkey*. If the row read has fewer fields than the
179 fieldnames sequence, the remaining keys take the value of the optional
180 *restval* parameter. Any other optional or keyword arguments are passed to
181 the underlying :class:`reader` instance.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000182
183
Hynek Schlawacke58ce012012-05-22 10:27:40 +0200184.. class:: DictWriter(csvfile, fieldnames, restval='', extrasaction='raise', dialect='excel', *args, **kwds)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000185
186 Create an object which operates like a regular writer but maps dictionaries onto
R David Murrayd2b5b312014-02-24 15:35:19 -0500187 output rows. The *fieldnames* parameter is a :ref:`sequence
188 <collections-abstract-base-classes>` of keys that identify the order in
189 which values in
Georg Brandl8ec7f652007-08-15 14:28:01 +0000190 the dictionary passed to the :meth:`writerow` method are written to the
191 *csvfile*. The optional *restval* parameter specifies the value to be written
192 if the dictionary is missing a key in *fieldnames*. If the dictionary passed to
193 the :meth:`writerow` method contains a key not found in *fieldnames*, the
194 optional *extrasaction* parameter indicates what action to take. If it is set
195 to ``'raise'`` a :exc:`ValueError` is raised. If it is set to ``'ignore'``,
196 extra values in the dictionary are ignored. Any other optional or keyword
197 arguments are passed to the underlying :class:`writer` instance.
198
199 Note that unlike the :class:`DictReader` class, the *fieldnames* parameter of
200 the :class:`DictWriter` is not optional. Since Python's :class:`dict` objects
201 are not ordered, there is not enough information available to deduce the order
202 in which the row should be written to the *csvfile*.
203
204
205.. class:: Dialect
206
207 The :class:`Dialect` class is a container class relied on primarily for its
208 attributes, which are used to define the parameters for a specific
209 :class:`reader` or :class:`writer` instance.
210
211
212.. class:: excel()
213
214 The :class:`excel` class defines the usual properties of an Excel-generated CSV
215 file. It is registered with the dialect name ``'excel'``.
216
217
218.. class:: excel_tab()
219
220 The :class:`excel_tab` class defines the usual properties of an Excel-generated
221 TAB-delimited file. It is registered with the dialect name ``'excel-tab'``.
222
223
224.. class:: Sniffer()
225
226 The :class:`Sniffer` class is used to deduce the format of a CSV file.
227
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000228 The :class:`Sniffer` class provides two methods:
Georg Brandl8ec7f652007-08-15 14:28:01 +0000229
Hynek Schlawacke58ce012012-05-22 10:27:40 +0200230 .. method:: sniff(sample, delimiters=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000231
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000232 Analyze the given *sample* and return a :class:`Dialect` subclass
233 reflecting the parameters found. If the optional *delimiters* parameter
234 is given, it is interpreted as a string containing possible valid
235 delimiter characters.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000236
237
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000238 .. method:: has_header(sample)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000239
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000240 Analyze the sample text (presumed to be in CSV format) and return
241 :const:`True` if the first row appears to be a series of column headers.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000242
Georg Brandl14aaee12008-01-06 16:04:56 +0000243An example for :class:`Sniffer` use::
Georg Brandl8ec7f652007-08-15 14:28:01 +0000244
Ezio Melottia733d812012-09-15 05:46:24 +0300245 with open('example.csv', 'rb') as csvfile:
246 dialect = csv.Sniffer().sniff(csvfile.read(1024))
247 csvfile.seek(0)
248 reader = csv.reader(csvfile, dialect)
249 # ... process CSV file contents here ...
Georg Brandl14aaee12008-01-06 16:04:56 +0000250
251
252The :mod:`csv` module defines the following constants:
Georg Brandl8ec7f652007-08-15 14:28:01 +0000253
254.. data:: QUOTE_ALL
255
256 Instructs :class:`writer` objects to quote all fields.
257
258
259.. data:: QUOTE_MINIMAL
260
261 Instructs :class:`writer` objects to only quote those fields which contain
262 special characters such as *delimiter*, *quotechar* or any of the characters in
263 *lineterminator*.
264
265
266.. data:: QUOTE_NONNUMERIC
267
268 Instructs :class:`writer` objects to quote all non-numeric fields.
269
270 Instructs the reader to convert all non-quoted fields to type *float*.
271
272
273.. data:: QUOTE_NONE
274
275 Instructs :class:`writer` objects to never quote fields. When the current
276 *delimiter* occurs in output data it is preceded by the current *escapechar*
277 character. If *escapechar* is not set, the writer will raise :exc:`Error` if
278 any characters that require escaping are encountered.
279
280 Instructs :class:`reader` to perform no special processing of quote characters.
281
282The :mod:`csv` module defines the following exception:
283
284
285.. exception:: Error
286
287 Raised by any of the functions when an error is detected.
288
289
290.. _csv-fmt-params:
291
292Dialects and Formatting Parameters
293----------------------------------
294
295To make it easier to specify the format of input and output records, specific
296formatting parameters are grouped together into dialects. A dialect is a
297subclass of the :class:`Dialect` class having a set of specific methods and a
298single :meth:`validate` method. When creating :class:`reader` or
299:class:`writer` objects, the programmer can specify a string or a subclass of
300the :class:`Dialect` class as the dialect parameter. In addition to, or instead
301of, the *dialect* parameter, the programmer can also specify individual
302formatting parameters, which have the same names as the attributes defined below
303for the :class:`Dialect` class.
304
305Dialects support the following attributes:
306
307
308.. attribute:: Dialect.delimiter
309
310 A one-character string used to separate fields. It defaults to ``','``.
311
312
313.. attribute:: Dialect.doublequote
314
315 Controls how instances of *quotechar* appearing inside a field should be
316 themselves be quoted. When :const:`True`, the character is doubled. When
317 :const:`False`, the *escapechar* is used as a prefix to the *quotechar*. It
318 defaults to :const:`True`.
319
320 On output, if *doublequote* is :const:`False` and no *escapechar* is set,
321 :exc:`Error` is raised if a *quotechar* is found in a field.
322
323
324.. attribute:: Dialect.escapechar
325
326 A one-character string used by the writer to escape the *delimiter* if *quoting*
327 is set to :const:`QUOTE_NONE` and the *quotechar* if *doublequote* is
328 :const:`False`. On reading, the *escapechar* removes any special meaning from
329 the following character. It defaults to :const:`None`, which disables escaping.
330
331
332.. attribute:: Dialect.lineterminator
333
334 The string used to terminate lines produced by the :class:`writer`. It defaults
335 to ``'\r\n'``.
336
337 .. note::
338
339 The :class:`reader` is hard-coded to recognise either ``'\r'`` or ``'\n'`` as
340 end-of-line, and ignores *lineterminator*. This behavior may change in the
341 future.
342
343
344.. attribute:: Dialect.quotechar
345
346 A one-character string used to quote fields containing special characters, such
347 as the *delimiter* or *quotechar*, or which contain new-line characters. It
348 defaults to ``'"'``.
349
350
351.. attribute:: Dialect.quoting
352
353 Controls when quotes should be generated by the writer and recognised by the
354 reader. It can take on any of the :const:`QUOTE_\*` constants (see section
355 :ref:`csv-contents`) and defaults to :const:`QUOTE_MINIMAL`.
356
357
358.. attribute:: Dialect.skipinitialspace
359
360 When :const:`True`, whitespace immediately following the *delimiter* is ignored.
361 The default is :const:`False`.
362
363
Ezio Melotti355637b2012-11-18 12:55:35 +0200364.. attribute:: Dialect.strict
365
366 When ``True``, raise exception :exc:`Error` on bad CSV input.
367 The default is ``False``.
368
Georg Brandl8ec7f652007-08-15 14:28:01 +0000369Reader Objects
370--------------
371
372Reader objects (:class:`DictReader` instances and objects returned by the
373:func:`reader` function) have the following public methods:
374
375
376.. method:: csvreader.next()
377
378 Return the next row of the reader's iterable object as a list, parsed according
379 to the current dialect.
380
381Reader objects have the following public attributes:
382
383
384.. attribute:: csvreader.dialect
385
386 A read-only description of the dialect in use by the parser.
387
388
389.. attribute:: csvreader.line_num
390
391 The number of lines read from the source iterator. This is not the same as the
392 number of records returned, as records can span multiple lines.
393
394 .. versionadded:: 2.5
395
396
Skip Montanaroa032bf42008-08-08 22:52:51 +0000397DictReader objects have the following public attribute:
398
399
400.. attribute:: csvreader.fieldnames
401
402 If not passed as a parameter when creating the object, this attribute is
403 initialized upon first access or when the first record is read from the
404 file.
405
406 .. versionchanged:: 2.6
407
408
Georg Brandl8ec7f652007-08-15 14:28:01 +0000409Writer Objects
410--------------
411
412:class:`Writer` objects (:class:`DictWriter` instances and objects returned by
413the :func:`writer` function) have the following public methods. A *row* must be
414a sequence of strings or numbers for :class:`Writer` objects and a dictionary
415mapping fieldnames to strings or numbers (by passing them through :func:`str`
416first) for :class:`DictWriter` objects. Note that complex numbers are written
417out surrounded by parens. This may cause some problems for other programs which
418read CSV files (assuming they support complex numbers at all).
419
420
421.. method:: csvwriter.writerow(row)
422
423 Write the *row* parameter to the writer's file object, formatted according to
424 the current dialect.
425
426
427.. method:: csvwriter.writerows(rows)
428
429 Write all the *rows* parameters (a list of *row* objects as described above) to
430 the writer's file object, formatted according to the current dialect.
431
432Writer objects have the following public attribute:
433
434
435.. attribute:: csvwriter.dialect
436
437 A read-only description of the dialect in use by the writer.
438
439
Dirkjan Ochtman86148172010-02-23 21:09:52 +0000440DictWriter objects have the following public method:
441
442
443.. method:: DictWriter.writeheader()
444
445 Write a row with the field names (as specified in the constructor).
446
447 .. versionadded:: 2.7
448
449
Georg Brandl8ec7f652007-08-15 14:28:01 +0000450.. _csv-examples:
451
452Examples
453--------
454
455The simplest example of reading a CSV file::
456
457 import csv
Eli Benderskyec40bab2011-03-13 08:45:19 +0200458 with open('some.csv', 'rb') as f:
459 reader = csv.reader(f)
460 for row in reader:
461 print row
Georg Brandl8ec7f652007-08-15 14:28:01 +0000462
463Reading a file with an alternate format::
464
465 import csv
Eli Benderskyec40bab2011-03-13 08:45:19 +0200466 with open('passwd', 'rb') as f:
467 reader = csv.reader(f, delimiter=':', quoting=csv.QUOTE_NONE)
468 for row in reader:
469 print row
Georg Brandl8ec7f652007-08-15 14:28:01 +0000470
471The corresponding simplest possible writing example is::
472
473 import csv
Eli Benderskyec40bab2011-03-13 08:45:19 +0200474 with open('some.csv', 'wb') as f:
475 writer = csv.writer(f)
476 writer.writerows(someiterable)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000477
478Registering a new dialect::
479
480 import csv
Georg Brandl8ec7f652007-08-15 14:28:01 +0000481 csv.register_dialect('unixpwd', delimiter=':', quoting=csv.QUOTE_NONE)
Eli Benderskyec40bab2011-03-13 08:45:19 +0200482 with open('passwd', 'rb') as f:
483 reader = csv.reader(f, 'unixpwd')
Georg Brandl8ec7f652007-08-15 14:28:01 +0000484
485A slightly more advanced use of the reader --- catching and reporting errors::
486
Benjamin Petersona7b55a32009-02-20 03:31:23 +0000487 import csv, sys
Eli Benderskyec40bab2011-03-13 08:45:19 +0200488 filename = 'some.csv'
489 with open(filename, 'rb') as f:
490 reader = csv.reader(f)
491 try:
492 for row in reader:
493 print row
Andrew Svetlov1625d882012-10-30 21:56:43 +0200494 except csv.Error as e:
Eli Benderskyec40bab2011-03-13 08:45:19 +0200495 sys.exit('file %s, line %d: %s' % (filename, reader.line_num, e))
Georg Brandl8ec7f652007-08-15 14:28:01 +0000496
497And while the module doesn't directly support parsing strings, it can easily be
498done::
499
500 import csv
501 for row in csv.reader(['one,two,three']):
502 print row
503
504The :mod:`csv` module doesn't directly support reading and writing Unicode, but
505it is 8-bit-clean save for some problems with ASCII NUL characters. So you can
506write functions or classes that handle the encoding and decoding for you as long
507as you avoid encodings like UTF-16 that use NULs. UTF-8 is recommended.
508
Georg Brandlcf3fb252007-10-21 10:52:38 +0000509:func:`unicode_csv_reader` below is a :term:`generator` that wraps :class:`csv.reader`
Georg Brandl8ec7f652007-08-15 14:28:01 +0000510to handle Unicode CSV data (a list of Unicode strings). :func:`utf_8_encoder`
Georg Brandlcf3fb252007-10-21 10:52:38 +0000511is a :term:`generator` that encodes the Unicode strings as UTF-8, one string (or row) at
Georg Brandl8ec7f652007-08-15 14:28:01 +0000512a time. The encoded strings are parsed by the CSV reader, and
513:func:`unicode_csv_reader` decodes the UTF-8-encoded cells back into Unicode::
514
515 import csv
516
517 def unicode_csv_reader(unicode_csv_data, dialect=csv.excel, **kwargs):
518 # csv.py doesn't do Unicode; encode temporarily as UTF-8:
519 csv_reader = csv.reader(utf_8_encoder(unicode_csv_data),
520 dialect=dialect, **kwargs)
521 for row in csv_reader:
522 # decode UTF-8 back to Unicode, cell by cell:
523 yield [unicode(cell, 'utf-8') for cell in row]
524
525 def utf_8_encoder(unicode_csv_data):
526 for line in unicode_csv_data:
527 yield line.encode('utf-8')
528
529For all other encodings the following :class:`UnicodeReader` and
530:class:`UnicodeWriter` classes can be used. They take an additional *encoding*
531parameter in their constructor and make sure that the data passes the real
532reader or writer encoded as UTF-8::
533
Benjamin Petersona7b55a32009-02-20 03:31:23 +0000534 import csv, codecs, cStringIO
Georg Brandl8ec7f652007-08-15 14:28:01 +0000535
536 class UTF8Recoder:
537 """
538 Iterator that reads an encoded stream and reencodes the input to UTF-8
539 """
540 def __init__(self, f, encoding):
541 self.reader = codecs.getreader(encoding)(f)
542
543 def __iter__(self):
544 return self
545
546 def next(self):
547 return self.reader.next().encode("utf-8")
548
549 class UnicodeReader:
550 """
551 A CSV reader which will iterate over lines in the CSV file "f",
552 which is encoded in the given encoding.
553 """
554
555 def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
556 f = UTF8Recoder(f, encoding)
557 self.reader = csv.reader(f, dialect=dialect, **kwds)
558
559 def next(self):
560 row = self.reader.next()
561 return [unicode(s, "utf-8") for s in row]
562
563 def __iter__(self):
564 return self
565
566 class UnicodeWriter:
567 """
568 A CSV writer which will write rows to CSV file "f",
569 which is encoded in the given encoding.
570 """
571
572 def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
573 # Redirect output to a queue
574 self.queue = cStringIO.StringIO()
575 self.writer = csv.writer(self.queue, dialect=dialect, **kwds)
576 self.stream = f
577 self.encoder = codecs.getincrementalencoder(encoding)()
578
579 def writerow(self, row):
580 self.writer.writerow([s.encode("utf-8") for s in row])
581 # Fetch UTF-8 output from the queue ...
582 data = self.queue.getvalue()
583 data = data.decode("utf-8")
584 # ... and reencode it into the target encoding
585 data = self.encoder.encode(data)
586 # write to the target stream
587 self.stream.write(data)
588 # empty queue
589 self.queue.truncate(0)
590
591 def writerows(self, rows):
592 for row in rows:
593 self.writerow(row)
594