blob: 14a83b4f7bae7e51bb69db15d62f403011b47f6b [file] [log] [blame]
Barry Warsawaf572511999-08-11 21:40:38 +00001#! /usr/bin/env python
Barry Warsaw6e972412001-05-21 19:35:20 +00002# Originally written by Barry Warsaw <barry@digicool.com>
Barry Warsawc8f08922000-02-26 20:56:47 +00003#
Barry Warsaw6e972412001-05-21 19:35:20 +00004# Minimally patched to make it even more xgettext compatible
Barry Warsawc8f08922000-02-26 20:56:47 +00005# by Peter Funk <pf@artcom-gmbh.de>
Barry Warsawe27db5a1999-08-13 20:59:48 +00006
Barry Warsaw08a8a352000-10-27 04:56:28 +00007"""pygettext -- Python equivalent of xgettext(1)
Barry Warsawe27db5a1999-08-13 20:59:48 +00008
9Many systems (Solaris, Linux, Gnu) provide extensive tools that ease the
10internationalization of C programs. Most of these tools are independent of
11the programming language and can be used from within Python programs. Martin
12von Loewis' work[1] helps considerably in this regard.
13
Barry Warsaw5dbf5261999-11-03 18:47:52 +000014There's one problem though; xgettext is the program that scans source code
Barry Warsawe27db5a1999-08-13 20:59:48 +000015looking for message strings, but it groks only C (or C++). Python introduces
16a few wrinkles, such as dual quoting characters, triple quoted strings, and
17raw strings. xgettext understands none of this.
18
19Enter pygettext, which uses Python's standard tokenize module to scan Python
20source code, generating .pot files identical to what GNU xgettext[2] generates
Barry Warsaw5dbf5261999-11-03 18:47:52 +000021for C and C++ code. From there, the standard GNU tools can be used.
Barry Warsawe27db5a1999-08-13 20:59:48 +000022
23A word about marking Python strings as candidates for translation. GNU
24xgettext recognizes the following keywords: gettext, dgettext, dcgettext, and
25gettext_noop. But those can be a lot of text to include all over your code.
Barry Warsaw5dbf5261999-11-03 18:47:52 +000026C and C++ have a trick: they use the C preprocessor. Most internationalized C
Barry Warsawe27db5a1999-08-13 20:59:48 +000027source includes a #define for gettext() to _() so that what has to be written
28in the source is much less. Thus these are both translatable strings:
29
30 gettext("Translatable String")
31 _("Translatable String")
32
33Python of course has no preprocessor so this doesn't work so well. Thus,
34pygettext searches only for _() by default, but see the -k/--keyword flag
35below for how to augment this.
36
37 [1] http://www.python.org/workshops/1997-10/proceedings/loewis.html
38 [2] http://www.gnu.org/software/gettext/gettext.html
39
Barry Warsawe27db5a1999-08-13 20:59:48 +000040NOTE: pygettext attempts to be option and feature compatible with GNU xgettext
Barry Warsawc8f08922000-02-26 20:56:47 +000041where ever possible. However some options are still missing or are not fully
Barry Warsawa17e0f12000-03-08 15:18:35 +000042implemented. Also, xgettext's use of command line switches with option
43arguments is broken, and in these cases, pygettext just defines additional
44switches.
Barry Warsawe27db5a1999-08-13 20:59:48 +000045
Barry Warsawa17e0f12000-03-08 15:18:35 +000046Usage: pygettext [options] inputfile ...
Barry Warsawe27db5a1999-08-13 20:59:48 +000047
48Options:
49
50 -a
51 --extract-all
52 Extract all strings
53
Barry Warsawc8f08922000-02-26 20:56:47 +000054 -d name
55 --default-domain=name
56 Rename the default output file from messages.pot to name.pot
57
58 -E
59 --escape
Barry Warsaw08a8a352000-10-27 04:56:28 +000060 Replace non-ASCII characters with octal escape sequences.
61
62 -D
63 --docstrings
64 Extract module, class, method, and function docstrings. These do not
65 need to be wrapped in _() markers, and in fact cannot be for Python to
66 consider them docstrings.
Barry Warsawc8f08922000-02-26 20:56:47 +000067
68 -h
69 --help
70 print this help message and exit
Barry Warsawe27db5a1999-08-13 20:59:48 +000071
Barry Warsawa17e0f12000-03-08 15:18:35 +000072 -k word
73 --keyword=word
74 Keywords to look for in addition to the default set, which are:
75 %(DEFAULTKEYWORDS)s
Barry Warsawe27db5a1999-08-13 20:59:48 +000076
Barry Warsawa17e0f12000-03-08 15:18:35 +000077 You can have multiple -k flags on the command line.
78
79 -K
80 --no-default-keywords
81 Disable the default set of keywords (see above). Any keywords
82 explicitly added with the -k/--keyword option are still recognized.
Barry Warsawe27db5a1999-08-13 20:59:48 +000083
84 --no-location
Barry Warsawa17e0f12000-03-08 15:18:35 +000085 Do not write filename/lineno location comments.
Barry Warsawe27db5a1999-08-13 20:59:48 +000086
Barry Warsawa17e0f12000-03-08 15:18:35 +000087 -n
88 --add-location
Barry Warsawe27db5a1999-08-13 20:59:48 +000089 Write filename/lineno location comments indicating where each
90 extracted string is found in the source. These lines appear before
Barry Warsawa17e0f12000-03-08 15:18:35 +000091 each msgid. The style of comments is controlled by the -S/--style
92 option. This is the default.
93
Barry Warsaw08a8a352000-10-27 04:56:28 +000094 -o filename
95 --output=filename
96 Rename the default output file from messages.pot to filename. If
97 filename is `-' then the output is sent to standard out.
98
99 -p dir
100 --output-dir=dir
101 Output files will be placed in directory dir.
102
Barry Warsawa17e0f12000-03-08 15:18:35 +0000103 -S stylename
104 --style stylename
105 Specify which style to use for location comments. Two styles are
106 supported:
Barry Warsawe27db5a1999-08-13 20:59:48 +0000107
108 Solaris # File: filename, line: line-number
Barry Warsawa17e0f12000-03-08 15:18:35 +0000109 GNU #: filename:line
Barry Warsawe27db5a1999-08-13 20:59:48 +0000110
Barry Warsawa17e0f12000-03-08 15:18:35 +0000111 The style name is case insensitive. GNU style is the default.
Barry Warsawe27db5a1999-08-13 20:59:48 +0000112
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000113 -v
114 --verbose
115 Print the names of the files being processed.
116
Barry Warsawc8f08922000-02-26 20:56:47 +0000117 -V
118 --version
119 Print the version of pygettext and exit.
120
121 -w columns
122 --width=columns
123 Set width of output to columns.
124
125 -x filename
126 --exclude-file=filename
127 Specify a file that contains a list of strings that are not be
128 extracted from the input files. Each string to be excluded must
129 appear on a line by itself in the file.
Barry Warsawe27db5a1999-08-13 20:59:48 +0000130
Barry Warsawa17e0f12000-03-08 15:18:35 +0000131If `inputfile' is -, standard input is read.
132
Barry Warsaw08a8a352000-10-27 04:56:28 +0000133"""
Barry Warsawe27db5a1999-08-13 20:59:48 +0000134
135import os
136import sys
Barry Warsawe27db5a1999-08-13 20:59:48 +0000137import time
138import getopt
139import tokenize
Barry Warsaw16b62c12001-05-21 19:51:26 +0000140import operator
Barry Warsawe27db5a1999-08-13 20:59:48 +0000141
Barry Warsaw08a8a352000-10-27 04:56:28 +0000142# for selftesting
143try:
144 import fintl
145 _ = fintl.gettext
146except ImportError:
147 def _(s): return s
148
Martin v. Löwis0f6b3832001-03-01 22:56:17 +0000149__version__ = '1.3'
Barry Warsawa17e0f12000-03-08 15:18:35 +0000150
151default_keywords = ['_']
152DEFAULTKEYWORDS = ', '.join(default_keywords)
153
154EMPTYSTRING = ''
Barry Warsawe27db5a1999-08-13 20:59:48 +0000155
156
157
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000158# The normal pot-file header. msgmerge and EMACS' po-mode work better if
159# it's there.
160pot_header = _('''\
161# SOME DESCRIPTIVE TITLE.
162# Copyright (C) YEAR ORGANIZATION
163# FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.
164#
165msgid ""
166msgstr ""
167"Project-Id-Version: PACKAGE VERSION\\n"
Martin v. Löwis0f6b3832001-03-01 22:56:17 +0000168"POT-Creation-Date: %(time)s\\n"
169"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\\n"
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000170"Last-Translator: FULL NAME <EMAIL@ADDRESS>\\n"
171"Language-Team: LANGUAGE <LL@li.org>\\n"
172"MIME-Version: 1.0\\n"
173"Content-Type: text/plain; charset=CHARSET\\n"
174"Content-Transfer-Encoding: ENCODING\\n"
175"Generated-By: pygettext.py %(version)s\\n"
176
177''')
178
179
Barry Warsawe27db5a1999-08-13 20:59:48 +0000180def usage(code, msg=''):
Barry Warsaw08a8a352000-10-27 04:56:28 +0000181 print >> sys.stderr, _(__doc__) % globals()
Barry Warsawe27db5a1999-08-13 20:59:48 +0000182 if msg:
Barry Warsaw08a8a352000-10-27 04:56:28 +0000183 print >> sys.stderr, msg
Barry Warsawe27db5a1999-08-13 20:59:48 +0000184 sys.exit(code)
185
Barry Warsawc8f08922000-02-26 20:56:47 +0000186
Barry Warsawe27db5a1999-08-13 20:59:48 +0000187
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000188escapes = []
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000189
Barry Warsawc8f08922000-02-26 20:56:47 +0000190def make_escapes(pass_iso8859):
191 global escapes
Barry Warsaw7733e122000-02-27 14:30:48 +0000192 if pass_iso8859:
193 # Allow iso-8859 characters to pass through so that e.g. 'msgid
194 # "Höhe"' would result not result in 'msgid "H\366he"'. Otherwise we
195 # escape any character outside the 32..126 range.
196 mod = 128
197 else:
198 mod = 256
Barry Warsawc8f08922000-02-26 20:56:47 +0000199 for i in range(256):
Barry Warsaw7733e122000-02-27 14:30:48 +0000200 if 32 <= (i % mod) <= 126:
Barry Warsawc8f08922000-02-26 20:56:47 +0000201 escapes.append(chr(i))
202 else:
203 escapes.append("\\%03o" % i)
204 escapes[ord('\\')] = '\\\\'
205 escapes[ord('\t')] = '\\t'
206 escapes[ord('\r')] = '\\r'
207 escapes[ord('\n')] = '\\n'
208 escapes[ord('\"')] = '\\"'
209
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000210
211def escape(s):
Barry Warsawc8f08922000-02-26 20:56:47 +0000212 global escapes
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000213 s = list(s)
214 for i in range(len(s)):
215 s[i] = escapes[ord(s[i])]
Barry Warsawa17e0f12000-03-08 15:18:35 +0000216 return EMPTYSTRING.join(s)
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000217
218
219def safe_eval(s):
220 # unwrap quotes, safely
221 return eval(s, {'__builtins__':{}}, {})
222
223
Barry Warsawe27db5a1999-08-13 20:59:48 +0000224def normalize(s):
225 # This converts the various Python string types into a format that is
226 # appropriate for .po files, namely much closer to C style.
Barry Warsawa17e0f12000-03-08 15:18:35 +0000227 lines = s.split('\n')
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000228 if len(lines) == 1:
229 s = '"' + escape(s) + '"'
Barry Warsawe27db5a1999-08-13 20:59:48 +0000230 else:
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000231 if not lines[-1]:
232 del lines[-1]
233 lines[-1] = lines[-1] + '\n'
234 for i in range(len(lines)):
235 lines[i] = escape(lines[i])
Barry Warsawa17e0f12000-03-08 15:18:35 +0000236 lineterm = '\\n"\n"'
237 s = '""\n"' + lineterm.join(lines) + '"'
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000238 return s
Barry Warsawe27db5a1999-08-13 20:59:48 +0000239
240
241
242class TokenEater:
243 def __init__(self, options):
244 self.__options = options
245 self.__messages = {}
246 self.__state = self.__waiting
247 self.__data = []
248 self.__lineno = -1
Barry Warsaw08a8a352000-10-27 04:56:28 +0000249 self.__freshmodule = 1
Barry Warsawe27db5a1999-08-13 20:59:48 +0000250
251 def __call__(self, ttype, tstring, stup, etup, line):
252 # dispatch
Barry Warsaw08a8a352000-10-27 04:56:28 +0000253## import token
254## print >> sys.stderr, 'ttype:', token.tok_name[ttype], \
255## 'tstring:', tstring
Barry Warsawe27db5a1999-08-13 20:59:48 +0000256 self.__state(ttype, tstring, stup[0])
257
258 def __waiting(self, ttype, tstring, lineno):
Barry Warsaw08a8a352000-10-27 04:56:28 +0000259 # Do docstring extractions, if enabled
260 if self.__options.docstrings:
261 # module docstring?
262 if self.__freshmodule:
263 if ttype == tokenize.STRING:
Barry Warsaw16b62c12001-05-21 19:51:26 +0000264 self.__addentry(safe_eval(tstring), lineno, isdocstring=1)
Barry Warsaw08a8a352000-10-27 04:56:28 +0000265 self.__freshmodule = 0
266 elif ttype not in (tokenize.COMMENT, tokenize.NL):
267 self.__freshmodule = 0
268 return
269 # class docstring?
270 if ttype == tokenize.NAME and tstring in ('class', 'def'):
271 self.__state = self.__suiteseen
272 return
Barry Warsawe27db5a1999-08-13 20:59:48 +0000273 if ttype == tokenize.NAME and tstring in self.__options.keywords:
274 self.__state = self.__keywordseen
275
Barry Warsaw08a8a352000-10-27 04:56:28 +0000276 def __suiteseen(self, ttype, tstring, lineno):
277 # ignore anything until we see the colon
278 if ttype == tokenize.OP and tstring == ':':
279 self.__state = self.__suitedocstring
280
281 def __suitedocstring(self, ttype, tstring, lineno):
282 # ignore any intervening noise
283 if ttype == tokenize.STRING:
Barry Warsaw16b62c12001-05-21 19:51:26 +0000284 self.__addentry(safe_eval(tstring), lineno, isdocstring=1)
Barry Warsaw08a8a352000-10-27 04:56:28 +0000285 self.__state = self.__waiting
286 elif ttype not in (tokenize.NEWLINE, tokenize.INDENT,
287 tokenize.COMMENT):
288 # there was no class docstring
289 self.__state = self.__waiting
290
Barry Warsawe27db5a1999-08-13 20:59:48 +0000291 def __keywordseen(self, ttype, tstring, lineno):
292 if ttype == tokenize.OP and tstring == '(':
293 self.__data = []
294 self.__lineno = lineno
295 self.__state = self.__openseen
296 else:
297 self.__state = self.__waiting
298
299 def __openseen(self, ttype, tstring, lineno):
300 if ttype == tokenize.OP and tstring == ')':
301 # We've seen the last of the translatable strings. Record the
302 # line number of the first line of the strings and update the list
303 # of messages seen. Reset state for the next batch. If there
304 # were no strings inside _(), then just ignore this entry.
305 if self.__data:
Barry Warsaw08a8a352000-10-27 04:56:28 +0000306 self.__addentry(EMPTYSTRING.join(self.__data))
Barry Warsawe27db5a1999-08-13 20:59:48 +0000307 self.__state = self.__waiting
308 elif ttype == tokenize.STRING:
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000309 self.__data.append(safe_eval(tstring))
Barry Warsawe27db5a1999-08-13 20:59:48 +0000310 # TBD: should we warn if we seen anything else?
311
Barry Warsaw16b62c12001-05-21 19:51:26 +0000312 def __addentry(self, msg, lineno=None, isdocstring=0):
Barry Warsaw08a8a352000-10-27 04:56:28 +0000313 if lineno is None:
314 lineno = self.__lineno
315 if not msg in self.__options.toexclude:
316 entry = (self.__curfile, lineno)
Barry Warsaw16b62c12001-05-21 19:51:26 +0000317 self.__messages.setdefault(msg, {})[entry] = isdocstring
Barry Warsaw08a8a352000-10-27 04:56:28 +0000318
Barry Warsawe27db5a1999-08-13 20:59:48 +0000319 def set_filename(self, filename):
320 self.__curfile = filename
321
322 def write(self, fp):
323 options = self.__options
324 timestamp = time.ctime(time.time())
Barry Warsaw08a8a352000-10-27 04:56:28 +0000325 # The time stamp in the header doesn't have the same format as that
326 # generated by xgettext...
327 print >> fp, pot_header % {'time': timestamp, 'version': __version__}
Barry Warsaw128c77d2001-05-23 16:59:45 +0000328 # Sort the entries. First sort each particular entry's keys, then
329 # sort all the entries by their first item.
330 reverse = {}
Fred Drake33e2c3e2000-10-26 03:49:15 +0000331 for k, v in self.__messages.items():
Barry Warsaw128c77d2001-05-23 16:59:45 +0000332 keys = v.keys()
333 keys.sort()
334 reverse[tuple(keys)] = (k, v)
335 rkeys = reverse.keys()
336 rkeys.sort()
337 for rkey in rkeys:
338 k, v = reverse[rkey]
Barry Warsaw16b62c12001-05-21 19:51:26 +0000339 # If the entry was gleaned out of a docstring, then add a comment
340 # stating so. This is to aid translators who may wish to skip
341 # translating some unimportant docstrings.
342 if reduce(operator.__add__, v.values()):
343 print >> fp, '#. docstring'
Barry Warsaw6e972412001-05-21 19:35:20 +0000344 # k is the message string, v is a dictionary-set of (filename,
345 # lineno) tuples. We want to sort the entries in v first by file
346 # name and then by line number.
347 v = v.keys()
348 v.sort()
Fred Drake33e2c3e2000-10-26 03:49:15 +0000349 if not options.writelocations:
350 pass
351 # location comments are different b/w Solaris and GNU:
352 elif options.locationstyle == options.SOLARIS:
353 for filename, lineno in v:
354 d = {'filename': filename, 'lineno': lineno}
355 print >>fp, _('# File: %(filename)s, line: %(lineno)d') % d
356 elif options.locationstyle == options.GNU:
357 # fit as many locations on one line, as long as the
358 # resulting line length doesn't exceeds 'options.width'
359 locline = '#:'
360 for filename, lineno in v:
361 d = {'filename': filename, 'lineno': lineno}
362 s = _(' %(filename)s:%(lineno)d') % d
363 if len(locline) + len(s) <= options.width:
364 locline = locline + s
365 else:
Barry Warsaw08a8a352000-10-27 04:56:28 +0000366 print >> fp, locline
Fred Drake33e2c3e2000-10-26 03:49:15 +0000367 locline = "#:" + s
368 if len(locline) > 2:
Barry Warsaw08a8a352000-10-27 04:56:28 +0000369 print >> fp, locline
Fred Drake33e2c3e2000-10-26 03:49:15 +0000370 # TBD: sorting, normalizing
Barry Warsaw08a8a352000-10-27 04:56:28 +0000371 print >> fp, 'msgid', normalize(k)
372 print >> fp, 'msgstr ""\n'
373
Barry Warsawe27db5a1999-08-13 20:59:48 +0000374
375
376def main():
Barry Warsawa17e0f12000-03-08 15:18:35 +0000377 global default_keywords
Barry Warsawe27db5a1999-08-13 20:59:48 +0000378 try:
379 opts, args = getopt.getopt(
380 sys.argv[1:],
Barry Warsaw08a8a352000-10-27 04:56:28 +0000381 'ad:DEhk:Kno:p:S:Vvw:x:',
Barry Warsaw2b639692001-05-21 19:58:23 +0000382 ['extract-all', 'default-domain=', 'escape', 'help',
Barry Warsawa17e0f12000-03-08 15:18:35 +0000383 'keyword=', 'no-default-keywords',
Barry Warsawc8f08922000-02-26 20:56:47 +0000384 'add-location', 'no-location', 'output=', 'output-dir=',
Barry Warsawa17e0f12000-03-08 15:18:35 +0000385 'style=', 'verbose', 'version', 'width=', 'exclude-file=',
Barry Warsaw08a8a352000-10-27 04:56:28 +0000386 'docstrings',
Barry Warsawc8f08922000-02-26 20:56:47 +0000387 ])
Barry Warsawe27db5a1999-08-13 20:59:48 +0000388 except getopt.error, msg:
389 usage(1, msg)
390
391 # for holding option values
392 class Options:
393 # constants
394 GNU = 1
395 SOLARIS = 2
396 # defaults
Barry Warsawc8f08922000-02-26 20:56:47 +0000397 extractall = 0 # FIXME: currently this option has no effect at all.
398 escape = 0
Barry Warsawe27db5a1999-08-13 20:59:48 +0000399 keywords = []
Barry Warsawc8f08922000-02-26 20:56:47 +0000400 outpath = ''
Barry Warsawe27db5a1999-08-13 20:59:48 +0000401 outfile = 'messages.pot'
Barry Warsawa17e0f12000-03-08 15:18:35 +0000402 writelocations = 1
403 locationstyle = GNU
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000404 verbose = 0
Barry Warsawc8f08922000-02-26 20:56:47 +0000405 width = 78
406 excludefilename = ''
Barry Warsaw08a8a352000-10-27 04:56:28 +0000407 docstrings = 0
Barry Warsawe27db5a1999-08-13 20:59:48 +0000408
409 options = Options()
410 locations = {'gnu' : options.GNU,
411 'solaris' : options.SOLARIS,
412 }
413
414 # parse options
415 for opt, arg in opts:
416 if opt in ('-h', '--help'):
417 usage(0)
Barry Warsawc8f08922000-02-26 20:56:47 +0000418 elif opt in ('-a', '--extract-all'):
419 options.extractall = 1
420 elif opt in ('-d', '--default-domain'):
421 options.outfile = arg + '.pot'
422 elif opt in ('-E', '--escape'):
423 options.escape = 1
Barry Warsaw08a8a352000-10-27 04:56:28 +0000424 elif opt in ('-D', '--docstrings'):
425 options.docstrings = 1
Barry Warsawe27db5a1999-08-13 20:59:48 +0000426 elif opt in ('-k', '--keyword'):
Barry Warsawe27db5a1999-08-13 20:59:48 +0000427 options.keywords.append(arg)
Barry Warsawa17e0f12000-03-08 15:18:35 +0000428 elif opt in ('-K', '--no-default-keywords'):
429 default_keywords = []
Barry Warsawe27db5a1999-08-13 20:59:48 +0000430 elif opt in ('-n', '--add-location'):
Barry Warsawa17e0f12000-03-08 15:18:35 +0000431 options.writelocations = 1
Barry Warsawe27db5a1999-08-13 20:59:48 +0000432 elif opt in ('--no-location',):
Barry Warsawa17e0f12000-03-08 15:18:35 +0000433 options.writelocations = 0
434 elif opt in ('-S', '--style'):
435 options.locationstyle = locations.get(arg.lower())
436 if options.locationstyle is None:
437 usage(1, _('Invalid value for --style: %s') % arg)
Barry Warsawc8f08922000-02-26 20:56:47 +0000438 elif opt in ('-o', '--output'):
439 options.outfile = arg
440 elif opt in ('-p', '--output-dir'):
441 options.outpath = arg
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000442 elif opt in ('-v', '--verbose'):
443 options.verbose = 1
Barry Warsawc8f08922000-02-26 20:56:47 +0000444 elif opt in ('-V', '--version'):
445 print _('pygettext.py (xgettext for Python) %s') % __version__
446 sys.exit(0)
447 elif opt in ('-w', '--width'):
448 try:
449 options.width = int(arg)
450 except ValueError:
Barry Warsawa17e0f12000-03-08 15:18:35 +0000451 usage(1, _('--width argument must be an integer: %s') % arg)
Barry Warsawc8f08922000-02-26 20:56:47 +0000452 elif opt in ('-x', '--exclude-file'):
453 options.excludefilename = arg
454
455 # calculate escapes
Barry Warsaw7733e122000-02-27 14:30:48 +0000456 make_escapes(options.escape)
Barry Warsawe27db5a1999-08-13 20:59:48 +0000457
458 # calculate all keywords
459 options.keywords.extend(default_keywords)
460
Barry Warsawc8f08922000-02-26 20:56:47 +0000461 # initialize list of strings to exclude
462 if options.excludefilename:
463 try:
464 fp = open(options.excludefilename)
465 options.toexclude = fp.readlines()
466 fp.close()
467 except IOError:
Barry Warsaw6e972412001-05-21 19:35:20 +0000468 print >> sys.stderr, _(
469 "Can't read --exclude-file: %s") % options.excludefilename
Barry Warsawc8f08922000-02-26 20:56:47 +0000470 sys.exit(1)
471 else:
472 options.toexclude = []
473
Barry Warsawe27db5a1999-08-13 20:59:48 +0000474 # slurp through all the files
475 eater = TokenEater(options)
476 for filename in args:
Barry Warsawa17e0f12000-03-08 15:18:35 +0000477 if filename == '-':
478 if options.verbose:
479 print _('Reading standard input')
480 fp = sys.stdin
481 closep = 0
482 else:
483 if options.verbose:
484 print _('Working on %s') % filename
485 fp = open(filename)
486 closep = 1
487 try:
488 eater.set_filename(filename)
Barry Warsaw75ee8f52001-02-26 04:46:53 +0000489 try:
490 tokenize.tokenize(fp.readline, eater)
491 except tokenize.TokenError, e:
Barry Warsaw6e972412001-05-21 19:35:20 +0000492 print >> sys.stderr, '%s: %s, line %d, column %d' % (
493 e[0], filename, e[1][0], e[1][1])
Barry Warsawa17e0f12000-03-08 15:18:35 +0000494 finally:
495 if closep:
496 fp.close()
Barry Warsawe27db5a1999-08-13 20:59:48 +0000497
Barry Warsawa17e0f12000-03-08 15:18:35 +0000498 # write the output
499 if options.outfile == '-':
500 fp = sys.stdout
501 closep = 0
502 else:
503 if options.outpath:
504 options.outfile = os.path.join(options.outpath, options.outfile)
505 fp = open(options.outfile, 'w')
506 closep = 1
507 try:
508 eater.write(fp)
509 finally:
510 if closep:
511 fp.close()
Barry Warsawe27db5a1999-08-13 20:59:48 +0000512
513
514if __name__ == '__main__':
515 main()
Barry Warsaw75a6e672000-05-02 19:28:30 +0000516 # some more test strings
517 _(u'a unicode string')