blob: a4bf01ff92a84dcd52c48ede891bd71347e7b1cf [file] [log] [blame]
Barry Warsawaf572511999-08-11 21:40:38 +00001#! /usr/bin/env python
Barry Warsaw6e972412001-05-21 19:35:20 +00002# Originally written by Barry Warsaw <barry@digicool.com>
Barry Warsawc8f08922000-02-26 20:56:47 +00003#
Barry Warsaw6e972412001-05-21 19:35:20 +00004# Minimally patched to make it even more xgettext compatible
Barry Warsawc8f08922000-02-26 20:56:47 +00005# by Peter Funk <pf@artcom-gmbh.de>
Barry Warsawe27db5a1999-08-13 20:59:48 +00006
Barry Warsaw08a8a352000-10-27 04:56:28 +00007"""pygettext -- Python equivalent of xgettext(1)
Barry Warsawe27db5a1999-08-13 20:59:48 +00008
9Many systems (Solaris, Linux, Gnu) provide extensive tools that ease the
10internationalization of C programs. Most of these tools are independent of
11the programming language and can be used from within Python programs. Martin
12von Loewis' work[1] helps considerably in this regard.
13
Barry Warsaw5dbf5261999-11-03 18:47:52 +000014There's one problem though; xgettext is the program that scans source code
Barry Warsawe27db5a1999-08-13 20:59:48 +000015looking for message strings, but it groks only C (or C++). Python introduces
16a few wrinkles, such as dual quoting characters, triple quoted strings, and
17raw strings. xgettext understands none of this.
18
19Enter pygettext, which uses Python's standard tokenize module to scan Python
20source code, generating .pot files identical to what GNU xgettext[2] generates
Barry Warsaw5dbf5261999-11-03 18:47:52 +000021for C and C++ code. From there, the standard GNU tools can be used.
Barry Warsawe27db5a1999-08-13 20:59:48 +000022
23A word about marking Python strings as candidates for translation. GNU
24xgettext recognizes the following keywords: gettext, dgettext, dcgettext, and
25gettext_noop. But those can be a lot of text to include all over your code.
Barry Warsaw5dbf5261999-11-03 18:47:52 +000026C and C++ have a trick: they use the C preprocessor. Most internationalized C
Barry Warsawe27db5a1999-08-13 20:59:48 +000027source includes a #define for gettext() to _() so that what has to be written
28in the source is much less. Thus these are both translatable strings:
29
30 gettext("Translatable String")
31 _("Translatable String")
32
33Python of course has no preprocessor so this doesn't work so well. Thus,
34pygettext searches only for _() by default, but see the -k/--keyword flag
35below for how to augment this.
36
37 [1] http://www.python.org/workshops/1997-10/proceedings/loewis.html
38 [2] http://www.gnu.org/software/gettext/gettext.html
39
Barry Warsawe27db5a1999-08-13 20:59:48 +000040NOTE: pygettext attempts to be option and feature compatible with GNU xgettext
Barry Warsawc8f08922000-02-26 20:56:47 +000041where ever possible. However some options are still missing or are not fully
Barry Warsawa17e0f12000-03-08 15:18:35 +000042implemented. Also, xgettext's use of command line switches with option
43arguments is broken, and in these cases, pygettext just defines additional
44switches.
Barry Warsawe27db5a1999-08-13 20:59:48 +000045
Barry Warsawa17e0f12000-03-08 15:18:35 +000046Usage: pygettext [options] inputfile ...
Barry Warsawe27db5a1999-08-13 20:59:48 +000047
48Options:
49
50 -a
51 --extract-all
52 Extract all strings
53
Barry Warsawc8f08922000-02-26 20:56:47 +000054 -d name
55 --default-domain=name
56 Rename the default output file from messages.pot to name.pot
57
58 -E
59 --escape
Barry Warsaw08a8a352000-10-27 04:56:28 +000060 Replace non-ASCII characters with octal escape sequences.
61
62 -D
63 --docstrings
64 Extract module, class, method, and function docstrings. These do not
65 need to be wrapped in _() markers, and in fact cannot be for Python to
66 consider them docstrings.
Barry Warsawc8f08922000-02-26 20:56:47 +000067
68 -h
69 --help
70 print this help message and exit
Barry Warsawe27db5a1999-08-13 20:59:48 +000071
Barry Warsawa17e0f12000-03-08 15:18:35 +000072 -k word
73 --keyword=word
74 Keywords to look for in addition to the default set, which are:
75 %(DEFAULTKEYWORDS)s
Barry Warsawe27db5a1999-08-13 20:59:48 +000076
Barry Warsawa17e0f12000-03-08 15:18:35 +000077 You can have multiple -k flags on the command line.
78
79 -K
80 --no-default-keywords
81 Disable the default set of keywords (see above). Any keywords
82 explicitly added with the -k/--keyword option are still recognized.
Barry Warsawe27db5a1999-08-13 20:59:48 +000083
84 --no-location
Barry Warsawa17e0f12000-03-08 15:18:35 +000085 Do not write filename/lineno location comments.
Barry Warsawe27db5a1999-08-13 20:59:48 +000086
Barry Warsawa17e0f12000-03-08 15:18:35 +000087 -n
88 --add-location
Barry Warsawe27db5a1999-08-13 20:59:48 +000089 Write filename/lineno location comments indicating where each
90 extracted string is found in the source. These lines appear before
Barry Warsawa17e0f12000-03-08 15:18:35 +000091 each msgid. The style of comments is controlled by the -S/--style
92 option. This is the default.
93
Barry Warsaw08a8a352000-10-27 04:56:28 +000094 -o filename
95 --output=filename
96 Rename the default output file from messages.pot to filename. If
97 filename is `-' then the output is sent to standard out.
98
99 -p dir
100 --output-dir=dir
101 Output files will be placed in directory dir.
102
Barry Warsawa17e0f12000-03-08 15:18:35 +0000103 -S stylename
104 --style stylename
105 Specify which style to use for location comments. Two styles are
106 supported:
Barry Warsawe27db5a1999-08-13 20:59:48 +0000107
108 Solaris # File: filename, line: line-number
Barry Warsawa17e0f12000-03-08 15:18:35 +0000109 GNU #: filename:line
Barry Warsawe27db5a1999-08-13 20:59:48 +0000110
Barry Warsawa17e0f12000-03-08 15:18:35 +0000111 The style name is case insensitive. GNU style is the default.
Barry Warsawe27db5a1999-08-13 20:59:48 +0000112
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000113 -v
114 --verbose
115 Print the names of the files being processed.
116
Barry Warsawc8f08922000-02-26 20:56:47 +0000117 -V
118 --version
119 Print the version of pygettext and exit.
120
121 -w columns
122 --width=columns
123 Set width of output to columns.
124
125 -x filename
126 --exclude-file=filename
127 Specify a file that contains a list of strings that are not be
128 extracted from the input files. Each string to be excluded must
129 appear on a line by itself in the file.
Barry Warsawe27db5a1999-08-13 20:59:48 +0000130
Barry Warsawa17e0f12000-03-08 15:18:35 +0000131If `inputfile' is -, standard input is read.
132
Barry Warsaw08a8a352000-10-27 04:56:28 +0000133"""
Barry Warsawe27db5a1999-08-13 20:59:48 +0000134
135import os
136import sys
Barry Warsawe27db5a1999-08-13 20:59:48 +0000137import time
138import getopt
139import tokenize
Barry Warsaw16b62c12001-05-21 19:51:26 +0000140import operator
Barry Warsawe27db5a1999-08-13 20:59:48 +0000141
Barry Warsaw08a8a352000-10-27 04:56:28 +0000142# for selftesting
143try:
144 import fintl
145 _ = fintl.gettext
146except ImportError:
147 def _(s): return s
148
Martin v. Löwis0f6b3832001-03-01 22:56:17 +0000149__version__ = '1.3'
Barry Warsawa17e0f12000-03-08 15:18:35 +0000150
151default_keywords = ['_']
152DEFAULTKEYWORDS = ', '.join(default_keywords)
153
154EMPTYSTRING = ''
Barry Warsawe27db5a1999-08-13 20:59:48 +0000155
156
157
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000158# The normal pot-file header. msgmerge and EMACS' po-mode work better if
159# it's there.
160pot_header = _('''\
161# SOME DESCRIPTIVE TITLE.
162# Copyright (C) YEAR ORGANIZATION
163# FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.
164#
165msgid ""
166msgstr ""
167"Project-Id-Version: PACKAGE VERSION\\n"
Martin v. Löwis0f6b3832001-03-01 22:56:17 +0000168"POT-Creation-Date: %(time)s\\n"
169"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\\n"
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000170"Last-Translator: FULL NAME <EMAIL@ADDRESS>\\n"
171"Language-Team: LANGUAGE <LL@li.org>\\n"
172"MIME-Version: 1.0\\n"
173"Content-Type: text/plain; charset=CHARSET\\n"
174"Content-Transfer-Encoding: ENCODING\\n"
175"Generated-By: pygettext.py %(version)s\\n"
176
177''')
178
179
Barry Warsawe27db5a1999-08-13 20:59:48 +0000180def usage(code, msg=''):
Barry Warsaw08a8a352000-10-27 04:56:28 +0000181 print >> sys.stderr, _(__doc__) % globals()
Barry Warsawe27db5a1999-08-13 20:59:48 +0000182 if msg:
Barry Warsaw08a8a352000-10-27 04:56:28 +0000183 print >> sys.stderr, msg
Barry Warsawe27db5a1999-08-13 20:59:48 +0000184 sys.exit(code)
185
Barry Warsawc8f08922000-02-26 20:56:47 +0000186
Barry Warsawe27db5a1999-08-13 20:59:48 +0000187
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000188escapes = []
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000189
Barry Warsawc8f08922000-02-26 20:56:47 +0000190def make_escapes(pass_iso8859):
191 global escapes
Barry Warsaw7733e122000-02-27 14:30:48 +0000192 if pass_iso8859:
193 # Allow iso-8859 characters to pass through so that e.g. 'msgid
194 # "Höhe"' would result not result in 'msgid "H\366he"'. Otherwise we
195 # escape any character outside the 32..126 range.
196 mod = 128
197 else:
198 mod = 256
Barry Warsawc8f08922000-02-26 20:56:47 +0000199 for i in range(256):
Barry Warsaw7733e122000-02-27 14:30:48 +0000200 if 32 <= (i % mod) <= 126:
Barry Warsawc8f08922000-02-26 20:56:47 +0000201 escapes.append(chr(i))
202 else:
203 escapes.append("\\%03o" % i)
204 escapes[ord('\\')] = '\\\\'
205 escapes[ord('\t')] = '\\t'
206 escapes[ord('\r')] = '\\r'
207 escapes[ord('\n')] = '\\n'
208 escapes[ord('\"')] = '\\"'
209
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000210
211def escape(s):
Barry Warsawc8f08922000-02-26 20:56:47 +0000212 global escapes
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000213 s = list(s)
214 for i in range(len(s)):
215 s[i] = escapes[ord(s[i])]
Barry Warsawa17e0f12000-03-08 15:18:35 +0000216 return EMPTYSTRING.join(s)
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000217
218
219def safe_eval(s):
220 # unwrap quotes, safely
221 return eval(s, {'__builtins__':{}}, {})
222
223
Barry Warsawe27db5a1999-08-13 20:59:48 +0000224def normalize(s):
225 # This converts the various Python string types into a format that is
226 # appropriate for .po files, namely much closer to C style.
Barry Warsawa17e0f12000-03-08 15:18:35 +0000227 lines = s.split('\n')
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000228 if len(lines) == 1:
229 s = '"' + escape(s) + '"'
Barry Warsawe27db5a1999-08-13 20:59:48 +0000230 else:
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000231 if not lines[-1]:
232 del lines[-1]
233 lines[-1] = lines[-1] + '\n'
234 for i in range(len(lines)):
235 lines[i] = escape(lines[i])
Barry Warsawa17e0f12000-03-08 15:18:35 +0000236 lineterm = '\\n"\n"'
237 s = '""\n"' + lineterm.join(lines) + '"'
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000238 return s
Barry Warsawe27db5a1999-08-13 20:59:48 +0000239
240
241
242class TokenEater:
243 def __init__(self, options):
244 self.__options = options
245 self.__messages = {}
246 self.__state = self.__waiting
247 self.__data = []
248 self.__lineno = -1
Barry Warsaw08a8a352000-10-27 04:56:28 +0000249 self.__freshmodule = 1
Barry Warsawe27db5a1999-08-13 20:59:48 +0000250
251 def __call__(self, ttype, tstring, stup, etup, line):
252 # dispatch
Barry Warsaw08a8a352000-10-27 04:56:28 +0000253## import token
254## print >> sys.stderr, 'ttype:', token.tok_name[ttype], \
255## 'tstring:', tstring
Barry Warsawe27db5a1999-08-13 20:59:48 +0000256 self.__state(ttype, tstring, stup[0])
257
258 def __waiting(self, ttype, tstring, lineno):
Barry Warsaw08a8a352000-10-27 04:56:28 +0000259 # Do docstring extractions, if enabled
260 if self.__options.docstrings:
261 # module docstring?
262 if self.__freshmodule:
263 if ttype == tokenize.STRING:
Barry Warsaw16b62c12001-05-21 19:51:26 +0000264 self.__addentry(safe_eval(tstring), lineno, isdocstring=1)
Barry Warsaw08a8a352000-10-27 04:56:28 +0000265 self.__freshmodule = 0
266 elif ttype not in (tokenize.COMMENT, tokenize.NL):
267 self.__freshmodule = 0
268 return
269 # class docstring?
270 if ttype == tokenize.NAME and tstring in ('class', 'def'):
271 self.__state = self.__suiteseen
272 return
Barry Warsawe27db5a1999-08-13 20:59:48 +0000273 if ttype == tokenize.NAME and tstring in self.__options.keywords:
274 self.__state = self.__keywordseen
275
Barry Warsaw08a8a352000-10-27 04:56:28 +0000276 def __suiteseen(self, ttype, tstring, lineno):
277 # ignore anything until we see the colon
278 if ttype == tokenize.OP and tstring == ':':
279 self.__state = self.__suitedocstring
280
281 def __suitedocstring(self, ttype, tstring, lineno):
282 # ignore any intervening noise
283 if ttype == tokenize.STRING:
Barry Warsaw16b62c12001-05-21 19:51:26 +0000284 self.__addentry(safe_eval(tstring), lineno, isdocstring=1)
Barry Warsaw08a8a352000-10-27 04:56:28 +0000285 self.__state = self.__waiting
286 elif ttype not in (tokenize.NEWLINE, tokenize.INDENT,
287 tokenize.COMMENT):
288 # there was no class docstring
289 self.__state = self.__waiting
290
Barry Warsawe27db5a1999-08-13 20:59:48 +0000291 def __keywordseen(self, ttype, tstring, lineno):
292 if ttype == tokenize.OP and tstring == '(':
293 self.__data = []
294 self.__lineno = lineno
295 self.__state = self.__openseen
296 else:
297 self.__state = self.__waiting
298
299 def __openseen(self, ttype, tstring, lineno):
300 if ttype == tokenize.OP and tstring == ')':
301 # We've seen the last of the translatable strings. Record the
302 # line number of the first line of the strings and update the list
303 # of messages seen. Reset state for the next batch. If there
304 # were no strings inside _(), then just ignore this entry.
305 if self.__data:
Barry Warsaw08a8a352000-10-27 04:56:28 +0000306 self.__addentry(EMPTYSTRING.join(self.__data))
Barry Warsawe27db5a1999-08-13 20:59:48 +0000307 self.__state = self.__waiting
308 elif ttype == tokenize.STRING:
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000309 self.__data.append(safe_eval(tstring))
Barry Warsawe27db5a1999-08-13 20:59:48 +0000310 # TBD: should we warn if we seen anything else?
311
Barry Warsaw16b62c12001-05-21 19:51:26 +0000312 def __addentry(self, msg, lineno=None, isdocstring=0):
Barry Warsaw08a8a352000-10-27 04:56:28 +0000313 if lineno is None:
314 lineno = self.__lineno
315 if not msg in self.__options.toexclude:
316 entry = (self.__curfile, lineno)
Barry Warsaw16b62c12001-05-21 19:51:26 +0000317 self.__messages.setdefault(msg, {})[entry] = isdocstring
Barry Warsaw08a8a352000-10-27 04:56:28 +0000318
Barry Warsawe27db5a1999-08-13 20:59:48 +0000319 def set_filename(self, filename):
320 self.__curfile = filename
321
322 def write(self, fp):
323 options = self.__options
324 timestamp = time.ctime(time.time())
Barry Warsaw08a8a352000-10-27 04:56:28 +0000325 # The time stamp in the header doesn't have the same format as that
326 # generated by xgettext...
327 print >> fp, pot_header % {'time': timestamp, 'version': __version__}
Barry Warsaw128c77d2001-05-23 16:59:45 +0000328 # Sort the entries. First sort each particular entry's keys, then
329 # sort all the entries by their first item.
330 reverse = {}
Fred Drake33e2c3e2000-10-26 03:49:15 +0000331 for k, v in self.__messages.items():
Barry Warsaw128c77d2001-05-23 16:59:45 +0000332 keys = v.keys()
333 keys.sort()
Barry Warsaw50cf7062001-05-24 23:06:13 +0000334 reverse.setdefault(tuple(keys), []).append((k, v))
Barry Warsaw128c77d2001-05-23 16:59:45 +0000335 rkeys = reverse.keys()
336 rkeys.sort()
337 for rkey in rkeys:
Barry Warsaw50cf7062001-05-24 23:06:13 +0000338 rentries = reverse[rkey]
339 rentries.sort()
340 for k, v in rentries:
341 # If the entry was gleaned out of a docstring, then add a
342 # comment stating so. This is to aid translators who may wish
343 # to skip translating some unimportant docstrings.
344 if reduce(operator.__add__, v.values()):
345 print >> fp, '#. docstring'
346 # k is the message string, v is a dictionary-set of (filename,
347 # lineno) tuples. We want to sort the entries in v first by
348 # file name and then by line number.
349 v = v.keys()
350 v.sort()
351 if not options.writelocations:
352 pass
353 # location comments are different b/w Solaris and GNU:
354 elif options.locationstyle == options.SOLARIS:
355 for filename, lineno in v:
356 d = {'filename': filename, 'lineno': lineno}
357 print >>fp, _(
358 '# File: %(filename)s, line: %(lineno)d') % d
359 elif options.locationstyle == options.GNU:
360 # fit as many locations on one line, as long as the
361 # resulting line length doesn't exceeds 'options.width'
362 locline = '#:'
363 for filename, lineno in v:
364 d = {'filename': filename, 'lineno': lineno}
365 s = _(' %(filename)s:%(lineno)d') % d
366 if len(locline) + len(s) <= options.width:
367 locline = locline + s
368 else:
369 print >> fp, locline
370 locline = "#:" + s
371 if len(locline) > 2:
Barry Warsaw08a8a352000-10-27 04:56:28 +0000372 print >> fp, locline
Barry Warsaw50cf7062001-05-24 23:06:13 +0000373 print >> fp, 'msgid', normalize(k)
374 print >> fp, 'msgstr ""\n'
Barry Warsaw08a8a352000-10-27 04:56:28 +0000375
Barry Warsawe27db5a1999-08-13 20:59:48 +0000376
377
378def main():
Barry Warsawa17e0f12000-03-08 15:18:35 +0000379 global default_keywords
Barry Warsawe27db5a1999-08-13 20:59:48 +0000380 try:
381 opts, args = getopt.getopt(
382 sys.argv[1:],
Barry Warsaw08a8a352000-10-27 04:56:28 +0000383 'ad:DEhk:Kno:p:S:Vvw:x:',
Barry Warsaw2b639692001-05-21 19:58:23 +0000384 ['extract-all', 'default-domain=', 'escape', 'help',
Barry Warsawa17e0f12000-03-08 15:18:35 +0000385 'keyword=', 'no-default-keywords',
Barry Warsawc8f08922000-02-26 20:56:47 +0000386 'add-location', 'no-location', 'output=', 'output-dir=',
Barry Warsawa17e0f12000-03-08 15:18:35 +0000387 'style=', 'verbose', 'version', 'width=', 'exclude-file=',
Barry Warsaw08a8a352000-10-27 04:56:28 +0000388 'docstrings',
Barry Warsawc8f08922000-02-26 20:56:47 +0000389 ])
Barry Warsawe27db5a1999-08-13 20:59:48 +0000390 except getopt.error, msg:
391 usage(1, msg)
392
393 # for holding option values
394 class Options:
395 # constants
396 GNU = 1
397 SOLARIS = 2
398 # defaults
Barry Warsawc8f08922000-02-26 20:56:47 +0000399 extractall = 0 # FIXME: currently this option has no effect at all.
400 escape = 0
Barry Warsawe27db5a1999-08-13 20:59:48 +0000401 keywords = []
Barry Warsawc8f08922000-02-26 20:56:47 +0000402 outpath = ''
Barry Warsawe27db5a1999-08-13 20:59:48 +0000403 outfile = 'messages.pot'
Barry Warsawa17e0f12000-03-08 15:18:35 +0000404 writelocations = 1
405 locationstyle = GNU
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000406 verbose = 0
Barry Warsawc8f08922000-02-26 20:56:47 +0000407 width = 78
408 excludefilename = ''
Barry Warsaw08a8a352000-10-27 04:56:28 +0000409 docstrings = 0
Barry Warsawe27db5a1999-08-13 20:59:48 +0000410
411 options = Options()
412 locations = {'gnu' : options.GNU,
413 'solaris' : options.SOLARIS,
414 }
415
416 # parse options
417 for opt, arg in opts:
418 if opt in ('-h', '--help'):
419 usage(0)
Barry Warsawc8f08922000-02-26 20:56:47 +0000420 elif opt in ('-a', '--extract-all'):
421 options.extractall = 1
422 elif opt in ('-d', '--default-domain'):
423 options.outfile = arg + '.pot'
424 elif opt in ('-E', '--escape'):
425 options.escape = 1
Barry Warsaw08a8a352000-10-27 04:56:28 +0000426 elif opt in ('-D', '--docstrings'):
427 options.docstrings = 1
Barry Warsawe27db5a1999-08-13 20:59:48 +0000428 elif opt in ('-k', '--keyword'):
Barry Warsawe27db5a1999-08-13 20:59:48 +0000429 options.keywords.append(arg)
Barry Warsawa17e0f12000-03-08 15:18:35 +0000430 elif opt in ('-K', '--no-default-keywords'):
431 default_keywords = []
Barry Warsawe27db5a1999-08-13 20:59:48 +0000432 elif opt in ('-n', '--add-location'):
Barry Warsawa17e0f12000-03-08 15:18:35 +0000433 options.writelocations = 1
Barry Warsawe27db5a1999-08-13 20:59:48 +0000434 elif opt in ('--no-location',):
Barry Warsawa17e0f12000-03-08 15:18:35 +0000435 options.writelocations = 0
436 elif opt in ('-S', '--style'):
437 options.locationstyle = locations.get(arg.lower())
438 if options.locationstyle is None:
439 usage(1, _('Invalid value for --style: %s') % arg)
Barry Warsawc8f08922000-02-26 20:56:47 +0000440 elif opt in ('-o', '--output'):
441 options.outfile = arg
442 elif opt in ('-p', '--output-dir'):
443 options.outpath = arg
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000444 elif opt in ('-v', '--verbose'):
445 options.verbose = 1
Barry Warsawc8f08922000-02-26 20:56:47 +0000446 elif opt in ('-V', '--version'):
447 print _('pygettext.py (xgettext for Python) %s') % __version__
448 sys.exit(0)
449 elif opt in ('-w', '--width'):
450 try:
451 options.width = int(arg)
452 except ValueError:
Barry Warsawa17e0f12000-03-08 15:18:35 +0000453 usage(1, _('--width argument must be an integer: %s') % arg)
Barry Warsawc8f08922000-02-26 20:56:47 +0000454 elif opt in ('-x', '--exclude-file'):
455 options.excludefilename = arg
456
457 # calculate escapes
Barry Warsaw7733e122000-02-27 14:30:48 +0000458 make_escapes(options.escape)
Barry Warsawe27db5a1999-08-13 20:59:48 +0000459
460 # calculate all keywords
461 options.keywords.extend(default_keywords)
462
Barry Warsawc8f08922000-02-26 20:56:47 +0000463 # initialize list of strings to exclude
464 if options.excludefilename:
465 try:
466 fp = open(options.excludefilename)
467 options.toexclude = fp.readlines()
468 fp.close()
469 except IOError:
Barry Warsaw6e972412001-05-21 19:35:20 +0000470 print >> sys.stderr, _(
471 "Can't read --exclude-file: %s") % options.excludefilename
Barry Warsawc8f08922000-02-26 20:56:47 +0000472 sys.exit(1)
473 else:
474 options.toexclude = []
475
Barry Warsawe27db5a1999-08-13 20:59:48 +0000476 # slurp through all the files
477 eater = TokenEater(options)
478 for filename in args:
Barry Warsawa17e0f12000-03-08 15:18:35 +0000479 if filename == '-':
480 if options.verbose:
481 print _('Reading standard input')
482 fp = sys.stdin
483 closep = 0
484 else:
485 if options.verbose:
486 print _('Working on %s') % filename
487 fp = open(filename)
488 closep = 1
489 try:
490 eater.set_filename(filename)
Barry Warsaw75ee8f52001-02-26 04:46:53 +0000491 try:
492 tokenize.tokenize(fp.readline, eater)
493 except tokenize.TokenError, e:
Barry Warsaw6e972412001-05-21 19:35:20 +0000494 print >> sys.stderr, '%s: %s, line %d, column %d' % (
495 e[0], filename, e[1][0], e[1][1])
Barry Warsawa17e0f12000-03-08 15:18:35 +0000496 finally:
497 if closep:
498 fp.close()
Barry Warsawe27db5a1999-08-13 20:59:48 +0000499
Barry Warsawa17e0f12000-03-08 15:18:35 +0000500 # write the output
501 if options.outfile == '-':
502 fp = sys.stdout
503 closep = 0
504 else:
505 if options.outpath:
506 options.outfile = os.path.join(options.outpath, options.outfile)
507 fp = open(options.outfile, 'w')
508 closep = 1
509 try:
510 eater.write(fp)
511 finally:
512 if closep:
513 fp.close()
Barry Warsawe27db5a1999-08-13 20:59:48 +0000514
515
516if __name__ == '__main__':
517 main()
Barry Warsaw75a6e672000-05-02 19:28:30 +0000518 # some more test strings
519 _(u'a unicode string')