blob: 988295217beb387fe9c11cbb362ab12c83b6e5a2 [file] [log] [blame]
Barry Warsawaf572511999-08-11 21:40:38 +00001#! /usr/bin/env python
Barry Warsaw6e972412001-05-21 19:35:20 +00002# Originally written by Barry Warsaw <barry@digicool.com>
Barry Warsawc8f08922000-02-26 20:56:47 +00003#
Barry Warsaw6e972412001-05-21 19:35:20 +00004# Minimally patched to make it even more xgettext compatible
Barry Warsawc8f08922000-02-26 20:56:47 +00005# by Peter Funk <pf@artcom-gmbh.de>
Barry Warsawe27db5a1999-08-13 20:59:48 +00006
Barry Warsaw08a8a352000-10-27 04:56:28 +00007"""pygettext -- Python equivalent of xgettext(1)
Barry Warsawe27db5a1999-08-13 20:59:48 +00008
9Many systems (Solaris, Linux, Gnu) provide extensive tools that ease the
10internationalization of C programs. Most of these tools are independent of
11the programming language and can be used from within Python programs. Martin
12von Loewis' work[1] helps considerably in this regard.
13
Barry Warsaw5dbf5261999-11-03 18:47:52 +000014There's one problem though; xgettext is the program that scans source code
Barry Warsawe27db5a1999-08-13 20:59:48 +000015looking for message strings, but it groks only C (or C++). Python introduces
16a few wrinkles, such as dual quoting characters, triple quoted strings, and
17raw strings. xgettext understands none of this.
18
19Enter pygettext, which uses Python's standard tokenize module to scan Python
20source code, generating .pot files identical to what GNU xgettext[2] generates
Barry Warsaw5dbf5261999-11-03 18:47:52 +000021for C and C++ code. From there, the standard GNU tools can be used.
Barry Warsawe27db5a1999-08-13 20:59:48 +000022
23A word about marking Python strings as candidates for translation. GNU
24xgettext recognizes the following keywords: gettext, dgettext, dcgettext, and
25gettext_noop. But those can be a lot of text to include all over your code.
Barry Warsaw5dbf5261999-11-03 18:47:52 +000026C and C++ have a trick: they use the C preprocessor. Most internationalized C
Barry Warsawe27db5a1999-08-13 20:59:48 +000027source includes a #define for gettext() to _() so that what has to be written
28in the source is much less. Thus these are both translatable strings:
29
30 gettext("Translatable String")
31 _("Translatable String")
32
33Python of course has no preprocessor so this doesn't work so well. Thus,
34pygettext searches only for _() by default, but see the -k/--keyword flag
35below for how to augment this.
36
37 [1] http://www.python.org/workshops/1997-10/proceedings/loewis.html
38 [2] http://www.gnu.org/software/gettext/gettext.html
39
Barry Warsawe27db5a1999-08-13 20:59:48 +000040NOTE: pygettext attempts to be option and feature compatible with GNU xgettext
Barry Warsawc8f08922000-02-26 20:56:47 +000041where ever possible. However some options are still missing or are not fully
Barry Warsawa17e0f12000-03-08 15:18:35 +000042implemented. Also, xgettext's use of command line switches with option
43arguments is broken, and in these cases, pygettext just defines additional
44switches.
Barry Warsawe27db5a1999-08-13 20:59:48 +000045
Barry Warsawa17e0f12000-03-08 15:18:35 +000046Usage: pygettext [options] inputfile ...
Barry Warsawe27db5a1999-08-13 20:59:48 +000047
48Options:
49
50 -a
51 --extract-all
52 Extract all strings
53
Barry Warsawc8f08922000-02-26 20:56:47 +000054 -d name
55 --default-domain=name
56 Rename the default output file from messages.pot to name.pot
57
58 -E
59 --escape
Barry Warsaw08a8a352000-10-27 04:56:28 +000060 Replace non-ASCII characters with octal escape sequences.
61
62 -D
63 --docstrings
64 Extract module, class, method, and function docstrings. These do not
65 need to be wrapped in _() markers, and in fact cannot be for Python to
66 consider them docstrings.
Barry Warsawc8f08922000-02-26 20:56:47 +000067
68 -h
69 --help
70 print this help message and exit
Barry Warsawe27db5a1999-08-13 20:59:48 +000071
Barry Warsawa17e0f12000-03-08 15:18:35 +000072 -k word
73 --keyword=word
74 Keywords to look for in addition to the default set, which are:
75 %(DEFAULTKEYWORDS)s
Barry Warsawe27db5a1999-08-13 20:59:48 +000076
Barry Warsawa17e0f12000-03-08 15:18:35 +000077 You can have multiple -k flags on the command line.
78
79 -K
80 --no-default-keywords
81 Disable the default set of keywords (see above). Any keywords
82 explicitly added with the -k/--keyword option are still recognized.
Barry Warsawe27db5a1999-08-13 20:59:48 +000083
84 --no-location
Barry Warsawa17e0f12000-03-08 15:18:35 +000085 Do not write filename/lineno location comments.
Barry Warsawe27db5a1999-08-13 20:59:48 +000086
Barry Warsawa17e0f12000-03-08 15:18:35 +000087 -n
88 --add-location
Barry Warsawe27db5a1999-08-13 20:59:48 +000089 Write filename/lineno location comments indicating where each
90 extracted string is found in the source. These lines appear before
Barry Warsawa17e0f12000-03-08 15:18:35 +000091 each msgid. The style of comments is controlled by the -S/--style
92 option. This is the default.
93
Barry Warsaw08a8a352000-10-27 04:56:28 +000094 -o filename
95 --output=filename
96 Rename the default output file from messages.pot to filename. If
97 filename is `-' then the output is sent to standard out.
98
99 -p dir
100 --output-dir=dir
101 Output files will be placed in directory dir.
102
Barry Warsawa17e0f12000-03-08 15:18:35 +0000103 -S stylename
104 --style stylename
105 Specify which style to use for location comments. Two styles are
106 supported:
Barry Warsawe27db5a1999-08-13 20:59:48 +0000107
108 Solaris # File: filename, line: line-number
Barry Warsawa17e0f12000-03-08 15:18:35 +0000109 GNU #: filename:line
Barry Warsawe27db5a1999-08-13 20:59:48 +0000110
Barry Warsawa17e0f12000-03-08 15:18:35 +0000111 The style name is case insensitive. GNU style is the default.
Barry Warsawe27db5a1999-08-13 20:59:48 +0000112
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000113 -v
114 --verbose
115 Print the names of the files being processed.
116
Barry Warsawc8f08922000-02-26 20:56:47 +0000117 -V
118 --version
119 Print the version of pygettext and exit.
120
121 -w columns
122 --width=columns
123 Set width of output to columns.
124
125 -x filename
126 --exclude-file=filename
127 Specify a file that contains a list of strings that are not be
128 extracted from the input files. Each string to be excluded must
129 appear on a line by itself in the file.
Barry Warsawe27db5a1999-08-13 20:59:48 +0000130
Barry Warsawa17e0f12000-03-08 15:18:35 +0000131If `inputfile' is -, standard input is read.
132
Barry Warsaw08a8a352000-10-27 04:56:28 +0000133"""
Barry Warsawe27db5a1999-08-13 20:59:48 +0000134
135import os
136import sys
Barry Warsawe27db5a1999-08-13 20:59:48 +0000137import time
138import getopt
139import tokenize
Barry Warsaw16b62c12001-05-21 19:51:26 +0000140import operator
Barry Warsawe27db5a1999-08-13 20:59:48 +0000141
Barry Warsaw08a8a352000-10-27 04:56:28 +0000142# for selftesting
143try:
144 import fintl
145 _ = fintl.gettext
146except ImportError:
147 def _(s): return s
148
Martin v. Löwis0f6b3832001-03-01 22:56:17 +0000149__version__ = '1.3'
Barry Warsawa17e0f12000-03-08 15:18:35 +0000150
151default_keywords = ['_']
152DEFAULTKEYWORDS = ', '.join(default_keywords)
153
154EMPTYSTRING = ''
Barry Warsawe27db5a1999-08-13 20:59:48 +0000155
156
157
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000158# The normal pot-file header. msgmerge and EMACS' po-mode work better if
159# it's there.
160pot_header = _('''\
161# SOME DESCRIPTIVE TITLE.
162# Copyright (C) YEAR ORGANIZATION
163# FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.
164#
165msgid ""
166msgstr ""
167"Project-Id-Version: PACKAGE VERSION\\n"
Martin v. Löwis0f6b3832001-03-01 22:56:17 +0000168"POT-Creation-Date: %(time)s\\n"
169"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\\n"
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000170"Last-Translator: FULL NAME <EMAIL@ADDRESS>\\n"
171"Language-Team: LANGUAGE <LL@li.org>\\n"
172"MIME-Version: 1.0\\n"
173"Content-Type: text/plain; charset=CHARSET\\n"
174"Content-Transfer-Encoding: ENCODING\\n"
175"Generated-By: pygettext.py %(version)s\\n"
176
177''')
178
179
Barry Warsawe27db5a1999-08-13 20:59:48 +0000180def usage(code, msg=''):
Barry Warsaw08a8a352000-10-27 04:56:28 +0000181 print >> sys.stderr, _(__doc__) % globals()
Barry Warsawe27db5a1999-08-13 20:59:48 +0000182 if msg:
Barry Warsaw08a8a352000-10-27 04:56:28 +0000183 print >> sys.stderr, msg
Barry Warsawe27db5a1999-08-13 20:59:48 +0000184 sys.exit(code)
185
Barry Warsawc8f08922000-02-26 20:56:47 +0000186
Barry Warsawe27db5a1999-08-13 20:59:48 +0000187
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000188escapes = []
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000189
Barry Warsawc8f08922000-02-26 20:56:47 +0000190def make_escapes(pass_iso8859):
191 global escapes
Barry Warsaw7733e122000-02-27 14:30:48 +0000192 if pass_iso8859:
193 # Allow iso-8859 characters to pass through so that e.g. 'msgid
194 # "Höhe"' would result not result in 'msgid "H\366he"'. Otherwise we
195 # escape any character outside the 32..126 range.
196 mod = 128
197 else:
198 mod = 256
Barry Warsawc8f08922000-02-26 20:56:47 +0000199 for i in range(256):
Barry Warsaw7733e122000-02-27 14:30:48 +0000200 if 32 <= (i % mod) <= 126:
Barry Warsawc8f08922000-02-26 20:56:47 +0000201 escapes.append(chr(i))
202 else:
203 escapes.append("\\%03o" % i)
204 escapes[ord('\\')] = '\\\\'
205 escapes[ord('\t')] = '\\t'
206 escapes[ord('\r')] = '\\r'
207 escapes[ord('\n')] = '\\n'
208 escapes[ord('\"')] = '\\"'
209
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000210
211def escape(s):
Barry Warsawc8f08922000-02-26 20:56:47 +0000212 global escapes
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000213 s = list(s)
214 for i in range(len(s)):
215 s[i] = escapes[ord(s[i])]
Barry Warsawa17e0f12000-03-08 15:18:35 +0000216 return EMPTYSTRING.join(s)
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000217
218
219def safe_eval(s):
220 # unwrap quotes, safely
221 return eval(s, {'__builtins__':{}}, {})
222
223
Barry Warsawe27db5a1999-08-13 20:59:48 +0000224def normalize(s):
225 # This converts the various Python string types into a format that is
226 # appropriate for .po files, namely much closer to C style.
Barry Warsawa17e0f12000-03-08 15:18:35 +0000227 lines = s.split('\n')
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000228 if len(lines) == 1:
229 s = '"' + escape(s) + '"'
Barry Warsawe27db5a1999-08-13 20:59:48 +0000230 else:
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000231 if not lines[-1]:
232 del lines[-1]
233 lines[-1] = lines[-1] + '\n'
234 for i in range(len(lines)):
235 lines[i] = escape(lines[i])
Barry Warsawa17e0f12000-03-08 15:18:35 +0000236 lineterm = '\\n"\n"'
237 s = '""\n"' + lineterm.join(lines) + '"'
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000238 return s
Barry Warsawe27db5a1999-08-13 20:59:48 +0000239
240
241
242class TokenEater:
243 def __init__(self, options):
244 self.__options = options
245 self.__messages = {}
246 self.__state = self.__waiting
247 self.__data = []
248 self.__lineno = -1
Barry Warsaw08a8a352000-10-27 04:56:28 +0000249 self.__freshmodule = 1
Barry Warsawe27db5a1999-08-13 20:59:48 +0000250
251 def __call__(self, ttype, tstring, stup, etup, line):
252 # dispatch
Barry Warsaw08a8a352000-10-27 04:56:28 +0000253## import token
254## print >> sys.stderr, 'ttype:', token.tok_name[ttype], \
255## 'tstring:', tstring
Barry Warsawe27db5a1999-08-13 20:59:48 +0000256 self.__state(ttype, tstring, stup[0])
257
258 def __waiting(self, ttype, tstring, lineno):
Barry Warsaw08a8a352000-10-27 04:56:28 +0000259 # Do docstring extractions, if enabled
260 if self.__options.docstrings:
261 # module docstring?
262 if self.__freshmodule:
263 if ttype == tokenize.STRING:
Barry Warsaw16b62c12001-05-21 19:51:26 +0000264 self.__addentry(safe_eval(tstring), lineno, isdocstring=1)
Barry Warsaw08a8a352000-10-27 04:56:28 +0000265 self.__freshmodule = 0
266 elif ttype not in (tokenize.COMMENT, tokenize.NL):
267 self.__freshmodule = 0
268 return
269 # class docstring?
270 if ttype == tokenize.NAME and tstring in ('class', 'def'):
271 self.__state = self.__suiteseen
272 return
Barry Warsawe27db5a1999-08-13 20:59:48 +0000273 if ttype == tokenize.NAME and tstring in self.__options.keywords:
274 self.__state = self.__keywordseen
275
Barry Warsaw08a8a352000-10-27 04:56:28 +0000276 def __suiteseen(self, ttype, tstring, lineno):
277 # ignore anything until we see the colon
278 if ttype == tokenize.OP and tstring == ':':
279 self.__state = self.__suitedocstring
280
281 def __suitedocstring(self, ttype, tstring, lineno):
282 # ignore any intervening noise
283 if ttype == tokenize.STRING:
Barry Warsaw16b62c12001-05-21 19:51:26 +0000284 self.__addentry(safe_eval(tstring), lineno, isdocstring=1)
Barry Warsaw08a8a352000-10-27 04:56:28 +0000285 self.__state = self.__waiting
286 elif ttype not in (tokenize.NEWLINE, tokenize.INDENT,
287 tokenize.COMMENT):
288 # there was no class docstring
289 self.__state = self.__waiting
290
Barry Warsawe27db5a1999-08-13 20:59:48 +0000291 def __keywordseen(self, ttype, tstring, lineno):
292 if ttype == tokenize.OP and tstring == '(':
293 self.__data = []
294 self.__lineno = lineno
295 self.__state = self.__openseen
296 else:
297 self.__state = self.__waiting
298
299 def __openseen(self, ttype, tstring, lineno):
300 if ttype == tokenize.OP and tstring == ')':
301 # We've seen the last of the translatable strings. Record the
302 # line number of the first line of the strings and update the list
303 # of messages seen. Reset state for the next batch. If there
304 # were no strings inside _(), then just ignore this entry.
305 if self.__data:
Barry Warsaw08a8a352000-10-27 04:56:28 +0000306 self.__addentry(EMPTYSTRING.join(self.__data))
Barry Warsawe27db5a1999-08-13 20:59:48 +0000307 self.__state = self.__waiting
308 elif ttype == tokenize.STRING:
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000309 self.__data.append(safe_eval(tstring))
Barry Warsawe27db5a1999-08-13 20:59:48 +0000310 # TBD: should we warn if we seen anything else?
311
Barry Warsaw16b62c12001-05-21 19:51:26 +0000312 def __addentry(self, msg, lineno=None, isdocstring=0):
Barry Warsaw08a8a352000-10-27 04:56:28 +0000313 if lineno is None:
314 lineno = self.__lineno
315 if not msg in self.__options.toexclude:
316 entry = (self.__curfile, lineno)
Barry Warsaw16b62c12001-05-21 19:51:26 +0000317 self.__messages.setdefault(msg, {})[entry] = isdocstring
Barry Warsaw08a8a352000-10-27 04:56:28 +0000318
Barry Warsawe27db5a1999-08-13 20:59:48 +0000319 def set_filename(self, filename):
320 self.__curfile = filename
321
322 def write(self, fp):
323 options = self.__options
324 timestamp = time.ctime(time.time())
Barry Warsaw08a8a352000-10-27 04:56:28 +0000325 # The time stamp in the header doesn't have the same format as that
326 # generated by xgettext...
327 print >> fp, pot_header % {'time': timestamp, 'version': __version__}
Fred Drake33e2c3e2000-10-26 03:49:15 +0000328 for k, v in self.__messages.items():
Barry Warsaw16b62c12001-05-21 19:51:26 +0000329 # If the entry was gleaned out of a docstring, then add a comment
330 # stating so. This is to aid translators who may wish to skip
331 # translating some unimportant docstrings.
332 if reduce(operator.__add__, v.values()):
333 print >> fp, '#. docstring'
Barry Warsaw6e972412001-05-21 19:35:20 +0000334 # k is the message string, v is a dictionary-set of (filename,
335 # lineno) tuples. We want to sort the entries in v first by file
336 # name and then by line number.
337 v = v.keys()
338 v.sort()
Fred Drake33e2c3e2000-10-26 03:49:15 +0000339 if not options.writelocations:
340 pass
341 # location comments are different b/w Solaris and GNU:
342 elif options.locationstyle == options.SOLARIS:
343 for filename, lineno in v:
344 d = {'filename': filename, 'lineno': lineno}
345 print >>fp, _('# File: %(filename)s, line: %(lineno)d') % d
346 elif options.locationstyle == options.GNU:
347 # fit as many locations on one line, as long as the
348 # resulting line length doesn't exceeds 'options.width'
349 locline = '#:'
350 for filename, lineno in v:
351 d = {'filename': filename, 'lineno': lineno}
352 s = _(' %(filename)s:%(lineno)d') % d
353 if len(locline) + len(s) <= options.width:
354 locline = locline + s
355 else:
Barry Warsaw08a8a352000-10-27 04:56:28 +0000356 print >> fp, locline
Fred Drake33e2c3e2000-10-26 03:49:15 +0000357 locline = "#:" + s
358 if len(locline) > 2:
Barry Warsaw08a8a352000-10-27 04:56:28 +0000359 print >> fp, locline
Fred Drake33e2c3e2000-10-26 03:49:15 +0000360 # TBD: sorting, normalizing
Barry Warsaw08a8a352000-10-27 04:56:28 +0000361 print >> fp, 'msgid', normalize(k)
362 print >> fp, 'msgstr ""\n'
363
Barry Warsawe27db5a1999-08-13 20:59:48 +0000364
365
366def main():
Barry Warsawa17e0f12000-03-08 15:18:35 +0000367 global default_keywords
Barry Warsawe27db5a1999-08-13 20:59:48 +0000368 try:
369 opts, args = getopt.getopt(
370 sys.argv[1:],
Barry Warsaw08a8a352000-10-27 04:56:28 +0000371 'ad:DEhk:Kno:p:S:Vvw:x:',
Barry Warsawa17e0f12000-03-08 15:18:35 +0000372 ['extract-all', 'default-domain', 'escape', 'help',
373 'keyword=', 'no-default-keywords',
Barry Warsawc8f08922000-02-26 20:56:47 +0000374 'add-location', 'no-location', 'output=', 'output-dir=',
Barry Warsawa17e0f12000-03-08 15:18:35 +0000375 'style=', 'verbose', 'version', 'width=', 'exclude-file=',
Barry Warsaw08a8a352000-10-27 04:56:28 +0000376 'docstrings',
Barry Warsawc8f08922000-02-26 20:56:47 +0000377 ])
Barry Warsawe27db5a1999-08-13 20:59:48 +0000378 except getopt.error, msg:
379 usage(1, msg)
380
381 # for holding option values
382 class Options:
383 # constants
384 GNU = 1
385 SOLARIS = 2
386 # defaults
Barry Warsawc8f08922000-02-26 20:56:47 +0000387 extractall = 0 # FIXME: currently this option has no effect at all.
388 escape = 0
Barry Warsawe27db5a1999-08-13 20:59:48 +0000389 keywords = []
Barry Warsawc8f08922000-02-26 20:56:47 +0000390 outpath = ''
Barry Warsawe27db5a1999-08-13 20:59:48 +0000391 outfile = 'messages.pot'
Barry Warsawa17e0f12000-03-08 15:18:35 +0000392 writelocations = 1
393 locationstyle = GNU
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000394 verbose = 0
Barry Warsawc8f08922000-02-26 20:56:47 +0000395 width = 78
396 excludefilename = ''
Barry Warsaw08a8a352000-10-27 04:56:28 +0000397 docstrings = 0
Barry Warsawe27db5a1999-08-13 20:59:48 +0000398
399 options = Options()
400 locations = {'gnu' : options.GNU,
401 'solaris' : options.SOLARIS,
402 }
403
404 # parse options
405 for opt, arg in opts:
406 if opt in ('-h', '--help'):
407 usage(0)
Barry Warsawc8f08922000-02-26 20:56:47 +0000408 elif opt in ('-a', '--extract-all'):
409 options.extractall = 1
410 elif opt in ('-d', '--default-domain'):
411 options.outfile = arg + '.pot'
412 elif opt in ('-E', '--escape'):
413 options.escape = 1
Barry Warsaw08a8a352000-10-27 04:56:28 +0000414 elif opt in ('-D', '--docstrings'):
415 options.docstrings = 1
Barry Warsawe27db5a1999-08-13 20:59:48 +0000416 elif opt in ('-k', '--keyword'):
Barry Warsawe27db5a1999-08-13 20:59:48 +0000417 options.keywords.append(arg)
Barry Warsawa17e0f12000-03-08 15:18:35 +0000418 elif opt in ('-K', '--no-default-keywords'):
419 default_keywords = []
Barry Warsawe27db5a1999-08-13 20:59:48 +0000420 elif opt in ('-n', '--add-location'):
Barry Warsawa17e0f12000-03-08 15:18:35 +0000421 options.writelocations = 1
Barry Warsawe27db5a1999-08-13 20:59:48 +0000422 elif opt in ('--no-location',):
Barry Warsawa17e0f12000-03-08 15:18:35 +0000423 options.writelocations = 0
424 elif opt in ('-S', '--style'):
425 options.locationstyle = locations.get(arg.lower())
426 if options.locationstyle is None:
427 usage(1, _('Invalid value for --style: %s') % arg)
Barry Warsawc8f08922000-02-26 20:56:47 +0000428 elif opt in ('-o', '--output'):
429 options.outfile = arg
430 elif opt in ('-p', '--output-dir'):
431 options.outpath = arg
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000432 elif opt in ('-v', '--verbose'):
433 options.verbose = 1
Barry Warsawc8f08922000-02-26 20:56:47 +0000434 elif opt in ('-V', '--version'):
435 print _('pygettext.py (xgettext for Python) %s') % __version__
436 sys.exit(0)
437 elif opt in ('-w', '--width'):
438 try:
439 options.width = int(arg)
440 except ValueError:
Barry Warsawa17e0f12000-03-08 15:18:35 +0000441 usage(1, _('--width argument must be an integer: %s') % arg)
Barry Warsawc8f08922000-02-26 20:56:47 +0000442 elif opt in ('-x', '--exclude-file'):
443 options.excludefilename = arg
444
445 # calculate escapes
Barry Warsaw7733e122000-02-27 14:30:48 +0000446 make_escapes(options.escape)
Barry Warsawe27db5a1999-08-13 20:59:48 +0000447
448 # calculate all keywords
449 options.keywords.extend(default_keywords)
450
Barry Warsawc8f08922000-02-26 20:56:47 +0000451 # initialize list of strings to exclude
452 if options.excludefilename:
453 try:
454 fp = open(options.excludefilename)
455 options.toexclude = fp.readlines()
456 fp.close()
457 except IOError:
Barry Warsaw6e972412001-05-21 19:35:20 +0000458 print >> sys.stderr, _(
459 "Can't read --exclude-file: %s") % options.excludefilename
Barry Warsawc8f08922000-02-26 20:56:47 +0000460 sys.exit(1)
461 else:
462 options.toexclude = []
463
Barry Warsawe27db5a1999-08-13 20:59:48 +0000464 # slurp through all the files
465 eater = TokenEater(options)
466 for filename in args:
Barry Warsawa17e0f12000-03-08 15:18:35 +0000467 if filename == '-':
468 if options.verbose:
469 print _('Reading standard input')
470 fp = sys.stdin
471 closep = 0
472 else:
473 if options.verbose:
474 print _('Working on %s') % filename
475 fp = open(filename)
476 closep = 1
477 try:
478 eater.set_filename(filename)
Barry Warsaw75ee8f52001-02-26 04:46:53 +0000479 try:
480 tokenize.tokenize(fp.readline, eater)
481 except tokenize.TokenError, e:
Barry Warsaw6e972412001-05-21 19:35:20 +0000482 print >> sys.stderr, '%s: %s, line %d, column %d' % (
483 e[0], filename, e[1][0], e[1][1])
Barry Warsawa17e0f12000-03-08 15:18:35 +0000484 finally:
485 if closep:
486 fp.close()
Barry Warsawe27db5a1999-08-13 20:59:48 +0000487
Barry Warsawa17e0f12000-03-08 15:18:35 +0000488 # write the output
489 if options.outfile == '-':
490 fp = sys.stdout
491 closep = 0
492 else:
493 if options.outpath:
494 options.outfile = os.path.join(options.outpath, options.outfile)
495 fp = open(options.outfile, 'w')
496 closep = 1
497 try:
498 eater.write(fp)
499 finally:
500 if closep:
501 fp.close()
Barry Warsawe27db5a1999-08-13 20:59:48 +0000502
503
504if __name__ == '__main__':
505 main()
Barry Warsaw75a6e672000-05-02 19:28:30 +0000506 # some more test strings
507 _(u'a unicode string')