blob: b40bda1b92e6e2cf0bcdaeb0a09cb472d4861402 [file] [log] [blame]
Barry Warsawaf572511999-08-11 21:40:38 +00001#! /usr/bin/env python
Barry Warsaw63ce5af2001-07-27 16:47:18 +00002# Originally written by Barry Warsaw <barry@zope.com>
Barry Warsawc8f08922000-02-26 20:56:47 +00003#
Barry Warsaw6e972412001-05-21 19:35:20 +00004# Minimally patched to make it even more xgettext compatible
Barry Warsawc8f08922000-02-26 20:56:47 +00005# by Peter Funk <pf@artcom-gmbh.de>
Barry Warsawe27db5a1999-08-13 20:59:48 +00006
Barry Warsaw08a8a352000-10-27 04:56:28 +00007"""pygettext -- Python equivalent of xgettext(1)
Barry Warsawe27db5a1999-08-13 20:59:48 +00008
9Many systems (Solaris, Linux, Gnu) provide extensive tools that ease the
10internationalization of C programs. Most of these tools are independent of
11the programming language and can be used from within Python programs. Martin
12von Loewis' work[1] helps considerably in this regard.
13
Barry Warsaw5dbf5261999-11-03 18:47:52 +000014There's one problem though; xgettext is the program that scans source code
Barry Warsawe27db5a1999-08-13 20:59:48 +000015looking for message strings, but it groks only C (or C++). Python introduces
16a few wrinkles, such as dual quoting characters, triple quoted strings, and
17raw strings. xgettext understands none of this.
18
19Enter pygettext, which uses Python's standard tokenize module to scan Python
20source code, generating .pot files identical to what GNU xgettext[2] generates
Barry Warsaw5dbf5261999-11-03 18:47:52 +000021for C and C++ code. From there, the standard GNU tools can be used.
Barry Warsawe27db5a1999-08-13 20:59:48 +000022
23A word about marking Python strings as candidates for translation. GNU
24xgettext recognizes the following keywords: gettext, dgettext, dcgettext, and
25gettext_noop. But those can be a lot of text to include all over your code.
Barry Warsaw5dbf5261999-11-03 18:47:52 +000026C and C++ have a trick: they use the C preprocessor. Most internationalized C
Barry Warsawe27db5a1999-08-13 20:59:48 +000027source includes a #define for gettext() to _() so that what has to be written
28in the source is much less. Thus these are both translatable strings:
29
30 gettext("Translatable String")
31 _("Translatable String")
32
33Python of course has no preprocessor so this doesn't work so well. Thus,
34pygettext searches only for _() by default, but see the -k/--keyword flag
35below for how to augment this.
36
37 [1] http://www.python.org/workshops/1997-10/proceedings/loewis.html
38 [2] http://www.gnu.org/software/gettext/gettext.html
39
Barry Warsawe27db5a1999-08-13 20:59:48 +000040NOTE: pygettext attempts to be option and feature compatible with GNU xgettext
Barry Warsawc8f08922000-02-26 20:56:47 +000041where ever possible. However some options are still missing or are not fully
Barry Warsawa17e0f12000-03-08 15:18:35 +000042implemented. Also, xgettext's use of command line switches with option
43arguments is broken, and in these cases, pygettext just defines additional
44switches.
Barry Warsawe27db5a1999-08-13 20:59:48 +000045
Barry Warsawa17e0f12000-03-08 15:18:35 +000046Usage: pygettext [options] inputfile ...
Barry Warsawe27db5a1999-08-13 20:59:48 +000047
48Options:
49
50 -a
51 --extract-all
Barry Warsaw63ce5af2001-07-27 16:47:18 +000052 Extract all strings.
Barry Warsawe27db5a1999-08-13 20:59:48 +000053
Barry Warsawc8f08922000-02-26 20:56:47 +000054 -d name
55 --default-domain=name
Barry Warsaw63ce5af2001-07-27 16:47:18 +000056 Rename the default output file from messages.pot to name.pot.
Barry Warsawc8f08922000-02-26 20:56:47 +000057
58 -E
59 --escape
Barry Warsaw08a8a352000-10-27 04:56:28 +000060 Replace non-ASCII characters with octal escape sequences.
61
62 -D
63 --docstrings
64 Extract module, class, method, and function docstrings. These do not
65 need to be wrapped in _() markers, and in fact cannot be for Python to
Barry Warsaw63ce5af2001-07-27 16:47:18 +000066 consider them docstrings. (See also the -X option).
Barry Warsawc8f08922000-02-26 20:56:47 +000067
68 -h
69 --help
Barry Warsaw63ce5af2001-07-27 16:47:18 +000070 Print this help message and exit.
Barry Warsawe27db5a1999-08-13 20:59:48 +000071
Barry Warsawa17e0f12000-03-08 15:18:35 +000072 -k word
73 --keyword=word
74 Keywords to look for in addition to the default set, which are:
75 %(DEFAULTKEYWORDS)s
Barry Warsawe27db5a1999-08-13 20:59:48 +000076
Barry Warsawa17e0f12000-03-08 15:18:35 +000077 You can have multiple -k flags on the command line.
78
79 -K
80 --no-default-keywords
81 Disable the default set of keywords (see above). Any keywords
82 explicitly added with the -k/--keyword option are still recognized.
Barry Warsawe27db5a1999-08-13 20:59:48 +000083
84 --no-location
Barry Warsawa17e0f12000-03-08 15:18:35 +000085 Do not write filename/lineno location comments.
Barry Warsawe27db5a1999-08-13 20:59:48 +000086
Barry Warsawa17e0f12000-03-08 15:18:35 +000087 -n
88 --add-location
Barry Warsawe27db5a1999-08-13 20:59:48 +000089 Write filename/lineno location comments indicating where each
90 extracted string is found in the source. These lines appear before
Barry Warsawa17e0f12000-03-08 15:18:35 +000091 each msgid. The style of comments is controlled by the -S/--style
92 option. This is the default.
93
Barry Warsaw08a8a352000-10-27 04:56:28 +000094 -o filename
95 --output=filename
96 Rename the default output file from messages.pot to filename. If
97 filename is `-' then the output is sent to standard out.
98
99 -p dir
100 --output-dir=dir
101 Output files will be placed in directory dir.
102
Barry Warsawa17e0f12000-03-08 15:18:35 +0000103 -S stylename
104 --style stylename
105 Specify which style to use for location comments. Two styles are
106 supported:
Barry Warsawe27db5a1999-08-13 20:59:48 +0000107
108 Solaris # File: filename, line: line-number
Barry Warsawa17e0f12000-03-08 15:18:35 +0000109 GNU #: filename:line
Barry Warsawe27db5a1999-08-13 20:59:48 +0000110
Barry Warsawa17e0f12000-03-08 15:18:35 +0000111 The style name is case insensitive. GNU style is the default.
Barry Warsawe27db5a1999-08-13 20:59:48 +0000112
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000113 -v
114 --verbose
115 Print the names of the files being processed.
116
Barry Warsawc8f08922000-02-26 20:56:47 +0000117 -V
118 --version
119 Print the version of pygettext and exit.
120
121 -w columns
122 --width=columns
123 Set width of output to columns.
124
125 -x filename
126 --exclude-file=filename
127 Specify a file that contains a list of strings that are not be
128 extracted from the input files. Each string to be excluded must
129 appear on a line by itself in the file.
Barry Warsawe27db5a1999-08-13 20:59:48 +0000130
Barry Warsaw63ce5af2001-07-27 16:47:18 +0000131 -X filename
132 --no-docstrings=filename
133 Specify a file that contains a list of files (one per line) that
134 should not have their docstrings extracted. This is only useful in
135 conjunction with the -D option above.
Barry Warsawa17e0f12000-03-08 15:18:35 +0000136
Barry Warsaw63ce5af2001-07-27 16:47:18 +0000137If `inputfile' is -, standard input is read.
Barry Warsaw08a8a352000-10-27 04:56:28 +0000138"""
Barry Warsawe27db5a1999-08-13 20:59:48 +0000139
140import os
141import sys
Barry Warsawe27db5a1999-08-13 20:59:48 +0000142import time
143import getopt
144import tokenize
Barry Warsaw16b62c12001-05-21 19:51:26 +0000145import operator
Barry Warsawe27db5a1999-08-13 20:59:48 +0000146
Barry Warsaw08a8a352000-10-27 04:56:28 +0000147# for selftesting
148try:
149 import fintl
150 _ = fintl.gettext
151except ImportError:
152 def _(s): return s
153
Barry Warsaw63ce5af2001-07-27 16:47:18 +0000154__version__ = '1.4'
Barry Warsawa17e0f12000-03-08 15:18:35 +0000155
156default_keywords = ['_']
157DEFAULTKEYWORDS = ', '.join(default_keywords)
158
159EMPTYSTRING = ''
Barry Warsawe27db5a1999-08-13 20:59:48 +0000160
161
162
Barry Warsaw63ce5af2001-07-27 16:47:18 +0000163# The normal pot-file header. msgmerge and Emacs's po-mode work better if it's
164# there.
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000165pot_header = _('''\
166# SOME DESCRIPTIVE TITLE.
167# Copyright (C) YEAR ORGANIZATION
168# FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.
169#
170msgid ""
171msgstr ""
172"Project-Id-Version: PACKAGE VERSION\\n"
Martin v. Löwis0f6b3832001-03-01 22:56:17 +0000173"POT-Creation-Date: %(time)s\\n"
174"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\\n"
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000175"Last-Translator: FULL NAME <EMAIL@ADDRESS>\\n"
176"Language-Team: LANGUAGE <LL@li.org>\\n"
177"MIME-Version: 1.0\\n"
178"Content-Type: text/plain; charset=CHARSET\\n"
179"Content-Transfer-Encoding: ENCODING\\n"
180"Generated-By: pygettext.py %(version)s\\n"
181
182''')
183
184
Barry Warsawe27db5a1999-08-13 20:59:48 +0000185def usage(code, msg=''):
Barry Warsaw08a8a352000-10-27 04:56:28 +0000186 print >> sys.stderr, _(__doc__) % globals()
Barry Warsawe27db5a1999-08-13 20:59:48 +0000187 if msg:
Barry Warsaw08a8a352000-10-27 04:56:28 +0000188 print >> sys.stderr, msg
Barry Warsawe27db5a1999-08-13 20:59:48 +0000189 sys.exit(code)
190
Barry Warsawc8f08922000-02-26 20:56:47 +0000191
Barry Warsawe27db5a1999-08-13 20:59:48 +0000192
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000193escapes = []
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000194
Barry Warsawc8f08922000-02-26 20:56:47 +0000195def make_escapes(pass_iso8859):
196 global escapes
Barry Warsaw7733e122000-02-27 14:30:48 +0000197 if pass_iso8859:
198 # Allow iso-8859 characters to pass through so that e.g. 'msgid
199 # "Höhe"' would result not result in 'msgid "H\366he"'. Otherwise we
200 # escape any character outside the 32..126 range.
201 mod = 128
202 else:
203 mod = 256
Barry Warsawc8f08922000-02-26 20:56:47 +0000204 for i in range(256):
Barry Warsaw7733e122000-02-27 14:30:48 +0000205 if 32 <= (i % mod) <= 126:
Barry Warsawc8f08922000-02-26 20:56:47 +0000206 escapes.append(chr(i))
207 else:
208 escapes.append("\\%03o" % i)
209 escapes[ord('\\')] = '\\\\'
210 escapes[ord('\t')] = '\\t'
211 escapes[ord('\r')] = '\\r'
212 escapes[ord('\n')] = '\\n'
213 escapes[ord('\"')] = '\\"'
214
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000215
216def escape(s):
Barry Warsawc8f08922000-02-26 20:56:47 +0000217 global escapes
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000218 s = list(s)
219 for i in range(len(s)):
220 s[i] = escapes[ord(s[i])]
Barry Warsawa17e0f12000-03-08 15:18:35 +0000221 return EMPTYSTRING.join(s)
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000222
223
224def safe_eval(s):
225 # unwrap quotes, safely
226 return eval(s, {'__builtins__':{}}, {})
227
228
Barry Warsawe27db5a1999-08-13 20:59:48 +0000229def normalize(s):
230 # This converts the various Python string types into a format that is
231 # appropriate for .po files, namely much closer to C style.
Barry Warsawa17e0f12000-03-08 15:18:35 +0000232 lines = s.split('\n')
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000233 if len(lines) == 1:
234 s = '"' + escape(s) + '"'
Barry Warsawe27db5a1999-08-13 20:59:48 +0000235 else:
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000236 if not lines[-1]:
237 del lines[-1]
238 lines[-1] = lines[-1] + '\n'
239 for i in range(len(lines)):
240 lines[i] = escape(lines[i])
Barry Warsawa17e0f12000-03-08 15:18:35 +0000241 lineterm = '\\n"\n"'
242 s = '""\n"' + lineterm.join(lines) + '"'
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000243 return s
Barry Warsawe27db5a1999-08-13 20:59:48 +0000244
245
246
247class TokenEater:
248 def __init__(self, options):
249 self.__options = options
250 self.__messages = {}
251 self.__state = self.__waiting
252 self.__data = []
253 self.__lineno = -1
Barry Warsaw08a8a352000-10-27 04:56:28 +0000254 self.__freshmodule = 1
Barry Warsaw63ce5af2001-07-27 16:47:18 +0000255 self.__curfile = None
Barry Warsawe27db5a1999-08-13 20:59:48 +0000256
257 def __call__(self, ttype, tstring, stup, etup, line):
258 # dispatch
Barry Warsaw08a8a352000-10-27 04:56:28 +0000259## import token
260## print >> sys.stderr, 'ttype:', token.tok_name[ttype], \
261## 'tstring:', tstring
Barry Warsawe27db5a1999-08-13 20:59:48 +0000262 self.__state(ttype, tstring, stup[0])
263
264 def __waiting(self, ttype, tstring, lineno):
Barry Warsaw63ce5af2001-07-27 16:47:18 +0000265 opts = self.__options
Barry Warsaw08a8a352000-10-27 04:56:28 +0000266 # Do docstring extractions, if enabled
Barry Warsaw63ce5af2001-07-27 16:47:18 +0000267 if opts.docstrings and not opts.nodocstrings.get(self.__curfile):
Barry Warsaw08a8a352000-10-27 04:56:28 +0000268 # module docstring?
269 if self.__freshmodule:
270 if ttype == tokenize.STRING:
Barry Warsaw16b62c12001-05-21 19:51:26 +0000271 self.__addentry(safe_eval(tstring), lineno, isdocstring=1)
Barry Warsaw08a8a352000-10-27 04:56:28 +0000272 self.__freshmodule = 0
273 elif ttype not in (tokenize.COMMENT, tokenize.NL):
274 self.__freshmodule = 0
275 return
276 # class docstring?
277 if ttype == tokenize.NAME and tstring in ('class', 'def'):
278 self.__state = self.__suiteseen
279 return
Barry Warsaw63ce5af2001-07-27 16:47:18 +0000280 if ttype == tokenize.NAME and tstring in opts.keywords:
Barry Warsawe27db5a1999-08-13 20:59:48 +0000281 self.__state = self.__keywordseen
282
Barry Warsaw08a8a352000-10-27 04:56:28 +0000283 def __suiteseen(self, ttype, tstring, lineno):
284 # ignore anything until we see the colon
285 if ttype == tokenize.OP and tstring == ':':
286 self.__state = self.__suitedocstring
287
288 def __suitedocstring(self, ttype, tstring, lineno):
289 # ignore any intervening noise
290 if ttype == tokenize.STRING:
Barry Warsaw16b62c12001-05-21 19:51:26 +0000291 self.__addentry(safe_eval(tstring), lineno, isdocstring=1)
Barry Warsaw08a8a352000-10-27 04:56:28 +0000292 self.__state = self.__waiting
293 elif ttype not in (tokenize.NEWLINE, tokenize.INDENT,
294 tokenize.COMMENT):
295 # there was no class docstring
296 self.__state = self.__waiting
297
Barry Warsawe27db5a1999-08-13 20:59:48 +0000298 def __keywordseen(self, ttype, tstring, lineno):
299 if ttype == tokenize.OP and tstring == '(':
300 self.__data = []
301 self.__lineno = lineno
302 self.__state = self.__openseen
303 else:
304 self.__state = self.__waiting
305
306 def __openseen(self, ttype, tstring, lineno):
307 if ttype == tokenize.OP and tstring == ')':
308 # We've seen the last of the translatable strings. Record the
309 # line number of the first line of the strings and update the list
310 # of messages seen. Reset state for the next batch. If there
311 # were no strings inside _(), then just ignore this entry.
312 if self.__data:
Barry Warsaw08a8a352000-10-27 04:56:28 +0000313 self.__addentry(EMPTYSTRING.join(self.__data))
Barry Warsawe27db5a1999-08-13 20:59:48 +0000314 self.__state = self.__waiting
315 elif ttype == tokenize.STRING:
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000316 self.__data.append(safe_eval(tstring))
Barry Warsawe27db5a1999-08-13 20:59:48 +0000317 # TBD: should we warn if we seen anything else?
318
Barry Warsaw16b62c12001-05-21 19:51:26 +0000319 def __addentry(self, msg, lineno=None, isdocstring=0):
Barry Warsaw08a8a352000-10-27 04:56:28 +0000320 if lineno is None:
321 lineno = self.__lineno
322 if not msg in self.__options.toexclude:
323 entry = (self.__curfile, lineno)
Barry Warsaw16b62c12001-05-21 19:51:26 +0000324 self.__messages.setdefault(msg, {})[entry] = isdocstring
Barry Warsaw08a8a352000-10-27 04:56:28 +0000325
Barry Warsawe27db5a1999-08-13 20:59:48 +0000326 def set_filename(self, filename):
327 self.__curfile = filename
Barry Warsaw63ce5af2001-07-27 16:47:18 +0000328 self.__freshmodule = 1
Barry Warsawe27db5a1999-08-13 20:59:48 +0000329
330 def write(self, fp):
331 options = self.__options
332 timestamp = time.ctime(time.time())
Barry Warsaw08a8a352000-10-27 04:56:28 +0000333 # The time stamp in the header doesn't have the same format as that
334 # generated by xgettext...
335 print >> fp, pot_header % {'time': timestamp, 'version': __version__}
Barry Warsaw128c77d2001-05-23 16:59:45 +0000336 # Sort the entries. First sort each particular entry's keys, then
337 # sort all the entries by their first item.
338 reverse = {}
Fred Drake33e2c3e2000-10-26 03:49:15 +0000339 for k, v in self.__messages.items():
Barry Warsaw128c77d2001-05-23 16:59:45 +0000340 keys = v.keys()
341 keys.sort()
Barry Warsaw50cf7062001-05-24 23:06:13 +0000342 reverse.setdefault(tuple(keys), []).append((k, v))
Barry Warsaw128c77d2001-05-23 16:59:45 +0000343 rkeys = reverse.keys()
344 rkeys.sort()
345 for rkey in rkeys:
Barry Warsaw50cf7062001-05-24 23:06:13 +0000346 rentries = reverse[rkey]
347 rentries.sort()
348 for k, v in rentries:
Barry Warsaw5c94ce52001-06-20 19:41:40 +0000349 isdocstring = 0
Barry Warsaw50cf7062001-05-24 23:06:13 +0000350 # If the entry was gleaned out of a docstring, then add a
351 # comment stating so. This is to aid translators who may wish
352 # to skip translating some unimportant docstrings.
353 if reduce(operator.__add__, v.values()):
Barry Warsaw5c94ce52001-06-20 19:41:40 +0000354 isdocstring = 1
Barry Warsaw50cf7062001-05-24 23:06:13 +0000355 # k is the message string, v is a dictionary-set of (filename,
356 # lineno) tuples. We want to sort the entries in v first by
357 # file name and then by line number.
358 v = v.keys()
359 v.sort()
360 if not options.writelocations:
361 pass
362 # location comments are different b/w Solaris and GNU:
363 elif options.locationstyle == options.SOLARIS:
364 for filename, lineno in v:
365 d = {'filename': filename, 'lineno': lineno}
366 print >>fp, _(
367 '# File: %(filename)s, line: %(lineno)d') % d
368 elif options.locationstyle == options.GNU:
369 # fit as many locations on one line, as long as the
370 # resulting line length doesn't exceeds 'options.width'
371 locline = '#:'
372 for filename, lineno in v:
373 d = {'filename': filename, 'lineno': lineno}
374 s = _(' %(filename)s:%(lineno)d') % d
375 if len(locline) + len(s) <= options.width:
376 locline = locline + s
377 else:
378 print >> fp, locline
379 locline = "#:" + s
380 if len(locline) > 2:
Barry Warsaw08a8a352000-10-27 04:56:28 +0000381 print >> fp, locline
Barry Warsaw5c94ce52001-06-20 19:41:40 +0000382 if isdocstring:
383 print >> fp, '#, docstring'
Barry Warsaw50cf7062001-05-24 23:06:13 +0000384 print >> fp, 'msgid', normalize(k)
385 print >> fp, 'msgstr ""\n'
Barry Warsaw08a8a352000-10-27 04:56:28 +0000386
Barry Warsawe27db5a1999-08-13 20:59:48 +0000387
388
389def main():
Barry Warsawa17e0f12000-03-08 15:18:35 +0000390 global default_keywords
Barry Warsawe27db5a1999-08-13 20:59:48 +0000391 try:
392 opts, args = getopt.getopt(
393 sys.argv[1:],
Barry Warsaw63ce5af2001-07-27 16:47:18 +0000394 'ad:DEhk:Kno:p:S:Vvw:x:X:',
Barry Warsaw2b639692001-05-21 19:58:23 +0000395 ['extract-all', 'default-domain=', 'escape', 'help',
Barry Warsawa17e0f12000-03-08 15:18:35 +0000396 'keyword=', 'no-default-keywords',
Barry Warsawc8f08922000-02-26 20:56:47 +0000397 'add-location', 'no-location', 'output=', 'output-dir=',
Barry Warsawa17e0f12000-03-08 15:18:35 +0000398 'style=', 'verbose', 'version', 'width=', 'exclude-file=',
Barry Warsaw63ce5af2001-07-27 16:47:18 +0000399 'docstrings', 'no-docstrings',
Barry Warsawc8f08922000-02-26 20:56:47 +0000400 ])
Barry Warsawe27db5a1999-08-13 20:59:48 +0000401 except getopt.error, msg:
402 usage(1, msg)
403
404 # for holding option values
405 class Options:
406 # constants
407 GNU = 1
408 SOLARIS = 2
409 # defaults
Barry Warsawc8f08922000-02-26 20:56:47 +0000410 extractall = 0 # FIXME: currently this option has no effect at all.
411 escape = 0
Barry Warsawe27db5a1999-08-13 20:59:48 +0000412 keywords = []
Barry Warsawc8f08922000-02-26 20:56:47 +0000413 outpath = ''
Barry Warsawe27db5a1999-08-13 20:59:48 +0000414 outfile = 'messages.pot'
Barry Warsawa17e0f12000-03-08 15:18:35 +0000415 writelocations = 1
416 locationstyle = GNU
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000417 verbose = 0
Barry Warsawc8f08922000-02-26 20:56:47 +0000418 width = 78
419 excludefilename = ''
Barry Warsaw08a8a352000-10-27 04:56:28 +0000420 docstrings = 0
Barry Warsaw63ce5af2001-07-27 16:47:18 +0000421 nodocstrings = {}
Barry Warsawe27db5a1999-08-13 20:59:48 +0000422
423 options = Options()
424 locations = {'gnu' : options.GNU,
425 'solaris' : options.SOLARIS,
426 }
427
428 # parse options
429 for opt, arg in opts:
430 if opt in ('-h', '--help'):
431 usage(0)
Barry Warsawc8f08922000-02-26 20:56:47 +0000432 elif opt in ('-a', '--extract-all'):
433 options.extractall = 1
434 elif opt in ('-d', '--default-domain'):
435 options.outfile = arg + '.pot'
436 elif opt in ('-E', '--escape'):
437 options.escape = 1
Barry Warsaw08a8a352000-10-27 04:56:28 +0000438 elif opt in ('-D', '--docstrings'):
439 options.docstrings = 1
Barry Warsawe27db5a1999-08-13 20:59:48 +0000440 elif opt in ('-k', '--keyword'):
Barry Warsawe27db5a1999-08-13 20:59:48 +0000441 options.keywords.append(arg)
Barry Warsawa17e0f12000-03-08 15:18:35 +0000442 elif opt in ('-K', '--no-default-keywords'):
443 default_keywords = []
Barry Warsawe27db5a1999-08-13 20:59:48 +0000444 elif opt in ('-n', '--add-location'):
Barry Warsawa17e0f12000-03-08 15:18:35 +0000445 options.writelocations = 1
Barry Warsawe27db5a1999-08-13 20:59:48 +0000446 elif opt in ('--no-location',):
Barry Warsawa17e0f12000-03-08 15:18:35 +0000447 options.writelocations = 0
448 elif opt in ('-S', '--style'):
449 options.locationstyle = locations.get(arg.lower())
450 if options.locationstyle is None:
451 usage(1, _('Invalid value for --style: %s') % arg)
Barry Warsawc8f08922000-02-26 20:56:47 +0000452 elif opt in ('-o', '--output'):
453 options.outfile = arg
454 elif opt in ('-p', '--output-dir'):
455 options.outpath = arg
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000456 elif opt in ('-v', '--verbose'):
457 options.verbose = 1
Barry Warsawc8f08922000-02-26 20:56:47 +0000458 elif opt in ('-V', '--version'):
459 print _('pygettext.py (xgettext for Python) %s') % __version__
460 sys.exit(0)
461 elif opt in ('-w', '--width'):
462 try:
463 options.width = int(arg)
464 except ValueError:
Barry Warsawa17e0f12000-03-08 15:18:35 +0000465 usage(1, _('--width argument must be an integer: %s') % arg)
Barry Warsawc8f08922000-02-26 20:56:47 +0000466 elif opt in ('-x', '--exclude-file'):
467 options.excludefilename = arg
Barry Warsaw63ce5af2001-07-27 16:47:18 +0000468 elif opt in ('-X', '--no-docstrings'):
469 fp = open(arg)
470 try:
471 while 1:
472 line = fp.readline()
473 if not line:
474 break
475 options.nodocstrings[line[:-1]] = 1
476 finally:
477 fp.close()
Barry Warsawc8f08922000-02-26 20:56:47 +0000478
479 # calculate escapes
Barry Warsaw7733e122000-02-27 14:30:48 +0000480 make_escapes(options.escape)
Barry Warsawe27db5a1999-08-13 20:59:48 +0000481
482 # calculate all keywords
483 options.keywords.extend(default_keywords)
484
Barry Warsawc8f08922000-02-26 20:56:47 +0000485 # initialize list of strings to exclude
486 if options.excludefilename:
487 try:
488 fp = open(options.excludefilename)
489 options.toexclude = fp.readlines()
490 fp.close()
491 except IOError:
Barry Warsaw6e972412001-05-21 19:35:20 +0000492 print >> sys.stderr, _(
493 "Can't read --exclude-file: %s") % options.excludefilename
Barry Warsawc8f08922000-02-26 20:56:47 +0000494 sys.exit(1)
495 else:
496 options.toexclude = []
497
Barry Warsawe27db5a1999-08-13 20:59:48 +0000498 # slurp through all the files
499 eater = TokenEater(options)
500 for filename in args:
Barry Warsawa17e0f12000-03-08 15:18:35 +0000501 if filename == '-':
502 if options.verbose:
503 print _('Reading standard input')
504 fp = sys.stdin
505 closep = 0
506 else:
507 if options.verbose:
508 print _('Working on %s') % filename
509 fp = open(filename)
510 closep = 1
511 try:
512 eater.set_filename(filename)
Barry Warsaw75ee8f52001-02-26 04:46:53 +0000513 try:
514 tokenize.tokenize(fp.readline, eater)
515 except tokenize.TokenError, e:
Barry Warsaw6e972412001-05-21 19:35:20 +0000516 print >> sys.stderr, '%s: %s, line %d, column %d' % (
517 e[0], filename, e[1][0], e[1][1])
Barry Warsawa17e0f12000-03-08 15:18:35 +0000518 finally:
519 if closep:
520 fp.close()
Barry Warsawe27db5a1999-08-13 20:59:48 +0000521
Barry Warsawa17e0f12000-03-08 15:18:35 +0000522 # write the output
523 if options.outfile == '-':
524 fp = sys.stdout
525 closep = 0
526 else:
527 if options.outpath:
528 options.outfile = os.path.join(options.outpath, options.outfile)
529 fp = open(options.outfile, 'w')
530 closep = 1
531 try:
532 eater.write(fp)
533 finally:
534 if closep:
535 fp.close()
Barry Warsawe27db5a1999-08-13 20:59:48 +0000536
537
538if __name__ == '__main__':
539 main()
Barry Warsaw75a6e672000-05-02 19:28:30 +0000540 # some more test strings
541 _(u'a unicode string')