blob: d0320cfb1632e774e430865c6e1d9f5871dd15f2 [file] [log] [blame]
Barry Warsawaf572511999-08-11 21:40:38 +00001#! /usr/bin/env python
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +00002# -*- coding: iso-8859-1 -*-
Barry Warsaw63ce5af2001-07-27 16:47:18 +00003# Originally written by Barry Warsaw <barry@zope.com>
Barry Warsawc8f08922000-02-26 20:56:47 +00004#
Barry Warsaw6e972412001-05-21 19:35:20 +00005# Minimally patched to make it even more xgettext compatible
Barry Warsawc8f08922000-02-26 20:56:47 +00006# by Peter Funk <pf@artcom-gmbh.de>
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +00007#
8# 2002-11-22 Jürgen Hermann <jh@web.de>
9# Added checks that _() only contains string literals, and
10# command line args are resolved to module lists, i.e. you
11# can now pass a filename, a module or package name, or a
12# directory (including globbing chars, important for Win32).
13# Made docstring fit in 80 chars wide displays using pydoc.
14#
Barry Warsawe27db5a1999-08-13 20:59:48 +000015
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +000016# for selftesting
17try:
18 import fintl
19 _ = fintl.gettext
20except ImportError:
21 _ = lambda s: s
22
23__doc__ = _("""pygettext -- Python equivalent of xgettext(1)
Barry Warsawe27db5a1999-08-13 20:59:48 +000024
25Many systems (Solaris, Linux, Gnu) provide extensive tools that ease the
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +000026internationalization of C programs. Most of these tools are independent of
27the programming language and can be used from within Python programs.
28Martin von Loewis' work[1] helps considerably in this regard.
Barry Warsawe27db5a1999-08-13 20:59:48 +000029
Barry Warsaw5dbf5261999-11-03 18:47:52 +000030There's one problem though; xgettext is the program that scans source code
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +000031looking for message strings, but it groks only C (or C++). Python
32introduces a few wrinkles, such as dual quoting characters, triple quoted
33strings, and raw strings. xgettext understands none of this.
Barry Warsawe27db5a1999-08-13 20:59:48 +000034
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +000035Enter pygettext, which uses Python's standard tokenize module to scan
36Python source code, generating .pot files identical to what GNU xgettext[2]
37generates for C and C++ code. From there, the standard GNU tools can be
38used.
Barry Warsawe27db5a1999-08-13 20:59:48 +000039
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +000040A word about marking Python strings as candidates for translation. GNU
41xgettext recognizes the following keywords: gettext, dgettext, dcgettext,
42and gettext_noop. But those can be a lot of text to include all over your
43code. C and C++ have a trick: they use the C preprocessor. Most
44internationalized C source includes a #define for gettext() to _() so that
45what has to be written in the source is much less. Thus these are both
46translatable strings:
Barry Warsawe27db5a1999-08-13 20:59:48 +000047
48 gettext("Translatable String")
49 _("Translatable String")
50
51Python of course has no preprocessor so this doesn't work so well. Thus,
52pygettext searches only for _() by default, but see the -k/--keyword flag
53below for how to augment this.
54
55 [1] http://www.python.org/workshops/1997-10/proceedings/loewis.html
56 [2] http://www.gnu.org/software/gettext/gettext.html
57
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +000058NOTE: pygettext attempts to be option and feature compatible with GNU
59xgettext where ever possible. However some options are still missing or are
60not fully implemented. Also, xgettext's use of command line switches with
61option arguments is broken, and in these cases, pygettext just defines
62additional switches.
Barry Warsawe27db5a1999-08-13 20:59:48 +000063
Barry Warsawa17e0f12000-03-08 15:18:35 +000064Usage: pygettext [options] inputfile ...
Barry Warsawe27db5a1999-08-13 20:59:48 +000065
66Options:
67
68 -a
69 --extract-all
Barry Warsaw63ce5af2001-07-27 16:47:18 +000070 Extract all strings.
Barry Warsawe27db5a1999-08-13 20:59:48 +000071
Barry Warsawc8f08922000-02-26 20:56:47 +000072 -d name
73 --default-domain=name
Barry Warsaw63ce5af2001-07-27 16:47:18 +000074 Rename the default output file from messages.pot to name.pot.
Barry Warsawc8f08922000-02-26 20:56:47 +000075
76 -E
77 --escape
Barry Warsaw08a8a352000-10-27 04:56:28 +000078 Replace non-ASCII characters with octal escape sequences.
79
80 -D
81 --docstrings
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +000082 Extract module, class, method, and function docstrings. These do
83 not need to be wrapped in _() markers, and in fact cannot be for
84 Python to consider them docstrings. (See also the -X option).
Barry Warsawc8f08922000-02-26 20:56:47 +000085
86 -h
87 --help
Barry Warsaw63ce5af2001-07-27 16:47:18 +000088 Print this help message and exit.
Barry Warsawe27db5a1999-08-13 20:59:48 +000089
Barry Warsawa17e0f12000-03-08 15:18:35 +000090 -k word
91 --keyword=word
92 Keywords to look for in addition to the default set, which are:
93 %(DEFAULTKEYWORDS)s
Barry Warsawe27db5a1999-08-13 20:59:48 +000094
Barry Warsawa17e0f12000-03-08 15:18:35 +000095 You can have multiple -k flags on the command line.
96
97 -K
98 --no-default-keywords
99 Disable the default set of keywords (see above). Any keywords
100 explicitly added with the -k/--keyword option are still recognized.
Barry Warsawe27db5a1999-08-13 20:59:48 +0000101
102 --no-location
Barry Warsawa17e0f12000-03-08 15:18:35 +0000103 Do not write filename/lineno location comments.
Barry Warsawe27db5a1999-08-13 20:59:48 +0000104
Barry Warsawa17e0f12000-03-08 15:18:35 +0000105 -n
106 --add-location
Barry Warsawe27db5a1999-08-13 20:59:48 +0000107 Write filename/lineno location comments indicating where each
108 extracted string is found in the source. These lines appear before
Barry Warsawa17e0f12000-03-08 15:18:35 +0000109 each msgid. The style of comments is controlled by the -S/--style
110 option. This is the default.
111
Barry Warsaw08a8a352000-10-27 04:56:28 +0000112 -o filename
113 --output=filename
114 Rename the default output file from messages.pot to filename. If
115 filename is `-' then the output is sent to standard out.
116
117 -p dir
118 --output-dir=dir
119 Output files will be placed in directory dir.
120
Barry Warsawa17e0f12000-03-08 15:18:35 +0000121 -S stylename
122 --style stylename
123 Specify which style to use for location comments. Two styles are
124 supported:
Barry Warsawe27db5a1999-08-13 20:59:48 +0000125
126 Solaris # File: filename, line: line-number
Barry Warsawa17e0f12000-03-08 15:18:35 +0000127 GNU #: filename:line
Barry Warsawe27db5a1999-08-13 20:59:48 +0000128
Barry Warsawa17e0f12000-03-08 15:18:35 +0000129 The style name is case insensitive. GNU style is the default.
Barry Warsawe27db5a1999-08-13 20:59:48 +0000130
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000131 -v
132 --verbose
133 Print the names of the files being processed.
134
Barry Warsawc8f08922000-02-26 20:56:47 +0000135 -V
136 --version
137 Print the version of pygettext and exit.
138
139 -w columns
140 --width=columns
141 Set width of output to columns.
142
143 -x filename
144 --exclude-file=filename
145 Specify a file that contains a list of strings that are not be
146 extracted from the input files. Each string to be excluded must
147 appear on a line by itself in the file.
Barry Warsawe27db5a1999-08-13 20:59:48 +0000148
Barry Warsaw63ce5af2001-07-27 16:47:18 +0000149 -X filename
150 --no-docstrings=filename
151 Specify a file that contains a list of files (one per line) that
152 should not have their docstrings extracted. This is only useful in
153 conjunction with the -D option above.
Barry Warsawa17e0f12000-03-08 15:18:35 +0000154
Barry Warsaw63ce5af2001-07-27 16:47:18 +0000155If `inputfile' is -, standard input is read.
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000156""")
Barry Warsawe27db5a1999-08-13 20:59:48 +0000157
158import os
159import sys
Barry Warsawe27db5a1999-08-13 20:59:48 +0000160import time
161import getopt
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000162import token
Barry Warsawe27db5a1999-08-13 20:59:48 +0000163import tokenize
Barry Warsaw16b62c12001-05-21 19:51:26 +0000164import operator
Barry Warsawe27db5a1999-08-13 20:59:48 +0000165
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000166__version__ = '1.5'
Barry Warsawa17e0f12000-03-08 15:18:35 +0000167
168default_keywords = ['_']
169DEFAULTKEYWORDS = ', '.join(default_keywords)
170
171EMPTYSTRING = ''
Barry Warsawe27db5a1999-08-13 20:59:48 +0000172
173
174
Barry Warsaw63ce5af2001-07-27 16:47:18 +0000175# The normal pot-file header. msgmerge and Emacs's po-mode work better if it's
176# there.
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000177pot_header = _('''\
178# SOME DESCRIPTIVE TITLE.
179# Copyright (C) YEAR ORGANIZATION
180# FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.
181#
182msgid ""
183msgstr ""
184"Project-Id-Version: PACKAGE VERSION\\n"
Martin v. Löwis0f6b3832001-03-01 22:56:17 +0000185"POT-Creation-Date: %(time)s\\n"
186"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\\n"
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000187"Last-Translator: FULL NAME <EMAIL@ADDRESS>\\n"
188"Language-Team: LANGUAGE <LL@li.org>\\n"
189"MIME-Version: 1.0\\n"
190"Content-Type: text/plain; charset=CHARSET\\n"
191"Content-Transfer-Encoding: ENCODING\\n"
192"Generated-By: pygettext.py %(version)s\\n"
193
194''')
195
196
Barry Warsawe27db5a1999-08-13 20:59:48 +0000197def usage(code, msg=''):
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000198 print >> sys.stderr, __doc__ % globals()
Barry Warsawe27db5a1999-08-13 20:59:48 +0000199 if msg:
Barry Warsaw08a8a352000-10-27 04:56:28 +0000200 print >> sys.stderr, msg
Barry Warsawe27db5a1999-08-13 20:59:48 +0000201 sys.exit(code)
202
Barry Warsawc8f08922000-02-26 20:56:47 +0000203
Barry Warsawe27db5a1999-08-13 20:59:48 +0000204
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000205escapes = []
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000206
Barry Warsawc8f08922000-02-26 20:56:47 +0000207def make_escapes(pass_iso8859):
208 global escapes
Barry Warsaw7733e122000-02-27 14:30:48 +0000209 if pass_iso8859:
210 # Allow iso-8859 characters to pass through so that e.g. 'msgid
211 # "Höhe"' would result not result in 'msgid "H\366he"'. Otherwise we
212 # escape any character outside the 32..126 range.
213 mod = 128
214 else:
215 mod = 256
Barry Warsawc8f08922000-02-26 20:56:47 +0000216 for i in range(256):
Barry Warsaw7733e122000-02-27 14:30:48 +0000217 if 32 <= (i % mod) <= 126:
Barry Warsawc8f08922000-02-26 20:56:47 +0000218 escapes.append(chr(i))
219 else:
220 escapes.append("\\%03o" % i)
221 escapes[ord('\\')] = '\\\\'
222 escapes[ord('\t')] = '\\t'
223 escapes[ord('\r')] = '\\r'
224 escapes[ord('\n')] = '\\n'
225 escapes[ord('\"')] = '\\"'
226
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000227
228def escape(s):
Barry Warsawc8f08922000-02-26 20:56:47 +0000229 global escapes
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000230 s = list(s)
231 for i in range(len(s)):
232 s[i] = escapes[ord(s[i])]
Barry Warsawa17e0f12000-03-08 15:18:35 +0000233 return EMPTYSTRING.join(s)
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000234
235
236def safe_eval(s):
237 # unwrap quotes, safely
238 return eval(s, {'__builtins__':{}}, {})
239
240
Barry Warsawe27db5a1999-08-13 20:59:48 +0000241def normalize(s):
242 # This converts the various Python string types into a format that is
243 # appropriate for .po files, namely much closer to C style.
Barry Warsawa17e0f12000-03-08 15:18:35 +0000244 lines = s.split('\n')
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000245 if len(lines) == 1:
246 s = '"' + escape(s) + '"'
Barry Warsawe27db5a1999-08-13 20:59:48 +0000247 else:
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000248 if not lines[-1]:
249 del lines[-1]
250 lines[-1] = lines[-1] + '\n'
251 for i in range(len(lines)):
252 lines[i] = escape(lines[i])
Barry Warsawa17e0f12000-03-08 15:18:35 +0000253 lineterm = '\\n"\n"'
254 s = '""\n"' + lineterm.join(lines) + '"'
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000255 return s
Barry Warsawe27db5a1999-08-13 20:59:48 +0000256
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000257
258def containsAny(str, set):
259 """ Check whether 'str' contains ANY of the chars in 'set'
260 """
261 return 1 in [c in str for c in set]
262
263
264def _visit_pyfiles(list, dirname, names):
265 """ Helper for getFilesForName().
266 """
267 # get extension for python source files
268 if not globals().has_key('_py_ext'):
269 import imp
270 global _py_ext
271 _py_ext = [triple[0] for triple in imp.get_suffixes() if triple[2] == imp.PY_SOURCE][0]
272
273 # don't recurse into CVS directories
274 if 'CVS' in names:
275 names.remove('CVS')
276
277 # add all *.py files to list
278 list.extend(
279 [os.path.join(dirname, file)
280 for file in names
281 if os.path.splitext(file)[1] == _py_ext])
282
283
284def _get_modpkg_path(dotted_name, pathlist=None):
285 """ Get the filesystem path for a module or a package.
286
287 Return the file system path to a file for a module,
288 and to a directory for a package. Return None if
289 the name is not found, or is a builtin or extension module.
290 """
291 import imp
292
293 # split off top-most name
294 parts = dotted_name.split('.', 1)
295
296 if len(parts) > 1:
297 # we have a dotted path, import top-level package
298 try:
299 file, pathname, description = imp.find_module(parts[0], pathlist)
300 if file: file.close()
301 except ImportError:
302 return None
303
304 # check if it's indeed a package
305 if description[2] == imp.PKG_DIRECTORY:
306 # recursively handle the remaining name parts
307 pathname = _get_modpkg_path(parts[1], [pathname])
308 else:
309 pathname = None
310 else:
311 # plain name
312 try:
313 file, pathname, description = imp.find_module(dotted_name, pathlist)
314 if file: file.close()
315 if description[2] not in [imp.PY_SOURCE, imp.PKG_DIRECTORY]:
316 pathname = None
317 except ImportError:
318 pathname = None
319
320 return pathname
321
322
323def getFilesForName(name):
324 """ Get a list of module files for a filename, a module or package name,
325 or a directory.
326 """
327 import imp
328
329 if not os.path.exists(name):
330 # check for glob chars
331 if containsAny(name, "*?[]"):
332 import glob
333 files = glob.glob(name)
334 list = []
335 for file in files:
336 list.extend(getFilesForName(file))
337 return list
338
339 # try to find module or package
340 name = _get_modpkg_path(name)
341 if not name:
342 return []
343
344 if os.path.isdir(name):
345 # find all python files in directory
346 list = []
347 os.path.walk(name, _visit_pyfiles, list)
348 return list
349 elif os.path.exists(name):
350 # a single file
351 return [name]
352
353 return []
Barry Warsawe27db5a1999-08-13 20:59:48 +0000354
355
356class TokenEater:
357 def __init__(self, options):
358 self.__options = options
359 self.__messages = {}
360 self.__state = self.__waiting
361 self.__data = []
362 self.__lineno = -1
Barry Warsaw08a8a352000-10-27 04:56:28 +0000363 self.__freshmodule = 1
Barry Warsaw63ce5af2001-07-27 16:47:18 +0000364 self.__curfile = None
Barry Warsawe27db5a1999-08-13 20:59:48 +0000365
366 def __call__(self, ttype, tstring, stup, etup, line):
367 # dispatch
Barry Warsaw08a8a352000-10-27 04:56:28 +0000368## import token
369## print >> sys.stderr, 'ttype:', token.tok_name[ttype], \
370## 'tstring:', tstring
Barry Warsawe27db5a1999-08-13 20:59:48 +0000371 self.__state(ttype, tstring, stup[0])
372
373 def __waiting(self, ttype, tstring, lineno):
Barry Warsaw63ce5af2001-07-27 16:47:18 +0000374 opts = self.__options
Barry Warsaw08a8a352000-10-27 04:56:28 +0000375 # Do docstring extractions, if enabled
Barry Warsaw63ce5af2001-07-27 16:47:18 +0000376 if opts.docstrings and not opts.nodocstrings.get(self.__curfile):
Barry Warsaw08a8a352000-10-27 04:56:28 +0000377 # module docstring?
378 if self.__freshmodule:
379 if ttype == tokenize.STRING:
Barry Warsaw16b62c12001-05-21 19:51:26 +0000380 self.__addentry(safe_eval(tstring), lineno, isdocstring=1)
Barry Warsaw08a8a352000-10-27 04:56:28 +0000381 self.__freshmodule = 0
382 elif ttype not in (tokenize.COMMENT, tokenize.NL):
383 self.__freshmodule = 0
384 return
385 # class docstring?
386 if ttype == tokenize.NAME and tstring in ('class', 'def'):
387 self.__state = self.__suiteseen
388 return
Barry Warsaw63ce5af2001-07-27 16:47:18 +0000389 if ttype == tokenize.NAME and tstring in opts.keywords:
Barry Warsawe27db5a1999-08-13 20:59:48 +0000390 self.__state = self.__keywordseen
391
Barry Warsaw08a8a352000-10-27 04:56:28 +0000392 def __suiteseen(self, ttype, tstring, lineno):
393 # ignore anything until we see the colon
394 if ttype == tokenize.OP and tstring == ':':
395 self.__state = self.__suitedocstring
396
397 def __suitedocstring(self, ttype, tstring, lineno):
398 # ignore any intervening noise
399 if ttype == tokenize.STRING:
Barry Warsaw16b62c12001-05-21 19:51:26 +0000400 self.__addentry(safe_eval(tstring), lineno, isdocstring=1)
Barry Warsaw08a8a352000-10-27 04:56:28 +0000401 self.__state = self.__waiting
402 elif ttype not in (tokenize.NEWLINE, tokenize.INDENT,
403 tokenize.COMMENT):
404 # there was no class docstring
405 self.__state = self.__waiting
406
Barry Warsawe27db5a1999-08-13 20:59:48 +0000407 def __keywordseen(self, ttype, tstring, lineno):
408 if ttype == tokenize.OP and tstring == '(':
409 self.__data = []
410 self.__lineno = lineno
411 self.__state = self.__openseen
412 else:
413 self.__state = self.__waiting
414
415 def __openseen(self, ttype, tstring, lineno):
416 if ttype == tokenize.OP and tstring == ')':
417 # We've seen the last of the translatable strings. Record the
418 # line number of the first line of the strings and update the list
419 # of messages seen. Reset state for the next batch. If there
420 # were no strings inside _(), then just ignore this entry.
421 if self.__data:
Barry Warsaw08a8a352000-10-27 04:56:28 +0000422 self.__addentry(EMPTYSTRING.join(self.__data))
Barry Warsawe27db5a1999-08-13 20:59:48 +0000423 self.__state = self.__waiting
424 elif ttype == tokenize.STRING:
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000425 self.__data.append(safe_eval(tstring))
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000426 elif ttype not in [tokenize.COMMENT, token.INDENT, token.DEDENT,
427 token.NEWLINE, tokenize.NL]:
428 # warn if we see anything else than STRING or whitespace
429 print >>sys.stderr, _('*** %(file)s:%(lineno)s: Seen unexpected token "%(token)s"') % {
430 'token': tstring, 'file': self.__curfile, 'lineno': self.__lineno}
431 self.__state = self.__waiting
Barry Warsawe27db5a1999-08-13 20:59:48 +0000432
Barry Warsaw16b62c12001-05-21 19:51:26 +0000433 def __addentry(self, msg, lineno=None, isdocstring=0):
Barry Warsaw08a8a352000-10-27 04:56:28 +0000434 if lineno is None:
435 lineno = self.__lineno
436 if not msg in self.__options.toexclude:
437 entry = (self.__curfile, lineno)
Barry Warsaw16b62c12001-05-21 19:51:26 +0000438 self.__messages.setdefault(msg, {})[entry] = isdocstring
Barry Warsaw08a8a352000-10-27 04:56:28 +0000439
Barry Warsawe27db5a1999-08-13 20:59:48 +0000440 def set_filename(self, filename):
441 self.__curfile = filename
Barry Warsaw63ce5af2001-07-27 16:47:18 +0000442 self.__freshmodule = 1
Barry Warsawe27db5a1999-08-13 20:59:48 +0000443
444 def write(self, fp):
445 options = self.__options
446 timestamp = time.ctime(time.time())
Barry Warsaw08a8a352000-10-27 04:56:28 +0000447 # The time stamp in the header doesn't have the same format as that
448 # generated by xgettext...
449 print >> fp, pot_header % {'time': timestamp, 'version': __version__}
Barry Warsaw128c77d2001-05-23 16:59:45 +0000450 # Sort the entries. First sort each particular entry's keys, then
451 # sort all the entries by their first item.
452 reverse = {}
Fred Drake33e2c3e2000-10-26 03:49:15 +0000453 for k, v in self.__messages.items():
Barry Warsaw128c77d2001-05-23 16:59:45 +0000454 keys = v.keys()
455 keys.sort()
Barry Warsaw50cf7062001-05-24 23:06:13 +0000456 reverse.setdefault(tuple(keys), []).append((k, v))
Barry Warsaw128c77d2001-05-23 16:59:45 +0000457 rkeys = reverse.keys()
458 rkeys.sort()
459 for rkey in rkeys:
Barry Warsaw50cf7062001-05-24 23:06:13 +0000460 rentries = reverse[rkey]
461 rentries.sort()
462 for k, v in rentries:
Barry Warsaw5c94ce52001-06-20 19:41:40 +0000463 isdocstring = 0
Barry Warsaw50cf7062001-05-24 23:06:13 +0000464 # If the entry was gleaned out of a docstring, then add a
465 # comment stating so. This is to aid translators who may wish
466 # to skip translating some unimportant docstrings.
467 if reduce(operator.__add__, v.values()):
Barry Warsaw5c94ce52001-06-20 19:41:40 +0000468 isdocstring = 1
Barry Warsaw50cf7062001-05-24 23:06:13 +0000469 # k is the message string, v is a dictionary-set of (filename,
470 # lineno) tuples. We want to sort the entries in v first by
471 # file name and then by line number.
472 v = v.keys()
473 v.sort()
474 if not options.writelocations:
475 pass
476 # location comments are different b/w Solaris and GNU:
477 elif options.locationstyle == options.SOLARIS:
478 for filename, lineno in v:
479 d = {'filename': filename, 'lineno': lineno}
480 print >>fp, _(
481 '# File: %(filename)s, line: %(lineno)d') % d
482 elif options.locationstyle == options.GNU:
483 # fit as many locations on one line, as long as the
484 # resulting line length doesn't exceeds 'options.width'
485 locline = '#:'
486 for filename, lineno in v:
487 d = {'filename': filename, 'lineno': lineno}
488 s = _(' %(filename)s:%(lineno)d') % d
489 if len(locline) + len(s) <= options.width:
490 locline = locline + s
491 else:
492 print >> fp, locline
493 locline = "#:" + s
494 if len(locline) > 2:
Barry Warsaw08a8a352000-10-27 04:56:28 +0000495 print >> fp, locline
Barry Warsaw5c94ce52001-06-20 19:41:40 +0000496 if isdocstring:
497 print >> fp, '#, docstring'
Barry Warsaw50cf7062001-05-24 23:06:13 +0000498 print >> fp, 'msgid', normalize(k)
499 print >> fp, 'msgstr ""\n'
Barry Warsaw08a8a352000-10-27 04:56:28 +0000500
Barry Warsawe27db5a1999-08-13 20:59:48 +0000501
502
503def main():
Barry Warsawa17e0f12000-03-08 15:18:35 +0000504 global default_keywords
Barry Warsawe27db5a1999-08-13 20:59:48 +0000505 try:
506 opts, args = getopt.getopt(
507 sys.argv[1:],
Barry Warsaw63ce5af2001-07-27 16:47:18 +0000508 'ad:DEhk:Kno:p:S:Vvw:x:X:',
Barry Warsaw2b639692001-05-21 19:58:23 +0000509 ['extract-all', 'default-domain=', 'escape', 'help',
Barry Warsawa17e0f12000-03-08 15:18:35 +0000510 'keyword=', 'no-default-keywords',
Barry Warsawc8f08922000-02-26 20:56:47 +0000511 'add-location', 'no-location', 'output=', 'output-dir=',
Barry Warsawa17e0f12000-03-08 15:18:35 +0000512 'style=', 'verbose', 'version', 'width=', 'exclude-file=',
Barry Warsaw63ce5af2001-07-27 16:47:18 +0000513 'docstrings', 'no-docstrings',
Barry Warsawc8f08922000-02-26 20:56:47 +0000514 ])
Barry Warsawe27db5a1999-08-13 20:59:48 +0000515 except getopt.error, msg:
516 usage(1, msg)
517
518 # for holding option values
519 class Options:
520 # constants
521 GNU = 1
522 SOLARIS = 2
523 # defaults
Barry Warsawc8f08922000-02-26 20:56:47 +0000524 extractall = 0 # FIXME: currently this option has no effect at all.
525 escape = 0
Barry Warsawe27db5a1999-08-13 20:59:48 +0000526 keywords = []
Barry Warsawc8f08922000-02-26 20:56:47 +0000527 outpath = ''
Barry Warsawe27db5a1999-08-13 20:59:48 +0000528 outfile = 'messages.pot'
Barry Warsawa17e0f12000-03-08 15:18:35 +0000529 writelocations = 1
530 locationstyle = GNU
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000531 verbose = 0
Barry Warsawc8f08922000-02-26 20:56:47 +0000532 width = 78
533 excludefilename = ''
Barry Warsaw08a8a352000-10-27 04:56:28 +0000534 docstrings = 0
Barry Warsaw63ce5af2001-07-27 16:47:18 +0000535 nodocstrings = {}
Barry Warsawe27db5a1999-08-13 20:59:48 +0000536
537 options = Options()
538 locations = {'gnu' : options.GNU,
539 'solaris' : options.SOLARIS,
540 }
541
542 # parse options
543 for opt, arg in opts:
544 if opt in ('-h', '--help'):
545 usage(0)
Barry Warsawc8f08922000-02-26 20:56:47 +0000546 elif opt in ('-a', '--extract-all'):
547 options.extractall = 1
548 elif opt in ('-d', '--default-domain'):
549 options.outfile = arg + '.pot'
550 elif opt in ('-E', '--escape'):
551 options.escape = 1
Barry Warsaw08a8a352000-10-27 04:56:28 +0000552 elif opt in ('-D', '--docstrings'):
553 options.docstrings = 1
Barry Warsawe27db5a1999-08-13 20:59:48 +0000554 elif opt in ('-k', '--keyword'):
Barry Warsawe27db5a1999-08-13 20:59:48 +0000555 options.keywords.append(arg)
Barry Warsawa17e0f12000-03-08 15:18:35 +0000556 elif opt in ('-K', '--no-default-keywords'):
557 default_keywords = []
Barry Warsawe27db5a1999-08-13 20:59:48 +0000558 elif opt in ('-n', '--add-location'):
Barry Warsawa17e0f12000-03-08 15:18:35 +0000559 options.writelocations = 1
Barry Warsawe27db5a1999-08-13 20:59:48 +0000560 elif opt in ('--no-location',):
Barry Warsawa17e0f12000-03-08 15:18:35 +0000561 options.writelocations = 0
562 elif opt in ('-S', '--style'):
563 options.locationstyle = locations.get(arg.lower())
564 if options.locationstyle is None:
565 usage(1, _('Invalid value for --style: %s') % arg)
Barry Warsawc8f08922000-02-26 20:56:47 +0000566 elif opt in ('-o', '--output'):
567 options.outfile = arg
568 elif opt in ('-p', '--output-dir'):
569 options.outpath = arg
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000570 elif opt in ('-v', '--verbose'):
571 options.verbose = 1
Barry Warsawc8f08922000-02-26 20:56:47 +0000572 elif opt in ('-V', '--version'):
573 print _('pygettext.py (xgettext for Python) %s') % __version__
574 sys.exit(0)
575 elif opt in ('-w', '--width'):
576 try:
577 options.width = int(arg)
578 except ValueError:
Barry Warsawa17e0f12000-03-08 15:18:35 +0000579 usage(1, _('--width argument must be an integer: %s') % arg)
Barry Warsawc8f08922000-02-26 20:56:47 +0000580 elif opt in ('-x', '--exclude-file'):
581 options.excludefilename = arg
Barry Warsaw63ce5af2001-07-27 16:47:18 +0000582 elif opt in ('-X', '--no-docstrings'):
583 fp = open(arg)
584 try:
585 while 1:
586 line = fp.readline()
587 if not line:
588 break
589 options.nodocstrings[line[:-1]] = 1
590 finally:
591 fp.close()
Barry Warsawc8f08922000-02-26 20:56:47 +0000592
593 # calculate escapes
Barry Warsaw7733e122000-02-27 14:30:48 +0000594 make_escapes(options.escape)
Barry Warsawe27db5a1999-08-13 20:59:48 +0000595
596 # calculate all keywords
597 options.keywords.extend(default_keywords)
598
Barry Warsawc8f08922000-02-26 20:56:47 +0000599 # initialize list of strings to exclude
600 if options.excludefilename:
601 try:
602 fp = open(options.excludefilename)
603 options.toexclude = fp.readlines()
604 fp.close()
605 except IOError:
Barry Warsaw6e972412001-05-21 19:35:20 +0000606 print >> sys.stderr, _(
607 "Can't read --exclude-file: %s") % options.excludefilename
Barry Warsawc8f08922000-02-26 20:56:47 +0000608 sys.exit(1)
609 else:
610 options.toexclude = []
611
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000612 # resolve args to module lists
613 expanded = []
614 for arg in args:
615 if arg == '-':
616 expanded.append(arg)
617 else:
618 expanded.extend(getFilesForName(arg))
619 args = expanded
620
Barry Warsawe27db5a1999-08-13 20:59:48 +0000621 # slurp through all the files
622 eater = TokenEater(options)
623 for filename in args:
Barry Warsawa17e0f12000-03-08 15:18:35 +0000624 if filename == '-':
625 if options.verbose:
626 print _('Reading standard input')
627 fp = sys.stdin
628 closep = 0
629 else:
630 if options.verbose:
631 print _('Working on %s') % filename
632 fp = open(filename)
633 closep = 1
634 try:
635 eater.set_filename(filename)
Barry Warsaw75ee8f52001-02-26 04:46:53 +0000636 try:
637 tokenize.tokenize(fp.readline, eater)
638 except tokenize.TokenError, e:
Barry Warsaw6e972412001-05-21 19:35:20 +0000639 print >> sys.stderr, '%s: %s, line %d, column %d' % (
640 e[0], filename, e[1][0], e[1][1])
Barry Warsawa17e0f12000-03-08 15:18:35 +0000641 finally:
642 if closep:
643 fp.close()
Barry Warsawe27db5a1999-08-13 20:59:48 +0000644
Barry Warsawa17e0f12000-03-08 15:18:35 +0000645 # write the output
646 if options.outfile == '-':
647 fp = sys.stdout
648 closep = 0
649 else:
650 if options.outpath:
651 options.outfile = os.path.join(options.outpath, options.outfile)
652 fp = open(options.outfile, 'w')
653 closep = 1
654 try:
655 eater.write(fp)
656 finally:
657 if closep:
658 fp.close()
Barry Warsawe27db5a1999-08-13 20:59:48 +0000659
660
661if __name__ == '__main__':
662 main()
Barry Warsaw75a6e672000-05-02 19:28:30 +0000663 # some more test strings
664 _(u'a unicode string')
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000665 _('*** Seen unexpected token "%(token)s"' % {'token': 'test'}) # this one creates a warning
666 _('more' 'than' 'one' 'string')
667