blob: 67a960f4f8253e8309640e671c78de397728faab [file] [log] [blame]
Benjamin Peterson90f5ba52010-03-11 22:53:45 +00001#! /usr/bin/env python3
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +00002# -*- coding: iso-8859-1 -*-
Barry Warsaw63ce5af2001-07-27 16:47:18 +00003# Originally written by Barry Warsaw <barry@zope.com>
Barry Warsawc8f08922000-02-26 20:56:47 +00004#
Barry Warsawe04ee702003-04-16 18:08:23 +00005# Minimally patched to make it even more xgettext compatible
Barry Warsawc8f08922000-02-26 20:56:47 +00006# by Peter Funk <pf@artcom-gmbh.de>
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +00007#
8# 2002-11-22 Jürgen Hermann <jh@web.de>
9# Added checks that _() only contains string literals, and
10# command line args are resolved to module lists, i.e. you
11# can now pass a filename, a module or package name, or a
12# directory (including globbing chars, important for Win32).
13# Made docstring fit in 80 chars wide displays using pydoc.
14#
Barry Warsawe27db5a1999-08-13 20:59:48 +000015
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +000016# for selftesting
17try:
18 import fintl
19 _ = fintl.gettext
20except ImportError:
21 _ = lambda s: s
22
23__doc__ = _("""pygettext -- Python equivalent of xgettext(1)
Barry Warsawe27db5a1999-08-13 20:59:48 +000024
25Many systems (Solaris, Linux, Gnu) provide extensive tools that ease the
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +000026internationalization of C programs. Most of these tools are independent of
27the programming language and can be used from within Python programs.
Barry Warsawe04ee702003-04-16 18:08:23 +000028Martin von Loewis' work[1] helps considerably in this regard.
Barry Warsawe27db5a1999-08-13 20:59:48 +000029
Barry Warsaw5dbf5261999-11-03 18:47:52 +000030There's one problem though; xgettext is the program that scans source code
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +000031looking for message strings, but it groks only C (or C++). Python
32introduces a few wrinkles, such as dual quoting characters, triple quoted
Barry Warsawe04ee702003-04-16 18:08:23 +000033strings, and raw strings. xgettext understands none of this.
Barry Warsawe27db5a1999-08-13 20:59:48 +000034
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +000035Enter pygettext, which uses Python's standard tokenize module to scan
36Python source code, generating .pot files identical to what GNU xgettext[2]
37generates for C and C++ code. From there, the standard GNU tools can be
Barry Warsawe04ee702003-04-16 18:08:23 +000038used.
Barry Warsawe27db5a1999-08-13 20:59:48 +000039
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +000040A word about marking Python strings as candidates for translation. GNU
41xgettext recognizes the following keywords: gettext, dgettext, dcgettext,
42and gettext_noop. But those can be a lot of text to include all over your
43code. C and C++ have a trick: they use the C preprocessor. Most
44internationalized C source includes a #define for gettext() to _() so that
45what has to be written in the source is much less. Thus these are both
Barry Warsawe04ee702003-04-16 18:08:23 +000046translatable strings:
Barry Warsawe27db5a1999-08-13 20:59:48 +000047
48 gettext("Translatable String")
49 _("Translatable String")
50
51Python of course has no preprocessor so this doesn't work so well. Thus,
52pygettext searches only for _() by default, but see the -k/--keyword flag
53below for how to augment this.
54
55 [1] http://www.python.org/workshops/1997-10/proceedings/loewis.html
56 [2] http://www.gnu.org/software/gettext/gettext.html
57
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +000058NOTE: pygettext attempts to be option and feature compatible with GNU
59xgettext where ever possible. However some options are still missing or are
60not fully implemented. Also, xgettext's use of command line switches with
61option arguments is broken, and in these cases, pygettext just defines
Barry Warsawe04ee702003-04-16 18:08:23 +000062additional switches.
Barry Warsawe27db5a1999-08-13 20:59:48 +000063
Barry Warsawa17e0f12000-03-08 15:18:35 +000064Usage: pygettext [options] inputfile ...
Barry Warsawe27db5a1999-08-13 20:59:48 +000065
66Options:
67
68 -a
69 --extract-all
Barry Warsaw63ce5af2001-07-27 16:47:18 +000070 Extract all strings.
Barry Warsawe27db5a1999-08-13 20:59:48 +000071
Barry Warsawc8f08922000-02-26 20:56:47 +000072 -d name
73 --default-domain=name
Barry Warsaw63ce5af2001-07-27 16:47:18 +000074 Rename the default output file from messages.pot to name.pot.
Barry Warsawc8f08922000-02-26 20:56:47 +000075
76 -E
77 --escape
Barry Warsaw08a8a352000-10-27 04:56:28 +000078 Replace non-ASCII characters with octal escape sequences.
79
80 -D
81 --docstrings
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +000082 Extract module, class, method, and function docstrings. These do
83 not need to be wrapped in _() markers, and in fact cannot be for
84 Python to consider them docstrings. (See also the -X option).
Barry Warsawc8f08922000-02-26 20:56:47 +000085
86 -h
87 --help
Barry Warsaw63ce5af2001-07-27 16:47:18 +000088 Print this help message and exit.
Barry Warsawe27db5a1999-08-13 20:59:48 +000089
Barry Warsawa17e0f12000-03-08 15:18:35 +000090 -k word
91 --keyword=word
92 Keywords to look for in addition to the default set, which are:
93 %(DEFAULTKEYWORDS)s
Barry Warsawe27db5a1999-08-13 20:59:48 +000094
Barry Warsawa17e0f12000-03-08 15:18:35 +000095 You can have multiple -k flags on the command line.
96
97 -K
98 --no-default-keywords
99 Disable the default set of keywords (see above). Any keywords
100 explicitly added with the -k/--keyword option are still recognized.
Barry Warsawe27db5a1999-08-13 20:59:48 +0000101
102 --no-location
Barry Warsawa17e0f12000-03-08 15:18:35 +0000103 Do not write filename/lineno location comments.
Barry Warsawe27db5a1999-08-13 20:59:48 +0000104
Barry Warsawa17e0f12000-03-08 15:18:35 +0000105 -n
106 --add-location
Barry Warsawe27db5a1999-08-13 20:59:48 +0000107 Write filename/lineno location comments indicating where each
108 extracted string is found in the source. These lines appear before
Barry Warsawa17e0f12000-03-08 15:18:35 +0000109 each msgid. The style of comments is controlled by the -S/--style
110 option. This is the default.
111
Barry Warsaw08a8a352000-10-27 04:56:28 +0000112 -o filename
113 --output=filename
114 Rename the default output file from messages.pot to filename. If
115 filename is `-' then the output is sent to standard out.
116
117 -p dir
118 --output-dir=dir
119 Output files will be placed in directory dir.
120
Barry Warsawa17e0f12000-03-08 15:18:35 +0000121 -S stylename
122 --style stylename
123 Specify which style to use for location comments. Two styles are
124 supported:
Barry Warsawe27db5a1999-08-13 20:59:48 +0000125
126 Solaris # File: filename, line: line-number
Barry Warsawa17e0f12000-03-08 15:18:35 +0000127 GNU #: filename:line
Barry Warsawe27db5a1999-08-13 20:59:48 +0000128
Barry Warsawa17e0f12000-03-08 15:18:35 +0000129 The style name is case insensitive. GNU style is the default.
Barry Warsawe27db5a1999-08-13 20:59:48 +0000130
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000131 -v
132 --verbose
133 Print the names of the files being processed.
134
Barry Warsawc8f08922000-02-26 20:56:47 +0000135 -V
136 --version
137 Print the version of pygettext and exit.
138
139 -w columns
140 --width=columns
141 Set width of output to columns.
142
143 -x filename
144 --exclude-file=filename
145 Specify a file that contains a list of strings that are not be
146 extracted from the input files. Each string to be excluded must
147 appear on a line by itself in the file.
Barry Warsawe27db5a1999-08-13 20:59:48 +0000148
Barry Warsaw63ce5af2001-07-27 16:47:18 +0000149 -X filename
150 --no-docstrings=filename
151 Specify a file that contains a list of files (one per line) that
152 should not have their docstrings extracted. This is only useful in
153 conjunction with the -D option above.
Barry Warsawa17e0f12000-03-08 15:18:35 +0000154
Barry Warsaw63ce5af2001-07-27 16:47:18 +0000155If `inputfile' is -, standard input is read.
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000156""")
Barry Warsawe27db5a1999-08-13 20:59:48 +0000157
158import os
Barry Warsawe04ee702003-04-16 18:08:23 +0000159import imp
Barry Warsawe27db5a1999-08-13 20:59:48 +0000160import sys
Barry Warsawe04ee702003-04-16 18:08:23 +0000161import glob
Barry Warsawe27db5a1999-08-13 20:59:48 +0000162import time
163import getopt
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000164import token
Barry Warsawe27db5a1999-08-13 20:59:48 +0000165import tokenize
Barry Warsaw16b62c12001-05-21 19:51:26 +0000166import operator
Barry Warsawe27db5a1999-08-13 20:59:48 +0000167
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000168__version__ = '1.5'
Barry Warsawa17e0f12000-03-08 15:18:35 +0000169
170default_keywords = ['_']
171DEFAULTKEYWORDS = ', '.join(default_keywords)
172
173EMPTYSTRING = ''
Barry Warsawe27db5a1999-08-13 20:59:48 +0000174
175
176
Barry Warsaw63ce5af2001-07-27 16:47:18 +0000177# The normal pot-file header. msgmerge and Emacs's po-mode work better if it's
178# there.
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000179pot_header = _('''\
180# SOME DESCRIPTIVE TITLE.
181# Copyright (C) YEAR ORGANIZATION
182# FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.
183#
184msgid ""
185msgstr ""
186"Project-Id-Version: PACKAGE VERSION\\n"
Martin v. Löwis0f6b3832001-03-01 22:56:17 +0000187"POT-Creation-Date: %(time)s\\n"
188"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\\n"
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000189"Last-Translator: FULL NAME <EMAIL@ADDRESS>\\n"
190"Language-Team: LANGUAGE <LL@li.org>\\n"
191"MIME-Version: 1.0\\n"
192"Content-Type: text/plain; charset=CHARSET\\n"
193"Content-Transfer-Encoding: ENCODING\\n"
194"Generated-By: pygettext.py %(version)s\\n"
195
196''')
197
198
Barry Warsawe27db5a1999-08-13 20:59:48 +0000199def usage(code, msg=''):
Collin Winter6afaeb72007-08-03 17:06:41 +0000200 print(__doc__ % globals(), file=sys.stderr)
Barry Warsawe27db5a1999-08-13 20:59:48 +0000201 if msg:
Collin Winter6afaeb72007-08-03 17:06:41 +0000202 print(msg, file=sys.stderr)
Barry Warsawe27db5a1999-08-13 20:59:48 +0000203 sys.exit(code)
204
Barry Warsawc8f08922000-02-26 20:56:47 +0000205
Barry Warsawe27db5a1999-08-13 20:59:48 +0000206
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000207escapes = []
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000208
Barry Warsawc8f08922000-02-26 20:56:47 +0000209def make_escapes(pass_iso8859):
210 global escapes
Barry Warsaw7733e122000-02-27 14:30:48 +0000211 if pass_iso8859:
212 # Allow iso-8859 characters to pass through so that e.g. 'msgid
213 # "Höhe"' would result not result in 'msgid "H\366he"'. Otherwise we
214 # escape any character outside the 32..126 range.
215 mod = 128
216 else:
217 mod = 256
Barry Warsawc8f08922000-02-26 20:56:47 +0000218 for i in range(256):
Barry Warsaw7733e122000-02-27 14:30:48 +0000219 if 32 <= (i % mod) <= 126:
Barry Warsawc8f08922000-02-26 20:56:47 +0000220 escapes.append(chr(i))
221 else:
222 escapes.append("\\%03o" % i)
223 escapes[ord('\\')] = '\\\\'
224 escapes[ord('\t')] = '\\t'
225 escapes[ord('\r')] = '\\r'
226 escapes[ord('\n')] = '\\n'
227 escapes[ord('\"')] = '\\"'
228
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000229
230def escape(s):
Barry Warsawc8f08922000-02-26 20:56:47 +0000231 global escapes
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000232 s = list(s)
233 for i in range(len(s)):
234 s[i] = escapes[ord(s[i])]
Barry Warsawa17e0f12000-03-08 15:18:35 +0000235 return EMPTYSTRING.join(s)
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000236
237
238def safe_eval(s):
239 # unwrap quotes, safely
240 return eval(s, {'__builtins__':{}}, {})
241
242
Barry Warsawe27db5a1999-08-13 20:59:48 +0000243def normalize(s):
244 # This converts the various Python string types into a format that is
245 # appropriate for .po files, namely much closer to C style.
Barry Warsawa17e0f12000-03-08 15:18:35 +0000246 lines = s.split('\n')
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000247 if len(lines) == 1:
248 s = '"' + escape(s) + '"'
Barry Warsawe27db5a1999-08-13 20:59:48 +0000249 else:
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000250 if not lines[-1]:
251 del lines[-1]
252 lines[-1] = lines[-1] + '\n'
253 for i in range(len(lines)):
254 lines[i] = escape(lines[i])
Barry Warsawa17e0f12000-03-08 15:18:35 +0000255 lineterm = '\\n"\n"'
256 s = '""\n"' + lineterm.join(lines) + '"'
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000257 return s
Barry Warsawe27db5a1999-08-13 20:59:48 +0000258
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000259
260def containsAny(str, set):
Barry Warsawe04ee702003-04-16 18:08:23 +0000261 """Check whether 'str' contains ANY of the chars in 'set'"""
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000262 return 1 in [c in str for c in set]
263
264
265def _visit_pyfiles(list, dirname, names):
Barry Warsawe04ee702003-04-16 18:08:23 +0000266 """Helper for getFilesForName()."""
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000267 # get extension for python source files
Georg Brandlbf82e372008-05-16 17:02:34 +0000268 if '_py_ext' not in globals():
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000269 global _py_ext
Barry Warsawe04ee702003-04-16 18:08:23 +0000270 _py_ext = [triple[0] for triple in imp.get_suffixes()
271 if triple[2] == imp.PY_SOURCE][0]
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000272
273 # don't recurse into CVS directories
274 if 'CVS' in names:
275 names.remove('CVS')
276
277 # add all *.py files to list
278 list.extend(
Barry Warsawe04ee702003-04-16 18:08:23 +0000279 [os.path.join(dirname, file) for file in names
280 if os.path.splitext(file)[1] == _py_ext]
281 )
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000282
283
284def _get_modpkg_path(dotted_name, pathlist=None):
Barry Warsawe04ee702003-04-16 18:08:23 +0000285 """Get the filesystem path for a module or a package.
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000286
Barry Warsawe04ee702003-04-16 18:08:23 +0000287 Return the file system path to a file for a module, and to a directory for
288 a package. Return None if the name is not found, or is a builtin or
289 extension module.
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000290 """
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000291 # split off top-most name
292 parts = dotted_name.split('.', 1)
293
294 if len(parts) > 1:
295 # we have a dotted path, import top-level package
296 try:
297 file, pathname, description = imp.find_module(parts[0], pathlist)
298 if file: file.close()
299 except ImportError:
300 return None
301
302 # check if it's indeed a package
303 if description[2] == imp.PKG_DIRECTORY:
304 # recursively handle the remaining name parts
305 pathname = _get_modpkg_path(parts[1], [pathname])
306 else:
307 pathname = None
308 else:
309 # plain name
310 try:
Barry Warsawe04ee702003-04-16 18:08:23 +0000311 file, pathname, description = imp.find_module(
312 dotted_name, pathlist)
313 if file:
314 file.close()
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000315 if description[2] not in [imp.PY_SOURCE, imp.PKG_DIRECTORY]:
316 pathname = None
317 except ImportError:
318 pathname = None
319
320 return pathname
321
322
323def getFilesForName(name):
Barry Warsawe04ee702003-04-16 18:08:23 +0000324 """Get a list of module files for a filename, a module or package name,
325 or a directory.
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000326 """
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000327 if not os.path.exists(name):
328 # check for glob chars
329 if containsAny(name, "*?[]"):
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000330 files = glob.glob(name)
331 list = []
332 for file in files:
333 list.extend(getFilesForName(file))
334 return list
335
336 # try to find module or package
337 name = _get_modpkg_path(name)
338 if not name:
339 return []
340
341 if os.path.isdir(name):
342 # find all python files in directory
343 list = []
Alexandre Vassalotti4e6531e2008-05-09 20:00:17 +0000344 os.walk(name, _visit_pyfiles, list)
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000345 return list
346 elif os.path.exists(name):
347 # a single file
348 return [name]
349
350 return []
Barry Warsawe27db5a1999-08-13 20:59:48 +0000351
352
353class TokenEater:
354 def __init__(self, options):
355 self.__options = options
356 self.__messages = {}
357 self.__state = self.__waiting
358 self.__data = []
359 self.__lineno = -1
Barry Warsaw08a8a352000-10-27 04:56:28 +0000360 self.__freshmodule = 1
Barry Warsaw63ce5af2001-07-27 16:47:18 +0000361 self.__curfile = None
Barry Warsawe27db5a1999-08-13 20:59:48 +0000362
363 def __call__(self, ttype, tstring, stup, etup, line):
364 # dispatch
Barry Warsaw08a8a352000-10-27 04:56:28 +0000365## import token
366## print >> sys.stderr, 'ttype:', token.tok_name[ttype], \
367## 'tstring:', tstring
Barry Warsawe27db5a1999-08-13 20:59:48 +0000368 self.__state(ttype, tstring, stup[0])
369
370 def __waiting(self, ttype, tstring, lineno):
Barry Warsaw63ce5af2001-07-27 16:47:18 +0000371 opts = self.__options
Barry Warsaw08a8a352000-10-27 04:56:28 +0000372 # Do docstring extractions, if enabled
Barry Warsaw63ce5af2001-07-27 16:47:18 +0000373 if opts.docstrings and not opts.nodocstrings.get(self.__curfile):
Barry Warsaw08a8a352000-10-27 04:56:28 +0000374 # module docstring?
375 if self.__freshmodule:
376 if ttype == tokenize.STRING:
Barry Warsaw16b62c12001-05-21 19:51:26 +0000377 self.__addentry(safe_eval(tstring), lineno, isdocstring=1)
Barry Warsaw08a8a352000-10-27 04:56:28 +0000378 self.__freshmodule = 0
379 elif ttype not in (tokenize.COMMENT, tokenize.NL):
380 self.__freshmodule = 0
381 return
382 # class docstring?
383 if ttype == tokenize.NAME and tstring in ('class', 'def'):
384 self.__state = self.__suiteseen
385 return
Barry Warsaw63ce5af2001-07-27 16:47:18 +0000386 if ttype == tokenize.NAME and tstring in opts.keywords:
Barry Warsawe27db5a1999-08-13 20:59:48 +0000387 self.__state = self.__keywordseen
388
Barry Warsaw08a8a352000-10-27 04:56:28 +0000389 def __suiteseen(self, ttype, tstring, lineno):
390 # ignore anything until we see the colon
391 if ttype == tokenize.OP and tstring == ':':
392 self.__state = self.__suitedocstring
393
394 def __suitedocstring(self, ttype, tstring, lineno):
395 # ignore any intervening noise
396 if ttype == tokenize.STRING:
Barry Warsaw16b62c12001-05-21 19:51:26 +0000397 self.__addentry(safe_eval(tstring), lineno, isdocstring=1)
Barry Warsaw08a8a352000-10-27 04:56:28 +0000398 self.__state = self.__waiting
399 elif ttype not in (tokenize.NEWLINE, tokenize.INDENT,
400 tokenize.COMMENT):
401 # there was no class docstring
402 self.__state = self.__waiting
403
Barry Warsawe27db5a1999-08-13 20:59:48 +0000404 def __keywordseen(self, ttype, tstring, lineno):
405 if ttype == tokenize.OP and tstring == '(':
406 self.__data = []
407 self.__lineno = lineno
408 self.__state = self.__openseen
409 else:
410 self.__state = self.__waiting
411
412 def __openseen(self, ttype, tstring, lineno):
413 if ttype == tokenize.OP and tstring == ')':
414 # We've seen the last of the translatable strings. Record the
Barry Warsawe04ee702003-04-16 18:08:23 +0000415 # line number of the first line of the strings and update the list
Barry Warsawe27db5a1999-08-13 20:59:48 +0000416 # of messages seen. Reset state for the next batch. If there
417 # were no strings inside _(), then just ignore this entry.
418 if self.__data:
Barry Warsaw08a8a352000-10-27 04:56:28 +0000419 self.__addentry(EMPTYSTRING.join(self.__data))
Barry Warsawe27db5a1999-08-13 20:59:48 +0000420 self.__state = self.__waiting
421 elif ttype == tokenize.STRING:
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000422 self.__data.append(safe_eval(tstring))
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000423 elif ttype not in [tokenize.COMMENT, token.INDENT, token.DEDENT,
424 token.NEWLINE, tokenize.NL]:
425 # warn if we see anything else than STRING or whitespace
Collin Winter6afaeb72007-08-03 17:06:41 +0000426 print(_(
Barry Warsawe04ee702003-04-16 18:08:23 +0000427 '*** %(file)s:%(lineno)s: Seen unexpected token "%(token)s"'
428 ) % {
429 'token': tstring,
430 'file': self.__curfile,
431 'lineno': self.__lineno
Collin Winter6afaeb72007-08-03 17:06:41 +0000432 }, file=sys.stderr)
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000433 self.__state = self.__waiting
Barry Warsawe27db5a1999-08-13 20:59:48 +0000434
Barry Warsaw16b62c12001-05-21 19:51:26 +0000435 def __addentry(self, msg, lineno=None, isdocstring=0):
Barry Warsaw08a8a352000-10-27 04:56:28 +0000436 if lineno is None:
437 lineno = self.__lineno
438 if not msg in self.__options.toexclude:
439 entry = (self.__curfile, lineno)
Barry Warsaw16b62c12001-05-21 19:51:26 +0000440 self.__messages.setdefault(msg, {})[entry] = isdocstring
Barry Warsaw08a8a352000-10-27 04:56:28 +0000441
Barry Warsawe27db5a1999-08-13 20:59:48 +0000442 def set_filename(self, filename):
443 self.__curfile = filename
Barry Warsaw63ce5af2001-07-27 16:47:18 +0000444 self.__freshmodule = 1
Barry Warsawe27db5a1999-08-13 20:59:48 +0000445
446 def write(self, fp):
447 options = self.__options
Matthias Klose2443d4a2004-08-16 12:10:12 +0000448 timestamp = time.strftime('%Y-%m-%d %H:%M+%Z')
Barry Warsaw08a8a352000-10-27 04:56:28 +0000449 # The time stamp in the header doesn't have the same format as that
450 # generated by xgettext...
Collin Winter6afaeb72007-08-03 17:06:41 +0000451 print(pot_header % {'time': timestamp, 'version': __version__}, file=fp)
Barry Warsaw128c77d2001-05-23 16:59:45 +0000452 # Sort the entries. First sort each particular entry's keys, then
453 # sort all the entries by their first item.
454 reverse = {}
Fred Drake33e2c3e2000-10-26 03:49:15 +0000455 for k, v in self.__messages.items():
Guido van Rossumf7bd9642008-01-15 17:41:38 +0000456 keys = sorted(v.keys())
Barry Warsaw50cf7062001-05-24 23:06:13 +0000457 reverse.setdefault(tuple(keys), []).append((k, v))
Guido van Rossumf7bd9642008-01-15 17:41:38 +0000458 rkeys = sorted(reverse.keys())
Barry Warsaw128c77d2001-05-23 16:59:45 +0000459 for rkey in rkeys:
Barry Warsaw50cf7062001-05-24 23:06:13 +0000460 rentries = reverse[rkey]
461 rentries.sort()
462 for k, v in rentries:
463 # If the entry was gleaned out of a docstring, then add a
464 # comment stating so. This is to aid translators who may wish
465 # to skip translating some unimportant docstrings.
Guido van Rossum89da5d72006-08-22 00:21:25 +0000466 isdocstring = any(v.values())
Barry Warsaw50cf7062001-05-24 23:06:13 +0000467 # k is the message string, v is a dictionary-set of (filename,
468 # lineno) tuples. We want to sort the entries in v first by
469 # file name and then by line number.
Guido van Rossumf7bd9642008-01-15 17:41:38 +0000470 v = sorted(v.keys())
Barry Warsaw50cf7062001-05-24 23:06:13 +0000471 if not options.writelocations:
472 pass
473 # location comments are different b/w Solaris and GNU:
474 elif options.locationstyle == options.SOLARIS:
475 for filename, lineno in v:
476 d = {'filename': filename, 'lineno': lineno}
Collin Winter6afaeb72007-08-03 17:06:41 +0000477 print(_(
478 '# File: %(filename)s, line: %(lineno)d') % d, file=fp)
Barry Warsaw50cf7062001-05-24 23:06:13 +0000479 elif options.locationstyle == options.GNU:
480 # fit as many locations on one line, as long as the
481 # resulting line length doesn't exceeds 'options.width'
482 locline = '#:'
483 for filename, lineno in v:
484 d = {'filename': filename, 'lineno': lineno}
485 s = _(' %(filename)s:%(lineno)d') % d
486 if len(locline) + len(s) <= options.width:
487 locline = locline + s
488 else:
Collin Winter6afaeb72007-08-03 17:06:41 +0000489 print(locline, file=fp)
Barry Warsaw50cf7062001-05-24 23:06:13 +0000490 locline = "#:" + s
491 if len(locline) > 2:
Collin Winter6afaeb72007-08-03 17:06:41 +0000492 print(locline, file=fp)
Barry Warsaw5c94ce52001-06-20 19:41:40 +0000493 if isdocstring:
Collin Winter6afaeb72007-08-03 17:06:41 +0000494 print('#, docstring', file=fp)
495 print('msgid', normalize(k), file=fp)
496 print('msgstr ""\n', file=fp)
Barry Warsaw08a8a352000-10-27 04:56:28 +0000497
Barry Warsawe27db5a1999-08-13 20:59:48 +0000498
499
500def main():
Barry Warsawa17e0f12000-03-08 15:18:35 +0000501 global default_keywords
Barry Warsawe27db5a1999-08-13 20:59:48 +0000502 try:
503 opts, args = getopt.getopt(
504 sys.argv[1:],
Barry Warsaw63ce5af2001-07-27 16:47:18 +0000505 'ad:DEhk:Kno:p:S:Vvw:x:X:',
Barry Warsaw2b639692001-05-21 19:58:23 +0000506 ['extract-all', 'default-domain=', 'escape', 'help',
Barry Warsawa17e0f12000-03-08 15:18:35 +0000507 'keyword=', 'no-default-keywords',
Barry Warsawc8f08922000-02-26 20:56:47 +0000508 'add-location', 'no-location', 'output=', 'output-dir=',
Barry Warsawa17e0f12000-03-08 15:18:35 +0000509 'style=', 'verbose', 'version', 'width=', 'exclude-file=',
Barry Warsaw63ce5af2001-07-27 16:47:18 +0000510 'docstrings', 'no-docstrings',
Barry Warsawc8f08922000-02-26 20:56:47 +0000511 ])
Guido van Rossumb940e112007-01-10 16:19:56 +0000512 except getopt.error as msg:
Barry Warsawe27db5a1999-08-13 20:59:48 +0000513 usage(1, msg)
514
515 # for holding option values
516 class Options:
517 # constants
518 GNU = 1
519 SOLARIS = 2
520 # defaults
Barry Warsawc8f08922000-02-26 20:56:47 +0000521 extractall = 0 # FIXME: currently this option has no effect at all.
522 escape = 0
Barry Warsawe27db5a1999-08-13 20:59:48 +0000523 keywords = []
Barry Warsawc8f08922000-02-26 20:56:47 +0000524 outpath = ''
Barry Warsawe27db5a1999-08-13 20:59:48 +0000525 outfile = 'messages.pot'
Barry Warsawa17e0f12000-03-08 15:18:35 +0000526 writelocations = 1
527 locationstyle = GNU
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000528 verbose = 0
Barry Warsawc8f08922000-02-26 20:56:47 +0000529 width = 78
530 excludefilename = ''
Barry Warsaw08a8a352000-10-27 04:56:28 +0000531 docstrings = 0
Barry Warsaw63ce5af2001-07-27 16:47:18 +0000532 nodocstrings = {}
Barry Warsawe27db5a1999-08-13 20:59:48 +0000533
534 options = Options()
535 locations = {'gnu' : options.GNU,
536 'solaris' : options.SOLARIS,
537 }
538
539 # parse options
540 for opt, arg in opts:
541 if opt in ('-h', '--help'):
542 usage(0)
Barry Warsawc8f08922000-02-26 20:56:47 +0000543 elif opt in ('-a', '--extract-all'):
544 options.extractall = 1
545 elif opt in ('-d', '--default-domain'):
546 options.outfile = arg + '.pot'
547 elif opt in ('-E', '--escape'):
548 options.escape = 1
Barry Warsaw08a8a352000-10-27 04:56:28 +0000549 elif opt in ('-D', '--docstrings'):
550 options.docstrings = 1
Barry Warsawe27db5a1999-08-13 20:59:48 +0000551 elif opt in ('-k', '--keyword'):
Barry Warsawe27db5a1999-08-13 20:59:48 +0000552 options.keywords.append(arg)
Barry Warsawa17e0f12000-03-08 15:18:35 +0000553 elif opt in ('-K', '--no-default-keywords'):
554 default_keywords = []
Barry Warsawe27db5a1999-08-13 20:59:48 +0000555 elif opt in ('-n', '--add-location'):
Barry Warsawa17e0f12000-03-08 15:18:35 +0000556 options.writelocations = 1
Barry Warsawe27db5a1999-08-13 20:59:48 +0000557 elif opt in ('--no-location',):
Barry Warsawa17e0f12000-03-08 15:18:35 +0000558 options.writelocations = 0
559 elif opt in ('-S', '--style'):
560 options.locationstyle = locations.get(arg.lower())
561 if options.locationstyle is None:
562 usage(1, _('Invalid value for --style: %s') % arg)
Barry Warsawc8f08922000-02-26 20:56:47 +0000563 elif opt in ('-o', '--output'):
564 options.outfile = arg
565 elif opt in ('-p', '--output-dir'):
566 options.outpath = arg
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000567 elif opt in ('-v', '--verbose'):
568 options.verbose = 1
Barry Warsawc8f08922000-02-26 20:56:47 +0000569 elif opt in ('-V', '--version'):
Collin Winter6afaeb72007-08-03 17:06:41 +0000570 print(_('pygettext.py (xgettext for Python) %s') % __version__)
Barry Warsawc8f08922000-02-26 20:56:47 +0000571 sys.exit(0)
572 elif opt in ('-w', '--width'):
573 try:
574 options.width = int(arg)
575 except ValueError:
Barry Warsawa17e0f12000-03-08 15:18:35 +0000576 usage(1, _('--width argument must be an integer: %s') % arg)
Barry Warsawc8f08922000-02-26 20:56:47 +0000577 elif opt in ('-x', '--exclude-file'):
578 options.excludefilename = arg
Barry Warsaw63ce5af2001-07-27 16:47:18 +0000579 elif opt in ('-X', '--no-docstrings'):
580 fp = open(arg)
581 try:
582 while 1:
583 line = fp.readline()
584 if not line:
585 break
586 options.nodocstrings[line[:-1]] = 1
587 finally:
588 fp.close()
Barry Warsawc8f08922000-02-26 20:56:47 +0000589
590 # calculate escapes
Barry Warsaw7733e122000-02-27 14:30:48 +0000591 make_escapes(options.escape)
Barry Warsawe27db5a1999-08-13 20:59:48 +0000592
593 # calculate all keywords
594 options.keywords.extend(default_keywords)
595
Barry Warsawc8f08922000-02-26 20:56:47 +0000596 # initialize list of strings to exclude
597 if options.excludefilename:
598 try:
599 fp = open(options.excludefilename)
600 options.toexclude = fp.readlines()
601 fp.close()
602 except IOError:
Collin Winter6afaeb72007-08-03 17:06:41 +0000603 print(_(
604 "Can't read --exclude-file: %s") % options.excludefilename, file=sys.stderr)
Barry Warsawc8f08922000-02-26 20:56:47 +0000605 sys.exit(1)
606 else:
607 options.toexclude = []
608
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000609 # resolve args to module lists
610 expanded = []
611 for arg in args:
612 if arg == '-':
613 expanded.append(arg)
614 else:
615 expanded.extend(getFilesForName(arg))
616 args = expanded
617
Barry Warsawe27db5a1999-08-13 20:59:48 +0000618 # slurp through all the files
619 eater = TokenEater(options)
620 for filename in args:
Barry Warsawa17e0f12000-03-08 15:18:35 +0000621 if filename == '-':
622 if options.verbose:
Collin Winter6afaeb72007-08-03 17:06:41 +0000623 print(_('Reading standard input'))
Barry Warsawa17e0f12000-03-08 15:18:35 +0000624 fp = sys.stdin
625 closep = 0
626 else:
627 if options.verbose:
Collin Winter6afaeb72007-08-03 17:06:41 +0000628 print(_('Working on %s') % filename)
Barry Warsawa17e0f12000-03-08 15:18:35 +0000629 fp = open(filename)
630 closep = 1
631 try:
632 eater.set_filename(filename)
Barry Warsaw75ee8f52001-02-26 04:46:53 +0000633 try:
Trent Nelson428de652008-03-18 22:41:35 +0000634 tokens = tokenize.generate_tokens(fp.readline)
635 for _token in tokens:
636 eater(*_token)
Guido van Rossumb940e112007-01-10 16:19:56 +0000637 except tokenize.TokenError as e:
Collin Winter6afaeb72007-08-03 17:06:41 +0000638 print('%s: %s, line %d, column %d' % (
Georg Brandl6464d472007-10-22 16:16:13 +0000639 e.args[0], filename, e.args[1][0], e.args[1][1]),
640 file=sys.stderr)
Barry Warsawa17e0f12000-03-08 15:18:35 +0000641 finally:
642 if closep:
643 fp.close()
Barry Warsawe27db5a1999-08-13 20:59:48 +0000644
Barry Warsawa17e0f12000-03-08 15:18:35 +0000645 # write the output
646 if options.outfile == '-':
647 fp = sys.stdout
648 closep = 0
649 else:
650 if options.outpath:
651 options.outfile = os.path.join(options.outpath, options.outfile)
652 fp = open(options.outfile, 'w')
653 closep = 1
654 try:
655 eater.write(fp)
656 finally:
657 if closep:
658 fp.close()
Barry Warsawe27db5a1999-08-13 20:59:48 +0000659
660
661if __name__ == '__main__':
662 main()
Barry Warsaw75a6e672000-05-02 19:28:30 +0000663 # some more test strings
Barry Warsawe04ee702003-04-16 18:08:23 +0000664 # this one creates a warning
665 _('*** Seen unexpected token "%(token)s"') % {'token': 'test'}
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000666 _('more' 'than' 'one' 'string')