blob: 3c6c14c8362e0110aed8ed5b32fe308145d482e5 [file] [log] [blame]
Benjamin Peterson90f5ba52010-03-11 22:53:45 +00001#! /usr/bin/env python3
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +00002# -*- coding: iso-8859-1 -*-
Benjamin Petersoneaedaec2013-12-22 19:45:38 -06003# Originally written by Barry Warsaw <barry@python.org>
Barry Warsawc8f08922000-02-26 20:56:47 +00004#
Barry Warsawe04ee702003-04-16 18:08:23 +00005# Minimally patched to make it even more xgettext compatible
Barry Warsawc8f08922000-02-26 20:56:47 +00006# by Peter Funk <pf@artcom-gmbh.de>
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +00007#
8# 2002-11-22 Jürgen Hermann <jh@web.de>
9# Added checks that _() only contains string literals, and
10# command line args are resolved to module lists, i.e. you
11# can now pass a filename, a module or package name, or a
12# directory (including globbing chars, important for Win32).
13# Made docstring fit in 80 chars wide displays using pydoc.
14#
Barry Warsawe27db5a1999-08-13 20:59:48 +000015
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +000016# for selftesting
17try:
18 import fintl
19 _ = fintl.gettext
20except ImportError:
21 _ = lambda s: s
22
23__doc__ = _("""pygettext -- Python equivalent of xgettext(1)
Barry Warsawe27db5a1999-08-13 20:59:48 +000024
25Many systems (Solaris, Linux, Gnu) provide extensive tools that ease the
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +000026internationalization of C programs. Most of these tools are independent of
27the programming language and can be used from within Python programs.
Barry Warsawe04ee702003-04-16 18:08:23 +000028Martin von Loewis' work[1] helps considerably in this regard.
Barry Warsawe27db5a1999-08-13 20:59:48 +000029
Barry Warsaw5dbf5261999-11-03 18:47:52 +000030There's one problem though; xgettext is the program that scans source code
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +000031looking for message strings, but it groks only C (or C++). Python
32introduces a few wrinkles, such as dual quoting characters, triple quoted
Barry Warsawe04ee702003-04-16 18:08:23 +000033strings, and raw strings. xgettext understands none of this.
Barry Warsawe27db5a1999-08-13 20:59:48 +000034
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +000035Enter pygettext, which uses Python's standard tokenize module to scan
36Python source code, generating .pot files identical to what GNU xgettext[2]
37generates for C and C++ code. From there, the standard GNU tools can be
Barry Warsawe04ee702003-04-16 18:08:23 +000038used.
Barry Warsawe27db5a1999-08-13 20:59:48 +000039
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +000040A word about marking Python strings as candidates for translation. GNU
41xgettext recognizes the following keywords: gettext, dgettext, dcgettext,
42and gettext_noop. But those can be a lot of text to include all over your
43code. C and C++ have a trick: they use the C preprocessor. Most
44internationalized C source includes a #define for gettext() to _() so that
45what has to be written in the source is much less. Thus these are both
Barry Warsawe04ee702003-04-16 18:08:23 +000046translatable strings:
Barry Warsawe27db5a1999-08-13 20:59:48 +000047
48 gettext("Translatable String")
49 _("Translatable String")
50
51Python of course has no preprocessor so this doesn't work so well. Thus,
52pygettext searches only for _() by default, but see the -k/--keyword flag
53below for how to augment this.
54
55 [1] http://www.python.org/workshops/1997-10/proceedings/loewis.html
56 [2] http://www.gnu.org/software/gettext/gettext.html
57
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +000058NOTE: pygettext attempts to be option and feature compatible with GNU
59xgettext where ever possible. However some options are still missing or are
60not fully implemented. Also, xgettext's use of command line switches with
61option arguments is broken, and in these cases, pygettext just defines
Barry Warsawe04ee702003-04-16 18:08:23 +000062additional switches.
Barry Warsawe27db5a1999-08-13 20:59:48 +000063
Barry Warsawa17e0f12000-03-08 15:18:35 +000064Usage: pygettext [options] inputfile ...
Barry Warsawe27db5a1999-08-13 20:59:48 +000065
66Options:
67
68 -a
69 --extract-all
Barry Warsaw63ce5af2001-07-27 16:47:18 +000070 Extract all strings.
Barry Warsawe27db5a1999-08-13 20:59:48 +000071
Barry Warsawc8f08922000-02-26 20:56:47 +000072 -d name
73 --default-domain=name
Barry Warsaw63ce5af2001-07-27 16:47:18 +000074 Rename the default output file from messages.pot to name.pot.
Barry Warsawc8f08922000-02-26 20:56:47 +000075
76 -E
77 --escape
Barry Warsaw08a8a352000-10-27 04:56:28 +000078 Replace non-ASCII characters with octal escape sequences.
79
80 -D
81 --docstrings
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +000082 Extract module, class, method, and function docstrings. These do
83 not need to be wrapped in _() markers, and in fact cannot be for
84 Python to consider them docstrings. (See also the -X option).
Barry Warsawc8f08922000-02-26 20:56:47 +000085
86 -h
87 --help
Barry Warsaw63ce5af2001-07-27 16:47:18 +000088 Print this help message and exit.
Barry Warsawe27db5a1999-08-13 20:59:48 +000089
Barry Warsawa17e0f12000-03-08 15:18:35 +000090 -k word
91 --keyword=word
92 Keywords to look for in addition to the default set, which are:
93 %(DEFAULTKEYWORDS)s
Barry Warsawe27db5a1999-08-13 20:59:48 +000094
Barry Warsawa17e0f12000-03-08 15:18:35 +000095 You can have multiple -k flags on the command line.
96
97 -K
98 --no-default-keywords
99 Disable the default set of keywords (see above). Any keywords
100 explicitly added with the -k/--keyword option are still recognized.
Barry Warsawe27db5a1999-08-13 20:59:48 +0000101
102 --no-location
Barry Warsawa17e0f12000-03-08 15:18:35 +0000103 Do not write filename/lineno location comments.
Barry Warsawe27db5a1999-08-13 20:59:48 +0000104
Barry Warsawa17e0f12000-03-08 15:18:35 +0000105 -n
106 --add-location
Barry Warsawe27db5a1999-08-13 20:59:48 +0000107 Write filename/lineno location comments indicating where each
108 extracted string is found in the source. These lines appear before
Barry Warsawa17e0f12000-03-08 15:18:35 +0000109 each msgid. The style of comments is controlled by the -S/--style
110 option. This is the default.
111
Barry Warsaw08a8a352000-10-27 04:56:28 +0000112 -o filename
113 --output=filename
114 Rename the default output file from messages.pot to filename. If
115 filename is `-' then the output is sent to standard out.
116
117 -p dir
118 --output-dir=dir
119 Output files will be placed in directory dir.
120
Barry Warsawa17e0f12000-03-08 15:18:35 +0000121 -S stylename
122 --style stylename
123 Specify which style to use for location comments. Two styles are
124 supported:
Barry Warsawe27db5a1999-08-13 20:59:48 +0000125
126 Solaris # File: filename, line: line-number
Barry Warsawa17e0f12000-03-08 15:18:35 +0000127 GNU #: filename:line
Barry Warsawe27db5a1999-08-13 20:59:48 +0000128
Barry Warsawa17e0f12000-03-08 15:18:35 +0000129 The style name is case insensitive. GNU style is the default.
Barry Warsawe27db5a1999-08-13 20:59:48 +0000130
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000131 -v
132 --verbose
133 Print the names of the files being processed.
134
Barry Warsawc8f08922000-02-26 20:56:47 +0000135 -V
136 --version
137 Print the version of pygettext and exit.
138
139 -w columns
140 --width=columns
141 Set width of output to columns.
142
143 -x filename
144 --exclude-file=filename
145 Specify a file that contains a list of strings that are not be
146 extracted from the input files. Each string to be excluded must
147 appear on a line by itself in the file.
Barry Warsawe27db5a1999-08-13 20:59:48 +0000148
Barry Warsaw63ce5af2001-07-27 16:47:18 +0000149 -X filename
150 --no-docstrings=filename
151 Specify a file that contains a list of files (one per line) that
152 should not have their docstrings extracted. This is only useful in
153 conjunction with the -D option above.
Barry Warsawa17e0f12000-03-08 15:18:35 +0000154
Barry Warsaw63ce5af2001-07-27 16:47:18 +0000155If `inputfile' is -, standard input is read.
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000156""")
Barry Warsawe27db5a1999-08-13 20:59:48 +0000157
158import os
Barry Warsawe04ee702003-04-16 18:08:23 +0000159import imp
Barry Warsawe27db5a1999-08-13 20:59:48 +0000160import sys
Barry Warsawe04ee702003-04-16 18:08:23 +0000161import glob
Barry Warsawe27db5a1999-08-13 20:59:48 +0000162import time
163import getopt
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000164import token
Barry Warsawe27db5a1999-08-13 20:59:48 +0000165import tokenize
166
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000167__version__ = '1.5'
Barry Warsawa17e0f12000-03-08 15:18:35 +0000168
169default_keywords = ['_']
170DEFAULTKEYWORDS = ', '.join(default_keywords)
171
172EMPTYSTRING = ''
Barry Warsawe27db5a1999-08-13 20:59:48 +0000173
174
175
Barry Warsaw63ce5af2001-07-27 16:47:18 +0000176# The normal pot-file header. msgmerge and Emacs's po-mode work better if it's
177# there.
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000178pot_header = _('''\
179# SOME DESCRIPTIVE TITLE.
180# Copyright (C) YEAR ORGANIZATION
181# FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.
182#
183msgid ""
184msgstr ""
185"Project-Id-Version: PACKAGE VERSION\\n"
Martin v. Löwis0f6b3832001-03-01 22:56:17 +0000186"POT-Creation-Date: %(time)s\\n"
187"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\\n"
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000188"Last-Translator: FULL NAME <EMAIL@ADDRESS>\\n"
189"Language-Team: LANGUAGE <LL@li.org>\\n"
190"MIME-Version: 1.0\\n"
Serhiy Storchakab6ed1732013-02-09 22:37:22 +0200191"Content-Type: text/plain; charset=%(charset)s\\n"
192"Content-Transfer-Encoding: %(encoding)s\\n"
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000193"Generated-By: pygettext.py %(version)s\\n"
194
195''')
196
197
Barry Warsawe27db5a1999-08-13 20:59:48 +0000198def usage(code, msg=''):
Collin Winter6afaeb72007-08-03 17:06:41 +0000199 print(__doc__ % globals(), file=sys.stderr)
Barry Warsawe27db5a1999-08-13 20:59:48 +0000200 if msg:
Collin Winter6afaeb72007-08-03 17:06:41 +0000201 print(msg, file=sys.stderr)
Barry Warsawe27db5a1999-08-13 20:59:48 +0000202 sys.exit(code)
203
Barry Warsawc8f08922000-02-26 20:56:47 +0000204
Barry Warsawe27db5a1999-08-13 20:59:48 +0000205
Serhiy Storchakab6ed1732013-02-09 22:37:22 +0200206def make_escapes(pass_nonascii):
207 global escapes, escape
208 if pass_nonascii:
209 # Allow non-ascii characters to pass through so that e.g. 'msgid
Barry Warsaw7733e122000-02-27 14:30:48 +0000210 # "Höhe"' would result not result in 'msgid "H\366he"'. Otherwise we
211 # escape any character outside the 32..126 range.
212 mod = 128
Serhiy Storchakab6ed1732013-02-09 22:37:22 +0200213 escape = escape_ascii
Barry Warsaw7733e122000-02-27 14:30:48 +0000214 else:
215 mod = 256
Serhiy Storchakab6ed1732013-02-09 22:37:22 +0200216 escape = escape_nonascii
217 escapes = [r"\%03o" % i for i in range(mod)]
218 for i in range(32, 127):
219 escapes[i] = chr(i)
220 escapes[ord('\\')] = r'\\'
221 escapes[ord('\t')] = r'\t'
222 escapes[ord('\r')] = r'\r'
223 escapes[ord('\n')] = r'\n'
224 escapes[ord('\"')] = r'\"'
Barry Warsawc8f08922000-02-26 20:56:47 +0000225
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000226
Serhiy Storchakab6ed1732013-02-09 22:37:22 +0200227def escape_ascii(s, encoding):
228 return ''.join(escapes[ord(c)] if ord(c) < 128 else c for c in s)
229
230def escape_nonascii(s, encoding):
231 return ''.join(escapes[b] for b in s.encode(encoding))
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000232
233
234def safe_eval(s):
235 # unwrap quotes, safely
236 return eval(s, {'__builtins__':{}}, {})
237
238
Serhiy Storchakab6ed1732013-02-09 22:37:22 +0200239def normalize(s, encoding):
Barry Warsawe27db5a1999-08-13 20:59:48 +0000240 # This converts the various Python string types into a format that is
241 # appropriate for .po files, namely much closer to C style.
Barry Warsawa17e0f12000-03-08 15:18:35 +0000242 lines = s.split('\n')
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000243 if len(lines) == 1:
Serhiy Storchakab6ed1732013-02-09 22:37:22 +0200244 s = '"' + escape(s, encoding) + '"'
Barry Warsawe27db5a1999-08-13 20:59:48 +0000245 else:
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000246 if not lines[-1]:
247 del lines[-1]
248 lines[-1] = lines[-1] + '\n'
249 for i in range(len(lines)):
Serhiy Storchakab6ed1732013-02-09 22:37:22 +0200250 lines[i] = escape(lines[i], encoding)
Barry Warsawa17e0f12000-03-08 15:18:35 +0000251 lineterm = '\\n"\n"'
252 s = '""\n"' + lineterm.join(lines) + '"'
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000253 return s
Barry Warsawe27db5a1999-08-13 20:59:48 +0000254
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000255
256def containsAny(str, set):
Barry Warsawe04ee702003-04-16 18:08:23 +0000257 """Check whether 'str' contains ANY of the chars in 'set'"""
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000258 return 1 in [c in str for c in set]
259
260
261def _visit_pyfiles(list, dirname, names):
Barry Warsawe04ee702003-04-16 18:08:23 +0000262 """Helper for getFilesForName()."""
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000263 # get extension for python source files
Georg Brandlbf82e372008-05-16 17:02:34 +0000264 if '_py_ext' not in globals():
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000265 global _py_ext
Barry Warsawe04ee702003-04-16 18:08:23 +0000266 _py_ext = [triple[0] for triple in imp.get_suffixes()
267 if triple[2] == imp.PY_SOURCE][0]
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000268
269 # don't recurse into CVS directories
270 if 'CVS' in names:
271 names.remove('CVS')
272
273 # add all *.py files to list
274 list.extend(
Barry Warsawe04ee702003-04-16 18:08:23 +0000275 [os.path.join(dirname, file) for file in names
276 if os.path.splitext(file)[1] == _py_ext]
277 )
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000278
279
280def _get_modpkg_path(dotted_name, pathlist=None):
Barry Warsawe04ee702003-04-16 18:08:23 +0000281 """Get the filesystem path for a module or a package.
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000282
Barry Warsawe04ee702003-04-16 18:08:23 +0000283 Return the file system path to a file for a module, and to a directory for
284 a package. Return None if the name is not found, or is a builtin or
285 extension module.
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000286 """
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000287 # split off top-most name
288 parts = dotted_name.split('.', 1)
289
290 if len(parts) > 1:
291 # we have a dotted path, import top-level package
292 try:
293 file, pathname, description = imp.find_module(parts[0], pathlist)
294 if file: file.close()
295 except ImportError:
296 return None
297
298 # check if it's indeed a package
299 if description[2] == imp.PKG_DIRECTORY:
300 # recursively handle the remaining name parts
301 pathname = _get_modpkg_path(parts[1], [pathname])
302 else:
303 pathname = None
304 else:
305 # plain name
306 try:
Barry Warsawe04ee702003-04-16 18:08:23 +0000307 file, pathname, description = imp.find_module(
308 dotted_name, pathlist)
309 if file:
310 file.close()
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000311 if description[2] not in [imp.PY_SOURCE, imp.PKG_DIRECTORY]:
312 pathname = None
313 except ImportError:
314 pathname = None
315
316 return pathname
317
318
319def getFilesForName(name):
Barry Warsawe04ee702003-04-16 18:08:23 +0000320 """Get a list of module files for a filename, a module or package name,
321 or a directory.
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000322 """
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000323 if not os.path.exists(name):
324 # check for glob chars
325 if containsAny(name, "*?[]"):
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000326 files = glob.glob(name)
327 list = []
328 for file in files:
329 list.extend(getFilesForName(file))
330 return list
331
332 # try to find module or package
333 name = _get_modpkg_path(name)
334 if not name:
335 return []
336
337 if os.path.isdir(name):
338 # find all python files in directory
339 list = []
Alexandre Vassalotti4e6531e2008-05-09 20:00:17 +0000340 os.walk(name, _visit_pyfiles, list)
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000341 return list
342 elif os.path.exists(name):
343 # a single file
344 return [name]
345
346 return []
Barry Warsawe27db5a1999-08-13 20:59:48 +0000347
348
349class TokenEater:
350 def __init__(self, options):
351 self.__options = options
352 self.__messages = {}
353 self.__state = self.__waiting
354 self.__data = []
355 self.__lineno = -1
Barry Warsaw08a8a352000-10-27 04:56:28 +0000356 self.__freshmodule = 1
Barry Warsaw63ce5af2001-07-27 16:47:18 +0000357 self.__curfile = None
Barry Warsawe27db5a1999-08-13 20:59:48 +0000358
359 def __call__(self, ttype, tstring, stup, etup, line):
360 # dispatch
Barry Warsaw08a8a352000-10-27 04:56:28 +0000361## import token
362## print >> sys.stderr, 'ttype:', token.tok_name[ttype], \
363## 'tstring:', tstring
Barry Warsawe27db5a1999-08-13 20:59:48 +0000364 self.__state(ttype, tstring, stup[0])
365
366 def __waiting(self, ttype, tstring, lineno):
Barry Warsaw63ce5af2001-07-27 16:47:18 +0000367 opts = self.__options
Barry Warsaw08a8a352000-10-27 04:56:28 +0000368 # Do docstring extractions, if enabled
Barry Warsaw63ce5af2001-07-27 16:47:18 +0000369 if opts.docstrings and not opts.nodocstrings.get(self.__curfile):
Barry Warsaw08a8a352000-10-27 04:56:28 +0000370 # module docstring?
371 if self.__freshmodule:
372 if ttype == tokenize.STRING:
Barry Warsaw16b62c12001-05-21 19:51:26 +0000373 self.__addentry(safe_eval(tstring), lineno, isdocstring=1)
Barry Warsaw08a8a352000-10-27 04:56:28 +0000374 self.__freshmodule = 0
375 elif ttype not in (tokenize.COMMENT, tokenize.NL):
376 self.__freshmodule = 0
377 return
378 # class docstring?
379 if ttype == tokenize.NAME and tstring in ('class', 'def'):
380 self.__state = self.__suiteseen
381 return
Barry Warsaw63ce5af2001-07-27 16:47:18 +0000382 if ttype == tokenize.NAME and tstring in opts.keywords:
Barry Warsawe27db5a1999-08-13 20:59:48 +0000383 self.__state = self.__keywordseen
384
Barry Warsaw08a8a352000-10-27 04:56:28 +0000385 def __suiteseen(self, ttype, tstring, lineno):
386 # ignore anything until we see the colon
387 if ttype == tokenize.OP and tstring == ':':
388 self.__state = self.__suitedocstring
389
390 def __suitedocstring(self, ttype, tstring, lineno):
391 # ignore any intervening noise
392 if ttype == tokenize.STRING:
Barry Warsaw16b62c12001-05-21 19:51:26 +0000393 self.__addentry(safe_eval(tstring), lineno, isdocstring=1)
Barry Warsaw08a8a352000-10-27 04:56:28 +0000394 self.__state = self.__waiting
395 elif ttype not in (tokenize.NEWLINE, tokenize.INDENT,
396 tokenize.COMMENT):
397 # there was no class docstring
398 self.__state = self.__waiting
399
Barry Warsawe27db5a1999-08-13 20:59:48 +0000400 def __keywordseen(self, ttype, tstring, lineno):
401 if ttype == tokenize.OP and tstring == '(':
402 self.__data = []
403 self.__lineno = lineno
404 self.__state = self.__openseen
405 else:
406 self.__state = self.__waiting
407
408 def __openseen(self, ttype, tstring, lineno):
409 if ttype == tokenize.OP and tstring == ')':
410 # We've seen the last of the translatable strings. Record the
Barry Warsawe04ee702003-04-16 18:08:23 +0000411 # line number of the first line of the strings and update the list
Barry Warsawe27db5a1999-08-13 20:59:48 +0000412 # of messages seen. Reset state for the next batch. If there
413 # were no strings inside _(), then just ignore this entry.
414 if self.__data:
Barry Warsaw08a8a352000-10-27 04:56:28 +0000415 self.__addentry(EMPTYSTRING.join(self.__data))
Barry Warsawe27db5a1999-08-13 20:59:48 +0000416 self.__state = self.__waiting
417 elif ttype == tokenize.STRING:
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000418 self.__data.append(safe_eval(tstring))
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000419 elif ttype not in [tokenize.COMMENT, token.INDENT, token.DEDENT,
420 token.NEWLINE, tokenize.NL]:
421 # warn if we see anything else than STRING or whitespace
Collin Winter6afaeb72007-08-03 17:06:41 +0000422 print(_(
Barry Warsawe04ee702003-04-16 18:08:23 +0000423 '*** %(file)s:%(lineno)s: Seen unexpected token "%(token)s"'
424 ) % {
425 'token': tstring,
426 'file': self.__curfile,
427 'lineno': self.__lineno
Collin Winter6afaeb72007-08-03 17:06:41 +0000428 }, file=sys.stderr)
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000429 self.__state = self.__waiting
Barry Warsawe27db5a1999-08-13 20:59:48 +0000430
Barry Warsaw16b62c12001-05-21 19:51:26 +0000431 def __addentry(self, msg, lineno=None, isdocstring=0):
Barry Warsaw08a8a352000-10-27 04:56:28 +0000432 if lineno is None:
433 lineno = self.__lineno
434 if not msg in self.__options.toexclude:
435 entry = (self.__curfile, lineno)
Barry Warsaw16b62c12001-05-21 19:51:26 +0000436 self.__messages.setdefault(msg, {})[entry] = isdocstring
Barry Warsaw08a8a352000-10-27 04:56:28 +0000437
Barry Warsawe27db5a1999-08-13 20:59:48 +0000438 def set_filename(self, filename):
439 self.__curfile = filename
Barry Warsaw63ce5af2001-07-27 16:47:18 +0000440 self.__freshmodule = 1
Barry Warsawe27db5a1999-08-13 20:59:48 +0000441
442 def write(self, fp):
443 options = self.__options
R David Murray2b781292015-04-16 12:15:09 -0400444 timestamp = time.strftime('%Y-%m-%d %H:%M%z')
Serhiy Storchakab6ed1732013-02-09 22:37:22 +0200445 encoding = fp.encoding if fp.encoding else 'UTF-8'
446 print(pot_header % {'time': timestamp, 'version': __version__,
447 'charset': encoding,
448 'encoding': '8bit'}, file=fp)
Barry Warsaw128c77d2001-05-23 16:59:45 +0000449 # Sort the entries. First sort each particular entry's keys, then
450 # sort all the entries by their first item.
451 reverse = {}
Fred Drake33e2c3e2000-10-26 03:49:15 +0000452 for k, v in self.__messages.items():
Guido van Rossumf7bd9642008-01-15 17:41:38 +0000453 keys = sorted(v.keys())
Barry Warsaw50cf7062001-05-24 23:06:13 +0000454 reverse.setdefault(tuple(keys), []).append((k, v))
Guido van Rossumf7bd9642008-01-15 17:41:38 +0000455 rkeys = sorted(reverse.keys())
Barry Warsaw128c77d2001-05-23 16:59:45 +0000456 for rkey in rkeys:
Barry Warsaw50cf7062001-05-24 23:06:13 +0000457 rentries = reverse[rkey]
458 rentries.sort()
459 for k, v in rentries:
460 # If the entry was gleaned out of a docstring, then add a
461 # comment stating so. This is to aid translators who may wish
462 # to skip translating some unimportant docstrings.
Guido van Rossum89da5d72006-08-22 00:21:25 +0000463 isdocstring = any(v.values())
Barry Warsaw50cf7062001-05-24 23:06:13 +0000464 # k is the message string, v is a dictionary-set of (filename,
465 # lineno) tuples. We want to sort the entries in v first by
466 # file name and then by line number.
Guido van Rossumf7bd9642008-01-15 17:41:38 +0000467 v = sorted(v.keys())
Barry Warsaw50cf7062001-05-24 23:06:13 +0000468 if not options.writelocations:
469 pass
470 # location comments are different b/w Solaris and GNU:
471 elif options.locationstyle == options.SOLARIS:
472 for filename, lineno in v:
473 d = {'filename': filename, 'lineno': lineno}
Collin Winter6afaeb72007-08-03 17:06:41 +0000474 print(_(
475 '# File: %(filename)s, line: %(lineno)d') % d, file=fp)
Barry Warsaw50cf7062001-05-24 23:06:13 +0000476 elif options.locationstyle == options.GNU:
477 # fit as many locations on one line, as long as the
478 # resulting line length doesn't exceeds 'options.width'
479 locline = '#:'
480 for filename, lineno in v:
481 d = {'filename': filename, 'lineno': lineno}
482 s = _(' %(filename)s:%(lineno)d') % d
483 if len(locline) + len(s) <= options.width:
484 locline = locline + s
485 else:
Collin Winter6afaeb72007-08-03 17:06:41 +0000486 print(locline, file=fp)
Barry Warsaw50cf7062001-05-24 23:06:13 +0000487 locline = "#:" + s
488 if len(locline) > 2:
Collin Winter6afaeb72007-08-03 17:06:41 +0000489 print(locline, file=fp)
Barry Warsaw5c94ce52001-06-20 19:41:40 +0000490 if isdocstring:
Collin Winter6afaeb72007-08-03 17:06:41 +0000491 print('#, docstring', file=fp)
Serhiy Storchakab6ed1732013-02-09 22:37:22 +0200492 print('msgid', normalize(k, encoding), file=fp)
Collin Winter6afaeb72007-08-03 17:06:41 +0000493 print('msgstr ""\n', file=fp)
Barry Warsaw08a8a352000-10-27 04:56:28 +0000494
Barry Warsawe27db5a1999-08-13 20:59:48 +0000495
496
497def main():
Barry Warsawa17e0f12000-03-08 15:18:35 +0000498 global default_keywords
Barry Warsawe27db5a1999-08-13 20:59:48 +0000499 try:
500 opts, args = getopt.getopt(
501 sys.argv[1:],
Barry Warsaw63ce5af2001-07-27 16:47:18 +0000502 'ad:DEhk:Kno:p:S:Vvw:x:X:',
Barry Warsaw2b639692001-05-21 19:58:23 +0000503 ['extract-all', 'default-domain=', 'escape', 'help',
Barry Warsawa17e0f12000-03-08 15:18:35 +0000504 'keyword=', 'no-default-keywords',
Barry Warsawc8f08922000-02-26 20:56:47 +0000505 'add-location', 'no-location', 'output=', 'output-dir=',
Barry Warsawa17e0f12000-03-08 15:18:35 +0000506 'style=', 'verbose', 'version', 'width=', 'exclude-file=',
Barry Warsaw63ce5af2001-07-27 16:47:18 +0000507 'docstrings', 'no-docstrings',
Barry Warsawc8f08922000-02-26 20:56:47 +0000508 ])
Guido van Rossumb940e112007-01-10 16:19:56 +0000509 except getopt.error as msg:
Barry Warsawe27db5a1999-08-13 20:59:48 +0000510 usage(1, msg)
511
512 # for holding option values
513 class Options:
514 # constants
515 GNU = 1
516 SOLARIS = 2
517 # defaults
Barry Warsawc8f08922000-02-26 20:56:47 +0000518 extractall = 0 # FIXME: currently this option has no effect at all.
519 escape = 0
Barry Warsawe27db5a1999-08-13 20:59:48 +0000520 keywords = []
Barry Warsawc8f08922000-02-26 20:56:47 +0000521 outpath = ''
Barry Warsawe27db5a1999-08-13 20:59:48 +0000522 outfile = 'messages.pot'
Barry Warsawa17e0f12000-03-08 15:18:35 +0000523 writelocations = 1
524 locationstyle = GNU
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000525 verbose = 0
Barry Warsawc8f08922000-02-26 20:56:47 +0000526 width = 78
527 excludefilename = ''
Barry Warsaw08a8a352000-10-27 04:56:28 +0000528 docstrings = 0
Barry Warsaw63ce5af2001-07-27 16:47:18 +0000529 nodocstrings = {}
Barry Warsawe27db5a1999-08-13 20:59:48 +0000530
531 options = Options()
532 locations = {'gnu' : options.GNU,
533 'solaris' : options.SOLARIS,
534 }
535
536 # parse options
537 for opt, arg in opts:
538 if opt in ('-h', '--help'):
539 usage(0)
Barry Warsawc8f08922000-02-26 20:56:47 +0000540 elif opt in ('-a', '--extract-all'):
541 options.extractall = 1
542 elif opt in ('-d', '--default-domain'):
543 options.outfile = arg + '.pot'
544 elif opt in ('-E', '--escape'):
545 options.escape = 1
Barry Warsaw08a8a352000-10-27 04:56:28 +0000546 elif opt in ('-D', '--docstrings'):
547 options.docstrings = 1
Barry Warsawe27db5a1999-08-13 20:59:48 +0000548 elif opt in ('-k', '--keyword'):
Barry Warsawe27db5a1999-08-13 20:59:48 +0000549 options.keywords.append(arg)
Barry Warsawa17e0f12000-03-08 15:18:35 +0000550 elif opt in ('-K', '--no-default-keywords'):
551 default_keywords = []
Barry Warsawe27db5a1999-08-13 20:59:48 +0000552 elif opt in ('-n', '--add-location'):
Barry Warsawa17e0f12000-03-08 15:18:35 +0000553 options.writelocations = 1
Barry Warsawe27db5a1999-08-13 20:59:48 +0000554 elif opt in ('--no-location',):
Barry Warsawa17e0f12000-03-08 15:18:35 +0000555 options.writelocations = 0
556 elif opt in ('-S', '--style'):
557 options.locationstyle = locations.get(arg.lower())
558 if options.locationstyle is None:
559 usage(1, _('Invalid value for --style: %s') % arg)
Barry Warsawc8f08922000-02-26 20:56:47 +0000560 elif opt in ('-o', '--output'):
561 options.outfile = arg
562 elif opt in ('-p', '--output-dir'):
563 options.outpath = arg
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000564 elif opt in ('-v', '--verbose'):
565 options.verbose = 1
Barry Warsawc8f08922000-02-26 20:56:47 +0000566 elif opt in ('-V', '--version'):
Collin Winter6afaeb72007-08-03 17:06:41 +0000567 print(_('pygettext.py (xgettext for Python) %s') % __version__)
Barry Warsawc8f08922000-02-26 20:56:47 +0000568 sys.exit(0)
569 elif opt in ('-w', '--width'):
570 try:
571 options.width = int(arg)
572 except ValueError:
Barry Warsawa17e0f12000-03-08 15:18:35 +0000573 usage(1, _('--width argument must be an integer: %s') % arg)
Barry Warsawc8f08922000-02-26 20:56:47 +0000574 elif opt in ('-x', '--exclude-file'):
575 options.excludefilename = arg
Barry Warsaw63ce5af2001-07-27 16:47:18 +0000576 elif opt in ('-X', '--no-docstrings'):
577 fp = open(arg)
578 try:
579 while 1:
580 line = fp.readline()
581 if not line:
582 break
583 options.nodocstrings[line[:-1]] = 1
584 finally:
585 fp.close()
Barry Warsawc8f08922000-02-26 20:56:47 +0000586
587 # calculate escapes
Serhiy Storchakab6ed1732013-02-09 22:37:22 +0200588 make_escapes(not options.escape)
Barry Warsawe27db5a1999-08-13 20:59:48 +0000589
590 # calculate all keywords
591 options.keywords.extend(default_keywords)
592
Barry Warsawc8f08922000-02-26 20:56:47 +0000593 # initialize list of strings to exclude
594 if options.excludefilename:
595 try:
596 fp = open(options.excludefilename)
597 options.toexclude = fp.readlines()
598 fp.close()
599 except IOError:
Collin Winter6afaeb72007-08-03 17:06:41 +0000600 print(_(
601 "Can't read --exclude-file: %s") % options.excludefilename, file=sys.stderr)
Barry Warsawc8f08922000-02-26 20:56:47 +0000602 sys.exit(1)
603 else:
604 options.toexclude = []
605
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000606 # resolve args to module lists
607 expanded = []
608 for arg in args:
609 if arg == '-':
610 expanded.append(arg)
611 else:
612 expanded.extend(getFilesForName(arg))
613 args = expanded
614
Barry Warsawe27db5a1999-08-13 20:59:48 +0000615 # slurp through all the files
616 eater = TokenEater(options)
617 for filename in args:
Barry Warsawa17e0f12000-03-08 15:18:35 +0000618 if filename == '-':
619 if options.verbose:
Collin Winter6afaeb72007-08-03 17:06:41 +0000620 print(_('Reading standard input'))
Serhiy Storchakab6ed1732013-02-09 22:37:22 +0200621 fp = sys.stdin.buffer
Barry Warsawa17e0f12000-03-08 15:18:35 +0000622 closep = 0
623 else:
624 if options.verbose:
Collin Winter6afaeb72007-08-03 17:06:41 +0000625 print(_('Working on %s') % filename)
Serhiy Storchakab6ed1732013-02-09 22:37:22 +0200626 fp = open(filename, 'rb')
Barry Warsawa17e0f12000-03-08 15:18:35 +0000627 closep = 1
628 try:
629 eater.set_filename(filename)
Barry Warsaw75ee8f52001-02-26 04:46:53 +0000630 try:
Serhiy Storchakab6ed1732013-02-09 22:37:22 +0200631 tokens = tokenize.tokenize(fp.readline)
Trent Nelson428de652008-03-18 22:41:35 +0000632 for _token in tokens:
633 eater(*_token)
Guido van Rossumb940e112007-01-10 16:19:56 +0000634 except tokenize.TokenError as e:
Collin Winter6afaeb72007-08-03 17:06:41 +0000635 print('%s: %s, line %d, column %d' % (
Georg Brandl6464d472007-10-22 16:16:13 +0000636 e.args[0], filename, e.args[1][0], e.args[1][1]),
637 file=sys.stderr)
Barry Warsawa17e0f12000-03-08 15:18:35 +0000638 finally:
639 if closep:
640 fp.close()
Barry Warsawe27db5a1999-08-13 20:59:48 +0000641
Barry Warsawa17e0f12000-03-08 15:18:35 +0000642 # write the output
643 if options.outfile == '-':
644 fp = sys.stdout
645 closep = 0
646 else:
647 if options.outpath:
648 options.outfile = os.path.join(options.outpath, options.outfile)
649 fp = open(options.outfile, 'w')
650 closep = 1
651 try:
652 eater.write(fp)
653 finally:
654 if closep:
655 fp.close()
Barry Warsawe27db5a1999-08-13 20:59:48 +0000656
657
658if __name__ == '__main__':
659 main()
Barry Warsaw75a6e672000-05-02 19:28:30 +0000660 # some more test strings
Barry Warsawe04ee702003-04-16 18:08:23 +0000661 # this one creates a warning
662 _('*** Seen unexpected token "%(token)s"') % {'token': 'test'}
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000663 _('more' 'than' 'one' 'string')