blob: 9dd65c221fd9daeded3773a0e9e1ce9ba41b5e1c [file] [log] [blame]
Benjamin Peterson90f5ba52010-03-11 22:53:45 +00001#! /usr/bin/env python3
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +00002# -*- coding: iso-8859-1 -*-
Benjamin Petersoneaedaec2013-12-22 19:45:38 -06003# Originally written by Barry Warsaw <barry@python.org>
Barry Warsawc8f08922000-02-26 20:56:47 +00004#
Barry Warsawe04ee702003-04-16 18:08:23 +00005# Minimally patched to make it even more xgettext compatible
Barry Warsawc8f08922000-02-26 20:56:47 +00006# by Peter Funk <pf@artcom-gmbh.de>
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +00007#
8# 2002-11-22 Jürgen Hermann <jh@web.de>
9# Added checks that _() only contains string literals, and
10# command line args are resolved to module lists, i.e. you
11# can now pass a filename, a module or package name, or a
12# directory (including globbing chars, important for Win32).
13# Made docstring fit in 80 chars wide displays using pydoc.
14#
Barry Warsawe27db5a1999-08-13 20:59:48 +000015
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +000016# for selftesting
17try:
18 import fintl
19 _ = fintl.gettext
20except ImportError:
21 _ = lambda s: s
22
23__doc__ = _("""pygettext -- Python equivalent of xgettext(1)
Barry Warsawe27db5a1999-08-13 20:59:48 +000024
25Many systems (Solaris, Linux, Gnu) provide extensive tools that ease the
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +000026internationalization of C programs. Most of these tools are independent of
27the programming language and can be used from within Python programs.
Barry Warsawe04ee702003-04-16 18:08:23 +000028Martin von Loewis' work[1] helps considerably in this regard.
Barry Warsawe27db5a1999-08-13 20:59:48 +000029
Barry Warsaw5dbf5261999-11-03 18:47:52 +000030There's one problem though; xgettext is the program that scans source code
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +000031looking for message strings, but it groks only C (or C++). Python
32introduces a few wrinkles, such as dual quoting characters, triple quoted
Barry Warsawe04ee702003-04-16 18:08:23 +000033strings, and raw strings. xgettext understands none of this.
Barry Warsawe27db5a1999-08-13 20:59:48 +000034
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +000035Enter pygettext, which uses Python's standard tokenize module to scan
36Python source code, generating .pot files identical to what GNU xgettext[2]
37generates for C and C++ code. From there, the standard GNU tools can be
Barry Warsawe04ee702003-04-16 18:08:23 +000038used.
Barry Warsawe27db5a1999-08-13 20:59:48 +000039
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +000040A word about marking Python strings as candidates for translation. GNU
41xgettext recognizes the following keywords: gettext, dgettext, dcgettext,
42and gettext_noop. But those can be a lot of text to include all over your
43code. C and C++ have a trick: they use the C preprocessor. Most
44internationalized C source includes a #define for gettext() to _() so that
45what has to be written in the source is much less. Thus these are both
Barry Warsawe04ee702003-04-16 18:08:23 +000046translatable strings:
Barry Warsawe27db5a1999-08-13 20:59:48 +000047
48 gettext("Translatable String")
49 _("Translatable String")
50
51Python of course has no preprocessor so this doesn't work so well. Thus,
52pygettext searches only for _() by default, but see the -k/--keyword flag
53below for how to augment this.
54
55 [1] http://www.python.org/workshops/1997-10/proceedings/loewis.html
56 [2] http://www.gnu.org/software/gettext/gettext.html
57
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +000058NOTE: pygettext attempts to be option and feature compatible with GNU
59xgettext where ever possible. However some options are still missing or are
60not fully implemented. Also, xgettext's use of command line switches with
61option arguments is broken, and in these cases, pygettext just defines
Barry Warsawe04ee702003-04-16 18:08:23 +000062additional switches.
Barry Warsawe27db5a1999-08-13 20:59:48 +000063
Barry Warsawa17e0f12000-03-08 15:18:35 +000064Usage: pygettext [options] inputfile ...
Barry Warsawe27db5a1999-08-13 20:59:48 +000065
66Options:
67
68 -a
69 --extract-all
Barry Warsaw63ce5af2001-07-27 16:47:18 +000070 Extract all strings.
Barry Warsawe27db5a1999-08-13 20:59:48 +000071
Barry Warsawc8f08922000-02-26 20:56:47 +000072 -d name
73 --default-domain=name
Barry Warsaw63ce5af2001-07-27 16:47:18 +000074 Rename the default output file from messages.pot to name.pot.
Barry Warsawc8f08922000-02-26 20:56:47 +000075
76 -E
77 --escape
Barry Warsaw08a8a352000-10-27 04:56:28 +000078 Replace non-ASCII characters with octal escape sequences.
79
80 -D
81 --docstrings
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +000082 Extract module, class, method, and function docstrings. These do
83 not need to be wrapped in _() markers, and in fact cannot be for
84 Python to consider them docstrings. (See also the -X option).
Barry Warsawc8f08922000-02-26 20:56:47 +000085
86 -h
87 --help
Barry Warsaw63ce5af2001-07-27 16:47:18 +000088 Print this help message and exit.
Barry Warsawe27db5a1999-08-13 20:59:48 +000089
Barry Warsawa17e0f12000-03-08 15:18:35 +000090 -k word
91 --keyword=word
92 Keywords to look for in addition to the default set, which are:
93 %(DEFAULTKEYWORDS)s
Barry Warsawe27db5a1999-08-13 20:59:48 +000094
Barry Warsawa17e0f12000-03-08 15:18:35 +000095 You can have multiple -k flags on the command line.
96
97 -K
98 --no-default-keywords
99 Disable the default set of keywords (see above). Any keywords
100 explicitly added with the -k/--keyword option are still recognized.
Barry Warsawe27db5a1999-08-13 20:59:48 +0000101
102 --no-location
Barry Warsawa17e0f12000-03-08 15:18:35 +0000103 Do not write filename/lineno location comments.
Barry Warsawe27db5a1999-08-13 20:59:48 +0000104
Barry Warsawa17e0f12000-03-08 15:18:35 +0000105 -n
106 --add-location
Barry Warsawe27db5a1999-08-13 20:59:48 +0000107 Write filename/lineno location comments indicating where each
108 extracted string is found in the source. These lines appear before
Barry Warsawa17e0f12000-03-08 15:18:35 +0000109 each msgid. The style of comments is controlled by the -S/--style
110 option. This is the default.
111
Barry Warsaw08a8a352000-10-27 04:56:28 +0000112 -o filename
113 --output=filename
114 Rename the default output file from messages.pot to filename. If
115 filename is `-' then the output is sent to standard out.
116
117 -p dir
118 --output-dir=dir
119 Output files will be placed in directory dir.
120
Barry Warsawa17e0f12000-03-08 15:18:35 +0000121 -S stylename
122 --style stylename
123 Specify which style to use for location comments. Two styles are
124 supported:
Barry Warsawe27db5a1999-08-13 20:59:48 +0000125
126 Solaris # File: filename, line: line-number
Barry Warsawa17e0f12000-03-08 15:18:35 +0000127 GNU #: filename:line
Barry Warsawe27db5a1999-08-13 20:59:48 +0000128
Barry Warsawa17e0f12000-03-08 15:18:35 +0000129 The style name is case insensitive. GNU style is the default.
Barry Warsawe27db5a1999-08-13 20:59:48 +0000130
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000131 -v
132 --verbose
133 Print the names of the files being processed.
134
Barry Warsawc8f08922000-02-26 20:56:47 +0000135 -V
136 --version
137 Print the version of pygettext and exit.
138
139 -w columns
140 --width=columns
141 Set width of output to columns.
142
143 -x filename
144 --exclude-file=filename
145 Specify a file that contains a list of strings that are not be
146 extracted from the input files. Each string to be excluded must
147 appear on a line by itself in the file.
Barry Warsawe27db5a1999-08-13 20:59:48 +0000148
Barry Warsaw63ce5af2001-07-27 16:47:18 +0000149 -X filename
150 --no-docstrings=filename
151 Specify a file that contains a list of files (one per line) that
152 should not have their docstrings extracted. This is only useful in
153 conjunction with the -D option above.
Barry Warsawa17e0f12000-03-08 15:18:35 +0000154
Barry Warsaw63ce5af2001-07-27 16:47:18 +0000155If `inputfile' is -, standard input is read.
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000156""")
Barry Warsawe27db5a1999-08-13 20:59:48 +0000157
158import os
Victor Stinner328cb1f2016-04-12 18:46:10 +0200159import importlib.machinery
160import importlib.util
Barry Warsawe27db5a1999-08-13 20:59:48 +0000161import sys
Barry Warsawe04ee702003-04-16 18:08:23 +0000162import glob
Barry Warsawe27db5a1999-08-13 20:59:48 +0000163import time
164import getopt
jack1142bfc6b632020-11-09 23:50:45 +0100165import ast
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000166import token
Barry Warsawe27db5a1999-08-13 20:59:48 +0000167import tokenize
168
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000169__version__ = '1.5'
Barry Warsawa17e0f12000-03-08 15:18:35 +0000170
171default_keywords = ['_']
172DEFAULTKEYWORDS = ', '.join(default_keywords)
173
174EMPTYSTRING = ''
Barry Warsawe27db5a1999-08-13 20:59:48 +0000175
176
177
Barry Warsaw63ce5af2001-07-27 16:47:18 +0000178# The normal pot-file header. msgmerge and Emacs's po-mode work better if it's
179# there.
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000180pot_header = _('''\
181# SOME DESCRIPTIVE TITLE.
182# Copyright (C) YEAR ORGANIZATION
183# FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.
184#
185msgid ""
186msgstr ""
187"Project-Id-Version: PACKAGE VERSION\\n"
Martin v. Löwis0f6b3832001-03-01 22:56:17 +0000188"POT-Creation-Date: %(time)s\\n"
189"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\\n"
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000190"Last-Translator: FULL NAME <EMAIL@ADDRESS>\\n"
191"Language-Team: LANGUAGE <LL@li.org>\\n"
192"MIME-Version: 1.0\\n"
Serhiy Storchakab6ed1732013-02-09 22:37:22 +0200193"Content-Type: text/plain; charset=%(charset)s\\n"
194"Content-Transfer-Encoding: %(encoding)s\\n"
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000195"Generated-By: pygettext.py %(version)s\\n"
196
197''')
198
199
Barry Warsawe27db5a1999-08-13 20:59:48 +0000200def usage(code, msg=''):
Collin Winter6afaeb72007-08-03 17:06:41 +0000201 print(__doc__ % globals(), file=sys.stderr)
Barry Warsawe27db5a1999-08-13 20:59:48 +0000202 if msg:
Collin Winter6afaeb72007-08-03 17:06:41 +0000203 print(msg, file=sys.stderr)
Barry Warsawe27db5a1999-08-13 20:59:48 +0000204 sys.exit(code)
205
Barry Warsawc8f08922000-02-26 20:56:47 +0000206
Barry Warsawe27db5a1999-08-13 20:59:48 +0000207
Serhiy Storchakab6ed1732013-02-09 22:37:22 +0200208def make_escapes(pass_nonascii):
209 global escapes, escape
210 if pass_nonascii:
211 # Allow non-ascii characters to pass through so that e.g. 'msgid
Barry Warsaw7733e122000-02-27 14:30:48 +0000212 # "Höhe"' would result not result in 'msgid "H\366he"'. Otherwise we
213 # escape any character outside the 32..126 range.
214 mod = 128
Serhiy Storchakab6ed1732013-02-09 22:37:22 +0200215 escape = escape_ascii
Barry Warsaw7733e122000-02-27 14:30:48 +0000216 else:
217 mod = 256
Serhiy Storchakab6ed1732013-02-09 22:37:22 +0200218 escape = escape_nonascii
219 escapes = [r"\%03o" % i for i in range(mod)]
220 for i in range(32, 127):
221 escapes[i] = chr(i)
222 escapes[ord('\\')] = r'\\'
223 escapes[ord('\t')] = r'\t'
224 escapes[ord('\r')] = r'\r'
225 escapes[ord('\n')] = r'\n'
226 escapes[ord('\"')] = r'\"'
Barry Warsawc8f08922000-02-26 20:56:47 +0000227
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000228
Serhiy Storchakab6ed1732013-02-09 22:37:22 +0200229def escape_ascii(s, encoding):
230 return ''.join(escapes[ord(c)] if ord(c) < 128 else c for c in s)
231
232def escape_nonascii(s, encoding):
233 return ''.join(escapes[b] for b in s.encode(encoding))
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000234
235
Serhiy Storchaka69524822018-04-19 09:23:03 +0300236def is_literal_string(s):
237 return s[0] in '\'"' or (s[0] in 'rRuU' and s[1] in '\'"')
238
239
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000240def safe_eval(s):
241 # unwrap quotes, safely
242 return eval(s, {'__builtins__':{}}, {})
243
244
Serhiy Storchakab6ed1732013-02-09 22:37:22 +0200245def normalize(s, encoding):
Barry Warsawe27db5a1999-08-13 20:59:48 +0000246 # This converts the various Python string types into a format that is
247 # appropriate for .po files, namely much closer to C style.
Barry Warsawa17e0f12000-03-08 15:18:35 +0000248 lines = s.split('\n')
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000249 if len(lines) == 1:
Serhiy Storchakab6ed1732013-02-09 22:37:22 +0200250 s = '"' + escape(s, encoding) + '"'
Barry Warsawe27db5a1999-08-13 20:59:48 +0000251 else:
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000252 if not lines[-1]:
253 del lines[-1]
254 lines[-1] = lines[-1] + '\n'
255 for i in range(len(lines)):
Serhiy Storchakab6ed1732013-02-09 22:37:22 +0200256 lines[i] = escape(lines[i], encoding)
Barry Warsawa17e0f12000-03-08 15:18:35 +0000257 lineterm = '\\n"\n"'
258 s = '""\n"' + lineterm.join(lines) + '"'
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000259 return s
Barry Warsawe27db5a1999-08-13 20:59:48 +0000260
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000261
262def containsAny(str, set):
Barry Warsawe04ee702003-04-16 18:08:23 +0000263 """Check whether 'str' contains ANY of the chars in 'set'"""
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000264 return 1 in [c in str for c in set]
265
266
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000267def getFilesForName(name):
Barry Warsawe04ee702003-04-16 18:08:23 +0000268 """Get a list of module files for a filename, a module or package name,
269 or a directory.
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000270 """
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000271 if not os.path.exists(name):
272 # check for glob chars
273 if containsAny(name, "*?[]"):
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000274 files = glob.glob(name)
275 list = []
276 for file in files:
277 list.extend(getFilesForName(file))
278 return list
279
280 # try to find module or package
Victor Stinner328cb1f2016-04-12 18:46:10 +0200281 try:
282 spec = importlib.util.find_spec(name)
283 name = spec.origin
284 except ImportError:
285 name = None
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000286 if not name:
287 return []
288
289 if os.path.isdir(name):
290 # find all python files in directory
291 list = []
Serhiy Storchakac93938b2018-04-09 20:09:17 +0300292 # get extension for python source files
293 _py_ext = importlib.machinery.SOURCE_SUFFIXES[0]
294 for root, dirs, files in os.walk(name):
295 # don't recurse into CVS directories
296 if 'CVS' in dirs:
297 dirs.remove('CVS')
298 # add all *.py files to list
299 list.extend(
300 [os.path.join(root, file) for file in files
301 if os.path.splitext(file)[1] == _py_ext]
302 )
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000303 return list
304 elif os.path.exists(name):
305 # a single file
306 return [name]
307
308 return []
Barry Warsawe27db5a1999-08-13 20:59:48 +0000309
310
311class TokenEater:
312 def __init__(self, options):
313 self.__options = options
314 self.__messages = {}
315 self.__state = self.__waiting
316 self.__data = []
317 self.__lineno = -1
Barry Warsaw08a8a352000-10-27 04:56:28 +0000318 self.__freshmodule = 1
Barry Warsaw63ce5af2001-07-27 16:47:18 +0000319 self.__curfile = None
Tobotimuseee72d42018-02-27 09:48:14 +1100320 self.__enclosurecount = 0
Barry Warsawe27db5a1999-08-13 20:59:48 +0000321
322 def __call__(self, ttype, tstring, stup, etup, line):
323 # dispatch
Barry Warsaw08a8a352000-10-27 04:56:28 +0000324## import token
Serhiy Storchaka69524822018-04-19 09:23:03 +0300325## print('ttype:', token.tok_name[ttype], 'tstring:', tstring,
326## file=sys.stderr)
Barry Warsawe27db5a1999-08-13 20:59:48 +0000327 self.__state(ttype, tstring, stup[0])
328
329 def __waiting(self, ttype, tstring, lineno):
Barry Warsaw63ce5af2001-07-27 16:47:18 +0000330 opts = self.__options
Barry Warsaw08a8a352000-10-27 04:56:28 +0000331 # Do docstring extractions, if enabled
Barry Warsaw63ce5af2001-07-27 16:47:18 +0000332 if opts.docstrings and not opts.nodocstrings.get(self.__curfile):
Barry Warsaw08a8a352000-10-27 04:56:28 +0000333 # module docstring?
334 if self.__freshmodule:
Serhiy Storchaka69524822018-04-19 09:23:03 +0300335 if ttype == tokenize.STRING and is_literal_string(tstring):
Barry Warsaw16b62c12001-05-21 19:51:26 +0000336 self.__addentry(safe_eval(tstring), lineno, isdocstring=1)
Barry Warsaw08a8a352000-10-27 04:56:28 +0000337 self.__freshmodule = 0
338 elif ttype not in (tokenize.COMMENT, tokenize.NL):
339 self.__freshmodule = 0
340 return
Tobotimuseee72d42018-02-27 09:48:14 +1100341 # class or func/method docstring?
Barry Warsaw08a8a352000-10-27 04:56:28 +0000342 if ttype == tokenize.NAME and tstring in ('class', 'def'):
343 self.__state = self.__suiteseen
344 return
Barry Warsaw63ce5af2001-07-27 16:47:18 +0000345 if ttype == tokenize.NAME and tstring in opts.keywords:
Barry Warsawe27db5a1999-08-13 20:59:48 +0000346 self.__state = self.__keywordseen
jack1142bfc6b632020-11-09 23:50:45 +0100347 return
348 if ttype == tokenize.STRING:
349 maybe_fstring = ast.parse(tstring, mode='eval').body
350 if not isinstance(maybe_fstring, ast.JoinedStr):
351 return
352 for value in filter(lambda node: isinstance(node, ast.FormattedValue),
353 maybe_fstring.values):
354 for call in filter(lambda node: isinstance(node, ast.Call),
355 ast.walk(value)):
356 func = call.func
357 if isinstance(func, ast.Name):
358 func_name = func.id
359 elif isinstance(func, ast.Attribute):
360 func_name = func.attr
361 else:
362 continue
363
364 if func_name not in opts.keywords:
365 continue
366 if len(call.args) != 1:
367 print(_(
368 '*** %(file)s:%(lineno)s: Seen unexpected amount of'
369 ' positional arguments in gettext call: %(source_segment)s'
370 ) % {
371 'source_segment': ast.get_source_segment(tstring, call) or tstring,
372 'file': self.__curfile,
373 'lineno': lineno
374 }, file=sys.stderr)
375 continue
376 if call.keywords:
377 print(_(
378 '*** %(file)s:%(lineno)s: Seen unexpected keyword arguments'
379 ' in gettext call: %(source_segment)s'
380 ) % {
381 'source_segment': ast.get_source_segment(tstring, call) or tstring,
382 'file': self.__curfile,
383 'lineno': lineno
384 }, file=sys.stderr)
385 continue
386 arg = call.args[0]
387 if not isinstance(arg, ast.Constant):
388 print(_(
389 '*** %(file)s:%(lineno)s: Seen unexpected argument type'
390 ' in gettext call: %(source_segment)s'
391 ) % {
392 'source_segment': ast.get_source_segment(tstring, call) or tstring,
393 'file': self.__curfile,
394 'lineno': lineno
395 }, file=sys.stderr)
396 continue
397 if isinstance(arg.value, str):
398 self.__addentry(arg.value, lineno)
Barry Warsawe27db5a1999-08-13 20:59:48 +0000399
Barry Warsaw08a8a352000-10-27 04:56:28 +0000400 def __suiteseen(self, ttype, tstring, lineno):
Tobotimuseee72d42018-02-27 09:48:14 +1100401 # skip over any enclosure pairs until we see the colon
402 if ttype == tokenize.OP:
403 if tstring == ':' and self.__enclosurecount == 0:
404 # we see a colon and we're not in an enclosure: end of def
405 self.__state = self.__suitedocstring
406 elif tstring in '([{':
407 self.__enclosurecount += 1
408 elif tstring in ')]}':
409 self.__enclosurecount -= 1
Barry Warsaw08a8a352000-10-27 04:56:28 +0000410
411 def __suitedocstring(self, ttype, tstring, lineno):
412 # ignore any intervening noise
Serhiy Storchaka69524822018-04-19 09:23:03 +0300413 if ttype == tokenize.STRING and is_literal_string(tstring):
Barry Warsaw16b62c12001-05-21 19:51:26 +0000414 self.__addentry(safe_eval(tstring), lineno, isdocstring=1)
Barry Warsaw08a8a352000-10-27 04:56:28 +0000415 self.__state = self.__waiting
416 elif ttype not in (tokenize.NEWLINE, tokenize.INDENT,
417 tokenize.COMMENT):
418 # there was no class docstring
419 self.__state = self.__waiting
420
Barry Warsawe27db5a1999-08-13 20:59:48 +0000421 def __keywordseen(self, ttype, tstring, lineno):
422 if ttype == tokenize.OP and tstring == '(':
423 self.__data = []
424 self.__lineno = lineno
425 self.__state = self.__openseen
426 else:
427 self.__state = self.__waiting
428
429 def __openseen(self, ttype, tstring, lineno):
430 if ttype == tokenize.OP and tstring == ')':
431 # We've seen the last of the translatable strings. Record the
Barry Warsawe04ee702003-04-16 18:08:23 +0000432 # line number of the first line of the strings and update the list
Barry Warsawe27db5a1999-08-13 20:59:48 +0000433 # of messages seen. Reset state for the next batch. If there
434 # were no strings inside _(), then just ignore this entry.
435 if self.__data:
Barry Warsaw08a8a352000-10-27 04:56:28 +0000436 self.__addentry(EMPTYSTRING.join(self.__data))
Barry Warsawe27db5a1999-08-13 20:59:48 +0000437 self.__state = self.__waiting
Serhiy Storchaka69524822018-04-19 09:23:03 +0300438 elif ttype == tokenize.STRING and is_literal_string(tstring):
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000439 self.__data.append(safe_eval(tstring))
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000440 elif ttype not in [tokenize.COMMENT, token.INDENT, token.DEDENT,
441 token.NEWLINE, tokenize.NL]:
442 # warn if we see anything else than STRING or whitespace
Collin Winter6afaeb72007-08-03 17:06:41 +0000443 print(_(
Barry Warsawe04ee702003-04-16 18:08:23 +0000444 '*** %(file)s:%(lineno)s: Seen unexpected token "%(token)s"'
445 ) % {
446 'token': tstring,
447 'file': self.__curfile,
448 'lineno': self.__lineno
Collin Winter6afaeb72007-08-03 17:06:41 +0000449 }, file=sys.stderr)
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000450 self.__state = self.__waiting
Barry Warsawe27db5a1999-08-13 20:59:48 +0000451
Barry Warsaw16b62c12001-05-21 19:51:26 +0000452 def __addentry(self, msg, lineno=None, isdocstring=0):
Barry Warsaw08a8a352000-10-27 04:56:28 +0000453 if lineno is None:
454 lineno = self.__lineno
455 if not msg in self.__options.toexclude:
456 entry = (self.__curfile, lineno)
Barry Warsaw16b62c12001-05-21 19:51:26 +0000457 self.__messages.setdefault(msg, {})[entry] = isdocstring
Barry Warsaw08a8a352000-10-27 04:56:28 +0000458
Barry Warsawe27db5a1999-08-13 20:59:48 +0000459 def set_filename(self, filename):
460 self.__curfile = filename
Barry Warsaw63ce5af2001-07-27 16:47:18 +0000461 self.__freshmodule = 1
Barry Warsawe27db5a1999-08-13 20:59:48 +0000462
463 def write(self, fp):
464 options = self.__options
R David Murray2b781292015-04-16 12:15:09 -0400465 timestamp = time.strftime('%Y-%m-%d %H:%M%z')
Serhiy Storchakab6ed1732013-02-09 22:37:22 +0200466 encoding = fp.encoding if fp.encoding else 'UTF-8'
467 print(pot_header % {'time': timestamp, 'version': __version__,
468 'charset': encoding,
469 'encoding': '8bit'}, file=fp)
Barry Warsaw128c77d2001-05-23 16:59:45 +0000470 # Sort the entries. First sort each particular entry's keys, then
471 # sort all the entries by their first item.
472 reverse = {}
Fred Drake33e2c3e2000-10-26 03:49:15 +0000473 for k, v in self.__messages.items():
Guido van Rossumf7bd9642008-01-15 17:41:38 +0000474 keys = sorted(v.keys())
Barry Warsaw50cf7062001-05-24 23:06:13 +0000475 reverse.setdefault(tuple(keys), []).append((k, v))
Guido van Rossumf7bd9642008-01-15 17:41:38 +0000476 rkeys = sorted(reverse.keys())
Barry Warsaw128c77d2001-05-23 16:59:45 +0000477 for rkey in rkeys:
Barry Warsaw50cf7062001-05-24 23:06:13 +0000478 rentries = reverse[rkey]
479 rentries.sort()
480 for k, v in rentries:
481 # If the entry was gleaned out of a docstring, then add a
482 # comment stating so. This is to aid translators who may wish
483 # to skip translating some unimportant docstrings.
Guido van Rossum89da5d72006-08-22 00:21:25 +0000484 isdocstring = any(v.values())
Barry Warsaw50cf7062001-05-24 23:06:13 +0000485 # k is the message string, v is a dictionary-set of (filename,
486 # lineno) tuples. We want to sort the entries in v first by
487 # file name and then by line number.
Guido van Rossumf7bd9642008-01-15 17:41:38 +0000488 v = sorted(v.keys())
Barry Warsaw50cf7062001-05-24 23:06:13 +0000489 if not options.writelocations:
490 pass
491 # location comments are different b/w Solaris and GNU:
492 elif options.locationstyle == options.SOLARIS:
493 for filename, lineno in v:
494 d = {'filename': filename, 'lineno': lineno}
Collin Winter6afaeb72007-08-03 17:06:41 +0000495 print(_(
496 '# File: %(filename)s, line: %(lineno)d') % d, file=fp)
Barry Warsaw50cf7062001-05-24 23:06:13 +0000497 elif options.locationstyle == options.GNU:
498 # fit as many locations on one line, as long as the
Martin Panter69332c12016-08-04 13:07:31 +0000499 # resulting line length doesn't exceed 'options.width'
Barry Warsaw50cf7062001-05-24 23:06:13 +0000500 locline = '#:'
501 for filename, lineno in v:
502 d = {'filename': filename, 'lineno': lineno}
503 s = _(' %(filename)s:%(lineno)d') % d
504 if len(locline) + len(s) <= options.width:
505 locline = locline + s
506 else:
Collin Winter6afaeb72007-08-03 17:06:41 +0000507 print(locline, file=fp)
Barry Warsaw50cf7062001-05-24 23:06:13 +0000508 locline = "#:" + s
509 if len(locline) > 2:
Collin Winter6afaeb72007-08-03 17:06:41 +0000510 print(locline, file=fp)
Barry Warsaw5c94ce52001-06-20 19:41:40 +0000511 if isdocstring:
Collin Winter6afaeb72007-08-03 17:06:41 +0000512 print('#, docstring', file=fp)
Serhiy Storchakab6ed1732013-02-09 22:37:22 +0200513 print('msgid', normalize(k, encoding), file=fp)
Collin Winter6afaeb72007-08-03 17:06:41 +0000514 print('msgstr ""\n', file=fp)
Barry Warsaw08a8a352000-10-27 04:56:28 +0000515
Barry Warsawe27db5a1999-08-13 20:59:48 +0000516
517
518def main():
Barry Warsawa17e0f12000-03-08 15:18:35 +0000519 global default_keywords
Barry Warsawe27db5a1999-08-13 20:59:48 +0000520 try:
521 opts, args = getopt.getopt(
522 sys.argv[1:],
Barry Warsaw63ce5af2001-07-27 16:47:18 +0000523 'ad:DEhk:Kno:p:S:Vvw:x:X:',
Barry Warsaw2b639692001-05-21 19:58:23 +0000524 ['extract-all', 'default-domain=', 'escape', 'help',
Barry Warsawa17e0f12000-03-08 15:18:35 +0000525 'keyword=', 'no-default-keywords',
Barry Warsawc8f08922000-02-26 20:56:47 +0000526 'add-location', 'no-location', 'output=', 'output-dir=',
Barry Warsawa17e0f12000-03-08 15:18:35 +0000527 'style=', 'verbose', 'version', 'width=', 'exclude-file=',
Barry Warsaw63ce5af2001-07-27 16:47:18 +0000528 'docstrings', 'no-docstrings',
Barry Warsawc8f08922000-02-26 20:56:47 +0000529 ])
Guido van Rossumb940e112007-01-10 16:19:56 +0000530 except getopt.error as msg:
Barry Warsawe27db5a1999-08-13 20:59:48 +0000531 usage(1, msg)
532
533 # for holding option values
534 class Options:
535 # constants
536 GNU = 1
537 SOLARIS = 2
538 # defaults
Barry Warsawc8f08922000-02-26 20:56:47 +0000539 extractall = 0 # FIXME: currently this option has no effect at all.
540 escape = 0
Barry Warsawe27db5a1999-08-13 20:59:48 +0000541 keywords = []
Barry Warsawc8f08922000-02-26 20:56:47 +0000542 outpath = ''
Barry Warsawe27db5a1999-08-13 20:59:48 +0000543 outfile = 'messages.pot'
Barry Warsawa17e0f12000-03-08 15:18:35 +0000544 writelocations = 1
545 locationstyle = GNU
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000546 verbose = 0
Barry Warsawc8f08922000-02-26 20:56:47 +0000547 width = 78
548 excludefilename = ''
Barry Warsaw08a8a352000-10-27 04:56:28 +0000549 docstrings = 0
Barry Warsaw63ce5af2001-07-27 16:47:18 +0000550 nodocstrings = {}
Barry Warsawe27db5a1999-08-13 20:59:48 +0000551
552 options = Options()
553 locations = {'gnu' : options.GNU,
554 'solaris' : options.SOLARIS,
555 }
556
557 # parse options
558 for opt, arg in opts:
559 if opt in ('-h', '--help'):
560 usage(0)
Barry Warsawc8f08922000-02-26 20:56:47 +0000561 elif opt in ('-a', '--extract-all'):
562 options.extractall = 1
563 elif opt in ('-d', '--default-domain'):
564 options.outfile = arg + '.pot'
565 elif opt in ('-E', '--escape'):
566 options.escape = 1
Barry Warsaw08a8a352000-10-27 04:56:28 +0000567 elif opt in ('-D', '--docstrings'):
568 options.docstrings = 1
Barry Warsawe27db5a1999-08-13 20:59:48 +0000569 elif opt in ('-k', '--keyword'):
Barry Warsawe27db5a1999-08-13 20:59:48 +0000570 options.keywords.append(arg)
Barry Warsawa17e0f12000-03-08 15:18:35 +0000571 elif opt in ('-K', '--no-default-keywords'):
572 default_keywords = []
Barry Warsawe27db5a1999-08-13 20:59:48 +0000573 elif opt in ('-n', '--add-location'):
Barry Warsawa17e0f12000-03-08 15:18:35 +0000574 options.writelocations = 1
Barry Warsawe27db5a1999-08-13 20:59:48 +0000575 elif opt in ('--no-location',):
Barry Warsawa17e0f12000-03-08 15:18:35 +0000576 options.writelocations = 0
577 elif opt in ('-S', '--style'):
578 options.locationstyle = locations.get(arg.lower())
579 if options.locationstyle is None:
580 usage(1, _('Invalid value for --style: %s') % arg)
Barry Warsawc8f08922000-02-26 20:56:47 +0000581 elif opt in ('-o', '--output'):
582 options.outfile = arg
583 elif opt in ('-p', '--output-dir'):
584 options.outpath = arg
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000585 elif opt in ('-v', '--verbose'):
586 options.verbose = 1
Barry Warsawc8f08922000-02-26 20:56:47 +0000587 elif opt in ('-V', '--version'):
Collin Winter6afaeb72007-08-03 17:06:41 +0000588 print(_('pygettext.py (xgettext for Python) %s') % __version__)
Barry Warsawc8f08922000-02-26 20:56:47 +0000589 sys.exit(0)
590 elif opt in ('-w', '--width'):
591 try:
592 options.width = int(arg)
593 except ValueError:
Barry Warsawa17e0f12000-03-08 15:18:35 +0000594 usage(1, _('--width argument must be an integer: %s') % arg)
Barry Warsawc8f08922000-02-26 20:56:47 +0000595 elif opt in ('-x', '--exclude-file'):
596 options.excludefilename = arg
Barry Warsaw63ce5af2001-07-27 16:47:18 +0000597 elif opt in ('-X', '--no-docstrings'):
598 fp = open(arg)
599 try:
600 while 1:
601 line = fp.readline()
602 if not line:
603 break
604 options.nodocstrings[line[:-1]] = 1
605 finally:
606 fp.close()
Barry Warsawc8f08922000-02-26 20:56:47 +0000607
608 # calculate escapes
Serhiy Storchakab6ed1732013-02-09 22:37:22 +0200609 make_escapes(not options.escape)
Barry Warsawe27db5a1999-08-13 20:59:48 +0000610
611 # calculate all keywords
612 options.keywords.extend(default_keywords)
613
Barry Warsawc8f08922000-02-26 20:56:47 +0000614 # initialize list of strings to exclude
615 if options.excludefilename:
616 try:
Serhiy Storchaka172bb392019-03-30 08:33:02 +0200617 with open(options.excludefilename) as fp:
618 options.toexclude = fp.readlines()
Barry Warsawc8f08922000-02-26 20:56:47 +0000619 except IOError:
Collin Winter6afaeb72007-08-03 17:06:41 +0000620 print(_(
621 "Can't read --exclude-file: %s") % options.excludefilename, file=sys.stderr)
Barry Warsawc8f08922000-02-26 20:56:47 +0000622 sys.exit(1)
623 else:
624 options.toexclude = []
625
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000626 # resolve args to module lists
627 expanded = []
628 for arg in args:
629 if arg == '-':
630 expanded.append(arg)
631 else:
632 expanded.extend(getFilesForName(arg))
633 args = expanded
634
Barry Warsawe27db5a1999-08-13 20:59:48 +0000635 # slurp through all the files
636 eater = TokenEater(options)
637 for filename in args:
Barry Warsawa17e0f12000-03-08 15:18:35 +0000638 if filename == '-':
639 if options.verbose:
Collin Winter6afaeb72007-08-03 17:06:41 +0000640 print(_('Reading standard input'))
Serhiy Storchakab6ed1732013-02-09 22:37:22 +0200641 fp = sys.stdin.buffer
Barry Warsawa17e0f12000-03-08 15:18:35 +0000642 closep = 0
643 else:
644 if options.verbose:
Collin Winter6afaeb72007-08-03 17:06:41 +0000645 print(_('Working on %s') % filename)
Serhiy Storchakab6ed1732013-02-09 22:37:22 +0200646 fp = open(filename, 'rb')
Barry Warsawa17e0f12000-03-08 15:18:35 +0000647 closep = 1
648 try:
649 eater.set_filename(filename)
Barry Warsaw75ee8f52001-02-26 04:46:53 +0000650 try:
Serhiy Storchakab6ed1732013-02-09 22:37:22 +0200651 tokens = tokenize.tokenize(fp.readline)
Trent Nelson428de652008-03-18 22:41:35 +0000652 for _token in tokens:
653 eater(*_token)
Guido van Rossumb940e112007-01-10 16:19:56 +0000654 except tokenize.TokenError as e:
Collin Winter6afaeb72007-08-03 17:06:41 +0000655 print('%s: %s, line %d, column %d' % (
Georg Brandl6464d472007-10-22 16:16:13 +0000656 e.args[0], filename, e.args[1][0], e.args[1][1]),
657 file=sys.stderr)
Barry Warsawa17e0f12000-03-08 15:18:35 +0000658 finally:
659 if closep:
660 fp.close()
Barry Warsawe27db5a1999-08-13 20:59:48 +0000661
Barry Warsawa17e0f12000-03-08 15:18:35 +0000662 # write the output
663 if options.outfile == '-':
664 fp = sys.stdout
665 closep = 0
666 else:
667 if options.outpath:
668 options.outfile = os.path.join(options.outpath, options.outfile)
669 fp = open(options.outfile, 'w')
670 closep = 1
671 try:
672 eater.write(fp)
673 finally:
674 if closep:
675 fp.close()
Barry Warsawe27db5a1999-08-13 20:59:48 +0000676
677
678if __name__ == '__main__':
679 main()
Barry Warsaw75a6e672000-05-02 19:28:30 +0000680 # some more test strings
Barry Warsawe04ee702003-04-16 18:08:23 +0000681 # this one creates a warning
682 _('*** Seen unexpected token "%(token)s"') % {'token': 'test'}
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000683 _('more' 'than' 'one' 'string')