blob: b46dd339736f44f19f72aefac42aa444957c2288 [file] [log] [blame]
Benjamin Peterson90f5ba52010-03-11 22:53:45 +00001#! /usr/bin/env python3
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +00002# -*- coding: iso-8859-1 -*-
Benjamin Petersoneaedaec2013-12-22 19:45:38 -06003# Originally written by Barry Warsaw <barry@python.org>
Barry Warsawc8f08922000-02-26 20:56:47 +00004#
Barry Warsawe04ee702003-04-16 18:08:23 +00005# Minimally patched to make it even more xgettext compatible
Barry Warsawc8f08922000-02-26 20:56:47 +00006# by Peter Funk <pf@artcom-gmbh.de>
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +00007#
8# 2002-11-22 Jürgen Hermann <jh@web.de>
9# Added checks that _() only contains string literals, and
10# command line args are resolved to module lists, i.e. you
11# can now pass a filename, a module or package name, or a
12# directory (including globbing chars, important for Win32).
13# Made docstring fit in 80 chars wide displays using pydoc.
14#
Barry Warsawe27db5a1999-08-13 20:59:48 +000015
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +000016# for selftesting
17try:
18 import fintl
19 _ = fintl.gettext
20except ImportError:
21 _ = lambda s: s
22
23__doc__ = _("""pygettext -- Python equivalent of xgettext(1)
Barry Warsawe27db5a1999-08-13 20:59:48 +000024
25Many systems (Solaris, Linux, Gnu) provide extensive tools that ease the
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +000026internationalization of C programs. Most of these tools are independent of
27the programming language and can be used from within Python programs.
Barry Warsawe04ee702003-04-16 18:08:23 +000028Martin von Loewis' work[1] helps considerably in this regard.
Barry Warsawe27db5a1999-08-13 20:59:48 +000029
Barry Warsaw5dbf5261999-11-03 18:47:52 +000030There's one problem though; xgettext is the program that scans source code
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +000031looking for message strings, but it groks only C (or C++). Python
32introduces a few wrinkles, such as dual quoting characters, triple quoted
Barry Warsawe04ee702003-04-16 18:08:23 +000033strings, and raw strings. xgettext understands none of this.
Barry Warsawe27db5a1999-08-13 20:59:48 +000034
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +000035Enter pygettext, which uses Python's standard tokenize module to scan
36Python source code, generating .pot files identical to what GNU xgettext[2]
37generates for C and C++ code. From there, the standard GNU tools can be
Barry Warsawe04ee702003-04-16 18:08:23 +000038used.
Barry Warsawe27db5a1999-08-13 20:59:48 +000039
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +000040A word about marking Python strings as candidates for translation. GNU
41xgettext recognizes the following keywords: gettext, dgettext, dcgettext,
42and gettext_noop. But those can be a lot of text to include all over your
43code. C and C++ have a trick: they use the C preprocessor. Most
44internationalized C source includes a #define for gettext() to _() so that
45what has to be written in the source is much less. Thus these are both
Barry Warsawe04ee702003-04-16 18:08:23 +000046translatable strings:
Barry Warsawe27db5a1999-08-13 20:59:48 +000047
48 gettext("Translatable String")
49 _("Translatable String")
50
51Python of course has no preprocessor so this doesn't work so well. Thus,
52pygettext searches only for _() by default, but see the -k/--keyword flag
53below for how to augment this.
54
55 [1] http://www.python.org/workshops/1997-10/proceedings/loewis.html
56 [2] http://www.gnu.org/software/gettext/gettext.html
57
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +000058NOTE: pygettext attempts to be option and feature compatible with GNU
59xgettext where ever possible. However some options are still missing or are
60not fully implemented. Also, xgettext's use of command line switches with
61option arguments is broken, and in these cases, pygettext just defines
Barry Warsawe04ee702003-04-16 18:08:23 +000062additional switches.
Barry Warsawe27db5a1999-08-13 20:59:48 +000063
Barry Warsawa17e0f12000-03-08 15:18:35 +000064Usage: pygettext [options] inputfile ...
Barry Warsawe27db5a1999-08-13 20:59:48 +000065
66Options:
67
68 -a
69 --extract-all
Barry Warsaw63ce5af2001-07-27 16:47:18 +000070 Extract all strings.
Barry Warsawe27db5a1999-08-13 20:59:48 +000071
Barry Warsawc8f08922000-02-26 20:56:47 +000072 -d name
73 --default-domain=name
Barry Warsaw63ce5af2001-07-27 16:47:18 +000074 Rename the default output file from messages.pot to name.pot.
Barry Warsawc8f08922000-02-26 20:56:47 +000075
76 -E
77 --escape
Barry Warsaw08a8a352000-10-27 04:56:28 +000078 Replace non-ASCII characters with octal escape sequences.
79
80 -D
81 --docstrings
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +000082 Extract module, class, method, and function docstrings. These do
83 not need to be wrapped in _() markers, and in fact cannot be for
84 Python to consider them docstrings. (See also the -X option).
Barry Warsawc8f08922000-02-26 20:56:47 +000085
86 -h
87 --help
Barry Warsaw63ce5af2001-07-27 16:47:18 +000088 Print this help message and exit.
Barry Warsawe27db5a1999-08-13 20:59:48 +000089
Barry Warsawa17e0f12000-03-08 15:18:35 +000090 -k word
91 --keyword=word
92 Keywords to look for in addition to the default set, which are:
93 %(DEFAULTKEYWORDS)s
Barry Warsawe27db5a1999-08-13 20:59:48 +000094
Barry Warsawa17e0f12000-03-08 15:18:35 +000095 You can have multiple -k flags on the command line.
96
97 -K
98 --no-default-keywords
99 Disable the default set of keywords (see above). Any keywords
100 explicitly added with the -k/--keyword option are still recognized.
Barry Warsawe27db5a1999-08-13 20:59:48 +0000101
102 --no-location
Barry Warsawa17e0f12000-03-08 15:18:35 +0000103 Do not write filename/lineno location comments.
Barry Warsawe27db5a1999-08-13 20:59:48 +0000104
Barry Warsawa17e0f12000-03-08 15:18:35 +0000105 -n
106 --add-location
Barry Warsawe27db5a1999-08-13 20:59:48 +0000107 Write filename/lineno location comments indicating where each
108 extracted string is found in the source. These lines appear before
Barry Warsawa17e0f12000-03-08 15:18:35 +0000109 each msgid. The style of comments is controlled by the -S/--style
110 option. This is the default.
111
Barry Warsaw08a8a352000-10-27 04:56:28 +0000112 -o filename
113 --output=filename
114 Rename the default output file from messages.pot to filename. If
115 filename is `-' then the output is sent to standard out.
116
117 -p dir
118 --output-dir=dir
119 Output files will be placed in directory dir.
120
Barry Warsawa17e0f12000-03-08 15:18:35 +0000121 -S stylename
122 --style stylename
123 Specify which style to use for location comments. Two styles are
124 supported:
Barry Warsawe27db5a1999-08-13 20:59:48 +0000125
126 Solaris # File: filename, line: line-number
Barry Warsawa17e0f12000-03-08 15:18:35 +0000127 GNU #: filename:line
Barry Warsawe27db5a1999-08-13 20:59:48 +0000128
Barry Warsawa17e0f12000-03-08 15:18:35 +0000129 The style name is case insensitive. GNU style is the default.
Barry Warsawe27db5a1999-08-13 20:59:48 +0000130
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000131 -v
132 --verbose
133 Print the names of the files being processed.
134
Barry Warsawc8f08922000-02-26 20:56:47 +0000135 -V
136 --version
137 Print the version of pygettext and exit.
138
139 -w columns
140 --width=columns
141 Set width of output to columns.
142
143 -x filename
144 --exclude-file=filename
145 Specify a file that contains a list of strings that are not be
146 extracted from the input files. Each string to be excluded must
147 appear on a line by itself in the file.
Barry Warsawe27db5a1999-08-13 20:59:48 +0000148
Barry Warsaw63ce5af2001-07-27 16:47:18 +0000149 -X filename
150 --no-docstrings=filename
151 Specify a file that contains a list of files (one per line) that
152 should not have their docstrings extracted. This is only useful in
153 conjunction with the -D option above.
Barry Warsawa17e0f12000-03-08 15:18:35 +0000154
Barry Warsaw63ce5af2001-07-27 16:47:18 +0000155If `inputfile' is -, standard input is read.
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000156""")
Barry Warsawe27db5a1999-08-13 20:59:48 +0000157
158import os
Victor Stinner328cb1f2016-04-12 18:46:10 +0200159import importlib.machinery
160import importlib.util
Barry Warsawe27db5a1999-08-13 20:59:48 +0000161import sys
Barry Warsawe04ee702003-04-16 18:08:23 +0000162import glob
Barry Warsawe27db5a1999-08-13 20:59:48 +0000163import time
164import getopt
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000165import token
Barry Warsawe27db5a1999-08-13 20:59:48 +0000166import tokenize
167
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000168__version__ = '1.5'
Barry Warsawa17e0f12000-03-08 15:18:35 +0000169
170default_keywords = ['_']
171DEFAULTKEYWORDS = ', '.join(default_keywords)
172
173EMPTYSTRING = ''
Barry Warsawe27db5a1999-08-13 20:59:48 +0000174
175
176
Barry Warsaw63ce5af2001-07-27 16:47:18 +0000177# The normal pot-file header. msgmerge and Emacs's po-mode work better if it's
178# there.
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000179pot_header = _('''\
180# SOME DESCRIPTIVE TITLE.
181# Copyright (C) YEAR ORGANIZATION
182# FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.
183#
184msgid ""
185msgstr ""
186"Project-Id-Version: PACKAGE VERSION\\n"
Martin v. Löwis0f6b3832001-03-01 22:56:17 +0000187"POT-Creation-Date: %(time)s\\n"
188"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\\n"
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000189"Last-Translator: FULL NAME <EMAIL@ADDRESS>\\n"
190"Language-Team: LANGUAGE <LL@li.org>\\n"
191"MIME-Version: 1.0\\n"
Serhiy Storchakab6ed1732013-02-09 22:37:22 +0200192"Content-Type: text/plain; charset=%(charset)s\\n"
193"Content-Transfer-Encoding: %(encoding)s\\n"
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000194"Generated-By: pygettext.py %(version)s\\n"
195
196''')
197
198
Barry Warsawe27db5a1999-08-13 20:59:48 +0000199def usage(code, msg=''):
Collin Winter6afaeb72007-08-03 17:06:41 +0000200 print(__doc__ % globals(), file=sys.stderr)
Barry Warsawe27db5a1999-08-13 20:59:48 +0000201 if msg:
Collin Winter6afaeb72007-08-03 17:06:41 +0000202 print(msg, file=sys.stderr)
Barry Warsawe27db5a1999-08-13 20:59:48 +0000203 sys.exit(code)
204
Barry Warsawc8f08922000-02-26 20:56:47 +0000205
Barry Warsawe27db5a1999-08-13 20:59:48 +0000206
Serhiy Storchakab6ed1732013-02-09 22:37:22 +0200207def make_escapes(pass_nonascii):
208 global escapes, escape
209 if pass_nonascii:
210 # Allow non-ascii characters to pass through so that e.g. 'msgid
Barry Warsaw7733e122000-02-27 14:30:48 +0000211 # "Höhe"' would result not result in 'msgid "H\366he"'. Otherwise we
212 # escape any character outside the 32..126 range.
213 mod = 128
Serhiy Storchakab6ed1732013-02-09 22:37:22 +0200214 escape = escape_ascii
Barry Warsaw7733e122000-02-27 14:30:48 +0000215 else:
216 mod = 256
Serhiy Storchakab6ed1732013-02-09 22:37:22 +0200217 escape = escape_nonascii
218 escapes = [r"\%03o" % i for i in range(mod)]
219 for i in range(32, 127):
220 escapes[i] = chr(i)
221 escapes[ord('\\')] = r'\\'
222 escapes[ord('\t')] = r'\t'
223 escapes[ord('\r')] = r'\r'
224 escapes[ord('\n')] = r'\n'
225 escapes[ord('\"')] = r'\"'
Barry Warsawc8f08922000-02-26 20:56:47 +0000226
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000227
Serhiy Storchakab6ed1732013-02-09 22:37:22 +0200228def escape_ascii(s, encoding):
229 return ''.join(escapes[ord(c)] if ord(c) < 128 else c for c in s)
230
231def escape_nonascii(s, encoding):
232 return ''.join(escapes[b] for b in s.encode(encoding))
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000233
234
Serhiy Storchaka69524822018-04-19 09:23:03 +0300235def is_literal_string(s):
236 return s[0] in '\'"' or (s[0] in 'rRuU' and s[1] in '\'"')
237
238
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000239def safe_eval(s):
240 # unwrap quotes, safely
241 return eval(s, {'__builtins__':{}}, {})
242
243
Serhiy Storchakab6ed1732013-02-09 22:37:22 +0200244def normalize(s, encoding):
Barry Warsawe27db5a1999-08-13 20:59:48 +0000245 # This converts the various Python string types into a format that is
246 # appropriate for .po files, namely much closer to C style.
Barry Warsawa17e0f12000-03-08 15:18:35 +0000247 lines = s.split('\n')
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000248 if len(lines) == 1:
Serhiy Storchakab6ed1732013-02-09 22:37:22 +0200249 s = '"' + escape(s, encoding) + '"'
Barry Warsawe27db5a1999-08-13 20:59:48 +0000250 else:
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000251 if not lines[-1]:
252 del lines[-1]
253 lines[-1] = lines[-1] + '\n'
254 for i in range(len(lines)):
Serhiy Storchakab6ed1732013-02-09 22:37:22 +0200255 lines[i] = escape(lines[i], encoding)
Barry Warsawa17e0f12000-03-08 15:18:35 +0000256 lineterm = '\\n"\n"'
257 s = '""\n"' + lineterm.join(lines) + '"'
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000258 return s
Barry Warsawe27db5a1999-08-13 20:59:48 +0000259
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000260
261def containsAny(str, set):
Barry Warsawe04ee702003-04-16 18:08:23 +0000262 """Check whether 'str' contains ANY of the chars in 'set'"""
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000263 return 1 in [c in str for c in set]
264
265
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000266def getFilesForName(name):
Barry Warsawe04ee702003-04-16 18:08:23 +0000267 """Get a list of module files for a filename, a module or package name,
268 or a directory.
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000269 """
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000270 if not os.path.exists(name):
271 # check for glob chars
272 if containsAny(name, "*?[]"):
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000273 files = glob.glob(name)
274 list = []
275 for file in files:
276 list.extend(getFilesForName(file))
277 return list
278
279 # try to find module or package
Victor Stinner328cb1f2016-04-12 18:46:10 +0200280 try:
281 spec = importlib.util.find_spec(name)
282 name = spec.origin
283 except ImportError:
284 name = None
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000285 if not name:
286 return []
287
288 if os.path.isdir(name):
289 # find all python files in directory
290 list = []
Serhiy Storchakac93938b2018-04-09 20:09:17 +0300291 # get extension for python source files
292 _py_ext = importlib.machinery.SOURCE_SUFFIXES[0]
293 for root, dirs, files in os.walk(name):
294 # don't recurse into CVS directories
295 if 'CVS' in dirs:
296 dirs.remove('CVS')
297 # add all *.py files to list
298 list.extend(
299 [os.path.join(root, file) for file in files
300 if os.path.splitext(file)[1] == _py_ext]
301 )
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000302 return list
303 elif os.path.exists(name):
304 # a single file
305 return [name]
306
307 return []
Barry Warsawe27db5a1999-08-13 20:59:48 +0000308
309
310class TokenEater:
311 def __init__(self, options):
312 self.__options = options
313 self.__messages = {}
314 self.__state = self.__waiting
315 self.__data = []
316 self.__lineno = -1
Barry Warsaw08a8a352000-10-27 04:56:28 +0000317 self.__freshmodule = 1
Barry Warsaw63ce5af2001-07-27 16:47:18 +0000318 self.__curfile = None
Tobotimuseee72d42018-02-27 09:48:14 +1100319 self.__enclosurecount = 0
Barry Warsawe27db5a1999-08-13 20:59:48 +0000320
321 def __call__(self, ttype, tstring, stup, etup, line):
322 # dispatch
Barry Warsaw08a8a352000-10-27 04:56:28 +0000323## import token
Serhiy Storchaka69524822018-04-19 09:23:03 +0300324## print('ttype:', token.tok_name[ttype], 'tstring:', tstring,
325## file=sys.stderr)
Barry Warsawe27db5a1999-08-13 20:59:48 +0000326 self.__state(ttype, tstring, stup[0])
327
328 def __waiting(self, ttype, tstring, lineno):
Barry Warsaw63ce5af2001-07-27 16:47:18 +0000329 opts = self.__options
Barry Warsaw08a8a352000-10-27 04:56:28 +0000330 # Do docstring extractions, if enabled
Barry Warsaw63ce5af2001-07-27 16:47:18 +0000331 if opts.docstrings and not opts.nodocstrings.get(self.__curfile):
Barry Warsaw08a8a352000-10-27 04:56:28 +0000332 # module docstring?
333 if self.__freshmodule:
Serhiy Storchaka69524822018-04-19 09:23:03 +0300334 if ttype == tokenize.STRING and is_literal_string(tstring):
Barry Warsaw16b62c12001-05-21 19:51:26 +0000335 self.__addentry(safe_eval(tstring), lineno, isdocstring=1)
Barry Warsaw08a8a352000-10-27 04:56:28 +0000336 self.__freshmodule = 0
337 elif ttype not in (tokenize.COMMENT, tokenize.NL):
338 self.__freshmodule = 0
339 return
Tobotimuseee72d42018-02-27 09:48:14 +1100340 # class or func/method docstring?
Barry Warsaw08a8a352000-10-27 04:56:28 +0000341 if ttype == tokenize.NAME and tstring in ('class', 'def'):
342 self.__state = self.__suiteseen
343 return
Barry Warsaw63ce5af2001-07-27 16:47:18 +0000344 if ttype == tokenize.NAME and tstring in opts.keywords:
Barry Warsawe27db5a1999-08-13 20:59:48 +0000345 self.__state = self.__keywordseen
346
Barry Warsaw08a8a352000-10-27 04:56:28 +0000347 def __suiteseen(self, ttype, tstring, lineno):
Tobotimuseee72d42018-02-27 09:48:14 +1100348 # skip over any enclosure pairs until we see the colon
349 if ttype == tokenize.OP:
350 if tstring == ':' and self.__enclosurecount == 0:
351 # we see a colon and we're not in an enclosure: end of def
352 self.__state = self.__suitedocstring
353 elif tstring in '([{':
354 self.__enclosurecount += 1
355 elif tstring in ')]}':
356 self.__enclosurecount -= 1
Barry Warsaw08a8a352000-10-27 04:56:28 +0000357
358 def __suitedocstring(self, ttype, tstring, lineno):
359 # ignore any intervening noise
Serhiy Storchaka69524822018-04-19 09:23:03 +0300360 if ttype == tokenize.STRING and is_literal_string(tstring):
Barry Warsaw16b62c12001-05-21 19:51:26 +0000361 self.__addentry(safe_eval(tstring), lineno, isdocstring=1)
Barry Warsaw08a8a352000-10-27 04:56:28 +0000362 self.__state = self.__waiting
363 elif ttype not in (tokenize.NEWLINE, tokenize.INDENT,
364 tokenize.COMMENT):
365 # there was no class docstring
366 self.__state = self.__waiting
367
Barry Warsawe27db5a1999-08-13 20:59:48 +0000368 def __keywordseen(self, ttype, tstring, lineno):
369 if ttype == tokenize.OP and tstring == '(':
370 self.__data = []
371 self.__lineno = lineno
372 self.__state = self.__openseen
373 else:
374 self.__state = self.__waiting
375
376 def __openseen(self, ttype, tstring, lineno):
377 if ttype == tokenize.OP and tstring == ')':
378 # We've seen the last of the translatable strings. Record the
Barry Warsawe04ee702003-04-16 18:08:23 +0000379 # line number of the first line of the strings and update the list
Barry Warsawe27db5a1999-08-13 20:59:48 +0000380 # of messages seen. Reset state for the next batch. If there
381 # were no strings inside _(), then just ignore this entry.
382 if self.__data:
Barry Warsaw08a8a352000-10-27 04:56:28 +0000383 self.__addentry(EMPTYSTRING.join(self.__data))
Barry Warsawe27db5a1999-08-13 20:59:48 +0000384 self.__state = self.__waiting
Serhiy Storchaka69524822018-04-19 09:23:03 +0300385 elif ttype == tokenize.STRING and is_literal_string(tstring):
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000386 self.__data.append(safe_eval(tstring))
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000387 elif ttype not in [tokenize.COMMENT, token.INDENT, token.DEDENT,
388 token.NEWLINE, tokenize.NL]:
389 # warn if we see anything else than STRING or whitespace
Collin Winter6afaeb72007-08-03 17:06:41 +0000390 print(_(
Barry Warsawe04ee702003-04-16 18:08:23 +0000391 '*** %(file)s:%(lineno)s: Seen unexpected token "%(token)s"'
392 ) % {
393 'token': tstring,
394 'file': self.__curfile,
395 'lineno': self.__lineno
Collin Winter6afaeb72007-08-03 17:06:41 +0000396 }, file=sys.stderr)
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000397 self.__state = self.__waiting
Barry Warsawe27db5a1999-08-13 20:59:48 +0000398
Barry Warsaw16b62c12001-05-21 19:51:26 +0000399 def __addentry(self, msg, lineno=None, isdocstring=0):
Barry Warsaw08a8a352000-10-27 04:56:28 +0000400 if lineno is None:
401 lineno = self.__lineno
402 if not msg in self.__options.toexclude:
403 entry = (self.__curfile, lineno)
Barry Warsaw16b62c12001-05-21 19:51:26 +0000404 self.__messages.setdefault(msg, {})[entry] = isdocstring
Barry Warsaw08a8a352000-10-27 04:56:28 +0000405
Barry Warsawe27db5a1999-08-13 20:59:48 +0000406 def set_filename(self, filename):
407 self.__curfile = filename
Barry Warsaw63ce5af2001-07-27 16:47:18 +0000408 self.__freshmodule = 1
Barry Warsawe27db5a1999-08-13 20:59:48 +0000409
410 def write(self, fp):
411 options = self.__options
R David Murray2b781292015-04-16 12:15:09 -0400412 timestamp = time.strftime('%Y-%m-%d %H:%M%z')
Serhiy Storchakab6ed1732013-02-09 22:37:22 +0200413 encoding = fp.encoding if fp.encoding else 'UTF-8'
414 print(pot_header % {'time': timestamp, 'version': __version__,
415 'charset': encoding,
416 'encoding': '8bit'}, file=fp)
Barry Warsaw128c77d2001-05-23 16:59:45 +0000417 # Sort the entries. First sort each particular entry's keys, then
418 # sort all the entries by their first item.
419 reverse = {}
Fred Drake33e2c3e2000-10-26 03:49:15 +0000420 for k, v in self.__messages.items():
Guido van Rossumf7bd9642008-01-15 17:41:38 +0000421 keys = sorted(v.keys())
Barry Warsaw50cf7062001-05-24 23:06:13 +0000422 reverse.setdefault(tuple(keys), []).append((k, v))
Guido van Rossumf7bd9642008-01-15 17:41:38 +0000423 rkeys = sorted(reverse.keys())
Barry Warsaw128c77d2001-05-23 16:59:45 +0000424 for rkey in rkeys:
Barry Warsaw50cf7062001-05-24 23:06:13 +0000425 rentries = reverse[rkey]
426 rentries.sort()
427 for k, v in rentries:
428 # If the entry was gleaned out of a docstring, then add a
429 # comment stating so. This is to aid translators who may wish
430 # to skip translating some unimportant docstrings.
Guido van Rossum89da5d72006-08-22 00:21:25 +0000431 isdocstring = any(v.values())
Barry Warsaw50cf7062001-05-24 23:06:13 +0000432 # k is the message string, v is a dictionary-set of (filename,
433 # lineno) tuples. We want to sort the entries in v first by
434 # file name and then by line number.
Guido van Rossumf7bd9642008-01-15 17:41:38 +0000435 v = sorted(v.keys())
Barry Warsaw50cf7062001-05-24 23:06:13 +0000436 if not options.writelocations:
437 pass
438 # location comments are different b/w Solaris and GNU:
439 elif options.locationstyle == options.SOLARIS:
440 for filename, lineno in v:
441 d = {'filename': filename, 'lineno': lineno}
Collin Winter6afaeb72007-08-03 17:06:41 +0000442 print(_(
443 '# File: %(filename)s, line: %(lineno)d') % d, file=fp)
Barry Warsaw50cf7062001-05-24 23:06:13 +0000444 elif options.locationstyle == options.GNU:
445 # fit as many locations on one line, as long as the
Martin Panter69332c12016-08-04 13:07:31 +0000446 # resulting line length doesn't exceed 'options.width'
Barry Warsaw50cf7062001-05-24 23:06:13 +0000447 locline = '#:'
448 for filename, lineno in v:
449 d = {'filename': filename, 'lineno': lineno}
450 s = _(' %(filename)s:%(lineno)d') % d
451 if len(locline) + len(s) <= options.width:
452 locline = locline + s
453 else:
Collin Winter6afaeb72007-08-03 17:06:41 +0000454 print(locline, file=fp)
Barry Warsaw50cf7062001-05-24 23:06:13 +0000455 locline = "#:" + s
456 if len(locline) > 2:
Collin Winter6afaeb72007-08-03 17:06:41 +0000457 print(locline, file=fp)
Barry Warsaw5c94ce52001-06-20 19:41:40 +0000458 if isdocstring:
Collin Winter6afaeb72007-08-03 17:06:41 +0000459 print('#, docstring', file=fp)
Serhiy Storchakab6ed1732013-02-09 22:37:22 +0200460 print('msgid', normalize(k, encoding), file=fp)
Collin Winter6afaeb72007-08-03 17:06:41 +0000461 print('msgstr ""\n', file=fp)
Barry Warsaw08a8a352000-10-27 04:56:28 +0000462
Barry Warsawe27db5a1999-08-13 20:59:48 +0000463
464
465def main():
Barry Warsawa17e0f12000-03-08 15:18:35 +0000466 global default_keywords
Barry Warsawe27db5a1999-08-13 20:59:48 +0000467 try:
468 opts, args = getopt.getopt(
469 sys.argv[1:],
Barry Warsaw63ce5af2001-07-27 16:47:18 +0000470 'ad:DEhk:Kno:p:S:Vvw:x:X:',
Barry Warsaw2b639692001-05-21 19:58:23 +0000471 ['extract-all', 'default-domain=', 'escape', 'help',
Barry Warsawa17e0f12000-03-08 15:18:35 +0000472 'keyword=', 'no-default-keywords',
Barry Warsawc8f08922000-02-26 20:56:47 +0000473 'add-location', 'no-location', 'output=', 'output-dir=',
Barry Warsawa17e0f12000-03-08 15:18:35 +0000474 'style=', 'verbose', 'version', 'width=', 'exclude-file=',
Barry Warsaw63ce5af2001-07-27 16:47:18 +0000475 'docstrings', 'no-docstrings',
Barry Warsawc8f08922000-02-26 20:56:47 +0000476 ])
Guido van Rossumb940e112007-01-10 16:19:56 +0000477 except getopt.error as msg:
Barry Warsawe27db5a1999-08-13 20:59:48 +0000478 usage(1, msg)
479
480 # for holding option values
481 class Options:
482 # constants
483 GNU = 1
484 SOLARIS = 2
485 # defaults
Barry Warsawc8f08922000-02-26 20:56:47 +0000486 extractall = 0 # FIXME: currently this option has no effect at all.
487 escape = 0
Barry Warsawe27db5a1999-08-13 20:59:48 +0000488 keywords = []
Barry Warsawc8f08922000-02-26 20:56:47 +0000489 outpath = ''
Barry Warsawe27db5a1999-08-13 20:59:48 +0000490 outfile = 'messages.pot'
Barry Warsawa17e0f12000-03-08 15:18:35 +0000491 writelocations = 1
492 locationstyle = GNU
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000493 verbose = 0
Barry Warsawc8f08922000-02-26 20:56:47 +0000494 width = 78
495 excludefilename = ''
Barry Warsaw08a8a352000-10-27 04:56:28 +0000496 docstrings = 0
Barry Warsaw63ce5af2001-07-27 16:47:18 +0000497 nodocstrings = {}
Barry Warsawe27db5a1999-08-13 20:59:48 +0000498
499 options = Options()
500 locations = {'gnu' : options.GNU,
501 'solaris' : options.SOLARIS,
502 }
503
504 # parse options
505 for opt, arg in opts:
506 if opt in ('-h', '--help'):
507 usage(0)
Barry Warsawc8f08922000-02-26 20:56:47 +0000508 elif opt in ('-a', '--extract-all'):
509 options.extractall = 1
510 elif opt in ('-d', '--default-domain'):
511 options.outfile = arg + '.pot'
512 elif opt in ('-E', '--escape'):
513 options.escape = 1
Barry Warsaw08a8a352000-10-27 04:56:28 +0000514 elif opt in ('-D', '--docstrings'):
515 options.docstrings = 1
Barry Warsawe27db5a1999-08-13 20:59:48 +0000516 elif opt in ('-k', '--keyword'):
Barry Warsawe27db5a1999-08-13 20:59:48 +0000517 options.keywords.append(arg)
Barry Warsawa17e0f12000-03-08 15:18:35 +0000518 elif opt in ('-K', '--no-default-keywords'):
519 default_keywords = []
Barry Warsawe27db5a1999-08-13 20:59:48 +0000520 elif opt in ('-n', '--add-location'):
Barry Warsawa17e0f12000-03-08 15:18:35 +0000521 options.writelocations = 1
Barry Warsawe27db5a1999-08-13 20:59:48 +0000522 elif opt in ('--no-location',):
Barry Warsawa17e0f12000-03-08 15:18:35 +0000523 options.writelocations = 0
524 elif opt in ('-S', '--style'):
525 options.locationstyle = locations.get(arg.lower())
526 if options.locationstyle is None:
527 usage(1, _('Invalid value for --style: %s') % arg)
Barry Warsawc8f08922000-02-26 20:56:47 +0000528 elif opt in ('-o', '--output'):
529 options.outfile = arg
530 elif opt in ('-p', '--output-dir'):
531 options.outpath = arg
Barry Warsaw5dbf5261999-11-03 18:47:52 +0000532 elif opt in ('-v', '--verbose'):
533 options.verbose = 1
Barry Warsawc8f08922000-02-26 20:56:47 +0000534 elif opt in ('-V', '--version'):
Collin Winter6afaeb72007-08-03 17:06:41 +0000535 print(_('pygettext.py (xgettext for Python) %s') % __version__)
Barry Warsawc8f08922000-02-26 20:56:47 +0000536 sys.exit(0)
537 elif opt in ('-w', '--width'):
538 try:
539 options.width = int(arg)
540 except ValueError:
Barry Warsawa17e0f12000-03-08 15:18:35 +0000541 usage(1, _('--width argument must be an integer: %s') % arg)
Barry Warsawc8f08922000-02-26 20:56:47 +0000542 elif opt in ('-x', '--exclude-file'):
543 options.excludefilename = arg
Barry Warsaw63ce5af2001-07-27 16:47:18 +0000544 elif opt in ('-X', '--no-docstrings'):
545 fp = open(arg)
546 try:
547 while 1:
548 line = fp.readline()
549 if not line:
550 break
551 options.nodocstrings[line[:-1]] = 1
552 finally:
553 fp.close()
Barry Warsawc8f08922000-02-26 20:56:47 +0000554
555 # calculate escapes
Serhiy Storchakab6ed1732013-02-09 22:37:22 +0200556 make_escapes(not options.escape)
Barry Warsawe27db5a1999-08-13 20:59:48 +0000557
558 # calculate all keywords
559 options.keywords.extend(default_keywords)
560
Barry Warsawc8f08922000-02-26 20:56:47 +0000561 # initialize list of strings to exclude
562 if options.excludefilename:
563 try:
564 fp = open(options.excludefilename)
565 options.toexclude = fp.readlines()
566 fp.close()
567 except IOError:
Collin Winter6afaeb72007-08-03 17:06:41 +0000568 print(_(
569 "Can't read --exclude-file: %s") % options.excludefilename, file=sys.stderr)
Barry Warsawc8f08922000-02-26 20:56:47 +0000570 sys.exit(1)
571 else:
572 options.toexclude = []
573
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000574 # resolve args to module lists
575 expanded = []
576 for arg in args:
577 if arg == '-':
578 expanded.append(arg)
579 else:
580 expanded.extend(getFilesForName(arg))
581 args = expanded
582
Barry Warsawe27db5a1999-08-13 20:59:48 +0000583 # slurp through all the files
584 eater = TokenEater(options)
585 for filename in args:
Barry Warsawa17e0f12000-03-08 15:18:35 +0000586 if filename == '-':
587 if options.verbose:
Collin Winter6afaeb72007-08-03 17:06:41 +0000588 print(_('Reading standard input'))
Serhiy Storchakab6ed1732013-02-09 22:37:22 +0200589 fp = sys.stdin.buffer
Barry Warsawa17e0f12000-03-08 15:18:35 +0000590 closep = 0
591 else:
592 if options.verbose:
Collin Winter6afaeb72007-08-03 17:06:41 +0000593 print(_('Working on %s') % filename)
Serhiy Storchakab6ed1732013-02-09 22:37:22 +0200594 fp = open(filename, 'rb')
Barry Warsawa17e0f12000-03-08 15:18:35 +0000595 closep = 1
596 try:
597 eater.set_filename(filename)
Barry Warsaw75ee8f52001-02-26 04:46:53 +0000598 try:
Serhiy Storchakab6ed1732013-02-09 22:37:22 +0200599 tokens = tokenize.tokenize(fp.readline)
Trent Nelson428de652008-03-18 22:41:35 +0000600 for _token in tokens:
601 eater(*_token)
Guido van Rossumb940e112007-01-10 16:19:56 +0000602 except tokenize.TokenError as e:
Collin Winter6afaeb72007-08-03 17:06:41 +0000603 print('%s: %s, line %d, column %d' % (
Georg Brandl6464d472007-10-22 16:16:13 +0000604 e.args[0], filename, e.args[1][0], e.args[1][1]),
605 file=sys.stderr)
Barry Warsawa17e0f12000-03-08 15:18:35 +0000606 finally:
607 if closep:
608 fp.close()
Barry Warsawe27db5a1999-08-13 20:59:48 +0000609
Barry Warsawa17e0f12000-03-08 15:18:35 +0000610 # write the output
611 if options.outfile == '-':
612 fp = sys.stdout
613 closep = 0
614 else:
615 if options.outpath:
616 options.outfile = os.path.join(options.outpath, options.outfile)
617 fp = open(options.outfile, 'w')
618 closep = 1
619 try:
620 eater.write(fp)
621 finally:
622 if closep:
623 fp.close()
Barry Warsawe27db5a1999-08-13 20:59:48 +0000624
625
626if __name__ == '__main__':
627 main()
Barry Warsaw75a6e672000-05-02 19:28:30 +0000628 # some more test strings
Barry Warsawe04ee702003-04-16 18:08:23 +0000629 # this one creates a warning
630 _('*** Seen unexpected token "%(token)s"') % {'token': 'test'}
Martin v. Löwis0d1fdea2002-11-22 08:36:54 +0000631 _('more' 'than' 'one' 'string')