Barry Warsaw | af57251 | 1999-08-11 21:40:38 +0000 | [diff] [blame] | 1 | #! /usr/bin/env python |
Barry Warsaw | 6e97241 | 2001-05-21 19:35:20 +0000 | [diff] [blame^] | 2 | # Originally written by Barry Warsaw <barry@digicool.com> |
Barry Warsaw | c8f0892 | 2000-02-26 20:56:47 +0000 | [diff] [blame] | 3 | # |
Barry Warsaw | 6e97241 | 2001-05-21 19:35:20 +0000 | [diff] [blame^] | 4 | # Minimally patched to make it even more xgettext compatible |
Barry Warsaw | c8f0892 | 2000-02-26 20:56:47 +0000 | [diff] [blame] | 5 | # by Peter Funk <pf@artcom-gmbh.de> |
Barry Warsaw | e27db5a | 1999-08-13 20:59:48 +0000 | [diff] [blame] | 6 | |
Barry Warsaw | 08a8a35 | 2000-10-27 04:56:28 +0000 | [diff] [blame] | 7 | """pygettext -- Python equivalent of xgettext(1) |
Barry Warsaw | e27db5a | 1999-08-13 20:59:48 +0000 | [diff] [blame] | 8 | |
| 9 | Many systems (Solaris, Linux, Gnu) provide extensive tools that ease the |
| 10 | internationalization of C programs. Most of these tools are independent of |
| 11 | the programming language and can be used from within Python programs. Martin |
| 12 | von Loewis' work[1] helps considerably in this regard. |
| 13 | |
Barry Warsaw | 5dbf526 | 1999-11-03 18:47:52 +0000 | [diff] [blame] | 14 | There's one problem though; xgettext is the program that scans source code |
Barry Warsaw | e27db5a | 1999-08-13 20:59:48 +0000 | [diff] [blame] | 15 | looking for message strings, but it groks only C (or C++). Python introduces |
| 16 | a few wrinkles, such as dual quoting characters, triple quoted strings, and |
| 17 | raw strings. xgettext understands none of this. |
| 18 | |
| 19 | Enter pygettext, which uses Python's standard tokenize module to scan Python |
| 20 | source code, generating .pot files identical to what GNU xgettext[2] generates |
Barry Warsaw | 5dbf526 | 1999-11-03 18:47:52 +0000 | [diff] [blame] | 21 | for C and C++ code. From there, the standard GNU tools can be used. |
Barry Warsaw | e27db5a | 1999-08-13 20:59:48 +0000 | [diff] [blame] | 22 | |
| 23 | A word about marking Python strings as candidates for translation. GNU |
| 24 | xgettext recognizes the following keywords: gettext, dgettext, dcgettext, and |
| 25 | gettext_noop. But those can be a lot of text to include all over your code. |
Barry Warsaw | 5dbf526 | 1999-11-03 18:47:52 +0000 | [diff] [blame] | 26 | C and C++ have a trick: they use the C preprocessor. Most internationalized C |
Barry Warsaw | e27db5a | 1999-08-13 20:59:48 +0000 | [diff] [blame] | 27 | source includes a #define for gettext() to _() so that what has to be written |
| 28 | in the source is much less. Thus these are both translatable strings: |
| 29 | |
| 30 | gettext("Translatable String") |
| 31 | _("Translatable String") |
| 32 | |
| 33 | Python of course has no preprocessor so this doesn't work so well. Thus, |
| 34 | pygettext searches only for _() by default, but see the -k/--keyword flag |
| 35 | below for how to augment this. |
| 36 | |
| 37 | [1] http://www.python.org/workshops/1997-10/proceedings/loewis.html |
| 38 | [2] http://www.gnu.org/software/gettext/gettext.html |
| 39 | |
Barry Warsaw | e27db5a | 1999-08-13 20:59:48 +0000 | [diff] [blame] | 40 | NOTE: pygettext attempts to be option and feature compatible with GNU xgettext |
Barry Warsaw | c8f0892 | 2000-02-26 20:56:47 +0000 | [diff] [blame] | 41 | where ever possible. However some options are still missing or are not fully |
Barry Warsaw | a17e0f1 | 2000-03-08 15:18:35 +0000 | [diff] [blame] | 42 | implemented. Also, xgettext's use of command line switches with option |
| 43 | arguments is broken, and in these cases, pygettext just defines additional |
| 44 | switches. |
Barry Warsaw | e27db5a | 1999-08-13 20:59:48 +0000 | [diff] [blame] | 45 | |
Barry Warsaw | a17e0f1 | 2000-03-08 15:18:35 +0000 | [diff] [blame] | 46 | Usage: pygettext [options] inputfile ... |
Barry Warsaw | e27db5a | 1999-08-13 20:59:48 +0000 | [diff] [blame] | 47 | |
| 48 | Options: |
| 49 | |
| 50 | -a |
| 51 | --extract-all |
| 52 | Extract all strings |
| 53 | |
Barry Warsaw | c8f0892 | 2000-02-26 20:56:47 +0000 | [diff] [blame] | 54 | -d name |
| 55 | --default-domain=name |
| 56 | Rename the default output file from messages.pot to name.pot |
| 57 | |
| 58 | -E |
| 59 | --escape |
Barry Warsaw | 08a8a35 | 2000-10-27 04:56:28 +0000 | [diff] [blame] | 60 | Replace non-ASCII characters with octal escape sequences. |
| 61 | |
| 62 | -D |
| 63 | --docstrings |
| 64 | Extract module, class, method, and function docstrings. These do not |
| 65 | need to be wrapped in _() markers, and in fact cannot be for Python to |
| 66 | consider them docstrings. |
Barry Warsaw | c8f0892 | 2000-02-26 20:56:47 +0000 | [diff] [blame] | 67 | |
| 68 | -h |
| 69 | --help |
| 70 | print this help message and exit |
Barry Warsaw | e27db5a | 1999-08-13 20:59:48 +0000 | [diff] [blame] | 71 | |
Barry Warsaw | a17e0f1 | 2000-03-08 15:18:35 +0000 | [diff] [blame] | 72 | -k word |
| 73 | --keyword=word |
| 74 | Keywords to look for in addition to the default set, which are: |
| 75 | %(DEFAULTKEYWORDS)s |
Barry Warsaw | e27db5a | 1999-08-13 20:59:48 +0000 | [diff] [blame] | 76 | |
Barry Warsaw | a17e0f1 | 2000-03-08 15:18:35 +0000 | [diff] [blame] | 77 | You can have multiple -k flags on the command line. |
| 78 | |
| 79 | -K |
| 80 | --no-default-keywords |
| 81 | Disable the default set of keywords (see above). Any keywords |
| 82 | explicitly added with the -k/--keyword option are still recognized. |
Barry Warsaw | e27db5a | 1999-08-13 20:59:48 +0000 | [diff] [blame] | 83 | |
| 84 | --no-location |
Barry Warsaw | a17e0f1 | 2000-03-08 15:18:35 +0000 | [diff] [blame] | 85 | Do not write filename/lineno location comments. |
Barry Warsaw | e27db5a | 1999-08-13 20:59:48 +0000 | [diff] [blame] | 86 | |
Barry Warsaw | a17e0f1 | 2000-03-08 15:18:35 +0000 | [diff] [blame] | 87 | -n |
| 88 | --add-location |
Barry Warsaw | e27db5a | 1999-08-13 20:59:48 +0000 | [diff] [blame] | 89 | Write filename/lineno location comments indicating where each |
| 90 | extracted string is found in the source. These lines appear before |
Barry Warsaw | a17e0f1 | 2000-03-08 15:18:35 +0000 | [diff] [blame] | 91 | each msgid. The style of comments is controlled by the -S/--style |
| 92 | option. This is the default. |
| 93 | |
Barry Warsaw | 08a8a35 | 2000-10-27 04:56:28 +0000 | [diff] [blame] | 94 | -o filename |
| 95 | --output=filename |
| 96 | Rename the default output file from messages.pot to filename. If |
| 97 | filename is `-' then the output is sent to standard out. |
| 98 | |
| 99 | -p dir |
| 100 | --output-dir=dir |
| 101 | Output files will be placed in directory dir. |
| 102 | |
Barry Warsaw | a17e0f1 | 2000-03-08 15:18:35 +0000 | [diff] [blame] | 103 | -S stylename |
| 104 | --style stylename |
| 105 | Specify which style to use for location comments. Two styles are |
| 106 | supported: |
Barry Warsaw | e27db5a | 1999-08-13 20:59:48 +0000 | [diff] [blame] | 107 | |
| 108 | Solaris # File: filename, line: line-number |
Barry Warsaw | a17e0f1 | 2000-03-08 15:18:35 +0000 | [diff] [blame] | 109 | GNU #: filename:line |
Barry Warsaw | e27db5a | 1999-08-13 20:59:48 +0000 | [diff] [blame] | 110 | |
Barry Warsaw | a17e0f1 | 2000-03-08 15:18:35 +0000 | [diff] [blame] | 111 | The style name is case insensitive. GNU style is the default. |
Barry Warsaw | e27db5a | 1999-08-13 20:59:48 +0000 | [diff] [blame] | 112 | |
Barry Warsaw | 5dbf526 | 1999-11-03 18:47:52 +0000 | [diff] [blame] | 113 | -v |
| 114 | --verbose |
| 115 | Print the names of the files being processed. |
| 116 | |
Barry Warsaw | c8f0892 | 2000-02-26 20:56:47 +0000 | [diff] [blame] | 117 | -V |
| 118 | --version |
| 119 | Print the version of pygettext and exit. |
| 120 | |
| 121 | -w columns |
| 122 | --width=columns |
| 123 | Set width of output to columns. |
| 124 | |
| 125 | -x filename |
| 126 | --exclude-file=filename |
| 127 | Specify a file that contains a list of strings that are not be |
| 128 | extracted from the input files. Each string to be excluded must |
| 129 | appear on a line by itself in the file. |
Barry Warsaw | e27db5a | 1999-08-13 20:59:48 +0000 | [diff] [blame] | 130 | |
Barry Warsaw | a17e0f1 | 2000-03-08 15:18:35 +0000 | [diff] [blame] | 131 | If `inputfile' is -, standard input is read. |
| 132 | |
Barry Warsaw | 08a8a35 | 2000-10-27 04:56:28 +0000 | [diff] [blame] | 133 | """ |
Barry Warsaw | e27db5a | 1999-08-13 20:59:48 +0000 | [diff] [blame] | 134 | |
| 135 | import os |
| 136 | import sys |
Barry Warsaw | e27db5a | 1999-08-13 20:59:48 +0000 | [diff] [blame] | 137 | import time |
| 138 | import getopt |
| 139 | import tokenize |
| 140 | |
Barry Warsaw | 08a8a35 | 2000-10-27 04:56:28 +0000 | [diff] [blame] | 141 | # for selftesting |
| 142 | try: |
| 143 | import fintl |
| 144 | _ = fintl.gettext |
| 145 | except ImportError: |
| 146 | def _(s): return s |
| 147 | |
Martin v. Löwis | 0f6b383 | 2001-03-01 22:56:17 +0000 | [diff] [blame] | 148 | __version__ = '1.3' |
Barry Warsaw | a17e0f1 | 2000-03-08 15:18:35 +0000 | [diff] [blame] | 149 | |
| 150 | default_keywords = ['_'] |
| 151 | DEFAULTKEYWORDS = ', '.join(default_keywords) |
| 152 | |
| 153 | EMPTYSTRING = '' |
Barry Warsaw | e27db5a | 1999-08-13 20:59:48 +0000 | [diff] [blame] | 154 | |
| 155 | |
| 156 | |
Barry Warsaw | 5dbf526 | 1999-11-03 18:47:52 +0000 | [diff] [blame] | 157 | # The normal pot-file header. msgmerge and EMACS' po-mode work better if |
| 158 | # it's there. |
| 159 | pot_header = _('''\ |
| 160 | # SOME DESCRIPTIVE TITLE. |
| 161 | # Copyright (C) YEAR ORGANIZATION |
| 162 | # FIRST AUTHOR <EMAIL@ADDRESS>, YEAR. |
| 163 | # |
| 164 | msgid "" |
| 165 | msgstr "" |
| 166 | "Project-Id-Version: PACKAGE VERSION\\n" |
Martin v. Löwis | 0f6b383 | 2001-03-01 22:56:17 +0000 | [diff] [blame] | 167 | "POT-Creation-Date: %(time)s\\n" |
| 168 | "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\\n" |
Barry Warsaw | 5dbf526 | 1999-11-03 18:47:52 +0000 | [diff] [blame] | 169 | "Last-Translator: FULL NAME <EMAIL@ADDRESS>\\n" |
| 170 | "Language-Team: LANGUAGE <LL@li.org>\\n" |
| 171 | "MIME-Version: 1.0\\n" |
| 172 | "Content-Type: text/plain; charset=CHARSET\\n" |
| 173 | "Content-Transfer-Encoding: ENCODING\\n" |
| 174 | "Generated-By: pygettext.py %(version)s\\n" |
| 175 | |
| 176 | ''') |
| 177 | |
| 178 | |
Barry Warsaw | e27db5a | 1999-08-13 20:59:48 +0000 | [diff] [blame] | 179 | def usage(code, msg=''): |
Barry Warsaw | 08a8a35 | 2000-10-27 04:56:28 +0000 | [diff] [blame] | 180 | print >> sys.stderr, _(__doc__) % globals() |
Barry Warsaw | e27db5a | 1999-08-13 20:59:48 +0000 | [diff] [blame] | 181 | if msg: |
Barry Warsaw | 08a8a35 | 2000-10-27 04:56:28 +0000 | [diff] [blame] | 182 | print >> sys.stderr, msg |
Barry Warsaw | e27db5a | 1999-08-13 20:59:48 +0000 | [diff] [blame] | 183 | sys.exit(code) |
| 184 | |
Barry Warsaw | c8f0892 | 2000-02-26 20:56:47 +0000 | [diff] [blame] | 185 | |
Barry Warsaw | e27db5a | 1999-08-13 20:59:48 +0000 | [diff] [blame] | 186 | |
Barry Warsaw | 5dbf526 | 1999-11-03 18:47:52 +0000 | [diff] [blame] | 187 | escapes = [] |
Barry Warsaw | 5dbf526 | 1999-11-03 18:47:52 +0000 | [diff] [blame] | 188 | |
Barry Warsaw | c8f0892 | 2000-02-26 20:56:47 +0000 | [diff] [blame] | 189 | def make_escapes(pass_iso8859): |
| 190 | global escapes |
Barry Warsaw | 7733e12 | 2000-02-27 14:30:48 +0000 | [diff] [blame] | 191 | if pass_iso8859: |
| 192 | # Allow iso-8859 characters to pass through so that e.g. 'msgid |
| 193 | # "Höhe"' would result not result in 'msgid "H\366he"'. Otherwise we |
| 194 | # escape any character outside the 32..126 range. |
| 195 | mod = 128 |
| 196 | else: |
| 197 | mod = 256 |
Barry Warsaw | c8f0892 | 2000-02-26 20:56:47 +0000 | [diff] [blame] | 198 | for i in range(256): |
Barry Warsaw | 7733e12 | 2000-02-27 14:30:48 +0000 | [diff] [blame] | 199 | if 32 <= (i % mod) <= 126: |
Barry Warsaw | c8f0892 | 2000-02-26 20:56:47 +0000 | [diff] [blame] | 200 | escapes.append(chr(i)) |
| 201 | else: |
| 202 | escapes.append("\\%03o" % i) |
| 203 | escapes[ord('\\')] = '\\\\' |
| 204 | escapes[ord('\t')] = '\\t' |
| 205 | escapes[ord('\r')] = '\\r' |
| 206 | escapes[ord('\n')] = '\\n' |
| 207 | escapes[ord('\"')] = '\\"' |
| 208 | |
Barry Warsaw | 5dbf526 | 1999-11-03 18:47:52 +0000 | [diff] [blame] | 209 | |
| 210 | def escape(s): |
Barry Warsaw | c8f0892 | 2000-02-26 20:56:47 +0000 | [diff] [blame] | 211 | global escapes |
Barry Warsaw | 5dbf526 | 1999-11-03 18:47:52 +0000 | [diff] [blame] | 212 | s = list(s) |
| 213 | for i in range(len(s)): |
| 214 | s[i] = escapes[ord(s[i])] |
Barry Warsaw | a17e0f1 | 2000-03-08 15:18:35 +0000 | [diff] [blame] | 215 | return EMPTYSTRING.join(s) |
Barry Warsaw | 5dbf526 | 1999-11-03 18:47:52 +0000 | [diff] [blame] | 216 | |
| 217 | |
| 218 | def safe_eval(s): |
| 219 | # unwrap quotes, safely |
| 220 | return eval(s, {'__builtins__':{}}, {}) |
| 221 | |
| 222 | |
Barry Warsaw | e27db5a | 1999-08-13 20:59:48 +0000 | [diff] [blame] | 223 | def normalize(s): |
| 224 | # This converts the various Python string types into a format that is |
| 225 | # appropriate for .po files, namely much closer to C style. |
Barry Warsaw | a17e0f1 | 2000-03-08 15:18:35 +0000 | [diff] [blame] | 226 | lines = s.split('\n') |
Barry Warsaw | 5dbf526 | 1999-11-03 18:47:52 +0000 | [diff] [blame] | 227 | if len(lines) == 1: |
| 228 | s = '"' + escape(s) + '"' |
Barry Warsaw | e27db5a | 1999-08-13 20:59:48 +0000 | [diff] [blame] | 229 | else: |
Barry Warsaw | 5dbf526 | 1999-11-03 18:47:52 +0000 | [diff] [blame] | 230 | if not lines[-1]: |
| 231 | del lines[-1] |
| 232 | lines[-1] = lines[-1] + '\n' |
| 233 | for i in range(len(lines)): |
| 234 | lines[i] = escape(lines[i]) |
Barry Warsaw | a17e0f1 | 2000-03-08 15:18:35 +0000 | [diff] [blame] | 235 | lineterm = '\\n"\n"' |
| 236 | s = '""\n"' + lineterm.join(lines) + '"' |
Barry Warsaw | 5dbf526 | 1999-11-03 18:47:52 +0000 | [diff] [blame] | 237 | return s |
Barry Warsaw | e27db5a | 1999-08-13 20:59:48 +0000 | [diff] [blame] | 238 | |
| 239 | |
| 240 | |
| 241 | class TokenEater: |
| 242 | def __init__(self, options): |
| 243 | self.__options = options |
| 244 | self.__messages = {} |
| 245 | self.__state = self.__waiting |
| 246 | self.__data = [] |
| 247 | self.__lineno = -1 |
Barry Warsaw | 08a8a35 | 2000-10-27 04:56:28 +0000 | [diff] [blame] | 248 | self.__freshmodule = 1 |
Barry Warsaw | e27db5a | 1999-08-13 20:59:48 +0000 | [diff] [blame] | 249 | |
| 250 | def __call__(self, ttype, tstring, stup, etup, line): |
| 251 | # dispatch |
Barry Warsaw | 08a8a35 | 2000-10-27 04:56:28 +0000 | [diff] [blame] | 252 | ## import token |
| 253 | ## print >> sys.stderr, 'ttype:', token.tok_name[ttype], \ |
| 254 | ## 'tstring:', tstring |
Barry Warsaw | e27db5a | 1999-08-13 20:59:48 +0000 | [diff] [blame] | 255 | self.__state(ttype, tstring, stup[0]) |
| 256 | |
| 257 | def __waiting(self, ttype, tstring, lineno): |
Barry Warsaw | 08a8a35 | 2000-10-27 04:56:28 +0000 | [diff] [blame] | 258 | # Do docstring extractions, if enabled |
| 259 | if self.__options.docstrings: |
| 260 | # module docstring? |
| 261 | if self.__freshmodule: |
| 262 | if ttype == tokenize.STRING: |
| 263 | self.__addentry(safe_eval(tstring), lineno) |
| 264 | self.__freshmodule = 0 |
| 265 | elif ttype not in (tokenize.COMMENT, tokenize.NL): |
| 266 | self.__freshmodule = 0 |
| 267 | return |
| 268 | # class docstring? |
| 269 | if ttype == tokenize.NAME and tstring in ('class', 'def'): |
| 270 | self.__state = self.__suiteseen |
| 271 | return |
Barry Warsaw | e27db5a | 1999-08-13 20:59:48 +0000 | [diff] [blame] | 272 | if ttype == tokenize.NAME and tstring in self.__options.keywords: |
| 273 | self.__state = self.__keywordseen |
| 274 | |
Barry Warsaw | 08a8a35 | 2000-10-27 04:56:28 +0000 | [diff] [blame] | 275 | def __suiteseen(self, ttype, tstring, lineno): |
| 276 | # ignore anything until we see the colon |
| 277 | if ttype == tokenize.OP and tstring == ':': |
| 278 | self.__state = self.__suitedocstring |
| 279 | |
| 280 | def __suitedocstring(self, ttype, tstring, lineno): |
| 281 | # ignore any intervening noise |
| 282 | if ttype == tokenize.STRING: |
| 283 | self.__addentry(safe_eval(tstring), lineno) |
| 284 | self.__state = self.__waiting |
| 285 | elif ttype not in (tokenize.NEWLINE, tokenize.INDENT, |
| 286 | tokenize.COMMENT): |
| 287 | # there was no class docstring |
| 288 | self.__state = self.__waiting |
| 289 | |
Barry Warsaw | e27db5a | 1999-08-13 20:59:48 +0000 | [diff] [blame] | 290 | def __keywordseen(self, ttype, tstring, lineno): |
| 291 | if ttype == tokenize.OP and tstring == '(': |
| 292 | self.__data = [] |
| 293 | self.__lineno = lineno |
| 294 | self.__state = self.__openseen |
| 295 | else: |
| 296 | self.__state = self.__waiting |
| 297 | |
| 298 | def __openseen(self, ttype, tstring, lineno): |
| 299 | if ttype == tokenize.OP and tstring == ')': |
| 300 | # We've seen the last of the translatable strings. Record the |
| 301 | # line number of the first line of the strings and update the list |
| 302 | # of messages seen. Reset state for the next batch. If there |
| 303 | # were no strings inside _(), then just ignore this entry. |
| 304 | if self.__data: |
Barry Warsaw | 08a8a35 | 2000-10-27 04:56:28 +0000 | [diff] [blame] | 305 | self.__addentry(EMPTYSTRING.join(self.__data)) |
Barry Warsaw | e27db5a | 1999-08-13 20:59:48 +0000 | [diff] [blame] | 306 | self.__state = self.__waiting |
| 307 | elif ttype == tokenize.STRING: |
Barry Warsaw | 5dbf526 | 1999-11-03 18:47:52 +0000 | [diff] [blame] | 308 | self.__data.append(safe_eval(tstring)) |
Barry Warsaw | e27db5a | 1999-08-13 20:59:48 +0000 | [diff] [blame] | 309 | # TBD: should we warn if we seen anything else? |
| 310 | |
Barry Warsaw | 08a8a35 | 2000-10-27 04:56:28 +0000 | [diff] [blame] | 311 | def __addentry(self, msg, lineno=None): |
| 312 | if lineno is None: |
| 313 | lineno = self.__lineno |
| 314 | if not msg in self.__options.toexclude: |
| 315 | entry = (self.__curfile, lineno) |
Barry Warsaw | 6e97241 | 2001-05-21 19:35:20 +0000 | [diff] [blame^] | 316 | self.__messages.setdefault(msg, {})[entry] = 1 |
Barry Warsaw | 08a8a35 | 2000-10-27 04:56:28 +0000 | [diff] [blame] | 317 | |
Barry Warsaw | e27db5a | 1999-08-13 20:59:48 +0000 | [diff] [blame] | 318 | def set_filename(self, filename): |
| 319 | self.__curfile = filename |
| 320 | |
| 321 | def write(self, fp): |
| 322 | options = self.__options |
| 323 | timestamp = time.ctime(time.time()) |
Barry Warsaw | 08a8a35 | 2000-10-27 04:56:28 +0000 | [diff] [blame] | 324 | # The time stamp in the header doesn't have the same format as that |
| 325 | # generated by xgettext... |
| 326 | print >> fp, pot_header % {'time': timestamp, 'version': __version__} |
Fred Drake | 33e2c3e | 2000-10-26 03:49:15 +0000 | [diff] [blame] | 327 | for k, v in self.__messages.items(): |
Barry Warsaw | 6e97241 | 2001-05-21 19:35:20 +0000 | [diff] [blame^] | 328 | # k is the message string, v is a dictionary-set of (filename, |
| 329 | # lineno) tuples. We want to sort the entries in v first by file |
| 330 | # name and then by line number. |
| 331 | v = v.keys() |
| 332 | v.sort() |
Fred Drake | 33e2c3e | 2000-10-26 03:49:15 +0000 | [diff] [blame] | 333 | if not options.writelocations: |
| 334 | pass |
| 335 | # location comments are different b/w Solaris and GNU: |
| 336 | elif options.locationstyle == options.SOLARIS: |
| 337 | for filename, lineno in v: |
| 338 | d = {'filename': filename, 'lineno': lineno} |
| 339 | print >>fp, _('# File: %(filename)s, line: %(lineno)d') % d |
| 340 | elif options.locationstyle == options.GNU: |
| 341 | # fit as many locations on one line, as long as the |
| 342 | # resulting line length doesn't exceeds 'options.width' |
| 343 | locline = '#:' |
| 344 | for filename, lineno in v: |
| 345 | d = {'filename': filename, 'lineno': lineno} |
| 346 | s = _(' %(filename)s:%(lineno)d') % d |
| 347 | if len(locline) + len(s) <= options.width: |
| 348 | locline = locline + s |
| 349 | else: |
Barry Warsaw | 08a8a35 | 2000-10-27 04:56:28 +0000 | [diff] [blame] | 350 | print >> fp, locline |
Fred Drake | 33e2c3e | 2000-10-26 03:49:15 +0000 | [diff] [blame] | 351 | locline = "#:" + s |
| 352 | if len(locline) > 2: |
Barry Warsaw | 08a8a35 | 2000-10-27 04:56:28 +0000 | [diff] [blame] | 353 | print >> fp, locline |
Fred Drake | 33e2c3e | 2000-10-26 03:49:15 +0000 | [diff] [blame] | 354 | # TBD: sorting, normalizing |
Barry Warsaw | 08a8a35 | 2000-10-27 04:56:28 +0000 | [diff] [blame] | 355 | print >> fp, 'msgid', normalize(k) |
| 356 | print >> fp, 'msgstr ""\n' |
| 357 | |
Barry Warsaw | e27db5a | 1999-08-13 20:59:48 +0000 | [diff] [blame] | 358 | |
| 359 | |
| 360 | def main(): |
Barry Warsaw | a17e0f1 | 2000-03-08 15:18:35 +0000 | [diff] [blame] | 361 | global default_keywords |
Barry Warsaw | e27db5a | 1999-08-13 20:59:48 +0000 | [diff] [blame] | 362 | try: |
| 363 | opts, args = getopt.getopt( |
| 364 | sys.argv[1:], |
Barry Warsaw | 08a8a35 | 2000-10-27 04:56:28 +0000 | [diff] [blame] | 365 | 'ad:DEhk:Kno:p:S:Vvw:x:', |
Barry Warsaw | a17e0f1 | 2000-03-08 15:18:35 +0000 | [diff] [blame] | 366 | ['extract-all', 'default-domain', 'escape', 'help', |
| 367 | 'keyword=', 'no-default-keywords', |
Barry Warsaw | c8f0892 | 2000-02-26 20:56:47 +0000 | [diff] [blame] | 368 | 'add-location', 'no-location', 'output=', 'output-dir=', |
Barry Warsaw | a17e0f1 | 2000-03-08 15:18:35 +0000 | [diff] [blame] | 369 | 'style=', 'verbose', 'version', 'width=', 'exclude-file=', |
Barry Warsaw | 08a8a35 | 2000-10-27 04:56:28 +0000 | [diff] [blame] | 370 | 'docstrings', |
Barry Warsaw | c8f0892 | 2000-02-26 20:56:47 +0000 | [diff] [blame] | 371 | ]) |
Barry Warsaw | e27db5a | 1999-08-13 20:59:48 +0000 | [diff] [blame] | 372 | except getopt.error, msg: |
| 373 | usage(1, msg) |
| 374 | |
| 375 | # for holding option values |
| 376 | class Options: |
| 377 | # constants |
| 378 | GNU = 1 |
| 379 | SOLARIS = 2 |
| 380 | # defaults |
Barry Warsaw | c8f0892 | 2000-02-26 20:56:47 +0000 | [diff] [blame] | 381 | extractall = 0 # FIXME: currently this option has no effect at all. |
| 382 | escape = 0 |
Barry Warsaw | e27db5a | 1999-08-13 20:59:48 +0000 | [diff] [blame] | 383 | keywords = [] |
Barry Warsaw | c8f0892 | 2000-02-26 20:56:47 +0000 | [diff] [blame] | 384 | outpath = '' |
Barry Warsaw | e27db5a | 1999-08-13 20:59:48 +0000 | [diff] [blame] | 385 | outfile = 'messages.pot' |
Barry Warsaw | a17e0f1 | 2000-03-08 15:18:35 +0000 | [diff] [blame] | 386 | writelocations = 1 |
| 387 | locationstyle = GNU |
Barry Warsaw | 5dbf526 | 1999-11-03 18:47:52 +0000 | [diff] [blame] | 388 | verbose = 0 |
Barry Warsaw | c8f0892 | 2000-02-26 20:56:47 +0000 | [diff] [blame] | 389 | width = 78 |
| 390 | excludefilename = '' |
Barry Warsaw | 08a8a35 | 2000-10-27 04:56:28 +0000 | [diff] [blame] | 391 | docstrings = 0 |
Barry Warsaw | e27db5a | 1999-08-13 20:59:48 +0000 | [diff] [blame] | 392 | |
| 393 | options = Options() |
| 394 | locations = {'gnu' : options.GNU, |
| 395 | 'solaris' : options.SOLARIS, |
| 396 | } |
| 397 | |
| 398 | # parse options |
| 399 | for opt, arg in opts: |
| 400 | if opt in ('-h', '--help'): |
| 401 | usage(0) |
Barry Warsaw | c8f0892 | 2000-02-26 20:56:47 +0000 | [diff] [blame] | 402 | elif opt in ('-a', '--extract-all'): |
| 403 | options.extractall = 1 |
| 404 | elif opt in ('-d', '--default-domain'): |
| 405 | options.outfile = arg + '.pot' |
| 406 | elif opt in ('-E', '--escape'): |
| 407 | options.escape = 1 |
Barry Warsaw | 08a8a35 | 2000-10-27 04:56:28 +0000 | [diff] [blame] | 408 | elif opt in ('-D', '--docstrings'): |
| 409 | options.docstrings = 1 |
Barry Warsaw | e27db5a | 1999-08-13 20:59:48 +0000 | [diff] [blame] | 410 | elif opt in ('-k', '--keyword'): |
Barry Warsaw | e27db5a | 1999-08-13 20:59:48 +0000 | [diff] [blame] | 411 | options.keywords.append(arg) |
Barry Warsaw | a17e0f1 | 2000-03-08 15:18:35 +0000 | [diff] [blame] | 412 | elif opt in ('-K', '--no-default-keywords'): |
| 413 | default_keywords = [] |
Barry Warsaw | e27db5a | 1999-08-13 20:59:48 +0000 | [diff] [blame] | 414 | elif opt in ('-n', '--add-location'): |
Barry Warsaw | a17e0f1 | 2000-03-08 15:18:35 +0000 | [diff] [blame] | 415 | options.writelocations = 1 |
Barry Warsaw | e27db5a | 1999-08-13 20:59:48 +0000 | [diff] [blame] | 416 | elif opt in ('--no-location',): |
Barry Warsaw | a17e0f1 | 2000-03-08 15:18:35 +0000 | [diff] [blame] | 417 | options.writelocations = 0 |
| 418 | elif opt in ('-S', '--style'): |
| 419 | options.locationstyle = locations.get(arg.lower()) |
| 420 | if options.locationstyle is None: |
| 421 | usage(1, _('Invalid value for --style: %s') % arg) |
Barry Warsaw | c8f0892 | 2000-02-26 20:56:47 +0000 | [diff] [blame] | 422 | elif opt in ('-o', '--output'): |
| 423 | options.outfile = arg |
| 424 | elif opt in ('-p', '--output-dir'): |
| 425 | options.outpath = arg |
Barry Warsaw | 5dbf526 | 1999-11-03 18:47:52 +0000 | [diff] [blame] | 426 | elif opt in ('-v', '--verbose'): |
| 427 | options.verbose = 1 |
Barry Warsaw | c8f0892 | 2000-02-26 20:56:47 +0000 | [diff] [blame] | 428 | elif opt in ('-V', '--version'): |
| 429 | print _('pygettext.py (xgettext for Python) %s') % __version__ |
| 430 | sys.exit(0) |
| 431 | elif opt in ('-w', '--width'): |
| 432 | try: |
| 433 | options.width = int(arg) |
| 434 | except ValueError: |
Barry Warsaw | a17e0f1 | 2000-03-08 15:18:35 +0000 | [diff] [blame] | 435 | usage(1, _('--width argument must be an integer: %s') % arg) |
Barry Warsaw | c8f0892 | 2000-02-26 20:56:47 +0000 | [diff] [blame] | 436 | elif opt in ('-x', '--exclude-file'): |
| 437 | options.excludefilename = arg |
| 438 | |
| 439 | # calculate escapes |
Barry Warsaw | 7733e12 | 2000-02-27 14:30:48 +0000 | [diff] [blame] | 440 | make_escapes(options.escape) |
Barry Warsaw | e27db5a | 1999-08-13 20:59:48 +0000 | [diff] [blame] | 441 | |
| 442 | # calculate all keywords |
| 443 | options.keywords.extend(default_keywords) |
| 444 | |
Barry Warsaw | c8f0892 | 2000-02-26 20:56:47 +0000 | [diff] [blame] | 445 | # initialize list of strings to exclude |
| 446 | if options.excludefilename: |
| 447 | try: |
| 448 | fp = open(options.excludefilename) |
| 449 | options.toexclude = fp.readlines() |
| 450 | fp.close() |
| 451 | except IOError: |
Barry Warsaw | 6e97241 | 2001-05-21 19:35:20 +0000 | [diff] [blame^] | 452 | print >> sys.stderr, _( |
| 453 | "Can't read --exclude-file: %s") % options.excludefilename |
Barry Warsaw | c8f0892 | 2000-02-26 20:56:47 +0000 | [diff] [blame] | 454 | sys.exit(1) |
| 455 | else: |
| 456 | options.toexclude = [] |
| 457 | |
Barry Warsaw | e27db5a | 1999-08-13 20:59:48 +0000 | [diff] [blame] | 458 | # slurp through all the files |
| 459 | eater = TokenEater(options) |
| 460 | for filename in args: |
Barry Warsaw | a17e0f1 | 2000-03-08 15:18:35 +0000 | [diff] [blame] | 461 | if filename == '-': |
| 462 | if options.verbose: |
| 463 | print _('Reading standard input') |
| 464 | fp = sys.stdin |
| 465 | closep = 0 |
| 466 | else: |
| 467 | if options.verbose: |
| 468 | print _('Working on %s') % filename |
| 469 | fp = open(filename) |
| 470 | closep = 1 |
| 471 | try: |
| 472 | eater.set_filename(filename) |
Barry Warsaw | 75ee8f5 | 2001-02-26 04:46:53 +0000 | [diff] [blame] | 473 | try: |
| 474 | tokenize.tokenize(fp.readline, eater) |
| 475 | except tokenize.TokenError, e: |
Barry Warsaw | 6e97241 | 2001-05-21 19:35:20 +0000 | [diff] [blame^] | 476 | print >> sys.stderr, '%s: %s, line %d, column %d' % ( |
| 477 | e[0], filename, e[1][0], e[1][1]) |
Barry Warsaw | a17e0f1 | 2000-03-08 15:18:35 +0000 | [diff] [blame] | 478 | finally: |
| 479 | if closep: |
| 480 | fp.close() |
Barry Warsaw | e27db5a | 1999-08-13 20:59:48 +0000 | [diff] [blame] | 481 | |
Barry Warsaw | a17e0f1 | 2000-03-08 15:18:35 +0000 | [diff] [blame] | 482 | # write the output |
| 483 | if options.outfile == '-': |
| 484 | fp = sys.stdout |
| 485 | closep = 0 |
| 486 | else: |
| 487 | if options.outpath: |
| 488 | options.outfile = os.path.join(options.outpath, options.outfile) |
| 489 | fp = open(options.outfile, 'w') |
| 490 | closep = 1 |
| 491 | try: |
| 492 | eater.write(fp) |
| 493 | finally: |
| 494 | if closep: |
| 495 | fp.close() |
Barry Warsaw | e27db5a | 1999-08-13 20:59:48 +0000 | [diff] [blame] | 496 | |
| 497 | |
| 498 | if __name__ == '__main__': |
| 499 | main() |
Barry Warsaw | 75a6e67 | 2000-05-02 19:28:30 +0000 | [diff] [blame] | 500 | # some more test strings |
| 501 | _(u'a unicode string') |