Blame - Tools/i18n/pygettext.py - platform/external/python/cpython3

2002-11-22 08:36:54 +0000

[diff] [blame]

2

# -*- coding: iso-8859-1 -*-

Benjamin Peterson

eaedaec

2013-12-22 19:45:38 -0600

[diff] [blame]

3

# Originally written by Barry Warsaw <barry@python.org>

Barry Warsaw

2000-02-26 20:56:47 +0000

[diff] [blame]

4

#

Barry Warsaw

2003-04-16 18:08:23 +0000

[diff] [blame]

5

# Minimally patched to make it even more xgettext compatible

Barry Warsaw

2000-02-26 20:56:47 +0000

[diff] [blame]

6

# by Peter Funk <pf@artcom-gmbh.de>

Martin v. Löwis

2002-11-22 08:36:54 +0000

[diff] [blame]

7

#

8

# 2002-11-22 Jürgen Hermann <jh@web.de>

9

# Added checks that _() only contains string literals, and

10

# command line args are resolved to module lists, i.e. you

11

# can now pass a filename, a module or package name, or a

12

# directory (including globbing chars, important for Win32).

13

# Made docstring fit in 80 chars wide displays using pydoc.

14

#

Barry Warsaw

1999-08-13 20:59:48 +0000

[diff] [blame]

15

Martin v. Löwis

2002-11-22 08:36:54 +0000

[diff] [blame]

# for selftesting

try:

import fintl

_ = fintl.gettext

except ImportError:

_ = lambda s: s

__doc__ = _("""pygettext -- Python equivalent of xgettext(1)

Barry Warsaw

1999-08-13 20:59:48 +0000

[diff] [blame]

24

25

Many systems (Solaris, Linux, Gnu) provide extensive tools that ease the

Martin v. Löwis

2002-11-22 08:36:54 +0000

[diff] [blame]

26

internationalization of C programs. Most of these tools are independent of

27

the programming language and can be used from within Python programs.

Barry Warsaw

2003-04-16 18:08:23 +0000

[diff] [blame]

28

Martin von Loewis' work[1] helps considerably in this regard.

Barry Warsaw

1999-08-13 20:59:48 +0000

[diff] [blame]

29

Barry Warsaw

1999-11-03 18:47:52 +0000

[diff] [blame]

30

There's one problem though; xgettext is the program that scans source code

Martin v. Löwis

2002-11-22 08:36:54 +0000

[diff] [blame]

31

looking for message strings, but it groks only C (or C++). Python

32

introduces a few wrinkles, such as dual quoting characters, triple quoted

Barry Warsaw

2003-04-16 18:08:23 +0000

[diff] [blame]

33

strings, and raw strings. xgettext understands none of this.

Barry Warsaw

1999-08-13 20:59:48 +0000

[diff] [blame]

34

Martin v. Löwis

2002-11-22 08:36:54 +0000

[diff] [blame]

35

Enter pygettext, which uses Python's standard tokenize module to scan

36

Python source code, generating .pot files identical to what GNU xgettext[2]

37

generates for C and C++ code. From there, the standard GNU tools can be

Barry Warsaw

2003-04-16 18:08:23 +0000

[diff] [blame]

38

used.

Barry Warsaw

1999-08-13 20:59:48 +0000

[diff] [blame]

39

Martin v. Löwis

2002-11-22 08:36:54 +0000

[diff] [blame]

40

A word about marking Python strings as candidates for translation. GNU

41

xgettext recognizes the following keywords: gettext, dgettext, dcgettext,

42

and gettext_noop. But those can be a lot of text to include all over your

43

code. C and C++ have a trick: they use the C preprocessor. Most

44

internationalized C source includes a #define for gettext() to _() so that

45

what has to be written in the source is much less. Thus these are both

Barry Warsaw

2003-04-16 18:08:23 +0000

[diff] [blame]

46

translatable strings:

Barry Warsaw

1999-08-13 20:59:48 +0000

[diff] [blame]

47

48

gettext("Translatable String")

49

_("Translatable String")

50

51

Python of course has no preprocessor so this doesn't work so well. Thus,

52

pygettext searches only for _() by default, but see the -k/--keyword flag

53

below for how to augment this.

54

55

[1] http://www.python.org/workshops/1997-10/proceedings/loewis.html

56

[2] http://www.gnu.org/software/gettext/gettext.html

57

Martin v. Löwis

2002-11-22 08:36:54 +0000

[diff] [blame]

58

NOTE: pygettext attempts to be option and feature compatible with GNU

59

xgettext where ever possible. However some options are still missing or are

60

not fully implemented. Also, xgettext's use of command line switches with

61

option arguments is broken, and in these cases, pygettext just defines

Barry Warsaw

2003-04-16 18:08:23 +0000

[diff] [blame]

62

additional switches.

Barry Warsaw

1999-08-13 20:59:48 +0000

[diff] [blame]

63

Barry Warsaw

2000-03-08 15:18:35 +0000

[diff] [blame]

64

Usage: pygettext [options] inputfile ...

Barry Warsaw

1999-08-13 20:59:48 +0000

[diff] [blame]

Options:

-a

--extract-all

Barry Warsaw

2001-07-27 16:47:18 +0000

[diff] [blame]

70

Extract all strings.

Barry Warsaw

1999-08-13 20:59:48 +0000

[diff] [blame]

71

Barry Warsaw

2000-02-26 20:56:47 +0000

[diff] [blame]

72

-d name

73

--default-domain=name

Barry Warsaw

2001-07-27 16:47:18 +0000

[diff] [blame]

74

Rename the default output file from messages.pot to name.pot.

Barry Warsaw

2000-02-26 20:56:47 +0000

[diff] [blame]

75

76

-E

77

--escape

Barry Warsaw

2000-10-27 04:56:28 +0000

[diff] [blame]

78

Replace non-ASCII characters with octal escape sequences.

79

80

-D

81

--docstrings

Martin v. Löwis

2002-11-22 08:36:54 +0000

[diff] [blame]

82

Extract module, class, method, and function docstrings. These do

83

not need to be wrapped in _() markers, and in fact cannot be for

84

Python to consider them docstrings. (See also the -X option).

Barry Warsaw

2000-02-26 20:56:47 +0000

[diff] [blame]

85

86

-h

87

--help

Barry Warsaw

2001-07-27 16:47:18 +0000

[diff] [blame]

88

Print this help message and exit.

Barry Warsaw

1999-08-13 20:59:48 +0000

[diff] [blame]

89

Barry Warsaw

2000-03-08 15:18:35 +0000

[diff] [blame]

90

-k word

91

--keyword=word

92

Keywords to look for in addition to the default set, which are:

93

%(DEFAULTKEYWORDS)s

Barry Warsaw

1999-08-13 20:59:48 +0000

[diff] [blame]

94

Barry Warsaw

2000-03-08 15:18:35 +0000

[diff] [blame]

95

You can have multiple -k flags on the command line.

96

97

-K

98

--no-default-keywords

99

Disable the default set of keywords (see above). Any keywords

100

explicitly added with the -k/--keyword option are still recognized.

Barry Warsaw

1999-08-13 20:59:48 +0000

[diff] [blame]

101

102

--no-location

Barry Warsaw

2000-03-08 15:18:35 +0000

[diff] [blame]

103

Do not write filename/lineno location comments.

Barry Warsaw

1999-08-13 20:59:48 +0000

[diff] [blame]

104

Barry Warsaw

2000-03-08 15:18:35 +0000

[diff] [blame]

105

-n

106

--add-location

Barry Warsaw

1999-08-13 20:59:48 +0000

[diff] [blame]

107

Write filename/lineno location comments indicating where each

108

extracted string is found in the source. These lines appear before

Barry Warsaw

2000-03-08 15:18:35 +0000

[diff] [blame]

109

each msgid. The style of comments is controlled by the -S/--style

110

option. This is the default.

111

Barry Warsaw

2000-10-27 04:56:28 +0000

[diff] [blame]

112

-o filename

113

--output=filename

114

Rename the default output file from messages.pot to filename. If

115

filename is `-' then the output is sent to standard out.

-p dir

--output-dir=dir

Output files will be placed in directory dir.

120

Barry Warsaw

2000-03-08 15:18:35 +0000

[diff] [blame]

121

-S stylename

122

--style stylename

123

Specify which style to use for location comments. Two styles are

124

supported:

Barry Warsaw

1999-08-13 20:59:48 +0000

[diff] [blame]

125

126

Solaris # File: filename, line: line-number

Barry Warsaw

2000-03-08 15:18:35 +0000

[diff] [blame]

127

GNU #: filename:line

Barry Warsaw

1999-08-13 20:59:48 +0000

[diff] [blame]

128

Barry Warsaw

2000-03-08 15:18:35 +0000

[diff] [blame]

129

The style name is case insensitive. GNU style is the default.

Barry Warsaw

1999-08-13 20:59:48 +0000

[diff] [blame]

130

Barry Warsaw

1999-11-03 18:47:52 +0000

[diff] [blame]

131

-v

132

--verbose

133

Print the names of the files being processed.

134

Barry Warsaw

2000-02-26 20:56:47 +0000

[diff] [blame]

135

-V

136

--version

137

Print the version of pygettext and exit.

-w columns

--width=columns

Set width of output to columns.

142

143

-x filename

144

--exclude-file=filename

145

Specify a file that contains a list of strings that are not be

146

extracted from the input files. Each string to be excluded must

147

appear on a line by itself in the file.

Barry Warsaw

1999-08-13 20:59:48 +0000

[diff] [blame]

148

Barry Warsaw

2001-07-27 16:47:18 +0000

[diff] [blame]

149

-X filename

150

--no-docstrings=filename

151

Specify a file that contains a list of files (one per line) that

152

should not have their docstrings extracted. This is only useful in

153

conjunction with the -D option above.

Barry Warsaw

2000-03-08 15:18:35 +0000

[diff] [blame]

154

Barry Warsaw

2001-07-27 16:47:18 +0000

[diff] [blame]

155

If `inputfile' is -, standard input is read.

Martin v. Löwis

2002-11-22 08:36:54 +0000

[diff] [blame]

156

""")

Barry Warsaw

1999-08-13 20:59:48 +0000

[diff] [blame]

157

158

import os

Victor Stinner

328cb1f

2016-04-12 18:46:10 +0200

[diff] [blame]

159

import importlib.machinery

160

import importlib.util

Barry Warsaw

1999-08-13 20:59:48 +0000

[diff] [blame]

161

import sys

Barry Warsaw

2003-04-16 18:08:23 +0000

[diff] [blame]

162

import glob

Barry Warsaw

1999-08-13 20:59:48 +0000

[diff] [blame]

163

import time

164

import getopt

jack1142

bfc6b63

2020-11-09 23:50:45 +0100

[diff] [blame]

165

import ast

Martin v. Löwis

2002-11-22 08:36:54 +0000

[diff] [blame]

166

import token

Barry Warsaw

1999-08-13 20:59:48 +0000

[diff] [blame]

167

import tokenize

168

Martin v. Löwis

2002-11-22 08:36:54 +0000

[diff] [blame]

169

__version__ = '1.5'

Barry Warsaw

2000-03-08 15:18:35 +0000

[diff] [blame]

170

171

default_keywords = ['_']

172

DEFAULTKEYWORDS = ', '.join(default_keywords)

173

174

EMPTYSTRING = ''

Barry Warsaw

1999-08-13 20:59:48 +0000

[diff] [blame]

Barry Warsaw

2001-07-27 16:47:18 +0000

[diff] [blame]

178

# The normal pot-file header. msgmerge and Emacs's po-mode work better if it's

179

# there.

Barry Warsaw

1999-11-03 18:47:52 +0000

[diff] [blame]

180

pot_header = _('''\

181

# SOME DESCRIPTIVE TITLE.

182

# Copyright (C) YEAR ORGANIZATION

183

# FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.

#

msgid ""

msgstr ""

"Project-Id-Version: PACKAGE VERSION\\n"

Martin v. Löwis

0f6b383

2001-03-01 22:56:17 +0000

[diff] [blame]

188

"POT-Creation-Date: %(time)s\\n"

189

"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\\n"

Barry Warsaw

1999-11-03 18:47:52 +0000

[diff] [blame]

190

"Last-Translator: FULL NAME <EMAIL@ADDRESS>\\n"

191

"Language-Team: LANGUAGE <LL@li.org>\\n"

192

"MIME-Version: 1.0\\n"

Serhiy Storchaka

2013-02-09 22:37:22 +0200

[diff] [blame]

193

"Content-Type: text/plain; charset=%(charset)s\\n"

194

"Content-Transfer-Encoding: %(encoding)s\\n"

Barry Warsaw

1999-11-03 18:47:52 +0000

[diff] [blame]

195

"Generated-By: pygettext.py %(version)s\\n"

''')

Barry Warsaw

1999-08-13 20:59:48 +0000

[diff] [blame]

200

def usage(code, msg=''):

Collin Winter

2007-08-03 17:06:41 +0000

[diff] [blame]

201

print(__doc__ % globals(), file=sys.stderr)

Barry Warsaw

1999-08-13 20:59:48 +0000

[diff] [blame]

202

if msg:

Collin Winter

2007-08-03 17:06:41 +0000

[diff] [blame]

203

print(msg, file=sys.stderr)

Barry Warsaw

1999-08-13 20:59:48 +0000

[diff] [blame]

204

sys.exit(code)

205

Barry Warsaw

2000-02-26 20:56:47 +0000

[diff] [blame]

206

Barry Warsaw

1999-08-13 20:59:48 +0000

[diff] [blame]

207

Serhiy Storchaka

2013-02-09 22:37:22 +0200

[diff] [blame]

208

def make_escapes(pass_nonascii):

209

global escapes, escape

210

if pass_nonascii:

211

# Allow non-ascii characters to pass through so that e.g. 'msgid

Barry Warsaw

7733e12

2000-02-27 14:30:48 +0000

[diff] [blame]

212

# "Höhe"' would result not result in 'msgid "H\366he"'. Otherwise we

213

# escape any character outside the 32..126 range.

214

mod = 128

Serhiy Storchaka

2013-02-09 22:37:22 +0200

[diff] [blame]

215

escape = escape_ascii

Barry Warsaw

7733e12

2000-02-27 14:30:48 +0000

[diff] [blame]

216

else:

217

mod = 256

Serhiy Storchaka

2013-02-09 22:37:22 +0200

[diff] [blame]

218

escape = escape_nonascii

219

escapes = [r"\%03o" % i for i in range(mod)]

220

for i in range(32, 127):

221

escapes[i] = chr(i)

222

escapes[ord('\\')] = r'\\'

223

escapes[ord('\t')] = r'\t'

224

escapes[ord('\r')] = r'\r'

225

escapes[ord('\n')] = r'\n'

226

escapes[ord('\"')] = r'\"'

Barry Warsaw

2000-02-26 20:56:47 +0000

[diff] [blame]

227

Barry Warsaw

1999-11-03 18:47:52 +0000

[diff] [blame]

228

Serhiy Storchaka

2013-02-09 22:37:22 +0200

[diff] [blame]

229

def escape_ascii(s, encoding):

230

return ''.join(escapes[ord(c)] if ord(c) < 128 else c for c in s)

231

232

def escape_nonascii(s, encoding):

233

return ''.join(escapes[b] for b in s.encode(encoding))

Barry Warsaw

1999-11-03 18:47:52 +0000

[diff] [blame]

234

235

Serhiy Storchaka

2018-04-19 09:23:03 +0300

[diff] [blame]

236

def is_literal_string(s):

237

return s[0] in '\'"' or (s[0] in 'rRuU' and s[1] in '\'"')

238

239

Barry Warsaw

1999-11-03 18:47:52 +0000

[diff] [blame]

240

def safe_eval(s):

241

# unwrap quotes, safely

242

return eval(s, {'__builtins__':{}}, {})

243

244

Serhiy Storchaka

2013-02-09 22:37:22 +0200

[diff] [blame]

245

def normalize(s, encoding):

Barry Warsaw

1999-08-13 20:59:48 +0000

[diff] [blame]

246

# This converts the various Python string types into a format that is

247

# appropriate for .po files, namely much closer to C style.

Barry Warsaw

2000-03-08 15:18:35 +0000

[diff] [blame]

248

lines = s.split('\n')

Barry Warsaw

1999-11-03 18:47:52 +0000

[diff] [blame]

249

if len(lines) == 1:

Serhiy Storchaka

2013-02-09 22:37:22 +0200

[diff] [blame]

250

s = '"' + escape(s, encoding) + '"'

Barry Warsaw

1999-08-13 20:59:48 +0000

[diff] [blame]

251

else:

Barry Warsaw

1999-11-03 18:47:52 +0000

[diff] [blame]

252

if not lines[-1]:

253

del lines[-1]

254

lines[-1] = lines[-1] + '\n'

255

for i in range(len(lines)):

Serhiy Storchaka

2013-02-09 22:37:22 +0200

[diff] [blame]

256

lines[i] = escape(lines[i], encoding)

Barry Warsaw

2000-03-08 15:18:35 +0000

[diff] [blame]

257

lineterm = '\\n"\n"'

258

s = '""\n"' + lineterm.join(lines) + '"'

Barry Warsaw

1999-11-03 18:47:52 +0000

[diff] [blame]

259

return s

Barry Warsaw

1999-08-13 20:59:48 +0000

[diff] [blame]

260

Martin v. Löwis

2002-11-22 08:36:54 +0000

[diff] [blame]

261

262

def containsAny(str, set):

Barry Warsaw

2003-04-16 18:08:23 +0000

[diff] [blame]

263

"""Check whether 'str' contains ANY of the chars in 'set'"""

Martin v. Löwis

2002-11-22 08:36:54 +0000

[diff] [blame]

264

return 1 in [c in str for c in set]

265

266

Martin v. Löwis

2002-11-22 08:36:54 +0000

[diff] [blame]

267

def getFilesForName(name):

Barry Warsaw

2003-04-16 18:08:23 +0000

[diff] [blame]

268

"""Get a list of module files for a filename, a module or package name,

269

or a directory.

Martin v. Löwis

2002-11-22 08:36:54 +0000

[diff] [blame]

270

"""

Martin v. Löwis

2002-11-22 08:36:54 +0000

[diff] [blame]

271

if not os.path.exists(name):

272

# check for glob chars

273

if containsAny(name, "*?[]"):

Martin v. Löwis

2002-11-22 08:36:54 +0000

[diff] [blame]

274

files = glob.glob(name)

275

list = []

276

for file in files:

277

list.extend(getFilesForName(file))

278

return list

279

280

# try to find module or package

Victor Stinner

328cb1f

2016-04-12 18:46:10 +0200

[diff] [blame]

281

try:

282

spec = importlib.util.find_spec(name)

283

name = spec.origin

284

except ImportError:

285

name = None

Martin v. Löwis

2002-11-22 08:36:54 +0000

[diff] [blame]

if not name:

return []

if os.path.isdir(name):

290

# find all python files in directory

291

list = []

Serhiy Storchaka

c93938b

2018-04-09 20:09:17 +0300

[diff] [blame]

292

# get extension for python source files

293

_py_ext = importlib.machinery.SOURCE_SUFFIXES[0]

294

for root, dirs, files in os.walk(name):

295

# don't recurse into CVS directories

296

if 'CVS' in dirs:

297

dirs.remove('CVS')

298

# add all *.py files to list

299

list.extend(

300

[os.path.join(root, file) for file in files

301

if os.path.splitext(file)[1] == _py_ext]

302

)

Martin v. Löwis

2002-11-22 08:36:54 +0000

[diff] [blame]

303

return list

304

elif os.path.exists(name):

# a single file

return [name]

return []

Barry Warsaw

1999-08-13 20:59:48 +0000

[diff] [blame]

class TokenEater:

def __init__(self, options):

313

self.__options = options

314

self.__messages = {}

315

self.__state = self.__waiting

316

self.__data = []

317

self.__lineno = -1

Barry Warsaw

2000-10-27 04:56:28 +0000

[diff] [blame]

318

self.__freshmodule = 1

Barry Warsaw

2001-07-27 16:47:18 +0000

[diff] [blame]

319

self.__curfile = None

Tobotimus

eee72d4

2018-02-27 09:48:14 +1100

[diff] [blame]

320

self.__enclosurecount = 0

Barry Warsaw

1999-08-13 20:59:48 +0000

[diff] [blame]

321

322

def __call__(self, ttype, tstring, stup, etup, line):

323

# dispatch

Barry Warsaw

2000-10-27 04:56:28 +0000

[diff] [blame]

324

## import token

Serhiy Storchaka

2018-04-19 09:23:03 +0300

[diff] [blame]

325

## print('ttype:', token.tok_name[ttype], 'tstring:', tstring,

326

## file=sys.stderr)

Barry Warsaw

1999-08-13 20:59:48 +0000

[diff] [blame]

327

self.__state(ttype, tstring, stup[0])

328

329

def __waiting(self, ttype, tstring, lineno):

Barry Warsaw

2001-07-27 16:47:18 +0000

[diff] [blame]

330

opts = self.__options

Barry Warsaw

2000-10-27 04:56:28 +0000

[diff] [blame]

331

# Do docstring extractions, if enabled

Barry Warsaw

2001-07-27 16:47:18 +0000

[diff] [blame]

332

if opts.docstrings and not opts.nodocstrings.get(self.__curfile):

Barry Warsaw

2000-10-27 04:56:28 +0000

[diff] [blame]

333

# module docstring?

334

if self.__freshmodule:

Serhiy Storchaka

2018-04-19 09:23:03 +0300

[diff] [blame]

335

if ttype == tokenize.STRING and is_literal_string(tstring):

Barry Warsaw

2001-05-21 19:51:26 +0000

[diff] [blame]

336

self.__addentry(safe_eval(tstring), lineno, isdocstring=1)

Barry Warsaw

2000-10-27 04:56:28 +0000

[diff] [blame]

337

self.__freshmodule = 0

338

elif ttype not in (tokenize.COMMENT, tokenize.NL):

339

self.__freshmodule = 0

340

return

Tobotimus

eee72d4

2018-02-27 09:48:14 +1100

[diff] [blame]

341

# class or func/method docstring?

Barry Warsaw

2000-10-27 04:56:28 +0000

[diff] [blame]

342

if ttype == tokenize.NAME and tstring in ('class', 'def'):

343

self.__state = self.__suiteseen

344

return

Barry Warsaw

2001-07-27 16:47:18 +0000

[diff] [blame]

345

if ttype == tokenize.NAME and tstring in opts.keywords:

Barry Warsaw

1999-08-13 20:59:48 +0000

[diff] [blame]

346

self.__state = self.__keywordseen

jack1142

bfc6b63

2020-11-09 23:50:45 +0100

[diff] [blame]

347

return

348

if ttype == tokenize.STRING:

349

maybe_fstring = ast.parse(tstring, mode='eval').body

350

if not isinstance(maybe_fstring, ast.JoinedStr):

351

return

352

for value in filter(lambda node: isinstance(node, ast.FormattedValue),

353

maybe_fstring.values):

354

for call in filter(lambda node: isinstance(node, ast.Call),

355

ast.walk(value)):

356

func = call.func

357

if isinstance(func, ast.Name):

358

func_name = func.id

359

elif isinstance(func, ast.Attribute):

360

func_name = func.attr

else:

continue

if func_name not in opts.keywords:

365

continue

366

if len(call.args) != 1:

367

print(_(

368

'*** %(file)s:%(lineno)s: Seen unexpected amount of'

369

' positional arguments in gettext call: %(source_segment)s'

370

) % {

371

'source_segment': ast.get_source_segment(tstring, call) or tstring,

372

'file': self.__curfile,

'lineno': lineno

}, file=sys.stderr)

continue

if call.keywords:

print(_(

'*** %(file)s:%(lineno)s: Seen unexpected keyword arguments'

379

' in gettext call: %(source_segment)s'

380

) % {

381

'source_segment': ast.get_source_segment(tstring, call) or tstring,

382

'file': self.__curfile,

'lineno': lineno

}, file=sys.stderr)

continue

arg = call.args[0]

if not isinstance(arg, ast.Constant):

388

print(_(

389

'*** %(file)s:%(lineno)s: Seen unexpected argument type'

390

' in gettext call: %(source_segment)s'

391

) % {

392

'source_segment': ast.get_source_segment(tstring, call) or tstring,

393

'file': self.__curfile,

'lineno': lineno

}, file=sys.stderr)

continue

if isinstance(arg.value, str):

398

self.__addentry(arg.value, lineno)

Barry Warsaw

1999-08-13 20:59:48 +0000

[diff] [blame]

399

Barry Warsaw

2000-10-27 04:56:28 +0000

[diff] [blame]

400

def __suiteseen(self, ttype, tstring, lineno):

Tobotimus

eee72d4

2018-02-27 09:48:14 +1100

[diff] [blame]

401

# skip over any enclosure pairs until we see the colon

402

if ttype == tokenize.OP:

403

if tstring == ':' and self.__enclosurecount == 0:

404

# we see a colon and we're not in an enclosure: end of def

405

self.__state = self.__suitedocstring

406

elif tstring in '([{':

407

self.__enclosurecount += 1

408

elif tstring in ')]}':

409

self.__enclosurecount -= 1

Barry Warsaw

2000-10-27 04:56:28 +0000

[diff] [blame]

410

411

def __suitedocstring(self, ttype, tstring, lineno):

412

# ignore any intervening noise

Serhiy Storchaka

2018-04-19 09:23:03 +0300

[diff] [blame]

413

if ttype == tokenize.STRING and is_literal_string(tstring):

Barry Warsaw

2001-05-21 19:51:26 +0000

[diff] [blame]

414

self.__addentry(safe_eval(tstring), lineno, isdocstring=1)

Barry Warsaw

2000-10-27 04:56:28 +0000

[diff] [blame]

415

self.__state = self.__waiting

416

elif ttype not in (tokenize.NEWLINE, tokenize.INDENT,

417

tokenize.COMMENT):

418

# there was no class docstring

419

self.__state = self.__waiting

420

Barry Warsaw

1999-08-13 20:59:48 +0000

[diff] [blame]

421

def __keywordseen(self, ttype, tstring, lineno):

422

if ttype == tokenize.OP and tstring == '(':

423

self.__data = []

424

self.__lineno = lineno

425

self.__state = self.__openseen

426

else:

427

self.__state = self.__waiting

428

429

def __openseen(self, ttype, tstring, lineno):

430

if ttype == tokenize.OP and tstring == ')':

431

# We've seen the last of the translatable strings. Record the

Barry Warsaw

2003-04-16 18:08:23 +0000

[diff] [blame]

432

# line number of the first line of the strings and update the list

Barry Warsaw

1999-08-13 20:59:48 +0000

[diff] [blame]

433

# of messages seen. Reset state for the next batch. If there

434

# were no strings inside _(), then just ignore this entry.

435

if self.__data:

Barry Warsaw

2000-10-27 04:56:28 +0000

[diff] [blame]

436

self.__addentry(EMPTYSTRING.join(self.__data))

Barry Warsaw

1999-08-13 20:59:48 +0000

[diff] [blame]

437

self.__state = self.__waiting

Serhiy Storchaka

2018-04-19 09:23:03 +0300

[diff] [blame]

438

elif ttype == tokenize.STRING and is_literal_string(tstring):

Barry Warsaw

1999-11-03 18:47:52 +0000

[diff] [blame]

439

self.__data.append(safe_eval(tstring))

Martin v. Löwis

2002-11-22 08:36:54 +0000

[diff] [blame]

440

elif ttype not in [tokenize.COMMENT, token.INDENT, token.DEDENT,

441

token.NEWLINE, tokenize.NL]:

442

# warn if we see anything else than STRING or whitespace

Collin Winter

2007-08-03 17:06:41 +0000

[diff] [blame]

443

print(_(

Barry Warsaw

2003-04-16 18:08:23 +0000

[diff] [blame]

444

'*** %(file)s:%(lineno)s: Seen unexpected token "%(token)s"'

445

) % {

446

'token': tstring,

447

'file': self.__curfile,

448

'lineno': self.__lineno

Collin Winter

2007-08-03 17:06:41 +0000

[diff] [blame]

449

}, file=sys.stderr)

Martin v. Löwis

2002-11-22 08:36:54 +0000

[diff] [blame]

450

self.__state = self.__waiting

Barry Warsaw

1999-08-13 20:59:48 +0000

[diff] [blame]

451

Barry Warsaw

2001-05-21 19:51:26 +0000

[diff] [blame]

452

def __addentry(self, msg, lineno=None, isdocstring=0):

Barry Warsaw

2000-10-27 04:56:28 +0000

[diff] [blame]

453

if lineno is None:

454

lineno = self.__lineno

455

if not msg in self.__options.toexclude:

456

entry = (self.__curfile, lineno)

Barry Warsaw

2001-05-21 19:51:26 +0000

[diff] [blame]

457

self.__messages.setdefault(msg, {})[entry] = isdocstring

Barry Warsaw

2000-10-27 04:56:28 +0000

[diff] [blame]

458

Barry Warsaw

1999-08-13 20:59:48 +0000

[diff] [blame]

459

def set_filename(self, filename):

460

self.__curfile = filename

Barry Warsaw

2001-07-27 16:47:18 +0000

[diff] [blame]

461

self.__freshmodule = 1

Barry Warsaw

1999-08-13 20:59:48 +0000

[diff] [blame]

462

463

def write(self, fp):

464

options = self.__options

R David Murray

2b78129

2015-04-16 12:15:09 -0400

[diff] [blame]

465

timestamp = time.strftime('%Y-%m-%d %H:%M%z')

Serhiy Storchaka

2013-02-09 22:37:22 +0200

[diff] [blame]

466

encoding = fp.encoding if fp.encoding else 'UTF-8'

467

print(pot_header % {'time': timestamp, 'version': __version__,

468

'charset': encoding,

469

'encoding': '8bit'}, file=fp)

Barry Warsaw

128c77d

2001-05-23 16:59:45 +0000

[diff] [blame]

470

# Sort the entries. First sort each particular entry's keys, then

471

# sort all the entries by their first item.

472

reverse = {}

Fred Drake

33e2c3e

2000-10-26 03:49:15 +0000

[diff] [blame]

473

for k, v in self.__messages.items():

Guido van Rossum

f7bd964

2008-01-15 17:41:38 +0000

[diff] [blame]

474

keys = sorted(v.keys())

Barry Warsaw

2001-05-24 23:06:13 +0000

[diff] [blame]

475

reverse.setdefault(tuple(keys), []).append((k, v))

Guido van Rossum

f7bd964

2008-01-15 17:41:38 +0000

[diff] [blame]

476

rkeys = sorted(reverse.keys())

Barry Warsaw

128c77d

2001-05-23 16:59:45 +0000

[diff] [blame]

477

for rkey in rkeys:

Barry Warsaw

2001-05-24 23:06:13 +0000

[diff] [blame]

478

rentries = reverse[rkey]

479

rentries.sort()

480

for k, v in rentries:

481

# If the entry was gleaned out of a docstring, then add a

482

# comment stating so. This is to aid translators who may wish

483

# to skip translating some unimportant docstrings.

Guido van Rossum

89da5d7

2006-08-22 00:21:25 +0000

[diff] [blame]

484

isdocstring = any(v.values())

Barry Warsaw

2001-05-24 23:06:13 +0000

[diff] [blame]

485

# k is the message string, v is a dictionary-set of (filename,

486

# lineno) tuples. We want to sort the entries in v first by

487

# file name and then by line number.

Guido van Rossum

f7bd964

2008-01-15 17:41:38 +0000

[diff] [blame]

488

v = sorted(v.keys())

Barry Warsaw

2001-05-24 23:06:13 +0000

[diff] [blame]

489

if not options.writelocations:

490

pass

491

# location comments are different b/w Solaris and GNU:

492

elif options.locationstyle == options.SOLARIS:

493

for filename, lineno in v:

494

d = {'filename': filename, 'lineno': lineno}

Collin Winter

2007-08-03 17:06:41 +0000

[diff] [blame]

495

print(_(

496

'# File: %(filename)s, line: %(lineno)d') % d, file=fp)

Barry Warsaw

2001-05-24 23:06:13 +0000

[diff] [blame]

497

elif options.locationstyle == options.GNU:

498

# fit as many locations on one line, as long as the

Martin Panter

69332c1

2016-08-04 13:07:31 +0000

[diff] [blame]

499

# resulting line length doesn't exceed 'options.width'

Barry Warsaw

2001-05-24 23:06:13 +0000

[diff] [blame]

500

locline = '#:'

501

for filename, lineno in v:

502

d = {'filename': filename, 'lineno': lineno}

503

s = _(' %(filename)s:%(lineno)d') % d

504

if len(locline) + len(s) <= options.width:

505

locline = locline + s

506

else:

Collin Winter

2007-08-03 17:06:41 +0000

[diff] [blame]

507

print(locline, file=fp)

Barry Warsaw

2001-05-24 23:06:13 +0000

[diff] [blame]

508

locline = "#:" + s

509

if len(locline) > 2:

Collin Winter

2007-08-03 17:06:41 +0000

[diff] [blame]

510

print(locline, file=fp)

Barry Warsaw

5c94ce5

2001-06-20 19:41:40 +0000

[diff] [blame]

511

if isdocstring:

Collin Winter

2007-08-03 17:06:41 +0000

[diff] [blame]

512

print('#, docstring', file=fp)

Serhiy Storchaka

2013-02-09 22:37:22 +0200

[diff] [blame]

513

print('msgid', normalize(k, encoding), file=fp)

Collin Winter

2007-08-03 17:06:41 +0000

[diff] [blame]

514

print('msgstr ""\n', file=fp)

Barry Warsaw

2000-10-27 04:56:28 +0000

[diff] [blame]

515

Barry Warsaw

1999-08-13 20:59:48 +0000

[diff] [blame]

516

517

518

def main():

Barry Warsaw

2000-03-08 15:18:35 +0000

[diff] [blame]

519

global default_keywords

Barry Warsaw

1999-08-13 20:59:48 +0000

[diff] [blame]

520

try:

521

opts, args = getopt.getopt(

522

sys.argv[1:],

Barry Warsaw

2001-07-27 16:47:18 +0000

[diff] [blame]

523

'ad:DEhk:Kno:p:S:Vvw:x:X:',

Barry Warsaw

2b63969

2001-05-21 19:58:23 +0000

[diff] [blame]

524

['extract-all', 'default-domain=', 'escape', 'help',

Barry Warsaw

2000-03-08 15:18:35 +0000

[diff] [blame]

525

'keyword=', 'no-default-keywords',

Barry Warsaw

2000-02-26 20:56:47 +0000

[diff] [blame]

526

'add-location', 'no-location', 'output=', 'output-dir=',

Barry Warsaw

2000-03-08 15:18:35 +0000

[diff] [blame]

527

'style=', 'verbose', 'version', 'width=', 'exclude-file=',

Barry Warsaw

2001-07-27 16:47:18 +0000

[diff] [blame]

528

'docstrings', 'no-docstrings',

Barry Warsaw

2000-02-26 20:56:47 +0000

[diff] [blame]

529

])

Guido van Rossum

b940e11

2007-01-10 16:19:56 +0000

[diff] [blame]

530

except getopt.error as msg:

Barry Warsaw

1999-08-13 20:59:48 +0000

[diff] [blame]

531

usage(1, msg)

532

533

# for holding option values

class Options:

# constants

GNU = 1

SOLARIS = 2

# defaults

Barry Warsaw

2000-02-26 20:56:47 +0000

[diff] [blame]

539

extractall = 0 # FIXME: currently this option has no effect at all.

540

escape = 0

Barry Warsaw

1999-08-13 20:59:48 +0000

[diff] [blame]

541

keywords = []

Barry Warsaw

2000-02-26 20:56:47 +0000

[diff] [blame]

542

outpath = ''

Barry Warsaw

1999-08-13 20:59:48 +0000

[diff] [blame]

543

outfile = 'messages.pot'

Barry Warsaw

2000-03-08 15:18:35 +0000

[diff] [blame]

544

writelocations = 1

545

locationstyle = GNU

Barry Warsaw

1999-11-03 18:47:52 +0000

[diff] [blame]

546

verbose = 0

Barry Warsaw

2000-02-26 20:56:47 +0000

[diff] [blame]

547

width = 78

548

excludefilename = ''

Barry Warsaw

2000-10-27 04:56:28 +0000

[diff] [blame]

549

docstrings = 0

Barry Warsaw

2001-07-27 16:47:18 +0000

[diff] [blame]

550

nodocstrings = {}

Barry Warsaw

1999-08-13 20:59:48 +0000

[diff] [blame]

551

552

options = Options()

553

locations = {'gnu' : options.GNU,

554

'solaris' : options.SOLARIS,

}

# parse options

for opt, arg in opts:

559

if opt in ('-h', '--help'):

560

usage(0)

Barry Warsaw

2000-02-26 20:56:47 +0000

[diff] [blame]

561

elif opt in ('-a', '--extract-all'):

562

options.extractall = 1

563

elif opt in ('-d', '--default-domain'):

564

options.outfile = arg + '.pot'

565

elif opt in ('-E', '--escape'):

566

options.escape = 1

Barry Warsaw

2000-10-27 04:56:28 +0000

[diff] [blame]

567

elif opt in ('-D', '--docstrings'):

568

options.docstrings = 1

Barry Warsaw

1999-08-13 20:59:48 +0000

[diff] [blame]

569

elif opt in ('-k', '--keyword'):

Barry Warsaw

1999-08-13 20:59:48 +0000

[diff] [blame]

570

options.keywords.append(arg)

Barry Warsaw

2000-03-08 15:18:35 +0000

[diff] [blame]

571

elif opt in ('-K', '--no-default-keywords'):

572

default_keywords = []

Barry Warsaw

1999-08-13 20:59:48 +0000

[diff] [blame]

573

elif opt in ('-n', '--add-location'):

Barry Warsaw

2000-03-08 15:18:35 +0000

[diff] [blame]

574

options.writelocations = 1

Barry Warsaw

1999-08-13 20:59:48 +0000

[diff] [blame]

575

elif opt in ('--no-location',):

Barry Warsaw

2000-03-08 15:18:35 +0000

[diff] [blame]

576

options.writelocations = 0

577

elif opt in ('-S', '--style'):

578

options.locationstyle = locations.get(arg.lower())

579

if options.locationstyle is None:

580

usage(1, _('Invalid value for --style: %s') % arg)

Barry Warsaw

2000-02-26 20:56:47 +0000

[diff] [blame]

581

elif opt in ('-o', '--output'):

582

options.outfile = arg

583

elif opt in ('-p', '--output-dir'):

584

options.outpath = arg

Barry Warsaw

1999-11-03 18:47:52 +0000

[diff] [blame]

585

elif opt in ('-v', '--verbose'):

586

options.verbose = 1

Barry Warsaw

2000-02-26 20:56:47 +0000

[diff] [blame]

587

elif opt in ('-V', '--version'):

Collin Winter

2007-08-03 17:06:41 +0000

[diff] [blame]

588

print(_('pygettext.py (xgettext for Python) %s') % __version__)

Barry Warsaw

2000-02-26 20:56:47 +0000

[diff] [blame]

589

sys.exit(0)

590

elif opt in ('-w', '--width'):

591

try:

592

options.width = int(arg)

593

except ValueError:

Barry Warsaw

2000-03-08 15:18:35 +0000

[diff] [blame]

594

usage(1, _('--width argument must be an integer: %s') % arg)

Barry Warsaw

2000-02-26 20:56:47 +0000

[diff] [blame]

595

elif opt in ('-x', '--exclude-file'):

596

options.excludefilename = arg

Barry Warsaw

2001-07-27 16:47:18 +0000

[diff] [blame]

597

elif opt in ('-X', '--no-docstrings'):

fp = open(arg)

try:

while 1:

line = fp.readline()

if not line:

break

options.nodocstrings[line[:-1]] = 1

605

finally:

606

fp.close()

Barry Warsaw

2000-02-26 20:56:47 +0000

[diff] [blame]

607

608

# calculate escapes

Serhiy Storchaka

2013-02-09 22:37:22 +0200

[diff] [blame]

609

make_escapes(not options.escape)

Barry Warsaw

1999-08-13 20:59:48 +0000

[diff] [blame]

610

611

# calculate all keywords

612

options.keywords.extend(default_keywords)

613

Barry Warsaw

2000-02-26 20:56:47 +0000

[diff] [blame]

614

# initialize list of strings to exclude

615

if options.excludefilename:

616

try:

Serhiy Storchaka

172bb39

2019-03-30 08:33:02 +0200

[diff] [blame]

617

with open(options.excludefilename) as fp:

618

options.toexclude = fp.readlines()

Barry Warsaw

2000-02-26 20:56:47 +0000

[diff] [blame]

619

except IOError:

Collin Winter

2007-08-03 17:06:41 +0000

[diff] [blame]

620

print(_(

621

"Can't read --exclude-file: %s") % options.excludefilename, file=sys.stderr)

Barry Warsaw

2000-02-26 20:56:47 +0000

[diff] [blame]

622

sys.exit(1)

623

else:

624

options.toexclude = []

625

Martin v. Löwis

2002-11-22 08:36:54 +0000

[diff] [blame]

626

# resolve args to module lists

expanded = []

for arg in args:

if arg == '-':

expanded.append(arg)

else:

expanded.extend(getFilesForName(arg))

633

args = expanded

634

Barry Warsaw

1999-08-13 20:59:48 +0000

[diff] [blame]

635

# slurp through all the files

636

eater = TokenEater(options)

637

for filename in args:

Barry Warsaw

2000-03-08 15:18:35 +0000

[diff] [blame]

638

if filename == '-':

639

if options.verbose:

Collin Winter

2007-08-03 17:06:41 +0000

[diff] [blame]

640

print(_('Reading standard input'))

Serhiy Storchaka

2013-02-09 22:37:22 +0200

[diff] [blame]

641

fp = sys.stdin.buffer

Barry Warsaw

2000-03-08 15:18:35 +0000

[diff] [blame]

642

closep = 0

643

else:

644

if options.verbose:

Collin Winter

2007-08-03 17:06:41 +0000

[diff] [blame]

645

print(_('Working on %s') % filename)

Serhiy Storchaka

2013-02-09 22:37:22 +0200

[diff] [blame]

646

fp = open(filename, 'rb')

Barry Warsaw

2000-03-08 15:18:35 +0000

[diff] [blame]

647

closep = 1

648

try:

649

eater.set_filename(filename)

Barry Warsaw

75ee8f5

2001-02-26 04:46:53 +0000

[diff] [blame]

650

try:

Serhiy Storchaka

2013-02-09 22:37:22 +0200

[diff] [blame]

651

tokens = tokenize.tokenize(fp.readline)

Trent Nelson

428de65

2008-03-18 22:41:35 +0000

[diff] [blame]

652

for _token in tokens:

653

eater(*_token)

Guido van Rossum

b940e11

2007-01-10 16:19:56 +0000

[diff] [blame]

654

except tokenize.TokenError as e:

Collin Winter

2007-08-03 17:06:41 +0000

[diff] [blame]

655

print('%s: %s, line %d, column %d' % (

Georg Brandl

6464d47

2007-10-22 16:16:13 +0000

[diff] [blame]

656

e.args[0], filename, e.args[1][0], e.args[1][1]),

657

file=sys.stderr)

Barry Warsaw

2000-03-08 15:18:35 +0000

[diff] [blame]

658

finally:

659

if closep:

660

fp.close()

Barry Warsaw

1999-08-13 20:59:48 +0000

[diff] [blame]

661

Barry Warsaw

2000-03-08 15:18:35 +0000

[diff] [blame]

662

# write the output

663

if options.outfile == '-':

fp = sys.stdout

closep = 0

else:

if options.outpath:

options.outfile = os.path.join(options.outpath, options.outfile)

669

fp = open(options.outfile, 'w')

closep = 1

try:

eater.write(fp)

finally:

if closep:

fp.close()

Barry Warsaw

1999-08-13 20:59:48 +0000

[diff] [blame]

676

677

678

if __name__ == '__main__':

679

main()

Barry Warsaw

75a6e67

2000-05-02 19:28:30 +0000

[diff] [blame]

680

# some more test strings

Barry Warsaw

2003-04-16 18:08:23 +0000

[diff] [blame]

681

# this one creates a warning

682

_('*** Seen unexpected token "%(token)s"') % {'token': 'test'}

Martin v. Löwis