blob: 5f56d04e77615c32a6746f762a153b1166389122 [file] [log] [blame]
Barry Warsaw0691a6b2000-08-30 03:27:10 +00001\section{\module{gettext} ---
2 Multilingual internationalization services}
3
4\declaremodule{standard}{gettext}
5\modulesynopsis{Multilingual internationalization services.}
6\moduleauthor{Barry A. Warsaw}{bwarsaw@beopen.com}
7\sectionauthor{Barry A. Warsaw}{bwarsaw@beopen.com}
8
9
10The \module{gettext} module provides internationalization (I18N) and
11localization (L10N) services for your Python modules and applications.
12It supports both the GNU \program{gettext} message catalog API and a
13higher level, class-based API that may be more appropriate for Python
14files. The interface described below allows you to write your
15module and application messages in one natural language, and provide a
16catalog of translated messages for running under different natural
17languages.
18
19Some hints on localizing your Python modules and applications are also
20given.
21
22\subsection{GNU \program{gettext} API}
23
24The \module{gettext} module defines the following API, which is very
25similar to the GNU \program{gettext} API. If you use this API you
26will affect the translation of your entire application globally. Often
27this is what you want if your application is monolingual, with the choice
28of language dependent on the locale of your user. If you are
29localizing a Python module, or if your application needs to switch
30languages on the fly, you probably want to use the class-based API
31instead.
32
33\begin{funcdesc}{bindtextdomain}{domain, localedir\code{=None}}
34Bind the \var{domain} to the locale directory
35\var{localedir}. More concretely, \module{gettext} will look for
36binary \file{.mo} files for the given domain using the path (on Unix):
37\file{\var{localedir}/\var{language}/LC_MESSAGES/\var{domain}.mo},
38where \var{languages} is searched for in the environment variables
39\code{LANGUAGE}, \code{LC_ALL}, \code{LC_MESSAGES}, and \code{LANG}
40respectively.
41
42If \var{localedir} is \code{None}, then the current binding for
43\var{domain} is returned\footnote{The default locale directory is system
44dependent; e.g. on standard RedHat Linux it is
45\file{/usr/share/locale}, but on Solaris it is
46\file{/usr/lib/locale}. The \module{gettext} module does not try to
47support these system dependent defaults; instead its default is
48\file{\code{sys.prefix}/share/locale}. For this reason, it is always
49best to call \code{gettext.bindtextdomain()} with an explicit absolute
50path at the start of your application.}.
51\end{funcdesc}
52
53\begin{funcdesc}{textdomain}{domain\code{=None}}
54Change or query the current global domain. If \var{domain} is
55\code{None}, then the current global domain is returned, otherwise the
56global domain is set to \var{domain}, which is returned.
57\end{funcdesc}
58
59\begin{funcdesc}{gettext}{message}
60Return the localized translation of \var{message}, based on the
61current global domain, language, and locale directory. This function
62is usually aliased as \function{_} in the local namespace (see
63examples below).
64\end{funcdesc}
65
66\begin{funcdesc}{dgettext}{domain, message}
67Like \function{gettext()}, but look the message up in the specified
68\var{domain}.
69\end{funcdesc}
70
71Note that GNU \program{gettext} also defines a \function{dcgettext()}
72method, but this was deemed not useful and so it is currently
73unimplemented.
74
75Here's an example of typical usage for this API:
76
77\begin{verbatim}
78import gettext
79gettext.bindtextdomain('myapplication', '/path/to/my/language/directory')
80gettext.textdomain('myapplication')
81_ = gettext.gettext
82# ...
83print _('This is a translatable string.')
84\end{verbatim}
85
86\subsection{Class-based API}
87
88The class-based API of the \module{gettext} module gives you more
89flexibility and greater convenience than the GNU \program{gettext}
90API. It is the recommended way of localizing your Python applications and
91modules. \module{gettext} defines a ``translations'' class which
92implements the parsing of GNU \file{.mo} format files, and has methods
93for returning either standard 8-bit strings or Unicode strings.
94Translations instances can also install themselves in the built-in
95namespace as the function \function{_()}.
96
97\begin{funcdesc}{find}{domain, localedir\code{=None}, languages\code{=None}}
98This function implements the standard \file{.mo} file search
99algorithm. It takes a \var{domain}, identical to what
100\function{textdomain()} takes, and optionally a \var{localedir} (as in
101\function{bindtextdomain()}), and a list of languages. All arguments
102are strings.
103
104If \var{localedir} is not given, then the default system locale
105directory is used\footnote{See the footnote for
106\function{bindtextdomain()} above.}. If \var{languages} is not given,
107then the following environment variables are searched: \code{LANGUAGE},
108\code{LC_ALL}, \code{LC_MESSAGES}, and \code{LANG}. The first one
109returning a non-empty value is used for the \var{languages} variable.
110The environment variables can contain a colon separated list of
111languages, which will be split.
112
113\function{find()} then expands and normalizes the languages, and then
114iterates through them, searching for an existing file built of these
115components:
116
117\file{\var{localedir}/\var{language}/LC_MESSAGES/\var{domain}.mo}
118
119The first such file name that exists is returned by \function{find()}.
120If no such file is found, then \code{None} is returned.
121\end{funcdesc}
122
123\begin{funcdesc}{translation}{domain, localedir\code{=None},
124languages\code{=None}, class_\code{=None}}
125Return a \class{Translations} instance based on the \var{domain},
126\var{localedir}, and \var{languages}, which are first passed to
127\function{find()} to get the
128associated \file{.mo} file path. Instances with
129identical \file{.mo} file names are cached. The actual class instantiated
130is either \var{class_} if provided, otherwise
131\class{GNUTranslations}. The class's constructor must take a single
132file object argument. If no \file{.mo} file is found, this
133function raises \exception{IOError}.
134\end{funcdesc}
135
136\begin{funcdesc}{install}{domain, localedir\code{=None}, unicode\code{=0}}
137This installs the function \function{_} in Python's builtin namespace,
138based on \var{domain}, and \var{localedir} which are passed to the
139function \function{translation()}. The \var{unicode} flag is passed to
140the resulting translation object's \method{install} method.
141
142As seen below, you usually mark the strings in your application that are
143candidates for translation, by wrapping them in a call to the function
144\function{_()}, e.g.
145
146\begin{verbatim}
147print _('This string will be translated.')
148\end{verbatim}
149
150For convenience, you want the \function{_()} function to be installed in
151Python's builtin namespace, so it is easily accessible in all modules
152of your application.
153\end{funcdesc}
154
155\subsubsection{The \class{NullTranslations} class}
156Translation classes are what actually implement the translation of
157original source file message strings to translated message strings.
158The base class used by all translation classes is
159\class{NullTranslations}; this provides the basic interface you can use
160to write your own specialized translation classes. Here are the
161methods of \class{NullTranslations}:
162
163\begin{methoddesc}[NullTranslations]{__init__}{fp\code{=None}}
164Takes an optional file object \var{fp}, which is ignored by the base
165class. Initializes ``protected'' instance variables \var{_info} and
166\var{_charset} which are set by derived classes. It then calls
167\code{self._parse(fp)} if \var{fp} is not \code{None}.
168\end{methoddesc}
169
170\begin{methoddesc}[NullTranslations]{_parse}{fp}
171No-op'd in the base class, this method takes file object \var{fp}, and
172reads the data from the file, initializing its message catalog. If
173you have an unsupported message catalog file format, you should
174override this method to parse your format.
175\end{methoddesc}
176
177\begin{methoddesc}[NullTranslations]{gettext}{message}
178Return the translated message. Overridden in derived classes.
179\end{methoddesc}
180
181\begin{methoddesc}[NullTranslations]{ugettext}{message}
182Return the translated message as a Unicode string. Overridden in
183derived classes.
184\end{methoddesc}
185
186\begin{methoddesc}[NullTranslations]{info}{}
187Return the ``protected'' \var{_info} variable.
188\end{methoddesc}
189
190\begin{methoddesc}[NullTranslations]{charset}{}
191Return the ``protected'' \var{_charset} variable.
192\end{methoddesc}
193
194\begin{methoddesc}[NullTranslations]{install}{unicode\code{=0}}
195If the \var{unicode} flag is false, this method installs
196\code{self.gettext} into the built-in namespace, binding it to
197\function{_}. If \var{unicode} is true, it binds \code{self.ugettext}
198instead.
199
200Note that this is only one way, albeit the most convenient way, to
201make the \function{_} function available to your application. Because it
202affects the entire application globally, and specifically the built-in
203namespace, localized modules should never install \function{_}.
204Instead, they should use this code to make \function{_} available to
205their module:
206
207\begin{verbatim}
208import gettext
209t = gettext.translation('mymodule', ...)
210_ = t.gettext
211\end{verbatim}
212
213This puts \function{_} only in the module's global namespace and so
214only affects calls within this module.
215\end{methoddesc}
216
217\subsubsection{The \class{GNUTranslations} class}
218
219The \module{gettext} module provides one additional class derived from
220\class{NullTranslations}: \class{GNUTranslations}. This class
221overrides \method{_parse()} to enable reading GNU \program{gettext}
222format \file{.mo} files in both big-endian and little-endian format.
223
224It also parses optional meta-data out of the translation catalog. It
225is convention with GNU \program{gettext} to include meta-data as the
226translation for the empty string. This meta-data is in RFC822-style
227\code{key: value} pairs. If the key \code{Content-Type:} is found,
228then the \code{charset} property is used to initialize the
229``protected'' \code{_charset} instance variable. The entire set of
230key/value pairs are placed into a dictionary and set as the
231``protected'' \code{_info} instance variable.
232
233If the \file{.mo} file's magic number is invalid, or if other problems
234occur while reading the file, instantiating a \class{GNUTranslations} class
235can raise \exception{IOError}.
236
237The other usefully overridden method is \method{ugettext()}, which
238returns a Unicode string by passing both the translated message string
239and the value of the ``protected'' \code{_charset} variable to the
240builtin \function{unicode()} function.
241
242\subsubsection{Solaris \file{.mo} file support}
243
244The Solaris operating system defines its own binary
245\file{.mo} file format, but since no documentation can be found on
246this format, it is not supported at this time.
247
248\subsubsection{The Catalog constructor}
249
250GNOME uses a version of the \module{gettext} module by James
251Henstridge, but this version has a slightly different API. Its
252documented usage was:
253
254\begin{verbatim}
255import gettext
256cat = gettext.Catalog(domain, localedir)
257_ = cat.gettext
258print _('hello world')
259\end{verbatim}
260
261For compatibility with this older module, the function
262\function{Catalog()} is an alias for the the \function{translation()}
263function described above.
264
265One difference between this module and Henstridge's: his catalog
266objects supported access through a mapping API, but this appears to be
267unused and so is not currently supported.
268
269\subsection{Internationalizing your programs and modules}
270Internationalization (I18N) refers to the operation by which a program
271is made aware of multiple languages. Localization (L10N) refers to
272the adaptation of your program, once internationalized, to the local
273language and cultural habits. In order to provide multilingual
274messages for your Python programs, you need to take the following
275steps:
276
277\begin{enumerate}
278 \item prepare your program or module by specially marking
279 translatable strings
280 \item run a suite of tools over your marked files to generate raw
281 messages catalogs
282 \item create language specific translations of the message catalogs
283 \item use the \module{gettext} module so that message strings are
284 properly translated
285\end{enumerate}
286
287In order to prepare your code for I18N, you need to look at all the
288strings in your files. Any string that needs to be translated
289should be marked by wrapping it in \code{_('...')} -- i.e. a call to
290the function \function{_()}. For example:
291
292\begin{verbatim}
293filename = 'mylog.txt'
294message = _('writing a log message')
295fp = open(filename, 'w')
296fp.write(message)
297fp.close()
298\end{verbatim}
299
300In this example, the string ``\code{writing a log message}'' is marked as
301a candidate for translation, while the strings ``\code{mylog.txt}'' and
302``\code{w}'' are not.
303
304The GNU \program{gettext} package provides a tool, called
305\program{xgettext}, that scans C and C++ source code looking for these
306specially marked strings. \program{xgettext} generates what are
307called \file{.pot} files, essentially structured human readable files
308which contain every marked string in the source code. These
309\file{.pot} files are copied and handed over to human translators who write
310language-specific versions for every supported natural language.
311
312For I18N Python programs however, \program{xgettext} won't work; it
313doesn't understand the myriad of string types support by Python. The
314standard Python distribution provides a tool called
315\program{pygettext} that does though (found in the \file{Tools/i18n}
316directory)\footnote{Fran\c cois Pinard has written a program called
317\program{xpot} which does a similar job. It is distributed separately
318from the Python distribution.}. This is a command line script that
319supports a similar interface as \program{xgettext}; see its
320documentation for details. Once you've used \program{pygettext} to
321create your \file{.pot} files, you can use the standard GNU
322\program{gettext} tools to generate your machine-readable \file{.mo}
323files, which are readable by the \class{GNUTranslations} class.
324
325How you use the \module{gettext} module in your code depends on
326whether you are internationalizing your entire application or a single
327module.
328
329\subsubsection{Localizing your module}
330
331If you are localizing your module, you must take care not to make
332global changes, e.g. to the built-in namespace. You should not use
333the GNU \program{gettext} API but instead the class-based API.
334
335Let's say your module is called ``spam'' and the module's various
336natural language translation \file{.mo} files reside in
337\file{/usr/share/locale} in GNU
338\program{gettext} format. Here's what you would put at the top of
339your module:
340
341\begin{verbatim}
342import gettext
343t = gettext.translation('spam', '/usr/share/locale')
344_ = t.gettext
345\end{verbatim}
346
347If your translators were providing you with Unicode strings in their
348\file{.po} files, you'd instead do:
349
350\begin{verbatim}
351import gettext
352t = gettext.translation('spam', '/usr/share/locale')
353_ = t.ugettext
354\end{verbatim}
355
356\subsubsection{Localizing your application}
357
358If you are localizing your application, you can install the \function{_()}
359function globally into the built-in namespace, usually in the main driver file
360of your application. This will let all your application-specific
361files just use \code{_('...')} without having to explicitly install it in
362each file.
363
364In the simple case then, you need only add the following bit of code
365to the main driver file of your application:
366
367\begin{verbatim}
368import gettext
369gettext.install('myapplication')
370\end{verbatim}
371
372If you need to set the locale directory or the \code{unicode} flag,
373you can pass these into the \function{install()} function:
374
375\begin{verbatim}
376import gettext
377gettext.install('myapplication', '/usr/share/locale', unicode=1)
378\end{verbatim}
379
380\subsubsection{Changing languages on the fly}
381
382If your program needs to support many languages at the same time, you
383may want to create multiple translation instances and then switch
384between them explicitly, like so:
385
386\begin{verbatim}
387import gettext
388
389lang1 = gettext.translation(languages=['en'])
390lang2 = gettext.translation(languages=['fr'])
391lang3 = gettext.translation(languages=['de'])
392
393# start by using language1
394lang1.install()
395
396# ... time goes by, user selects language 2
397lang2.install()
398
399# ... more time goes by, user selects language 3
400lang3.install()
401\end{verbatim}
402
403\subsubsection{Deferred translations}
404
405In most coding situations, strings are translated were they are coded.
406Occasionally however, you need to mark strings for translation, but
407defer actual translation until later. A classic example is:
408
409\begin{verbatim}
410animals = ['mollusk',
411 'albatross',
412 'rat',
413 'penguin',
414 'python',
415 ]
416# ...
417for a in animals:
418 print a
419\end{verbatim}
420
421Here, you want to mark the strings in the \code{animals} list as being
422translatable, but you don't actually want to translate them until they
423are printed.
424
425Here is one way you can handle this situation:
426
427\begin{verbatim}
428def _(message): return message
429
430animals = [_('mollusk'),
431 _('albatross'),
432 _('rat'),
433 _('penguin'),
434 _('python'),
435 ]
436
437del _
438
439# ...
440for a in animals:
441 print _(a)
442\end{verbatim}
443
444This works because the dummy definition of \function{_()} simply returns
445the string unchanged. And this dummy definition will temporarily
446override any definition of \function{_()} in the built-in namespace
447(until the \code{del} command).
448Take care, though if you have a previous definition of \function{_} in
449the local namespace.
450
451Note that the second use of \function{_()} will not identify ``a'' as
452being translatable to the \program{pygettext} program, since it is not
453a string.
454
455Another way to handle this is with the following example:
456
457\begin{verbatim}
458def N_(message): return message
459
460animals = [N_('mollusk'),
461 N_('albatross'),
462 N_('rat'),
463 N_('penguin'),
464 N_('python'),
465 ]
466
467# ...
468for a in animals:
469 print _(a)
470\end{verbatim}
471
472In this case, you are marking translatable strings with the function
473\function{N_()}\footnote{The choice of \function{N_()} here is totally
474arbitrary; it could have just as easily been
475\function{MarkThisStringForTranslation()}.},
476which won't conflict with any definition of
477\function{_()}. However, you will need to teach your message extraction
478program to look for translatable strings marked with \function{N_()}.
479\program{pygettext} and \program{xpot} both support this through the
480use of command line switches.
481
482\subsection{Acknowledgements}
483
484The following people contributed code, feedback, design suggestions,
485previous implementations, and valuable experience to the creation of
486this module:
487
488\begin{itemize}
489 \item Peter Funk
490 \item James Henstridge
Barry Warsaw28b815f2000-08-30 03:28:17 +0000491 \item Marc-Andre Lemburg
Barry Warsaw0691a6b2000-08-30 03:27:10 +0000492 \item Martin von L\"owis
493 \item Fran\c cois Pinard
494 \item Barry Warsaw
495\end{itemize}