blob: 107c21de7f7eda4cc09f9867b322d379f64a5987 [file] [log] [blame]
Fred Drake295da241998-08-10 19:42:37 +00001\section{\module{locale} ---
Fred Drakec3845a11999-04-21 17:18:04 +00002 Internationalization services}
3
Fred Drakeb91e9341998-07-23 17:59:49 +00004\declaremodule{standard}{locale}
Fred Drakeb91e9341998-07-23 17:59:49 +00005\modulesynopsis{Internationalization services.}
Fred Drake1491cac2000-10-25 20:59:52 +00006\moduleauthor{Martin von L\"owis}{loewis@informatik.hu-berlin.de}
7\sectionauthor{Martin von L\"owis}{loewis@informatik.hu-berlin.de}
Fred Drakeb91e9341998-07-23 17:59:49 +00008
Guido van Rossumbc12f781997-11-20 21:04:27 +00009
Fred Drake1491cac2000-10-25 20:59:52 +000010The \module{locale} module opens access to the \POSIX{} locale
11database and functionality. The \POSIX{} locale mechanism allows
12programmers to deal with certain cultural issues in an application,
13without requiring the programmer to know all the specifics of each
14country where the software is executed.
Guido van Rossumbc12f781997-11-20 21:04:27 +000015
Fred Drake193338a1998-03-10 04:23:12 +000016The \module{locale} module is implemented on top of the
17\module{_locale}\refbimodindex{_locale} module, which in turn uses an
Fred Drakec3845a11999-04-21 17:18:04 +000018ANSI C locale implementation if available.
Guido van Rossumbc12f781997-11-20 21:04:27 +000019
Fred Drake193338a1998-03-10 04:23:12 +000020The \module{locale} module defines the following exception and
21functions:
Guido van Rossumbc12f781997-11-20 21:04:27 +000022
Guido van Rossumbc12f781997-11-20 21:04:27 +000023
Fred Drake1491cac2000-10-25 20:59:52 +000024\begin{excdesc}{Error}
25 Exception raised when \function{setlocale()} fails.
26\end{excdesc}
Guido van Rossumbc12f781997-11-20 21:04:27 +000027
Fred Drake1491cac2000-10-25 20:59:52 +000028\begin{funcdesc}{setlocale}{category\optional{, locale}}
29 If \var{locale} is specified, it may be a string, a tuple of the
30 form \code{(\var{language code}, \var{encoding})}, or \code{None}.
31 If it is a tuple, it is converted to a string using the locale
32 aliasing engine. If \var{locale} is given and not \code{None},
33 \function{setlocale()} modifies the locale setting for the
34 \var{category}. The available categories are listed in the data
35 description below. The value is the name of a locale. An empty
36 string specifies the user's default settings. If the modification of
37 the locale fails, the exception \exception{Error} is raised. If
38 successful, the new locale setting is returned.
Guido van Rossumbc12f781997-11-20 21:04:27 +000039
Fred Drake1491cac2000-10-25 20:59:52 +000040 If \var{locale} is omitted or \code{None}, the current setting for
41 \var{category} is returned.
42
43 \function{setlocale()} is not thread safe on most systems.
44 Applications typically start with a call of
45
Fred Drake19479911998-02-13 06:58:54 +000046\begin{verbatim}
Guido van Rossumbc12f781997-11-20 21:04:27 +000047import locale
Fred Drakec01f6e62000-11-30 07:13:58 +000048locale.setlocale(locale.LC_ALL, '')
Fred Drake19479911998-02-13 06:58:54 +000049\end{verbatim}
Guido van Rossumbc12f781997-11-20 21:04:27 +000050
Fred Drake1491cac2000-10-25 20:59:52 +000051 This sets the locale for all categories to the user's default
52 setting (typically specified in the \envvar{LANG} environment
53 variable). If the locale is not changed thereafter, using
54 multithreading should not cause problems.
55
56 \versionchanged[Added support for tuple values of the \var{locale}
57 parameter]{2.0}
58\end{funcdesc}
Fred Drake193338a1998-03-10 04:23:12 +000059
Guido van Rossumbc12f781997-11-20 21:04:27 +000060\begin{funcdesc}{localeconv}{}
Fred Drake1491cac2000-10-25 20:59:52 +000061 Returns the database of of the local conventions as a dictionary.
62 This dictionary has the following strings as keys:
Fred Drake193338a1998-03-10 04:23:12 +000063
Fred Drakec01f6e62000-11-30 07:13:58 +000064 \begin{tableiii}{l|l|p{3in}}{constant}{Key}{Category}{Meaning}
65 \lineiii{LC_NUMERIC}{\code{'decimal_point'}}
66 {Decimal point character.}
67 \lineiii{}{\code{'grouping'}}
68 {Sequence of numbers specifying which relative positions
69 the \code{'thousands_sep'} is expected. If the sequence is
70 terminated with \constant{CHAR_MAX}, no further grouping
71 is performed. If the sequence terminates with a \code{0},
72 the last group size is repeatedly used.}
73 \lineiii{}{\code{'thousands_sep'}}
74 {Character used between groups.}\hline
75 \lineiii{LC_MONETARY}{\code{'int_curr_symbol'}}
76 {International currency symbol.}
77 \lineiii{}{\code{'currency_symbol'}}
78 {Local currency symbol.}
79 \lineiii{}{\code{'mon_decimal_point'}}
80 {Decimal point used for monetary values.}
81 \lineiii{}{\code{'mon_thousands_sep'}}
82 {Group separator used for monetary values.}
83 \lineiii{}{\code{'mon_grouping'}}
84 {Equivalent to \code{'grouping'}, used for monetary
85 values.}
86 \lineiii{}{\code{'positive_sign'}}
87 {Symbol used to annotate a positive monetary value.}
88 \lineiii{}{\code{'negative_sign'}}
89 {Symbol used to annotate a nnegative monetary value.}
90 \lineiii{}{\code{'frac_digits'}}
91 {Number of fractional digits used in local formatting
92 of monetary values.}
93 \lineiii{}{\code{'int_frac_digits'}}
94 {Number of fractional digits used in international
95 formatting of monetary values.}
96 \end{tableiii}
Fred Drake193338a1998-03-10 04:23:12 +000097
Fred Drakec01f6e62000-11-30 07:13:58 +000098 The possible values for \code{'p_sign_posn'} and
99 \code{'n_sign_posn'} are given below.
Fred Drake1491cac2000-10-25 20:59:52 +0000100
101 \begin{tableii}{c|l}{code}{Value}{Explanation}
102 \lineii{0}{Currency and value are surrounded by parentheses.}
103 \lineii{1}{The sign should precede the value and currency symbol.}
104 \lineii{2}{The sign should follow the value and currency symbol.}
105 \lineii{3}{The sign should immediately precede the value.}
106 \lineii{4}{The sign should immediately follow the value.}
Fred Drakec01f6e62000-11-30 07:13:58 +0000107 \lineii{\constant{LC_MAX}}{Nothing is specified in this locale.}
Fred Drake1491cac2000-10-25 20:59:52 +0000108 \end{tableii}
Guido van Rossumbc12f781997-11-20 21:04:27 +0000109\end{funcdesc}
110
Fred Drake1491cac2000-10-25 20:59:52 +0000111\begin{funcdesc}{getdefaultlocale}{\optional{envvars}}
112 Tries to determine the default locale settings and returns
113 them as a tuple of the form \code{(\var{language code},
114 \var{encoding})}.
115
116 According to \POSIX, a program which has not called
117 \code{setlocale(LC_ALL, '')} runs using the portable \code{'C'}
118 locale. Calling \code{setlocale(LC_ALL, '')} lets it use the
119 default locale as defined by the \envvar{LANG} variable. Since we
120 do not want to interfere with the current locale setting we thus
121 emulate the behavior in the way described above.
122
123 To maintain compatibility with other platforms, not only the
124 \envvar{LANG} variable is tested, but a list of variables given as
125 envvars parameter. The first found to be defined will be
126 used. \var{envvars} defaults to the search path used in GNU gettext;
127 it must always contain the variable name \samp{LANG}. The GNU
128 gettext search path contains \code{'LANGUAGE'}, \code{'LC_ALL'},
129 code{'LC_CTYPE'}, and \code{'LANG'}, in that order.
130
131 Except for the code \code{'C'}, the language code corresponds to
132 \rfc{1766}. \var{language code} and \var{encoding} may be
133 \code{None} if their values cannot be determined.
134 \versionadded{2.0}
135\end{funcdesc}
136
137\begin{funcdesc}{getlocale}{\optional{category}}
138 Returns the current setting for the given locale category as
139 tuple (language code, encoding). \var{category} may be one of the
140 \constant{LC_*} values except \constant{LC_ALL}. It defaults to
141 \constant{LC_CTYPE}.
142
143 Except for the code \code{'C'}, the language code corresponds to
144 \rfc{1766}. \var{language code} and \var{encoding} may be
145 \code{None} if their values cannot be determined.
146 \versionadded{2.0}
147\end{funcdesc}
148
149\begin{funcdesc}{normalize}{localename}
150 Returns a normalized locale code for the given locale name. The
151 returned locale code is formatted for use with
152 \function{setlocale()}. If normalization fails, the original name
153 is returned unchanged.
154
155 If the given encoding is not known, the function defaults to
156 the default encoding for the locale code just like
157 \function{setlocale()}.
158 \versionadded{2.0}
159\end{funcdesc}
160
161\begin{funcdesc}{resetlocale}{\optional{category}}
162 Sets the locale for \var{category} to the default setting.
163
164 The default setting is determined by calling
165 \function{getdefaultlocale()}. \var{category} defaults to
166 \constant{LC_ALL}.
167 \versionadded{2.0}
168\end{funcdesc}
169
170\begin{funcdesc}{strcoll}{string1, string2}
171 Compares two strings according to the current
172 \constant{LC_COLLATE} setting. As any other compare function,
173 returns a negative, or a positive value, or \code{0}, depending on
174 whether \var{string1} collates before or after \var{string2} or is
175 equal to it.
Guido van Rossumbc12f781997-11-20 21:04:27 +0000176\end{funcdesc}
177
178\begin{funcdesc}{strxfrm}{string}
Fred Drake1491cac2000-10-25 20:59:52 +0000179 Transforms a string to one that can be used for the built-in
180 function \function{cmp()}\bifuncindex{cmp}, and still returns
181 locale-aware results. This function can be used when the same
182 string is compared repeatedly, e.g. when collating a sequence of
183 strings.
Guido van Rossumbc12f781997-11-20 21:04:27 +0000184\end{funcdesc}
185
Fred Drake1491cac2000-10-25 20:59:52 +0000186\begin{funcdesc}{format}{format, val\optional{, grouping}}
187 Formats a number \var{val} according to the current
188 \constant{LC_NUMERIC} setting. The format follows the conventions
189 of the \code{\%} operator. For floating point values, the decimal
190 point is modified if appropriate. If \var{grouping} is true, also
191 takes the grouping into account.
Guido van Rossumbc12f781997-11-20 21:04:27 +0000192\end{funcdesc}
193
194\begin{funcdesc}{str}{float}
Fred Drake1491cac2000-10-25 20:59:52 +0000195 Formats a floating point number using the same format as the
196 built-in function \code{str(\var{float})}, but takes the decimal
197 point into account.
Guido van Rossumbc12f781997-11-20 21:04:27 +0000198\end{funcdesc}
199
200\begin{funcdesc}{atof}{string}
Fred Drake1491cac2000-10-25 20:59:52 +0000201 Converts a string to a floating point number, following the
202 \constant{LC_NUMERIC} settings.
Guido van Rossumbc12f781997-11-20 21:04:27 +0000203\end{funcdesc}
204
205\begin{funcdesc}{atoi}{string}
Fred Drake1491cac2000-10-25 20:59:52 +0000206 Converts a string to an integer, following the
207 \constant{LC_NUMERIC} conventions.
Guido van Rossumbc12f781997-11-20 21:04:27 +0000208\end{funcdesc}
209
210\begin{datadesc}{LC_CTYPE}
Fred Drake304474f1997-12-17 15:30:07 +0000211\refstmodindex{string}
Fred Drake1491cac2000-10-25 20:59:52 +0000212 Locale category for the character type functions. Depending on the
213 settings of this category, the functions of module
214 \refmodule{string} dealing with case change their behaviour.
Guido van Rossumbc12f781997-11-20 21:04:27 +0000215\end{datadesc}
216
217\begin{datadesc}{LC_COLLATE}
Fred Drake1491cac2000-10-25 20:59:52 +0000218 Locale category for sorting strings. The functions
219 \function{strcoll()} and \function{strxfrm()} of the
220 \module{locale} module are affected.
Guido van Rossumbc12f781997-11-20 21:04:27 +0000221\end{datadesc}
222
223\begin{datadesc}{LC_TIME}
Fred Drake1491cac2000-10-25 20:59:52 +0000224 Locale category for the formatting of time. The function
225 \function{time.strftime()} follows these conventions.
Guido van Rossumbc12f781997-11-20 21:04:27 +0000226\end{datadesc}
227
228\begin{datadesc}{LC_MONETARY}
Fred Drake1491cac2000-10-25 20:59:52 +0000229 Locale category for formatting of monetary values. The available
230 options are available from the \function{localeconv()} function.
Guido van Rossumbc12f781997-11-20 21:04:27 +0000231\end{datadesc}
232
233\begin{datadesc}{LC_MESSAGES}
Fred Drake1491cac2000-10-25 20:59:52 +0000234 Locale category for message display. Python currently does not
235 support application specific locale-aware messages. Messages
236 displayed by the operating system, like those returned by
237 \function{os.strerror()} might be affected by this category.
Guido van Rossumbc12f781997-11-20 21:04:27 +0000238\end{datadesc}
239
240\begin{datadesc}{LC_NUMERIC}
Fred Drake1491cac2000-10-25 20:59:52 +0000241 Locale category for formatting numbers. The functions
242 \function{format()}, \function{atoi()}, \function{atof()} and
243 \function{str()} of the \module{locale} module are affected by that
244 category. All other numeric formatting operations are not
245 affected.
Guido van Rossumbc12f781997-11-20 21:04:27 +0000246\end{datadesc}
247
248\begin{datadesc}{LC_ALL}
Fred Drake1491cac2000-10-25 20:59:52 +0000249 Combination of all locale settings. If this flag is used when the
250 locale is changed, setting the locale for all categories is
251 attempted. If that fails for any category, no category is changed at
252 all. When the locale is retrieved using this flag, a string
253 indicating the setting for all categories is returned. This string
254 can be later used to restore the settings.
Guido van Rossumbc12f781997-11-20 21:04:27 +0000255\end{datadesc}
256
257\begin{datadesc}{CHAR_MAX}
Fred Drake1491cac2000-10-25 20:59:52 +0000258 This is a symbolic constant used for different values returned by
259 \function{localeconv()}.
Guido van Rossumbc12f781997-11-20 21:04:27 +0000260\end{datadesc}
261
Guido van Rossumbc12f781997-11-20 21:04:27 +0000262Example:
263
Fred Drake19479911998-02-13 06:58:54 +0000264\begin{verbatim}
Guido van Rossumbc12f781997-11-20 21:04:27 +0000265>>> import locale
Guido van Rossumd028ca91998-02-22 04:41:51 +0000266>>> loc = locale.setlocale(locale.LC_ALL) # get current locale
Fred Drakec01f6e62000-11-30 07:13:58 +0000267>>> locale.setlocale(locale.LC_ALL, 'de') # use German locale
Ka-Ping Yeefa004ad2001-01-24 17:19:08 +0000268>>> locale.strcoll('f\xe4n', 'foo') # compare a string containing an umlaut
Fred Drakec01f6e62000-11-30 07:13:58 +0000269>>> locale.setlocale(locale.LC_ALL, '') # use user's preferred locale
270>>> locale.setlocale(locale.LC_ALL, 'C') # use default (C) locale
Guido van Rossumd028ca91998-02-22 04:41:51 +0000271>>> locale.setlocale(locale.LC_ALL, loc) # restore saved locale
Fred Drake19479911998-02-13 06:58:54 +0000272\end{verbatim}
Guido van Rossum3ffb7151998-02-22 04:23:51 +0000273
Fred Drake1491cac2000-10-25 20:59:52 +0000274
Guido van Rossum3ffb7151998-02-22 04:23:51 +0000275\subsection{Background, details, hints, tips and caveats}
276
277The C standard defines the locale as a program-wide property that may
278be relatively expensive to change. On top of that, some
279implementation are broken in such a way that frequent locale changes
280may cause core dumps. This makes the locale somewhat painful to use
281correctly.
282
Fred Drake9fee0711998-04-03 06:21:23 +0000283Initially, when a program is started, the locale is the \samp{C} locale, no
Guido van Rossum3ffb7151998-02-22 04:23:51 +0000284matter what the user's preferred locale is. The program must
285explicitly say that it wants the user's preferred locale settings by
Fred Drakec01f6e62000-11-30 07:13:58 +0000286calling \code{setlocale(LC_ALL, '')}.
Guido van Rossum3ffb7151998-02-22 04:23:51 +0000287
Fred Drake193338a1998-03-10 04:23:12 +0000288It is generally a bad idea to call \function{setlocale()} in some library
Guido van Rossum3ffb7151998-02-22 04:23:51 +0000289routine, since as a side effect it affects the entire program. Saving
290and restoring it is almost as bad: it is expensive and affects other
291threads that happen to run before the settings have been restored.
292
293If, when coding a module for general use, you need a locale
294independent version of an operation that is affected by the locale
Fred Drake193338a1998-03-10 04:23:12 +0000295(e.g. \function{string.lower()}, or certain formats used with
296\function{time.strftime()})), you will have to find a way to do it
297without using the standard library routine. Even better is convincing
298yourself that using locale settings is okay. Only as a last resort
Fred Drake9fee0711998-04-03 06:21:23 +0000299should you document that your module is not compatible with
300non-\samp{C} locale settings.
Guido van Rossum3ffb7151998-02-22 04:23:51 +0000301
Fred Drake193338a1998-03-10 04:23:12 +0000302The case conversion functions in the
Fred Drakec3845a11999-04-21 17:18:04 +0000303\refmodule{string}\refstmodindex{string} and
Fred Drake193338a1998-03-10 04:23:12 +0000304\module{strop}\refbimodindex{strop} modules are affected by the locale
305settings. When a call to the \function{setlocale()} function changes
306the \constant{LC_CTYPE} settings, the variables
307\code{string.lowercase}, \code{string.uppercase} and
308\code{string.letters} (and their counterparts in \module{strop}) are
Guido van Rossum3ffb7151998-02-22 04:23:51 +0000309recalculated. Note that this code that uses these variable through
Fred Drake193338a1998-03-10 04:23:12 +0000310`\keyword{from} ... \keyword{import} ...', e.g. \code{from string
311import letters}, is not affected by subsequent \function{setlocale()}
312calls.
Guido van Rossum3ffb7151998-02-22 04:23:51 +0000313
314The only way to perform numeric operations according to the locale
315is to use the special functions defined by this module:
Fred Drake193338a1998-03-10 04:23:12 +0000316\function{atof()}, \function{atoi()}, \function{format()},
317\function{str()}.
Guido van Rossum3ffb7151998-02-22 04:23:51 +0000318
Fred Drake1491cac2000-10-25 20:59:52 +0000319\subsection{For extension writers and programs that embed Python
320 \label{embedding-locale}}
Guido van Rossum3ffb7151998-02-22 04:23:51 +0000321
Fred Drake193338a1998-03-10 04:23:12 +0000322Extension modules should never call \function{setlocale()}, except to
323find out what the current locale is. But since the return value can
324only be used portably to restore it, that is not very useful (except
Fred Drake9fee0711998-04-03 06:21:23 +0000325perhaps to find out whether or not the locale is \samp{C}).
Guido van Rossum3ffb7151998-02-22 04:23:51 +0000326
327When Python is embedded in an application, if the application sets the
328locale to something specific before initializing Python, that is
329generally okay, and Python will use whatever locale is set,
Fred Drake9fee0711998-04-03 06:21:23 +0000330\emph{except} that the \constant{LC_NUMERIC} locale should always be
331\samp{C}.
Guido van Rossum3ffb7151998-02-22 04:23:51 +0000332
Fred Drake85b56831999-07-01 16:31:03 +0000333The \function{setlocale()} function in the \module{locale} module
Thomas Woutersf8316632000-07-16 19:01:10 +0000334gives the Python programmer the impression that you can manipulate the
Fred Drakec3845a11999-04-21 17:18:04 +0000335\constant{LC_NUMERIC} locale setting, but this not the case at the C
336level: C code will always find that the \constant{LC_NUMERIC} locale
Fred Drake9fee0711998-04-03 06:21:23 +0000337setting is \samp{C}. This is because too much would break when the
Guido van Rossum3ffb7151998-02-22 04:23:51 +0000338decimal point character is set to something else than a period
339(e.g. the Python parser would break). Caveat: threads that run
340without holding Python's global interpreter lock may occasionally find
341that the numeric locale setting differs; this is because the only
342portable way to implement this feature is to set the numeric locale
343settings to what the user requests, extract the relevant
Fred Drake9fee0711998-04-03 06:21:23 +0000344characteristics, and then restore the \samp{C} numeric locale.
Guido van Rossum3ffb7151998-02-22 04:23:51 +0000345
Fred Drake193338a1998-03-10 04:23:12 +0000346When Python code uses the \module{locale} module to change the locale,
Fred Draked8a41e61999-02-19 17:54:10 +0000347this also affects the embedding application. If the embedding
Guido van Rossum3ffb7151998-02-22 04:23:51 +0000348application doesn't want this to happen, it should remove the
Fred Drake193338a1998-03-10 04:23:12 +0000349\module{_locale} extension module (which does all the work) from the
350table of built-in modules in the \file{config.c} file, and make sure
351that the \module{_locale} module is not accessible as a shared library.