blob: 7f40f069de7465fbdebf5266f18878e1bebb2f88 [file] [log] [blame]
Fred Drake295da241998-08-10 19:42:37 +00001\section{\module{locale} ---
Fred Drakec3845a11999-04-21 17:18:04 +00002 Internationalization services}
3
Fred Drakeb91e9341998-07-23 17:59:49 +00004\declaremodule{standard}{locale}
Fred Drakeb91e9341998-07-23 17:59:49 +00005\modulesynopsis{Internationalization services.}
Fred Drake1491cac2000-10-25 20:59:52 +00006\moduleauthor{Martin von L\"owis}{loewis@informatik.hu-berlin.de}
7\sectionauthor{Martin von L\"owis}{loewis@informatik.hu-berlin.de}
Fred Drakeb91e9341998-07-23 17:59:49 +00008
Guido van Rossumbc12f781997-11-20 21:04:27 +00009
Fred Drake1491cac2000-10-25 20:59:52 +000010The \module{locale} module opens access to the \POSIX{} locale
11database and functionality. The \POSIX{} locale mechanism allows
12programmers to deal with certain cultural issues in an application,
13without requiring the programmer to know all the specifics of each
14country where the software is executed.
Guido van Rossumbc12f781997-11-20 21:04:27 +000015
Fred Drake193338a1998-03-10 04:23:12 +000016The \module{locale} module is implemented on top of the
17\module{_locale}\refbimodindex{_locale} module, which in turn uses an
Fred Drakec3845a11999-04-21 17:18:04 +000018ANSI C locale implementation if available.
Guido van Rossumbc12f781997-11-20 21:04:27 +000019
Fred Drake193338a1998-03-10 04:23:12 +000020The \module{locale} module defines the following exception and
21functions:
Guido van Rossumbc12f781997-11-20 21:04:27 +000022
Guido van Rossumbc12f781997-11-20 21:04:27 +000023
Fred Drake1491cac2000-10-25 20:59:52 +000024\begin{excdesc}{Error}
25 Exception raised when \function{setlocale()} fails.
26\end{excdesc}
Guido van Rossumbc12f781997-11-20 21:04:27 +000027
Fred Drake1491cac2000-10-25 20:59:52 +000028\begin{funcdesc}{setlocale}{category\optional{, locale}}
29 If \var{locale} is specified, it may be a string, a tuple of the
30 form \code{(\var{language code}, \var{encoding})}, or \code{None}.
31 If it is a tuple, it is converted to a string using the locale
32 aliasing engine. If \var{locale} is given and not \code{None},
33 \function{setlocale()} modifies the locale setting for the
34 \var{category}. The available categories are listed in the data
35 description below. The value is the name of a locale. An empty
36 string specifies the user's default settings. If the modification of
37 the locale fails, the exception \exception{Error} is raised. If
38 successful, the new locale setting is returned.
Guido van Rossumbc12f781997-11-20 21:04:27 +000039
Fred Drake1491cac2000-10-25 20:59:52 +000040 If \var{locale} is omitted or \code{None}, the current setting for
41 \var{category} is returned.
42
43 \function{setlocale()} is not thread safe on most systems.
44 Applications typically start with a call of
45
Fred Drake19479911998-02-13 06:58:54 +000046\begin{verbatim}
Guido van Rossumbc12f781997-11-20 21:04:27 +000047import locale
48locale.setlocale(locale.LC_ALL,"")
Fred Drake19479911998-02-13 06:58:54 +000049\end{verbatim}
Guido van Rossumbc12f781997-11-20 21:04:27 +000050
Fred Drake1491cac2000-10-25 20:59:52 +000051 This sets the locale for all categories to the user's default
52 setting (typically specified in the \envvar{LANG} environment
53 variable). If the locale is not changed thereafter, using
54 multithreading should not cause problems.
55
56 \versionchanged[Added support for tuple values of the \var{locale}
57 parameter]{2.0}
58\end{funcdesc}
Fred Drake193338a1998-03-10 04:23:12 +000059
Guido van Rossumbc12f781997-11-20 21:04:27 +000060\begin{funcdesc}{localeconv}{}
Fred Drake1491cac2000-10-25 20:59:52 +000061 Returns the database of of the local conventions as a dictionary.
62 This dictionary has the following strings as keys:
Fred Drake193338a1998-03-10 04:23:12 +000063
Fred Drake1491cac2000-10-25 20:59:52 +000064 \begin{itemize}
65 \item
66 \code{'decimal_point'} specifies the decimal point used in floating
67 point number representations for the \constant{LC_NUMERIC}
68 category.
Fred Drake193338a1998-03-10 04:23:12 +000069
Fred Drake1491cac2000-10-25 20:59:52 +000070 \item
71 \code{'groupin'} is a sequence of numbers specifying at which
72 relative positions the \code{'thousands_sep'} is expected. If the
73 sequence is terminated with \constant{CHAR_MAX}, no further
74 grouping is performed. If the sequence terminates with a \code{0},
75 the last group size is repeatedly used.
76
77 \item
78 \code{'thousands_sep'} is the character used between groups.
79
80 \item
81 \code{'int_curr_symbol'} specifies the international currency
82 symbol from the \constant{LC_MONETARY} category.
83
84 \item
85 \code{'currency_symbol'} is the local currency symbol.
86
87 \item
88 \code{'mon_decimal_point'} is the decimal point used in monetary
89 values.
90
91 \item
92 \code{'mon_thousands_sep'} is the separator for grouping of
93 monetary values.
94
95 \item
96 \code{'mon_grouping'} has the same format as the \code{'grouping'}
97 key; it is used for monetary values.
98
99 \item
100 \code{'positive_sign'} and \code{'negative_sign'} gives the sign
101 used for positive and negative monetary quantities.
102
103 \item
104 \code{'int_frac_digits'} and \code{'frac_digits'} specify the number
105 of fractional digits used in the international and local
106 formatting of monetary values.
107
108 \item
109 \code{'p_cs_precedes'} and \code{'n_cs_precedes'} specifies whether
110 the currency symbol precedes the value for positive or negative
111 values.
112
113 \item
114 \code{'p_sep_by_space'} and \code{'n_sep_by_space'} specifies
115 whether there is a space between the positive or negative value
116 and the currency symbol.
117
118 \item
119 \code{'p_sign_posn'} and \code{'n_sign_posn'} indicate how the
120 sign should be placed for positive and negative monetary values.
121 \end{itemize}
122
123 The possible values for \code{p_sign_posn} and
124 \code{n_sign_posn} are given below.
125
126 \begin{tableii}{c|l}{code}{Value}{Explanation}
127 \lineii{0}{Currency and value are surrounded by parentheses.}
128 \lineii{1}{The sign should precede the value and currency symbol.}
129 \lineii{2}{The sign should follow the value and currency symbol.}
130 \lineii{3}{The sign should immediately precede the value.}
131 \lineii{4}{The sign should immediately follow the value.}
132 \lineii{LC_MAX}{Nothing is specified in this locale.}
133 \end{tableii}
Guido van Rossumbc12f781997-11-20 21:04:27 +0000134\end{funcdesc}
135
Fred Drake1491cac2000-10-25 20:59:52 +0000136\begin{funcdesc}{getdefaultlocale}{\optional{envvars}}
137 Tries to determine the default locale settings and returns
138 them as a tuple of the form \code{(\var{language code},
139 \var{encoding})}.
140
141 According to \POSIX, a program which has not called
142 \code{setlocale(LC_ALL, '')} runs using the portable \code{'C'}
143 locale. Calling \code{setlocale(LC_ALL, '')} lets it use the
144 default locale as defined by the \envvar{LANG} variable. Since we
145 do not want to interfere with the current locale setting we thus
146 emulate the behavior in the way described above.
147
148 To maintain compatibility with other platforms, not only the
149 \envvar{LANG} variable is tested, but a list of variables given as
150 envvars parameter. The first found to be defined will be
151 used. \var{envvars} defaults to the search path used in GNU gettext;
152 it must always contain the variable name \samp{LANG}. The GNU
153 gettext search path contains \code{'LANGUAGE'}, \code{'LC_ALL'},
154 code{'LC_CTYPE'}, and \code{'LANG'}, in that order.
155
156 Except for the code \code{'C'}, the language code corresponds to
157 \rfc{1766}. \var{language code} and \var{encoding} may be
158 \code{None} if their values cannot be determined.
159 \versionadded{2.0}
160\end{funcdesc}
161
162\begin{funcdesc}{getlocale}{\optional{category}}
163 Returns the current setting for the given locale category as
164 tuple (language code, encoding). \var{category} may be one of the
165 \constant{LC_*} values except \constant{LC_ALL}. It defaults to
166 \constant{LC_CTYPE}.
167
168 Except for the code \code{'C'}, the language code corresponds to
169 \rfc{1766}. \var{language code} and \var{encoding} may be
170 \code{None} if their values cannot be determined.
171 \versionadded{2.0}
172\end{funcdesc}
173
174\begin{funcdesc}{normalize}{localename}
175 Returns a normalized locale code for the given locale name. The
176 returned locale code is formatted for use with
177 \function{setlocale()}. If normalization fails, the original name
178 is returned unchanged.
179
180 If the given encoding is not known, the function defaults to
181 the default encoding for the locale code just like
182 \function{setlocale()}.
183 \versionadded{2.0}
184\end{funcdesc}
185
186\begin{funcdesc}{resetlocale}{\optional{category}}
187 Sets the locale for \var{category} to the default setting.
188
189 The default setting is determined by calling
190 \function{getdefaultlocale()}. \var{category} defaults to
191 \constant{LC_ALL}.
192 \versionadded{2.0}
193\end{funcdesc}
194
195\begin{funcdesc}{strcoll}{string1, string2}
196 Compares two strings according to the current
197 \constant{LC_COLLATE} setting. As any other compare function,
198 returns a negative, or a positive value, or \code{0}, depending on
199 whether \var{string1} collates before or after \var{string2} or is
200 equal to it.
Guido van Rossumbc12f781997-11-20 21:04:27 +0000201\end{funcdesc}
202
203\begin{funcdesc}{strxfrm}{string}
Fred Drake1491cac2000-10-25 20:59:52 +0000204 Transforms a string to one that can be used for the built-in
205 function \function{cmp()}\bifuncindex{cmp}, and still returns
206 locale-aware results. This function can be used when the same
207 string is compared repeatedly, e.g. when collating a sequence of
208 strings.
Guido van Rossumbc12f781997-11-20 21:04:27 +0000209\end{funcdesc}
210
Fred Drake1491cac2000-10-25 20:59:52 +0000211\begin{funcdesc}{format}{format, val\optional{, grouping}}
212 Formats a number \var{val} according to the current
213 \constant{LC_NUMERIC} setting. The format follows the conventions
214 of the \code{\%} operator. For floating point values, the decimal
215 point is modified if appropriate. If \var{grouping} is true, also
216 takes the grouping into account.
Guido van Rossumbc12f781997-11-20 21:04:27 +0000217\end{funcdesc}
218
219\begin{funcdesc}{str}{float}
Fred Drake1491cac2000-10-25 20:59:52 +0000220 Formats a floating point number using the same format as the
221 built-in function \code{str(\var{float})}, but takes the decimal
222 point into account.
Guido van Rossumbc12f781997-11-20 21:04:27 +0000223\end{funcdesc}
224
225\begin{funcdesc}{atof}{string}
Fred Drake1491cac2000-10-25 20:59:52 +0000226 Converts a string to a floating point number, following the
227 \constant{LC_NUMERIC} settings.
Guido van Rossumbc12f781997-11-20 21:04:27 +0000228\end{funcdesc}
229
230\begin{funcdesc}{atoi}{string}
Fred Drake1491cac2000-10-25 20:59:52 +0000231 Converts a string to an integer, following the
232 \constant{LC_NUMERIC} conventions.
Guido van Rossumbc12f781997-11-20 21:04:27 +0000233\end{funcdesc}
234
235\begin{datadesc}{LC_CTYPE}
Fred Drake304474f1997-12-17 15:30:07 +0000236\refstmodindex{string}
Fred Drake1491cac2000-10-25 20:59:52 +0000237 Locale category for the character type functions. Depending on the
238 settings of this category, the functions of module
239 \refmodule{string} dealing with case change their behaviour.
Guido van Rossumbc12f781997-11-20 21:04:27 +0000240\end{datadesc}
241
242\begin{datadesc}{LC_COLLATE}
Fred Drake1491cac2000-10-25 20:59:52 +0000243 Locale category for sorting strings. The functions
244 \function{strcoll()} and \function{strxfrm()} of the
245 \module{locale} module are affected.
Guido van Rossumbc12f781997-11-20 21:04:27 +0000246\end{datadesc}
247
248\begin{datadesc}{LC_TIME}
Fred Drake1491cac2000-10-25 20:59:52 +0000249 Locale category for the formatting of time. The function
250 \function{time.strftime()} follows these conventions.
Guido van Rossumbc12f781997-11-20 21:04:27 +0000251\end{datadesc}
252
253\begin{datadesc}{LC_MONETARY}
Fred Drake1491cac2000-10-25 20:59:52 +0000254 Locale category for formatting of monetary values. The available
255 options are available from the \function{localeconv()} function.
Guido van Rossumbc12f781997-11-20 21:04:27 +0000256\end{datadesc}
257
258\begin{datadesc}{LC_MESSAGES}
Fred Drake1491cac2000-10-25 20:59:52 +0000259 Locale category for message display. Python currently does not
260 support application specific locale-aware messages. Messages
261 displayed by the operating system, like those returned by
262 \function{os.strerror()} might be affected by this category.
Guido van Rossumbc12f781997-11-20 21:04:27 +0000263\end{datadesc}
264
265\begin{datadesc}{LC_NUMERIC}
Fred Drake1491cac2000-10-25 20:59:52 +0000266 Locale category for formatting numbers. The functions
267 \function{format()}, \function{atoi()}, \function{atof()} and
268 \function{str()} of the \module{locale} module are affected by that
269 category. All other numeric formatting operations are not
270 affected.
Guido van Rossumbc12f781997-11-20 21:04:27 +0000271\end{datadesc}
272
273\begin{datadesc}{LC_ALL}
Fred Drake1491cac2000-10-25 20:59:52 +0000274 Combination of all locale settings. If this flag is used when the
275 locale is changed, setting the locale for all categories is
276 attempted. If that fails for any category, no category is changed at
277 all. When the locale is retrieved using this flag, a string
278 indicating the setting for all categories is returned. This string
279 can be later used to restore the settings.
Guido van Rossumbc12f781997-11-20 21:04:27 +0000280\end{datadesc}
281
282\begin{datadesc}{CHAR_MAX}
Fred Drake1491cac2000-10-25 20:59:52 +0000283 This is a symbolic constant used for different values returned by
284 \function{localeconv()}.
Guido van Rossumbc12f781997-11-20 21:04:27 +0000285\end{datadesc}
286
Guido van Rossumbc12f781997-11-20 21:04:27 +0000287Example:
288
Fred Drake19479911998-02-13 06:58:54 +0000289\begin{verbatim}
Guido van Rossumbc12f781997-11-20 21:04:27 +0000290>>> import locale
Guido van Rossumd028ca91998-02-22 04:41:51 +0000291>>> loc = locale.setlocale(locale.LC_ALL) # get current locale
292>>> locale.setlocale(locale.LC_ALL, "de") # use German locale
293>>> locale.strcoll("f\344n", "foo") # compare a string containing an umlaut
294>>> locale.setlocale(locale.LC_ALL, "") # use user's preferred locale
295>>> locale.setlocale(locale.LC_ALL, "C") # use default (C) locale
296>>> locale.setlocale(locale.LC_ALL, loc) # restore saved locale
Fred Drake19479911998-02-13 06:58:54 +0000297\end{verbatim}
Guido van Rossum3ffb7151998-02-22 04:23:51 +0000298
Fred Drake1491cac2000-10-25 20:59:52 +0000299
Guido van Rossum3ffb7151998-02-22 04:23:51 +0000300\subsection{Background, details, hints, tips and caveats}
301
302The C standard defines the locale as a program-wide property that may
303be relatively expensive to change. On top of that, some
304implementation are broken in such a way that frequent locale changes
305may cause core dumps. This makes the locale somewhat painful to use
306correctly.
307
Fred Drake9fee0711998-04-03 06:21:23 +0000308Initially, when a program is started, the locale is the \samp{C} locale, no
Guido van Rossum3ffb7151998-02-22 04:23:51 +0000309matter what the user's preferred locale is. The program must
310explicitly say that it wants the user's preferred locale settings by
311calling \code{setlocale(LC_ALL, "")}.
312
Fred Drake193338a1998-03-10 04:23:12 +0000313It is generally a bad idea to call \function{setlocale()} in some library
Guido van Rossum3ffb7151998-02-22 04:23:51 +0000314routine, since as a side effect it affects the entire program. Saving
315and restoring it is almost as bad: it is expensive and affects other
316threads that happen to run before the settings have been restored.
317
318If, when coding a module for general use, you need a locale
319independent version of an operation that is affected by the locale
Fred Drake193338a1998-03-10 04:23:12 +0000320(e.g. \function{string.lower()}, or certain formats used with
321\function{time.strftime()})), you will have to find a way to do it
322without using the standard library routine. Even better is convincing
323yourself that using locale settings is okay. Only as a last resort
Fred Drake9fee0711998-04-03 06:21:23 +0000324should you document that your module is not compatible with
325non-\samp{C} locale settings.
Guido van Rossum3ffb7151998-02-22 04:23:51 +0000326
Fred Drake193338a1998-03-10 04:23:12 +0000327The case conversion functions in the
Fred Drakec3845a11999-04-21 17:18:04 +0000328\refmodule{string}\refstmodindex{string} and
Fred Drake193338a1998-03-10 04:23:12 +0000329\module{strop}\refbimodindex{strop} modules are affected by the locale
330settings. When a call to the \function{setlocale()} function changes
331the \constant{LC_CTYPE} settings, the variables
332\code{string.lowercase}, \code{string.uppercase} and
333\code{string.letters} (and their counterparts in \module{strop}) are
Guido van Rossum3ffb7151998-02-22 04:23:51 +0000334recalculated. Note that this code that uses these variable through
Fred Drake193338a1998-03-10 04:23:12 +0000335`\keyword{from} ... \keyword{import} ...', e.g. \code{from string
336import letters}, is not affected by subsequent \function{setlocale()}
337calls.
Guido van Rossum3ffb7151998-02-22 04:23:51 +0000338
339The only way to perform numeric operations according to the locale
340is to use the special functions defined by this module:
Fred Drake193338a1998-03-10 04:23:12 +0000341\function{atof()}, \function{atoi()}, \function{format()},
342\function{str()}.
Guido van Rossum3ffb7151998-02-22 04:23:51 +0000343
Fred Drake1491cac2000-10-25 20:59:52 +0000344\subsection{For extension writers and programs that embed Python
345 \label{embedding-locale}}
Guido van Rossum3ffb7151998-02-22 04:23:51 +0000346
Fred Drake193338a1998-03-10 04:23:12 +0000347Extension modules should never call \function{setlocale()}, except to
348find out what the current locale is. But since the return value can
349only be used portably to restore it, that is not very useful (except
Fred Drake9fee0711998-04-03 06:21:23 +0000350perhaps to find out whether or not the locale is \samp{C}).
Guido van Rossum3ffb7151998-02-22 04:23:51 +0000351
352When Python is embedded in an application, if the application sets the
353locale to something specific before initializing Python, that is
354generally okay, and Python will use whatever locale is set,
Fred Drake9fee0711998-04-03 06:21:23 +0000355\emph{except} that the \constant{LC_NUMERIC} locale should always be
356\samp{C}.
Guido van Rossum3ffb7151998-02-22 04:23:51 +0000357
Fred Drake85b56831999-07-01 16:31:03 +0000358The \function{setlocale()} function in the \module{locale} module
Thomas Woutersf8316632000-07-16 19:01:10 +0000359gives the Python programmer the impression that you can manipulate the
Fred Drakec3845a11999-04-21 17:18:04 +0000360\constant{LC_NUMERIC} locale setting, but this not the case at the C
361level: C code will always find that the \constant{LC_NUMERIC} locale
Fred Drake9fee0711998-04-03 06:21:23 +0000362setting is \samp{C}. This is because too much would break when the
Guido van Rossum3ffb7151998-02-22 04:23:51 +0000363decimal point character is set to something else than a period
364(e.g. the Python parser would break). Caveat: threads that run
365without holding Python's global interpreter lock may occasionally find
366that the numeric locale setting differs; this is because the only
367portable way to implement this feature is to set the numeric locale
368settings to what the user requests, extract the relevant
Fred Drake9fee0711998-04-03 06:21:23 +0000369characteristics, and then restore the \samp{C} numeric locale.
Guido van Rossum3ffb7151998-02-22 04:23:51 +0000370
Fred Drake193338a1998-03-10 04:23:12 +0000371When Python code uses the \module{locale} module to change the locale,
Fred Draked8a41e61999-02-19 17:54:10 +0000372this also affects the embedding application. If the embedding
Guido van Rossum3ffb7151998-02-22 04:23:51 +0000373application doesn't want this to happen, it should remove the
Fred Drake193338a1998-03-10 04:23:12 +0000374\module{_locale} extension module (which does all the work) from the
375table of built-in modules in the \file{config.c} file, and make sure
376that the \module{_locale} module is not accessible as a shared library.