blob: c4c5506569494735ed1940043290440f1da44e2e [file] [log] [blame]
Fred Drake304474f1997-12-17 15:30:07 +00001\section{Standard Module \sectcode{locale}}
Guido van Rossumbc12f781997-11-20 21:04:27 +00002\stmodindex{locale}
3
4\label{module-locale}
5
Fred Drake65b32f71998-02-09 20:27:12 +00006The \code{locale} module opens access to the \POSIX{} locale database
7and functionality. The \POSIX{} locale mechanism allows applications
8to integrate certain cultural aspects into an applications, without
Guido van Rossumbc12f781997-11-20 21:04:27 +00009requiring the programmer to know all the specifics of each country
10where the software is executed.
11
12The \code{locale} module is implemented on top of the \code{_locale}
Fred Drake304474f1997-12-17 15:30:07 +000013module, which in turn uses an ANSI \C{} locale implementation if
14available.
15\refbimodindex{_locale}
Guido van Rossumbc12f781997-11-20 21:04:27 +000016
17The \code{locale} module defines the following functions:
18
Fred Drake19479911998-02-13 06:58:54 +000019\setindexsubitem{(in module locale)}
Guido van Rossumbc12f781997-11-20 21:04:27 +000020
21\begin{funcdesc}{setlocale}{category\optional{\, value}}
22If \var{value} is specified, modifies the locale setting for the
23\var{category}. The available categories are listed in the data
24description below. The value is the name of a locale. An empty string
25specifies the user's default settings. If the modification of the
26locale fails, the exception \code{locale.Error} is
27raised. If successful, the new locale setting is returned.
28
29If no \var{value} is specified, the current setting for the
30\var{category} is returned.
31
Fred Drake304474f1997-12-17 15:30:07 +000032\code{setlocale()} is not thread safe on most systems. Applications
Guido van Rossumbc12f781997-11-20 21:04:27 +000033typically start with a call of
Fred Drake19479911998-02-13 06:58:54 +000034\begin{verbatim}
Guido van Rossumbc12f781997-11-20 21:04:27 +000035import locale
36locale.setlocale(locale.LC_ALL,"")
Fred Drake19479911998-02-13 06:58:54 +000037\end{verbatim}
Guido van Rossumbc12f781997-11-20 21:04:27 +000038This sets the locale for all categories to the user's default setting
39(typically specified in the \code{LANG} environment variable). If the
40locale is not changed thereafter, using multithreading should not
41cause problems.
42\end{funcdesc}
43
44\begin{funcdesc}{localeconv}{}
45Returns the database of of the local conventions as a dictionary. This
46dictionary has the following strings as keys:
47\begin{itemize}
48\item \code{decimal_point} specifies the decimal point used in
49floating point number representations for the \code{LC_NUMERIC}
50category.
51\item \code{grouping} is a sequence of numbers specifying at which
52relative positions the \code{thousands_sep} is expected. If the
53sequence is terminated with \code{locale.CHAR_MAX}, no further
Fred Drake304474f1997-12-17 15:30:07 +000054grouping is performed. If the sequence terminates with a \code{0}, the last
Guido van Rossumbc12f781997-11-20 21:04:27 +000055group size is repeatedly used.
56\item \code{thousands_sep} is the character used between groups.
57\item \code{int_curr_symbol} specifies the international currency
58symbol from the \code{LC_MONETARY} category.
59\item \code{currency_symbol} is the local currency symbol.
60\item \code{mon_decimal_point} is the decimal point used in monetary
61values.
62\item \code{mon_thousands_sep} is the separator for grouping of
63monetary values.
64\item \code{mon_grouping} has the same format as the \code{grouping}
65key; it is used for monetary values.
66\item \code{positive_sign} and \code{negative_sign} gives the sign
67used for positive and negative monetary quantities.
68\item \code{int_frac_digits} and \code{frac_digits} specify the number
69of fractional digits used in the international and local formatting
70of monetary values.
71\item \code{p_cs_precedes} and \code{n_cs_precedes} specifies whether
72the currency symbol precedes the value for positive or negative
73values.
74\item \code{p_sep_by_space} and \code{n_sep_by_space} specifies
75whether there is a space between the positive or negative value and
76the currency symbol.
77\item \code{p_sign_posn} and \code{n_sign_posn} indicate how the
78sign should be placed for positive and negative monetary values.
79\end{itemize}
80The possible values for \code{p_sign_posn} and \code{n_sign_posn}
81are given below.
82\begin{itemize}
83\item 0 - Currency and value are surrounded by parentheses.
84\item 1 - The sign should precede the value and currency symbol.
85\item 2 - The sign should follow the value and currency symbol.
86\item 3 - The sign should immediately precede the value.
87\item 4 - The sign should immediately follow the value.
88\item LC_MAX - nothing is specified in this locale.
89\end{itemize}
90\end{funcdesc}
91
92\begin{funcdesc}{strcoll}{string1,string2}
Fred Drake304474f1997-12-17 15:30:07 +000093Compares two strings according to the current \code{LC_COLLATE}
94setting. As any other compare function, returns a negative, or a
95positive value, or \code{0}, depending on whether \var{string1}
96collates before or after \var{string2} or is equal to it.
Guido van Rossumbc12f781997-11-20 21:04:27 +000097\end{funcdesc}
98
99\begin{funcdesc}{strxfrm}{string}
100Transforms a string to one that can be used for the builtin function
Fred Drake304474f1997-12-17 15:30:07 +0000101\code{cmp()}, and still returns locale-aware results. This function can be
Guido van Rossumbc12f781997-11-20 21:04:27 +0000102used when the same string is compared repeatedly, e.g. when collating
103a sequence of strings.
104\end{funcdesc}
105
106\begin{funcdesc}{format}{format,val\optional{grouping=0}}
Fred Drake304474f1997-12-17 15:30:07 +0000107Formats a number \var{val} according to the current \code{LC_NUMERIC}
108setting. The format follows the conventions of the \code{\%} operator. For
Guido van Rossumbc12f781997-11-20 21:04:27 +0000109floating point values, the decimal point is modified if
110appropriate. If \var{grouping} is true, also takes the grouping into
111account.
112\end{funcdesc}
113
114\begin{funcdesc}{str}{float}
Fred Drake304474f1997-12-17 15:30:07 +0000115Formats a floating point number using the same format as the built-in
116function \code{str(\var{float})}, but takes the decimal point into
117account.
Guido van Rossumbc12f781997-11-20 21:04:27 +0000118\end{funcdesc}
119
120\begin{funcdesc}{atof}{string}
Fred Drake304474f1997-12-17 15:30:07 +0000121Converts a string to a floating point number, following the \code{LC_NUMERIC}
Guido van Rossumbc12f781997-11-20 21:04:27 +0000122settings.
123\end{funcdesc}
124
125\begin{funcdesc}{atoi}{string}
Fred Drake304474f1997-12-17 15:30:07 +0000126Converts a string to an integer, following the \code{LC_NUMERIC} conventions.
Guido van Rossumbc12f781997-11-20 21:04:27 +0000127\end{funcdesc}
128
129\begin{datadesc}{LC_CTYPE}
Fred Drake304474f1997-12-17 15:30:07 +0000130\refstmodindex{string}
Guido van Rossumbc12f781997-11-20 21:04:27 +0000131Locale category for the character type functions. Depending on the
132settings of this category, the functions of module \code{string}
133dealing with case change their behaviour.
134\end{datadesc}
135
136\begin{datadesc}{LC_COLLATE}
Fred Drake304474f1997-12-17 15:30:07 +0000137Locale category for sorting strings. The functions \code{strcoll()} and
138\code{strxfrm()} of the \code{locale} module are affected.
Guido van Rossumbc12f781997-11-20 21:04:27 +0000139\end{datadesc}
140
141\begin{datadesc}{LC_TIME}
142Locale category for the formatting of time. The function
Fred Drake304474f1997-12-17 15:30:07 +0000143\code{time.strftime()} follows these conventions.
Guido van Rossumbc12f781997-11-20 21:04:27 +0000144\end{datadesc}
145
146\begin{datadesc}{LC_MONETARY}
147Locale category for formatting of monetary values. The available
Fred Drake304474f1997-12-17 15:30:07 +0000148options are available from the \code{localeconv()} function.
Guido van Rossumbc12f781997-11-20 21:04:27 +0000149\end{datadesc}
150
151\begin{datadesc}{LC_MESSAGES}
152Locale category for message display. Python currently does not support
153application specific locale-aware messages. Messages displayed by the
Fred Drake304474f1997-12-17 15:30:07 +0000154operating system, like those returned by \code{posix.strerror()} might
Guido van Rossumbc12f781997-11-20 21:04:27 +0000155be affected by this category.
156\end{datadesc}
157
158\begin{datadesc}{LC_NUMERIC}
159Locale category for formatting numbers. The functions
Fred Drake304474f1997-12-17 15:30:07 +0000160\code{format()}, \code{atoi()}, \code{atof()} and \code{str()} of the
161\code{locale} module are affected by that category. All other numeric
162formatting operations are not affected.
Guido van Rossumbc12f781997-11-20 21:04:27 +0000163\end{datadesc}
164
165\begin{datadesc}{LC_ALL}
166Combination of all locale settings. If this flag is used when the
167locale is changed, setting the locale for all categories is
168attempted. If that fails for any category, no category is changed at
169all. When the locale is retrieved using this flag, a string indicating
170the setting for all categories is returned. This string can be later
171used to restore the settings.
172\end{datadesc}
173
174\begin{datadesc}{CHAR_MAX}
175This is a symbolic constant used for different values returned by
Fred Drake304474f1997-12-17 15:30:07 +0000176\code{localeconv()}.
Guido van Rossumbc12f781997-11-20 21:04:27 +0000177\end{datadesc}
178
179\begin{excdesc}{Error}
Fred Drake304474f1997-12-17 15:30:07 +0000180Exception raised when \code{setlocale()} fails.
Guido van Rossumbc12f781997-11-20 21:04:27 +0000181\end{excdesc}
182
183Example:
184
Fred Drake19479911998-02-13 06:58:54 +0000185\begin{verbatim}
Guido van Rossumbc12f781997-11-20 21:04:27 +0000186>>> import locale
187>>> locale.open(locale.LC_ALL,"de") #setting locale to German
188>>> locale.strcoll("f\344n","foo") #comparing a string containing an umlaut
189>>> can.close()
Fred Drake19479911998-02-13 06:58:54 +0000190\end{verbatim}
Guido van Rossum3ffb7151998-02-22 04:23:51 +0000191
192\subsection{Background, details, hints, tips and caveats}
193
194The C standard defines the locale as a program-wide property that may
195be relatively expensive to change. On top of that, some
196implementation are broken in such a way that frequent locale changes
197may cause core dumps. This makes the locale somewhat painful to use
198correctly.
199
200Initially, when a program is started, the locale is the "C" locale, no
201matter what the user's preferred locale is. The program must
202explicitly say that it wants the user's preferred locale settings by
203calling \code{setlocale(LC_ALL, "")}.
204
205It is generally a bad idea to call \code{setlocale()} in some library
206routine, since as a side effect it affects the entire program. Saving
207and restoring it is almost as bad: it is expensive and affects other
208threads that happen to run before the settings have been restored.
209
210If, when coding a module for general use, you need a locale
211independent version of an operation that is affected by the locale
212(e.g. \code{string.lower()}, or certain formats used with
213\code{time.strftime()})), you will have to find a way to do it without
214using the standard library routine. Even better is convincing
215yourself that using locale settings is okay. Only as a last should
216you document that your module is not compatible with non-C locale
217settings.
218
219The case conversion functions in the \code{string} and \code{strop}
220modules are affected by the locale settings. When a call to the
221\code{setlocale()} function changes the \code{LC_CTYPE} settings, the
222variables \code{string.lowercase}, \code{string.uppercase} and
223\code{string.letters} (and their counterparts in \code{strop}) are
224recalculated. Note that this code that uses these variable through
225\code{from ... import ...}, e.g. \code{from string import letters}, is
226not affected by subsequent \code{setlocale()} calls.
227
228The only way to perform numeric operations according to the locale
229is to use the special functions defined by this module:
230\code{atof()}, \code{atoi()}, \code{format()}, \code{str()}.
231
232\code{For extension writers and programs that embed Python}
233
234Extension modules should never call \code{setlocale()}, except to find
235out what the current locale is. But since the return value can only
236be used portably to restore it, that is not very useful (except
237perhaps to find out whether or not the locale is ``C'').
238
239When Python is embedded in an application, if the application sets the
240locale to something specific before initializing Python, that is
241generally okay, and Python will use whatever locale is set,
242\strong{except} that the \code{LC_NUMERIC} locale should always be
243``C''.
244
245The \code{setlocale()} function in the \code{locale} module contains
246gives the Python progammer the impression that you can manipulate the
247\code{LC_NUMERIC} locale setting, but this not the case at the C
248level: C code will always find that the \code{LC_NUMERIC} locale
249setting is ``C''. This is because too much would break when the
250decimal point character is set to something else than a period
251(e.g. the Python parser would break). Caveat: threads that run
252without holding Python's global interpreter lock may occasionally find
253that the numeric locale setting differs; this is because the only
254portable way to implement this feature is to set the numeric locale
255settings to what the user requests, extract the relevant
256characteristics, and then restore the ``C'' numeric locale.
257
258When Python code uses the \code{locale} module to change the locale,
259this also affect the embedding application. If the embedding
260application doesn't want this to happen, it should remove the
261\code{_locale} extension module (which does all the work) from the
262table of built-in modules in the \code{config.c} file, and make sure
263that the \code{_locale} module is not accessible as a shared library.