Fred Drake | 304474f | 1997-12-17 15:30:07 +0000 | [diff] [blame] | 1 | \section{Standard Module \sectcode{locale}} |
Guido van Rossum | bc12f78 | 1997-11-20 21:04:27 +0000 | [diff] [blame] | 2 | \stmodindex{locale} |
| 3 | |
| 4 | \label{module-locale} |
| 5 | |
Fred Drake | 65b32f7 | 1998-02-09 20:27:12 +0000 | [diff] [blame] | 6 | The \code{locale} module opens access to the \POSIX{} locale database |
| 7 | and functionality. The \POSIX{} locale mechanism allows applications |
| 8 | to integrate certain cultural aspects into an applications, without |
Guido van Rossum | bc12f78 | 1997-11-20 21:04:27 +0000 | [diff] [blame] | 9 | requiring the programmer to know all the specifics of each country |
| 10 | where the software is executed. |
| 11 | |
Fred Drake | 193338a | 1998-03-10 04:23:12 +0000 | [diff] [blame] | 12 | The \module{locale} module is implemented on top of the |
| 13 | \module{_locale}\refbimodindex{_locale} module, which in turn uses an |
| 14 | ANSI \C{} locale implementation if available. |
Guido van Rossum | bc12f78 | 1997-11-20 21:04:27 +0000 | [diff] [blame] | 15 | |
Fred Drake | 193338a | 1998-03-10 04:23:12 +0000 | [diff] [blame] | 16 | The \module{locale} module defines the following exception and |
| 17 | functions: |
Guido van Rossum | bc12f78 | 1997-11-20 21:04:27 +0000 | [diff] [blame] | 18 | |
Guido van Rossum | bc12f78 | 1997-11-20 21:04:27 +0000 | [diff] [blame] | 19 | |
Fred Drake | 193338a | 1998-03-10 04:23:12 +0000 | [diff] [blame] | 20 | \begin{funcdesc}{setlocale}{category\optional{, value}} |
Guido van Rossum | bc12f78 | 1997-11-20 21:04:27 +0000 | [diff] [blame] | 21 | If \var{value} is specified, modifies the locale setting for the |
| 22 | \var{category}. The available categories are listed in the data |
| 23 | description below. The value is the name of a locale. An empty string |
| 24 | specifies the user's default settings. If the modification of the |
Fred Drake | 193338a | 1998-03-10 04:23:12 +0000 | [diff] [blame] | 25 | locale fails, the exception \exception{Error} is |
Guido van Rossum | bc12f78 | 1997-11-20 21:04:27 +0000 | [diff] [blame] | 26 | raised. If successful, the new locale setting is returned. |
| 27 | |
| 28 | If no \var{value} is specified, the current setting for the |
| 29 | \var{category} is returned. |
| 30 | |
Fred Drake | 193338a | 1998-03-10 04:23:12 +0000 | [diff] [blame] | 31 | \function{setlocale()} is not thread safe on most systems. Applications |
Guido van Rossum | bc12f78 | 1997-11-20 21:04:27 +0000 | [diff] [blame] | 32 | typically start with a call of |
Fred Drake | 1947991 | 1998-02-13 06:58:54 +0000 | [diff] [blame] | 33 | \begin{verbatim} |
Guido van Rossum | bc12f78 | 1997-11-20 21:04:27 +0000 | [diff] [blame] | 34 | import locale |
| 35 | locale.setlocale(locale.LC_ALL,"") |
Fred Drake | 1947991 | 1998-02-13 06:58:54 +0000 | [diff] [blame] | 36 | \end{verbatim} |
Guido van Rossum | bc12f78 | 1997-11-20 21:04:27 +0000 | [diff] [blame] | 37 | This sets the locale for all categories to the user's default setting |
| 38 | (typically specified in the \code{LANG} environment variable). If the |
| 39 | locale is not changed thereafter, using multithreading should not |
| 40 | cause problems. |
| 41 | \end{funcdesc} |
| 42 | |
Fred Drake | 193338a | 1998-03-10 04:23:12 +0000 | [diff] [blame] | 43 | \begin{excdesc}{Error} |
| 44 | Exception raised when \function{setlocale()} fails. |
| 45 | \end{excdesc} |
| 46 | |
Guido van Rossum | bc12f78 | 1997-11-20 21:04:27 +0000 | [diff] [blame] | 47 | \begin{funcdesc}{localeconv}{} |
| 48 | Returns the database of of the local conventions as a dictionary. This |
| 49 | dictionary has the following strings as keys: |
| 50 | \begin{itemize} |
| 51 | \item \code{decimal_point} specifies the decimal point used in |
| 52 | floating point number representations for the \code{LC_NUMERIC} |
| 53 | category. |
| 54 | \item \code{grouping} is a sequence of numbers specifying at which |
| 55 | relative positions the \code{thousands_sep} is expected. If the |
| 56 | sequence is terminated with \code{locale.CHAR_MAX}, no further |
Fred Drake | 304474f | 1997-12-17 15:30:07 +0000 | [diff] [blame] | 57 | grouping is performed. If the sequence terminates with a \code{0}, the last |
Guido van Rossum | bc12f78 | 1997-11-20 21:04:27 +0000 | [diff] [blame] | 58 | group size is repeatedly used. |
| 59 | \item \code{thousands_sep} is the character used between groups. |
| 60 | \item \code{int_curr_symbol} specifies the international currency |
| 61 | symbol from the \code{LC_MONETARY} category. |
| 62 | \item \code{currency_symbol} is the local currency symbol. |
| 63 | \item \code{mon_decimal_point} is the decimal point used in monetary |
| 64 | values. |
| 65 | \item \code{mon_thousands_sep} is the separator for grouping of |
| 66 | monetary values. |
| 67 | \item \code{mon_grouping} has the same format as the \code{grouping} |
| 68 | key; it is used for monetary values. |
| 69 | \item \code{positive_sign} and \code{negative_sign} gives the sign |
| 70 | used for positive and negative monetary quantities. |
| 71 | \item \code{int_frac_digits} and \code{frac_digits} specify the number |
| 72 | of fractional digits used in the international and local formatting |
| 73 | of monetary values. |
| 74 | \item \code{p_cs_precedes} and \code{n_cs_precedes} specifies whether |
| 75 | the currency symbol precedes the value for positive or negative |
| 76 | values. |
| 77 | \item \code{p_sep_by_space} and \code{n_sep_by_space} specifies |
| 78 | whether there is a space between the positive or negative value and |
| 79 | the currency symbol. |
| 80 | \item \code{p_sign_posn} and \code{n_sign_posn} indicate how the |
| 81 | sign should be placed for positive and negative monetary values. |
| 82 | \end{itemize} |
Fred Drake | 193338a | 1998-03-10 04:23:12 +0000 | [diff] [blame] | 83 | |
Guido van Rossum | bc12f78 | 1997-11-20 21:04:27 +0000 | [diff] [blame] | 84 | The possible values for \code{p_sign_posn} and \code{n_sign_posn} |
| 85 | are given below. |
Fred Drake | 193338a | 1998-03-10 04:23:12 +0000 | [diff] [blame] | 86 | |
| 87 | \begin{tableii}{|c|l|}{code}{Value}{Explanation} |
| 88 | \lineii{0}{Currency and value are surrounded by parentheses.} |
| 89 | \lineii{1}{The sign should precede the value and currency symbol.} |
| 90 | \lineii{2}{The sign should follow the value and currency symbol.} |
| 91 | \lineii{3}{The sign should immediately precede the value.} |
| 92 | \lineii{4}{The sign should immediately follow the value.} |
| 93 | \lineii{LC_MAX}{Nothing is specified in this locale.} |
| 94 | \end{tableii} |
Guido van Rossum | bc12f78 | 1997-11-20 21:04:27 +0000 | [diff] [blame] | 95 | \end{funcdesc} |
| 96 | |
| 97 | \begin{funcdesc}{strcoll}{string1,string2} |
Fred Drake | 193338a | 1998-03-10 04:23:12 +0000 | [diff] [blame] | 98 | Compares two strings according to the current \constant{LC_COLLATE} |
Fred Drake | 304474f | 1997-12-17 15:30:07 +0000 | [diff] [blame] | 99 | setting. As any other compare function, returns a negative, or a |
| 100 | positive value, or \code{0}, depending on whether \var{string1} |
| 101 | collates before or after \var{string2} or is equal to it. |
Guido van Rossum | bc12f78 | 1997-11-20 21:04:27 +0000 | [diff] [blame] | 102 | \end{funcdesc} |
| 103 | |
| 104 | \begin{funcdesc}{strxfrm}{string} |
| 105 | Transforms a string to one that can be used for the builtin function |
Fred Drake | 193338a | 1998-03-10 04:23:12 +0000 | [diff] [blame] | 106 | \function{cmp()}\bifuncindex{cmp}, and still returns locale-aware |
| 107 | results. This function can be used when the same string is compared |
| 108 | repeatedly, e.g. when collating a sequence of strings. |
Guido van Rossum | bc12f78 | 1997-11-20 21:04:27 +0000 | [diff] [blame] | 109 | \end{funcdesc} |
| 110 | |
Fred Drake | 193338a | 1998-03-10 04:23:12 +0000 | [diff] [blame] | 111 | \begin{funcdesc}{format}{format, val, \optional{grouping\code{ = 0}}} |
| 112 | Formats a number \var{val} according to the current |
| 113 | \constant{LC_NUMERIC} setting. The format follows the conventions of |
| 114 | the \code{\%} operator. For floating point values, the decimal point |
| 115 | is modified if appropriate. If \var{grouping} is true, also takes the |
| 116 | grouping into account. |
Guido van Rossum | bc12f78 | 1997-11-20 21:04:27 +0000 | [diff] [blame] | 117 | \end{funcdesc} |
| 118 | |
| 119 | \begin{funcdesc}{str}{float} |
Fred Drake | 304474f | 1997-12-17 15:30:07 +0000 | [diff] [blame] | 120 | Formats a floating point number using the same format as the built-in |
| 121 | function \code{str(\var{float})}, but takes the decimal point into |
| 122 | account. |
Guido van Rossum | bc12f78 | 1997-11-20 21:04:27 +0000 | [diff] [blame] | 123 | \end{funcdesc} |
| 124 | |
| 125 | \begin{funcdesc}{atof}{string} |
Fred Drake | 193338a | 1998-03-10 04:23:12 +0000 | [diff] [blame] | 126 | Converts a string to a floating point number, following the |
| 127 | \constant{LC_NUMERIC} settings. |
Guido van Rossum | bc12f78 | 1997-11-20 21:04:27 +0000 | [diff] [blame] | 128 | \end{funcdesc} |
| 129 | |
| 130 | \begin{funcdesc}{atoi}{string} |
Fred Drake | 193338a | 1998-03-10 04:23:12 +0000 | [diff] [blame] | 131 | Converts a string to an integer, following the \constant{LC_NUMERIC} |
| 132 | conventions. |
Guido van Rossum | bc12f78 | 1997-11-20 21:04:27 +0000 | [diff] [blame] | 133 | \end{funcdesc} |
| 134 | |
| 135 | \begin{datadesc}{LC_CTYPE} |
Fred Drake | 304474f | 1997-12-17 15:30:07 +0000 | [diff] [blame] | 136 | \refstmodindex{string} |
Guido van Rossum | bc12f78 | 1997-11-20 21:04:27 +0000 | [diff] [blame] | 137 | Locale category for the character type functions. Depending on the |
Fred Drake | 193338a | 1998-03-10 04:23:12 +0000 | [diff] [blame] | 138 | settings of this category, the functions of module \module{string} |
Guido van Rossum | bc12f78 | 1997-11-20 21:04:27 +0000 | [diff] [blame] | 139 | dealing with case change their behaviour. |
| 140 | \end{datadesc} |
| 141 | |
| 142 | \begin{datadesc}{LC_COLLATE} |
Fred Drake | 193338a | 1998-03-10 04:23:12 +0000 | [diff] [blame] | 143 | Locale category for sorting strings. The functions |
| 144 | \function{strcoll()} and \function{strxfrm()} of the \module{locale} |
| 145 | module are affected. |
Guido van Rossum | bc12f78 | 1997-11-20 21:04:27 +0000 | [diff] [blame] | 146 | \end{datadesc} |
| 147 | |
| 148 | \begin{datadesc}{LC_TIME} |
| 149 | Locale category for the formatting of time. The function |
Fred Drake | 193338a | 1998-03-10 04:23:12 +0000 | [diff] [blame] | 150 | \function{time.strftime()} follows these conventions. |
Guido van Rossum | bc12f78 | 1997-11-20 21:04:27 +0000 | [diff] [blame] | 151 | \end{datadesc} |
| 152 | |
| 153 | \begin{datadesc}{LC_MONETARY} |
| 154 | Locale category for formatting of monetary values. The available |
Fred Drake | 193338a | 1998-03-10 04:23:12 +0000 | [diff] [blame] | 155 | options are available from the \function{localeconv()} function. |
Guido van Rossum | bc12f78 | 1997-11-20 21:04:27 +0000 | [diff] [blame] | 156 | \end{datadesc} |
| 157 | |
| 158 | \begin{datadesc}{LC_MESSAGES} |
| 159 | Locale category for message display. Python currently does not support |
| 160 | application specific locale-aware messages. Messages displayed by the |
Fred Drake | 193338a | 1998-03-10 04:23:12 +0000 | [diff] [blame] | 161 | operating system, like those returned by \function{os.strerror()} |
| 162 | might be affected by this category. |
Guido van Rossum | bc12f78 | 1997-11-20 21:04:27 +0000 | [diff] [blame] | 163 | \end{datadesc} |
| 164 | |
| 165 | \begin{datadesc}{LC_NUMERIC} |
| 166 | Locale category for formatting numbers. The functions |
Fred Drake | 193338a | 1998-03-10 04:23:12 +0000 | [diff] [blame] | 167 | \function{format()}, \function{atoi()}, \function{atof()} and |
| 168 | \function{str()} of the \module{locale} module are affected by that |
| 169 | category. All other numeric formatting operations are not affected. |
Guido van Rossum | bc12f78 | 1997-11-20 21:04:27 +0000 | [diff] [blame] | 170 | \end{datadesc} |
| 171 | |
| 172 | \begin{datadesc}{LC_ALL} |
| 173 | Combination of all locale settings. If this flag is used when the |
| 174 | locale is changed, setting the locale for all categories is |
| 175 | attempted. If that fails for any category, no category is changed at |
| 176 | all. When the locale is retrieved using this flag, a string indicating |
| 177 | the setting for all categories is returned. This string can be later |
| 178 | used to restore the settings. |
| 179 | \end{datadesc} |
| 180 | |
| 181 | \begin{datadesc}{CHAR_MAX} |
| 182 | This is a symbolic constant used for different values returned by |
Fred Drake | 193338a | 1998-03-10 04:23:12 +0000 | [diff] [blame] | 183 | \function{localeconv()}. |
Guido van Rossum | bc12f78 | 1997-11-20 21:04:27 +0000 | [diff] [blame] | 184 | \end{datadesc} |
| 185 | |
Guido van Rossum | bc12f78 | 1997-11-20 21:04:27 +0000 | [diff] [blame] | 186 | Example: |
| 187 | |
Fred Drake | 1947991 | 1998-02-13 06:58:54 +0000 | [diff] [blame] | 188 | \begin{verbatim} |
Guido van Rossum | bc12f78 | 1997-11-20 21:04:27 +0000 | [diff] [blame] | 189 | >>> import locale |
Guido van Rossum | d028ca9 | 1998-02-22 04:41:51 +0000 | [diff] [blame] | 190 | >>> loc = locale.setlocale(locale.LC_ALL) # get current locale |
| 191 | >>> locale.setlocale(locale.LC_ALL, "de") # use German locale |
| 192 | >>> locale.strcoll("f\344n", "foo") # compare a string containing an umlaut |
| 193 | >>> locale.setlocale(locale.LC_ALL, "") # use user's preferred locale |
| 194 | >>> locale.setlocale(locale.LC_ALL, "C") # use default (C) locale |
| 195 | >>> locale.setlocale(locale.LC_ALL, loc) # restore saved locale |
Fred Drake | 1947991 | 1998-02-13 06:58:54 +0000 | [diff] [blame] | 196 | \end{verbatim} |
Guido van Rossum | 3ffb715 | 1998-02-22 04:23:51 +0000 | [diff] [blame] | 197 | |
| 198 | \subsection{Background, details, hints, tips and caveats} |
| 199 | |
| 200 | The C standard defines the locale as a program-wide property that may |
| 201 | be relatively expensive to change. On top of that, some |
| 202 | implementation are broken in such a way that frequent locale changes |
| 203 | may cause core dumps. This makes the locale somewhat painful to use |
| 204 | correctly. |
| 205 | |
| 206 | Initially, when a program is started, the locale is the "C" locale, no |
| 207 | matter what the user's preferred locale is. The program must |
| 208 | explicitly say that it wants the user's preferred locale settings by |
| 209 | calling \code{setlocale(LC_ALL, "")}. |
| 210 | |
Fred Drake | 193338a | 1998-03-10 04:23:12 +0000 | [diff] [blame] | 211 | It is generally a bad idea to call \function{setlocale()} in some library |
Guido van Rossum | 3ffb715 | 1998-02-22 04:23:51 +0000 | [diff] [blame] | 212 | routine, since as a side effect it affects the entire program. Saving |
| 213 | and restoring it is almost as bad: it is expensive and affects other |
| 214 | threads that happen to run before the settings have been restored. |
| 215 | |
| 216 | If, when coding a module for general use, you need a locale |
| 217 | independent version of an operation that is affected by the locale |
Fred Drake | 193338a | 1998-03-10 04:23:12 +0000 | [diff] [blame] | 218 | (e.g. \function{string.lower()}, or certain formats used with |
| 219 | \function{time.strftime()})), you will have to find a way to do it |
| 220 | without using the standard library routine. Even better is convincing |
| 221 | yourself that using locale settings is okay. Only as a last resort |
| 222 | should you document that your module is not compatible with non-C |
| 223 | locale settings. |
Guido van Rossum | 3ffb715 | 1998-02-22 04:23:51 +0000 | [diff] [blame] | 224 | |
Fred Drake | 193338a | 1998-03-10 04:23:12 +0000 | [diff] [blame] | 225 | The case conversion functions in the |
| 226 | \module{string}\refstmodindex{string} and |
| 227 | \module{strop}\refbimodindex{strop} modules are affected by the locale |
| 228 | settings. When a call to the \function{setlocale()} function changes |
| 229 | the \constant{LC_CTYPE} settings, the variables |
| 230 | \code{string.lowercase}, \code{string.uppercase} and |
| 231 | \code{string.letters} (and their counterparts in \module{strop}) are |
Guido van Rossum | 3ffb715 | 1998-02-22 04:23:51 +0000 | [diff] [blame] | 232 | recalculated. Note that this code that uses these variable through |
Fred Drake | 193338a | 1998-03-10 04:23:12 +0000 | [diff] [blame] | 233 | `\keyword{from} ... \keyword{import} ...', e.g. \code{from string |
| 234 | import letters}, is not affected by subsequent \function{setlocale()} |
| 235 | calls. |
Guido van Rossum | 3ffb715 | 1998-02-22 04:23:51 +0000 | [diff] [blame] | 236 | |
| 237 | The only way to perform numeric operations according to the locale |
| 238 | is to use the special functions defined by this module: |
Fred Drake | 193338a | 1998-03-10 04:23:12 +0000 | [diff] [blame] | 239 | \function{atof()}, \function{atoi()}, \function{format()}, |
| 240 | \function{str()}. |
Guido van Rossum | 3ffb715 | 1998-02-22 04:23:51 +0000 | [diff] [blame] | 241 | |
Fred Drake | 193338a | 1998-03-10 04:23:12 +0000 | [diff] [blame] | 242 | \subsection{For extension writers and programs that embed Python} |
Guido van Rossum | 3ffb715 | 1998-02-22 04:23:51 +0000 | [diff] [blame] | 243 | |
Fred Drake | 193338a | 1998-03-10 04:23:12 +0000 | [diff] [blame] | 244 | Extension modules should never call \function{setlocale()}, except to |
| 245 | find out what the current locale is. But since the return value can |
| 246 | only be used portably to restore it, that is not very useful (except |
Guido van Rossum | 3ffb715 | 1998-02-22 04:23:51 +0000 | [diff] [blame] | 247 | perhaps to find out whether or not the locale is ``C''). |
| 248 | |
| 249 | When Python is embedded in an application, if the application sets the |
| 250 | locale to something specific before initializing Python, that is |
| 251 | generally okay, and Python will use whatever locale is set, |
Fred Drake | 193338a | 1998-03-10 04:23:12 +0000 | [diff] [blame] | 252 | \strong{except} that the \constant{LC_NUMERIC} locale should always be |
Guido van Rossum | 3ffb715 | 1998-02-22 04:23:51 +0000 | [diff] [blame] | 253 | ``C''. |
| 254 | |
Fred Drake | 193338a | 1998-03-10 04:23:12 +0000 | [diff] [blame] | 255 | The \function{setlocale()} function in the \module{locale} module contains |
Guido van Rossum | 3ffb715 | 1998-02-22 04:23:51 +0000 | [diff] [blame] | 256 | gives the Python progammer the impression that you can manipulate the |
Fred Drake | 193338a | 1998-03-10 04:23:12 +0000 | [diff] [blame] | 257 | \constant{LC_NUMERIC} locale setting, but this not the case at the C |
| 258 | level: C code will always find that the \constant{LC_NUMERIC} locale |
Guido van Rossum | 3ffb715 | 1998-02-22 04:23:51 +0000 | [diff] [blame] | 259 | setting is ``C''. This is because too much would break when the |
| 260 | decimal point character is set to something else than a period |
| 261 | (e.g. the Python parser would break). Caveat: threads that run |
| 262 | without holding Python's global interpreter lock may occasionally find |
| 263 | that the numeric locale setting differs; this is because the only |
| 264 | portable way to implement this feature is to set the numeric locale |
| 265 | settings to what the user requests, extract the relevant |
| 266 | characteristics, and then restore the ``C'' numeric locale. |
| 267 | |
Fred Drake | 193338a | 1998-03-10 04:23:12 +0000 | [diff] [blame] | 268 | When Python code uses the \module{locale} module to change the locale, |
Guido van Rossum | 3ffb715 | 1998-02-22 04:23:51 +0000 | [diff] [blame] | 269 | this also affect the embedding application. If the embedding |
| 270 | application doesn't want this to happen, it should remove the |
Fred Drake | 193338a | 1998-03-10 04:23:12 +0000 | [diff] [blame] | 271 | \module{_locale} extension module (which does all the work) from the |
| 272 | table of built-in modules in the \file{config.c} file, and make sure |
| 273 | that the \module{_locale} module is not accessible as a shared library. |