Fred Drake | 304474f | 1997-12-17 15:30:07 +0000 | [diff] [blame] | 1 | \section{Standard Module \sectcode{locale}} |
Guido van Rossum | bc12f78 | 1997-11-20 21:04:27 +0000 | [diff] [blame] | 2 | \stmodindex{locale} |
| 3 | |
| 4 | \label{module-locale} |
| 5 | |
Fred Drake | 65b32f7 | 1998-02-09 20:27:12 +0000 | [diff] [blame] | 6 | The \code{locale} module opens access to the \POSIX{} locale database |
| 7 | and functionality. The \POSIX{} locale mechanism allows applications |
| 8 | to integrate certain cultural aspects into an applications, without |
Guido van Rossum | bc12f78 | 1997-11-20 21:04:27 +0000 | [diff] [blame] | 9 | requiring the programmer to know all the specifics of each country |
| 10 | where the software is executed. |
| 11 | |
| 12 | The \code{locale} module is implemented on top of the \code{_locale} |
Fred Drake | 304474f | 1997-12-17 15:30:07 +0000 | [diff] [blame] | 13 | module, which in turn uses an ANSI \C{} locale implementation if |
| 14 | available. |
| 15 | \refbimodindex{_locale} |
Guido van Rossum | bc12f78 | 1997-11-20 21:04:27 +0000 | [diff] [blame] | 16 | |
| 17 | The \code{locale} module defines the following functions: |
| 18 | |
Fred Drake | 1947991 | 1998-02-13 06:58:54 +0000 | [diff] [blame] | 19 | \setindexsubitem{(in module locale)} |
Guido van Rossum | bc12f78 | 1997-11-20 21:04:27 +0000 | [diff] [blame] | 20 | |
| 21 | \begin{funcdesc}{setlocale}{category\optional{\, value}} |
| 22 | If \var{value} is specified, modifies the locale setting for the |
| 23 | \var{category}. The available categories are listed in the data |
| 24 | description below. The value is the name of a locale. An empty string |
| 25 | specifies the user's default settings. If the modification of the |
| 26 | locale fails, the exception \code{locale.Error} is |
| 27 | raised. If successful, the new locale setting is returned. |
| 28 | |
| 29 | If no \var{value} is specified, the current setting for the |
| 30 | \var{category} is returned. |
| 31 | |
Fred Drake | 304474f | 1997-12-17 15:30:07 +0000 | [diff] [blame] | 32 | \code{setlocale()} is not thread safe on most systems. Applications |
Guido van Rossum | bc12f78 | 1997-11-20 21:04:27 +0000 | [diff] [blame] | 33 | typically start with a call of |
Fred Drake | 1947991 | 1998-02-13 06:58:54 +0000 | [diff] [blame] | 34 | \begin{verbatim} |
Guido van Rossum | bc12f78 | 1997-11-20 21:04:27 +0000 | [diff] [blame] | 35 | import locale |
| 36 | locale.setlocale(locale.LC_ALL,"") |
Fred Drake | 1947991 | 1998-02-13 06:58:54 +0000 | [diff] [blame] | 37 | \end{verbatim} |
Guido van Rossum | bc12f78 | 1997-11-20 21:04:27 +0000 | [diff] [blame] | 38 | This sets the locale for all categories to the user's default setting |
| 39 | (typically specified in the \code{LANG} environment variable). If the |
| 40 | locale is not changed thereafter, using multithreading should not |
| 41 | cause problems. |
| 42 | \end{funcdesc} |
| 43 | |
| 44 | \begin{funcdesc}{localeconv}{} |
| 45 | Returns the database of of the local conventions as a dictionary. This |
| 46 | dictionary has the following strings as keys: |
| 47 | \begin{itemize} |
| 48 | \item \code{decimal_point} specifies the decimal point used in |
| 49 | floating point number representations for the \code{LC_NUMERIC} |
| 50 | category. |
| 51 | \item \code{grouping} is a sequence of numbers specifying at which |
| 52 | relative positions the \code{thousands_sep} is expected. If the |
| 53 | sequence is terminated with \code{locale.CHAR_MAX}, no further |
Fred Drake | 304474f | 1997-12-17 15:30:07 +0000 | [diff] [blame] | 54 | grouping is performed. If the sequence terminates with a \code{0}, the last |
Guido van Rossum | bc12f78 | 1997-11-20 21:04:27 +0000 | [diff] [blame] | 55 | group size is repeatedly used. |
| 56 | \item \code{thousands_sep} is the character used between groups. |
| 57 | \item \code{int_curr_symbol} specifies the international currency |
| 58 | symbol from the \code{LC_MONETARY} category. |
| 59 | \item \code{currency_symbol} is the local currency symbol. |
| 60 | \item \code{mon_decimal_point} is the decimal point used in monetary |
| 61 | values. |
| 62 | \item \code{mon_thousands_sep} is the separator for grouping of |
| 63 | monetary values. |
| 64 | \item \code{mon_grouping} has the same format as the \code{grouping} |
| 65 | key; it is used for monetary values. |
| 66 | \item \code{positive_sign} and \code{negative_sign} gives the sign |
| 67 | used for positive and negative monetary quantities. |
| 68 | \item \code{int_frac_digits} and \code{frac_digits} specify the number |
| 69 | of fractional digits used in the international and local formatting |
| 70 | of monetary values. |
| 71 | \item \code{p_cs_precedes} and \code{n_cs_precedes} specifies whether |
| 72 | the currency symbol precedes the value for positive or negative |
| 73 | values. |
| 74 | \item \code{p_sep_by_space} and \code{n_sep_by_space} specifies |
| 75 | whether there is a space between the positive or negative value and |
| 76 | the currency symbol. |
| 77 | \item \code{p_sign_posn} and \code{n_sign_posn} indicate how the |
| 78 | sign should be placed for positive and negative monetary values. |
| 79 | \end{itemize} |
| 80 | The possible values for \code{p_sign_posn} and \code{n_sign_posn} |
| 81 | are given below. |
| 82 | \begin{itemize} |
| 83 | \item 0 - Currency and value are surrounded by parentheses. |
| 84 | \item 1 - The sign should precede the value and currency symbol. |
| 85 | \item 2 - The sign should follow the value and currency symbol. |
| 86 | \item 3 - The sign should immediately precede the value. |
| 87 | \item 4 - The sign should immediately follow the value. |
| 88 | \item LC_MAX - nothing is specified in this locale. |
| 89 | \end{itemize} |
| 90 | \end{funcdesc} |
| 91 | |
| 92 | \begin{funcdesc}{strcoll}{string1,string2} |
Fred Drake | 304474f | 1997-12-17 15:30:07 +0000 | [diff] [blame] | 93 | Compares two strings according to the current \code{LC_COLLATE} |
| 94 | setting. As any other compare function, returns a negative, or a |
| 95 | positive value, or \code{0}, depending on whether \var{string1} |
| 96 | collates before or after \var{string2} or is equal to it. |
Guido van Rossum | bc12f78 | 1997-11-20 21:04:27 +0000 | [diff] [blame] | 97 | \end{funcdesc} |
| 98 | |
| 99 | \begin{funcdesc}{strxfrm}{string} |
| 100 | Transforms a string to one that can be used for the builtin function |
Fred Drake | 304474f | 1997-12-17 15:30:07 +0000 | [diff] [blame] | 101 | \code{cmp()}, and still returns locale-aware results. This function can be |
Guido van Rossum | bc12f78 | 1997-11-20 21:04:27 +0000 | [diff] [blame] | 102 | used when the same string is compared repeatedly, e.g. when collating |
| 103 | a sequence of strings. |
| 104 | \end{funcdesc} |
| 105 | |
| 106 | \begin{funcdesc}{format}{format,val\optional{grouping=0}} |
Fred Drake | 304474f | 1997-12-17 15:30:07 +0000 | [diff] [blame] | 107 | Formats a number \var{val} according to the current \code{LC_NUMERIC} |
| 108 | setting. The format follows the conventions of the \code{\%} operator. For |
Guido van Rossum | bc12f78 | 1997-11-20 21:04:27 +0000 | [diff] [blame] | 109 | floating point values, the decimal point is modified if |
| 110 | appropriate. If \var{grouping} is true, also takes the grouping into |
| 111 | account. |
| 112 | \end{funcdesc} |
| 113 | |
| 114 | \begin{funcdesc}{str}{float} |
Fred Drake | 304474f | 1997-12-17 15:30:07 +0000 | [diff] [blame] | 115 | Formats a floating point number using the same format as the built-in |
| 116 | function \code{str(\var{float})}, but takes the decimal point into |
| 117 | account. |
Guido van Rossum | bc12f78 | 1997-11-20 21:04:27 +0000 | [diff] [blame] | 118 | \end{funcdesc} |
| 119 | |
| 120 | \begin{funcdesc}{atof}{string} |
Fred Drake | 304474f | 1997-12-17 15:30:07 +0000 | [diff] [blame] | 121 | Converts a string to a floating point number, following the \code{LC_NUMERIC} |
Guido van Rossum | bc12f78 | 1997-11-20 21:04:27 +0000 | [diff] [blame] | 122 | settings. |
| 123 | \end{funcdesc} |
| 124 | |
| 125 | \begin{funcdesc}{atoi}{string} |
Fred Drake | 304474f | 1997-12-17 15:30:07 +0000 | [diff] [blame] | 126 | Converts a string to an integer, following the \code{LC_NUMERIC} conventions. |
Guido van Rossum | bc12f78 | 1997-11-20 21:04:27 +0000 | [diff] [blame] | 127 | \end{funcdesc} |
| 128 | |
| 129 | \begin{datadesc}{LC_CTYPE} |
Fred Drake | 304474f | 1997-12-17 15:30:07 +0000 | [diff] [blame] | 130 | \refstmodindex{string} |
Guido van Rossum | bc12f78 | 1997-11-20 21:04:27 +0000 | [diff] [blame] | 131 | Locale category for the character type functions. Depending on the |
| 132 | settings of this category, the functions of module \code{string} |
| 133 | dealing with case change their behaviour. |
| 134 | \end{datadesc} |
| 135 | |
| 136 | \begin{datadesc}{LC_COLLATE} |
Fred Drake | 304474f | 1997-12-17 15:30:07 +0000 | [diff] [blame] | 137 | Locale category for sorting strings. The functions \code{strcoll()} and |
| 138 | \code{strxfrm()} of the \code{locale} module are affected. |
Guido van Rossum | bc12f78 | 1997-11-20 21:04:27 +0000 | [diff] [blame] | 139 | \end{datadesc} |
| 140 | |
| 141 | \begin{datadesc}{LC_TIME} |
| 142 | Locale category for the formatting of time. The function |
Fred Drake | 304474f | 1997-12-17 15:30:07 +0000 | [diff] [blame] | 143 | \code{time.strftime()} follows these conventions. |
Guido van Rossum | bc12f78 | 1997-11-20 21:04:27 +0000 | [diff] [blame] | 144 | \end{datadesc} |
| 145 | |
| 146 | \begin{datadesc}{LC_MONETARY} |
| 147 | Locale category for formatting of monetary values. The available |
Fred Drake | 304474f | 1997-12-17 15:30:07 +0000 | [diff] [blame] | 148 | options are available from the \code{localeconv()} function. |
Guido van Rossum | bc12f78 | 1997-11-20 21:04:27 +0000 | [diff] [blame] | 149 | \end{datadesc} |
| 150 | |
| 151 | \begin{datadesc}{LC_MESSAGES} |
| 152 | Locale category for message display. Python currently does not support |
| 153 | application specific locale-aware messages. Messages displayed by the |
Fred Drake | 304474f | 1997-12-17 15:30:07 +0000 | [diff] [blame] | 154 | operating system, like those returned by \code{posix.strerror()} might |
Guido van Rossum | bc12f78 | 1997-11-20 21:04:27 +0000 | [diff] [blame] | 155 | be affected by this category. |
| 156 | \end{datadesc} |
| 157 | |
| 158 | \begin{datadesc}{LC_NUMERIC} |
| 159 | Locale category for formatting numbers. The functions |
Fred Drake | 304474f | 1997-12-17 15:30:07 +0000 | [diff] [blame] | 160 | \code{format()}, \code{atoi()}, \code{atof()} and \code{str()} of the |
| 161 | \code{locale} module are affected by that category. All other numeric |
| 162 | formatting operations are not affected. |
Guido van Rossum | bc12f78 | 1997-11-20 21:04:27 +0000 | [diff] [blame] | 163 | \end{datadesc} |
| 164 | |
| 165 | \begin{datadesc}{LC_ALL} |
| 166 | Combination of all locale settings. If this flag is used when the |
| 167 | locale is changed, setting the locale for all categories is |
| 168 | attempted. If that fails for any category, no category is changed at |
| 169 | all. When the locale is retrieved using this flag, a string indicating |
| 170 | the setting for all categories is returned. This string can be later |
| 171 | used to restore the settings. |
| 172 | \end{datadesc} |
| 173 | |
| 174 | \begin{datadesc}{CHAR_MAX} |
| 175 | This is a symbolic constant used for different values returned by |
Fred Drake | 304474f | 1997-12-17 15:30:07 +0000 | [diff] [blame] | 176 | \code{localeconv()}. |
Guido van Rossum | bc12f78 | 1997-11-20 21:04:27 +0000 | [diff] [blame] | 177 | \end{datadesc} |
| 178 | |
| 179 | \begin{excdesc}{Error} |
Fred Drake | 304474f | 1997-12-17 15:30:07 +0000 | [diff] [blame] | 180 | Exception raised when \code{setlocale()} fails. |
Guido van Rossum | bc12f78 | 1997-11-20 21:04:27 +0000 | [diff] [blame] | 181 | \end{excdesc} |
| 182 | |
| 183 | Example: |
| 184 | |
Fred Drake | 1947991 | 1998-02-13 06:58:54 +0000 | [diff] [blame] | 185 | \begin{verbatim} |
Guido van Rossum | bc12f78 | 1997-11-20 21:04:27 +0000 | [diff] [blame] | 186 | >>> import locale |
| 187 | >>> locale.open(locale.LC_ALL,"de") #setting locale to German |
| 188 | >>> locale.strcoll("f\344n","foo") #comparing a string containing an umlaut |
| 189 | >>> can.close() |
Fred Drake | 1947991 | 1998-02-13 06:58:54 +0000 | [diff] [blame] | 190 | \end{verbatim} |
Guido van Rossum | 3ffb715 | 1998-02-22 04:23:51 +0000 | [diff] [blame^] | 191 | |
| 192 | \subsection{Background, details, hints, tips and caveats} |
| 193 | |
| 194 | The C standard defines the locale as a program-wide property that may |
| 195 | be relatively expensive to change. On top of that, some |
| 196 | implementation are broken in such a way that frequent locale changes |
| 197 | may cause core dumps. This makes the locale somewhat painful to use |
| 198 | correctly. |
| 199 | |
| 200 | Initially, when a program is started, the locale is the "C" locale, no |
| 201 | matter what the user's preferred locale is. The program must |
| 202 | explicitly say that it wants the user's preferred locale settings by |
| 203 | calling \code{setlocale(LC_ALL, "")}. |
| 204 | |
| 205 | It is generally a bad idea to call \code{setlocale()} in some library |
| 206 | routine, since as a side effect it affects the entire program. Saving |
| 207 | and restoring it is almost as bad: it is expensive and affects other |
| 208 | threads that happen to run before the settings have been restored. |
| 209 | |
| 210 | If, when coding a module for general use, you need a locale |
| 211 | independent version of an operation that is affected by the locale |
| 212 | (e.g. \code{string.lower()}, or certain formats used with |
| 213 | \code{time.strftime()})), you will have to find a way to do it without |
| 214 | using the standard library routine. Even better is convincing |
| 215 | yourself that using locale settings is okay. Only as a last should |
| 216 | you document that your module is not compatible with non-C locale |
| 217 | settings. |
| 218 | |
| 219 | The case conversion functions in the \code{string} and \code{strop} |
| 220 | modules are affected by the locale settings. When a call to the |
| 221 | \code{setlocale()} function changes the \code{LC_CTYPE} settings, the |
| 222 | variables \code{string.lowercase}, \code{string.uppercase} and |
| 223 | \code{string.letters} (and their counterparts in \code{strop}) are |
| 224 | recalculated. Note that this code that uses these variable through |
| 225 | \code{from ... import ...}, e.g. \code{from string import letters}, is |
| 226 | not affected by subsequent \code{setlocale()} calls. |
| 227 | |
| 228 | The only way to perform numeric operations according to the locale |
| 229 | is to use the special functions defined by this module: |
| 230 | \code{atof()}, \code{atoi()}, \code{format()}, \code{str()}. |
| 231 | |
| 232 | \code{For extension writers and programs that embed Python} |
| 233 | |
| 234 | Extension modules should never call \code{setlocale()}, except to find |
| 235 | out what the current locale is. But since the return value can only |
| 236 | be used portably to restore it, that is not very useful (except |
| 237 | perhaps to find out whether or not the locale is ``C''). |
| 238 | |
| 239 | When Python is embedded in an application, if the application sets the |
| 240 | locale to something specific before initializing Python, that is |
| 241 | generally okay, and Python will use whatever locale is set, |
| 242 | \strong{except} that the \code{LC_NUMERIC} locale should always be |
| 243 | ``C''. |
| 244 | |
| 245 | The \code{setlocale()} function in the \code{locale} module contains |
| 246 | gives the Python progammer the impression that you can manipulate the |
| 247 | \code{LC_NUMERIC} locale setting, but this not the case at the C |
| 248 | level: C code will always find that the \code{LC_NUMERIC} locale |
| 249 | setting is ``C''. This is because too much would break when the |
| 250 | decimal point character is set to something else than a period |
| 251 | (e.g. the Python parser would break). Caveat: threads that run |
| 252 | without holding Python's global interpreter lock may occasionally find |
| 253 | that the numeric locale setting differs; this is because the only |
| 254 | portable way to implement this feature is to set the numeric locale |
| 255 | settings to what the user requests, extract the relevant |
| 256 | characteristics, and then restore the ``C'' numeric locale. |
| 257 | |
| 258 | When Python code uses the \code{locale} module to change the locale, |
| 259 | this also affect the embedding application. If the embedding |
| 260 | application doesn't want this to happen, it should remove the |
| 261 | \code{_locale} extension module (which does all the work) from the |
| 262 | table of built-in modules in the \code{config.c} file, and make sure |
| 263 | that the \code{_locale} module is not accessible as a shared library. |