| \section{\module{locale} --- |
| Internationalization services} |
| |
| \declaremodule{standard}{locale} |
| \modulesynopsis{Internationalization services.} |
| \moduleauthor{Martin von Loewis}{loewis@informatik.hu-berlin.de} |
| \sectionauthor{Martin von Loewis}{loewis@informatik.hu-berlin.de} |
| |
| |
| The \module{locale} module opens access to the \POSIX{} locale database |
| and functionality. The \POSIX{} locale mechanism allows programmers |
| to deal with certain cultural issues in an application, without |
| requiring the programmer to know all the specifics of each country |
| where the software is executed. |
| |
| The \module{locale} module is implemented on top of the |
| \module{_locale}\refbimodindex{_locale} module, which in turn uses an |
| ANSI C locale implementation if available. |
| |
| The \module{locale} module defines the following exception and |
| functions: |
| |
| |
| \begin{funcdesc}{setlocale}{category\optional{, value}} |
| If \var{value} is specified, modifies the locale setting for the |
| \var{category}. The available categories are listed in the data |
| description below. The value is the name of a locale. An empty string |
| specifies the user's default settings. If the modification of the |
| locale fails, the exception \exception{Error} is |
| raised. If successful, the new locale setting is returned. |
| |
| If no \var{value} is specified, the current setting for the |
| \var{category} is returned. |
| |
| \function{setlocale()} is not thread safe on most systems. Applications |
| typically start with a call of |
| \begin{verbatim} |
| import locale |
| locale.setlocale(locale.LC_ALL,"") |
| \end{verbatim} |
| This sets the locale for all categories to the user's default setting |
| (typically specified in the \envvar{LANG} environment variable). If |
| the locale is not changed thereafter, using multithreading should not |
| cause problems. |
| \end{funcdesc} |
| |
| \begin{excdesc}{Error} |
| Exception raised when \function{setlocale()} fails. |
| \end{excdesc} |
| |
| \begin{funcdesc}{localeconv}{} |
| Returns the database of of the local conventions as a dictionary. This |
| dictionary has the following strings as keys: |
| \begin{itemize} |
| \item \code{decimal_point} specifies the decimal point used in |
| floating point number representations for the \constant{LC_NUMERIC} |
| category. |
| \item \code{grouping} is a sequence of numbers specifying at which |
| relative positions the \code{thousands_sep} is expected. If the |
| sequence is terminated with \constant{CHAR_MAX}, no further |
| grouping is performed. If the sequence terminates with a \code{0}, the last |
| group size is repeatedly used. |
| \item \code{thousands_sep} is the character used between groups. |
| \item \code{int_curr_symbol} specifies the international currency |
| symbol from the \constant{LC_MONETARY} category. |
| \item \code{currency_symbol} is the local currency symbol. |
| \item \code{mon_decimal_point} is the decimal point used in monetary |
| values. |
| \item \code{mon_thousands_sep} is the separator for grouping of |
| monetary values. |
| \item \code{mon_grouping} has the same format as the \code{grouping} |
| key; it is used for monetary values. |
| \item \code{positive_sign} and \code{negative_sign} gives the sign |
| used for positive and negative monetary quantities. |
| \item \code{int_frac_digits} and \code{frac_digits} specify the number |
| of fractional digits used in the international and local formatting |
| of monetary values. |
| \item \code{p_cs_precedes} and \code{n_cs_precedes} specifies whether |
| the currency symbol precedes the value for positive or negative |
| values. |
| \item \code{p_sep_by_space} and \code{n_sep_by_space} specifies |
| whether there is a space between the positive or negative value and |
| the currency symbol. |
| \item \code{p_sign_posn} and \code{n_sign_posn} indicate how the |
| sign should be placed for positive and negative monetary values. |
| \end{itemize} |
| |
| The possible values for \code{p_sign_posn} and |
| \code{n_sign_posn} are given below. |
| |
| \begin{tableii}{c|l}{code}{Value}{Explanation} |
| \lineii{0}{Currency and value are surrounded by parentheses.} |
| \lineii{1}{The sign should precede the value and currency symbol.} |
| \lineii{2}{The sign should follow the value and currency symbol.} |
| \lineii{3}{The sign should immediately precede the value.} |
| \lineii{4}{The sign should immediately follow the value.} |
| \lineii{LC_MAX}{Nothing is specified in this locale.} |
| \end{tableii} |
| \end{funcdesc} |
| |
| \begin{funcdesc}{strcoll}{string1,string2} |
| Compares two strings according to the current \constant{LC_COLLATE} |
| setting. As any other compare function, returns a negative, or a |
| positive value, or \code{0}, depending on whether \var{string1} |
| collates before or after \var{string2} or is equal to it. |
| \end{funcdesc} |
| |
| \begin{funcdesc}{strxfrm}{string} |
| Transforms a string to one that can be used for the built-in function |
| \function{cmp()}\bifuncindex{cmp}, and still returns locale-aware |
| results. This function can be used when the same string is compared |
| repeatedly, e.g. when collating a sequence of strings. |
| \end{funcdesc} |
| |
| \begin{funcdesc}{format}{format, val, \optional{grouping\code{ = 0}}} |
| Formats a number \var{val} according to the current |
| \constant{LC_NUMERIC} setting. The format follows the conventions of |
| the \code{\%} operator. For floating point values, the decimal point |
| is modified if appropriate. If \var{grouping} is true, also takes the |
| grouping into account. |
| \end{funcdesc} |
| |
| \begin{funcdesc}{str}{float} |
| Formats a floating point number using the same format as the built-in |
| function \code{str(\var{float})}, but takes the decimal point into |
| account. |
| \end{funcdesc} |
| |
| \begin{funcdesc}{atof}{string} |
| Converts a string to a floating point number, following the |
| \constant{LC_NUMERIC} settings. |
| \end{funcdesc} |
| |
| \begin{funcdesc}{atoi}{string} |
| Converts a string to an integer, following the \constant{LC_NUMERIC} |
| conventions. |
| \end{funcdesc} |
| |
| \begin{datadesc}{LC_CTYPE} |
| \refstmodindex{string} |
| Locale category for the character type functions. Depending on the |
| settings of this category, the functions of module \refmodule{string} |
| dealing with case change their behaviour. |
| \end{datadesc} |
| |
| \begin{datadesc}{LC_COLLATE} |
| Locale category for sorting strings. The functions |
| \function{strcoll()} and \function{strxfrm()} of the \module{locale} |
| module are affected. |
| \end{datadesc} |
| |
| \begin{datadesc}{LC_TIME} |
| Locale category for the formatting of time. The function |
| \function{time.strftime()} follows these conventions. |
| \end{datadesc} |
| |
| \begin{datadesc}{LC_MONETARY} |
| Locale category for formatting of monetary values. The available |
| options are available from the \function{localeconv()} function. |
| \end{datadesc} |
| |
| \begin{datadesc}{LC_MESSAGES} |
| Locale category for message display. Python currently does not support |
| application specific locale-aware messages. Messages displayed by the |
| operating system, like those returned by \function{os.strerror()} |
| might be affected by this category. |
| \end{datadesc} |
| |
| \begin{datadesc}{LC_NUMERIC} |
| Locale category for formatting numbers. The functions |
| \function{format()}, \function{atoi()}, \function{atof()} and |
| \function{str()} of the \module{locale} module are affected by that |
| category. All other numeric formatting operations are not affected. |
| \end{datadesc} |
| |
| \begin{datadesc}{LC_ALL} |
| Combination of all locale settings. If this flag is used when the |
| locale is changed, setting the locale for all categories is |
| attempted. If that fails for any category, no category is changed at |
| all. When the locale is retrieved using this flag, a string indicating |
| the setting for all categories is returned. This string can be later |
| used to restore the settings. |
| \end{datadesc} |
| |
| \begin{datadesc}{CHAR_MAX} |
| This is a symbolic constant used for different values returned by |
| \function{localeconv()}. |
| \end{datadesc} |
| |
| Example: |
| |
| \begin{verbatim} |
| >>> import locale |
| >>> loc = locale.setlocale(locale.LC_ALL) # get current locale |
| >>> locale.setlocale(locale.LC_ALL, "de") # use German locale |
| >>> locale.strcoll("f\344n", "foo") # compare a string containing an umlaut |
| >>> locale.setlocale(locale.LC_ALL, "") # use user's preferred locale |
| >>> locale.setlocale(locale.LC_ALL, "C") # use default (C) locale |
| >>> locale.setlocale(locale.LC_ALL, loc) # restore saved locale |
| \end{verbatim} |
| |
| \subsection{Background, details, hints, tips and caveats} |
| |
| The C standard defines the locale as a program-wide property that may |
| be relatively expensive to change. On top of that, some |
| implementation are broken in such a way that frequent locale changes |
| may cause core dumps. This makes the locale somewhat painful to use |
| correctly. |
| |
| Initially, when a program is started, the locale is the \samp{C} locale, no |
| matter what the user's preferred locale is. The program must |
| explicitly say that it wants the user's preferred locale settings by |
| calling \code{setlocale(LC_ALL, "")}. |
| |
| It is generally a bad idea to call \function{setlocale()} in some library |
| routine, since as a side effect it affects the entire program. Saving |
| and restoring it is almost as bad: it is expensive and affects other |
| threads that happen to run before the settings have been restored. |
| |
| If, when coding a module for general use, you need a locale |
| independent version of an operation that is affected by the locale |
| (e.g. \function{string.lower()}, or certain formats used with |
| \function{time.strftime()})), you will have to find a way to do it |
| without using the standard library routine. Even better is convincing |
| yourself that using locale settings is okay. Only as a last resort |
| should you document that your module is not compatible with |
| non-\samp{C} locale settings. |
| |
| The case conversion functions in the |
| \refmodule{string}\refstmodindex{string} and |
| \module{strop}\refbimodindex{strop} modules are affected by the locale |
| settings. When a call to the \function{setlocale()} function changes |
| the \constant{LC_CTYPE} settings, the variables |
| \code{string.lowercase}, \code{string.uppercase} and |
| \code{string.letters} (and their counterparts in \module{strop}) are |
| recalculated. Note that this code that uses these variable through |
| `\keyword{from} ... \keyword{import} ...', e.g. \code{from string |
| import letters}, is not affected by subsequent \function{setlocale()} |
| calls. |
| |
| The only way to perform numeric operations according to the locale |
| is to use the special functions defined by this module: |
| \function{atof()}, \function{atoi()}, \function{format()}, |
| \function{str()}. |
| |
| \subsection{For extension writers and programs that embed Python} |
| \label{embedding-locale} |
| |
| Extension modules should never call \function{setlocale()}, except to |
| find out what the current locale is. But since the return value can |
| only be used portably to restore it, that is not very useful (except |
| perhaps to find out whether or not the locale is \samp{C}). |
| |
| When Python is embedded in an application, if the application sets the |
| locale to something specific before initializing Python, that is |
| generally okay, and Python will use whatever locale is set, |
| \emph{except} that the \constant{LC_NUMERIC} locale should always be |
| \samp{C}. |
| |
| The \function{setlocale()} function in the \module{locale} module |
| gives the Python progammer the impression that you can manipulate the |
| \constant{LC_NUMERIC} locale setting, but this not the case at the C |
| level: C code will always find that the \constant{LC_NUMERIC} locale |
| setting is \samp{C}. This is because too much would break when the |
| decimal point character is set to something else than a period |
| (e.g. the Python parser would break). Caveat: threads that run |
| without holding Python's global interpreter lock may occasionally find |
| that the numeric locale setting differs; this is because the only |
| portable way to implement this feature is to set the numeric locale |
| settings to what the user requests, extract the relevant |
| characteristics, and then restore the \samp{C} numeric locale. |
| |
| When Python code uses the \module{locale} module to change the locale, |
| this also affects the embedding application. If the embedding |
| application doesn't want this to happen, it should remove the |
| \module{_locale} extension module (which does all the work) from the |
| table of built-in modules in the \file{config.c} file, and make sure |
| that the \module{_locale} module is not accessible as a shared library. |