blob: 8905aa62b4127b3805a6c268d7853351fed29c7a [file] [log] [blame]
Fred Drake295da241998-08-10 19:42:37 +00001\section{\module{string} ---
Fred Drakeffbe6871999-04-22 21:23:22 +00002 Common string operations}
Fred Drakeb91e9341998-07-23 17:59:49 +00003
Fred Drakeffbe6871999-04-22 21:23:22 +00004\declaremodule{standard}{string}
Fred Drakeb91e9341998-07-23 17:59:49 +00005\modulesynopsis{Common string operations.}
6
Guido van Rossum5fdeeea1994-01-02 01:22:07 +00007
8This module defines some constants useful for checking character
Fred Drake6d2bdb61997-12-16 04:04:25 +00009classes and some useful string functions. See the module
Fred Drakeffbe6871999-04-22 21:23:22 +000010\refmodule{re}\refstmodindex{re} for string functions based on regular
Fred Drakecce10901998-03-17 06:33:25 +000011expressions.
Guido van Rossum0bf4d891995-03-02 12:37:30 +000012
13The constants defined in this module are are:
Guido van Rossum5fdeeea1994-01-02 01:22:07 +000014
Guido van Rossum5fdeeea1994-01-02 01:22:07 +000015\begin{datadesc}{digits}
16 The string \code{'0123456789'}.
17\end{datadesc}
18
19\begin{datadesc}{hexdigits}
20 The string \code{'0123456789abcdefABCDEF'}.
21\end{datadesc}
22
23\begin{datadesc}{letters}
Fred Drake0682be42000-04-10 18:35:49 +000024 The concatenation of the strings \constant{lowercase} and
25 \constant{uppercase} described below.
Guido van Rossum5fdeeea1994-01-02 01:22:07 +000026\end{datadesc}
27
28\begin{datadesc}{lowercase}
29 A string containing all the characters that are considered lowercase
30 letters. On most systems this is the string
Guido van Rossum86751151995-02-28 17:14:32 +000031 \code{'abcdefghijklmnopqrstuvwxyz'}. Do not change its definition ---
Fred Drakecce10901998-03-17 06:33:25 +000032 the effect on the routines \function{upper()} and
33 \function{swapcase()} is undefined.
Guido van Rossum5fdeeea1994-01-02 01:22:07 +000034\end{datadesc}
35
36\begin{datadesc}{octdigits}
37 The string \code{'01234567'}.
38\end{datadesc}
39
Fred Drake480abc22000-09-18 16:48:13 +000040\begin{datadesc}{punctuation}
41 String of \ASCII{} characters which are considered punctuation
42 characters in the \samp{C} locale.
43\end{datadesc}
44
45\begin{datadesc}{printable}
46 String of characters which are considered printable. This is a
47 combination of \constant{digits}, \constant{letters},
48 \constant{punctuation}, and \constant{whitespace}.
49\end{datadesc}
50
Guido van Rossum5fdeeea1994-01-02 01:22:07 +000051\begin{datadesc}{uppercase}
52 A string containing all the characters that are considered uppercase
53 letters. On most systems this is the string
Guido van Rossum86751151995-02-28 17:14:32 +000054 \code{'ABCDEFGHIJKLMNOPQRSTUVWXYZ'}. Do not change its definition ---
Fred Drakecce10901998-03-17 06:33:25 +000055 the effect on the routines \function{lower()} and
56 \function{swapcase()} is undefined.
Guido van Rossum5fdeeea1994-01-02 01:22:07 +000057\end{datadesc}
58
59\begin{datadesc}{whitespace}
60 A string containing all characters that are considered whitespace.
61 On most systems this includes the characters space, tab, linefeed,
Guido van Rossum86751151995-02-28 17:14:32 +000062 return, formfeed, and vertical tab. Do not change its definition ---
Fred Drakecce10901998-03-17 06:33:25 +000063 the effect on the routines \function{strip()} and \function{split()}
64 is undefined.
Guido van Rossum5fdeeea1994-01-02 01:22:07 +000065\end{datadesc}
66
Guido van Rossum5fdeeea1994-01-02 01:22:07 +000067
Fred Drake1b194f922000-09-09 05:34:06 +000068Many of the functions provided by this module are also defined as
69methods of string and Unicode objects; see ``String Methods'' (section
70\ref{string-methods}) for more information on those.
71The functions defined in this module are:
Guido van Rossum5fdeeea1994-01-02 01:22:07 +000072
73\begin{funcdesc}{atof}{s}
Fred Drakee8489761998-12-21 18:56:13 +000074 Convert a string to a floating point number. The string must have
75 the standard syntax for a floating point literal in Python,
Fred Drake70a66c91999-02-18 16:08:36 +000076 optionally preceded by a sign (\samp{+} or \samp{-}). Note that
77 this behaves identical to the built-in function
78 \function{float()}\bifuncindex{float} when passed a string.
79
80 \strong{Note:} When passing in a string, values for NaN\index{NaN}
81 and Infinity\index{Infinity} may be returned, depending on the
82 underlying C library. The specific set of strings accepted which
83 cause these values to be returned depends entirely on the C library
84 and is known to vary.
Guido van Rossum5fdeeea1994-01-02 01:22:07 +000085\end{funcdesc}
86
Fred Drakecce10901998-03-17 06:33:25 +000087\begin{funcdesc}{atoi}{s\optional{, base}}
Fred Drakee8489761998-12-21 18:56:13 +000088 Convert string \var{s} to an integer in the given \var{base}. The
89 string must consist of one or more digits, optionally preceded by a
90 sign (\samp{+} or \samp{-}). The \var{base} defaults to 10. If it
91 is 0, a default base is chosen depending on the leading characters
92 of the string (after stripping the sign): \samp{0x} or \samp{0X}
93 means 16, \samp{0} means 8, anything else means 10. If \var{base}
Fred Drakefffe5db2000-09-21 05:25:30 +000094 is 16, a leading \samp{0x} or \samp{0X} is always accepted, though
95 not required. This behaves identically to the built-in function
96 \function{int()} when passed a string. (Also note: for a more
97 flexible interpretation of numeric literals, use the built-in
98 function \function{eval()}\bifuncindex{eval}.)
Guido van Rossum5fdeeea1994-01-02 01:22:07 +000099\end{funcdesc}
100
Fred Drakecce10901998-03-17 06:33:25 +0000101\begin{funcdesc}{atol}{s\optional{, base}}
Fred Drakee8489761998-12-21 18:56:13 +0000102 Convert string \var{s} to a long integer in the given \var{base}.
103 The string must consist of one or more digits, optionally preceded
104 by a sign (\samp{+} or \samp{-}). The \var{base} argument has the
105 same meaning as for \function{atoi()}. A trailing \samp{l} or
106 \samp{L} is not allowed, except if the base is 0. Note that when
107 invoked without \var{base} or with \var{base} set to 10, this
108 behaves identical to the built-in function
109 \function{long()}\bifuncindex{long} when passed a string.
Guido van Rossum5fdeeea1994-01-02 01:22:07 +0000110\end{funcdesc}
111
Guido van Rossume5e55d71996-08-09 21:44:51 +0000112\begin{funcdesc}{capitalize}{word}
Fred Drakee8489761998-12-21 18:56:13 +0000113 Capitalize the first character of the argument.
Guido van Rossume5e55d71996-08-09 21:44:51 +0000114\end{funcdesc}
115
116\begin{funcdesc}{capwords}{s}
Fred Drakee8489761998-12-21 18:56:13 +0000117 Split the argument into words using \function{split()}, capitalize
118 each word using \function{capitalize()}, and join the capitalized
119 words using \function{join()}. Note that this replaces runs of
120 whitespace characters by a single space, and removes leading and
121 trailing whitespace.
Guido van Rossume5e55d71996-08-09 21:44:51 +0000122\end{funcdesc}
123
Guido van Rossum9700e9b1999-01-25 22:31:53 +0000124\begin{funcdesc}{expandtabs}{s, \optional{tabsize}}
Fred Drakee8489761998-12-21 18:56:13 +0000125 Expand tabs in a string, i.e.\ replace them by one or more spaces,
126 depending on the current column and the given tab size. The column
127 number is reset to zero after each newline occurring in the string.
128 This doesn't understand other non-printing characters or escape
Guido van Rossum9700e9b1999-01-25 22:31:53 +0000129 sequences. The tab size defaults to 8.
Guido van Rossum5fdeeea1994-01-02 01:22:07 +0000130\end{funcdesc}
131
Fred Drakecce10901998-03-17 06:33:25 +0000132\begin{funcdesc}{find}{s, sub\optional{, start\optional{,end}}}
Fred Drakee8489761998-12-21 18:56:13 +0000133 Return the lowest index in \var{s} where the substring \var{sub} is
134 found such that \var{sub} is wholly contained in
135 \code{\var{s}[\var{start}:\var{end}]}. Return \code{-1} on failure.
136 Defaults for \var{start} and \var{end} and interpretation of
137 negative values is the same as for slices.
Guido van Rossum5fdeeea1994-01-02 01:22:07 +0000138\end{funcdesc}
139
Fred Drakecce10901998-03-17 06:33:25 +0000140\begin{funcdesc}{rfind}{s, sub\optional{, start\optional{, end}}}
Fred Drakee8489761998-12-21 18:56:13 +0000141 Like \function{find()} but find the highest index.
Guido van Rossum5fdeeea1994-01-02 01:22:07 +0000142\end{funcdesc}
143
Fred Drakecce10901998-03-17 06:33:25 +0000144\begin{funcdesc}{index}{s, sub\optional{, start\optional{, end}}}
Fred Drakee8489761998-12-21 18:56:13 +0000145 Like \function{find()} but raise \exception{ValueError} when the
146 substring is not found.
Guido van Rossum5fdeeea1994-01-02 01:22:07 +0000147\end{funcdesc}
148
Fred Drakecce10901998-03-17 06:33:25 +0000149\begin{funcdesc}{rindex}{s, sub\optional{, start\optional{, end}}}
Fred Drakee8489761998-12-21 18:56:13 +0000150 Like \function{rfind()} but raise \exception{ValueError} when the
151 substring is not found.
Guido van Rossum5fdeeea1994-01-02 01:22:07 +0000152\end{funcdesc}
153
Fred Drakecce10901998-03-17 06:33:25 +0000154\begin{funcdesc}{count}{s, sub\optional{, start\optional{, end}}}
Fred Drakee8489761998-12-21 18:56:13 +0000155 Return the number of (non-overlapping) occurrences of substring
156 \var{sub} in string \code{\var{s}[\var{start}:\var{end}]}.
157 Defaults for \var{start} and \var{end} and interpretation of
Andrew M. Kuchlinga4ca07c2000-06-21 01:48:46 +0000158 negative values are the same as for slices.
Guido van Rossumab3a2501994-08-01 12:18:36 +0000159\end{funcdesc}
160
Guido van Rossum5fdeeea1994-01-02 01:22:07 +0000161\begin{funcdesc}{lower}{s}
Fred Drakee8489761998-12-21 18:56:13 +0000162 Return a copy of \var{s}, but with upper case letters converted to
163 lower case.
Guido van Rossum5fdeeea1994-01-02 01:22:07 +0000164\end{funcdesc}
165
Guido van Rossumf4d0d571996-07-30 18:23:05 +0000166\begin{funcdesc}{maketrans}{from, to}
Fred Drakee8489761998-12-21 18:56:13 +0000167 Return a translation table suitable for passing to
168 \function{translate()} or \function{regex.compile()}, that will map
169 each character in \var{from} into the character at the same position
170 in \var{to}; \var{from} and \var{to} must have the same length.
Guido van Rossuma3eebe61998-06-11 16:03:30 +0000171
Fred Drake0682be42000-04-10 18:35:49 +0000172 \strong{Warning:} don't use strings derived from \constant{lowercase}
173 and \constant{uppercase} as arguments; in some locales, these don't have
Fred Drakee8489761998-12-21 18:56:13 +0000174 the same length. For case conversions, always use
175 \function{lower()} and \function{upper()}.
Guido van Rossumf4d0d571996-07-30 18:23:05 +0000176\end{funcdesc}
177
Fred Drakecce10901998-03-17 06:33:25 +0000178\begin{funcdesc}{split}{s\optional{, sep\optional{, maxsplit}}}
Fred Drakee8489761998-12-21 18:56:13 +0000179 Return a list of the words of the string \var{s}. If the optional
180 second argument \var{sep} is absent or \code{None}, the words are
181 separated by arbitrary strings of whitespace characters (space, tab,
182 newline, return, formfeed). If the second argument \var{sep} is
183 present and not \code{None}, it specifies a string to be used as the
Fred Drakea7ce52b01999-05-27 17:18:08 +0000184 word separator. The returned list will then have one more item
Fred Drakee8489761998-12-21 18:56:13 +0000185 than the number of non-overlapping occurrences of the separator in
186 the string. The optional third argument \var{maxsplit} defaults to
187 0. If it is nonzero, at most \var{maxsplit} number of splits occur,
188 and the remainder of the string is returned as the final element of
189 the list (thus, the list will have at most \code{\var{maxsplit}+1}
190 elements).
Guido van Rossum5fdeeea1994-01-02 01:22:07 +0000191\end{funcdesc}
192
Fred Drakecce10901998-03-17 06:33:25 +0000193\begin{funcdesc}{splitfields}{s\optional{, sep\optional{, maxsplit}}}
Fred Drakee8489761998-12-21 18:56:13 +0000194 This function behaves identically to \function{split()}. (In the
195 past, \function{split()} was only used with one argument, while
196 \function{splitfields()} was only used with two arguments.)
Guido van Rossum5fdeeea1994-01-02 01:22:07 +0000197\end{funcdesc}
198
Fred Drakecce10901998-03-17 06:33:25 +0000199\begin{funcdesc}{join}{words\optional{, sep}}
Fred Drakee8489761998-12-21 18:56:13 +0000200 Concatenate a list or tuple of words with intervening occurrences of
201 \var{sep}. The default value for \var{sep} is a single space
202 character. It is always true that
203 \samp{string.join(string.split(\var{s}, \var{sep}), \var{sep})}
204 equals \var{s}.
Guido van Rossume5e55d71996-08-09 21:44:51 +0000205\end{funcdesc}
206
Fred Drakecce10901998-03-17 06:33:25 +0000207\begin{funcdesc}{joinfields}{words\optional{, sep}}
Fred Drakee8489761998-12-21 18:56:13 +0000208 This function behaves identical to \function{join()}. (In the past,
209 \function{join()} was only used with one argument, while
210 \function{joinfields()} was only used with two arguments.)
Guido van Rossume5e55d71996-08-09 21:44:51 +0000211\end{funcdesc}
212
213\begin{funcdesc}{lstrip}{s}
Fred Drakee8489761998-12-21 18:56:13 +0000214 Return a copy of \var{s} but without leading whitespace characters.
Guido van Rossume5e55d71996-08-09 21:44:51 +0000215\end{funcdesc}
216
217\begin{funcdesc}{rstrip}{s}
Fred Drakee8489761998-12-21 18:56:13 +0000218 Return a copy of \var{s} but without trailing whitespace
219 characters.
Guido van Rossum5fdeeea1994-01-02 01:22:07 +0000220\end{funcdesc}
221
222\begin{funcdesc}{strip}{s}
Fred Drakee8489761998-12-21 18:56:13 +0000223 Return a copy of \var{s} without leading or trailing whitespace.
Guido van Rossum5fdeeea1994-01-02 01:22:07 +0000224\end{funcdesc}
225
226\begin{funcdesc}{swapcase}{s}
Fred Drakee8489761998-12-21 18:56:13 +0000227 Return a copy of \var{s}, but with lower case letters
228 converted to upper case and vice versa.
Guido van Rossum5fdeeea1994-01-02 01:22:07 +0000229\end{funcdesc}
230
Guido van Rossumf4d0d571996-07-30 18:23:05 +0000231\begin{funcdesc}{translate}{s, table\optional{, deletechars}}
Fred Drakee8489761998-12-21 18:56:13 +0000232 Delete all characters from \var{s} that are in \var{deletechars} (if
233 present), and then translate the characters using \var{table}, which
234 must be a 256-character string giving the translation for each
235 character value, indexed by its ordinal.
Guido van Rossumf65f2781995-09-13 17:37:21 +0000236\end{funcdesc}
237
Guido van Rossum5fdeeea1994-01-02 01:22:07 +0000238\begin{funcdesc}{upper}{s}
Fred Drakee8489761998-12-21 18:56:13 +0000239 Return a copy of \var{s}, but with lower case letters converted to
240 upper case.
Guido van Rossum5fdeeea1994-01-02 01:22:07 +0000241\end{funcdesc}
242
Fred Drakecce10901998-03-17 06:33:25 +0000243\begin{funcdesc}{ljust}{s, width}
244\funcline{rjust}{s, width}
245\funcline{center}{s, width}
Fred Drakee8489761998-12-21 18:56:13 +0000246 These functions respectively left-justify, right-justify and center
247 a string in a field of given width. They return a string that is at
248 least \var{width} characters wide, created by padding the string
249 \var{s} with spaces until the given width on the right, left or both
250 sides. The string is never truncated.
Guido van Rossum5fdeeea1994-01-02 01:22:07 +0000251\end{funcdesc}
252
Fred Drakecce10901998-03-17 06:33:25 +0000253\begin{funcdesc}{zfill}{s, width}
Fred Drakee8489761998-12-21 18:56:13 +0000254 Pad a numeric string on the left with zero digits until the given
255 width is reached. Strings starting with a sign are handled
256 correctly.
Guido van Rossum5fdeeea1994-01-02 01:22:07 +0000257\end{funcdesc}
Guido van Rossum0bf4d891995-03-02 12:37:30 +0000258
Guido van Rossum740eb821997-04-02 05:56:16 +0000259\begin{funcdesc}{replace}{str, old, new\optional{, maxsplit}}
Fred Drakee8489761998-12-21 18:56:13 +0000260 Return a copy of string \var{str} with all occurrences of substring
261 \var{old} replaced by \var{new}. If the optional argument
262 \var{maxsplit} is given, the first \var{maxsplit} occurrences are
263 replaced.
Guido van Rossumc8a80cd1997-03-25 16:41:31 +0000264\end{funcdesc}
265
Guido van Rossum0bf4d891995-03-02 12:37:30 +0000266This module is implemented in Python. Much of its functionality has
Fred Drakecce10901998-03-17 06:33:25 +0000267been reimplemented in the built-in module
268\module{strop}\refbimodindex{strop}. However, you
Guido van Rossum0bf4d891995-03-02 12:37:30 +0000269should \emph{never} import the latter module directly. When
Fred Drakecce10901998-03-17 06:33:25 +0000270\module{string} discovers that \module{strop} exists, it transparently
271replaces parts of itself with the implementation from \module{strop}.
Guido van Rossum0bf4d891995-03-02 12:37:30 +0000272After initialization, there is \emph{no} overhead in using
Fred Drakecce10901998-03-17 06:33:25 +0000273\module{string} instead of \module{strop}.