blob: 442486369b17944633490a388ba8d4673fb72590 [file] [log] [blame]
Fred Drake295da241998-08-10 19:42:37 +00001\section{\module{string} ---
2 Common string operations.}
Fred Drakeb91e9341998-07-23 17:59:49 +00003\declaremodule{standard}{string}
4
5\modulesynopsis{Common string operations.}
6
Guido van Rossum5fdeeea1994-01-02 01:22:07 +00007
8This module defines some constants useful for checking character
Fred Drake6d2bdb61997-12-16 04:04:25 +00009classes and some useful string functions. See the module
Fred Drakecce10901998-03-17 06:33:25 +000010\module{re}\refstmodindex{re} for string functions based on regular
11expressions.
Guido van Rossum0bf4d891995-03-02 12:37:30 +000012
13The constants defined in this module are are:
Guido van Rossum5fdeeea1994-01-02 01:22:07 +000014
Guido van Rossum5fdeeea1994-01-02 01:22:07 +000015\begin{datadesc}{digits}
16 The string \code{'0123456789'}.
17\end{datadesc}
18
19\begin{datadesc}{hexdigits}
20 The string \code{'0123456789abcdefABCDEF'}.
21\end{datadesc}
22
23\begin{datadesc}{letters}
Fred Drakecce10901998-03-17 06:33:25 +000024 The concatenation of the strings \function{lowercase()} and
25 \function{uppercase()} described below.
Guido van Rossum5fdeeea1994-01-02 01:22:07 +000026\end{datadesc}
27
28\begin{datadesc}{lowercase}
29 A string containing all the characters that are considered lowercase
30 letters. On most systems this is the string
Guido van Rossum86751151995-02-28 17:14:32 +000031 \code{'abcdefghijklmnopqrstuvwxyz'}. Do not change its definition ---
Fred Drakecce10901998-03-17 06:33:25 +000032 the effect on the routines \function{upper()} and
33 \function{swapcase()} is undefined.
Guido van Rossum5fdeeea1994-01-02 01:22:07 +000034\end{datadesc}
35
36\begin{datadesc}{octdigits}
37 The string \code{'01234567'}.
38\end{datadesc}
39
40\begin{datadesc}{uppercase}
41 A string containing all the characters that are considered uppercase
42 letters. On most systems this is the string
Guido van Rossum86751151995-02-28 17:14:32 +000043 \code{'ABCDEFGHIJKLMNOPQRSTUVWXYZ'}. Do not change its definition ---
Fred Drakecce10901998-03-17 06:33:25 +000044 the effect on the routines \function{lower()} and
45 \function{swapcase()} is undefined.
Guido van Rossum5fdeeea1994-01-02 01:22:07 +000046\end{datadesc}
47
48\begin{datadesc}{whitespace}
49 A string containing all characters that are considered whitespace.
50 On most systems this includes the characters space, tab, linefeed,
Guido van Rossum86751151995-02-28 17:14:32 +000051 return, formfeed, and vertical tab. Do not change its definition ---
Fred Drakecce10901998-03-17 06:33:25 +000052 the effect on the routines \function{strip()} and \function{split()}
53 is undefined.
Guido van Rossum5fdeeea1994-01-02 01:22:07 +000054\end{datadesc}
55
Guido van Rossum0bf4d891995-03-02 12:37:30 +000056The functions defined in this module are:
Guido van Rossum5fdeeea1994-01-02 01:22:07 +000057
Guido van Rossum5fdeeea1994-01-02 01:22:07 +000058
59\begin{funcdesc}{atof}{s}
Fred Drakee8489761998-12-21 18:56:13 +000060 Convert a string to a floating point number. The string must have
61 the standard syntax for a floating point literal in Python,
62 optionally preceded by a sign (\samp{+} or \samp{-}). Note that
63 this behaves identical to the built-in function
64 \function{float()}\bifuncindex{float} when passed a string.
Guido van Rossum5fdeeea1994-01-02 01:22:07 +000065\end{funcdesc}
66
Fred Drakecce10901998-03-17 06:33:25 +000067\begin{funcdesc}{atoi}{s\optional{, base}}
Fred Drakee8489761998-12-21 18:56:13 +000068 Convert string \var{s} to an integer in the given \var{base}. The
69 string must consist of one or more digits, optionally preceded by a
70 sign (\samp{+} or \samp{-}). The \var{base} defaults to 10. If it
71 is 0, a default base is chosen depending on the leading characters
72 of the string (after stripping the sign): \samp{0x} or \samp{0X}
73 means 16, \samp{0} means 8, anything else means 10. If \var{base}
74 is 16, a leading \samp{0x} or \samp{0X} is always accepted. Note
75 that when invoked without \var{base} or with \var{base} set to 10,
76 this behaves identical to the built-in function \function{int()}
77 when passed a string. (Also note: for a more flexible
78 interpretation of numeric literals, use the built-in function
79 \function{eval()}\bifuncindex{eval}.)
Guido van Rossum5fdeeea1994-01-02 01:22:07 +000080\end{funcdesc}
81
Fred Drakecce10901998-03-17 06:33:25 +000082\begin{funcdesc}{atol}{s\optional{, base}}
Fred Drakee8489761998-12-21 18:56:13 +000083 Convert string \var{s} to a long integer in the given \var{base}.
84 The string must consist of one or more digits, optionally preceded
85 by a sign (\samp{+} or \samp{-}). The \var{base} argument has the
86 same meaning as for \function{atoi()}. A trailing \samp{l} or
87 \samp{L} is not allowed, except if the base is 0. Note that when
88 invoked without \var{base} or with \var{base} set to 10, this
89 behaves identical to the built-in function
90 \function{long()}\bifuncindex{long} when passed a string.
Guido van Rossum5fdeeea1994-01-02 01:22:07 +000091\end{funcdesc}
92
Guido van Rossume5e55d71996-08-09 21:44:51 +000093\begin{funcdesc}{capitalize}{word}
Fred Drakee8489761998-12-21 18:56:13 +000094 Capitalize the first character of the argument.
Guido van Rossume5e55d71996-08-09 21:44:51 +000095\end{funcdesc}
96
97\begin{funcdesc}{capwords}{s}
Fred Drakee8489761998-12-21 18:56:13 +000098 Split the argument into words using \function{split()}, capitalize
99 each word using \function{capitalize()}, and join the capitalized
100 words using \function{join()}. Note that this replaces runs of
101 whitespace characters by a single space, and removes leading and
102 trailing whitespace.
Guido van Rossume5e55d71996-08-09 21:44:51 +0000103\end{funcdesc}
104
Guido van Rossum9700e9b1999-01-25 22:31:53 +0000105\begin{funcdesc}{expandtabs}{s, \optional{tabsize}}
Fred Drakee8489761998-12-21 18:56:13 +0000106 Expand tabs in a string, i.e.\ replace them by one or more spaces,
107 depending on the current column and the given tab size. The column
108 number is reset to zero after each newline occurring in the string.
109 This doesn't understand other non-printing characters or escape
Guido van Rossum9700e9b1999-01-25 22:31:53 +0000110 sequences. The tab size defaults to 8.
Guido van Rossum5fdeeea1994-01-02 01:22:07 +0000111\end{funcdesc}
112
Fred Drakecce10901998-03-17 06:33:25 +0000113\begin{funcdesc}{find}{s, sub\optional{, start\optional{,end}}}
Fred Drakee8489761998-12-21 18:56:13 +0000114 Return the lowest index in \var{s} where the substring \var{sub} is
115 found such that \var{sub} is wholly contained in
116 \code{\var{s}[\var{start}:\var{end}]}. Return \code{-1} on failure.
117 Defaults for \var{start} and \var{end} and interpretation of
118 negative values is the same as for slices.
Guido van Rossum5fdeeea1994-01-02 01:22:07 +0000119\end{funcdesc}
120
Fred Drakecce10901998-03-17 06:33:25 +0000121\begin{funcdesc}{rfind}{s, sub\optional{, start\optional{, end}}}
Fred Drakee8489761998-12-21 18:56:13 +0000122 Like \function{find()} but find the highest index.
Guido van Rossum5fdeeea1994-01-02 01:22:07 +0000123\end{funcdesc}
124
Fred Drakecce10901998-03-17 06:33:25 +0000125\begin{funcdesc}{index}{s, sub\optional{, start\optional{, end}}}
Fred Drakee8489761998-12-21 18:56:13 +0000126 Like \function{find()} but raise \exception{ValueError} when the
127 substring is not found.
Guido van Rossum5fdeeea1994-01-02 01:22:07 +0000128\end{funcdesc}
129
Fred Drakecce10901998-03-17 06:33:25 +0000130\begin{funcdesc}{rindex}{s, sub\optional{, start\optional{, end}}}
Fred Drakee8489761998-12-21 18:56:13 +0000131 Like \function{rfind()} but raise \exception{ValueError} when the
132 substring is not found.
Guido van Rossum5fdeeea1994-01-02 01:22:07 +0000133\end{funcdesc}
134
Fred Drakecce10901998-03-17 06:33:25 +0000135\begin{funcdesc}{count}{s, sub\optional{, start\optional{, end}}}
Fred Drakee8489761998-12-21 18:56:13 +0000136 Return the number of (non-overlapping) occurrences of substring
137 \var{sub} in string \code{\var{s}[\var{start}:\var{end}]}.
138 Defaults for \var{start} and \var{end} and interpretation of
139 negative values is the same as for slices.
Guido van Rossumab3a2501994-08-01 12:18:36 +0000140\end{funcdesc}
141
Guido van Rossum5fdeeea1994-01-02 01:22:07 +0000142\begin{funcdesc}{lower}{s}
Fred Drakee8489761998-12-21 18:56:13 +0000143 Return a copy of \var{s}, but with upper case letters converted to
144 lower case.
Guido van Rossum5fdeeea1994-01-02 01:22:07 +0000145\end{funcdesc}
146
Guido van Rossumf4d0d571996-07-30 18:23:05 +0000147\begin{funcdesc}{maketrans}{from, to}
Fred Drakee8489761998-12-21 18:56:13 +0000148 Return a translation table suitable for passing to
149 \function{translate()} or \function{regex.compile()}, that will map
150 each character in \var{from} into the character at the same position
151 in \var{to}; \var{from} and \var{to} must have the same length.
Guido van Rossuma3eebe61998-06-11 16:03:30 +0000152
Fred Drakee8489761998-12-21 18:56:13 +0000153 \strong{Warning:} don't use strings derived from \code{lowercase}
154 and \code{uppercase} as arguments; in some locales, these don't have
155 the same length. For case conversions, always use
156 \function{lower()} and \function{upper()}.
Guido van Rossumf4d0d571996-07-30 18:23:05 +0000157\end{funcdesc}
158
Fred Drakecce10901998-03-17 06:33:25 +0000159\begin{funcdesc}{split}{s\optional{, sep\optional{, maxsplit}}}
Fred Drakee8489761998-12-21 18:56:13 +0000160 Return a list of the words of the string \var{s}. If the optional
161 second argument \var{sep} is absent or \code{None}, the words are
162 separated by arbitrary strings of whitespace characters (space, tab,
163 newline, return, formfeed). If the second argument \var{sep} is
164 present and not \code{None}, it specifies a string to be used as the
165 word separator. The returned list will then have one more items
166 than the number of non-overlapping occurrences of the separator in
167 the string. The optional third argument \var{maxsplit} defaults to
168 0. If it is nonzero, at most \var{maxsplit} number of splits occur,
169 and the remainder of the string is returned as the final element of
170 the list (thus, the list will have at most \code{\var{maxsplit}+1}
171 elements).
Guido van Rossum5fdeeea1994-01-02 01:22:07 +0000172\end{funcdesc}
173
Fred Drakecce10901998-03-17 06:33:25 +0000174\begin{funcdesc}{splitfields}{s\optional{, sep\optional{, maxsplit}}}
Fred Drakee8489761998-12-21 18:56:13 +0000175 This function behaves identically to \function{split()}. (In the
176 past, \function{split()} was only used with one argument, while
177 \function{splitfields()} was only used with two arguments.)
Guido van Rossum5fdeeea1994-01-02 01:22:07 +0000178\end{funcdesc}
179
Fred Drakecce10901998-03-17 06:33:25 +0000180\begin{funcdesc}{join}{words\optional{, sep}}
Fred Drakee8489761998-12-21 18:56:13 +0000181 Concatenate a list or tuple of words with intervening occurrences of
182 \var{sep}. The default value for \var{sep} is a single space
183 character. It is always true that
184 \samp{string.join(string.split(\var{s}, \var{sep}), \var{sep})}
185 equals \var{s}.
Guido van Rossume5e55d71996-08-09 21:44:51 +0000186\end{funcdesc}
187
Fred Drakecce10901998-03-17 06:33:25 +0000188\begin{funcdesc}{joinfields}{words\optional{, sep}}
Fred Drakee8489761998-12-21 18:56:13 +0000189 This function behaves identical to \function{join()}. (In the past,
190 \function{join()} was only used with one argument, while
191 \function{joinfields()} was only used with two arguments.)
Guido van Rossume5e55d71996-08-09 21:44:51 +0000192\end{funcdesc}
193
194\begin{funcdesc}{lstrip}{s}
Fred Drakee8489761998-12-21 18:56:13 +0000195 Return a copy of \var{s} but without leading whitespace characters.
Guido van Rossume5e55d71996-08-09 21:44:51 +0000196\end{funcdesc}
197
198\begin{funcdesc}{rstrip}{s}
Fred Drakee8489761998-12-21 18:56:13 +0000199 Return a copy of \var{s} but without trailing whitespace
200 characters.
Guido van Rossum5fdeeea1994-01-02 01:22:07 +0000201\end{funcdesc}
202
203\begin{funcdesc}{strip}{s}
Fred Drakee8489761998-12-21 18:56:13 +0000204 Return a copy of \var{s} without leading or trailing whitespace.
Guido van Rossum5fdeeea1994-01-02 01:22:07 +0000205\end{funcdesc}
206
207\begin{funcdesc}{swapcase}{s}
Fred Drakee8489761998-12-21 18:56:13 +0000208 Return a copy of \var{s}, but with lower case letters
209 converted to upper case and vice versa.
Guido van Rossum5fdeeea1994-01-02 01:22:07 +0000210\end{funcdesc}
211
Guido van Rossumf4d0d571996-07-30 18:23:05 +0000212\begin{funcdesc}{translate}{s, table\optional{, deletechars}}
Fred Drakee8489761998-12-21 18:56:13 +0000213 Delete all characters from \var{s} that are in \var{deletechars} (if
214 present), and then translate the characters using \var{table}, which
215 must be a 256-character string giving the translation for each
216 character value, indexed by its ordinal.
Guido van Rossumf65f2781995-09-13 17:37:21 +0000217\end{funcdesc}
218
Guido van Rossum5fdeeea1994-01-02 01:22:07 +0000219\begin{funcdesc}{upper}{s}
Fred Drakee8489761998-12-21 18:56:13 +0000220 Return a copy of \var{s}, but with lower case letters converted to
221 upper case.
Guido van Rossum5fdeeea1994-01-02 01:22:07 +0000222\end{funcdesc}
223
Fred Drakecce10901998-03-17 06:33:25 +0000224\begin{funcdesc}{ljust}{s, width}
225\funcline{rjust}{s, width}
226\funcline{center}{s, width}
Fred Drakee8489761998-12-21 18:56:13 +0000227 These functions respectively left-justify, right-justify and center
228 a string in a field of given width. They return a string that is at
229 least \var{width} characters wide, created by padding the string
230 \var{s} with spaces until the given width on the right, left or both
231 sides. The string is never truncated.
Guido van Rossum5fdeeea1994-01-02 01:22:07 +0000232\end{funcdesc}
233
Fred Drakecce10901998-03-17 06:33:25 +0000234\begin{funcdesc}{zfill}{s, width}
Fred Drakee8489761998-12-21 18:56:13 +0000235 Pad a numeric string on the left with zero digits until the given
236 width is reached. Strings starting with a sign are handled
237 correctly.
Guido van Rossum5fdeeea1994-01-02 01:22:07 +0000238\end{funcdesc}
Guido van Rossum0bf4d891995-03-02 12:37:30 +0000239
Guido van Rossum740eb821997-04-02 05:56:16 +0000240\begin{funcdesc}{replace}{str, old, new\optional{, maxsplit}}
Fred Drakee8489761998-12-21 18:56:13 +0000241 Return a copy of string \var{str} with all occurrences of substring
242 \var{old} replaced by \var{new}. If the optional argument
243 \var{maxsplit} is given, the first \var{maxsplit} occurrences are
244 replaced.
Guido van Rossumc8a80cd1997-03-25 16:41:31 +0000245\end{funcdesc}
246
Guido van Rossum0bf4d891995-03-02 12:37:30 +0000247This module is implemented in Python. Much of its functionality has
Fred Drakecce10901998-03-17 06:33:25 +0000248been reimplemented in the built-in module
249\module{strop}\refbimodindex{strop}. However, you
Guido van Rossum0bf4d891995-03-02 12:37:30 +0000250should \emph{never} import the latter module directly. When
Fred Drakecce10901998-03-17 06:33:25 +0000251\module{string} discovers that \module{strop} exists, it transparently
252replaces parts of itself with the implementation from \module{strop}.
Guido van Rossum0bf4d891995-03-02 12:37:30 +0000253After initialization, there is \emph{no} overhead in using
Fred Drakecce10901998-03-17 06:33:25 +0000254\module{string} instead of \module{strop}.