blob: 6abcd1307c6b9e540bb1bbaceb4392b66a1c959c [file] [log] [blame]
Fred Drake295da241998-08-10 19:42:37 +00001\section{\module{string} ---
Fred Drakeffbe6871999-04-22 21:23:22 +00002 Common string operations}
Fred Drakeb91e9341998-07-23 17:59:49 +00003
Fred Drakeffbe6871999-04-22 21:23:22 +00004\declaremodule{standard}{string}
Fred Drakeb91e9341998-07-23 17:59:49 +00005\modulesynopsis{Common string operations.}
6
Guido van Rossum5fdeeea1994-01-02 01:22:07 +00007
8This module defines some constants useful for checking character
Fred Drake6d2bdb61997-12-16 04:04:25 +00009classes and some useful string functions. See the module
Fred Drakeffbe6871999-04-22 21:23:22 +000010\refmodule{re}\refstmodindex{re} for string functions based on regular
Fred Drakecce10901998-03-17 06:33:25 +000011expressions.
Guido van Rossum0bf4d891995-03-02 12:37:30 +000012
13The constants defined in this module are are:
Guido van Rossum5fdeeea1994-01-02 01:22:07 +000014
Guido van Rossum5fdeeea1994-01-02 01:22:07 +000015\begin{datadesc}{digits}
16 The string \code{'0123456789'}.
17\end{datadesc}
18
19\begin{datadesc}{hexdigits}
20 The string \code{'0123456789abcdefABCDEF'}.
21\end{datadesc}
22
23\begin{datadesc}{letters}
Fred Drake0682be42000-04-10 18:35:49 +000024 The concatenation of the strings \constant{lowercase} and
25 \constant{uppercase} described below.
Guido van Rossum5fdeeea1994-01-02 01:22:07 +000026\end{datadesc}
27
28\begin{datadesc}{lowercase}
29 A string containing all the characters that are considered lowercase
30 letters. On most systems this is the string
Guido van Rossum86751151995-02-28 17:14:32 +000031 \code{'abcdefghijklmnopqrstuvwxyz'}. Do not change its definition ---
Fred Drakecce10901998-03-17 06:33:25 +000032 the effect on the routines \function{upper()} and
33 \function{swapcase()} is undefined.
Guido van Rossum5fdeeea1994-01-02 01:22:07 +000034\end{datadesc}
35
36\begin{datadesc}{octdigits}
37 The string \code{'01234567'}.
38\end{datadesc}
39
40\begin{datadesc}{uppercase}
41 A string containing all the characters that are considered uppercase
42 letters. On most systems this is the string
Guido van Rossum86751151995-02-28 17:14:32 +000043 \code{'ABCDEFGHIJKLMNOPQRSTUVWXYZ'}. Do not change its definition ---
Fred Drakecce10901998-03-17 06:33:25 +000044 the effect on the routines \function{lower()} and
45 \function{swapcase()} is undefined.
Guido van Rossum5fdeeea1994-01-02 01:22:07 +000046\end{datadesc}
47
48\begin{datadesc}{whitespace}
49 A string containing all characters that are considered whitespace.
50 On most systems this includes the characters space, tab, linefeed,
Guido van Rossum86751151995-02-28 17:14:32 +000051 return, formfeed, and vertical tab. Do not change its definition ---
Fred Drakecce10901998-03-17 06:33:25 +000052 the effect on the routines \function{strip()} and \function{split()}
53 is undefined.
Guido van Rossum5fdeeea1994-01-02 01:22:07 +000054\end{datadesc}
55
Guido van Rossum5fdeeea1994-01-02 01:22:07 +000056
Fred Drake1b194f922000-09-09 05:34:06 +000057Many of the functions provided by this module are also defined as
58methods of string and Unicode objects; see ``String Methods'' (section
59\ref{string-methods}) for more information on those.
60The functions defined in this module are:
Guido van Rossum5fdeeea1994-01-02 01:22:07 +000061
62\begin{funcdesc}{atof}{s}
Fred Drakee8489761998-12-21 18:56:13 +000063 Convert a string to a floating point number. The string must have
64 the standard syntax for a floating point literal in Python,
Fred Drake70a66c91999-02-18 16:08:36 +000065 optionally preceded by a sign (\samp{+} or \samp{-}). Note that
66 this behaves identical to the built-in function
67 \function{float()}\bifuncindex{float} when passed a string.
68
69 \strong{Note:} When passing in a string, values for NaN\index{NaN}
70 and Infinity\index{Infinity} may be returned, depending on the
71 underlying C library. The specific set of strings accepted which
72 cause these values to be returned depends entirely on the C library
73 and is known to vary.
Guido van Rossum5fdeeea1994-01-02 01:22:07 +000074\end{funcdesc}
75
Fred Drakecce10901998-03-17 06:33:25 +000076\begin{funcdesc}{atoi}{s\optional{, base}}
Fred Drakee8489761998-12-21 18:56:13 +000077 Convert string \var{s} to an integer in the given \var{base}. The
78 string must consist of one or more digits, optionally preceded by a
79 sign (\samp{+} or \samp{-}). The \var{base} defaults to 10. If it
80 is 0, a default base is chosen depending on the leading characters
81 of the string (after stripping the sign): \samp{0x} or \samp{0X}
82 means 16, \samp{0} means 8, anything else means 10. If \var{base}
83 is 16, a leading \samp{0x} or \samp{0X} is always accepted. Note
84 that when invoked without \var{base} or with \var{base} set to 10,
85 this behaves identical to the built-in function \function{int()}
86 when passed a string. (Also note: for a more flexible
87 interpretation of numeric literals, use the built-in function
88 \function{eval()}\bifuncindex{eval}.)
Guido van Rossum5fdeeea1994-01-02 01:22:07 +000089\end{funcdesc}
90
Fred Drakecce10901998-03-17 06:33:25 +000091\begin{funcdesc}{atol}{s\optional{, base}}
Fred Drakee8489761998-12-21 18:56:13 +000092 Convert string \var{s} to a long integer in the given \var{base}.
93 The string must consist of one or more digits, optionally preceded
94 by a sign (\samp{+} or \samp{-}). The \var{base} argument has the
95 same meaning as for \function{atoi()}. A trailing \samp{l} or
96 \samp{L} is not allowed, except if the base is 0. Note that when
97 invoked without \var{base} or with \var{base} set to 10, this
98 behaves identical to the built-in function
99 \function{long()}\bifuncindex{long} when passed a string.
Guido van Rossum5fdeeea1994-01-02 01:22:07 +0000100\end{funcdesc}
101
Guido van Rossume5e55d71996-08-09 21:44:51 +0000102\begin{funcdesc}{capitalize}{word}
Fred Drakee8489761998-12-21 18:56:13 +0000103 Capitalize the first character of the argument.
Guido van Rossume5e55d71996-08-09 21:44:51 +0000104\end{funcdesc}
105
106\begin{funcdesc}{capwords}{s}
Fred Drakee8489761998-12-21 18:56:13 +0000107 Split the argument into words using \function{split()}, capitalize
108 each word using \function{capitalize()}, and join the capitalized
109 words using \function{join()}. Note that this replaces runs of
110 whitespace characters by a single space, and removes leading and
111 trailing whitespace.
Guido van Rossume5e55d71996-08-09 21:44:51 +0000112\end{funcdesc}
113
Guido van Rossum9700e9b1999-01-25 22:31:53 +0000114\begin{funcdesc}{expandtabs}{s, \optional{tabsize}}
Fred Drakee8489761998-12-21 18:56:13 +0000115 Expand tabs in a string, i.e.\ replace them by one or more spaces,
116 depending on the current column and the given tab size. The column
117 number is reset to zero after each newline occurring in the string.
118 This doesn't understand other non-printing characters or escape
Guido van Rossum9700e9b1999-01-25 22:31:53 +0000119 sequences. The tab size defaults to 8.
Guido van Rossum5fdeeea1994-01-02 01:22:07 +0000120\end{funcdesc}
121
Fred Drakecce10901998-03-17 06:33:25 +0000122\begin{funcdesc}{find}{s, sub\optional{, start\optional{,end}}}
Fred Drakee8489761998-12-21 18:56:13 +0000123 Return the lowest index in \var{s} where the substring \var{sub} is
124 found such that \var{sub} is wholly contained in
125 \code{\var{s}[\var{start}:\var{end}]}. Return \code{-1} on failure.
126 Defaults for \var{start} and \var{end} and interpretation of
127 negative values is the same as for slices.
Guido van Rossum5fdeeea1994-01-02 01:22:07 +0000128\end{funcdesc}
129
Fred Drakecce10901998-03-17 06:33:25 +0000130\begin{funcdesc}{rfind}{s, sub\optional{, start\optional{, end}}}
Fred Drakee8489761998-12-21 18:56:13 +0000131 Like \function{find()} but find the highest index.
Guido van Rossum5fdeeea1994-01-02 01:22:07 +0000132\end{funcdesc}
133
Fred Drakecce10901998-03-17 06:33:25 +0000134\begin{funcdesc}{index}{s, sub\optional{, start\optional{, end}}}
Fred Drakee8489761998-12-21 18:56:13 +0000135 Like \function{find()} but raise \exception{ValueError} when the
136 substring is not found.
Guido van Rossum5fdeeea1994-01-02 01:22:07 +0000137\end{funcdesc}
138
Fred Drakecce10901998-03-17 06:33:25 +0000139\begin{funcdesc}{rindex}{s, sub\optional{, start\optional{, end}}}
Fred Drakee8489761998-12-21 18:56:13 +0000140 Like \function{rfind()} but raise \exception{ValueError} when the
141 substring is not found.
Guido van Rossum5fdeeea1994-01-02 01:22:07 +0000142\end{funcdesc}
143
Fred Drakecce10901998-03-17 06:33:25 +0000144\begin{funcdesc}{count}{s, sub\optional{, start\optional{, end}}}
Fred Drakee8489761998-12-21 18:56:13 +0000145 Return the number of (non-overlapping) occurrences of substring
146 \var{sub} in string \code{\var{s}[\var{start}:\var{end}]}.
147 Defaults for \var{start} and \var{end} and interpretation of
Andrew M. Kuchlinga4ca07c2000-06-21 01:48:46 +0000148 negative values are the same as for slices.
Guido van Rossumab3a2501994-08-01 12:18:36 +0000149\end{funcdesc}
150
Guido van Rossum5fdeeea1994-01-02 01:22:07 +0000151\begin{funcdesc}{lower}{s}
Fred Drakee8489761998-12-21 18:56:13 +0000152 Return a copy of \var{s}, but with upper case letters converted to
153 lower case.
Guido van Rossum5fdeeea1994-01-02 01:22:07 +0000154\end{funcdesc}
155
Guido van Rossumf4d0d571996-07-30 18:23:05 +0000156\begin{funcdesc}{maketrans}{from, to}
Fred Drakee8489761998-12-21 18:56:13 +0000157 Return a translation table suitable for passing to
158 \function{translate()} or \function{regex.compile()}, that will map
159 each character in \var{from} into the character at the same position
160 in \var{to}; \var{from} and \var{to} must have the same length.
Guido van Rossuma3eebe61998-06-11 16:03:30 +0000161
Fred Drake0682be42000-04-10 18:35:49 +0000162 \strong{Warning:} don't use strings derived from \constant{lowercase}
163 and \constant{uppercase} as arguments; in some locales, these don't have
Fred Drakee8489761998-12-21 18:56:13 +0000164 the same length. For case conversions, always use
165 \function{lower()} and \function{upper()}.
Guido van Rossumf4d0d571996-07-30 18:23:05 +0000166\end{funcdesc}
167
Fred Drakecce10901998-03-17 06:33:25 +0000168\begin{funcdesc}{split}{s\optional{, sep\optional{, maxsplit}}}
Fred Drakee8489761998-12-21 18:56:13 +0000169 Return a list of the words of the string \var{s}. If the optional
170 second argument \var{sep} is absent or \code{None}, the words are
171 separated by arbitrary strings of whitespace characters (space, tab,
172 newline, return, formfeed). If the second argument \var{sep} is
173 present and not \code{None}, it specifies a string to be used as the
Fred Drakea7ce52b01999-05-27 17:18:08 +0000174 word separator. The returned list will then have one more item
Fred Drakee8489761998-12-21 18:56:13 +0000175 than the number of non-overlapping occurrences of the separator in
176 the string. The optional third argument \var{maxsplit} defaults to
177 0. If it is nonzero, at most \var{maxsplit} number of splits occur,
178 and the remainder of the string is returned as the final element of
179 the list (thus, the list will have at most \code{\var{maxsplit}+1}
180 elements).
Guido van Rossum5fdeeea1994-01-02 01:22:07 +0000181\end{funcdesc}
182
Fred Drakecce10901998-03-17 06:33:25 +0000183\begin{funcdesc}{splitfields}{s\optional{, sep\optional{, maxsplit}}}
Fred Drakee8489761998-12-21 18:56:13 +0000184 This function behaves identically to \function{split()}. (In the
185 past, \function{split()} was only used with one argument, while
186 \function{splitfields()} was only used with two arguments.)
Guido van Rossum5fdeeea1994-01-02 01:22:07 +0000187\end{funcdesc}
188
Fred Drakecce10901998-03-17 06:33:25 +0000189\begin{funcdesc}{join}{words\optional{, sep}}
Fred Drakee8489761998-12-21 18:56:13 +0000190 Concatenate a list or tuple of words with intervening occurrences of
191 \var{sep}. The default value for \var{sep} is a single space
192 character. It is always true that
193 \samp{string.join(string.split(\var{s}, \var{sep}), \var{sep})}
194 equals \var{s}.
Guido van Rossume5e55d71996-08-09 21:44:51 +0000195\end{funcdesc}
196
Fred Drakecce10901998-03-17 06:33:25 +0000197\begin{funcdesc}{joinfields}{words\optional{, sep}}
Fred Drakee8489761998-12-21 18:56:13 +0000198 This function behaves identical to \function{join()}. (In the past,
199 \function{join()} was only used with one argument, while
200 \function{joinfields()} was only used with two arguments.)
Guido van Rossume5e55d71996-08-09 21:44:51 +0000201\end{funcdesc}
202
203\begin{funcdesc}{lstrip}{s}
Fred Drakee8489761998-12-21 18:56:13 +0000204 Return a copy of \var{s} but without leading whitespace characters.
Guido van Rossume5e55d71996-08-09 21:44:51 +0000205\end{funcdesc}
206
207\begin{funcdesc}{rstrip}{s}
Fred Drakee8489761998-12-21 18:56:13 +0000208 Return a copy of \var{s} but without trailing whitespace
209 characters.
Guido van Rossum5fdeeea1994-01-02 01:22:07 +0000210\end{funcdesc}
211
212\begin{funcdesc}{strip}{s}
Fred Drakee8489761998-12-21 18:56:13 +0000213 Return a copy of \var{s} without leading or trailing whitespace.
Guido van Rossum5fdeeea1994-01-02 01:22:07 +0000214\end{funcdesc}
215
216\begin{funcdesc}{swapcase}{s}
Fred Drakee8489761998-12-21 18:56:13 +0000217 Return a copy of \var{s}, but with lower case letters
218 converted to upper case and vice versa.
Guido van Rossum5fdeeea1994-01-02 01:22:07 +0000219\end{funcdesc}
220
Guido van Rossumf4d0d571996-07-30 18:23:05 +0000221\begin{funcdesc}{translate}{s, table\optional{, deletechars}}
Fred Drakee8489761998-12-21 18:56:13 +0000222 Delete all characters from \var{s} that are in \var{deletechars} (if
223 present), and then translate the characters using \var{table}, which
224 must be a 256-character string giving the translation for each
225 character value, indexed by its ordinal.
Guido van Rossumf65f2781995-09-13 17:37:21 +0000226\end{funcdesc}
227
Guido van Rossum5fdeeea1994-01-02 01:22:07 +0000228\begin{funcdesc}{upper}{s}
Fred Drakee8489761998-12-21 18:56:13 +0000229 Return a copy of \var{s}, but with lower case letters converted to
230 upper case.
Guido van Rossum5fdeeea1994-01-02 01:22:07 +0000231\end{funcdesc}
232
Fred Drakecce10901998-03-17 06:33:25 +0000233\begin{funcdesc}{ljust}{s, width}
234\funcline{rjust}{s, width}
235\funcline{center}{s, width}
Fred Drakee8489761998-12-21 18:56:13 +0000236 These functions respectively left-justify, right-justify and center
237 a string in a field of given width. They return a string that is at
238 least \var{width} characters wide, created by padding the string
239 \var{s} with spaces until the given width on the right, left or both
240 sides. The string is never truncated.
Guido van Rossum5fdeeea1994-01-02 01:22:07 +0000241\end{funcdesc}
242
Fred Drakecce10901998-03-17 06:33:25 +0000243\begin{funcdesc}{zfill}{s, width}
Fred Drakee8489761998-12-21 18:56:13 +0000244 Pad a numeric string on the left with zero digits until the given
245 width is reached. Strings starting with a sign are handled
246 correctly.
Guido van Rossum5fdeeea1994-01-02 01:22:07 +0000247\end{funcdesc}
Guido van Rossum0bf4d891995-03-02 12:37:30 +0000248
Guido van Rossum740eb821997-04-02 05:56:16 +0000249\begin{funcdesc}{replace}{str, old, new\optional{, maxsplit}}
Fred Drakee8489761998-12-21 18:56:13 +0000250 Return a copy of string \var{str} with all occurrences of substring
251 \var{old} replaced by \var{new}. If the optional argument
252 \var{maxsplit} is given, the first \var{maxsplit} occurrences are
253 replaced.
Guido van Rossumc8a80cd1997-03-25 16:41:31 +0000254\end{funcdesc}
255
Guido van Rossum0bf4d891995-03-02 12:37:30 +0000256This module is implemented in Python. Much of its functionality has
Fred Drakecce10901998-03-17 06:33:25 +0000257been reimplemented in the built-in module
258\module{strop}\refbimodindex{strop}. However, you
Guido van Rossum0bf4d891995-03-02 12:37:30 +0000259should \emph{never} import the latter module directly. When
Fred Drakecce10901998-03-17 06:33:25 +0000260\module{string} discovers that \module{strop} exists, it transparently
261replaces parts of itself with the implementation from \module{strop}.
Guido van Rossum0bf4d891995-03-02 12:37:30 +0000262After initialization, there is \emph{no} overhead in using
Fred Drakecce10901998-03-17 06:33:25 +0000263\module{string} instead of \module{strop}.