blob: aa2494b8887c645f1024a98324191d6af374ee1b [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001
2:mod:`string` --- Common string operations
3==========================================
4
5.. module:: string
6 :synopsis: Common string operations.
7
8
9.. index:: module: re
10
11The :mod:`string` module contains a number of useful constants and
12classes, as well as some deprecated legacy functions that are also
13available as methods on strings. In addition, Python's built-in string
14classes support the sequence type methods described in the
15:ref:`typesseq` section, and also the string-specific methods described
16in the :ref:`string-methods` section. To output formatted strings use
17template strings or the ``%`` operator described in the
18:ref:`string-formatting` section. Also, see the :mod:`re` module for
19string functions based on regular expressions.
20
21
22String constants
23----------------
24
25The constants defined in this module are:
26
27
28.. data:: ascii_letters
29
30 The concatenation of the :const:`ascii_lowercase` and :const:`ascii_uppercase`
31 constants described below. This value is not locale-dependent.
32
33
34.. data:: ascii_lowercase
35
36 The lowercase letters ``'abcdefghijklmnopqrstuvwxyz'``. This value is not
37 locale-dependent and will not change.
38
39
40.. data:: ascii_uppercase
41
42 The uppercase letters ``'ABCDEFGHIJKLMNOPQRSTUVWXYZ'``. This value is not
43 locale-dependent and will not change.
44
45
46.. data:: digits
47
48 The string ``'0123456789'``.
49
50
51.. data:: hexdigits
52
53 The string ``'0123456789abcdefABCDEF'``.
54
55
56.. data:: octdigits
57
58 The string ``'01234567'``.
59
60
61.. data:: punctuation
62
63 String of ASCII characters which are considered punctuation characters
64 in the ``C`` locale.
65
66
67.. data:: printable
68
69 String of ASCII characters which are considered printable. This is a
70 combination of :const:`digits`, :const:`ascii_letters`, :const:`punctuation`,
71 and :const:`whitespace`.
72
73
74.. data:: whitespace
75
76 A string containing all characters that are considered whitespace.
77 This includes the characters space, tab, linefeed, return, formfeed, and
78 vertical tab.
79
80
81Template strings
82----------------
83
84Templates provide simpler string substitutions as described in :pep:`292`.
85Instead of the normal ``%``\ -based substitutions, Templates support ``$``\
86-based substitutions, using the following rules:
87
88* ``$$`` is an escape; it is replaced with a single ``$``.
89
90* ``$identifier`` names a substitution placeholder matching a mapping key of
91 ``"identifier"``. By default, ``"identifier"`` must spell a Python
92 identifier. The first non-identifier character after the ``$`` character
93 terminates this placeholder specification.
94
95* ``${identifier}`` is equivalent to ``$identifier``. It is required when valid
96 identifier characters follow the placeholder but are not part of the
97 placeholder, such as ``"${noun}ification"``.
98
99Any other appearance of ``$`` in the string will result in a :exc:`ValueError`
100being raised.
101
102.. versionadded:: 2.4
103
104The :mod:`string` module provides a :class:`Template` class that implements
105these rules. The methods of :class:`Template` are:
106
107
108.. class:: Template(template)
109
110 The constructor takes a single argument which is the template string.
111
112
113.. method:: Template.substitute(mapping[, **kws])
114
115 Performs the template substitution, returning a new string. *mapping* is any
116 dictionary-like object with keys that match the placeholders in the template.
117 Alternatively, you can provide keyword arguments, where the keywords are the
118 placeholders. When both *mapping* and *kws* are given and there are duplicates,
119 the placeholders from *kws* take precedence.
120
121
122.. method:: Template.safe_substitute(mapping[, **kws])
123
124 Like :meth:`substitute`, except that if placeholders are missing from *mapping*
125 and *kws*, instead of raising a :exc:`KeyError` exception, the original
126 placeholder will appear in the resulting string intact. Also, unlike with
127 :meth:`substitute`, any other appearances of the ``$`` will simply return ``$``
128 instead of raising :exc:`ValueError`.
129
130 While other exceptions may still occur, this method is called "safe" because
131 substitutions always tries to return a usable string instead of raising an
132 exception. In another sense, :meth:`safe_substitute` may be anything other than
133 safe, since it will silently ignore malformed templates containing dangling
134 delimiters, unmatched braces, or placeholders that are not valid Python
135 identifiers.
136
137:class:`Template` instances also provide one public data attribute:
138
139
140.. attribute:: string.template
141
142 This is the object passed to the constructor's *template* argument. In general,
143 you shouldn't change it, but read-only access is not enforced.
144
145Here is an example of how to use a Template::
146
147 >>> from string import Template
148 >>> s = Template('$who likes $what')
149 >>> s.substitute(who='tim', what='kung pao')
150 'tim likes kung pao'
151 >>> d = dict(who='tim')
152 >>> Template('Give $who $100').substitute(d)
153 Traceback (most recent call last):
154 [...]
155 ValueError: Invalid placeholder in string: line 1, col 10
156 >>> Template('$who likes $what').substitute(d)
157 Traceback (most recent call last):
158 [...]
159 KeyError: 'what'
160 >>> Template('$who likes $what').safe_substitute(d)
161 'tim likes $what'
162
163Advanced usage: you can derive subclasses of :class:`Template` to customize the
164placeholder syntax, delimiter character, or the entire regular expression used
165to parse template strings. To do this, you can override these class attributes:
166
167* *delimiter* -- This is the literal string describing a placeholder introducing
168 delimiter. The default value ``$``. Note that this should *not* be a regular
169 expression, as the implementation will call :meth:`re.escape` on this string as
170 needed.
171
172* *idpattern* -- This is the regular expression describing the pattern for
173 non-braced placeholders (the braces will be added automatically as
174 appropriate). The default value is the regular expression
175 ``[_a-z][_a-z0-9]*``.
176
177Alternatively, you can provide the entire regular expression pattern by
178overriding the class attribute *pattern*. If you do this, the value must be a
179regular expression object with four named capturing groups. The capturing
180groups correspond to the rules given above, along with the invalid placeholder
181rule:
182
183* *escaped* -- This group matches the escape sequence, e.g. ``$$``, in the
184 default pattern.
185
186* *named* -- This group matches the unbraced placeholder name; it should not
187 include the delimiter in capturing group.
188
189* *braced* -- This group matches the brace enclosed placeholder name; it should
190 not include either the delimiter or braces in the capturing group.
191
192* *invalid* -- This group matches any other delimiter pattern (usually a single
193 delimiter), and it should appear last in the regular expression.
194
195
196String functions
197----------------
198
199The following functions are available to operate on string and Unicode objects.
200They are not available as string methods.
201
202
203.. function:: capwords(s)
204
205 Split the argument into words using :func:`split`, capitalize each word using
206 :func:`capitalize`, and join the capitalized words using :func:`join`. Note
207 that this replaces runs of whitespace characters by a single space, and removes
208 leading and trailing whitespace.
209
210
211.. function:: maketrans(from, to)
212
213 Return a translation table suitable for passing to :func:`translate`, that will
214 map each character in *from* into the character at the same position in *to*;
215 *from* and *to* must have the same length.
216
217 .. warning::
218
219 Don't use strings derived from :const:`lowercase` and :const:`uppercase` as
220 arguments; in some locales, these don't have the same length. For case
221 conversions, always use :func:`lower` and :func:`upper`.
222
223
224Deprecated string functions
225---------------------------
226
227The following list of functions are also defined as methods of string and
228Unicode objects; see section :ref:`string-methods` for more information on
229those. You should consider these functions as deprecated, although they will
230not be removed until Python 3.0. The functions defined in this module are:
231
232
233.. function:: atof(s)
234
235 .. deprecated:: 2.0
236 Use the :func:`float` built-in function.
237
238 .. index:: builtin: float
239
240 Convert a string to a floating point number. The string must have the standard
241 syntax for a floating point literal in Python, optionally preceded by a sign
242 (``+`` or ``-``). Note that this behaves identical to the built-in function
243 :func:`float` when passed a string.
244
245 .. note::
246
247 .. index::
248 single: NaN
249 single: Infinity
250
251 When passing in a string, values for NaN and Infinity may be returned, depending
252 on the underlying C library. The specific set of strings accepted which cause
253 these values to be returned depends entirely on the C library and is known to
254 vary.
255
256
257.. function:: atoi(s[, base])
258
259 .. deprecated:: 2.0
260 Use the :func:`int` built-in function.
261
262 .. index:: builtin: eval
263
264 Convert string *s* to an integer in the given *base*. The string must consist
265 of one or more digits, optionally preceded by a sign (``+`` or ``-``). The
266 *base* defaults to 10. If it is 0, a default base is chosen depending on the
267 leading characters of the string (after stripping the sign): ``0x`` or ``0X``
268 means 16, ``0`` means 8, anything else means 10. If *base* is 16, a leading
269 ``0x`` or ``0X`` is always accepted, though not required. This behaves
270 identically to the built-in function :func:`int` when passed a string. (Also
271 note: for a more flexible interpretation of numeric literals, use the built-in
272 function :func:`eval`.)
273
274
275.. function:: atol(s[, base])
276
277 .. deprecated:: 2.0
278 Use the :func:`long` built-in function.
279
280 .. index:: builtin: long
281
282 Convert string *s* to a long integer in the given *base*. The string must
283 consist of one or more digits, optionally preceded by a sign (``+`` or ``-``).
284 The *base* argument has the same meaning as for :func:`atoi`. A trailing ``l``
285 or ``L`` is not allowed, except if the base is 0. Note that when invoked
286 without *base* or with *base* set to 10, this behaves identical to the built-in
287 function :func:`long` when passed a string.
288
289
290.. function:: capitalize(word)
291
292 Return a copy of *word* with only its first character capitalized.
293
294
295.. function:: expandtabs(s[, tabsize])
296
297 Expand tabs in a string replacing them by one or more spaces, depending on the
298 current column and the given tab size. The column number is reset to zero after
299 each newline occurring in the string. This doesn't understand other non-printing
300 characters or escape sequences. The tab size defaults to 8.
301
302
303.. function:: find(s, sub[, start[,end]])
304
305 Return the lowest index in *s* where the substring *sub* is found such that
306 *sub* is wholly contained in ``s[start:end]``. Return ``-1`` on failure.
307 Defaults for *start* and *end* and interpretation of negative values is the same
308 as for slices.
309
310
311.. function:: rfind(s, sub[, start[, end]])
312
313 Like :func:`find` but find the highest index.
314
315
316.. function:: index(s, sub[, start[, end]])
317
318 Like :func:`find` but raise :exc:`ValueError` when the substring is not found.
319
320
321.. function:: rindex(s, sub[, start[, end]])
322
323 Like :func:`rfind` but raise :exc:`ValueError` when the substring is not found.
324
325
326.. function:: count(s, sub[, start[, end]])
327
328 Return the number of (non-overlapping) occurrences of substring *sub* in string
329 ``s[start:end]``. Defaults for *start* and *end* and interpretation of negative
330 values are the same as for slices.
331
332
333.. function:: lower(s)
334
335 Return a copy of *s*, but with upper case letters converted to lower case.
336
337
338.. function:: split(s[, sep[, maxsplit]])
339
340 Return a list of the words of the string *s*. If the optional second argument
341 *sep* is absent or ``None``, the words are separated by arbitrary strings of
342 whitespace characters (space, tab, newline, return, formfeed). If the second
343 argument *sep* is present and not ``None``, it specifies a string to be used as
344 the word separator. The returned list will then have one more item than the
345 number of non-overlapping occurrences of the separator in the string. The
346 optional third argument *maxsplit* defaults to 0. If it is nonzero, at most
347 *maxsplit* number of splits occur, and the remainder of the string is returned
348 as the final element of the list (thus, the list will have at most
349 ``maxsplit+1`` elements).
350
351 The behavior of split on an empty string depends on the value of *sep*. If *sep*
352 is not specified, or specified as ``None``, the result will be an empty list.
353 If *sep* is specified as any string, the result will be a list containing one
354 element which is an empty string.
355
356
357.. function:: rsplit(s[, sep[, maxsplit]])
358
359 Return a list of the words of the string *s*, scanning *s* from the end. To all
360 intents and purposes, the resulting list of words is the same as returned by
361 :func:`split`, except when the optional third argument *maxsplit* is explicitly
362 specified and nonzero. When *maxsplit* is nonzero, at most *maxsplit* number of
363 splits -- the *rightmost* ones -- occur, and the remainder of the string is
364 returned as the first element of the list (thus, the list will have at most
365 ``maxsplit+1`` elements).
366
367 .. versionadded:: 2.4
368
369
370.. function:: splitfields(s[, sep[, maxsplit]])
371
372 This function behaves identically to :func:`split`. (In the past, :func:`split`
373 was only used with one argument, while :func:`splitfields` was only used with
374 two arguments.)
375
376
377.. function:: join(words[, sep])
378
379 Concatenate a list or tuple of words with intervening occurrences of *sep*.
380 The default value for *sep* is a single space character. It is always true that
381 ``string.join(string.split(s, sep), sep)`` equals *s*.
382
383
384.. function:: joinfields(words[, sep])
385
386 This function behaves identically to :func:`join`. (In the past, :func:`join`
387 was only used with one argument, while :func:`joinfields` was only used with two
388 arguments.) Note that there is no :meth:`joinfields` method on string objects;
389 use the :meth:`join` method instead.
390
391
392.. function:: lstrip(s[, chars])
393
394 Return a copy of the string with leading characters removed. If *chars* is
395 omitted or ``None``, whitespace characters are removed. If given and not
396 ``None``, *chars* must be a string; the characters in the string will be
397 stripped from the beginning of the string this method is called on.
398
399 .. versionchanged:: 2.2.3
400 The *chars* parameter was added. The *chars* parameter cannot be passed in
401 earlier 2.2 versions.
402
403
404.. function:: rstrip(s[, chars])
405
406 Return a copy of the string with trailing characters removed. If *chars* is
407 omitted or ``None``, whitespace characters are removed. If given and not
408 ``None``, *chars* must be a string; the characters in the string will be
409 stripped from the end of the string this method is called on.
410
411 .. versionchanged:: 2.2.3
412 The *chars* parameter was added. The *chars* parameter cannot be passed in
413 earlier 2.2 versions.
414
415
416.. function:: strip(s[, chars])
417
418 Return a copy of the string with leading and trailing characters removed. If
419 *chars* is omitted or ``None``, whitespace characters are removed. If given and
420 not ``None``, *chars* must be a string; the characters in the string will be
421 stripped from the both ends of the string this method is called on.
422
423 .. versionchanged:: 2.2.3
424 The *chars* parameter was added. The *chars* parameter cannot be passed in
425 earlier 2.2 versions.
426
427
428.. function:: swapcase(s)
429
430 Return a copy of *s*, but with lower case letters converted to upper case and
431 vice versa.
432
433
434.. function:: translate(s, table[, deletechars])
435
436 Delete all characters from *s* that are in *deletechars* (if present), and then
437 translate the characters using *table*, which must be a 256-character string
438 giving the translation for each character value, indexed by its ordinal. If
439 *table* is ``None``, then only the character deletion step is performed.
440
441
442.. function:: upper(s)
443
444 Return a copy of *s*, but with lower case letters converted to upper case.
445
446
447.. function:: ljust(s, width)
448 rjust(s, width)
449 center(s, width)
450
451 These functions respectively left-justify, right-justify and center a string in
452 a field of given width. They return a string that is at least *width*
453 characters wide, created by padding the string *s* with spaces until the given
454 width on the right, left or both sides. The string is never truncated.
455
456
457.. function:: zfill(s, width)
458
459 Pad a numeric string on the left with zero digits until the given width is
460 reached. Strings starting with a sign are handled correctly.
461
462
463.. function:: replace(str, old, new[, maxreplace])
464
465 Return a copy of string *str* with all occurrences of substring *old* replaced
466 by *new*. If the optional argument *maxreplace* is given, the first
467 *maxreplace* occurrences are replaced.
468