blob: 8671917c47f6929876aabf6944b7d68a7d0414bb [file] [log] [blame]
Georg Brandl8ec7f652007-08-15 14:28:01 +00001
2:mod:`string` --- Common string operations
3==========================================
4
5.. module:: string
6 :synopsis: Common string operations.
7
8
9.. index:: module: re
10
11The :mod:`string` module contains a number of useful constants and
12classes, as well as some deprecated legacy functions that are also
13available as methods on strings. In addition, Python's built-in string
14classes support the sequence type methods described in the
15:ref:`typesseq` section, and also the string-specific methods described
16in the :ref:`string-methods` section. To output formatted strings use
17template strings or the ``%`` operator described in the
18:ref:`string-formatting` section. Also, see the :mod:`re` module for
19string functions based on regular expressions.
20
21
22String constants
23----------------
24
25The constants defined in this module are:
26
27
28.. data:: ascii_letters
29
30 The concatenation of the :const:`ascii_lowercase` and :const:`ascii_uppercase`
31 constants described below. This value is not locale-dependent.
32
33
34.. data:: ascii_lowercase
35
36 The lowercase letters ``'abcdefghijklmnopqrstuvwxyz'``. This value is not
37 locale-dependent and will not change.
38
39
40.. data:: ascii_uppercase
41
42 The uppercase letters ``'ABCDEFGHIJKLMNOPQRSTUVWXYZ'``. This value is not
43 locale-dependent and will not change.
44
45
46.. data:: digits
47
48 The string ``'0123456789'``.
49
50
51.. data:: hexdigits
52
53 The string ``'0123456789abcdefABCDEF'``.
54
55
56.. data:: letters
57
58 The concatenation of the strings :const:`lowercase` and :const:`uppercase`
59 described below. The specific value is locale-dependent, and will be updated
60 when :func:`locale.setlocale` is called.
61
62
63.. data:: lowercase
64
65 A string containing all the characters that are considered lowercase letters.
66 On most systems this is the string ``'abcdefghijklmnopqrstuvwxyz'``. Do not
67 change its definition --- the effect on the routines :func:`upper` and
68 :func:`swapcase` is undefined. The specific value is locale-dependent, and will
69 be updated when :func:`locale.setlocale` is called.
70
71
72.. data:: octdigits
73
74 The string ``'01234567'``.
75
76
77.. data:: punctuation
78
79 String of ASCII characters which are considered punctuation characters in the
80 ``C`` locale.
81
82
83.. data:: printable
84
85 String of characters which are considered printable. This is a combination of
86 :const:`digits`, :const:`letters`, :const:`punctuation`, and
87 :const:`whitespace`.
88
89
90.. data:: uppercase
91
92 A string containing all the characters that are considered uppercase letters.
93 On most systems this is the string ``'ABCDEFGHIJKLMNOPQRSTUVWXYZ'``. Do not
94 change its definition --- the effect on the routines :func:`lower` and
95 :func:`swapcase` is undefined. The specific value is locale-dependent, and will
96 be updated when :func:`locale.setlocale` is called.
97
98
99.. data:: whitespace
100
101 A string containing all characters that are considered whitespace. On most
102 systems this includes the characters space, tab, linefeed, return, formfeed, and
103 vertical tab. Do not change its definition --- the effect on the routines
104 :func:`strip` and :func:`split` is undefined.
105
106
107Template strings
108----------------
109
110Templates provide simpler string substitutions as described in :pep:`292`.
111Instead of the normal ``%``\ -based substitutions, Templates support ``$``\
112-based substitutions, using the following rules:
113
114* ``$$`` is an escape; it is replaced with a single ``$``.
115
116* ``$identifier`` names a substitution placeholder matching a mapping key of
117 ``"identifier"``. By default, ``"identifier"`` must spell a Python
118 identifier. The first non-identifier character after the ``$`` character
119 terminates this placeholder specification.
120
121* ``${identifier}`` is equivalent to ``$identifier``. It is required when valid
122 identifier characters follow the placeholder but are not part of the
123 placeholder, such as ``"${noun}ification"``.
124
125Any other appearance of ``$`` in the string will result in a :exc:`ValueError`
126being raised.
127
128.. versionadded:: 2.4
129
130The :mod:`string` module provides a :class:`Template` class that implements
131these rules. The methods of :class:`Template` are:
132
133
134.. class:: Template(template)
135
136 The constructor takes a single argument which is the template string.
137
138
139.. method:: Template.substitute(mapping[, **kws])
140
141 Performs the template substitution, returning a new string. *mapping* is any
142 dictionary-like object with keys that match the placeholders in the template.
143 Alternatively, you can provide keyword arguments, where the keywords are the
144 placeholders. When both *mapping* and *kws* are given and there are duplicates,
145 the placeholders from *kws* take precedence.
146
147
148.. method:: Template.safe_substitute(mapping[, **kws])
149
150 Like :meth:`substitute`, except that if placeholders are missing from *mapping*
151 and *kws*, instead of raising a :exc:`KeyError` exception, the original
152 placeholder will appear in the resulting string intact. Also, unlike with
153 :meth:`substitute`, any other appearances of the ``$`` will simply return ``$``
154 instead of raising :exc:`ValueError`.
155
156 While other exceptions may still occur, this method is called "safe" because
157 substitutions always tries to return a usable string instead of raising an
158 exception. In another sense, :meth:`safe_substitute` may be anything other than
159 safe, since it will silently ignore malformed templates containing dangling
160 delimiters, unmatched braces, or placeholders that are not valid Python
161 identifiers.
162
163:class:`Template` instances also provide one public data attribute:
164
165
166.. attribute:: string.template
167
168 This is the object passed to the constructor's *template* argument. In general,
169 you shouldn't change it, but read-only access is not enforced.
170
171Here is an example of how to use a Template::
172
173 >>> from string import Template
174 >>> s = Template('$who likes $what')
175 >>> s.substitute(who='tim', what='kung pao')
176 'tim likes kung pao'
177 >>> d = dict(who='tim')
178 >>> Template('Give $who $100').substitute(d)
179 Traceback (most recent call last):
180 [...]
181 ValueError: Invalid placeholder in string: line 1, col 10
182 >>> Template('$who likes $what').substitute(d)
183 Traceback (most recent call last):
184 [...]
185 KeyError: 'what'
186 >>> Template('$who likes $what').safe_substitute(d)
187 'tim likes $what'
188
189Advanced usage: you can derive subclasses of :class:`Template` to customize the
190placeholder syntax, delimiter character, or the entire regular expression used
191to parse template strings. To do this, you can override these class attributes:
192
193* *delimiter* -- This is the literal string describing a placeholder introducing
194 delimiter. The default value ``$``. Note that this should *not* be a regular
195 expression, as the implementation will call :meth:`re.escape` on this string as
196 needed.
197
198* *idpattern* -- This is the regular expression describing the pattern for
199 non-braced placeholders (the braces will be added automatically as
200 appropriate). The default value is the regular expression
201 ``[_a-z][_a-z0-9]*``.
202
203Alternatively, you can provide the entire regular expression pattern by
204overriding the class attribute *pattern*. If you do this, the value must be a
205regular expression object with four named capturing groups. The capturing
206groups correspond to the rules given above, along with the invalid placeholder
207rule:
208
209* *escaped* -- This group matches the escape sequence, e.g. ``$$``, in the
210 default pattern.
211
212* *named* -- This group matches the unbraced placeholder name; it should not
213 include the delimiter in capturing group.
214
215* *braced* -- This group matches the brace enclosed placeholder name; it should
216 not include either the delimiter or braces in the capturing group.
217
218* *invalid* -- This group matches any other delimiter pattern (usually a single
219 delimiter), and it should appear last in the regular expression.
220
221
222String functions
223----------------
224
225The following functions are available to operate on string and Unicode objects.
226They are not available as string methods.
227
228
229.. function:: capwords(s)
230
231 Split the argument into words using :func:`split`, capitalize each word using
232 :func:`capitalize`, and join the capitalized words using :func:`join`. Note
233 that this replaces runs of whitespace characters by a single space, and removes
234 leading and trailing whitespace.
235
236
237.. function:: maketrans(from, to)
238
239 Return a translation table suitable for passing to :func:`translate`, that will
240 map each character in *from* into the character at the same position in *to*;
241 *from* and *to* must have the same length.
242
243 .. warning::
244
245 Don't use strings derived from :const:`lowercase` and :const:`uppercase` as
246 arguments; in some locales, these don't have the same length. For case
247 conversions, always use :func:`lower` and :func:`upper`.
248
249
250Deprecated string functions
251---------------------------
252
253The following list of functions are also defined as methods of string and
254Unicode objects; see section :ref:`string-methods` for more information on
255those. You should consider these functions as deprecated, although they will
256not be removed until Python 3.0. The functions defined in this module are:
257
258
259.. function:: atof(s)
260
261 .. deprecated:: 2.0
262 Use the :func:`float` built-in function.
263
264 .. index:: builtin: float
265
266 Convert a string to a floating point number. The string must have the standard
267 syntax for a floating point literal in Python, optionally preceded by a sign
268 (``+`` or ``-``). Note that this behaves identical to the built-in function
269 :func:`float` when passed a string.
270
271 .. note::
272
273 .. index::
274 single: NaN
275 single: Infinity
276
277 When passing in a string, values for NaN and Infinity may be returned, depending
278 on the underlying C library. The specific set of strings accepted which cause
279 these values to be returned depends entirely on the C library and is known to
280 vary.
281
282
283.. function:: atoi(s[, base])
284
285 .. deprecated:: 2.0
286 Use the :func:`int` built-in function.
287
288 .. index:: builtin: eval
289
290 Convert string *s* to an integer in the given *base*. The string must consist
291 of one or more digits, optionally preceded by a sign (``+`` or ``-``). The
292 *base* defaults to 10. If it is 0, a default base is chosen depending on the
293 leading characters of the string (after stripping the sign): ``0x`` or ``0X``
294 means 16, ``0`` means 8, anything else means 10. If *base* is 16, a leading
295 ``0x`` or ``0X`` is always accepted, though not required. This behaves
296 identically to the built-in function :func:`int` when passed a string. (Also
297 note: for a more flexible interpretation of numeric literals, use the built-in
298 function :func:`eval`.)
299
300
301.. function:: atol(s[, base])
302
303 .. deprecated:: 2.0
304 Use the :func:`long` built-in function.
305
306 .. index:: builtin: long
307
308 Convert string *s* to a long integer in the given *base*. The string must
309 consist of one or more digits, optionally preceded by a sign (``+`` or ``-``).
310 The *base* argument has the same meaning as for :func:`atoi`. A trailing ``l``
311 or ``L`` is not allowed, except if the base is 0. Note that when invoked
312 without *base* or with *base* set to 10, this behaves identical to the built-in
313 function :func:`long` when passed a string.
314
315
316.. function:: capitalize(word)
317
318 Return a copy of *word* with only its first character capitalized.
319
320
321.. function:: expandtabs(s[, tabsize])
322
323 Expand tabs in a string replacing them by one or more spaces, depending on the
324 current column and the given tab size. The column number is reset to zero after
325 each newline occurring in the string. This doesn't understand other non-printing
326 characters or escape sequences. The tab size defaults to 8.
327
328
329.. function:: find(s, sub[, start[,end]])
330
331 Return the lowest index in *s* where the substring *sub* is found such that
332 *sub* is wholly contained in ``s[start:end]``. Return ``-1`` on failure.
333 Defaults for *start* and *end* and interpretation of negative values is the same
334 as for slices.
335
336
337.. function:: rfind(s, sub[, start[, end]])
338
339 Like :func:`find` but find the highest index.
340
341
342.. function:: index(s, sub[, start[, end]])
343
344 Like :func:`find` but raise :exc:`ValueError` when the substring is not found.
345
346
347.. function:: rindex(s, sub[, start[, end]])
348
349 Like :func:`rfind` but raise :exc:`ValueError` when the substring is not found.
350
351
352.. function:: count(s, sub[, start[, end]])
353
354 Return the number of (non-overlapping) occurrences of substring *sub* in string
355 ``s[start:end]``. Defaults for *start* and *end* and interpretation of negative
356 values are the same as for slices.
357
358
359.. function:: lower(s)
360
361 Return a copy of *s*, but with upper case letters converted to lower case.
362
363
364.. function:: split(s[, sep[, maxsplit]])
365
366 Return a list of the words of the string *s*. If the optional second argument
367 *sep* is absent or ``None``, the words are separated by arbitrary strings of
368 whitespace characters (space, tab, newline, return, formfeed). If the second
369 argument *sep* is present and not ``None``, it specifies a string to be used as
370 the word separator. The returned list will then have one more item than the
371 number of non-overlapping occurrences of the separator in the string. The
372 optional third argument *maxsplit* defaults to 0. If it is nonzero, at most
373 *maxsplit* number of splits occur, and the remainder of the string is returned
374 as the final element of the list (thus, the list will have at most
375 ``maxsplit+1`` elements).
376
377 The behavior of split on an empty string depends on the value of *sep*. If *sep*
378 is not specified, or specified as ``None``, the result will be an empty list.
379 If *sep* is specified as any string, the result will be a list containing one
380 element which is an empty string.
381
382
383.. function:: rsplit(s[, sep[, maxsplit]])
384
385 Return a list of the words of the string *s*, scanning *s* from the end. To all
386 intents and purposes, the resulting list of words is the same as returned by
387 :func:`split`, except when the optional third argument *maxsplit* is explicitly
388 specified and nonzero. When *maxsplit* is nonzero, at most *maxsplit* number of
389 splits -- the *rightmost* ones -- occur, and the remainder of the string is
390 returned as the first element of the list (thus, the list will have at most
391 ``maxsplit+1`` elements).
392
393 .. versionadded:: 2.4
394
395
396.. function:: splitfields(s[, sep[, maxsplit]])
397
398 This function behaves identically to :func:`split`. (In the past, :func:`split`
399 was only used with one argument, while :func:`splitfields` was only used with
400 two arguments.)
401
402
403.. function:: join(words[, sep])
404
405 Concatenate a list or tuple of words with intervening occurrences of *sep*.
406 The default value for *sep* is a single space character. It is always true that
407 ``string.join(string.split(s, sep), sep)`` equals *s*.
408
409
410.. function:: joinfields(words[, sep])
411
412 This function behaves identically to :func:`join`. (In the past, :func:`join`
413 was only used with one argument, while :func:`joinfields` was only used with two
414 arguments.) Note that there is no :meth:`joinfields` method on string objects;
415 use the :meth:`join` method instead.
416
417
418.. function:: lstrip(s[, chars])
419
420 Return a copy of the string with leading characters removed. If *chars* is
421 omitted or ``None``, whitespace characters are removed. If given and not
422 ``None``, *chars* must be a string; the characters in the string will be
423 stripped from the beginning of the string this method is called on.
424
425 .. versionchanged:: 2.2.3
426 The *chars* parameter was added. The *chars* parameter cannot be passed in
427 earlier 2.2 versions.
428
429
430.. function:: rstrip(s[, chars])
431
432 Return a copy of the string with trailing characters removed. If *chars* is
433 omitted or ``None``, whitespace characters are removed. If given and not
434 ``None``, *chars* must be a string; the characters in the string will be
435 stripped from the end of the string this method is called on.
436
437 .. versionchanged:: 2.2.3
438 The *chars* parameter was added. The *chars* parameter cannot be passed in
439 earlier 2.2 versions.
440
441
442.. function:: strip(s[, chars])
443
444 Return a copy of the string with leading and trailing characters removed. If
445 *chars* is omitted or ``None``, whitespace characters are removed. If given and
446 not ``None``, *chars* must be a string; the characters in the string will be
447 stripped from the both ends of the string this method is called on.
448
449 .. versionchanged:: 2.2.3
450 The *chars* parameter was added. The *chars* parameter cannot be passed in
451 earlier 2.2 versions.
452
453
454.. function:: swapcase(s)
455
456 Return a copy of *s*, but with lower case letters converted to upper case and
457 vice versa.
458
459
460.. function:: translate(s, table[, deletechars])
461
462 Delete all characters from *s* that are in *deletechars* (if present), and then
463 translate the characters using *table*, which must be a 256-character string
464 giving the translation for each character value, indexed by its ordinal. If
465 *table* is ``None``, then only the character deletion step is performed.
466
467
468.. function:: upper(s)
469
470 Return a copy of *s*, but with lower case letters converted to upper case.
471
472
473.. function:: ljust(s, width)
474 rjust(s, width)
475 center(s, width)
476
477 These functions respectively left-justify, right-justify and center a string in
478 a field of given width. They return a string that is at least *width*
479 characters wide, created by padding the string *s* with spaces until the given
480 width on the right, left or both sides. The string is never truncated.
481
482
483.. function:: zfill(s, width)
484
485 Pad a numeric string on the left with zero digits until the given width is
486 reached. Strings starting with a sign are handled correctly.
487
488
489.. function:: replace(str, old, new[, maxreplace])
490
491 Return a copy of string *str* with all occurrences of substring *old* replaced
492 by *new*. If the optional argument *maxreplace* is given, the first
493 *maxreplace* occurrences are replaced.
494