Blame - Doc/library/string.rst - platform/external/python/cpython2

blob: 8671917c47f6929876aabf6944b7d68a7d0414bb [file] [log] [blame]

Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame^]	1
				2	:mod:`string` --- Common string operations
				3	==========================================
				4
				5	.. module:: string
				6	:synopsis: Common string operations.
				7
				8
				9	.. index:: module: re
				10
				11	The :mod:`string` module contains a number of useful constants and
				12	classes, as well as some deprecated legacy functions that are also
				13	available as methods on strings. In addition, Python's built-in string
				14	classes support the sequence type methods described in the
				15	:ref:`typesseq` section, and also the string-specific methods described
				16	in the :ref:`string-methods` section. To output formatted strings use
				17	template strings or the ``%`` operator described in the
				18	:ref:`string-formatting` section. Also, see the :mod:`re` module for
				19	string functions based on regular expressions.
				20
				21
				22	String constants
				23	----------------
				24
				25	The constants defined in this module are:
				26
				27
				28	.. data:: ascii_letters
				29
				30	The concatenation of the :const:`ascii_lowercase` and :const:`ascii_uppercase`
				31	constants described below. This value is not locale-dependent.
				32
				33
				34	.. data:: ascii_lowercase
				35
				36	The lowercase letters ``'abcdefghijklmnopqrstuvwxyz'``. This value is not
				37	locale-dependent and will not change.
				38
				39
				40	.. data:: ascii_uppercase
				41
				42	The uppercase letters ``'ABCDEFGHIJKLMNOPQRSTUVWXYZ'``. This value is not
				43	locale-dependent and will not change.
				44
				45
				46	.. data:: digits
				47
				48	The string ``'0123456789'``.
				49
				50
				51	.. data:: hexdigits
				52
				53	The string ``'0123456789abcdefABCDEF'``.
				54
				55
				56	.. data:: letters
				57
				58	The concatenation of the strings :const:`lowercase` and :const:`uppercase`
				59	described below. The specific value is locale-dependent, and will be updated
				60	when :func:`locale.setlocale` is called.
				61
				62
				63	.. data:: lowercase
				64
				65	A string containing all the characters that are considered lowercase letters.
				66	On most systems this is the string ``'abcdefghijklmnopqrstuvwxyz'``. Do not
				67	change its definition --- the effect on the routines :func:`upper` and
				68	:func:`swapcase` is undefined. The specific value is locale-dependent, and will
				69	be updated when :func:`locale.setlocale` is called.
				70
				71
				72	.. data:: octdigits
				73
				74	The string ``'01234567'``.
				75
				76
				77	.. data:: punctuation
				78
				79	String of ASCII characters which are considered punctuation characters in the
				80	``C`` locale.
				81
				82
				83	.. data:: printable
				84
				85	String of characters which are considered printable. This is a combination of
				86	:const:`digits`, :const:`letters`, :const:`punctuation`, and
				87	:const:`whitespace`.
				88
				89
				90	.. data:: uppercase
				91
				92	A string containing all the characters that are considered uppercase letters.
				93	On most systems this is the string ``'ABCDEFGHIJKLMNOPQRSTUVWXYZ'``. Do not
				94	change its definition --- the effect on the routines :func:`lower` and
				95	:func:`swapcase` is undefined. The specific value is locale-dependent, and will
				96	be updated when :func:`locale.setlocale` is called.
				97
				98
				99	.. data:: whitespace
				100
				101	A string containing all characters that are considered whitespace. On most
				102	systems this includes the characters space, tab, linefeed, return, formfeed, and
				103	vertical tab. Do not change its definition --- the effect on the routines
				104	:func:`strip` and :func:`split` is undefined.
				105
				106
				107	Template strings
				108	----------------
				109
				110	Templates provide simpler string substitutions as described in :pep:`292`.
				111	Instead of the normal ``%``\ -based substitutions, Templates support ``$``\
				112	-based substitutions, using the following rules:
				113
				114	* ``$$`` is an escape; it is replaced with a single ``$``.
				115
				116	* ``$identifier`` names a substitution placeholder matching a mapping key of
				117	``"identifier"``. By default, ``"identifier"`` must spell a Python
				118	identifier. The first non-identifier character after the ``$`` character
				119	terminates this placeholder specification.
				120
				121	* ``${identifier}`` is equivalent to ``$identifier``. It is required when valid
				122	identifier characters follow the placeholder but are not part of the
				123	placeholder, such as ``"${noun}ification"``.
				124
				125	Any other appearance of ``$`` in the string will result in a :exc:`ValueError`
				126	being raised.
				127
				128	.. versionadded:: 2.4
				129
				130	The :mod:`string` module provides a :class:`Template` class that implements
				131	these rules. The methods of :class:`Template` are:
				132
				133
				134	.. class:: Template(template)
				135
				136	The constructor takes a single argument which is the template string.
				137
				138
				139	.. method:: Template.substitute(mapping[, **kws])
				140
				141	Performs the template substitution, returning a new string. mapping is any
				142	dictionary-like object with keys that match the placeholders in the template.
				143	Alternatively, you can provide keyword arguments, where the keywords are the
				144	placeholders. When both mapping and kws are given and there are duplicates,
				145	the placeholders from kws take precedence.
				146
				147
				148	.. method:: Template.safe_substitute(mapping[, **kws])
				149
				150	Like :meth:`substitute`, except that if placeholders are missing from mapping
				151	and kws, instead of raising a :exc:`KeyError` exception, the original
				152	placeholder will appear in the resulting string intact. Also, unlike with
				153	:meth:`substitute`, any other appearances of the ``$`` will simply return ``$``
				154	instead of raising :exc:`ValueError`.
				155
				156	While other exceptions may still occur, this method is called "safe" because
				157	substitutions always tries to return a usable string instead of raising an
				158	exception. In another sense, :meth:`safe_substitute` may be anything other than
				159	safe, since it will silently ignore malformed templates containing dangling
				160	delimiters, unmatched braces, or placeholders that are not valid Python
				161	identifiers.
				162
				163	:class:`Template` instances also provide one public data attribute:
				164
				165
				166	.. attribute:: string.template
				167
				168	This is the object passed to the constructor's template argument. In general,
				169	you shouldn't change it, but read-only access is not enforced.
				170
				171	Here is an example of how to use a Template::
				172
				173	>>> from string import Template
				174	>>> s = Template('$who likes $what')
				175	>>> s.substitute(who='tim', what='kung pao')
				176	'tim likes kung pao'
				177	>>> d = dict(who='tim')
				178	>>> Template('Give $who $100').substitute(d)
				179	Traceback (most recent call last):
				180	[...]
				181	ValueError: Invalid placeholder in string: line 1, col 10
				182	>>> Template('$who likes $what').substitute(d)
				183	Traceback (most recent call last):
				184	[...]
				185	KeyError: 'what'
				186	>>> Template('$who likes $what').safe_substitute(d)
				187	'tim likes $what'
				188
				189	Advanced usage: you can derive subclasses of :class:`Template` to customize the
				190	placeholder syntax, delimiter character, or the entire regular expression used
				191	to parse template strings. To do this, you can override these class attributes:
				192
				193	* delimiter -- This is the literal string describing a placeholder introducing
				194	delimiter. The default value ``$``. Note that this should not be a regular
				195	expression, as the implementation will call :meth:`re.escape` on this string as
				196	needed.
				197
				198	* idpattern -- This is the regular expression describing the pattern for
				199	non-braced placeholders (the braces will be added automatically as
				200	appropriate). The default value is the regular expression
				201	``[_a-z][_a-z0-9]*``.
				202
				203	Alternatively, you can provide the entire regular expression pattern by
				204	overriding the class attribute pattern. If you do this, the value must be a
				205	regular expression object with four named capturing groups. The capturing
				206	groups correspond to the rules given above, along with the invalid placeholder
				207	rule:
				208
				209	* escaped -- This group matches the escape sequence, e.g. ``$$``, in the
				210	default pattern.
				211
				212	* named -- This group matches the unbraced placeholder name; it should not
				213	include the delimiter in capturing group.
				214
				215	* braced -- This group matches the brace enclosed placeholder name; it should
				216	not include either the delimiter or braces in the capturing group.
				217
				218	* invalid -- This group matches any other delimiter pattern (usually a single
				219	delimiter), and it should appear last in the regular expression.
				220
				221
				222	String functions
				223	----------------
				224
				225	The following functions are available to operate on string and Unicode objects.
				226	They are not available as string methods.
				227
				228
				229	.. function:: capwords(s)
				230
				231	Split the argument into words using :func:`split`, capitalize each word using
				232	:func:`capitalize`, and join the capitalized words using :func:`join`. Note
				233	that this replaces runs of whitespace characters by a single space, and removes
				234	leading and trailing whitespace.
				235
				236
				237	.. function:: maketrans(from, to)
				238
				239	Return a translation table suitable for passing to :func:`translate`, that will
				240	map each character in from into the character at the same position in to;
				241	from and to must have the same length.
				242
				243	.. warning::
				244
				245	Don't use strings derived from :const:`lowercase` and :const:`uppercase` as
				246	arguments; in some locales, these don't have the same length. For case
				247	conversions, always use :func:`lower` and :func:`upper`.
				248
				249
				250	Deprecated string functions
				251	---------------------------
				252
				253	The following list of functions are also defined as methods of string and
				254	Unicode objects; see section :ref:`string-methods` for more information on
				255	those. You should consider these functions as deprecated, although they will
				256	not be removed until Python 3.0. The functions defined in this module are:
				257
				258
				259	.. function:: atof(s)
				260
				261	.. deprecated:: 2.0
				262	Use the :func:`float` built-in function.
				263
				264	.. index:: builtin: float
				265
				266	Convert a string to a floating point number. The string must have the standard
				267	syntax for a floating point literal in Python, optionally preceded by a sign
				268	(``+`` or ``-``). Note that this behaves identical to the built-in function
				269	:func:`float` when passed a string.
				270
				271	.. note::
				272
				273	.. index::
				274	single: NaN
				275	single: Infinity
				276
				277	When passing in a string, values for NaN and Infinity may be returned, depending
				278	on the underlying C library. The specific set of strings accepted which cause
				279	these values to be returned depends entirely on the C library and is known to
				280	vary.
				281
				282
				283	.. function:: atoi(s[, base])
				284
				285	.. deprecated:: 2.0
				286	Use the :func:`int` built-in function.
				287
				288	.. index:: builtin: eval
				289
				290	Convert string s to an integer in the given base. The string must consist
				291	of one or more digits, optionally preceded by a sign (``+`` or ``-``). The
				292	base defaults to 10. If it is 0, a default base is chosen depending on the
				293	leading characters of the string (after stripping the sign): ``0x`` or ``0X``
				294	means 16, ``0`` means 8, anything else means 10. If base is 16, a leading
				295	``0x`` or ``0X`` is always accepted, though not required. This behaves
				296	identically to the built-in function :func:`int` when passed a string. (Also
				297	note: for a more flexible interpretation of numeric literals, use the built-in
				298	function :func:`eval`.)
				299
				300
				301	.. function:: atol(s[, base])
				302
				303	.. deprecated:: 2.0
				304	Use the :func:`long` built-in function.
				305
				306	.. index:: builtin: long
				307
				308	Convert string s to a long integer in the given base. The string must
				309	consist of one or more digits, optionally preceded by a sign (``+`` or ``-``).
				310	The base argument has the same meaning as for :func:`atoi`. A trailing ``l``
				311	or ``L`` is not allowed, except if the base is 0. Note that when invoked
				312	without base or with base set to 10, this behaves identical to the built-in
				313	function :func:`long` when passed a string.
				314
				315
				316	.. function:: capitalize(word)
				317
				318	Return a copy of word with only its first character capitalized.
				319
				320
				321	.. function:: expandtabs(s[, tabsize])
				322
				323	Expand tabs in a string replacing them by one or more spaces, depending on the
				324	current column and the given tab size. The column number is reset to zero after
				325	each newline occurring in the string. This doesn't understand other non-printing
				326	characters or escape sequences. The tab size defaults to 8.
				327
				328
				329	.. function:: find(s, sub[, start[,end]])
				330
				331	Return the lowest index in s where the substring sub is found such that
				332	sub is wholly contained in ``s[start:end]``. Return ``-1`` on failure.
				333	Defaults for start and end and interpretation of negative values is the same
				334	as for slices.
				335
				336
				337	.. function:: rfind(s, sub[, start[, end]])
				338
				339	Like :func:`find` but find the highest index.
				340
				341
				342	.. function:: index(s, sub[, start[, end]])
				343
				344	Like :func:`find` but raise :exc:`ValueError` when the substring is not found.
				345
				346
				347	.. function:: rindex(s, sub[, start[, end]])
				348
				349	Like :func:`rfind` but raise :exc:`ValueError` when the substring is not found.
				350
				351
				352	.. function:: count(s, sub[, start[, end]])
				353
				354	Return the number of (non-overlapping) occurrences of substring sub in string
				355	``s[start:end]``. Defaults for start and end and interpretation of negative
				356	values are the same as for slices.
				357
				358
				359	.. function:: lower(s)
				360
				361	Return a copy of s, but with upper case letters converted to lower case.
				362
				363
				364	.. function:: split(s[, sep[, maxsplit]])
				365
				366	Return a list of the words of the string s. If the optional second argument
				367	sep is absent or ``None``, the words are separated by arbitrary strings of
				368	whitespace characters (space, tab, newline, return, formfeed). If the second
				369	argument sep is present and not ``None``, it specifies a string to be used as
				370	the word separator. The returned list will then have one more item than the
				371	number of non-overlapping occurrences of the separator in the string. The
				372	optional third argument maxsplit defaults to 0. If it is nonzero, at most
				373	maxsplit number of splits occur, and the remainder of the string is returned
				374	as the final element of the list (thus, the list will have at most
				375	``maxsplit+1`` elements).
				376
				377	The behavior of split on an empty string depends on the value of sep. If sep
				378	is not specified, or specified as ``None``, the result will be an empty list.
				379	If sep is specified as any string, the result will be a list containing one
				380	element which is an empty string.
				381
				382
				383	.. function:: rsplit(s[, sep[, maxsplit]])
				384
				385	Return a list of the words of the string s, scanning s from the end. To all
				386	intents and purposes, the resulting list of words is the same as returned by
				387	:func:`split`, except when the optional third argument maxsplit is explicitly
				388	specified and nonzero. When maxsplit is nonzero, at most maxsplit number of
				389	splits -- the rightmost ones -- occur, and the remainder of the string is
				390	returned as the first element of the list (thus, the list will have at most
				391	``maxsplit+1`` elements).
				392
				393	.. versionadded:: 2.4
				394
				395
				396	.. function:: splitfields(s[, sep[, maxsplit]])
				397
				398	This function behaves identically to :func:`split`. (In the past, :func:`split`
				399	was only used with one argument, while :func:`splitfields` was only used with
				400	two arguments.)
				401
				402
				403	.. function:: join(words[, sep])
				404
				405	Concatenate a list or tuple of words with intervening occurrences of sep.
				406	The default value for sep is a single space character. It is always true that
				407	``string.join(string.split(s, sep), sep)`` equals s.
				408
				409
				410	.. function:: joinfields(words[, sep])
				411
				412	This function behaves identically to :func:`join`. (In the past, :func:`join`
				413	was only used with one argument, while :func:`joinfields` was only used with two
				414	arguments.) Note that there is no :meth:`joinfields` method on string objects;
				415	use the :meth:`join` method instead.
				416
				417
				418	.. function:: lstrip(s[, chars])
				419
				420	Return a copy of the string with leading characters removed. If chars is
				421	omitted or ``None``, whitespace characters are removed. If given and not
				422	``None``, chars must be a string; the characters in the string will be
				423	stripped from the beginning of the string this method is called on.
				424
				425	.. versionchanged:: 2.2.3
				426	The chars parameter was added. The chars parameter cannot be passed in
				427	earlier 2.2 versions.
				428
				429
				430	.. function:: rstrip(s[, chars])
				431
				432	Return a copy of the string with trailing characters removed. If chars is
				433	omitted or ``None``, whitespace characters are removed. If given and not
				434	``None``, chars must be a string; the characters in the string will be
				435	stripped from the end of the string this method is called on.
				436
				437	.. versionchanged:: 2.2.3
				438	The chars parameter was added. The chars parameter cannot be passed in
				439	earlier 2.2 versions.
				440
				441
				442	.. function:: strip(s[, chars])
				443
				444	Return a copy of the string with leading and trailing characters removed. If
				445	chars is omitted or ``None``, whitespace characters are removed. If given and
				446	not ``None``, chars must be a string; the characters in the string will be
				447	stripped from the both ends of the string this method is called on.
				448
				449	.. versionchanged:: 2.2.3
				450	The chars parameter was added. The chars parameter cannot be passed in
				451	earlier 2.2 versions.
				452
				453
				454	.. function:: swapcase(s)
				455
				456	Return a copy of s, but with lower case letters converted to upper case and
				457	vice versa.
				458
				459
				460	.. function:: translate(s, table[, deletechars])
				461
				462	Delete all characters from s that are in deletechars (if present), and then
				463	translate the characters using table, which must be a 256-character string
				464	giving the translation for each character value, indexed by its ordinal. If
				465	table is ``None``, then only the character deletion step is performed.
				466
				467
				468	.. function:: upper(s)
				469
				470	Return a copy of s, but with lower case letters converted to upper case.
				471
				472
				473	.. function:: ljust(s, width)
				474	rjust(s, width)
				475	center(s, width)
				476
				477	These functions respectively left-justify, right-justify and center a string in
				478	a field of given width. They return a string that is at least width
				479	characters wide, created by padding the string s with spaces until the given
				480	width on the right, left or both sides. The string is never truncated.
				481
				482
				483	.. function:: zfill(s, width)
				484
				485	Pad a numeric string on the left with zero digits until the given width is
				486	reached. Strings starting with a sign are handled correctly.
				487
				488
				489	.. function:: replace(str, old, new[, maxreplace])
				490
				491	Return a copy of string str with all occurrences of substring old replaced
				492	by new. If the optional argument maxreplace is given, the first
				493	maxreplace occurrences are replaced.
				494