Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 1 | |
| 2 | :mod:`string` --- Common string operations |
| 3 | ========================================== |
| 4 | |
| 5 | .. module:: string |
| 6 | :synopsis: Common string operations. |
| 7 | |
| 8 | |
| 9 | .. index:: module: re |
| 10 | |
Georg Brandl | 4b49131 | 2007-08-31 09:22:56 +0000 | [diff] [blame] | 11 | The :mod:`string` module contains a number of useful constants and classes, as |
| 12 | well as some deprecated legacy functions that are also available as methods on |
| 13 | strings. In addition, Python's built-in string classes support the sequence type |
| 14 | methods described in the :ref:`typesseq` section, and also the string-specific |
| 15 | methods described in the :ref:`string-methods` section. To output formatted |
| 16 | strings, see the :ref:`string-formatting` section. Also, see the :mod:`re` |
| 17 | module for string functions based on regular expressions. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 18 | |
| 19 | |
| 20 | String constants |
| 21 | ---------------- |
| 22 | |
| 23 | The constants defined in this module are: |
| 24 | |
| 25 | |
| 26 | .. data:: ascii_letters |
| 27 | |
| 28 | The concatenation of the :const:`ascii_lowercase` and :const:`ascii_uppercase` |
| 29 | constants described below. This value is not locale-dependent. |
| 30 | |
| 31 | |
| 32 | .. data:: ascii_lowercase |
| 33 | |
| 34 | The lowercase letters ``'abcdefghijklmnopqrstuvwxyz'``. This value is not |
| 35 | locale-dependent and will not change. |
| 36 | |
| 37 | |
| 38 | .. data:: ascii_uppercase |
| 39 | |
| 40 | The uppercase letters ``'ABCDEFGHIJKLMNOPQRSTUVWXYZ'``. This value is not |
| 41 | locale-dependent and will not change. |
| 42 | |
| 43 | |
| 44 | .. data:: digits |
| 45 | |
| 46 | The string ``'0123456789'``. |
| 47 | |
| 48 | |
| 49 | .. data:: hexdigits |
| 50 | |
| 51 | The string ``'0123456789abcdefABCDEF'``. |
| 52 | |
| 53 | |
| 54 | .. data:: octdigits |
| 55 | |
| 56 | The string ``'01234567'``. |
| 57 | |
| 58 | |
| 59 | .. data:: punctuation |
| 60 | |
| 61 | String of ASCII characters which are considered punctuation characters |
| 62 | in the ``C`` locale. |
| 63 | |
| 64 | |
| 65 | .. data:: printable |
| 66 | |
| 67 | String of ASCII characters which are considered printable. This is a |
| 68 | combination of :const:`digits`, :const:`ascii_letters`, :const:`punctuation`, |
| 69 | and :const:`whitespace`. |
| 70 | |
| 71 | |
| 72 | .. data:: whitespace |
| 73 | |
Georg Brandl | 5076740 | 2008-11-22 08:31:09 +0000 | [diff] [blame] | 74 | A string containing all ASCII characters that are considered whitespace. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 75 | This includes the characters space, tab, linefeed, return, formfeed, and |
| 76 | vertical tab. |
| 77 | |
| 78 | |
Georg Brandl | 4b49131 | 2007-08-31 09:22:56 +0000 | [diff] [blame] | 79 | .. _string-formatting: |
| 80 | |
| 81 | String Formatting |
| 82 | ----------------- |
| 83 | |
Benjamin Peterson | 50923f9 | 2008-05-25 19:45:17 +0000 | [diff] [blame] | 84 | The built-in string class provides the ability to do complex variable |
| 85 | substitutions and value formatting via the :func:`format` method described in |
| 86 | :pep:`3101`. The :class:`Formatter` class in the :mod:`string` module allows |
| 87 | you to create and customize your own string formatting behaviors using the same |
| 88 | implementation as the built-in :meth:`format` method. |
Georg Brandl | 4b49131 | 2007-08-31 09:22:56 +0000 | [diff] [blame] | 89 | |
| 90 | .. class:: Formatter |
| 91 | |
| 92 | The :class:`Formatter` class has the following public methods: |
| 93 | |
| 94 | .. method:: format(format_string, *args, *kwargs) |
| 95 | |
| 96 | :meth:`format` is the primary API method. It takes a format template |
| 97 | string, and an arbitrary set of positional and keyword argument. |
| 98 | :meth:`format` is just a wrapper that calls :meth:`vformat`. |
| 99 | |
| 100 | .. method:: vformat(format_string, args, kwargs) |
| 101 | |
| 102 | This function does the actual work of formatting. It is exposed as a |
| 103 | separate function for cases where you want to pass in a predefined |
| 104 | dictionary of arguments, rather than unpacking and repacking the |
| 105 | dictionary as individual arguments using the ``*args`` and ``**kwds`` |
| 106 | syntax. :meth:`vformat` does the work of breaking up the format template |
| 107 | string into character data and replacement fields. It calls the various |
| 108 | methods described below. |
| 109 | |
| 110 | In addition, the :class:`Formatter` defines a number of methods that are |
| 111 | intended to be replaced by subclasses: |
| 112 | |
| 113 | .. method:: parse(format_string) |
| 114 | |
| 115 | Loop over the format_string and return an iterable of tuples |
| 116 | (*literal_text*, *field_name*, *format_spec*, *conversion*). This is used |
| 117 | by :meth:`vformat` to break the string in to either literal text, or |
| 118 | replacement fields. |
| 119 | |
| 120 | The values in the tuple conceptually represent a span of literal text |
| 121 | followed by a single replacement field. If there is no literal text |
| 122 | (which can happen if two replacement fields occur consecutively), then |
| 123 | *literal_text* will be a zero-length string. If there is no replacement |
| 124 | field, then the values of *field_name*, *format_spec* and *conversion* |
| 125 | will be ``None``. |
| 126 | |
Eric Smith | 9d4ba39 | 2007-09-02 15:33:26 +0000 | [diff] [blame] | 127 | .. method:: get_field(field_name, args, kwargs) |
Georg Brandl | 4b49131 | 2007-08-31 09:22:56 +0000 | [diff] [blame] | 128 | |
| 129 | Given *field_name* as returned by :meth:`parse` (see above), convert it to |
Georg Brandl | 7f13e6b | 2007-08-31 10:37:15 +0000 | [diff] [blame] | 130 | an object to be formatted. Returns a tuple (obj, used_key). The default |
| 131 | version takes strings of the form defined in :pep:`3101`, such as |
| 132 | "0[name]" or "label.title". *args* and *kwargs* are as passed in to |
| 133 | :meth:`vformat`. The return value *used_key* has the same meaning as the |
| 134 | *key* parameter to :meth:`get_value`. |
Georg Brandl | 4b49131 | 2007-08-31 09:22:56 +0000 | [diff] [blame] | 135 | |
| 136 | .. method:: get_value(key, args, kwargs) |
| 137 | |
| 138 | Retrieve a given field value. The *key* argument will be either an |
| 139 | integer or a string. If it is an integer, it represents the index of the |
| 140 | positional argument in *args*; if it is a string, then it represents a |
| 141 | named argument in *kwargs*. |
| 142 | |
| 143 | The *args* parameter is set to the list of positional arguments to |
| 144 | :meth:`vformat`, and the *kwargs* parameter is set to the dictionary of |
| 145 | keyword arguments. |
| 146 | |
| 147 | For compound field names, these functions are only called for the first |
| 148 | component of the field name; Subsequent components are handled through |
| 149 | normal attribute and indexing operations. |
| 150 | |
| 151 | So for example, the field expression '0.name' would cause |
| 152 | :meth:`get_value` to be called with a *key* argument of 0. The ``name`` |
| 153 | attribute will be looked up after :meth:`get_value` returns by calling the |
| 154 | built-in :func:`getattr` function. |
| 155 | |
| 156 | If the index or keyword refers to an item that does not exist, then an |
| 157 | :exc:`IndexError` or :exc:`KeyError` should be raised. |
| 158 | |
| 159 | .. method:: check_unused_args(used_args, args, kwargs) |
| 160 | |
| 161 | Implement checking for unused arguments if desired. The arguments to this |
| 162 | function is the set of all argument keys that were actually referred to in |
| 163 | the format string (integers for positional arguments, and strings for |
| 164 | named arguments), and a reference to the *args* and *kwargs* that was |
| 165 | passed to vformat. The set of unused args can be calculated from these |
| 166 | parameters. :meth:`check_unused_args` is assumed to throw an exception if |
| 167 | the check fails. |
| 168 | |
| 169 | .. method:: format_field(value, format_spec) |
| 170 | |
| 171 | :meth:`format_field` simply calls the global :func:`format` built-in. The |
| 172 | method is provided so that subclasses can override it. |
| 173 | |
| 174 | .. method:: convert_field(value, conversion) |
| 175 | |
| 176 | Converts the value (returned by :meth:`get_field`) given a conversion type |
| 177 | (as in the tuple returned by the :meth:`parse` method.) The default |
| 178 | version understands 'r' (repr) and 's' (str) conversion types. |
| 179 | |
Georg Brandl | 4b49131 | 2007-08-31 09:22:56 +0000 | [diff] [blame] | 180 | |
| 181 | .. _formatstrings: |
| 182 | |
| 183 | Format String Syntax |
| 184 | -------------------- |
| 185 | |
| 186 | The :meth:`str.format` method and the :class:`Formatter` class share the same |
| 187 | syntax for format strings (although in the case of :class:`Formatter`, |
| 188 | subclasses can define their own format string syntax.) |
| 189 | |
| 190 | Format strings contain "replacement fields" surrounded by curly braces ``{}``. |
| 191 | Anything that is not contained in braces is considered literal text, which is |
| 192 | copied unchanged to the output. If you need to include a brace character in the |
| 193 | literal text, it can be escaped by doubling: ``{{`` and ``}}``. |
| 194 | |
| 195 | The grammar for a replacement field is as follows: |
| 196 | |
| 197 | .. productionlist:: sf |
| 198 | replacement_field: "{" `field_name` ["!" `conversion`] [":" `format_spec`] "}" |
| 199 | field_name: (`identifier` | `integer`) ("." `attribute_name` | "[" element_index "]")* |
| 200 | attribute_name: `identifier` |
| 201 | element_index: `integer` |
Benjamin Peterson | 065ba70 | 2008-11-09 01:43:02 +0000 | [diff] [blame] | 202 | conversion: "r" | "s" | "a" |
Georg Brandl | 4b49131 | 2007-08-31 09:22:56 +0000 | [diff] [blame] | 203 | format_spec: <described in the next section> |
| 204 | |
| 205 | In less formal terms, the replacement field starts with a *field_name*, which |
| 206 | can either be a number (for a positional argument), or an identifier (for |
| 207 | keyword arguments). Following this is an optional *conversion* field, which is |
| 208 | preceded by an exclamation point ``'!'``, and a *format_spec*, which is preceded |
| 209 | by a colon ``':'``. |
| 210 | |
| 211 | The *field_name* itself begins with either a number or a keyword. If it's a |
| 212 | number, it refers to a positional argument, and if it's a keyword it refers to a |
| 213 | named keyword argument. This can be followed by any number of index or |
| 214 | attribute expressions. An expression of the form ``'.name'`` selects the named |
| 215 | attribute using :func:`getattr`, while an expression of the form ``'[index]'`` |
| 216 | does an index lookup using :func:`__getitem__`. |
| 217 | |
| 218 | Some simple format string examples:: |
| 219 | |
| 220 | "First, thou shalt count to {0}" # References first positional argument |
| 221 | "My quest is {name}" # References keyword argument 'name' |
| 222 | "Weight in tons {0.weight}" # 'weight' attribute of first positional arg |
| 223 | "Units destroyed: {players[0]}" # First element of keyword argument 'players'. |
| 224 | |
| 225 | The *conversion* field causes a type coercion before formatting. Normally, the |
| 226 | job of formatting a value is done by the :meth:`__format__` method of the value |
| 227 | itself. However, in some cases it is desirable to force a type to be formatted |
| 228 | as a string, overriding its own definition of formatting. By converting the |
| 229 | value to a string before calling :meth:`__format__`, the normal formatting logic |
| 230 | is bypassed. |
| 231 | |
Georg Brandl | 559e5d7 | 2008-06-11 18:37:52 +0000 | [diff] [blame] | 232 | Three conversion flags are currently supported: ``'!s'`` which calls :func:`str` |
| 233 | on the value, ``'!r'`` which calls :func:`repr` and ``'!a'`` which calls |
| 234 | :func:`ascii`. |
Georg Brandl | 4b49131 | 2007-08-31 09:22:56 +0000 | [diff] [blame] | 235 | |
| 236 | Some examples:: |
| 237 | |
| 238 | "Harold's a clever {0!s}" # Calls str() on the argument first |
| 239 | "Bring out the holy {name!r}" # Calls repr() on the argument first |
| 240 | |
| 241 | The *format_spec* field contains a specification of how the value should be |
| 242 | presented, including such details as field width, alignment, padding, decimal |
| 243 | precision and so on. Each value type can define it's own "formatting |
| 244 | mini-language" or interpretation of the *format_spec*. |
| 245 | |
| 246 | Most built-in types support a common formatting mini-language, which is |
| 247 | described in the next section. |
| 248 | |
| 249 | A *format_spec* field can also include nested replacement fields within it. |
| 250 | These nested replacement fields can contain only a field name; conversion flags |
| 251 | and format specifications are not allowed. The replacement fields within the |
| 252 | format_spec are substituted before the *format_spec* string is interpreted. |
| 253 | This allows the formatting of a value to be dynamically specified. |
| 254 | |
| 255 | For example, suppose you wanted to have a replacement field whose field width is |
| 256 | determined by another variable:: |
| 257 | |
| 258 | "A man with two {0:{1}}".format("noses", 10) |
| 259 | |
| 260 | This would first evaluate the inner replacement field, making the format string |
| 261 | effectively:: |
| 262 | |
| 263 | "A man with two {0:10}" |
| 264 | |
| 265 | Then the outer replacement field would be evaluated, producing:: |
| 266 | |
| 267 | "noses " |
| 268 | |
Georg Brandl | 2ee470f | 2008-07-16 12:55:28 +0000 | [diff] [blame] | 269 | Which is substituted into the string, yielding:: |
Georg Brandl | 4b49131 | 2007-08-31 09:22:56 +0000 | [diff] [blame] | 270 | |
| 271 | "A man with two noses " |
| 272 | |
| 273 | (The extra space is because we specified a field width of 10, and because left |
| 274 | alignment is the default for strings.) |
| 275 | |
Georg Brandl | 4b49131 | 2007-08-31 09:22:56 +0000 | [diff] [blame] | 276 | |
| 277 | .. _formatspec: |
| 278 | |
| 279 | Format Specification Mini-Language |
| 280 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 281 | |
| 282 | "Format specifications" are used within replacement fields contained within a |
| 283 | format string to define how individual values are presented (see |
| 284 | :ref:`formatstrings`.) They can also be passed directly to the builtin |
| 285 | :func:`format` function. Each formattable type may define how the format |
| 286 | specification is to be interpreted. |
| 287 | |
| 288 | Most built-in types implement the following options for format specifications, |
| 289 | although some of the formatting options are only supported by the numeric types. |
| 290 | |
| 291 | A general convention is that an empty format string (``""``) produces the same |
Georg Brandl | 222e127 | 2008-01-11 12:58:40 +0000 | [diff] [blame] | 292 | result as if you had called :func:`str` on the value. |
Georg Brandl | 4b49131 | 2007-08-31 09:22:56 +0000 | [diff] [blame] | 293 | |
| 294 | The general form of a *standard format specifier* is: |
| 295 | |
| 296 | .. productionlist:: sf |
Eric Smith | d68af8f | 2008-07-16 00:15:35 +0000 | [diff] [blame] | 297 | format_spec: [[`fill`]`align`][`sign`][#][0][`width`][.`precision`][`type`] |
Georg Brandl | 4b49131 | 2007-08-31 09:22:56 +0000 | [diff] [blame] | 298 | fill: <a character other than '}'> |
| 299 | align: "<" | ">" | "=" | "^" |
| 300 | sign: "+" | "-" | " " |
| 301 | width: `integer` |
| 302 | precision: `integer` |
| 303 | type: "b" | "c" | "d" | "e" | "E" | "f" | "F" | "g" | "G" | "n" | "o" | "x" | "X" | "%" |
| 304 | |
| 305 | The *fill* character can be any character other than '}' (which signifies the |
| 306 | end of the field). The presence of a fill character is signaled by the *next* |
| 307 | character, which must be one of the alignment options. If the second character |
| 308 | of *format_spec* is not a valid alignment option, then it is assumed that both |
| 309 | the fill character and the alignment option are absent. |
| 310 | |
| 311 | The meaning of the various alignment options is as follows: |
| 312 | |
| 313 | +---------+----------------------------------------------------------+ |
| 314 | | Option | Meaning | |
| 315 | +=========+==========================================================+ |
| 316 | | ``'<'`` | Forces the field to be left-aligned within the available | |
| 317 | | | space (This is the default.) | |
| 318 | +---------+----------------------------------------------------------+ |
| 319 | | ``'>'`` | Forces the field to be right-aligned within the | |
| 320 | | | available space. | |
| 321 | +---------+----------------------------------------------------------+ |
| 322 | | ``'='`` | Forces the padding to be placed after the sign (if any) | |
| 323 | | | but before the digits. This is used for printing fields | |
| 324 | | | in the form '+000000120'. This alignment option is only | |
| 325 | | | valid for numeric types. | |
| 326 | +---------+----------------------------------------------------------+ |
| 327 | | ``'^'`` | Forces the field to be centered within the available | |
| 328 | | | space. | |
| 329 | +---------+----------------------------------------------------------+ |
| 330 | |
| 331 | Note that unless a minimum field width is defined, the field width will always |
| 332 | be the same size as the data to fill it, so that the alignment option has no |
| 333 | meaning in this case. |
| 334 | |
| 335 | The *sign* option is only valid for number types, and can be one of the |
| 336 | following: |
| 337 | |
| 338 | +---------+----------------------------------------------------------+ |
| 339 | | Option | Meaning | |
| 340 | +=========+==========================================================+ |
| 341 | | ``'+'`` | indicates that a sign should be used for both | |
| 342 | | | positive as well as negative numbers. | |
| 343 | +---------+----------------------------------------------------------+ |
| 344 | | ``'-'`` | indicates that a sign should be used only for negative | |
| 345 | | | numbers (this is the default behavior). | |
| 346 | +---------+----------------------------------------------------------+ |
| 347 | | space | indicates that a leading space should be used on | |
| 348 | | | positive numbers, and a minus sign on negative numbers. | |
| 349 | +---------+----------------------------------------------------------+ |
| 350 | |
Benjamin Peterson | d7b0328 | 2008-09-13 15:58:53 +0000 | [diff] [blame] | 351 | The ``'#'`` option is only valid for integers, and only for binary, octal, or |
| 352 | hexadecimal output. If present, it specifies that the output will be prefixed |
| 353 | by ``'0b'``, ``'0o'``, or ``'0x'``, respectively. |
Eric Smith | d68af8f | 2008-07-16 00:15:35 +0000 | [diff] [blame] | 354 | |
Georg Brandl | 4b49131 | 2007-08-31 09:22:56 +0000 | [diff] [blame] | 355 | *width* is a decimal integer defining the minimum field width. If not |
| 356 | specified, then the field width will be determined by the content. |
| 357 | |
| 358 | If the *width* field is preceded by a zero (``'0'``) character, this enables |
| 359 | zero-padding. This is equivalent to an *alignment* type of ``'='`` and a *fill* |
| 360 | character of ``'0'``. |
| 361 | |
| 362 | The *precision* is a decimal number indicating how many digits should be |
Georg Brandl | 3dbca81 | 2008-07-23 16:10:53 +0000 | [diff] [blame] | 363 | displayed after the decimal point for a floating point value formatted with |
| 364 | ``'f'`` and ``'F'``, or before and after the decimal point for a floating point |
| 365 | value formatted with ``'g'`` or ``'G'``. For non-number types the field |
| 366 | indicates the maximum field size - in other words, how many characters will be |
| 367 | used from the field content. The *precision* is ignored for integer values. |
Georg Brandl | 4b49131 | 2007-08-31 09:22:56 +0000 | [diff] [blame] | 368 | |
| 369 | Finally, the *type* determines how the data should be presented. |
| 370 | |
| 371 | The available integer presentation types are: |
| 372 | |
| 373 | +---------+----------------------------------------------------------+ |
| 374 | | Type | Meaning | |
| 375 | +=========+==========================================================+ |
Eric Smith | d68af8f | 2008-07-16 00:15:35 +0000 | [diff] [blame] | 376 | | ``'b'`` | Binary format. Outputs the number in base 2. | |
Georg Brandl | 4b49131 | 2007-08-31 09:22:56 +0000 | [diff] [blame] | 377 | +---------+----------------------------------------------------------+ |
| 378 | | ``'c'`` | Character. Converts the integer to the corresponding | |
| 379 | | | unicode character before printing. | |
| 380 | +---------+----------------------------------------------------------+ |
| 381 | | ``'d'`` | Decimal Integer. Outputs the number in base 10. | |
| 382 | +---------+----------------------------------------------------------+ |
| 383 | | ``'o'`` | Octal format. Outputs the number in base 8. | |
| 384 | +---------+----------------------------------------------------------+ |
| 385 | | ``'x'`` | Hex format. Outputs the number in base 16, using lower- | |
| 386 | | | case letters for the digits above 9. | |
| 387 | +---------+----------------------------------------------------------+ |
| 388 | | ``'X'`` | Hex format. Outputs the number in base 16, using upper- | |
| 389 | | | case letters for the digits above 9. | |
| 390 | +---------+----------------------------------------------------------+ |
Eric Smith | 5e18a20 | 2008-05-12 10:01:24 +0000 | [diff] [blame] | 391 | | ``'n'`` | Number. This is the same as ``'d'``, except that it uses | |
| 392 | | | the current locale setting to insert the appropriate | |
| 393 | | | number separator characters. | |
| 394 | +---------+----------------------------------------------------------+ |
Georg Brandl | 3dbca81 | 2008-07-23 16:10:53 +0000 | [diff] [blame] | 395 | | None | The same as ``'d'``. | |
Georg Brandl | 4b49131 | 2007-08-31 09:22:56 +0000 | [diff] [blame] | 396 | +---------+----------------------------------------------------------+ |
| 397 | |
| 398 | The available presentation types for floating point and decimal values are: |
| 399 | |
| 400 | +---------+----------------------------------------------------------+ |
| 401 | | Type | Meaning | |
| 402 | +=========+==========================================================+ |
| 403 | | ``'e'`` | Exponent notation. Prints the number in scientific | |
| 404 | | | notation using the letter 'e' to indicate the exponent. | |
| 405 | +---------+----------------------------------------------------------+ |
Eric Smith | 22b85b3 | 2008-07-17 19:18:29 +0000 | [diff] [blame] | 406 | | ``'E'`` | Exponent notation. Same as ``'e'`` except it uses an | |
| 407 | | | upper case 'E' as the separator character. | |
Georg Brandl | 4b49131 | 2007-08-31 09:22:56 +0000 | [diff] [blame] | 408 | +---------+----------------------------------------------------------+ |
| 409 | | ``'f'`` | Fixed point. Displays the number as a fixed-point | |
| 410 | | | number. | |
| 411 | +---------+----------------------------------------------------------+ |
Eric Smith | 22b85b3 | 2008-07-17 19:18:29 +0000 | [diff] [blame] | 412 | | ``'F'`` | Fixed point. Same as ``'f'``. | |
Georg Brandl | 4b49131 | 2007-08-31 09:22:56 +0000 | [diff] [blame] | 413 | +---------+----------------------------------------------------------+ |
| 414 | | ``'g'`` | General format. This prints the number as a fixed-point | |
| 415 | | | number, unless the number is too large, in which case | |
Georg Brandl | 3dbca81 | 2008-07-23 16:10:53 +0000 | [diff] [blame] | 416 | | | it switches to ``'e'`` exponent notation. Infinity and | |
| 417 | | | NaN values are formatted as ``inf``, ``-inf`` and | |
| 418 | | | ``nan``, respectively. | |
Georg Brandl | 4b49131 | 2007-08-31 09:22:56 +0000 | [diff] [blame] | 419 | +---------+----------------------------------------------------------+ |
| 420 | | ``'G'`` | General format. Same as ``'g'`` except switches to | |
Georg Brandl | 3dbca81 | 2008-07-23 16:10:53 +0000 | [diff] [blame] | 421 | | | ``'E'`` if the number gets to large. The representations | |
| 422 | | | of infinity and NaN are uppercased, too. | |
Georg Brandl | 4b49131 | 2007-08-31 09:22:56 +0000 | [diff] [blame] | 423 | +---------+----------------------------------------------------------+ |
| 424 | | ``'n'`` | Number. This is the same as ``'g'``, except that it uses | |
| 425 | | | the current locale setting to insert the appropriate | |
| 426 | | | number separator characters. | |
| 427 | +---------+----------------------------------------------------------+ |
| 428 | | ``'%'`` | Percentage. Multiplies the number by 100 and displays | |
| 429 | | | in fixed (``'f'``) format, followed by a percent sign. | |
| 430 | +---------+----------------------------------------------------------+ |
Georg Brandl | 3dbca81 | 2008-07-23 16:10:53 +0000 | [diff] [blame] | 431 | | None | The same as ``'g'``. | |
Georg Brandl | 4b49131 | 2007-08-31 09:22:56 +0000 | [diff] [blame] | 432 | +---------+----------------------------------------------------------+ |
| 433 | |
| 434 | |
| 435 | .. _template-strings: |
| 436 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 437 | Template strings |
| 438 | ---------------- |
| 439 | |
| 440 | Templates provide simpler string substitutions as described in :pep:`292`. |
| 441 | Instead of the normal ``%``\ -based substitutions, Templates support ``$``\ |
| 442 | -based substitutions, using the following rules: |
| 443 | |
| 444 | * ``$$`` is an escape; it is replaced with a single ``$``. |
| 445 | |
| 446 | * ``$identifier`` names a substitution placeholder matching a mapping key of |
| 447 | ``"identifier"``. By default, ``"identifier"`` must spell a Python |
| 448 | identifier. The first non-identifier character after the ``$`` character |
| 449 | terminates this placeholder specification. |
| 450 | |
| 451 | * ``${identifier}`` is equivalent to ``$identifier``. It is required when valid |
| 452 | identifier characters follow the placeholder but are not part of the |
| 453 | placeholder, such as ``"${noun}ification"``. |
| 454 | |
| 455 | Any other appearance of ``$`` in the string will result in a :exc:`ValueError` |
| 456 | being raised. |
| 457 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 458 | The :mod:`string` module provides a :class:`Template` class that implements |
| 459 | these rules. The methods of :class:`Template` are: |
| 460 | |
| 461 | |
| 462 | .. class:: Template(template) |
| 463 | |
| 464 | The constructor takes a single argument which is the template string. |
| 465 | |
| 466 | |
Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 467 | .. method:: substitute(mapping[, **kws]) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 468 | |
Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 469 | Performs the template substitution, returning a new string. *mapping* is |
| 470 | any dictionary-like object with keys that match the placeholders in the |
| 471 | template. Alternatively, you can provide keyword arguments, where the |
| 472 | keywords are the placeholders. When both *mapping* and *kws* are given |
| 473 | and there are duplicates, the placeholders from *kws* take precedence. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 474 | |
| 475 | |
Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 476 | .. method:: safe_substitute(mapping[, **kws]) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 477 | |
Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 478 | Like :meth:`substitute`, except that if placeholders are missing from |
| 479 | *mapping* and *kws*, instead of raising a :exc:`KeyError` exception, the |
| 480 | original placeholder will appear in the resulting string intact. Also, |
| 481 | unlike with :meth:`substitute`, any other appearances of the ``$`` will |
| 482 | simply return ``$`` instead of raising :exc:`ValueError`. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 483 | |
Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 484 | While other exceptions may still occur, this method is called "safe" |
| 485 | because substitutions always tries to return a usable string instead of |
| 486 | raising an exception. In another sense, :meth:`safe_substitute` may be |
| 487 | anything other than safe, since it will silently ignore malformed |
| 488 | templates containing dangling delimiters, unmatched braces, or |
| 489 | placeholders that are not valid Python identifiers. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 490 | |
| 491 | :class:`Template` instances also provide one public data attribute: |
| 492 | |
| 493 | |
| 494 | .. attribute:: string.template |
| 495 | |
| 496 | This is the object passed to the constructor's *template* argument. In general, |
| 497 | you shouldn't change it, but read-only access is not enforced. |
| 498 | |
Christian Heimes | fe337bf | 2008-03-23 21:54:12 +0000 | [diff] [blame] | 499 | Here is an example of how to use a Template: |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 500 | |
| 501 | >>> from string import Template |
| 502 | >>> s = Template('$who likes $what') |
| 503 | >>> s.substitute(who='tim', what='kung pao') |
| 504 | 'tim likes kung pao' |
| 505 | >>> d = dict(who='tim') |
| 506 | >>> Template('Give $who $100').substitute(d) |
| 507 | Traceback (most recent call last): |
| 508 | [...] |
| 509 | ValueError: Invalid placeholder in string: line 1, col 10 |
| 510 | >>> Template('$who likes $what').substitute(d) |
| 511 | Traceback (most recent call last): |
| 512 | [...] |
| 513 | KeyError: 'what' |
| 514 | >>> Template('$who likes $what').safe_substitute(d) |
| 515 | 'tim likes $what' |
| 516 | |
| 517 | Advanced usage: you can derive subclasses of :class:`Template` to customize the |
| 518 | placeholder syntax, delimiter character, or the entire regular expression used |
| 519 | to parse template strings. To do this, you can override these class attributes: |
| 520 | |
| 521 | * *delimiter* -- This is the literal string describing a placeholder introducing |
| 522 | delimiter. The default value ``$``. Note that this should *not* be a regular |
| 523 | expression, as the implementation will call :meth:`re.escape` on this string as |
| 524 | needed. |
| 525 | |
| 526 | * *idpattern* -- This is the regular expression describing the pattern for |
| 527 | non-braced placeholders (the braces will be added automatically as |
| 528 | appropriate). The default value is the regular expression |
| 529 | ``[_a-z][_a-z0-9]*``. |
| 530 | |
| 531 | Alternatively, you can provide the entire regular expression pattern by |
| 532 | overriding the class attribute *pattern*. If you do this, the value must be a |
| 533 | regular expression object with four named capturing groups. The capturing |
| 534 | groups correspond to the rules given above, along with the invalid placeholder |
| 535 | rule: |
| 536 | |
| 537 | * *escaped* -- This group matches the escape sequence, e.g. ``$$``, in the |
| 538 | default pattern. |
| 539 | |
| 540 | * *named* -- This group matches the unbraced placeholder name; it should not |
| 541 | include the delimiter in capturing group. |
| 542 | |
| 543 | * *braced* -- This group matches the brace enclosed placeholder name; it should |
| 544 | not include either the delimiter or braces in the capturing group. |
| 545 | |
| 546 | * *invalid* -- This group matches any other delimiter pattern (usually a single |
| 547 | delimiter), and it should appear last in the regular expression. |
| 548 | |
| 549 | |
| 550 | String functions |
| 551 | ---------------- |
| 552 | |
Georg Brandl | f694518 | 2008-02-01 11:56:49 +0000 | [diff] [blame] | 553 | The following functions are available to operate on string objects. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 554 | They are not available as string methods. |
| 555 | |
| 556 | |
| 557 | .. function:: capwords(s) |
| 558 | |
| 559 | Split the argument into words using :func:`split`, capitalize each word using |
| 560 | :func:`capitalize`, and join the capitalized words using :func:`join`. Note |
| 561 | that this replaces runs of whitespace characters by a single space, and removes |
| 562 | leading and trailing whitespace. |
| 563 | |
| 564 | |
Georg Brandl | 7f13e6b | 2007-08-31 10:37:15 +0000 | [diff] [blame] | 565 | .. function:: maketrans(frm, to) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 566 | |
Georg Brandl | 7f13e6b | 2007-08-31 10:37:15 +0000 | [diff] [blame] | 567 | Return a translation table suitable for passing to :meth:`bytes.translate`, |
| 568 | that will map each character in *from* into the character at the same |
| 569 | position in *to*; *from* and *to* must have the same length. |