Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 1 | |
| 2 | :mod:`struct` --- Interpret strings as packed binary data |
| 3 | ========================================================= |
| 4 | |
| 5 | .. module:: struct |
| 6 | :synopsis: Interpret strings as packed binary data. |
| 7 | |
| 8 | .. index:: |
| 9 | pair: C; structures |
| 10 | triple: packing; binary; data |
| 11 | |
| 12 | This module performs conversions between Python values and C structs represented |
| 13 | as Python strings. It uses :dfn:`format strings` (explained below) as compact |
| 14 | descriptions of the lay-out of the C structs and the intended conversion to/from |
| 15 | Python values. This can be used in handling binary data stored in files or from |
| 16 | network connections, among other sources. |
| 17 | |
| 18 | The module defines the following exception and functions: |
| 19 | |
| 20 | |
| 21 | .. exception:: error |
| 22 | |
| 23 | Exception raised on various occasions; argument is a string describing what is |
| 24 | wrong. |
| 25 | |
| 26 | |
| 27 | .. function:: pack(fmt, v1, v2, ...) |
| 28 | |
| 29 | Return a string containing the values ``v1, v2, ...`` packed according to the |
| 30 | given format. The arguments must match the values required by the format |
| 31 | exactly. |
| 32 | |
| 33 | |
| 34 | .. function:: pack_into(fmt, buffer, offset, v1, v2, ...) |
| 35 | |
| 36 | Pack the values ``v1, v2, ...`` according to the given format, write the packed |
| 37 | bytes into the writable *buffer* starting at *offset*. Note that the offset is |
| 38 | a required argument. |
| 39 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 40 | |
| 41 | .. function:: unpack(fmt, string) |
| 42 | |
| 43 | Unpack the string (presumably packed by ``pack(fmt, ...)``) according to the |
| 44 | given format. The result is a tuple even if it contains exactly one item. The |
| 45 | string must contain exactly the amount of data required by the format |
| 46 | (``len(string)`` must equal ``calcsize(fmt)``). |
| 47 | |
| 48 | |
| 49 | .. function:: unpack_from(fmt, buffer[,offset=0]) |
| 50 | |
| 51 | Unpack the *buffer* according to tthe given format. The result is a tuple even |
| 52 | if it contains exactly one item. The *buffer* must contain at least the amount |
| 53 | of data required by the format (``len(buffer[offset:])`` must be at least |
| 54 | ``calcsize(fmt)``). |
| 55 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 56 | |
| 57 | .. function:: calcsize(fmt) |
| 58 | |
| 59 | Return the size of the struct (and hence of the string) corresponding to the |
| 60 | given format. |
| 61 | |
| 62 | Format characters have the following meaning; the conversion between C and |
| 63 | Python values should be obvious given their types: |
| 64 | |
| 65 | +--------+-------------------------+--------------------+-------+ |
| 66 | | Format | C Type | Python | Notes | |
| 67 | +========+=========================+====================+=======+ |
| 68 | | ``x`` | pad byte | no value | | |
| 69 | +--------+-------------------------+--------------------+-------+ |
| 70 | | ``c`` | :ctype:`char` | string of length 1 | | |
| 71 | +--------+-------------------------+--------------------+-------+ |
| 72 | | ``b`` | :ctype:`signed char` | integer | | |
| 73 | +--------+-------------------------+--------------------+-------+ |
| 74 | | ``B`` | :ctype:`unsigned char` | integer | | |
| 75 | +--------+-------------------------+--------------------+-------+ |
| 76 | | ``t`` | :ctype:`_Bool` | bool | \(1) | |
| 77 | +--------+-------------------------+--------------------+-------+ |
| 78 | | ``h`` | :ctype:`short` | integer | | |
| 79 | +--------+-------------------------+--------------------+-------+ |
| 80 | | ``H`` | :ctype:`unsigned short` | integer | | |
| 81 | +--------+-------------------------+--------------------+-------+ |
| 82 | | ``i`` | :ctype:`int` | integer | | |
| 83 | +--------+-------------------------+--------------------+-------+ |
Georg Brandl | ba956ae | 2007-11-29 17:24:34 +0000 | [diff] [blame] | 84 | | ``I`` | :ctype:`unsigned int` | integer | | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 85 | +--------+-------------------------+--------------------+-------+ |
| 86 | | ``l`` | :ctype:`long` | integer | | |
| 87 | +--------+-------------------------+--------------------+-------+ |
Georg Brandl | ba956ae | 2007-11-29 17:24:34 +0000 | [diff] [blame] | 88 | | ``L`` | :ctype:`unsigned long` | integer | | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 89 | +--------+-------------------------+--------------------+-------+ |
Georg Brandl | ba956ae | 2007-11-29 17:24:34 +0000 | [diff] [blame] | 90 | | ``q`` | :ctype:`long long` | integer | \(2) | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 91 | +--------+-------------------------+--------------------+-------+ |
Georg Brandl | ba956ae | 2007-11-29 17:24:34 +0000 | [diff] [blame] | 92 | | ``Q`` | :ctype:`unsigned long | integer | \(2) | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 93 | | | long` | | | |
| 94 | +--------+-------------------------+--------------------+-------+ |
| 95 | | ``f`` | :ctype:`float` | float | | |
| 96 | +--------+-------------------------+--------------------+-------+ |
| 97 | | ``d`` | :ctype:`double` | float | | |
| 98 | +--------+-------------------------+--------------------+-------+ |
| 99 | | ``s`` | :ctype:`char[]` | string | | |
| 100 | +--------+-------------------------+--------------------+-------+ |
| 101 | | ``p`` | :ctype:`char[]` | string | | |
| 102 | +--------+-------------------------+--------------------+-------+ |
| 103 | | ``P`` | :ctype:`void \*` | integer | | |
| 104 | +--------+-------------------------+--------------------+-------+ |
| 105 | |
| 106 | Notes: |
| 107 | |
| 108 | (1) |
| 109 | The ``'t'`` conversion code corresponds to the :ctype:`_Bool` type defined by |
| 110 | C99. If this type is not available, it is simulated using a :ctype:`char`. In |
| 111 | standard mode, it is always represented by one byte. |
| 112 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 113 | (2) |
| 114 | The ``'q'`` and ``'Q'`` conversion codes are available in native mode only if |
| 115 | the platform C compiler supports C :ctype:`long long`, or, on Windows, |
| 116 | :ctype:`__int64`. They are always available in standard modes. |
| 117 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 118 | A format character may be preceded by an integral repeat count. For example, |
| 119 | the format string ``'4h'`` means exactly the same as ``'hhhh'``. |
| 120 | |
| 121 | Whitespace characters between formats are ignored; a count and its format must |
| 122 | not contain whitespace though. |
| 123 | |
| 124 | For the ``'s'`` format character, the count is interpreted as the size of the |
| 125 | string, not a repeat count like for the other format characters; for example, |
| 126 | ``'10s'`` means a single 10-byte string, while ``'10c'`` means 10 characters. |
| 127 | For packing, the string is truncated or padded with null bytes as appropriate to |
| 128 | make it fit. For unpacking, the resulting string always has exactly the |
| 129 | specified number of bytes. As a special case, ``'0s'`` means a single, empty |
| 130 | string (while ``'0c'`` means 0 characters). |
| 131 | |
| 132 | The ``'p'`` format character encodes a "Pascal string", meaning a short |
| 133 | variable-length string stored in a fixed number of bytes. The count is the total |
| 134 | number of bytes stored. The first byte stored is the length of the string, or |
| 135 | 255, whichever is smaller. The bytes of the string follow. If the string |
| 136 | passed in to :func:`pack` is too long (longer than the count minus 1), only the |
| 137 | leading count-1 bytes of the string are stored. If the string is shorter than |
| 138 | count-1, it is padded with null bytes so that exactly count bytes in all are |
| 139 | used. Note that for :func:`unpack`, the ``'p'`` format character consumes count |
| 140 | bytes, but that the string returned can never contain more than 255 characters. |
| 141 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 142 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 143 | |
| 144 | For the ``'t'`` format character, the return value is either :const:`True` or |
| 145 | :const:`False`. When packing, the truth value of the argument object is used. |
| 146 | Either 0 or 1 in the native or standard bool representation will be packed, and |
| 147 | any non-zero value will be True when unpacking. |
| 148 | |
| 149 | By default, C numbers are represented in the machine's native format and byte |
| 150 | order, and properly aligned by skipping pad bytes if necessary (according to the |
| 151 | rules used by the C compiler). |
| 152 | |
| 153 | Alternatively, the first character of the format string can be used to indicate |
| 154 | the byte order, size and alignment of the packed data, according to the |
| 155 | following table: |
| 156 | |
| 157 | +-----------+------------------------+--------------------+ |
| 158 | | Character | Byte order | Size and alignment | |
| 159 | +===========+========================+====================+ |
| 160 | | ``@`` | native | native | |
| 161 | +-----------+------------------------+--------------------+ |
| 162 | | ``=`` | native | standard | |
| 163 | +-----------+------------------------+--------------------+ |
| 164 | | ``<`` | little-endian | standard | |
| 165 | +-----------+------------------------+--------------------+ |
| 166 | | ``>`` | big-endian | standard | |
| 167 | +-----------+------------------------+--------------------+ |
| 168 | | ``!`` | network (= big-endian) | standard | |
| 169 | +-----------+------------------------+--------------------+ |
| 170 | |
| 171 | If the first character is not one of these, ``'@'`` is assumed. |
| 172 | |
| 173 | Native byte order is big-endian or little-endian, depending on the host system. |
| 174 | For example, Motorola and Sun processors are big-endian; Intel and DEC |
| 175 | processors are little-endian. |
| 176 | |
| 177 | Native size and alignment are determined using the C compiler's |
Christian Heimes | 5b5e81c | 2007-12-31 16:14:33 +0000 | [diff] [blame] | 178 | ``sizeof`` expression. This is always combined with native byte order. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 179 | |
| 180 | Standard size and alignment are as follows: no alignment is required for any |
| 181 | type (so you have to use pad bytes); :ctype:`short` is 2 bytes; :ctype:`int` and |
| 182 | :ctype:`long` are 4 bytes; :ctype:`long long` (:ctype:`__int64` on Windows) is 8 |
| 183 | bytes; :ctype:`float` and :ctype:`double` are 32-bit and 64-bit IEEE floating |
| 184 | point numbers, respectively. :ctype:`_Bool` is 1 byte. |
| 185 | |
| 186 | Note the difference between ``'@'`` and ``'='``: both use native byte order, but |
| 187 | the size and alignment of the latter is standardized. |
| 188 | |
| 189 | The form ``'!'`` is available for those poor souls who claim they can't remember |
| 190 | whether network byte order is big-endian or little-endian. |
| 191 | |
| 192 | There is no way to indicate non-native byte order (force byte-swapping); use the |
| 193 | appropriate choice of ``'<'`` or ``'>'``. |
| 194 | |
| 195 | The ``'P'`` format character is only available for the native byte ordering |
| 196 | (selected as the default or with the ``'@'`` byte order character). The byte |
| 197 | order character ``'='`` chooses to use little- or big-endian ordering based on |
| 198 | the host system. The struct module does not interpret this as native ordering, |
| 199 | so the ``'P'`` format is not available. |
| 200 | |
| 201 | Examples (all using native byte order, size and alignment, on a big-endian |
| 202 | machine):: |
| 203 | |
| 204 | >>> from struct import * |
| 205 | >>> pack('hhl', 1, 2, 3) |
| 206 | '\x00\x01\x00\x02\x00\x00\x00\x03' |
| 207 | >>> unpack('hhl', '\x00\x01\x00\x02\x00\x00\x00\x03') |
| 208 | (1, 2, 3) |
| 209 | >>> calcsize('hhl') |
| 210 | 8 |
| 211 | |
| 212 | Hint: to align the end of a structure to the alignment requirement of a |
| 213 | particular type, end the format with the code for that type with a repeat count |
| 214 | of zero. For example, the format ``'llh0l'`` specifies two pad bytes at the |
| 215 | end, assuming longs are aligned on 4-byte boundaries. This only works when |
| 216 | native size and alignment are in effect; standard size and alignment does not |
| 217 | enforce any alignment. |
| 218 | |
| 219 | |
| 220 | .. seealso:: |
| 221 | |
| 222 | Module :mod:`array` |
| 223 | Packed binary storage of homogeneous data. |
| 224 | |
| 225 | Module :mod:`xdrlib` |
| 226 | Packing and unpacking of XDR data. |
| 227 | |
| 228 | |
| 229 | .. _struct-objects: |
| 230 | |
| 231 | Struct Objects |
| 232 | -------------- |
| 233 | |
| 234 | The :mod:`struct` module also defines the following type: |
| 235 | |
| 236 | |
| 237 | .. class:: Struct(format) |
| 238 | |
| 239 | Return a new Struct object which writes and reads binary data according to the |
| 240 | format string *format*. Creating a Struct object once and calling its methods |
| 241 | is more efficient than calling the :mod:`struct` functions with the same format |
| 242 | since the format string only needs to be compiled once. |
| 243 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 244 | |
| 245 | Compiled Struct objects support the following methods and attributes: |
| 246 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 247 | .. method:: Struct.pack(v1, v2, ...) |
| 248 | |
| 249 | Identical to the :func:`pack` function, using the compiled format. |
| 250 | (``len(result)`` will equal :attr:`self.size`.) |
| 251 | |
| 252 | |
| 253 | .. method:: Struct.pack_into(buffer, offset, v1, v2, ...) |
| 254 | |
| 255 | Identical to the :func:`pack_into` function, using the compiled format. |
| 256 | |
| 257 | |
| 258 | .. method:: Struct.unpack(string) |
| 259 | |
| 260 | Identical to the :func:`unpack` function, using the compiled format. |
| 261 | (``len(string)`` must equal :attr:`self.size`). |
| 262 | |
| 263 | |
| 264 | .. method:: Struct.unpack_from(buffer[, offset=0]) |
| 265 | |
| 266 | Identical to the :func:`unpack_from` function, using the compiled format. |
| 267 | (``len(buffer[offset:])`` must be at least :attr:`self.size`). |
| 268 | |
| 269 | |
| 270 | .. attribute:: Struct.format |
| 271 | |
| 272 | The format string used to construct this Struct object. |
| 273 | |
Guido van Rossum | 04110fb | 2007-08-24 16:32:05 +0000 | [diff] [blame] | 274 | .. attribute:: Struct.size |
| 275 | |
| 276 | The calculated size of the struct (and hence of the string) corresponding |
| 277 | to :attr:`format`. |
| 278 | |