blob: 0b8052c40d8f3bbb76457d9268d49b766c6cd6e1 [file] [log] [blame]
Georg Brandl8ec7f652007-08-15 14:28:01 +00001
2:mod:`struct` --- Interpret strings as packed binary data
3=========================================================
4
5.. module:: struct
6 :synopsis: Interpret strings as packed binary data.
7
8.. index::
9 pair: C; structures
10 triple: packing; binary; data
11
12This module performs conversions between Python values and C structs represented
Mark Dickinsonb633f102010-04-12 19:46:20 +000013as Python strings. This can be used in handling binary data stored in files or
14from network connections, among other sources. It uses
Mark Dickinsonbbacb832010-04-12 19:25:32 +000015:ref:`struct-format-strings` as compact descriptions of the layout of the C
16structs and the intended conversion to/from Python values.
17
18.. note::
19
Mark Dickinson78ab5832010-04-12 20:38:36 +000020 By default, the result of packing a given C struct includes pad bytes in
21 order to maintain proper alignment for the C types involved; similarly,
22 alignment is taken into account when unpacking. This behavior is chosen so
23 that the bytes of a packed struct correspond exactly to the layout in memory
Mark Dickinson526e5ee2010-06-15 08:33:03 +000024 of the corresponding C struct. To handle platform-independent data formats
25 or omit implicit pad bytes, use `standard` size and alignment instead of
26 `native` size and alignment: see :ref:`struct-alignment` for details.
Mark Dickinsonbbacb832010-04-12 19:25:32 +000027
28Functions and Exceptions
29------------------------
Georg Brandl8ec7f652007-08-15 14:28:01 +000030
31The module defines the following exception and functions:
32
33
34.. exception:: error
35
Mark Dickinsonb633f102010-04-12 19:46:20 +000036 Exception raised on various occasions; argument is a string describing what
37 is wrong.
Georg Brandl8ec7f652007-08-15 14:28:01 +000038
39
40.. function:: pack(fmt, v1, v2, ...)
41
42 Return a string containing the values ``v1, v2, ...`` packed according to the
43 given format. The arguments must match the values required by the format
44 exactly.
45
46
47.. function:: pack_into(fmt, buffer, offset, v1, v2, ...)
48
Mark Dickinsonb633f102010-04-12 19:46:20 +000049 Pack the values ``v1, v2, ...`` according to the given format, write the
50 packed bytes into the writable *buffer* starting at *offset*. Note that the
51 offset is a required argument.
Georg Brandl8ec7f652007-08-15 14:28:01 +000052
53 .. versionadded:: 2.5
54
55
56.. function:: unpack(fmt, string)
57
58 Unpack the string (presumably packed by ``pack(fmt, ...)``) according to the
Mark Dickinsonb633f102010-04-12 19:46:20 +000059 given format. The result is a tuple even if it contains exactly one item.
60 The string must contain exactly the amount of data required by the format
Georg Brandl8ec7f652007-08-15 14:28:01 +000061 (``len(string)`` must equal ``calcsize(fmt)``).
62
63
64.. function:: unpack_from(fmt, buffer[,offset=0])
65
Facundo Batistaeeafb962009-03-04 21:18:17 +000066 Unpack the *buffer* according to the given format. The result is a tuple even
Mark Dickinsonb633f102010-04-12 19:46:20 +000067 if it contains exactly one item. The *buffer* must contain at least the
68 amount of data required by the format (``len(buffer[offset:])`` must be at
69 least ``calcsize(fmt)``).
Georg Brandl8ec7f652007-08-15 14:28:01 +000070
71 .. versionadded:: 2.5
72
73
74.. function:: calcsize(fmt)
75
76 Return the size of the struct (and hence of the string) corresponding to the
77 given format.
78
Mark Dickinsonbbacb832010-04-12 19:25:32 +000079.. _struct-format-strings:
80
81Format Strings
82--------------
83
84Format strings are the mechanism used to specify the expected layout when
Mark Dickinson8e6c45c2010-06-12 18:50:34 +000085packing and unpacking data. They are built up from :ref:`format-characters`,
86which specify the type of data being packed/unpacked. In addition, there are
87special characters for controlling the :ref:`struct-alignment`.
Georg Brandl8ec7f652007-08-15 14:28:01 +000088
Mark Dickinson78ab5832010-04-12 20:38:36 +000089
90.. _struct-alignment:
91
Mark Dickinsonbbacb832010-04-12 19:25:32 +000092Byte Order, Size, and Alignment
93^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
94
Mark Dickinson78ab5832010-04-12 20:38:36 +000095By default, C types are represented in the machine's native format and byte
Georg Brandl8ec7f652007-08-15 14:28:01 +000096order, and properly aligned by skipping pad bytes if necessary (according to the
97rules used by the C compiler).
98
99Alternatively, the first character of the format string can be used to indicate
100the byte order, size and alignment of the packed data, according to the
101following table:
102
Mark Dickinson526e5ee2010-06-15 08:33:03 +0000103+-----------+------------------------+----------+-----------+
104| Character | Byte order | Size | Alignment |
105+===========+========================+==========+===========+
106| ``@`` | native | native | native |
107+-----------+------------------------+----------+-----------+
108| ``=`` | native | standard | none |
109+-----------+------------------------+----------+-----------+
110| ``<`` | little-endian | standard | none |
111+-----------+------------------------+----------+-----------+
112| ``>`` | big-endian | standard | none |
113+-----------+------------------------+----------+-----------+
114| ``!`` | network (= big-endian) | standard | none |
115+-----------+------------------------+----------+-----------+
Georg Brandl8ec7f652007-08-15 14:28:01 +0000116
117If the first character is not one of these, ``'@'`` is assumed.
118
Andrew M. Kuchlingdfd01482010-02-22 15:13:17 +0000119Native byte order is big-endian or little-endian, depending on the host
120system. For example, Intel x86 and AMD64 (x86-64) are little-endian;
121Motorola 68000 and PowerPC G5 are big-endian; ARM and Intel Itanium feature
122switchable endianness (bi-endian). Use ``sys.byteorder`` to check the
123endianness of your system.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000124
125Native size and alignment are determined using the C compiler's
Georg Brandlb19be572007-12-29 10:57:00 +0000126``sizeof`` expression. This is always combined with native byte order.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000127
Mark Dickinson526e5ee2010-06-15 08:33:03 +0000128Standard size depends only on the format character; see the table in
129the :ref:`format-characters` section.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000130
131Note the difference between ``'@'`` and ``'='``: both use native byte order, but
132the size and alignment of the latter is standardized.
133
134The form ``'!'`` is available for those poor souls who claim they can't remember
135whether network byte order is big-endian or little-endian.
136
137There is no way to indicate non-native byte order (force byte-swapping); use the
138appropriate choice of ``'<'`` or ``'>'``.
139
Mark Dickinsonbbacb832010-04-12 19:25:32 +0000140Notes:
141
142(1) Padding is only automatically added between successive structure members.
Mark Dickinson78ab5832010-04-12 20:38:36 +0000143 No padding is added at the beginning or the end of the encoded struct.
Mark Dickinsonbbacb832010-04-12 19:25:32 +0000144
Mark Dickinson78ab5832010-04-12 20:38:36 +0000145(2) No padding is added when using non-native size and alignment, e.g.
Mark Dickinsonbbacb832010-04-12 19:25:32 +0000146 with '<', '>', '=', and '!'.
147
148(3) To align the end of a structure to the alignment requirement of a
149 particular type, end the format with the code for that type with a repeat
150 count of zero. See :ref:`struct-examples`.
151
152
Mark Dickinson8e6c45c2010-06-12 18:50:34 +0000153.. _format-characters:
154
155Format Characters
156^^^^^^^^^^^^^^^^^
157
158Format characters have the following meaning; the conversion between C and
Mark Dickinson4aa5f6f2010-06-29 20:09:12 +0000159Python values should be obvious given their types. The 'Standard size' column
160refers to the size of the packed value in bytes when using standard size; that
161is, when the format string starts with one of ``'<'``, ``'>'``, ``'!'`` or
162``'='``. When using native size, the size of the packed value is
163platform-dependent.
Mark Dickinson8e6c45c2010-06-12 18:50:34 +0000164
165+--------+-------------------------+--------------------+----------------+------------+
166| Format | C Type | Python type | Standard size | Notes |
167+========+=========================+====================+================+============+
168| ``x`` | pad byte | no value | | |
169+--------+-------------------------+--------------------+----------------+------------+
170| ``c`` | :ctype:`char` | string of length 1 | 1 | |
171+--------+-------------------------+--------------------+----------------+------------+
172| ``b`` | :ctype:`signed char` | integer | 1 | \(3) |
173+--------+-------------------------+--------------------+----------------+------------+
174| ``B`` | :ctype:`unsigned char` | integer | 1 | \(3) |
175+--------+-------------------------+--------------------+----------------+------------+
176| ``?`` | :ctype:`_Bool` | bool | 1 | \(1) |
177+--------+-------------------------+--------------------+----------------+------------+
178| ``h`` | :ctype:`short` | integer | 2 | \(3) |
179+--------+-------------------------+--------------------+----------------+------------+
180| ``H`` | :ctype:`unsigned short` | integer | 2 | \(3) |
181+--------+-------------------------+--------------------+----------------+------------+
182| ``i`` | :ctype:`int` | integer | 4 | \(3) |
183+--------+-------------------------+--------------------+----------------+------------+
184| ``I`` | :ctype:`unsigned int` | integer | 4 | \(3) |
185+--------+-------------------------+--------------------+----------------+------------+
186| ``l`` | :ctype:`long` | integer | 4 | \(3) |
187+--------+-------------------------+--------------------+----------------+------------+
188| ``L`` | :ctype:`unsigned long` | integer | 4 | \(3) |
189+--------+-------------------------+--------------------+----------------+------------+
190| ``q`` | :ctype:`long long` | integer | 8 | \(2), \(3) |
191+--------+-------------------------+--------------------+----------------+------------+
192| ``Q`` | :ctype:`unsigned long | integer | 8 | \(2), \(3) |
193| | long` | | | |
194+--------+-------------------------+--------------------+----------------+------------+
Mark Dickinson526e5ee2010-06-15 08:33:03 +0000195| ``f`` | :ctype:`float` | float | 4 | \(4) |
Mark Dickinson8e6c45c2010-06-12 18:50:34 +0000196+--------+-------------------------+--------------------+----------------+------------+
Mark Dickinson526e5ee2010-06-15 08:33:03 +0000197| ``d`` | :ctype:`double` | float | 8 | \(4) |
Mark Dickinson8e6c45c2010-06-12 18:50:34 +0000198+--------+-------------------------+--------------------+----------------+------------+
199| ``s`` | :ctype:`char[]` | string | | |
200+--------+-------------------------+--------------------+----------------+------------+
201| ``p`` | :ctype:`char[]` | string | | |
202+--------+-------------------------+--------------------+----------------+------------+
Mark Dickinson526e5ee2010-06-15 08:33:03 +0000203| ``P`` | :ctype:`void \*` | integer | | \(5), \(3) |
Mark Dickinson8e6c45c2010-06-12 18:50:34 +0000204+--------+-------------------------+--------------------+----------------+------------+
205
206Notes:
207
208(1)
209 The ``'?'`` conversion code corresponds to the :ctype:`_Bool` type defined by
210 C99. If this type is not available, it is simulated using a :ctype:`char`. In
211 standard mode, it is always represented by one byte.
212
213 .. versionadded:: 2.6
214
215(2)
216 The ``'q'`` and ``'Q'`` conversion codes are available in native mode only if
217 the platform C compiler supports C :ctype:`long long`, or, on Windows,
218 :ctype:`__int64`. They are always available in standard modes.
219
220 .. versionadded:: 2.2
221
222(3)
223 When attempting to pack a non-integer using any of the integer conversion
224 codes, if the non-integer has a :meth:`__index__` method then that method is
225 called to convert the argument to an integer before packing. If no
226 :meth:`__index__` method exists, or the call to :meth:`__index__` raises
227 :exc:`TypeError`, then the :meth:`__int__` method is tried. However, the use
228 of :meth:`__int__` is deprecated, and will raise :exc:`DeprecationWarning`.
229
230 .. versionchanged:: 2.7
231 Use of the :meth:`__index__` method for non-integers is new in 2.7.
232
233 .. versionchanged:: 2.7
234 Prior to version 2.7, not all integer conversion codes would use the
235 :meth:`__int__` method to convert, and :exc:`DeprecationWarning` was
236 raised only for float arguments.
237
Mark Dickinson526e5ee2010-06-15 08:33:03 +0000238(4)
239 For the ``'f'`` and ``'d'`` conversion codes, the packed representation uses
240 the IEEE 754 binary32 (for ``'f'``) or binary64 (for ``'d'``) format,
241 regardless of the floating-point format used by the platform.
242
243(5)
244 The ``'P'`` format character is only available for the native byte ordering
245 (selected as the default or with the ``'@'`` byte order character). The byte
246 order character ``'='`` chooses to use little- or big-endian ordering based
247 on the host system. The struct module does not interpret this as native
248 ordering, so the ``'P'`` format is not available.
249
Mark Dickinson8e6c45c2010-06-12 18:50:34 +0000250
251A format character may be preceded by an integral repeat count. For example,
252the format string ``'4h'`` means exactly the same as ``'hhhh'``.
253
254Whitespace characters between formats are ignored; a count and its format must
255not contain whitespace though.
256
257For the ``'s'`` format character, the count is interpreted as the size of the
258string, not a repeat count like for the other format characters; for example,
259``'10s'`` means a single 10-byte string, while ``'10c'`` means 10 characters.
260For packing, the string is truncated or padded with null bytes as appropriate to
261make it fit. For unpacking, the resulting string always has exactly the
262specified number of bytes. As a special case, ``'0s'`` means a single, empty
263string (while ``'0c'`` means 0 characters).
264
265The ``'p'`` format character encodes a "Pascal string", meaning a short
Georg Brandle85e1ae2010-10-06 09:17:24 +0000266variable-length string stored in a *fixed number of bytes*, given by the count.
267The first byte stored is the length of the string, or 255, whichever is smaller.
268The bytes of the string follow. If the string passed in to :func:`pack` is too
269long (longer than the count minus 1), only the leading ``count-1`` bytes of the
270string are stored. If the string is shorter than ``count-1``, it is padded with
271null bytes so that exactly count bytes in all are used. Note that for
272:func:`unpack`, the ``'p'`` format character consumes count bytes, but that the
273string returned can never contain more than 255 characters.
Mark Dickinson8e6c45c2010-06-12 18:50:34 +0000274
275For the ``'P'`` format character, the return value is a Python integer or long
276integer, depending on the size needed to hold a pointer when it has been cast to
277an integer type. A *NULL* pointer will always be returned as the Python integer
278``0``. When packing pointer-sized values, Python integer or long integer objects
279may be used. For example, the Alpha and Merced processors use 64-bit pointer
280values, meaning a Python long integer will be used to hold the pointer; other
281platforms use 32-bit pointers and will use a Python integer.
282
283For the ``'?'`` format character, the return value is either :const:`True` or
284:const:`False`. When packing, the truth value of the argument object is used.
285Either 0 or 1 in the native or standard bool representation will be packed, and
286any non-zero value will be True when unpacking.
287
288
289
Mark Dickinsonbbacb832010-04-12 19:25:32 +0000290.. _struct-examples:
291
292Examples
293^^^^^^^^
294
295.. note::
296 All examples assume a native byte order, size, and alignment with a
297 big-endian machine.
298
299A basic example of packing/unpacking three integers::
Georg Brandl8ec7f652007-08-15 14:28:01 +0000300
301 >>> from struct import *
302 >>> pack('hhl', 1, 2, 3)
303 '\x00\x01\x00\x02\x00\x00\x00\x03'
304 >>> unpack('hhl', '\x00\x01\x00\x02\x00\x00\x00\x03')
305 (1, 2, 3)
306 >>> calcsize('hhl')
307 8
308
Raymond Hettingerf6901e92008-05-23 17:21:44 +0000309Unpacked fields can be named by assigning them to variables or by wrapping
310the result in a named tuple::
311
312 >>> record = 'raymond \x32\x12\x08\x01\x08'
313 >>> name, serialnum, school, gradelevel = unpack('<10sHHb', record)
314
315 >>> from collections import namedtuple
316 >>> Student = namedtuple('Student', 'name serialnum school gradelevel')
317 >>> Student._make(unpack('<10sHHb', s))
318 Student(name='raymond ', serialnum=4658, school=264, gradelevel=8)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000319
Mark Dickinsonbbacb832010-04-12 19:25:32 +0000320The ordering of format characters may have an impact on size since the padding
321needed to satisfy alignment requirements is different::
322
323 >>> pack('ci', '*', 0x12131415)
324 '*\x00\x00\x00\x12\x13\x14\x15'
325 >>> pack('ic', 0x12131415, '*')
326 '\x12\x13\x14\x15*'
327 >>> calcsize('ci')
328 8
329 >>> calcsize('ic')
330 5
331
Mark Dickinsonb633f102010-04-12 19:46:20 +0000332The following format ``'llh0l'`` specifies two pad bytes at the end, assuming
333longs are aligned on 4-byte boundaries::
Mark Dickinsonbbacb832010-04-12 19:25:32 +0000334
335 >>> pack('llh0l', 1, 2, 3)
336 '\x00\x00\x00\x01\x00\x00\x00\x02\x00\x03\x00\x00'
337
338This only works when native size and alignment are in effect; standard size and
339alignment does not enforce any alignment.
340
341
Georg Brandl8ec7f652007-08-15 14:28:01 +0000342.. seealso::
343
344 Module :mod:`array`
345 Packed binary storage of homogeneous data.
346
347 Module :mod:`xdrlib`
348 Packing and unpacking of XDR data.
349
350
351.. _struct-objects:
352
Mark Dickinson8e6c45c2010-06-12 18:50:34 +0000353Classes
Mark Dickinsonbbacb832010-04-12 19:25:32 +0000354-------
Georg Brandl8ec7f652007-08-15 14:28:01 +0000355
356The :mod:`struct` module also defines the following type:
357
358
359.. class:: Struct(format)
360
Mark Dickinsonb633f102010-04-12 19:46:20 +0000361 Return a new Struct object which writes and reads binary data according to
362 the format string *format*. Creating a Struct object once and calling its
363 methods is more efficient than calling the :mod:`struct` functions with the
364 same format since the format string only needs to be compiled once.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000365
366 .. versionadded:: 2.5
367
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000368 Compiled Struct objects support the following methods and attributes:
Georg Brandl8ec7f652007-08-15 14:28:01 +0000369
370
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000371 .. method:: pack(v1, v2, ...)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000372
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000373 Identical to the :func:`pack` function, using the compiled format.
374 (``len(result)`` will equal :attr:`self.size`.)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000375
376
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000377 .. method:: pack_into(buffer, offset, v1, v2, ...)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000378
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000379 Identical to the :func:`pack_into` function, using the compiled format.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000380
381
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000382 .. method:: unpack(string)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000383
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000384 Identical to the :func:`unpack` function, using the compiled format.
385 (``len(string)`` must equal :attr:`self.size`).
Georg Brandl8ec7f652007-08-15 14:28:01 +0000386
387
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000388 .. method:: unpack_from(buffer[, offset=0])
Georg Brandl8ec7f652007-08-15 14:28:01 +0000389
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000390 Identical to the :func:`unpack_from` function, using the compiled format.
391 (``len(buffer[offset:])`` must be at least :attr:`self.size`).
Georg Brandl8ec7f652007-08-15 14:28:01 +0000392
393
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000394 .. attribute:: format
Georg Brandl8ec7f652007-08-15 14:28:01 +0000395
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000396 The format string used to construct this Struct object.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000397
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000398 .. attribute:: size
Georg Brandlb7a837d2007-08-23 21:21:36 +0000399
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000400 The calculated size of the struct (and hence of the string) corresponding
401 to :attr:`format`.
Georg Brandlb7a837d2007-08-23 21:21:36 +0000402