blob: aa3de40750cfb0d3ea3e93d63a40f8b952ddb290 [file] [log] [blame]
Georg Brandl8ec7f652007-08-15 14:28:01 +00001
2:mod:`struct` --- Interpret strings as packed binary data
3=========================================================
4
5.. module:: struct
6 :synopsis: Interpret strings as packed binary data.
7
8.. index::
9 pair: C; structures
10 triple: packing; binary; data
11
12This module performs conversions between Python values and C structs represented
Mark Dickinson83afa362010-05-22 18:47:23 +000013as Python strings. This can be used in handling binary data stored in files or
14from network connections, among other sources. It uses
15:ref:`struct-format-strings` as compact descriptions of the layout of the C
16structs and the intended conversion to/from Python values.
17
18.. note::
19
20 By default, the result of packing a given C struct includes pad bytes in
21 order to maintain proper alignment for the C types involved; similarly,
22 alignment is taken into account when unpacking. This behavior is chosen so
23 that the bytes of a packed struct correspond exactly to the layout in memory
24 of the corresponding C struct. To omit pad bytes, use `standard` size and
25 alignment instead of `native` size and alignment: see :ref:`struct-alignment`
26 for details.
27
28Functions and Exceptions
29------------------------
Georg Brandl8ec7f652007-08-15 14:28:01 +000030
31The module defines the following exception and functions:
32
33
34.. exception:: error
35
Mark Dickinson83afa362010-05-22 18:47:23 +000036 Exception raised on various occasions; argument is a string describing what
37 is wrong.
Georg Brandl8ec7f652007-08-15 14:28:01 +000038
39
40.. function:: pack(fmt, v1, v2, ...)
41
42 Return a string containing the values ``v1, v2, ...`` packed according to the
43 given format. The arguments must match the values required by the format
44 exactly.
45
46
47.. function:: pack_into(fmt, buffer, offset, v1, v2, ...)
48
Mark Dickinson83afa362010-05-22 18:47:23 +000049 Pack the values ``v1, v2, ...`` according to the given format, write the
50 packed bytes into the writable *buffer* starting at *offset*. Note that the
51 offset is a required argument.
Georg Brandl8ec7f652007-08-15 14:28:01 +000052
53 .. versionadded:: 2.5
54
55
56.. function:: unpack(fmt, string)
57
58 Unpack the string (presumably packed by ``pack(fmt, ...)``) according to the
Mark Dickinson83afa362010-05-22 18:47:23 +000059 given format. The result is a tuple even if it contains exactly one item.
60 The string must contain exactly the amount of data required by the format
Georg Brandl8ec7f652007-08-15 14:28:01 +000061 (``len(string)`` must equal ``calcsize(fmt)``).
62
63
64.. function:: unpack_from(fmt, buffer[,offset=0])
65
Georg Brandl51b72162009-10-27 13:54:57 +000066 Unpack the *buffer* according to the given format. The result is a tuple even
Mark Dickinson83afa362010-05-22 18:47:23 +000067 if it contains exactly one item. The *buffer* must contain at least the
68 amount of data required by the format (``len(buffer[offset:])`` must be at
69 least ``calcsize(fmt)``).
Georg Brandl8ec7f652007-08-15 14:28:01 +000070
71 .. versionadded:: 2.5
72
73
74.. function:: calcsize(fmt)
75
76 Return the size of the struct (and hence of the string) corresponding to the
77 given format.
78
Mark Dickinson83afa362010-05-22 18:47:23 +000079.. _struct-format-strings:
80
81Format Strings
82--------------
83
84Format strings are the mechanism used to specify the expected layout when
Mark Dickinsonfb49f9a2010-06-12 18:55:47 +000085packing and unpacking data. They are built up from :ref:`format-characters`,
86which specify the type of data being packed/unpacked. In addition, there are
87special characters for controlling the :ref:`struct-alignment`.
Georg Brandl8ec7f652007-08-15 14:28:01 +000088
Mark Dickinson83afa362010-05-22 18:47:23 +000089
90.. _struct-alignment:
91
92Byte Order, Size, and Alignment
93^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
94
95By default, C types are represented in the machine's native format and byte
Georg Brandl8ec7f652007-08-15 14:28:01 +000096order, and properly aligned by skipping pad bytes if necessary (according to the
97rules used by the C compiler).
98
99Alternatively, the first character of the format string can be used to indicate
100the byte order, size and alignment of the packed data, according to the
101following table:
102
103+-----------+------------------------+--------------------+
104| Character | Byte order | Size and alignment |
105+===========+========================+====================+
106| ``@`` | native | native |
107+-----------+------------------------+--------------------+
108| ``=`` | native | standard |
109+-----------+------------------------+--------------------+
110| ``<`` | little-endian | standard |
111+-----------+------------------------+--------------------+
112| ``>`` | big-endian | standard |
113+-----------+------------------------+--------------------+
114| ``!`` | network (= big-endian) | standard |
115+-----------+------------------------+--------------------+
116
117If the first character is not one of these, ``'@'`` is assumed.
118
Georg Brandlf5dec8e2010-05-19 14:12:57 +0000119Native byte order is big-endian or little-endian, depending on the host
120system. For example, Intel x86 and AMD64 (x86-64) are little-endian;
121Motorola 68000 and PowerPC G5 are big-endian; ARM and Intel Itanium feature
122switchable endianness (bi-endian). Use ``sys.byteorder`` to check the
123endianness of your system.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000124
125Native size and alignment are determined using the C compiler's
Georg Brandlb19be572007-12-29 10:57:00 +0000126``sizeof`` expression. This is always combined with native byte order.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000127
128Standard size and alignment are as follows: no alignment is required for any
129type (so you have to use pad bytes); :ctype:`short` is 2 bytes; :ctype:`int` and
130:ctype:`long` are 4 bytes; :ctype:`long long` (:ctype:`__int64` on Windows) is 8
131bytes; :ctype:`float` and :ctype:`double` are 32-bit and 64-bit IEEE floating
132point numbers, respectively. :ctype:`_Bool` is 1 byte.
133
134Note the difference between ``'@'`` and ``'='``: both use native byte order, but
135the size and alignment of the latter is standardized.
136
137The form ``'!'`` is available for those poor souls who claim they can't remember
138whether network byte order is big-endian or little-endian.
139
140There is no way to indicate non-native byte order (force byte-swapping); use the
141appropriate choice of ``'<'`` or ``'>'``.
142
143The ``'P'`` format character is only available for the native byte ordering
144(selected as the default or with the ``'@'`` byte order character). The byte
145order character ``'='`` chooses to use little- or big-endian ordering based on
146the host system. The struct module does not interpret this as native ordering,
147so the ``'P'`` format is not available.
148
Mark Dickinson83afa362010-05-22 18:47:23 +0000149Notes:
150
151(1) Padding is only automatically added between successive structure members.
152 No padding is added at the beginning or the end of the encoded struct.
153
154(2) No padding is added when using non-native size and alignment, e.g.
155 with '<', '>', '=', and '!'.
156
157(3) To align the end of a structure to the alignment requirement of a
158 particular type, end the format with the code for that type with a repeat
159 count of zero. See :ref:`struct-examples`.
160
161
Mark Dickinsonfb49f9a2010-06-12 18:55:47 +0000162.. _format-characters:
163
164Format Characters
165^^^^^^^^^^^^^^^^^
166
167Format characters have the following meaning; the conversion between C and
168Python values should be obvious given their types:
169
170+--------+-------------------------+--------------------+----------------+------------+
171| Format | C Type | Python type | Standard size | Notes |
172+========+=========================+====================+================+============+
173| ``x`` | pad byte | no value | | |
174+--------+-------------------------+--------------------+----------------+------------+
175| ``c`` | :ctype:`char` | string of length 1 | 1 | |
176+--------+-------------------------+--------------------+----------------+------------+
177| ``b`` | :ctype:`signed char` | integer | 1 | \(3) |
178+--------+-------------------------+--------------------+----------------+------------+
179| ``B`` | :ctype:`unsigned char` | integer | 1 | \(3) |
180+--------+-------------------------+--------------------+----------------+------------+
181| ``?`` | :ctype:`_Bool` | bool | 1 | \(1) |
182+--------+-------------------------+--------------------+----------------+------------+
183| ``h`` | :ctype:`short` | integer | 2 | \(3) |
184+--------+-------------------------+--------------------+----------------+------------+
185| ``H`` | :ctype:`unsigned short` | integer | 2 | \(3) |
186+--------+-------------------------+--------------------+----------------+------------+
187| ``i`` | :ctype:`int` | integer | 4 | \(3) |
188+--------+-------------------------+--------------------+----------------+------------+
189| ``I`` | :ctype:`unsigned int` | integer | 4 | \(3) |
190+--------+-------------------------+--------------------+----------------+------------+
191| ``l`` | :ctype:`long` | integer | 4 | \(3) |
192+--------+-------------------------+--------------------+----------------+------------+
193| ``L`` | :ctype:`unsigned long` | integer | 4 | \(3) |
194+--------+-------------------------+--------------------+----------------+------------+
195| ``q`` | :ctype:`long long` | integer | 8 | \(2), \(3) |
196+--------+-------------------------+--------------------+----------------+------------+
197| ``Q`` | :ctype:`unsigned long | integer | 8 | \(2), \(3) |
198| | long` | | | |
199+--------+-------------------------+--------------------+----------------+------------+
200| ``f`` | :ctype:`float` | float | 4 | |
201+--------+-------------------------+--------------------+----------------+------------+
202| ``d`` | :ctype:`double` | float | 8 | |
203+--------+-------------------------+--------------------+----------------+------------+
204| ``s`` | :ctype:`char[]` | string | | |
205+--------+-------------------------+--------------------+----------------+------------+
206| ``p`` | :ctype:`char[]` | string | | |
207+--------+-------------------------+--------------------+----------------+------------+
208| ``P`` | :ctype:`void \*` | integer | | \(3) |
209+--------+-------------------------+--------------------+----------------+------------+
210
211Notes:
212
213(1)
214 The ``'?'`` conversion code corresponds to the :ctype:`_Bool` type defined by
215 C99. If this type is not available, it is simulated using a :ctype:`char`. In
216 standard mode, it is always represented by one byte.
217
218 .. versionadded:: 2.6
219
220(2)
221 The ``'q'`` and ``'Q'`` conversion codes are available in native mode only if
222 the platform C compiler supports C :ctype:`long long`, or, on Windows,
223 :ctype:`__int64`. They are always available in standard modes.
224
225 .. versionadded:: 2.2
226
227A format character may be preceded by an integral repeat count. For example,
228the format string ``'4h'`` means exactly the same as ``'hhhh'``.
229
230Whitespace characters between formats are ignored; a count and its format must
231not contain whitespace though.
232
233For the ``'s'`` format character, the count is interpreted as the size of the
234string, not a repeat count like for the other format characters; for example,
235``'10s'`` means a single 10-byte string, while ``'10c'`` means 10 characters.
236For packing, the string is truncated or padded with null bytes as appropriate to
237make it fit. For unpacking, the resulting string always has exactly the
238specified number of bytes. As a special case, ``'0s'`` means a single, empty
239string (while ``'0c'`` means 0 characters).
240
241The ``'p'`` format character encodes a "Pascal string", meaning a short
242variable-length string stored in a fixed number of bytes. The count is the total
243number of bytes stored. The first byte stored is the length of the string, or
244255, whichever is smaller. The bytes of the string follow. If the string
245passed in to :func:`pack` is too long (longer than the count minus 1), only the
246leading count-1 bytes of the string are stored. If the string is shorter than
247count-1, it is padded with null bytes so that exactly count bytes in all are
248used. Note that for :func:`unpack`, the ``'p'`` format character consumes count
249bytes, but that the string returned can never contain more than 255 characters.
250
251For the ``'P'`` format character, the return value is a Python integer or long
252integer, depending on the size needed to hold a pointer when it has been cast to
253an integer type. A *NULL* pointer will always be returned as the Python integer
254``0``. When packing pointer-sized values, Python integer or long integer objects
255may be used. For example, the Alpha and Merced processors use 64-bit pointer
256values, meaning a Python long integer will be used to hold the pointer; other
257platforms use 32-bit pointers and will use a Python integer.
258
259For the ``'?'`` format character, the return value is either :const:`True` or
260:const:`False`. When packing, the truth value of the argument object is used.
261Either 0 or 1 in the native or standard bool representation will be packed, and
262any non-zero value will be True when unpacking.
263
264
265
Mark Dickinson83afa362010-05-22 18:47:23 +0000266.. _struct-examples:
267
268Examples
269^^^^^^^^
270
271.. note::
272 All examples assume a native byte order, size, and alignment with a
273 big-endian machine.
274
275A basic example of packing/unpacking three integers::
Georg Brandl8ec7f652007-08-15 14:28:01 +0000276
277 >>> from struct import *
278 >>> pack('hhl', 1, 2, 3)
279 '\x00\x01\x00\x02\x00\x00\x00\x03'
280 >>> unpack('hhl', '\x00\x01\x00\x02\x00\x00\x00\x03')
281 (1, 2, 3)
282 >>> calcsize('hhl')
283 8
284
Raymond Hettingerf6901e92008-05-23 17:21:44 +0000285Unpacked fields can be named by assigning them to variables or by wrapping
286the result in a named tuple::
287
288 >>> record = 'raymond \x32\x12\x08\x01\x08'
289 >>> name, serialnum, school, gradelevel = unpack('<10sHHb', record)
290
291 >>> from collections import namedtuple
292 >>> Student = namedtuple('Student', 'name serialnum school gradelevel')
293 >>> Student._make(unpack('<10sHHb', s))
294 Student(name='raymond ', serialnum=4658, school=264, gradelevel=8)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000295
Mark Dickinson83afa362010-05-22 18:47:23 +0000296The ordering of format characters may have an impact on size since the padding
297needed to satisfy alignment requirements is different::
298
299 >>> pack('ci', '*', 0x12131415)
300 '*\x00\x00\x00\x12\x13\x14\x15'
301 >>> pack('ic', 0x12131415, '*')
302 '\x12\x13\x14\x15*'
303 >>> calcsize('ci')
304 8
305 >>> calcsize('ic')
306 5
307
308The following format ``'llh0l'`` specifies two pad bytes at the end, assuming
309longs are aligned on 4-byte boundaries::
310
311 >>> pack('llh0l', 1, 2, 3)
312 '\x00\x00\x00\x01\x00\x00\x00\x02\x00\x03\x00\x00'
313
314This only works when native size and alignment are in effect; standard size and
315alignment does not enforce any alignment.
316
317
Georg Brandl8ec7f652007-08-15 14:28:01 +0000318.. seealso::
319
320 Module :mod:`array`
321 Packed binary storage of homogeneous data.
322
323 Module :mod:`xdrlib`
324 Packing and unpacking of XDR data.
325
326
327.. _struct-objects:
328
Mark Dickinsonfb49f9a2010-06-12 18:55:47 +0000329Classes
Mark Dickinson83afa362010-05-22 18:47:23 +0000330-------
Georg Brandl8ec7f652007-08-15 14:28:01 +0000331
332The :mod:`struct` module also defines the following type:
333
334
335.. class:: Struct(format)
336
Mark Dickinson83afa362010-05-22 18:47:23 +0000337 Return a new Struct object which writes and reads binary data according to
338 the format string *format*. Creating a Struct object once and calling its
339 methods is more efficient than calling the :mod:`struct` functions with the
340 same format since the format string only needs to be compiled once.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000341
342 .. versionadded:: 2.5
343
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000344 Compiled Struct objects support the following methods and attributes:
Georg Brandl8ec7f652007-08-15 14:28:01 +0000345
346
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000347 .. method:: pack(v1, v2, ...)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000348
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000349 Identical to the :func:`pack` function, using the compiled format.
350 (``len(result)`` will equal :attr:`self.size`.)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000351
352
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000353 .. method:: pack_into(buffer, offset, v1, v2, ...)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000354
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000355 Identical to the :func:`pack_into` function, using the compiled format.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000356
357
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000358 .. method:: unpack(string)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000359
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000360 Identical to the :func:`unpack` function, using the compiled format.
361 (``len(string)`` must equal :attr:`self.size`).
Georg Brandl8ec7f652007-08-15 14:28:01 +0000362
363
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000364 .. method:: unpack_from(buffer[, offset=0])
Georg Brandl8ec7f652007-08-15 14:28:01 +0000365
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000366 Identical to the :func:`unpack_from` function, using the compiled format.
367 (``len(buffer[offset:])`` must be at least :attr:`self.size`).
Georg Brandl8ec7f652007-08-15 14:28:01 +0000368
369
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000370 .. attribute:: format
Georg Brandl8ec7f652007-08-15 14:28:01 +0000371
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000372 The format string used to construct this Struct object.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000373
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000374 .. attribute:: size
Georg Brandlb7a837d2007-08-23 21:21:36 +0000375
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000376 The calculated size of the struct (and hence of the string) corresponding
377 to :attr:`format`.
Georg Brandlb7a837d2007-08-23 21:21:36 +0000378