blob: 166b7342a9418ce81640ffe410833c5bb245a0eb [file] [log] [blame]
Georg Brandl8ec7f652007-08-15 14:28:01 +00001
2:mod:`struct` --- Interpret strings as packed binary data
3=========================================================
4
5.. module:: struct
6 :synopsis: Interpret strings as packed binary data.
7
8.. index::
9 pair: C; structures
10 triple: packing; binary; data
11
12This module performs conversions between Python values and C structs represented
Mark Dickinsonb633f102010-04-12 19:46:20 +000013as Python strings. This can be used in handling binary data stored in files or
14from network connections, among other sources. It uses
Mark Dickinsonbbacb832010-04-12 19:25:32 +000015:ref:`struct-format-strings` as compact descriptions of the layout of the C
16structs and the intended conversion to/from Python values.
17
18.. note::
19
Mark Dickinson78ab5832010-04-12 20:38:36 +000020 By default, the result of packing a given C struct includes pad bytes in
21 order to maintain proper alignment for the C types involved; similarly,
22 alignment is taken into account when unpacking. This behavior is chosen so
23 that the bytes of a packed struct correspond exactly to the layout in memory
24 of the corresponding C struct. To omit pad bytes, use `standard` size and
25 alignment instead of `native` size and alignment: see :ref:`struct-alignment`
26 for details.
Mark Dickinsonbbacb832010-04-12 19:25:32 +000027
28Functions and Exceptions
29------------------------
Georg Brandl8ec7f652007-08-15 14:28:01 +000030
31The module defines the following exception and functions:
32
33
34.. exception:: error
35
Mark Dickinsonb633f102010-04-12 19:46:20 +000036 Exception raised on various occasions; argument is a string describing what
37 is wrong.
Georg Brandl8ec7f652007-08-15 14:28:01 +000038
39
40.. function:: pack(fmt, v1, v2, ...)
41
42 Return a string containing the values ``v1, v2, ...`` packed according to the
43 given format. The arguments must match the values required by the format
44 exactly.
45
46
47.. function:: pack_into(fmt, buffer, offset, v1, v2, ...)
48
Mark Dickinsonb633f102010-04-12 19:46:20 +000049 Pack the values ``v1, v2, ...`` according to the given format, write the
50 packed bytes into the writable *buffer* starting at *offset*. Note that the
51 offset is a required argument.
Georg Brandl8ec7f652007-08-15 14:28:01 +000052
53 .. versionadded:: 2.5
54
55
56.. function:: unpack(fmt, string)
57
58 Unpack the string (presumably packed by ``pack(fmt, ...)``) according to the
Mark Dickinsonb633f102010-04-12 19:46:20 +000059 given format. The result is a tuple even if it contains exactly one item.
60 The string must contain exactly the amount of data required by the format
Georg Brandl8ec7f652007-08-15 14:28:01 +000061 (``len(string)`` must equal ``calcsize(fmt)``).
62
63
64.. function:: unpack_from(fmt, buffer[,offset=0])
65
Facundo Batistaeeafb962009-03-04 21:18:17 +000066 Unpack the *buffer* according to the given format. The result is a tuple even
Mark Dickinsonb633f102010-04-12 19:46:20 +000067 if it contains exactly one item. The *buffer* must contain at least the
68 amount of data required by the format (``len(buffer[offset:])`` must be at
69 least ``calcsize(fmt)``).
Georg Brandl8ec7f652007-08-15 14:28:01 +000070
71 .. versionadded:: 2.5
72
73
74.. function:: calcsize(fmt)
75
76 Return the size of the struct (and hence of the string) corresponding to the
77 given format.
78
Mark Dickinsonbbacb832010-04-12 19:25:32 +000079.. _struct-format-strings:
80
81Format Strings
82--------------
83
84Format strings are the mechanism used to specify the expected layout when
85packing and unpacking data. They are built up from format characters, which
86specify the type of data being packed/unpacked. In addition, there are
87special characters for controlling the byte order, size, and alignment.
88
89Format Characters
90^^^^^^^^^^^^^^^^^
91
Georg Brandl8ec7f652007-08-15 14:28:01 +000092Format characters have the following meaning; the conversion between C and
93Python values should be obvious given their types:
94
Mark Dickinsonbbacb832010-04-12 19:25:32 +000095+--------+-------------------------+--------------------+------------+
96| Format | C Type | Python | Notes |
97+========+=========================+====================+============+
98| ``x`` | pad byte | no value | |
99+--------+-------------------------+--------------------+------------+
100| ``c`` | :ctype:`char` | string of length 1 | |
101+--------+-------------------------+--------------------+------------+
102| ``b`` | :ctype:`signed char` | integer | \(3) |
103+--------+-------------------------+--------------------+------------+
104| ``B`` | :ctype:`unsigned char` | integer | \(3) |
105+--------+-------------------------+--------------------+------------+
106| ``?`` | :ctype:`_Bool` | bool | \(1) |
107+--------+-------------------------+--------------------+------------+
108| ``h`` | :ctype:`short` | integer | \(3) |
109+--------+-------------------------+--------------------+------------+
110| ``H`` | :ctype:`unsigned short` | integer | \(3) |
111+--------+-------------------------+--------------------+------------+
112| ``i`` | :ctype:`int` | integer | \(3) |
113+--------+-------------------------+--------------------+------------+
114| ``I`` | :ctype:`unsigned int` | integer or long | \(3) |
115+--------+-------------------------+--------------------+------------+
116| ``l`` | :ctype:`long` | integer | \(3) |
117+--------+-------------------------+--------------------+------------+
118| ``L`` | :ctype:`unsigned long` | long | \(3) |
119+--------+-------------------------+--------------------+------------+
120| ``q`` | :ctype:`long long` | long | \(2),\(3) |
121+--------+-------------------------+--------------------+------------+
122| ``Q`` | :ctype:`unsigned long | long | \(2),\(3) |
123| | long` | | |
124+--------+-------------------------+--------------------+------------+
125| ``f`` | :ctype:`float` | float | |
126+--------+-------------------------+--------------------+------------+
127| ``d`` | :ctype:`double` | float | |
128+--------+-------------------------+--------------------+------------+
129| ``s`` | :ctype:`char[]` | string | |
130+--------+-------------------------+--------------------+------------+
131| ``p`` | :ctype:`char[]` | string | |
132+--------+-------------------------+--------------------+------------+
133| ``P`` | :ctype:`void \*` | long | \(3) |
134+--------+-------------------------+--------------------+------------+
Georg Brandl8ec7f652007-08-15 14:28:01 +0000135
136Notes:
137
138(1)
Thomas Hellerf3c05592008-03-05 15:34:29 +0000139 The ``'?'`` conversion code corresponds to the :ctype:`_Bool` type defined by
Georg Brandl8ec7f652007-08-15 14:28:01 +0000140 C99. If this type is not available, it is simulated using a :ctype:`char`. In
141 standard mode, it is always represented by one byte.
142
143 .. versionadded:: 2.6
144
145(2)
146 The ``'q'`` and ``'Q'`` conversion codes are available in native mode only if
147 the platform C compiler supports C :ctype:`long long`, or, on Windows,
148 :ctype:`__int64`. They are always available in standard modes.
149
150 .. versionadded:: 2.2
151
Mark Dickinson154b7ad2010-03-07 16:24:45 +0000152(3)
153 When attempting to pack a non-integer using any of the integer conversion
Mark Dickinson4846a8e2010-04-03 14:05:10 +0000154 codes, if the non-integer has a :meth:`__index__` method then that method is
155 called to convert the argument to an integer before packing. If no
156 :meth:`__index__` method exists, or the call to :meth:`__index__` raises
157 :exc:`TypeError`, then the :meth:`__int__` method is tried. However, the use
Ezio Melotti9ccc5812010-04-05 08:16:41 +0000158 of :meth:`__int__` is deprecated, and will raise :exc:`DeprecationWarning`.
Mark Dickinson4846a8e2010-04-03 14:05:10 +0000159
160 .. versionchanged:: 2.7
161 Use of the :meth:`__index__` method for non-integers is new in 2.7.
Mark Dickinson154b7ad2010-03-07 16:24:45 +0000162
163 .. versionchanged:: 2.7
164 Prior to version 2.7, not all integer conversion codes would use the
165 :meth:`__int__` method to convert, and :exc:`DeprecationWarning` was
166 raised only for float arguments.
167
168
Georg Brandl8ec7f652007-08-15 14:28:01 +0000169A format character may be preceded by an integral repeat count. For example,
170the format string ``'4h'`` means exactly the same as ``'hhhh'``.
171
172Whitespace characters between formats are ignored; a count and its format must
173not contain whitespace though.
174
175For the ``'s'`` format character, the count is interpreted as the size of the
176string, not a repeat count like for the other format characters; for example,
177``'10s'`` means a single 10-byte string, while ``'10c'`` means 10 characters.
178For packing, the string is truncated or padded with null bytes as appropriate to
179make it fit. For unpacking, the resulting string always has exactly the
180specified number of bytes. As a special case, ``'0s'`` means a single, empty
181string (while ``'0c'`` means 0 characters).
182
183The ``'p'`` format character encodes a "Pascal string", meaning a short
184variable-length string stored in a fixed number of bytes. The count is the total
185number of bytes stored. The first byte stored is the length of the string, or
186255, whichever is smaller. The bytes of the string follow. If the string
187passed in to :func:`pack` is too long (longer than the count minus 1), only the
188leading count-1 bytes of the string are stored. If the string is shorter than
189count-1, it is padded with null bytes so that exactly count bytes in all are
190used. Note that for :func:`unpack`, the ``'p'`` format character consumes count
191bytes, but that the string returned can never contain more than 255 characters.
192
193For the ``'I'``, ``'L'``, ``'q'`` and ``'Q'`` format characters, the return
194value is a Python long integer.
195
196For the ``'P'`` format character, the return value is a Python integer or long
197integer, depending on the size needed to hold a pointer when it has been cast to
198an integer type. A *NULL* pointer will always be returned as the Python integer
199``0``. When packing pointer-sized values, Python integer or long integer objects
200may be used. For example, the Alpha and Merced processors use 64-bit pointer
201values, meaning a Python long integer will be used to hold the pointer; other
202platforms use 32-bit pointers and will use a Python integer.
203
Thomas Hellerf3c05592008-03-05 15:34:29 +0000204For the ``'?'`` format character, the return value is either :const:`True` or
Georg Brandl8ec7f652007-08-15 14:28:01 +0000205:const:`False`. When packing, the truth value of the argument object is used.
206Either 0 or 1 in the native or standard bool representation will be packed, and
207any non-zero value will be True when unpacking.
208
Mark Dickinson78ab5832010-04-12 20:38:36 +0000209
210.. _struct-alignment:
211
Mark Dickinsonbbacb832010-04-12 19:25:32 +0000212Byte Order, Size, and Alignment
213^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
214
Mark Dickinson78ab5832010-04-12 20:38:36 +0000215By default, C types are represented in the machine's native format and byte
Georg Brandl8ec7f652007-08-15 14:28:01 +0000216order, and properly aligned by skipping pad bytes if necessary (according to the
217rules used by the C compiler).
218
219Alternatively, the first character of the format string can be used to indicate
220the byte order, size and alignment of the packed data, according to the
221following table:
222
223+-----------+------------------------+--------------------+
224| Character | Byte order | Size and alignment |
225+===========+========================+====================+
226| ``@`` | native | native |
227+-----------+------------------------+--------------------+
228| ``=`` | native | standard |
229+-----------+------------------------+--------------------+
230| ``<`` | little-endian | standard |
231+-----------+------------------------+--------------------+
232| ``>`` | big-endian | standard |
233+-----------+------------------------+--------------------+
234| ``!`` | network (= big-endian) | standard |
235+-----------+------------------------+--------------------+
236
237If the first character is not one of these, ``'@'`` is assumed.
238
Andrew M. Kuchlingdfd01482010-02-22 15:13:17 +0000239Native byte order is big-endian or little-endian, depending on the host
240system. For example, Intel x86 and AMD64 (x86-64) are little-endian;
241Motorola 68000 and PowerPC G5 are big-endian; ARM and Intel Itanium feature
242switchable endianness (bi-endian). Use ``sys.byteorder`` to check the
243endianness of your system.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000244
245Native size and alignment are determined using the C compiler's
Georg Brandlb19be572007-12-29 10:57:00 +0000246``sizeof`` expression. This is always combined with native byte order.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000247
248Standard size and alignment are as follows: no alignment is required for any
249type (so you have to use pad bytes); :ctype:`short` is 2 bytes; :ctype:`int` and
250:ctype:`long` are 4 bytes; :ctype:`long long` (:ctype:`__int64` on Windows) is 8
251bytes; :ctype:`float` and :ctype:`double` are 32-bit and 64-bit IEEE floating
252point numbers, respectively. :ctype:`_Bool` is 1 byte.
253
254Note the difference between ``'@'`` and ``'='``: both use native byte order, but
255the size and alignment of the latter is standardized.
256
257The form ``'!'`` is available for those poor souls who claim they can't remember
258whether network byte order is big-endian or little-endian.
259
260There is no way to indicate non-native byte order (force byte-swapping); use the
261appropriate choice of ``'<'`` or ``'>'``.
262
263The ``'P'`` format character is only available for the native byte ordering
264(selected as the default or with the ``'@'`` byte order character). The byte
265order character ``'='`` chooses to use little- or big-endian ordering based on
266the host system. The struct module does not interpret this as native ordering,
267so the ``'P'`` format is not available.
268
Mark Dickinsonbbacb832010-04-12 19:25:32 +0000269Notes:
270
271(1) Padding is only automatically added between successive structure members.
Mark Dickinson78ab5832010-04-12 20:38:36 +0000272 No padding is added at the beginning or the end of the encoded struct.
Mark Dickinsonbbacb832010-04-12 19:25:32 +0000273
Mark Dickinson78ab5832010-04-12 20:38:36 +0000274(2) No padding is added when using non-native size and alignment, e.g.
Mark Dickinsonbbacb832010-04-12 19:25:32 +0000275 with '<', '>', '=', and '!'.
276
277(3) To align the end of a structure to the alignment requirement of a
278 particular type, end the format with the code for that type with a repeat
279 count of zero. See :ref:`struct-examples`.
280
281
282.. _struct-examples:
283
284Examples
285^^^^^^^^
286
287.. note::
288 All examples assume a native byte order, size, and alignment with a
289 big-endian machine.
290
291A basic example of packing/unpacking three integers::
Georg Brandl8ec7f652007-08-15 14:28:01 +0000292
293 >>> from struct import *
294 >>> pack('hhl', 1, 2, 3)
295 '\x00\x01\x00\x02\x00\x00\x00\x03'
296 >>> unpack('hhl', '\x00\x01\x00\x02\x00\x00\x00\x03')
297 (1, 2, 3)
298 >>> calcsize('hhl')
299 8
300
Raymond Hettingerf6901e92008-05-23 17:21:44 +0000301Unpacked fields can be named by assigning them to variables or by wrapping
302the result in a named tuple::
303
304 >>> record = 'raymond \x32\x12\x08\x01\x08'
305 >>> name, serialnum, school, gradelevel = unpack('<10sHHb', record)
306
307 >>> from collections import namedtuple
308 >>> Student = namedtuple('Student', 'name serialnum school gradelevel')
309 >>> Student._make(unpack('<10sHHb', s))
310 Student(name='raymond ', serialnum=4658, school=264, gradelevel=8)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000311
Mark Dickinsonbbacb832010-04-12 19:25:32 +0000312The ordering of format characters may have an impact on size since the padding
313needed to satisfy alignment requirements is different::
314
315 >>> pack('ci', '*', 0x12131415)
316 '*\x00\x00\x00\x12\x13\x14\x15'
317 >>> pack('ic', 0x12131415, '*')
318 '\x12\x13\x14\x15*'
319 >>> calcsize('ci')
320 8
321 >>> calcsize('ic')
322 5
323
Mark Dickinsonb633f102010-04-12 19:46:20 +0000324The following format ``'llh0l'`` specifies two pad bytes at the end, assuming
325longs are aligned on 4-byte boundaries::
Mark Dickinsonbbacb832010-04-12 19:25:32 +0000326
327 >>> pack('llh0l', 1, 2, 3)
328 '\x00\x00\x00\x01\x00\x00\x00\x02\x00\x03\x00\x00'
329
330This only works when native size and alignment are in effect; standard size and
331alignment does not enforce any alignment.
332
333
Georg Brandl8ec7f652007-08-15 14:28:01 +0000334.. seealso::
335
336 Module :mod:`array`
337 Packed binary storage of homogeneous data.
338
339 Module :mod:`xdrlib`
340 Packing and unpacking of XDR data.
341
342
343.. _struct-objects:
344
Mark Dickinsonbbacb832010-04-12 19:25:32 +0000345Objects
346-------
Georg Brandl8ec7f652007-08-15 14:28:01 +0000347
348The :mod:`struct` module also defines the following type:
349
350
351.. class:: Struct(format)
352
Mark Dickinsonb633f102010-04-12 19:46:20 +0000353 Return a new Struct object which writes and reads binary data according to
354 the format string *format*. Creating a Struct object once and calling its
355 methods is more efficient than calling the :mod:`struct` functions with the
356 same format since the format string only needs to be compiled once.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000357
358 .. versionadded:: 2.5
359
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000360 Compiled Struct objects support the following methods and attributes:
Georg Brandl8ec7f652007-08-15 14:28:01 +0000361
362
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000363 .. method:: pack(v1, v2, ...)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000364
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000365 Identical to the :func:`pack` function, using the compiled format.
366 (``len(result)`` will equal :attr:`self.size`.)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000367
368
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000369 .. method:: pack_into(buffer, offset, v1, v2, ...)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000370
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000371 Identical to the :func:`pack_into` function, using the compiled format.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000372
373
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000374 .. method:: unpack(string)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000375
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000376 Identical to the :func:`unpack` function, using the compiled format.
377 (``len(string)`` must equal :attr:`self.size`).
Georg Brandl8ec7f652007-08-15 14:28:01 +0000378
379
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000380 .. method:: unpack_from(buffer[, offset=0])
Georg Brandl8ec7f652007-08-15 14:28:01 +0000381
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000382 Identical to the :func:`unpack_from` function, using the compiled format.
383 (``len(buffer[offset:])`` must be at least :attr:`self.size`).
Georg Brandl8ec7f652007-08-15 14:28:01 +0000384
385
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000386 .. attribute:: format
Georg Brandl8ec7f652007-08-15 14:28:01 +0000387
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000388 The format string used to construct this Struct object.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000389
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000390 .. attribute:: size
Georg Brandlb7a837d2007-08-23 21:21:36 +0000391
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000392 The calculated size of the struct (and hence of the string) corresponding
393 to :attr:`format`.
Georg Brandlb7a837d2007-08-23 21:21:36 +0000394