blob: 2bed42e70948c647204892d8357583708945d41d [file] [log] [blame]
Georg Brandl8ec7f652007-08-15 14:28:01 +00001
2:mod:`struct` --- Interpret strings as packed binary data
3=========================================================
4
5.. module:: struct
6 :synopsis: Interpret strings as packed binary data.
7
8.. index::
9 pair: C; structures
10 triple: packing; binary; data
11
12This module performs conversions between Python values and C structs represented
Mark Dickinson83afa362010-05-22 18:47:23 +000013as Python strings. This can be used in handling binary data stored in files or
14from network connections, among other sources. It uses
15:ref:`struct-format-strings` as compact descriptions of the layout of the C
16structs and the intended conversion to/from Python values.
17
18.. note::
19
20 By default, the result of packing a given C struct includes pad bytes in
21 order to maintain proper alignment for the C types involved; similarly,
22 alignment is taken into account when unpacking. This behavior is chosen so
23 that the bytes of a packed struct correspond exactly to the layout in memory
24 of the corresponding C struct. To omit pad bytes, use `standard` size and
25 alignment instead of `native` size and alignment: see :ref:`struct-alignment`
26 for details.
27
28Functions and Exceptions
29------------------------
Georg Brandl8ec7f652007-08-15 14:28:01 +000030
31The module defines the following exception and functions:
32
33
34.. exception:: error
35
Mark Dickinson83afa362010-05-22 18:47:23 +000036 Exception raised on various occasions; argument is a string describing what
37 is wrong.
Georg Brandl8ec7f652007-08-15 14:28:01 +000038
39
40.. function:: pack(fmt, v1, v2, ...)
41
42 Return a string containing the values ``v1, v2, ...`` packed according to the
43 given format. The arguments must match the values required by the format
44 exactly.
45
46
47.. function:: pack_into(fmt, buffer, offset, v1, v2, ...)
48
Mark Dickinson83afa362010-05-22 18:47:23 +000049 Pack the values ``v1, v2, ...`` according to the given format, write the
50 packed bytes into the writable *buffer* starting at *offset*. Note that the
51 offset is a required argument.
Georg Brandl8ec7f652007-08-15 14:28:01 +000052
53 .. versionadded:: 2.5
54
55
56.. function:: unpack(fmt, string)
57
58 Unpack the string (presumably packed by ``pack(fmt, ...)``) according to the
Mark Dickinson83afa362010-05-22 18:47:23 +000059 given format. The result is a tuple even if it contains exactly one item.
60 The string must contain exactly the amount of data required by the format
Georg Brandl8ec7f652007-08-15 14:28:01 +000061 (``len(string)`` must equal ``calcsize(fmt)``).
62
63
64.. function:: unpack_from(fmt, buffer[,offset=0])
65
Georg Brandl51b72162009-10-27 13:54:57 +000066 Unpack the *buffer* according to the given format. The result is a tuple even
Mark Dickinson83afa362010-05-22 18:47:23 +000067 if it contains exactly one item. The *buffer* must contain at least the
68 amount of data required by the format (``len(buffer[offset:])`` must be at
69 least ``calcsize(fmt)``).
Georg Brandl8ec7f652007-08-15 14:28:01 +000070
71 .. versionadded:: 2.5
72
73
74.. function:: calcsize(fmt)
75
76 Return the size of the struct (and hence of the string) corresponding to the
77 given format.
78
Mark Dickinson83afa362010-05-22 18:47:23 +000079.. _struct-format-strings:
80
81Format Strings
82--------------
83
84Format strings are the mechanism used to specify the expected layout when
85packing and unpacking data. They are built up from format characters, which
86specify the type of data being packed/unpacked. In addition, there are
87special characters for controlling the byte order, size, and alignment.
88
89Format Characters
90^^^^^^^^^^^^^^^^^
91
Georg Brandl8ec7f652007-08-15 14:28:01 +000092Format characters have the following meaning; the conversion between C and
93Python values should be obvious given their types:
94
Mark Dickinson83afa362010-05-22 18:47:23 +000095+--------+-------------------------+--------------------+------------+
96| Format | C Type | Python | Notes |
97+========+=========================+====================+============+
98| ``x`` | pad byte | no value | |
99+--------+-------------------------+--------------------+------------+
100| ``c`` | :ctype:`char` | string of length 1 | |
101+--------+-------------------------+--------------------+------------+
102| ``b`` | :ctype:`signed char` | integer | \(3) |
103+--------+-------------------------+--------------------+------------+
104| ``B`` | :ctype:`unsigned char` | integer | \(3) |
105+--------+-------------------------+--------------------+------------+
106| ``?`` | :ctype:`_Bool` | bool | \(1) |
107+--------+-------------------------+--------------------+------------+
108| ``h`` | :ctype:`short` | integer | \(3) |
109+--------+-------------------------+--------------------+------------+
110| ``H`` | :ctype:`unsigned short` | integer | \(3) |
111+--------+-------------------------+--------------------+------------+
112| ``i`` | :ctype:`int` | integer | \(3) |
113+--------+-------------------------+--------------------+------------+
114| ``I`` | :ctype:`unsigned int` | integer or long | \(3) |
115+--------+-------------------------+--------------------+------------+
116| ``l`` | :ctype:`long` | integer | \(3) |
117+--------+-------------------------+--------------------+------------+
118| ``L`` | :ctype:`unsigned long` | long | \(3) |
119+--------+-------------------------+--------------------+------------+
120| ``q`` | :ctype:`long long` | long | \(2),\(3) |
121+--------+-------------------------+--------------------+------------+
122| ``Q`` | :ctype:`unsigned long | long | \(2),\(3) |
123| | long` | | |
124+--------+-------------------------+--------------------+------------+
125| ``f`` | :ctype:`float` | float | |
126+--------+-------------------------+--------------------+------------+
127| ``d`` | :ctype:`double` | float | |
128+--------+-------------------------+--------------------+------------+
129| ``s`` | :ctype:`char[]` | string | |
130+--------+-------------------------+--------------------+------------+
131| ``p`` | :ctype:`char[]` | string | |
132+--------+-------------------------+--------------------+------------+
133| ``P`` | :ctype:`void \*` | long | \(3) |
134+--------+-------------------------+--------------------+------------+
Georg Brandl8ec7f652007-08-15 14:28:01 +0000135
136Notes:
137
138(1)
Thomas Hellerf3c05592008-03-05 15:34:29 +0000139 The ``'?'`` conversion code corresponds to the :ctype:`_Bool` type defined by
Georg Brandl8ec7f652007-08-15 14:28:01 +0000140 C99. If this type is not available, it is simulated using a :ctype:`char`. In
141 standard mode, it is always represented by one byte.
142
143 .. versionadded:: 2.6
144
145(2)
146 The ``'q'`` and ``'Q'`` conversion codes are available in native mode only if
147 the platform C compiler supports C :ctype:`long long`, or, on Windows,
148 :ctype:`__int64`. They are always available in standard modes.
149
150 .. versionadded:: 2.2
151
152A format character may be preceded by an integral repeat count. For example,
153the format string ``'4h'`` means exactly the same as ``'hhhh'``.
154
155Whitespace characters between formats are ignored; a count and its format must
156not contain whitespace though.
157
158For the ``'s'`` format character, the count is interpreted as the size of the
159string, not a repeat count like for the other format characters; for example,
160``'10s'`` means a single 10-byte string, while ``'10c'`` means 10 characters.
161For packing, the string is truncated or padded with null bytes as appropriate to
162make it fit. For unpacking, the resulting string always has exactly the
163specified number of bytes. As a special case, ``'0s'`` means a single, empty
164string (while ``'0c'`` means 0 characters).
165
166The ``'p'`` format character encodes a "Pascal string", meaning a short
167variable-length string stored in a fixed number of bytes. The count is the total
168number of bytes stored. The first byte stored is the length of the string, or
169255, whichever is smaller. The bytes of the string follow. If the string
170passed in to :func:`pack` is too long (longer than the count minus 1), only the
171leading count-1 bytes of the string are stored. If the string is shorter than
172count-1, it is padded with null bytes so that exactly count bytes in all are
173used. Note that for :func:`unpack`, the ``'p'`` format character consumes count
174bytes, but that the string returned can never contain more than 255 characters.
175
176For the ``'I'``, ``'L'``, ``'q'`` and ``'Q'`` format characters, the return
177value is a Python long integer.
178
179For the ``'P'`` format character, the return value is a Python integer or long
180integer, depending on the size needed to hold a pointer when it has been cast to
181an integer type. A *NULL* pointer will always be returned as the Python integer
182``0``. When packing pointer-sized values, Python integer or long integer objects
183may be used. For example, the Alpha and Merced processors use 64-bit pointer
184values, meaning a Python long integer will be used to hold the pointer; other
185platforms use 32-bit pointers and will use a Python integer.
186
Thomas Hellerf3c05592008-03-05 15:34:29 +0000187For the ``'?'`` format character, the return value is either :const:`True` or
Georg Brandl8ec7f652007-08-15 14:28:01 +0000188:const:`False`. When packing, the truth value of the argument object is used.
189Either 0 or 1 in the native or standard bool representation will be packed, and
190any non-zero value will be True when unpacking.
191
Mark Dickinson83afa362010-05-22 18:47:23 +0000192
193.. _struct-alignment:
194
195Byte Order, Size, and Alignment
196^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
197
198By default, C types are represented in the machine's native format and byte
Georg Brandl8ec7f652007-08-15 14:28:01 +0000199order, and properly aligned by skipping pad bytes if necessary (according to the
200rules used by the C compiler).
201
202Alternatively, the first character of the format string can be used to indicate
203the byte order, size and alignment of the packed data, according to the
204following table:
205
206+-----------+------------------------+--------------------+
207| Character | Byte order | Size and alignment |
208+===========+========================+====================+
209| ``@`` | native | native |
210+-----------+------------------------+--------------------+
211| ``=`` | native | standard |
212+-----------+------------------------+--------------------+
213| ``<`` | little-endian | standard |
214+-----------+------------------------+--------------------+
215| ``>`` | big-endian | standard |
216+-----------+------------------------+--------------------+
217| ``!`` | network (= big-endian) | standard |
218+-----------+------------------------+--------------------+
219
220If the first character is not one of these, ``'@'`` is assumed.
221
Georg Brandlf5dec8e2010-05-19 14:12:57 +0000222Native byte order is big-endian or little-endian, depending on the host
223system. For example, Intel x86 and AMD64 (x86-64) are little-endian;
224Motorola 68000 and PowerPC G5 are big-endian; ARM and Intel Itanium feature
225switchable endianness (bi-endian). Use ``sys.byteorder`` to check the
226endianness of your system.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000227
228Native size and alignment are determined using the C compiler's
Georg Brandlb19be572007-12-29 10:57:00 +0000229``sizeof`` expression. This is always combined with native byte order.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000230
231Standard size and alignment are as follows: no alignment is required for any
232type (so you have to use pad bytes); :ctype:`short` is 2 bytes; :ctype:`int` and
233:ctype:`long` are 4 bytes; :ctype:`long long` (:ctype:`__int64` on Windows) is 8
234bytes; :ctype:`float` and :ctype:`double` are 32-bit and 64-bit IEEE floating
235point numbers, respectively. :ctype:`_Bool` is 1 byte.
236
237Note the difference between ``'@'`` and ``'='``: both use native byte order, but
238the size and alignment of the latter is standardized.
239
240The form ``'!'`` is available for those poor souls who claim they can't remember
241whether network byte order is big-endian or little-endian.
242
243There is no way to indicate non-native byte order (force byte-swapping); use the
244appropriate choice of ``'<'`` or ``'>'``.
245
246The ``'P'`` format character is only available for the native byte ordering
247(selected as the default or with the ``'@'`` byte order character). The byte
248order character ``'='`` chooses to use little- or big-endian ordering based on
249the host system. The struct module does not interpret this as native ordering,
250so the ``'P'`` format is not available.
251
Mark Dickinson83afa362010-05-22 18:47:23 +0000252Notes:
253
254(1) Padding is only automatically added between successive structure members.
255 No padding is added at the beginning or the end of the encoded struct.
256
257(2) No padding is added when using non-native size and alignment, e.g.
258 with '<', '>', '=', and '!'.
259
260(3) To align the end of a structure to the alignment requirement of a
261 particular type, end the format with the code for that type with a repeat
262 count of zero. See :ref:`struct-examples`.
263
264
265.. _struct-examples:
266
267Examples
268^^^^^^^^
269
270.. note::
271 All examples assume a native byte order, size, and alignment with a
272 big-endian machine.
273
274A basic example of packing/unpacking three integers::
Georg Brandl8ec7f652007-08-15 14:28:01 +0000275
276 >>> from struct import *
277 >>> pack('hhl', 1, 2, 3)
278 '\x00\x01\x00\x02\x00\x00\x00\x03'
279 >>> unpack('hhl', '\x00\x01\x00\x02\x00\x00\x00\x03')
280 (1, 2, 3)
281 >>> calcsize('hhl')
282 8
283
Raymond Hettingerf6901e92008-05-23 17:21:44 +0000284Unpacked fields can be named by assigning them to variables or by wrapping
285the result in a named tuple::
286
287 >>> record = 'raymond \x32\x12\x08\x01\x08'
288 >>> name, serialnum, school, gradelevel = unpack('<10sHHb', record)
289
290 >>> from collections import namedtuple
291 >>> Student = namedtuple('Student', 'name serialnum school gradelevel')
292 >>> Student._make(unpack('<10sHHb', s))
293 Student(name='raymond ', serialnum=4658, school=264, gradelevel=8)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000294
Mark Dickinson83afa362010-05-22 18:47:23 +0000295The ordering of format characters may have an impact on size since the padding
296needed to satisfy alignment requirements is different::
297
298 >>> pack('ci', '*', 0x12131415)
299 '*\x00\x00\x00\x12\x13\x14\x15'
300 >>> pack('ic', 0x12131415, '*')
301 '\x12\x13\x14\x15*'
302 >>> calcsize('ci')
303 8
304 >>> calcsize('ic')
305 5
306
307The following format ``'llh0l'`` specifies two pad bytes at the end, assuming
308longs are aligned on 4-byte boundaries::
309
310 >>> pack('llh0l', 1, 2, 3)
311 '\x00\x00\x00\x01\x00\x00\x00\x02\x00\x03\x00\x00'
312
313This only works when native size and alignment are in effect; standard size and
314alignment does not enforce any alignment.
315
316
Georg Brandl8ec7f652007-08-15 14:28:01 +0000317.. seealso::
318
319 Module :mod:`array`
320 Packed binary storage of homogeneous data.
321
322 Module :mod:`xdrlib`
323 Packing and unpacking of XDR data.
324
325
326.. _struct-objects:
327
Mark Dickinson83afa362010-05-22 18:47:23 +0000328Objects
329-------
Georg Brandl8ec7f652007-08-15 14:28:01 +0000330
331The :mod:`struct` module also defines the following type:
332
333
334.. class:: Struct(format)
335
Mark Dickinson83afa362010-05-22 18:47:23 +0000336 Return a new Struct object which writes and reads binary data according to
337 the format string *format*. Creating a Struct object once and calling its
338 methods is more efficient than calling the :mod:`struct` functions with the
339 same format since the format string only needs to be compiled once.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000340
341 .. versionadded:: 2.5
342
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000343 Compiled Struct objects support the following methods and attributes:
Georg Brandl8ec7f652007-08-15 14:28:01 +0000344
345
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000346 .. method:: pack(v1, v2, ...)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000347
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000348 Identical to the :func:`pack` function, using the compiled format.
349 (``len(result)`` will equal :attr:`self.size`.)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000350
351
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000352 .. method:: pack_into(buffer, offset, v1, v2, ...)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000353
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000354 Identical to the :func:`pack_into` function, using the compiled format.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000355
356
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000357 .. method:: unpack(string)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000358
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000359 Identical to the :func:`unpack` function, using the compiled format.
360 (``len(string)`` must equal :attr:`self.size`).
Georg Brandl8ec7f652007-08-15 14:28:01 +0000361
362
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000363 .. method:: unpack_from(buffer[, offset=0])
Georg Brandl8ec7f652007-08-15 14:28:01 +0000364
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000365 Identical to the :func:`unpack_from` function, using the compiled format.
366 (``len(buffer[offset:])`` must be at least :attr:`self.size`).
Georg Brandl8ec7f652007-08-15 14:28:01 +0000367
368
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000369 .. attribute:: format
Georg Brandl8ec7f652007-08-15 14:28:01 +0000370
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000371 The format string used to construct this Struct object.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000372
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000373 .. attribute:: size
Georg Brandlb7a837d2007-08-23 21:21:36 +0000374
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000375 The calculated size of the struct (and hence of the string) corresponding
376 to :attr:`format`.
Georg Brandlb7a837d2007-08-23 21:21:36 +0000377