blob: ee568e624110efab53c3450cff4ba31cb4db4f93 [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001
2:mod:`struct` --- Interpret strings as packed binary data
3=========================================================
4
5.. module:: struct
6 :synopsis: Interpret strings as packed binary data.
7
8.. index::
9 pair: C; structures
10 triple: packing; binary; data
11
12This module performs conversions between Python values and C structs represented
13as Python strings. It uses :dfn:`format strings` (explained below) as compact
14descriptions of the lay-out of the C structs and the intended conversion to/from
15Python values. This can be used in handling binary data stored in files or from
16network connections, among other sources.
17
18The module defines the following exception and functions:
19
20
21.. exception:: error
22
23 Exception raised on various occasions; argument is a string describing what is
24 wrong.
25
26
27.. function:: pack(fmt, v1, v2, ...)
28
29 Return a string containing the values ``v1, v2, ...`` packed according to the
30 given format. The arguments must match the values required by the format
31 exactly.
32
33
34.. function:: pack_into(fmt, buffer, offset, v1, v2, ...)
35
36 Pack the values ``v1, v2, ...`` according to the given format, write the packed
37 bytes into the writable *buffer* starting at *offset*. Note that the offset is
38 a required argument.
39
Georg Brandl116aa622007-08-15 14:28:22 +000040
41.. function:: unpack(fmt, string)
42
43 Unpack the string (presumably packed by ``pack(fmt, ...)``) according to the
44 given format. The result is a tuple even if it contains exactly one item. The
45 string must contain exactly the amount of data required by the format
46 (``len(string)`` must equal ``calcsize(fmt)``).
47
48
49.. function:: unpack_from(fmt, buffer[,offset=0])
50
51 Unpack the *buffer* according to tthe given format. The result is a tuple even
52 if it contains exactly one item. The *buffer* must contain at least the amount
53 of data required by the format (``len(buffer[offset:])`` must be at least
54 ``calcsize(fmt)``).
55
Georg Brandl116aa622007-08-15 14:28:22 +000056
57.. function:: calcsize(fmt)
58
59 Return the size of the struct (and hence of the string) corresponding to the
60 given format.
61
62Format characters have the following meaning; the conversion between C and
63Python values should be obvious given their types:
64
65+--------+-------------------------+--------------------+-------+
66| Format | C Type | Python | Notes |
67+========+=========================+====================+=======+
68| ``x`` | pad byte | no value | |
69+--------+-------------------------+--------------------+-------+
70| ``c`` | :ctype:`char` | string of length 1 | |
71+--------+-------------------------+--------------------+-------+
72| ``b`` | :ctype:`signed char` | integer | |
73+--------+-------------------------+--------------------+-------+
74| ``B`` | :ctype:`unsigned char` | integer | |
75+--------+-------------------------+--------------------+-------+
76| ``t`` | :ctype:`_Bool` | bool | \(1) |
77+--------+-------------------------+--------------------+-------+
78| ``h`` | :ctype:`short` | integer | |
79+--------+-------------------------+--------------------+-------+
80| ``H`` | :ctype:`unsigned short` | integer | |
81+--------+-------------------------+--------------------+-------+
82| ``i`` | :ctype:`int` | integer | |
83+--------+-------------------------+--------------------+-------+
Georg Brandlba956ae2007-11-29 17:24:34 +000084| ``I`` | :ctype:`unsigned int` | integer | |
Georg Brandl116aa622007-08-15 14:28:22 +000085+--------+-------------------------+--------------------+-------+
86| ``l`` | :ctype:`long` | integer | |
87+--------+-------------------------+--------------------+-------+
Georg Brandlba956ae2007-11-29 17:24:34 +000088| ``L`` | :ctype:`unsigned long` | integer | |
Georg Brandl116aa622007-08-15 14:28:22 +000089+--------+-------------------------+--------------------+-------+
Georg Brandlba956ae2007-11-29 17:24:34 +000090| ``q`` | :ctype:`long long` | integer | \(2) |
Georg Brandl116aa622007-08-15 14:28:22 +000091+--------+-------------------------+--------------------+-------+
Georg Brandlba956ae2007-11-29 17:24:34 +000092| ``Q`` | :ctype:`unsigned long | integer | \(2) |
Georg Brandl116aa622007-08-15 14:28:22 +000093| | long` | | |
94+--------+-------------------------+--------------------+-------+
95| ``f`` | :ctype:`float` | float | |
96+--------+-------------------------+--------------------+-------+
97| ``d`` | :ctype:`double` | float | |
98+--------+-------------------------+--------------------+-------+
99| ``s`` | :ctype:`char[]` | string | |
100+--------+-------------------------+--------------------+-------+
101| ``p`` | :ctype:`char[]` | string | |
102+--------+-------------------------+--------------------+-------+
103| ``P`` | :ctype:`void \*` | integer | |
104+--------+-------------------------+--------------------+-------+
105
106Notes:
107
108(1)
109 The ``'t'`` conversion code corresponds to the :ctype:`_Bool` type defined by
110 C99. If this type is not available, it is simulated using a :ctype:`char`. In
111 standard mode, it is always represented by one byte.
112
Georg Brandl116aa622007-08-15 14:28:22 +0000113(2)
114 The ``'q'`` and ``'Q'`` conversion codes are available in native mode only if
115 the platform C compiler supports C :ctype:`long long`, or, on Windows,
116 :ctype:`__int64`. They are always available in standard modes.
117
Georg Brandl116aa622007-08-15 14:28:22 +0000118A format character may be preceded by an integral repeat count. For example,
119the format string ``'4h'`` means exactly the same as ``'hhhh'``.
120
121Whitespace characters between formats are ignored; a count and its format must
122not contain whitespace though.
123
124For the ``'s'`` format character, the count is interpreted as the size of the
125string, not a repeat count like for the other format characters; for example,
126``'10s'`` means a single 10-byte string, while ``'10c'`` means 10 characters.
127For packing, the string is truncated or padded with null bytes as appropriate to
128make it fit. For unpacking, the resulting string always has exactly the
129specified number of bytes. As a special case, ``'0s'`` means a single, empty
130string (while ``'0c'`` means 0 characters).
131
132The ``'p'`` format character encodes a "Pascal string", meaning a short
133variable-length string stored in a fixed number of bytes. The count is the total
134number of bytes stored. The first byte stored is the length of the string, or
135255, whichever is smaller. The bytes of the string follow. If the string
136passed in to :func:`pack` is too long (longer than the count minus 1), only the
137leading count-1 bytes of the string are stored. If the string is shorter than
138count-1, it is padded with null bytes so that exactly count bytes in all are
139used. Note that for :func:`unpack`, the ``'p'`` format character consumes count
140bytes, but that the string returned can never contain more than 255 characters.
141
Georg Brandl116aa622007-08-15 14:28:22 +0000142
Georg Brandl116aa622007-08-15 14:28:22 +0000143
144For the ``'t'`` format character, the return value is either :const:`True` or
145:const:`False`. When packing, the truth value of the argument object is used.
146Either 0 or 1 in the native or standard bool representation will be packed, and
147any non-zero value will be True when unpacking.
148
149By default, C numbers are represented in the machine's native format and byte
150order, and properly aligned by skipping pad bytes if necessary (according to the
151rules used by the C compiler).
152
153Alternatively, the first character of the format string can be used to indicate
154the byte order, size and alignment of the packed data, according to the
155following table:
156
157+-----------+------------------------+--------------------+
158| Character | Byte order | Size and alignment |
159+===========+========================+====================+
160| ``@`` | native | native |
161+-----------+------------------------+--------------------+
162| ``=`` | native | standard |
163+-----------+------------------------+--------------------+
164| ``<`` | little-endian | standard |
165+-----------+------------------------+--------------------+
166| ``>`` | big-endian | standard |
167+-----------+------------------------+--------------------+
168| ``!`` | network (= big-endian) | standard |
169+-----------+------------------------+--------------------+
170
171If the first character is not one of these, ``'@'`` is assumed.
172
173Native byte order is big-endian or little-endian, depending on the host system.
174For example, Motorola and Sun processors are big-endian; Intel and DEC
175processors are little-endian.
176
177Native size and alignment are determined using the C compiler's
Christian Heimes5b5e81c2007-12-31 16:14:33 +0000178``sizeof`` expression. This is always combined with native byte order.
Georg Brandl116aa622007-08-15 14:28:22 +0000179
180Standard size and alignment are as follows: no alignment is required for any
181type (so you have to use pad bytes); :ctype:`short` is 2 bytes; :ctype:`int` and
182:ctype:`long` are 4 bytes; :ctype:`long long` (:ctype:`__int64` on Windows) is 8
183bytes; :ctype:`float` and :ctype:`double` are 32-bit and 64-bit IEEE floating
184point numbers, respectively. :ctype:`_Bool` is 1 byte.
185
186Note the difference between ``'@'`` and ``'='``: both use native byte order, but
187the size and alignment of the latter is standardized.
188
189The form ``'!'`` is available for those poor souls who claim they can't remember
190whether network byte order is big-endian or little-endian.
191
192There is no way to indicate non-native byte order (force byte-swapping); use the
193appropriate choice of ``'<'`` or ``'>'``.
194
195The ``'P'`` format character is only available for the native byte ordering
196(selected as the default or with the ``'@'`` byte order character). The byte
197order character ``'='`` chooses to use little- or big-endian ordering based on
198the host system. The struct module does not interpret this as native ordering,
199so the ``'P'`` format is not available.
200
201Examples (all using native byte order, size and alignment, on a big-endian
202machine)::
203
204 >>> from struct import *
205 >>> pack('hhl', 1, 2, 3)
206 '\x00\x01\x00\x02\x00\x00\x00\x03'
207 >>> unpack('hhl', '\x00\x01\x00\x02\x00\x00\x00\x03')
208 (1, 2, 3)
209 >>> calcsize('hhl')
210 8
211
212Hint: to align the end of a structure to the alignment requirement of a
213particular type, end the format with the code for that type with a repeat count
214of zero. For example, the format ``'llh0l'`` specifies two pad bytes at the
215end, assuming longs are aligned on 4-byte boundaries. This only works when
216native size and alignment are in effect; standard size and alignment does not
217enforce any alignment.
218
219
220.. seealso::
221
222 Module :mod:`array`
223 Packed binary storage of homogeneous data.
224
225 Module :mod:`xdrlib`
226 Packing and unpacking of XDR data.
227
228
229.. _struct-objects:
230
231Struct Objects
232--------------
233
234The :mod:`struct` module also defines the following type:
235
236
237.. class:: Struct(format)
238
239 Return a new Struct object which writes and reads binary data according to the
240 format string *format*. Creating a Struct object once and calling its methods
241 is more efficient than calling the :mod:`struct` functions with the same format
242 since the format string only needs to be compiled once.
243
Georg Brandl116aa622007-08-15 14:28:22 +0000244
245Compiled Struct objects support the following methods and attributes:
246
Georg Brandl116aa622007-08-15 14:28:22 +0000247.. method:: Struct.pack(v1, v2, ...)
248
249 Identical to the :func:`pack` function, using the compiled format.
250 (``len(result)`` will equal :attr:`self.size`.)
251
252
253.. method:: Struct.pack_into(buffer, offset, v1, v2, ...)
254
255 Identical to the :func:`pack_into` function, using the compiled format.
256
257
258.. method:: Struct.unpack(string)
259
260 Identical to the :func:`unpack` function, using the compiled format.
261 (``len(string)`` must equal :attr:`self.size`).
262
263
264.. method:: Struct.unpack_from(buffer[, offset=0])
265
266 Identical to the :func:`unpack_from` function, using the compiled format.
267 (``len(buffer[offset:])`` must be at least :attr:`self.size`).
268
269
270.. attribute:: Struct.format
271
272 The format string used to construct this Struct object.
273
Guido van Rossum04110fb2007-08-24 16:32:05 +0000274.. attribute:: Struct.size
275
276 The calculated size of the struct (and hence of the string) corresponding
277 to :attr:`format`.
278