Fred Drake | 295da24 | 1998-08-10 19:42:37 +0000 | [diff] [blame] | 1 | \section{\module{struct} --- |
Fred Drake | b68a125 | 1999-08-24 20:16:29 +0000 | [diff] [blame] | 2 | Interpret strings as packed binary data} |
Fred Drake | b91e934 | 1998-07-23 17:59:49 +0000 | [diff] [blame] | 3 | \declaremodule{builtin}{struct} |
| 4 | |
| 5 | \modulesynopsis{Interpret strings as packed binary data.} |
| 6 | |
Fred Drake | b68a125 | 1999-08-24 20:16:29 +0000 | [diff] [blame] | 7 | \indexii{C}{structures} |
| 8 | \indexiii{packing}{binary}{data} |
Guido van Rossum | 5fdeeea | 1994-01-02 01:22:07 +0000 | [diff] [blame] | 9 | |
Fred Drake | b68a125 | 1999-08-24 20:16:29 +0000 | [diff] [blame] | 10 | This module performs conversions between Python values and C |
Guido van Rossum | 5fdeeea | 1994-01-02 01:22:07 +0000 | [diff] [blame] | 11 | structs represented as Python strings. It uses \dfn{format strings} |
Fred Drake | b68a125 | 1999-08-24 20:16:29 +0000 | [diff] [blame] | 12 | (explained below) as compact descriptions of the lay-out of the C |
| 13 | structs and the intended conversion to/from Python values. This can |
| 14 | be used in handling binary data stored in files or from network |
| 15 | connections, among other sources. |
Guido van Rossum | 5fdeeea | 1994-01-02 01:22:07 +0000 | [diff] [blame] | 16 | |
| 17 | The module defines the following exception and functions: |
| 18 | |
Fred Drake | 7ddd043 | 1998-03-08 07:44:13 +0000 | [diff] [blame] | 19 | |
Guido van Rossum | 5fdeeea | 1994-01-02 01:22:07 +0000 | [diff] [blame] | 20 | \begin{excdesc}{error} |
| 21 | Exception raised on various occasions; argument is a string |
| 22 | describing what is wrong. |
| 23 | \end{excdesc} |
| 24 | |
Fred Drake | 50b804d | 1998-11-30 22:14:58 +0000 | [diff] [blame] | 25 | \begin{funcdesc}{pack}{fmt, v1, v2, \textrm{\ldots}} |
Guido van Rossum | 5fdeeea | 1994-01-02 01:22:07 +0000 | [diff] [blame] | 26 | Return a string containing the values |
Fred Drake | 50b804d | 1998-11-30 22:14:58 +0000 | [diff] [blame] | 27 | \code{\var{v1}, \var{v2}, \textrm{\ldots}} packed according to the given |
Guido van Rossum | 5fdeeea | 1994-01-02 01:22:07 +0000 | [diff] [blame] | 28 | format. The arguments must match the values required by the format |
| 29 | exactly. |
| 30 | \end{funcdesc} |
| 31 | |
Georg Brandl | ee467d0 | 2007-02-15 11:29:08 +0000 | [diff] [blame] | 32 | \begin{funcdesc}{pack_into}{fmt, buffer, offset, v1, v2, \moreargs} |
| 33 | Pack the values \code{\var{v1}, \var{v2}, \textrm{\ldots}} according to the given |
| 34 | format, write the packed bytes into the writable \var{buffer} starting at |
| 35 | \var{offset}. |
| 36 | Note that the offset is not an optional argument. |
Georg Brandl | 6b2a1a0 | 2007-02-15 11:29:58 +0000 | [diff] [blame] | 37 | |
| 38 | \versionadded{2.5} |
Georg Brandl | ee467d0 | 2007-02-15 11:29:08 +0000 | [diff] [blame] | 39 | \end{funcdesc} |
| 40 | |
Fred Drake | cce1090 | 1998-03-17 06:33:25 +0000 | [diff] [blame] | 41 | \begin{funcdesc}{unpack}{fmt, string} |
Fred Drake | 50b804d | 1998-11-30 22:14:58 +0000 | [diff] [blame] | 42 | Unpack the string (presumably packed by \code{pack(\var{fmt}, |
| 43 | \textrm{\ldots})}) according to the given format. The result is a |
| 44 | tuple even if it contains exactly one item. The string must contain |
Fred Drake | 907e76b | 2001-07-06 20:30:11 +0000 | [diff] [blame] | 45 | exactly the amount of data required by the format |
| 46 | (\code{len(\var{string})} must equal \code{calcsize(\var{fmt})}). |
Guido van Rossum | 5fdeeea | 1994-01-02 01:22:07 +0000 | [diff] [blame] | 47 | \end{funcdesc} |
| 48 | |
Georg Brandl | ee467d0 | 2007-02-15 11:29:08 +0000 | [diff] [blame] | 49 | \begin{funcdesc}{unpack_from}{fmt, buffer\optional{,offset \code{= 0}}} |
| 50 | Unpack the \var{buffer} according to tthe given format. |
| 51 | The result is a tuple even if it contains exactly one item. The |
| 52 | \var{buffer} must contain at least the amount of data required by the |
| 53 | format (\code{len(buffer[offset:])} must be at least |
| 54 | \code{calcsize(\var{fmt})}). |
Georg Brandl | 6b2a1a0 | 2007-02-15 11:29:58 +0000 | [diff] [blame] | 55 | |
| 56 | \versionadded{2.5} |
Georg Brandl | ee467d0 | 2007-02-15 11:29:08 +0000 | [diff] [blame] | 57 | \end{funcdesc} |
| 58 | |
Guido van Rossum | 5fdeeea | 1994-01-02 01:22:07 +0000 | [diff] [blame] | 59 | \begin{funcdesc}{calcsize}{fmt} |
| 60 | Return the size of the struct (and hence of the string) |
| 61 | corresponding to the given format. |
| 62 | \end{funcdesc} |
| 63 | |
Fred Drake | 50b804d | 1998-11-30 22:14:58 +0000 | [diff] [blame] | 64 | Format characters have the following meaning; the conversion between |
Fred Drake | b68a125 | 1999-08-24 20:16:29 +0000 | [diff] [blame] | 65 | C and Python values should be obvious given their types: |
Guido van Rossum | 5fdeeea | 1994-01-02 01:22:07 +0000 | [diff] [blame] | 66 | |
Fred Drake | 38e5d27 | 2000-04-03 20:13:55 +0000 | [diff] [blame] | 67 | \begin{tableiv}{c|l|l|c}{samp}{Format}{C Type}{Python}{Notes} |
| 68 | \lineiv{x}{pad byte}{no value}{} |
| 69 | \lineiv{c}{\ctype{char}}{string of length 1}{} |
| 70 | \lineiv{b}{\ctype{signed char}}{integer}{} |
| 71 | \lineiv{B}{\ctype{unsigned char}}{integer}{} |
| 72 | \lineiv{h}{\ctype{short}}{integer}{} |
| 73 | \lineiv{H}{\ctype{unsigned short}}{integer}{} |
| 74 | \lineiv{i}{\ctype{int}}{integer}{} |
Tim Peters | 7b9542a | 2001-06-10 23:40:19 +0000 | [diff] [blame] | 75 | \lineiv{I}{\ctype{unsigned int}}{long}{} |
Fred Drake | 38e5d27 | 2000-04-03 20:13:55 +0000 | [diff] [blame] | 76 | \lineiv{l}{\ctype{long}}{integer}{} |
| 77 | \lineiv{L}{\ctype{unsigned long}}{long}{} |
Tim Peters | 7b9542a | 2001-06-10 23:40:19 +0000 | [diff] [blame] | 78 | \lineiv{q}{\ctype{long long}}{long}{(1)} |
| 79 | \lineiv{Q}{\ctype{unsigned long long}}{long}{(1)} |
Fred Drake | 38e5d27 | 2000-04-03 20:13:55 +0000 | [diff] [blame] | 80 | \lineiv{f}{\ctype{float}}{float}{} |
| 81 | \lineiv{d}{\ctype{double}}{float}{} |
| 82 | \lineiv{s}{\ctype{char[]}}{string}{} |
| 83 | \lineiv{p}{\ctype{char[]}}{string}{} |
| 84 | \lineiv{P}{\ctype{void *}}{integer}{} |
| 85 | \end{tableiv} |
| 86 | |
| 87 | \noindent |
| 88 | Notes: |
| 89 | |
| 90 | \begin{description} |
| 91 | \item[(1)] |
Tim Peters | 7b9542a | 2001-06-10 23:40:19 +0000 | [diff] [blame] | 92 | The \character{q} and \character{Q} conversion codes are available in |
| 93 | native mode only if the platform C compiler supports C \ctype{long long}, |
Fred Drake | 54d10fd | 2001-06-15 14:13:07 +0000 | [diff] [blame] | 94 | or, on Windows, \ctype{__int64}. They are always available in standard |
Tim Peters | 7a3bfc3 | 2001-06-12 01:22:22 +0000 | [diff] [blame] | 95 | modes. |
Fred Drake | 54d10fd | 2001-06-15 14:13:07 +0000 | [diff] [blame] | 96 | \versionadded{2.2} |
Fred Drake | 38e5d27 | 2000-04-03 20:13:55 +0000 | [diff] [blame] | 97 | \end{description} |
| 98 | |
Guido van Rossum | 5fdeeea | 1994-01-02 01:22:07 +0000 | [diff] [blame] | 99 | |
Fred Drake | 907e76b | 2001-07-06 20:30:11 +0000 | [diff] [blame] | 100 | A format character may be preceded by an integral repeat count. For |
| 101 | example, the format string \code{'4h'} means exactly the same as |
Fred Drake | 50b804d | 1998-11-30 22:14:58 +0000 | [diff] [blame] | 102 | \code{'hhhh'}. |
Guido van Rossum | 5fdeeea | 1994-01-02 01:22:07 +0000 | [diff] [blame] | 103 | |
Guido van Rossum | e20aef5 | 1997-08-26 20:39:54 +0000 | [diff] [blame] | 104 | Whitespace characters between formats are ignored; a count and its |
| 105 | format must not contain whitespace though. |
| 106 | |
Fred Drake | cf0fb8b | 1998-07-23 21:18:25 +0000 | [diff] [blame] | 107 | For the \character{s} format character, the count is interpreted as the |
Guido van Rossum | 1254346 | 1996-12-31 02:22:14 +0000 | [diff] [blame] | 108 | size of the string, not a repeat count like for the other format |
Fred Drake | 907e76b | 2001-07-06 20:30:11 +0000 | [diff] [blame] | 109 | characters; for example, \code{'10s'} means a single 10-byte string, while |
Guido van Rossum | 1254346 | 1996-12-31 02:22:14 +0000 | [diff] [blame] | 110 | \code{'10c'} means 10 characters. For packing, the string is |
| 111 | truncated or padded with null bytes as appropriate to make it fit. |
| 112 | For unpacking, the resulting string always has exactly the specified |
| 113 | number of bytes. As a special case, \code{'0s'} means a single, empty |
| 114 | string (while \code{'0c'} means 0 characters). |
Guido van Rossum | 5fdeeea | 1994-01-02 01:22:07 +0000 | [diff] [blame] | 115 | |
Tim Peters | 88091aa | 2001-09-15 18:09:22 +0000 | [diff] [blame] | 116 | The \character{p} format character encodes a "Pascal string", meaning |
| 117 | a short variable-length string stored in a fixed number of bytes. |
| 118 | The count is the total number of bytes stored. The first byte stored is |
| 119 | the length of the string, or 255, whichever is smaller. The bytes |
| 120 | of the string follow. If the string passed in to \function{pack()} is too |
| 121 | long (longer than the count minus 1), only the leading count-1 bytes of the |
Tim Peters | 5b7759f | 2001-09-15 18:16:27 +0000 | [diff] [blame] | 122 | string are stored. If the string is shorter than count-1, it is padded |
Tim Peters | 88091aa | 2001-09-15 18:09:22 +0000 | [diff] [blame] | 123 | with null bytes so that exactly count bytes in all are used. Note that |
| 124 | for \function{unpack()}, the \character{p} format character consumes count |
| 125 | bytes, but that the string returned can never contain more than 255 |
| 126 | characters. |
Fred Drake | cf0fb8b | 1998-07-23 21:18:25 +0000 | [diff] [blame] | 127 | |
Tim Peters | 7a3bfc3 | 2001-06-12 01:22:22 +0000 | [diff] [blame] | 128 | For the \character{I}, \character{L}, \character{q} and \character{Q} |
| 129 | format characters, the return value is a Python long integer. |
Guido van Rossum | 1254346 | 1996-12-31 02:22:14 +0000 | [diff] [blame] | 130 | |
Guido van Rossum | 6ac06b3 | 1998-09-21 14:44:34 +0000 | [diff] [blame] | 131 | For the \character{P} format character, the return value is a Python |
| 132 | integer or long integer, depending on the size needed to hold a |
Fred Drake | 50b804d | 1998-11-30 22:14:58 +0000 | [diff] [blame] | 133 | pointer when it has been cast to an integer type. A \NULL{} pointer will |
| 134 | always be returned as the Python integer \code{0}. When packing pointer-sized |
Guido van Rossum | 6ac06b3 | 1998-09-21 14:44:34 +0000 | [diff] [blame] | 135 | values, Python integer or long integer objects may be used. For |
| 136 | example, the Alpha and Merced processors use 64-bit pointer values, |
| 137 | meaning a Python long integer will be used to hold the pointer; other |
| 138 | platforms use 32-bit pointers and will use a Python integer. |
| 139 | |
Fred Drake | b68a125 | 1999-08-24 20:16:29 +0000 | [diff] [blame] | 140 | By default, C numbers are represented in the machine's native format |
Guido van Rossum | 1254346 | 1996-12-31 02:22:14 +0000 | [diff] [blame] | 141 | and byte order, and properly aligned by skipping pad bytes if |
Fred Drake | b68a125 | 1999-08-24 20:16:29 +0000 | [diff] [blame] | 142 | necessary (according to the rules used by the C compiler). |
Guido van Rossum | 1254346 | 1996-12-31 02:22:14 +0000 | [diff] [blame] | 143 | |
| 144 | Alternatively, the first character of the format string can be used to |
| 145 | indicate the byte order, size and alignment of the packed data, |
| 146 | according to the following table: |
| 147 | |
Fred Drake | ee60191 | 1998-04-11 20:53:03 +0000 | [diff] [blame] | 148 | \begin{tableiii}{c|l|l}{samp}{Character}{Byte order}{Size and alignment} |
Guido van Rossum | 1254346 | 1996-12-31 02:22:14 +0000 | [diff] [blame] | 149 | \lineiii{@}{native}{native} |
| 150 | \lineiii{=}{native}{standard} |
| 151 | \lineiii{<}{little-endian}{standard} |
| 152 | \lineiii{>}{big-endian}{standard} |
| 153 | \lineiii{!}{network (= big-endian)}{standard} |
| 154 | \end{tableiii} |
| 155 | |
Fred Drake | cf0fb8b | 1998-07-23 21:18:25 +0000 | [diff] [blame] | 156 | If the first character is not one of these, \character{@} is assumed. |
Guido van Rossum | 1254346 | 1996-12-31 02:22:14 +0000 | [diff] [blame] | 157 | |
| 158 | Native byte order is big-endian or little-endian, depending on the |
Fred Drake | 907e76b | 2001-07-06 20:30:11 +0000 | [diff] [blame] | 159 | host system. For example, Motorola and Sun processors are big-endian; |
| 160 | Intel and DEC processors are little-endian. |
Guido van Rossum | 1254346 | 1996-12-31 02:22:14 +0000 | [diff] [blame] | 161 | |
Fred Drake | b68a125 | 1999-08-24 20:16:29 +0000 | [diff] [blame] | 162 | Native size and alignment are determined using the C compiler's |
Fred Drake | 50b804d | 1998-11-30 22:14:58 +0000 | [diff] [blame] | 163 | \keyword{sizeof} expression. This is always combined with native byte |
| 164 | order. |
Guido van Rossum | 1254346 | 1996-12-31 02:22:14 +0000 | [diff] [blame] | 165 | |
| 166 | Standard size and alignment are as follows: no alignment is required |
Tim Peters | 7a3bfc3 | 2001-06-12 01:22:22 +0000 | [diff] [blame] | 167 | for any type (so you have to use pad bytes); |
| 168 | \ctype{short} is 2 bytes; |
| 169 | \ctype{int} and \ctype{long} are 4 bytes; |
| 170 | \ctype{long long} (\ctype{__int64} on Windows) is 8 bytes; |
| 171 | \ctype{float} and \ctype{double} are 32-bit and 64-bit |
| 172 | IEEE floating point numbers, respectively. |
Guido van Rossum | 1254346 | 1996-12-31 02:22:14 +0000 | [diff] [blame] | 173 | |
Fred Drake | 50b804d | 1998-11-30 22:14:58 +0000 | [diff] [blame] | 174 | Note the difference between \character{@} and \character{=}: both use |
| 175 | native byte order, but the size and alignment of the latter is |
| 176 | standardized. |
Guido van Rossum | 1254346 | 1996-12-31 02:22:14 +0000 | [diff] [blame] | 177 | |
Fred Drake | cf0fb8b | 1998-07-23 21:18:25 +0000 | [diff] [blame] | 178 | The form \character{!} is available for those poor souls who claim they |
Guido van Rossum | 1254346 | 1996-12-31 02:22:14 +0000 | [diff] [blame] | 179 | can't remember whether network byte order is big-endian or |
| 180 | little-endian. |
| 181 | |
Fred Drake | 907e76b | 2001-07-06 20:30:11 +0000 | [diff] [blame] | 182 | There is no way to indicate non-native byte order (force |
Fred Drake | cf0fb8b | 1998-07-23 21:18:25 +0000 | [diff] [blame] | 183 | byte-swapping); use the appropriate choice of \character{<} or |
| 184 | \character{>}. |
Guido van Rossum | 1254346 | 1996-12-31 02:22:14 +0000 | [diff] [blame] | 185 | |
Guido van Rossum | 6ac06b3 | 1998-09-21 14:44:34 +0000 | [diff] [blame] | 186 | The \character{P} format character is only available for the native |
| 187 | byte ordering (selected as the default or with the \character{@} byte |
| 188 | order character). The byte order character \character{=} chooses to |
| 189 | use little- or big-endian ordering based on the host system. The |
| 190 | struct module does not interpret this as native ordering, so the |
| 191 | \character{P} format is not available. |
| 192 | |
Guido van Rossum | 1254346 | 1996-12-31 02:22:14 +0000 | [diff] [blame] | 193 | Examples (all using native byte order, size and alignment, on a |
| 194 | big-endian machine): |
Guido van Rossum | 5fdeeea | 1994-01-02 01:22:07 +0000 | [diff] [blame] | 195 | |
Fred Drake | 1947991 | 1998-02-13 06:58:54 +0000 | [diff] [blame] | 196 | \begin{verbatim} |
Guido van Rossum | dbadd55 | 1997-01-03 04:20:09 +0000 | [diff] [blame] | 197 | >>> from struct import * |
| 198 | >>> pack('hhl', 1, 2, 3) |
Ka-Ping Yee | fa004ad | 2001-01-24 17:19:08 +0000 | [diff] [blame] | 199 | '\x00\x01\x00\x02\x00\x00\x00\x03' |
| 200 | >>> unpack('hhl', '\x00\x01\x00\x02\x00\x00\x00\x03') |
Guido van Rossum | dbadd55 | 1997-01-03 04:20:09 +0000 | [diff] [blame] | 201 | (1, 2, 3) |
| 202 | >>> calcsize('hhl') |
| 203 | 8 |
Fred Drake | 1947991 | 1998-02-13 06:58:54 +0000 | [diff] [blame] | 204 | \end{verbatim} |
Fred Drake | 50b804d | 1998-11-30 22:14:58 +0000 | [diff] [blame] | 205 | |
Guido van Rossum | 5fdeeea | 1994-01-02 01:22:07 +0000 | [diff] [blame] | 206 | Hint: to align the end of a structure to the alignment requirement of |
| 207 | a particular type, end the format with the code for that type with a |
Fred Drake | 907e76b | 2001-07-06 20:30:11 +0000 | [diff] [blame] | 208 | repeat count of zero. For example, the format \code{'llh0l'} |
| 209 | specifies two pad bytes at the end, assuming longs are aligned on |
| 210 | 4-byte boundaries. This only works when native size and alignment are |
| 211 | in effect; standard size and alignment does not enforce any alignment. |
Fred Drake | 7ddd043 | 1998-03-08 07:44:13 +0000 | [diff] [blame] | 212 | |
| 213 | \begin{seealso} |
Fred Drake | b68a125 | 1999-08-24 20:16:29 +0000 | [diff] [blame] | 214 | \seemodule{array}{Packed binary storage of homogeneous data.} |
| 215 | \seemodule{xdrlib}{Packing and unpacking of XDR data.} |
Fred Drake | 7ddd043 | 1998-03-08 07:44:13 +0000 | [diff] [blame] | 216 | \end{seealso} |
Georg Brandl | ee467d0 | 2007-02-15 11:29:08 +0000 | [diff] [blame] | 217 | |
| 218 | \subsection{Struct Objects \label{struct-objects}} |
| 219 | |
| 220 | The \module{struct} module also defines the following type: |
| 221 | |
| 222 | \begin{classdesc}{Struct}{format} |
| 223 | Return a new Struct object which writes and reads binary data according to |
| 224 | the format string \var{format}. Creating a Struct object once and calling |
| 225 | its methods is more efficient than calling the \module{struct} functions |
| 226 | with the same format since the format string only needs to be compiled once. |
| 227 | |
| 228 | \versionadded{2.5} |
| 229 | \end{classdesc} |
| 230 | |
| 231 | Compiled Struct objects support the following methods and attributes: |
| 232 | |
| 233 | \begin{methoddesc}[Struct]{pack}{v1, v2, \moreargs} |
| 234 | Identical to the \function{pack()} function, using the compiled format. |
| 235 | (\code{len(result)} will equal \member{self.size}.) |
| 236 | \end{methoddesc} |
| 237 | |
| 238 | \begin{methoddesc}[Struct]{pack_into}{buffer, offset, v1, v2, \moreargs} |
| 239 | Identical to the \function{pack_into()} function, using the compiled format. |
| 240 | \end{methoddesc} |
| 241 | |
| 242 | \begin{methoddesc}[Struct]{unpack}{string} |
| 243 | Identical to the \function{unpack()} function, using the compiled format. |
| 244 | (\code{len(string)} must equal \member{self.size}). |
| 245 | \end{methoddesc} |
| 246 | |
| 247 | \begin{methoddesc}[Struct]{unpack_from}{buffer\optional{,offset |
| 248 | \code{= 0}}} |
| 249 | Identical to the \function{unpack_from()} function, using the compiled format. |
| 250 | (\code{len(buffer[offset:])} must be at least \member{self.size}). |
| 251 | \end{methoddesc} |
| 252 | |
| 253 | \begin{memberdesc}[Struct]{format} |
| 254 | The format string used to construct this Struct object. |
| 255 | \end{memberdesc} |
| 256 | |