Fred Drake | 295da24 | 1998-08-10 19:42:37 +0000 | [diff] [blame] | 1 | \section{\module{struct} --- |
| 2 | Interpret strings as packed binary data.} |
Fred Drake | b91e934 | 1998-07-23 17:59:49 +0000 | [diff] [blame] | 3 | \declaremodule{builtin}{struct} |
| 4 | |
| 5 | \modulesynopsis{Interpret strings as packed binary data.} |
| 6 | |
Fred Drake | abdea22 | 1998-03-16 05:22:08 +0000 | [diff] [blame] | 7 | \indexii{C@\C{}}{structures} |
Guido van Rossum | 5fdeeea | 1994-01-02 01:22:07 +0000 | [diff] [blame] | 8 | |
Fred Drake | 50b804d | 1998-11-30 22:14:58 +0000 | [diff] [blame] | 9 | This module performs conversions between Python values and \C{} |
Guido van Rossum | 5fdeeea | 1994-01-02 01:22:07 +0000 | [diff] [blame] | 10 | structs represented as Python strings. It uses \dfn{format strings} |
Fred Drake | 50b804d | 1998-11-30 22:14:58 +0000 | [diff] [blame] | 11 | (explained below) as compact descriptions of the lay-out of the \C{} |
Guido van Rossum | 5fdeeea | 1994-01-02 01:22:07 +0000 | [diff] [blame] | 12 | structs and the intended conversion to/from Python values. |
| 13 | |
| 14 | The module defines the following exception and functions: |
| 15 | |
Fred Drake | 7ddd043 | 1998-03-08 07:44:13 +0000 | [diff] [blame] | 16 | |
Guido van Rossum | 5fdeeea | 1994-01-02 01:22:07 +0000 | [diff] [blame] | 17 | \begin{excdesc}{error} |
| 18 | Exception raised on various occasions; argument is a string |
| 19 | describing what is wrong. |
| 20 | \end{excdesc} |
| 21 | |
Fred Drake | 50b804d | 1998-11-30 22:14:58 +0000 | [diff] [blame] | 22 | \begin{funcdesc}{pack}{fmt, v1, v2, \textrm{\ldots}} |
Guido van Rossum | 5fdeeea | 1994-01-02 01:22:07 +0000 | [diff] [blame] | 23 | Return a string containing the values |
Fred Drake | 50b804d | 1998-11-30 22:14:58 +0000 | [diff] [blame] | 24 | \code{\var{v1}, \var{v2}, \textrm{\ldots}} packed according to the given |
Guido van Rossum | 5fdeeea | 1994-01-02 01:22:07 +0000 | [diff] [blame] | 25 | format. The arguments must match the values required by the format |
| 26 | exactly. |
| 27 | \end{funcdesc} |
| 28 | |
Fred Drake | cce1090 | 1998-03-17 06:33:25 +0000 | [diff] [blame] | 29 | \begin{funcdesc}{unpack}{fmt, string} |
Fred Drake | 50b804d | 1998-11-30 22:14:58 +0000 | [diff] [blame] | 30 | Unpack the string (presumably packed by \code{pack(\var{fmt}, |
| 31 | \textrm{\ldots})}) according to the given format. The result is a |
| 32 | tuple even if it contains exactly one item. The string must contain |
| 33 | exactly the amount of data required by the format (i.e. |
| 34 | \code{len(\var{string})} must equal \code{calcsize(\var{fmt})}). |
Guido van Rossum | 5fdeeea | 1994-01-02 01:22:07 +0000 | [diff] [blame] | 35 | \end{funcdesc} |
| 36 | |
| 37 | \begin{funcdesc}{calcsize}{fmt} |
| 38 | Return the size of the struct (and hence of the string) |
| 39 | corresponding to the given format. |
| 40 | \end{funcdesc} |
| 41 | |
Fred Drake | 50b804d | 1998-11-30 22:14:58 +0000 | [diff] [blame] | 42 | Format characters have the following meaning; the conversion between |
| 43 | \C{} and Python values should be obvious given their types: |
Guido van Rossum | 5fdeeea | 1994-01-02 01:22:07 +0000 | [diff] [blame] | 44 | |
Fred Drake | ee60191 | 1998-04-11 20:53:03 +0000 | [diff] [blame] | 45 | \begin{tableiii}{c|l|l}{samp}{Format}{C Type}{Python} |
Guido van Rossum | 5fdeeea | 1994-01-02 01:22:07 +0000 | [diff] [blame] | 46 | \lineiii{x}{pad byte}{no value} |
Fred Drake | 50b804d | 1998-11-30 22:14:58 +0000 | [diff] [blame] | 47 | \lineiii{c}{\ctype{char}}{string of length 1} |
| 48 | \lineiii{b}{\ctype{signed char}}{integer} |
| 49 | \lineiii{B}{\ctype{unsigned char}}{integer} |
| 50 | \lineiii{h}{\ctype{short}}{integer} |
| 51 | \lineiii{H}{\ctype{unsigned short}}{integer} |
| 52 | \lineiii{i}{\ctype{int}}{integer} |
| 53 | \lineiii{I}{\ctype{unsigned int}}{integer} |
| 54 | \lineiii{l}{\ctype{long}}{integer} |
| 55 | \lineiii{L}{\ctype{unsigned long}}{integer} |
| 56 | \lineiii{f}{\ctype{float}}{float} |
| 57 | \lineiii{d}{\ctype{double}}{float} |
| 58 | \lineiii{s}{\ctype{char[]}}{string} |
| 59 | \lineiii{p}{\ctype{char[]}}{string} |
| 60 | \lineiii{P}{\ctype{void *}}{integer} |
Guido van Rossum | 5fdeeea | 1994-01-02 01:22:07 +0000 | [diff] [blame] | 61 | \end{tableiii} |
| 62 | |
Fred Drake | 50b804d | 1998-11-30 22:14:58 +0000 | [diff] [blame] | 63 | A format character may be preceded by an integral repeat count; |
| 64 | e.g.\ the format string \code{'4h'} means exactly the same as |
| 65 | \code{'hhhh'}. |
Guido van Rossum | 5fdeeea | 1994-01-02 01:22:07 +0000 | [diff] [blame] | 66 | |
Guido van Rossum | e20aef5 | 1997-08-26 20:39:54 +0000 | [diff] [blame] | 67 | Whitespace characters between formats are ignored; a count and its |
| 68 | format must not contain whitespace though. |
| 69 | |
Fred Drake | cf0fb8b | 1998-07-23 21:18:25 +0000 | [diff] [blame] | 70 | For the \character{s} format character, the count is interpreted as the |
Guido van Rossum | 1254346 | 1996-12-31 02:22:14 +0000 | [diff] [blame] | 71 | size of the string, not a repeat count like for the other format |
| 72 | characters; e.g. \code{'10s'} means a single 10-byte string, while |
| 73 | \code{'10c'} means 10 characters. For packing, the string is |
| 74 | truncated or padded with null bytes as appropriate to make it fit. |
| 75 | For unpacking, the resulting string always has exactly the specified |
| 76 | number of bytes. As a special case, \code{'0s'} means a single, empty |
| 77 | string (while \code{'0c'} means 0 characters). |
Guido van Rossum | 5fdeeea | 1994-01-02 01:22:07 +0000 | [diff] [blame] | 78 | |
Fred Drake | cf0fb8b | 1998-07-23 21:18:25 +0000 | [diff] [blame] | 79 | The \character{p} format character can be used to encode a Pascal |
| 80 | string. The first byte is the length of the stored string, with the |
| 81 | bytes of the string following. If count is given, it is used as the |
| 82 | total number of bytes used, including the length byte. If the string |
| 83 | passed in to \function{pack()} is too long, the stored representation |
| 84 | is truncated. If the string is too short, padding is used to ensure |
| 85 | that exactly enough bytes are used to satisfy the count. |
| 86 | |
| 87 | For the \character{I} and \character{L} format characters, the return |
Guido van Rossum | 6530717 | 1997-01-03 19:21:53 +0000 | [diff] [blame] | 88 | value is a Python long integer. |
Guido van Rossum | 1254346 | 1996-12-31 02:22:14 +0000 | [diff] [blame] | 89 | |
Guido van Rossum | 6ac06b3 | 1998-09-21 14:44:34 +0000 | [diff] [blame] | 90 | For the \character{P} format character, the return value is a Python |
| 91 | integer or long integer, depending on the size needed to hold a |
Fred Drake | 50b804d | 1998-11-30 22:14:58 +0000 | [diff] [blame] | 92 | pointer when it has been cast to an integer type. A \NULL{} pointer will |
| 93 | always be returned as the Python integer \code{0}. When packing pointer-sized |
Guido van Rossum | 6ac06b3 | 1998-09-21 14:44:34 +0000 | [diff] [blame] | 94 | values, Python integer or long integer objects may be used. For |
| 95 | example, the Alpha and Merced processors use 64-bit pointer values, |
| 96 | meaning a Python long integer will be used to hold the pointer; other |
| 97 | platforms use 32-bit pointers and will use a Python integer. |
| 98 | |
Fred Drake | 50b804d | 1998-11-30 22:14:58 +0000 | [diff] [blame] | 99 | By default, \C{} numbers are represented in the machine's native format |
Guido van Rossum | 1254346 | 1996-12-31 02:22:14 +0000 | [diff] [blame] | 100 | and byte order, and properly aligned by skipping pad bytes if |
Fred Drake | 50b804d | 1998-11-30 22:14:58 +0000 | [diff] [blame] | 101 | necessary (according to the rules used by the \C{} compiler). |
Guido van Rossum | 1254346 | 1996-12-31 02:22:14 +0000 | [diff] [blame] | 102 | |
| 103 | Alternatively, the first character of the format string can be used to |
| 104 | indicate the byte order, size and alignment of the packed data, |
| 105 | according to the following table: |
| 106 | |
Fred Drake | ee60191 | 1998-04-11 20:53:03 +0000 | [diff] [blame] | 107 | \begin{tableiii}{c|l|l}{samp}{Character}{Byte order}{Size and alignment} |
Guido van Rossum | 1254346 | 1996-12-31 02:22:14 +0000 | [diff] [blame] | 108 | \lineiii{@}{native}{native} |
| 109 | \lineiii{=}{native}{standard} |
| 110 | \lineiii{<}{little-endian}{standard} |
| 111 | \lineiii{>}{big-endian}{standard} |
| 112 | \lineiii{!}{network (= big-endian)}{standard} |
| 113 | \end{tableiii} |
| 114 | |
Fred Drake | cf0fb8b | 1998-07-23 21:18:25 +0000 | [diff] [blame] | 115 | If the first character is not one of these, \character{@} is assumed. |
Guido van Rossum | 1254346 | 1996-12-31 02:22:14 +0000 | [diff] [blame] | 116 | |
| 117 | Native byte order is big-endian or little-endian, depending on the |
| 118 | host system (e.g. Motorola and Sun are big-endian; Intel and DEC are |
| 119 | little-endian). |
| 120 | |
Fred Drake | 50b804d | 1998-11-30 22:14:58 +0000 | [diff] [blame] | 121 | Native size and alignment are determined using the \C{} compiler's |
| 122 | \keyword{sizeof} expression. This is always combined with native byte |
| 123 | order. |
Guido van Rossum | 1254346 | 1996-12-31 02:22:14 +0000 | [diff] [blame] | 124 | |
| 125 | Standard size and alignment are as follows: no alignment is required |
Fred Drake | 50b804d | 1998-11-30 22:14:58 +0000 | [diff] [blame] | 126 | for any type (so you have to use pad bytes); \ctype{short} is 2 bytes; |
| 127 | \ctype{int} and \ctype{long} are 4 bytes. \ctype{float} and |
| 128 | \ctype{double} are 32-bit and 64-bit IEEE floating point numbers, |
| 129 | respectively. |
Guido van Rossum | 1254346 | 1996-12-31 02:22:14 +0000 | [diff] [blame] | 130 | |
Fred Drake | 50b804d | 1998-11-30 22:14:58 +0000 | [diff] [blame] | 131 | Note the difference between \character{@} and \character{=}: both use |
| 132 | native byte order, but the size and alignment of the latter is |
| 133 | standardized. |
Guido van Rossum | 1254346 | 1996-12-31 02:22:14 +0000 | [diff] [blame] | 134 | |
Fred Drake | cf0fb8b | 1998-07-23 21:18:25 +0000 | [diff] [blame] | 135 | The form \character{!} is available for those poor souls who claim they |
Guido van Rossum | 1254346 | 1996-12-31 02:22:14 +0000 | [diff] [blame] | 136 | can't remember whether network byte order is big-endian or |
| 137 | little-endian. |
| 138 | |
| 139 | There is no way to indicate non-native byte order (i.e. force |
Fred Drake | cf0fb8b | 1998-07-23 21:18:25 +0000 | [diff] [blame] | 140 | byte-swapping); use the appropriate choice of \character{<} or |
| 141 | \character{>}. |
Guido van Rossum | 1254346 | 1996-12-31 02:22:14 +0000 | [diff] [blame] | 142 | |
Guido van Rossum | 6ac06b3 | 1998-09-21 14:44:34 +0000 | [diff] [blame] | 143 | The \character{P} format character is only available for the native |
| 144 | byte ordering (selected as the default or with the \character{@} byte |
| 145 | order character). The byte order character \character{=} chooses to |
| 146 | use little- or big-endian ordering based on the host system. The |
| 147 | struct module does not interpret this as native ordering, so the |
| 148 | \character{P} format is not available. |
| 149 | |
Guido van Rossum | 1254346 | 1996-12-31 02:22:14 +0000 | [diff] [blame] | 150 | Examples (all using native byte order, size and alignment, on a |
| 151 | big-endian machine): |
Guido van Rossum | 5fdeeea | 1994-01-02 01:22:07 +0000 | [diff] [blame] | 152 | |
Fred Drake | 1947991 | 1998-02-13 06:58:54 +0000 | [diff] [blame] | 153 | \begin{verbatim} |
Guido van Rossum | dbadd55 | 1997-01-03 04:20:09 +0000 | [diff] [blame] | 154 | >>> from struct import * |
| 155 | >>> pack('hhl', 1, 2, 3) |
| 156 | '\000\001\000\002\000\000\000\003' |
| 157 | >>> unpack('hhl', '\000\001\000\002\000\000\000\003') |
| 158 | (1, 2, 3) |
| 159 | >>> calcsize('hhl') |
| 160 | 8 |
| 161 | >>> |
Fred Drake | 1947991 | 1998-02-13 06:58:54 +0000 | [diff] [blame] | 162 | \end{verbatim} |
Fred Drake | 50b804d | 1998-11-30 22:14:58 +0000 | [diff] [blame] | 163 | |
Guido van Rossum | 5fdeeea | 1994-01-02 01:22:07 +0000 | [diff] [blame] | 164 | Hint: to align the end of a structure to the alignment requirement of |
| 165 | a particular type, end the format with the code for that type with a |
Guido van Rossum | 6c4f003 | 1995-03-07 10:14:09 +0000 | [diff] [blame] | 166 | repeat count of zero, e.g.\ the format \code{'llh0l'} specifies two |
Guido van Rossum | 5fdeeea | 1994-01-02 01:22:07 +0000 | [diff] [blame] | 167 | pad bytes at the end, assuming longs are aligned on 4-byte boundaries. |
Fred Drake | 7ddd043 | 1998-03-08 07:44:13 +0000 | [diff] [blame] | 168 | This only works when native size and alignment are in effect; |
| 169 | standard size and alignment does not enforce any alignment. |
| 170 | |
| 171 | \begin{seealso} |
Fred Drake | 3d815bd | 1999-04-21 15:57:29 +0000 | [diff] [blame^] | 172 | \seemodule{array}{packed binary storage of homogeneous data} |
| 173 | \seemodule{xdrlib}{packing and unpacking of XDR data} |
Fred Drake | 7ddd043 | 1998-03-08 07:44:13 +0000 | [diff] [blame] | 174 | \end{seealso} |