Guido van Rossum | 470be14 | 1995-03-17 16:07:09 +0000 | [diff] [blame] | 1 | \section{Built-in Module \sectcode{struct}} |
Guido van Rossum | e47da0a | 1997-07-17 16:34:52 +0000 | [diff] [blame] | 2 | \label{module-struct} |
Guido van Rossum | 5fdeeea | 1994-01-02 01:22:07 +0000 | [diff] [blame] | 3 | \bimodindex{struct} |
Fred Drake | abdea22 | 1998-03-16 05:22:08 +0000 | [diff] [blame^] | 4 | \indexii{C@\C{}}{structures} |
Guido van Rossum | 5fdeeea | 1994-01-02 01:22:07 +0000 | [diff] [blame] | 5 | |
| 6 | This module performs conversions between Python values and C |
| 7 | structs represented as Python strings. It uses \dfn{format strings} |
| 8 | (explained below) as compact descriptions of the lay-out of the C |
| 9 | structs and the intended conversion to/from Python values. |
| 10 | |
| 11 | The module defines the following exception and functions: |
| 12 | |
Fred Drake | 7ddd043 | 1998-03-08 07:44:13 +0000 | [diff] [blame] | 13 | |
Guido van Rossum | 5fdeeea | 1994-01-02 01:22:07 +0000 | [diff] [blame] | 14 | \begin{excdesc}{error} |
| 15 | Exception raised on various occasions; argument is a string |
| 16 | describing what is wrong. |
| 17 | \end{excdesc} |
| 18 | |
| 19 | \begin{funcdesc}{pack}{fmt\, v1\, v2\, {\rm \ldots}} |
| 20 | Return a string containing the values |
| 21 | \code{\var{v1}, \var{v2}, {\rm \ldots}} packed according to the given |
| 22 | format. The arguments must match the values required by the format |
| 23 | exactly. |
| 24 | \end{funcdesc} |
| 25 | |
| 26 | \begin{funcdesc}{unpack}{fmt\, string} |
| 27 | Unpack the string (presumably packed by \code{pack(\var{fmt}, {\rm \ldots})}) |
| 28 | according to the given format. The result is a tuple even if it |
| 29 | contains exactly one item. The string must contain exactly the |
| 30 | amount of data required by the format (i.e. \code{len(\var{string})} must |
| 31 | equal \code{calcsize(\var{fmt})}). |
| 32 | \end{funcdesc} |
| 33 | |
| 34 | \begin{funcdesc}{calcsize}{fmt} |
| 35 | Return the size of the struct (and hence of the string) |
| 36 | corresponding to the given format. |
| 37 | \end{funcdesc} |
| 38 | |
| 39 | Format characters have the following meaning; the conversion between C |
| 40 | and Python values should be obvious given their types: |
| 41 | |
| 42 | \begin{tableiii}{|c|l|l|}{samp}{Format}{C}{Python} |
| 43 | \lineiii{x}{pad byte}{no value} |
| 44 | \lineiii{c}{char}{string of length 1} |
| 45 | \lineiii{b}{signed char}{integer} |
Guido van Rossum | 1254346 | 1996-12-31 02:22:14 +0000 | [diff] [blame] | 46 | \lineiii{B}{unsigned char}{integer} |
Guido van Rossum | 5fdeeea | 1994-01-02 01:22:07 +0000 | [diff] [blame] | 47 | \lineiii{h}{short}{integer} |
Guido van Rossum | 1254346 | 1996-12-31 02:22:14 +0000 | [diff] [blame] | 48 | \lineiii{H}{unsigned short}{integer} |
Guido van Rossum | 5fdeeea | 1994-01-02 01:22:07 +0000 | [diff] [blame] | 49 | \lineiii{i}{int}{integer} |
Guido van Rossum | 1254346 | 1996-12-31 02:22:14 +0000 | [diff] [blame] | 50 | \lineiii{I}{unsigned int}{integer} |
Guido van Rossum | 5fdeeea | 1994-01-02 01:22:07 +0000 | [diff] [blame] | 51 | \lineiii{l}{long}{integer} |
Guido van Rossum | 1254346 | 1996-12-31 02:22:14 +0000 | [diff] [blame] | 52 | \lineiii{L}{unsigned long}{integer} |
Guido van Rossum | 5fdeeea | 1994-01-02 01:22:07 +0000 | [diff] [blame] | 53 | \lineiii{f}{float}{float} |
| 54 | \lineiii{d}{double}{float} |
Guido van Rossum | 1254346 | 1996-12-31 02:22:14 +0000 | [diff] [blame] | 55 | \lineiii{s}{char[]}{string} |
Guido van Rossum | 5fdeeea | 1994-01-02 01:22:07 +0000 | [diff] [blame] | 56 | \end{tableiii} |
| 57 | |
Guido van Rossum | 6c4f003 | 1995-03-07 10:14:09 +0000 | [diff] [blame] | 58 | A format character may be preceded by an integral repeat count; e.g.\ |
Guido van Rossum | 5fdeeea | 1994-01-02 01:22:07 +0000 | [diff] [blame] | 59 | the format string \code{'4h'} means exactly the same as \code{'hhhh'}. |
| 60 | |
Guido van Rossum | e20aef5 | 1997-08-26 20:39:54 +0000 | [diff] [blame] | 61 | Whitespace characters between formats are ignored; a count and its |
| 62 | format must not contain whitespace though. |
| 63 | |
Guido van Rossum | 1254346 | 1996-12-31 02:22:14 +0000 | [diff] [blame] | 64 | For the \code{'s'} format character, the count is interpreted as the |
| 65 | size of the string, not a repeat count like for the other format |
| 66 | characters; e.g. \code{'10s'} means a single 10-byte string, while |
| 67 | \code{'10c'} means 10 characters. For packing, the string is |
| 68 | truncated or padded with null bytes as appropriate to make it fit. |
| 69 | For unpacking, the resulting string always has exactly the specified |
| 70 | number of bytes. As a special case, \code{'0s'} means a single, empty |
| 71 | string (while \code{'0c'} means 0 characters). |
Guido van Rossum | 5fdeeea | 1994-01-02 01:22:07 +0000 | [diff] [blame] | 72 | |
Guido van Rossum | 1254346 | 1996-12-31 02:22:14 +0000 | [diff] [blame] | 73 | For the \code{'I'} and \code{'L'} format characters, the return |
Guido van Rossum | 6530717 | 1997-01-03 19:21:53 +0000 | [diff] [blame] | 74 | value is a Python long integer. |
Guido van Rossum | 1254346 | 1996-12-31 02:22:14 +0000 | [diff] [blame] | 75 | |
| 76 | By default, C numbers are represented in the machine's native format |
| 77 | and byte order, and properly aligned by skipping pad bytes if |
| 78 | necessary (according to the rules used by the C compiler). |
| 79 | |
| 80 | Alternatively, the first character of the format string can be used to |
| 81 | indicate the byte order, size and alignment of the packed data, |
| 82 | according to the following table: |
| 83 | |
| 84 | \begin{tableiii}{|c|l|l|}{samp}{Character}{Byte order}{Size and alignment} |
| 85 | \lineiii{@}{native}{native} |
| 86 | \lineiii{=}{native}{standard} |
| 87 | \lineiii{<}{little-endian}{standard} |
| 88 | \lineiii{>}{big-endian}{standard} |
| 89 | \lineiii{!}{network (= big-endian)}{standard} |
| 90 | \end{tableiii} |
| 91 | |
| 92 | If the first character is not one of these, \code{'@'} is assumed. |
| 93 | |
| 94 | Native byte order is big-endian or little-endian, depending on the |
| 95 | host system (e.g. Motorola and Sun are big-endian; Intel and DEC are |
| 96 | little-endian). |
| 97 | |
| 98 | Native size and alignment are determined using the C compiler's sizeof |
| 99 | expression. This is always combined with native byte order. |
| 100 | |
| 101 | Standard size and alignment are as follows: no alignment is required |
| 102 | for any type (so you have to use pad bytes); short is 2 bytes; int and |
Guido van Rossum | dbadd55 | 1997-01-03 04:20:09 +0000 | [diff] [blame] | 103 | long are 4 bytes. Float and double are 32-bit and 64-bit IEEE floating |
| 104 | point numbers, respectively. |
Guido van Rossum | 1254346 | 1996-12-31 02:22:14 +0000 | [diff] [blame] | 105 | |
| 106 | Note the difference between \code{'@'} and \code{'='}: both use native |
| 107 | byte order, but the size and alignment of the latter is standardized. |
| 108 | |
| 109 | The form \code{'!'} is available for those poor souls who claim they |
| 110 | can't remember whether network byte order is big-endian or |
| 111 | little-endian. |
| 112 | |
| 113 | There is no way to indicate non-native byte order (i.e. force |
| 114 | byte-swapping); use the appropriate choice of \code{'<'} or |
| 115 | \code{'>'}. |
| 116 | |
| 117 | Examples (all using native byte order, size and alignment, on a |
| 118 | big-endian machine): |
Guido van Rossum | 5fdeeea | 1994-01-02 01:22:07 +0000 | [diff] [blame] | 119 | |
Fred Drake | 1947991 | 1998-02-13 06:58:54 +0000 | [diff] [blame] | 120 | \begin{verbatim} |
Guido van Rossum | dbadd55 | 1997-01-03 04:20:09 +0000 | [diff] [blame] | 121 | >>> from struct import * |
| 122 | >>> pack('hhl', 1, 2, 3) |
| 123 | '\000\001\000\002\000\000\000\003' |
| 124 | >>> unpack('hhl', '\000\001\000\002\000\000\000\003') |
| 125 | (1, 2, 3) |
| 126 | >>> calcsize('hhl') |
| 127 | 8 |
| 128 | >>> |
Fred Drake | 1947991 | 1998-02-13 06:58:54 +0000 | [diff] [blame] | 129 | \end{verbatim} |
Guido van Rossum | e47da0a | 1997-07-17 16:34:52 +0000 | [diff] [blame] | 130 | % |
Guido van Rossum | 5fdeeea | 1994-01-02 01:22:07 +0000 | [diff] [blame] | 131 | Hint: to align the end of a structure to the alignment requirement of |
| 132 | a particular type, end the format with the code for that type with a |
Guido van Rossum | 6c4f003 | 1995-03-07 10:14:09 +0000 | [diff] [blame] | 133 | repeat count of zero, e.g.\ the format \code{'llh0l'} specifies two |
Guido van Rossum | 5fdeeea | 1994-01-02 01:22:07 +0000 | [diff] [blame] | 134 | pad bytes at the end, assuming longs are aligned on 4-byte boundaries. |
Fred Drake | 7ddd043 | 1998-03-08 07:44:13 +0000 | [diff] [blame] | 135 | This only works when native size and alignment are in effect; |
| 136 | standard size and alignment does not enforce any alignment. |
| 137 | |
| 138 | \begin{seealso} |
| 139 | \seemodule{array}{packed binary storage of homogeneous data} |
| 140 | \end{seealso} |